Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot load saved model from S3 bucket after tensorflow-serving 2.7 #1963

Open
jeongukjae opened this issue Jan 18, 2022 · 21 comments
Open

Cannot load saved model from S3 bucket after tensorflow-serving 2.7 #1963

jeongukjae opened this issue Jan 18, 2022 · 21 comments

Comments

@jeongukjae
Copy link

jeongukjae commented Jan 18, 2022

Bug Report

If this is a bug report, please fill out the following form in full:

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
  • TensorFlow Serving installed from (source or binary): Docker image (tensorflow/serving:2.7.0)
  • TensorFlow Serving version: 2.7.0

Describe the problem

Describe the problem clearly here. Be sure to convey here why it's a bug in
TensorFlow Serving.

SavedModel cannot be loaded from S3 bucket after tensorflow-serving 2.7. Server raised error like below:

2022-01-18 02:30:30.032712: E tensorflow_serving/sources/storage_path/file_system_storage_path_source.cc:365] FileSystemStoragePathSource encountered a filesystem access error: Could not find base path s3://some-s3-path..... for servable model with error UNIMPLEMENTED: File system scheme 's3' not implemented (file: 's3://some-s3-path.....')

Docker image tensorflow/serving:2.6.2 runs without any error in same configuration.

Exact Steps to Reproduce

Please include all steps necessary for someone to reproduce this issue on their
own machine. If not, skip this section.

docker run --rm -it \
  --env AWS_REGION=some-aws-region \
  --env MODEL_BASE_PATH=some-s3-path \
  tensorflow/serving:2.7.0

Source code / logs

Include any logs or source code that would be helpful to diagnose the problem.
If including tracebacks, please include the full traceback. Large logs and files
should be attached. Try to provide a reproducible test case that is the bare
minimum necessary to generate the problem.

2022-01-18 02:30:30.032712: E tensorflow_serving/sources/storage_path/file_system_storage_path_source.cc:365] FileSystemStoragePathSource encountered a filesystem access error: Could not find base path s3://some-s3-path..... for servable model with error UNIMPLEMENTED: File system scheme 's3' not implemented (file: 's3://some-s3-path.....')

I think this is because of Modular File System Migration in TF 2.7.0's release note. Is there any way to link tensorflow-io in the build step?

@siddharth-agrawal
Copy link

I asked a question on SO just a few days before this issue was opened: https://stackoverflow.com/questions/70700005/using-s3-bucket-for-savedmodel-with-tensorflow-serving2-7-0-gpu-docker-image. TF support asked me to follow up here, which is what I'm doing.

@jeongukjae
Copy link
Author

Is there any progress here?

@mihaimaruseac
Copy link
Contributor

Cloud filesystem implementation moved to https://github.com/tensorflow/io.

Users will have to install https://pypi.org/project/tensorflow-io/ and import it. This has the side effect of loading the plugin for the S3 filesystem so there will be an implementation

So, from 2.7 onwards, users need pip install tensorflow-io and import tensorflow_io and then code will work as from before.

For any 2.6 patch or previous releases there is nothing to change.

@jeongukjae
Copy link
Author

@mihaimaruseac Thanks for your response :)

But I think that makes sense in python environment. To fix this issue, maybe tensorflow-io should be linked in bazel or c++ code level. For now, I downgrade tensorflow serving docker image version to load saved model from s3.

@richieyoum
Copy link

@mihaimaruseac are you referring to importing tfio in the training script? how will tf serving be impacted from tfio that way?

@haitong
Copy link

haitong commented May 8, 2022

any updates? @yongtang You removed s3 support in tensorflow in this PR https://github.com/tensorflow/tensorflow/pull/51032/files Do you know how to build s3 support into tensorflow serving? tensorflow serving compiles against tensorflow source code. If tensorflow source code doesn't support s3 anymore, tfserving will fail to load models from s3 too

@hsahovic
Copy link

My team wanted to upgrade some of our production systems to tensorflow 2.8.X, but this is a blocking issue. @shan3290 any update on a way to have s3 support with tensorflow serving?

@lumenghe
Copy link

lumenghe commented Jul 1, 2022

Is there any news ? we would like to use tensorflow 2.8 too with s3. thanks

@RakeshRaj97
Copy link

Any updates on this issue?

@521bibi
Copy link

521bibi commented Aug 5, 2022

image
docker run --rm -p 8501:8501 --name tfs-s3 -e AWS_ACCESS_KEY_ID=minioadmin -e AWS_SECRET_ACCESS_KEY=minioadmin -e S3_ENDPOINT=http://127.0.0.1:9000 -e AWS_REGION=us-east-1 --env MODEL_BASE_PATH=S3://models --env MODEL_NAME=half -t tensorflow/serving:latest
my error is:
image

The latest version is 2.5.1, right? Why is it also a tfio error? What can I do to use S3?

@shan3290
Copy link
Member

shan3290 commented Aug 8, 2022

Hi, the 'latest' tag points to 2.5.1. But it's 1 year ago and is not the newest one. We recently tag by explicit versions. According to the comments above, could you try 2.6.x (e.g. tensorflow/serving:2.6.2)?

For this issue, TF-serving is adding the dependency to TF IO to bring this feature back. Sorry for the breakage and we will let you know once it's fixed.

@fsonntag
Copy link

This is also a blocking issue for our team, would be great if this can be fixed soon!

@glynjackson
Copy link

This still seems to be an issue. Any idea when the removed support for S3 will be added back in instead of workarounds?

@mihaimaruseac
Copy link
Contributor

It won't be added back. TF team at Google does not maintain S3 filesystem, so the code in TF would just rot. Whereas it being in SIG IO helps by making up to date and adding new features.

@RakeshRaj97
Copy link

RakeshRaj97 commented Oct 20, 2022

@mihaimaruseac Can you please add any tutorial on how to use tf io inside tf serving ecosystem? Currently I am doing a manual work of copying the models inside the tf-serving containers and doing a docker commit. This is extremely inefficient and I need to maintain heavy fat dockers, lot of time required to push and pull to my internal artifactory from my local dev machine. It would be great if I could automate this process by calling the models straight from the S3/Minio buckets. I hope many of us are facing this problem and I would like to hear solutions to make this process lot more efficient.

@fsonntag
Copy link

@mihaimaruseac
That makes sense to not maintain the code, but it could be added as a dependency, such as TF Decision Forests.

I also tried compiling TF Serving with TF IO as a dependency, but as someone with no Bazel or TF development experience, it's not easy, I couldn't make it work.

@glynjackson
Copy link

glynjackson commented Oct 20, 2022

That makes total sense @mihaimaruseac thank you, but unless I'm misunderstanding this, why isn’t tensorflow-io a package or dependency inside the container just like any other?

@mihaimaruseac
Copy link
Contributor

For all these questions, adding @yongtang

The system is similar to the GCS filesystem support, except that one is also distributed as a separate pip package.

@TaylorZowtuk
Copy link

@yongtang I am also interested in the response to the following question and potential future plans for including tensorflow-io as a dependency in the serving image. Do you have any comments / update?

That makes total sense @mihaimaruseac thank you, but unless I'm misunderstanding this, why isn’t tensorflow-io a package or dependency inside the container just like any other?

@jeongukjae
Copy link
Author

jeongukjae commented Jan 17, 2023

Hi everyone. I tried to build TensorFlow Serving with an S3 filesystem implemented in TensorFlow IO, and it was successful.

Here are the codes to build them and docker images.

Codes: https://github.com/jeongukjae/tf-serving-s3
Docker image: https://github.com/jeongukjae/tf-serving-s3/pkgs/container/tf-serving-s3

Docker image size is 465MB, which is a little bit bigger than the official image.
(tensorflow/serving:2.11.0 is 459MB)

Maybe the optimal solution is that the TensorFlow team or community (SIG IO?) maintain the TF Serving image with TensorFlow IO, but I hope my solution helps someone who needs this.

@singhniraj08 singhniraj08 assigned nniuzft and unassigned shan3290 Feb 17, 2023
@adriangay
Copy link

Hi everyone. I tried to build TensorFlow Serving with an S3 filesystem implemented in TensorFlow IO, and it was successful.

Here are the codes to build them and docker images.

Codes: https://github.com/jeongukjae/tf-serving-s3 Docker image: https://github.com/jeongukjae/tf-serving-s3/pkgs/container/tf-serving-s3

Docker image size is 465MB, which is a little bit bigger than the official image. (tensorflow/serving:2.11.0 is 459MB)

Maybe the optimal solution is that the TensorFlow team or community (SIG IO?) maintain the TF Serving image with TensorFlow IO, but I hope my solution helps someone who needs this.

For anyone coming here looking for a solution, this excellent piece of work worked for us. I cannot thank @jeongukjae enough for this! We are now serving models in EKS pulling from AWS S3.

If you try this, be aware that Bazel build parameters in the Dockerfile may need to be adjusted depending on resources of the host machine. Reducing default number of parallel jobs and restricting memory and CPU. These values enabled me to build locally on a Macbook Pro 2019 and in our Jenkins-based CI/CD, without failures due to OOMs:

    --noshow_progress \
    --jobs=4 \
    --local_ram_resources=HOST_RAM*.5 \
    --local_cpu_resources=HOST_CPUS-2 \

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests