-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot load saved model from S3 bucket after tensorflow-serving 2.7 #1963
Comments
I asked a question on SO just a few days before this issue was opened: https://stackoverflow.com/questions/70700005/using-s3-bucket-for-savedmodel-with-tensorflow-serving2-7-0-gpu-docker-image. TF support asked me to follow up here, which is what I'm doing. |
Is there any progress here? |
Cloud filesystem implementation moved to https://github.com/tensorflow/io. Users will have to install https://pypi.org/project/tensorflow-io/ and import it. This has the side effect of loading the plugin for the S3 filesystem so there will be an implementation So, from 2.7 onwards, users need pip install tensorflow-io and import tensorflow_io and then code will work as from before. For any 2.6 patch or previous releases there is nothing to change. |
@mihaimaruseac Thanks for your response :) But I think that makes sense in python environment. To fix this issue, maybe tensorflow-io should be linked in bazel or c++ code level. For now, I downgrade tensorflow serving docker image version to load saved model from s3. |
@mihaimaruseac are you referring to importing tfio in the training script? how will tf serving be impacted from tfio that way? |
any updates? @yongtang You removed s3 support in tensorflow in this PR https://github.com/tensorflow/tensorflow/pull/51032/files Do you know how to build s3 support into tensorflow serving? tensorflow serving compiles against tensorflow source code. If tensorflow source code doesn't support s3 anymore, tfserving will fail to load models from s3 too |
My team wanted to upgrade some of our production systems to tensorflow |
Is there any news ? we would like to use tensorflow 2.8 too with s3. thanks |
Any updates on this issue? |
The latest version is 2.5.1, right? Why is it also a tfio error? What can I do to use S3? |
Hi, the 'latest' tag points to 2.5.1. But it's 1 year ago and is not the newest one. We recently tag by explicit versions. According to the comments above, could you try 2.6.x (e.g. tensorflow/serving:2.6.2)? For this issue, TF-serving is adding the dependency to TF IO to bring this feature back. Sorry for the breakage and we will let you know once it's fixed. |
This is also a blocking issue for our team, would be great if this can be fixed soon! |
This still seems to be an issue. Any idea when the removed support for S3 will be added back in instead of workarounds? |
It won't be added back. TF team at Google does not maintain S3 filesystem, so the code in TF would just rot. Whereas it being in SIG IO helps by making up to date and adding new features. |
@mihaimaruseac Can you please add any tutorial on how to use tf io inside tf serving ecosystem? Currently I am doing a manual work of copying the models inside the tf-serving containers and doing a docker commit. This is extremely inefficient and I need to maintain heavy fat dockers, lot of time required to push and pull to my internal artifactory from my local dev machine. It would be great if I could automate this process by calling the models straight from the S3/Minio buckets. I hope many of us are facing this problem and I would like to hear solutions to make this process lot more efficient. |
@mihaimaruseac I also tried compiling TF Serving with TF IO as a dependency, but as someone with no Bazel or TF development experience, it's not easy, I couldn't make it work. |
That makes total sense @mihaimaruseac thank you, but unless I'm misunderstanding this, why isn’t tensorflow-io a package or dependency inside the container just like any other? |
For all these questions, adding @yongtang The system is similar to the GCS filesystem support, except that one is also distributed as a separate pip package. |
@yongtang I am also interested in the response to the following question and potential future plans for including tensorflow-io as a dependency in the serving image. Do you have any comments / update?
|
Hi everyone. I tried to build TensorFlow Serving with an S3 filesystem implemented in TensorFlow IO, and it was successful. Here are the codes to build them and docker images. Codes: https://github.com/jeongukjae/tf-serving-s3 Docker image size is 465MB, which is a little bit bigger than the official image. Maybe the optimal solution is that the TensorFlow team or community (SIG IO?) maintain the TF Serving image with TensorFlow IO, but I hope my solution helps someone who needs this. |
For anyone coming here looking for a solution, this excellent piece of work worked for us. I cannot thank @jeongukjae enough for this! We are now serving models in EKS pulling from AWS S3. If you try this, be aware that Bazel build parameters in the Dockerfile may need to be adjusted depending on resources of the host machine. Reducing default number of parallel jobs and restricting memory and CPU. These values enabled me to build locally on a Macbook Pro 2019 and in our Jenkins-based CI/CD, without failures due to OOMs:
|
Bug Report
If this is a bug report, please fill out the following form in full:
System information
tensorflow/serving:2.7.0
)Describe the problem
Describe the problem clearly here. Be sure to convey here why it's a bug in
TensorFlow Serving.
SavedModel cannot be loaded from S3 bucket after tensorflow-serving 2.7. Server raised error like below:
Docker image
tensorflow/serving:2.6.2
runs without any error in same configuration.Exact Steps to Reproduce
Please include all steps necessary for someone to reproduce this issue on their
own machine. If not, skip this section.
Source code / logs
Include any logs or source code that would be helpful to diagnose the problem.
If including tracebacks, please include the full traceback. Large logs and files
should be attached. Try to provide a reproducible test case that is the bare
minimum necessary to generate the problem.
I think this is because of Modular File System Migration in TF 2.7.0's release note. Is there any way to link tensorflow-io in the build step?
The text was updated successfully, but these errors were encountered: