-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensemble Scheduler: Internal response allocation is not allocating memory at all #7593
Comments
Update I updated to Going deeper into the logs to found that:
Particularly:
It seems that the ensemble scheduler is not able to allocate memory to internal response (all is zero...) I don't know if I have to allocate memory myself or not ... |
Update Still in I converted my ensemble model into BLS, to check if it is really the ensemble scheduler that causes the issue, but still get the same problem:
Any tutorials about ensemble scheduling nor BLS talk about a memory allocation for models "in the middle". I found issues #4478, #95 with somehow the same errors but the solutions do not apply here because individual models work perfectly. @Tabrizian @szalpal, sorry for tagging but any help would be appreciated on this since I think I just missed something... |
I just made further tests to try to understand. I just built a VERY simple ensemble with two python models
And I got the same error... As a desperate move, it tried just to remove the
By removing
in both |
Is there any update for the allocation memory error? We also have a similar problem. We have 2 models running with onnx runtime and one bls, We can get response from models separately but when I try to send requests to the bls that ensemble the models with some logic it is not working. the bls config:
error: |
Hi, unfortunately I never got any answer from the staff... |
Hi, while I may not be the best person to help with this, I'll try my best. Could you tell, how do you run the |
Hi,
I'm using docker container, yes. I already tried changing the |
Just for visibility, running into the same issue on 24.09-py3 Testing with 23.02-py3 now |
@gpadiolleau this issue might be related #7647. It would explain the difference you are getting specifying and environment... Maybe your environment is compiled with numpy 2.0 or above, which seems to be not fully supported by the python backend. |
@SDJustus nice catch !! |
@gpadiolleau downgrading numpy<2 fixed the issue! |
I finally got time to test: downgrading numpy to 1.26 fixed the issue too. |
Description
Individual models works as expected but ensemble pipeline of these individuals raise
[StatusCode.INTERNAL] in ensemble 'depthcomp_pipeline', onnx runtime error 2: not enough space: expected 1048576, got 0
.Where the expected
1048576
byte size exactly matches the byte size of myrgbd_img
(1x4x512x512 = 1048576
) ONNX model first input (see below the config file of depthcomp_model).Triton Information
I am using Triton container:
nvcr.io/nvidia/tritonserver:23.12-py3
Host Nvidia Driver version:
545.29.06
Host CUDA version:
12.3
HW: Nvidia GeForce RTX4060 8Gb
To Reproduce
I am using gRPC inference (with shared memory) for a pipeline ensemble called depthcomp_pipeline and composed of three models : depthcomp_preprocessing, depthcomp_model and depthcomp_postprocessing.
-> Note that each individual model works separately and I can run gRPC inference with shared memory for each.
-> The exact same configuration works perfectly in
nvcr.io/nvidia/tritonserver:23.02-py3
(I just changed the custom python backend to python3.10)Here are the config file used:
Expected behavior
I would expect the ensemble to work properly since each individual model works but only the ensemble does not work.
I didn't find any change that could cause this error in the release notes but I may have missed something. In this case, thanks to point this me out.
The text was updated successfully, but these errors were encountered: