-
Notifications
You must be signed in to change notification settings - Fork 41
feat: Enable EpContext OVIR Encapsulation #704
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: ovep-develop
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds support for the EPContext OVIR Encapsulation feature by updating the model import, compilation, and inference logic across the OpenVINO provider. Key changes include:
- Adding a new parameter (enable_causallm) to the OVCore::ImportModel API.
- Updating the OVCore::ImportModel implementation to branch on XML model stream detection and enable stateful compilation.
- Introducing a helper function to detect XML model streams and enforcing OpenVINO SDK version compatibility in the onnx_ctx_model_helper.
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.
Show a summary per file
File | Description |
---|---|
onnxruntime/core/providers/openvino/ov_interface.h | Added the bool enable_causallm parameter to ImportModel. |
onnxruntime/core/providers/openvino/ov_interface.cc | Refactored ImportModel logic and repositioned log messages based on the model format. |
onnxruntime/core/providers/openvino/onnx_ctx_model_helper.cc | Added XML stream check and enforced SDK version compatibility with an error message. |
onnxruntime/core/providers/openvino/backends/basic_backend.cc | Updated the call to ImportModel to supply the new parameter and model path name. |
onnxruntime/core/providers/openvino/backend_utils.h | Declared the new IsModelStreamXML helper. |
onnxruntime/core/providers/openvino/backend_utils.cc | Implemented IsModelStreamXML to detect XML headers in the model stream. |
Comments suppressed due to low confidence (1)
onnxruntime/core/providers/openvino/ov_interface.h:82
- [nitpick] Consider renaming 'enable_causallm' to 'enableCausalLM' to follow typical C++ camelCase naming conventions and improve readability.
bool enable_causallm,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I tested this on my LNL machine with some EPCtx models that are used for AI Toolkit. They seem to work fine.
The only issue I see, which is somewhat outside of the scope of this specific PR is, as compared to msb_release_v2 branch, python applications are unable to recover from KV-Cache is full.
exceptions.
With this branch (and ovep-develop
), when these exceptions are thrown, the infer request is deleted from the infer request queue (with delete Request0
message printed), and when application tries to start another generation sequence, the application crashes.
With msb_release_v2
, the application is able to catch these exceptions and then proceed / try again with the next generation sequence (after rewinding KV-Cache state, etc.).
if (enable_causallm) { | ||
exe = OVCore::Get()->StatefulCompileModel(model, hw_target, device_config); | ||
} else { | ||
auto obj = core.compile_model(model, hw_target, device_config); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ankitm3k .. What would be the path to enable EPCtx cache post-compile to avoid FEIL penalty on every load?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be tested by:
onnxruntime_perf_test.exe -e openvino -m times -r 1 -o 0 -I -i "device_type|NPU" -C "ep.context_enable|1 ep.context_file_path|C:\resnet50_int8_st_epctx.onnx" C:\resnet50_int8_st.onnx
// where weights from bin file is directly consumed | ||
std::string xml_file_name = name; | ||
if (name.size() >= 5 && name.substr(name.size() - 5) == ".onnx") { | ||
xml_file_name.replace(name.size() - 5, 5, ".xml"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have we validated this with CreateSessionFromArray where model is passed in memory? Is there a way to decouple from the location on disk since the onnx model and its contents should be portable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is just a file name handling code where your input model name string i.e. mode.onnx is now reprsented as an input xml file name string i.e. model.xml, this wont impact the memory usage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not regarding memory usage but whether model passed in memory would work.
This can be verified by:
onnxruntime_perf_test.exe -e openvino -m times -r 1 -o 0 -I -l -i "device_type|NPU" -C "ep.context_file_path|C:\resnet50_int8_st.onnx" C:\resnet50_int8_st.onnx
@@ -73,7 +73,8 @@ BasicBackend::BasicBackend(std::unique_ptr<ONNX_NAMESPACE::ModelProto>& model_pr | |||
exe_network_ = OVCore::Get()->ImportModel(*model_stream, | |||
hw_target, | |||
device_config, | |||
subgraph_context_.subgraph_name); | |||
enable_causallm, | |||
session_context_.onnx_model_path_name.string()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@javier-intel, @preetha-intel .. Does this change start supporting OV IR wrapped in ONNX but impact pre-compiled and partitioned ONNX models as we do not have any reference to subgraph_context anymore passed into ImportModel ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The field subgraph_context_.subgraph_name was a redundant entity in the current implementation which was used in exception handling with the graph name, thus we are using the original model name here to facilitate the model xml contents to be parsed while loading the model
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With subgraph_context_.subgraph_name we get better error handling inside ImportModel, so let's try to retain the argument.
Co-authored-by: Copilot <[email protected]>
Description
This PR enables the EPContext OVIR Encapsulation model import, compilation & inference feature.
https://jira.devtools.intel.com/browse/CVS-169087