Skip to content

feat: Enable EpContext OVIR Encapsulation #704

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: ovep-develop
Choose a base branch
from

Conversation

ankitm3k
Copy link

Description

This PR enables the EPContext OVIR Encapsulation model import, compilation & inference feature.

https://jira.devtools.intel.com/browse/CVS-169087

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for the EPContext OVIR Encapsulation feature by updating the model import, compilation, and inference logic across the OpenVINO provider. Key changes include:

  • Adding a new parameter (enable_causallm) to the OVCore::ImportModel API.
  • Updating the OVCore::ImportModel implementation to branch on XML model stream detection and enable stateful compilation.
  • Introducing a helper function to detect XML model streams and enforcing OpenVINO SDK version compatibility in the onnx_ctx_model_helper.

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
onnxruntime/core/providers/openvino/ov_interface.h Added the bool enable_causallm parameter to ImportModel.
onnxruntime/core/providers/openvino/ov_interface.cc Refactored ImportModel logic and repositioned log messages based on the model format.
onnxruntime/core/providers/openvino/onnx_ctx_model_helper.cc Added XML stream check and enforced SDK version compatibility with an error message.
onnxruntime/core/providers/openvino/backends/basic_backend.cc Updated the call to ImportModel to supply the new parameter and model path name.
onnxruntime/core/providers/openvino/backend_utils.h Declared the new IsModelStreamXML helper.
onnxruntime/core/providers/openvino/backend_utils.cc Implemented IsModelStreamXML to detect XML headers in the model stream.
Comments suppressed due to low confidence (1)

onnxruntime/core/providers/openvino/ov_interface.h:82

  • [nitpick] Consider renaming 'enable_causallm' to 'enableCausalLM' to follow typical C++ camelCase naming conventions and improve readability.
bool enable_causallm,

Copy link

@RyanMetcalfeInt8 RyanMetcalfeInt8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I tested this on my LNL machine with some EPCtx models that are used for AI Toolkit. They seem to work fine.

The only issue I see, which is somewhat outside of the scope of this specific PR is, as compared to msb_release_v2 branch, python applications are unable to recover from KV-Cache is full. exceptions.

With this branch (and ovep-develop), when these exceptions are thrown, the infer request is deleted from the infer request queue (with delete Request0 message printed), and when application tries to start another generation sequence, the application crashes.

With msb_release_v2, the application is able to catch these exceptions and then proceed / try again with the next generation sequence (after rewinding KV-Cache state, etc.).

if (enable_causallm) {
exe = OVCore::Get()->StatefulCompileModel(model, hw_target, device_config);
} else {
auto obj = core.compile_model(model, hw_target, device_config);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ankitm3k .. What would be the path to enable EPCtx cache post-compile to avoid FEIL penalty on every load?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be tested by:

onnxruntime_perf_test.exe -e openvino -m times -r 1 -o 0 -I -i "device_type|NPU" -C "ep.context_enable|1 ep.context_file_path|C:\resnet50_int8_st_epctx.onnx" C:\resnet50_int8_st.onnx

// where weights from bin file is directly consumed
std::string xml_file_name = name;
if (name.size() >= 5 && name.substr(name.size() - 5) == ".onnx") {
xml_file_name.replace(name.size() - 5, 5, ".xml");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we validated this with CreateSessionFromArray where model is passed in memory? Is there a way to decouple from the location on disk since the onnx model and its contents should be portable.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is just a file name handling code where your input model name string i.e. mode.onnx is now reprsented as an input xml file name string i.e. model.xml, this wont impact the memory usage

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not regarding memory usage but whether model passed in memory would work.

This can be verified by:
onnxruntime_perf_test.exe -e openvino -m times -r 1 -o 0 -I -l -i "device_type|NPU" -C "ep.context_file_path|C:\resnet50_int8_st.onnx" C:\resnet50_int8_st.onnx

@@ -73,7 +73,8 @@ BasicBackend::BasicBackend(std::unique_ptr<ONNX_NAMESPACE::ModelProto>& model_pr
exe_network_ = OVCore::Get()->ImportModel(*model_stream,
hw_target,
device_config,
subgraph_context_.subgraph_name);
enable_causallm,
session_context_.onnx_model_path_name.string());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@javier-intel, @preetha-intel .. Does this change start supporting OV IR wrapped in ONNX but impact pre-compiled and partitioned ONNX models as we do not have any reference to subgraph_context anymore passed into ImportModel ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The field subgraph_context_.subgraph_name was a redundant entity in the current implementation which was used in exception handling with the graph name, thus we are using the original model name here to facilitate the model xml contents to be parsed while loading the model

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With subgraph_context_.subgraph_name we get better error handling inside ImportModel, so let's try to retain the argument.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants