feat: Enable EpContext OVIR Encapsulation #704

ankitm3k · 2025-06-11T12:54:48Z

Description

This PR enables the EPContext OVIR Encapsulation model import, compilation & inference feature.

https://jira.devtools.intel.com/browse/CVS-169087

Copilot

Pull Request Overview

This PR adds support for the EPContext OVIR Encapsulation feature by updating the model import, compilation, and inference logic across the OpenVINO provider. Key changes include:

Adding a new parameter (enable_causallm) to the OVCore::ImportModel API.
Updating the OVCore::ImportModel implementation to branch on XML model stream detection and enable stateful compilation.
Introducing a helper function to detect XML model streams and enforcing OpenVINO SDK version compatibility in the onnx_ctx_model_helper.

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
onnxruntime/core/providers/openvino/ov_interface.h	Added the bool enable_causallm parameter to ImportModel.
onnxruntime/core/providers/openvino/ov_interface.cc	Refactored ImportModel logic and repositioned log messages based on the model format.
onnxruntime/core/providers/openvino/onnx_ctx_model_helper.cc	Added XML stream check and enforced SDK version compatibility with an error message.
onnxruntime/core/providers/openvino/backends/basic_backend.cc	Updated the call to ImportModel to supply the new parameter and model path name.
onnxruntime/core/providers/openvino/backend_utils.h	Declared the new IsModelStreamXML helper.
onnxruntime/core/providers/openvino/backend_utils.cc	Implemented IsModelStreamXML to detect XML headers in the model stream.

Comments suppressed due to low confidence (1)

onnxruntime/core/providers/openvino/ov_interface.h:82

[nitpick] Consider renaming 'enable_causallm' to 'enableCausalLM' to follow typical C++ camelCase naming conventions and improve readability.

bool enable_causallm,

onnxruntime/core/providers/openvino/onnx_ctx_model_helper.cc

RyanMetcalfeInt8

LGTM. I tested this on my LNL machine with some EPCtx models that are used for AI Toolkit. They seem to work fine.

The only issue I see, which is somewhat outside of the scope of this specific PR is, as compared to msb_release_v2 branch, python applications are unable to recover from KV-Cache is full. exceptions.

With this branch (and ovep-develop), when these exceptions are thrown, the infer request is deleted from the infer request queue (with delete Request0 message printed), and when application tries to start another generation sequence, the application crashes.

With msb_release_v2, the application is able to catch these exceptions and then proceed / try again with the next generation sequence (after rewinding KV-Cache state, etc.).

MayureshV1 · 2025-06-12T19:21:08Z

onnxruntime/core/providers/openvino/ov_interface.cc

+      if (enable_causallm) {
+        exe = OVCore::Get()->StatefulCompileModel(model, hw_target, device_config);
+      } else {
+        auto obj = core.compile_model(model, hw_target, device_config);


@ankitm3k .. What would be the path to enable EPCtx cache post-compile to avoid FEIL penalty on every load?

Can be tested by:

onnxruntime_perf_test.exe -e openvino -m times -r 1 -o 0 -I -i "device_type|NPU" -C "ep.context_enable|1 ep.context_file_path|C:\resnet50_int8_st_epctx.onnx" C:\resnet50_int8_st.onnx

MayureshV1 · 2025-06-12T19:25:53Z

onnxruntime/core/providers/openvino/ov_interface.cc

+      // where weights from bin file is directly consumed
+      std::string xml_file_name = name;
+      if (name.size() >= 5 && name.substr(name.size() - 5) == ".onnx") {
+        xml_file_name.replace(name.size() - 5, 5, ".xml");


Have we validated this with CreateSessionFromArray where model is passed in memory? Is there a way to decouple from the location on disk since the onnx model and its contents should be portable.

this is just a file name handling code where your input model name string i.e. mode.onnx is now reprsented as an input xml file name string i.e. model.xml, this wont impact the memory usage

This is not regarding memory usage but whether model passed in memory would work.

This can be verified by:
onnxruntime_perf_test.exe -e openvino -m times -r 1 -o 0 -I -l -i "device_type|NPU" -C "ep.context_file_path|C:\resnet50_int8_st.onnx" C:\resnet50_int8_st.onnx

MayureshV1 · 2025-06-12T19:28:59Z

onnxruntime/core/providers/openvino/backends/basic_backend.cc

@@ -73,7 +73,8 @@ BasicBackend::BasicBackend(std::unique_ptr<ONNX_NAMESPACE::ModelProto>& model_pr
      exe_network_ = OVCore::Get()->ImportModel(*model_stream,
                                                hw_target,
                                                device_config,
-                                                subgraph_context_.subgraph_name);
+                                                enable_causallm,
+                                                session_context_.onnx_model_path_name.string());


@javier-intel, @preetha-intel .. Does this change start supporting OV IR wrapped in ONNX but impact pre-compiled and partitioned ONNX models as we do not have any reference to subgraph_context anymore passed into ImportModel ?

The field subgraph_context_.subgraph_name was a redundant entity in the current implementation which was used in exception handling with the graph name, thus we are using the original model name here to facilitate the model xml contents to be parsed while loading the model

With subgraph_context_.subgraph_name we get better error handling inside ImportModel, so let's try to retain the argument.

onnxruntime/core/providers/openvino/backend_utils.cc

Co-authored-by: Copilot <[email protected]>

feat: Enable EpContext OVIR Encapsulation

4991d3b

ankitm3k requested review from sfatimar, preetha-intel, vthaniel and jatinwadhwa921 June 11, 2025 12:54

sfatimar requested a review from MayureshV1 June 12, 2025 03:03

ankitm3k requested a review from RyanMetcalfeInt8 June 12, 2025 17:00

MayureshV1 requested a review from Copilot June 12, 2025 19:03

Copilot AI reviewed Jun 12, 2025

View reviewed changes

onnxruntime/core/providers/openvino/onnx_ctx_model_helper.cc Outdated Show resolved Hide resolved

RyanMetcalfeInt8 approved these changes Jun 12, 2025

View reviewed changes

MayureshV1 reviewed Jun 12, 2025

View reviewed changes

Update onnxruntime/core/providers/openvino/onnx_ctx_model_helper.cc

6aafc3c

Co-authored-by: Copilot <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Enable EpContext OVIR Encapsulation #704

feat: Enable EpContext OVIR Encapsulation #704

ankitm3k commented Jun 11, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

RyanMetcalfeInt8 left a comment •

edited

Loading

Uh oh!

MayureshV1 Jun 12, 2025

Uh oh!

MayureshV1 Jun 13, 2025

Uh oh!

MayureshV1 Jun 12, 2025

Uh oh!

ankitm3k Jun 13, 2025

Uh oh!

MayureshV1 Jun 13, 2025

Uh oh!

MayureshV1 Jun 12, 2025

Uh oh!

ankitm3k Jun 13, 2025

Uh oh!

MayureshV1 Jun 13, 2025

Uh oh!

Uh oh!

Uh oh!

feat: Enable EpContext OVIR Encapsulation #704

Are you sure you want to change the base?

feat: Enable EpContext OVIR Encapsulation #704

Conversation

ankitm3k commented Jun 11, 2025

Description

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

RyanMetcalfeInt8 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

RyanMetcalfeInt8 left a comment •

edited

Loading