Skip to content

Modin fail on CORE (GEN13 i9) #1957

Open
@weiseng-yeap

Description

@weiseng-yeap

Summary

When I try to installed oneAPI base toolkit and test the MODIN sample apps:
https://github.com/oneapi-src/oneAPI-samples/tree/master/AI-and-Analytics/Getting-Started-Samples/IntelModin_GettingStarted

Then detected error below:
(raylet) [2023-10-10 22:04:54,885 E 21639 21688] (raylet) agent_manager.cc:135: The raylet exited immediately because the Ray agent failed. The raylet fate shares with the agent. This can happen because the Ray agent was unexpectedly killed or failed. Agent can fail when
(raylet) - The version of grpcio doesn't follow Ray's requirement. Agent can segfault with the incorrect grpcio version. Check the grpcio version pip freeze | grep grpcio.
(raylet) - The agent failed to start because of unexpected error or port conflict. Read the log cat /tmp/ray/session_latest/dashboard_agent.log. You can find the log file structure here https://docs.ray.io/en/master/ray-observability/ray-logging.html#logging-directory-structure.
(raylet) - The agent is killed by the OS (e.g., out of memory).

Version

oneAPI toolkit version: 2023.2.0

Environment

OS is Linux uBuntu 22.04.2 LTS
CPU: 13th Gen Intel(R) Core(TM) i9-13900
RAM: 32GB

Steps to reproduce

Using the conda running the MODIN sample apps that released by oneAPI:
https://github.com/oneapi-src/oneAPI-samples/tree/master/AI-and-Analytics/Getting-Started-Samples/IntelModin_GettingStarted

Observed behavior

Detected the raylet fail like below log:

(raylet) [2023-10-10 22:04:54,885 E 21639 21688] (raylet) agent_manager.cc:135: The raylet exited immediately because the Ray agent failed. The raylet fate shares with the agent. This can happen because the Ray agent was unexpectedly killed or failed. Agent can fail when
(raylet) - The version of grpcio doesn't follow Ray's requirement. Agent can segfault with the incorrect grpcio version. Check the grpcio version pip freeze | grep grpcio.
(raylet) - The agent failed to start because of unexpected error or port conflict. Read the log cat /tmp/ray/session_latest/dashboard_agent.log. You can find the log file structure here https://docs.ray.io/en/master/ray-observability/ray-logging.html#logging-directory-structure.
(raylet) - The agent is killed by the OS (e.g., out of memory).

Expected behavior

I tested on XEON is working, but CORE product not working as same setup.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions