Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible undefined behavior with bots connected #761

Open
TestingPlant opened this issue Dec 19, 2024 · 4 comments
Open

Possible undefined behavior with bots connected #761

TestingPlant opened this issue Dec 19, 2024 · 4 comments
Assignees
Labels
bug 🐞 Something isn't working help wanted 🫂 Extra attention is needed prio 🚨

Comments

@TestingPlant
Copy link
Collaborator

The tag program aborts on debug mode and panics on release mode after using rust-mc-bot and testing with 500 bots.

Debug:

2024-12-19T13:12:42.207209Z  INFO crates/hyperion/src/simulation/skin.rs:60: player skin cache miss for a730781f-f613-0a8b-8ba5-ef6309ead7bc
2024-12-19T13:12:42.207386Z  INFO crates/hyperion/src/ingress/mod.rs:127: Starting login: Bot_410 553abe3e-dc6c-26cf-de5c-05c477767348
2024-12-19T13:12:42.207517Z DEBUG /home/remote-dev/.cargo/registry/src/index.crates.io-6f17d22bba15001f/hyper-util-0.1.10/src/client/legacy/pool.rs:270: reuse idle connection for ("https", mowojang.matdoes.dev)
target/debug/tag(+0x1a41b8c) [0xae714ad91b8c]
target/debug/tag(+0x1a58f00) [0xae714ada8f00]
target/debug/tag(+0x1a59038) [0xae714ada9038]
target/debug/tag(+0x1a597e0) [0xae714ada97e0]
target/debug/tag(+0x1a44754) [0xae714ad94754]
target/debug/tag(+0x1a45484) [0xae714ad95484]
target/debug/tag(+0x1a29928) [0xae714ad79928]
target/debug/tag(+0x78adf4) [0xae7149adadf4]
target/debug/tag(+0x6f0ca0) [0xae7149a40ca0]
target/debug/tag(+0x99b35c) [0xae7149ceb35c]
target/debug/tag(+0x6ec010) [0xae7149a3c010]
target/debug/tag(+0x85e298) [0xae7149bae298]
target/debug/tag(+0x810e18) [0xae7149b60e18]
target/debug/tag(+0x99f8bc) [0xae7149cef8bc]
target/debug/tag(+0x9f1c04) [0xae7149d41c04]
target/debug/tag(+0x1ad9344) [0xae714ae29344]
target/debug/tag(+0x1ab02e0) [0xae714ae002e0]
target/debug/tag(+0x1ab18a8) [0xae714ae018a8]
target/debug/tag(+0x1ab482c) [0xae714ae0482c]
target/debug/tag(+0x1ab1e84) [0xae714ae01e84]
target/debug/tag(+0x1a51b84) [0xae714ada1b84]
target/debug/tag(+0x1a51d30) [0xae714ada1d30]
target/debug/tag(+0x1a51b24) [0xae714ada1b24]
target/debug/tag(+0x1a51cd8) [0xae714ada1cd8]
target/debug/tag(+0x19da018) [0xae714ad2a018]
target/debug/tag(+0x518384) [0xae7149868384]
target/debug/tag(+0x1bc310) [0xae714950c310]
target/debug/tag(+0x1b6780) [0xae7149506780]
target/debug/tag(+0x1b79a4) [0xae71495079a4]
target/debug/tag(+0x1b5378) [0xae7149505378]
target/debug/tag(+0x1bfd558) [0xae714af4d558]
target/debug/tag(+0x1b5340) [0xae7149505340]
target/debug/tag(+0x1bcf0c) [0xae714950cf0c]
/lib/aarch64-linux-gnu/libc.so.6(+0x273fc) [0xf238d68773fc]
/lib/aarch64-linux-gnu/libc.so.6(__libc_start_main+0x98) [0xf238d68774cc]
target/debug/tag(+0x1b08b0) [0xae71495008b0]
Aborted (core dumped)

Release:

2024-12-19T13:16:33.400346Z  INFO player_join_world{name="Bot_65"}: crates/hyperion/src/egress/player_join/mod.rs:367: Bot_65 joined the world
2024-12-19T13:16:33.399757Z  INFO player_join_world{name="Bot_66"}: crates/hyperion/src/egress/player_join/mod.rs:367: Bot_66 joined the world
2024-12-19T13:16:33.400207Z  INFO player_join_world{name="Bot_55"}: crates/hyperion/src/egress/player_join/mod.rs:179: sending skins for 47 players
2024-12-19T13:16:33.400971Z  INFO player_join_world{name="Bot_55"}: crates/hyperion/src/egress/player_join/mod.rs:367: Bot_55 joined the world
2024-12-19T13:16:33.402026Z ERROR crates/hyperion/src/ingress/mod.rs:136: failed to get skin failed to parse json from response: "". Using empty skin
2024-12-19T13:16:33.402451Z  INFO crates/hyperion/src/ingress/mod.rs:127: Starting login: Bot_77 ee925f6b-4c61-eedb-b03b-2bf9341085ec
2024-12-19T13:16:33.403465Z  INFO crates/hyperion/src/ingress/mod.rs:127: Starting login: Bot_79 f82e45d7-b5ac-58ca-dd6b-37a06fbc969e
2024-12-19T13:16:33.403515Z  INFO crates/hyperion/src/ingress/mod.rs:127: Starting login: Bot_76 963f5be3-fafe-9c74-338d-3a84999fec96
2024-12-19T13:16:33.403719Z  INFO crates/hyperion/src/simulation/skin.rs:60: player skin cache miss for ee925f6b-4c61-eedb-b03b-2bf9341085ec
2024-12-19T13:16:33.404235Z  INFO crates/hyperion/src/ingress/mod.rs:127: Starting login: Bot_78 3fb4580b-c2fa-ff5d-1de9-150816fdd002
2024-12-19T13:16:33.404733Z  INFO crates/hyperion/src/simulation/skin.rs:60: player skin cache miss for f82e45d7-b5ac-58ca-dd6b-37a06fbc969e
2024-12-19T13:16:33.404807Z  INFO crates/hyperion/src/simulation/skin.rs:60: player skin cache miss for 963f5be3-fafe-9c74-338d-3a84999fec96
2024-12-19T13:16:33.404840Z  INFO crates/hyperion/src/simulation/skin.rs:60: player skin cache miss for 3fb4580b-c2fa-ff5d-1de9-150816fdd002
2024-12-19T13:16:33.406650Z ERROR crates/hyperion/src/ingress/mod.rs:136: failed to get skin failed to parse json from response: "". Using empty skin
2024-12-19T13:16:33.407077Z ERROR crates/hyperion/src/ingress/mod.rs:136: failed to get skin failed to parse json from response: "". Using empty skin
2024-12-19T13:16:33.408920Z ERROR crates/hyperion/src/ingress/mod.rs:136: failed to get skin failed to parse json from response: "". Using empty skin
2024-12-19T13:16:33.420370Z ERROR crates/hyperion/src/ingress/mod.rs:136: failed to get skin failed to parse json from response: "". Using empty skin
2024-12-19T13:16:33.437485Z ERROR crates/hyperion/src/ingress/mod.rs:136: failed to get skin failed to parse json from response: "". Using empty skin
2024-12-19T13:16:33.437522Z ERROR crates/hyperion/src/ingress/mod.rs:136: failed to get skin failed to parse json from response: "". Using empty skin
2024-12-19T13:16:33.443026Z ERROR crates/hyperion/src/ingress/mod.rs:136: failed to get skin failed to parse json from response: "". Using empty skin
2024-12-19T13:16:33.445178Z ERROR crates/hyperion/src/ingress/mod.rs:136: failed to get skin failed to parse json from response: "". Using empty skin
2024-12-19T13:16:33.446019Z  INFO generate_ingress_events: crates/hyperion/src/ingress/mod.rs:316: player_connect
2024-12-19T13:16:33.446216Z  INFO generate_ingress_events: crates/hyperion/src/ingress/mod.rs:316: player_connect
2024-12-19T13:16:33.446224Z  INFO generate_ingress_events: crates/hyperion/src/ingress/mod.rs:316: player_connect
2024-12-19T13:16:33.446227Z  INFO generate_ingress_events: crates/hyperion/src/ingress/mod.rs:316: player_connect
thread '2024-12-19T13:16:33.449438Z  INFO player_join_world{name="Bot_57"}: crates/hyperion/src/egress/player_join/mod.rs:179: sending skins for 57 players
2024-12-19T13:16:33.449665Z  INFO player_join_world{name="Bot_57"}: crates/hyperion/src/egress/player_join/mod.rs:367: Bot_57 joined the world
2024-12-19T13:16:33.450355Z  INFO player_join_world{name="Bot_31"}: crates/hyperion/src/egress/player_join/mod.rs:179: sending skins for 57 players
2024-12-19T13:16:33.450474Z  INFO player_join_world{name="Bot_31"}: crates/hyperion/src/egress/player_join/mod.rs:367: Bot_31 joined the world
2024-12-19T13:16:33.451081Z  INFO player_join_world{name="Bot_70"}: crates/hyperion/src/egress/player_join/mod.rs:179: sending skins for 57 players
2024-12-19T13:16:33.451222Z  INFO player_join_world{name="Bot_70"}: crates/hyperion/src/egress/player_join/mod.rs:367: Bot_70 joined the world
<unnamed>' panicked at /usr/local/cargo/git/checkouts/flecs-rust-ff75a97755951fdf/db1b212/flecs_ecs/src/core/get_tuple.rs:414:1:
Component `hyperion::simulation::Name` not found on `EntityView::get`operation
with parameters: `(&hyperion::simulation::Uuid, &hyperion::simulation::Name, &hyperion::simulation::Position, &hyperion::simulation::Yaw, &hyperion::simulation::Pitch, &hyperion::net::ConnectionId)`.
Use `try_get` variant to avoid assert/panicking if you want to handle the error
or use `Option<&hyperion::simulation::Name> instead to handle individual cases.
stack backtrace:
   0: rust_begin_unwind
   1: core::panicking::panic_fmt
   2: rayon::iter::plumbing::bridge_producer_consumer::helper
   3: rayon_core::join::join_context::{{closure}}
   4: rayon::iter::plumbing::bridge_producer_consumer::helper
   5: rayon_core::join::join_context::{{closure}}
   6: rayon::iter::plumbing::bridge_producer_consumer::helper
   7: <rayon_core::job::StackJob<L,F,R> as rayon_core::job::Job>::execute
   8: rayon_core::registry::WorkerThread::wait_until_cold
   9: rayon_core::registry::ThreadBuilder::run
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

Since there's a different behavior on release mode, I suspect some UB is occurring and that the panic/abort is just a symptom. In addition, on commit d358e2, release mode sometimes caused a segfault and sometimes caused a panic, although on the latest main (ef00b8) I only seem to get a panic on release mode.

A git bisect shows that the abort started in commit 22d628. However, that commit doesn't seem to have any obviously unsound code, so I believe the UB was introduced earlier than that, and 22d628 is the first commit to show a symptom of UB.

@TestingPlant TestingPlant self-assigned this Dec 19, 2024
@TestingPlant
Copy link
Collaborator Author

I'm currently working on this branch which replaces unsafe code with safe code to try to find which unsafe block is causing issues: https://github.com/TestingPlant/hyperion/tree/reduce-unsafe

@andrewgazelka
Copy link
Member

btw we have https://github.com/hyperion-mc/hyperion/tree/main/tools/antithesis-bot

all machines have 48 cores so they should be able to run hundreds of bots. https://youtu.be/m3HwXlQPCEU?si=ZULB7Be7BVF58sK3

@TestingPlant TestingPlant added bug 🐞 Something isn't working prio 🚨 labels Jan 21, 2025
@TestingPlant TestingPlant added the help wanted 🫂 Extra attention is needed label Jan 25, 2025
@TestingPlant
Copy link
Collaborator Author

I've been testing more in this branch: https://github.com/TestingPlant/hyperion/tree/reduce-unsafe-2

This crash still occurs with all the play c2s packet handling removed. I've looked through most of the unsafe blocks in this branch and most either:

  • don't seem unsound (at least to me)
  • would trigger assert_unsafe_precondition in debug mode if it were used in an unsound manner (such as unwrap_unchecked)

The server also aborts with a stack trace when using ctrl+c to exit it. I'm not sure if that's related.

I've also only tested this on aarch64 but not on x86_64. I haven't heard reports of this issue from anyone else so this crash might be specific to aarch64, although the bug is likely caused by UB which should be fixed anyways.

This could also possibly be an issue in flecs (either the C library or the Rust bindings) or with how we use flecs, since there are ways to use flecs in an unsound manner without unsafe.

@andrewgazelka
Copy link
Member

andrewgazelka commented Jan 25, 2025

.> I've been testing more in this branch: TestingPlant/hyperion@reduce-unsafe-2

This crash still occurs with all the play c2s packet handling removed. I've looked through most of the unsafe blocks in this branch and most either:

  • don't seem unsound (at least to me)
  • would trigger assert_unsafe_precondition in debug mode if it were used in an unsound manner (such as unwrap_unchecked)

The server also aborts with a stack trace when using ctrl+c to exit it. I'm not sure if that's related.

I've also only tested this on aarch64 but not on x86_64. I haven't heard reports of this issue from anyone else so this crash might be specific to aarch64, although the bug is likely caused by UB which should be fixed anyways.

This could also possibly be an issue in flecs (either the C library or the Rust bindings) or with how we use flecs, since there are ways to use flecs in an unsound manner without unsafe.

btw a lot of the work I do is on a macbook which is aarch64 but not still is slightly different than on linux... but I haven’t done a lot of stuff recently with this project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐞 Something isn't working help wanted 🫂 Extra attention is needed prio 🚨
Projects
None yet
Development

No branches or pull requests

2 participants