-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Index out of range in ghc-events show
and dependencies of ghc-events
#109
Comments
I think we're failing in |
Running ghc-events with an eventlog from the reproducer triggers an assertion failure at this line: ghc-events/src/GHC/RTS/Events/Binary.hs Lines 758 to 765 in 2168f61
It seems like the payloadLen is wrong for certain EVENT_PROF_SAMPLE_COST_CENTRE events |
jup, the last four parses before it blows up look like this: evSpec: HeapProfSampleCostCentre {heapProfId = 0, heapProfResidency = 912, heapProfStackDepth = 1, heapProfStack = [78]}, etRef: 163
evSpec: ProfSampleCostCentre {profCapset = 369098752, profTicks = 4333765376, profStackDepth = 0, profCcsStack = []}, etRef: 167
evSpec: HeapProfSampleString {heapProfId = 0, heapProfResidency = 2029359963648, heapProfLabel = "\SO"}, etRef: 164
evSpec: CreateThread {thread = 22784}, etRef: 0 It looks like the parse of |
@TeofilC is it possible that this is the Ccs stack being too deep for the profStackDepth Word8? And then it overflows and we don't read the |
That doesn't seem to be it, you're right, the |
Yes on the GHC side we truncate to 255, which should be fine |
So afaiu this means that either the parsing of the header on the |
Strangely enough I can't seem to reproduce this with |
Also don’t they acquire the global eventBuf lock before writing? |
I also can't reproduce if I only do heap profiling or time profiling with the non-threaded RTS. So it seems like we need to be doing both with the non-threaded RTS. I think it's highly likely that somehow we try to write both a heap sample and a time sample at the same time to the eventlog |
Ah but that only exists for the threaded RTS |
and the -threaded safe guards against that because it does proper locking while the non-threaded RTS doesn’t but is still somehow concurrent? That’s weird |
What seems to be happening is:
So we end up with something garbled. This story is backed up by putting a bunch of traces inside the eventlog printing functions in the RTS. This is the order of events they suggest |
So there must be a context switch somewhere in |
Is it possible that this is happening because the time profile is running asynchronously? (see |
so maybe it would work if we'd just keep the |
@TeofilC I don't think that you observation is generally right - I was going to try if I can completely circumvent the issue by using |
Is there perhaps some similar issue where the initialisation events (cost centre definitions) are being posted to the output, and that is interupted by a heap profile event before all of the definitions are dumped. Does it happen if you are not using a profiled executable? (ie, don't compile with |
Another thing to try is a longer profiling interval ( |
Interesting @MangoIV . It sounds like there's potentially multiple bugs. In your larger example, maybe you could try to find the last few events before the eventlog gets corrupted. That might help suggest which event is going wrong |
I've written up the bug we found here: https://gitlab.haskell.org/ghc/ghc/-/issues/25165 |
|
This seems to confirm what @TeofilC suggested about time profiling events interrupting the writing of other events and leading to corruption |
it also happens when only using |
Hi! I have had a couple of problems with
eventlog2html
andhs-speedscope
recently and they seem to be a problem either with the library or the eventlog that the ghc RTS emits. I can more or less (sometimes it doesn't happen) reliably reproduce this error with the following program:compile with
ghc -rtsopts -prof -fprof-late -O0 ./bla.hs
run with
./bla +RTS -hc -p -l-au
I have not tried to further reduce the example.
This happens on
ghc
9.6, 9.8 and 9.10 at least, according to @TeofilC onghc-events
HEAD, and then at least onghc-events
0.19.0.1.It affects not only
ghc-events
but alsohs-speedscope
andeventlog2html
.related issue on eventlog2html issue tracker:
mpickering/eventlog2html#136
The text was updated successfully, but these errors were encountered: