-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bounded memory sorting #32
Comments
Sorry for the delayed response. I haven't looked into the patch yet but in general I think external sorting is a useful addition to the API. |
I recently had a similar problem. Loading a 1.3G eventlog uses more than 32G memory. Profiling output shows that most allocations are done by |
Indeed, I came here to suggest precisely @facundominguez's patch. Sorting via secondary storage would be great; I routinely run into issues like this when processing eventlogs. |
This is addressed by #61. |
I feel like it should be possible to read an eventlog in sorted order with constant memory overhead and without an intermediate file. Does that sound reasonable? If so, I can try to implement it at some point when I have time |
Hi @TeofilC. That was a long time ago, so perhaps it's changed, but I remember fixing bugs in Threadscope that came from assuming evens for a single capability come sorted. If they still don't, I think the perturbation is probably minimal, one or two positions, but it's worth checking on a few different computers (slow, fast, under load, etc.) and, if needed. be prepared to react. OTOH, perhaps memory deceives me and it was events for threads, not HECs, coming out of order or events that are expected not arriving at all. |
I never got around to working on this. Since then my thinking has shifted. I'm not really sure if eventlogs need to be sorted at all. I think most use cases are better solved by having some sort of index of how the different Capabilities' streams are interleaved. This can be produced without having to modify the eventlog at all and allow us to efficiently read the events in sorted order. |
While writing an analysis tool, the need for sorting the events in an eventlog file arose. The problem with the existing
GHC.RTS.Events.sortEvents
is that it is not memory bounded, leading to heap exhaustion on large inputs.How about we provide a memory bounded version?
I ended up writing something similar to this. It splits events per capability to different temporary files, and then merges the events back into a single list.
I could submit a PR if it looks useful to others.
This version is a bit wasteful in that it has to deserialize and serialize all the events, where for the sake of splitting the events, only the capability number and the encoded form of the event would be needed.
Another odd aspect is that I couldn't make it stream when using Data.ByteString.Builder to write the events to the temporary files. Heap would be exhausted the same.
The text was updated successfully, but these errors were encountered: