Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic offset tracking for stream queues #661

Open
wants to merge 68 commits into
base: main
Choose a base branch
from

Conversation

viktorerlingsson
Copy link
Member

@viktorerlingsson viktorerlingsson commented Apr 17, 2024

WHAT is this pull request doing?

Adds broker tracking of consumer offsets in streams if no x-stream-offset is provided by the consumer. Does not track if the consumer tag is generated by the broker.

✅ When to run cleanup_consumer_offsets?
✅ IndexError when trying to cleanup if msg_size => segment_size

HOW can this pull request be tested?

Run specs

@viktorerlingsson viktorerlingsson force-pushed the streams_automatic_offset_tracking branch 2 times, most recently from 2ecc8d3 to e00b4e5 Compare April 23, 2024 13:56
src/lavinmq/mfile.cr Outdated Show resolved Hide resolved
src/lavinmq/queue/stream_queue_message_store.cr Outdated Show resolved Hide resolved
src/lavinmq/queue/stream_queue_message_store.cr Outdated Show resolved Hide resolved
src/lavinmq/queue/stream_queue_message_store.cr Outdated Show resolved Hide resolved
src/lavinmq/queue/stream_queue_message_store.cr Outdated Show resolved Hide resolved
src/lavinmq/queue/stream_queue_message_store.cr Outdated Show resolved Hide resolved
src/lavinmq/queue/stream_queue_message_store.cr Outdated Show resolved Hide resolved
src/lavinmq/queue/stream_queue_message_store.cr Outdated Show resolved Hide resolved
src/lavinmq/queue/stream_queue_message_store.cr Outdated Show resolved Hide resolved
src/lavinmq/queue/stream_queue_message_store.cr Outdated Show resolved Hide resolved
@viktorerlingsson viktorerlingsson linked an issue May 14, 2024 that may be closed by this pull request
src/lavinmq/queue/stream_queue_message_store.cr Outdated Show resolved Hide resolved
src/lavinmq/queue/stream_queue_message_store.cr Outdated Show resolved Hide resolved
src/lavinmq/queue/stream_queue_message_store.cr Outdated Show resolved Hide resolved
src/lavinmq/queue/stream_queue_message_store.cr Outdated Show resolved Hide resolved
src/lavinmq/queue/stream_queue_message_store.cr Outdated Show resolved Hide resolved
src/lavinmq/queue/stream_queue_message_store.cr Outdated Show resolved Hide resolved
src/lavinmq/queue/stream_queue_message_store.cr Outdated Show resolved Hide resolved
src/lavinmq/queue/stream_queue_message_store.cr Outdated Show resolved Hide resolved
@viktorerlingsson viktorerlingsson force-pushed the streams_automatic_offset_tracking branch from 194926e to e58ad9b Compare May 23, 2024 14:38
@viktorerlingsson viktorerlingsson marked this pull request as ready for review June 3, 2024 08:09
src/lavinmq/client/channel/stream_consumer.cr Outdated Show resolved Hide resolved
spec/stream_queue_spec.cr Show resolved Hide resolved
Copy link
Member

@carlhoerberg carlhoerberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens with consumer_offset_capacity is reached? MFile doesn't auto expand.

src/lavinmq/queue/stream_queue_message_store.cr Outdated Show resolved Hide resolved
src/lavinmq/queue/stream_queue_message_store.cr Outdated Show resolved Hide resolved
src/lavinmq/client/channel/stream_consumer.cr Outdated Show resolved Hide resolved
src/lavinmq/mfile.cr Outdated Show resolved Hide resolved
src/lavinmq/queue/stream_queue_message_store.cr Outdated Show resolved Hide resolved
src/lavinmq/queue/stream_queue_message_store.cr Outdated Show resolved Hide resolved
@viktorerlingsson
Copy link
Member Author

viktorerlingsson commented Jun 17, 2024

What happens with consumer_offset_capacity is reached? MFile doesn't auto expand.

Fixed here

src/lavinmq/queue/stream_queue_message_store.cr Outdated Show resolved Hide resolved
src/lavinmq/queue/stream_queue_message_store.cr Outdated Show resolved Hide resolved
src/lavinmq/queue/stream_queue_message_store.cr Outdated Show resolved Hide resolved
src/lavinmq/queue/stream_queue_message_store.cr Outdated Show resolved Hide resolved
src/lavinmq/queue/stream_queue_message_store.cr Outdated Show resolved Hide resolved
@viktorerlingsson viktorerlingsson force-pushed the streams_automatic_offset_tracking branch from d52b6f5 to a74d4a6 Compare June 19, 2024 13:39
Copy link
Member

@carlhoerberg carlhoerberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to think about replication of the consumer offset file.

src/lavinmq/queue/stream_queue_message_store.cr Outdated Show resolved Hide resolved
src/lavinmq/queue/stream_queue_message_store.cr Outdated Show resolved Hide resolved
@viktorerlingsson
Copy link
Member Author

@carlhoerberg Performance seems pretty similar with both solutions, maybe even slightly quicker with append-only (I increased file size to Config.instance.segment_size to make sure we don't have to constantly GC). So I think we can go with that solution 👍

@carlhoerberg
Copy link
Member

Cool! Havent read all code but looks like the file is still expanded? Is that nessecary? Shouldnt a GC be issued instead? The file size should be a minimum segment size yes, but larger if there are many consumers we are tracking for, so calc the size of the file after GC and the capacity should be a multiple of that, say 1000-10000 larger or so, so that the offset of each consumer could be updated that many times without causing a new GC.

@viktorerlingsson
Copy link
Member Author

Cool! Havent read all code but looks like the file is still expanded? Is that nessecary? Shouldnt a GC be issued instead? The file size should be a minimum segment size yes, but larger if there are many consumers we are tracking for, so calc the size of the file after GC and the capacity should be a multiple of that, say 1000-10000 larger or so, so that the offset of each consumer could be updated that many times without causing a new GC.

Yeah, it will try to compact the file first (only keep the latest offset for each ctag). If the file is still full after that (i.e. there was nothing to compact) it will be expanded. Probably safer to set the capacity to a multiple of tracked ctags as you say though. Will look at that tomorrow!

        cleanup_consumer_offsets if consumer_offset_file_full?(consumer_tag)
        expand_consumer_offset_file if consumer_offset_file_full?(consumer_tag)

@viktorerlingsson
Copy link
Member Author

Cool! Havent read all code but looks like the file is still expanded? Is that nessecary? Shouldnt a GC be issued instead? The file size should be a minimum segment size yes, but larger if there are many consumers we are tracking for, so calc the size of the file after GC and the capacity should be a multiple of that, say 1000-10000 larger or so, so that the offset of each consumer could be updated that many times without causing a new GC.

@carlhoerberg Added this. Do we need to set some form of max size for the offsets file? capacity could in theory be very large if there are many consumers (and/or if ctags are long).

@viktorerlingsson
Copy link
Member Author

@carlhoerberg added some replication for the consumer offsets file. Not really sure what the best approach is for when we replace the file during compaction though. Just replace_file after we've done the replace? Or do we also want to stream writing to a temp file and then replace the current (and then deleting temp file)? We don't have support for renaming files in replication now I guess

@carlhoerberg
Copy link
Member

Ah, well spotted, can implement replace file on the client side with write-to-tempfile/rename in another PR

@carlhoerberg
Copy link
Member

Should revive this

@carlhoerberg carlhoerberg added this to the v2.1 milestone Nov 1, 2024
viktorerlingsson and others added 28 commits November 25, 2024 16:44
Co-authored-by: Carl Hörberg <[email protected]>
@viktorerlingsson viktorerlingsson marked this pull request as ready for review December 3, 2024 15:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Stream queues: Automatic offset tracking
3 participants