Skip to content
This repository has been archived by the owner on May 25, 2021. It is now read-only.

Improve compaction task status updates #241

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

davisp
Copy link
Member

@davisp davisp commented Mar 30, 2017

Previous the emsort related operations did not update the compaction
task status. For large databases this leads to some very long waits
while the compaction task stays at 100%. This change adds progress
reports to the steps for sorting and copying document ids back into the
database file.


emsort_cb(_Ems, {merge, chain}, {init, Copied, Nodes}) ->
{init, Copied, Nodes + 1};
emsort_cb(_Ems, row_copy, {init, Copied, Nodes}) when Copied >= 1000 ->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Magic constant 1000.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you mean you'd rather that be a define? I just copied the same shape as merge_docids down below.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I mean that it would be better to have a define. Like ?UPDATE_FREQ or ?BATCH_SIZE or something.

@davisp
Copy link
Member Author

davisp commented Mar 30, 2017

Also I need to change my use of DocCount to be TotalChanges. I noticed while testing the optimizations I tried that its currently not correct during compaction retries.

Copy link
Contributor

@iilyak iilyak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good but I have few questions.

]),
0;

emsort_cb(_Ems, row_copy, Copied) when is_integer(Copied), Copied > 1000 ->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Magic constant 1000.

{init, 0, Nodes};
emsort_cb(_Ems, row_copy, {init, Copied, Nodes}) ->
{init, Copied + 1, Nodes};
emsort_cb(Ems, {merge_start, reverse}, {init, Copied, Nodes}) ->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that we can also have {merge_start, forward} event. Should we handle it as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we wouldn't have a progress updates for the second pass. Is it intentional?

Previous the emsort related operations did not update the compaction
task status. For large databases this leads to some very long waits
while the compaction task stays at 100%. This change adds progress
reports to the steps for sorting and copying document ids back into the
database file.
@asfgit asfgit force-pushed the feat-improve-compaction-task-status branch from 1db1337 to ae00a5a Compare March 31, 2017 15:57
@davisp
Copy link
Member Author

davisp commented Mar 31, 2017

Updated to use define's for batch sizes and fixed the changes/doc count mixup.

For the {merge_start, forward}, thats only there for completeness. It felt a bit weird to not include that event even though I don't necessarily need it for this work.

Updates during the second phase happen here:

https://github.com/apache/couchdb-couch/pull/241/files#diff-f6f654ab26b490bab95be6f502c49d89R1376

This is a bit subtle but it seemed like the best I could do without starting to modify the on-disk contents of files which I like to avoid when at all possible.

Its a bit funky but without starting to store the total number of rows in emsort the best approach I had was to guess with the input being the total number of changes processed in this run and then in the first phase of the merge sort just count how many rows we have. Once that first pass is over we can calculate the total number of rows that will be copied and then just update progress in the normal fashion. Its a bit awkward but that emsort_cb kind of switches modes once it sees that the first phase has ended and then its purely just listening for row_copy events.

@davisp
Copy link
Member Author

davisp commented Mar 31, 2017

And I hesitate to store the total number of rows because there'd then be upgrade things to worry about and generally speaking this will just sort itself out while still giving an approximate progress update.

@hubot hubot deleted the feat-improve-compaction-task-status branch April 28, 2017 15:46
@hubot hubot restored the feat-improve-compaction-task-status branch April 28, 2017 20:41
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants