Issue1322 post exchange split revamp #1329
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
fix #1322
partially fix #1297
The post_exchangeSplit was originally conceived as a means of scaling duplicate suppression beyond what a single process could support. Duplicates were defined as files that had the same checksum, so splitting them among exchanges based on the checksum is what made sense initially. This method of load distribution will assign the same file path to different exchanges if they are not identical (don't have the same checksum.)
As work with mirroring has proceeded, it turns out that a file is really a stream of changes to a given path, and to make good decisions about it, you want a single process to receive all the updates for a given path. if a file is modified and then removed... the removal event would not necessarily have a related checksum to the modification. Similarly for a symbolic link.
Changing the hasing algorithm to be based on path name, rather than checksum: