Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue1322 post exchange split revamp #1329

Merged
merged 2 commits into from
Dec 6, 2024

Conversation

petersilva
Copy link
Contributor

fix #1322
partially fix #1297

The post_exchangeSplit was originally conceived as a means of scaling duplicate suppression beyond what a single process could support. Duplicates were defined as files that had the same checksum, so splitting them among exchanges based on the checksum is what made sense initially. This method of load distribution will assign the same file path to different exchanges if they are not identical (don't have the same checksum.)

As work with mirroring has proceeded, it turns out that a file is really a stream of changes to a given path, and to make good decisions about it, you want a single process to receive all the updates for a given path. if a file is modified and then removed... the removal event would not necessarily have a related checksum to the modification. Similarly for a symbolic link.

Changing the hasing algorithm to be based on path name, rather than checksum:

  • If files have the same path, and the same checksum... they are still duplicates. so that still works.
  • If files have the same path, but different checksums... fine it should be processed same as today, just by the same participant in the exchangeSplit pool of flows.
  • If files have the same path but different type (link, directory, file... remove event.) they should now show up in the same participant in the exchangeSplit pool of flows.
  • it was a sorry that when hashing based on checksums, we hamper the effectiveness of the duplicate suppresion path by removing some hashes from it, resulting in an unbalanced tree. hashing on path is likely to yield better/faster associative arrays for duplicate suppression within each participant in the exchangeSplit pool of flows.

Copy link

github-actions bot commented Dec 5, 2024

Test Results

238 tests   235 ✅  1m 26s ⏱️
  1 suites    1 💤
  1 files      2 ❌

For more details on these failures, see this check.

Results for commit 2d7466b.

Copy link
Member

@reidsunderland reidsunderland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, it's a draft so i'm not sure if i was supposed to look at it yet

@petersilva petersilva marked this pull request as ready for review December 6, 2024 00:44
@petersilva petersilva merged commit cf36775 into development Dec 6, 2024
32 of 59 checks passed
@petersilva petersilva deleted the issue1322_post_exchangeSplit_revamp branch December 13, 2024 22:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants