New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

tutorial/lesson: deduplication and chunking algorithms #22

Open

flyingzumwalt opened this issue May 19, 2017 · 1 comment

Labels

requested tutorial/lesson

Collaborator

flyingzumwalt commented May 19, 2017

Explain how content-addressed tools like IPFS are able to deduplicate content by breaking it into chunks

different senses of deduplication:
- deduplication of un-changed portions of file/dataset when you're storing a series of versions of that file/dataset
- deduplication of repeated content that appears multiple times in a corpus (ie. same file in 2 places)
chunking algorithms, in principle
rabin fingerprinting vs. fixed-size chunks
the ideal: match the chunking algorithm to the content type

flyingzumwalt added the requested tutorial/lesson label

Collaborator Author

flyingzumwalt commented May 20, 2017 •

edited

Loading

Could reuse the diagrams from this: https://github.com/swadeshi/decentralized-web-primer/blob/gh-pages/presentations/Replication%20Patterns.pdf

(I made some of these slides into animations):

How Bittorrent Replicates Files (video)
Replicating Changes using Hash Trees (video)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment