1
0
Fork 0
mirror of https://github.com/borgbackup/borg.git synced 2025-03-09 05:16:35 +00:00

docs/data-structures: tie CDC back into dedup rationale

This commit is contained in:
enkore 2021-11-27 18:45:19 +00:00 committed by GitHub
parent 5b297849d3
commit 79cb4e43e5
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23

View file

@ -626,7 +626,11 @@ The idea of content-defined chunking is assigning every byte where a
cut *could* be placed a hash. The hash is based on some number of bytes
(the window size) before the byte in question. Chunks are cut
where the hash satisfies some condition
(usually "n numbers of trailing/leading zeroes").
(usually "n numbers of trailing/leading zeroes"). This causes chunks to be cut
in the same location relative to the file's contents, even if bytes are inserted
or removed before/after a cut, as long as the bytes within the window stay the same.
This results in a high chance that a single cluster of changes to a file will only
result in 1-2 new chunks, aiding deduplication.
Using normal hash functions this would be extremely slow,
requiring hashing ``window size * file size`` bytes.