mirror of
https://github.com/borgbackup/borg.git
synced 2025-03-09 05:16:35 +00:00
docs/data-structures: tie CDC back into dedup rationale
This commit is contained in:
parent
5b297849d3
commit
79cb4e43e5
1 changed files with 5 additions and 1 deletions
|
@ -626,7 +626,11 @@ The idea of content-defined chunking is assigning every byte where a
|
|||
cut *could* be placed a hash. The hash is based on some number of bytes
|
||||
(the window size) before the byte in question. Chunks are cut
|
||||
where the hash satisfies some condition
|
||||
(usually "n numbers of trailing/leading zeroes").
|
||||
(usually "n numbers of trailing/leading zeroes"). This causes chunks to be cut
|
||||
in the same location relative to the file's contents, even if bytes are inserted
|
||||
or removed before/after a cut, as long as the bytes within the window stay the same.
|
||||
This results in a high chance that a single cluster of changes to a file will only
|
||||
result in 1-2 new chunks, aiding deduplication.
|
||||
|
||||
Using normal hash functions this would be extremely slow,
|
||||
requiring hashing ``window size * file size`` bytes.
|
||||
|
|
Loading…
Add table
Reference in a new issue