1
0
Fork 0
mirror of https://github.com/borgbackup/borg.git synced 2025-01-20 14:29:25 +00:00

be clear about what buzhash is used for, fixes #2390

and want it is not used for (deduplication).

also say already in the readme that we use a cryptohash
for dedupe, so people don't worry.
This commit is contained in:
Thomas Waldmann 2017-04-25 23:38:55 +02:00
parent 6f47b797f9
commit bf69b049e9
2 changed files with 12 additions and 0 deletions

View file

@ -27,6 +27,10 @@ Main features
of bytes stored: each file is split into a number of variable length chunks of bytes stored: each file is split into a number of variable length chunks
and only chunks that have never been seen before are added to the repository. and only chunks that have never been seen before are added to the repository.
A chunk is considered duplicate if its id_hash value is identical.
A cryptographically strong hash or MAC function is used as id_hash, e.g.
(hmac-)sha256.
To deduplicate, all the chunks in the same repository are considered, no To deduplicate, all the chunks in the same repository are considered, no
matter whether they come from different machines, from previous backups, matter whether they come from different machines, from previous backups,
from the same backup or even from the same single file. from the same backup or even from the same single file.

View file

@ -69,6 +69,9 @@ Normally the keys are computed like this::
The id_hash function depends on the :ref:`encryption mode <borg_init>`. The id_hash function depends on the :ref:`encryption mode <borg_init>`.
As the id / key is used for deduplication, id_hash must be a cryptographically
strong hash or MAC.
Segments Segments
~~~~~~~~ ~~~~~~~~
@ -243,6 +246,11 @@ The |project_name| chunker uses a rolling hash computed by the Buzhash_ algorith
It triggers (chunks) when the last HASH_MASK_BITS bits of the hash are zero, It triggers (chunks) when the last HASH_MASK_BITS bits of the hash are zero,
producing chunks of 2^HASH_MASK_BITS Bytes on average. producing chunks of 2^HASH_MASK_BITS Bytes on average.
Buzhash is **only** used for cutting the chunks at places defined by the
content, the buzhash value is **not** used as the deduplication criteria (we
use a cryptographically strong hash/MAC over the chunk contents for this, the
id_hash).
``borg create --chunker-params CHUNK_MIN_EXP,CHUNK_MAX_EXP,HASH_MASK_BITS,HASH_WINDOW_SIZE`` ``borg create --chunker-params CHUNK_MIN_EXP,CHUNK_MAX_EXP,HASH_MASK_BITS,HASH_WINDOW_SIZE``
can be used to tune the chunker parameters, the default is: can be used to tune the chunker parameters, the default is: