From bf69b049e9dca6d66e5a30b7addf70aea86510e8 Mon Sep 17 00:00:00 2001 From: Thomas Waldmann Date: Tue, 25 Apr 2017 23:38:55 +0200 Subject: [PATCH 1/2] be clear about what buzhash is used for, fixes #2390 and want it is not used for (deduplication). also say already in the readme that we use a cryptohash for dedupe, so people don't worry. --- README.rst | 4 ++++ docs/internals/data-structures.rst | 8 ++++++++ 2 files changed, 12 insertions(+) diff --git a/README.rst b/README.rst index 41765b80f..ba7c735f0 100644 --- a/README.rst +++ b/README.rst @@ -27,6 +27,10 @@ Main features of bytes stored: each file is split into a number of variable length chunks and only chunks that have never been seen before are added to the repository. + A chunk is considered duplicate if its id_hash value is identical. + A cryptographically strong hash or MAC function is used as id_hash, e.g. + (hmac-)sha256. + To deduplicate, all the chunks in the same repository are considered, no matter whether they come from different machines, from previous backups, from the same backup or even from the same single file. diff --git a/docs/internals/data-structures.rst b/docs/internals/data-structures.rst index a76f14e1b..339338a37 100644 --- a/docs/internals/data-structures.rst +++ b/docs/internals/data-structures.rst @@ -69,6 +69,9 @@ Normally the keys are computed like this:: The id_hash function depends on the :ref:`encryption mode `. +As the id / key is used for deduplication, id_hash must be a cryptographically +strong hash or MAC. + Segments ~~~~~~~~ @@ -243,6 +246,11 @@ The |project_name| chunker uses a rolling hash computed by the Buzhash_ algorith It triggers (chunks) when the last HASH_MASK_BITS bits of the hash are zero, producing chunks of 2^HASH_MASK_BITS Bytes on average. +Buzhash is **only** used for cutting the chunks at places defined by the +content, the buzhash value is **not** used as the deduplication criteria (we +use a cryptographically strong hash/MAC over the chunk contents for this, the +id_hash). + ``borg create --chunker-params CHUNK_MIN_EXP,CHUNK_MAX_EXP,HASH_MASK_BITS,HASH_WINDOW_SIZE`` can be used to tune the chunker parameters, the default is: From ba20d8d1310139c2e781544c25a77428c223cbd0 Mon Sep 17 00:00:00 2001 From: Thomas Waldmann Date: Wed, 26 Apr 2017 03:16:12 +0200 Subject: [PATCH 2/2] document borg init behaviour via append-only borg serve, fixes #2440 --- docs/usage.rst | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/docs/usage.rst b/docs/usage.rst index a4250aea2..42bda5535 100644 --- a/docs/usage.rst +++ b/docs/usage.rst @@ -677,6 +677,11 @@ in ``.ssh/authorized_keys`` :: command="borg serve --append-only ..." ssh-rsa command="borg serve ..." ssh-rsa +Please note that if you run ``borg init`` via a ``borg serve --append-only`` +server, the repository config will be created with a ``append_only=1`` entry. +This behaviour is subject to change in a later borg version. So, be aware of +it for now, but do not rely on it. + Example +++++++