1
0
Fork 0
mirror of https://github.com/borgbackup/borg.git synced 2024-12-24 00:37:56 +00:00

improve chunker params docs, fixes #362

This commit is contained in:
Thomas Waldmann 2015-11-02 19:47:09 +01:00
parent 36cc377329
commit 734dae80ef
2 changed files with 45 additions and 5 deletions

View file

@ -196,6 +196,7 @@ to the archive metadata.
A chunk is stored as an object as well, of course.
.. _chunker_details:
Chunks
------
@ -212,16 +213,13 @@ can be used to tune the chunker parameters, the default is:
- HASH_MASK_BITS = 16 (statistical medium chunk size ~= 2^16 B = 64 kiB)
- HASH_WINDOW_SIZE = 4095 [B] (`0xFFF`)
The default parameters are OK for relatively small backup data volumes and
repository sizes and a lot of available memory (RAM) and disk space for the
chunk index. If that does not apply, you are advised to tune these parameters
to keep the chunk count lower than with the defaults.
The buzhash table is altered by XORing it with a seed randomly generated once
for the archive, and stored encrypted in the keyfile. This is to prevent chunk
size based fingerprinting attacks on your encrypted repo contents (to guess
what files you have based on a specific set of chunk sizes).
For some more general usage hints see also `--chunker-params`.
Indexes / Caches
----------------

View file

@ -391,6 +391,48 @@ Additional Notes
Here are misc. notes about topics that are maybe not covered in enough detail in the usage section.
--chunker-params
~~~~~~~~~~~~~~~~
The chunker params influence how input files are cut into pieces (chunks)
which are then considered for deduplication. They also have a big impact on
resource usage (RAM and disk space) as the amount of resources needed is
(also) determined by the total amount of chunks in the repository (see
`Indexes / Caches memory usage` for details).
`--chunker-params=10,23,16,4095 (default)` results in a fine-grained deduplication
and creates a big amount of chunks and thus uses a lot of resources to manage them.
This is good for relatively small data volumes and if the machine has a good
amount of free RAM and disk space.
`--chunker-params=19,23,21,4095` results in a coarse-grained deduplication and
creates a much smaller amount of chunks and thus uses less resources.
This is good for relatively big data volumes and if the machine has a relatively
low amount of free RAM and disk space.
If you already have made some archives in a repository and you then change
chunker params, this of course impacts deduplication as the chunks will be
cut differently.
In the worst case (all files are big and were touched in between backups), this
will store all content into the repository again.
Usually, it is not that bad though:
- usually most files are not touched, so it will just re-use the old chunks
it already has in the repo
- files smaller than the (both old and new) minimum chunksize result in only
one chunk anyway, so the resulting chunks are same and deduplication will apply
If you switch chunker params to save resources for an existing repo that
already has some backup archives, you will see an increasing effect over time,
when more and more files have been touched and stored again using the bigger
chunksize **and** all references to the smaller older chunks have been removed
(by deleting / pruning archives).
If you want to see an immediate big effect on resource usage, you better start
a new repository when changing chunker params.
For more details, see :ref:`chunker_details`.
--read-special
~~~~~~~~~~~~~~