mirror of
https://github.com/borgbackup/borg.git
synced 2024-12-23 16:26:29 +00:00
improve chunker params docs, fixes #362
This commit is contained in:
parent
36cc377329
commit
734dae80ef
2 changed files with 45 additions and 5 deletions
|
@ -196,6 +196,7 @@ to the archive metadata.
|
|||
|
||||
A chunk is stored as an object as well, of course.
|
||||
|
||||
.. _chunker_details:
|
||||
|
||||
Chunks
|
||||
------
|
||||
|
@ -212,16 +213,13 @@ can be used to tune the chunker parameters, the default is:
|
|||
- HASH_MASK_BITS = 16 (statistical medium chunk size ~= 2^16 B = 64 kiB)
|
||||
- HASH_WINDOW_SIZE = 4095 [B] (`0xFFF`)
|
||||
|
||||
The default parameters are OK for relatively small backup data volumes and
|
||||
repository sizes and a lot of available memory (RAM) and disk space for the
|
||||
chunk index. If that does not apply, you are advised to tune these parameters
|
||||
to keep the chunk count lower than with the defaults.
|
||||
|
||||
The buzhash table is altered by XORing it with a seed randomly generated once
|
||||
for the archive, and stored encrypted in the keyfile. This is to prevent chunk
|
||||
size based fingerprinting attacks on your encrypted repo contents (to guess
|
||||
what files you have based on a specific set of chunk sizes).
|
||||
|
||||
For some more general usage hints see also `--chunker-params`.
|
||||
|
||||
|
||||
Indexes / Caches
|
||||
----------------
|
||||
|
|
|
@ -391,6 +391,48 @@ Additional Notes
|
|||
|
||||
Here are misc. notes about topics that are maybe not covered in enough detail in the usage section.
|
||||
|
||||
--chunker-params
|
||||
~~~~~~~~~~~~~~~~
|
||||
The chunker params influence how input files are cut into pieces (chunks)
|
||||
which are then considered for deduplication. They also have a big impact on
|
||||
resource usage (RAM and disk space) as the amount of resources needed is
|
||||
(also) determined by the total amount of chunks in the repository (see
|
||||
`Indexes / Caches memory usage` for details).
|
||||
|
||||
`--chunker-params=10,23,16,4095 (default)` results in a fine-grained deduplication
|
||||
and creates a big amount of chunks and thus uses a lot of resources to manage them.
|
||||
This is good for relatively small data volumes and if the machine has a good
|
||||
amount of free RAM and disk space.
|
||||
|
||||
`--chunker-params=19,23,21,4095` results in a coarse-grained deduplication and
|
||||
creates a much smaller amount of chunks and thus uses less resources.
|
||||
This is good for relatively big data volumes and if the machine has a relatively
|
||||
low amount of free RAM and disk space.
|
||||
|
||||
If you already have made some archives in a repository and you then change
|
||||
chunker params, this of course impacts deduplication as the chunks will be
|
||||
cut differently.
|
||||
|
||||
In the worst case (all files are big and were touched in between backups), this
|
||||
will store all content into the repository again.
|
||||
|
||||
Usually, it is not that bad though:
|
||||
- usually most files are not touched, so it will just re-use the old chunks
|
||||
it already has in the repo
|
||||
- files smaller than the (both old and new) minimum chunksize result in only
|
||||
one chunk anyway, so the resulting chunks are same and deduplication will apply
|
||||
|
||||
If you switch chunker params to save resources for an existing repo that
|
||||
already has some backup archives, you will see an increasing effect over time,
|
||||
when more and more files have been touched and stored again using the bigger
|
||||
chunksize **and** all references to the smaller older chunks have been removed
|
||||
(by deleting / pruning archives).
|
||||
|
||||
If you want to see an immediate big effect on resource usage, you better start
|
||||
a new repository when changing chunker params.
|
||||
|
||||
For more details, see :ref:`chunker_details`.
|
||||
|
||||
--read-special
|
||||
~~~~~~~~~~~~~~
|
||||
|
||||
|
|
Loading…
Reference in a new issue