mirror of
https://github.com/borgbackup/borg.git
synced 2024-12-24 08:45:13 +00:00
improve chunker params docs, fixes #362
This commit is contained in:
parent
36cc377329
commit
734dae80ef
2 changed files with 45 additions and 5 deletions
|
@ -196,6 +196,7 @@ to the archive metadata.
|
||||||
|
|
||||||
A chunk is stored as an object as well, of course.
|
A chunk is stored as an object as well, of course.
|
||||||
|
|
||||||
|
.. _chunker_details:
|
||||||
|
|
||||||
Chunks
|
Chunks
|
||||||
------
|
------
|
||||||
|
@ -212,16 +213,13 @@ can be used to tune the chunker parameters, the default is:
|
||||||
- HASH_MASK_BITS = 16 (statistical medium chunk size ~= 2^16 B = 64 kiB)
|
- HASH_MASK_BITS = 16 (statistical medium chunk size ~= 2^16 B = 64 kiB)
|
||||||
- HASH_WINDOW_SIZE = 4095 [B] (`0xFFF`)
|
- HASH_WINDOW_SIZE = 4095 [B] (`0xFFF`)
|
||||||
|
|
||||||
The default parameters are OK for relatively small backup data volumes and
|
|
||||||
repository sizes and a lot of available memory (RAM) and disk space for the
|
|
||||||
chunk index. If that does not apply, you are advised to tune these parameters
|
|
||||||
to keep the chunk count lower than with the defaults.
|
|
||||||
|
|
||||||
The buzhash table is altered by XORing it with a seed randomly generated once
|
The buzhash table is altered by XORing it with a seed randomly generated once
|
||||||
for the archive, and stored encrypted in the keyfile. This is to prevent chunk
|
for the archive, and stored encrypted in the keyfile. This is to prevent chunk
|
||||||
size based fingerprinting attacks on your encrypted repo contents (to guess
|
size based fingerprinting attacks on your encrypted repo contents (to guess
|
||||||
what files you have based on a specific set of chunk sizes).
|
what files you have based on a specific set of chunk sizes).
|
||||||
|
|
||||||
|
For some more general usage hints see also `--chunker-params`.
|
||||||
|
|
||||||
|
|
||||||
Indexes / Caches
|
Indexes / Caches
|
||||||
----------------
|
----------------
|
||||||
|
|
|
@ -391,6 +391,48 @@ Additional Notes
|
||||||
|
|
||||||
Here are misc. notes about topics that are maybe not covered in enough detail in the usage section.
|
Here are misc. notes about topics that are maybe not covered in enough detail in the usage section.
|
||||||
|
|
||||||
|
--chunker-params
|
||||||
|
~~~~~~~~~~~~~~~~
|
||||||
|
The chunker params influence how input files are cut into pieces (chunks)
|
||||||
|
which are then considered for deduplication. They also have a big impact on
|
||||||
|
resource usage (RAM and disk space) as the amount of resources needed is
|
||||||
|
(also) determined by the total amount of chunks in the repository (see
|
||||||
|
`Indexes / Caches memory usage` for details).
|
||||||
|
|
||||||
|
`--chunker-params=10,23,16,4095 (default)` results in a fine-grained deduplication
|
||||||
|
and creates a big amount of chunks and thus uses a lot of resources to manage them.
|
||||||
|
This is good for relatively small data volumes and if the machine has a good
|
||||||
|
amount of free RAM and disk space.
|
||||||
|
|
||||||
|
`--chunker-params=19,23,21,4095` results in a coarse-grained deduplication and
|
||||||
|
creates a much smaller amount of chunks and thus uses less resources.
|
||||||
|
This is good for relatively big data volumes and if the machine has a relatively
|
||||||
|
low amount of free RAM and disk space.
|
||||||
|
|
||||||
|
If you already have made some archives in a repository and you then change
|
||||||
|
chunker params, this of course impacts deduplication as the chunks will be
|
||||||
|
cut differently.
|
||||||
|
|
||||||
|
In the worst case (all files are big and were touched in between backups), this
|
||||||
|
will store all content into the repository again.
|
||||||
|
|
||||||
|
Usually, it is not that bad though:
|
||||||
|
- usually most files are not touched, so it will just re-use the old chunks
|
||||||
|
it already has in the repo
|
||||||
|
- files smaller than the (both old and new) minimum chunksize result in only
|
||||||
|
one chunk anyway, so the resulting chunks are same and deduplication will apply
|
||||||
|
|
||||||
|
If you switch chunker params to save resources for an existing repo that
|
||||||
|
already has some backup archives, you will see an increasing effect over time,
|
||||||
|
when more and more files have been touched and stored again using the bigger
|
||||||
|
chunksize **and** all references to the smaller older chunks have been removed
|
||||||
|
(by deleting / pruning archives).
|
||||||
|
|
||||||
|
If you want to see an immediate big effect on resource usage, you better start
|
||||||
|
a new repository when changing chunker params.
|
||||||
|
|
||||||
|
For more details, see :ref:`chunker_details`.
|
||||||
|
|
||||||
--read-special
|
--read-special
|
||||||
~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
|
Loading…
Reference in a new issue