mirror of
https://github.com/borgbackup/borg.git
synced 2025-03-10 14:15:43 +00:00
update internals doc about chunker params, memory usage and compression
This commit is contained in:
parent
b2f460d591
commit
b5bdb52b6a
1 changed files with 46 additions and 13 deletions
|
@ -168,13 +168,27 @@ A chunk is stored as an object as well, of course.
|
||||||
Chunks
|
Chunks
|
||||||
------
|
------
|
||||||
|
|
||||||
|project_name| uses a rolling hash computed by the Buzhash_ algorithm, with a
|
The |project_name| chunker uses a rolling hash computed by the Buzhash_ algorithm.
|
||||||
window size of 4095 bytes (`0xFFF`), with a minimum chunk size of 1024 bytes.
|
It triggers (chunks) when the last HASH_MASK_BITS bits of the hash are zero,
|
||||||
It triggers (chunks) when the last 16 bits of the hash are zero, producing
|
producing chunks of 2^HASH_MASK_BITS Bytes on average.
|
||||||
chunks of 64kiB on average.
|
|
||||||
|
create --chunker-params CHUNK_MIN_EXP,CHUNK_MAX_EXP,HASH_MASK_BITS,HASH_WINDOW_SIZE
|
||||||
|
can be used to tune the chunker parameters, the default is:
|
||||||
|
|
||||||
|
- CHUNK_MIN_EXP = 10 (minimum chunk size = 2^10 B = 1 kiB)
|
||||||
|
- CHUNK_MAX_EXP = 23 (maximum chunk size = 2^23 B = 8 MiB)
|
||||||
|
- HASH_MASK_BITS = 16 (statistical medium chunk size ~= 2^16 B = 64 kiB)
|
||||||
|
- HASH_WINDOW_SIZE = 4095 [B] (`0xFFF`)
|
||||||
|
|
||||||
|
The default parameters are OK for relatively small backup data volumes and
|
||||||
|
repository sizes and a lot of available memory (RAM) and disk space for the
|
||||||
|
chunk index. If that does not apply, you are advised to tune these parameters
|
||||||
|
to keep the chunk count lower than with the defaults.
|
||||||
|
|
||||||
The buzhash table is altered by XORing it with a seed randomly generated once
|
The buzhash table is altered by XORing it with a seed randomly generated once
|
||||||
for the archive, and stored encrypted in the keyfile.
|
for the archive, and stored encrypted in the keyfile. This is to prevent chunk
|
||||||
|
size based fingerprinting attacks on your encrypted repo contents (to guess
|
||||||
|
what files you have based on a specific set of chunk sizes).
|
||||||
|
|
||||||
|
|
||||||
Indexes / Caches
|
Indexes / Caches
|
||||||
|
@ -243,7 +257,7 @@ Indexes / Caches memory usage
|
||||||
|
|
||||||
Here is the estimated memory usage of |project_name|:
|
Here is the estimated memory usage of |project_name|:
|
||||||
|
|
||||||
chunk_count ~= total_file_size / 65536
|
chunk_count ~= total_file_size / 2 ^ HASH_MASK_BITS
|
||||||
|
|
||||||
repo_index_usage = chunk_count * 40
|
repo_index_usage = chunk_count * 40
|
||||||
|
|
||||||
|
@ -252,20 +266,32 @@ Here is the estimated memory usage of |project_name|:
|
||||||
files_cache_usage = total_file_count * 240 + chunk_count * 80
|
files_cache_usage = total_file_count * 240 + chunk_count * 80
|
||||||
|
|
||||||
mem_usage ~= repo_index_usage + chunks_cache_usage + files_cache_usage
|
mem_usage ~= repo_index_usage + chunks_cache_usage + files_cache_usage
|
||||||
= total_file_count * 240 + total_file_size / 400
|
= chunk_count * 164 + total_file_count * 240
|
||||||
|
|
||||||
All units are Bytes.
|
All units are Bytes.
|
||||||
|
|
||||||
It is assuming every chunk is referenced exactly once and that typical chunk size is 64kiB.
|
It is assuming every chunk is referenced exactly once (if you have a lot of
|
||||||
|
duplicate chunks, you will have less chunks than estimated above).
|
||||||
|
|
||||||
|
It is also assuming that typical chunk size is 2^HASH_MASK_BITS (if you have
|
||||||
|
a lot of files smaller than this statistical medium chunk size, you will have
|
||||||
|
more chunks than estimated above, because 1 file is at least 1 chunk).
|
||||||
|
|
||||||
If a remote repository is used the repo index will be allocated on the remote side.
|
If a remote repository is used the repo index will be allocated on the remote side.
|
||||||
|
|
||||||
E.g. backing up a total count of 1Mi files with a total size of 1TiB:
|
E.g. backing up a total count of 1Mi files with a total size of 1TiB.
|
||||||
|
|
||||||
mem_usage = 1 * 2**20 * 240 + 1 * 2**40 / 400 = 2.8GiB
|
a) with create --chunker-params 10,23,16,4095 (default):
|
||||||
|
|
||||||
Note: there is a commandline option to switch off the files cache. You'll save
|
mem_usage = 2.8GiB
|
||||||
some memory, but it will need to read / chunk all the files then.
|
|
||||||
|
b) with create --chunker-params 10,23,20,4095 (custom):
|
||||||
|
|
||||||
|
mem_usage = 0.4GiB
|
||||||
|
|
||||||
|
Note: there is also the --no-files-cache option to switch off the files cache.
|
||||||
|
You'll save some memory, but it will need to read / chunk all the files then as
|
||||||
|
it can not skip unmodified files then.
|
||||||
|
|
||||||
|
|
||||||
Encryption
|
Encryption
|
||||||
|
@ -291,6 +317,7 @@ Encryption keys are either derived from a passphrase or kept in a key file.
|
||||||
The passphrase is passed through the ``BORG_PASSPHRASE`` environment variable
|
The passphrase is passed through the ``BORG_PASSPHRASE`` environment variable
|
||||||
or prompted for interactive usage.
|
or prompted for interactive usage.
|
||||||
|
|
||||||
|
|
||||||
Key files
|
Key files
|
||||||
---------
|
---------
|
||||||
|
|
||||||
|
@ -355,4 +382,10 @@ representation of the repository id.
|
||||||
Compression
|
Compression
|
||||||
-----------
|
-----------
|
||||||
|
|
||||||
Currently, compression is disabled by default. Zlib compression can be enabled by passing ``--compression level`` on the command line. Level can be anything from 0 (no compression, fast) to 9 (high compression, slow).
|
|project_name| currently always pipes all data through a zlib compressor which
|
||||||
|
supports compression levels 0 (no compression, fast) to 9 (high compression, slow).
|
||||||
|
|
||||||
|
See ``borg create --help`` about how to specify the compression level and its default.
|
||||||
|
|
||||||
|
Note: zlib level 0 creates a little bit more output data than it gets as input,
|
||||||
|
due to zlib protocol overhead.
|
||||||
|
|
Loading…
Add table
Reference in a new issue