mirror of
https://github.com/borgbackup/borg.git
synced 2025-03-10 14:15:43 +00:00
update internals doc about chunker params, memory usage and compression
This commit is contained in:
parent
b2f460d591
commit
b5bdb52b6a
1 changed files with 46 additions and 13 deletions
|
@ -168,13 +168,27 @@ A chunk is stored as an object as well, of course.
|
|||
Chunks
|
||||
------
|
||||
|
||||
|project_name| uses a rolling hash computed by the Buzhash_ algorithm, with a
|
||||
window size of 4095 bytes (`0xFFF`), with a minimum chunk size of 1024 bytes.
|
||||
It triggers (chunks) when the last 16 bits of the hash are zero, producing
|
||||
chunks of 64kiB on average.
|
||||
The |project_name| chunker uses a rolling hash computed by the Buzhash_ algorithm.
|
||||
It triggers (chunks) when the last HASH_MASK_BITS bits of the hash are zero,
|
||||
producing chunks of 2^HASH_MASK_BITS Bytes on average.
|
||||
|
||||
create --chunker-params CHUNK_MIN_EXP,CHUNK_MAX_EXP,HASH_MASK_BITS,HASH_WINDOW_SIZE
|
||||
can be used to tune the chunker parameters, the default is:
|
||||
|
||||
- CHUNK_MIN_EXP = 10 (minimum chunk size = 2^10 B = 1 kiB)
|
||||
- CHUNK_MAX_EXP = 23 (maximum chunk size = 2^23 B = 8 MiB)
|
||||
- HASH_MASK_BITS = 16 (statistical medium chunk size ~= 2^16 B = 64 kiB)
|
||||
- HASH_WINDOW_SIZE = 4095 [B] (`0xFFF`)
|
||||
|
||||
The default parameters are OK for relatively small backup data volumes and
|
||||
repository sizes and a lot of available memory (RAM) and disk space for the
|
||||
chunk index. If that does not apply, you are advised to tune these parameters
|
||||
to keep the chunk count lower than with the defaults.
|
||||
|
||||
The buzhash table is altered by XORing it with a seed randomly generated once
|
||||
for the archive, and stored encrypted in the keyfile.
|
||||
for the archive, and stored encrypted in the keyfile. This is to prevent chunk
|
||||
size based fingerprinting attacks on your encrypted repo contents (to guess
|
||||
what files you have based on a specific set of chunk sizes).
|
||||
|
||||
|
||||
Indexes / Caches
|
||||
|
@ -243,7 +257,7 @@ Indexes / Caches memory usage
|
|||
|
||||
Here is the estimated memory usage of |project_name|:
|
||||
|
||||
chunk_count ~= total_file_size / 65536
|
||||
chunk_count ~= total_file_size / 2 ^ HASH_MASK_BITS
|
||||
|
||||
repo_index_usage = chunk_count * 40
|
||||
|
||||
|
@ -252,20 +266,32 @@ Here is the estimated memory usage of |project_name|:
|
|||
files_cache_usage = total_file_count * 240 + chunk_count * 80
|
||||
|
||||
mem_usage ~= repo_index_usage + chunks_cache_usage + files_cache_usage
|
||||
= total_file_count * 240 + total_file_size / 400
|
||||
= chunk_count * 164 + total_file_count * 240
|
||||
|
||||
All units are Bytes.
|
||||
|
||||
It is assuming every chunk is referenced exactly once and that typical chunk size is 64kiB.
|
||||
It is assuming every chunk is referenced exactly once (if you have a lot of
|
||||
duplicate chunks, you will have less chunks than estimated above).
|
||||
|
||||
It is also assuming that typical chunk size is 2^HASH_MASK_BITS (if you have
|
||||
a lot of files smaller than this statistical medium chunk size, you will have
|
||||
more chunks than estimated above, because 1 file is at least 1 chunk).
|
||||
|
||||
If a remote repository is used the repo index will be allocated on the remote side.
|
||||
|
||||
E.g. backing up a total count of 1Mi files with a total size of 1TiB:
|
||||
E.g. backing up a total count of 1Mi files with a total size of 1TiB.
|
||||
|
||||
mem_usage = 1 * 2**20 * 240 + 1 * 2**40 / 400 = 2.8GiB
|
||||
a) with create --chunker-params 10,23,16,4095 (default):
|
||||
|
||||
Note: there is a commandline option to switch off the files cache. You'll save
|
||||
some memory, but it will need to read / chunk all the files then.
|
||||
mem_usage = 2.8GiB
|
||||
|
||||
b) with create --chunker-params 10,23,20,4095 (custom):
|
||||
|
||||
mem_usage = 0.4GiB
|
||||
|
||||
Note: there is also the --no-files-cache option to switch off the files cache.
|
||||
You'll save some memory, but it will need to read / chunk all the files then as
|
||||
it can not skip unmodified files then.
|
||||
|
||||
|
||||
Encryption
|
||||
|
@ -291,6 +317,7 @@ Encryption keys are either derived from a passphrase or kept in a key file.
|
|||
The passphrase is passed through the ``BORG_PASSPHRASE`` environment variable
|
||||
or prompted for interactive usage.
|
||||
|
||||
|
||||
Key files
|
||||
---------
|
||||
|
||||
|
@ -355,4 +382,10 @@ representation of the repository id.
|
|||
Compression
|
||||
-----------
|
||||
|
||||
Currently, compression is disabled by default. Zlib compression can be enabled by passing ``--compression level`` on the command line. Level can be anything from 0 (no compression, fast) to 9 (high compression, slow).
|
||||
|project_name| currently always pipes all data through a zlib compressor which
|
||||
supports compression levels 0 (no compression, fast) to 9 (high compression, slow).
|
||||
|
||||
See ``borg create --help`` about how to specify the compression level and its default.
|
||||
|
||||
Note: zlib level 0 creates a little bit more output data than it gets as input,
|
||||
due to zlib protocol overhead.
|
||||
|
|
Loading…
Add table
Reference in a new issue