mirror of https://github.com/borgbackup/borg.git
Merge pull request #1882 from ThomasWaldmann/docs-rsrc-usage
add more details about resource usage
This commit is contained in:
commit
8465979956
|
@ -364,7 +364,7 @@ varies between 33% and 300%.
|
||||||
Indexes / Caches memory usage
|
Indexes / Caches memory usage
|
||||||
-----------------------------
|
-----------------------------
|
||||||
|
|
||||||
Here is the estimated memory usage of |project_name|:
|
Here is the estimated memory usage of |project_name| - it's complicated:
|
||||||
|
|
||||||
chunk_count ~= total_file_size / 2 ^ HASH_MASK_BITS
|
chunk_count ~= total_file_size / 2 ^ HASH_MASK_BITS
|
||||||
|
|
||||||
|
@ -377,6 +377,14 @@ Here is the estimated memory usage of |project_name|:
|
||||||
mem_usage ~= repo_index_usage + chunks_cache_usage + files_cache_usage
|
mem_usage ~= repo_index_usage + chunks_cache_usage + files_cache_usage
|
||||||
= chunk_count * 164 + total_file_count * 240
|
= chunk_count * 164 + total_file_count * 240
|
||||||
|
|
||||||
|
Due to the hashtables, the best/usual/worst cases for memory allocation can
|
||||||
|
be estimated like that:
|
||||||
|
|
||||||
|
mem_allocation = mem_usage / load_factor # l_f = 0.25 .. 0.75
|
||||||
|
|
||||||
|
mem_allocation_peak = mem_allocation * (1 + growth_factor) # g_f = 1.1 .. 2
|
||||||
|
|
||||||
|
|
||||||
All units are Bytes.
|
All units are Bytes.
|
||||||
|
|
||||||
It is assuming every chunk is referenced exactly once (if you have a lot of
|
It is assuming every chunk is referenced exactly once (if you have a lot of
|
||||||
|
@ -388,6 +396,17 @@ more chunks than estimated above, because 1 file is at least 1 chunk).
|
||||||
|
|
||||||
If a remote repository is used the repo index will be allocated on the remote side.
|
If a remote repository is used the repo index will be allocated on the remote side.
|
||||||
|
|
||||||
|
The chunks cache, files cache and the repo index are all implemented as hash
|
||||||
|
tables. A hash table must have a significant amount of unused entries to be
|
||||||
|
fast - the so-called load factor gives the used/unused elements ratio.
|
||||||
|
|
||||||
|
When a hash table gets full (load factor getting too high), it needs to be
|
||||||
|
grown (allocate new, bigger hash table, copy all elements over to it, free old
|
||||||
|
hash table) - this will lead to short-time peaks in memory usage each time this
|
||||||
|
happens. Usually does not happen for all hashtables at the same time, though.
|
||||||
|
For small hash tables, we start with a growth factor of 2, which comes down to
|
||||||
|
~1.1x for big hash tables.
|
||||||
|
|
||||||
E.g. backing up a total count of 1 Mi (IEC binary prefix i.e. 2^20) files with a total size of 1TiB.
|
E.g. backing up a total count of 1 Mi (IEC binary prefix i.e. 2^20) files with a total size of 1TiB.
|
||||||
|
|
||||||
a) with ``create --chunker-params 10,23,16,4095`` (custom, like borg < 1.0 or attic):
|
a) with ``create --chunker-params 10,23,16,4095`` (custom, like borg < 1.0 or attic):
|
||||||
|
|
|
@ -206,36 +206,80 @@ Resource Usage
|
||||||
|
|
||||||
|project_name| might use a lot of resources depending on the size of the data set it is dealing with.
|
|project_name| might use a lot of resources depending on the size of the data set it is dealing with.
|
||||||
|
|
||||||
CPU:
|
If one uses |project_name| in a client/server way (with a ssh: repository),
|
||||||
|
the resource usage occurs in part on the client and in another part on the
|
||||||
|
server.
|
||||||
|
|
||||||
|
If one uses |project_name| as a single process (with a filesystem repo),
|
||||||
|
all the resource usage occurs in that one process, so just add up client +
|
||||||
|
server to get the approximate resource usage.
|
||||||
|
|
||||||
|
CPU client:
|
||||||
|
borg create: does chunking, hashing, compression, crypto (high CPU usage)
|
||||||
|
chunks cache sync: quite heavy on CPU, doing lots of hashtable operations.
|
||||||
|
borg extract: crypto, decompression (medium to high CPU usage)
|
||||||
|
borg check: similar to extract, but depends on options given.
|
||||||
|
borg prune / borg delete archive: low to medium CPU usage
|
||||||
|
borg delete repo: done on the server
|
||||||
It won't go beyond 100% of 1 core as the code is currently single-threaded.
|
It won't go beyond 100% of 1 core as the code is currently single-threaded.
|
||||||
Especially higher zlib and lzma compression levels use significant amounts
|
Especially higher zlib and lzma compression levels use significant amounts
|
||||||
of CPU cycles.
|
of CPU cycles. Crypto might be cheap on the CPU (if hardware accelerated) or
|
||||||
|
expensive (if not).
|
||||||
|
|
||||||
Memory (RAM):
|
CPU server:
|
||||||
|
It usually doesn't need much CPU, it just deals with the key/value store
|
||||||
|
(repository) and uses the repository index for that.
|
||||||
|
|
||||||
|
borg check: the repository check computes the checksums of all chunks
|
||||||
|
(medium CPU usage)
|
||||||
|
borg delete repo: low CPU usage
|
||||||
|
|
||||||
|
CPU (only for client/server operation):
|
||||||
|
When using borg in a client/server way with a ssh:-type repo, the ssh
|
||||||
|
processes used for the transport layer will need some CPU on the client and
|
||||||
|
on the server due to the crypto they are doing - esp. if you are pumping
|
||||||
|
big amounts of data.
|
||||||
|
|
||||||
|
Memory (RAM) client:
|
||||||
The chunks index and the files index are read into memory for performance
|
The chunks index and the files index are read into memory for performance
|
||||||
reasons.
|
reasons. Might need big amounts of memory (see below).
|
||||||
Compression, esp. lzma compression with high levels might need substantial
|
Compression, esp. lzma compression with high levels might need substantial
|
||||||
amounts of memory.
|
amounts of memory.
|
||||||
|
|
||||||
Temporary files:
|
Memory (RAM) server:
|
||||||
Reading data and metadata from a FUSE mounted repository will consume about
|
The server process will load the repository index into memory. Might need
|
||||||
the same space as the deduplicated chunks used to represent them in the
|
considerable amounts of memory, but less than on the client (see below).
|
||||||
repository.
|
|
||||||
|
|
||||||
Cache files:
|
Chunks index (client only):
|
||||||
Contains the chunks index and files index (plus a compressed collection of
|
|
||||||
single-archive chunk indexes).
|
|
||||||
|
|
||||||
Chunks index:
|
|
||||||
Proportional to the amount of data chunks in your repo. Lots of chunks
|
Proportional to the amount of data chunks in your repo. Lots of chunks
|
||||||
in your repo imply a big chunks index.
|
in your repo imply a big chunks index.
|
||||||
It is possible to tweak the chunker params (see create options).
|
It is possible to tweak the chunker params (see create options).
|
||||||
|
|
||||||
Files index:
|
Files index (client only):
|
||||||
Proportional to the amount of files in your last backup. Can be switched
|
Proportional to the amount of files in your last backups. Can be switched
|
||||||
off (see create options), but next backup will be much slower if you do.
|
off (see create options), but next backup might be much slower if you do.
|
||||||
|
The speed benefit of using the files cache is proportional to file size.
|
||||||
|
|
||||||
Network:
|
Repository index (server only):
|
||||||
|
Proportional to the amount of data chunks in your repo. Lots of chunks
|
||||||
|
in your repo imply a big repository index.
|
||||||
|
It is possible to tweak the chunker params (see create options) to
|
||||||
|
influence the amount of chunks being created.
|
||||||
|
|
||||||
|
Temporary files (client):
|
||||||
|
Reading data and metadata from a FUSE mounted repository will consume up to
|
||||||
|
the size of all deduplicated, small chunks in the repository. Big chunks
|
||||||
|
won't be locally cached.
|
||||||
|
|
||||||
|
Temporary files (server):
|
||||||
|
None.
|
||||||
|
|
||||||
|
Cache files (client only):
|
||||||
|
Contains the chunks index and files index (plus a collection of single-
|
||||||
|
archive chunk indexes which might need huge amounts of disk space,
|
||||||
|
depending on archive count and size - see FAQ about how to reduce).
|
||||||
|
|
||||||
|
Network (only for client/server operation):
|
||||||
If your repository is remote, all deduplicated (and optionally compressed/
|
If your repository is remote, all deduplicated (and optionally compressed/
|
||||||
encrypted) data of course has to go over the connection (ssh: repo url).
|
encrypted) data of course has to go over the connection (ssh: repo url).
|
||||||
If you use a locally mounted network filesystem, additionally some copy
|
If you use a locally mounted network filesystem, additionally some copy
|
||||||
|
@ -243,7 +287,8 @@ Network:
|
||||||
you backup multiple sources to one target repository, additional traffic
|
you backup multiple sources to one target repository, additional traffic
|
||||||
happens for cache resynchronization.
|
happens for cache resynchronization.
|
||||||
|
|
||||||
In case you are interested in more details, please read the internals documentation.
|
In case you are interested in more details (like formulas), please see
|
||||||
|
:ref:`internals`.
|
||||||
|
|
||||||
|
|
||||||
Units
|
Units
|
||||||
|
|
Loading…
Reference in New Issue