docs: misc. updates

- remove outdated stuff
- fix infos for borg 2.0
This commit is contained in:
Thomas Waldmann 2022-08-01 23:47:26 +02:00
parent b0480a06d6
commit 89e4a52c56
11 changed files with 85 additions and 164 deletions

View File

@ -53,7 +53,7 @@ requests (if you don't have GitHub or don't want to use it you can
send smaller patches via the borgbackup mailing list to the maintainers).
Stable releases are maintained on maintenance branches named ``x.y-maint``, eg.
the maintenance branch of the 1.0.x series is ``1.0-maint``.
the maintenance branch of the 1.2.x series is ``1.2-maint``.
Most PRs should be filed against the ``master`` branch. Only if an
issue affects **only** a particular maintenance branch a PR should be

View File

@ -51,43 +51,26 @@ Can I copy or synchronize my repo to another location?
If you want to have redundant backup repositories (preferably at separate
locations), the recommended way to do that is like this:
- ``borg rcreate repo1``
- ``borg rcreate repo2``
- ``borg rcreate repo1 --encryption=X``
- ``borg rcreate repo2 --encryption=X --other-repo=repo1``
- maybe do a snapshot to have stable and same input data for both borg create.
- client machine ---borg create---> repo1
- client machine ---borg create---> repo2
This will create distinct repositories (separate repo ID, separate
keys) and nothing bad happening in repo1 will influence repo2.
This will create distinct (different repo ID), but related repositories.
Related means using the same chunker secret and the same id_key, thus producing
the same chunks / the same chunk ids if the input data is the same.
Some people decide against above recommendation and create identical
copies of a repo (using some copy / sync / clone tool).
The 2 independent borg create invocations mean that there is no error propagation
from repo1 to repo2 when done like that.
While this might be better than having no redundancy at all, you have
to be very careful about how you do that and what you may / must not
do with the result (if you decide against our recommendation).
An alternative way would be to use ``borg transfer`` to copy backup archives
from repo1 to repo2. Likely a bit more efficient and the archives would be identical,
but suffering from potential error propagation.
What you would get with this is:
- client machine ---borg create---> repo
- repo ---copy/sync---> copy-of-repo
There is no special borg command to do the copying, you could just
use any reliable tool that creates an identical copy (cp, rsync, rclone
might be options).
But think about whether that is really what you want. If something goes
wrong in repo, you will have the same issue in copy-of-repo.
Make sure you do the copy/sync while no backup is running, see
:ref:`borg_with-lock` about how to do that.
Also, you must not run borg against multiple instances of the same repo
(like repo and copy-of-repo) as that would create severe issues:
- Data loss: they have the same repository ID, so the borg client will
think they are identical and e.g. use the same local cache for them
(which is an issue if they happen to be not the same).
See :issue:`4272` for an example.
Warning: using borg with multiple repositories with identical repository ID (like when
creating 1:1 repository copies) is not supported and can lead to all sorts of issues,
like e.g. cache coherency issues, malfunction, data corruption.
"this is either an attack or unsafe" warning
--------------------------------------------
@ -192,7 +175,13 @@ that option under any normal circumstances.
How can I backup huge file(s) over a unstable connection?
---------------------------------------------------------
This is not a problem anymore.
Yes. For more details, see :ref:`checkpoints_parts`.
How can I restore huge file(s) over an unstable connection?
-----------------------------------------------------------
If you cannot manage to extract the whole big file in one go, you can extract
all the part files and manually concatenate them together.
For more details, see :ref:`checkpoints_parts`.
@ -203,21 +192,12 @@ You could do that (via borg config REPO append_only 0/1), but using different
ssh keys and different entries in ``authorized_keys`` is much easier and also
maybe has less potential of things going wrong somehow.
My machine goes to sleep causing `Broken pipe`
----------------------------------------------
When backing up your data over the network, your machine should not go to sleep.
While backing up your data over the network, your machine should not go to sleep.
On macOS you can use `caffeinate` to avoid that.
How can I restore huge file(s) over an unstable connection?
-----------------------------------------------------------
If you cannot manage to extract the whole big file in one go, you can extract
all the part files and manually concatenate them together.
For more details, see :ref:`checkpoints_parts`.
How can I compare contents of an archive to my local filesystem?
-----------------------------------------------------------------
@ -385,9 +365,9 @@ Why is the time elapsed in the archive stats different from wall clock time?
----------------------------------------------------------------------------
Borg needs to write the time elapsed into the archive metadata before finalizing
the archive, compacting the segments, and committing the repo & cache. This means
when Borg is run with e.g. the ``time`` command, the duration shown in the archive
stats may be shorter than the full time the command runs for.
the archive and committing the repo & cache.
This means when Borg is run with e.g. the ``time`` command, the duration shown
in the archive stats may be shorter than the full time the command runs for.
How do I configure different prune policies for different directories?
----------------------------------------------------------------------
@ -810,13 +790,12 @@ and disk space on subsequent runs. Here what Borg does when you run ``borg creat
fast). If so, the processing of the chunk is completed here. Otherwise it needs to
process the chunk:
- Compresses (the default lz4 is super fast)
- Encrypts (AES, usually fast if your CPU has AES acceleration as usual
since about 10y)
- Authenticates ("signs") using hmac-sha256 or blake2b (see above),
- Encrypts and authenticates (AES-OCB, usually fast if your CPU has AES acceleration as usual
since about 10y, or chacha20-poly1305, fast pure-software crypto)
- Transmits to repo. If the repo is remote, this usually involves an SSH connection
(does its own encryption / authentication).
- Stores the chunk into a key/value store (the key is the chunk id, the value
is the data). While doing that, it computes a CRC32 of the data (repo low-level
is the data). While doing that, it computes CRC32 / XXH64 of the data (repo low-level
checksum, used by borg check --repository) and also updates the repo index
(another hashtable).
@ -860,13 +839,6 @@ If you feel your Borg backup is too slow somehow, here is what you can do:
``--noacls``, ``--noxattrs``. This can lead to noticeable performance improvements
when your backup consists of many small files.
If you feel that Borg "freezes" on a file, it could be in the middle of processing a
large file (like ISOs or VM images). Borg < 1.2 announces file names *after* finishing
with the file. This can lead to displaying the name of a small file, while processing the
next (larger) file. For very big files this can lead to the progress display show some
previous short file for a long time while it processes the big one. With Borg 1.2 this
was changed to announcing the filename before starting to process it.
To see what files have changed and take more time processing, you can also add
``--list --filter=AME --stats`` to your ``borg create`` call to produce more log output,
including a file list (with file status characters) and also some statistics at

View File

@ -143,9 +143,14 @@ Index, hints and integrity
The **repository index** is stored in ``index.<TRANSACTION_ID>`` and is used to
determine an object's location in the repository. It is a HashIndex_,
a hash table using open addressing. It maps object keys_ to two
unsigned 32-bit integers; the first integer gives the segment number,
the second indicates the offset of the object's entry within the segment.
a hash table using open addressing.
It maps object keys_ to:
* segment number (unit32)
* offset of the object's entry within the segment (uint32)
* size of the payload, not including the entry header (uint32)
* flags (uint32)
The **hints file** is a msgpacked file named ``hints.<TRANSACTION_ID>``.
It contains:
@ -153,6 +158,8 @@ It contains:
* version
* list of segments
* compact
* shadow_index
* storage_quota_use
The **integrity file** is a msgpacked file named ``integrity.<TRANSACTION_ID>``.
It contains checksums of the index and hints files and is described in the
@ -176,17 +183,9 @@ Since writing a ``DELETE`` tag does not actually delete any data and
thus does not free disk space any log-based data store will need a
compaction strategy (somewhat analogous to a garbage collector).
Borg uses a simple forward compacting algorithm,
which avoids modifying existing segments.
Borg uses a simple forward compacting algorithm, which avoids modifying existing segments.
Compaction runs when a commit is issued with ``compact=True`` parameter, e.g.
by the ``borg compact`` command (unless the :ref:`append_only_mode` is active).
One client transaction can manifest as multiple physical transactions,
since compaction is transacted, too, and Borg does not distinguish between the two::
Perspective| Time -->
-----------+--------------
Client | Begin transaction - Modify Data - Commit | <client waits for repository> (done)
Repository | Begin transaction - Modify Data - Commit | Compact segments - Commit | (done)
The compaction algorithm requires two inputs in addition to the segments themselves:
@ -198,9 +197,6 @@ The compaction algorithm requires two inputs in addition to the segments themsel
to be stored as well. Therefore, Borg stores a mapping ``(segment
id,) -> (number of sparse bytes,)``.
The 1.0.x series used a simpler non-conditional algorithm,
which only required the list of sparse segments. Thus,
it only stored a list, not the mapping described above.
(ii) Each segment's reference count, which indicates how many live objects are in a segment.
This is not strictly required to perform the algorithm. Rather, it is used to validate
that a segment is unused before deleting it. If the algorithm is incorrect, or the reference
@ -209,14 +205,7 @@ The compaction algorithm requires two inputs in addition to the segments themsel
These two pieces of information are stored in the hints file (`hints.N`)
next to the index (`index.N`).
When loading a hints file, Borg checks the version contained in the file.
The 1.0.x series writes version 1 of the format (with the segments list instead
of the mapping, mentioned above). Since Borg 1.0.4, version 2 is read as well.
The 1.1.x series writes version 2 of the format and reads either version.
When reading a version 1 hints file, Borg 1.1.x will
read all sparse segments to determine their sparsity.
This process may take some time if a repository has been kept in append-only mode
Compaction may take some time if a repository has been kept in append-only mode
or ``borg compact`` has not been used for a longer time, which both has caused
the number of sparse segments to grow.
@ -578,7 +567,7 @@ dictionary created by the ``Item`` class that contains:
* source (for symlinks)
* hlid (for hardlinks)
* rdev (for device files)
* mtime, atime, ctime in nanoseconds
* mtime, atime, ctime, birthtime in nanoseconds
* xattrs
* acl (various OS-dependent fields)
* flags
@ -689,14 +678,14 @@ In memory, the files cache is a key -> value mapping (a Python *dict*) and conta
- file inode number
- file size
- file mtime_ns
- file ctime_ns (or mtime_ns)
- age (0 [newest], 1, 2, 3, ..., BORG_FILES_CACHE_TTL - 1)
- list of chunk ids representing the file's contents
To determine whether a file has not changed, cached values are looked up via
the key in the mapping and compared to the current file attribute values.
If the file's size, mtime_ns and inode number is still the same, it is
If the file's size, timestamp and inode number is still the same, it is
considered to not have changed. In that case, we check that all file content
chunks are (still) present in the repository (we check that via the chunks
cache).
@ -714,7 +703,7 @@ different files, as a single path may not be unique across different
archives in different setups.
Not all filesystems have stable inode numbers. If that is the case, borg can
be told to ignore the inode number in the check via --ignore-inode.
be told to ignore the inode number in the check via --files-cache.
The age value is used for cache management. If a file is "seen" in a backup
run, its age is reset to 0, otherwise its age is incremented by one.
@ -802,7 +791,7 @@ For small hash tables, we start with a growth factor of 2, which comes down to
E.g. backing up a total count of 1 Mi (IEC binary prefix i.e. 2^20) files with a total size of 1TiB.
a) with ``create --chunker-params buzhash,10,23,16,4095`` (custom, like borg < 1.0 or attic):
a) with ``create --chunker-params buzhash,10,23,16,4095`` (custom, like borg < 1.0):
mem_usage = 2.8GiB
@ -887,7 +876,8 @@ Encryption
AEAD modes
~~~~~~~~~~
Uses modern AEAD ciphers: AES-OCB or CHACHA20-POLY1305.
For new repositories, borg only uses modern AEAD ciphers: AES-OCB or CHACHA20-POLY1305.
For each borg invocation, a new sessionkey is derived from the borg key material
and the 48bit IV starts from 0 again (both ciphers internally add a 32bit counter
to our IV, so we'll just count up by 1 per chunk).
@ -909,24 +899,11 @@ even higher limit.
Legacy modes
~~~~~~~~~~~~
AES_-256 is used in CTR mode (so no need for padding). A 64 bit initialization
vector is used, a MAC is computed on the encrypted chunk
and both are stored in the chunk. Encryption and MAC use two different keys.
Each chunk consists of ``TYPE(1)`` + ``MAC(32)`` + ``NONCE(8)`` + ``CIPHERTEXT``:
Old repositories (which used AES-CTR mode) are supported read-only to be able to
``borg transfer`` their archives to new repositories (which use AEAD modes).
.. figure:: encryption.png
:figwidth: 100%
:width: 100%
In AES-CTR mode you can think of the IV as the start value for the counter.
The counter itself is incremented by one after each 16 byte block.
The IV/counter is not required to be random but it must NEVER be reused.
So to accomplish this Borg initializes the encryption counter to be
higher than any previously used counter value before encrypting new data.
To reduce payload size, only 8 bytes of the 16 bytes nonce is saved in the
payload, the first 8 bytes are always zeros. This does not affect security but
limits the maximum repository capacity to only 295 exabytes (2**64 * 16 bytes).
AES-CTR mode is not supported for new repositories and the related code will be
removed in a future release.
Both modes
~~~~~~~~~~
@ -947,13 +924,11 @@ Key files
.. seealso:: The :ref:`key_encryption` section for an in-depth review of the key encryption.
When initialized with the ``init -e keyfile`` command, Borg
needs an associated file in ``$HOME/.config/borg/keys`` to read and write
the repository. The format is based on msgpack_, base64 encoding and
PBKDF2_ SHA256 hashing, which is then encoded again in a msgpack_.
When initializing a repository with one of the "keyfile" encryption modes,
Borg creates an associated key file in ``$HOME/.config/borg/keys``.
The same data structure is also used in the "repokey" modes, which store
it in the repository in the configuration file.
The same key is also used in the "repokey" modes, which store it in the repository
in the configuration file.
The internal data structure is as follows:
@ -963,24 +938,20 @@ version
repository_id
the ``id`` field in the ``config`` ``INI`` file of the repository.
enc_key
the key used to encrypt data with AES (256 bits)
enc_hmac_key
the key used to HMAC the encrypted data (256 bits)
crypt_key
the initial key material used for the AEAD crypto (512 bits)
id_key
the key used to HMAC the plaintext chunk data to compute the chunk's id
the key used to MAC the plaintext chunk data to compute the chunk's id
chunk_seed
the seed for the buzhash chunking table (signed 32 bit integer)
These fields are packed using msgpack_. The utf-8 encoded passphrase
is processed with PBKDF2_ (SHA256_, 100000 iterations, random 256 bit salt)
to derive a 256 bit key encryption key (KEK).
is processed with argon2_ to derive a 256 bit key encryption key (KEK).
A `HMAC-SHA256`_ checksum of the packed fields is generated with the KEK,
then the KEK is also used to encrypt the same packed fields using AES-CTR.
Then the KEK is used to encrypt and authenticate the packed data using
the chacha20-poly1305 AEAD cipher.
The result is stored in a another msgpack_ formatted as follows:
@ -990,15 +961,12 @@ version
salt
random 256 bits salt used to process the passphrase
iterations
number of iterations used to process the passphrase (currently 100000)
argon2_*
some parameters for the argon2 kdf
algorithm
the hashing algorithm used to process the passphrase and do the HMAC
checksum (currently the string ``sha256``)
hash
HMAC-SHA256 of the *plaintext* of the packed fields.
the algorithms used to process the passphrase
(currently the string ``argon2 chacha20-poly1305``)
data
The encrypted, packed fields.

Binary file not shown.

Binary file not shown.

Before

Width:  |  Height:  |  Size: 156 KiB

After

Width:  |  Height:  |  Size: 136 KiB

Binary file not shown.

Binary file not shown.

Before

Width:  |  Height:  |  Size: 62 KiB

View File

@ -152,7 +152,7 @@ each encrypted message.
Session::
sessionid = os.urandom(24)
ikm = enc_key || enc_hmac_key
ikm = crypt_key
salt = "borg-session-key-CIPHERNAME"
sessionkey = HKDF(ikm, sessionid, salt)
message_iv = 0
@ -216,32 +216,23 @@ For offline storage of the encryption keys they are encrypted with a
user-chosen passphrase.
A 256 bit key encryption key (KEK) is derived from the passphrase
using PBKDF2-HMAC-SHA256 with a random 256 bit salt which is then used
to Encrypt-*and*-MAC (unlike the Encrypt-*then*-MAC approach used
otherwise) a packed representation of the keys with AES-256-CTR with a
constant initialization vector of 0. A HMAC-SHA256 of the plaintext is
generated using the same KEK and is stored alongside the ciphertext,
which is converted to base64 in its entirety.
using argon2_ with a random 256 bit salt. The KEK is then used
to Encrypt-*then*-MAC a packed representation of the keys using the
chacha20-poly1305 AEAD cipher and a constant IV == 0.
The ciphertext is then converted to base64.
This base64 blob (commonly referred to as *keyblob*) is then stored in
the key file or in the repository config (keyfile and repokey modes
respectively).
This scheme, and specifically the use of a constant IV with the CTR
mode, is secure because an identical passphrase will result in a
different derived KEK for every key encryption due to the salt.
The use of Encrypt-and-MAC instead of Encrypt-then-MAC is seen as
uncritical (but not ideal) here, since it is combined with AES-CTR mode,
which is not vulnerable to padding attacks.
The use of a constant IV is secure because an identical passphrase will
result in a different derived KEK for every key encryption due to the salt.
.. seealso::
Refer to the :ref:`key_files` section for details on the format.
Refer to issue :issue:`747` for suggested improvements of the encryption
scheme and password-based key derivation.
Implementations used
--------------------
@ -249,27 +240,18 @@ Implementations used
We do not implement cryptographic primitives ourselves, but rely
on widely used libraries providing them:
- AES-CTR, AES-OCB, CHACHA20-POLY1305 and HMAC-SHA-256 from OpenSSL 1.1 are used,
- AES-OCB and CHACHA20-POLY1305 from OpenSSL 1.1 are used,
which is also linked into the static binaries we provide.
We think this is not an additional risk, since we don't ever
use OpenSSL's networking, TLS or X.509 code, but only their
primitives implemented in libcrypto.
- SHA-256, SHA-512 and BLAKE2b from Python's hashlib_ standard library module are used.
Borg requires a Python built with OpenSSL support (due to PBKDF2), therefore
these functions are delegated to OpenSSL by Python.
- HMAC, PBKDF2 and a constant-time comparison from Python's hmac_ standard
library module is used. While the HMAC implementation is written in Python,
the PBKDF2 implementation is provided by OpenSSL. The constant-time comparison
(``compare_digest``) is written in C and part of Python.
- HMAC and a constant-time comparison from Python's hmac_ standard library module are used.
- argon2 is used via argon2-cffi.
Implemented cryptographic constructions are:
- AEAD modes: AES-OCB and CHACHA20-POLY1305 are straight from OpenSSL.
- Legacy modes: Encrypt-then-MAC based on AES-256-CTR and either HMAC-SHA-256
or keyed BLAKE2b256 as described above under Encryption_.
- Encrypt-and-MAC based on AES-256-CTR and HMAC-SHA-256
as described above under `Offline key security`_.
- HKDF_-SHA-512
- HKDF_-SHA-512 (using ``hmac.digest`` from Python's hmac_ standard library module)
.. _Horton principle: https://en.wikipedia.org/wiki/Horton_Principle
.. _HKDF: https://tools.ietf.org/html/rfc5869

View File

@ -5,8 +5,7 @@
Quick Start
===========
This chapter will get you started with Borg and covers
various use cases.
This chapter will get you started with Borg and covers various use cases.
A step by step example
----------------------
@ -83,7 +82,7 @@ root, just run it as your normal user.
For a local repository just always use the same user to invoke borg.
For a remote repository: always use e.g. borg@remote_host. You can use this
For a remote repository: always use e.g. ssh://borg@remote_host. You can use this
from different local users, the remote user running borg and accessing the
repo will always be `borg`.
@ -113,7 +112,7 @@ common techniques to achieve this.
- Dump databases or stop the database servers.
- Shut down virtual machines before backing up their images.
- Shut down virtual machines before backing up their disk image files.
- Shut down containers before backing up their storage volumes.
@ -144,7 +143,7 @@ After the backup this script also uses the :ref:`borg_prune` subcommand to keep
only a certain number of old archives and deletes the others.
Finally, it uses the :ref:`borg_compact` subcommand to remove deleted objects
from the segment files in the repository to preserve disk space.
from the segment files in the repository to free disk space.
Before running, make sure that the repository is initialized as documented in
:ref:`remote_repos` and that the script has the correct permissions to be executable

View File

@ -37,7 +37,7 @@ Examples
# Make a big effort in fine granular deduplication (big chunk management
# overhead, needs a lot of RAM and disk space, see formula in internals
# docs - same parameters as borg < 1.0 or attic):
# docs - same parameters as borg < 1.0):
$ borg create --chunker-params buzhash,10,23,16,4095 small /smallstuff
# Backup a raw device (must not be active/in use/mounted at that time)

View File

@ -19,14 +19,14 @@ Examples
--other-repo ssh://borg2@borgbackup/./tests/b12 -e repokey-blake2-aes-ocb
# 2. Check what and how much it would transfer:
$ borg --repo ssh://borg2@borgbackup/./tests/b20 transfer \
$ borg --repo ssh://borg2@borgbackup/./tests/b20 transfer --upgrader=From12To20 \
--other-repo ssh://borg2@borgbackup/./tests/b12 --dry-run
# 3. Transfer (copy) archives from old repo into new repo (takes time and space!):
$ borg --repo ssh://borg2@borgbackup/./tests/b20 transfer \
$ borg --repo ssh://borg2@borgbackup/./tests/b20 transfer --upgrader=From12To20 \
--other-repo ssh://borg2@borgbackup/./tests/b12
# 4. Check if we have everything (same as 2.):
$ borg --repo ssh://borg2@borgbackup/./tests/b20 transfer \
$ borg --repo ssh://borg2@borgbackup/./tests/b20 transfer --upgrader=From12To20 \
--other-repo ssh://borg2@borgbackup/./tests/b12 --dry-run