diff --git a/docs/development.rst b/docs/development.rst index 506bf7254..be4ed84d9 100644 --- a/docs/development.rst +++ b/docs/development.rst @@ -53,7 +53,7 @@ requests (if you don't have GitHub or don't want to use it you can send smaller patches via the borgbackup mailing list to the maintainers). Stable releases are maintained on maintenance branches named ``x.y-maint``, eg. -the maintenance branch of the 1.0.x series is ``1.0-maint``. +the maintenance branch of the 1.2.x series is ``1.2-maint``. Most PRs should be filed against the ``master`` branch. Only if an issue affects **only** a particular maintenance branch a PR should be diff --git a/docs/faq.rst b/docs/faq.rst index a6fb31edb..dbcdf9d6a 100644 --- a/docs/faq.rst +++ b/docs/faq.rst @@ -51,43 +51,26 @@ Can I copy or synchronize my repo to another location? If you want to have redundant backup repositories (preferably at separate locations), the recommended way to do that is like this: -- ``borg rcreate repo1`` -- ``borg rcreate repo2`` +- ``borg rcreate repo1 --encryption=X`` +- ``borg rcreate repo2 --encryption=X --other-repo=repo1`` +- maybe do a snapshot to have stable and same input data for both borg create. - client machine ---borg create---> repo1 - client machine ---borg create---> repo2 -This will create distinct repositories (separate repo ID, separate -keys) and nothing bad happening in repo1 will influence repo2. +This will create distinct (different repo ID), but related repositories. +Related means using the same chunker secret and the same id_key, thus producing +the same chunks / the same chunk ids if the input data is the same. -Some people decide against above recommendation and create identical -copies of a repo (using some copy / sync / clone tool). +The 2 independent borg create invocations mean that there is no error propagation +from repo1 to repo2 when done like that. -While this might be better than having no redundancy at all, you have -to be very careful about how you do that and what you may / must not -do with the result (if you decide against our recommendation). +An alternative way would be to use ``borg transfer`` to copy backup archives +from repo1 to repo2. Likely a bit more efficient and the archives would be identical, +but suffering from potential error propagation. -What you would get with this is: - -- client machine ---borg create---> repo -- repo ---copy/sync---> copy-of-repo - -There is no special borg command to do the copying, you could just -use any reliable tool that creates an identical copy (cp, rsync, rclone -might be options). - -But think about whether that is really what you want. If something goes -wrong in repo, you will have the same issue in copy-of-repo. - -Make sure you do the copy/sync while no backup is running, see -:ref:`borg_with-lock` about how to do that. - -Also, you must not run borg against multiple instances of the same repo -(like repo and copy-of-repo) as that would create severe issues: - -- Data loss: they have the same repository ID, so the borg client will - think they are identical and e.g. use the same local cache for them - (which is an issue if they happen to be not the same). - See :issue:`4272` for an example. +Warning: using borg with multiple repositories with identical repository ID (like when +creating 1:1 repository copies) is not supported and can lead to all sorts of issues, +like e.g. cache coherency issues, malfunction, data corruption. "this is either an attack or unsafe" warning -------------------------------------------- @@ -192,7 +175,13 @@ that option under any normal circumstances. How can I backup huge file(s) over a unstable connection? --------------------------------------------------------- -This is not a problem anymore. +Yes. For more details, see :ref:`checkpoints_parts`. + +How can I restore huge file(s) over an unstable connection? +----------------------------------------------------------- + +If you cannot manage to extract the whole big file in one go, you can extract +all the part files and manually concatenate them together. For more details, see :ref:`checkpoints_parts`. @@ -203,21 +192,12 @@ You could do that (via borg config REPO append_only 0/1), but using different ssh keys and different entries in ``authorized_keys`` is much easier and also maybe has less potential of things going wrong somehow. - My machine goes to sleep causing `Broken pipe` ---------------------------------------------- -When backing up your data over the network, your machine should not go to sleep. +While backing up your data over the network, your machine should not go to sleep. On macOS you can use `caffeinate` to avoid that. -How can I restore huge file(s) over an unstable connection? ------------------------------------------------------------ - -If you cannot manage to extract the whole big file in one go, you can extract -all the part files and manually concatenate them together. - -For more details, see :ref:`checkpoints_parts`. - How can I compare contents of an archive to my local filesystem? ----------------------------------------------------------------- @@ -385,9 +365,9 @@ Why is the time elapsed in the archive stats different from wall clock time? ---------------------------------------------------------------------------- Borg needs to write the time elapsed into the archive metadata before finalizing -the archive, compacting the segments, and committing the repo & cache. This means -when Borg is run with e.g. the ``time`` command, the duration shown in the archive -stats may be shorter than the full time the command runs for. +the archive and committing the repo & cache. +This means when Borg is run with e.g. the ``time`` command, the duration shown +in the archive stats may be shorter than the full time the command runs for. How do I configure different prune policies for different directories? ---------------------------------------------------------------------- @@ -810,13 +790,12 @@ and disk space on subsequent runs. Here what Borg does when you run ``borg creat fast). If so, the processing of the chunk is completed here. Otherwise it needs to process the chunk: - Compresses (the default lz4 is super fast) -- Encrypts (AES, usually fast if your CPU has AES acceleration as usual - since about 10y) -- Authenticates ("signs") using hmac-sha256 or blake2b (see above), +- Encrypts and authenticates (AES-OCB, usually fast if your CPU has AES acceleration as usual + since about 10y, or chacha20-poly1305, fast pure-software crypto) - Transmits to repo. If the repo is remote, this usually involves an SSH connection (does its own encryption / authentication). - Stores the chunk into a key/value store (the key is the chunk id, the value - is the data). While doing that, it computes a CRC32 of the data (repo low-level + is the data). While doing that, it computes CRC32 / XXH64 of the data (repo low-level checksum, used by borg check --repository) and also updates the repo index (another hashtable). @@ -860,13 +839,6 @@ If you feel your Borg backup is too slow somehow, here is what you can do: ``--noacls``, ``--noxattrs``. This can lead to noticeable performance improvements when your backup consists of many small files. -If you feel that Borg "freezes" on a file, it could be in the middle of processing a -large file (like ISOs or VM images). Borg < 1.2 announces file names *after* finishing -with the file. This can lead to displaying the name of a small file, while processing the -next (larger) file. For very big files this can lead to the progress display show some -previous short file for a long time while it processes the big one. With Borg 1.2 this -was changed to announcing the filename before starting to process it. - To see what files have changed and take more time processing, you can also add ``--list --filter=AME --stats`` to your ``borg create`` call to produce more log output, including a file list (with file status characters) and also some statistics at diff --git a/docs/internals/data-structures.rst b/docs/internals/data-structures.rst index de91f9ab0..fe35e4ad5 100644 --- a/docs/internals/data-structures.rst +++ b/docs/internals/data-structures.rst @@ -143,9 +143,14 @@ Index, hints and integrity The **repository index** is stored in ``index.`` and is used to determine an object's location in the repository. It is a HashIndex_, -a hash table using open addressing. It maps object keys_ to two -unsigned 32-bit integers; the first integer gives the segment number, -the second indicates the offset of the object's entry within the segment. +a hash table using open addressing. + +It maps object keys_ to: + +* segment number (unit32) +* offset of the object's entry within the segment (uint32) +* size of the payload, not including the entry header (uint32) +* flags (uint32) The **hints file** is a msgpacked file named ``hints.``. It contains: @@ -153,6 +158,8 @@ It contains: * version * list of segments * compact +* shadow_index +* storage_quota_use The **integrity file** is a msgpacked file named ``integrity.``. It contains checksums of the index and hints files and is described in the @@ -176,17 +183,9 @@ Since writing a ``DELETE`` tag does not actually delete any data and thus does not free disk space any log-based data store will need a compaction strategy (somewhat analogous to a garbage collector). -Borg uses a simple forward compacting algorithm, -which avoids modifying existing segments. +Borg uses a simple forward compacting algorithm, which avoids modifying existing segments. Compaction runs when a commit is issued with ``compact=True`` parameter, e.g. by the ``borg compact`` command (unless the :ref:`append_only_mode` is active). -One client transaction can manifest as multiple physical transactions, -since compaction is transacted, too, and Borg does not distinguish between the two:: - - Perspective| Time --> - -----------+-------------- - Client | Begin transaction - Modify Data - Commit | (done) - Repository | Begin transaction - Modify Data - Commit | Compact segments - Commit | (done) The compaction algorithm requires two inputs in addition to the segments themselves: @@ -198,9 +197,6 @@ The compaction algorithm requires two inputs in addition to the segments themsel to be stored as well. Therefore, Borg stores a mapping ``(segment id,) -> (number of sparse bytes,)``. - The 1.0.x series used a simpler non-conditional algorithm, - which only required the list of sparse segments. Thus, - it only stored a list, not the mapping described above. (ii) Each segment's reference count, which indicates how many live objects are in a segment. This is not strictly required to perform the algorithm. Rather, it is used to validate that a segment is unused before deleting it. If the algorithm is incorrect, or the reference @@ -209,14 +205,7 @@ The compaction algorithm requires two inputs in addition to the segments themsel These two pieces of information are stored in the hints file (`hints.N`) next to the index (`index.N`). -When loading a hints file, Borg checks the version contained in the file. -The 1.0.x series writes version 1 of the format (with the segments list instead -of the mapping, mentioned above). Since Borg 1.0.4, version 2 is read as well. -The 1.1.x series writes version 2 of the format and reads either version. -When reading a version 1 hints file, Borg 1.1.x will -read all sparse segments to determine their sparsity. - -This process may take some time if a repository has been kept in append-only mode +Compaction may take some time if a repository has been kept in append-only mode or ``borg compact`` has not been used for a longer time, which both has caused the number of sparse segments to grow. @@ -578,7 +567,7 @@ dictionary created by the ``Item`` class that contains: * source (for symlinks) * hlid (for hardlinks) * rdev (for device files) -* mtime, atime, ctime in nanoseconds +* mtime, atime, ctime, birthtime in nanoseconds * xattrs * acl (various OS-dependent fields) * flags @@ -689,14 +678,14 @@ In memory, the files cache is a key -> value mapping (a Python *dict*) and conta - file inode number - file size - - file mtime_ns + - file ctime_ns (or mtime_ns) - age (0 [newest], 1, 2, 3, ..., BORG_FILES_CACHE_TTL - 1) - list of chunk ids representing the file's contents To determine whether a file has not changed, cached values are looked up via the key in the mapping and compared to the current file attribute values. -If the file's size, mtime_ns and inode number is still the same, it is +If the file's size, timestamp and inode number is still the same, it is considered to not have changed. In that case, we check that all file content chunks are (still) present in the repository (we check that via the chunks cache). @@ -714,7 +703,7 @@ different files, as a single path may not be unique across different archives in different setups. Not all filesystems have stable inode numbers. If that is the case, borg can -be told to ignore the inode number in the check via --ignore-inode. +be told to ignore the inode number in the check via --files-cache. The age value is used for cache management. If a file is "seen" in a backup run, its age is reset to 0, otherwise its age is incremented by one. @@ -802,7 +791,7 @@ For small hash tables, we start with a growth factor of 2, which comes down to E.g. backing up a total count of 1 Mi (IEC binary prefix i.e. 2^20) files with a total size of 1TiB. -a) with ``create --chunker-params buzhash,10,23,16,4095`` (custom, like borg < 1.0 or attic): +a) with ``create --chunker-params buzhash,10,23,16,4095`` (custom, like borg < 1.0): mem_usage = 2.8GiB @@ -887,7 +876,8 @@ Encryption AEAD modes ~~~~~~~~~~ -Uses modern AEAD ciphers: AES-OCB or CHACHA20-POLY1305. +For new repositories, borg only uses modern AEAD ciphers: AES-OCB or CHACHA20-POLY1305. + For each borg invocation, a new sessionkey is derived from the borg key material and the 48bit IV starts from 0 again (both ciphers internally add a 32bit counter to our IV, so we'll just count up by 1 per chunk). @@ -909,24 +899,11 @@ even higher limit. Legacy modes ~~~~~~~~~~~~ -AES_-256 is used in CTR mode (so no need for padding). A 64 bit initialization -vector is used, a MAC is computed on the encrypted chunk -and both are stored in the chunk. Encryption and MAC use two different keys. -Each chunk consists of ``TYPE(1)`` + ``MAC(32)`` + ``NONCE(8)`` + ``CIPHERTEXT``: +Old repositories (which used AES-CTR mode) are supported read-only to be able to +``borg transfer`` their archives to new repositories (which use AEAD modes). -.. figure:: encryption.png - :figwidth: 100% - :width: 100% - -In AES-CTR mode you can think of the IV as the start value for the counter. -The counter itself is incremented by one after each 16 byte block. -The IV/counter is not required to be random but it must NEVER be reused. -So to accomplish this Borg initializes the encryption counter to be -higher than any previously used counter value before encrypting new data. - -To reduce payload size, only 8 bytes of the 16 bytes nonce is saved in the -payload, the first 8 bytes are always zeros. This does not affect security but -limits the maximum repository capacity to only 295 exabytes (2**64 * 16 bytes). +AES-CTR mode is not supported for new repositories and the related code will be +removed in a future release. Both modes ~~~~~~~~~~ @@ -947,13 +924,11 @@ Key files .. seealso:: The :ref:`key_encryption` section for an in-depth review of the key encryption. -When initialized with the ``init -e keyfile`` command, Borg -needs an associated file in ``$HOME/.config/borg/keys`` to read and write -the repository. The format is based on msgpack_, base64 encoding and -PBKDF2_ SHA256 hashing, which is then encoded again in a msgpack_. +When initializing a repository with one of the "keyfile" encryption modes, +Borg creates an associated key file in ``$HOME/.config/borg/keys``. -The same data structure is also used in the "repokey" modes, which store -it in the repository in the configuration file. +The same key is also used in the "repokey" modes, which store it in the repository +in the configuration file. The internal data structure is as follows: @@ -963,24 +938,20 @@ version repository_id the ``id`` field in the ``config`` ``INI`` file of the repository. -enc_key - the key used to encrypt data with AES (256 bits) - -enc_hmac_key - the key used to HMAC the encrypted data (256 bits) +crypt_key + the initial key material used for the AEAD crypto (512 bits) id_key - the key used to HMAC the plaintext chunk data to compute the chunk's id + the key used to MAC the plaintext chunk data to compute the chunk's id chunk_seed the seed for the buzhash chunking table (signed 32 bit integer) These fields are packed using msgpack_. The utf-8 encoded passphrase -is processed with PBKDF2_ (SHA256_, 100000 iterations, random 256 bit salt) -to derive a 256 bit key encryption key (KEK). +is processed with argon2_ to derive a 256 bit key encryption key (KEK). -A `HMAC-SHA256`_ checksum of the packed fields is generated with the KEK, -then the KEK is also used to encrypt the same packed fields using AES-CTR. +Then the KEK is used to encrypt and authenticate the packed data using +the chacha20-poly1305 AEAD cipher. The result is stored in a another msgpack_ formatted as follows: @@ -990,15 +961,12 @@ version salt random 256 bits salt used to process the passphrase -iterations - number of iterations used to process the passphrase (currently 100000) +argon2_* + some parameters for the argon2 kdf algorithm - the hashing algorithm used to process the passphrase and do the HMAC - checksum (currently the string ``sha256``) - -hash - HMAC-SHA256 of the *plaintext* of the packed fields. + the algorithms used to process the passphrase + (currently the string ``argon2 chacha20-poly1305``) data The encrypted, packed fields. diff --git a/docs/internals/encryption-aead.odg b/docs/internals/encryption-aead.odg index 0f74fb428..a28a63b21 100644 Binary files a/docs/internals/encryption-aead.odg and b/docs/internals/encryption-aead.odg differ diff --git a/docs/internals/encryption-aead.png b/docs/internals/encryption-aead.png index b9eb2339b..1bcfbd178 100644 Binary files a/docs/internals/encryption-aead.png and b/docs/internals/encryption-aead.png differ diff --git a/docs/internals/encryption.odg b/docs/internals/encryption.odg deleted file mode 100644 index ce1916729..000000000 Binary files a/docs/internals/encryption.odg and /dev/null differ diff --git a/docs/internals/encryption.png b/docs/internals/encryption.png deleted file mode 100644 index 0f1b80642..000000000 Binary files a/docs/internals/encryption.png and /dev/null differ diff --git a/docs/internals/security.rst b/docs/internals/security.rst index e8b9bbf05..e90549d0f 100644 --- a/docs/internals/security.rst +++ b/docs/internals/security.rst @@ -152,7 +152,7 @@ each encrypted message. Session:: sessionid = os.urandom(24) - ikm = enc_key || enc_hmac_key + ikm = crypt_key salt = "borg-session-key-CIPHERNAME" sessionkey = HKDF(ikm, sessionid, salt) message_iv = 0 @@ -216,32 +216,23 @@ For offline storage of the encryption keys they are encrypted with a user-chosen passphrase. A 256 bit key encryption key (KEK) is derived from the passphrase -using PBKDF2-HMAC-SHA256 with a random 256 bit salt which is then used -to Encrypt-*and*-MAC (unlike the Encrypt-*then*-MAC approach used -otherwise) a packed representation of the keys with AES-256-CTR with a -constant initialization vector of 0. A HMAC-SHA256 of the plaintext is -generated using the same KEK and is stored alongside the ciphertext, -which is converted to base64 in its entirety. +using argon2_ with a random 256 bit salt. The KEK is then used +to Encrypt-*then*-MAC a packed representation of the keys using the +chacha20-poly1305 AEAD cipher and a constant IV == 0. +The ciphertext is then converted to base64. This base64 blob (commonly referred to as *keyblob*) is then stored in the key file or in the repository config (keyfile and repokey modes respectively). -This scheme, and specifically the use of a constant IV with the CTR -mode, is secure because an identical passphrase will result in a -different derived KEK for every key encryption due to the salt. - -The use of Encrypt-and-MAC instead of Encrypt-then-MAC is seen as -uncritical (but not ideal) here, since it is combined with AES-CTR mode, -which is not vulnerable to padding attacks. +The use of a constant IV is secure because an identical passphrase will +result in a different derived KEK for every key encryption due to the salt. .. seealso:: Refer to the :ref:`key_files` section for details on the format. - Refer to issue :issue:`747` for suggested improvements of the encryption - scheme and password-based key derivation. Implementations used -------------------- @@ -249,27 +240,18 @@ Implementations used We do not implement cryptographic primitives ourselves, but rely on widely used libraries providing them: -- AES-CTR, AES-OCB, CHACHA20-POLY1305 and HMAC-SHA-256 from OpenSSL 1.1 are used, +- AES-OCB and CHACHA20-POLY1305 from OpenSSL 1.1 are used, which is also linked into the static binaries we provide. We think this is not an additional risk, since we don't ever use OpenSSL's networking, TLS or X.509 code, but only their primitives implemented in libcrypto. - SHA-256, SHA-512 and BLAKE2b from Python's hashlib_ standard library module are used. - Borg requires a Python built with OpenSSL support (due to PBKDF2), therefore - these functions are delegated to OpenSSL by Python. -- HMAC, PBKDF2 and a constant-time comparison from Python's hmac_ standard - library module is used. While the HMAC implementation is written in Python, - the PBKDF2 implementation is provided by OpenSSL. The constant-time comparison - (``compare_digest``) is written in C and part of Python. +- HMAC and a constant-time comparison from Python's hmac_ standard library module are used. +- argon2 is used via argon2-cffi. Implemented cryptographic constructions are: -- AEAD modes: AES-OCB and CHACHA20-POLY1305 are straight from OpenSSL. -- Legacy modes: Encrypt-then-MAC based on AES-256-CTR and either HMAC-SHA-256 - or keyed BLAKE2b256 as described above under Encryption_. -- Encrypt-and-MAC based on AES-256-CTR and HMAC-SHA-256 - as described above under `Offline key security`_. -- HKDF_-SHA-512 +- HKDF_-SHA-512 (using ``hmac.digest`` from Python's hmac_ standard library module) .. _Horton principle: https://en.wikipedia.org/wiki/Horton_Principle .. _HKDF: https://tools.ietf.org/html/rfc5869 diff --git a/docs/quickstart.rst b/docs/quickstart.rst index 5754a02cf..1711fee2d 100644 --- a/docs/quickstart.rst +++ b/docs/quickstart.rst @@ -5,8 +5,7 @@ Quick Start =========== -This chapter will get you started with Borg and covers -various use cases. +This chapter will get you started with Borg and covers various use cases. A step by step example ---------------------- @@ -83,7 +82,7 @@ root, just run it as your normal user. For a local repository just always use the same user to invoke borg. -For a remote repository: always use e.g. borg@remote_host. You can use this +For a remote repository: always use e.g. ssh://borg@remote_host. You can use this from different local users, the remote user running borg and accessing the repo will always be `borg`. @@ -113,7 +112,7 @@ common techniques to achieve this. - Dump databases or stop the database servers. -- Shut down virtual machines before backing up their images. +- Shut down virtual machines before backing up their disk image files. - Shut down containers before backing up their storage volumes. @@ -144,7 +143,7 @@ After the backup this script also uses the :ref:`borg_prune` subcommand to keep only a certain number of old archives and deletes the others. Finally, it uses the :ref:`borg_compact` subcommand to remove deleted objects -from the segment files in the repository to preserve disk space. +from the segment files in the repository to free disk space. Before running, make sure that the repository is initialized as documented in :ref:`remote_repos` and that the script has the correct permissions to be executable diff --git a/docs/usage/create.rst b/docs/usage/create.rst index a64d63b96..b4b978a3c 100644 --- a/docs/usage/create.rst +++ b/docs/usage/create.rst @@ -37,7 +37,7 @@ Examples # Make a big effort in fine granular deduplication (big chunk management # overhead, needs a lot of RAM and disk space, see formula in internals - # docs - same parameters as borg < 1.0 or attic): + # docs - same parameters as borg < 1.0): $ borg create --chunker-params buzhash,10,23,16,4095 small /smallstuff # Backup a raw device (must not be active/in use/mounted at that time) diff --git a/docs/usage/transfer.rst b/docs/usage/transfer.rst index cab2d2a62..ccc395782 100644 --- a/docs/usage/transfer.rst +++ b/docs/usage/transfer.rst @@ -19,14 +19,14 @@ Examples --other-repo ssh://borg2@borgbackup/./tests/b12 -e repokey-blake2-aes-ocb # 2. Check what and how much it would transfer: - $ borg --repo ssh://borg2@borgbackup/./tests/b20 transfer \ + $ borg --repo ssh://borg2@borgbackup/./tests/b20 transfer --upgrader=From12To20 \ --other-repo ssh://borg2@borgbackup/./tests/b12 --dry-run # 3. Transfer (copy) archives from old repo into new repo (takes time and space!): - $ borg --repo ssh://borg2@borgbackup/./tests/b20 transfer \ + $ borg --repo ssh://borg2@borgbackup/./tests/b20 transfer --upgrader=From12To20 \ --other-repo ssh://borg2@borgbackup/./tests/b12 # 4. Check if we have everything (same as 2.): - $ borg --repo ssh://borg2@borgbackup/./tests/b20 transfer \ + $ borg --repo ssh://borg2@borgbackup/./tests/b20 transfer --upgrader=From12To20 \ --other-repo ssh://borg2@borgbackup/./tests/b12 --dry-run