1
0
Fork 0
mirror of https://github.com/borgbackup/borg.git synced 2024-12-11 02:27:57 +00:00

Merge pull request #8332 from ThomasWaldmann/use-borgstore

use borgstore and other big changes
This commit is contained in:
TW 2024-09-08 15:16:24 +02:00 committed by GitHub
commit ea08e49210
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
166 changed files with 6744 additions and 8421 deletions

View file

@ -12,4 +12,4 @@ jobs:
- uses: actions/checkout@v4
- uses: psf/black@stable
with:
version: "~= 23.0"
version: "~= 24.0"

View file

@ -107,8 +107,7 @@ jobs:
pip install -r requirements.d/development.txt
- name: Install borgbackup
run: |
# pip install -e .
python setup.py -v develop
pip install -e .
- name: run tox env
env:
XDISTN: "4"

View file

@ -1,6 +1,6 @@
repos:
- repo: https://github.com/psf/black
rev: 23.1.0
rev: 24.8.0
hooks:
- id: black
- repo: https://github.com/astral-sh/ruff-pre-commit

View file

@ -69,7 +69,7 @@ Main features
**Speed**
* performance-critical code (chunking, compression, encryption) is
implemented in C/Cython
* local caching of files/chunks index data
* local caching
* quick detection of unmodified files
**Data encryption**

View file

@ -12,8 +12,8 @@ This section provides information about security and corruption issues.
Upgrade Notes
=============
borg 1.2.x to borg 2.0
----------------------
borg 1.2.x/1.4.x to borg 2.0
----------------------------
Compatibility notes:
@ -21,11 +21,11 @@ Compatibility notes:
We tried to put all the necessary "breaking" changes into this release, so we
hopefully do not need another breaking release in the near future. The changes
were necessary for improved security, improved speed, unblocking future
improvements, getting rid of legacy crap / design limitations, having less and
simpler code to maintain.
were necessary for improved security, improved speed and parallelism,
unblocking future improvements, getting rid of legacy crap and design
limitations, having less and simpler code to maintain.
You can use "borg transfer" to transfer archives from borg 1.1/1.2 repos to
You can use "borg transfer" to transfer archives from borg 1.2/1.4 repos to
a new borg 2.0 repo, but it will need some time and space.
Before using "borg transfer", you must have upgraded to borg >= 1.2.6 (or
@ -84,6 +84,7 @@ Compatibility notes:
- removed --nobsdflags (use --noflags)
- removed --noatime (default now, see also --atime)
- removed --save-space option (does not change behaviour)
- removed --bypass-lock option
- using --list together with --progress is now disallowed (except with --log-json), #7219
- the --glob-archives option was renamed to --match-archives (the short option
name -a is unchanged) and extended to support different pattern styles:
@ -114,12 +115,61 @@ Compatibility notes:
fail now that somehow "worked" before (but maybe didn't work as intended due to
the contradicting options).
.. _changelog:
Change Log 2.x
==============
Version 2.0.0b10 (2024-09-09)
-----------------------------
TL;DR: this is a huge change and the first very fundamental change in how borg
works since ever:
- you will need to create new repos.
- likely more exciting than previous betas, definitely not for production.
New features:
- borgstore based repository, file:, ssh: and sftp: for now, more possible.
- repository stores objects separately now, not using segment files.
this has more fs overhead, but needs much less I/O because no segment
files compaction is required anymore. also, no repository index is
needed anymore because we can directly find the objects by their ID.
- locking: new borgstore based repository locking with automatic stale
lock removal (if lock does not get refreshed, if lock owner process is dead).
- simultaneous repository access for many borg commands except check/compact.
the cache lock for adhocwithfiles is still exclusive though, so use
BORG_CACHE_IMPL=adhoc if you want to try that out using only 1 machine
and 1 user (that implementation doesn't use a cache lock). When using
multiple client machines or users, it also works with the default cache.
- delete/prune: much quicker now and can be undone.
- check --repair --undelete-archives: bring archives back from the dead.
- rspace: manage reserved space in repository (avoid dead-end situation if
repository fs runs full).
Bugs/issues fixed:
- a lot! all linked from PR #8332.
Other changes:
- repository: remove transactions, solved differently and much simpler now
(convergence and write order primarily).
- repository: replaced precise reference counting with "object exists in repo?"
and "garbage collection of unused objects".
- cache: remove transactions, remove chunks cache.
removed LocalCache, BORG_CACHE_IMPL=local, solving all related issues.
as in beta 9, adhowwithfiles is the default implementation.
- compact: needs the borg key now (run it clientside), -v gives nice stats.
- transfer: archive transfers from borg 1.x need the --from-borg1 option
- check: reimplemented / bigger changes.
- code: got rid of a metric ton of not needed complexity.
when borg does not need to read borg 1.x repos/archives anymore, after
users have transferred their archives, even much more can be removed.
- docs: updated / removed outdated stuff
Version 2.0.0b9 (2024-07-20)
----------------------------

View file

@ -3469,7 +3469,7 @@ Other changes:
- archiver tests: add check_cache tool - lints refcounts
- fixed cache sync performance regression from 1.1.0b1 onwards, #1940
- syncing the cache without chunks.archive.d (see :ref:`disable_archive_chunks`)
- syncing the cache without chunks.archive.d
now avoids any merges and is thus faster, #1940
- borg check --verify-data: faster due to linear on-disk-order scan
- borg debug-xxx commands removed, we use "debug xxx" subcommands now, #1627

View file

@ -105,7 +105,7 @@ modify it to suit your needs (e.g. more backup sets, dumping databases etc.).
#
# Options for borg create
BORG_OPTS="--stats --one-file-system --compression lz4 --checkpoint-interval 86400"
BORG_OPTS="--stats --one-file-system --compression lz4"
# Set BORG_PASSPHRASE or BORG_PASSCOMMAND somewhere around here, using export,
# if encryption is used.

View file

@ -68,8 +68,6 @@ can be filled to the specified quota.
If storage quotas are used, ensure that all deployed Borg releases
support storage quotas.
Refer to :ref:`internals_storage_quota` for more details on storage quotas.
**Specificities: Append-only repositories**
Running ``borg init`` via a ``borg serve --append-only`` server will **not**

View file

@ -14,7 +14,7 @@ What is the difference between a repo on an external hard drive vs. repo on a se
If Borg is running in client/server mode, the client uses SSH as a transport to
talk to the remote agent, which is another Borg process (Borg is installed on
the server, too) started automatically by the client. The Borg server is doing
storage-related low-level repo operations (get, put, commit, check, compact),
storage-related low-level repo operations (list, load and store objects),
while the Borg client does the high-level stuff: deduplication, encryption,
compression, dealing with archives, backups, restores, etc., which reduces the
amount of data that goes over the network.
@ -27,17 +27,7 @@ which is slower.
Can I back up from multiple servers into a single repository?
-------------------------------------------------------------
In order for the deduplication used by Borg to work, it
needs to keep a local cache containing checksums of all file
chunks already stored in the repository. This cache is stored in
``~/.cache/borg/``. If Borg detects that a repository has been
modified since the local cache was updated it will need to rebuild
the cache. This rebuild can be quite time consuming.
So, yes it's possible. But it will be most efficient if a single
repository is only modified from one place. Also keep in mind that
Borg will keep an exclusive lock on the repository while creating
or deleting archives, which may make *simultaneous* backups fail.
Yes, you can! Even simultaneously.
Can I back up to multiple, swapped backup targets?
--------------------------------------------------
@ -124,50 +114,31 @@ Are there other known limitations?
remove files which are in the destination, but not in the archive.
See :issue:`4598` for a workaround and more details.
.. _checkpoints_parts:
.. _interrupted_backup:
If a backup stops mid-way, does the already-backed-up data stay there?
----------------------------------------------------------------------
Yes, Borg supports resuming backups.
During a backup, a special checkpoint archive named ``<archive-name>.checkpoint``
is saved at every checkpoint interval (the default value for this is 30
minutes) containing all the data backed-up until that point.
This checkpoint archive is a valid archive, but it is only a partial backup
(not all files that you wanted to back up are contained in it and the last file
in it might be a partial file). Having it in the repo until a successful, full
backup is completed is useful because it references all the transmitted chunks up
to the checkpoint. This means that in case of an interruption, you only need to
retransfer the data since the last checkpoint.
Yes, the data transferred into the repo stays there - just avoid running
``borg compact`` before you completed the backup, because that would remove
chunks that were already transferred to the repo, but not (yet) referenced
by an archive.
If a backup was interrupted, you normally do not need to do anything special,
just invoke ``borg create`` as you always do. If the repository is still locked,
you may need to run ``borg break-lock`` before the next backup. You may use the
same archive name as in previous attempt or a different one (e.g. if you always
include the current datetime), it does not matter.
just invoke ``borg create`` as you always do. You may use the same archive name
as in previous attempt or a different one (e.g. if you always include the
current datetime), it does not matter.
Borg always does full single-pass backups, so it will start again
from the beginning - but it will be much faster, because some of the data was
already stored into the repo (and is still referenced by the checkpoint
archive), so it does not need to get transmitted and stored again.
Once your backup has finished successfully, you can delete all
``<archive-name>.checkpoint`` archives. If you run ``borg prune``, it will
also care for deleting unneeded checkpoints.
Note: the checkpointing mechanism may create a partial (truncated) last file
in a checkpoint archive named ``<filename>.borg_part``. Such partial files
won't be contained in the final archive.
This is done so that checkpoints work cleanly and promptly while a big
file is being processed.
already stored into the repo, so it does not need to get transmitted and stored
again.
How can I back up huge file(s) over a unstable connection?
----------------------------------------------------------
Yes. For more details, see :ref:`checkpoints_parts`.
Yes. For more details, see :ref:`interrupted_backup`.
How can I restore huge file(s) over an unstable connection?
-----------------------------------------------------------
@ -220,23 +191,6 @@ Yes, if you want to detect accidental data damage (like bit rot), use the
If you want to be able to detect malicious tampering also, use an encrypted
repo. It will then be able to check using CRCs and HMACs.
Can I use Borg on SMR hard drives?
----------------------------------
SMR (shingled magnetic recording) hard drives are very different from
regular hard drives. Applications have to behave in certain ways or
performance will be heavily degraded.
Borg ships with default settings suitable for SMR drives,
and has been successfully tested on *Seagate Archive v2* drives
using the ext4 file system.
Some Linux kernel versions between 3.19 and 4.5 had various bugs
handling device-managed SMR drives, leading to IO errors, unresponsive
drives and unreliable operation in general.
For more details, refer to :issue:`2252`.
.. _faq-integrityerror:
I get an IntegrityError or similar - what now?
@ -355,7 +309,7 @@ Why is the time elapsed in the archive stats different from wall clock time?
----------------------------------------------------------------------------
Borg needs to write the time elapsed into the archive metadata before finalizing
the archive and committing the repo & cache.
the archive and saving the files cache.
This means when Borg is run with e.g. the ``time`` command, the duration shown
in the archive stats may be shorter than the full time the command runs for.
@ -391,8 +345,7 @@ will of course delete everything in the archive, not only some files.
:ref:`borg_recreate` command to rewrite all archives with a different
``--exclude`` pattern. See the examples in the manpage for more information.
Finally, run :ref:`borg_compact` with the ``--threshold 0`` option to delete the
data chunks from the repository.
Finally, run :ref:`borg_compact` to delete the data chunks from the repository.
Can I safely change the compression level or algorithm?
--------------------------------------------------------
@ -402,6 +355,7 @@ are calculated *before* compression. New compression settings
will only be applied to new chunks, not existing chunks. So it's safe
to change them.
Use ``borg rcompress`` to efficiently recompress a complete repository.
Security
########
@ -704,38 +658,6 @@ serialized way in a single script, you need to give them ``--lock-wait N`` (with
being a bit more than the time the server needs to terminate broken down
connections and release the lock).
.. _disable_archive_chunks:
The borg cache eats way too much disk space, what can I do?
-----------------------------------------------------------
This may especially happen if borg needs to rebuild the local "chunks" index -
either because it was removed, or because it was not coherent with the
repository state any more (e.g. because another borg instance changed the
repository).
To optimize this rebuild process, borg caches per-archive information in the
``chunks.archive.d/`` directory. It won't help the first time it happens, but it
will make the subsequent rebuilds faster (because it needs to transfer less data
from the repository). While being faster, the cache needs quite some disk space,
which might be unwanted.
You can disable the cached archive chunk indexes by setting the environment
variable ``BORG_USE_CHUNKS_ARCHIVE`` to ``no``.
This has some pros and cons, though:
- much less disk space needs for ~/.cache/borg.
- chunk cache resyncs will be slower as it will have to transfer chunk usage
metadata for all archives from the repository (which might be slow if your
repo connection is slow) and it will also have to build the hashtables from
that data.
chunk cache resyncs happen e.g. if your repo was written to by another
machine (if you share same backup repo between multiple machines) or if
your local chunks cache was lost somehow.
The long term plan to improve this is called "borgception", see :issue:`474`.
Can I back up my root partition (/) with Borg?
----------------------------------------------
@ -779,7 +701,7 @@ This can make creation of the first archive slower, but saves time
and disk space on subsequent runs. Here what Borg does when you run ``borg create``:
- Borg chunks the file (using the relatively expensive buzhash algorithm)
- It then computes the "id" of the chunk (hmac-sha256 (often slow, except
- It then computes the "id" of the chunk (hmac-sha256 (slow, except
if your CPU has sha256 acceleration) or blake2b (fast, in software))
- Then it checks whether this chunk is already in the repo (local hashtable lookup,
fast). If so, the processing of the chunk is completed here. Otherwise it needs to
@ -790,9 +712,8 @@ and disk space on subsequent runs. Here what Borg does when you run ``borg creat
- Transmits to repo. If the repo is remote, this usually involves an SSH connection
(does its own encryption / authentication).
- Stores the chunk into a key/value store (the key is the chunk id, the value
is the data). While doing that, it computes CRC32 / XXH64 of the data (repo low-level
checksum, used by borg check --repository) and also updates the repo index
(another hashtable).
is the data). While doing that, it computes XXH64 of the data (repo low-level
checksum, used by borg check --repository).
Subsequent backups are usually very fast if most files are unchanged and only
a few are new or modified. The high performance on unchanged files primarily depends
@ -826,10 +747,9 @@ If you feel your Borg backup is too slow somehow, here is what you can do:
- Don't use any expensive compression. The default is lz4 and super fast.
Uncompressed is often slower than lz4.
- Just wait. You can also interrupt it and start it again as often as you like,
it will converge against a valid "completed" state (see ``--checkpoint-interval``,
maybe use the default, but in any case don't make it too short). It is starting
it will converge against a valid "completed" state. It is starting
from the beginning each time, but it is still faster then as it does not store
data into the repo which it already has there from last checkpoint.
data into the repo which it already has there.
- If you dont need additional file attributes, you can disable them with ``--noflags``,
``--noacls``, ``--noxattrs``. This can lead to noticeable performance improvements
when your backup consists of many small files.
@ -1021,6 +941,12 @@ To achieve this, run ``borg create`` within the mountpoint/snapshot directory:
cd /mnt/rootfs
borg create rootfs_backup .
Another way (without changing the directory) is to use the slashdot hack:
::
borg create rootfs_backup /mnt/rootfs/./
I am having troubles with some network/FUSE/special filesystem, why?
--------------------------------------------------------------------
@ -1100,16 +1026,6 @@ to make it behave correctly::
.. _workaround: https://unix.stackexchange.com/a/123236
Can I disable checking for free disk space?
-------------------------------------------
In some cases, the free disk space of the target volume is reported incorrectly.
This can happen for CIFS- or FUSE shares. If you are sure that your target volume
will always have enough disk space, you can use the following workaround to disable
checking for free disk space::
borg config -- additional_free_space -2T
How do I rename a repository?
-----------------------------
@ -1126,26 +1042,6 @@ It may be useful to set ``BORG_RELOCATED_REPO_ACCESS_IS_OK=yes`` to avoid the
prompts when renaming multiple repositories or in a non-interactive context
such as a script. See :doc:`deployment` for an example.
The repository quota size is reached, what can I do?
----------------------------------------------------
The simplest solution is to increase or disable the quota and resume the backup:
::
borg config /path/to/repo storage_quota 0
If you are bound to the quota, you have to free repository space. The first to
try is running :ref:`borg_compact` to free unused backup space (see also
:ref:`separate_compaction`):
::
borg compact /path/to/repo
If your repository is already compacted, run :ref:`borg_prune` or
:ref:`borg_delete` to delete archives that you do not need anymore, and then run
``borg compact`` again.
My backup disk is full, what can I do?
--------------------------------------
@ -1159,11 +1055,6 @@ conditions, but generally this should be avoided. If your backup disk is already
full when Borg starts a write command like `borg create`, it will abort
immediately and the repository will stay as-is.
If you run a backup that stops due to a disk running full, Borg will roll back,
delete the new segment file and thus freeing disk space automatically. There
may be a checkpoint archive left that has been saved before the disk got full.
You can keep it to speed up the next backup or delete it to get back more disk
space.
Miscellaneous
#############

Binary file not shown.

Binary file not shown.

Before

Width:  |  Height:  |  Size: 324 KiB

View file

@ -19,63 +19,51 @@ discussion about internals`_ and also on static code analysis.
Repository
----------
.. Some parts of this description were taken from the Repository docstring
Borg stores its data in a `Repository`, which is a key-value store and has
the following structure:
Borg stores its data in a `Repository`, which is a file system based
transactional key-value store. Thus the repository does not know about
the concept of archives or items.
config/
readme
simple text object telling that this is a Borg repository
id
the unique repository ID encoded as hexadecimal number text
version
the repository version encoded as decimal number text
manifest
some data about the repository, binary
last-key-checked
repository check progress (partial checks, full checks' checkpointing),
path of last object checked as text
space-reserve.N
purely random binary data to reserve space, e.g. for disk-full emergencies
Each repository has the following file structure:
There is a list of pointers to archive objects in this directory:
README
simple text file telling that this is a Borg repository
archives/
0000... .. ffff...
config
repository configuration
The actual data is stored into a nested directory structure, using the full
object ID as name. Each (encrypted and compressed) object is stored separately.
data/
directory where the actual data is stored
00/ .. ff/
00/ .. ff/
0000... .. ffff...
hints.%d
hints for repository compaction
keys/
repokey
When using encryption in repokey mode, the encrypted, passphrase protected
key is stored here as a base64 encoded text.
index.%d
repository index
locks/
used by the locking system to manage shared and exclusive locks.
lock.roster and lock.exclusive/*
used by the locking system to manage shared and exclusive locks
Transactionality is achieved by using a log (aka journal) to record changes. The log is a series of numbered files
called segments_. Each segment is a series of log entries. The segment number together with the offset of each
entry relative to its segment start establishes an ordering of the log entries. This is the "definition" of
time for the purposes of the log.
.. _config-file:
Config file
~~~~~~~~~~~
Each repository has a ``config`` file which is a ``INI``-style file
and looks like this::
[repository]
version = 2
segments_per_dir = 1000
max_segment_size = 524288000
id = 57d6c1d52ce76a836b532b0e42e677dec6af9fca3673db511279358828a21ed6
This is where the ``repository.id`` is stored. It is a unique
identifier for repositories. It will not change if you move the
repository around so you can make a local transfer then decide to move
the repository to another (even remote) location at a later time.
Keys
~~~~
Repository keys are byte-strings of fixed length (32 bytes), they
don't have a particular meaning (except for the Manifest_).
Normally the keys are computed like this::
Repository object IDs (which are used as key into the key-value store) are
byte-strings of fixed length (256bit, 32 bytes), computed like this::
key = id = id_hash(plaintext_data) # plain = not encrypted, not compressed, not obfuscated
@ -84,247 +72,68 @@ The id_hash function depends on the :ref:`encryption mode <borg_rcreate>`.
As the id / key is used for deduplication, id_hash must be a cryptographically
strong hash or MAC.
Segments
~~~~~~~~
Repository objects
~~~~~~~~~~~~~~~~~~
Objects referenced by a key are stored inline in files (`segments`) of approx.
500 MB size in numbered subdirectories of ``repo/data``. The number of segments
per directory is controlled by the value of ``segments_per_dir``. If you change
this value in a non-empty repository, you may also need to relocate the segment
files manually.
Each repository object is stored separately, under its ID into data/xx/yy/xxyy...
A segment starts with a magic number (``BORG_SEG`` as an eight byte ASCII string),
followed by a number of log entries. Each log entry consists of (in this order):
A repo object has a structure like this:
* crc32 checksum (uint32):
- for PUT2: CRC32(size + tag + key + digest)
- for PUT: CRC32(size + tag + key + payload)
- for DELETE: CRC32(size + tag + key)
- for COMMIT: CRC32(size + tag)
* size (uint32) of the entry (including the whole header)
* tag (uint8): PUT(0), DELETE(1), COMMIT(2) or PUT2(3)
* key (256 bit) - only for PUT/PUT2/DELETE
* payload (size - 41 bytes) - only for PUT
* xxh64 digest (64 bit) = XXH64(size + tag + key + payload) - only for PUT2
* payload (size - 41 - 8 bytes) - only for PUT2
* 32bit meta size
* 32bit data size
* 64bit xxh64(meta)
* 64bit xxh64(data)
* meta
* data
PUT2 is new since repository version 2. For new log entries PUT2 is used.
PUT is still supported to read version 1 repositories, but not generated any more.
If we talk about ``PUT`` in general, it shall usually mean PUT2 for repository
version 2+.
The size and xxh64 hashes can be used for server-side corruption checks without
needing to decrypt anything (which would require the borg key).
Those files are strictly append-only and modified only once.
The overall size of repository objects varies from very small (a small source
file will be stored as a single repo object) to medium (big source files will
be cut into medium sized chunks of some MB).
When an object is written to the repository a ``PUT`` entry is written
to the file containing the object id and payload. If an object is deleted
a ``DELETE`` entry is appended with the object id.
Metadata and data are separately encrypted and authenticated (depending on
the user's choices).
A ``COMMIT`` tag is written when a repository transaction is
committed. The segment number of the segment containing
a commit is the **transaction ID**.
See :ref:`data-encryption` for a graphic outlining the anatomy of the
encryption.
When a repository is opened any ``PUT`` or ``DELETE`` operations not
followed by a ``COMMIT`` tag are discarded since they are part of a
partial/uncommitted transaction.
Repo object metadata
~~~~~~~~~~~~~~~~~~~~
The size of individual segments is limited to 4 GiB, since the offset of entries
within segments is stored in a 32-bit unsigned integer in the repository index.
Metadata is a msgpacked (and encrypted/authenticated) dict with:
Objects / Payload structure
~~~~~~~~~~~~~~~~~~~~~~~~~~~
- ctype (compression type 0..255)
- clevel (compression level 0..255)
- csize (overall compressed (and maybe obfuscated) data size)
- psize (only when obfuscated: payload size without the obfuscation trailer)
- size (uncompressed size of the data)
All data (the manifest, archives, archive item stream chunks and file data
chunks) is compressed, optionally obfuscated and encrypted. This produces some
additional metadata (size and compression information), which is separately
serialized and also encrypted.
See :ref:`data-encryption` for a graphic outlining the anatomy of the encryption in Borg.
What you see at the bottom there is done twice: once for the data and once for the metadata.
An object (the payload part of a segment file log entry) must be like:
- length of encrypted metadata (16bit unsigned int)
- encrypted metadata (incl. encryption header), when decrypted:
- msgpacked dict with:
- ctype (compression type 0..255)
- clevel (compression level 0..255)
- csize (overall compressed (and maybe obfuscated) data size)
- psize (only when obfuscated: payload size without the obfuscation trailer)
- size (uncompressed size of the data)
- encrypted data (incl. encryption header), when decrypted:
- compressed data (with an optional all-zero-bytes obfuscation trailer)
This new, more complex repo v2 object format was implemented to be able to query the
metadata efficiently without having to read, transfer and decrypt the (usually much bigger)
data part.
The metadata is encrypted not to disclose potentially sensitive information that could be
used for e.g. fingerprinting attacks.
Having this separately encrypted metadata makes it more efficient to query
the metadata without having to read, transfer and decrypt the (usually much
bigger) data part.
The compression `ctype` and `clevel` is explained in :ref:`data-compression`.
Index, hints and integrity
~~~~~~~~~~~~~~~~~~~~~~~~~~
The **repository index** is stored in ``index.<TRANSACTION_ID>`` and is used to
determine an object's location in the repository. It is a HashIndex_,
a hash table using open addressing.
It maps object keys_ to:
* segment number (unit32)
* offset of the object's entry within the segment (uint32)
* size of the payload, not including the entry header (uint32)
* flags (uint32)
The **hints file** is a msgpacked file named ``hints.<TRANSACTION_ID>``.
It contains:
* version
* list of segments
* compact
* shadow_index
* storage_quota_use
The **integrity file** is a msgpacked file named ``integrity.<TRANSACTION_ID>``.
It contains checksums of the index and hints files and is described in the
:ref:`Checksumming data structures <integrity_repo>` section below.
If the index or hints are corrupted, they are re-generated automatically.
If they are outdated, segments are replayed from the index state to the currently
committed transaction.
Compaction
~~~~~~~~~~
For a given key only the last entry regarding the key, which is called current (all other entries are called
superseded), is relevant: If there is no entry or the last entry is a DELETE then the key does not exist.
Otherwise the last PUT defines the value of the key.
``borg compact`` is used to free repository space. It will:
By superseding a PUT (with either another PUT or a DELETE) the log entry becomes obsolete. A segment containing
such obsolete entries is called sparse, while a segment containing no such entries is called compact.
- list all object IDs present in the repository
- read all archives and determine which object IDs are in use
- remove all unused objects from the repository
- inform / warn about anything remarkable it found:
Since writing a ``DELETE`` tag does not actually delete any data and
thus does not free disk space any log-based data store will need a
compaction strategy (somewhat analogous to a garbage collector).
- warn about IDs used, but not present (data loss!)
- inform about IDs that reappeared that were previously lost
- compute statistics about:
Borg uses a simple forward compacting algorithm, which avoids modifying existing segments.
Compaction runs when a commit is issued with ``compact=True`` parameter, e.g.
by the ``borg compact`` command (unless the :ref:`append_only_mode` is active).
- compression and deduplication factors
- repository space usage and space freed
The compaction algorithm requires two inputs in addition to the segments themselves:
(i) Which segments are sparse, to avoid scanning all segments (impractical).
Further, Borg uses a conditional compaction strategy: Only those
segments that exceed a threshold sparsity are compacted.
To implement the threshold condition efficiently, the sparsity has
to be stored as well. Therefore, Borg stores a mapping ``(segment
id,) -> (number of sparse bytes,)``.
(ii) Each segment's reference count, which indicates how many live objects are in a segment.
This is not strictly required to perform the algorithm. Rather, it is used to validate
that a segment is unused before deleting it. If the algorithm is incorrect, or the reference
count was not accounted correctly, then an assertion failure occurs.
These two pieces of information are stored in the hints file (`hints.N`)
next to the index (`index.N`).
Compaction may take some time if a repository has been kept in append-only mode
or ``borg compact`` has not been used for a longer time, which both has caused
the number of sparse segments to grow.
Compaction processes sparse segments from oldest to newest; sparse segments
which don't contain enough deleted data to justify compaction are skipped. This
avoids doing e.g. 500 MB of writing current data to a new segment when only
a couple kB were deleted in a segment.
Segments that are compacted are read in entirety. Current entries are written to
a new segment, while superseded entries are omitted. After each segment an intermediary
commit is written to the new segment. Then, the old segment is deleted
(asserting that the reference count diminished to zero), freeing disk space.
A simplified example (excluding conditional compaction and with simpler
commit logic) showing the principal operation of compaction:
.. figure:: compaction.png
:figwidth: 100%
:width: 100%
(The actual algorithm is more complex to avoid various consistency issues, refer to
the ``borg.repository`` module for more comments and documentation on these issues.)
.. _internals_storage_quota:
Storage quotas
~~~~~~~~~~~~~~
Quotas are implemented at the Repository level. The active quota of a repository
is determined by the ``storage_quota`` `config` entry or a run-time override (via :ref:`borg_serve`).
The currently used quota is stored in the hints file. Operations (PUT and DELETE) during
a transaction modify the currently used quota:
- A PUT adds the size of the *log entry* to the quota,
i.e. the length of the data plus the 41 byte header.
- A DELETE subtracts the size of the deleted log entry from the quota,
which includes the header.
Thus, PUT and DELETE are symmetric and cancel each other out precisely.
The quota does not track on-disk size overheads (due to conditional compaction
or append-only mode). In normal operation the inclusion of the log entry headers
in the quota act as a faithful proxy for index and hints overheads.
By tracking effective content size, the client can *always* recover from a full quota
by deleting archives. This would not be possible if the quota tracked on-disk size,
since journaling DELETEs requires extra disk space before space is freed.
Tracking effective size on the other hand accounts DELETEs immediately as freeing quota.
.. rubric:: Enforcing the quota
The storage quota is meant as a robust mechanism for service providers, therefore
:ref:`borg_serve` has to enforce it without loopholes (e.g. modified clients).
The following sections refer to using quotas on remotely accessed repositories.
For local access, consider *client* and *serve* the same.
Accordingly, quotas cannot be enforced with local access,
since the quota can be changed in the repository config.
The quota is enforcible only if *all* :ref:`borg_serve` versions
accessible to clients support quotas (see next section). Further, quota is
per repository. Therefore, ensure clients can only access a defined set of repositories
with their quotas set, using ``--restrict-to-repository``.
If the client exceeds the storage quota the ``StorageQuotaExceeded`` exception is
raised. Normally a client could ignore such an exception and just send a ``commit()``
command anyway, circumventing the quota. However, when ``StorageQuotaExceeded`` is raised,
it is stored in the ``transaction_doomed`` attribute of the repository.
If the transaction is doomed, then commit will re-raise this exception, aborting the commit.
The transaction_doomed indicator is reset on a rollback (which erases the quota-exceeding
state).
.. rubric:: Compatibility with older servers and enabling quota after-the-fact
If no quota data is stored in the hints file, Borg assumes zero quota is used.
Thus, if a repository with an enabled quota is written to with an older ``borg serve``
version that does not understand quotas, then the quota usage will be erased.
The client version is irrelevant to the storage quota and has no part in it.
The form of error messages due to exceeding quota varies with client versions.
A similar situation arises when upgrading from a Borg release that did not have quotas.
Borg will start tracking quota use from the time of the upgrade, starting at zero.
If the quota shall be enforced accurately in these cases, either
- delete the ``index.N`` and ``hints.N`` files, forcing Borg to rebuild both,
re-acquiring quota data in the process, or
- edit the msgpacked ``hints.N`` file (not recommended and thus not
documented further).
The object graph
----------------
@ -344,10 +153,10 @@ More on how this helps security in :ref:`security_structural_auth`.
The manifest
~~~~~~~~~~~~
The manifest is the root of the object hierarchy. It references
all archives in a repository, and thus all data in it.
Since no object references it, it cannot be stored under its ID key.
Instead, the manifest has a fixed all-zero key.
Compared to borg 1.x:
- the manifest moved from object ID 0 to config/manifest
- the archives list has been moved from the manifest to archives/*
The manifest is rewritten each time an archive is created, deleted,
or modified. It looks like this:
@ -523,17 +332,18 @@ these may/may not be implemented and purely serve as examples.
Archives
~~~~~~~~
Each archive is an object referenced by the manifest. The archive object
itself does not store any of the data contained in the archive it describes.
Each archive is an object referenced by an entry below archives/.
The archive object itself does not store any of the data contained in the
archive it describes.
Instead, it contains a list of chunks which form a msgpacked stream of items_.
The archive object itself further contains some metadata:
* *version*
* *name*, which might differ from the name set in the manifest.
* *name*, which might differ from the name set in the archives/* object.
When :ref:`borg_check` rebuilds the manifest (e.g. if it was corrupted) and finds
more than one archive object with the same name, it adds a counter to the name
in the manifest, but leaves the *name* field of the archives as it was.
in archives/*, but leaves the *name* field of the archives as they were.
* *item_ptrs*, a list of "pointer chunk" IDs.
Each "pointer chunk" contains a list of chunk IDs of item metadata.
* *command_line*, the command line which was used to create the archive
@ -676,7 +486,7 @@ In memory, the files cache is a key -> value mapping (a Python *dict*) and conta
- file size
- file ctime_ns (or mtime_ns)
- age (0 [newest], 1, 2, 3, ..., BORG_FILES_CACHE_TTL - 1)
- list of chunk ids representing the file's contents
- list of chunk (id, size) tuples representing the file's contents
To determine whether a file has not changed, cached values are looked up via
the key in the mapping and compared to the current file attribute values.
@ -717,9 +527,9 @@ The on-disk format of the files cache is a stream of msgpacked tuples (key, valu
Loading the files cache involves reading the file, one msgpack object at a time,
unpacking it, and msgpacking the value (in an effort to save memory).
The **chunks cache** is stored in ``cache/chunks`` and is used to determine
whether we already have a specific chunk, to count references to it and also
for statistics.
The **chunks cache** is not persisted to disk, but dynamically built in memory
by querying the existing object IDs from the repository.
It is used to determine whether we already have a specific chunk.
The chunks cache is a key -> value mapping and contains:
@ -728,14 +538,10 @@ The chunks cache is a key -> value mapping and contains:
- chunk id_hash
* value:
- reference count
- size
- reference count (always MAX_VALUE as we do not refcount anymore)
- size (0 for prev. existing objects, we can't query their plaintext size)
The chunks cache is a HashIndex_. Due to some restrictions of HashIndex,
the reference count of each given chunk is limited to a constant, MAX_VALUE
(introduced below in HashIndex_), approximately 2**32.
If a reference count hits MAX_VALUE, decrementing it yields MAX_VALUE again,
i.e. the reference count is pinned to MAX_VALUE.
The chunks cache is a HashIndex_.
.. _cache-memory-usage:
@ -747,14 +553,12 @@ Here is the estimated memory usage of Borg - it's complicated::
chunk_size ~= 2 ^ HASH_MASK_BITS (for buzhash chunker, BLOCK_SIZE for fixed chunker)
chunk_count ~= total_file_size / chunk_size
repo_index_usage = chunk_count * 48
chunks_cache_usage = chunk_count * 40
files_cache_usage = total_file_count * 240 + chunk_count * 80
files_cache_usage = total_file_count * 240 + chunk_count * 165
mem_usage ~= repo_index_usage + chunks_cache_usage + files_cache_usage
= chunk_count * 164 + total_file_count * 240
mem_usage ~= chunks_cache_usage + files_cache_usage
= chunk_count * 205 + total_file_count * 240
Due to the hashtables, the best/usual/worst cases for memory allocation can
be estimated like that::
@ -772,11 +576,9 @@ It is also assuming that typical chunk size is 2^HASH_MASK_BITS (if you have
a lot of files smaller than this statistical medium chunk size, you will have
more chunks than estimated above, because 1 file is at least 1 chunk).
If a remote repository is used the repo index will be allocated on the remote side.
The chunks cache, files cache and the repo index are all implemented as hash
tables. A hash table must have a significant amount of unused entries to be
fast - the so-called load factor gives the used/unused elements ratio.
The chunks cache and files cache are all implemented as hash tables.
A hash table must have a significant amount of unused entries to be fast -
the so-called load factor gives the used/unused elements ratio.
When a hash table gets full (load factor getting too high), it needs to be
grown (allocate new, bigger hash table, copy all elements over to it, free old
@ -802,7 +604,7 @@ b) with ``create --chunker-params buzhash,19,23,21,4095`` (default):
HashIndex
---------
The chunks cache and the repository index are stored as hash tables, with
The chunks cache is implemented as a hash table, with
only one slot per bucket, spreading hash collisions to the following
buckets. As a consequence the hash is just a start position for a linear
search. If a key is looked up that is not in the table, then the hash table
@ -905,7 +707,7 @@ Both modes
~~~~~~~~~~
Encryption keys (and other secrets) are kept either in a key file on the client
('keyfile' mode) or in the repository config on the server ('repokey' mode).
('keyfile' mode) or in the repository under keys/repokey ('repokey' mode).
In both cases, the secrets are generated from random and then encrypted by a
key derived from your passphrase (this happens on the client before the key
is stored into the keyfile or as repokey).
@ -923,8 +725,7 @@ Key files
When initializing a repository with one of the "keyfile" encryption modes,
Borg creates an associated key file in ``$HOME/.config/borg/keys``.
The same key is also used in the "repokey" modes, which store it in the repository
in the configuration file.
The same key is also used in the "repokey" modes, which store it in the repository.
The internal data structure is as follows:
@ -1016,11 +817,10 @@ methods in one repo does not influence deduplication.
See ``borg create --help`` about how to specify the compression level and its default.
Lock files
----------
Lock files (fslocking)
----------------------
Borg uses locks to get (exclusive or shared) access to the cache and
the repository.
Borg uses filesystem locks to get (exclusive or shared) access to the cache.
The locking system is based on renaming a temporary directory
to `lock.exclusive` (for
@ -1037,24 +837,46 @@ to `lock.exclusive`, it has the lock for it. If renaming fails
denotes a thread on the host which is still alive), lock acquisition fails.
The cache lock is usually in `~/.cache/borg/REPOID/lock.*`.
The repository lock is in `repository/lock.*`.
Locks (storelocking)
--------------------
To implement locking based on ``borgstore``, borg stores objects below locks/.
The objects contain:
- a timestamp when lock was created (or refreshed)
- host / process / thread information about lock owner
- lock type: exclusive or shared
Using that information, borg implements:
- lock auto-expiry: if a lock is old and has not been refreshed in time,
it will be automatically ignored and deleted. the primary purpose of this
is to get rid of stale locks by borg processes on other machines.
- lock auto-removal if the owner process is dead. the primary purpose of this
is to quickly get rid of stale locks by borg processes on the same machine.
Breaking the locks
------------------
In case you run into troubles with the locks, you can use the ``borg break-lock``
command after you first have made sure that no Borg process is
running on any machine that accesses this resource. Be very careful, the cache
or repository might get damaged if multiple processes use it at the same time.
If there is an issue just with the repository lock, it will usually resolve
automatically (see above), just retry later.
Checksumming data structures
----------------------------
As detailed in the previous sections, Borg generates and stores various files
containing important meta data, such as the repository index, repository hints,
chunks caches and files cache.
containing important meta data, such as the files cache.
Data corruption in these files can damage the archive data in a repository,
e.g. due to wrong reference counts in the chunks cache. Only some parts of Borg
were designed to handle corrupted data structures, so a corrupted files cache
may cause crashes or write incorrect archives.
Data corruption in the files cache could create incorrect archives, e.g. due
to wrong object IDs or sizes in the files cache.
Therefore, Borg calculates checksums when writing these files and tests checksums
when reading them. Checksums are generally 64-bit XXH64 hashes.
@ -1086,11 +908,11 @@ xxHash was expressly designed for data blocks of these sizes.
Lower layer — file_integrity
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
To accommodate the different transaction models used for the cache and repository,
there is a lower layer (borg.crypto.file_integrity.IntegrityCheckedFile)
wrapping a file-like object, performing streaming calculation and comparison of checksums.
Checksum errors are signalled by raising an exception (borg.crypto.file_integrity.FileIntegrityError)
at the earliest possible moment.
There is a lower layer (borg.crypto.file_integrity.IntegrityCheckedFile)
wrapping a file-like object, performing streaming calculation and comparison
of checksums.
Checksum errors are signalled by raising an exception at the earliest possible
moment (borg.crypto.file_integrity.FileIntegrityError).
.. rubric:: Calculating checksums
@ -1134,19 +956,13 @@ The *digests* key contains a mapping of part names to their digests.
Integrity data is generally stored by the upper layers, introduced below. An exception
is the DetachedIntegrityCheckedFile, which automatically writes and reads it from
a ".integrity" file next to the data file.
It is used for archive chunks indexes in chunks.archive.d.
Upper layer
~~~~~~~~~~~
Storage of integrity data depends on the component using it, since they have
different transaction mechanisms, and integrity data needs to be
transacted with the data it is supposed to protect.
.. rubric:: Main cache files: chunks and files cache
The integrity data of the ``chunks`` and ``files`` caches is stored in the
cache ``config``, since all three are transacted together.
The integrity data of the ``files`` cache is stored in the cache ``config``.
The ``[integrity]`` section is used:
@ -1162,7 +978,7 @@ The ``[integrity]`` section is used:
[integrity]
manifest = 10e...21c
chunks = {"algorithm": "XXH64", "digests": {"HashHeader": "eab...39e3", "final": "e2a...b24"}}
files = {"algorithm": "XXH64", "digests": {"HashHeader": "eab...39e3", "final": "e2a...b24"}}
The manifest ID is duplicated in the integrity section due to the way all Borg
versions handle the config file. Instead of creating a "new" config file from
@ -1182,52 +998,6 @@ easy to tell whether the checksums concern the current state of the cache.
Integrity errors are fatal in these files, terminating the program,
and are not automatically corrected at this time.
.. rubric:: chunks.archive.d
Indices in chunks.archive.d are not transacted and use DetachedIntegrityCheckedFile,
which writes the integrity data to a separate ".integrity" file.
Integrity errors result in deleting the affected index and rebuilding it.
This logs a warning and increases the exit code to WARNING (1).
.. _integrity_repo:
.. rubric:: Repository index and hints
The repository associates index and hints files with a transaction by including the
transaction ID in the file names. Integrity data is stored in a third file
("integrity.<TRANSACTION_ID>"). Like the hints file, it is msgpacked:
.. code-block:: python
{
'version': 2,
'hints': '{"algorithm": "XXH64", "digests": {"final": "411208db2aa13f1a"}}',
'index': '{"algorithm": "XXH64", "digests": {"HashHeader": "846b7315f91b8e48", "final": "cb3e26cadc173e40"}}'
}
The *version* key started at 2, the same version used for the hints. Since Borg has
many versioned file formats, this keeps the number of different versions in use
a bit lower.
The other keys map an auxiliary file, like *index* or *hints* to their integrity data.
Note that the JSON is stored as-is, and not as part of the msgpack structure.
Integrity errors result in deleting the affected file(s) (index/hints) and rebuilding the index,
which is the same action taken when corruption is noticed in other ways (e.g. HashIndex can
detect most corrupted headers, but not data corruption). A warning is logged as well.
The exit code is not influenced, since remote repositories cannot perform that action.
Raising the exit code would be possible for local repositories, but is not implemented.
Unlike the cache design this mechanism can have false positives whenever an older version
*rewrites* the auxiliary files for a transaction created by a newer version,
since that might result in a different index (due to hash-table resizing) or hints file
(hash ordering, or the older version 1 format), while not invalidating the integrity file.
For example, using 1.1 on a repository, noticing corruption or similar issues and then running
``borg-1.0 check --repair``, which rewrites the index and hints, results in this situation.
Borg 1.1 would erroneously report checksum errors in the hints and/or index files and trigger
an automatic rebuild of these files.
HardLinkManager and the hlid concept
------------------------------------

Binary file not shown.

Binary file not shown.

Before

Width:  |  Height:  |  Size: 380 KiB

After

Width:  |  Height:  |  Size: 98 KiB

View file

@ -31,14 +31,14 @@ deleted between attacks).
Under these circumstances Borg guarantees that the attacker cannot
1. modify the data of any archive without the client detecting the change
2. rename, remove or add an archive without the client detecting the change
2. rename or add an archive without the client detecting the change
3. recover plain-text data
4. recover definite (heuristics based on access patterns are possible)
structural information such as the object graph (which archives
refer to what chunks)
The attacker can always impose a denial of service per definition (he could
forbid connections to the repository, or delete it entirely).
forbid connections to the repository, or delete it partly or entirely).
.. _security_structural_auth:
@ -47,12 +47,12 @@ Structural Authentication
-------------------------
Borg is fundamentally based on an object graph structure (see :ref:`internals`),
where the root object is called the manifest.
where the root objects are the archives.
Borg follows the `Horton principle`_, which states that
not only the message must be authenticated, but also its meaning (often
expressed through context), because every object used is referenced by a
parent object through its object ID up to the manifest. The object ID in
parent object through its object ID up to the archive list entry. The object ID in
Borg is a MAC of the object's plaintext, therefore this ensures that
an attacker cannot change the context of an object without forging the MAC.
@ -64,8 +64,8 @@ represent packed file metadata. On their own, it's not clear that these objects
would represent what they do, but by the archive item referring to them
in a particular part of its own data structure assigns this meaning.
This results in a directed acyclic graph of authentication from the manifest
to the data chunks of individual files.
This results in a directed acyclic graph of authentication from the archive
list entry to the data chunks of individual files.
Above used to be all for borg 1.x and was the reason why it needed the
tertiary authentication mechanism (TAM) for manifest and archives.
@ -80,11 +80,23 @@ the object ID (via giving the ID as AAD), there is no way an attacker (without
access to the borg key) could change the type of the object or move content
to a different object ID.
This effectively 'anchors' the manifest (and also other metadata, like archives)
to the key, which is controlled by the client, thereby anchoring the entire DAG,
making it impossible for an attacker to add, remove or modify any part of the
This effectively 'anchors' each archive to the key, which is controlled by the
client, thereby anchoring the DAG starting from the archives list entry,
making it impossible for an attacker to add or modify any part of the
DAG without Borg being able to detect the tampering.
Please note that removing an archive by removing an entry from archives/*
is possible and is done by ``borg delete`` and ``borg prune`` within their
normal operation. An attacker could also remove some entries there, but, due to
encryption, would not know what exactly they are removing. An attacker with
repository access could also remove other parts of the repository or the whole
repository, so there is not much point in protecting against archive removal.
The borg 1.x way of having the archives list within the manifest chunk was
problematic as it required a read-modify-write operation on the manifest,
requiring a lock on the repository. We want to try less locking and more
parallelism in future.
Passphrase notes
----------------

View file

@ -27,7 +27,7 @@ level margin: \\n[rst2man-indent\\n[rst2man-indent-level]]
.\" new: \\n[rst2man-indent\\n[rst2man-indent-level]]
.in \\n[rst2man-indent\\n[rst2man-indent-level]]u
..
.TH "BORG-BENCHMARK-CPU" 1 "2024-07-19" "" "borg backup tool"
.TH "BORG-BENCHMARK-CPU" 1 "2024-09-08" "" "borg backup tool"
.SH NAME
borg-benchmark-cpu \- Benchmark CPU bound operations.
.SH SYNOPSIS

View file

@ -27,7 +27,7 @@ level margin: \\n[rst2man-indent\\n[rst2man-indent-level]]
.\" new: \\n[rst2man-indent\\n[rst2man-indent-level]]
.in \\n[rst2man-indent\\n[rst2man-indent-level]]u
..
.TH "BORG-BENCHMARK-CRUD" 1 "2024-07-19" "" "borg backup tool"
.TH "BORG-BENCHMARK-CRUD" 1 "2024-09-08" "" "borg backup tool"
.SH NAME
borg-benchmark-crud \- Benchmark Create, Read, Update, Delete for archives.
.SH SYNOPSIS

View file

@ -27,7 +27,7 @@ level margin: \\n[rst2man-indent\\n[rst2man-indent-level]]
.\" new: \\n[rst2man-indent\\n[rst2man-indent-level]]
.in \\n[rst2man-indent\\n[rst2man-indent-level]]u
..
.TH "BORG-BENCHMARK" 1 "2024-07-19" "" "borg backup tool"
.TH "BORG-BENCHMARK" 1 "2024-09-08" "" "borg backup tool"
.SH NAME
borg-benchmark \- benchmark command
.SH SYNOPSIS

View file

@ -27,7 +27,7 @@ level margin: \\n[rst2man-indent\\n[rst2man-indent-level]]
.\" new: \\n[rst2man-indent\\n[rst2man-indent-level]]
.in \\n[rst2man-indent\\n[rst2man-indent-level]]u
..
.TH "BORG-BREAK-LOCK" 1 "2024-07-19" "" "borg backup tool"
.TH "BORG-BREAK-LOCK" 1 "2024-09-08" "" "borg backup tool"
.SH NAME
borg-break-lock \- Break the repository lock (e.g. in case it was left by a dead borg.
.SH SYNOPSIS

View file

@ -27,7 +27,7 @@ level margin: \\n[rst2man-indent\\n[rst2man-indent-level]]
.\" new: \\n[rst2man-indent\\n[rst2man-indent-level]]
.in \\n[rst2man-indent\\n[rst2man-indent-level]]u
..
.TH "BORG-CHECK" 1 "2024-07-19" "" "borg backup tool"
.TH "BORG-CHECK" 1 "2024-09-08" "" "borg backup tool"
.SH NAME
borg-check \- Check repository consistency
.SH SYNOPSIS
@ -40,8 +40,8 @@ It consists of two major steps:
.INDENT 0.0
.IP 1. 3
Checking the consistency of the repository itself. This includes checking
the segment magic headers, and both the metadata and data of all objects in
the segments. The read data is checked by size and CRC. Bit rot and other
the file magic headers, and both the metadata and data of all objects in
the repository. The read data is checked by size and hash. Bit rot and other
types of accidental damage can be detected this way. Running the repository
check can be split into multiple partial checks using \fB\-\-max\-duration\fP\&.
When checking a remote repository, please note that the checks run on the
@ -77,13 +77,12 @@ archive checks, nor enable repair mode. Consequently, if you want to use
.sp
\fBWarning:\fP Please note that partial repository checks (i.e. running it with
\fB\-\-max\-duration\fP) can only perform non\-cryptographic checksum checks on the
segment files. A full repository check (i.e. without \fB\-\-max\-duration\fP) can
also do a repository index check. Enabling partial repository checks excepts
archive checks for the same reason. Therefore partial checks may be useful with
very large repositories only where a full check would take too long.
repository files. Enabling partial repository checks excepts archive checks
for the same reason. Therefore partial checks may be useful with very large
repositories only where a full check would take too long.
.sp
The \fB\-\-verify\-data\fP option will perform a full integrity verification (as
opposed to checking the CRC32 of the segment) of data, which means reading the
opposed to checking just the xxh64) of data, which means reading the
data from the repository, decrypting and decompressing it. It is a complete
cryptographic verification and hence very time consuming, but will detect any
accidental and malicious corruption. Tamper\-resistance is only guaranteed for
@ -122,17 +121,15 @@ by definition, a potentially lossy task.
In practice, repair mode hooks into both the repository and archive checks:
.INDENT 0.0
.IP 1. 3
When checking the repository\(aqs consistency, repair mode will try to recover
as many objects from segments with integrity errors as possible, and ensure
that the index is consistent with the data stored in the segments.
When checking the repository\(aqs consistency, repair mode removes corrupted
objects from the repository after it did a 2nd try to read them correctly.
.IP 2. 3
When checking the consistency and correctness of archives, repair mode might
remove whole archives from the manifest if their archive metadata chunk is
corrupt or lost. On a chunk level (i.e. the contents of files), repair mode
will replace corrupt or lost chunks with a same\-size replacement chunk of
zeroes. If a previously zeroed chunk reappears, repair mode will restore
this lost chunk using the new chunk. Lastly, repair mode will also delete
orphaned chunks (e.g. caused by read errors while creating the archive).
this lost chunk using the new chunk.
.UNINDENT
.sp
Most steps taken by repair mode have a one\-time effect on the repository, like
@ -152,6 +149,12 @@ replace the all\-zero replacement chunk by the reappeared chunk. If all lost
chunks of a \(dqzero\-patched\(dq file reappear, this effectively \(dqheals\(dq the file.
Consequently, if lost chunks were repaired earlier, it is advised to run
\fB\-\-repair\fP a second time after creating some new backups.
.sp
If \fB\-\-repair \-\-undelete\-archives\fP is given, Borg will scan the repository
for archive metadata and if it finds some where no corresponding archives
directory entry exists, it will create the entries. This is basically undoing
\fBborg delete archive\fP or \fBborg prune ...\fP commands and only possible before
\fBborg compact\fP would remove the archives\(aq data completely.
.SH OPTIONS
.sp
See \fIborg\-common(1)\fP for common options of Borg commands.
@ -170,6 +173,9 @@ perform cryptographic archive data integrity verification (conflicts with \fB\-\
.B \-\-repair
attempt to repair any inconsistencies found
.TP
.B \-\-undelete\-archives
attempt to undelete archives (use with \-\-repair)
.TP
.BI \-\-max\-duration \ SECONDS
do only a partial repo check for max. SECONDS seconds (Default: unlimited)
.UNINDENT

View file

@ -27,7 +27,7 @@ level margin: \\n[rst2man-indent\\n[rst2man-indent-level]]
.\" new: \\n[rst2man-indent\\n[rst2man-indent-level]]
.in \\n[rst2man-indent\\n[rst2man-indent-level]]u
..
.TH "BORG-COMMON" 1 "2024-07-19" "" "borg backup tool"
.TH "BORG-COMMON" 1 "2024-09-08" "" "borg backup tool"
.SH NAME
borg-common \- Common options of Borg commands
.SH SYNOPSIS
@ -64,10 +64,7 @@ format using IEC units (1KiB = 1024B)
Output one JSON object per log line instead of formatted text.
.TP
.BI \-\-lock\-wait \ SECONDS
wait at most SECONDS for acquiring a repository/cache lock (default: 1).
.TP
.B \-\-bypass\-lock
Bypass locking mechanism
wait at most SECONDS for acquiring a repository/cache lock (default: 10).
.TP
.B \-\-show\-version
show/log the borg version

View file

@ -27,40 +27,25 @@ level margin: \\n[rst2man-indent\\n[rst2man-indent-level]]
.\" new: \\n[rst2man-indent\\n[rst2man-indent-level]]
.in \\n[rst2man-indent\\n[rst2man-indent-level]]u
..
.TH "BORG-COMPACT" 1 "2024-07-19" "" "borg backup tool"
.TH "BORG-COMPACT" 1 "2024-09-08" "" "borg backup tool"
.SH NAME
borg-compact \- compact segment files in the repository
borg-compact \- Collect garbage in repository
.SH SYNOPSIS
.sp
borg [common options] compact [options]
.SH DESCRIPTION
.sp
This command frees repository space by compacting segments.
Free repository space by deleting unused chunks.
.sp
Use this regularly to avoid running out of space \- you do not need to use this
after each borg command though. It is especially useful after deleting archives,
because only compaction will really free repository space.
borg compact analyzes all existing archives to find out which chunks are
actually used. There might be unused chunks resulting from borg delete or prune,
which can be removed to free space in the repository.
.sp
borg compact does not need a key, so it is possible to invoke it from the
client or also from the server.
.sp
Depending on the amount of segments that need compaction, it may take a while,
so consider using the \fB\-\-progress\fP option.
.sp
A segment is compacted if the amount of saved space is above the percentage value
given by the \fB\-\-threshold\fP option. If omitted, a threshold of 10% is used.
When using \fB\-\-verbose\fP, borg will output an estimate of the freed space.
.sp
See \fIseparate_compaction\fP in Additional Notes for more details.
Differently than borg 1.x, borg2\(aqs compact needs the borg key if the repo is
encrypted.
.SH OPTIONS
.sp
See \fIborg\-common(1)\fP for common options of Borg commands.
.SS optional arguments
.INDENT 0.0
.TP
.BI \-\-threshold \ PERCENT
set minimum threshold for saved space in PERCENT (Default: 10)
.UNINDENT
.SH EXAMPLES
.INDENT 0.0
.INDENT 3.5

View file

@ -27,7 +27,7 @@ level margin: \\n[rst2man-indent\\n[rst2man-indent-level]]
.\"