mirror of
https://github.com/borgbackup/borg.git
synced 2024-12-11 02:27:57 +00:00
Merge pull request #8332 from ThomasWaldmann/use-borgstore
use borgstore and other big changes
This commit is contained in:
commit
ea08e49210
166 changed files with 6744 additions and 8421 deletions
2
.github/workflows/black.yaml
vendored
2
.github/workflows/black.yaml
vendored
|
@ -12,4 +12,4 @@ jobs:
|
|||
- uses: actions/checkout@v4
|
||||
- uses: psf/black@stable
|
||||
with:
|
||||
version: "~= 23.0"
|
||||
version: "~= 24.0"
|
||||
|
|
3
.github/workflows/ci.yml
vendored
3
.github/workflows/ci.yml
vendored
|
@ -107,8 +107,7 @@ jobs:
|
|||
pip install -r requirements.d/development.txt
|
||||
- name: Install borgbackup
|
||||
run: |
|
||||
# pip install -e .
|
||||
python setup.py -v develop
|
||||
pip install -e .
|
||||
- name: run tox env
|
||||
env:
|
||||
XDISTN: "4"
|
||||
|
|
|
@ -1,6 +1,6 @@
|
|||
repos:
|
||||
- repo: https://github.com/psf/black
|
||||
rev: 23.1.0
|
||||
rev: 24.8.0
|
||||
hooks:
|
||||
- id: black
|
||||
- repo: https://github.com/astral-sh/ruff-pre-commit
|
||||
|
|
|
@ -69,7 +69,7 @@ Main features
|
|||
**Speed**
|
||||
* performance-critical code (chunking, compression, encryption) is
|
||||
implemented in C/Cython
|
||||
* local caching of files/chunks index data
|
||||
* local caching
|
||||
* quick detection of unmodified files
|
||||
|
||||
**Data encryption**
|
||||
|
|
|
@ -12,8 +12,8 @@ This section provides information about security and corruption issues.
|
|||
Upgrade Notes
|
||||
=============
|
||||
|
||||
borg 1.2.x to borg 2.0
|
||||
----------------------
|
||||
borg 1.2.x/1.4.x to borg 2.0
|
||||
----------------------------
|
||||
|
||||
Compatibility notes:
|
||||
|
||||
|
@ -21,11 +21,11 @@ Compatibility notes:
|
|||
|
||||
We tried to put all the necessary "breaking" changes into this release, so we
|
||||
hopefully do not need another breaking release in the near future. The changes
|
||||
were necessary for improved security, improved speed, unblocking future
|
||||
improvements, getting rid of legacy crap / design limitations, having less and
|
||||
simpler code to maintain.
|
||||
were necessary for improved security, improved speed and parallelism,
|
||||
unblocking future improvements, getting rid of legacy crap and design
|
||||
limitations, having less and simpler code to maintain.
|
||||
|
||||
You can use "borg transfer" to transfer archives from borg 1.1/1.2 repos to
|
||||
You can use "borg transfer" to transfer archives from borg 1.2/1.4 repos to
|
||||
a new borg 2.0 repo, but it will need some time and space.
|
||||
|
||||
Before using "borg transfer", you must have upgraded to borg >= 1.2.6 (or
|
||||
|
@ -84,6 +84,7 @@ Compatibility notes:
|
|||
- removed --nobsdflags (use --noflags)
|
||||
- removed --noatime (default now, see also --atime)
|
||||
- removed --save-space option (does not change behaviour)
|
||||
- removed --bypass-lock option
|
||||
- using --list together with --progress is now disallowed (except with --log-json), #7219
|
||||
- the --glob-archives option was renamed to --match-archives (the short option
|
||||
name -a is unchanged) and extended to support different pattern styles:
|
||||
|
@ -114,12 +115,61 @@ Compatibility notes:
|
|||
fail now that somehow "worked" before (but maybe didn't work as intended due to
|
||||
the contradicting options).
|
||||
|
||||
|
||||
.. _changelog:
|
||||
|
||||
Change Log 2.x
|
||||
==============
|
||||
|
||||
Version 2.0.0b10 (2024-09-09)
|
||||
-----------------------------
|
||||
|
||||
TL;DR: this is a huge change and the first very fundamental change in how borg
|
||||
works since ever:
|
||||
|
||||
- you will need to create new repos.
|
||||
- likely more exciting than previous betas, definitely not for production.
|
||||
|
||||
New features:
|
||||
|
||||
- borgstore based repository, file:, ssh: and sftp: for now, more possible.
|
||||
- repository stores objects separately now, not using segment files.
|
||||
this has more fs overhead, but needs much less I/O because no segment
|
||||
files compaction is required anymore. also, no repository index is
|
||||
needed anymore because we can directly find the objects by their ID.
|
||||
- locking: new borgstore based repository locking with automatic stale
|
||||
lock removal (if lock does not get refreshed, if lock owner process is dead).
|
||||
- simultaneous repository access for many borg commands except check/compact.
|
||||
the cache lock for adhocwithfiles is still exclusive though, so use
|
||||
BORG_CACHE_IMPL=adhoc if you want to try that out using only 1 machine
|
||||
and 1 user (that implementation doesn't use a cache lock). When using
|
||||
multiple client machines or users, it also works with the default cache.
|
||||
- delete/prune: much quicker now and can be undone.
|
||||
- check --repair --undelete-archives: bring archives back from the dead.
|
||||
- rspace: manage reserved space in repository (avoid dead-end situation if
|
||||
repository fs runs full).
|
||||
|
||||
Bugs/issues fixed:
|
||||
|
||||
- a lot! all linked from PR #8332.
|
||||
|
||||
Other changes:
|
||||
|
||||
- repository: remove transactions, solved differently and much simpler now
|
||||
(convergence and write order primarily).
|
||||
- repository: replaced precise reference counting with "object exists in repo?"
|
||||
and "garbage collection of unused objects".
|
||||
- cache: remove transactions, remove chunks cache.
|
||||
removed LocalCache, BORG_CACHE_IMPL=local, solving all related issues.
|
||||
as in beta 9, adhowwithfiles is the default implementation.
|
||||
- compact: needs the borg key now (run it clientside), -v gives nice stats.
|
||||
- transfer: archive transfers from borg 1.x need the --from-borg1 option
|
||||
- check: reimplemented / bigger changes.
|
||||
- code: got rid of a metric ton of not needed complexity.
|
||||
when borg does not need to read borg 1.x repos/archives anymore, after
|
||||
users have transferred their archives, even much more can be removed.
|
||||
- docs: updated / removed outdated stuff
|
||||
|
||||
|
||||
Version 2.0.0b9 (2024-07-20)
|
||||
----------------------------
|
||||
|
||||
|
|
|
@ -3469,7 +3469,7 @@ Other changes:
|
|||
- archiver tests: add check_cache tool - lints refcounts
|
||||
|
||||
- fixed cache sync performance regression from 1.1.0b1 onwards, #1940
|
||||
- syncing the cache without chunks.archive.d (see :ref:`disable_archive_chunks`)
|
||||
- syncing the cache without chunks.archive.d
|
||||
now avoids any merges and is thus faster, #1940
|
||||
- borg check --verify-data: faster due to linear on-disk-order scan
|
||||
- borg debug-xxx commands removed, we use "debug xxx" subcommands now, #1627
|
||||
|
|
|
@ -105,7 +105,7 @@ modify it to suit your needs (e.g. more backup sets, dumping databases etc.).
|
|||
#
|
||||
|
||||
# Options for borg create
|
||||
BORG_OPTS="--stats --one-file-system --compression lz4 --checkpoint-interval 86400"
|
||||
BORG_OPTS="--stats --one-file-system --compression lz4"
|
||||
|
||||
# Set BORG_PASSPHRASE or BORG_PASSCOMMAND somewhere around here, using export,
|
||||
# if encryption is used.
|
||||
|
|
|
@ -68,8 +68,6 @@ can be filled to the specified quota.
|
|||
If storage quotas are used, ensure that all deployed Borg releases
|
||||
support storage quotas.
|
||||
|
||||
Refer to :ref:`internals_storage_quota` for more details on storage quotas.
|
||||
|
||||
**Specificities: Append-only repositories**
|
||||
|
||||
Running ``borg init`` via a ``borg serve --append-only`` server will **not**
|
||||
|
|
163
docs/faq.rst
163
docs/faq.rst
|
@ -14,7 +14,7 @@ What is the difference between a repo on an external hard drive vs. repo on a se
|
|||
If Borg is running in client/server mode, the client uses SSH as a transport to
|
||||
talk to the remote agent, which is another Borg process (Borg is installed on
|
||||
the server, too) started automatically by the client. The Borg server is doing
|
||||
storage-related low-level repo operations (get, put, commit, check, compact),
|
||||
storage-related low-level repo operations (list, load and store objects),
|
||||
while the Borg client does the high-level stuff: deduplication, encryption,
|
||||
compression, dealing with archives, backups, restores, etc., which reduces the
|
||||
amount of data that goes over the network.
|
||||
|
@ -27,17 +27,7 @@ which is slower.
|
|||
Can I back up from multiple servers into a single repository?
|
||||
-------------------------------------------------------------
|
||||
|
||||
In order for the deduplication used by Borg to work, it
|
||||
needs to keep a local cache containing checksums of all file
|
||||
chunks already stored in the repository. This cache is stored in
|
||||
``~/.cache/borg/``. If Borg detects that a repository has been
|
||||
modified since the local cache was updated it will need to rebuild
|
||||
the cache. This rebuild can be quite time consuming.
|
||||
|
||||
So, yes it's possible. But it will be most efficient if a single
|
||||
repository is only modified from one place. Also keep in mind that
|
||||
Borg will keep an exclusive lock on the repository while creating
|
||||
or deleting archives, which may make *simultaneous* backups fail.
|
||||
Yes, you can! Even simultaneously.
|
||||
|
||||
Can I back up to multiple, swapped backup targets?
|
||||
--------------------------------------------------
|
||||
|
@ -124,50 +114,31 @@ Are there other known limitations?
|
|||
remove files which are in the destination, but not in the archive.
|
||||
See :issue:`4598` for a workaround and more details.
|
||||
|
||||
.. _checkpoints_parts:
|
||||
.. _interrupted_backup:
|
||||
|
||||
If a backup stops mid-way, does the already-backed-up data stay there?
|
||||
----------------------------------------------------------------------
|
||||
|
||||
Yes, Borg supports resuming backups.
|
||||
|
||||
During a backup, a special checkpoint archive named ``<archive-name>.checkpoint``
|
||||
is saved at every checkpoint interval (the default value for this is 30
|
||||
minutes) containing all the data backed-up until that point.
|
||||
|
||||
This checkpoint archive is a valid archive, but it is only a partial backup
|
||||
(not all files that you wanted to back up are contained in it and the last file
|
||||
in it might be a partial file). Having it in the repo until a successful, full
|
||||
backup is completed is useful because it references all the transmitted chunks up
|
||||
to the checkpoint. This means that in case of an interruption, you only need to
|
||||
retransfer the data since the last checkpoint.
|
||||
Yes, the data transferred into the repo stays there - just avoid running
|
||||
``borg compact`` before you completed the backup, because that would remove
|
||||
chunks that were already transferred to the repo, but not (yet) referenced
|
||||
by an archive.
|
||||
|
||||
If a backup was interrupted, you normally do not need to do anything special,
|
||||
just invoke ``borg create`` as you always do. If the repository is still locked,
|
||||
you may need to run ``borg break-lock`` before the next backup. You may use the
|
||||
same archive name as in previous attempt or a different one (e.g. if you always
|
||||
include the current datetime), it does not matter.
|
||||
just invoke ``borg create`` as you always do. You may use the same archive name
|
||||
as in previous attempt or a different one (e.g. if you always include the
|
||||
current datetime), it does not matter.
|
||||
|
||||
Borg always does full single-pass backups, so it will start again
|
||||
from the beginning - but it will be much faster, because some of the data was
|
||||
already stored into the repo (and is still referenced by the checkpoint
|
||||
archive), so it does not need to get transmitted and stored again.
|
||||
|
||||
Once your backup has finished successfully, you can delete all
|
||||
``<archive-name>.checkpoint`` archives. If you run ``borg prune``, it will
|
||||
also care for deleting unneeded checkpoints.
|
||||
|
||||
Note: the checkpointing mechanism may create a partial (truncated) last file
|
||||
in a checkpoint archive named ``<filename>.borg_part``. Such partial files
|
||||
won't be contained in the final archive.
|
||||
This is done so that checkpoints work cleanly and promptly while a big
|
||||
file is being processed.
|
||||
already stored into the repo, so it does not need to get transmitted and stored
|
||||
again.
|
||||
|
||||
|
||||
How can I back up huge file(s) over a unstable connection?
|
||||
----------------------------------------------------------
|
||||
|
||||
Yes. For more details, see :ref:`checkpoints_parts`.
|
||||
Yes. For more details, see :ref:`interrupted_backup`.
|
||||
|
||||
How can I restore huge file(s) over an unstable connection?
|
||||
-----------------------------------------------------------
|
||||
|
@ -220,23 +191,6 @@ Yes, if you want to detect accidental data damage (like bit rot), use the
|
|||
If you want to be able to detect malicious tampering also, use an encrypted
|
||||
repo. It will then be able to check using CRCs and HMACs.
|
||||
|
||||
Can I use Borg on SMR hard drives?
|
||||
----------------------------------
|
||||
|
||||
SMR (shingled magnetic recording) hard drives are very different from
|
||||
regular hard drives. Applications have to behave in certain ways or
|
||||
performance will be heavily degraded.
|
||||
|
||||
Borg ships with default settings suitable for SMR drives,
|
||||
and has been successfully tested on *Seagate Archive v2* drives
|
||||
using the ext4 file system.
|
||||
|
||||
Some Linux kernel versions between 3.19 and 4.5 had various bugs
|
||||
handling device-managed SMR drives, leading to IO errors, unresponsive
|
||||
drives and unreliable operation in general.
|
||||
|
||||
For more details, refer to :issue:`2252`.
|
||||
|
||||
.. _faq-integrityerror:
|
||||
|
||||
I get an IntegrityError or similar - what now?
|
||||
|
@ -355,7 +309,7 @@ Why is the time elapsed in the archive stats different from wall clock time?
|
|||
----------------------------------------------------------------------------
|
||||
|
||||
Borg needs to write the time elapsed into the archive metadata before finalizing
|
||||
the archive and committing the repo & cache.
|
||||
the archive and saving the files cache.
|
||||
This means when Borg is run with e.g. the ``time`` command, the duration shown
|
||||
in the archive stats may be shorter than the full time the command runs for.
|
||||
|
||||
|
@ -391,8 +345,7 @@ will of course delete everything in the archive, not only some files.
|
|||
:ref:`borg_recreate` command to rewrite all archives with a different
|
||||
``--exclude`` pattern. See the examples in the manpage for more information.
|
||||
|
||||
Finally, run :ref:`borg_compact` with the ``--threshold 0`` option to delete the
|
||||
data chunks from the repository.
|
||||
Finally, run :ref:`borg_compact` to delete the data chunks from the repository.
|
||||
|
||||
Can I safely change the compression level or algorithm?
|
||||
--------------------------------------------------------
|
||||
|
@ -402,6 +355,7 @@ are calculated *before* compression. New compression settings
|
|||
will only be applied to new chunks, not existing chunks. So it's safe
|
||||
to change them.
|
||||
|
||||
Use ``borg rcompress`` to efficiently recompress a complete repository.
|
||||
|
||||
Security
|
||||
########
|
||||
|
@ -704,38 +658,6 @@ serialized way in a single script, you need to give them ``--lock-wait N`` (with
|
|||
being a bit more than the time the server needs to terminate broken down
|
||||
connections and release the lock).
|
||||
|
||||
.. _disable_archive_chunks:
|
||||
|
||||
The borg cache eats way too much disk space, what can I do?
|
||||
-----------------------------------------------------------
|
||||
|
||||
This may especially happen if borg needs to rebuild the local "chunks" index -
|
||||
either because it was removed, or because it was not coherent with the
|
||||
repository state any more (e.g. because another borg instance changed the
|
||||
repository).
|
||||
|
||||
To optimize this rebuild process, borg caches per-archive information in the
|
||||
``chunks.archive.d/`` directory. It won't help the first time it happens, but it
|
||||
will make the subsequent rebuilds faster (because it needs to transfer less data
|
||||
from the repository). While being faster, the cache needs quite some disk space,
|
||||
which might be unwanted.
|
||||
|
||||
You can disable the cached archive chunk indexes by setting the environment
|
||||
variable ``BORG_USE_CHUNKS_ARCHIVE`` to ``no``.
|
||||
|
||||
This has some pros and cons, though:
|
||||
|
||||
- much less disk space needs for ~/.cache/borg.
|
||||
- chunk cache resyncs will be slower as it will have to transfer chunk usage
|
||||
metadata for all archives from the repository (which might be slow if your
|
||||
repo connection is slow) and it will also have to build the hashtables from
|
||||
that data.
|
||||
chunk cache resyncs happen e.g. if your repo was written to by another
|
||||
machine (if you share same backup repo between multiple machines) or if
|
||||
your local chunks cache was lost somehow.
|
||||
|
||||
The long term plan to improve this is called "borgception", see :issue:`474`.
|
||||
|
||||
Can I back up my root partition (/) with Borg?
|
||||
----------------------------------------------
|
||||
|
||||
|
@ -779,7 +701,7 @@ This can make creation of the first archive slower, but saves time
|
|||
and disk space on subsequent runs. Here what Borg does when you run ``borg create``:
|
||||
|
||||
- Borg chunks the file (using the relatively expensive buzhash algorithm)
|
||||
- It then computes the "id" of the chunk (hmac-sha256 (often slow, except
|
||||
- It then computes the "id" of the chunk (hmac-sha256 (slow, except
|
||||
if your CPU has sha256 acceleration) or blake2b (fast, in software))
|
||||
- Then it checks whether this chunk is already in the repo (local hashtable lookup,
|
||||
fast). If so, the processing of the chunk is completed here. Otherwise it needs to
|
||||
|
@ -790,9 +712,8 @@ and disk space on subsequent runs. Here what Borg does when you run ``borg creat
|
|||
- Transmits to repo. If the repo is remote, this usually involves an SSH connection
|
||||
(does its own encryption / authentication).
|
||||
- Stores the chunk into a key/value store (the key is the chunk id, the value
|
||||
is the data). While doing that, it computes CRC32 / XXH64 of the data (repo low-level
|
||||
checksum, used by borg check --repository) and also updates the repo index
|
||||
(another hashtable).
|
||||
is the data). While doing that, it computes XXH64 of the data (repo low-level
|
||||
checksum, used by borg check --repository).
|
||||
|
||||
Subsequent backups are usually very fast if most files are unchanged and only
|
||||
a few are new or modified. The high performance on unchanged files primarily depends
|
||||
|
@ -826,10 +747,9 @@ If you feel your Borg backup is too slow somehow, here is what you can do:
|
|||
- Don't use any expensive compression. The default is lz4 and super fast.
|
||||
Uncompressed is often slower than lz4.
|
||||
- Just wait. You can also interrupt it and start it again as often as you like,
|
||||
it will converge against a valid "completed" state (see ``--checkpoint-interval``,
|
||||
maybe use the default, but in any case don't make it too short). It is starting
|
||||
it will converge against a valid "completed" state. It is starting
|
||||
from the beginning each time, but it is still faster then as it does not store
|
||||
data into the repo which it already has there from last checkpoint.
|
||||
data into the repo which it already has there.
|
||||
- If you don’t need additional file attributes, you can disable them with ``--noflags``,
|
||||
``--noacls``, ``--noxattrs``. This can lead to noticeable performance improvements
|
||||
when your backup consists of many small files.
|
||||
|
@ -1021,6 +941,12 @@ To achieve this, run ``borg create`` within the mountpoint/snapshot directory:
|
|||
cd /mnt/rootfs
|
||||
borg create rootfs_backup .
|
||||
|
||||
Another way (without changing the directory) is to use the slashdot hack:
|
||||
|
||||
::
|
||||
|
||||
borg create rootfs_backup /mnt/rootfs/./
|
||||
|
||||
|
||||
I am having troubles with some network/FUSE/special filesystem, why?
|
||||
--------------------------------------------------------------------
|
||||
|
@ -1100,16 +1026,6 @@ to make it behave correctly::
|
|||
.. _workaround: https://unix.stackexchange.com/a/123236
|
||||
|
||||
|
||||
Can I disable checking for free disk space?
|
||||
-------------------------------------------
|
||||
|
||||
In some cases, the free disk space of the target volume is reported incorrectly.
|
||||
This can happen for CIFS- or FUSE shares. If you are sure that your target volume
|
||||
will always have enough disk space, you can use the following workaround to disable
|
||||
checking for free disk space::
|
||||
|
||||
borg config -- additional_free_space -2T
|
||||
|
||||
How do I rename a repository?
|
||||
-----------------------------
|
||||
|
||||
|
@ -1126,26 +1042,6 @@ It may be useful to set ``BORG_RELOCATED_REPO_ACCESS_IS_OK=yes`` to avoid the
|
|||
prompts when renaming multiple repositories or in a non-interactive context
|
||||
such as a script. See :doc:`deployment` for an example.
|
||||
|
||||
The repository quota size is reached, what can I do?
|
||||
----------------------------------------------------
|
||||
|
||||
The simplest solution is to increase or disable the quota and resume the backup:
|
||||
|
||||
::
|
||||
|
||||
borg config /path/to/repo storage_quota 0
|
||||
|
||||
If you are bound to the quota, you have to free repository space. The first to
|
||||
try is running :ref:`borg_compact` to free unused backup space (see also
|
||||
:ref:`separate_compaction`):
|
||||
|
||||
::
|
||||
|
||||
borg compact /path/to/repo
|
||||
|
||||
If your repository is already compacted, run :ref:`borg_prune` or
|
||||
:ref:`borg_delete` to delete archives that you do not need anymore, and then run
|
||||
``borg compact`` again.
|
||||
|
||||
My backup disk is full, what can I do?
|
||||
--------------------------------------
|
||||
|
@ -1159,11 +1055,6 @@ conditions, but generally this should be avoided. If your backup disk is already
|
|||
full when Borg starts a write command like `borg create`, it will abort
|
||||
immediately and the repository will stay as-is.
|
||||
|
||||
If you run a backup that stops due to a disk running full, Borg will roll back,
|
||||
delete the new segment file and thus freeing disk space automatically. There
|
||||
may be a checkpoint archive left that has been saved before the disk got full.
|
||||
You can keep it to speed up the next backup or delete it to get back more disk
|
||||
space.
|
||||
|
||||
Miscellaneous
|
||||
#############
|
||||
|
|
Binary file not shown.
Binary file not shown.
Before Width: | Height: | Size: 324 KiB |
|
@ -19,63 +19,51 @@ discussion about internals`_ and also on static code analysis.
|
|||
Repository
|
||||
----------
|
||||
|
||||
.. Some parts of this description were taken from the Repository docstring
|
||||
Borg stores its data in a `Repository`, which is a key-value store and has
|
||||
the following structure:
|
||||
|
||||
Borg stores its data in a `Repository`, which is a file system based
|
||||
transactional key-value store. Thus the repository does not know about
|
||||
the concept of archives or items.
|
||||
config/
|
||||
readme
|
||||
simple text object telling that this is a Borg repository
|
||||
id
|
||||
the unique repository ID encoded as hexadecimal number text
|
||||
version
|
||||
the repository version encoded as decimal number text
|
||||
manifest
|
||||
some data about the repository, binary
|
||||
last-key-checked
|
||||
repository check progress (partial checks, full checks' checkpointing),
|
||||
path of last object checked as text
|
||||
space-reserve.N
|
||||
purely random binary data to reserve space, e.g. for disk-full emergencies
|
||||
|
||||
Each repository has the following file structure:
|
||||
There is a list of pointers to archive objects in this directory:
|
||||
|
||||
README
|
||||
simple text file telling that this is a Borg repository
|
||||
archives/
|
||||
0000... .. ffff...
|
||||
|
||||
config
|
||||
repository configuration
|
||||
The actual data is stored into a nested directory structure, using the full
|
||||
object ID as name. Each (encrypted and compressed) object is stored separately.
|
||||
|
||||
data/
|
||||
directory where the actual data is stored
|
||||
00/ .. ff/
|
||||
00/ .. ff/
|
||||
0000... .. ffff...
|
||||
|
||||
hints.%d
|
||||
hints for repository compaction
|
||||
keys/
|
||||
repokey
|
||||
When using encryption in repokey mode, the encrypted, passphrase protected
|
||||
key is stored here as a base64 encoded text.
|
||||
|
||||
index.%d
|
||||
repository index
|
||||
locks/
|
||||
used by the locking system to manage shared and exclusive locks.
|
||||
|
||||
lock.roster and lock.exclusive/*
|
||||
used by the locking system to manage shared and exclusive locks
|
||||
|
||||
Transactionality is achieved by using a log (aka journal) to record changes. The log is a series of numbered files
|
||||
called segments_. Each segment is a series of log entries. The segment number together with the offset of each
|
||||
entry relative to its segment start establishes an ordering of the log entries. This is the "definition" of
|
||||
time for the purposes of the log.
|
||||
|
||||
.. _config-file:
|
||||
|
||||
Config file
|
||||
~~~~~~~~~~~
|
||||
|
||||
Each repository has a ``config`` file which is a ``INI``-style file
|
||||
and looks like this::
|
||||
|
||||
[repository]
|
||||
version = 2
|
||||
segments_per_dir = 1000
|
||||
max_segment_size = 524288000
|
||||
id = 57d6c1d52ce76a836b532b0e42e677dec6af9fca3673db511279358828a21ed6
|
||||
|
||||
This is where the ``repository.id`` is stored. It is a unique
|
||||
identifier for repositories. It will not change if you move the
|
||||
repository around so you can make a local transfer then decide to move
|
||||
the repository to another (even remote) location at a later time.
|
||||
|
||||
Keys
|
||||
~~~~
|
||||
|
||||
Repository keys are byte-strings of fixed length (32 bytes), they
|
||||
don't have a particular meaning (except for the Manifest_).
|
||||
|
||||
Normally the keys are computed like this::
|
||||
Repository object IDs (which are used as key into the key-value store) are
|
||||
byte-strings of fixed length (256bit, 32 bytes), computed like this::
|
||||
|
||||
key = id = id_hash(plaintext_data) # plain = not encrypted, not compressed, not obfuscated
|
||||
|
||||
|
@ -84,247 +72,68 @@ The id_hash function depends on the :ref:`encryption mode <borg_rcreate>`.
|
|||
As the id / key is used for deduplication, id_hash must be a cryptographically
|
||||
strong hash or MAC.
|
||||
|
||||
Segments
|
||||
~~~~~~~~
|
||||
Repository objects
|
||||
~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Objects referenced by a key are stored inline in files (`segments`) of approx.
|
||||
500 MB size in numbered subdirectories of ``repo/data``. The number of segments
|
||||
per directory is controlled by the value of ``segments_per_dir``. If you change
|
||||
this value in a non-empty repository, you may also need to relocate the segment
|
||||
files manually.
|
||||
Each repository object is stored separately, under its ID into data/xx/yy/xxyy...
|
||||
|
||||
A segment starts with a magic number (``BORG_SEG`` as an eight byte ASCII string),
|
||||
followed by a number of log entries. Each log entry consists of (in this order):
|
||||
A repo object has a structure like this:
|
||||
|
||||
* crc32 checksum (uint32):
|
||||
- for PUT2: CRC32(size + tag + key + digest)
|
||||
- for PUT: CRC32(size + tag + key + payload)
|
||||
- for DELETE: CRC32(size + tag + key)
|
||||
- for COMMIT: CRC32(size + tag)
|
||||
* size (uint32) of the entry (including the whole header)
|
||||
* tag (uint8): PUT(0), DELETE(1), COMMIT(2) or PUT2(3)
|
||||
* key (256 bit) - only for PUT/PUT2/DELETE
|
||||
* payload (size - 41 bytes) - only for PUT
|
||||
* xxh64 digest (64 bit) = XXH64(size + tag + key + payload) - only for PUT2
|
||||
* payload (size - 41 - 8 bytes) - only for PUT2
|
||||
* 32bit meta size
|
||||
* 32bit data size
|
||||
* 64bit xxh64(meta)
|
||||
* 64bit xxh64(data)
|
||||
* meta
|
||||
* data
|
||||
|
||||
PUT2 is new since repository version 2. For new log entries PUT2 is used.
|
||||
PUT is still supported to read version 1 repositories, but not generated any more.
|
||||
If we talk about ``PUT`` in general, it shall usually mean PUT2 for repository
|
||||
version 2+.
|
||||
The size and xxh64 hashes can be used for server-side corruption checks without
|
||||
needing to decrypt anything (which would require the borg key).
|
||||
|
||||
Those files are strictly append-only and modified only once.
|
||||
The overall size of repository objects varies from very small (a small source
|
||||
file will be stored as a single repo object) to medium (big source files will
|
||||
be cut into medium sized chunks of some MB).
|
||||
|
||||
When an object is written to the repository a ``PUT`` entry is written
|
||||
to the file containing the object id and payload. If an object is deleted
|
||||
a ``DELETE`` entry is appended with the object id.
|
||||
Metadata and data are separately encrypted and authenticated (depending on
|
||||
the user's choices).
|
||||
|
||||
A ``COMMIT`` tag is written when a repository transaction is
|
||||
committed. The segment number of the segment containing
|
||||
a commit is the **transaction ID**.
|
||||
See :ref:`data-encryption` for a graphic outlining the anatomy of the
|
||||
encryption.
|
||||
|
||||
When a repository is opened any ``PUT`` or ``DELETE`` operations not
|
||||
followed by a ``COMMIT`` tag are discarded since they are part of a
|
||||
partial/uncommitted transaction.
|
||||
Repo object metadata
|
||||
~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The size of individual segments is limited to 4 GiB, since the offset of entries
|
||||
within segments is stored in a 32-bit unsigned integer in the repository index.
|
||||
Metadata is a msgpacked (and encrypted/authenticated) dict with:
|
||||
|
||||
Objects / Payload structure
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
- ctype (compression type 0..255)
|
||||
- clevel (compression level 0..255)
|
||||
- csize (overall compressed (and maybe obfuscated) data size)
|
||||
- psize (only when obfuscated: payload size without the obfuscation trailer)
|
||||
- size (uncompressed size of the data)
|
||||
|
||||
All data (the manifest, archives, archive item stream chunks and file data
|
||||
chunks) is compressed, optionally obfuscated and encrypted. This produces some
|
||||
additional metadata (size and compression information), which is separately
|
||||
serialized and also encrypted.
|
||||
|
||||
See :ref:`data-encryption` for a graphic outlining the anatomy of the encryption in Borg.
|
||||
What you see at the bottom there is done twice: once for the data and once for the metadata.
|
||||
|
||||
An object (the payload part of a segment file log entry) must be like:
|
||||
|
||||
- length of encrypted metadata (16bit unsigned int)
|
||||
- encrypted metadata (incl. encryption header), when decrypted:
|
||||
|
||||
- msgpacked dict with:
|
||||
|
||||
- ctype (compression type 0..255)
|
||||
- clevel (compression level 0..255)
|
||||
- csize (overall compressed (and maybe obfuscated) data size)
|
||||
- psize (only when obfuscated: payload size without the obfuscation trailer)
|
||||
- size (uncompressed size of the data)
|
||||
- encrypted data (incl. encryption header), when decrypted:
|
||||
|
||||
- compressed data (with an optional all-zero-bytes obfuscation trailer)
|
||||
|
||||
This new, more complex repo v2 object format was implemented to be able to query the
|
||||
metadata efficiently without having to read, transfer and decrypt the (usually much bigger)
|
||||
data part.
|
||||
|
||||
The metadata is encrypted not to disclose potentially sensitive information that could be
|
||||
used for e.g. fingerprinting attacks.
|
||||
Having this separately encrypted metadata makes it more efficient to query
|
||||
the metadata without having to read, transfer and decrypt the (usually much
|
||||
bigger) data part.
|
||||
|
||||
The compression `ctype` and `clevel` is explained in :ref:`data-compression`.
|
||||
|
||||
|
||||
Index, hints and integrity
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The **repository index** is stored in ``index.<TRANSACTION_ID>`` and is used to
|
||||
determine an object's location in the repository. It is a HashIndex_,
|
||||
a hash table using open addressing.
|
||||
|
||||
It maps object keys_ to:
|
||||
|
||||
* segment number (unit32)
|
||||
* offset of the object's entry within the segment (uint32)
|
||||
* size of the payload, not including the entry header (uint32)
|
||||
* flags (uint32)
|
||||
|
||||
The **hints file** is a msgpacked file named ``hints.<TRANSACTION_ID>``.
|
||||
It contains:
|
||||
|
||||
* version
|
||||
* list of segments
|
||||
* compact
|
||||
* shadow_index
|
||||
* storage_quota_use
|
||||
|
||||
The **integrity file** is a msgpacked file named ``integrity.<TRANSACTION_ID>``.
|
||||
It contains checksums of the index and hints files and is described in the
|
||||
:ref:`Checksumming data structures <integrity_repo>` section below.
|
||||
|
||||
If the index or hints are corrupted, they are re-generated automatically.
|
||||
If they are outdated, segments are replayed from the index state to the currently
|
||||
committed transaction.
|
||||
|
||||
Compaction
|
||||
~~~~~~~~~~
|
||||
|
||||
For a given key only the last entry regarding the key, which is called current (all other entries are called
|
||||
superseded), is relevant: If there is no entry or the last entry is a DELETE then the key does not exist.
|
||||
Otherwise the last PUT defines the value of the key.
|
||||
``borg compact`` is used to free repository space. It will:
|
||||
|
||||
By superseding a PUT (with either another PUT or a DELETE) the log entry becomes obsolete. A segment containing
|
||||
such obsolete entries is called sparse, while a segment containing no such entries is called compact.
|
||||
- list all object IDs present in the repository
|
||||
- read all archives and determine which object IDs are in use
|
||||
- remove all unused objects from the repository
|
||||
- inform / warn about anything remarkable it found:
|
||||
|
||||
Since writing a ``DELETE`` tag does not actually delete any data and
|
||||
thus does not free disk space any log-based data store will need a
|
||||
compaction strategy (somewhat analogous to a garbage collector).
|
||||
- warn about IDs used, but not present (data loss!)
|
||||
- inform about IDs that reappeared that were previously lost
|
||||
- compute statistics about:
|
||||
|
||||
Borg uses a simple forward compacting algorithm, which avoids modifying existing segments.
|
||||
Compaction runs when a commit is issued with ``compact=True`` parameter, e.g.
|
||||
by the ``borg compact`` command (unless the :ref:`append_only_mode` is active).
|
||||
- compression and deduplication factors
|
||||
- repository space usage and space freed
|
||||
|
||||
The compaction algorithm requires two inputs in addition to the segments themselves:
|
||||
|
||||
(i) Which segments are sparse, to avoid scanning all segments (impractical).
|
||||
Further, Borg uses a conditional compaction strategy: Only those
|
||||
segments that exceed a threshold sparsity are compacted.
|
||||
|
||||
To implement the threshold condition efficiently, the sparsity has
|
||||
to be stored as well. Therefore, Borg stores a mapping ``(segment
|
||||
id,) -> (number of sparse bytes,)``.
|
||||
|
||||
(ii) Each segment's reference count, which indicates how many live objects are in a segment.
|
||||
This is not strictly required to perform the algorithm. Rather, it is used to validate
|
||||
that a segment is unused before deleting it. If the algorithm is incorrect, or the reference
|
||||
count was not accounted correctly, then an assertion failure occurs.
|
||||
|
||||
These two pieces of information are stored in the hints file (`hints.N`)
|
||||
next to the index (`index.N`).
|
||||
|
||||
Compaction may take some time if a repository has been kept in append-only mode
|
||||
or ``borg compact`` has not been used for a longer time, which both has caused
|
||||
the number of sparse segments to grow.
|
||||
|
||||
Compaction processes sparse segments from oldest to newest; sparse segments
|
||||
which don't contain enough deleted data to justify compaction are skipped. This
|
||||
avoids doing e.g. 500 MB of writing current data to a new segment when only
|
||||
a couple kB were deleted in a segment.
|
||||
|
||||
Segments that are compacted are read in entirety. Current entries are written to
|
||||
a new segment, while superseded entries are omitted. After each segment an intermediary
|
||||
commit is written to the new segment. Then, the old segment is deleted
|
||||
(asserting that the reference count diminished to zero), freeing disk space.
|
||||
|
||||
A simplified example (excluding conditional compaction and with simpler
|
||||
commit logic) showing the principal operation of compaction:
|
||||
|
||||
.. figure:: compaction.png
|
||||
:figwidth: 100%
|
||||
:width: 100%
|
||||
|
||||
(The actual algorithm is more complex to avoid various consistency issues, refer to
|
||||
the ``borg.repository`` module for more comments and documentation on these issues.)
|
||||
|
||||
.. _internals_storage_quota:
|
||||
|
||||
Storage quotas
|
||||
~~~~~~~~~~~~~~
|
||||
|
||||
Quotas are implemented at the Repository level. The active quota of a repository
|
||||
is determined by the ``storage_quota`` `config` entry or a run-time override (via :ref:`borg_serve`).
|
||||
The currently used quota is stored in the hints file. Operations (PUT and DELETE) during
|
||||
a transaction modify the currently used quota:
|
||||
|
||||
- A PUT adds the size of the *log entry* to the quota,
|
||||
i.e. the length of the data plus the 41 byte header.
|
||||
- A DELETE subtracts the size of the deleted log entry from the quota,
|
||||
which includes the header.
|
||||
|
||||
Thus, PUT and DELETE are symmetric and cancel each other out precisely.
|
||||
|
||||
The quota does not track on-disk size overheads (due to conditional compaction
|
||||
or append-only mode). In normal operation the inclusion of the log entry headers
|
||||
in the quota act as a faithful proxy for index and hints overheads.
|
||||
|
||||
By tracking effective content size, the client can *always* recover from a full quota
|
||||
by deleting archives. This would not be possible if the quota tracked on-disk size,
|
||||
since journaling DELETEs requires extra disk space before space is freed.
|
||||
Tracking effective size on the other hand accounts DELETEs immediately as freeing quota.
|
||||
|
||||
.. rubric:: Enforcing the quota
|
||||
|
||||
The storage quota is meant as a robust mechanism for service providers, therefore
|
||||
:ref:`borg_serve` has to enforce it without loopholes (e.g. modified clients).
|
||||
The following sections refer to using quotas on remotely accessed repositories.
|
||||
For local access, consider *client* and *serve* the same.
|
||||
Accordingly, quotas cannot be enforced with local access,
|
||||
since the quota can be changed in the repository config.
|
||||
|
||||
The quota is enforcible only if *all* :ref:`borg_serve` versions
|
||||
accessible to clients support quotas (see next section). Further, quota is
|
||||
per repository. Therefore, ensure clients can only access a defined set of repositories
|
||||
with their quotas set, using ``--restrict-to-repository``.
|
||||
|
||||
If the client exceeds the storage quota the ``StorageQuotaExceeded`` exception is
|
||||
raised. Normally a client could ignore such an exception and just send a ``commit()``
|
||||
command anyway, circumventing the quota. However, when ``StorageQuotaExceeded`` is raised,
|
||||
it is stored in the ``transaction_doomed`` attribute of the repository.
|
||||
If the transaction is doomed, then commit will re-raise this exception, aborting the commit.
|
||||
|
||||
The transaction_doomed indicator is reset on a rollback (which erases the quota-exceeding
|
||||
state).
|
||||
|
||||
.. rubric:: Compatibility with older servers and enabling quota after-the-fact
|
||||
|
||||
If no quota data is stored in the hints file, Borg assumes zero quota is used.
|
||||
Thus, if a repository with an enabled quota is written to with an older ``borg serve``
|
||||
version that does not understand quotas, then the quota usage will be erased.
|
||||
|
||||
The client version is irrelevant to the storage quota and has no part in it.
|
||||
The form of error messages due to exceeding quota varies with client versions.
|
||||
|
||||
A similar situation arises when upgrading from a Borg release that did not have quotas.
|
||||
Borg will start tracking quota use from the time of the upgrade, starting at zero.
|
||||
|
||||
If the quota shall be enforced accurately in these cases, either
|
||||
|
||||
- delete the ``index.N`` and ``hints.N`` files, forcing Borg to rebuild both,
|
||||
re-acquiring quota data in the process, or
|
||||
- edit the msgpacked ``hints.N`` file (not recommended and thus not
|
||||
documented further).
|
||||
|
||||
The object graph
|
||||
----------------
|
||||
|
@ -344,10 +153,10 @@ More on how this helps security in :ref:`security_structural_auth`.
|
|||
The manifest
|
||||
~~~~~~~~~~~~
|
||||
|
||||
The manifest is the root of the object hierarchy. It references
|
||||
all archives in a repository, and thus all data in it.
|
||||
Since no object references it, it cannot be stored under its ID key.
|
||||
Instead, the manifest has a fixed all-zero key.
|
||||
Compared to borg 1.x:
|
||||
|
||||
- the manifest moved from object ID 0 to config/manifest
|
||||
- the archives list has been moved from the manifest to archives/*
|
||||
|
||||
The manifest is rewritten each time an archive is created, deleted,
|
||||
or modified. It looks like this:
|
||||
|
@ -523,17 +332,18 @@ these may/may not be implemented and purely serve as examples.
|
|||
Archives
|
||||
~~~~~~~~
|
||||
|
||||
Each archive is an object referenced by the manifest. The archive object
|
||||
itself does not store any of the data contained in the archive it describes.
|
||||
Each archive is an object referenced by an entry below archives/.
|
||||
The archive object itself does not store any of the data contained in the
|
||||
archive it describes.
|
||||
|
||||
Instead, it contains a list of chunks which form a msgpacked stream of items_.
|
||||
The archive object itself further contains some metadata:
|
||||
|
||||
* *version*
|
||||
* *name*, which might differ from the name set in the manifest.
|
||||
* *name*, which might differ from the name set in the archives/* object.
|
||||
When :ref:`borg_check` rebuilds the manifest (e.g. if it was corrupted) and finds
|
||||
more than one archive object with the same name, it adds a counter to the name
|
||||
in the manifest, but leaves the *name* field of the archives as it was.
|
||||
in archives/*, but leaves the *name* field of the archives as they were.
|
||||
* *item_ptrs*, a list of "pointer chunk" IDs.
|
||||
Each "pointer chunk" contains a list of chunk IDs of item metadata.
|
||||
* *command_line*, the command line which was used to create the archive
|
||||
|
@ -676,7 +486,7 @@ In memory, the files cache is a key -> value mapping (a Python *dict*) and conta
|
|||
- file size
|
||||
- file ctime_ns (or mtime_ns)
|
||||
- age (0 [newest], 1, 2, 3, ..., BORG_FILES_CACHE_TTL - 1)
|
||||
- list of chunk ids representing the file's contents
|
||||
- list of chunk (id, size) tuples representing the file's contents
|
||||
|
||||
To determine whether a file has not changed, cached values are looked up via
|
||||
the key in the mapping and compared to the current file attribute values.
|
||||
|
@ -717,9 +527,9 @@ The on-disk format of the files cache is a stream of msgpacked tuples (key, valu
|
|||
Loading the files cache involves reading the file, one msgpack object at a time,
|
||||
unpacking it, and msgpacking the value (in an effort to save memory).
|
||||
|
||||
The **chunks cache** is stored in ``cache/chunks`` and is used to determine
|
||||
whether we already have a specific chunk, to count references to it and also
|
||||
for statistics.
|
||||
The **chunks cache** is not persisted to disk, but dynamically built in memory
|
||||
by querying the existing object IDs from the repository.
|
||||
It is used to determine whether we already have a specific chunk.
|
||||
|
||||
The chunks cache is a key -> value mapping and contains:
|
||||
|
||||
|
@ -728,14 +538,10 @@ The chunks cache is a key -> value mapping and contains:
|
|||
- chunk id_hash
|
||||
* value:
|
||||
|
||||
- reference count
|
||||
- size
|
||||
- reference count (always MAX_VALUE as we do not refcount anymore)
|
||||
- size (0 for prev. existing objects, we can't query their plaintext size)
|
||||
|
||||
The chunks cache is a HashIndex_. Due to some restrictions of HashIndex,
|
||||
the reference count of each given chunk is limited to a constant, MAX_VALUE
|
||||
(introduced below in HashIndex_), approximately 2**32.
|
||||
If a reference count hits MAX_VALUE, decrementing it yields MAX_VALUE again,
|
||||
i.e. the reference count is pinned to MAX_VALUE.
|
||||
The chunks cache is a HashIndex_.
|
||||
|
||||
.. _cache-memory-usage:
|
||||
|
||||
|
@ -747,14 +553,12 @@ Here is the estimated memory usage of Borg - it's complicated::
|
|||
chunk_size ~= 2 ^ HASH_MASK_BITS (for buzhash chunker, BLOCK_SIZE for fixed chunker)
|
||||
chunk_count ~= total_file_size / chunk_size
|
||||
|
||||
repo_index_usage = chunk_count * 48
|
||||
|
||||
chunks_cache_usage = chunk_count * 40
|
||||
|
||||
files_cache_usage = total_file_count * 240 + chunk_count * 80
|
||||
files_cache_usage = total_file_count * 240 + chunk_count * 165
|
||||
|
||||
mem_usage ~= repo_index_usage + chunks_cache_usage + files_cache_usage
|
||||
= chunk_count * 164 + total_file_count * 240
|
||||
mem_usage ~= chunks_cache_usage + files_cache_usage
|
||||
= chunk_count * 205 + total_file_count * 240
|
||||
|
||||
Due to the hashtables, the best/usual/worst cases for memory allocation can
|
||||
be estimated like that::
|
||||
|
@ -772,11 +576,9 @@ It is also assuming that typical chunk size is 2^HASH_MASK_BITS (if you have
|
|||
a lot of files smaller than this statistical medium chunk size, you will have
|
||||
more chunks than estimated above, because 1 file is at least 1 chunk).
|
||||
|
||||
If a remote repository is used the repo index will be allocated on the remote side.
|
||||
|
||||
The chunks cache, files cache and the repo index are all implemented as hash
|
||||
tables. A hash table must have a significant amount of unused entries to be
|
||||
fast - the so-called load factor gives the used/unused elements ratio.
|
||||
The chunks cache and files cache are all implemented as hash tables.
|
||||
A hash table must have a significant amount of unused entries to be fast -
|
||||
the so-called load factor gives the used/unused elements ratio.
|
||||
|
||||
When a hash table gets full (load factor getting too high), it needs to be
|
||||
grown (allocate new, bigger hash table, copy all elements over to it, free old
|
||||
|
@ -802,7 +604,7 @@ b) with ``create --chunker-params buzhash,19,23,21,4095`` (default):
|
|||
HashIndex
|
||||
---------
|
||||
|
||||
The chunks cache and the repository index are stored as hash tables, with
|
||||
The chunks cache is implemented as a hash table, with
|
||||
only one slot per bucket, spreading hash collisions to the following
|
||||
buckets. As a consequence the hash is just a start position for a linear
|
||||
search. If a key is looked up that is not in the table, then the hash table
|
||||
|
@ -905,7 +707,7 @@ Both modes
|
|||
~~~~~~~~~~
|
||||
|
||||
Encryption keys (and other secrets) are kept either in a key file on the client
|
||||
('keyfile' mode) or in the repository config on the server ('repokey' mode).
|
||||
('keyfile' mode) or in the repository under keys/repokey ('repokey' mode).
|
||||
In both cases, the secrets are generated from random and then encrypted by a
|
||||
key derived from your passphrase (this happens on the client before the key
|
||||
is stored into the keyfile or as repokey).
|
||||
|
@ -923,8 +725,7 @@ Key files
|
|||
When initializing a repository with one of the "keyfile" encryption modes,
|
||||
Borg creates an associated key file in ``$HOME/.config/borg/keys``.
|
||||
|
||||
The same key is also used in the "repokey" modes, which store it in the repository
|
||||
in the configuration file.
|
||||
The same key is also used in the "repokey" modes, which store it in the repository.
|
||||
|
||||
The internal data structure is as follows:
|
||||
|
||||
|
@ -1016,11 +817,10 @@ methods in one repo does not influence deduplication.
|
|||
|
||||
See ``borg create --help`` about how to specify the compression level and its default.
|
||||
|
||||
Lock files
|
||||
----------
|
||||
Lock files (fslocking)
|
||||
----------------------
|
||||
|
||||
Borg uses locks to get (exclusive or shared) access to the cache and
|
||||
the repository.
|
||||
Borg uses filesystem locks to get (exclusive or shared) access to the cache.
|
||||
|
||||
The locking system is based on renaming a temporary directory
|
||||
to `lock.exclusive` (for
|
||||
|
@ -1037,24 +837,46 @@ to `lock.exclusive`, it has the lock for it. If renaming fails
|
|||
denotes a thread on the host which is still alive), lock acquisition fails.
|
||||
|
||||
The cache lock is usually in `~/.cache/borg/REPOID/lock.*`.
|
||||
The repository lock is in `repository/lock.*`.
|
||||
|
||||
Locks (storelocking)
|
||||
--------------------
|
||||
|
||||
To implement locking based on ``borgstore``, borg stores objects below locks/.
|
||||
|
||||
The objects contain:
|
||||
|
||||
- a timestamp when lock was created (or refreshed)
|
||||
- host / process / thread information about lock owner
|
||||
- lock type: exclusive or shared
|
||||
|
||||
Using that information, borg implements:
|
||||
|
||||
- lock auto-expiry: if a lock is old and has not been refreshed in time,
|
||||
it will be automatically ignored and deleted. the primary purpose of this
|
||||
is to get rid of stale locks by borg processes on other machines.
|
||||
- lock auto-removal if the owner process is dead. the primary purpose of this
|
||||
is to quickly get rid of stale locks by borg processes on the same machine.
|
||||
|
||||
Breaking the locks
|
||||
------------------
|
||||
|
||||
In case you run into troubles with the locks, you can use the ``borg break-lock``
|
||||
command after you first have made sure that no Borg process is
|
||||
running on any machine that accesses this resource. Be very careful, the cache
|
||||
or repository might get damaged if multiple processes use it at the same time.
|
||||
|
||||
If there is an issue just with the repository lock, it will usually resolve
|
||||
automatically (see above), just retry later.
|
||||
|
||||
|
||||
Checksumming data structures
|
||||
----------------------------
|
||||
|
||||
As detailed in the previous sections, Borg generates and stores various files
|
||||
containing important meta data, such as the repository index, repository hints,
|
||||
chunks caches and files cache.
|
||||
containing important meta data, such as the files cache.
|
||||
|
||||
Data corruption in these files can damage the archive data in a repository,
|
||||
e.g. due to wrong reference counts in the chunks cache. Only some parts of Borg
|
||||
were designed to handle corrupted data structures, so a corrupted files cache
|
||||
may cause crashes or write incorrect archives.
|
||||
Data corruption in the files cache could create incorrect archives, e.g. due
|
||||
to wrong object IDs or sizes in the files cache.
|
||||
|
||||
Therefore, Borg calculates checksums when writing these files and tests checksums
|
||||
when reading them. Checksums are generally 64-bit XXH64 hashes.
|
||||
|
@ -1086,11 +908,11 @@ xxHash was expressly designed for data blocks of these sizes.
|
|||
Lower layer — file_integrity
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
To accommodate the different transaction models used for the cache and repository,
|
||||
there is a lower layer (borg.crypto.file_integrity.IntegrityCheckedFile)
|
||||
wrapping a file-like object, performing streaming calculation and comparison of checksums.
|
||||
Checksum errors are signalled by raising an exception (borg.crypto.file_integrity.FileIntegrityError)
|
||||
at the earliest possible moment.
|
||||
There is a lower layer (borg.crypto.file_integrity.IntegrityCheckedFile)
|
||||
wrapping a file-like object, performing streaming calculation and comparison
|
||||
of checksums.
|
||||
Checksum errors are signalled by raising an exception at the earliest possible
|
||||
moment (borg.crypto.file_integrity.FileIntegrityError).
|
||||
|
||||
.. rubric:: Calculating checksums
|
||||
|
||||
|
@ -1134,19 +956,13 @@ The *digests* key contains a mapping of part names to their digests.
|
|||
Integrity data is generally stored by the upper layers, introduced below. An exception
|
||||
is the DetachedIntegrityCheckedFile, which automatically writes and reads it from
|
||||
a ".integrity" file next to the data file.
|
||||
It is used for archive chunks indexes in chunks.archive.d.
|
||||
|
||||
Upper layer
|
||||
~~~~~~~~~~~
|
||||
|
||||
Storage of integrity data depends on the component using it, since they have
|
||||
different transaction mechanisms, and integrity data needs to be
|
||||
transacted with the data it is supposed to protect.
|
||||
|
||||
.. rubric:: Main cache files: chunks and files cache
|
||||
|
||||
The integrity data of the ``chunks`` and ``files`` caches is stored in the
|
||||
cache ``config``, since all three are transacted together.
|
||||
The integrity data of the ``files`` cache is stored in the cache ``config``.
|
||||
|
||||
The ``[integrity]`` section is used:
|
||||
|
||||
|
@ -1162,7 +978,7 @@ The ``[integrity]`` section is used:
|
|||
|
||||
[integrity]
|
||||
manifest = 10e...21c
|
||||
chunks = {"algorithm": "XXH64", "digests": {"HashHeader": "eab...39e3", "final": "e2a...b24"}}
|
||||
files = {"algorithm": "XXH64", "digests": {"HashHeader": "eab...39e3", "final": "e2a...b24"}}
|
||||
|
||||
The manifest ID is duplicated in the integrity section due to the way all Borg
|
||||
versions handle the config file. Instead of creating a "new" config file from
|
||||
|
@ -1182,52 +998,6 @@ easy to tell whether the checksums concern the current state of the cache.
|
|||
Integrity errors are fatal in these files, terminating the program,
|
||||
and are not automatically corrected at this time.
|
||||
|
||||
.. rubric:: chunks.archive.d
|
||||
|
||||
Indices in chunks.archive.d are not transacted and use DetachedIntegrityCheckedFile,
|
||||
which writes the integrity data to a separate ".integrity" file.
|
||||
|
||||
Integrity errors result in deleting the affected index and rebuilding it.
|
||||
This logs a warning and increases the exit code to WARNING (1).
|
||||
|
||||
.. _integrity_repo:
|
||||
|
||||
.. rubric:: Repository index and hints
|
||||
|
||||
The repository associates index and hints files with a transaction by including the
|
||||
transaction ID in the file names. Integrity data is stored in a third file
|
||||
("integrity.<TRANSACTION_ID>"). Like the hints file, it is msgpacked:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
{
|
||||
'version': 2,
|
||||
'hints': '{"algorithm": "XXH64", "digests": {"final": "411208db2aa13f1a"}}',
|
||||
'index': '{"algorithm": "XXH64", "digests": {"HashHeader": "846b7315f91b8e48", "final": "cb3e26cadc173e40"}}'
|
||||
}
|
||||
|
||||
The *version* key started at 2, the same version used for the hints. Since Borg has
|
||||
many versioned file formats, this keeps the number of different versions in use
|
||||
a bit lower.
|
||||
|
||||
The other keys map an auxiliary file, like *index* or *hints* to their integrity data.
|
||||
Note that the JSON is stored as-is, and not as part of the msgpack structure.
|
||||
|
||||
Integrity errors result in deleting the affected file(s) (index/hints) and rebuilding the index,
|
||||
which is the same action taken when corruption is noticed in other ways (e.g. HashIndex can
|
||||
detect most corrupted headers, but not data corruption). A warning is logged as well.
|
||||
The exit code is not influenced, since remote repositories cannot perform that action.
|
||||
Raising the exit code would be possible for local repositories, but is not implemented.
|
||||
|
||||
Unlike the cache design this mechanism can have false positives whenever an older version
|
||||
*rewrites* the auxiliary files for a transaction created by a newer version,
|
||||
since that might result in a different index (due to hash-table resizing) or hints file
|
||||
(hash ordering, or the older version 1 format), while not invalidating the integrity file.
|
||||
|
||||
For example, using 1.1 on a repository, noticing corruption or similar issues and then running
|
||||
``borg-1.0 check --repair``, which rewrites the index and hints, results in this situation.
|
||||
Borg 1.1 would erroneously report checksum errors in the hints and/or index files and trigger
|
||||
an automatic rebuild of these files.
|
||||
|
||||
HardLinkManager and the hlid concept
|
||||
------------------------------------
|
||||
|
|
Binary file not shown.
Binary file not shown.
Before Width: | Height: | Size: 380 KiB After Width: | Height: | Size: 98 KiB |
|
@ -31,14 +31,14 @@ deleted between attacks).
|
|||
Under these circumstances Borg guarantees that the attacker cannot
|
||||
|
||||
1. modify the data of any archive without the client detecting the change
|
||||
2. rename, remove or add an archive without the client detecting the change
|
||||
2. rename or add an archive without the client detecting the change
|
||||
3. recover plain-text data
|
||||
4. recover definite (heuristics based on access patterns are possible)
|
||||
structural information such as the object graph (which archives
|
||||
refer to what chunks)
|
||||
|
||||
The attacker can always impose a denial of service per definition (he could
|
||||
forbid connections to the repository, or delete it entirely).
|
||||
forbid connections to the repository, or delete it partly or entirely).
|
||||
|
||||
|
||||
.. _security_structural_auth:
|
||||
|
@ -47,12 +47,12 @@ Structural Authentication
|
|||
-------------------------
|
||||
|
||||
Borg is fundamentally based on an object graph structure (see :ref:`internals`),
|
||||
where the root object is called the manifest.
|
||||
where the root objects are the archives.
|
||||
|
||||
Borg follows the `Horton principle`_, which states that
|
||||
not only the message must be authenticated, but also its meaning (often
|
||||
expressed through context), because every object used is referenced by a
|
||||
parent object through its object ID up to the manifest. The object ID in
|
||||
parent object through its object ID up to the archive list entry. The object ID in
|
||||
Borg is a MAC of the object's plaintext, therefore this ensures that
|
||||
an attacker cannot change the context of an object without forging the MAC.
|
||||
|
||||
|
@ -64,8 +64,8 @@ represent packed file metadata. On their own, it's not clear that these objects
|
|||
would represent what they do, but by the archive item referring to them
|
||||
in a particular part of its own data structure assigns this meaning.
|
||||
|
||||
This results in a directed acyclic graph of authentication from the manifest
|
||||
to the data chunks of individual files.
|
||||
This results in a directed acyclic graph of authentication from the archive
|
||||
list entry to the data chunks of individual files.
|
||||
|
||||
Above used to be all for borg 1.x and was the reason why it needed the
|
||||
tertiary authentication mechanism (TAM) for manifest and archives.
|
||||
|
@ -80,11 +80,23 @@ the object ID (via giving the ID as AAD), there is no way an attacker (without
|
|||
access to the borg key) could change the type of the object or move content
|
||||
to a different object ID.
|
||||
|
||||
This effectively 'anchors' the manifest (and also other metadata, like archives)
|
||||
to the key, which is controlled by the client, thereby anchoring the entire DAG,
|
||||
making it impossible for an attacker to add, remove or modify any part of the
|
||||
This effectively 'anchors' each archive to the key, which is controlled by the
|
||||
client, thereby anchoring the DAG starting from the archives list entry,
|
||||
making it impossible for an attacker to add or modify any part of the
|
||||
DAG without Borg being able to detect the tampering.
|
||||
|
||||
Please note that removing an archive by removing an entry from archives/*
|
||||
is possible and is done by ``borg delete`` and ``borg prune`` within their
|
||||
normal operation. An attacker could also remove some entries there, but, due to
|
||||
encryption, would not know what exactly they are removing. An attacker with
|
||||
repository access could also remove other parts of the repository or the whole
|
||||
repository, so there is not much point in protecting against archive removal.
|
||||
|
||||
The borg 1.x way of having the archives list within the manifest chunk was
|
||||
problematic as it required a read-modify-write operation on the manifest,
|
||||
requiring a lock on the repository. We want to try less locking and more
|
||||
parallelism in future.
|
||||
|
||||
Passphrase notes
|
||||
----------------
|
||||
|
||||
|
|
|
@ -27,7 +27,7 @@ level margin: \\n[rst2man-indent\\n[rst2man-indent-level]]
|
|||
.\" new: \\n[rst2man-indent\\n[rst2man-indent-level]]
|
||||
.in \\n[rst2man-indent\\n[rst2man-indent-level]]u
|
||||
..
|
||||
.TH "BORG-BENCHMARK-CPU" 1 "2024-07-19" "" "borg backup tool"
|
||||
.TH "BORG-BENCHMARK-CPU" 1 "2024-09-08" "" "borg backup tool"
|
||||
.SH NAME
|
||||
borg-benchmark-cpu \- Benchmark CPU bound operations.
|
||||
.SH SYNOPSIS
|
||||
|
|
|
@ -27,7 +27,7 @@ level margin: \\n[rst2man-indent\\n[rst2man-indent-level]]
|
|||
.\" new: \\n[rst2man-indent\\n[rst2man-indent-level]]
|
||||
.in \\n[rst2man-indent\\n[rst2man-indent-level]]u
|
||||
..
|
||||
.TH "BORG-BENCHMARK-CRUD" 1 "2024-07-19" "" "borg backup tool"
|
||||
.TH "BORG-BENCHMARK-CRUD" 1 "2024-09-08" "" "borg backup tool"
|
||||
.SH NAME
|
||||
borg-benchmark-crud \- Benchmark Create, Read, Update, Delete for archives.
|
||||
.SH SYNOPSIS
|
||||
|
|
|
@ -27,7 +27,7 @@ level margin: \\n[rst2man-indent\\n[rst2man-indent-level]]
|
|||
.\" new: \\n[rst2man-indent\\n[rst2man-indent-level]]
|
||||
.in \\n[rst2man-indent\\n[rst2man-indent-level]]u
|
||||
..
|
||||
.TH "BORG-BENCHMARK" 1 "2024-07-19" "" "borg backup tool"
|
||||
.TH "BORG-BENCHMARK" 1 "2024-09-08" "" "borg backup tool"
|
||||
.SH NAME
|
||||
borg-benchmark \- benchmark command
|
||||
.SH SYNOPSIS
|
||||
|
|
|
@ -27,7 +27,7 @@ level margin: \\n[rst2man-indent\\n[rst2man-indent-level]]
|
|||
.\" new: \\n[rst2man-indent\\n[rst2man-indent-level]]
|
||||
.in \\n[rst2man-indent\\n[rst2man-indent-level]]u
|
||||
..
|
||||
.TH "BORG-BREAK-LOCK" 1 "2024-07-19" "" "borg backup tool"
|
||||
.TH "BORG-BREAK-LOCK" 1 "2024-09-08" "" "borg backup tool"
|
||||
.SH NAME
|
||||
borg-break-lock \- Break the repository lock (e.g. in case it was left by a dead borg.
|
||||
.SH SYNOPSIS
|
||||
|
|
|
@ -27,7 +27,7 @@ level margin: \\n[rst2man-indent\\n[rst2man-indent-level]]
|
|||
.\" new: \\n[rst2man-indent\\n[rst2man-indent-level]]
|
||||
.in \\n[rst2man-indent\\n[rst2man-indent-level]]u
|
||||
..
|
||||
.TH "BORG-CHECK" 1 "2024-07-19" "" "borg backup tool"
|
||||
.TH "BORG-CHECK" 1 "2024-09-08" "" "borg backup tool"
|
||||
.SH NAME
|
||||
borg-check \- Check repository consistency
|
||||
.SH SYNOPSIS
|
||||
|
@ -40,8 +40,8 @@ It consists of two major steps:
|
|||
.INDENT 0.0
|
||||
.IP 1. 3
|
||||
Checking the consistency of the repository itself. This includes checking
|
||||
the segment magic headers, and both the metadata and data of all objects in
|
||||
the segments. The read data is checked by size and CRC. Bit rot and other
|
||||
the file magic headers, and both the metadata and data of all objects in
|
||||
the repository. The read data is checked by size and hash. Bit rot and other
|
||||
types of accidental damage can be detected this way. Running the repository
|
||||
check can be split into multiple partial checks using \fB\-\-max\-duration\fP\&.
|
||||
When checking a remote repository, please note that the checks run on the
|
||||
|
@ -77,13 +77,12 @@ archive checks, nor enable repair mode. Consequently, if you want to use
|
|||
.sp
|
||||
\fBWarning:\fP Please note that partial repository checks (i.e. running it with
|
||||
\fB\-\-max\-duration\fP) can only perform non\-cryptographic checksum checks on the
|
||||
segment files. A full repository check (i.e. without \fB\-\-max\-duration\fP) can
|
||||
also do a repository index check. Enabling partial repository checks excepts
|
||||
archive checks for the same reason. Therefore partial checks may be useful with
|
||||
very large repositories only where a full check would take too long.
|
||||
repository files. Enabling partial repository checks excepts archive checks
|
||||
for the same reason. Therefore partial checks may be useful with very large
|
||||
repositories only where a full check would take too long.
|
||||
.sp
|
||||
The \fB\-\-verify\-data\fP option will perform a full integrity verification (as
|
||||
opposed to checking the CRC32 of the segment) of data, which means reading the
|
||||
opposed to checking just the xxh64) of data, which means reading the
|
||||
data from the repository, decrypting and decompressing it. It is a complete
|
||||
cryptographic verification and hence very time consuming, but will detect any
|
||||
accidental and malicious corruption. Tamper\-resistance is only guaranteed for
|
||||
|
@ -122,17 +121,15 @@ by definition, a potentially lossy task.
|
|||
In practice, repair mode hooks into both the repository and archive checks:
|
||||
.INDENT 0.0
|
||||
.IP 1. 3
|
||||
When checking the repository\(aqs consistency, repair mode will try to recover
|
||||
as many objects from segments with integrity errors as possible, and ensure
|
||||
that the index is consistent with the data stored in the segments.
|
||||
When checking the repository\(aqs consistency, repair mode removes corrupted
|
||||
objects from the repository after it did a 2nd try to read them correctly.
|
||||
.IP 2. 3
|
||||
When checking the consistency and correctness of archives, repair mode might
|
||||
remove whole archives from the manifest if their archive metadata chunk is
|
||||
corrupt or lost. On a chunk level (i.e. the contents of files), repair mode
|
||||
will replace corrupt or lost chunks with a same\-size replacement chunk of
|
||||
zeroes. If a previously zeroed chunk reappears, repair mode will restore
|
||||
this lost chunk using the new chunk. Lastly, repair mode will also delete
|
||||
orphaned chunks (e.g. caused by read errors while creating the archive).
|
||||
this lost chunk using the new chunk.
|
||||
.UNINDENT
|
||||
.sp
|
||||
Most steps taken by repair mode have a one\-time effect on the repository, like
|
||||
|
@ -152,6 +149,12 @@ replace the all\-zero replacement chunk by the reappeared chunk. If all lost
|
|||
chunks of a \(dqzero\-patched\(dq file reappear, this effectively \(dqheals\(dq the file.
|
||||
Consequently, if lost chunks were repaired earlier, it is advised to run
|
||||
\fB\-\-repair\fP a second time after creating some new backups.
|
||||
.sp
|
||||
If \fB\-\-repair \-\-undelete\-archives\fP is given, Borg will scan the repository
|
||||
for archive metadata and if it finds some where no corresponding archives
|
||||
directory entry exists, it will create the entries. This is basically undoing
|
||||
\fBborg delete archive\fP or \fBborg prune ...\fP commands and only possible before
|
||||
\fBborg compact\fP would remove the archives\(aq data completely.
|
||||
.SH OPTIONS
|
||||
.sp
|
||||
See \fIborg\-common(1)\fP for common options of Borg commands.
|
||||
|
@ -170,6 +173,9 @@ perform cryptographic archive data integrity verification (conflicts with \fB\-\
|
|||
.B \-\-repair
|
||||
attempt to repair any inconsistencies found
|
||||
.TP
|
||||
.B \-\-undelete\-archives
|
||||
attempt to undelete archives (use with \-\-repair)
|
||||
.TP
|
||||
.BI \-\-max\-duration \ SECONDS
|
||||
do only a partial repo check for max. SECONDS seconds (Default: unlimited)
|
||||
.UNINDENT
|
||||
|
|
|
@ -27,7 +27,7 @@ level margin: \\n[rst2man-indent\\n[rst2man-indent-level]]
|
|||
.\" new: \\n[rst2man-indent\\n[rst2man-indent-level]]
|
||||
.in \\n[rst2man-indent\\n[rst2man-indent-level]]u
|
||||
..
|
||||
.TH "BORG-COMMON" 1 "2024-07-19" "" "borg backup tool"
|
||||
.TH "BORG-COMMON" 1 "2024-09-08" "" "borg backup tool"
|
||||
.SH NAME
|
||||
borg-common \- Common options of Borg commands
|
||||
.SH SYNOPSIS
|
||||
|
@ -64,10 +64,7 @@ format using IEC units (1KiB = 1024B)
|
|||
Output one JSON object per log line instead of formatted text.
|
||||
.TP
|
||||
.BI \-\-lock\-wait \ SECONDS
|
||||
wait at most SECONDS for acquiring a repository/cache lock (default: 1).
|
||||
.TP
|
||||
.B \-\-bypass\-lock
|
||||
Bypass locking mechanism
|
||||
wait at most SECONDS for acquiring a repository/cache lock (default: 10).
|
||||
.TP
|
||||
.B \-\-show\-version
|
||||
show/log the borg version
|
||||
|
|
|
@ -27,40 +27,25 @@ level margin: \\n[rst2man-indent\\n[rst2man-indent-level]]
|
|||
.\" new: \\n[rst2man-indent\\n[rst2man-indent-level]]
|
||||
.in \\n[rst2man-indent\\n[rst2man-indent-level]]u
|
||||
..
|
||||
.TH "BORG-COMPACT" 1 "2024-07-19" "" "borg backup tool"
|
||||
.TH "BORG-COMPACT" 1 "2024-09-08" "" "borg backup tool"
|
||||
.SH NAME
|
||||
borg-compact \- compact segment files in the repository
|
||||
borg-compact \- Collect garbage in repository
|
||||
.SH SYNOPSIS
|
||||
.sp
|
||||
borg [common options] compact [options]
|
||||
.SH DESCRIPTION
|
||||
.sp
|
||||
This command frees repository space by compacting segments.
|
||||
Free repository space by deleting unused chunks.
|
||||
.sp
|
||||
Use this regularly to avoid running out of space \- you do not need to use this
|
||||
after each borg command though. It is especially useful after deleting archives,
|
||||
because only compaction will really free repository space.
|
||||
borg compact analyzes all existing archives to find out which chunks are
|
||||
actually used. There might be unused chunks resulting from borg delete or prune,
|
||||
which can be removed to free space in the repository.
|
||||
.sp
|
||||
borg compact does not need a key, so it is possible to invoke it from the
|
||||
client or also from the server.
|
||||
.sp
|
||||
Depending on the amount of segments that need compaction, it may take a while,
|
||||
so consider using the \fB\-\-progress\fP option.
|
||||
.sp
|
||||
A segment is compacted if the amount of saved space is above the percentage value
|
||||
given by the \fB\-\-threshold\fP option. If omitted, a threshold of 10% is used.
|
||||
When using \fB\-\-verbose\fP, borg will output an estimate of the freed space.
|
||||
.sp
|
||||
See \fIseparate_compaction\fP in Additional Notes for more details.
|
||||
Differently than borg 1.x, borg2\(aqs compact needs the borg key if the repo is
|
||||
encrypted.
|
||||
.SH OPTIONS
|
||||
.sp
|
||||
See \fIborg\-common(1)\fP for common options of Borg commands.
|
||||
.SS optional arguments
|
||||
.INDENT 0.0
|
||||
.TP
|
||||
.BI \-\-threshold \ PERCENT
|
||||
set minimum threshold for saved space in PERCENT (Default: 10)
|
||||
.UNINDENT
|
||||
.SH EXAMPLES
|
||||
.INDENT 0.0
|
||||
.INDENT 3.5
|
||||
|
|
|
@ -27,7 +27,7 @@ level margin: \\n[rst2man-indent\\n[rst2man-indent-level]]
|
|||
.\" |