- implement updating exit code based on severity, including modern codes
- extend print_warning with kwargs wc (warning code) and wt (warning type)
- update a global warnings_list with warning_info elements
- create a class hierarchy below BorgWarning class similar to Error class
- diff: change harmless warnings about speed to rc == 0
- delete --force --force: change harmless warnings to rc == 0
Also:
- have BackupRaceConditionError as a more precise subclass of BackupError
also: do a small optimisation in borg check:
if the type of the repo object is not ROBJ_ARCHIVE_META, we
can skip the object, it can not contain valid archive meta data.
if the type is correct, this is already a sufficient check, so
we can be quite sure that there will be valid archive metadata
in the object.
writing: put type into repoobj metadata
reading: check wanted type against type we got
repoobj metadata is encrypted and authenticated.
repoobj data is encrypted and authenticated, also (separately).
encryption and decryption of both metadata and data get the
same "chunk ID" as AAD, so both are "bound" to that (same) ID.
a repo-side attacker can neither see cleartext metadata/data,
nor successfully tamper with it (AEAD decryption would fail).
also, a repo-side attacker could not replace a repoobj A with a
differently typed repoobj B without borg noticing:
- the metadata/data is cryptographically bound to its ID.
authentication/decryption would fail on mismatch.
- the type check would fail.
thus, the problem (see CVEs in changelog) solved in borg 1 by the
manifest and archive TAMs is now already solved by the type check.
For many use cases, the repo-wide "rcompress" is more efficient.
Also, recreate --recompress calls add_chunk with overwrite=True,
which is unsupported with the AdHocCache.
rebuild_refcounts verifies and recreates the TAM.
Now it re-uses the salt, so that the archive ID does not change
just because of a new salt if the archive has still the same data.
list: shows either "verified" or "none", depending on
whether a TAM auth tag could be verified or was
missing (old archives from borg < 1.0.9).
when loading an archive, we now try to verify the archive
TAM, but we do not require it. people might still have
old archives in their repos and we want to be able to
list such repos without fatal exceptions.
This part of the archive checker recreates the Archive
items (always, just in case some missing chunks needed
repairing).
When loading the Archive item, we now verify the TAM.
When saving the (potentially modified) Archive item,
we now (re-)generate the TAM.
Archives without a valid TAM are dropped rather than TAM-authenticated
when saving them. There shouldn't be any archives without a valid TAM:
- borg writes an archive TAM since long (1.0.9)
- users are expected to TAM-authenticate archives created
by older borg when upgrading to borg 1.2.5.
Also:
Archive.set_meta: TAM-authenticate new archive
This is also used by Archive.rename and .recreate.
Replacing the internals should make the implementation faster
and simpler since the order tracking is done by the `OrderedDict`.
Furthermore, this commit adds type hints to `LRUCache` and
renames the `upd` method to `replace` to make its use more clear.
Paths are not always sanitized when creating an archive and,
more importantly, never when extracting one. The following example
shows how this can be used to attempt to write a file outside the
extraction directory:
$ echo abcdef | borg create -r ~/borg/a --stdin-name x/../../../../../etc/shadow archive-1 -
$ borg list -r ~/borg/a archive-1
-rw-rw---- root root 7 Sun, 2022-10-23 19:14:27 x/../../../../../etc/shadow
$ mkdir borg/target
$ cd borg/target
$ borg extract -r ~/borg/a archive-1
x/../../../../../etc/shadow: makedirs: [Errno 13] Permission denied: '/home/user/borg/target/x/../../../../../etc'
Note that Borg tries to extract the file to /etc/shadow and the
permission error is a result of the user not having access.
This patch ensures file names are sanitized before archiving.
As for files extracted from the archive, paths are sanitized
by making all paths relative, removing '.' elements, and removing
superfluous slashes (as in '//'). '..' elements, however, are
rejected outright. The reasoning here is that it is easy to start
a path with './' or insert a '//' by accident (e.g. via --stdin-name
or import-tar). '..', however, seem unlikely to be the result
of an accident and could indicate a tampered repository.
With paths being sanitized as they are being read, this "errors"
will be corrected during the `borg transfer` required when upgrading
to Borg 2. Hence, the sanitation, when reading the archive,
can be removed once support for reading v1 repositories is dropped.
V2 repository will not contain non-sanitized paths. Of course,
a check for absolute paths and '..' elements needs to kept in
place to detect tempered archives.
I recommend treating this as a security issue. I see the following
cases where extracting a file outside the extraction path could
constitute a security risk:
a) When extraction is done as a different user than archive
creation. The user that created the archive may be able to
get a file overwritten as a different user.
b) When the archive is created on one host and extracted on
another. The user that created the archive may be able to
get a file overwritten on another host.
c) When an archive is created and extracted after a OS reinstall.
When a host is suspected compromised, it is common to reinstall
(or set up a new machine), extract the backups and then evaluate
their integrity. A user that manipulates the archive before such
a reinstall may be able to get a file overwritten outside the
extraction path and may evade integrity checks.
Notably absent is the creation and extraction on the same host as
the same user. In such case, an adversary must be assumed to be able
to replace any file directly.
This also (partially) fixes#7099.
diff: include changes in ctime and mtime, fixes#7248
also:
- sort JSON output alphabetically
- add --content-only to ignore metadata changes
Co-authored-by: Michael Deyaso <mdeyaso@fusioniq.io>
Errors handled for backup src files:
- BackupOSError (converted from OSError), e.g. I/O Error
- BackupError (stats race, file changed while we backed it up)
Error Handling:
- retry the same file after some sleep time
- sleep time starts from 1ms, increases exponentially up to 10s
- 10 tries
If retrying does not help:
- BackupOSError: skip the file, log it with "E" status
- BackupError: last try will back it up, log it with "C" status
Works for:
- borg create's normal (builtin) fs recursion
- borg create --paths-from-command
- borg create --paths-from-stdin
Notes:
- update stats.files_stats late (so we don't get wrong
stats in case of e.g. IOErrors while reading the file).
- _process_any: no changes to the big block, just indented
for adding the retry loop and the try/except.
- test_create_erroneous_file succeeds because we retry the file.
we do book-keeping in item.chunks:
in case something goes wrong and we need to clean up,
we will have a list with chunks to decref in item.chunks.
also:
- make variable naming more consistent
- cosmetic changes
if we run into some issue reading an input file, e.g. an I/O error,
the BackupOSError exception raised due to that will skip the current
file and no archive item will be created for this file.
But we maybe have already added some of its content chunks to the repo,
we have either written them as new chunks or incref'd some identical chunk
in the repo.
Added an exception handler that decrefs (and deletes if refcount reaches 0)
these chunks again before re-raising the exception, so the repo is in a
consistent state again and we do not have orphaned content chunks in the repo.
we now just treat that one .borg_part file we might have inside
checkpoint archives as a normal file.
people can recognize via the file name it is a partial file.
nobody cares for statistics of checkpoint files and the final
archive now does not contain any partial files any more, thus
no needs to maintain statistics about count and size of part
files.
checkpoint archives might have a single, incomplete part file as last item.
part files are always a prefix of the full file, growing in size from
checkpoint to checkpoint.
we now manage the archive items metadata stream in a special way:
- checkpoint archive A(n) might end with a partial item PI(n)
- checkpoint archive A(n+1) does not contain PI(n)
- checkpoint archive A(n+1) contains a new partial item PI(n+1)
- the final archive does not contain any partial items
not having this had created orphaned item_ptrs chunks for checkpoint archives.
also:
- borg check: show id of orphaned chunks
- borg check: archive list with explicit consider_checkpoints=True (this is the default, but better make sure).
check --archives: add --newer/--older/--newest/--oldest, fixes#7062
Options accept a timespan, like Nd for N days or Nm for N months.
Use these to do date-based matching on archives and only check some of them,
like: borg check --archives --newer=1m --newest=7d
Author: Michael Deyaso <mdeyaso@fusioniq.io>
Same change for .recreate_cmdline -> .recreate_command_line .
JSON output key "command_line":
borg 1.x: sys.argv [list of str]
borg 2: shlex.join(sys.argv) [str]