writing: put type into repoobj metadata
reading: check wanted type against type we got
repoobj metadata is encrypted and authenticated.
repoobj data is encrypted and authenticated, also (separately).
encryption and decryption of both metadata and data get the
same "chunk ID" as AAD, so both are "bound" to that (same) ID.
a repo-side attacker can neither see cleartext metadata/data,
nor successfully tamper with it (AEAD decryption would fail).
also, a repo-side attacker could not replace a repoobj A with a
differently typed repoobj B without borg noticing:
- the metadata/data is cryptographically bound to its ID.
authentication/decryption would fail on mismatch.
- the type check would fail.
thus, the problem (see CVEs in changelog) solved in borg 1 by the
manifest and archive TAMs is now already solved by the type check.
Paths are not always sanitized when creating an archive and,
more importantly, never when extracting one. The following example
shows how this can be used to attempt to write a file outside the
extraction directory:
$ echo abcdef | borg create -r ~/borg/a --stdin-name x/../../../../../etc/shadow archive-1 -
$ borg list -r ~/borg/a archive-1
-rw-rw---- root root 7 Sun, 2022-10-23 19:14:27 x/../../../../../etc/shadow
$ mkdir borg/target
$ cd borg/target
$ borg extract -r ~/borg/a archive-1
x/../../../../../etc/shadow: makedirs: [Errno 13] Permission denied: '/home/user/borg/target/x/../../../../../etc'
Note that Borg tries to extract the file to /etc/shadow and the
permission error is a result of the user not having access.
This patch ensures file names are sanitized before archiving.
As for files extracted from the archive, paths are sanitized
by making all paths relative, removing '.' elements, and removing
superfluous slashes (as in '//'). '..' elements, however, are
rejected outright. The reasoning here is that it is easy to start
a path with './' or insert a '//' by accident (e.g. via --stdin-name
or import-tar). '..', however, seem unlikely to be the result
of an accident and could indicate a tampered repository.
With paths being sanitized as they are being read, this "errors"
will be corrected during the `borg transfer` required when upgrading
to Borg 2. Hence, the sanitation, when reading the archive,
can be removed once support for reading v1 repositories is dropped.
V2 repository will not contain non-sanitized paths. Of course,
a check for absolute paths and '..' elements needs to kept in
place to detect tempered archives.
I recommend treating this as a security issue. I see the following
cases where extracting a file outside the extraction path could
constitute a security risk:
a) When extraction is done as a different user than archive
creation. The user that created the archive may be able to
get a file overwritten as a different user.
b) When the archive is created on one host and extracted on
another. The user that created the archive may be able to
get a file overwritten on another host.
c) When an archive is created and extracted after a OS reinstall.
When a host is suspected compromised, it is common to reinstall
(or set up a new machine), extract the backups and then evaluate
their integrity. A user that manipulates the archive before such
a reinstall may be able to get a file overwritten outside the
extraction path and may evade integrity checks.
Notably absent is the creation and extraction on the same host as
the same user. In such case, an adversary must be assumed to be able
to replace any file directly.
This also (partially) fixes#7099.
- file status A/M/E counters
- chunking time
- hashing time
- rx_bytes / tx_bytes
Note: the sleep() in the test is needed due to timestamp granularity on linux being much more coarse than expected (uses the system timer, 100Hz or 250Hz).
borg < 2:
obj = encrypted(compressed(data))
borg 2:
obj = enc_meta_len32 + encrypted(msgpacked(meta)) + encrypted(compressed(data))
handle compr / decompr in repoobj
move the assert_id call from decrypt to RepoObj.parse
also:
- for AEADKeyBase, add a dummy assert_id (not needed here)
- only test assert_id for other if not AEADKeyBase instance
- remove test_getting_wrong_chunk. assert_id is called elsewhere
and is not needed any more anyway with the new AEAD crypto.
- only give manifest (includes key, repo, repo_objs)
- only return manifest from Manifest.load (includes key, repo, repo_objs)
see ticket and borg.helpers.msgpack docstring.
this changeset implements the full migration to
msgpack 2.0 spec (use_bin_type=True, raw=False).
still needed compat to the past is done via want_bytes decoder in borg.item.
this is different default behaviour than in borg < 1.2:
default (numeric_owner=False) is to use the user/group name from the archive,
look up the local uid / gid and then use that for the FUSE fs.
when --numeric-owner is given (numeric_owner=True), then the uid/gid
from the archive is directly used (as it was the default behaviour in
borg < 1.2).
this was implemented like this (changing the default behaviour) to make
borg mount and borg extract behave more similar considering usage of
user/group numeric archived ids or archived names mapped to corresponding
numeric local system ids.
also, both now use the same function to get the uid/gid from the item.
fuse:
- add user and group name entries to default_dir
- also: set internal_dict(!) of new Item with data from Item.as_dict()
wrap msgpack to avoid future upstream api changes making troubles
or that we would have to globally spoil our code with extra params.
make sure the packing is always with use_bin_type=False,
thus generating "old" msgpack format (as borg always did) from
bytes objects.
make sure the unpacking is always with raw=True,
thus generating bytes objects.
note:
safe unicode encoding/decoding for some kinds of data types is done in Item
class (see item.pyx), so it is enough if we care for bytes objects on the
msgpack level.
also wrap exception handling, so borg code can catch msgpack specific
exceptions even if the upstream msgpack code raises way too generic
exceptions typed Exception, TypeError or ValueError.
We use own Exception classes for this, upstream classes are deprecated
Before this changeset, async responses were:
- if not an error: ignored
- if an error: raised as response to the arbitrary/unrelated next command
Now, after sending async commands, the async_response command must be used
to process outstanding responses / exceptions.
We are avoiding to pile up lots of stuff in cases of high latency, because we do NOT
first wait until ALL responses have arrived, but we just can begin to process responses.
Calls with wait=False will just return what we already have received.
Repeated calls with wait=True until None is returned will fetch all responses.
Async commands now actually could have non-exception non-None results, but
this is not used yet. None responses are still dropped.
The motivation for this is to have a clear separation between a request
blowing up because it (itself) failed and failures unrelated to that request /
to that line in the sourcecode.
also: fix processing for async repo obj deletes
exception_ignored is a special object used that is "not None" (as None is used to signal
"finished with processing async results") but also not a potential async response result value.
Also:
added wait=True to chunk_decref() and add_chunk()
this makes async processing explicit - the default is synchronous and you only
need to be careful and do extra steps for async processing if you explicitly
request async by calling with wait=False (usually for speed reasons).
to process async results, use async_response, see above.
This is some 15 times faster than @contextmanager, because no instance
creation is involved and no generator has to be maintained. Overall
difference is low, but still nice for a very simple change.
currently it is just the same global stats also shown in "borg info ARCHIVE",
just without the archive-specific stats.
also: add separate test for "borg info".