Commit Graph

115 Commits

Author SHA1 Message Date
Thomas Waldmann 6087d7487b
WIP (tests failing) implement NewCache
Also:
- move common code to ChunksMixin
- always use ._txn_active (not .txn_active)
2024-04-04 13:33:10 +02:00
Thomas Waldmann ab0b111af0
give clean error msg for invalid nonce file, see #7967
this is a fwd port from 1.4-maint. as we don't have nonce files
any more in master, only the generally useful stuff has been ported.

- add Error / ErrorWithTraceback exception classes to RPC layer.
- add hex_to_bin helper
2024-02-18 14:47:52 +01:00
Thomas Waldmann 9de07ebd46
update "modern" error RCs (docs and code) 2024-02-13 22:58:02 +01:00
Thomas Waldmann 73284db53f
PATH: do not accept empty strings, fixes #4221 2024-01-02 19:17:55 +01:00
Thomas Waldmann fa1e9df0d1
--sort-by: support "archive" as alias of "name", fixes #7873 2023-11-05 18:05:27 +01:00
Thomas Waldmann 6a68ad5cd6
remove archive TAMs 2023-09-24 20:10:51 +02:00
Thomas Waldmann 1b6f928917
ro_type: typed repo objects, see #7670
writing: put type into repoobj metadata
reading: check wanted type against type we got

repoobj metadata is encrypted and authenticated.
repoobj data is encrypted and authenticated, also (separately).
encryption and decryption of both metadata and data get the
same "chunk ID" as AAD, so both are "bound" to that (same) ID.

a repo-side attacker can neither see cleartext metadata/data,
nor successfully tamper with it (AEAD decryption would fail).

also, a repo-side attacker could not replace a repoobj A with a
differently typed repoobj B without borg noticing:
- the metadata/data is cryptographically bound to its ID.
  authentication/decryption would fail on mismatch.
- the type check would fail.

thus, the problem (see CVEs in changelog) solved in borg 1 by the
manifest and archive TAMs is now already solved by the type check.
2023-09-24 20:10:50 +02:00
Thomas Waldmann a0f5264cbd
rlist: remove support for {tam} placeholder
archives are now always TAM-authenticated.
2023-09-03 22:27:24 +02:00
Thomas Waldmann b23e6cb73d
list: support {tam} placeholder. check archive TAM.
list: shows either "verified" or "none", depending on
whether a TAM auth tag could be verified or was
missing (old archives from borg < 1.0.9).

when loading an archive, we now try to verify the archive
TAM, but we do not require it. people might still have
old archives in their repos and we want to be able to
list such repos without fatal exceptions.
2023-08-30 00:58:02 +02:00
Thomas Waldmann 5013121bd8
fix E501 2023-07-26 01:24:20 +02:00
Thomas Waldmann 6151b369c4
fix E741 2023-07-26 01:24:00 +02:00
Thomas Waldmann 35ac39b751
fix F401 2023-07-26 01:23:37 +02:00
Tarrailt 616d5e7330
Add --format option to `borg diff`, resolve issue #4634 (#7534)
diff: add --format option

also: refactoring/improvements of BaseFormatter
2023-06-11 22:41:36 +02:00
Thomas Waldmann db96c0c487
subclass MakePathSafeAction from Highlander 2023-06-10 11:41:31 +02:00
Peter Gerber 438cf2e7ef
Sanitize paths during archive creation/extraction/...
Paths are not always sanitized when creating an archive and,
more importantly, never when extracting one. The following example
shows how this can be used to attempt to write a file outside the
extraction directory:

$ echo abcdef | borg create -r ~/borg/a --stdin-name x/../../../../../etc/shadow archive-1 -
$ borg list -r ~/borg/a archive-1
-rw-rw---- root   root          7 Sun, 2022-10-23 19:14:27  x/../../../../../etc/shadow
$ mkdir borg/target
$ cd borg/target
$ borg extract -r ~/borg/a archive-1
x/../../../../../etc/shadow: makedirs: [Errno 13] Permission denied: '/home/user/borg/target/x/../../../../../etc'

Note that Borg tries to extract the file to /etc/shadow and the
permission error is a result of the user not having access.

This patch ensures file names are sanitized before archiving.
As for files extracted from the archive, paths are sanitized
by making all paths relative, removing '.' elements, and removing
superfluous slashes (as in '//'). '..' elements, however, are
rejected outright. The reasoning here is that it is easy to start
a path with './' or insert a '//' by accident (e.g. via --stdin-name
or import-tar). '..', however, seem unlikely to be the result
of an accident and could indicate a tampered repository.

With paths being sanitized as they are being read, this "errors"
will be corrected during the `borg transfer` required when upgrading
to Borg 2. Hence, the sanitation, when reading the archive,
can be removed once support for reading v1 repositories is dropped.
V2 repository will not contain non-sanitized paths. Of course,
a check for absolute paths and '..' elements needs to kept in
place to detect tempered archives.

I recommend treating this as a security issue. I see the following
cases where extracting a file outside the extraction path could
constitute a security risk:

a) When extraction is done as a different user than archive
creation. The user that created the archive may be able to
get a file overwritten as a different user.
b) When the archive is created on one host and extracted on
another. The user that created the archive may be able to
get a file overwritten on another host.
c) When an archive is created and extracted after a OS reinstall.
When a host is suspected compromised, it is common to reinstall
(or set up a new machine), extract the backups and then evaluate
their integrity. A user that manipulates the archive before such
a reinstall may be able to get a file overwritten outside the
extraction path and may evade integrity checks.

Notably absent is the creation and extraction on the same host as
the same user. In such case, an adversary must be assumed to be able
to replace any file directly.

This also (partially) fixes #7099.
2023-06-07 23:23:53 +02:00
Thomas Waldmann ffc59dd071
implement unix domain (ipc) socket support
server (listening) side:
borg serve --socket  # default location
borg serve --socket=/path/to/socket

client side:
borg -r socket:///path/to/repo create ...
borg --socket=/path/to/socket -r socket:///path/to/repo ...

served connections:
- for ssh: proto: one connection
- for socket: proto: many connections (one after the other)

The socket has user and group permissions (770).

skip socket tests on win32, they hang infinitely, until
github CI terminates them after 60 minutes.

socket tests: use unique socket name

don't use the standard / default socket name, otherwise tests
running in parallel would interfere with each other by using
the same socket / the same borg serve process.

write a .pid file, clean up .pid and .sock file at exit

add stderr print for accepted/finished socket connection
2023-06-06 21:12:54 +02:00
Thomas Waldmann 05bf29f504
fix SortBySpec validator 2023-04-12 01:21:43 +02:00
Thomas Waldmann 0f923c8c4a
fix FilesCacheMode validator 2023-04-12 01:18:05 +02:00
Thomas Waldmann 6d38530ff1
fix ChunkerParams validator and tests 2023-04-12 01:15:46 +02:00
Thomas Waldmann 7a2c757c69
rlist: add size and nfiles to format keys, fixes #6086 2023-04-07 16:37:17 +02:00
Thomas Waldmann 355a50225f
ArchiveFormatter.get_meta: add default value argument 2023-04-07 14:23:43 +02:00
Thomas Waldmann 52793be923
pyupgrade --py39-plus ./**/*.py 2023-04-02 02:14:54 +02:00
Michael Deyaso 2c232449b0
Modified Item.pyx to include diffs in ctime and mtime (#7335)
diff: include changes in ctime and mtime, fixes #7248

also:
- sort JSON output alphabetically
- add --content-only to ignore metadata changes

Co-authored-by: Michael Deyaso <mdeyaso@fusioniq.io>
2023-03-06 23:18:36 +01:00
Thomas Waldmann 7f973a5b34
implement "fail" chunker for testing purposes
--chunker-params=fail,4096,rrrEErrrr means:
- cut chunks of 4096b fixed size (last chunk in a file can be less)
- read chunks 0, 1 and 2 successfully
- error at chunk 3 and 4 (simulated OSError(errno.EIO))
- read successfully again for the next 4 chunks

Chunks are counted inside the chunker instance, starting
from 0, always increasing while the same instance is used.

Read chunks as well as failed chunks count up by 1.
2023-02-13 17:15:45 +01:00
Rayyan Ansari 03a27654e0 parseformat: remove Windows-specific hack
This fixes the rest of the tests.
2023-02-11 16:32:19 +00:00
Michael Deyaso b2654bc17d
Support for date-based matching during archive listing (#7272)
check --archives: add --newer/--older/--newest/--oldest, fixes #7062

Options accept a timespan, like Nd for N days or Nm for N months.

Use these to do date-based matching on archives and only check some of them,
like: borg check --archives --newer=1m --newest=7d

Author: Michael Deyaso <mdeyaso@fusioniq.io>
2023-01-23 15:00:05 +01:00
Thomas Waldmann bf667170a7
ArchiveItem.cmdline list-of-str -> .command_line str, fixes #7246
Same change for .recreate_cmdline -> .recreate_command_line .

JSON output key "command_line":
borg 1.x: sys.argv [list of str]
borg 2: shlex.join(sys.argv) [str]
2023-01-20 00:19:00 +01:00
Thomas Waldmann 249189e04e
refactor: reuse code from remove_surrogates 2023-01-19 20:58:58 +01:00
Thomas Waldmann b3da7d0e72
make sure hostname and username have no surrogate escapes
if they are present, process them through json_text().
this replaces s-e by "?" for the key and puts the binary
representation into key_b64, if needed.

likely this is rarely needed.
2023-01-19 20:46:40 +01:00
Thomas Waldmann 1672aee031
Item: symlinks: rename .source to .target, fixes #7245
Also, in JSON:
- rename "linktarget" to "target" for symlinks
- remove "source" for symlinks
2023-01-16 20:28:25 +01:00
Thomas Waldmann 262812e76f
get_item_data: do not require item.uid/gid, see #7249
if uid is not present, use None.
if gid is not present, use None.
2023-01-16 18:17:03 +01:00
Thomas Waldmann 491f898612
borg2 archive names and comments are always pure unicode 2023-01-16 17:45:48 +01:00
Thomas Waldmann e63cfcd708
json output: use text_to_json, fixes #6151
item: path, source, user, group

for non-unicode stuff borg 1.2 had "bpath".

now we have:
path - unicode approximation (invalid stuff replaced by ?)
path_b64 - base64(path_bytes)  # only if needed

source has the same issue as path and is now covered also.

user and group are usually unicode or even pure ASCII,
but we rather are cautious and cover them also.
2023-01-16 17:45:34 +01:00
Thomas Waldmann 32d430a1b0
implement text_to_json / binary_to_json, see #6151
binary bytes:
- json_key = <key>_b64
- json_value == base64(value)

text (potentially with surrogate escapes):
- json_key1 = <key>
- json_value1 = value_text (s-e replaced by ?)
- json_key2 = <key>_b64
- json_value2 = base64(value_binary)

json_key2/_value2 is only present if value_text required
replacement of surrogate escapes (and thus does not represent
the original value, but just an approximation).
value_binary then gives the original bytes value (e.g. a
non-utf8 bytes sequence).
2023-01-16 17:45:27 +01:00
Rayyan Ansari 66a407ff9f Use \n for the {NL} format specifier
\n is automatically converted on write to the platform-dependent os.linesep.
Using os.linesep instead of \n means that on Windows, the line ending becomes "\r\r\n".
Also switches mentions of {LF} to {NL} in code and docs.
2022-12-18 14:05:24 +00:00
Thomas Waldmann a475227e18
use archivename_validator everywhere
also: simplify, reuse code from text_validator.
2022-12-15 22:54:46 +01:00
Thomas Waldmann b9dc2be661
remove support for bpath placeholder
this needs to get done differently later, maybe base64 for json output.
2022-12-12 19:39:52 +01:00
Thomas Waldmann 1e2741ad3d
remove support for barchive placeholder
not needed any more as the archive name is now validated pure text.
2022-12-12 19:10:58 +01:00
Thomas Waldmann 1517db07ec
remove support for bcomment placeholder
not needed any more as the comment is now validated pure text.
2022-12-12 18:55:14 +01:00
Thomas Waldmann 99c6afbc61
add generic text and comment validator 2022-12-12 18:01:07 +01:00
Thomas Waldmann fe2b2bc007
archive names: validate more strictly, fixes #2290
we want to be able to use an archive name as a directory name,
e.g. for the FUSE fs built by borg mount.

thus we can not allow "/" in an archive name on linux.

on windows, the rules are more restrictive, disallowing
quite some more characters (':<>"|*?' plus some more).
we do not have FUSE fs / borg mount on windows yet, but
we better avoid any issues.
we can not avoid ":" though, as our {now} placeholder
generates ISO-8601 timestamps, including ":" chars.

also, we do not want to have leading/trailing blanks in
archive names, neither surrogate-escapes.

control chars are disallowed also, including chr(0).
we have python str here, thus chr(0) is not expected in there
(is not used to terminate a string, like it is in C).
2022-12-12 17:05:01 +01:00
Thomas Waldmann 287907b218 bsdflags cleanup, #6908
https://github.com/borgbackup/borg/issues/6908#issuecomment-1224839170
2022-09-14 11:24:50 +02:00
Thomas Waldmann d3a2d831b7 refactor replace_placeholders, fixes #6966
fix replacing placeholders in archive name, --comment and --glob-archives values (even if overridden by other options like
--timestamp).

add test.
2022-09-09 23:28:34 +02:00
Thomas Waldmann fa986a9f19 repoobj: add a layer to format/parse repo objects
borg < 2:

obj = encrypted(compressed(data))

borg 2:

obj = enc_meta_len32 + encrypted(msgpacked(meta)) + encrypted(compressed(data))

handle compr / decompr in repoobj

move the assert_id call from decrypt to RepoObj.parse

also:
- for AEADKeyBase, add a dummy assert_id (not needed here)
- only test assert_id for other if not AEADKeyBase instance
- remove test_getting_wrong_chunk. assert_id is called elsewhere
  and is not needed any more anyway with the new AEAD crypto.
- only give manifest (includes key, repo, repo_objs)
- only return manifest from Manifest.load (includes key, repo, repo_objs)
2022-09-04 00:49:38 +02:00
Thomas Waldmann 9beaced33c move manifest module from helpers to borg.manifest 2022-08-13 21:55:12 +02:00
Thomas Waldmann ade08ce842 use timezones
- timezone aware timestamps
- str representation with +HHMM or +HH:MM
- get rid of to_locatime
- fix with_timestamp
- have archive start/end time always in local time with tz or as given
- idea: do not lose tz information

then we know when a backup was made and even from
which timezone it was made. if we want to compute
utc, we can do that using these infos.

this makes a quite nice archives list, with timestamps
as expected (in local time with timezone info).

at some places we just enforce utc, like for the
repo manifest timestamp or for the transaction log,
these are usually not looked at by the user.
2022-08-13 18:31:22 +02:00
Thomas Waldmann 15cb54f623 list: fix {flags:<WIDTH>} formatting, fixes #6081
item.bsdflags is either not present or an int, thus we default to 0 (== no flags) if not present.
2022-07-29 10:44:37 +02:00
Thomas Waldmann b8e48c5036 mypy: fixes / annotations 2022-07-15 14:54:48 +02:00
Thomas Waldmann ff411b5d98 move serve command to archiver.serve 2022-07-09 15:13:12 +02:00
Thomas Waldmann 7957af562d blacken all the code
https://black.readthedocs.io/
2022-07-06 16:34:38 +02:00