Commit Graph

403 Commits

Author SHA1 Message Date
Thomas Waldmann 1269c852bf
create/extract: ignore OSError if ACLs are not supported (ENOTSUP)
but do not silence other OSErrors.
2024-04-02 01:38:18 +02:00
Thomas Waldmann eb79b1f13f
create: deal with EBUSY, fixes #8123
I put it into same class as EPERM and EACCES:
BackupPermissionError: borg is not permitted to access the file.
2024-02-25 12:17:09 +01:00
Thomas Waldmann e7bd18d7f3
create: add the slashdot hack, fixes #4685 2024-02-20 04:08:09 +01:00
Thomas Waldmann 97dd287584
raise BackupOSError subclasses 2024-02-15 17:53:53 +01:00
Thomas Waldmann 900a1674df
move Backup*Error to errors module 2024-02-14 01:40:55 +01:00
Thomas Waldmann c704e5ea9e
new warnings infrastructure to support modern exit codes
- implement updating exit code based on severity, including modern codes
- extend print_warning with kwargs wc (warning code) and wt (warning type)
- update a global warnings_list with warning_info elements
- create a class hierarchy below BorgWarning class similar to Error class
- diff: change harmless warnings about speed to rc == 0
- delete --force --force: change harmless warnings to rc == 0

Also:

- have BackupRaceConditionError as a more precise subclass of BackupError
2024-02-14 01:26:12 +01:00
Thomas Waldmann 9de07ebd46
update "modern" error RCs (docs and code) 2024-02-13 22:58:02 +01:00
Thomas Waldmann d1fde11645
tests: borg check must not add a spoofed archive to manifest
also: do a small optimisation in borg check:

if the type of the repo object is not ROBJ_ARCHIVE_META, we
can skip the object, it can not contain valid archive meta data.

if the type is correct, this is already a sufficient check, so
we can be quite sure that there will be valid archive metadata
in the object.
2023-09-24 20:10:58 +02:00
Thomas Waldmann 6a68ad5cd6
remove archive TAMs 2023-09-24 20:10:51 +02:00
Thomas Waldmann 1b6f928917
ro_type: typed repo objects, see #7670
writing: put type into repoobj metadata
reading: check wanted type against type we got

repoobj metadata is encrypted and authenticated.
repoobj data is encrypted and authenticated, also (separately).
encryption and decryption of both metadata and data get the
same "chunk ID" as AAD, so both are "bound" to that (same) ID.

a repo-side attacker can neither see cleartext metadata/data,
nor successfully tamper with it (AEAD decryption would fail).

also, a repo-side attacker could not replace a repoobj A with a
differently typed repoobj B without borg noticing:
- the metadata/data is cryptographically bound to its ID.
  authentication/decryption would fail on mismatch.
- the type check would fail.

thus, the problem (see CVEs in changelog) solved in borg 1 by the
manifest and archive TAMs is now already solved by the type check.
2023-09-24 20:10:50 +02:00
Thomas Waldmann 15c24cbe7e
recreate: remove --recompress option
For many use cases, the repo-wide "rcompress" is more efficient.

Also, recreate --recompress calls add_chunk with overwrite=True,
which is unsupported with the AdHocCache.
2023-09-23 00:01:39 +02:00
Thomas Waldmann a0f5264cbd
rlist: remove support for {tam} placeholder
archives are now always TAM-authenticated.
2023-09-03 22:27:24 +02:00
Thomas Waldmann 2d78fa89a5
always implicitly require archive TAMs
they must be there since the upgrade to borg 1.2.6 (or other
borg versions that also have a fix for CVE-2023-36811).
2023-09-03 22:02:35 +02:00
Thomas Waldmann 1338646b9d
check: improve logging for TAM issues, fixes #7797 2023-09-03 17:15:09 +02:00
Thomas Waldmann 5cd2060345
rebuild_refcounts: keep archive ID, if possible
rebuild_refcounts verifies and recreates the TAM.
Now it re-uses the salt, so that the archive ID does not change
just because of a new salt if the archive has still the same data.
2023-08-30 01:13:52 +02:00
Thomas Waldmann b23e6cb73d
list: support {tam} placeholder. check archive TAM.
list: shows either "verified" or "none", depending on
whether a TAM auth tag could be verified or was
missing (old archives from borg < 1.0.9).

when loading an archive, we now try to verify the archive
TAM, but we do not require it. people might still have
old archives in their repos and we want to be able to
list such repos without fatal exceptions.
2023-08-30 00:58:02 +02:00
Thomas Waldmann 462c1bdf2e
check: rebuild_refcounts verify and recreate TAM
This part of the archive checker recreates the Archive
items (always, just in case some missing chunks needed
repairing).

When loading the Archive item, we now verify the TAM.
When saving the (potentially modified) Archive item,
we now (re-)generate the TAM.

Archives without a valid TAM are dropped rather than TAM-authenticated
when saving them. There shouldn't be any archives without a valid TAM:

- borg writes an archive TAM since long (1.0.9)
- users are expected to TAM-authenticate archives created
  by older borg when upgrading to borg 1.2.5.

Also:

Archive.set_meta: TAM-authenticate new archive

This is also used by Archive.rename and .recreate.
2023-08-30 00:57:33 +02:00
Thomas Waldmann a2ee13fd34
check: rebuild_manifest must verify archive TAM 2023-08-29 21:10:32 +02:00
Thomas Waldmann 5013121bd8
fix E501 2023-07-26 01:24:20 +02:00
Thomas Waldmann 35ac39b751
fix F401 2023-07-26 01:23:37 +02:00
Tarrailt 616d5e7330
Add --format option to `borg diff`, resolve issue #4634 (#7534)
diff: add --format option

also: refactoring/improvements of BaseFormatter
2023-06-11 22:41:36 +02:00
TW 8506c05ab6
Merge pull request #7642 from Deric-W/typehints
replace `LRUCache` internals with `OrderedDict`
2023-06-11 17:11:41 +02:00
Eric Wolf e683c80c75
replace `LRUCache` internals with `OrderedDict`
Replacing the internals should make the implementation faster
and simpler since the order tracking is done by the `OrderedDict`.

Furthermore, this commit adds type hints to `LRUCache` and
renames the `upd` method to `replace` to make its use more clear.
2023-06-10 20:57:32 +02:00
Peter Gerber 438cf2e7ef
Sanitize paths during archive creation/extraction/...
Paths are not always sanitized when creating an archive and,
more importantly, never when extracting one. The following example
shows how this can be used to attempt to write a file outside the
extraction directory:

$ echo abcdef | borg create -r ~/borg/a --stdin-name x/../../../../../etc/shadow archive-1 -
$ borg list -r ~/borg/a archive-1
-rw-rw---- root   root          7 Sun, 2022-10-23 19:14:27  x/../../../../../etc/shadow
$ mkdir borg/target
$ cd borg/target
$ borg extract -r ~/borg/a archive-1
x/../../../../../etc/shadow: makedirs: [Errno 13] Permission denied: '/home/user/borg/target/x/../../../../../etc'

Note that Borg tries to extract the file to /etc/shadow and the
permission error is a result of the user not having access.

This patch ensures file names are sanitized before archiving.
As for files extracted from the archive, paths are sanitized
by making all paths relative, removing '.' elements, and removing
superfluous slashes (as in '//'). '..' elements, however, are
rejected outright. The reasoning here is that it is easy to start
a path with './' or insert a '//' by accident (e.g. via --stdin-name
or import-tar). '..', however, seem unlikely to be the result
of an accident and could indicate a tampered repository.

With paths being sanitized as they are being read, this "errors"
will be corrected during the `borg transfer` required when upgrading
to Borg 2. Hence, the sanitation, when reading the archive,
can be removed once support for reading v1 repositories is dropped.
V2 repository will not contain non-sanitized paths. Of course,
a check for absolute paths and '..' elements needs to kept in
place to detect tempered archives.

I recommend treating this as a security issue. I see the following
cases where extracting a file outside the extraction path could
constitute a security risk:

a) When extraction is done as a different user than archive
creation. The user that created the archive may be able to
get a file overwritten as a different user.
b) When the archive is created on one host and extracted on
another. The user that created the archive may be able to
get a file overwritten on another host.
c) When an archive is created and extracted after a OS reinstall.
When a host is suspected compromised, it is common to reinstall
(or set up a new machine), extract the backups and then evaluate
their integrity. A user that manipulates the archive before such
a reinstall may be able to get a file overwritten outside the
extraction path and may evade integrity checks.

Notably absent is the creation and extraction on the same host as
the same user. In such case, an adversary must be assumed to be able
to replace any file directly.

This also (partially) fixes #7099.
2023-06-07 23:23:53 +02:00
Thomas Waldmann 573275e678
extract --continue: continue a previously interrupted extraction, fixes #1356
This skips over all previously fully extracted regular files,
but will delete and fully re-extract incomplete files.
2023-04-16 21:09:48 +02:00
Thomas Waldmann 7786cc7cb4
extract: support extraction of atime/mtime on win32 2023-04-16 20:40:35 +02:00
Thomas Waldmann 9e534c1929
Archive.extract_item: remove unused params, make most params kwargs
stripped_components: this is done already in do_extract, it modifies item.path accordingly.

original_path: not used any more.

also: run black.
2023-04-16 15:40:36 +02:00
Thomas Waldmann 52793be923
pyupgrade --py39-plus ./**/*.py 2023-04-02 02:14:54 +02:00
Michael Deyaso 2c232449b0
Modified Item.pyx to include diffs in ctime and mtime (#7335)
diff: include changes in ctime and mtime, fixes #7248

also:
- sort JSON output alphabetically
- add --content-only to ignore metadata changes

Co-authored-by: Michael Deyaso <mdeyaso@fusioniq.io>
2023-03-06 23:18:36 +01:00
Soumik Dutta cad138aa23
Add files changed while reading to Statistics class #7354 (#7378)
add files changed while reading to Statistics class, fixes #7354

Signed-off-by: Soumik Dutta <shalearkane@gmail.com>
2023-02-25 01:47:39 +01:00
Thomas Waldmann 7e6afc93e9
create: implement retries for individual fs files
Errors handled for backup src files:
- BackupOSError (converted from OSError), e.g. I/O Error
- BackupError (stats race, file changed while we backed it up)

Error Handling:
- retry the same file after some sleep time
- sleep time starts from 1ms, increases exponentially up to 10s
- 10 tries

If retrying does not help:
- BackupOSError: skip the file, log it with "E" status
- BackupError: last try will back it up, log it with "C" status

Works for:
- borg create's normal (builtin) fs recursion
- borg create --paths-from-command
- borg create --paths-from-stdin

Notes:
- update stats.files_stats late (so we don't get wrong
  stats in case of e.g. IOErrors while reading the file).
- _process_any: no changes to the big block, just indented
  for adding the retry loop and the try/except.
- test_create_erroneous_file succeeds because we retry the file.
2023-02-23 01:19:19 +01:00
TW 6da5b7d1ba
Merge pull request #7367 from ThomasWaldmann/new-crypto-assert-id
assert_id: better be paranoid, fixes #7362
2023-02-23 01:14:25 +01:00
vhadzhiev a08a3eb173 fixed Statistics.__add__(), fixes #7355 2023-02-20 11:25:28 +02:00
Thomas Waldmann 74a19ee2a0
verify_data: always decompress and call assert_id(), see #7362 2023-02-19 21:25:24 +01:00
Thomas Waldmann 71f8dd3a17
FilesystemObjectProcessors.process_pipe: also add same exception handler there 2023-02-13 20:46:48 +01:00
Thomas Waldmann c9dbe323e3
TarfileObjectProcessors.process_file: also add same exception handler there 2023-02-13 20:46:46 +01:00
Thomas Waldmann d0c61bbbf1
FilesystemObjectProcessors.process_file: clean up orphaned chunks in case of exceptions
Note: no changes inside the indented block,
just added the try and the except block.
2023-02-13 20:46:45 +01:00
Thomas Waldmann f1981715e4
2nd+ hardlinks: add to item.chunks after incref'ing
we do book-keeping in item.chunks:
in case something goes wrong and we need to clean up,
we will have a list with chunks to decref in item.chunks.

also:
- make variable naming more consistent
- cosmetic changes
2023-02-13 20:46:21 +01:00
Thomas Waldmann 5cb3a17796
Revert "avoid orphan content chunks on BackupOSError, fixes #6709"
This reverts commit ffe32316a5.
2023-02-13 18:24:28 +01:00
Thomas Waldmann 303c474f21
better included/excluded status chars, docs, fixes #7321
more consistent now between dry-run and non-dry-run mode.

--filter=... users might need to update the status chars they filter for.
2023-02-10 01:13:21 +01:00
Thomas Waldmann ffe32316a5
avoid orphan content chunks on BackupOSError, fixes #6709
if we run into some issue reading an input file, e.g. an I/O error,
the BackupOSError exception raised due to that will skip the current
file and no archive item will be created for this file.

But we maybe have already added some of its content chunks to the repo,
we have either written them as new chunks or incref'd some identical chunk
in the repo.

Added an exception handler that decrefs (and deletes if refcount reaches 0)
these chunks again before re-raising the exception, so the repo is in a
consistent state again and we do not have orphaned content chunks in the repo.
2023-02-03 01:35:12 +01:00
Thomas Waldmann b92f4aa487
remove --consider-part-files, related stats code, update docs
we now just treat that one .borg_part file we might have inside
checkpoint archives as a normal file.

people can recognize via the file name it is a partial file.

nobody cares for statistics of checkpoint files and the final
archive now does not contain any partial files any more, thus
no needs to maintain statistics about count and size of part
files.
2023-02-01 13:04:18 +01:00
Thomas Waldmann 0fed44110a
remove part files from final archive
checkpoint archives might have a single, incomplete part file as last item.
part files are always a prefix of the full file, growing in size from
checkpoint to checkpoint.

we now manage the archive items metadata stream in a special way:
- checkpoint archive A(n) might end with a partial item PI(n)
- checkpoint archive A(n+1) does not contain PI(n)
- checkpoint archive A(n+1) contains a new partial item PI(n+1)
- the final archive does not contain any partial items
2023-02-01 13:04:12 +01:00
Thomas Waldmann a0330d578e
run black 23.1.0 on the code 2023-02-01 12:30:37 +01:00
Thomas Waldmann 15d1bc0c49
fix checkpointing: add item_ptrs chunks cleanup
not having this had created orphaned item_ptrs chunks for checkpoint archives.

also:
- borg check: show id of orphaned chunks
- borg check: archive list with explicit consider_checkpoints=True (this is the default, but better make sure).
2023-01-31 14:40:41 +01:00
Thomas Waldmann 7e31fab754
cleanup: remove Archive.checkpoint_interval (not used)
checkpoint_interval and checkpoint_volume are only needed for
the ChunksProcessor.
2023-01-31 04:01:39 +01:00
Thomas Waldmann 56b6f1d2e0
create/recreate/import-tar: add --checkpoint-volume option
volume based checkpointing is easier to test than its time based cousin.

also added first checkpointing test.
2023-01-31 04:01:37 +01:00
Thomas Waldmann 61904dd683
ArchiveChecker.check: reorder args, make most kwargs-only 2023-01-23 15:18:11 +01:00
Michael Deyaso b2654bc17d
Support for date-based matching during archive listing (#7272)
check --archives: add --newer/--older/--newest/--oldest, fixes #7062

Options accept a timespan, like Nd for N days or Nm for N months.

Use these to do date-based matching on archives and only check some of them,
like: borg check --archives --newer=1m --newest=7d

Author: Michael Deyaso <mdeyaso@fusioniq.io>
2023-01-23 15:00:05 +01:00
Thomas Waldmann bf667170a7
ArchiveItem.cmdline list-of-str -> .command_line str, fixes #7246
Same change for .recreate_cmdline -> .recreate_command_line .

JSON output key "command_line":
borg 1.x: sys.argv [list of str]
borg 2: shlex.join(sys.argv) [str]
2023-01-20 00:19:00 +01:00