Commit Graph

65 Commits

Author SHA1 Message Date
Thomas Waldmann 21d4407170
always implicitly require manifest TAMs
remove a lot of complexity from the code that was just there to
support legacy borg versions < 1.0.9 which did not TAM authenticate
the manifest.

since then, borg writes TAM authentication to the manifest,
even if the repo is unencrypted.
if the repo is unencrypted, it did not check the somehow pointless
authentication that was generated without any secret, but
if we add that fake TAM, we can also verify the fake TAM.

if somebody explicitly switches off all crypto, they can not
expect authentication.

for everybody else, borg now always generates the TAM and also
verifies it.
2023-09-03 22:01:46 +02:00
Thomas Waldmann 51e68c24e4
manifest: move item_keys into config dict, fixes #7710
also: manifest.version == 2 now
2023-07-05 01:11:24 +02:00
Tarrailt 616d5e7330
Add --format option to `borg diff`, resolve issue #4634 (#7534)
diff: add --format option

also: refactoring/improvements of BaseFormatter
2023-06-11 22:41:36 +02:00
Peter Gerber 438cf2e7ef
Sanitize paths during archive creation/extraction/...
Paths are not always sanitized when creating an archive and,
more importantly, never when extracting one. The following example
shows how this can be used to attempt to write a file outside the
extraction directory:

$ echo abcdef | borg create -r ~/borg/a --stdin-name x/../../../../../etc/shadow archive-1 -
$ borg list -r ~/borg/a archive-1
-rw-rw---- root   root          7 Sun, 2022-10-23 19:14:27  x/../../../../../etc/shadow
$ mkdir borg/target
$ cd borg/target
$ borg extract -r ~/borg/a archive-1
x/../../../../../etc/shadow: makedirs: [Errno 13] Permission denied: '/home/user/borg/target/x/../../../../../etc'

Note that Borg tries to extract the file to /etc/shadow and the
permission error is a result of the user not having access.

This patch ensures file names are sanitized before archiving.
As for files extracted from the archive, paths are sanitized
by making all paths relative, removing '.' elements, and removing
superfluous slashes (as in '//'). '..' elements, however, are
rejected outright. The reasoning here is that it is easy to start
a path with './' or insert a '//' by accident (e.g. via --stdin-name
or import-tar). '..', however, seem unlikely to be the result
of an accident and could indicate a tampered repository.

With paths being sanitized as they are being read, this "errors"
will be corrected during the `borg transfer` required when upgrading
to Borg 2. Hence, the sanitation, when reading the archive,
can be removed once support for reading v1 repositories is dropped.
V2 repository will not contain non-sanitized paths. Of course,
a check for absolute paths and '..' elements needs to kept in
place to detect tempered archives.

I recommend treating this as a security issue. I see the following
cases where extracting a file outside the extraction path could
constitute a security risk:

a) When extraction is done as a different user than archive
creation. The user that created the archive may be able to
get a file overwritten as a different user.
b) When the archive is created on one host and extracted on
another. The user that created the archive may be able to
get a file overwritten on another host.
c) When an archive is created and extracted after a OS reinstall.
When a host is suspected compromised, it is common to reinstall
(or set up a new machine), extract the backups and then evaluate
their integrity. A user that manipulates the archive before such
a reinstall may be able to get a file overwritten outside the
extraction path and may evade integrity checks.

Notably absent is the creation and extraction on the same host as
the same user. In such case, an adversary must be assumed to be able
to replace any file directly.

This also (partially) fixes #7099.
2023-06-07 23:23:53 +02:00
Michael Deyaso 2c232449b0
Modified Item.pyx to include diffs in ctime and mtime (#7335)
diff: include changes in ctime and mtime, fixes #7248

also:
- sort JSON output alphabetically
- add --content-only to ignore metadata changes

Co-authored-by: Michael Deyaso <mdeyaso@fusioniq.io>
2023-03-06 23:18:36 +01:00
Thomas Waldmann b92f4aa487
remove --consider-part-files, related stats code, update docs
we now just treat that one .borg_part file we might have inside
checkpoint archives as a normal file.

people can recognize via the file name it is a partial file.

nobody cares for statistics of checkpoint files and the final
archive now does not contain any partial files any more, thus
no needs to maintain statistics about count and size of part
files.
2023-02-01 13:04:18 +01:00
Thomas Waldmann bf667170a7
ArchiveItem.cmdline list-of-str -> .command_line str, fixes #7246
Same change for .recreate_cmdline -> .recreate_command_line .

JSON output key "command_line":
borg 1.x: sys.argv [list of str]
borg 2: shlex.join(sys.argv) [str]
2023-01-20 00:19:00 +01:00
Thomas Waldmann 1672aee031
Item: symlinks: rename .source to .target, fixes #7245
Also, in JSON:
- rename "linktarget" to "target" for symlinks
- remove "source" for symlinks
2023-01-16 20:28:25 +01:00
Thomas Waldmann 215ccaebea cosmetic: spaces, typos 2022-09-29 20:40:07 +02:00
Thomas Waldmann fd5019a7b2 cpdef variables -> cdef
warning: src/borg/item.pyx:199:10: cpdef variables will not be supported in Cython 3; currently they are no different from cdef variables

  warning: src/borg/item.pyx:200:10: cpdef variables will not be supported in Cython 3; currently they are no different from cdef variables

  warning: src/borg/item.pyx:202:10: cpdef variables will not be supported in Cython 3; currently they are no different from cdef variables
2022-09-29 20:40:07 +02:00
Thomas Waldmann ce2dd6df24 item.pyx: use more cython and turn PropDict properties to a descriptor
this turns all python level classes into extension type classes.

additionally it turns the indirect properties into direct descriptors.

test_propdict_attributes runs about 30% faster.

base memory usage as reported by sys.getsizeof(Item()):
before: 48 bytes, after this PR: 40 bytes

Author: @RonnyPfannschmidt in PR #5763
2022-09-29 20:39:06 +02:00
Thomas Waldmann 57ca9f6e74 faster implementation of item.chunks_contents_equal
This is about 10x faster than before, thanks to Ronny!

Author: @RonnyPfannschmidt in PR #5763
2022-09-28 18:57:40 +02:00
Thomas Waldmann b71ab084ba transfer: fix user/group == None crash with borg1 archives 2022-09-26 23:00:29 +02:00
Thomas Waldmann fb74fdb710 massively increase per archive metadata stream size limit, fixes #1473
implemented by introducing one level of indirection, the limit is now
very high, so it is not practically relevant any more.

we always use the indirection (storing the metadata stream chunk ids list not
directly into the archive item, but into some repo objects referenced by the new
ArchiveItem.item_ptrs list).

thus, the code behaves the same for all archive sizes.
2022-08-06 19:01:41 +02:00
Thomas Waldmann d5df53732d increase Key.version to 2
Old borg (< 2.0) can not read/process the new keys that have crypt_key instead of enc_key and enc_hmac_key.
2022-08-03 12:25:58 +02:00
Thomas Waldmann 3ee69bc7ba Key: crypt_key instead of enc_key + enc_hmac_key, fixes #6611 2022-08-03 12:04:23 +02:00
Thomas Waldmann b726aa5665 remove csize support from get_size 2022-06-12 15:48:33 +02:00
Thomas Waldmann ace5957524 remove csize from item.chunks elements 2022-06-12 15:48:33 +02:00
Thomas Waldmann b9f9623a6d prepare to remove csize (set it to 0 for now) 2022-06-12 15:48:33 +02:00
Thomas Waldmann 08228fbd32 Item: remove unused hardlink_masters param 2022-06-09 17:57:28 +02:00
Thomas Waldmann 58009f6773 Key: fix once, remove decode=... 2022-06-09 17:57:28 +02:00
Thomas Waldmann ed22f721f3 EncryptedKey: fix once, remove decode=... 2022-06-09 17:57:28 +02:00
Thomas Waldmann f2b085787b Item: disallow None value for .user/group/chunks/chunks_healthy
If we do not know the value, just do not have that key/value pair in the item.
2022-06-09 17:57:28 +02:00
Thomas Waldmann 64cc16a9f4 Item: fix xattr processing
Item.xattrs is now always a StableDict mapping bytes keys -> bytes values.

The special casing of empty values (b'') getting replaced by None was removed.
2022-06-09 17:57:28 +02:00
Thomas Waldmann 9d684120a2 Item: assert type also in property getter
also: fixed Item.xattrs to be StableDict (not just a dict, as the
msgpack unpacker gives us)
2022-06-09 17:57:28 +02:00
Thomas Waldmann 7b138cc710 Item: convert timestamps once, get rid of bigint code 2022-06-09 17:57:28 +02:00
Thomas Waldmann 8e58525fc6 Item: remove some decode= params
update_internal() makes sure they have the desired type already.
2022-06-09 17:57:28 +02:00
Thomas Waldmann 655c1b9cc2 update docstrings / comments 2022-06-09 17:57:28 +02:00
Thomas Waldmann 33444be926 more str vs bytes fixing 2022-06-09 17:57:28 +02:00
Thomas Waldmann 8e87f1111b cleanup msgpack related str/bytes mess, fixes #968
see ticket and borg.helpers.msgpack docstring.

this changeset implements the full migration to
msgpack 2.0 spec (use_bin_type=True, raw=False).

still needed compat to the past is done via want_bytes decoder in borg.item.
2022-06-09 17:57:28 +02:00
Thomas Waldmann f8dbe5b542 cleanup msgpack related str/bytes mess, see #968
see ticket and borg.helpers.msgpack docstring.
2022-06-09 17:57:28 +02:00
Thomas Waldmann 6bfdb3f630 refactor hardlink_master processing globally
borg now has the chunks list in every item with content.
due to the symmetric way how borg now deals with hardlinks using
item.hlid, processing gets much simpler.

but some places where borg deals with other "sources" of hardlinks
still need to do some hardlink management:
borg uses the HardLinkManager there now (which is not much more
than a dict, but keeps documentation at one place and avoids some
code duplication we had before).

item.hlid is computed via hardlink_id function.

support hardlinked symlinks, fixes #2379
as we use item.hlid now to group hardlinks together,
there is no conflict with the item.source usage for
symlink targets any more.

2nd+ hardlinks now add to the files count as did the 1st one.
for borg, now all hardlinks are created equal.
so any hardlink item with chunks now adds to the "file" count.

ItemFormatter: support {hlid} instead of {source} for hardlinks
2022-05-18 14:20:01 +02:00
Thomas Waldmann 7903dad183 transfer: convert timestamps int/bigint -> msgpack.Timestamp, see #2323
Timestamp scales to 64 or 96bit serialization formats, that should be enough for everybody.

We use this in archived items and also in the files cache.
2022-05-18 14:20:01 +02:00
Thomas Waldmann e4a97ea8cc transfer: all hardlinks have chunks, maybe chunks_healty, hlid
Item.hlid: same id, same hardlink (xxh64 digest)
Item.hardlink_master: not used for new archives any more
Item.source: not used for hardlink slaves any more
2022-05-18 14:20:01 +02:00
TW c60a314ee0 diff: support presence change for blkdev, chrdev and fifo items (1.2-maint) (#6615)
diff: support presence change for blkdev, chrdev and fifo items

also: refactor / clean up / reuse code.
2022-04-19 16:49:21 +02:00
Andrey Andreyevich Bienkowski 56c27a99d0
Argon2 the second part: implement key encryption / decryption (#6469)
Argon2 the second part: implement encryption/decryption of argon2 keys

borg init --key-algorithm=argon2 (new default, older pbkdf2 also still available)

borg key change-passphrase: keep key algorithm the same
borg key change-location: keep key algorithm the same

use env var BORG_TESTONLY_WEAKEN_KDF=1 to resource limit (cpu, memory, ...) the kdf when running the automated tests.
2022-04-07 16:22:34 +02:00
TW c114e060ec
Merge pull request #5788 from RonnyPfannschmidt/move-chunks-equals
move chunk_equals to module level and modernize tox.ini
2021-05-02 21:14:59 +02:00
Ronny Pfannschmidt 603023bbd5 transform _chunk_content_equal into a global function to ease later benchmarking 2021-05-02 17:29:37 +02:00
Thomas Waldmann 6f9b9e5a53 s/numeric_owner/numeric_ids/g 2021-04-16 15:02:16 +02:00
Robert Blenis b2dea4422e add --json-lines option to diff command 2021-03-13 11:50:55 -05:00
Thomas Waldmann 95ee729086 PropDict: refactor / micro-optimize
- do not call update methods if there is nothing to do (empty dict)
- order if/elif/else by simplicity / probability
2020-11-10 13:49:15 +01:00
Thomas Waldmann 0e1cf2056b PropDict: fail early if internal_dict is not a dict 2020-11-10 13:35:07 +01:00
Thomas Waldmann a65cefb7bb bump API_VERSIONs to 1.2_xx 2019-02-24 19:45:41 +01:00
Thomas Waldmann e569595974 include size/csize/nfiles[_parts] stats into archive, fixes #3241 2019-02-23 15:05:07 +01:00
Thomas Waldmann 58f177aa82 add comment about unused recreate_* members in ArchiveItem 2019-02-23 10:49:24 +01:00
Thomas Waldmann fc30a0765b remove ARCHIVE_KEYS duplication
also: get key set in sync, obviously we have "recreate_partial_chunks"
in ArchiveItem still.
2019-02-23 10:09:40 +01:00
Thomas Waldmann c4ffbd2a17 prepare to support multiple chunkers 2019-02-13 04:24:14 +01:00
Sam H b0141c1dc9 include item birthtime in archive (where available) (#3313)
include item birthtime in archive, fixes #3272

* use `safe_ns` when reading birthtime into attributes
* proper order for `birthtime` in `ITEM_KEYS` list
* use `bigint` wrapper for consistency
* Add tests to verify that birthtime is normally preserved, but not preserved when `--nobirthtime` is passed to `borg create`.
2017-11-13 14:55:10 +01:00
Simon Frei 9dc22d230f Refactor the diff functionality
This factors out a lot of the logic in do_diff in archiver.py to Archive in
archive.py and a new class ItemDiff in item.pyx. The idea is to move methods
to the classes that are affected and to make it reusable, primarily for a new
option to fuse (#2475).
2017-08-13 21:23:04 +02:00
Simon Frei 37f75519cf Only style changes - still NOT functional 2017-08-06 01:46:20 +02:00