Commit Graph

403 Commits

Author SHA1 Message Date
Thomas Waldmann 1672aee031
Item: symlinks: rename .source to .target, fixes #7245
Also, in JSON:
- rename "linktarget" to "target" for symlinks
- remove "source" for symlinks
2023-01-16 20:28:25 +01:00
Thomas Waldmann 4dcc48f5c4
extract: chown only if we have u/g info in archived item, see #7249
also: move get_item_uid_gid() to "not is_win32" block for now.
2023-01-16 18:17:13 +01:00
Thomas Waldmann b338eb0ce8
process_pipe: allow creating item w/o user/group/uid/gid, see #7249 2023-01-16 18:17:09 +01:00
Thomas Waldmann 4f9cda1aab
get_item_uid_gid: do not require item.uid/gid, see #7249
if uid is not present, fall back to uid_default.
if gid is not present, fall back to gid_default.
2023-01-16 18:12:34 +01:00
TW d49665526c
Merge pull request #7232 from ThomasWaldmann/json_b64
implement and use (text|binary)_to_json
2023-01-16 18:10:52 +01:00
Thomas Waldmann e63cfcd708
json output: use text_to_json, fixes #6151
item: path, source, user, group

for non-unicode stuff borg 1.2 had "bpath".

now we have:
path - unicode approximation (invalid stuff replaced by ?)
path_b64 - base64(path_bytes)  # only if needed

source has the same issue as path and is now covered also.

user and group are usually unicode or even pure ASCII,
but we rather are cautious and cover them also.
2023-01-16 17:45:34 +01:00
Thomas Waldmann aa78314ffe
recreate: when --target is given, do not detect "nothing to do"
use case:

borg recreate -a src --target dst can be used to make a copy
of an archive inside the same repository, see #7254.
2023-01-14 19:05:50 +01:00
Thomas Waldmann 5d8801e72c
macOS: fix mtime timestamp extraction if ResourceFork xattr is present, fixes #7234
setting the timestamps after xattrs helps for correct mtime,
but atime is still broken in this case.
2023-01-06 21:58:35 +01:00
Paul D a85b643866 Docs grammar fixes.
One cannot "to not x", but one can "not to x".
Avoiding split infinitives gives the added bonus that machine
translation yields better results.

setup (n/adj) vs set(v) up. We don't "I setup it" but "I set it up".

Likewise for login(n/adj) and log(v) in, backup(n/adj) and back(v) up.
2022-12-29 00:01:48 +00:00
Thomas Waldmann 8747644540
remove --save-space
this option did not change behaviour since longer,
we only had kept it for API compatibility.

as a borg2 repo server won't have old clients talking to it,
we can safely remove this everywhere now.
2022-12-17 16:48:54 +01:00
Thomas Waldmann 1f859c9f17
refactor: get archive timestamps via archive_ts_now() 2022-12-04 10:55:17 +01:00
Franco Ayala 2ed7f317d3
Adding performance statistics to borg create (#6991)
- file status A/M/E counters
- chunking time
- hashing time
- rx_bytes / tx_bytes

Note: the sleep() in the test is needed due to timestamp granularity on linux being much more coarse than expected (uses the system timer, 100Hz or 250Hz).
2022-10-19 21:40:02 +02:00
Thomas Waldmann 2e83d18d35 archive.save(): always use metadata from stats, fixes #7072
e.g. nfiles, size, etc.

fixes:
- checkpoint archives did not have this metadata yet
- borg import-tar did not have this metadata yet
2022-10-03 23:25:02 +02:00
Thomas Waldmann c339be7df9 get_chunker: fix missing sparse=False argument, fixes #7056 2022-10-02 14:09:19 +02:00
TW 78b1301b98
Merge pull request #7028 from ThomasWaldmann/match-archives
implement pattern support for --match-archives, fixes #6504
2022-09-27 11:38:27 +02:00
Thomas Waldmann c4e54ca44e repository.scan: use same end_segment within same scan
achieved by putting it into the state that is now used instead of the marker.
2022-09-19 21:14:25 +02:00
Thomas Waldmann 49a4884cfe repository.scan: do not use chunkid as marker, but (segment, offset)
when using .scan(limit, marker), we used to use the last chunkid from
the previously returned scan result to remember how far we got and
from where we need to continue.

as this approach used the repo index to look up the respective segment/offset,
it was problematic if the code using scan was re-writing the chunk to
a new segment/offset, updating the repo index (e.g. when recompressing a chunk)
and basically destroying the memory about from where we need to continue
scanning.

thus, directly returning (segment, offset) as marker is easier and solves this issue.
2022-09-19 12:03:13 +02:00
Thomas Waldmann 1a6b60f415 mode, user/group id/name: minor code refactor, remove None values at transfer time, #6908
https://github.com/borgbackup/borg/issues/6908#issuecomment-1224910916
2022-09-16 21:12:29 +02:00
Thomas Waldmann 4493d396e6 implement pattern support for --match-archives, fixes #6504
also:
- rename --glob-archives option to --match-archives (short: -a, unchanged)
- globbing patterns now need sh: prefix
- regex patterns need re: prefix
- "identical" match "patterns" use an id: prefix
- new default style is id: pattern (--glob-archives used sh: glob pattern)
- source code: glob -> match, GLOB -> PATTERN
2022-09-16 15:10:13 +02:00
Thomas Waldmann 6e2419f3b2 timestamps: minor code refactor, nothing else to do, #6908
https://github.com/borgbackup/borg/issues/6908#issuecomment-1224886207
2022-09-14 18:19:35 +02:00
Thomas Waldmann 6a1c64b0dc xattrs cleanup, #6908
https://github.com/borgbackup/borg/issues/6908#issuecomment-1224870018
2022-09-14 13:57:40 +02:00
Thomas Waldmann 287907b218 bsdflags cleanup, #6908
https://github.com/borgbackup/borg/issues/6908#issuecomment-1224839170
2022-09-14 11:24:50 +02:00
TW c258eb45f4
Merge pull request #7008 from KN4CK3R/forwardport-6990
xattrs / extended stat: improve exception handling (master)
2022-09-10 00:52:38 +02:00
Thomas Waldmann b28d6ee657 recompress: only read metadata to check for ctype/clevel 2022-09-08 20:47:40 +02:00
Thomas Waldmann 4c9ed2a6c6 refactor compressors to new api
legacy: add/remove ctype/clevel bytes prefix of compressed data

new: use a separate metadata dict

compressors: use an int as ID, not a len 1 bytestring
2022-09-07 19:23:47 +02:00
Thomas Waldmann 1e156ca02b fix upgrader 2022-09-07 19:23:11 +02:00
TW 68e43911f5 Merge pull request #6990 from ThomasWaldmann/more-fine-grained-extended-stat-1.2
xattrs / extended stat: improve exception handling (1.2-maint)
2022-09-07 09:34:52 +02:00
Thomas Waldmann b6cbf045ff add a test for borg 1 -> 2 repo objects transformation 2022-09-05 22:17:51 +02:00
Thomas Waldmann fa986a9f19 repoobj: add a layer to format/parse repo objects
borg < 2:

obj = encrypted(compressed(data))

borg 2:

obj = enc_meta_len32 + encrypted(msgpacked(meta)) + encrypted(compressed(data))

handle compr / decompr in repoobj

move the assert_id call from decrypt to RepoObj.parse

also:
- for AEADKeyBase, add a dummy assert_id (not needed here)
- only test assert_id for other if not AEADKeyBase instance
- remove test_getting_wrong_chunk. assert_id is called elsewhere
  and is not needed any more anyway with the new AEAD crypto.
- only give manifest (includes key, repo, repo_objs)
- only return manifest from Manifest.load (includes key, repo, repo_objs)
2022-09-04 00:49:38 +02:00
Thomas Waldmann 578639b35e move lrucache module to borg.helpers 2022-08-13 22:02:04 +02:00
Thomas Waldmann 9beaced33c move manifest module from helpers to borg.manifest 2022-08-13 21:55:12 +02:00
Thomas Waldmann ade08ce842 use timezones
- timezone aware timestamps
- str representation with +HHMM or +HH:MM
- get rid of to_locatime
- fix with_timestamp
- have archive start/end time always in local time with tz or as given
- idea: do not lose tz information

then we know when a backup was made and even from
which timezone it was made. if we want to compute
utc, we can do that using these infos.

this makes a quite nice archives list, with timestamps
as expected (in local time with timezone info).

at some places we just enforce utc, like for the
repo manifest timestamp or for the transaction log,
these are usually not looked at by the user.
2022-08-13 18:31:22 +02:00
Thomas Waldmann bab68a8d25 use py37+ datetime.isoformat / .fromisoformat
since python 3.7, .isoformat() is usable IF timespec != "auto"
is given ("auto" [default] would be as evil as before, sometimes
formatting with, sometimes without microseconds).

also since python 3.7, there is now .fromisoformat().
2022-08-11 21:18:56 +02:00
Thomas Waldmann fb74fdb710 massively increase per archive metadata stream size limit, fixes #1473
implemented by introducing one level of indirection, the limit is now
very high, so it is not practically relevant any more.

we always use the indirection (storing the metadata stream chunk ids list not
directly into the archive item, but into some repo objects referenced by the new
ArchiveItem.item_ptrs list).

thus, the code behaves the same for all archive sizes.
2022-08-06 19:01:41 +02:00
Thomas Waldmann 53830ecae9 check: try harder to create the key, fixes #5719
the old code did just 1 attempt to detect the repo decryption key.
if the first chunkid we got from the chunks hashtable iterator was accidentally
the id of the chunk we intentionally corrupted in test_delete_double_force,
setup of the key failed and that made the test crash.

in practice, this could of course also happen if chunks are corrupted, thus
we now do many retries with other chunks before giving up.

error handling was improved: do not return None (instead of a key), it just
leads to weird crashes elsewhere, but fail early with IntegrityError and a
reasonable error msg.

rename method to make_key to avoid confusion with borg.crypto.key.identify_key.
2022-07-29 10:34:58 +02:00
Thomas Waldmann b0db800b5a check: --verify-data does not need to decompress with new crypto modes 2022-07-20 15:51:33 +02:00
Thomas Waldmann 7bc7f01342 remove remainders of attic legacy
we expect that everybody has upgraded to borg
using borg 1.2.x or older, thus we do not need
to care about attic repos any more in borg2.
2022-07-13 16:55:29 +02:00
Thomas Waldmann 7957af562d blacken all the code
https://black.readthedocs.io/
2022-07-06 16:34:38 +02:00
TW 80289215d6
Merge pull request #6837 from ThomasWaldmann/recreate-recompress-considering-level
recreate: consider level for recompression, fixes #6698, fixes #3622
2022-07-06 14:11:06 +02:00
Thomas Waldmann 0dc25000a9 recreate: consider level for recompression, fixes #6698, fixes #3622 2022-07-05 02:38:09 +02:00
Thomas Waldmann 350393c9fd remove unused imports 2022-07-05 00:05:07 +02:00
Thomas Waldmann c36c75db59 borg check: remove --name, better use -a
The glob can also match precisely one archive,
so this does the same with less code.
2022-06-25 22:17:29 +02:00
Thomas Waldmann 31a081f695 simplify stats output
also:
- move stats related stuff to Statistics class
- repo ops give repo / overall stats
- archive ops give archive stats
- adapt tests
2022-06-23 16:00:12 +02:00
Thomas Waldmann 49adb77157 calc_stats: deduplicated size now, was deduplicated csize
also: remove pre12_meta cache
2022-06-12 17:15:13 +02:00
Thomas Waldmann 19dfbe5c5c compute the deduplicated size before compression
so we do not need csize for it.
2022-06-12 17:15:13 +02:00
Thomas Waldmann 2c1f7951c4 remove csize from ChunkIndexEntry 2022-06-12 17:15:13 +02:00
Thomas Waldmann b726aa5665 remove csize support from get_size 2022-06-12 15:48:33 +02:00
Thomas Waldmann ace5957524 remove csize from item.chunks elements 2022-06-12 15:48:33 +02:00
Thomas Waldmann b9f9623a6d prepare to remove csize (set it to 0 for now) 2022-06-12 15:48:33 +02:00
Thomas Waldmann f2b085787b Item: disallow None value for .user/group/chunks/chunks_healthy
If we do not know the value, just do not have that key/value pair in the item.
2022-06-09 17:57:28 +02:00
Thomas Waldmann 8e87f1111b cleanup msgpack related str/bytes mess, fixes #968
see ticket and borg.helpers.msgpack docstring.

this changeset implements the full migration to
msgpack 2.0 spec (use_bin_type=True, raw=False).

still needed compat to the past is done via want_bytes decoder in borg.item.
2022-06-09 17:57:28 +02:00
Thomas Waldmann f8dbe5b542 cleanup msgpack related str/bytes mess, see #968
see ticket and borg.helpers.msgpack docstring.
2022-06-09 17:57:28 +02:00
Thomas Waldmann 32a3601e4a compute hlid from inode / device 2022-06-09 17:49:16 +02:00
Thomas Waldmann d3dfa3be30 use version 2 for new archives
but still be able to read v1 archives
for borg transfer.
2022-06-09 17:49:16 +02:00
Thomas Waldmann e5f1a4fb4d recreate: cachedir_masters not needed any more
now all hardlinked regular file items have chunks.
2022-05-18 14:20:01 +02:00
Thomas Waldmann 6bfdb3f630 refactor hardlink_master processing globally
borg now has the chunks list in every item with content.
due to the symmetric way how borg now deals with hardlinks using
item.hlid, processing gets much simpler.

but some places where borg deals with other "sources" of hardlinks
still need to do some hardlink management:
borg uses the HardLinkManager there now (which is not much more
than a dict, but keeps documentation at one place and avoids some
code duplication we had before).

item.hlid is computed via hardlink_id function.

support hardlinked symlinks, fixes #2379
as we use item.hlid now to group hardlinks together,
there is no conflict with the item.source usage for
symlink targets any more.

2nd+ hardlinks now add to the files count as did the 1st one.
for borg, now all hardlinks are created equal.
so any hardlink item with chunks now adds to the "file" count.

ItemFormatter: support {hlid} instead of {source} for hardlinks
2022-05-18 14:20:01 +02:00
Thomas Waldmann 98b7dc0bf5 transfer: clean item of attic 0.13 'acl' bug remnants
also: remove attic bug support code from borg check.

borg transfer removes the acl key. we do not run borg check on old repos.
2022-05-18 14:20:00 +02:00
Thomas Waldmann 1c0937958d show_progress: add finished=true/false to archive_progress json, fixes #6570
also:
- remove empty values from final json
- add test
2022-05-08 18:32:07 +02:00
Thomas Waldmann cc0e33da65 fix key.decrypt calls
the id must now always be given correctly because
the AEAD crypto modes authenticate the chunk id.

the special case when id == MANIFEST_ID is now handled
inside assert_id, so we never need to give a None id.
2022-05-02 20:56:50 +02:00
Thomas Waldmann e199f5bc6c metadata stream can produce all-zero chunks, fixes #6587
all-zero chunks are propagated as:
CH_ALLOC, data=None, size=len(zeros)

other chunks are:
CH_DATA, data=data, size=len(data)

also: remove the comment with the wrong assumption
2022-04-14 00:22:05 +02:00
Thomas Waldmann b5f7f2376c check archives: improve error handling for corrupt archive metadata block
this is similar to #4777.

borg check must not crash if an archive metadata block does not decrypt.

Instead, report the archive_id, remove the archive from the manifest and skip to the next archive.
2022-04-12 17:47:43 +02:00
Thomas Waldmann ced3d8b9d5 check archive: make robust_iterator more robust, fixes #4777
borg check must not crash if an archive metadata chunk does not decrypt.

Instead, report the chunk and skip to the next one.
2022-04-12 17:47:32 +02:00
TW 28fa9e0f0b
Merge pull request #6523 from ThomasWaldmann/pax-borg-item-master
import/export-tar: --tar-format=BORG: roundtrip ALL item metadata
2022-04-09 20:22:36 +02:00
TW 1e213e93a3
Merge pull request #6544 from ThomasWaldmann/fix-progress-archivename-master
escape % chars in archive name, fixes #6500
2022-04-07 20:22:20 +02:00
Thomas Waldmann 911da7a1cf escape % chars in archive name, fixes #6500
also: fix percentage format for float value.
2022-04-07 18:07:50 +02:00
Björn Ketelaars e86fde5364 Fix OpenBSD symlink mode test failure (#2055)
OpenBSD does not have `lchmod()` causing `os.lchmod` to be unavailable
on this platform. As a result ArchiverTestCase::test_basic_functionality
fails when run manually (#2055).

OpenBSD does have `fchmodat()`, which has a flag that makes it behave
like `lchmod()`. In Python this can be used via `os.chmod(path, mode,
follow_symlinks=False)`.

As of Python 3.3 `os.lchmod(path, mode)` is equivalent to
`os.chmod(path, mode, follow_symlinks=False)`. As such, switching to the
latter is preferred as it enables more platforms to do the right thing.
2022-04-04 21:55:48 +02:00
Thomas Waldmann e8069a8f80 import/export-tar: --tar-format=BORG: roundtrip ALL item metadata, fixes #5830
export-tar: just msgpack and b64encode all item metadata and
            put that into a BORG specific PAX header.
            this is *additional* to the standard tar metadata.

import-tar: when detecting the BORG specific PAX header, just get
            all metadata from there (and ignore the standard tar
            metadata).
2022-04-02 22:25:44 +02:00
Thomas Waldmann 78e92fa9e1 import/export-tar: --tar-format, support ctime/atime
--tar-format=GNU|PAX (default: GNU)

changed the tests which use GNU tar cli tool to use --tar-format=GNU
explicitly, so they don't break in case we change the default.

atime timestamp is only present in output if the archive item has it
(which is not the case by default, needs "borg create --atime ...").
2022-04-02 18:30:55 +02:00
Thomas Waldmann d3b78a6cf5 minor key.encrypt api change/cleanup
we already have .decrypt(id, data, ...).
i changed .encrypt(chunk) to .encrypt(id, data).

the old borg crypto won't really need or use the id,
but the new AEAD crypto will authenticate the id in future.
2022-03-26 17:05:57 +01:00
Thomas Waldmann 2bcee08b88 import-tar: fix mtime type bug
looks like with a .tar file created by the tar tool,
tarinfo.mtime is a float [s]. So, after converting to
nanoseconds, we need to cast to int because that's what
Item.mtime wants.

also added a safe_ns() there to clip values to the safe range.
2022-03-05 16:24:59 -05:00
Thomas Waldmann cbeef56454 pyupgrade --py38-plus ./**/*.py 2022-02-27 20:11:56 +01:00
TW 4896fe1560
Merge pull request #6296 from ThomasWaldmann/cache-pre12-archive-meta
info: use a pre12-meta cache to accelerate stats for borg < 1.2 archives
2022-02-14 18:29:17 +01:00
Thomas Waldmann a2fb9cde4e calc_stats progress display: add archive name 2022-02-14 18:00:02 +01:00
Thomas Waldmann 25e27a1539 info: use a pre12-meta cache to accelerate stats for borg < 1.2 archives
first time borg info is invoked on a borg 1.1 repo, it can take
a rather long time computing and caching some stats values for
1.1 archives, which borg 1.2 archives have in their archive
metadata structure. be patient, esp. if you have lots of old
archives.

following invocations are much faster.
2022-02-14 18:00:02 +01:00
Tomás Andrighetti a2ae36bb54 Exclude directories in is_hardlink_master 2022-02-13 19:23:40 -03:00
Thomas Waldmann 5064ec3c9a fix hardlinkable file type check, fixes #6037 2021-11-16 14:36:43 +01:00
Jim Paris 7a0ffed7f0 create: fix passing device nodes and symlinks to --paths-from-stdin
Paths that come from --paths-from-stdin or --paths-from-command don't
have a parent_fd or name, so we need to use the os_stat helper that
falls back on the full path if those are missing.

Fixes borgbackup/borg#6009
2021-10-14 11:46:10 -04:00
Thomas Waldmann 506c01dc8f import-tar: fix empty user/group name in TarInfo, fixes #5853
if the tar has no information about user/group name (empty string),
we must assign None to Item.user/group (not the empty string).
2021-06-17 15:59:41 +02:00
Thomas Waldmann 4572974218 fix missing parameter in "did not consistently fail" msg, see #5822 2021-06-15 23:10:37 -05:00
Thomas Waldmann b0af91837d minor fixes 2021-06-14 16:03:49 +02:00
Thomas Waldmann fb2efd88fe implement TarfileObjectProcessors similar to FilesystemObjectProcessors 2021-06-14 15:37:58 +02:00
Elmar Hoffmann 938e7f295c add progress indicator for archive check
Depending on the number of archives in a repository, the archive check part
of the check operation can take some time, so it should have a progress
indicator as well.
2021-05-15 23:15:31 +02:00
Thomas Waldmann 76dfd64aba create/recreate: print preliminary file status early, fixes #5417
if we back up stdin / pipes / regular files (or devices with --read-special),
that may take longer, depending on the amount of content data (could be many GiBs).

usually borg announces file status AFTER backing up the file,
when the final status of the file is known.

with this change, borg announces a preliminary file status before
the content data is processed. if the file status changes afterwards,
e.g. due to an error, it will also announce that as final file status.
2021-04-30 20:34:13 +02:00
Romain Vimont 9ddcfaf4f7 info / create --stats: add --iec option
If --iec is passed, then sizes are expressed in powers of 1024
instead of 1000.
2021-04-28 15:17:40 +02:00
Thomas Waldmann dec1664a7e missing / healed chunks: always tell chunk ID, fixes #5704 2021-04-19 23:46:21 +02:00
Thomas Waldmann 6f9b9e5a53 s/numeric_owner/numeric_ids/g 2021-04-16 15:02:16 +02:00
Thomas Waldmann bbccdbd81c mount: implement --numeric-owner (default: False!), fixes #2377
this is different default behaviour than in borg < 1.2:

default (numeric_owner=False) is to use the user/group name from the archive,
look up the local uid / gid and then use that for the FUSE fs.

when --numeric-owner is given (numeric_owner=True), then the uid/gid
from the archive is directly used (as it was the default behaviour in
borg < 1.2).

this was implemented like this (changing the default behaviour) to make
borg mount and borg extract behave more similar considering usage of
user/group numeric archived ids or archived names mapped to corresponding
numeric local system ids.

also, both now use the same function to get the uid/gid from the item.

fuse:
- add user and group name entries to default_dir
- also: set internal_dict(!) of new Item with data from Item.as_dict()
2021-03-07 18:16:23 +01:00
Thomas Waldmann 2211b840a3 verbose files cache logging via --debug-topic=files_cache, fixes #5659 2021-02-28 22:39:44 +01:00
Thomas Waldmann d4971e2819 some micro-opts in stat_ext_attrs 2021-02-16 23:24:05 +01:00
Thomas Waldmann 1b65db990d create/extract: add --noxattrs option, #3955
when given with borg create, borg will not get xattrs from input files (and thus, it will not archive xattrs).

when given with borg extract, borg will not read xattrs from archive and it will not set xattrs on extracted files.
2021-02-16 23:20:28 +01:00
Thomas Waldmann 9412a8430e create/extract: add --noacls option, #3955
when given with borg create, borg will not get ACLs from input files (and thus, it will not archive ACLs).

when given with borg extract, borg will not read ACLs from archive and it will not set ACLs on extracted files.
2021-02-16 22:43:08 +01:00
Manu a84ead8e7c Pass args.log_json to FilesystemObjectProcessors/Statistics instance 2021-02-07 10:42:46 +08:00
Thomas Waldmann 6dc334422e fixup: improve comment about assumptions in the item metadata stream chunker 2021-01-15 21:51:15 +01:00
Thomas Waldmann 8162e2e67b cached_hash is only used in archive, move it there 2021-01-14 20:50:12 +01:00
Thomas Waldmann be257728ca move zeros to constants module 2021-01-14 20:02:18 +01:00
Thomas Waldmann 3b9798cffc remove max_chunk_size (unused) 2021-01-14 19:56:39 +01:00
Thomas Waldmann ef19d937ed use cached_hash also to generate all-zero replacement chunks
at least for major amounts of fixed-size replacement hashes,
this will be much faster. also less memory management overhead.
2021-01-08 23:39:53 +01:00
Thomas Waldmann f3088a9893 rename chunk_to_id_data to cached_hash 2021-01-08 23:39:53 +01:00
Thomas Waldmann 92f221075a refactor recreate to use chunk_to_id_data 2021-01-08 23:39:53 +01:00
Thomas Waldmann b3659e0b8c reuse chunker.zeros for sparse extraction 2021-01-08 23:39:53 +01:00