Commit Graph

213 Commits

Author SHA1 Message Date
Thomas Waldmann bf9f42320e repository: sync write file in get_fd
this fixes a strange test failure that did not happen until now:
it could not read the MAGIC bytes from a (quite new) segment file,
it just returned the empty string.

maybe its appearance is related to the removed I/O calls.
2022-06-14 14:48:56 +02:00
Thomas Waldmann 3ce3fbcdff repository index: add payload size (==csize) and flags to NSIndex entries
This saves some segment file random IO that was previously necessary
just to determine the size of to be deleted data.

Keep old one as NSIndex1 for old borg compatibility.
Choose NSIndex or NSIndex1 based on repo index layout from HashHeader.

for an old repo index repo.get(key) returns segment, offset, None, None
2022-06-14 14:48:56 +02:00
Thomas Waldmann ba1f8926cc secure_erase: avoid collateral damage, fixes #6768
if a hardlink copy of a repo was made and a new repo config
shall be saved, do NOT fill in random garbage before deleting
the previous repo config, because that would damage the hardlink
copy.
2022-06-13 15:57:01 +02:00
Thomas Waldmann 8e87f1111b cleanup msgpack related str/bytes mess, fixes #968
see ticket and borg.helpers.msgpack docstring.

this changeset implements the full migration to
msgpack 2.0 spec (use_bin_type=True, raw=False).

still needed compat to the past is done via want_bytes decoder in borg.item.
2022-06-09 17:57:28 +02:00
TW 7b08222256 Merge pull request #6722 from ThomasWaldmann/debug-get-chunk-1.2
borg debug dump-repo-objs --ghost: new --segment=S --offset=O options
2022-05-28 01:42:26 +02:00
Thomas Waldmann 1aba534c5a better error msg for defect or unsupported repo configs, fixes #6566 2022-04-18 09:27:26 +02:00
TW 8d3db4637d
Merge pull request #6564 from ThomasWaldmann/deleted-key-master
load_key: no key is same as empty key, fixes #6441
2022-04-12 19:06:12 +02:00
Jakub Wilk 3a5c79e881 remove stray punctuation from secure-erase message 2022-04-11 18:47:59 +02:00
Thomas Waldmann f5cddf0224 load_key: no key is same as empty key, fixes #6441
when migrating from repokey to keyfile, we just store an empty key into the repo config,
because we do not have a "delete key" RPC api. thus, empty key means "there is no key".

here we fix load_key, so that it does not behave differently for no key and empty key:
in both cases, it just returns an empty value.

additionally, we strip the value we get from the config, so whitespace does not matter.

All callers now check for the repokey not being empty, otherwise RepoKeyNotFoundError
is raised.
2022-04-10 20:58:59 +02:00
Thomas Waldmann 38f390ae45 repository: create and use version 2 repos only for now
for now, this code shall only work on v2 repos (created by this code).

the code to read v1 repos is still present though, so for experiments,
it is possible to change the repo version in the repo config from 1 to
2 manually.

having version 2 in the repo config also avoids that borg < 1.3 is
used on such a repo, which would cause damage:
old borg would not recognize the PUT2 tagged segment entries and
old borg check --repair would likely kill them all due to that.

also: keep repo version in Repository.version
2022-04-09 18:58:47 +02:00
Thomas Waldmann 52f75d7722 repository: implement PUT2: header crc32, overall xxh64, fixes #1704
note: this required a slight increase of MAX_OBJECT_SIZE so that MAX_DATA_SIZE
      could stay the same as before.

For PUT2, compute the hash over the whole entry (header and content, excluding
hash and crc32 fields, because the crc32 computation includes the hash).

Also: refactor crc32 checks into function, use f-strings, structure _read in
a more logical sequential order.

write_put: avoid creating a large temporary bytes object

why use xxh64?
- fast even without hw acceleration
- borg depends on it already anyway
- stronger than crc32 and strong enough for this purpose
2022-04-09 18:58:47 +02:00
Thomas Waldmann c7b1cd56d8 upgrade: remove the "attic backup" repo upgrader and tests
attic is borg's parent project, but it stalled in 2015 and was not updated since then.

guess we can assume that most attic users have meanwhile noticed this and already
converted their repos to borg.

if some did not yet, they are advised to use borg < 1.3 to do that ASAP.

note: borg can still DETECT an attic repo by recognizing its ATTIC_MAGIC value
      and then gives exactly that advice.
2022-04-01 12:41:11 +02:00
Thomas Waldmann cfa34bdf71 repository: simplify LoggedIO._read
Code gets simpler if we always only use the (shorter) header_fmt.
That format ALWAYS applies, to all tags borg writes.

If the tag unpacked from there indicates that there is also a chunkid
to read (like for PUT and DEL), we can decide that inside _read and
then read the chunkid from the fd.
2022-03-31 20:50:55 +02:00
Thomas Waldmann cc3b5c062c remove algorithms package, move checksums module to borg package 2022-03-17 00:24:49 +01:00
Thomas Waldmann 2e536bcbe2 borg key change-location 2022-03-11 23:05:32 +01:00
Thomas Waldmann cbeef56454 pyupgrade --py38-plus ./**/*.py 2022-02-27 20:11:56 +01:00
Thomas Waldmann b292e158a6 rename truncate_and_unlink to safe_unlink
it usually does not truncate any more,
only under "disk full" circumstances and only if there is only one hardlink.
2022-02-15 21:08:34 +01:00
Thomas Waldmann 17e8aef394 compact: not "freeable", but "maybe freeable"
e.g. if there is a ton of DELs in a segment, they all are maybe freeable,
but only if we also got rid of the respective PUTs (see also #6289).
2022-02-12 20:37:28 +01:00
Thomas Waldmann e80b5c2272 compact: derive freed space from quota use before/after, fixes #5679
due to the way quota accounting is done, this is likely not
100% precise, but much better than selling the hints as the truth.
2022-02-12 20:37:18 +01:00
Thomas Waldmann 925daf30b7 fix intermediate commits, shall be at end of segment
compact_segments produced separate 17b files for intermediate commits, although they were intended to be end-of-segment-file commits.

this is because when the intermediate commit is triggered, we are already at an offset beyond the limit.
 thus needed to add the no_new flag to indicate that we do not want a new segment file just for the commit IF it is an intermediate commit.
2022-02-01 19:45:29 +01:00
Thomas Waldmann 57e0724108 repository: fix compactable space computation for empty segment file 2022-01-22 01:32:04 +01:00
Thomas Waldmann f4b9f63856 repository: fix used quota computation
storage_quota_use should reflect current disk space usage (not considering some overheads like for the index etc.).

 if a chunk is deleted, but the segment file containing the chunk is not yet compacted, the chunk's disk space is still in use!

 when compact_segments is dropping the unused chunks, it is the right time to reduce storage_quota_use.

 storage_quota_use includes the put header overhead.
2022-01-22 01:27:23 +01:00
Peter Gerber 6c21404143
Validate tag ID when --repair[ing] an object
This too should make the scan faster as, assuming the data is
random, we can skip CRC checks for almost 94% of the incorrect
header location solely based on the tag.

As draw back, this will limit the number of tags that can be
added without breaking backwards compatibility to 16, with
13 currently unused.
2021-10-28 14:13:37 +00:00
Peter Gerber 2bc91e5010
Speed up search for next valid object in segment in --repair mode
When an object is corrupted, the start position of the next object
will not be known as the size field belonging to the corrupted
object may be corrupted as well. In order to find the next object
within the segment, the remainder is scanned for the next valid
object, byte-by-byte. An object is considered valid if the CRC
checksum matches the content. However, doing so the scan accepted
any object size that fit within the remainder of the segment. As a
result, in particular when the corruption occurred near the start
of a segment, CRC checksums were calculated for large objects,
often hundreds of megabytes in size, despite the size being limited
to 20 MiB. This change makes it so that CRC calculation is skipped
when the object header indicates an impossible size, thereby,
greatly reducing the number of CPU cycles used for CRC calculations.
In my case, this brought down the time for repair from hours to mere
minutes.

This has also the additional benefit that there is some verification
in addition to the CRC checksum. The 4-bytes checksum is rather
short considering the amount of data that might be in an archive.

Likely fixes the hanging --repair in #5995 also.
2021-10-28 10:59:11 +00:00
Thomas Waldmann d44836a865 config: accept non-int value for max_segment_size
borg config REPO max_segment_size 500M

note: when setting a non-int value for this in a repo config, using the repo will require borg >= 1.1.16.
2021-02-28 22:28:58 +01:00
Thomas Waldmann 99aa15b850 config: accept non-int value for storage_quota
borg config REPO storage_quota 100G

note: when setting a non-int value for this in a repo config, using the repo will require borg >= 1.1.16.
2021-02-28 22:27:48 +01:00
Thomas Waldmann 3d0c61a184 revert incorrect fix for put updating shadow_index, fixes #5661
A) the compaction code needs the shadow index only for this case:

segment A: PUT x, segment B: DEL x, with A < B  (DEL shadows the PUT).

B) for the following case, we have no shadowing DEL (or rather: it does not matter,
because there is a PUT right after the DEL) and x is in the repo index,
thus the shadow_index is not needed for the special case in the compaction code:

segment A: PUT x, segment B: DEL x PUT x

see also PR #5636.

reverts f079a83fed
and clarifies the code by more comments.

we keep the code deduplication of 5f32b5666a
and just add a update_shadow_index param to make it not look like there was
something accidentally forgotten, which was the whole reason for the reverted
"fix".
2021-02-04 02:29:43 +01:00
Thomas Waldmann f079a83fed fix updating shadow_index also in put
The shadow_index should be in same state after both of these sequences
(let's assume that A is not in repo yet for simplicity, but it does not matter):

a) explicit delete: put(A), delete(A), put(A), resulting in: PUT A, DEL A, PUT A repo contents

b) implicit delete: put(A), put(A), resulting in: PUT A, DEL A, PUT A repo contents
2021-01-29 17:05:01 +01:00
Thomas Waldmann 5f32b5666a deduplicate code of put and delete, no functional change 2021-01-29 17:05:01 +01:00
Thomas Waldmann 6f00b025d8 remove empty shadowed_segments lists, fixes #5275
also:
- add test for removed empty shadowed_segments list
- add some comments
- add repo_dump test debug tool
2021-01-29 15:44:49 +01:00
Andrea Gelmini 72e7c46fa7 Fix typos 2021-01-07 17:54:33 +01:00
Thomas Waldmann f2cb17d66c check: debug log segment filename 2021-01-03 18:23:52 +01:00
Dan Hipschman 1a94c2e27a Allow EIO with warning when trying to hardlink 2020-11-01 14:26:56 -08:00
Thomas Waldmann bf8706b741 fixup: invert nesting of context managers
cleaner teardown of contexts:

close mmap, close src_fd (reading), close dst_fd (and rename)

maybe it was not a real problem to rename a still open-for-reading / mmapped file,
but in any case it is cleaner like now.
2020-09-08 18:26:03 +02:00
Thomas Waldmann b198160257 check --repair: fix potential data loss, fixes #5325
We already have used SaveFile context manager since long at other places.
By using it, the original segment file stays in place until recovery of it
is completed (writing/syncing into *.tmp).
On successful completion, .tmp is renamed over original + dir syncing.
If aborted by some exception, including Ctrl-C, the original file is unmodified.
2020-09-08 18:25:36 +02:00
Thomas Waldmann 7bfa766192 persist shadow_index in between borg runs, fixes #4830
in borg 1.1, compact_segments() was always run directly after some repo writing
operation (in same borg process). but now, only "borg compact" is used to compact
segments and it is a separate borg invocation (new process), so we need to persist
the shadow_index so we do not lose that information.
2020-07-28 21:15:56 +02:00
finefoot e49a17143d Add option to bypass locking mechanism 2020-04-11 17:04:52 +02:00
Thomas Waldmann dd7c08ae91 do not emit warning headline, there might be no mismatches to report
instead, use a slightly different format for the warnings themselves.
2020-03-09 21:48:46 +01:00
Thomas Waldmann d124cf0761 check: improve error output for matching index size, see #4829
if the rebuilt index size matched the on-disk index size AND there
was a difference in e.g. 1 key, the old code only output the key/value
for one index, but not what is present in the other index.

we already had better code in the branch for different index sizes,
so just use that for both cases.

additionally we tell when the index size matches (new) because we
also tell if there is a mismatch.
2020-03-09 21:47:03 +01:00
TW 6520fa2bb7
Merge pull request #5009 from ThomasWaldmann/fix-commit-freespace-calc-missing-segment-file-master
commit-time free space calc: ignore bad compact map entries, fixes #4796
2020-03-09 16:01:56 +01:00
Thomas Waldmann d5a1979d87 commit-time free space calc: ignore bad compact map entries, fixes #4796
at least it does not crash now when committing.

the question why the compact map points to a missing segment file
is not answered yet, there might be another problem...
2020-03-09 00:16:32 +01:00
Thomas Waldmann 2211aaab48 fix crash when upgrading erroneous hints file, fixes #4922
if an old hints file gets converted to the new format and it
has entries referring to non-existent segment files, a crash
occurred.

with this code, the crash is avoided and the erroneous hints
entry is removed.
2020-03-09 00:08:39 +01:00
TW 597b09a993 support platforms with no os.link (#4903)
support platforms with no os.link, fixes #4901

if we don't have os.link, we just extract another copy instead of making a hardlink.

for that to work, we need to have (and keep) the chunks list in hardlink_masters.
2020-03-03 23:34:54 -05:00
Thalian 2209f56cd5 Feature/4674 compact threshold (#4798)
compact: add --threshold option, fixes #4674
2019-10-24 10:12:58 +02:00
Thomas Waldmann 851db7fe21 ignore EACCES (errno 13) when hardlinking, fixes #4730
we create the hardlink to be able to secure erase the old config file.

if we can't do that because there is just a problem with hardlinks not
working, the old config will be just overwritten normally (not secure
erased). the user will get a warning in that case, but other than that,
the overall borg operation will succeed.

if there is a bigger problem (like a general lack of permissions or a
general issue with the underlying fs), subsequent operations will fail.
2019-10-03 15:19:03 +02:00
TW 373bd8abd3
Merge pull request #4696 from jrast/win10
WIP jrast/borg:win10, PR for better review and testing
2019-08-25 22:41:05 +02:00
Jürg Rast 6b426d08d7 Initial work to build and run borg under windows
- Created a batch file to build borg on windows
- Adjusted setup.py to be runnable on windows and build the windows
extension
- Extracted the free space check to a function in the platform module
- Created the minimal needed (dummy) functions for the windows platform
module
2019-08-24 10:17:18 +02:00
Thomas Waldmann 8b49c4d2df Repository.check_can_create_repository: use stat() to check
similar issue as #4695.

(cherry picked from commit 4911720faf)
2019-08-09 15:10:15 +05:30
Thomas Waldmann bb7a9e6c20 Repository.open: use stat() to check for repo dir, fixes #4695
(cherry picked from commit ec3fad0f85)
2019-08-09 15:09:48 +05:30
Thomas Waldmann 8b75dde0fa compact: log freed space at INFO level
note: correctness of value depends on correctness/completeness of
repository.compact datastructure.
2019-05-06 22:47:25 +02:00
user062 a83739fda8 give invalid repo error msg if repo config not found, fixes #4411
if the repo config is not there, we definitely have a invalid repo.

for other problems (like permission issues), we'll just let it blow
up with a traceback, so the user can see what the precise problem is.
2019-04-20 17:36:30 +02:00
Thomas Waldmann 6ae5530507 lrucache: regularly remove old FDs, fixes #4427 2019-03-11 02:38:24 +01:00
TW d493806e5c
incremental repo check (#4422)
incremental repo check, fixes #1657
2019-03-10 20:21:22 +01:00
Thomas Waldmann 7ad5290501 redo stale lock handling, fixes #3986
drop BORG_HOSTNAME_IS_UNIQUE (please use BORG_HOST_ID if needed)

borg now always assumes it has a unique hostid - either automatically
from fqdn plus uuid.getnode() or overridden via BORG_HOST_ID.
2019-03-04 21:07:05 +01:00
Thomas Waldmann 25264dce1f compact: require >10% freeable space in a segment, fixes #2985
before this, it over-eagerly compacted "small" segments ("small"
being < 100MB by default) if there were only a few bytes to be freed.

also:
- improve debug logging
- as compaction is a separate borg command now, use the module logger
2019-02-22 16:18:41 +01:00
Thomas Waldmann 600e798201 borg init --make-parent-dirs parent1/parent2/repo_dir, fixes #4235 2019-02-04 17:12:11 +01:00
TW 422d9cf170
Merge pull request #4275 from ThomasWaldmann/fix-empty-segment-crash-master
recover_segment: handle too small segment files correctly, see #4272
2019-02-02 00:06:44 +01:00
Manu c3a882b509 Use f_frsize instead of f_bsize to calculate free space. Fixes #4289 2019-01-31 14:26:21 +08:00
Thomas Waldmann 2c94d5ba58 recover_segment: handle too small segment files correctly, see #4272
nothing left to recover there, but at least we must not crash in mmap().
2019-01-29 19:21:51 +01:00
TW 2bcff382cb
Merge pull request #4247 from ThomasWaldmann/memoryview-cm
correctly release memoryview
2019-01-29 15:53:47 +01:00
Thomas Waldmann b4c68de128 avoid diaper pattern in configparser by opening files, fixes #4263
this will fail early with correct error msg / exception traceback
if a config file is not readable.
2019-01-27 03:28:11 +01:00
Thomas Waldmann 78361744ea keep "data" as is, use "d" for slices
so that the data.release() call is on the original memoryview and
also we can delete the last reference to a slice of it first.
2019-01-25 02:09:00 +01:00
Thomas Waldmann 2910d13055 use try/finally to ensure correct memoryview release
see #4243.
2019-01-25 02:09:00 +01:00
Thomas Waldmann 02f3daebbe use a contextmanager to ensure correct memoryview release
see #4243.
2019-01-25 02:09:00 +01:00
Emmo Emminghaus 733a2bfa30 Introduce borg.platformflags.is_<os> 2018-11-10 23:34:43 +01:00
Emmo Emminghaus 558ca61d20 remove posix issues and fixup for unsupported methodes 2018-11-10 21:48:46 +01:00
Thomas Waldmann d6cb39a6d6 implement borg debug dump-repo-objs --ghost
intended as a last resort measure to export all segment file contents
in a relatively easy to use format.

if you want to dig into a damaged repo (e.g. missing segment files,
missing commits) and you know what you do.

note: dump-repo-objs --ghost must not use repo.list()

because this would need the repo index and call get_transaction_id and
check_transaction methods, which can easily fail on a damaged repo.

thus we use the same low level scan method as we use anyway to get
some encrypted piece of data to setup the decryption "key".

(cherry picked from commit 8738e85967)
2018-08-09 08:29:34 +02:00
Thomas Waldmann 3c173cc03b wrap msgpack, fixes #3632, fixes #2738
wrap msgpack to avoid future upstream api changes making troubles
or that we would have to globally spoil our code with extra params.

make sure the packing is always with use_bin_type=False,
thus generating "old" msgpack format (as borg always did) from
bytes objects.

make sure the unpacking is always with raw=True,
thus generating bytes objects.

note:

safe unicode encoding/decoding for some kinds of data types is done in Item
class (see item.pyx), so it is enough if we care for bytes objects on the
msgpack level.

also wrap exception handling, so borg code can catch msgpack specific
exceptions even if the upstream msgpack code raises way too generic
exceptions typed Exception, TypeError or ValueError.
We use own Exception classes for this, upstream classes are deprecated
2018-08-06 17:32:55 +02:00
Thomas Waldmann 3715d2da3e slightly refactor write_commit using new "want_new" flag 2018-07-14 14:29:28 +02:00
Thomas Waldmann 1f387d911a start new segment file for put/del to MANIFEST_ID
specialcase deleting / writing the manifest to be in a separate, new
segment file, so that when we supersede and compact it later, less
segment data has to be shuffled around - compaction can then just
delete this segment file and that's all.
2018-07-14 14:29:28 +02:00
Thomas Waldmann 755eaeec0a borg compact --cleanup-commits to get rid of leftover 17byte segments
see #2850.
2018-07-14 14:29:28 +02:00
Marian Beermann aeef082483 repository: track commits in hints 2018-07-14 14:29:28 +02:00
Thomas Waldmann de4afa097c separate borg compact command, fixes #2195 2018-07-14 14:29:28 +02:00
Thomas Waldmann 5b5546d7e9 avoid stale filehandle issues, fixes #3265 2018-06-24 01:29:15 +02:00
TW 3e2d5b2b22
Merge pull request #3581 from ThomasWaldmann/borg-config-validation
borg config: add some validation, fixes #3566
2018-03-05 23:40:12 +01:00
Thomas Waldmann 0e0e6da585 make sure all segment file offsets fit into uint32, fixes #3592
C code and the repo index use uint32 type for segment file offsets,
so when opening a repo and the config max_segment_size is too big,
fail early.

Also disallow setting a too big value via "borg config".
2018-03-05 17:50:53 +01:00
Thomas Waldmann fe65ccf95a be more clear in secure-erase warning message, fixes #3591 2018-03-05 15:21:17 +01:00
Josh Holland 9f400633f2 Correct some confusing error messages from `borg init` (#3485)
init: more clear exception messages for borg create, fixes #3465

also: refactor
2017-12-29 01:15:07 +01:00
Thomas Waldmann 203a5c8f19 catch ENOTSUP for os.link, fixes #3107 2017-10-10 01:57:58 +02:00
TW 67cb76809a Merge pull request #2998 from ThomasWaldmann/fix-2994
fix .isoformat() issues
2017-09-07 14:54:46 +02:00
Thomas Waldmann 928bde8676 get rid of datetime.isoformat to avoid bugs like #2994 2017-09-07 14:11:07 +02:00
Thomas Waldmann 7122913825 repo cleanup/write: invalidate cached FDs 2017-09-06 06:11:39 +02:00
TW 86c0b66de3 Merge pull request #2988 from ThomasWaldmann/recover-segments-memory-usage
recover_segment: use mmap(), fixes #2982
2017-09-02 17:48:04 +02:00
Thomas Waldmann 9fc4d00bf6 recover_segment: use mmap(), fixes #2982 2017-09-01 05:26:27 +02:00
Thomas Waldmann 57f808e4bb add debug logging for repository cleanup
so we can know whether it did a cleanup and if so,
which and how many segments were cleaned up.
2017-08-31 22:49:30 +02:00
enkore 11653d8bc2 Merge pull request #2920 from lfos/detect-attic-repos
Detect non-upgraded Attic repositories
2017-08-16 17:47:02 +02:00
Lukas Fleischer 0943b322e3 Detect non-upgraded Attic repositories
When opening a repository, always try to read the magic number of the
latest segment and compare it to the Attic segment magic (unless the
repository is opened for upgrading). If an Attic segment is detected,
raise a dedicated exception, telling the user to upgrade the repository
first.

Fixes #1933.
2017-08-15 19:58:30 +02:00
Thomas Waldmann 6f94949a36 migrate locks to child PID when daemonize is used
also:

increase platform api version due to change in get_process_id behaviour.
2017-08-08 03:46:44 +02:00
Thomas Waldmann b7b428edc2 repository: fix assert expression to not have a side effect
lgtm:
This 'assert' statement contains an expression which may have side effects.
2017-07-22 01:51:19 +02:00
Thomas Waldmann 89f3cab6cd move get_limited_unpacker to helpers
also: move some constants to borg.constants
2017-06-25 23:36:28 +02:00
Marian Beermann 8aa745ddbd create: --no-cache-sync 2017-06-18 02:01:26 +02:00
Andrea Gelmini e4247cc0d2 Fix typos 2017-06-09 16:49:30 +02:00
Marian Beermann 1135114520 helpers: truncate_and_unlink doc 2017-06-06 19:52:08 +02:00
Marian Beermann ed0a5c798f platform.SaveFile: truncate_and_unlink temporary
SaveFile is typically used for small files where this is not
necessary. The sole exception is the files cache.
2017-06-06 18:13:20 +02:00
Marian Beermann 95064cd241 repository: truncate segments before unlinking 2017-06-06 17:21:45 +02:00
Marian Beermann 54e023c75a repository: add complementary index corruption test 2017-06-02 21:44:45 +02:00
Marian Beermann 2e067a7ae8 repository: add refcount corruption test 2017-06-02 21:44:45 +02:00
Marian Beermann f61ee038d0 repository: checksum index and hints 2017-06-02 21:44:45 +02:00
Marian Beermann 6c91a750d1 algorithms: rename crc32 to checksums 2017-06-01 21:26:42 +02:00
Marian Beermann 4edf77788d Implement storage quotas 2017-05-31 18:36:03 +02:00