borg init calls this. If there is a PermissionError, it is
usually fs permission issue at path or its parent directory.
Don't give a traceback, but rather an error msg and a specific exit code.
Also: use ERROR loglevel for these (not WARNING).
A different amount of index entries was already logged as error
and led to "error_found = True" in repository.check.
Different values in the rebuilt index vs. the on-disk index were
only logged on warning level, but did not lead to error_found = True.
Guess there is no reason why these should not be errors and lead to
error_found = True, so this was fixed in this commit.
Minor related change: change report_error function args, so it can be
called like logger.error - including giving a format AND args.
before this fix, borg check --repair just created an
empty shadow index, which can lead to incomplete
entries if entries are added later.
and such incomplete (but present) entries can lead to
compact_segments() resurrecting old PUTs by accidentally
dropping related DELs.
Replacing the internals should make the implementation faster
and simpler since the order tracking is done by the `OrderedDict`.
Furthermore, this commit adds type hints to `LRUCache` and
renames the `upd` method to `replace` to make its use more clear.
Compact moves data to new segments, and then removes the old segments.
When enough segments are moved, directories holding the now cleared segments
may thus become empty.
With this commit any empty directories are cleared after segments compacting.
Fixes#6823
+ os.scandir instead of os.listdir
Improved speed and added flexibility with attributes (name, path, is_dir(), is_file())
+ use is_dir / is_file to make sure we're reading only dirs / files respectively
+ Filtering to particular start, end index range built in
+ Move value bounds of segment (index) into constants module and use them instead
Resolves#7597
(forward patch from commits c9f35a16e9bf9e7073c486553177cef79ff1cb06^..edb5e749f512b7737b6933e13b7e61fefcd17bcb)
not needed for borg2 repos (we derive a new session key for each borg
invocation and start counting from 0).
also not needed for borg 1.x repos because we only read them (borg transfer)
and won't write new encrypted data to them.
One cannot "to not x", but one can "not to x".
Avoiding split infinitives gives the added bonus that machine
translation yields better results.
setup (n/adj) vs set(v) up. We don't "I setup it" but "I set it up".
Likewise for login(n/adj) and log(v) in, backup(n/adj) and back(v) up.
this option did not change behaviour since longer,
we only had kept it for API compatibility.
as a borg2 repo server won't have old clients talking to it,
we can safely remove this everywhere now.
support reading new, improved hashindex header format, fixes#6960
Bit of a pain to work with that code:
- C code
- needs to still be able to read the old hashindex file format,
- while also supporting the new file format.
- the hash computed while reading the file causes additional problems because
it expects all places in the file get read exactly once and in sequential order.
I solved this by separately opening the file in the python part of the code and
checking for the magic.
BORG_IDX means the legacy file format and legacy layout of the hashtable,
BORG2IDX means the new file format and the new layout of the hashtable.
Done:
- added a version int32 directly after the magic and set it to 2 (like borg 2).
the old header had no version info, but could be denoted as version 1 in case
we ever need it (currently it decides based on the magic).
- added num_empty as indicated by a TODO in count_empty, so it does not need a
full hashtable scan to determine the amount of empty buckets.
- to keep it simpler, I just filled the HashHeader struct with a
`char reserved[1024 - 32];`
1024 being the desired overall header size and 32 being the currently used size.
this alignment might be useful in case we mmap() the hashindex file one day.
when using .scan(limit, marker), we used to use the last chunkid from
the previously returned scan result to remember how far we got and
from where we need to continue.
as this approach used the repo index to look up the respective segment/offset,
it was problematic if the code using scan was re-writing the chunk to
a new segment/offset, updating the repo index (e.g. when recompressing a chunk)
and basically destroying the memory about from where we need to continue
scanning.
thus, directly returning (segment, offset) as marker is easier and solves this issue.
otherwise, if we scan+get+put (e.g. if we read/modify/write chunks to
recompress them), it would scan past the last commit and run into the
newly written chunks (and potentially never terminate).
- timezone aware timestamps
- str representation with +HHMM or +HH:MM
- get rid of to_locatime
- fix with_timestamp
- have archive start/end time always in local time with tz or as given
- idea: do not lose tz information
then we know when a backup was made and even from
which timezone it was made. if we want to compute
utc, we can do that using these infos.
this makes a quite nice archives list, with timestamps
as expected (in local time with timezone info).
at some places we just enforce utc, like for the
repo manifest timestamp or for the transaction log,
these are usually not looked at by the user.
since python 3.7, .isoformat() is usable IF timespec != "auto"
is given ("auto" [default] would be as evil as before, sometimes
formatting with, sometimes without microseconds).
also since python 3.7, there is now .fromisoformat().
there was no way to tell the repository version for a remote repo.
borg 2 needs that to reject doing most operations with an old repo,
except the stuff needed for borg transfer.
v2 is the default repo version for borg 2.0.
v1 repos must only be used in a read-only way, e.g. for
--other-repo=V1_REPO with borg init and borg transfer!