While doing some doc updates I needed a way to test them - to build
the documentation and inspect the output. I ran into an issue:
running python setup.py build_man was throwing exceptions:
1. The import-tar parser had a None description causing:
File "/home/user/borg/setup_docs.py", line 451, in write_heading
write(char * len(header))
TypeError: object of type 'NoneType' has no len()
2. There was no docs/usage/import-tar.rst causing an exception too
this re-introduces a race between os.path.exists vs. SaveFile creating that file, but due to the way how SaveFile works, it still makes sure that in the end there is a good cache tag file in place.
While reading the docs I noticed that in `borg list` the options --list-format and --format do the same thing. Using `git log -S` I have uncovered that --list-format used to be deprecated and was supposed to be removed in c87393cab7, but you overlooked it and undeprecated it instead. What should we do now? Just remove it or deprecate it again?
BORG_LIBC was added in a4f7e69 to allow borg to work on systems where
ctypes.util,find_library() fails. Since 9914968 borg no longer uses
find_library().
This too should make the scan faster as, assuming the data is
random, we can skip CRC checks for almost 94% of the incorrect
header location solely based on the tag.
As draw back, this will limit the number of tags that can be
added without breaking backwards compatibility to 16, with
13 currently unused.
When an object is corrupted, the start position of the next object
will not be known as the size field belonging to the corrupted
object may be corrupted as well. In order to find the next object
within the segment, the remainder is scanned for the next valid
object, byte-by-byte. An object is considered valid if the CRC
checksum matches the content. However, doing so the scan accepted
any object size that fit within the remainder of the segment. As a
result, in particular when the corruption occurred near the start
of a segment, CRC checksums were calculated for large objects,
often hundreds of megabytes in size, despite the size being limited
to 20 MiB. This change makes it so that CRC calculation is skipped
when the object header indicates an impossible size, thereby,
greatly reducing the number of CPU cycles used for CRC calculations.
In my case, this brought down the time for repair from hours to mere
minutes.
This has also the additional benefit that there is some verification
in addition to the CRC checksum. The 4-bytes checksum is rather
short considering the amount of data that might be in an archive.
Likely fixes the hanging --repair in #5995 also.
Paths that come from --paths-from-stdin or --paths-from-command don't
have a parent_fd or name, so we need to use the os_stat helper that
falls back on the full path if those are missing.
Fixesborgbackup/borg#6009