support reading new, improved hashindex header format, fixes#6960
Bit of a pain to work with that code:
- C code
- needs to still be able to read the old hashindex file format,
- while also supporting the new file format.
- the hash computed while reading the file causes additional problems because
it expects all places in the file get read exactly once and in sequential order.
I solved this by separately opening the file in the python part of the code and
checking for the magic.
BORG_IDX means the legacy file format and legacy layout of the hashtable,
BORG2IDX means the new file format and the new layout of the hashtable.
Done:
- added a version int32 directly after the magic and set it to 2 (like borg 2).
the old header had no version info, but could be denoted as version 1 in case
we ever need it (currently it decides based on the magic).
- added num_empty as indicated by a TODO in count_empty, so it does not need a
full hashtable scan to determine the amount of empty buckets.
- to keep it simpler, I just filled the HashHeader struct with a
`char reserved[1024 - 32];`
1024 being the desired overall header size and 32 being the currently used size.
this alignment might be useful in case we mmap() the hashindex file one day.
borg < 2:
obj = encrypted(compressed(data))
borg 2:
obj = enc_meta_len32 + encrypted(msgpacked(meta)) + encrypted(compressed(data))
handle compr / decompr in repoobj
move the assert_id call from decrypt to RepoObj.parse
also:
- for AEADKeyBase, add a dummy assert_id (not needed here)
- only test assert_id for other if not AEADKeyBase instance
- remove test_getting_wrong_chunk. assert_id is called elsewhere
and is not needed any more anyway with the new AEAD crypto.
- only give manifest (includes key, repo, repo_objs)
- only return manifest from Manifest.load (includes key, repo, repo_objs)
/vagrant/borg/borg/.tox/py36-none/lib/python3.6/site-packages/borg/testsuite/remote.py:73:
PytestDeprecationWarning: @pytest.yield_fixture is deprecated.
Use @pytest.fixture instead; they are the same.
Docs: https://docs.pytest.org/en/stable/warnings.html
our data structures need strict_map_key=False, which is not the
default of msgpack 1.0.0. i made it default in our wrapper API.
call our wrapper for performance profile creation/conversion also
to avoid msgpack compat issues.
remove encoding from wrapper api, we do not use it any more.
remove raw is True check, we need false for profiles
strict_map_key is only supported for msgpack >= 1.0.0.
the problem was that the upper layer code did not have enough information
about the file, whether it is known or not - and thus, could not decide
correctly whether status should be M)odified or A)dded.
now, file_known_and_unchanged method returns an additional "known"
boolean to fix this.
also: add comment about files cache loading in cache_mode='r'
This is a (relatively) simple state machine running in the
data callbacks invoked by the msgpack unpacking stack machine
(the same machine is used in msgpack-c and msgpack-python,
changes are minor and cosmetic, e.g. removal of msgpack_unpack_object,
removal of the C++ template thus porting to C and so on).
Compared to the previous solution this has multiple advantages
- msgpack-c dependency is removed
- this approach is faster and requires fewer and smaller
memory allocations
Testability of the two solutions does not differ in my
professional opinion(tm).
Two other changes were rolled up; _hashindex.c can be compiled
without Python.h again (handy for fuzzing and testing);
a "small" bug in the cache sync was fixed which allocated too
large archive indices, leading to excessive archive.chunks.d
disk usage (that actually gave me an idea).