Empty index file is most likely a result from an unclean
shutdown in the middle of write, e.g. on ext4 with delayed
allocation enabled (default).
Ignoring such a file would get it recreated by other parts of code,
where as not ignoring it leads to an exception about
not being able to read enough bytes from the index.
this commit fixes#1195
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
do not ignore bad placeholders and just return empty string,
this could have bad consequences, e.g. with --prefix '{invalidplaceholder}':
a typo in the placeholder name would cause the prefix to be the empty string.
in openssl 1.1, the cipher context is opaque, members can not
be accessed directly. we only used this for ctx.iv to determine
the current IV (counter value).
now, we just remember the original IV, count the AES blocks we
process and then compute iv = iv_orig + blocks.
that way, it works on OpenSSL 1.0.x and >= 1.1 in the same way.
found out that xfs is doing stuff behind the scenes: it is pre-allocating 16MB
to prevent fragmentation (in my case, value depends on misc factors).
fixed the test so it just checks that the extracted sparse file uses less (not
necessary much less) space than a non-sparse file would use.
another problem showed up when i tried to verify the holes in the sparse file
via SEEK_HOLE, SEEK_DATA:
after the few bytes of real data in the file, there was another 16MB
preallocated space.
So I ended up checking just the hole at the start of the file.
tested on: ext4, xfs, zfs, btrfs
this was already done in a similar way for item metadata dict validation.
also: check for some more required keys - the old code would crash if 'name' or 'time' key were missing.
the previous check only checked that we got a dict, but did not validate the dict keys.
this triggered issues with e.g. (invalid) integer keys.
now it validates the keys:
- some required keys must be present
- the set of keys is a subset of all valid keys
we need a list of valid item metadata keys. using a list stored in the repo manifest
is more future-proof than the hardcoded ITEM_KEYS in the source code.
keys that are in union(item_keys_from_repo, item_keys_from_source) are considered valid.
when trying to resync and skip invalid data, borg tries to qualify a byte sequence as
valid-looking msgpacked item metadata dict (or not) before even invoking msgpack's unpack.
besides previously hard to understand code, there were 2 issues:
- a missing check for map16 - this type is what msgpack uses if the dict has more than
15 items (could happen in future, not for 1.0.x).
- missing checks for str8/16/32 - str16 is what msgpack uses if the bytestring has more than 31 bytes
(borg does not have that long key names, thus this wasn't causing any harm)
this misqualification (valid data considered invalid) could lead to a wrong resync, skipping valid items.
added more comments and tests.