this was also the loop contents of hashindex_merge, but we also need it callable from Cython/Python code.
this saves some cycles, esp. if the key is already present in the index.
The read_msgpack and write_msgpack functions were only used in one place
each. Since msgpack is read and written in lots of places, having
functions with these generic names is confusing. Also, the helpers
module is quite a mess, so reducing its size seems to be a good idea.
the problem here was that we do not just have changed and unchanged items,
but also a lot of items besides regular files which we just back up "as is" without
determining whether they are changed or not. thus, we can't support changed/unchanged
in a way users would expect them to work.
the A/M/U status only applies to the data content of regular files (compared to the index).
for all items, we ALWAYS save the metadata, there is no changed / not changed detection there.
thus, I replaced this with a --filter option where you can just specify which
status chars you want to see listed in the output.
E.g. --filter AM will only show regular files with A(dded) or M(odified) state, but nothing else.
Not giving --filter defaults to showing all items no matter what status they have.
Output is emitted via logger at info level, so it won't show up except if the logger is at that level.
BUCKET_UPPER_LIMIT: 90% load degrades hash table performance severely,
so I lowered that to 75% (which is a usual value - java uses 75%, python uses 66%).
I chose the higher value of both because we also should not consume too much
memory, considering the RAM usage already is rather high.
MIN_BUCKETS: I can't explain why, but benchmarks showed that choosing 2^N as
table size severely degrades performance (by 3 orders of magnitude!). So a prime
start value improves this a lot, even if we later still use the grow-by-2x algorithm.
hashindex_resize: removed the hashindex_get() call as we already know that the values
come at key + key_size address.
hashindex_init: do not calloc X*Y elements of size 1, but rather X elements of size Y.
Makes the code simpler, not sure if it affects performance.
The tests needed fixing as the resulting hashtable blob is now of course different due
to the above changes, so its sha hash changed.
print_verbose is now simply logger.info() and is always displayed if
log level allows it. this affects only the `prune` and `mount`
commands which were the only users of the --verbose option. the
additional display is which archives are kept and pruned and a single
message when the fileystem is mounted.
files iteration in create and extract is now printed through a
separate function which will be later controled through a topical
flag.