mirror/borg - Z·K·N·T GIT

Commit Graph

Author	SHA1	Message	Date
Dan Christensen	8eff0b2f02	_hashindex.c: fix comment	2023-02-12 08:24:27 -05:00
Dan Christensen	36aa395e49	_hashindex.c: set min_empty and num_empty even when permit_compact=True	2023-02-11 20:11:09 -05:00
Dan Christensen	60bea46eb7	_hashindex.c: rewrite hashindex_compact	2023-02-11 20:10:02 -05:00
Thomas Waldmann	3d57dc0590	hashindex: add tests, misc. minor fixes/changes test hashtable expansion/rebuild. hashindex_lookup: - return -2 for a compact / completely full hashtable - return -1 and (via start_idx pointer) the deleted/tombstone bucket index. fix size assertion (we add 1 element to trigger rebuild) fix upper_limit check - since we'll be adding 1 to num_entries below, the condition should be >=: hashindex_compact: set min_empty/upper_limit Co-authored-by: Dan Christensen <jdc+github@uwo.ca>	2023-02-09 22:14:07 +01:00
Thomas Waldmann	316bb7c937	add num_entries assertion	2023-02-08 22:45:11 +01:00
Thomas Waldmann	b79766a933	hashindex: simplify size_idx function Thanks to @jdchristensen for the code.	2023-02-08 22:11:49 +01:00
Thomas Waldmann	a13d53ec1e	Simplify full HT scan assertion	2023-02-08 22:11:47 +01:00
Thomas Waldmann	4fc7815f11	hashindex: always have at least 1 empty bucket avoid rounding / integer conversion issues bringing this down to 0.	2023-02-08 22:11:44 +01:00
Thomas Waldmann	4f8c4aea19	implement ht idx wrap around less strangely, add comment	2023-02-08 22:11:43 +01:00
Thomas Waldmann	907da00931	if HT is full with entries and tombstones: give up/fail early	2023-02-08 22:11:41 +01:00
Thomas Waldmann	0098ac9e63	more comments for hashindex_lookup	2023-02-08 22:11:34 +01:00
Thomas Waldmann	9bf352d00c	bugfix: do not resize hashindex with wrong num_empty otherwise we would lose the decrement operation on num_empty.	2023-02-08 22:10:50 +01:00
Thomas Waldmann	83e6b4269e	hashindex: simplify assert	2023-02-08 16:25:15 +01:00
Thomas Waldmann	c9573c04ac	_hashindex: easier to understand code, dubious loops removed, asserts hashindex_index returns the perfect hashtable index, but does not check what's in the bucket there, so we had these loops afterwards to search for an empty or deleted bucket. problem: if the HT were completely filled with no empty and no deleted buckets, that loop would never end. due to our HT resizing, it can never happen, but still not pretty. when using hashindex_lookup (as also used some lines above), the code is easier to understand, because (after we resized the HT), we freshly create the same situation as after the first call of that function: - return value < 0, because we (still) can not find the key - start_idx will point to an empty bucket Thus, we do not need the problematic loops we had there. Modified the checks to make sure we really have an empty or deleted bucket before overwriting it with data. Added some additional asserts to make sure the code behaves.	2023-02-08 14:51:54 +01:00
TW	c29d4a096b	Hashindex header work, fixes #6960 (#7064 ) support reading new, improved hashindex header format, fixes #6960 Bit of a pain to work with that code: - C code - needs to still be able to read the old hashindex file format, - while also supporting the new file format. - the hash computed while reading the file causes additional problems because it expects all places in the file get read exactly once and in sequential order. I solved this by separately opening the file in the python part of the code and checking for the magic. BORG_IDX means the legacy file format and legacy layout of the hashtable, BORG2IDX means the new file format and the new layout of the hashtable. Done: - added a version int32 directly after the magic and set it to 2 (like borg 2). the old header had no version info, but could be denoted as version 1 in case we ever need it (currently it decides based on the magic). - added num_empty as indicated by a TODO in count_empty, so it does not need a full hashtable scan to determine the amount of empty buckets. - to keep it simpler, I just filled the HashHeader struct with a `char reserved[1024 - 32];` 1024 being the desired overall header size and 32 being the currently used size. this alignment might be useful in case we mmap() the hashindex file one day.	2022-10-02 14:35:21 +02:00
Thomas Waldmann	350393c9fd	remove unused imports	2022-07-05 00:05:07 +02:00
Thomas Waldmann	2157a35212	hashindex_compact: fix eval order (check idx before use), fixes #5899 also: fix "off by one" comment	2022-06-29 20:12:29 +02:00
James Buren	5f3d61e2f0	src/borg/_hashindex.c: fix compiler warnings The value argument of hashindex_set is causing harmless pointer type mismatches. This resolves the issue by changing the type to void* which silences these types of warnings.	2022-02-26 14:02:29 -06:00
Thomas Waldmann	fa63150e14	fix bug in hashindex_set on resize, fixes #4829 the problem was that after a resize that was triggered by too few empty buckets, the rebuilt new hash table had entries at different positions than before, but the idx where to SET the entry was not recomputed afterwards.	2020-03-01 17:47:12 +01:00
Thomas Waldmann	7e7ea69e92	add comment about hashtable sizes, fixes #2830	2019-03-20 14:35:05 +01:00
motwok	78a3f2475e	Basic MSC Compatibility (#4147 ) MSC Compatibility (Native Compile on Windows)	2018-11-03 18:52:54 +01:00
Emmo Emminghaus	b2c3bc4d9e	removing accidental change	2018-11-02 14:10:23 +01:00
Emmo Emminghaus	718abfd6f8	cleanup void * in _hashindex.c (2nd)	2018-10-27 17:45:10 +02:00
Emmo Emminghaus	9d1276af04	cleanup void * in _hashindex.c	2018-10-27 17:34:25 +02:00
Thomas Waldmann	ce4d248e0b	add debug_print macro for hashindex debugging, see #3807	2018-05-19 22:26:56 +02:00
Thomas Waldmann	a3cecf599f	add parens for C preprocessor macro argument usages this is needed for correctness because the preprocessor is just doing text replacement. This is the correct way: #define MUL(x, y) ((x) * (y)) MUL(1+2, 3-4) -> ((1+2) * (3-4)) # not: (1+2 * 3-4) I didn't put parens around all arg usages for readability. Some stuff (like index) is not expected to be an expression. Also, when a arg is only used in another macro or function call, no parens are needed either. I reviewed the code: no harm was done (yet) due to this fault. Thanks to @rciorba who found this.	2017-12-13 04:01:59 +01:00
Radu Ciorba	12e0f55991	replace modulo with if to check for wraparound in hashmap Integer division is slow, and this improves the speed of all operations on the hashmap. Benchmarked this patch on the rciorba/master-bench branch: `9e5d61e03c/results.html`	2017-07-20 13:22:34 +03:00
Marian Beermann	019a258709	create _endian.h	2017-07-11 19:12:19 +02:00
Marian Beermann	9a856533ba	fuse: versions view, linear numbering by archive time	2017-07-03 12:38:10 +02:00
Marian Beermann	9827578df5	hashindex: don't pass side effect into macro Py_XDECREF and friends are explicitly written to use op only once in CPython (and other code relies on this, Py_XDECREF(something()) is fairly common), but other implementations don't guarantee this. So, let's make a rule: don't pass side effects into macros, full stop.	2017-07-02 15:20:52 +02:00
Thomas Waldmann	72ef24cbc0	hashindex: implement KeyError	2017-06-17 23:25:32 +02:00
Marian Beermann	783a5926d6	cache sync: introduce BORG_NO_PYTHON textshell edition	2017-06-11 20:23:17 +02:00
enkore	13f396d5ad	Merge pull request #2638 from enkore/f/fastcachesync-minify Compact chunks.archive.d	2017-06-10 17:13:25 +02:00
enkore	d06ee5648c	Merge pull request #2572 from enkore/f/fastcachesync Improve cache sync speed	2017-06-10 17:12:51 +02:00
Marian Beermann	1d5d50463c	hashindex_compact: use memmove for possibly overlapping copy	2017-06-09 12:23:27 +02:00
Marian Beermann	6e011b9354	cache: compact hashindex before writing to chunks.archive.d	2017-06-09 12:23:26 +02:00
Marian Beermann	b75c214af5	hashindex: Cython defines PY_SSIZE_T_CLEAN	2017-06-08 15:16:52 +02:00
Marian Beermann	c786a5941e	CacheSynchronizer: redo as quasi FSM on top of unpack.h This is a (relatively) simple state machine running in the data callbacks invoked by the msgpack unpacking stack machine (the same machine is used in msgpack-c and msgpack-python, changes are minor and cosmetic, e.g. removal of msgpack_unpack_object, removal of the C++ template thus porting to C and so on). Compared to the previous solution this has multiple advantages - msgpack-c dependency is removed - this approach is faster and requires fewer and smaller memory allocations Testability of the two solutions does not differ in my professional opinion(tm). Two other changes were rolled up; _hashindex.c can be compiled without Python.h again (handy for fuzzing and testing); a "small" bug in the cache sync was fixed which allocated too large archive indices, leading to excessive archive.chunks.d disk usage (that actually gave me an idea).	2017-06-02 17:43:15 +02:00
Marian Beermann	d463dd89aa	hashindex: read/write: use hash_part for HashHeader	2017-05-25 17:44:01 +02:00
enkore	6cd7d415ca	hashindex: Use Python I/O (#2496 ) - Preparation for #1688 / #1101 - Support hash indices >2 GB - Better error reporting	2017-05-09 21:30:14 +02:00
Thomas Waldmann	fd0649767a	hashindex: rebuild hashtable if we have too little empty buckets, fixes #2246 if there are too many deleted buckets (tombstones), hashtable performance goes down the drain. in the worst case of 0 empty buckets and lots of tombstones, this results in full table scans for new / unknown keys. thus we make sure we always have a good amount of empty buckets.	2017-03-04 03:09:18 +01:00
TW	c6ea34be96	Merge pull request #2111 from ThomasWaldmann/merge-1.0-maint Merge 1.0-maint	2017-02-01 12:13:37 +01:00
Radu Ciorba	a85cf75465	fix wrong skip_hint on hashindex_set when encountering tombstones hashindex_lookup would always hint at skipping whatever it's probe length had been with no regard for tombstones it had encountered. This meant new keys would not overwrite first tombstones, but would always land on empty buckets. The regression was introduced in #1748	2017-01-30 23:29:08 +02:00
Thomas Waldmann	c0dc644ef6	Merge branch '1.0-maint' into merge-1.0-maint # Conflicts: # MANIFEST.in # Vagrantfile # docs/changes.rst # docs/usage/mount.rst.inc # src/borg/archiver.py # src/borg/fuse.py # src/borg/repository.py	2017-01-29 05:49:53 +01:00
Radu Ciorba	766a43afc3	shortcut hashindex_set by having hashindex_lookup hint about address	2016-10-22 12:33:15 +03:00
Marian Beermann	e9a73b808f	Check for sufficient free space before committing	2016-07-30 00:04:27 +02:00
Thomas Waldmann	3baa8a3728	Merge branch '1.0-maint' # Conflicts: # docs/changes.rst # docs/usage/mount.rst.inc # src/borg/archive.py # src/borg/archiver.py # src/borg/fuse.py # src/borg/testsuite/archiver.py	2016-07-11 01:23:27 +02:00
Thomas Waldmann	d1ea925a5b	move borg package to src/	2016-05-05 20:19:50 +02:00

48 Commits