mirror of
https://github.com/borgbackup/borg.git
synced 2024-12-22 15:57:15 +00:00
docs: internals: more HashIndex details
This commit is contained in:
parent
cf77aa53d7
commit
19b425a5c8
1 changed files with 30 additions and 3 deletions
|
@ -499,7 +499,11 @@ The chunks cache is a key -> value mapping and contains:
|
||||||
- size
|
- size
|
||||||
- encrypted/compressed size
|
- encrypted/compressed size
|
||||||
|
|
||||||
The chunks cache is a HashIndex_.
|
The chunks cache is a HashIndex_. Due to some restrictions of HashIndex,
|
||||||
|
the reference count of each given chunk is limited to a constant, MAX_VALUE
|
||||||
|
(introduced below in HashIndex_), approximately 2**32.
|
||||||
|
If a reference count hits MAX_VALUE, decrementing it yields MAX_VALUE again,
|
||||||
|
i.e. the reference count is pinned to MAX_VALUE.
|
||||||
|
|
||||||
.. _cache-memory-usage:
|
.. _cache-memory-usage:
|
||||||
|
|
||||||
|
@ -598,9 +602,32 @@ outputs of a cryptographic hash or MAC and thus already have excellent distribut
|
||||||
Thus, HashIndex simply uses the first 32 bits of the key as its "hash".
|
Thus, HashIndex simply uses the first 32 bits of the key as its "hash".
|
||||||
|
|
||||||
The format is easy to read and write, because the buckets array has the same layout
|
The format is easy to read and write, because the buckets array has the same layout
|
||||||
in memory and on disk. Only the header formats differ.
|
in memory and on disk. Only the header formats differ. The on-disk header is
|
||||||
|
``struct HashHeader``:
|
||||||
|
|
||||||
.. todo:: Describe HashHeader
|
- First, the HashIndex magic, the eight byte ASCII string "BORG_IDX".
|
||||||
|
- Second, the signed 32-bit number of entries (i.e. buckets which are not deleted and not empty).
|
||||||
|
- Third, the signed 32-bit number of buckets, i.e. the length of the buckets array
|
||||||
|
contained in the file, and the modulus for index calculation.
|
||||||
|
- Fourth, the signed 8-bit length of keys.
|
||||||
|
- Fifth, the signed 8-bit length of values. This has to be at least four bytes.
|
||||||
|
|
||||||
|
All fields are packed.
|
||||||
|
|
||||||
|
The HashIndex is *not* a general purpose data structure.
|
||||||
|
The value size must be at least 4 bytes, and these first bytes are used for in-band
|
||||||
|
signalling in the data structure itself.
|
||||||
|
|
||||||
|
The constant MAX_VALUE (defined as 2**32-1025 = 4294966271) defines the valid range for
|
||||||
|
these 4 bytes when interpreted as an uint32_t from 0 to MAX_VALUE (inclusive).
|
||||||
|
The following reserved values beyond MAX_VALUE are currently in use (byte order is LE):
|
||||||
|
|
||||||
|
- 0xffffffff marks empty buckets in the hash table
|
||||||
|
- 0xfffffffe marks deleted buckets in the hash table
|
||||||
|
|
||||||
|
HashIndex is implemented in C and wrapped with Cython in a class-based interface.
|
||||||
|
The Cython wrapper checks every passed value against these reserved values and
|
||||||
|
raises an AssertionError if they are used.
|
||||||
|
|
||||||
Encryption
|
Encryption
|
||||||
----------
|
----------
|
||||||
|
|
Loading…
Reference in a new issue