mirror of
https://github.com/borgbackup/borg.git
synced 2024-12-24 16:55:36 +00:00
improve cache / index docs, esp. files cache docs, fixes #1825
This commit is contained in:
parent
df5482d7fc
commit
c8b58e0fd8
1 changed files with 76 additions and 26 deletions
|
@ -252,44 +252,94 @@ For some more general usage hints see also ``--chunker-params``.
|
||||||
Indexes / Caches
|
Indexes / Caches
|
||||||
----------------
|
----------------
|
||||||
|
|
||||||
The **files cache** is stored in ``cache/files`` and is indexed on the
|
The **files cache** is stored in ``cache/files`` and is used at backup time to
|
||||||
``file path hash``. At backup time, it is used to quickly determine whether we
|
quickly determine whether a given file is unchanged and we have all its chunks.
|
||||||
need to chunk a given file (or whether it is unchanged and we already have all
|
|
||||||
its pieces).
|
|
||||||
It contains:
|
|
||||||
|
|
||||||
* age
|
The files cache is a key -> value mapping and contains:
|
||||||
* file inode number
|
|
||||||
* file size
|
|
||||||
* file mtime_ns
|
|
||||||
* file content chunk hashes
|
|
||||||
|
|
||||||
The inode number is stored to make sure we distinguish between
|
* key:
|
||||||
|
|
||||||
|
- full, absolute file path id_hash
|
||||||
|
* value:
|
||||||
|
|
||||||
|
- file inode number
|
||||||
|
- file size
|
||||||
|
- file mtime_ns
|
||||||
|
- list of file content chunk id hashes
|
||||||
|
- age (0 [newest], 1, 2, 3, ..., BORG_FILES_CACHE_TTL - 1)
|
||||||
|
|
||||||
|
To determine whether a file has not changed, cached values are looked up via
|
||||||
|
the key in the mapping and compared to the current file attribute values.
|
||||||
|
|
||||||
|
If the file's size, mtime_ns and inode number is still the same, it is
|
||||||
|
considered to not have changed. In that case, we check that all file content
|
||||||
|
chunks are (still) present in the repository (we check that via the chunks
|
||||||
|
cache).
|
||||||
|
|
||||||
|
If everything is matching and all chunks are present, the file is not read /
|
||||||
|
chunked / hashed again (but still a file metadata item is written to the
|
||||||
|
archive, made from fresh file metadata read from the filesystem). This is
|
||||||
|
what makes borg so fast when processing unchanged files.
|
||||||
|
|
||||||
|
If there is a mismatch or a chunk is missing, the file is read / chunked /
|
||||||
|
hashed. Chunks already present in repo won't be transferred to repo again.
|
||||||
|
|
||||||
|
The inode number is stored and compared to make sure we distinguish between
|
||||||
different files, as a single path may not be unique across different
|
different files, as a single path may not be unique across different
|
||||||
archives in different setups.
|
archives in different setups.
|
||||||
|
|
||||||
The files cache is stored as a python associative array storing
|
Not all filesystems have stable inode numbers. If that is the case, borg can
|
||||||
python objects, which generates a lot of overhead.
|
be told to ignore the inode number in the check via --ignore-inode.
|
||||||
|
|
||||||
The **chunks cache** is stored in ``cache/chunks`` and is indexed on the
|
The age value is used for cache management. If a file is "seen" in a backup
|
||||||
``chunk id_hash``. It is used to determine whether we already have a specific
|
run, its age is reset to 0, otherwise its age is incremented by one.
|
||||||
chunk, to count references to it and also for statistics.
|
If a file was not seen in BORG_FILES_CACHE_TTL backups, its cache entry is
|
||||||
It contains:
|
removed. See also: :ref:`always_chunking` and :ref:`a_status_oddity`
|
||||||
|
|
||||||
* reference count
|
The files cache is a python dictionary, storing python objects, which
|
||||||
* size
|
generates a lot of overhead.
|
||||||
* encrypted/compressed size
|
|
||||||
|
|
||||||
The **repository index** is stored in ``repo/index.%d`` and is indexed on the
|
Borg can also work without using the files cache (saves memory if you have a
|
||||||
``chunk id_hash``. It is used to determine a chunk's location in the repository.
|
lot of files or not much RAM free), then all files are assumed to have changed.
|
||||||
It contains:
|
This is usually much slower than with files cache.
|
||||||
|
|
||||||
* segment (that contains the chunk)
|
The **chunks cache** is stored in ``cache/chunks`` and is used to determine
|
||||||
* offset (where the chunk is located in the segment)
|
whether we already have a specific chunk, to count references to it and also
|
||||||
|
for statistics.
|
||||||
|
|
||||||
|
The chunks cache is a key -> value mapping and contains:
|
||||||
|
|
||||||
|
* key:
|
||||||
|
|
||||||
|
- chunk id_hash
|
||||||
|
* value:
|
||||||
|
|
||||||
|
- reference count
|
||||||
|
- size
|
||||||
|
- encrypted/compressed size
|
||||||
|
|
||||||
|
The chunks cache is a hashindex, a hash table implemented in C and tuned for
|
||||||
|
memory efficiency.
|
||||||
|
|
||||||
|
The **repository index** is stored in ``repo/index.%d`` and is used to
|
||||||
|
determine a chunk's location in the repository.
|
||||||
|
|
||||||
|
The repo index is a key -> value mapping and contains:
|
||||||
|
|
||||||
|
* key:
|
||||||
|
|
||||||
|
- chunk id_hash
|
||||||
|
* value:
|
||||||
|
|
||||||
|
- segment (that contains the chunk)
|
||||||
|
- offset (where the chunk is located in the segment)
|
||||||
|
|
||||||
|
The repo index is a hashindex, a hash table implemented in C and tuned for
|
||||||
|
memory efficiency.
|
||||||
|
|
||||||
The repository index file is random access.
|
|
||||||
|
|
||||||
Hints are stored in a file (``repo/hints.%d``).
|
Hints are stored in a file (``repo/hints.%d``).
|
||||||
|
|
||||||
It contains:
|
It contains:
|
||||||
|
|
||||||
* version
|
* version
|
||||||
|
|
Loading…
Reference in a new issue