improve cache / index docs, esp. files cache docs, fixes #1825

2024-12-24 16:55:36 +00:00 · 2016-11-24 01:53:23 +01:00 · 2016-11-24 01:53:23 +01:00 · c8b58e0fd8
commit c8b58e0fd8
parent df5482d7fc
1 changed files with 76 additions and 26 deletions
--- a/docs/internals.rst
+++ b/docs/internals.rst
@ -252,44 +252,94 @@ For some more general usage hints see also ``--chunker-params``.
 Indexes / Caches
 ----------------
-The **files cache** is stored in ``cache/files`` and is indexed on the
+The **files cache** is stored in ``cache/files`` and is used at backup time to
-``file path hash``. At backup time, it is used to quickly determine whether we
+quickly determine whether a given file is unchanged and we have all its chunks.
 need to chunk a given file (or whether it is unchanged and we already have all
 its pieces).
 It contains:
-* age
+The files cache is a key -> value mapping and contains:
 * file inode number
 * file size
 * file mtime_ns
 * file content chunk hashes
-The inode number is stored to make sure we distinguish between
+* key:
  - full, absolute file path id_hash
 * value:
  - file inode number
  - file size
  - file mtime_ns
  - list of file content chunk id hashes
  - age (0 [newest], 1, 2, 3, ..., BORG_FILES_CACHE_TTL - 1)
 To determine whether a file has not changed, cached values are looked up via
 the key in the mapping and compared to the current file attribute values.
 If the file's size, mtime_ns and inode number is still the same, it is
 considered to not have changed. In that case, we check that all file content
 chunks are (still) present in the repository (we check that via the chunks
 cache).
 If everything is matching and all chunks are present, the file is not read /
 chunked / hashed again (but still a file metadata item is written to the
 archive, made from fresh file metadata read from the filesystem). This is
 what makes borg so fast when processing unchanged files.
 If there is a mismatch or a chunk is missing, the file is read / chunked /
 hashed. Chunks already present in repo won't be transferred to repo again.
 The inode number is stored and compared to make sure we distinguish between
 different files, as a single path may not be unique across different
 archives in different setups.
-The files cache is stored as a python associative array storing
+Not all filesystems have stable inode numbers. If that is the case, borg can
-python objects, which generates a lot of overhead.
+be told to ignore the inode number in the check via --ignore-inode.
-The **chunks cache** is stored in ``cache/chunks`` and is indexed on the
+The age value is used for cache management. If a file is "seen" in a backup
-``chunk id_hash``. It is used to determine whether we already have a specific
+run, its age is reset to 0, otherwise its age is incremented by one.
-chunk, to count references to it and also for statistics.
+If a file was not seen in BORG_FILES_CACHE_TTL backups, its cache entry is
-It contains:
+removed. See also: :ref:`always_chunking` and :ref:`a_status_oddity`
-* reference count
+The files cache is a python dictionary, storing python objects, which
-* size
+generates a lot of overhead.
 * encrypted/compressed size
-The **repository index** is stored in ``repo/index.%d`` and is indexed on the
+Borg can also work without using the files cache (saves memory if you have a
-``chunk id_hash``. It is used to determine a chunk's location in the repository.
+lot of files or not much RAM free), then all files are assumed to have changed.
-It contains:
+This is usually much slower than with files cache.
-* segment (that contains the chunk)
+The **chunks cache** is stored in ``cache/chunks`` and is used to determine
-* offset (where the chunk is located in the segment)
+whether we already have a specific chunk, to count references to it and also
 for statistics.
 The chunks cache is a key -> value mapping and contains:
 * key:
  - chunk id_hash
 * value:
  - reference count
  - size
  - encrypted/compressed size
 The chunks cache is a hashindex, a hash table implemented in C and tuned for
 memory efficiency.
 The **repository index** is stored in ``repo/index.%d`` and is used to
 determine a chunk's location in the repository.
 The repo index is a key -> value mapping and contains:
 * key:
  - chunk id_hash
 * value:
  - segment (that contains the chunk)
  - offset (where the chunk is located in the segment)
 The repo index is a hashindex, a hash table implemented in C and tuned for
 memory efficiency.
 The repository index file is random access.
 Hints are stored in a file (``repo/hints.%d``).
 It contains:
 * version