mirror of
https://github.com/borgbackup/borg.git
synced 2024-12-25 17:27:31 +00:00
Merge pull request #2530 from enkore/f/compact-revisit@2
Repository compaction docs
This commit is contained in:
commit
58791583d9
4 changed files with 51 additions and 8 deletions
BIN
docs/internals/compaction.png
Normal file
BIN
docs/internals/compaction.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 757 KiB |
BIN
docs/internals/compaction.vsd
Normal file
BIN
docs/internals/compaction.vsd
Normal file
Binary file not shown.
|
@ -122,11 +122,49 @@ such obsolete entries is called sparse, while a segment containing no such entri
|
|||
|
||||
Since writing a ``DELETE`` tag does not actually delete any data and
|
||||
thus does not free disk space any log-based data store will need a
|
||||
compaction strategy.
|
||||
compaction strategy (somewhat analogous to a garbage collector).
|
||||
Borg uses a simple forward compacting algorithm,
|
||||
which avoids modifying existing segments.
|
||||
Compaction runs when a commit is issued (unless the :ref:`append_only_mode` is active).
|
||||
One client transaction can manifest as multiple physical transactions,
|
||||
since compaction is transacted, too, and Borg does not distinguish between the two::
|
||||
|
||||
Borg tracks which segments are sparse and does a forward compaction
|
||||
when a commit is issued (unless the :ref:`append_only_mode` is
|
||||
active).
|
||||
Perspective| Time -->
|
||||
-----------+--------------
|
||||
Client | Begin transaction - Modify Data - Commit | <client waits for repository> (done)
|
||||
Repository | Begin transaction - Modify Data - Commit | Compact segments - Commit | (done)
|
||||
|
||||
The compaction algorithm requires two inputs in addition to the segments themselves:
|
||||
|
||||
(i) Which segments are sparse, to avoid scanning all segments (impractical).
|
||||
Further, Borg uses a conditional compaction strategy: Only those
|
||||
segments that exceed a threshold sparsity are compacted.
|
||||
|
||||
To implement the threshold condition efficiently, the sparsity has
|
||||
to be stored as well. Therefore, Borg stores a mapping ``(segment
|
||||
id,) -> (number of sparse bytes,)``.
|
||||
|
||||
The 1.0.x series used a simpler non-conditional algorithm,
|
||||
which only required the list of sparse segments. Thus,
|
||||
it only stored a list, not the mapping described above.
|
||||
(ii) Each segment's reference count, which indicates how many live objects are in a segment.
|
||||
This is not strictly required to perform the algorithm. Rather, it is used to validate
|
||||
that a segment is unused before deleting it. If the algorithm is incorrect, or the reference
|
||||
count was not accounted correctly, then an assertion failure occurs.
|
||||
|
||||
These two pieces of information are stored in the hints file (`hints.N`)
|
||||
next to the index (`index.N`).
|
||||
|
||||
When loading a hints file, Borg checks the version contained in the file.
|
||||
The 1.0.x series writes version 1 of the format (with the segments list instead
|
||||
of the mapping, mentioned above). Since Borg 1.0.4, version 2 is read as well.
|
||||
The 1.1.x series writes version 2 of the format and reads either version.
|
||||
When reading a version 1 hints file, Borg 1.1.x will
|
||||
read all sparse segments to determine their sparsity.
|
||||
|
||||
This process may take some time if a repository is kept in the append-only mode,
|
||||
which causes the number of sparse segments to grow. Repositories not in append-only
|
||||
mode have no sparse segments in 1.0.x, since compaction is unconditional.
|
||||
|
||||
Compaction processes sparse segments from oldest to newest; sparse segments
|
||||
which don't contain enough deleted data to justify compaction are skipped. This
|
||||
|
@ -135,8 +173,14 @@ a couple kB were deleted in a segment.
|
|||
|
||||
Segments that are compacted are read in entirety. Current entries are written to
|
||||
a new segment, while superseded entries are omitted. After each segment an intermediary
|
||||
commit is written to the new segment, data is synced and the old segment is deleted --
|
||||
freeing disk space.
|
||||
commit is written to the new segment. Then, the old segment is deleted
|
||||
(asserting that the reference count diminished to zero), freeing disk space.
|
||||
|
||||
A simplified example (excluding conditional compaction and with simpler
|
||||
commit logic) showing the principal operation of compaction:
|
||||
|
||||
.. figure::
|
||||
compaction.png
|
||||
|
||||
(The actual algorithm is more complex to avoid various consistency issues, refer to
|
||||
the ``borg.repository`` module for more comments and documentation on these issues.)
|
||||
|
|
|
@ -31,8 +31,7 @@
|
|||
# the header, and the total size was set to 20 MiB).
|
||||
MAX_DATA_SIZE = 20971479
|
||||
|
||||
# A few hundred files per directory to go easy on filesystems which don't like too many files per dir (NTFS)
|
||||
DEFAULT_SEGMENTS_PER_DIR = 500
|
||||
DEFAULT_SEGMENTS_PER_DIR = 2000
|
||||
|
||||
CHUNK_MIN_EXP = 19 # 2**19 == 512kiB
|
||||
CHUNK_MAX_EXP = 23 # 2**23 == 8MiB
|
||||
|
|
Loading…
Reference in a new issue