mirror of https://github.com/borgbackup/borg.git
editing
This commit is contained in:
parent
45ee62e5ea
commit
b8e40fdce6
|
@ -729,11 +729,22 @@ were designed to handle corrupted data structures, so a corrupted files cache
|
||||||
may cause crashes or write incorrect archives.
|
may cause crashes or write incorrect archives.
|
||||||
|
|
||||||
Therefore, Borg calculates checksums when writing these files and tests checksums
|
Therefore, Borg calculates checksums when writing these files and tests checksums
|
||||||
when reading them. Checksums are generally 64-bit XXH64 checksums.
|
when reading them. Checksums are generally 64-bit XXH64 hashes.
|
||||||
|
The canonical xxHash representation is used, i.e. big-endian.
|
||||||
|
Checksums are stored as hexadecimal ASCII strings.
|
||||||
|
|
||||||
|
For compatibility, checksums are not required and absent checksums do not trigger errors.
|
||||||
|
The mechanisms have been designed to avoid false-positives when various Borg
|
||||||
|
versions are used alternately on the same repositories.
|
||||||
|
|
||||||
|
Checksums are a data safety mechanism. They are not a security mechanism.
|
||||||
|
|
||||||
|
.. rubric:: Choice of algorithm
|
||||||
|
|
||||||
XXH64 has been chosen for its high speed on all platforms, which avoids performance
|
XXH64 has been chosen for its high speed on all platforms, which avoids performance
|
||||||
degradation in CPU-limited parts (e.g. cache synchronization). Unlike CRC32,
|
degradation in CPU-limited parts (e.g. cache synchronization).
|
||||||
it does neither require hardware support (crc32c or CLMUL) nor vectorized code
|
Unlike CRC32, it neither requires hardware support (crc32c or CLMUL)
|
||||||
nor large, cache-unfriendly lookup tables to achieve good performance.
|
nor vectorized code nor large, cache-unfriendly lookup tables to achieve good performance.
|
||||||
This simplifies deployment of it considerably (cf. src/borg/algorithms/crc32...).
|
This simplifies deployment of it considerably (cf. src/borg/algorithms/crc32...).
|
||||||
|
|
||||||
Further, XXH64 is a non-linear hash function and thus has a "more or less" good
|
Further, XXH64 is a non-linear hash function and thus has a "more or less" good
|
||||||
|
@ -742,32 +753,36 @@ of detection decreases with error size.
|
||||||
|
|
||||||
The 64-bit checksum length is considered sufficient for the file sizes typically
|
The 64-bit checksum length is considered sufficient for the file sizes typically
|
||||||
checksummed (individual files up to a few GB, usually less).
|
checksummed (individual files up to a few GB, usually less).
|
||||||
|
xxHash was expressly designed for data blocks of these sizes.
|
||||||
The canonical xxHash representation is used, i.e. big-endian.
|
|
||||||
Checksums are generally stored as hexadecimal ASCII strings.
|
|
||||||
|
|
||||||
Lower layer — file_integrity
|
Lower layer — file_integrity
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
To accommodate the different transaction models used for the cache and repository,
|
To accommodate the different transaction models used for the cache and repository,
|
||||||
there is a lower layer (borg.crypto.file_integrity.IntegrityCheckedFile) which
|
there is a lower layer (borg.crypto.file_integrity.IntegrityCheckedFile)
|
||||||
wraps a file-like object and performs streaming calculation and comparison of checksums.
|
wrapping a file-like object, performing streaming calculation and comparison of checksums.
|
||||||
Checksum errors are signalled by raising an exception (borg.crypto.file_integrity.FileIntegrityError)
|
Checksum errors are signalled by raising an exception (borg.crypto.file_integrity.FileIntegrityError)
|
||||||
at the earliest possible moment.
|
at the earliest possible moment.
|
||||||
|
|
||||||
.. rubric:: Calculating checksums
|
.. rubric:: Calculating checksums
|
||||||
|
|
||||||
|
Before feeding the checksum algorithm any data, the file name (i.e. without any path)
|
||||||
|
is mixed into the checksum, since the name encodes the context of the data for Borg.
|
||||||
|
|
||||||
The various indices used by Borg have separate header and main data parts.
|
The various indices used by Borg have separate header and main data parts.
|
||||||
IntegrityCheckedFile allows to checksum them independently, which avoids
|
IntegrityCheckedFile allows to checksum them independently, which avoids
|
||||||
even reading the data when the header is corrupted. When a part is signalled,
|
even reading the data when the header is corrupted. When a part is signalled,
|
||||||
the length of the pathname is mixed into the checksum state first (encoded
|
the length of the part name is mixed into the checksum state first (encoded
|
||||||
as an ASCII string via `%10d` printf format), then the name of the part
|
as an ASCII string via `%10d` printf format), then the name of the part
|
||||||
is mixed in as an UTF-8 string. Lastly, the current position (length)
|
is mixed in as an UTF-8 string. Lastly, the current position (length)
|
||||||
in the file is mixed in as well.
|
in the file is mixed in as well.
|
||||||
|
|
||||||
The checksum state is not reset at part boundaries.
|
The checksum state is not reset at part boundaries.
|
||||||
|
|
||||||
A final checksum is always calculated from the entire state.
|
A final checksum is always calculated in the same way as the parts described above,
|
||||||
|
after seeking to the end of the file. The final checksum cannot prevent code
|
||||||
|
from processing corrupted data during reading, however, it prevents use of the
|
||||||
|
corrupted data.
|
||||||
|
|
||||||
.. rubric:: Serializing checksums
|
.. rubric:: Serializing checksums
|
||||||
|
|
||||||
|
@ -790,7 +805,8 @@ The *digests* key contains a mapping of part names to their digests.
|
||||||
|
|
||||||
Integrity data is generally stored by the upper layers, introduced below. An exception
|
Integrity data is generally stored by the upper layers, introduced below. An exception
|
||||||
is the DetachedIntegrityCheckedFile, which automatically writes and reads it from
|
is the DetachedIntegrityCheckedFile, which automatically writes and reads it from
|
||||||
a ".integrity" file next to the data file. It is used for archive chunks in chunks.archive.d.
|
a ".integrity" file next to the data file.
|
||||||
|
It is used for archive chunks indexes in chunks.archive.d.
|
||||||
|
|
||||||
Upper layer
|
Upper layer
|
||||||
~~~~~~~~~~~
|
~~~~~~~~~~~
|
||||||
|
@ -840,8 +856,8 @@ and are not automatically corrected at this time.
|
||||||
|
|
||||||
.. rubric:: chunks.archive.d
|
.. rubric:: chunks.archive.d
|
||||||
|
|
||||||
Indices in chunks.archive.d are not transacted and use DetachedIntegrityCheckedFile, which
|
Indices in chunks.archive.d are not transacted and use DetachedIntegrityCheckedFile,
|
||||||
writes the integrity data to a separate ".integrity" file.
|
which writes the integrity data to a separate ".integrity" file.
|
||||||
|
|
||||||
Integrity errors result in deleting the affected index and rebuilding it.
|
Integrity errors result in deleting the affected index and rebuilding it.
|
||||||
This logs a warning and increases the exit code to WARNING (1).
|
This logs a warning and increases the exit code to WARNING (1).
|
||||||
|
|
Loading…
Reference in New Issue