update docs

2022-09-08 21:52:32 +02:00 · 2022-09-08 21:52:32 +02:00 · c8830cde44
parent 5afe94883a
commit c8830cde44
3 changed files with 46 additions and 17 deletions
--- a/docs/internals/data-structures.rst
+++ b/docs/internals/data-structures.rst
@ -77,7 +77,7 @@ don't have a particular meaning (except for the Manifest_).

 Normally the keys are computed like this::

-  key = id = id_hash(unencrypted_data)
+  key = id = id_hash(plaintext_data)  # plain = not encrypted, not compressed, not obfuscated

 The id_hash function depends on the :ref:`encryption mode <borg_rcreate>`.

@ -98,15 +98,15 @@ followed by a number of log entries. Each log entry consists of (in this order):

 * crc32 checksum (uint32):
  - for PUT2: CRC32(size + tag + key + digest)
-  - for PUT: CRC32(size + tag + key + data)
+  - for PUT: CRC32(size + tag + key + payload)
  - for DELETE: CRC32(size + tag + key)
  - for COMMIT: CRC32(size + tag)
 * size (uint32) of the entry (including the whole header)
 * tag (uint8): PUT(0), DELETE(1), COMMIT(2) or PUT2(3)
 * key (256 bit) - only for PUT/PUT2/DELETE
-* data (size - 41 bytes) - only for PUT
-* xxh64 digest (64 bit) = XXH64(size + tag + key + data) - only for PUT2
-* data (size - 41 - 8 bytes) - only for PUT2
+* payload (size - 41 bytes) - only for PUT
+* xxh64 digest (64 bit) = XXH64(size + tag + key + payload) - only for PUT2
+* payload (size - 41 - 8 bytes) - only for PUT2

 PUT2 is new since repository version 2. For new log entries PUT2 is used.
 PUT is still supported to read version 1 repositories, but not generated any more.
@ -116,7 +116,7 @@ version 2+.
 Those files are strictly append-only and modified only once.

 When an object is written to the repository a ``PUT`` entry is written
-to the file containing the object id and data. If an object is deleted
+to the file containing the object id and payload. If an object is deleted
 a ``DELETE`` entry is appended with the object id.

 A ``COMMIT`` tag is written when a repository transaction is
@ -130,13 +130,42 @@ partial/uncommitted transaction.
 The size of individual segments is limited to 4 GiB, since the offset of entries
 within segments is stored in a 32-bit unsigned integer in the repository index.

-Objects
-~~~~~~~
+Objects / Payload structure
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+All data (the manifest, archives, archive item stream chunks and file data
+chunks) is compressed, optionally obfuscated and encrypted. This produces some
+additional metadata (size and compression information), which is separately
+serialized and also encrypted.
+
+See :ref:`data-encryption` for a graphic outlining the anatomy of the encryption in Borg.
+What you see at the bottom there is done twice: once for the data and once for the metadata.
+
+An object (the payload part of a segment file log entry) must be like:
+
+- length of encrypted metadata (16bit unsigned int)
+- encrypted metadata (incl. encryption header), when decrypted:
+
+  - msgpacked dict with:
+
+    - ctype (compression type 0..255)
+    - clevel (compression level 0..255)
+    - csize (overall compressed (and maybe obfuscated) data size)
+    - psize (only when obfuscated: payload size without the obfuscation trailer)
+    - size (uncompressed size of the data)
+- encrypted data (incl. encryption header), when decrypted:
+
+  - compressed data (with an optional all-zero-bytes obfuscation trailer)
+
+This new, more complex repo v2 object format was implemented to be able to efficiently
+query the metadata without having to read, transfer and decrypt the (usually much bigger)
+data part.
+
+The metadata is encrypted to not disclose potentially sensitive information that could be
+used for e.g. fingerprinting attacks.
+
+The compression `ctype` and `clevel` is explained in :ref:`data-compression`.

-All objects (the manifest, archives, archive item streams chunks and file data
-chunks) are encrypted and/or compressed. See :ref:`data-encryption` for a
-graphic outlining the anatomy of an object in Borg. The `type` for compression
-is explained in :ref:`data-compression`.

 Index, hints and integrity
 ~~~~~~~~~~~~~~~~~~~~~~~~~~
@ -855,7 +884,7 @@ For each borg invocation, a new sessionkey is derived from the borg key material
 and the 48bit IV starts from 0 again (both ciphers internally add a 32bit counter
 to our IV, so we'll just count up by 1 per chunk).

-The chunk layout is best seen at the bottom of this diagram:
+The encryption layout is best seen at the bottom of this diagram:

 .. figure:: encryption-aead.png
    :figwidth: 100%
@ -954,14 +983,14 @@ representation of the repository id.
 Compression
 -----------

-Borg supports the following compression methods, each identified by a type
-byte:
+Borg supports the following compression methods, each identified by a ctype value
+in the range between 0 and 255 (and augmented by a clevel 0..255 value for the
+compression level):

 - none (no compression, pass through data 1:1), identified by 0x00
 - lz4 (low compression, but super fast), identified by 0x01
 - zstd (level 1-22 offering a wide range: level 1 is lower compression and high
-  speed, level 22 is higher compression and lower speed) - since borg 1.1.4,
-  identified by 0x03
+  speed, level 22 is higher compression and lower speed) - identified by 0x03
 - zlib (level 0-9, level 0 is no compression [but still adding zlib overhead],
  level 1 is low, level 9 is high compression), identified by 0x05
 - lzma (level 0-9, level 0 is low, level 9 is high compression), identified
--- a/docs/internals/encryption-aead.odg
+++ b/docs/internals/encryption-aead.odg
--- a/docs/internals/encryption-aead.png
+++ b/docs/internals/encryption-aead.png