From bd89fb717e57967e60662a144d787f09514ff54d Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jonas=20Sch=C3=A4fer?= Date: Fri, 29 Jul 2022 22:36:57 +0200 Subject: [PATCH] docs: provide more details on object layout While writing my own out-of-band decoder, I had a hard time figuring out how to unpack the manifest. From the description, I was only able to read that the manifest is msgpack'd, but I had not been able to figure out that it's also going through the same encryption+compression logic as all other things do. This should make it a little clearer and provide the necessary information to understand how the compression works. --- docs/internals/data-structures.rst | 27 +++++++++++++++++++++------ 1 file changed, 21 insertions(+), 6 deletions(-) diff --git a/docs/internals/data-structures.rst b/docs/internals/data-structures.rst index 4c191ece0..0760b1b56 100644 --- a/docs/internals/data-structures.rst +++ b/docs/internals/data-structures.rst @@ -121,6 +121,14 @@ partial/uncommitted transaction. The size of individual segments is limited to 4 GiB, since the offset of entries within segments is stored in a 32-bit unsigned integer in the repository index. +Objects +~~~~~~~ + +All objects (the manifest, archives, archive item streams chunks and file data +chunks) are encrypted and/or compressed. See :ref:`data-encryption` for a +graphic outlining the anatomy of an object in Borg. The `type` for compression +is explained in :ref:`data-compression`. + Index, hints and integrity ~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -860,6 +868,8 @@ HashIndex is implemented in C and wrapped with Cython in a class-based interface The Cython wrapper checks every passed value against these reserved values and raises an AssertionError if they are used. +.. _data-encryption: + Encryption ---------- @@ -961,18 +971,23 @@ key file, wrapped using the standard ``textwrap`` module with a header. The header is a single line with a MAGIC string, a space and a hexadecimal representation of the repository id. +.. _data-compression: + Compression ----------- -Borg supports the following compression methods: +Borg supports the following compression methods, each identified by two bytes: -- none (no compression, pass through data 1:1) -- lz4 (low compression, but super fast) +- none (no compression, pass through data 1:1), identified by ``\x00\x00`` +- lz4 (low compression, but super fast), identified by ``\x01\x00`` - zstd (level 1-22 offering a wide range: level 1 is lower compression and high - speed, level 22 is higher compression and lower speed) - since borg 1.1.4 + speed, level 22 is higher compression and lower speed) - since borg 1.1.4, + identified by ``\x03\x00`` - zlib (level 0-9, level 0 is no compression [but still adding zlib overhead], - level 1 is low, level 9 is high compression) -- lzma (level 0-9, level 0 is low, level 9 is high compression). + level 1 is low, level 9 is high compression), identified by a zlib header + (``\x.8\x..``) +- lzma (level 0-9, level 0 is low, level 9 is high compression), identified + by ``\x02\x00``. Speed: none > lz4 > zlib > lzma, lz4 > zstd Compression: lzma > zlib > lz4 > none, zstd > lz4