From c834b2969cea159cf1a163666381dec63c73ebcf Mon Sep 17 00:00:00 2001 From: Thomas Waldmann Date: Fri, 12 Aug 2016 17:54:15 +0200 Subject: [PATCH] document archive limitation, #1452 --- docs/faq.rst | 11 +++++++++++ docs/internals.rst | 32 ++++++++++++++++++++++++++++++-- 2 files changed, 41 insertions(+), 2 deletions(-) diff --git a/docs/faq.rst b/docs/faq.rst index 88418b180..c772f5fa7 100644 --- a/docs/faq.rst +++ b/docs/faq.rst @@ -62,6 +62,17 @@ Which file types, attributes, etc. are *not* preserved? holes in a sparse file. * filesystem specific attributes, like ext4 immutable bit, see :issue:`618`. +Are there other known limitations? +---------------------------------- + +- A single archive can only reference a limited volume of file/dir metadata, + usually corresponding to tens or hundreds of millions of files/dirs. + When trying to go beyond that limit, you will get a fatal IntegrityError + exception telling that the (archive) object is too big. + An easy workaround is to create multiple archives with less items each. + See also the :ref:`archive_limitation` and :issue:`1452`. + + Why is my backup bigger than with attic? Why doesn't |project_name| do compression by default? ---------------------------------------------------------------------------------------------- diff --git a/docs/internals.rst b/docs/internals.rst index b088f68eb..82be188bf 100644 --- a/docs/internals.rst +++ b/docs/internals.rst @@ -160,12 +160,40 @@ object that contains: * version * name -* list of chunks containing item metadata +* list of chunks containing item metadata (size: count * ~40B) * cmdline * hostname * username * time +.. _archive_limitation: + +Note about archive limitations +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The archive is currently stored as a single object in the repository +and thus limited in size to MAX_OBJECT_SIZE (20MiB). + +As one chunk list entry is ~40B, that means we can reference ~500.000 item +metadata stream chunks per archive. + +Each item metadata stream chunk is ~128kiB (see hardcoded ITEMS_CHUNKER_PARAMS). + +So that means the whole item metadata stream is limited to ~64GiB chunks. +If compression is used, the amount of storable metadata is bigger - by the +compression factor. + +If the medium size of an item entry is 100B (small size file, no ACLs/xattrs), +that means a limit of ~640 million files/directories per archive. + +If the medium size of an item entry is 2kB (~100MB size files or more +ACLs/xattrs), the limit will be ~32 million files/directories per archive. + +If one tries to create an archive object bigger than MAX_OBJECT_SIZE, a fatal +IntegrityError will be raised. + +A workaround is to create multiple archives with less items each, see +also :issue:`1452`. The Item -------- @@ -174,7 +202,7 @@ Each item represents a file, directory or other fs item and is stored as an ``item`` dictionary that contains: * path -* list of data chunks +* list of data chunks (size: count * ~40B) * user * group * uid