mirror of https://github.com/borgbackup/borg.git
Merge branch 'master' into loggedio-exceptions
Conflicts: borg/repository.py
This commit is contained in:
commit
08688fbc13
3
AUTHORS
3
AUTHORS
|
@ -1,6 +1,7 @@
|
||||||
Borg Developers / Contributors ("The Borg Collective")
|
Borg Developers / Contributors ("The Borg Collective")
|
||||||
``````````````````````````````````````````````````````
|
``````````````````````````````````````````````````````
|
||||||
- Thomas Waldmann
|
- Thomas Waldmann <tw@waldmann-edv.de>
|
||||||
|
- Antoine Beaupré
|
||||||
|
|
||||||
|
|
||||||
Borg is a fork of Attic. Attic is written and maintained
|
Borg is a fork of Attic. Attic is written and maintained
|
||||||
|
|
120
CHANGES
120
CHANGES
|
@ -1,56 +1,108 @@
|
||||||
Borg Changelog
|
Borg Changelog
|
||||||
==============
|
==============
|
||||||
|
|
||||||
Version <TBD>
|
|
||||||
-------------
|
Version 0.24.0
|
||||||
|
--------------
|
||||||
|
|
||||||
|
New features:
|
||||||
|
|
||||||
|
- borg create --chunker-params ... to configure the chunker.
|
||||||
|
See docs/misc/create_chunker-params.txt for more information.
|
||||||
|
- borg info now reports chunk counts in the chunk index.
|
||||||
|
|
||||||
|
Bug fixes:
|
||||||
|
|
||||||
|
- reduce memory usage, see --chunker-params, fixes #16.
|
||||||
|
This can be used to reduce chunk management overhead, so borg does not create
|
||||||
|
a huge chunks index/repo index and eats all your RAM if you back up lots of
|
||||||
|
data in huge files (like VM disk images).
|
||||||
|
- better Exception msg if there is no Borg installed on the remote repo server.
|
||||||
|
|
||||||
|
Other changes:
|
||||||
|
|
||||||
|
- Fedora/Fedora-based install instructions added to docs.
|
||||||
|
- added docs/misc directory for misc. writeups that won't be included "as is"
|
||||||
|
into the html docs.
|
||||||
|
|
||||||
|
|
||||||
|
I forgot to list some stuff already implemented in 0.23.0, here they are:
|
||||||
|
|
||||||
|
New features:
|
||||||
|
|
||||||
|
- efficient archive list from manifest, meaning a big speedup for slow
|
||||||
|
repo connections and "list <repo>", "delete <repo>", "prune"
|
||||||
|
- big speedup for chunks cache sync (esp. for slow repo connections), fixes #18
|
||||||
|
- hashindex: improve error messages
|
||||||
|
|
||||||
|
Other changes:
|
||||||
|
|
||||||
|
- explicitly specify binary mode to open binary files
|
||||||
|
- some easy micro optimizations
|
||||||
|
|
||||||
|
|
||||||
|
Version 0.23.0
|
||||||
|
--------------
|
||||||
|
|
||||||
|
Incompatible changes (compared to attic, fork related):
|
||||||
|
|
||||||
- changed sw name and cli command to "borg", updated docs
|
- changed sw name and cli command to "borg", updated docs
|
||||||
- package name and name in urls uses "borgbackup" to have less collisions
|
- package name (and name in urls) uses "borgbackup" to have less collisions
|
||||||
- changed repo / cache internal magic strings from ATTIC* to BORG*,
|
- changed repo / cache internal magic strings from ATTIC* to BORG*,
|
||||||
changed cache location to .cache/borg/
|
changed cache location to .cache/borg/ - this means that it currently won't
|
||||||
- give specific path to xattr.is_enabled(), disable symlink setattr call that
|
accept attic repos (see issue #21 about improving that)
|
||||||
always fails
|
|
||||||
- fix misleading hint the fuse ImportError handler gave, fixes attic #237
|
Bug fixes:
|
||||||
- source: misc. cleanups, pep8, style
|
|
||||||
- implement check --last N
|
|
||||||
- check: sort archives in reverse time order
|
|
||||||
- avoid defect python-msgpack releases, fixes attic #171, fixes attic #185
|
- avoid defect python-msgpack releases, fixes attic #171, fixes attic #185
|
||||||
- check unpacked data from RPC for tuple type and correct length, fixes attic #127
|
|
||||||
- less memory usage: add global option --no-cache-files
|
|
||||||
- fix traceback when trying to do unsupported passphrase change, fixes attic #189
|
- fix traceback when trying to do unsupported passphrase change, fixes attic #189
|
||||||
- datetime does not like the year 10.000, fixes attic #139
|
- datetime does not like the year 10.000, fixes attic #139
|
||||||
- docs and faq improvements, fixes, updates
|
- fix "info" all archives stats, fixes attic #183
|
||||||
- cleanup crypto.pyx, make it easier to adapt to other modes
|
- fix parsing with missing microseconds, fixes attic #282
|
||||||
- extract: if --stdout is given, write all extracted binary data to stdout
|
- fix misleading hint the fuse ImportError handler gave, fixes attic #237
|
||||||
|
- check unpacked data from RPC for tuple type and correct length, fixes attic #127
|
||||||
|
- fix Repository._active_txn state when lock upgrade fails
|
||||||
|
- give specific path to xattr.is_enabled(), disable symlink setattr call that
|
||||||
|
always fails
|
||||||
|
- fix test setup for 32bit platforms, partial fix for attic #196
|
||||||
|
- upgraded versioneer, PEP440 compliance, fixes attic #257
|
||||||
|
|
||||||
|
New features:
|
||||||
|
|
||||||
|
- less memory usage: add global option --no-cache-files
|
||||||
|
- check --last N (only check the last N archives)
|
||||||
|
- check: sort archives in reverse time order
|
||||||
|
- rename repo::oldname newname (rename repository)
|
||||||
|
- create -v output more informative
|
||||||
|
- create --progress (backup progress indicator)
|
||||||
|
- create --timestamp (utc string or reference file/dir)
|
||||||
- create: if "-" is given as path, read binary from stdin
|
- create: if "-" is given as path, read binary from stdin
|
||||||
- do os.fsync like recommended in the python docs
|
- extract: if --stdout is given, write all extracted binary data to stdout
|
||||||
|
- extract --sparse (simple sparse file support)
|
||||||
- extra debug information for 'fread failed'
|
- extra debug information for 'fread failed'
|
||||||
|
- delete <repo> (deletes whole repo + local cache)
|
||||||
- FUSE: reflect deduplication in allocated blocks
|
- FUSE: reflect deduplication in allocated blocks
|
||||||
- only allow whitelisted RPC calls in server mode
|
- only allow whitelisted RPC calls in server mode
|
||||||
- normalize source/exclude paths before matching
|
- normalize source/exclude paths before matching
|
||||||
- fix "info" all archives stats, fixes attic #183
|
|
||||||
- implement create --timestamp, utc string or reference file/dir
|
|
||||||
- simple sparse file support (extract --sparse)
|
|
||||||
- fix parsing with missing microseconds, fixes attic #282
|
|
||||||
- use posix_fadvise to not spoil the OS cache, fixes attic #252
|
- use posix_fadvise to not spoil the OS cache, fixes attic #252
|
||||||
- source: Let chunker optionally work with os-level file descriptor.
|
|
||||||
- source: Linux: remove duplicate os.fsencode calls
|
|
||||||
- fix test setup for 32bit platforms, partial fix for attic #196
|
|
||||||
- source: refactor _open_rb code a bit, so it is more consistent / regular
|
|
||||||
- implement rename repo::oldname newname
|
|
||||||
- implement create --progress
|
|
||||||
- source: refactor indicator (status) and item processing
|
|
||||||
- implement delete repo (also deletes local cache)
|
|
||||||
- better create -v output
|
|
||||||
- upgraded versioneer, PEP440 compliance, fixes attic #257
|
|
||||||
- source: use py.test for better testing, flake8 for code style checks
|
|
||||||
- source: fix tox >=2.0 compatibility
|
|
||||||
- toplevel error handler: show tracebacks for better error analysis
|
- toplevel error handler: show tracebacks for better error analysis
|
||||||
- sigusr1 / sigint handler to print current file infos - attic PR #286
|
- sigusr1 / sigint handler to print current file infos - attic PR #286
|
||||||
- pypi package: add python version classifiers, add FreeBSD to platforms
|
|
||||||
- fix Repository._active_txn state when lock upgrade fails
|
|
||||||
- RPCError: include the exception args we get from remote
|
- RPCError: include the exception args we get from remote
|
||||||
|
|
||||||
|
Other changes:
|
||||||
|
|
||||||
|
- source: misc. cleanups, pep8, style
|
||||||
|
- docs and faq improvements, fixes, updates
|
||||||
|
- cleanup crypto.pyx, make it easier to adapt to other AES modes
|
||||||
|
- do os.fsync like recommended in the python docs
|
||||||
|
- source: Let chunker optionally work with os-level file descriptor.
|
||||||
|
- source: Linux: remove duplicate os.fsencode calls
|
||||||
|
- source: refactor _open_rb code a bit, so it is more consistent / regular
|
||||||
|
- source: refactor indicator (status) and item processing
|
||||||
|
- source: use py.test for better testing, flake8 for code style checks
|
||||||
|
- source: fix tox >=2.0 compatibility (test runner)
|
||||||
|
- pypi package: add python version classifiers, add FreeBSD to platforms
|
||||||
|
|
||||||
|
|
||||||
Attic Changelog
|
Attic Changelog
|
||||||
===============
|
===============
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
include README.rst LICENSE CHANGES MANIFEST.in versioneer.py
|
include README.rst AUTHORS LICENSE CHANGES MANIFEST.in versioneer.py
|
||||||
recursive-include borg *.pyx
|
recursive-include borg *.pyx
|
||||||
recursive-include docs *
|
recursive-include docs *
|
||||||
recursive-exclude docs *.pyc
|
recursive-exclude docs *.pyc
|
||||||
|
|
10
README.rst
10
README.rst
|
@ -10,8 +10,12 @@ are stored.
|
||||||
Borg is a fork of Attic and maintained by "The Borg Collective" (see AUTHORS file).
|
Borg is a fork of Attic and maintained by "The Borg Collective" (see AUTHORS file).
|
||||||
|
|
||||||
BORG IS NOT COMPATIBLE WITH ORIGINAL ATTIC.
|
BORG IS NOT COMPATIBLE WITH ORIGINAL ATTIC.
|
||||||
UNTIL FURTHER NOTICE, EXPECT THAT WE WILL BREAK COMPATIBILITY REPEATEDLY.
|
EXPECT THAT WE WILL BREAK COMPATIBILITY REPEATEDLY WHEN MAJOR RELEASE NUMBER
|
||||||
THIS IS SOFTWARE IN DEVELOPMENT, DECIDE YOURSELF IF IT FITS YOUR NEEDS.
|
CHANGES (like when going from 0.x.y to 1.0.0). Please read CHANGES document.
|
||||||
|
|
||||||
|
NOT RELEASED DEVELOPMENT VERSIONS HAVE UNKNOWN COMPATIBILITY PROPERTIES.
|
||||||
|
|
||||||
|
THIS IS SOFTWARE IN DEVELOPMENT, DECIDE YOURSELF WHETHER IT FITS YOUR NEEDS.
|
||||||
|
|
||||||
Read issue #1 on the issue tracker, goals are being defined there.
|
Read issue #1 on the issue tracker, goals are being defined there.
|
||||||
|
|
||||||
|
@ -66,7 +70,7 @@ Where are the tests?
|
||||||
The tests are in the borg/testsuite package. To run the test suite use the
|
The tests are in the borg/testsuite package. To run the test suite use the
|
||||||
following command::
|
following command::
|
||||||
|
|
||||||
$ fakeroot -u tox # you need to have tox installed
|
$ fakeroot -u tox # you need to have tox and pytest installed
|
||||||
|
|
||||||
.. |build| image:: https://travis-ci.org/borgbackup/borg.svg
|
.. |build| image:: https://travis-ci.org/borgbackup/borg.svg
|
||||||
:alt: Build Status
|
:alt: Build Status
|
||||||
|
|
|
@ -18,8 +18,11 @@
|
||||||
#error Unknown byte order
|
#error Unknown byte order
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
|
#define MAGIC "BORG_IDX"
|
||||||
|
#define MAGIC_LEN 8
|
||||||
|
|
||||||
typedef struct {
|
typedef struct {
|
||||||
char magic[8];
|
char magic[MAGIC_LEN];
|
||||||
int32_t num_entries;
|
int32_t num_entries;
|
||||||
int32_t num_buckets;
|
int32_t num_buckets;
|
||||||
int8_t key_size;
|
int8_t key_size;
|
||||||
|
@ -37,7 +40,6 @@ typedef struct {
|
||||||
int upper_limit;
|
int upper_limit;
|
||||||
} HashIndex;
|
} HashIndex;
|
||||||
|
|
||||||
#define MAGIC "BORG_IDX"
|
|
||||||
#define EMPTY _htole32(0xffffffff)
|
#define EMPTY _htole32(0xffffffff)
|
||||||
#define DELETED _htole32(0xfffffffe)
|
#define DELETED _htole32(0xfffffffe)
|
||||||
#define MAX_BUCKET_SIZE 512
|
#define MAX_BUCKET_SIZE 512
|
||||||
|
@ -162,7 +164,7 @@ hashindex_read(const char *path)
|
||||||
EPRINTF_PATH(path, "fseek failed");
|
EPRINTF_PATH(path, "fseek failed");
|
||||||
goto fail;
|
goto fail;
|
||||||
}
|
}
|
||||||
if(memcmp(header.magic, MAGIC, 8)) {
|
if(memcmp(header.magic, MAGIC, MAGIC_LEN)) {
|
||||||
EPRINTF_MSG_PATH(path, "Unknown MAGIC in header");
|
EPRINTF_MSG_PATH(path, "Unknown MAGIC in header");
|
||||||
goto fail;
|
goto fail;
|
||||||
}
|
}
|
||||||
|
@ -359,14 +361,18 @@ hashindex_get_size(HashIndex *index)
|
||||||
}
|
}
|
||||||
|
|
||||||
static void
|
static void
|
||||||
hashindex_summarize(HashIndex *index, long long *total_size, long long *total_csize, long long *total_unique_size, long long *total_unique_csize)
|
hashindex_summarize(HashIndex *index, long long *total_size, long long *total_csize,
|
||||||
|
long long *total_unique_size, long long *total_unique_csize,
|
||||||
|
long long *total_unique_chunks, long long *total_chunks)
|
||||||
{
|
{
|
||||||
int64_t size = 0, csize = 0, unique_size = 0, unique_csize = 0;
|
int64_t size = 0, csize = 0, unique_size = 0, unique_csize = 0, chunks = 0, unique_chunks = 0;
|
||||||
const int32_t *values;
|
const int32_t *values;
|
||||||
void *key = NULL;
|
void *key = NULL;
|
||||||
|
|
||||||
while((key = hashindex_next_key(index, key))) {
|
while((key = hashindex_next_key(index, key))) {
|
||||||
values = key + 32;
|
values = key + index->key_size;
|
||||||
|
unique_chunks++;
|
||||||
|
chunks += values[0];
|
||||||
unique_size += values[1];
|
unique_size += values[1];
|
||||||
unique_csize += values[2];
|
unique_csize += values[2];
|
||||||
size += values[0] * values[1];
|
size += values[0] * values[1];
|
||||||
|
@ -376,4 +382,6 @@ hashindex_summarize(HashIndex *index, long long *total_size, long long *total_cs
|
||||||
*total_csize = csize;
|
*total_csize = csize;
|
||||||
*total_unique_size = unique_size;
|
*total_unique_size = unique_size;
|
||||||
*total_unique_csize = unique_csize;
|
*total_unique_csize = unique_csize;
|
||||||
|
*total_unique_chunks = unique_chunks;
|
||||||
|
*total_chunks = chunks;
|
||||||
}
|
}
|
||||||
|
|
|
@ -21,12 +21,14 @@ from .helpers import parse_timestamp, Error, uid2user, user2uid, gid2group, grou
|
||||||
Manifest, Statistics, decode_dict, st_mtime_ns, make_path_safe, StableDict, int_to_bigint, bigint_to_int
|
Manifest, Statistics, decode_dict, st_mtime_ns, make_path_safe, StableDict, int_to_bigint, bigint_to_int
|
||||||
|
|
||||||
ITEMS_BUFFER = 1024 * 1024
|
ITEMS_BUFFER = 1024 * 1024
|
||||||
CHUNK_MIN = 1024
|
|
||||||
CHUNK_MAX = 10 * 1024 * 1024
|
|
||||||
WINDOW_SIZE = 0xfff
|
|
||||||
CHUNK_MASK = 0xffff
|
|
||||||
|
|
||||||
ZEROS = b'\0' * CHUNK_MAX
|
CHUNK_MIN_EXP = 10 # 2**10 == 1kiB
|
||||||
|
CHUNK_MAX_EXP = 23 # 2**23 == 8MiB
|
||||||
|
HASH_WINDOW_SIZE = 0xfff # 4095B
|
||||||
|
HASH_MASK_BITS = 16 # results in ~64kiB chunks statistically
|
||||||
|
|
||||||
|
# defaults, use --chunker-params to override
|
||||||
|
CHUNKER_PARAMS = (CHUNK_MIN_EXP, CHUNK_MAX_EXP, HASH_MASK_BITS, HASH_WINDOW_SIZE)
|
||||||
|
|
||||||
utime_supports_fd = os.utime in getattr(os, 'supports_fd', {})
|
utime_supports_fd = os.utime in getattr(os, 'supports_fd', {})
|
||||||
utime_supports_follow_symlinks = os.utime in getattr(os, 'supports_follow_symlinks', {})
|
utime_supports_follow_symlinks = os.utime in getattr(os, 'supports_follow_symlinks', {})
|
||||||
|
@ -69,12 +71,12 @@ class DownloadPipeline:
|
||||||
class ChunkBuffer:
|
class ChunkBuffer:
|
||||||
BUFFER_SIZE = 1 * 1024 * 1024
|
BUFFER_SIZE = 1 * 1024 * 1024
|
||||||
|
|
||||||
def __init__(self, key):
|
def __init__(self, key, chunker_params=CHUNKER_PARAMS):
|
||||||
self.buffer = BytesIO()
|
self.buffer = BytesIO()
|
||||||
self.packer = msgpack.Packer(unicode_errors='surrogateescape')
|
self.packer = msgpack.Packer(unicode_errors='surrogateescape')
|
||||||
self.chunks = []
|
self.chunks = []
|
||||||
self.key = key
|
self.key = key
|
||||||
self.chunker = Chunker(WINDOW_SIZE, CHUNK_MASK, CHUNK_MIN, CHUNK_MAX,self.key.chunk_seed)
|
self.chunker = Chunker(self.key.chunk_seed, *chunker_params)
|
||||||
|
|
||||||
def add(self, item):
|
def add(self, item):
|
||||||
self.buffer.write(self.packer.pack(StableDict(item)))
|
self.buffer.write(self.packer.pack(StableDict(item)))
|
||||||
|
@ -104,8 +106,8 @@ class ChunkBuffer:
|
||||||
|
|
||||||
class CacheChunkBuffer(ChunkBuffer):
|
class CacheChunkBuffer(ChunkBuffer):
|
||||||
|
|
||||||
def __init__(self, cache, key, stats):
|
def __init__(self, cache, key, stats, chunker_params=CHUNKER_PARAMS):
|
||||||
super(CacheChunkBuffer, self).__init__(key)
|
super(CacheChunkBuffer, self).__init__(key, chunker_params)
|
||||||
self.cache = cache
|
self.cache = cache
|
||||||
self.stats = stats
|
self.stats = stats
|
||||||
|
|
||||||
|
@ -127,7 +129,8 @@ class Archive:
|
||||||
|
|
||||||
|
|
||||||
def __init__(self, repository, key, manifest, name, cache=None, create=False,
|
def __init__(self, repository, key, manifest, name, cache=None, create=False,
|
||||||
checkpoint_interval=300, numeric_owner=False, progress=False):
|
checkpoint_interval=300, numeric_owner=False, progress=False,
|
||||||
|
chunker_params=CHUNKER_PARAMS):
|
||||||
self.cwd = os.getcwd()
|
self.cwd = os.getcwd()
|
||||||
self.key = key
|
self.key = key
|
||||||
self.repository = repository
|
self.repository = repository
|
||||||
|
@ -142,8 +145,8 @@ class Archive:
|
||||||
self.numeric_owner = numeric_owner
|
self.numeric_owner = numeric_owner
|
||||||
self.pipeline = DownloadPipeline(self.repository, self.key)
|
self.pipeline = DownloadPipeline(self.repository, self.key)
|
||||||
if create:
|
if create:
|
||||||
self.items_buffer = CacheChunkBuffer(self.cache, self.key, self.stats)
|
self.items_buffer = CacheChunkBuffer(self.cache, self.key, self.stats, chunker_params)
|
||||||
self.chunker = Chunker(WINDOW_SIZE, CHUNK_MASK, CHUNK_MIN, CHUNK_MAX, self.key.chunk_seed)
|
self.chunker = Chunker(self.key.chunk_seed, *chunker_params)
|
||||||
if name in manifest.archives:
|
if name in manifest.archives:
|
||||||
raise self.AlreadyExists(name)
|
raise self.AlreadyExists(name)
|
||||||
self.last_checkpoint = time.time()
|
self.last_checkpoint = time.time()
|
||||||
|
@ -158,6 +161,7 @@ class Archive:
|
||||||
raise self.DoesNotExist(name)
|
raise self.DoesNotExist(name)
|
||||||
info = self.manifest.archives[name]
|
info = self.manifest.archives[name]
|
||||||
self.load(info[b'id'])
|
self.load(info[b'id'])
|
||||||
|
self.zeros = b'\0' * (1 << chunker_params[1])
|
||||||
|
|
||||||
def _load_meta(self, id):
|
def _load_meta(self, id):
|
||||||
data = self.key.decrypt(id, self.repository.get(id))
|
data = self.key.decrypt(id, self.repository.get(id))
|
||||||
|
@ -286,7 +290,7 @@ class Archive:
|
||||||
with open(path, 'wb') as fd:
|
with open(path, 'wb') as fd:
|
||||||
ids = [c[0] for c in item[b'chunks']]
|
ids = [c[0] for c in item[b'chunks']]
|
||||||
for data in self.pipeline.fetch_many(ids, is_preloaded=True):
|
for data in self.pipeline.fetch_many(ids, is_preloaded=True):
|
||||||
if sparse and ZEROS.startswith(data):
|
if sparse and self.zeros.startswith(data):
|
||||||
# all-zero chunk: create a hole in a sparse file
|
# all-zero chunk: create a hole in a sparse file
|
||||||
fd.seek(len(data), 1)
|
fd.seek(len(data), 1)
|
||||||
else:
|
else:
|
||||||
|
|
|
@ -13,7 +13,7 @@ import textwrap
|
||||||
import traceback
|
import traceback
|
||||||
|
|
||||||
from . import __version__
|
from . import __version__
|
||||||
from .archive import Archive, ArchiveChecker
|
from .archive import Archive, ArchiveChecker, CHUNKER_PARAMS
|
||||||
from .repository import Repository
|
from .repository import Repository
|
||||||
from .cache import Cache
|
from .cache import Cache
|
||||||
from .key import key_creator
|
from .key import key_creator
|
||||||
|
@ -21,7 +21,7 @@ from .helpers import Error, location_validator, format_time, format_file_size, \
|
||||||
format_file_mode, ExcludePattern, exclude_path, adjust_patterns, to_localtime, timestamp, \
|
format_file_mode, ExcludePattern, exclude_path, adjust_patterns, to_localtime, timestamp, \
|
||||||
get_cache_dir, get_keys_dir, format_timedelta, prune_within, prune_split, \
|
get_cache_dir, get_keys_dir, format_timedelta, prune_within, prune_split, \
|
||||||
Manifest, remove_surrogates, update_excludes, format_archive, check_extension_modules, Statistics, \
|
Manifest, remove_surrogates, update_excludes, format_archive, check_extension_modules, Statistics, \
|
||||||
is_cachedir, bigint_to_int
|
is_cachedir, bigint_to_int, ChunkerParams
|
||||||
from .remote import RepositoryServer, RemoteRepository
|
from .remote import RepositoryServer, RemoteRepository
|
||||||
|
|
||||||
|
|
||||||
|
@ -104,7 +104,8 @@ Type "Yes I am sure" if you understand this and want to continue.\n""")
|
||||||
cache = Cache(repository, key, manifest, do_files=args.cache_files)
|
cache = Cache(repository, key, manifest, do_files=args.cache_files)
|
||||||
archive = Archive(repository, key, manifest, args.archive.archive, cache=cache,
|
archive = Archive(repository, key, manifest, args.archive.archive, cache=cache,
|
||||||
create=True, checkpoint_interval=args.checkpoint_interval,
|
create=True, checkpoint_interval=args.checkpoint_interval,
|
||||||
numeric_owner=args.numeric_owner, progress=args.progress)
|
numeric_owner=args.numeric_owner, progress=args.progress,
|
||||||
|
chunker_params=args.chunker_params)
|
||||||
# Add cache dir to inode_skip list
|
# Add cache dir to inode_skip list
|
||||||
skip_inodes = set()
|
skip_inodes = set()
|
||||||
try:
|
try:
|
||||||
|
@ -515,8 +516,12 @@ Type "Yes I am sure" if you understand this and want to continue.\n""")
|
||||||
parser = argparse.ArgumentParser(description='Borg %s - Deduplicated Backups' % __version__)
|
parser = argparse.ArgumentParser(description='Borg %s - Deduplicated Backups' % __version__)
|
||||||
subparsers = parser.add_subparsers(title='Available commands')
|
subparsers = parser.add_subparsers(title='Available commands')
|
||||||
|
|
||||||
|
serve_epilog = textwrap.dedent("""
|
||||||
|
This command starts a repository server process. This command is usually not used manually.
|
||||||
|
""")
|
||||||
subparser = subparsers.add_parser('serve', parents=[common_parser],
|
subparser = subparsers.add_parser('serve', parents=[common_parser],
|
||||||
description=self.do_serve.__doc__)
|
description=self.do_serve.__doc__, epilog=serve_epilog,
|
||||||
|
formatter_class=argparse.RawDescriptionHelpFormatter)
|
||||||
subparser.set_defaults(func=self.do_serve)
|
subparser.set_defaults(func=self.do_serve)
|
||||||
subparser.add_argument('--restrict-to-path', dest='restrict_to_paths', action='append',
|
subparser.add_argument('--restrict-to-path', dest='restrict_to_paths', action='append',
|
||||||
metavar='PATH', help='restrict repository access to PATH')
|
metavar='PATH', help='restrict repository access to PATH')
|
||||||
|
@ -621,6 +626,10 @@ Type "Yes I am sure" if you understand this and want to continue.\n""")
|
||||||
metavar='yyyy-mm-ddThh:mm:ss',
|
metavar='yyyy-mm-ddThh:mm:ss',
|
||||||
help='manually specify the archive creation date/time (UTC). '
|
help='manually specify the archive creation date/time (UTC). '
|
||||||
'alternatively, give a reference file/directory.')
|
'alternatively, give a reference file/directory.')
|
||||||
|
subparser.add_argument('--chunker-params', dest='chunker_params',
|
||||||
|
type=ChunkerParams, default=CHUNKER_PARAMS,
|
||||||
|
metavar='CHUNK_MIN_EXP,CHUNK_MAX_EXP,HASH_MASK_BITS,HASH_WINDOW_SIZE',
|
||||||
|
help='specify the chunker parameters. default: %d,%d,%d,%d' % CHUNKER_PARAMS)
|
||||||
subparser.add_argument('archive', metavar='ARCHIVE',
|
subparser.add_argument('archive', metavar='ARCHIVE',
|
||||||
type=location_validator(archive=True),
|
type=location_validator(archive=True),
|
||||||
help='archive to create')
|
help='archive to create')
|
||||||
|
|
|
@ -20,8 +20,11 @@ cdef extern from "_chunker.c":
|
||||||
cdef class Chunker:
|
cdef class Chunker:
|
||||||
cdef _Chunker *chunker
|
cdef _Chunker *chunker
|
||||||
|
|
||||||
def __cinit__(self, window_size, chunk_mask, min_size, max_size, seed):
|
def __cinit__(self, seed, chunk_min_exp, chunk_max_exp, hash_mask_bits, hash_window_size):
|
||||||
self.chunker = chunker_init(window_size, chunk_mask, min_size, max_size, seed & 0xffffffff)
|
min_size = 1 << chunk_min_exp
|
||||||
|
max_size = 1 << chunk_max_exp
|
||||||
|
hash_mask = (1 << hash_mask_bits) - 1
|
||||||
|
self.chunker = chunker_init(hash_window_size, hash_mask, min_size, max_size, seed & 0xffffffff)
|
||||||
|
|
||||||
def chunkify(self, fd, fh=-1):
|
def chunkify(self, fd, fh=-1):
|
||||||
"""
|
"""
|
||||||
|
|
|
@ -11,7 +11,9 @@ cdef extern from "_hashindex.c":
|
||||||
HashIndex *hashindex_read(char *path)
|
HashIndex *hashindex_read(char *path)
|
||||||
HashIndex *hashindex_init(int capacity, int key_size, int value_size)
|
HashIndex *hashindex_init(int capacity, int key_size, int value_size)
|
||||||
void hashindex_free(HashIndex *index)
|
void hashindex_free(HashIndex *index)
|
||||||
void hashindex_summarize(HashIndex *index, long long *total_size, long long *total_csize, long long *unique_size, long long *unique_csize)
|
void hashindex_summarize(HashIndex *index, long long *total_size, long long *total_csize,
|
||||||
|
long long *unique_size, long long *unique_csize,
|
||||||
|
long long *total_unique_chunks, long long *total_chunks)
|
||||||
int hashindex_get_size(HashIndex *index)
|
int hashindex_get_size(HashIndex *index)
|
||||||
int hashindex_write(HashIndex *index, char *path)
|
int hashindex_write(HashIndex *index, char *path)
|
||||||
void *hashindex_get(HashIndex *index, void *key)
|
void *hashindex_get(HashIndex *index, void *key)
|
||||||
|
@ -179,9 +181,11 @@ cdef class ChunkIndex(IndexBase):
|
||||||
return iter
|
return iter
|
||||||
|
|
||||||
def summarize(self):
|
def summarize(self):
|
||||||
cdef long long total_size, total_csize, unique_size, unique_csize
|
cdef long long total_size, total_csize, unique_size, unique_csize, total_unique_chunks, total_chunks
|
||||||
hashindex_summarize(self.index, &total_size, &total_csize, &unique_size, &unique_csize)
|
hashindex_summarize(self.index, &total_size, &total_csize,
|
||||||
return total_size, total_csize, unique_size, unique_csize
|
&unique_size, &unique_csize,
|
||||||
|
&total_unique_chunks, &total_chunks)
|
||||||
|
return total_size, total_csize, unique_size, unique_csize, total_unique_chunks, total_chunks
|
||||||
|
|
||||||
|
|
||||||
cdef class ChunkKeyIterator:
|
cdef class ChunkKeyIterator:
|
||||||
|
|
|
@ -174,11 +174,14 @@ class Statistics:
|
||||||
self.usize += csize
|
self.usize += csize
|
||||||
|
|
||||||
def print_(self, label, cache):
|
def print_(self, label, cache):
|
||||||
total_size, total_csize, unique_size, unique_csize = cache.chunks.summarize()
|
total_size, total_csize, unique_size, unique_csize, total_unique_chunks, total_chunks = cache.chunks.summarize()
|
||||||
print()
|
print()
|
||||||
print(' Original size Compressed size Deduplicated size')
|
print(' Original size Compressed size Deduplicated size')
|
||||||
print('%-15s %20s %20s %20s' % (label, format_file_size(self.osize), format_file_size(self.csize), format_file_size(self.usize)))
|
print('%-15s %20s %20s %20s' % (label, format_file_size(self.osize), format_file_size(self.csize), format_file_size(self.usize)))
|
||||||
print('All archives: %20s %20s %20s' % (format_file_size(total_size), format_file_size(total_csize), format_file_size(unique_csize)))
|
print('All archives: %20s %20s %20s' % (format_file_size(total_size), format_file_size(total_csize), format_file_size(unique_csize)))
|
||||||
|
print()
|
||||||
|
print(' Unique chunks Total chunks')
|
||||||
|
print('Chunk index: %20d %20d' % (total_unique_chunks, total_chunks))
|
||||||
|
|
||||||
def show_progress(self, item=None, final=False):
|
def show_progress(self, item=None, final=False):
|
||||||
if not final:
|
if not final:
|
||||||
|
@ -310,6 +313,11 @@ def timestamp(s):
|
||||||
raise ValueError
|
raise ValueError
|
||||||
|
|
||||||
|
|
||||||
|
def ChunkerParams(s):
|
||||||
|
window_size, chunk_mask, chunk_min, chunk_max = s.split(',')
|
||||||
|
return int(window_size), int(chunk_mask), int(chunk_min), int(chunk_max)
|
||||||
|
|
||||||
|
|
||||||
def is_cachedir(path):
|
def is_cachedir(path):
|
||||||
"""Determines whether the specified path is a cache directory (and
|
"""Determines whether the specified path is a cache directory (and
|
||||||
therefore should potentially be excluded from the backup) according to
|
therefore should potentially be excluded from the backup) according to
|
||||||
|
|
|
@ -141,7 +141,10 @@ class RemoteRepository:
|
||||||
self.r_fds = [self.stdout_fd]
|
self.r_fds = [self.stdout_fd]
|
||||||
self.x_fds = [self.stdin_fd, self.stdout_fd]
|
self.x_fds = [self.stdin_fd, self.stdout_fd]
|
||||||
|
|
||||||
version = self.call('negotiate', 1)
|
try:
|
||||||
|
version = self.call('negotiate', 1)
|
||||||
|
except ConnectionClosed:
|
||||||
|
raise Exception('Server immediately closed connection - is Borg installed and working on the server?')
|
||||||
if version != 1:
|
if version != 1:
|
||||||
raise Exception('Server insisted on using unsupported protocol version %d' % version)
|
raise Exception('Server insisted on using unsupported protocol version %d' % version)
|
||||||
self.id = self.call('open', location.path, create)
|
self.id = self.call('open', location.path, create)
|
||||||
|
|
|
@ -14,6 +14,7 @@ from .lrucache import LRUCache
|
||||||
|
|
||||||
MAX_OBJECT_SIZE = 20 * 1024 * 1024
|
MAX_OBJECT_SIZE = 20 * 1024 * 1024
|
||||||
MAGIC = b'BORG_SEG'
|
MAGIC = b'BORG_SEG'
|
||||||
|
MAGIC_LEN = len(MAGIC)
|
||||||
TAG_PUT = 0
|
TAG_PUT = 0
|
||||||
TAG_DELETE = 1
|
TAG_DELETE = 1
|
||||||
TAG_COMMIT = 2
|
TAG_COMMIT = 2
|
||||||
|
@ -481,7 +482,7 @@ class LoggedIO:
|
||||||
os.mkdir(dirname)
|
os.mkdir(dirname)
|
||||||
self._write_fd = open(self.segment_filename(self.segment), 'ab')
|
self._write_fd = open(self.segment_filename(self.segment), 'ab')
|
||||||
self._write_fd.write(MAGIC)
|
self._write_fd.write(MAGIC)
|
||||||
self.offset = 8
|
self.offset = MAGIC_LEN
|
||||||
return self._write_fd
|
return self._write_fd
|
||||||
|
|
||||||
def get_fd(self, segment):
|
def get_fd(self, segment):
|
||||||
|
@ -504,9 +505,9 @@ class LoggedIO:
|
||||||
def iter_objects(self, segment, include_data=False):
|
def iter_objects(self, segment, include_data=False):
|
||||||
fd = self.get_fd(segment)
|
fd = self.get_fd(segment)
|
||||||
fd.seek(0)
|
fd.seek(0)
|
||||||
if fd.read(8) != MAGIC:
|
if fd.read(MAGIC_LEN) != MAGIC:
|
||||||
raise IntegrityError('Invalid segment magic')
|
raise IntegrityError('Invalid segment magic')
|
||||||
offset = 8
|
offset = MAGIC_LEN
|
||||||
header = fd.read(self.header_fmt.size)
|
header = fd.read(self.header_fmt.size)
|
||||||
while header:
|
while header:
|
||||||
try:
|
try:
|
||||||
|
|
|
@ -12,7 +12,7 @@ import unittest
|
||||||
from hashlib import sha256
|
from hashlib import sha256
|
||||||
|
|
||||||
from .. import xattr
|
from .. import xattr
|
||||||
from ..archive import Archive, ChunkBuffer, CHUNK_MAX
|
from ..archive import Archive, ChunkBuffer, CHUNK_MAX_EXP
|
||||||
from ..archiver import Archiver
|
from ..archiver import Archiver
|
||||||
from ..cache import Cache
|
from ..cache import Cache
|
||||||
from ..crypto import bytes_to_long, num_aes_blocks
|
from ..crypto import bytes_to_long, num_aes_blocks
|
||||||
|
@ -213,7 +213,7 @@ class ArchiverTestCase(ArchiverTestCaseBase):
|
||||||
sparse_support = sys.platform != 'darwin'
|
sparse_support = sys.platform != 'darwin'
|
||||||
filename = os.path.join(self.input_path, 'sparse')
|
filename = os.path.join(self.input_path, 'sparse')
|
||||||
content = b'foobar'
|
content = b'foobar'
|
||||||
hole_size = 5 * CHUNK_MAX # 5 full chunker buffers
|
hole_size = 5 * (1 << CHUNK_MAX_EXP) # 5 full chunker buffers
|
||||||
with open(filename, 'wb') as fd:
|
with open(filename, 'wb') as fd:
|
||||||
# create a file that has a hole at the beginning and end (if the
|
# create a file that has a hole at the beginning and end (if the
|
||||||
# OS and filesystem supports sparse files)
|
# OS and filesystem supports sparse files)
|
||||||
|
|
|
@ -1,27 +1,27 @@
|
||||||
from io import BytesIO
|
from io import BytesIO
|
||||||
|
|
||||||
from ..chunker import Chunker, buzhash, buzhash_update
|
from ..chunker import Chunker, buzhash, buzhash_update
|
||||||
from ..archive import CHUNK_MAX
|
from ..archive import CHUNK_MAX_EXP
|
||||||
from . import BaseTestCase
|
from . import BaseTestCase
|
||||||
|
|
||||||
|
|
||||||
class ChunkerTestCase(BaseTestCase):
|
class ChunkerTestCase(BaseTestCase):
|
||||||
|
|
||||||
def test_chunkify(self):
|
def test_chunkify(self):
|
||||||
data = b'0' * int(1.5 * CHUNK_MAX) + b'Y'
|
data = b'0' * int(1.5 * (1 << CHUNK_MAX_EXP)) + b'Y'
|
||||||
parts = [bytes(c) for c in Chunker(2, 0x3, 2, CHUNK_MAX, 0).chunkify(BytesIO(data))]
|
parts = [bytes(c) for c in Chunker(0, 1, CHUNK_MAX_EXP, 2, 2).chunkify(BytesIO(data))]
|
||||||
self.assert_equal(len(parts), 2)
|
self.assert_equal(len(parts), 2)
|
||||||
self.assert_equal(b''.join(parts), data)
|
self.assert_equal(b''.join(parts), data)
|
||||||
self.assert_equal([bytes(c) for c in Chunker(2, 0x3, 2, CHUNK_MAX, 0).chunkify(BytesIO(b''))], [])
|
self.assert_equal([bytes(c) for c in Chunker(0, 1, CHUNK_MAX_EXP, 2, 2).chunkify(BytesIO(b''))], [])
|
||||||
self.assert_equal([bytes(c) for c in Chunker(2, 0x3, 2, CHUNK_MAX, 0).chunkify(BytesIO(b'foobarboobaz' * 3))], [b'fooba', b'rboobaz', b'fooba', b'rboobaz', b'fooba', b'rboobaz'])
|
self.assert_equal([bytes(c) for c in Chunker(0, 1, CHUNK_MAX_EXP, 2, 2).chunkify(BytesIO(b'foobarboobaz' * 3))], [b'fooba', b'rboobaz', b'fooba', b'rboobaz', b'fooba', b'rboobaz'])
|
||||||
self.assert_equal([bytes(c) for c in Chunker(2, 0x3, 2, CHUNK_MAX, 1).chunkify(BytesIO(b'foobarboobaz' * 3))], [b'fo', b'obarb', b'oob', b'azf', b'oobarb', b'oob', b'azf', b'oobarb', b'oobaz'])
|
self.assert_equal([bytes(c) for c in Chunker(1, 1, CHUNK_MAX_EXP, 2, 2).chunkify(BytesIO(b'foobarboobaz' * 3))], [b'fo', b'obarb', b'oob', b'azf', b'oobarb', b'oob', b'azf', b'oobarb', b'oobaz'])
|
||||||
self.assert_equal([bytes(c) for c in Chunker(2, 0x3, 2, CHUNK_MAX, 2).chunkify(BytesIO(b'foobarboobaz' * 3))], [b'foob', b'ar', b'boobazfoob', b'ar', b'boobazfoob', b'ar', b'boobaz'])
|
self.assert_equal([bytes(c) for c in Chunker(2, 1, CHUNK_MAX_EXP, 2, 2).chunkify(BytesIO(b'foobarboobaz' * 3))], [b'foob', b'ar', b'boobazfoob', b'ar', b'boobazfoob', b'ar', b'boobaz'])
|
||||||
self.assert_equal([bytes(c) for c in Chunker(3, 0x3, 3, CHUNK_MAX, 0).chunkify(BytesIO(b'foobarboobaz' * 3))], [b'foobarboobaz' * 3])
|
self.assert_equal([bytes(c) for c in Chunker(0, 2, CHUNK_MAX_EXP, 2, 3).chunkify(BytesIO(b'foobarboobaz' * 3))], [b'foobarboobaz' * 3])
|
||||||
self.assert_equal([bytes(c) for c in Chunker(3, 0x3, 3, CHUNK_MAX, 1).chunkify(BytesIO(b'foobarboobaz' * 3))], [b'foobar', b'boo', b'bazfo', b'obar', b'boo', b'bazfo', b'obar', b'boobaz'])
|
self.assert_equal([bytes(c) for c in Chunker(1, 2, CHUNK_MAX_EXP, 2, 3).chunkify(BytesIO(b'foobarboobaz' * 3))], [b'foobar', b'boobazfo', b'obar', b'boobazfo', b'obar', b'boobaz'])
|
||||||
self.assert_equal([bytes(c) for c in Chunker(3, 0x3, 3, CHUNK_MAX, 2).chunkify(BytesIO(b'foobarboobaz' * 3))], [b'foo', b'barboobaz', b'foo', b'barboobaz', b'foo', b'barboobaz'])
|
self.assert_equal([bytes(c) for c in Chunker(2, 2, CHUNK_MAX_EXP, 2, 3).chunkify(BytesIO(b'foobarboobaz' * 3))], [b'foob', b'arboobaz', b'foob', b'arboobaz', b'foob', b'arboobaz'])
|
||||||
self.assert_equal([bytes(c) for c in Chunker(3, 0x3, 4, CHUNK_MAX, 0).chunkify(BytesIO(b'foobarboobaz' * 3))], [b'foobarboobaz' * 3])
|
self.assert_equal([bytes(c) for c in Chunker(0, 3, CHUNK_MAX_EXP, 2, 3).chunkify(BytesIO(b'foobarboobaz' * 3))], [b'foobarboobaz' * 3])
|
||||||
self.assert_equal([bytes(c) for c in Chunker(3, 0x3, 4, CHUNK_MAX, 1).chunkify(BytesIO(b'foobarboobaz' * 3))], [b'foobar', b'boobazfo', b'obar', b'boobazfo', b'obar', b'boobaz'])
|
self.assert_equal([bytes(c) for c in Chunker(1, 3, CHUNK_MAX_EXP, 2, 3).chunkify(BytesIO(b'foobarboobaz' * 3))], [b'foobarbo', b'obazfoobar', b'boobazfo', b'obarboobaz'])
|
||||||
self.assert_equal([bytes(c) for c in Chunker(3, 0x3, 4, CHUNK_MAX, 2).chunkify(BytesIO(b'foobarboobaz' * 3))], [b'foob', b'arboobaz', b'foob', b'arboobaz', b'foob', b'arboobaz'])
|
self.assert_equal([bytes(c) for c in Chunker(2, 3, CHUNK_MAX_EXP, 2, 3).chunkify(BytesIO(b'foobarboobaz' * 3))], [b'foobarboobaz', b'foobarboobaz', b'foobarboobaz'])
|
||||||
|
|
||||||
def test_buzhash(self):
|
def test_buzhash(self):
|
||||||
self.assert_equal(buzhash(b'abcdefghijklmnop', 0), 3795437769)
|
self.assert_equal(buzhash(b'abcdefghijklmnop', 0), 3795437769)
|
||||||
|
|
|
@ -161,8 +161,8 @@ p.admonition-title:after {
|
||||||
}
|
}
|
||||||
|
|
||||||
div.note {
|
div.note {
|
||||||
background-color: #0f5;
|
background-color: #002211;
|
||||||
border-bottom: 2px solid #d22;
|
border-bottom: 2px solid #22dd22;
|
||||||
}
|
}
|
||||||
|
|
||||||
div.seealso {
|
div.seealso {
|
||||||
|
|
|
@ -51,7 +51,7 @@ Which file types, attributes, etc. are *not* preserved?
|
||||||
recreate them in any case). So, don't panic if your backup misses a UDS!
|
recreate them in any case). So, don't panic if your backup misses a UDS!
|
||||||
* The precise on-disk representation of the holes in a sparse file.
|
* The precise on-disk representation of the holes in a sparse file.
|
||||||
Archive creation has no special support for sparse files, holes are
|
Archive creation has no special support for sparse files, holes are
|
||||||
backed up up as (deduplicated and compressed) runs of zero bytes.
|
backed up as (deduplicated and compressed) runs of zero bytes.
|
||||||
Archive extraction has optional support to extract all-zero chunks as
|
Archive extraction has optional support to extract all-zero chunks as
|
||||||
holes in a sparse file.
|
holes in a sparse file.
|
||||||
|
|
||||||
|
|
|
@ -62,21 +62,60 @@ Some of the steps detailled below might be useful also for non-git installs.
|
||||||
# optional: for unit testing
|
# optional: for unit testing
|
||||||
apt-get install fakeroot
|
apt-get install fakeroot
|
||||||
|
|
||||||
# install virtualenv tool, create and activate a virtual env
|
|
||||||
apt-get install python-virtualenv
|
|
||||||
virtualenv --python=python3 borg-env
|
|
||||||
source borg-env/bin/activate # always do this before using!
|
|
||||||
|
|
||||||
# install some dependencies into virtual env
|
|
||||||
pip install cython # to compile .pyx -> .c
|
|
||||||
pip install tox pytest # optional, for running unit tests
|
|
||||||
pip install sphinx # optional, to build the docs
|
|
||||||
|
|
||||||
# get |project_name| from github, install it
|
# get |project_name| from github, install it
|
||||||
git clone |git_url|
|
git clone |git_url|
|
||||||
|
|
||||||
|
apt-get install python-virtualenv
|
||||||
|
virtualenv --python=python3 borg-env
|
||||||
|
source borg-env/bin/activate # always before using!
|
||||||
|
|
||||||
|
# install borg + dependencies into virtualenv
|
||||||
|
pip install cython # compile .pyx -> .c
|
||||||
|
pip install tox pytest # optional, for running unit tests
|
||||||
|
pip install sphinx # optional, to build the docs
|
||||||
cd borg
|
cd borg
|
||||||
pip install -e . # in-place editable mode
|
pip install -e . # in-place editable mode
|
||||||
|
|
||||||
# optional: run all the tests, on all supported Python versions
|
# optional: run all the tests, on all supported Python versions
|
||||||
fakeroot -u tox
|
fakeroot -u tox
|
||||||
|
|
||||||
|
|
||||||
|
Korora / Fedora 21 installation (from git)
|
||||||
|
------------------------------------------
|
||||||
|
Note: this uses latest, unreleased development code from git.
|
||||||
|
While we try not to break master, there are no guarantees on anything.
|
||||||
|
|
||||||
|
Some of the steps detailled below might be useful also for non-git installs.
|
||||||
|
|
||||||
|
.. parsed-literal::
|
||||||
|
# Python 3.x (>= 3.2) + Headers, Py Package Installer
|
||||||
|
sudo dnf install python3 python3-devel python3-pip
|
||||||
|
|
||||||
|
# we need OpenSSL + Headers for Crypto
|
||||||
|
sudo dnf install openssl-devel openssl
|
||||||
|
|
||||||
|
# ACL support Headers + Library
|
||||||
|
sudo dnf install libacl-devel libacl
|
||||||
|
|
||||||
|
# optional: lowlevel FUSE py binding - to mount backup archives
|
||||||
|
sudo dnf install python3-llfuse fuse
|
||||||
|
|
||||||
|
# optional: for unit testing
|
||||||
|
sudo dnf install fakeroot
|
||||||
|
|
||||||
|
# get |project_name| from github, install it
|
||||||
|
git clone |git_url|
|
||||||
|
|
||||||
|
dnf install python3-virtualenv
|
||||||
|
virtualenv --python=python3 borg-env
|
||||||
|
source borg-env/bin/activate # always before using!
|
||||||
|
|
||||||
|
# install borg + dependencies into virtualenv
|
||||||
|
pip install cython # compile .pyx -> .c
|
||||||
|
pip install tox pytest # optional, for running unit tests
|
||||||
|
pip install sphinx # optional, to build the docs
|
||||||
|
cd borg
|
||||||
|
pip install -e . # in-place editable mode
|
||||||
|
|
||||||
|
# optional: run all the tests, on all supported Python versions
|
||||||
|
fakeroot -u tox
|
||||||
|
|
|
@ -6,38 +6,43 @@ Internals
|
||||||
|
|
||||||
This page documents the internal data structures and storage
|
This page documents the internal data structures and storage
|
||||||
mechanisms of |project_name|. It is partly based on `mailing list
|
mechanisms of |project_name|. It is partly based on `mailing list
|
||||||
discussion about internals`_ and also on static code analysis. It may
|
discussion about internals`_ and also on static code analysis.
|
||||||
not be exactly up to date with the current source code.
|
|
||||||
|
It may not be exactly up to date with the current source code.
|
||||||
|
|
||||||
|
Repository and Archives
|
||||||
|
-----------------------
|
||||||
|
|
||||||
|project_name| stores its data in a `Repository`. Each repository can
|
|project_name| stores its data in a `Repository`. Each repository can
|
||||||
hold multiple `Archives`, which represent individual backups that
|
hold multiple `Archives`, which represent individual backups that
|
||||||
contain a full archive of the files specified when the backup was
|
contain a full archive of the files specified when the backup was
|
||||||
performed. Deduplication is performed across multiple backups, both on
|
performed. Deduplication is performed across multiple backups, both on
|
||||||
data and metadata, using `Segments` chunked with the Buzhash_
|
data and metadata, using `Chunks` created by the chunker using the Buzhash_
|
||||||
algorithm. Each repository has the following file structure:
|
algorithm.
|
||||||
|
|
||||||
|
Each repository has the following file structure:
|
||||||
|
|
||||||
README
|
README
|
||||||
simple text file describing the repository
|
simple text file telling that this is a |project_name| repository
|
||||||
|
|
||||||
config
|
config
|
||||||
description of the repository, includes the unique identifier. also
|
repository configuration and lock file
|
||||||
acts as a lock file
|
|
||||||
|
|
||||||
data/
|
data/
|
||||||
directory where the actual data (`segments`) is stored
|
directory where the actual data is stored
|
||||||
|
|
||||||
hints.%d
|
hints.%d
|
||||||
undocumented
|
hints for repository compaction
|
||||||
|
|
||||||
index.%d
|
index.%d
|
||||||
cache of the file indexes. those files can be regenerated with
|
repository index
|
||||||
``check --repair``
|
|
||||||
|
|
||||||
Config file
|
Config file
|
||||||
-----------
|
-----------
|
||||||
|
|
||||||
Each repository has a ``config`` file which which is a ``INI``
|
Each repository has a ``config`` file which which is a ``INI``-style file
|
||||||
formatted file which looks like this::
|
and looks like this::
|
||||||
|
|
||||||
[repository]
|
[repository]
|
||||||
version = 1
|
version = 1
|
||||||
|
@ -48,20 +53,35 @@ formatted file which looks like this::
|
||||||
This is where the ``repository.id`` is stored. It is a unique
|
This is where the ``repository.id`` is stored. It is a unique
|
||||||
identifier for repositories. It will not change if you move the
|
identifier for repositories. It will not change if you move the
|
||||||
repository around so you can make a local transfer then decide to move
|
repository around so you can make a local transfer then decide to move
|
||||||
the repository in another (even remote) location at a later time.
|
the repository to another (even remote) location at a later time.
|
||||||
|
|
||||||
|project_name| will do a POSIX read lock on that file when operating
|
|project_name| will do a POSIX read lock on the config file when operating
|
||||||
on the repository.
|
on the repository.
|
||||||
|
|
||||||
|
|
||||||
|
Keys
|
||||||
|
----
|
||||||
|
The key to address the key/value store is usually computed like this:
|
||||||
|
|
||||||
|
key = id = id_hash(unencrypted_data)
|
||||||
|
|
||||||
|
The id_hash function is:
|
||||||
|
|
||||||
|
* sha256 (no encryption keys available)
|
||||||
|
* hmac-sha256 (encryption keys available)
|
||||||
|
|
||||||
|
|
||||||
Segments and archives
|
Segments and archives
|
||||||
---------------------
|
---------------------
|
||||||
|
|
||||||
|project_name| is a "filesystem based transactional key value
|
A |project_name| repository is a filesystem based transactional key/value
|
||||||
store". It makes extensive use of msgpack_ to store data and, unless
|
store. It makes extensive use of msgpack_ to store data and, unless
|
||||||
otherwise noted, data is stored in msgpack_ encoded files.
|
otherwise noted, data is stored in msgpack_ encoded files.
|
||||||
|
|
||||||
Objects referenced by a key (256bits id/hash) are stored inline in
|
Objects referenced by a key are stored inline in files (`segments`) of approx.
|
||||||
files (`segments`) of size approx 5MB in ``repo/data``. They contain:
|
5MB size in numbered subdirectories of ``repo/data``.
|
||||||
|
|
||||||
|
They contain:
|
||||||
|
|
||||||
* header size
|
* header size
|
||||||
* crc
|
* crc
|
||||||
|
@ -77,21 +97,26 @@ Tag is either ``PUT``, ``DELETE``, or ``COMMIT``. A segment file is
|
||||||
basically a transaction log where each repository operation is
|
basically a transaction log where each repository operation is
|
||||||
appended to the file. So if an object is written to the repository a
|
appended to the file. So if an object is written to the repository a
|
||||||
``PUT`` tag is written to the file followed by the object id and
|
``PUT`` tag is written to the file followed by the object id and
|
||||||
data. And if an object is deleted a ``DELETE`` tag is appended
|
data. If an object is deleted a ``DELETE`` tag is appended
|
||||||
followed by the object id. A ``COMMIT`` tag is written when a
|
followed by the object id. A ``COMMIT`` tag is written when a
|
||||||
repository transaction is committed. When a repository is opened any
|
repository transaction is committed. When a repository is opened any
|
||||||
``PUT`` or ``DELETE`` operations not followed by a ``COMMIT`` tag are
|
``PUT`` or ``DELETE`` operations not followed by a ``COMMIT`` tag are
|
||||||
discarded since they are part of a partial/uncommitted transaction.
|
discarded since they are part of a partial/uncommitted transaction.
|
||||||
|
|
||||||
The manifest is an object with an id of only zeros (32 bytes), that
|
|
||||||
references all the archives. It contains:
|
The manifest
|
||||||
|
------------
|
||||||
|
|
||||||
|
The manifest is an object with an all-zero key that references all the
|
||||||
|
archives.
|
||||||
|
It contains:
|
||||||
|
|
||||||
* version
|
* version
|
||||||
* list of archives
|
* list of archive infos
|
||||||
* timestamp
|
* timestamp
|
||||||
* config
|
* config
|
||||||
|
|
||||||
Each archive contains:
|
Each archive info contains:
|
||||||
|
|
||||||
* name
|
* name
|
||||||
* id
|
* id
|
||||||
|
@ -102,21 +127,21 @@ each time.
|
||||||
|
|
||||||
The archive metadata does not contain the file items directly. Only
|
The archive metadata does not contain the file items directly. Only
|
||||||
references to other objects that contain that data. An archive is an
|
references to other objects that contain that data. An archive is an
|
||||||
object that contain metadata:
|
object that contains:
|
||||||
|
|
||||||
* version
|
* version
|
||||||
* name
|
* name
|
||||||
* items list
|
* list of chunks containing item metadata
|
||||||
* cmdline
|
* cmdline
|
||||||
* hostname
|
* hostname
|
||||||
* username
|
* username
|
||||||
* time
|
* time
|
||||||
|
|
||||||
Each item represents a file or directory or
|
Each item represents a file, directory or other fs item and is stored as an
|
||||||
symlink is stored as an ``item`` dictionary that contains:
|
``item`` dictionary that contains:
|
||||||
|
|
||||||
* path
|
* path
|
||||||
* list of chunks
|
* list of data chunks
|
||||||
* user
|
* user
|
||||||
* group
|
* group
|
||||||
* uid
|
* uid
|
||||||
|
@ -135,124 +160,136 @@ it and it is reset every time an inode's metadata is changed.
|
||||||
All items are serialized using msgpack and the resulting byte stream
|
All items are serialized using msgpack and the resulting byte stream
|
||||||
is fed into the same chunker used for regular file data and turned
|
is fed into the same chunker used for regular file data and turned
|
||||||
into deduplicated chunks. The reference to these chunks is then added
|
into deduplicated chunks. The reference to these chunks is then added
|
||||||
to the archive metadata. This allows the archive to store many files,
|
to the archive metadata.
|
||||||
beyond the ``MAX_OBJECT_SIZE`` barrier of 20MB.
|
|
||||||
|
|
||||||
A chunk is an object as well, of course. The chunk id is either
|
A chunk is stored as an object as well, of course.
|
||||||
HMAC-SHA256_, when encryption is used, or a SHA256_ hash otherwise.
|
|
||||||
|
|
||||||
Hints are stored in a file (``repo/hints``) and contain:
|
|
||||||
|
|
||||||
* version
|
|
||||||
* list of segments
|
|
||||||
* compact
|
|
||||||
|
|
||||||
Chunks
|
Chunks
|
||||||
------
|
------
|
||||||
|
|
||||||
|project_name| uses a rolling checksum with Buzhash_ algorithm, with
|
|project_name| uses a rolling hash computed by the Buzhash_ algorithm, with a
|
||||||
window size of 4095 bytes (`0xFFF`), with a minimum of 1024, and triggers when
|
window size of 4095 bytes (`0xFFF`), with a minimum chunk size of 1024 bytes.
|
||||||
the last 16 bits of the checksum are null, producing chunks of 64kB on
|
It triggers (chunks) when the last 16 bits of the hash are zero, producing
|
||||||
average. All these parameters are fixed. The buzhash table is altered
|
chunks of 64kiB on average.
|
||||||
by XORing it with a seed randomly generated once for the archive, and
|
|
||||||
stored encrypted in the keyfile.
|
|
||||||
|
|
||||||
Indexes
|
The buzhash table is altered by XORing it with a seed randomly generated once
|
||||||
-------
|
for the archive, and stored encrypted in the keyfile.
|
||||||
|
|
||||||
There are two main indexes: the chunk lookup index and the repository
|
|
||||||
index. There is also the file chunk cache.
|
|
||||||
|
|
||||||
The chunk lookup index is stored in ``cache/chunk`` and is indexed on
|
Indexes / Caches
|
||||||
the ``chunk hash``. It contains:
|
----------------
|
||||||
|
|
||||||
* reference count
|
The files cache is stored in ``cache/files`` and is indexed on the
|
||||||
* size
|
``file path hash``. At backup time, it is used to quickly determine whether we
|
||||||
* ciphered size
|
need to chunk a given file (or whether it is unchanged and we already have all
|
||||||
|
its pieces).
|
||||||
The repository index is stored in ``repo/index.%d`` and is also
|
It contains:
|
||||||
indexed on ``chunk hash`` and contains:
|
|
||||||
|
|
||||||
* segment
|
|
||||||
* offset
|
|
||||||
|
|
||||||
The repository index files are random access but those files can be
|
|
||||||
recreated if damaged or lost using ``check --repair``.
|
|
||||||
|
|
||||||
Both indexes are stored as hash tables, directly mapped in memory from
|
|
||||||
the file content, with only one slot per bucket, but that spreads the
|
|
||||||
collisions to the following buckets. As a consequence the hash is just
|
|
||||||
a start position for a linear search, and if the element is not in the
|
|
||||||
table the index is linearly crossed until an empty bucket is
|
|
||||||
found. When the table is full at 90% its size is doubled, when it's
|
|
||||||
empty at 25% its size is halfed. So operations on it have a variable
|
|
||||||
complexity between constant and linear with low factor, and memory
|
|
||||||
overhead varies between 10% and 300%.
|
|
||||||
|
|
||||||
The file chunk cache is stored in ``cache/files`` and is indexed on
|
|
||||||
the ``file path hash`` and contains:
|
|
||||||
|
|
||||||
* age
|
* age
|
||||||
* inode number
|
* file inode number
|
||||||
* size
|
* file size
|
||||||
* mtime_ns
|
* file mtime_ns
|
||||||
* chunks hashes
|
* file content chunk hashes
|
||||||
|
|
||||||
The inode number is stored to make sure we distinguish between
|
The inode number is stored to make sure we distinguish between
|
||||||
different files, as a single path may not be unique across different
|
different files, as a single path may not be unique across different
|
||||||
archives in different setups.
|
archives in different setups.
|
||||||
|
|
||||||
The file chunk cache is stored as a python associative array storing
|
The files cache is stored as a python associative array storing
|
||||||
python objects, which generate a lot of overhead. This takes around
|
python objects, which generates a lot of overhead.
|
||||||
240 bytes per file without the chunk list, to be compared to at most
|
|
||||||
64 bytes of real data (depending on data alignment), and around 80
|
|
||||||
bytes per chunk hash (vs 32), with a minimum of ~250 bytes even if
|
|
||||||
only one chunk hash.
|
|
||||||
|
|
||||||
Indexes memory usage
|
The chunks cache is stored in ``cache/chunks`` and is indexed on the
|
||||||
--------------------
|
``chunk id_hash``. It is used to determine whether we already have a specific
|
||||||
|
chunk, to count references to it and also for statistics.
|
||||||
|
It contains:
|
||||||
|
|
||||||
Here is the estimated memory usage of |project_name| when using those
|
* reference count
|
||||||
indexes.
|
* size
|
||||||
|
* encrypted/compressed size
|
||||||
|
|
||||||
Repository index
|
The repository index is stored in ``repo/index.%d`` and is indexed on the
|
||||||
40 bytes x N ~ 200MB (If a remote repository is
|
``chunk id_hash``. It is used to determine a chunk's location in the repository.
|
||||||
used this will be allocated on the remote side)
|
It contains:
|
||||||
|
|
||||||
Chunk lookup index
|
* segment (that contains the chunk)
|
||||||
44 bytes x N ~ 220MB
|
* offset (where the chunk is located in the segment)
|
||||||
|
|
||||||
File chunk cache
|
The repository index file is random access.
|
||||||
probably 80-100 bytes x N ~ 400MB
|
|
||||||
|
Hints are stored in a file (``repo/hints.%d``).
|
||||||
|
It contains:
|
||||||
|
|
||||||
|
* version
|
||||||
|
* list of segments
|
||||||
|
* compact
|
||||||
|
|
||||||
|
hints and index can be recreated if damaged or lost using ``check --repair``.
|
||||||
|
|
||||||
|
The chunks cache and the repository index are stored as hash tables, with
|
||||||
|
only one slot per bucket, but that spreads the collisions to the following
|
||||||
|
buckets. As a consequence the hash is just a start position for a linear
|
||||||
|
search, and if the element is not in the table the index is linearly crossed
|
||||||
|
until an empty bucket is found.
|
||||||
|
|
||||||
|
When the hash table is almost full at 90%, its size is doubled. When it's
|
||||||
|
almost empty at 25%, its size is halved. So operations on it have a variable
|
||||||
|
complexity between constant and linear with low factor, and memory overhead
|
||||||
|
varies between 10% and 300%.
|
||||||
|
|
||||||
|
|
||||||
|
Indexes / Caches memory usage
|
||||||
|
-----------------------------
|
||||||
|
|
||||||
|
Here is the estimated memory usage of |project_name|:
|
||||||
|
|
||||||
|
chunk_count ~= total_file_size / 65536
|
||||||
|
|
||||||
|
repo_index_usage = chunk_count * 40
|
||||||
|
|
||||||
|
chunks_cache_usage = chunk_count * 44
|
||||||
|
|
||||||
|
files_cache_usage = total_file_count * 240 + chunk_count * 80
|
||||||
|
|
||||||
|
mem_usage ~= repo_index_usage + chunks_cache_usage + files_cache_usage
|
||||||
|
= total_file_count * 240 + total_file_size / 400
|
||||||
|
|
||||||
|
All units are Bytes.
|
||||||
|
|
||||||
|
It is assuming every chunk is referenced exactly once and that typical chunk size is 64kiB.
|
||||||
|
|
||||||
|
If a remote repository is used the repo index will be allocated on the remote side.
|
||||||
|
|
||||||
|
E.g. backing up a total count of 1Mi files with a total size of 1TiB:
|
||||||
|
|
||||||
|
mem_usage = 1 * 2**20 * 240 + 1 * 2**40 / 400 = 2.8GiB
|
||||||
|
|
||||||
|
Note: there is a commandline option to switch off the files cache. You'll save
|
||||||
|
some memory, but it will need to read / chunk all the files then.
|
||||||
|
|
||||||
In the above we assume 350GB of data that we divide on an average 64KB
|
|
||||||
chunk size, so N is around 5.3 million.
|
|
||||||
|
|
||||||
Encryption
|
Encryption
|
||||||
----------
|
----------
|
||||||
|
|
||||||
AES_ is used with CTR mode of operation (so no need for padding). A 64
|
AES_ is used in CTR mode (so no need for padding). A 64bit initialization
|
||||||
bits initialization vector is used, a `HMAC-SHA256`_ is computed
|
vector is used, a `HMAC-SHA256`_ is computed on the encrypted chunk with a
|
||||||
on the encrypted chunk with a random 64 bits nonce and both are stored
|
random 64bit nonce and both are stored in the chunk.
|
||||||
in the chunk. The header of each chunk is : ``TYPE(1)`` +
|
The header of each chunk is : ``TYPE(1)`` + ``HMAC(32)`` + ``NONCE(8)`` + ``CIPHERTEXT``.
|
||||||
``HMAC(32)`` + ``NONCE(8)`` + ``CIPHERTEXT``. Encryption and HMAC use
|
Encryption and HMAC use two different keys.
|
||||||
two different keys.
|
|
||||||
|
|
||||||
In AES CTR mode you can think of the IV as the start value for the
|
In AES CTR mode you can think of the IV as the start value for the counter.
|
||||||
counter. The counter itself is incremented by one after each 16 byte
|
The counter itself is incremented by one after each 16 byte block.
|
||||||
block. The IV/counter is not required to be random but it must NEVER be
|
The IV/counter is not required to be random but it must NEVER be reused.
|
||||||
reused. So to accomplish this |project_name| initializes the encryption counter
|
So to accomplish this |project_name| initializes the encryption counter to be
|
||||||
to be higher than any previously used counter value before encrypting
|
higher than any previously used counter value before encrypting new data.
|
||||||
new data.
|
|
||||||
|
|
||||||
To reduce payload size only 8 bytes of the 16 bytes nonce is saved in
|
To reduce payload size, only 8 bytes of the 16 bytes nonce is saved in the
|
||||||
the payload, the first 8 bytes are always zeroes. This does not affect
|
payload, the first 8 bytes are always zeros. This does not affect security but
|
||||||
security but limits the maximum repository capacity to only 295
|
limits the maximum repository capacity to only 295 exabytes (2**64 * 16 bytes).
|
||||||
exabytes (2**64 * 16 bytes).
|
|
||||||
|
|
||||||
Encryption keys are either a passphrase, passed through the
|
Encryption keys are either derived from a passphrase or kept in a key file.
|
||||||
``BORG_PASSPHRASE`` environment or prompted on the commandline, or
|
The passphrase is passed through the ``BORG_PASSPHRASE`` environment variable
|
||||||
stored in automatically generated key files.
|
or prompted for interactive usage.
|
||||||
|
|
||||||
Key files
|
Key files
|
||||||
---------
|
---------
|
||||||
|
@ -274,22 +311,20 @@ enc_key
|
||||||
the key used to encrypt data with AES (256 bits)
|
the key used to encrypt data with AES (256 bits)
|
||||||
|
|
||||||
enc_hmac_key
|
enc_hmac_key
|
||||||
the key used to HMAC the resulting AES-encrypted data (256 bits)
|
the key used to HMAC the encrypted data (256 bits)
|
||||||
|
|
||||||
id_key
|
id_key
|
||||||
the key used to HMAC the above chunks, the resulting hash is
|
the key used to HMAC the plaintext chunk data to compute the chunk's id
|
||||||
stored out of band (256 bits)
|
|
||||||
|
|
||||||
chunk_seed
|
chunk_seed
|
||||||
the seed for the buzhash chunking table (signed 32 bit integer)
|
the seed for the buzhash chunking table (signed 32 bit integer)
|
||||||
|
|
||||||
Those fields are processed using msgpack_. The utf-8 encoded phassphrase
|
Those fields are processed using msgpack_. The utf-8 encoded passphrase
|
||||||
is encrypted with PBKDF2_ and SHA256_ using 100000 iterations and a
|
is processed with PBKDF2_ (SHA256_, 100000 iterations, random 256 bit salt)
|
||||||
random 256 bits salt to give us a derived key. The derived key is 256
|
to give us a derived key. The derived key is 256 bits long.
|
||||||
bits long. A `HMAC-SHA256`_ checksum of the above fields is generated
|
A `HMAC-SHA256`_ checksum of the above fields is generated with the derived
|
||||||
with the derived key, then the derived key is also used to encrypt the
|
key, then the derived key is also used to encrypt the above pack of fields.
|
||||||
above pack of fields. Then the result is stored in a another msgpack_
|
Then the result is stored in a another msgpack_ formatted as follows:
|
||||||
formatted as follows:
|
|
||||||
|
|
||||||
version
|
version
|
||||||
currently always an integer, 1
|
currently always an integer, 1
|
||||||
|
@ -315,3 +350,9 @@ The resulting msgpack_ is then encoded using base64 and written to the
|
||||||
key file, wrapped using the standard ``textwrap`` module with a header.
|
key file, wrapped using the standard ``textwrap`` module with a header.
|
||||||
The header is a single line with a MAGIC string, a space and a hexadecimal
|
The header is a single line with a MAGIC string, a space and a hexadecimal
|
||||||
representation of the repository id.
|
representation of the repository id.
|
||||||
|
|
||||||
|
|
||||||
|
Compression
|
||||||
|
-----------
|
||||||
|
|
||||||
|
Currently, zlib level 6 is used as compression.
|
||||||
|
|
|
@ -0,0 +1,116 @@
|
||||||
|
About borg create --chunker-params
|
||||||
|
==================================
|
||||||
|
|
||||||
|
--chunker-params CHUNK_MIN_EXP,CHUNK_MAX_EXP,HASH_MASK_BITS,HASH_WINDOW_SIZE
|
||||||
|
|
||||||
|
CHUNK_MIN_EXP and CHUNK_MAX_EXP give the exponent N of the 2^N minimum and
|
||||||
|
maximum chunk size. Required: CHUNK_MIN_EXP < CHUNK_MAX_EXP.
|
||||||
|
|
||||||
|
Defaults: 10 (2^10 == 1KiB) minimum, 23 (2^23 == 8MiB) maximum.
|
||||||
|
|
||||||
|
HASH_MASK_BITS is the number of least-significant bits of the rolling hash
|
||||||
|
that need to be zero to trigger a chunk cut.
|
||||||
|
Recommended: CHUNK_MIN_EXP + X <= HASH_MASK_BITS <= CHUNK_MAX_EXP - X, X >= 2
|
||||||
|
(this allows the rolling hash some freedom to make its cut at a place
|
||||||
|
determined by the windows contents rather than the min/max. chunk size).
|
||||||
|
|
||||||
|
Default: 16 (statistically, chunks will be about 2^16 == 64kiB in size)
|
||||||
|
|
||||||
|
HASH_WINDOW_SIZE: the size of the window used for the rolling hash computation.
|
||||||
|
Default: 4095B
|
||||||
|
|
||||||
|
|
||||||
|
Trying it out
|
||||||
|
=============
|
||||||
|
|
||||||
|
I backed up a VM directory to demonstrate how different chunker parameters
|
||||||
|
influence repo size, index size / chunk count, compression, deduplication.
|
||||||
|
|
||||||
|
repo-sm: ~64kiB chunks (16 bits chunk mask), min chunk size 1kiB (2^10B)
|
||||||
|
(these are attic / borg 0.23 internal defaults)
|
||||||
|
|
||||||
|
repo-lg: ~1MiB chunks (20 bits chunk mask), min chunk size 64kiB (2^16B)
|
||||||
|
|
||||||
|
repo-xl: 8MiB chunks (2^23B max chunk size), min chunk size 64kiB (2^16B).
|
||||||
|
The chunk mask bits was set to 31, so it (almost) never triggers.
|
||||||
|
This degrades the rolling hash based dedup to a fixed-offset dedup
|
||||||
|
as the cutting point is now (almost) always the end of the buffer
|
||||||
|
(at 2^23B == 8MiB).
|
||||||
|
|
||||||
|
The repo index size is an indicator for the RAM needs of Borg.
|
||||||
|
In this special case, the total RAM needs are about 2.1x the repo index size.
|
||||||
|
You see index size of repo-sm is 16x larger than of repo-lg, which corresponds
|
||||||
|
to the ratio of the different target chunk sizes.
|
||||||
|
|
||||||
|
Note: RAM needs were not a problem in this specific case (37GB data size).
|
||||||
|
But just imagine, you have 37TB of such data and much less than 42GB RAM,
|
||||||
|
then you'ld definitely want the "lg" chunker params so you only need
|
||||||
|
2.6GB RAM. Or even bigger chunks than shown for "lg" (see "xl").
|
||||||
|
|
||||||
|
You also see compression works better for larger chunks, as expected.
|
||||||
|
Duplication works worse for larger chunks, also as expected.
|
||||||
|
|
||||||
|
small chunks
|
||||||
|
============
|
||||||
|
|
||||||
|
$ borg info /extra/repo-sm::1
|
||||||
|
|
||||||
|
Command line: /home/tw/w/borg-env/bin/borg create --chunker-params 10,23,16,4095 /extra/repo-sm::1 /home/tw/win
|
||||||
|
Number of files: 3
|
||||||
|
|
||||||
|
Original size Compressed size Deduplicated size
|
||||||
|
This archive: 37.12 GB 14.81 GB 12.18 GB
|
||||||
|
All archives: 37.12 GB 14.81 GB 12.18 GB
|
||||||
|
|
||||||
|
Unique chunks Total chunks
|
||||||
|
Chunk index: 378374 487316
|
||||||
|
|
||||||
|
$ ls -l /extra/repo-sm/index*
|
||||||
|
|
||||||
|
-rw-rw-r-- 1 tw tw 20971538 Jun 20 23:39 index.2308
|
||||||
|
|
||||||
|
$ du -sk /extra/repo-sm
|
||||||
|
11930840 /extra/repo-sm
|
||||||
|
|
||||||
|
large chunks
|
||||||
|
============
|
||||||
|
|
||||||
|
$ borg info /extra/repo-lg::1
|
||||||
|
|
||||||
|
Command line: /home/tw/w/borg-env/bin/borg create --chunker-params 16,23,20,4095 /extra/repo-lg::1 /home/tw/win
|
||||||
|
Number of files: 3
|
||||||
|
|
||||||
|
Original size Compressed size Deduplicated size
|
||||||
|
This archive: 37.10 GB 14.60 GB 13.38 GB
|
||||||
|
All archives: 37.10 GB 14.60 GB 13.38 GB
|
||||||
|
|
||||||
|
Unique chunks Total chunks
|
||||||
|
Chunk index: 25889 29349
|
||||||
|
|
||||||
|
$ ls -l /extra/repo-lg/index*
|
||||||
|
|
||||||
|
-rw-rw-r-- 1 tw tw 1310738 Jun 20 23:10 index.2264
|
||||||
|
|
||||||
|
$ du -sk /extra/repo-lg
|
||||||
|
13073928 /extra/repo-lg
|
||||||
|
|
||||||
|
xl chunks
|
||||||
|
=========
|
||||||
|
|
||||||
|
(borg-env)tw@tux:~/w/borg$ borg info /extra/repo-xl::1
|
||||||
|
Command line: /home/tw/w/borg-env/bin/borg create --chunker-params 16,23,31,4095 /extra/repo-xl::1 /home/tw/win
|
||||||
|
Number of files: 3
|
||||||
|
|
||||||
|
Original size Compressed size Deduplicated size
|
||||||
|
This archive: 37.10 GB 14.59 GB 14.59 GB
|
||||||
|
All archives: 37.10 GB 14.59 GB 14.59 GB
|
||||||
|
|
||||||
|
Unique chunks Total chunks
|
||||||
|
Chunk index: 4319 4434
|
||||||
|
|
||||||
|
$ ls -l /extra/repo-xl/index*
|
||||||
|
-rw-rw-r-- 1 tw tw 327698 Jun 21 00:52 index.2011
|
||||||
|
|
||||||
|
$ du -sk /extra/repo-xl/
|
||||||
|
14253464 /extra/repo-xl/
|
||||||
|
|
|
@ -50,6 +50,9 @@ Examples
|
||||||
NAME="root-`date +%Y-%m-%d`"
|
NAME="root-`date +%Y-%m-%d`"
|
||||||
$ borg create /mnt/backup::$NAME / --do-not-cross-mountpoints
|
$ borg create /mnt/backup::$NAME / --do-not-cross-mountpoints
|
||||||
|
|
||||||
|
# Backup huge files with little chunk management overhead
|
||||||
|
$ borg create --chunker-params 19,23,21,4095 /mnt/backup::VMs /srv/VMs
|
||||||
|
|
||||||
|
|
||||||
.. include:: usage/extract.rst.inc
|
.. include:: usage/extract.rst.inc
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue