Commit Graph

403 Commits

Author SHA1 Message Date
Thomas Waldmann 9fd284ce1a refactor new zero chunk handling to be reusable 2021-01-08 23:39:53 +01:00
Thomas Waldmann 6d0f9a52eb detect all-zero chunks, avoid hashing them
comparing zeros is quicker than hashing them.
the comparison should fail quickly inside non-zero data.
2021-01-08 17:40:06 +01:00
Thomas Waldmann 52bd55b29a integrate Chunk type, avoid hashing holes 2021-01-08 17:39:51 +01:00
Thomas Waldmann b8bb0494f6 create --sparse, file map support for the "fixed" chunker, see #14
a file map can be:

- created internally inside chunkify by calling sparsemap, which uses
  SEEK_DATA / SEEK_HOLE to determine data and hole ranges inside a
  seekable sparse file.
  Usage: borg create --sparse --chunker-params=fixed,BLOCKSIZE ...
  BLOCKSIZE is the chunker blocksize here, not the filesystem blocksize!

- made by some other means and given to the chunkify function.
  this is not used yet, but in future this could be used to only read
  the changed parts and seek over the (known) unchanged parts of a file.

sparsemap: the generate range sizes are multiples of the fs block size.
           the tests assume 4kiB fs block size.
2020-12-27 22:06:08 +01:00
Thomas Waldmann 24d3400dd4 borg export-tar: fix memory leak with ssh: remote repository, fixes #5568
also: added a comment how to avoid this kind of memory leak.
2020-12-17 22:55:13 +01:00
Guinness 9052c1cc54
Add repo location to the stats in borg create 2020-12-16 13:46:29 +01:00
Lapinot 34f6cfcd81
Outsource recursive directory walking (#5492)
Split recursive directory walking/processing into walking and item processing.
2020-11-15 15:31:01 +01:00
Phil Kulin c0504c0669 create: implement --stdin-mode, --stdin-user and --stdin-group, #5333 2020-11-01 20:45:56 +03:00
Thomas Waldmann 0839ac3034 prettier error message when archive gets too big, fixes #5307 2020-09-08 21:00:27 +02:00
Thomas Waldmann d2536de4ee fix hardlinked CACHEDIR.TAG processing, fixes #4911 2020-06-14 22:00:02 +02:00
Thomas Waldmann dee402652f --read-special: .part files also should be regular files, fixes #5217 2020-06-14 15:36:22 +02:00
Peter Gerber 00b09370c0
Allow creating archives using stdout of given command (#5174)
allow creating archives using stdout of given command

In addition to allowing:

some-command --param value | borg create REPO::ARCH -

also allow:

borg create --content-from-command create REPO::ARCH -- some-command --param value

The difference is that the latter approach deals with errors properly.
In the former example, an archive is created no matter what. Even, if
`some-command` aborts and the output is truncated, Borg won't realize.
In the latter example, the status code is checked and archive creation
is aborted properly when appropriate.
2020-06-02 22:24:14 +02:00
Elmar Hoffmann dad3aa9dae rename local preload() function to not overwrite keyword argument of same name
The locally defined preload() function overwrites the preload boolean keyword
argument, always evaluating to true, so preloading is done, even when not
requested by the caller, causing a memory leak.
Also move its definition outside of the loop.

This issue was found by Antonio Larrosa in borg issue #5202.
2020-06-01 17:12:51 +02:00
Thalian 08a7661e67 [FEATURE] #4489 – Deprecate --nobsdflags option
Replaced by --noflags. In internal data structure the key 'bsdflags' is kept for backwards compatibility.
2020-03-25 06:35:15 +01:00
Thomas Waldmann 046dea8643 check: do not stumble over invalid item key, fixes #4845
The code used for error reporting crashes due to an invalid utf-8
sequence. Use errors='replace' to never crash there. Errors
are expected in input data when borg check is run.
2020-03-09 00:12:36 +01:00
TW 597b09a993 support platforms with no os.link (#4903)
support platforms with no os.link, fixes #4901

if we don't have os.link, we just extract another copy instead of making a hardlink.

for that to work, we need to have (and keep) the chunks list in hardlink_masters.
2020-03-03 23:34:54 -05:00
Thomas Waldmann a8831f4978 fix ProgressIndicator msgids, fixes #4935
add some to code, fix docs.
2020-03-03 23:57:36 +01:00
Rémi Oudin a029d686b5 Borg recreate timestamp is a no op (#4815)
recreate: support --timestamp option, fixes #4745
2019-11-16 11:03:34 +01:00
TW aa7df50a2d
Merge pull request #4635 from ThomasWaldmann/ctrlc-checkpoint
first ctrl-c: checkpoint and abort, fixes #4606
2019-09-06 21:44:07 +02:00
Thomas Waldmann cb2d31ed98 fix partial extract for hardlinked contentless file types, fixes #4725
if the file is not a regular file, but a hardlink slave with a not
extracted hardlink master, chunks will be None and we must not call
preload(chunks).

(cherry picked from commit 291d58efa1)
2019-08-27 19:20:20 +05:30
Thomas Waldmann 9732fe4965 special behaviour on first ctrl-c, fixes #4606
like:
 - try saving a checkpoint if borg create is ctrl-c-ed
2019-08-25 22:49:09 +02:00
Jürg Rast bff97a99e1 Windows specific directory handling
On windows os.open does not work for directories.
If borg tries to open an directory on windows, None is returned
as file descriptor. The archive and archiver where adjusted to
handle the case if a file descriptor is None.
2019-08-24 10:17:18 +02:00
Thomas Waldmann 71c7efd17c extract: fix KeyError for "partial" extraction, fixes #4607
note that "partial" even applied to giving an always matching condition.

"full" is only assumed if no conditions are given.
2019-06-10 20:18:44 +02:00
Thomas Waldmann f33f318d81 preload chunks for hardlink slaves w/o preloaded master, fixes #4350
also split the hardlink extraction test into 2 tests.
2019-05-06 02:06:58 +02:00
Thomas Waldmann 502ebe63be delete archive: consider part files correctly for stats, see #4507 2019-04-19 19:29:30 +02:00
Thomas Waldmann cd4f6b41ca create: only run stat_simple_attrs() once
the second call was done in stat_attrs().

this increases backup with lots of unchanged files performance by ~ 5%.
2019-04-08 21:34:09 +02:00
Thomas Waldmann b3751b107d determine whether a file has changed while being backed up, fixes #1750 2019-03-11 22:55:27 +01:00
Thomas Waldmann 6809f6f7fa calc_stats: use archive stats metadata, if available
by default, we still have to compute unique_csize the slow way,
but the code offers want_unique=False param to not compute it.
2019-02-23 15:05:07 +01:00
Thomas Waldmann e569595974 include size/csize/nfiles[_parts] stats into archive, fixes #3241 2019-02-23 15:05:07 +01:00
Thomas Waldmann 23eeded7c5 fix --read-special behaviour: follow symlinks pointing to special files
also: added a test for this.
2019-02-20 10:13:09 +01:00
Thomas Waldmann ec17f0a607 check for stat race conditions, see #908
we must avoid a handler processing a fs item of wrong file type,
so check if it has changed.
2019-02-20 09:16:57 +01:00
Thomas Waldmann 39922e88e5 micro-opt: get xattrs directly before acls
on linux, acls are based on xattrs, so do these closeby:

1. listxattr -> keys (without acl related keys)
2. for all keys: getxattr
3. acl-related getxattr by acl library
2019-02-17 02:46:03 +01:00
Thomas Waldmann 85b711fc88 opening device files is troublesome, don't do it
for fd-based operations, we would have to open the file, but for
char / block devices this has unwanted effects, even if we do not
read from the device.

thus, we use path (or dir_fd + name) based ops here.
2019-02-14 09:20:04 +01:00
Thomas Waldmann 833c49f834 use *at style functions (e.g. openat, statat) to avoid races
races via changing path components can be avoided by opening the
parent directory and using parent_fd + file_name combination with
*at style functions to access the directories' contents.
2019-02-14 09:20:04 +01:00
Thomas Waldmann ad5b9a1dfd _process / process_*: change to kwargs only
we'll add/remove some args soon, so many pos args would be just bad.
2019-02-14 09:20:03 +01:00
Thomas Waldmann 8220c6eac8 move/refactor Archive._open_rb function to helpers.os_open
also:
- add and use OsOpen context manager
- add O_NONBLOCK, O_NOFOLLOW, O_NOCTTY (inspired by gnu tar)
2019-02-14 09:20:03 +01:00
Thomas Waldmann 677102f292 process_file: avoid race condition: stat data vs. content
always open the file and then do all operations with the fd:
- fstat
- read
- get xattrs, acls, bsdflags
2019-02-14 09:20:03 +01:00
Thomas Waldmann ac0803fe0b chunker algorithms: use constants to avoid typos 2019-02-13 04:36:09 +01:00
Thomas Waldmann c4ffbd2a17 prepare to support multiple chunkers 2019-02-13 04:24:14 +01:00
TW b204201fb5
Merge pull request #4302 from ThomasWaldmann/repair-output
add archive name to check --repair output, fixes #3447
2019-02-04 03:29:58 +01:00
TW c3f40de606
cache_sync: compute size/count stats, borg info: consider part files (#4286)
cache_sync: compute size/count stats, borg info: consider part files

fixes #3522
2019-02-04 03:26:45 +01:00
Thomas Waldmann 18b62f63a6 add archive name to check --repair output, fixes #3447
so it does not look like duplicated and also informs the user about
affected archives.
2019-02-01 23:30:45 +01:00
Emmo Emminghaus 733a2bfa30 Introduce borg.platformflags.is_<os> 2018-11-10 23:34:43 +01:00
Emmo Emminghaus 558ca61d20 remove posix issues and fixup for unsupported methodes 2018-11-10 21:48:46 +01:00
Emmo Emminghaus b997d5ba5b move code from borg.helpers.usergroup to borg.platform.posix 2018-11-10 21:43:45 +01:00
Thomas Waldmann 10cdadb2f8 flake8: fix F841 2018-10-29 12:36:03 +01:00
Thomas Waldmann 3c173cc03b wrap msgpack, fixes #3632, fixes #2738
wrap msgpack to avoid future upstream api changes making troubles
or that we would have to globally spoil our code with extra params.

make sure the packing is always with use_bin_type=False,
thus generating "old" msgpack format (as borg always did) from
bytes objects.

make sure the unpacking is always with raw=True,
thus generating bytes objects.

note:

safe unicode encoding/decoding for some kinds of data types is done in Item
class (see item.pyx), so it is enough if we care for bytes objects on the
msgpack level.

also wrap exception handling, so borg code can catch msgpack specific
exceptions even if the upstream msgpack code raises way too generic
exceptions typed Exception, TypeError or ValueError.
We use own Exception classes for this, upstream classes are deprecated
2018-08-06 17:32:55 +02:00
Thomas Waldmann d2e2f1b89d call socket.gethostname only once 2018-08-04 17:40:40 +02:00
Thomas Waldmann de4afa097c separate borg compact command, fixes #2195 2018-07-14 14:29:28 +02:00
Thomas Waldmann 13e6970437 create: do not give chunker a py file object, it is not needed
the os level file handle is enough, the chunker will prefer it if
valid and won't use the file obj, so we can give None there.

this saves these unneeded syscalls:

fstat(5, {st_mode=S_IFREG|0664, st_size=227063, ...}) = 0
ioctl(5, TCGETS, 0x7ffd635635f0)  = -1 ENOTTY (Inappropriate ioctl for device)
lseek(5, 0, SEEK_CUR)             = 0
2018-07-07 18:06:57 +02:00
Thomas Waldmann 018b62c845 bsdflags: use fd instead of path
this optimization is only needed for linux, the bsd-like platforms
do not need an open file to run a ioctl against, but have bsdflags
in the stat result already.

on linux, this optimization saves 1 file open/close per input file.
2018-07-07 17:30:17 +02:00
Thomas Waldmann 7e47e68e29 acls: use fd instead of path 2018-07-07 17:02:37 +02:00
Thomas Waldmann 113b0eabec xattr: use fd for get_all
when processing regular files, use a fd to query xattrs.

when the file was modified and we chunked it, we have it open anyways.

if not, we open the file once and then query xattrs, in the hope that
this is more efficient than the path based calls.

guess it is less prone to race conditions in any case.
2018-07-07 15:47:56 +02:00
Thomas Waldmann 394d59e6d8 xattr: implement set_all to complement get_all
also: follow_symlinks param defaults to False (we do never use True)

fix tests, xattrs are set via FD now.
2018-07-07 15:47:56 +02:00
Thomas Waldmann c29c3063b0 xattr: use bytes typed path for listxattr, getxattr, setxattr 2018-07-07 15:47:56 +02:00
Thomas Waldmann 9deb90db71 xattr: use bytes typed names for listxattr, getxattr, setxattr 2018-07-07 15:47:56 +02:00
Thomas Waldmann b5a9ac5682 xattr: use bytes typed values for listattr, getxattr, setxattr
- getxattr should only return bytes, not None
- setxattr should not get a None value, just bytes
- remove unneeded tmp vars
2018-07-07 15:47:56 +02:00
Thomas Waldmann de113bab23 move capacity calculation to IndexBase, fixes #2646
we just give how many "usable" hashtable entries we want and it computes
the hashtable capacity internally via int(usable / MAX_LOAD_FACTOR).
2018-06-12 22:25:27 +02:00
Thomas Waldmann e064fcd99b borg check: show progress while rebuilding missing manifest, fixes #3787
(cherry picked from commit 85bc590c75)
2018-05-19 01:28:55 +02:00
Thomas Waldmann 7792cec03a borg check: fixup for "deleting orphaned objs" msgs, fixes #3795
only output msgs if there is actually something to delete.
be more precise, show count of orphaned / superseded objects.

(cherry picked from commit d671e9acf2)
2018-05-18 22:05:38 +02:00
Thomas Waldmann be4fdee3ae more borg check --repair output
(cherry picked from commit e6e1d18f9a)
2018-05-18 22:03:03 +02:00
Thomas Waldmann 1ee4397c1c xattrs: fix borg exception handling on ENOSPC error, fixes #3808
(cherry picked from commit 959beb867b)
2018-05-18 17:27:51 +02:00
TW b80dfc727e
Merge pull request #3725 from ThomasWaldmann/issue-3448
set rc=1 when extracting damaged files, fixes #3448
2018-03-25 20:47:37 +02:00
Thomas Waldmann 232f051c10
cleanup: move "processing files" message to expected place
(now possible as we do not lazy load the files cache any more)
2018-03-24 17:04:20 -07:00
Thomas Waldmann e2f71b5dc3
cleanup: get rid of ignore_inode, replace with cache_mode
ignore_inode == ('i' not in cache_mode)  # i)node
2018-03-24 17:04:20 -07:00
Thomas Waldmann b1e7e7f90a
cleanup: get rid of Cache.do_files, replace with cache_mode
not do_files == (cache_mode == 'd')  # d)isabled
2018-03-24 17:04:20 -07:00
Thomas Waldmann 91e5e231f1
read files cache early, init checkpoint timer after that, see #3394
reading the files cache can take considerable amount of time (a user
reported 1h 42min for a 700MB files cache for a repo with 8M files and
15TB total), so we must init the checkpoint timer after that or borg
will create the checkpoint too early.

creating a checkpoint means (among other stuff) saving the files cache,
which will also take a lot of time in such a case, one time too much.

doing this in a clean way required some refactoring:
- cache_mode is now given to Cache initializer and stored in instance
- the files cache is loaded early in _do_open (if needed)
2018-03-24 17:04:13 -07:00
Thomas Waldmann 1c97efd81e set rc=1 when extracting damaged files, fixes #3448
- size inconsistencies
- file has all-zero replacement chunks

introduced new BackupError exception. when raised while extracting
files, gets handled via emitting a warning, setting rc=1 and
proceeding to next file.
2018-03-25 00:21:06 +01:00
Thomas Waldmann dc48377dc6
fix Archive's checkpoint_interval arg default (300 -> 1800s)
the commandline arg default was already at 1800, so likely this is
only a cosmetic fix.
2018-03-24 16:05:05 -07:00
Thomas Waldmann f979349f07 fix borg recreate --progress (broken by previous commit)
fixup for cb7887836a
2018-03-10 15:41:01 +01:00
Rémi Oudin cb7887836a Fix --progress option. (#3557)
Fix --progress option, fixes #3431
2018-03-10 15:11:08 +01:00
Thomas Waldmann 4e0f369d0a fix borg create never showing M status
the problem was that the upper layer code did not have enough information
about the file, whether it is known or not - and thus, could not decide
correctly whether status should be M)odified or A)dded.

now, file_known_and_unchanged method returns an additional "known"
boolean to fix this.

also: add comment about files cache loading in cache_mode='r'
2018-02-26 11:07:20 +01:00
Alexander 'Leo' Bergolth 74c10e4643 add chunker_params to archive info (at least to json output) 2018-01-25 21:02:39 +01:00
Thomas Waldmann 57a2d920cb check --repair: fix malfunctioning validator, fixes #3444
the major problem was the ('path' in item) expression.
the dict has bytes-typed keys there, so it never succeeded as it
looked for a str key. this is a 1.1 regression, 1.0 was fine.

the dict -> StableDict change is just for being more specific,
the check triggered correctly as StableDict subclasses dict,
it was just a bit too general.

(cherry picked from commit e09892caec)
2017-12-16 21:44:35 +01:00
Sam H b0141c1dc9 include item birthtime in archive (where available) (#3313)
include item birthtime in archive, fixes #3272

* use `safe_ns` when reading birthtime into attributes
* proper order for `birthtime` in `ITEM_KEYS` list
* use `bigint` wrapper for consistency
* Add tests to verify that birthtime is normally preserved, but not preserved when `--nobirthtime` is passed to `borg create`.
2017-11-13 14:55:10 +01:00
Thomas Waldmann 66cd1cd240 stats: do not count data volume twice when checkpointing, fixes #3224 2017-11-05 00:48:17 +01:00
TW 41ccd3d7d1
Merge pull request #3266 from ThomasWaldmann/set-bsdflags-last
set bsdflags last (include immutable flag), fixes #3263
2017-11-04 20:10:34 +01:00
Thomas Waldmann 7aafcc517a recreate: move chunks_healthy when excluding hardlink master, fixes #3228 2017-11-04 18:39:00 +01:00
Thomas Waldmann 90186ad12b get rid of already existing invalid chunks_healthy metadata, see #3218 2017-11-04 18:39:00 +01:00
Thomas Waldmann 7211bb2211 get rid of chunks_healthy when rechunking, fixes #3218 2017-11-04 18:39:00 +01:00
Thomas Waldmann 2c6f9634bc set bsdflags last (include immutable flag), fixes #3263 2017-11-04 15:18:55 +01:00
Thomas Waldmann 427e2ca5fb borg create: fix stats
master branch only (not present in 1.1-maint):

stats were computed at 2 different places, but the summing up was missing.
2017-11-02 18:06:39 +01:00
TW 38dd1f11ac Merge pull request #3181 from ThomasWaldmann/hardlinked-symlink-warning
remove hardlinked symlink warning, update docs
2017-10-17 21:30:53 +02:00
Thomas Waldmann 10adadf685 implement --nobsdflags and --exclude-nodump, fixes #3160
do no read/archive bsdflags: borg create --nobsdflags ...
do not extract/set bsdflags: borg extract --nobsdflags ...

use cases:

- fs shows wrong / random bsdflags (bug in filesystem)
- fs does not support bsdflags anyway
- already archived bsdflags are wrong / unwanted
- borg shows any sort of unwanted effect due to get_flags, esp. on Linux

the nodump flag ("do not backup this file") is not honoured any more by
default because this functionality (esp. if it happened by error or
unexpected) was rather confusing and unexplainable at first to users.

if you want that "do not backup NODUMP-flagged files" behaviour, use:
borg create --exclude-nodump ...
2017-10-17 18:45:32 +02:00
Thomas Waldmann e674822888 remove hardlinked symlinks warning, update docs, fixes #3175
the warning was annoying for people with a lot of such items and
they can not do anything about it anyway.

thus, just document this as a limitation.
2017-10-17 18:34:32 +02:00
Thomas Waldmann 9d6b125e98 borg recreate: correctly compute part file sizes, fixes #3157
when doing in-file checkpointing, borg creates *.borg_part_N files.
complete_file = part_1 + part_2 + ... + part_N

the source item for recreate already has a precomputed (total) size
member, thus we must force recomputation from the (partial) chunks
list to correct the size to be the part's size only.

borg create avoided this problem by computing the size member after
writing all the parts. this is now not required any more.

the bug is mostly cosmetic, borg check will complain, borg extract on
a part file would also complain. but all the complaints only refer to
the wrong metadata of the part files, the part files' contents are
correct.

usually you will never extract or look at part files, but only deal
with the full file, which will be completely valid, all metadata and
content.

you can get rid of the archives with these cosmetic errors by running
borg recreate on them with a fixed borg version. the old part files
will get dropped (because they are usually ignored) and any new part
file created due to checkpointing will be correct.
2017-10-14 04:24:26 +02:00
TW 13a4439bb8 Merge pull request #3120 from ThomasWaldmann/fix-nonlocal-path-detection
fix detection of non-local path, fixes #3108
2017-10-11 01:01:17 +02:00
Thomas Waldmann 60e9249100 fix detection of non-local path, fixes #3108
filenames like ..foobar are valid, so, to detect stuff in upper dirs,
we need to include the path separator and check if it starts with '../'.
2017-10-10 01:36:44 +02:00
Thomas Waldmann 9d3daebd5f recreate: don't crash on attic archives w/o time_end, fixes #3109 2017-10-10 01:17:56 +02:00
Thomas Waldmann 5e2de8ba67 implement files cache mode control, fixes #911
You can now control the files cache mode using this option:

--files-cache={ctime,mtime,size,inode,rechunk,disabled}*

(only some combinations are supported)

Previously, only these modes were supported:
- mtime,size,inode (default of borg < 1.1.0rc4)
- mtime,size (by using --ignore-inode)
- disabled (by using --no-files-cache)

Now, you additionally get:
- ctime alternatively to mtime (more safe), e.g.:
  ctime,size,inode (this is the new default of borg >= 1.1.0rc4)
- rechunk (consider all files as changed, rechunk them)

Deprecated:
- --ignore-inodes (use modes without "inode")
- --no-files-cache (use "disabled" mode)

The tests needed some changes:
- previously, we use os.utime() to set a files mtime (atime) to specific
  values, but that does not work for ctime.
- now use time.sleep() to create the "latest file" that usually does
  not end up in the files cache (see FAQ)
2017-10-01 00:52:32 +02:00
Thomas Waldmann 928bde8676 get rid of datetime.isoformat to avoid bugs like #2994 2017-09-07 14:11:07 +02:00
TW 95d267493e Merge pull request #2959 from ThomasWaldmann/fix-timestamp-option
borg create --timestamp: set start time, fixes #2957
2017-08-25 04:36:44 +02:00
Thomas Waldmann 8a299ae24c borg create --timestamp: set start time, fixes #2957 2017-08-24 04:07:37 +02:00
enkore 1ac49380b1 Merge pull request #2925 from enkore/issue/2376
Datetime formatting
2017-08-22 17:33:17 +02:00
Marian Beermann a836f451ab one datetime formatter to rule them all
# Conflicts:
#	src/borg/helpers.py
2017-08-22 17:32:21 +02:00
Simon Frei 9dc22d230f Refactor the diff functionality
This factors out a lot of the logic in do_diff in archiver.py to Archive in
archive.py and a new class ItemDiff in item.pyx. The idea is to move methods
to the classes that are affected and to make it reusable, primarily for a new
option to fuse (#2475).
2017-08-13 21:23:04 +02:00
Simon Frei 9f6df7d999 Only move and change indendation of code - NOT functional 2017-08-06 01:42:32 +02:00
Marian Beermann a88519d540 archive: delete unused Archive.list_archives 2017-07-29 19:37:37 +02:00
Marian Beermann c93dba0195 archive: create FilesystemObjectProcessors class 2017-07-29 19:37:37 +02:00
Thomas Waldmann 8752039bec integrate new crypto code 2017-07-27 23:33:15 +02:00
Thomas Waldmann fc3498ac53 chunk_incref: use "size" for public api 2017-07-23 13:53:48 +02:00
Thomas Waldmann 186123cb68 give known chunk size to chunk_incref, fixes #2853
chunk_incref was called when dealing with part files without giving the
known chunk size in the size_ parameter.

adjusted LocalCache.chunk_incref to have same signature.
2017-07-23 13:53:47 +02:00
Thomas Waldmann 199f192a65 archive: closely wrap next() called from generator
lgtm:
Calling next() in a generator may cause unintended early termination of
an iteration.

It seems that lgtm did not detect the more loose wrapping that we used
before.
2017-07-23 02:00:55 +02:00
Thomas Waldmann 75c602d294 support borg list repo --format {comment}, fixes #2081
Also supported: {bcomment} for binary and {end} for backup end time.

Refactor so ArchiveFormatter works similar to ItemFormatter.
2017-07-05 23:37:42 +02:00
Thomas Waldmann 726051b9d1 fix double delete in rebuild_refcounts
in case of the Manifest having an IntegrityError,
the entry for the manifest was already deleted.
2017-06-17 23:25:32 +02:00
Marian Beermann e189a4d302 info: use CacheSynchronizer & HashIndex.stats_against 2017-06-13 14:34:10 +02:00
Marian Beermann 5f5371f0b1 implement --glob-archives/-a 2017-06-11 12:15:12 +02:00
Martin Hostettler fd0250d34a Add minimal version of in repository mandatory feature flags.
This should allow us to make sure older borg versions can be cleanly
prevented from doing operations that are no longer safe because of
repository format evolution. This allows more fine grained control than
just incrementing the manifest version. So for example a change that
still allows new archives to be created but would corrupt the repository
when an old version tries to delete an archive or check the repository
would add the new feature to the check and delete set but leave it out
of the write set.

This is somewhat inspired by ext{2,3,4} which uses sets for
compat (everything except fsck), ro-compat (may only be accessed
read-only by older versions) and features (refuse all access).
2017-06-09 23:13:31 +02:00
Marian Beermann 3f8a0221ee Revert "move chunker to borg.algorithms"
This reverts commit 956b50b29c.

# Conflicts:
#	setup.py
#	src/borg/archive.py
#	src/borg/helpers.py
2017-06-07 23:51:42 +02:00
TW 50bcd7843d recreate: keep timestamps as in original archive, fixes #2384 (#2607)
the timestamps of the recreated archive (in the archive metadata and
also in the manifest) are now as they were for the original archive.

they are important metadata about the archive contents and should
therefore be kept "as is".

note: when using -v --stats, the timestamps shown there for recreate
      are about the recreate start/end/duration.
2017-06-05 09:59:17 +02:00
Marian Beermann 8ad309ae2a recreate: if single archive is not processed, exit 2 2017-06-03 15:47:01 +02:00
Thomas Waldmann efec00b39c use stat with follow_symlinks=False
should be equivalent to using os.lstat() before.
2017-05-22 17:54:42 +02:00
Thomas Waldmann 094376a8ad require and use chown with follow_symlinks=False
should be equivalent to using os.lchown() before.
2017-05-22 17:54:42 +02:00
Marian Beermann a976e11a63 create crypto package with key, keymanager, low_level 2017-05-02 20:49:27 +02:00
Marian Beermann 956b50b29c move chunker to borg.algorithms 2017-05-02 19:15:01 +02:00
Marian Beermann 580496b592 create patterns module 2017-05-01 22:20:33 +02:00
Thomas Waldmann 28b0700437 verify_data: fix IntegrityError handling for defect chunks, fixes #2442
just getting data from the repo can already raise IntegrityErrors
in LoggedIO, so we need to catch them also.

see also the code a few lines above where this is done in the same way.
2017-04-25 15:48:16 +02:00
Mark Edgington 798127f636 allow excluding parent and including child, fixes #2314
This fixes the problem raised by issue #2314 by requiring that each root
subtree be fully traversed.

The problem occurs when a patterns file excludes a parent directory P later
in the file, but earlier in the file a subdirectory S of P is included.
Because a tree is processed recursively with a depth-first search, P is
processed before S is.  Previously, if P was excluded, then S would not even
be considered.  Now, it is possible to recurse into P nonetheless, while not
adding P (as a directory entry) to the archive.

With this commit, a `-` in a patterns-file will allow an excluded directory
to be searched for matching descendants.  If the old behavior is desired, it
can be achieved by using a `!` in place of the `-`.

The following is a list of specific changes made by this commit:

 * renamed InclExclPattern named-tuple -> CmdTuple (with names 'val' and 'cmd'), since it is used more generally for commands, and not only for representing patterns.
 * represent commands as IECommand enum types (RootPath, PatternStyle, Include, Exclude, ExcludeNoRecurse)
 * archiver: Archiver.build_matcher() paths arg renamed -> include_paths to prevent confusion as to whether the list of paths are to be included or excluded.
 * helpers: PatternMatcher has recurse_dir attribute that is used to communicate whether an excluded dir should be recursed (used by Archiver._process())
 * archiver: Archiver.build_matcher() now only returns a PatternMatcher instance, and not an include_patterns list -- this list is now created and housed within the PatternMatcher instance, and can be accessed from there.
 * moved operation of finding unmatched patterns from Archiver to PatternMatcher.get_unmatched_include_patterns()
 * added / modified some documentation of code
 * renamed _PATTERN_STYLES -> _PATTERN_CLASSES since "style" is ambiguous and this helps clarify that the set contains classes and not instances.
 * have PatternBase subclass instances store whether excluded dirs are to be recursed.  Because PatternBase objs are created corresponding to each +, -, ! command it is necessary to differentiate - from ! within these objects.
 * add test for '!' exclusion rule (which doesn't recurse)
2017-04-12 12:06:18 -04:00
enkore 736a815972 Merge pull request #2342 from ThomasWaldmann/generic-hardlinks
Generic hardlinks
2017-04-05 14:34:29 +02:00
Thomas Waldmann 155f38c233 remove comment about strange hardlink_masters term
(maybe revisit this later, this is not in scope of the generic hardlinks refactor)
2017-04-05 13:56:57 +02:00
Thomas Waldmann 8f769a9b24 implement and use hardlinkable() helper 2017-04-05 13:38:27 +02:00
Thomas Waldmann cb86bda413 extract: implement extract_helper context manager
Most code of the CM is just moved 1:1 from the regular file block.

Use the CM for regular files, FIFOs and devices, but not for:
- directories (can not have hardlinks)
- symlinks (we can not support hardlinked symlinks)
2017-04-05 13:36:09 +02:00
Thomas Waldmann cda7465038 extract: indent code, no semantics change
prepare for a extract_helper context manager

(some changes may seem superfluous, but see the following changesets)
2017-04-05 13:36:00 +02:00
Thomas Waldmann 3cc1cdd2ed extract: refactor hardlinks related code
prepare for a extract_helper context manager

(some changes may seem superfluous, but see the following changesets)
2017-04-05 13:03:58 +02:00
Thomas Waldmann 23cc679617 no hardlinking for directories and symlinks
- nlink > 1 for dirs does not mean hardlinking
  (at least not everywhere, wondering how apple does it)
- we can not archive hardlinked symlinks due to item.source dual-use,
  see issue #2343.

likely nobody uses this anyway.
2017-04-05 13:03:53 +02:00
Thomas Waldmann 1f6dc55eab simplify char/block device file dispatching 2017-04-05 13:01:04 +02:00
Thomas Waldmann 9478e8abd0 support hardlinks via create_helper context manager
also: reduce code duplication
2017-04-05 12:58:25 +02:00
Thomas Waldmann e5d094d0ce use same finalizing code for hardlink masters and slaves
hardlink slaves get a precomputed size attribute now.
2017-04-05 12:31:15 +02:00
Thomas Waldmann a206a85890 indent block, no semantics change 2017-04-05 12:31:11 +02:00
Thomas Waldmann 66f4cd1a29 minor refactor for regular file hardlink processing 2017-04-05 12:24:08 +02:00
Marian Beermann b2953357ed recreate: add --recompress flag, avoid weirdo use of args.compression 2017-04-04 15:11:15 +02:00
Marian Beermann 2ff75d58f2 remove Chunk() 2017-04-04 00:16:15 +02:00
Marian Beermann 69fb9bd403 remove --compression-from 2017-04-04 00:16:15 +02:00
Marian Beermann 0847c3f9a5 Unify ComprSpec and CompressionSpec; don't instanciate Compressors right away 2017-04-04 00:16:14 +02:00
Marian Beermann d1826cca92 Rename CompressionDecider1 -> CompressionDecider 2017-04-03 21:31:28 +02:00
Marian Beermann 0c7410104c Rename Chunk.meta[compress] => Chunk.meta[compressor] 2017-04-03 21:31:28 +02:00
Marian Beermann a27f585eaa refactor CompressionDecider2 into a meta Compressor 2017-04-03 21:31:28 +02:00
Thomas Waldmann bdbcbf7bb8 extract: remove duplicate code
anything at <path> gets nuked already a few lines above, if possible.
2017-04-01 16:56:21 +02:00
Thomas Waldmann d4e27e2952 extract: small bugfix and refactoring for parent dir creation
make_parent(path) helper to reduce code duplication.
also use it for directories although makedirs can also do it.

bugfix: also create parent dir for device files, if needed.
2017-03-28 23:22:25 +02:00
Thomas Waldmann ceaf4a8fcf extract: small bugfix and optimization for hardlink masters
if a hardlink master is not in the to-be-extracted subset, the "x"
status was not displayed for it.

also, the matcher was called twice for matching items.
2017-03-28 22:02:54 +02:00
Thomas Waldmann 945880af47 implement async_response, add wait=True for add_chunk/chunk_decref
Before this changeset, async responses were:
- if not an error: ignored
- if an error: raised as response to the arbitrary/unrelated next command

Now, after sending async commands,  the async_response command must be used
to process outstanding responses / exceptions.

We are avoiding to pile up lots of stuff in cases of high latency, because we do NOT
first wait until ALL responses have arrived, but we just can begin to process responses.
Calls with wait=False will just return what we already have received.
Repeated calls with wait=True until None is returned will fetch all responses.

Async commands now actually could have non-exception non-None results, but
this is not used yet. None responses are still dropped.

The motivation for this is to have a clear separation between a request
blowing up because it (itself) failed and failures unrelated to that request /
to that line in the sourcecode.

also: fix processing for async repo obj deletes

exception_ignored is a special object used that is "not None" (as None is used to signal
"finished with processing async results") but also not a potential async response result value.

Also:

added wait=True to chunk_decref() and add_chunk()

this makes async processing explicit - the default is synchronous and you only
need to be careful and do extra steps for async processing if you explicitly
request async by calling with wait=False (usually for speed reasons).

to process async results, use async_response, see above.
2017-03-26 17:33:19 +02:00
Thomas Waldmann 2414cd4df7 use immutable data structure for the compression spec, fixes #2331
the bug was compr_args.update(compr_spec), helpers.py:2168 - that mutated
the compression spec dict (and not just some local one, but the compr spec
dict parsed from the commandline args).

so a change that was intended just for 1 chunk changed the desired
compression level on the archive scope.

I refactored the stuff to use a namedtuple (which is immutable, so such
effects can not happen again).
2017-03-24 03:09:55 +01:00
TW 10d4c97cad Merge pull request #2309 from ThomasWaldmann/fix-2304
clamp (nano)second values to unproblematic range, fixes #2304
2017-03-16 20:31:39 +01:00
Thomas Waldmann b7a17a6db7 clamp (nano)second values to unproblematic range, fixes #2304
filesystem -> clamp -> archive (create)
2017-03-16 20:31:05 +01:00
enkore 883a7eefb2 Archive: allocate zeros when needed (#2308)
fixes huge memory usage of mount (8 MiB × number of archives)
2017-03-15 17:08:07 +01:00
Marian Beermann cdb4df0885 --log-json: time property on most progress/log objects, remove is_prompt 2017-03-09 21:36:37 +01:00
Abdel-Rahman 63b5cbfc99 extract: warning RC for unextracted big extended attributes, followup (#2258)
* Set warning exit code when xattr is too big

* Warnings for more extended attributes errors (ENOTSUP, EACCES)

* Add tests for all xattr warnings
2017-03-08 17:13:42 +01:00
TW 89114d4885 Merge pull request #2198 from Abogical/too-big-xattr
Handle big extended attributes. Fixes #2161
2017-03-04 17:54:58 +01:00
Marian Beermann d5515b6952 add msgid to progress output 2017-02-28 01:19:20 +01:00
enkore 7c9c4b61d7 Merge pull request #2157 from ThomasWaldmann/add-filesize
archived file items: add size metadata
2017-02-27 18:05:43 +01:00
TW 9bc825a27a Merge pull request #2184 from ThomasWaldmann/zap
borg delete --force --force to delete severely corrupted archives, fixes #1975
2017-02-26 18:44:31 +01:00
Marian Beermann 70c11976bc Add --log-json option for structured logging output 2017-02-26 16:25:58 +01:00
Thomas Waldmann 4d81b186ec borg delete --force --force to delete severely corrupted archives, fixes #1975 2017-02-24 01:28:42 +01:00
Abogical 4c9bc96fb7 Print a warning for too big extended attributes 2017-02-23 23:42:56 +02:00
Thomas Waldmann 7da0a9c982 borg extract: check file size consistency 2017-02-23 21:46:15 +01:00
Thomas Waldmann adc4da280d borg check: check file size consistency 2017-02-23 21:46:15 +01:00
Marian Beermann 4f1db82f6d info <archive>: use Archive.info() for both JSON and human display 2017-02-23 21:39:56 +01:00
Thomas Waldmann 50068c596d rename Item.file_size -> get_size
file_size is misleading here because one thinks of on-disk file size,
but for compressed=True, there is no such on-disk file.
2017-02-23 21:27:05 +01:00
Thomas Waldmann 0021052dbd reduce code duplication 2017-02-23 21:24:37 +01:00
Thomas Waldmann a52b54dc3c archived file items: add size metadata
if an item has a chunk list, pre-compute the total size and store it into "size" metadata entry.

this speeds up access to item size (e.g. for regular files) and could also be used to verify the validity of the chunks list.

note about hardlinks: size is only stored for hardlink masters (only they have an own chunk list)
2017-02-23 21:24:37 +01:00
Marian Beermann 6180f5055c info: --json for archives 2017-02-23 14:28:15 +01:00
Marian Beermann 7cbade2f8c create: add --json option 2017-02-23 12:00:21 +01:00
Marian Beermann 69f7810658 info: show utilization of maximum archive size
See #1452

This is 100 % accurate.

Also increases maximum data size by ~41 bytes. Not 100 % side-effect free;
if you manage to exactly land in that area then older Borg would not read
it. OTOH it gives us a nice round number there.
2017-02-22 23:47:21 +01:00
TW 268d74bb43 Merge pull request #2181 from ThomasWaldmann/fix-2180
archive check: detect and fix missing replacement chunks, fixes #2180
2017-02-21 21:57:23 +01:00
Thomas Waldmann b82f648875 archive check: detect and fix missing all-zero replacement chunks, fixes #2180 2017-02-19 03:05:41 +01:00
Thomas Waldmann b05893e723 borg rpc: use limited msgpack.Unpacker, fixes #2139
we do not trust the remote, so we are careful unpacking its responses.

the remote could return manipulated msgpack data that announces e.g.
a huge array or map or string. the local would then need to allocate huge
amounts of RAM in expectation of that data (no matter whether really
that much is coming or not).

by using limits in the Unpacker, a ValueError will be raised if unexpected
amounts of data shall get unpacked. memory DoS will be avoided.
2017-02-17 05:44:48 +01:00
TW c6ea34be96 Merge pull request #2111 from ThomasWaldmann/merge-1.0-maint
Merge 1.0-maint
2017-02-01 12:13:37 +01:00
Leo Antunes dd6b90fe6c change dir_is_tagged to use os.path.exists()
Add --keep-exclude-tags option as alias to --keep-tag-files and
deprecate the later. Also make tagging accept directories as tags,
allowing things like `--exclude-if-present .git`.

fixes #1999
2017-01-29 18:13:51 +01:00
Thomas Waldmann c0dc644ef6 Merge branch '1.0-maint' into merge-1.0-maint
# Conflicts:
#	MANIFEST.in
#	Vagrantfile
#	docs/changes.rst
#	docs/usage/mount.rst.inc
#	src/borg/archiver.py
#	src/borg/fuse.py
#	src/borg/repository.py
2017-01-29 05:49:53 +01:00
Marian Beermann 5cc292c52c fix performance regression in "borg info ::archive" 2017-01-13 15:33:38 +01:00
Marian Beermann 7923088ff9 check: pick better insufficent archives matched warning from TW's merge 2017-01-12 17:04:51 +01:00
Marian Beermann ecad0ed53a Merge branch '1.0-maint' into merge/1.0-maint
# Conflicts: ... everywhere ...
#	.travis.yml
#	Vagrantfile
#	borg/testsuite/key.py
#	docs/changes.rst
#	docs/quickstart.rst
#	docs/usage.rst
#	docs/usage/upgrade.rst.inc
#	src/borg/archive.py
#	src/borg/archiver.py
#	src/borg/crypto.pyx
#	src/borg/helpers.py
#	src/borg/key.py
#	src/borg/remote.py
#	src/borg/repository.py
#	src/borg/testsuite/archive.py
#	src/borg/testsuite/archiver.py
#	src/borg/testsuite/crypto.py
#	src/borg/testsuite/helpers.py
#	src/borg/testsuite/repository.py
#	src/borg/upgrader.py
#	tox.ini
2017-01-12 15:01:41 +01:00
Marian Beermann d15fb241bd check: handle duplicate archive items neatly
# Conflicts:
#	src/borg/archive.py
2016-12-20 22:53:55 +01:00
Marian Beermann 5e1cb9d899 Add tertiary authentication for metadata (TAM) 2016-12-20 22:53:53 +01:00
Marian Beermann 63ce627a35 fix in-file checkpoints when clock jumps 2016-12-17 13:59:37 +01:00
Marian Beermann a9db2a2e55 Merge branch '1.0-maint' into master
# Conflicts:
#	src/borg/archive.py
#	src/borg/archiver.py
#	src/borg/helpers.py
2016-12-17 13:26:28 +01:00
Marian Beermann 34e19ccb6a mention failed operation in per-file warnings
on the one hand one can say it's ugly global state, on the other it's
totally handy!

just have to keep that in mind for MT, but it's rather obvious.
2016-12-14 15:20:08 +01:00
Marian Beermann b7eaeee266 clean imports, remove unused code 2016-12-03 17:50:50 +01:00
Marian Beermann b3707f7175 Replace backup_io with a singleton
This is some 15 times faster than @contextmanager, because no instance
creation is involved and no generator has to be maintained. Overall
difference is low, but still nice for a very simple change.
2016-12-03 11:52:48 +01:00
Marian Beermann a9395dd8b1 recreate: don't rechunkify unless explicitly told so 2016-12-02 20:19:59 +01:00
Marian Beermann 30df63c509 recreate: remove special-cased --dry-run 2016-12-02 18:15:11 +01:00
Marian Beermann c1ccad82c3 recreate: update/remove/rename outdated comments 2016-12-02 12:54:27 +01:00
Marian Beermann eade10a0a8 recreate: fix crash on checkpoint 2016-12-02 11:39:10 +01:00
Marian Beermann eb940e6779 recreate: fix rechunking dropping all chunks on the floor 2016-12-02 11:20:26 +01:00
Marian Beermann b410392899 recreate repo: fix only one archive being processed 2016-12-02 11:09:52 +01:00
Thomas Waldmann a100fb67eb Merge branch '1.0-maint' into merge-1.0-maint
# Conflicts:
#	AUTHORS
#	src/borg/archive.py
#	src/borg/key.py
2016-11-30 05:38:04 +01:00
Thomas Waldmann c83a124e65 Merge branch '1.0-maint' (into master) 2016-11-28 02:23:32 +01:00
Marian Beermann 93b03ea231 recreate: re-use existing checkpoint functionality 2016-11-20 18:13:51 +01:00
Marian Beermann 44935aa8ea recreate: remove interruption blah, autocommit blah, resuming blah 2016-11-19 16:49:20 +01:00
enkore cf8f8fb746 Merge pull request #1846 from Abogical/master
Improve extract progress display, for #1721
2016-11-14 21:43:35 +01:00
Abogical b737866905 Improve extract progress display and ProgressIndicatorPercent 2016-11-13 23:41:01 +02:00
Marian Beermann 0d2b76fa7d Merge branch '1.0-maint' into merge/1.0-maint 2016-11-13 15:58:42 +01:00
Thomas Waldmann 4c884fd075 borg check --first / --last / --sort / --prefix, fixes #1663 2016-10-20 16:51:26 +02:00
Thomas Waldmann b5f9858055 move first/last/sort_by-multiple functionality into Manifest.list
also: rename list_filtered to list_considering
2016-10-15 01:04:56 +02:00
Thomas Waldmann b88e82d99d remove debug-xxx commands, fixes #1627
we use "debug xxx" subcommands now. docs updated.

also makes "borg help" shorter as not all debug-xxx commands
show up, but just 1 main "debug" command.
2016-10-10 00:22:01 +02:00
Thomas Waldmann cdb8d64fe2 check for index vs. segment files object count mismatch 2016-10-05 17:36:36 +02:00
Thomas Waldmann 6624ca9cdb verify_data: do a linear scan in disk-order 2016-10-05 17:36:36 +02:00
Thomas Waldmann 19eb75984e borg check --verify-data tuning 2016-09-29 18:40:02 +02:00
Marian Beermann 9cef0a9ed8 Fix broken --progress ellipsis for double-cell paths 2016-09-27 11:35:45 +02:00
Marian Beermann 8164524d99 Fix broken --progress for double-cell paths 2016-09-25 22:18:37 +02:00