see ticket and borg.helpers.msgpack docstring.
this changeset implements the full migration to
msgpack 2.0 spec (use_bin_type=True, raw=False).
still needed compat to the past is done via want_bytes decoder in borg.item.
borg now has the chunks list in every item with content.
due to the symmetric way how borg now deals with hardlinks using
item.hlid, processing gets much simpler.
but some places where borg deals with other "sources" of hardlinks
still need to do some hardlink management:
borg uses the HardLinkManager there now (which is not much more
than a dict, but keeps documentation at one place and avoids some
code duplication we had before).
item.hlid is computed via hardlink_id function.
support hardlinked symlinks, fixes#2379
as we use item.hlid now to group hardlinks together,
there is no conflict with the item.source usage for
symlink targets any more.
2nd+ hardlinks now add to the files count as did the 1st one.
for borg, now all hardlinks are created equal.
so any hardlink item with chunks now adds to the "file" count.
ItemFormatter: support {hlid} instead of {source} for hardlinks
the id must now always be given correctly because
the AEAD crypto modes authenticate the chunk id.
the special case when id == MANIFEST_ID is now handled
inside assert_id, so we never need to give a None id.
all-zero chunks are propagated as:
CH_ALLOC, data=None, size=len(zeros)
other chunks are:
CH_DATA, data=data, size=len(data)
also: remove the comment with the wrong assumption
this is similar to #4777.
borg check must not crash if an archive metadata block does not decrypt.
Instead, report the archive_id, remove the archive from the manifest and skip to the next archive.
OpenBSD does not have `lchmod()` causing `os.lchmod` to be unavailable
on this platform. As a result ArchiverTestCase::test_basic_functionality
fails when run manually (#2055).
OpenBSD does have `fchmodat()`, which has a flag that makes it behave
like `lchmod()`. In Python this can be used via `os.chmod(path, mode,
follow_symlinks=False)`.
As of Python 3.3 `os.lchmod(path, mode)` is equivalent to
`os.chmod(path, mode, follow_symlinks=False)`. As such, switching to the
latter is preferred as it enables more platforms to do the right thing.
export-tar: just msgpack and b64encode all item metadata and
put that into a BORG specific PAX header.
this is *additional* to the standard tar metadata.
import-tar: when detecting the BORG specific PAX header, just get
all metadata from there (and ignore the standard tar
metadata).
--tar-format=GNU|PAX (default: GNU)
changed the tests which use GNU tar cli tool to use --tar-format=GNU
explicitly, so they don't break in case we change the default.
atime timestamp is only present in output if the archive item has it
(which is not the case by default, needs "borg create --atime ...").
we already have .decrypt(id, data, ...).
i changed .encrypt(chunk) to .encrypt(id, data).
the old borg crypto won't really need or use the id,
but the new AEAD crypto will authenticate the id in future.
looks like with a .tar file created by the tar tool,
tarinfo.mtime is a float [s]. So, after converting to
nanoseconds, we need to cast to int because that's what
Item.mtime wants.
also added a safe_ns() there to clip values to the safe range.
first time borg info is invoked on a borg 1.1 repo, it can take
a rather long time computing and caching some stats values for
1.1 archives, which borg 1.2 archives have in their archive
metadata structure. be patient, esp. if you have lots of old
archives.
following invocations are much faster.
Paths that come from --paths-from-stdin or --paths-from-command don't
have a parent_fd or name, so we need to use the os_stat helper that
falls back on the full path if those are missing.
Fixesborgbackup/borg#6009
Depending on the number of archives in a repository, the archive check part
of the check operation can take some time, so it should have a progress
indicator as well.
if we back up stdin / pipes / regular files (or devices with --read-special),
that may take longer, depending on the amount of content data (could be many GiBs).
usually borg announces file status AFTER backing up the file,
when the final status of the file is known.
with this change, borg announces a preliminary file status before
the content data is processed. if the file status changes afterwards,
e.g. due to an error, it will also announce that as final file status.
this is different default behaviour than in borg < 1.2:
default (numeric_owner=False) is to use the user/group name from the archive,
look up the local uid / gid and then use that for the FUSE fs.
when --numeric-owner is given (numeric_owner=True), then the uid/gid
from the archive is directly used (as it was the default behaviour in
borg < 1.2).
this was implemented like this (changing the default behaviour) to make
borg mount and borg extract behave more similar considering usage of
user/group numeric archived ids or archived names mapped to corresponding
numeric local system ids.
also, both now use the same function to get the uid/gid from the item.
fuse:
- add user and group name entries to default_dir
- also: set internal_dict(!) of new Item with data from Item.as_dict()
when given with borg create, borg will not get xattrs from input files (and thus, it will not archive xattrs).
when given with borg extract, borg will not read xattrs from archive and it will not set xattrs on extracted files.
when given with borg create, borg will not get ACLs from input files (and thus, it will not archive ACLs).
when given with borg extract, borg will not read ACLs from archive and it will not set ACLs on extracted files.
a file map can be:
- created internally inside chunkify by calling sparsemap, which uses
SEEK_DATA / SEEK_HOLE to determine data and hole ranges inside a
seekable sparse file.
Usage: borg create --sparse --chunker-params=fixed,BLOCKSIZE ...
BLOCKSIZE is the chunker blocksize here, not the filesystem blocksize!
- made by some other means and given to the chunkify function.
this is not used yet, but in future this could be used to only read
the changed parts and seek over the (known) unchanged parts of a file.
sparsemap: the generate range sizes are multiples of the fs block size.
the tests assume 4kiB fs block size.
allow creating archives using stdout of given command
In addition to allowing:
some-command --param value | borg create REPO::ARCH -
also allow:
borg create --content-from-command create REPO::ARCH -- some-command --param value
The difference is that the latter approach deals with errors properly.
In the former example, an archive is created no matter what. Even, if
`some-command` aborts and the output is truncated, Borg won't realize.
In the latter example, the status code is checked and archive creation
is aborted properly when appropriate.
The locally defined preload() function overwrites the preload boolean keyword
argument, always evaluating to true, so preloading is done, even when not
requested by the caller, causing a memory leak.
Also move its definition outside of the loop.
This issue was found by Antonio Larrosa in borg issue #5202.
The code used for error reporting crashes due to an invalid utf-8
sequence. Use errors='replace' to never crash there. Errors
are expected in input data when borg check is run.
support platforms with no os.link, fixes#4901
if we don't have os.link, we just extract another copy instead of making a hardlink.
for that to work, we need to have (and keep) the chunks list in hardlink_masters.
if the file is not a regular file, but a hardlink slave with a not
extracted hardlink master, chunks will be None and we must not call
preload(chunks).
(cherry picked from commit 291d58efa1)
On windows os.open does not work for directories.
If borg tries to open an directory on windows, None is returned
as file descriptor. The archive and archiver where adjusted to
handle the case if a file descriptor is None.
on linux, acls are based on xattrs, so do these closeby:
1. listxattr -> keys (without acl related keys)
2. for all keys: getxattr
3. acl-related getxattr by acl library
for fd-based operations, we would have to open the file, but for
char / block devices this has unwanted effects, even if we do not
read from the device.
thus, we use path (or dir_fd + name) based ops here.
races via changing path components can be avoided by opening the
parent directory and using parent_fd + file_name combination with
*at style functions to access the directories' contents.
wrap msgpack to avoid future upstream api changes making troubles
or that we would have to globally spoil our code with extra params.
make sure the packing is always with use_bin_type=False,
thus generating "old" msgpack format (as borg always did) from
bytes objects.
make sure the unpacking is always with raw=True,
thus generating bytes objects.
note:
safe unicode encoding/decoding for some kinds of data types is done in Item
class (see item.pyx), so it is enough if we care for bytes objects on the
msgpack level.
also wrap exception handling, so borg code can catch msgpack specific
exceptions even if the upstream msgpack code raises way too generic
exceptions typed Exception, TypeError or ValueError.
We use own Exception classes for this, upstream classes are deprecated
the os level file handle is enough, the chunker will prefer it if
valid and won't use the file obj, so we can give None there.
this saves these unneeded syscalls:
fstat(5, {st_mode=S_IFREG|0664, st_size=227063, ...}) = 0
ioctl(5, TCGETS, 0x7ffd635635f0) = -1 ENOTTY (Inappropriate ioctl for device)
lseek(5, 0, SEEK_CUR) = 0
this optimization is only needed for linux, the bsd-like platforms
do not need an open file to run a ioctl against, but have bsdflags
in the stat result already.
on linux, this optimization saves 1 file open/close per input file.
when processing regular files, use a fd to query xattrs.
when the file was modified and we chunked it, we have it open anyways.
if not, we open the file once and then query xattrs, in the hope that
this is more efficient than the path based calls.
guess it is less prone to race conditions in any case.
only output msgs if there is actually something to delete.
be more precise, show count of orphaned / superseded objects.
(cherry picked from commit d671e9acf2)
reading the files cache can take considerable amount of time (a user
reported 1h 42min for a 700MB files cache for a repo with 8M files and
15TB total), so we must init the checkpoint timer after that or borg
will create the checkpoint too early.
creating a checkpoint means (among other stuff) saving the files cache,
which will also take a lot of time in such a case, one time too much.
doing this in a clean way required some refactoring:
- cache_mode is now given to Cache initializer and stored in instance
- the files cache is loaded early in _do_open (if needed)
- size inconsistencies
- file has all-zero replacement chunks
introduced new BackupError exception. when raised while extracting
files, gets handled via emitting a warning, setting rc=1 and
proceeding to next file.
the problem was that the upper layer code did not have enough information
about the file, whether it is known or not - and thus, could not decide
correctly whether status should be M)odified or A)dded.
now, file_known_and_unchanged method returns an additional "known"
boolean to fix this.
also: add comment about files cache loading in cache_mode='r'
the major problem was the ('path' in item) expression.
the dict has bytes-typed keys there, so it never succeeded as it
looked for a str key. this is a 1.1 regression, 1.0 was fine.
the dict -> StableDict change is just for being more specific,
the check triggered correctly as StableDict subclasses dict,
it was just a bit too general.
(cherry picked from commit e09892caec)
include item birthtime in archive, fixes#3272
* use `safe_ns` when reading birthtime into attributes
* proper order for `birthtime` in `ITEM_KEYS` list
* use `bigint` wrapper for consistency
* Add tests to verify that birthtime is normally preserved, but not preserved when `--nobirthtime` is passed to `borg create`.
do no read/archive bsdflags: borg create --nobsdflags ...
do not extract/set bsdflags: borg extract --nobsdflags ...
use cases:
- fs shows wrong / random bsdflags (bug in filesystem)
- fs does not support bsdflags anyway
- already archived bsdflags are wrong / unwanted
- borg shows any sort of unwanted effect due to get_flags, esp. on Linux
the nodump flag ("do not backup this file") is not honoured any more by
default because this functionality (esp. if it happened by error or
unexpected) was rather confusing and unexplainable at first to users.
if you want that "do not backup NODUMP-flagged files" behaviour, use:
borg create --exclude-nodump ...
when doing in-file checkpointing, borg creates *.borg_part_N files.
complete_file = part_1 + part_2 + ... + part_N
the source item for recreate already has a precomputed (total) size
member, thus we must force recomputation from the (partial) chunks
list to correct the size to be the part's size only.
borg create avoided this problem by computing the size member after
writing all the parts. this is now not required any more.
the bug is mostly cosmetic, borg check will complain, borg extract on
a part file would also complain. but all the complaints only refer to
the wrong metadata of the part files, the part files' contents are
correct.
usually you will never extract or look at part files, but only deal
with the full file, which will be completely valid, all metadata and
content.
you can get rid of the archives with these cosmetic errors by running
borg recreate on them with a fixed borg version. the old part files
will get dropped (because they are usually ignored) and any new part
file created due to checkpointing will be correct.
You can now control the files cache mode using this option:
--files-cache={ctime,mtime,size,inode,rechunk,disabled}*
(only some combinations are supported)
Previously, only these modes were supported:
- mtime,size,inode (default of borg < 1.1.0rc4)
- mtime,size (by using --ignore-inode)
- disabled (by using --no-files-cache)
Now, you additionally get:
- ctime alternatively to mtime (more safe), e.g.:
ctime,size,inode (this is the new default of borg >= 1.1.0rc4)
- rechunk (consider all files as changed, rechunk them)
Deprecated:
- --ignore-inodes (use modes without "inode")
- --no-files-cache (use "disabled" mode)
The tests needed some changes:
- previously, we use os.utime() to set a files mtime (atime) to specific
values, but that does not work for ctime.
- now use time.sleep() to create the "latest file" that usually does
not end up in the files cache (see FAQ)
This factors out a lot of the logic in do_diff in archiver.py to Archive in
archive.py and a new class ItemDiff in item.pyx. The idea is to move methods
to the classes that are affected and to make it reusable, primarily for a new
option to fuse (#2475).