item: path, source, user, group
for non-unicode stuff borg 1.2 had "bpath".
now we have:
path - unicode approximation (invalid stuff replaced by ?)
path_b64 - base64(path_bytes) # only if needed
source has the same issue as path and is now covered also.
user and group are usually unicode or even pure ASCII,
but we rather are cautious and cover them also.
One cannot "to not x", but one can "not to x".
Avoiding split infinitives gives the added bonus that machine
translation yields better results.
setup (n/adj) vs set(v) up. We don't "I setup it" but "I set it up".
Likewise for login(n/adj) and log(v) in, backup(n/adj) and back(v) up.
this option did not change behaviour since longer,
we only had kept it for API compatibility.
as a borg2 repo server won't have old clients talking to it,
we can safely remove this everywhere now.
- file status A/M/E counters
- chunking time
- hashing time
- rx_bytes / tx_bytes
Note: the sleep() in the test is needed due to timestamp granularity on linux being much more coarse than expected (uses the system timer, 100Hz or 250Hz).
when using .scan(limit, marker), we used to use the last chunkid from
the previously returned scan result to remember how far we got and
from where we need to continue.
as this approach used the repo index to look up the respective segment/offset,
it was problematic if the code using scan was re-writing the chunk to
a new segment/offset, updating the repo index (e.g. when recompressing a chunk)
and basically destroying the memory about from where we need to continue
scanning.
thus, directly returning (segment, offset) as marker is easier and solves this issue.
legacy: add/remove ctype/clevel bytes prefix of compressed data
new: use a separate metadata dict
compressors: use an int as ID, not a len 1 bytestring
borg < 2:
obj = encrypted(compressed(data))
borg 2:
obj = enc_meta_len32 + encrypted(msgpacked(meta)) + encrypted(compressed(data))
handle compr / decompr in repoobj
move the assert_id call from decrypt to RepoObj.parse
also:
- for AEADKeyBase, add a dummy assert_id (not needed here)
- only test assert_id for other if not AEADKeyBase instance
- remove test_getting_wrong_chunk. assert_id is called elsewhere
and is not needed any more anyway with the new AEAD crypto.
- only give manifest (includes key, repo, repo_objs)
- only return manifest from Manifest.load (includes key, repo, repo_objs)
- timezone aware timestamps
- str representation with +HHMM or +HH:MM
- get rid of to_locatime
- fix with_timestamp
- have archive start/end time always in local time with tz or as given
- idea: do not lose tz information
then we know when a backup was made and even from
which timezone it was made. if we want to compute
utc, we can do that using these infos.
this makes a quite nice archives list, with timestamps
as expected (in local time with timezone info).
at some places we just enforce utc, like for the
repo manifest timestamp or for the transaction log,
these are usually not looked at by the user.
since python 3.7, .isoformat() is usable IF timespec != "auto"
is given ("auto" [default] would be as evil as before, sometimes
formatting with, sometimes without microseconds).
also since python 3.7, there is now .fromisoformat().
implemented by introducing one level of indirection, the limit is now
very high, so it is not practically relevant any more.
we always use the indirection (storing the metadata stream chunk ids list not
directly into the archive item, but into some repo objects referenced by the new
ArchiveItem.item_ptrs list).
thus, the code behaves the same for all archive sizes.
the old code did just 1 attempt to detect the repo decryption key.
if the first chunkid we got from the chunks hashtable iterator was accidentally
the id of the chunk we intentionally corrupted in test_delete_double_force,
setup of the key failed and that made the test crash.
in practice, this could of course also happen if chunks are corrupted, thus
we now do many retries with other chunks before giving up.
error handling was improved: do not return None (instead of a key), it just
leads to weird crashes elsewhere, but fail early with IntegrityError and a
reasonable error msg.
rename method to make_key to avoid confusion with borg.crypto.key.identify_key.
see ticket and borg.helpers.msgpack docstring.
this changeset implements the full migration to
msgpack 2.0 spec (use_bin_type=True, raw=False).
still needed compat to the past is done via want_bytes decoder in borg.item.
borg now has the chunks list in every item with content.
due to the symmetric way how borg now deals with hardlinks using
item.hlid, processing gets much simpler.
but some places where borg deals with other "sources" of hardlinks
still need to do some hardlink management:
borg uses the HardLinkManager there now (which is not much more
than a dict, but keeps documentation at one place and avoids some
code duplication we had before).
item.hlid is computed via hardlink_id function.
support hardlinked symlinks, fixes#2379
as we use item.hlid now to group hardlinks together,
there is no conflict with the item.source usage for
symlink targets any more.
2nd+ hardlinks now add to the files count as did the 1st one.
for borg, now all hardlinks are created equal.
so any hardlink item with chunks now adds to the "file" count.
ItemFormatter: support {hlid} instead of {source} for hardlinks
the id must now always be given correctly because
the AEAD crypto modes authenticate the chunk id.
the special case when id == MANIFEST_ID is now handled
inside assert_id, so we never need to give a None id.
all-zero chunks are propagated as:
CH_ALLOC, data=None, size=len(zeros)
other chunks are:
CH_DATA, data=data, size=len(data)
also: remove the comment with the wrong assumption
this is similar to #4777.
borg check must not crash if an archive metadata block does not decrypt.
Instead, report the archive_id, remove the archive from the manifest and skip to the next archive.
OpenBSD does not have `lchmod()` causing `os.lchmod` to be unavailable
on this platform. As a result ArchiverTestCase::test_basic_functionality
fails when run manually (#2055).
OpenBSD does have `fchmodat()`, which has a flag that makes it behave
like `lchmod()`. In Python this can be used via `os.chmod(path, mode,
follow_symlinks=False)`.
As of Python 3.3 `os.lchmod(path, mode)` is equivalent to
`os.chmod(path, mode, follow_symlinks=False)`. As such, switching to the
latter is preferred as it enables more platforms to do the right thing.
export-tar: just msgpack and b64encode all item metadata and
put that into a BORG specific PAX header.
this is *additional* to the standard tar metadata.
import-tar: when detecting the BORG specific PAX header, just get
all metadata from there (and ignore the standard tar
metadata).
--tar-format=GNU|PAX (default: GNU)
changed the tests which use GNU tar cli tool to use --tar-format=GNU
explicitly, so they don't break in case we change the default.
atime timestamp is only present in output if the archive item has it
(which is not the case by default, needs "borg create --atime ...").
we already have .decrypt(id, data, ...).
i changed .encrypt(chunk) to .encrypt(id, data).
the old borg crypto won't really need or use the id,
but the new AEAD crypto will authenticate the id in future.
looks like with a .tar file created by the tar tool,
tarinfo.mtime is a float [s]. So, after converting to
nanoseconds, we need to cast to int because that's what
Item.mtime wants.
also added a safe_ns() there to clip values to the safe range.
first time borg info is invoked on a borg 1.1 repo, it can take
a rather long time computing and caching some stats values for
1.1 archives, which borg 1.2 archives have in their archive
metadata structure. be patient, esp. if you have lots of old
archives.
following invocations are much faster.
Paths that come from --paths-from-stdin or --paths-from-command don't
have a parent_fd or name, so we need to use the os_stat helper that
falls back on the full path if those are missing.
Fixesborgbackup/borg#6009
Depending on the number of archives in a repository, the archive check part
of the check operation can take some time, so it should have a progress
indicator as well.
if we back up stdin / pipes / regular files (or devices with --read-special),
that may take longer, depending on the amount of content data (could be many GiBs).
usually borg announces file status AFTER backing up the file,
when the final status of the file is known.
with this change, borg announces a preliminary file status before
the content data is processed. if the file status changes afterwards,
e.g. due to an error, it will also announce that as final file status.
this is different default behaviour than in borg < 1.2:
default (numeric_owner=False) is to use the user/group name from the archive,
look up the local uid / gid and then use that for the FUSE fs.
when --numeric-owner is given (numeric_owner=True), then the uid/gid
from the archive is directly used (as it was the default behaviour in
borg < 1.2).
this was implemented like this (changing the default behaviour) to make
borg mount and borg extract behave more similar considering usage of
user/group numeric archived ids or archived names mapped to corresponding
numeric local system ids.
also, both now use the same function to get the uid/gid from the item.
fuse:
- add user and group name entries to default_dir
- also: set internal_dict(!) of new Item with data from Item.as_dict()
when given with borg create, borg will not get xattrs from input files (and thus, it will not archive xattrs).
when given with borg extract, borg will not read xattrs from archive and it will not set xattrs on extracted files.
when given with borg create, borg will not get ACLs from input files (and thus, it will not archive ACLs).
when given with borg extract, borg will not read ACLs from archive and it will not set ACLs on extracted files.