writing: put type into repoobj metadata
reading: check wanted type against type we got
repoobj metadata is encrypted and authenticated.
repoobj data is encrypted and authenticated, also (separately).
encryption and decryption of both metadata and data get the
same "chunk ID" as AAD, so both are "bound" to that (same) ID.
a repo-side attacker can neither see cleartext metadata/data,
nor successfully tamper with it (AEAD decryption would fail).
also, a repo-side attacker could not replace a repoobj A with a
differently typed repoobj B without borg noticing:
- the metadata/data is cryptographically bound to its ID.
authentication/decryption would fail on mismatch.
- the type check would fail.
thus, the problem (see CVEs in changelog) solved in borg 1 by the
manifest and archive TAMs is now already solved by the type check.
+ os.scandir instead of os.listdir
Improved speed and added flexibility with attributes (name, path, is_dir(), is_file())
+ use is_dir / is_file to make sure we're reading only dirs / files respectively
+ Filtering to particular start, end index range built in
+ Move value bounds of segment (index) into constants module and use them instead
Resolves#7597
(forward patch from commits c9f35a16e9bf9e7073c486553177cef79ff1cb06^..edb5e749f512b7737b6933e13b7e61fefcd17bcb)
--chunker-params=fail,4096,rrrEErrrr means:
- cut chunks of 4096b fixed size (last chunk in a file can be less)
- read chunks 0, 1 and 2 successfully
- error at chunk 3 and 4 (simulated OSError(errno.EIO))
- read successfully again for the next 4 chunks
Chunks are counted inside the chunker instance, starting
from 0, always increasing while the same instance is used.
Read chunks as well as failed chunks count up by 1.
we now just treat that one .borg_part file we might have inside
checkpoint archives as a normal file.
people can recognize via the file name it is a partial file.
nobody cares for statistics of checkpoint files and the final
archive now does not contain any partial files any more, thus
no needs to maintain statistics about count and size of part
files.
Same change for .recreate_cmdline -> .recreate_command_line .
JSON output key "command_line":
borg 1.x: sys.argv [list of str]
borg 2: shlex.join(sys.argv) [str]
since python 3.7, .isoformat() is usable IF timespec != "auto"
is given ("auto" [default] would be as evil as before, sometimes
formatting with, sometimes without microseconds).
also since python 3.7, there is now .fromisoformat().
implemented by introducing one level of indirection, the limit is now
very high, so it is not practically relevant any more.
we always use the indirection (storing the metadata stream chunk ids list not
directly into the archive item, but into some repo objects referenced by the new
ArchiveItem.item_ptrs list).
thus, the code behaves the same for all archive sizes.
* make constants for files cache mode more clear
Traditionally, DEFAULT_FILES_CACHE_MODE_UI and DEFAULT_FILES_CACHE_MODE
were - as the naming scheme implies - the same setting, one being the UI
representation as given to the --files-cache command line option and the
other being the same default value in the internal representation.
It happended that the actual value used in borg create always comes from
DEFAULT_FILES_CACHE_MODE_UI (because that does have the --files-cache
option) whereas for all other commands (that do not use the files cache) it
comes from DEFAULT_FILES_CACHE_MODE.
PR #5777 then abused this fact to implement the optimisation to skip loading
of the files cache in those other commands by changing the value of
DEFAULT_FILES_CACHE_MODE to disabled.
This however also changes the meaning of that variable and thus redesignates
it to something not matching the original naming anymore.
Anyone not aware of this change and the intention behind it looking at the
code would have a hard time figuring this out and be easily mislead.
This does away with the confusion making the code more maintainable by
renaming DEFAULT_FILES_CACHE_MODE to FILES_CACHE_MODE_DISABLED, making the
new intention of that internal default clear.
* make constant for files cache mode UI default match naming scheme
Item.hlid: same id, same hardlink (xxh64 digest)
Item.hardlink_master: not used for new archives any more
Item.source: not used for hardlink slaves any more
note: this required a slight increase of MAX_OBJECT_SIZE so that MAX_DATA_SIZE
could stay the same as before.
For PUT2, compute the hash over the whole entry (header and content, excluding
hash and crc32 fields, because the crc32 computation includes the hash).
Also: refactor crc32 checks into function, use f-strings, structure _read in
a more logical sequential order.
write_put: avoid creating a large temporary bytes object
why use xxh64?
- fast even without hw acceleration
- borg depends on it already anyway
- stronger than crc32 and strong enough for this purpose
Argon2 the second part: implement encryption/decryption of argon2 keys
borg init --key-algorithm=argon2 (new default, older pbkdf2 also still available)
borg key change-passphrase: keep key algorithm the same
borg key change-location: keep key algorithm the same
use env var BORG_TESTONLY_WEAKEN_KDF=1 to resource limit (cpu, memory, ...) the kdf when running the automated tests.
also:
cleanup class structure: less inheritance, more mixins.
define type bytes using the 4:4 split
upper 4 bits are ciphersuite:
0 == legacy AES-CTR based stuff
1+ == new AEAD stuff
lower 4 bits are keytype:
legacy: a bit mixed up, as it was...
new stuff: 0=keyfile 1=repokey, ...
C code and the repo index use uint32 type for segment file offsets,
so when opening a repo and the config max_segment_size is too big,
fail early.
Also disallow setting a too big value via "borg config".
include item birthtime in archive, fixes#3272
* use `safe_ns` when reading birthtime into attributes
* proper order for `birthtime` in `ITEM_KEYS` list
* use `bigint` wrapper for consistency
* Add tests to verify that birthtime is normally preserved, but not preserved when `--nobirthtime` is passed to `borg create`.
You can now control the files cache mode using this option:
--files-cache={ctime,mtime,size,inode,rechunk,disabled}*
(only some combinations are supported)
Previously, only these modes were supported:
- mtime,size,inode (default of borg < 1.1.0rc4)
- mtime,size (by using --ignore-inode)
- disabled (by using --no-files-cache)
Now, you additionally get:
- ctime alternatively to mtime (more safe), e.g.:
ctime,size,inode (this is the new default of borg >= 1.1.0rc4)
- rechunk (consider all files as changed, rechunk them)
Deprecated:
- --ignore-inodes (use modes without "inode")
- --no-files-cache (use "disabled" mode)
The tests needed some changes:
- previously, we use os.utime() to set a files mtime (atime) to specific
values, but that does not work for ctime.
- now use time.sleep() to create the "latest file" that usually does
not end up in the files cache (see FAQ)
if an item has a chunk list, pre-compute the total size and store it into "size" metadata entry.
this speeds up access to item size (e.g. for regular files) and could also be used to verify the validity of the chunks list.
note about hardlinks: size is only stored for hardlink masters (only they have an own chunk list)
See #1452
This is 100 % accurate.
Also increases maximum data size by ~41 bytes. Not 100 % side-effect free;
if you manage to exactly land in that area then older Borg would not read
it. OTOH it gives us a nice round number there.