as borg now uses repository.store_load and .store_save to load
and save the chunks cache, we need a rather high limit here.
this is a quick fix, the real fix might be using chunks of the
data (preferably <= MAX_OBJECT_SIZE), so there is less to unpack
at once.
Read or modify this set, only add validated str to it:
Archive.tags: Optional[set[str]]
borg info [--json] <archive> displays a list of comma-separated archive tags (currently always empty).
borg 1.x encouraged users to put everything into the archive name:
- name of the dataset
- timestamp (usually used to make the archive name unique)
- maybe also hostname (when backing up to same repo from multiple hosts)
- maybe also username (when backing up to same repo from multiple users)
borg2 now discourages users from putting the timestamp into the name,
because we rather want same name within a series of archives - thus,
the field width for the name can be narrower.
the ID of the archive is now the only unique identifier, thus it is
moved to the leftmost place.
256bits (64 hex digits) was a bit much and as borg can also deal with
abbreviated IDs, we only show 32bits (8 hex digits) by default.
the ID is followed by the timestamp (also quite "interesting", because
it usually differs for different archives).
then following are: archive name, user name, host name - these might be
always the same if there is only one series of archives in a repo.
use 2 blanks separating the fields for better readability.
Needed to change this because listing just the
archive names is pretty useless if names are not
unique.
The short list is likely mostly used by scripts to
iterate over all archives, so outputting IDs is
better.
Because it ended the loop only when .list() returned an
empty result, this always needed one call more than
necessary.
We can also detect that we are finished, if .list()
returns less than the limit we gave to it.
Also: reduce code duplication by using repo_lister func.
borg compact now uses ChunkIndex (a specialized, memory-efficient data structure),
so it needs less memory now. Also, it saves that chunks index to cache/chunks in
the repository.
When the chunks index is needed, it is first tried to get it from cache/chunks.
If that fails, fall back to building the chunks index via repository.list(),
which can be rather slow and immediately cache the resulting ChunkIndex in the
repo.
borg check --repair currently just deletes the chunks cache, because it might
have deleted some invalid chunks in the repo.
cache.close now saves the chunks index to cache/chunks in repo if it
was modified.
thus, borg create will update the cached chunks index with new chunks.
cache/chunks_hash can be used to validate cache/chunks (and also to validate /
invalidate locally cached copies of that).
we discard all files cache entries referring to files
with timestamps AFTER we started the backup.
so, even in case we would back up an inconsistent file
that has been changed while we backed it up, we would
not have a files cache entry for it and would fully
read/chunk/hash it again in next backup.
if we detect the conditions for this (rare) race,
abort reading the file and retry.
The caller (_process_any) will do up to MAX_RETRIES
before giving up. If it gives up, a warning is logged
and the file is not written to the archive and won't
be memorized in the files cache either.
Thus, the file will be read/chunked/hashed again at
the next borg create run.