Read or modify this set, only add validated str to it:
Archive.tags: Optional[set[str]]
borg info [--json] <archive> displays a list of comma-separated archive tags (currently always empty).
borg 1.x encouraged users to put everything into the archive name:
- name of the dataset
- timestamp (usually used to make the archive name unique)
- maybe also hostname (when backing up to same repo from multiple hosts)
- maybe also username (when backing up to same repo from multiple users)
borg2 now discourages users from putting the timestamp into the name,
because we rather want same name within a series of archives - thus,
the field width for the name can be narrower.
the ID of the archive is now the only unique identifier, thus it is
moved to the leftmost place.
256bits (64 hex digits) was a bit much and as borg can also deal with
abbreviated IDs, we only show 32bits (8 hex digits) by default.
the ID is followed by the timestamp (also quite "interesting", because
it usually differs for different archives).
then following are: archive name, user name, host name - these might be
always the same if there is only one series of archives in a repo.
use 2 blanks separating the fields for better readability.
Needed to change this because listing just the
archive names is pretty useless if names are not
unique.
The short list is likely mostly used by scripts to
iterate over all archives, so outputting IDs is
better.
Because it ended the loop only when .list() returned an
empty result, this always needed one call more than
necessary.
We can also detect that we are finished, if .list()
returns less than the limit we gave to it.
Also: reduce code duplication by using repo_lister func.
borg compact now uses ChunkIndex (a specialized, memory-efficient data structure),
so it needs less memory now. Also, it saves that chunks index to cache/chunks in
the repository.
When the chunks index is needed, it is first tried to get it from cache/chunks.
If that fails, fall back to building the chunks index via repository.list(),
which can be rather slow and immediately cache the resulting ChunkIndex in the
repo.
borg check --repair currently just deletes the chunks cache, because it might
have deleted some invalid chunks in the repo.
cache.close now saves the chunks index to cache/chunks in repo if it
was modified.
thus, borg create will update the cached chunks index with new chunks.
cache/chunks_hash can be used to validate cache/chunks (and also to validate /
invalidate locally cached copies of that).
we discard all files cache entries referring to files
with timestamps AFTER we started the backup.
so, even in case we would back up an inconsistent file
that has been changed while we backed it up, we would
not have a files cache entry for it and would fully
read/chunk/hash it again in next backup.
if we detect the conditions for this (rare) race,
abort reading the file and retry.
The caller (_process_any) will do up to MAX_RETRIES
before giving up. If it gives up, a warning is logged
and the file is not written to the archive and won't
be memorized in the files cache either.
Thus, the file will be read/chunked/hashed again at
the next borg create run.
- on explicit request, update .last_refresh_dt inside _create_lock / _delete_lock
- reset .last_refresh_dt if we kill our own lock
- be more precise, have exactly the datetime of the lock in .last_refresh_dt
- cosmetic: do refresh/stale time comparisons always in the same way
- changes to locally stored files cache:
- store as files.<H(archive_name)>
- user can manually control suffix via env var
- if local files cache is not found, build from previous archive.
- enable rebuilding the files cache via loading the previous
archive's metadata from the repo (better than starting with
empty files cache and needing to read/chunk/hash all files).
previous archive == same archive name, latest timestamp in repo.
- remove AdHocCache (not needed any more, slow)
- remove BORG_CACHE_IMPL, we only have one
- remove cache lock (this was blocking parallel backups to same
repo from same machine/user).
Cache entries now have ctime AND mtime.
Note: TTL and age still needed for discarding removed files.
But due to the separate files caches per series, the TTL
was lowered to 2 (from 20).
repository.list is slow, so rather use the chunkindex,
which might be cached in future. currently, it also uses
repository.list, but at least we can solve the problem
at one place then.
under all circumstances, we must avoid that the lock
gets stale due to not being refreshed in time.
there is some internal rate limiting in _lock_refresh,
so calling it often should be no problem.