Because it ended the loop only when .list() returned an
empty result, this always needed one call more than
necessary.
We can also detect that we are finished, if .list()
returns less than the limit we gave to it.
Also: reduce code duplication by using repo_lister func.
borg compact now uses ChunkIndex (a specialized, memory-efficient data structure),
so it needs less memory now. Also, it saves that chunks index to cache/chunks in
the repository.
When the chunks index is needed, it is first tried to get it from cache/chunks.
If that fails, fall back to building the chunks index via repository.list(),
which can be rather slow and immediately cache the resulting ChunkIndex in the
repo.
borg check --repair currently just deletes the chunks cache, because it might
have deleted some invalid chunks in the repo.
cache.close now saves the chunks index to cache/chunks in repo if it
was modified.
thus, borg create will update the cached chunks index with new chunks.
cache/chunks_hash can be used to validate cache/chunks (and also to validate /
invalidate locally cached copies of that).
we discard all files cache entries referring to files
with timestamps AFTER we started the backup.
so, even in case we would back up an inconsistent file
that has been changed while we backed it up, we would
not have a files cache entry for it and would fully
read/chunk/hash it again in next backup.
if we detect the conditions for this (rare) race,
abort reading the file and retry.
The caller (_process_any) will do up to MAX_RETRIES
before giving up. If it gives up, a warning is logged
and the file is not written to the archive and won't
be memorized in the files cache either.
Thus, the file will be read/chunked/hashed again at
the next borg create run.
- on explicit request, update .last_refresh_dt inside _create_lock / _delete_lock
- reset .last_refresh_dt if we kill our own lock
- be more precise, have exactly the datetime of the lock in .last_refresh_dt
- cosmetic: do refresh/stale time comparisons always in the same way
- changes to locally stored files cache:
- store as files.<H(archive_name)>
- user can manually control suffix via env var
- if local files cache is not found, build from previous archive.
- enable rebuilding the files cache via loading the previous
archive's metadata from the repo (better than starting with
empty files cache and needing to read/chunk/hash all files).
previous archive == same archive name, latest timestamp in repo.
- remove AdHocCache (not needed any more, slow)
- remove BORG_CACHE_IMPL, we only have one
- remove cache lock (this was blocking parallel backups to same
repo from same machine/user).
Cache entries now have ctime AND mtime.
Note: TTL and age still needed for discarding removed files.
But due to the separate files caches per series, the TTL
was lowered to 2 (from 20).
repository.list is slow, so rather use the chunkindex,
which might be cached in future. currently, it also uses
repository.list, but at least we can solve the problem
at one place then.
under all circumstances, we must avoid that the lock
gets stale due to not being refreshed in time.
there is some internal rate limiting in _lock_refresh,
so calling it often should be no problem.
in borg 1.x, we used to put a timestamp into the archive name to make
it unique, because borg1 required that.
borg2 does not require unique archive names, but it encourages you
to even use an identical archive name within the same SERIES of archives.
that makes matching (e.g. for prune, but also at other places) much
simpler and borg KNOWS which archives belong to the same series.
for the archives directory, we only need to know the archive IDs,
everything else can be fetched from the ArchiveItem in the repo.
so we store empty files into archives/* with the archiv ID as name.
this makes some "by-id" operations much easier and we don't have to
deal with a useless "store_key" anymore.
removed .delete method - we can't delete by name anymore as we
allow duplicate names for the series feature. everything uses
delete_by_id() now.
also: simplify, clean up, refactor
- we should always output name and id when talking about an archive
- no problem anymore if names in archives directory are "duplicate"
- use "by-id" archives directory entry delete function
- rewrite/simplify test for borg check --undelete-archives
so if one works with backup series, one can just do:
borg prune --keep-daily 30 seriesname
seriesname will then do a precise match on the archive names
and select that series.
aid:<archive-id-prefix> can be used for -a / --match-archives
to match on the archive id (prefix) instead of the name.
NAME positional argument now also supports matching (and aid:),
but requires that there is exactly ONE result.