reuse_chunk is the complement of add_chunk for already existing chunks.
It doesn't do refcounting anymore.
.seen_chunk does not return the refcount anymore, but just whether the chunk exists.
If we add a new chunk, it immediately sets its refcount to MAX_VALUE, so
there is no difference anymore between previously existing chunks and new
chunks added. This makes the stats even more useless, but we have less complexity.
.init_chunks has just built self.chunks using repository.list(), so don't
call that again, but just iterate over self.chunks.
also some other changes, making the code much simpler.
When the AdhocCache(WithFiles) queries chunk IDs from the repo to build the chunks
index, it won't know their refcount and thus all chunks in the index have their
refcount at the MAX_VALUE (representing "infinite") and that would never decrease
nor could that ever reach zero and get the chunk deleted from the repo.
Only completely new chunks first written in the current borg run have a valid
refcount.
In some exception handlers, borg tried to clean up chunks that won't be used
by an item by decref'ing them. That is either:
- pointless due to refcount being at MAX_VALUE
- inefficient, because the user might retry the backup and would need to
transmit these chunks to the repo again.
We'll just rely on borg compact ONLY to clean up any unused/orphan chunks.
borg1 needed this due to its transactional / rollback behaviour:
if there was uncommitted stuff in the repo, next repo opening automatically
rolled back to last commit. thus we needed checkpoint archives to reference
chunks and commit the repo.
borg2 does not do that anymore, unused chunks are only removed when the
user invokes borg compact.
thus, if a borg create gets interrupted, the user can just run borg create
again and it will find some chunks are already in the repo, making progress
even if borg create gets frequently interrupted.
This was an implementation specific "in on-disk order" list method that made sense
with borg 1.x log-like segment files only.
But we now store objects separately, so there is no "in on-disk order" anymore.
This was used for an implementation detail of the borg 1.x
repository code, dumping uncommitted objects. Not needed any more.
Also remove local repository method scan_low_level, it was only used by --ghost.
Tests were a bit tricky as there is validation on 2 layers now:
- repository3 does an xxh64 check, finds most corruptions already
- on the archives level, borg also does an even stronger cryptographic check
Dummy returns all-zero stats from that call.
Problem was that these values can't be computed from the chunks cache
anymore. No correct refcounts, often no size information.
Also removed hashindex.ChunkIndex.summarize (previously used by the above mentioned
.stats() call) and .stats_against (unused) for same reason.
Lots of low-level code written back then to optimize runtime of some
functions.
We'll solve this differently by doing less stats, esp. if it is expensive to compute.