This removes the list of in-flight blobs from the master index and
instead keeps a list of "known" blobs in the Archiver. "known" here
means: either already processed, or included in an index. This property
is tested atomically, when the blob is not in the list of "known" blobs,
it is added to the list and the caller is responsible to make this
happen (i.e. save the blob).
This adds code to the master index to allow saving duplicate blobs
within the repacker. In this mode, only the list of currently in flight
blobs is consulted, and not the index. This correct because while
repacking, a unique list of blobs is saved again to the index.
This commit fixes a situation reported by a user where two indexes
contained information about the same pack without overlap, e.g.:
Index 3e6a32 contained:
{
"id": "c02e3b",
"blobs": [
{
"id": "8114b1",
"type": "data",
"offset": 0,
"length": 530107
}
]
}
And index 62da5f contained:
{
"id": "c02e3b",
"blobs": [
{
"id": "e344f8",
"type": "data",
"offset": 1975848,
"length": 3426468
},
{
"id": "939ed9",
"type": "data",
"offset": 530107,
"length": 1445741
}
]
}
This commit adds all blobs in a pack in one atomic operation so that
intermediate such as these do not happen.
... by first adding a preliminary index entry and making this fail if
an index entry for the same blob already exists.
A preliminary index entry is characterized by not yet being associated
with a pack. Until now, these entries where added to the index just
like final index entries using index.Store, which silently overwrites
existing index entries.
This commit adds a new method index.StoreInProgress which refuses to
overwrite existing index entries and allows for creating preliminary
index entries only. The existing method index.Store has not been
changed and continues to silently overwrite existing index entries.
This distinction is important, as otherwise, it would be impossible to
update a preliminary index entry after the blob has been written to a
pack.
Resolves: restic#292
This changes `repository.LoadBlob()` so that a destination buffer must
be provided, which enables the fuse code to use a buffer from a
`sync.Pool`. In addition, release the buffers when the file is closed.
At the moment, the max memory usage is defined by the max file size that
is read in one go (e.g. with `cat`). It could be further optimized by
implementing a LRU caching scheme.
Since backend.ID is always a slice of constant length, use an array
instead of a slice. Mostly, arrays behave as slices, except that an
array cannot be nil, so use `*backend.ID` insteaf of `backend.ID` in
places where the absence of an ID is possible (e.g. for the Subtree of a
Node, which may not present when the node is a file node).
This change allows to directly use backend.ID as the the key for a map,
so that arbitrary data structures (e.g. a Set implemented as a
map[backend.ID]struct{}) can easily be formed.