- The SaveBlob method now checks for duplicates.
- Moves handling of pending blobs to MasterIndex.
-> also cleans up pending index entries when they are saved in the index
-> when using SaveBlob no need to care about index any longer
- Always check for full index and save it when storing packs.
-> removes the need of an index uploader
-> also removes the verbose "uploaded intermediate index" messages
- The Flush method now also saves the index
- Fix race condition when checking and saving full/non-finalized indexes
When looking up a blob in the master index, with several
indexes present in the master index, a significant amount of time
is spent generating errors for each failed lookup. However, these
errors are often used to check if a blob is present, but the contents
are not inspected making the overhead of the error not useful.
Instead, change Index.Lookup (and Index.LookupSize) to instead return
a boolean denoting if the blob was found instead of an error. Also change
all the calls to these functions to handle the new function signature.
benchmark old ns/op new ns/op delta
BenchmarkMasterIndexLookupSingleIndex-6 820 897 +9.39%
BenchmarkMasterIndexLookupMultipleIndex-6 12821 2001 -84.39%
BenchmarkMasterIndexLookupSingleIndexUnknown-6 5378 492 -90.85%
BenchmarkMasterIndexLookupMultipleIndexUnknown-6 17026 1649 -90.31%
benchmark old allocs new allocs delta
BenchmarkMasterIndexLookupSingleIndex-6 9 9 +0.00%
BenchmarkMasterIndexLookupMultipleIndex-6 59 19 -67.80%
BenchmarkMasterIndexLookupSingleIndexUnknown-6 22 6 -72.73%
BenchmarkMasterIndexLookupMultipleIndexUnknown-6 72 16 -77.78%
benchmark old bytes new bytes delta
BenchmarkMasterIndexLookupSingleIndex-6 160 160 +0.00%
BenchmarkMasterIndexLookupMultipleIndex-6 3200 240 -92.50%
BenchmarkMasterIndexLookupSingleIndexUnknown-6 1232 48 -96.10%
BenchmarkMasterIndexLookupMultipleIndexUnknown-6 4272 128 -97.00%
When setting up the index used for benchmarking, use math/rand instead of
crypto/rand since the generated ids don't need to be evenly distributed,
and not be secure against guessing. As such, use a different random id
function (only available during tests) that uses math/rand instead.
When backing up several million files (>14M tested here) with few changes,
a large amount of time is spent failing to find an id in an index and creating
an error to signify this. Since this is checked using the Has method,
which doesn't use this error, this time creating the error is wasted.
Instead, directly check if the given id and type are present in the index.
This also avoids reporting all the packs containing this blob, further
reducing cpu usage.