In some rare cases files could be created which contain null IDs (all
zero) in their content list. This was caused by a race condition between
growing the `Content` slice and inserting the blob IDs into it. In some
cases the blob ID was written to the old slice, which a short time
afterwards was replaced with a larger copy, that did not yet contain the
blob ID.
As the FileSaver is asynchronously waiting for all blobs of a file to be
stored, the number of active files is higher than the number of files
from which restic is reading concurrently. Thus to not confuse users,
only display files in the status from which restic is currently reading.
After reading and chunking all data in a file, the FutureFile still has
to wait until the FutureBlobs are completed. This was done synchronously
which results in blocking the file saver and prevents the next file from
being read.
By replacing the FutureBlob with a callback, it becomes possible to
complete the FutureFile asynchronously.
Archiver.Save queries the current time multiple times. This commit
removes one of these calls as they showed up while profiling a backup of
a nearly unchanged dataset containing 3 million files.
if x { return true } return false => return x
fmt.Sprintf("%v", x) => fmt.Sprint(x) or x.String()
The fmt.Sprintf idiom is still used in the SecretString tests, where it
serves security hardening.
The backup command failed if a directory contains duplicate entries.
Downgrade the severity of this problem from fatal error to a warning.
This allows users to still create a backup.
SaveTree did not use the TreeSaver but rather managed the tree
collection and upload itself. This prevents using the parallelism
offered by the TreeSaver and duplicates all related code. Using the
TreeSaver can provide some speed-ups as all steps within the backup tree
now rely on FutureNodes. This can be especially relevant for backups
with large amounts of explicitly specified files.
The main difference between SaveTree and SaveDir is, that only the
former can save tree blobs in which nodes have a different name than the
actual file on disk. This is the result of resolving name conflicts
between multiple files with the same name. The filename that must be
used within the snapshot is now passed directly to
restic.NodeFromFileInfo. This ensures that a FutureNode already contains
the correct filename.
FutureBlob now uses a Take() method as a more memory-efficient way to
retrieve the futures result. In addition, futures are now collected
while saving the file. As only a limited number of blobs can be queued
for uploading, for a large file nearly all FutureBlobs already have
their result ready, such that the FutureBlob object just consumes
memory.
There is no real difference between the FutureTree and FutureFile
structs. However, differentiating both increases the size of the
FutureNode struct.
The FutureNode struct is now only 16 bytes large on 64bit platforms.
That way is has a very low overhead if the corresponding file/directory
was not processed yet.
There is a special case for nodes that were reused from the parent
snapshot, as a go channel seems to have 96 bytes overhead which would
result in a memory usage regression.
After the `BlobSaver` job is submitted, the buffer can be released and
reused by another `FileSaver` even before `BlobSaver.Save` returns. That
FileSaver will the change `buf.Data` leading to wrong backup statistics.
Found by `go test -race ./...`:
WARNING: DATA RACE
Write at 0x00c0000784a0 by goroutine 41:
github.com/restic/restic/internal/archiver.(*FileSaver).saveFile()
/home/michael/Projekte/restic/restic/internal/archiver/file_saver.go:176 +0x789
github.com/restic/restic/internal/archiver.(*FileSaver).worker()
/home/michael/Projekte/restic/restic/internal/archiver/file_saver.go:242 +0x2af
github.com/restic/restic/internal/archiver.NewFileSaver.func2()
/home/michael/Projekte/restic/restic/internal/archiver/file_saver.go:88 +0x5d
golang.org/x/sync/errgroup.(*Group).Go.func1()
/home/michael/go/pkg/mod/golang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57 +0x91
Previous read at 0x00c0000784a0 by goroutine 29:
github.com/restic/restic/internal/archiver.(*BlobSaver).Save()
/home/michael/Projekte/restic/restic/internal/archiver/blob_saver.go:57 +0x1dd
github.com/restic/restic/internal/archiver.(*BlobSaver).Save-fm()
<autogenerated>:1 +0xac
github.com/restic/restic/internal/archiver.(*FileSaver).saveFile()
/home/michael/Projekte/restic/restic/internal/archiver/file_saver.go:191 +0x855
github.com/restic/restic/internal/archiver.(*FileSaver).worker()
/home/michael/Projekte/restic/restic/internal/archiver/file_saver.go:242 +0x2af
github.com/restic/restic/internal/archiver.NewFileSaver.func2()
/home/michael/Projekte/restic/restic/internal/archiver/file_saver.go:88 +0x5d
golang.org/x/sync/errgroup.(*Group).Go.func1()
/home/michael/go/pkg/mod/golang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57 +0x91
Now with the asynchronous uploaders there's no more benefit from using
more blob savers than we have CPUs. Thus use just one blob saver for
each CPU we are allowed to use.
Previously, SaveAndEncrypt would assemble blobs into packs and either
return immediately if the pack is not yet full or upload the pack file
otherwise. The upload will block the current goroutine until it
finishes.
Now, the upload is done using separate goroutines. This requires changes
to the error handling. As uploads are no longer tied to a SaveAndEncrypt
call, failed uploads are signaled using an errgroup.
To count the uploaded amount of data, the pack header overhead is no
longer returned by `packer.Finalize` but rather by
`packer.HeaderOverhead`. This helper method is necessary to continue
returning the pack header overhead directly to the responsible call to
`repository.SaveBlob`. Without the method this would not be possible,
as packs are finalized asynchronously.
github.com/pkg/errors is no longer getting updates, because Go 1.13
went with the more flexible errors.{As,Is} function. Use those instead:
errors from pkg/errors already support the Unwrap interface used by 1.13
error handling. Also:
* check for io.EOF with a straight ==. That value should not be wrapped,
and the chunker (whose error is checked in the cases changed) does not
wrap it.
* Give custom Error methods pointer receivers, so there's no ambiguity
when type-switching since the value type will no longer implement error.
* Make restic.ErrAlreadyLocked private, and rename it to
alreadyLockedError to match the stdlib convention that error type
names end in Error.
* Same with rest.ErrIsNotExist => rest.notExistError.
* Make s3.Backend.IsAccessDenied a private function.
This isn't doing anything. Channels should get cleaned up by the GC when
the last reference to them disappears, just like all other data
structures. Also inlined BufferPool.Put in Buffer.Release, its only
caller.
This can be caused when the test has uploaded four blobs, then queues
two blobs for upload which are delayed. Then a seventh file can be
opened which lead to a test failure.
Before, the scanner would could files twice if they were included in the
list of backup targets twice, e.g. `restic backup foo foo/bar` would
could the file `foo/bar` twice.
This commit uses the tree structure from the archiver to run the
scanner, so both parts see the same files.