Implemented sparse file support to remove this blocker for people backing up lots of
huge sparse files (like VM images). Attic could not support this use case yet as it would
have restored all files to their fully expanded size, possibly running out of disk space if
the total expanded size would be bigger than the available space.
Please note that this is a very simple implementation of sparse file support - at backup time,
it does not do anything special (it just reads all these zero bytes, chunks, compresses and
encrypts them as usual). At restore time, it detects chunks that are completely filled with zeros
and does a seek on the output file rather than a normal data write, so it creates a hole in
a sparse file. The chunk size for these all-zero chunks is currently 10MiB, so it'll create holes
of multiples of that size (depends also a bit on fs block size, alignment, previously written data).
Special cases like sparse files starting and/or ending with a hole are supported.
Please note that it will currently always create sparse files at restore time if it detects all-zero
chunks.
Also improved:
I needed a constant for the max. chunk size, so I introduced CHUNK_MAX (see also
existing CHUNK_MIN) for the maximum chunk size (which is the same as the chunk
buffer size).
Attic still always uses 10MiB chunk buffer size now, but it could be changed now more easily.
Archive timestamps are stored as the output of datetime.isoformat().
This function omits microseconds in the string output if the
microseconds are zero (as documented and explained at
https://bugs.python.org/issue7342).
Parsing of timestamps assumes there are always microseconds present
after a decimal point. This is not always true. Handle this case where
it is not true by explicitly using '0' microseconds when not present.
This commit fixes#282
datetime.isoformat() has different output depending on whether
microseconds are zero or not. Add test cases to ensure we handle both
cases correctly in an archive.
less calls to posix_fadvise (which seem to force a write-cache sync-to-disk and
wait for that to complete) - if we call it after we synced anyway, we don't lose time.
also: fixed a bug in the os.fsync call, it needs the fileno.
note:
- we call this frequently AFTER re-filling the chunker buffer,
so even big input files have little cache impact.
- there is still some cache impact due to output files caching,
if the repository is on a locally mounted filesystem.
this safes some back-and-forth between C and Python code and also some memory
management overhead as we can always reuse the same read_buf instead of letting
Python allocate and free a up to 10MB big buffer for each buffer filling read.
we can't use os-level file descriptors all the time though, as chunkify gets also invoked
on objects like BytesIO that are not backed by a os-level file.
Note: this changeset is also a preparation for O_DIRECT support which can be
implemented a lot easier on C level.
shows original, compressed and deduped size plus path name.
output is 79 chars wide, so 80x24 terminal does not wrap/scroll.
long path names are shortened (in a rather simplistic way).
output happens when a new item is started, but not more often than 5/s
(thus, not every pathname is shown)
at the end, the output line is cleared but not scrolled, so it basically vanishes.
process_item was used only for dirs and fifo, replaced it by process_dir and process_fifo,
so the status can be generated there (as it is done for the other item types).
before this changesets, most informations about exceptions/tracebacks
on the remote side were lost. now they are transmitted and displayed,
together with the remote attic version.
don't catch "Exception" when OSError was meant (otherwise e.errno is not there anyway)
don't use bare "except:" if one can avoid (copied code fragment from similar handler)
added "nonlocal euid" - without this, euid just gets redefined in inner scope instead of assigned to outer scope
added check for euid 0 - if we run as root, we always have permissions (not just if we are file owner)
note: due to caching and OS behaviour on linux, the bug was a bit tricky to reproduce
and also the fix was a bit tricky to test.
one needs strictatime mount option to enfore traditional atime updating.
for repeated tests, always change file contents (e.g. from /dev/urandom) or attic's caching
will prevent that the file gets read ("accessed") again.
check atimes with ls -lu
i could reproduce code was broken and is fixed with this changeset. and root now doesn't touch any atimes.