os.uname is UNIX-only, sys.platform is portable.
note:
- this doesn't implicate attic will now work on windows.
- windows is untested / unsupported and there might be a lot of other issues left.
- attic's xattr module already used sys.platform, so this is better for internal consistency also.
the problem was that calc_stats() dirties cache.chunks by decrementing
the chunk reference counters (so it can compute the deduplicated size
of the archive correctly).
the fix is to create a local Cache instance inside calc_stats, so the dirty cache
instance can not be used elsewhere.
also:
fix internal consistency of calc_stats function: always use "cache" (not "self.cache").
minor cosmetic pep8 fixes
Implemented sparse file support to remove this blocker for people backing up lots of
huge sparse files (like VM images). Attic could not support this use case yet as it would
have restored all files to their fully expanded size, possibly running out of disk space if
the total expanded size would be bigger than the available space.
Please note that this is a very simple implementation of sparse file support - at backup time,
it does not do anything special (it just reads all these zero bytes, chunks, compresses and
encrypts them as usual). At restore time, it detects chunks that are completely filled with zeros
and does a seek on the output file rather than a normal data write, so it creates a hole in
a sparse file. The chunk size for these all-zero chunks is currently 10MiB, so it'll create holes
of multiples of that size (depends also a bit on fs block size, alignment, previously written data).
Special cases like sparse files starting and/or ending with a hole are supported.
Please note that it will currently always create sparse files at restore time if it detects all-zero
chunks.
Also improved:
I needed a constant for the max. chunk size, so I introduced CHUNK_MAX (see also
existing CHUNK_MIN) for the maximum chunk size (which is the same as the chunk
buffer size).
Attic still always uses 10MiB chunk buffer size now, but it could be changed now more easily.
Archive timestamps are stored as the output of datetime.isoformat().
This function omits microseconds in the string output if the
microseconds are zero (as documented and explained at
https://bugs.python.org/issue7342).
Parsing of timestamps assumes there are always microseconds present
after a decimal point. This is not always true. Handle this case where
it is not true by explicitly using '0' microseconds when not present.
This commit fixes#282
datetime.isoformat() has different output depending on whether
microseconds are zero or not. Add test cases to ensure we handle both
cases correctly in an archive.
less calls to posix_fadvise (which seem to force a write-cache sync-to-disk and
wait for that to complete) - if we call it after we synced anyway, we don't lose time.
also: fixed a bug in the os.fsync call, it needs the fileno.
note:
- we call this frequently AFTER re-filling the chunker buffer,
so even big input files have little cache impact.
- there is still some cache impact due to output files caching,
if the repository is on a locally mounted filesystem.
this safes some back-and-forth between C and Python code and also some memory
management overhead as we can always reuse the same read_buf instead of letting
Python allocate and free a up to 10MB big buffer for each buffer filling read.
we can't use os-level file descriptors all the time though, as chunkify gets also invoked
on objects like BytesIO that are not backed by a os-level file.
Note: this changeset is also a preparation for O_DIRECT support which can be
implemented a lot easier on C level.
sure it is "prettier" without, but a lot of useful information for debugging is lost if the traceback is not shown.
even for KeyboardInterrupt:
it may have some bad reason when one has to use Ctrl-C - if attic was stuck somewhere, we want to know where it was.