Commit Graph

9 Commits

Author SHA1 Message Date
Thomas Waldmann b6ed1c742b PR #284 - Merge branch 'sparse_files' into merge 2015-04-15 16:43:07 +02:00
Thomas Waldmann a2bf2aea22 simple sparse file support, made chunk buffer size flexible
Implemented sparse file support to remove this blocker for people backing up lots of
huge sparse files (like VM images). Attic could not support this use case yet as it would
have restored all files to their fully expanded size, possibly running out of disk space if
the total expanded size would be bigger than the available space.

Please note that this is a very simple implementation of sparse file support - at backup time,
it does not do anything special (it just reads all these zero bytes, chunks, compresses and
encrypts them as usual). At restore time, it detects chunks that are completely filled with zeros
and does a seek on the output file rather than a normal data write, so it creates a hole in
a sparse file. The chunk size for these all-zero chunks is currently 10MiB, so it'll create holes
of multiples of that size (depends also a bit on fs block size, alignment, previously written data).

Special cases like sparse files starting and/or ending with a hole are supported.

Please note that it will currently always create sparse files at restore time if it detects all-zero
chunks.

Also improved:
I needed a constant for the max. chunk size, so I introduced CHUNK_MAX (see also
existing CHUNK_MIN) for the maximum chunk size (which is the same as the chunk
buffer size).

Attic still always uses 10MiB chunk buffer size now, but it could be changed now more easily.
2015-04-15 16:29:18 +02:00
Thomas Waldmann c7d232c4ce use posix_fadvise to avoid spoiling the OS cache
note:
 - we call this frequently AFTER re-filling the chunker buffer,
so even big input files have little cache impact.
- there is still some cache impact due to output files caching,
if the repository is on a locally mounted filesystem.
2015-04-11 01:09:03 +02:00
Thomas Waldmann 7ad1093951 let chunker optionally work with os-level file descriptor
this safes some back-and-forth between C and Python code and also some memory
management overhead as we can always reuse the same read_buf instead of letting
Python allocate and free a up to 10MB big buffer for each buffer filling read.

we can't use os-level file descriptors all the time though, as chunkify gets also invoked
on objects like BytesIO that are not backed by a os-level file.

Note: this changeset is also a preparation for O_DIRECT support which can be
 implemented a lot easier on C level.
2015-04-08 18:43:53 +02:00
Jonas Borgström 9f64e39d9f Reuse chunker buffer between files. 2014-08-03 15:04:41 +02:00
Cyril Roussillon 0b4e324af2 chunker: optimized the barrel shift
move the modulo out of the barrel shift and use 32bits variables so
that the compiler recognizes it and uses the "rol*" asm instructions.

Before: 245 MiB/s
After: 338 MiB/s

CPU: Intel(R) Core(TM) i5-3320M CPU @ 2.60GHz

Modification by Jonas: commit message formatting and added
Cyril Roussillon to AUTHORS
2014-05-13 23:05:13 +02:00
Tung Dao 6d77808bec Fix for ISO C90 compliance 2014-03-30 22:43:31 +07:00
Jonas Borgström 1e4fd4e18a PyBuffer_FromMemory should be a static function 2014-02-09 21:25:05 +01:00
Jonas Borgström b718a443a8 Project rename 2013-07-09 20:14:18 +02:00
Renamed from darc/_chunker.c (Browse further)