Commit Graph

6 Commits

Author SHA1 Message Date
Thomas Waldmann b6ed1c742b PR #284 - Merge branch 'sparse_files' into merge 2015-04-15 16:43:07 +02:00
Thomas Waldmann a2bf2aea22 simple sparse file support, made chunk buffer size flexible
Implemented sparse file support to remove this blocker for people backing up lots of
huge sparse files (like VM images). Attic could not support this use case yet as it would
have restored all files to their fully expanded size, possibly running out of disk space if
the total expanded size would be bigger than the available space.

Please note that this is a very simple implementation of sparse file support - at backup time,
it does not do anything special (it just reads all these zero bytes, chunks, compresses and
encrypts them as usual). At restore time, it detects chunks that are completely filled with zeros
and does a seek on the output file rather than a normal data write, so it creates a hole in
a sparse file. The chunk size for these all-zero chunks is currently 10MiB, so it'll create holes
of multiples of that size (depends also a bit on fs block size, alignment, previously written data).

Special cases like sparse files starting and/or ending with a hole are supported.

Please note that it will currently always create sparse files at restore time if it detects all-zero
chunks.

Also improved:
I needed a constant for the max. chunk size, so I introduced CHUNK_MAX (see also
existing CHUNK_MIN) for the maximum chunk size (which is the same as the chunk
buffer size).

Attic still always uses 10MiB chunk buffer size now, but it could be changed now more easily.
2015-04-15 16:29:18 +02:00
Thomas Waldmann 7ad1093951 let chunker optionally work with os-level file descriptor
this safes some back-and-forth between C and Python code and also some memory
management overhead as we can always reuse the same read_buf instead of letting
Python allocate and free a up to 10MB big buffer for each buffer filling read.

we can't use os-level file descriptors all the time though, as chunkify gets also invoked
on objects like BytesIO that are not backed by a os-level file.

Note: this changeset is also a preparation for O_DIRECT support which can be
 implemented a lot easier on C level.
2015-04-08 18:43:53 +02:00
Jonas Borgström 9f64e39d9f Reuse chunker buffer between files. 2014-08-03 15:04:41 +02:00
Jonas Borgström 92c333c071 Add a method to detect out of date binary extension modules 2014-03-18 22:04:08 +01:00
Jonas Borgström b718a443a8 Project rename 2013-07-09 20:14:18 +02:00
Renamed from darc/chunker.pyx (Browse further)