Commit Graph

26 Commits

Author SHA1 Message Date
Thomas Waldmann 7f973a5b34
implement "fail" chunker for testing purposes
--chunker-params=fail,4096,rrrEErrrr means:
- cut chunks of 4096b fixed size (last chunk in a file can be less)
- read chunks 0, 1 and 2 successfully
- error at chunk 3 and 4 (simulated OSError(errno.EIO))
- read successfully again for the next 4 chunks

Chunks are counted inside the chunker instance, starting
from 0, always increasing while the same instance is used.

Read chunks as well as failed chunks count up by 1.
2023-02-13 17:15:45 +01:00
Paul D 253d8e8d4e Docs grammar fixes
joined split infinitives, and relocated adverbs appropriately.
2022-12-29 22:26:54 +00:00
Franco Ayala 2ed7f317d3
Adding performance statistics to borg create (#6991)
- file status A/M/E counters
- chunking time
- hashing time
- rx_bytes / tx_bytes

Note: the sleep() in the test is needed due to timestamp granularity on linux being much more coarse than expected (uses the system timer, 100Hz or 250Hz).
2022-10-19 21:40:02 +02:00
Thomas Waldmann f04b2bd255 remove coding: from cython files, utf-8 is default encoding 2022-07-05 00:08:51 +02:00
Thomas Waldmann 350393c9fd remove unused imports 2022-07-05 00:05:07 +02:00
Thomas Waldmann 2391d160a8 add all-zero detection to buzhash chunk data processing 2021-01-15 21:27:29 +01:00
Thomas Waldmann 2d76365214 cosmetic: directly set allocation instead going via is_zero 2021-01-15 21:10:07 +01:00
Thomas Waldmann 8162e2e67b cached_hash is only used in archive, move it there 2021-01-14 20:50:12 +01:00
Thomas Waldmann be257728ca move zeros to constants module 2021-01-14 20:02:18 +01:00
Thomas Waldmann 3b9798cffc remove max_chunk_size (unused) 2021-01-14 19:56:39 +01:00
Thomas Waldmann 4e3be1db5e reuse zeros also in fixed-size chunker for all-zero chunk detection
also: zeros.startswith() is faster
2021-01-08 23:39:53 +01:00
Thomas Waldmann f3088a9893 rename chunk_to_id_data to cached_hash 2021-01-08 23:39:53 +01:00
Thomas Waldmann 9fd284ce1a refactor new zero chunk handling to be reusable 2021-01-08 23:39:53 +01:00
Thomas Waldmann 6d0f9a52eb detect all-zero chunks, avoid hashing them
comparing zeros is quicker than hashing them.
the comparison should fail quickly inside non-zero data.
2021-01-08 17:40:06 +01:00
Thomas Waldmann 8c299696aa Chunker: yield Chunk namedtuple instead of bytes/memoryview 2021-01-08 01:10:44 +01:00
Thomas Waldmann c0c0da9c76 skip sparse tests if has_seek_hole is False
also: do the os.SEEK_(HOLE|DATA) check only once
2020-12-27 22:06:08 +01:00
Thomas Waldmann b8bb0494f6 create --sparse, file map support for the "fixed" chunker, see #14
a file map can be:

- created internally inside chunkify by calling sparsemap, which uses
  SEEK_DATA / SEEK_HOLE to determine data and hole ranges inside a
  seekable sparse file.
  Usage: borg create --sparse --chunker-params=fixed,BLOCKSIZE ...
  BLOCKSIZE is the chunker blocksize here, not the filesystem blocksize!

- made by some other means and given to the chunkify function.
  this is not used yet, but in future this could be used to only read
  the changed parts and seek over the (known) unchanged parts of a file.

sparsemap: the generate range sizes are multiples of the fs block size.
           the tests assume 4kiB fs block size.
2020-12-27 22:06:08 +01:00
Thomas Waldmann a65cefb7bb bump API_VERSIONs to 1.2_xx 2019-02-24 19:45:41 +01:00
Thomas Waldmann 80e0b42f7d add fixed blocksize chunker, fixes #1086 2019-02-13 04:24:14 +01:00
Thomas Waldmann c4ffbd2a17 prepare to support multiple chunkers 2019-02-13 04:24:14 +01:00
Marian Beermann faf2d0b537 chunker: fix invalid use of types
With the argument specified as unsigned char *, Cython emits
code in the Python wrapper to convert string-like objects to
unsigned char* (essentially PyBytes_AS_STRING).

Because the len(data) call is performed on a cdef'd string-ish type,
Cython emits a strlen() call, on the result of PyBytes_AS_STRING.

This is not correct, since embedded null bytes are entirely possible.

Incidentally, the code generated by Cython was also not correct,
since the Clang Static Analyzer found a path of execution where
passing arguments in a weird way from Python resulted in strlen(NULL).

Formulated like this, Cython emits essentially:

c_buzhash(
 PyBytes_AS_STRING(data),
 PyObject_Length(data),
 ...
)

which is correct.
2017-06-14 19:16:36 +02:00
Marian Beermann 3f8a0221ee Revert "move chunker to borg.algorithms"
This reverts commit 956b50b29c.

# Conflicts:
#	setup.py
#	src/borg/archive.py
#	src/borg/helpers.py
2017-06-07 23:51:42 +02:00
Marian Beermann 956b50b29c move chunker to borg.algorithms 2017-05-02 19:15:01 +02:00
Thomas Waldmann e431d60cc5 merge 1.0-maint into master
# Conflicts:
#	src/borg/crypto.pyx
#	src/borg/hashindex.pyx
#	src/borg/helpers.py
#	src/borg/platform/__init__.py
#	src/borg/platform/darwin.pyx
#	src/borg/platform/freebsd.pyx
#	src/borg/platform/linux.pyx
#	src/borg/remote.py
2017-01-14 03:07:11 +01:00
Thomas Waldmann 045e5a1203 Merge branch 'master' into move-to-src 2016-05-30 19:38:16 +02:00
Thomas Waldmann d1ea925a5b move borg package to src/ 2016-05-05 20:19:50 +02:00
Renamed from borg/chunker.pyx (Browse further)