Commit Graph

38 Commits

Author SHA1 Message Date
Thomas Waldmann b92f4aa487
remove --consider-part-files, related stats code, update docs
we now just treat that one .borg_part file we might have inside
checkpoint archives as a normal file.

people can recognize via the file name it is a partial file.

nobody cares for statistics of checkpoint files and the final
archive now does not contain any partial files any more, thus
no needs to maintain statistics about count and size of part
files.
2023-02-01 13:04:18 +01:00
TW c29d4a096b
Hashindex header work, fixes #6960 (#7064)
support reading new, improved hashindex header format, fixes #6960

Bit of a pain to work with that code:
- C code
- needs to still be able to read the old hashindex file format,
- while also supporting the new file format.
- the hash computed while reading the file causes additional problems because
  it expects all places in the file get read exactly once and in sequential order.
  I solved this by separately opening the file in the python part of the code and
  checking for the magic.
  BORG_IDX means the legacy file format and legacy layout of the hashtable,
  BORG2IDX means the new file format and the new layout of the hashtable.

Done:
- added a version int32 directly after the magic and set it to 2 (like borg 2).
  the old header had no version info, but could be denoted as version 1 in case
  we ever need it (currently it decides based on the magic).
- added num_empty as indicated by a TODO in count_empty, so it does not need a
  full hashtable scan to determine the amount of empty buckets.
- to keep it simpler, I just filled the HashHeader struct with a
  `char reserved[1024 - 32];`
  1024 being the desired overall header size and 32 being the currently used size.
  this alignment might be useful in case we mmap() the hashindex file one day.
2022-10-02 14:35:21 +02:00
Thomas Waldmann 8c9fed105d hashindex: make NSIndex1 api compatible
some new stuff is not supported for NSIndex1,
but we can avoid crashing due to function signature mismatches or
missing methods and rather have more clear exceptions.
2022-09-21 08:56:37 +02:00
Thomas Waldmann d003046078 hashindex.pyx: fix signedness warning 2022-08-04 10:50:38 +02:00
Thomas Waldmann f04b2bd255 remove coding: from cython files, utf-8 is default encoding 2022-07-05 00:08:51 +02:00
Thomas Waldmann 350393c9fd remove unused imports 2022-07-05 00:05:07 +02:00
Thomas Waldmann e5ea016115 repository: set/query flags, iteration over flagged items (NSIndex)
use this to query or set/clear flags in the "extra" word.

also: remove direct access to the "extra" word, adapt tests.
2022-06-14 14:48:56 +02:00
Thomas Waldmann 3ce3fbcdff repository index: add payload size (==csize) and flags to NSIndex entries
This saves some segment file random IO that was previously necessary
just to determine the size of to be deleted data.

Keep old one as NSIndex1 for old borg compatibility.
Choose NSIndex or NSIndex1 based on repo index layout from HashHeader.

for an old repo index repo.get(key) returns segment, offset, None, None
2022-06-14 14:48:56 +02:00
Thomas Waldmann 2c1f7951c4 remove csize from ChunkIndexEntry 2022-06-12 17:15:13 +02:00
Thomas Waldmann b82a39c3b3 remove csize from stats_against() 2022-06-12 15:48:33 +02:00
Thomas Waldmann 0211948cac remove csize from summarize return tuple 2022-06-12 15:48:33 +02:00
Thomas Waldmann ace5957524 remove csize from item.chunks elements 2022-06-12 15:48:33 +02:00
Thomas Waldmann b9f9623a6d prepare to remove csize (set it to 0 for now) 2022-06-12 15:48:33 +02:00
Thomas Waldmann 603b58f6a1 implement more standard hashindex.setdefault behaviour
the .get() like behaviour (== returning the value) was missing.

it's still not 100% like dict.setdefault, because there is no
default value None. but None doesn't make sense here, because we
usually need a N-tuple matching the hash table's value format.

note: this "bug" (or unusual implementation) was without consequences,
      because hashindex.setdefault is not used anywhere in borg, so
      it was also not used in a wrong way anywhere.

https://docs.python.org/3/library/stdtypes.html#dict.setdefault
2022-02-13 03:47:44 +01:00
Thomas Waldmann a65cefb7bb bump API_VERSIONs to 1.2_xx 2019-02-24 19:45:41 +01:00
TW c3f40de606
cache_sync: compute size/count stats, borg info: consider part files (#4286)
cache_sync: compute size/count stats, borg info: consider part files

fixes #3522
2019-02-04 03:26:45 +01:00
Emmo Emminghaus f8ef6af454 hashindex: clean void* arithmetic up #2677
lowlevel: clean void* arithmetic up
unpack: repalce nonstandard false with 0
2018-10-24 21:40:05 +02:00
Thomas Waldmann de113bab23 move capacity calculation to IndexBase, fixes #2646
we just give how many "usable" hashtable entries we want and it computes
the hashtable capacity internally via int(usable / MAX_LOAD_FACTOR).
2018-06-12 22:25:27 +02:00
Marian Beermann 9a856533ba fuse: versions view, linear numbering by archive time 2017-07-03 12:38:10 +02:00
Marian Beermann 8aa745ddbd create: --no-cache-sync 2017-06-18 02:01:26 +02:00
Thomas Waldmann 72ef24cbc0 hashindex: implement KeyError 2017-06-17 23:25:32 +02:00
Marian Beermann e189a4d302 info: use CacheSynchronizer & HashIndex.stats_against 2017-06-13 14:34:10 +02:00
enkore 13f396d5ad Merge pull request #2638 from enkore/f/fastcachesync-minify
Compact chunks.archive.d
2017-06-10 17:13:25 +02:00
Marian Beermann 310a71e4f0 cache sync: use ro_buffer to accept bytes, memoryview, ... 2017-06-10 10:17:28 +02:00
Marian Beermann 6e011b9354 cache: compact hashindex before writing to chunks.archive.d 2017-06-09 12:23:26 +02:00
Marian Beermann c786a5941e CacheSynchronizer: redo as quasi FSM on top of unpack.h
This is a (relatively) simple state machine running in the
data callbacks invoked by the msgpack unpacking stack machine
(the same machine is used in msgpack-c and msgpack-python,
changes are minor and cosmetic, e.g. removal of msgpack_unpack_object,
removal of the C++ template thus porting to C and so on).

Compared to the previous solution this has multiple advantages
- msgpack-c dependency is removed
- this approach is faster and requires fewer and smaller
  memory allocations

Testability of the two solutions does not differ in my
professional opinion(tm).

Two other changes were rolled up; _hashindex.c can be compiled
without Python.h again (handy for fuzzing and testing);
a "small" bug in the cache sync was fixed which allocated too
large archive indices, leading to excessive archive.chunks.d
disk usage (that actually gave me an idea).
2017-06-02 17:43:15 +02:00
Marian Beermann 740898d83b CacheSynchronizer 2017-06-02 17:43:14 +02:00
Marian Beermann 06cf15cc6d hashindex: read/write: accept file-like objects for path 2017-05-25 14:04:41 +02:00
enkore 6cd7d415ca hashindex: Use Python I/O (#2496)
- Preparation for #1688 / #1101
- Support hash indices >2 GB
- Better error reporting
2017-05-09 21:30:14 +02:00
Thomas Waldmann 8d7dfe739f fix ChunkIndex.__contains__ assertion for big-endian archs
also: add some missing assertion messages

severity:

- no issue on little-endian platforms (== most, including x86/x64)
- harmless even on big-endian as long as refcount is below 0xfffbffff,
  which is very likely always the case in practice anyway.
2017-02-20 07:38:55 +01:00
Thomas Waldmann e431d60cc5 merge 1.0-maint into master
# Conflicts:
#	src/borg/crypto.pyx
#	src/borg/hashindex.pyx
#	src/borg/helpers.py
#	src/borg/platform/__init__.py
#	src/borg/platform/darwin.pyx
#	src/borg/platform/freebsd.pyx
#	src/borg/platform/linux.pyx
#	src/borg/remote.py
2017-01-14 03:07:11 +01:00
Thomas Waldmann 8df6cb8156 hashindex: bump api_version
note:
merging the respective changeset from 1.0-maint was not effective
as we already had version 3, so there was no increase.
2016-09-30 23:59:41 +02:00
Thomas Waldmann ba30098079 Merge branch '1.0-maint' into merge-1.0-maint 2016-09-29 12:57:29 +02:00
Thomas Waldmann 1287d1ae92 Merge branch '1.0-maint' into merge-1.0-maint
# Conflicts:
#	docs/development.rst
#	src/borg/archive.py
#	src/borg/archiver.py
#	src/borg/hashindex.pyx
#	src/borg/testsuite/hashindex.py
2016-09-14 02:53:41 +02:00
Marian Beermann e9a73b808f Check for sufficient free space before committing 2016-07-30 00:04:27 +02:00
Thomas Waldmann 3baa8a3728 Merge branch '1.0-maint'
# Conflicts:
#	docs/changes.rst
#	docs/usage/mount.rst.inc
#	src/borg/archive.py
#	src/borg/archiver.py
#	src/borg/fuse.py
#	src/borg/testsuite/archiver.py
2016-07-11 01:23:27 +02:00
Thomas Waldmann 045e5a1203 Merge branch 'master' into move-to-src 2016-05-30 19:38:16 +02:00
Thomas Waldmann d1ea925a5b move borg package to src/ 2016-05-05 20:19:50 +02:00
Renamed from borg/hashindex.pyx (Browse further)