1
0
Fork 0
mirror of https://github.com/borgbackup/borg.git synced 2025-01-03 05:35:58 +00:00
Commit graph

8673 commits

Author SHA1 Message Date
TW
fd1a7ddb76
Merge pull request #8417 from ThomasWaldmann/ids-per-chunk
increase IDS_PER_CHUNK, fixes #6945
2024-09-25 22:48:16 +02:00
Thomas Waldmann
015e3a43aa
increase IDS_PER_CHUNK, fixes #6945 2024-09-25 20:57:28 +02:00
TW
67b62b5989
Merge pull request #8411 from ThomasWaldmann/optimize-repo-list-usage
bugfix: remove superfluous repository.list() call
2024-09-25 11:06:09 +02:00
Thomas Waldmann
1436bbba1a
bugfix: remove superfluous repository.list() call
Because it ended the loop only when .list() returned an
empty result, this always needed one call more than
necessary.

We can also detect that we are finished, if .list()
returns less than the limit we gave to it.

Also: reduce code duplication by using repo_lister func.
2024-09-24 23:43:08 +02:00
TW
7d02fe2b8f
Merge pull request #8403 from ThomasWaldmann/cache-chunkindex
chunks index caching
2024-09-24 23:37:29 +02:00
Thomas Waldmann
36e3d63474
chunks index caching, fixes #8397
borg compact now uses ChunkIndex (a specialized, memory-efficient data structure),
so it needs less memory now. Also, it saves that chunks index to cache/chunks in
the repository.

When the chunks index is needed, it is first tried to get it from cache/chunks.
If that fails, fall back to building the chunks index via repository.list(),
which can be rather slow and immediately cache the resulting ChunkIndex in the
repo.

borg check --repair currently just deletes the chunks cache, because it might
have deleted some invalid chunks in the repo.

cache.close now saves the chunks index to cache/chunks in repo if it
was modified.
thus, borg create will update the cached chunks index with new chunks.

cache/chunks_hash can be used to validate cache/chunks (and also to validate /
invalidate locally cached copies of that).
2024-09-24 22:25:00 +02:00
TW
1e6f71f2f5
Merge pull request #8408 from helmutg/faq-slow-fat
FAQ: Why is backing up an unmodified FAT filesystem slow on Linux?
2024-09-23 13:46:46 +02:00
Helmut Grohne
70f173caa7 FAQ: Why is backing up an unmodified FAT filesystem slow on Linux? 2024-09-23 10:36:56 +02:00
TW
527454840b
Merge pull request #8405 from ThomasWaldmann/support-rclone-borgstore
add support for rclone:// repositories (via borgstore)
2024-09-22 23:27:27 +02:00
Thomas Waldmann
bd6caf835d
add support for rclone:// repositories (via borgstore) 2024-09-22 22:26:07 +02:00
TW
4d8954ecbb
Merge pull request #8404 from ThomasWaldmann/fix-build-files-cache
cache: fix crash in _build_files_cache
2024-09-22 01:34:21 +02:00
Thomas Waldmann
e5e685fd1f
cache: fix crash in _build_files_cache 2024-09-22 00:36:30 +02:00
TW
b862f2b95f
Merge pull request #8389 from ThomasWaldmann/files-cache-from-archive
files cache improvements
2024-09-21 15:17:29 +02:00
Thomas Waldmann
ec9d412756
fix race condition with data loss potential, fixes #3536
we discard all files cache entries referring to files
with timestamps AFTER we started the backup.

so, even in case we would back up an inconsistent file
that has been changed while we backed it up, we would
not have a files cache entry for it and would fully
read/chunk/hash it again in next backup.
2024-09-21 11:34:34 +02:00
Thomas Waldmann
b60378cf0e
fix race condition with data loss potential, fixes #3536
if we detect the conditions for this (rare) race,
abort reading the file and retry.

The caller (_process_any) will do up to MAX_RETRIES
before giving up. If it gives up, a warning is logged
and the file is not written to the archive and won't
be memorized in the files cache either.

Thus, the file will be read/chunked/hashed again at
the next borg create run.
2024-09-21 11:34:31 +02:00
TW
275e5e136c
Merge pull request #8399 from ThomasWaldmann/storelocking-updates
storelocking: fixes / cleanups
2024-09-20 14:28:01 +02:00
Thomas Waldmann
31e5318e66
storelocking: fixes / cleanups
- on explicit request, update .last_refresh_dt inside _create_lock / _delete_lock
- reset .last_refresh_dt if we kill our own lock
- be more precise, have exactly the datetime of the lock in .last_refresh_dt
- cosmetic: do refresh/stale time comparisons always in the same way
2024-09-20 11:49:40 +02:00
Thomas Waldmann
c100e7b1f5
files cache: update ctime, mtime of known and "unchanged" files, fixes #4915 2024-09-20 00:44:55 +02:00
Thomas Waldmann
a891559578
files cache improvements, fixes #8385, fixes #5658
- changes to locally stored files cache:

  - store as files.<H(archive_name)>
  - user can manually control suffix via env var
  - if local files cache is not found, build from previous archive.
- enable rebuilding the files cache via loading the previous
  archive's metadata from the repo (better than starting with
  empty files cache and needing to read/chunk/hash all files).
  previous archive == same archive name, latest timestamp in repo.
- remove AdHocCache (not needed any more, slow)
- remove BORG_CACHE_IMPL, we only have one
- remove cache lock (this was blocking parallel backups to same
  repo from same machine/user).

Cache entries now have ctime AND mtime.

Note: TTL and age still needed for discarding removed files.
      But due to the separate files caches per series, the TTL
      was lowered to 2 (from 20).
2024-09-20 00:40:49 +02:00
TW
385eeeb4d5
Merge pull request #8398 from ThomasWaldmann/repo-compress-using-chunkindex
repo-compress: use chunkindex rather than repository.list()
2024-09-19 19:42:41 +02:00
Thomas Waldmann
f1a39a059e
repo-compress: use chunkindex rather than repository.list()
repository.list is slow, so rather use the chunkindex,
which might be cached in future. currently, it also uses
repository.list, but at least we can solve the problem
at one place then.
2024-09-19 18:59:03 +02:00
TW
3fd1587dd7
Merge pull request #8396 from ThomasWaldmann/storelocking-debug-logging
storelocking: add debug logging
2024-09-19 16:38:26 +02:00
Thomas Waldmann
d322889972
storelocking: avoid raising a NotLocked exception while releasing the lock while handling an exception 2024-09-19 15:15:22 +02:00
Thomas Waldmann
6a283200f2
storelocking: add debug logging 2024-09-19 15:15:20 +02:00
TW
8d37c00f7b
Merge pull request #8395 from ThomasWaldmann/msys-updates
msys2: disable SETUPTOOLS_USE_DISTUTILS=stdlib hack
2024-09-19 13:29:21 +02:00
Thomas Waldmann
4c1a0b1ca0
msys2: disable SETUPTOOLS_USE_DISTUTILS=stdlib hack
The msys2 changelog says it is only needed for setuptools < 70.2.0:

https://www.msys2.org/docs/python/#known-issues

https://setuptools.pypa.io/en/stable/history.html#v70-2-0
2024-09-19 12:36:18 +02:00
TW
11b72efffe
Merge pull request #8394 from ThomasWaldmann/list-refresh-lock-frequently
repository.list: refresh lock more frequently
2024-09-19 12:19:29 +02:00
Thomas Waldmann
2a20ebeec7
repository.list: refresh lock more frequently
under all circumstances, we must avoid that the lock
gets stale due to not being refreshed in time.

there is some internal rate limiting in _lock_refresh,
so calling it often should be no problem.
2024-09-19 11:38:49 +02:00
TW
97d1e18626
Merge pull request #8393 from ThomasWaldmann/fix-compact-nonunique-names
fixes for non-unique archive names
2024-09-19 01:52:08 +02:00
Thomas Waldmann
03b139ee53
ArchiveFormatter: fix for non-unique archive names
For Archive(), always use the archive id, not the archive name!
2024-09-19 00:58:45 +02:00
Thomas Waldmann
eb75390240
tests: fix for non-unique archive names
For Archive(), always use the archive id, not the archive name!
2024-09-19 00:58:32 +02:00
Thomas Waldmann
948bc4cdf9
mount: fix for non-unique archive names
For Archive(), always use the archive id, not the archive name!
2024-09-19 00:58:21 +02:00
Thomas Waldmann
f5b3ab66e9
transfer: fix for non-unique archive names
For Archive(), always use the archive id, not the archive name!
2024-09-19 00:58:05 +02:00
Thomas Waldmann
4bce862d95
diff: fix for non-unique archive names
For Archive(), always use the archive id, not the archive name!

Also: remove with_archive decorator usage to get more consistent code.
2024-09-19 00:57:04 +02:00
Thomas Waldmann
6b68b5a4a7
compact: fix for non-unique archive names
For Archive(), always use the archive id, not the archive name!

Also: sorting by timestamp, improved output.
2024-09-19 00:56:25 +02:00
TW
61dd29b815
Merge pull request #8391 from ThomasWaldmann/fix-hashing-time-test
remove the hashing/chunking time tests
2024-09-18 19:39:55 +02:00
Thomas Waldmann
2b69e71465
remove the hashing/chunking time tests
frequently failing on fast machines.
2024-09-18 19:01:05 +02:00
TW
29c7ce4e1f
Merge pull request #8379 from ThomasWaldmann/backup-series
backup series, fixes #7930
2024-09-18 15:03:27 +02:00
Thomas Waldmann
1bc5902718
docs: update about archive series
in borg 1.x, we used to put a timestamp into the archive name to make
it unique, because borg1 required that.

borg2 does not require unique archive names, but it encourages you
to even use an identical archive name within the same SERIES of archives.
that makes matching (e.g. for prune, but also at other places) much
simpler and borg KNOWS which archives belong to the same series.
2024-09-18 14:05:12 +02:00
Thomas Waldmann
a4d2fc8dd7
remote: allow get_manifest method (bugfix)
LegacyRepository of borg2 knows this method.
2024-09-17 20:18:47 +02:00
Thomas Waldmann
dd78b774a0
transfer: checking name is not enough, check name/id and name/ts 2024-09-17 20:18:45 +02:00
Thomas Waldmann
426d1c7dda
manifest: reorder methods, no other changes 2024-09-17 20:18:44 +02:00
Thomas Waldmann
77fc3fb884
manifest: use empty archives/<archive_hex_id>
for the archives directory, we only need to know the archive IDs,
everything else can be fetched from the ArchiveItem in the repo.

so we store empty files into archives/* with the archiv ID as name.

this makes some "by-id" operations much easier and we don't have to
deal with a useless "store_key" anymore.

removed .delete method - we can't delete by name anymore as we
allow duplicate names for the series feature. everything uses
delete_by_id() now.

also: simplify, clean up, refactor
2024-09-17 20:18:40 +02:00
Thomas Waldmann
6ce5f2f230
manifest: use id from directory, fetch other metadata from archive 2024-09-17 20:18:38 +02:00
Thomas Waldmann
bb5cf96fe8
check: fix/enhance code, rewrite test
- we should always output name and id when talking about an archive
- no problem anymore if names in archives directory are "duplicate"
- use "by-id" archives directory entry delete function
- rewrite/simplify test for borg check --undelete-archives
2024-09-17 18:17:07 +02:00
Thomas Waldmann
81a27c1dbe
info/delete/prune: allow positional NAME argument
so if one works with backup series, one can just do:

borg prune --keep-daily 30 seriesname

seriesname will then do a precise match on the archive names
and select that series.
2024-09-17 18:16:36 +02:00
Thomas Waldmann
8237e6beca
NAME is the series, archive id is the hash
aid:<archive-id-prefix> can be used for -a / --match-archives
to match on the archive id (prefix) instead of the name.

NAME positional argument now also supports matching (and aid:),
but requires that there is exactly ONE result.
2024-09-17 18:16:31 +02:00
TW
ed31131fb6
Merge pull request #8384 from ThomasWaldmann/archive-inode
create: also archive inode number, fixes #8362
2024-09-17 12:50:54 +02:00
Thomas Waldmann
ae4abdfe1c
create: also archive inode number, fixes #8362
we could use this if we use the "previous archive" instead of the "files cache"
to determine whether a file is unchanged.
2024-09-17 11:49:05 +02:00
TW
e4b5a59be0
Merge pull request #8383 from ThomasWaldmann/sftp-url-docs
docs: user@ and :port are optional in sftp and ssh URLs
2024-09-17 10:06:54 +02:00