A borgbackup-2.0.0b6 test fails on OpenBSD with the message below.
```
=================================== FAILURES ===================================
_____________________________ test_get_runtime_dir _____________________________
path = '/run/user/55/borg', mode = 511, pretty_deadly = True
def ensure_dir(path, mode=stat.S_IRWXU | stat.S_IRWXG | stat.S_IRWXO, pretty_deadly=True):
"""
Ensures that the dir exists with the right permissions.
1) Make sure the directory exists in a race-free operation
2) If mode is not None and the directory has been created, give the right
permissions to the leaf directory. The current umask value is masked out first.
3) If pretty_deadly is True, catch exceptions, reraise them with a pretty
message.
Returns if the directory has been created and has the right permissions,
An exception otherwise. If a deadly exception happened it is reraised.
"""
try:
> os.makedirs(path, mode=mode, exist_ok=True)
build/lib.openbsd-7.3-amd64-cpython-310/borg/helpers/fs.py:37:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
```
If `$XDG_RUNTIME_DIR` is not set `platformdirs.user_runtime_dir()`
returns one of 3 different paths
(https://github.com/platformdirs/platformdirs/pull/201). Proposed fix is
to check if `get_runtime_dir()` returns one of these paths.
macFUSE supports a volname mount option to give what
finder displays on desktop / in directory list.
if the user did not specify it, we make something up,
because otherwise it would be "macFUSE Volume 0 (Python)".
about 10-50% of the github windows CI runs fail due to
this - root cause unknown.
Example failure:
# we first check if we could create a sparse input file:
sparse_support = is_sparse(filename, total_size, hole_size)
if sparse_support:
# we could create a sparse input file, so creating a backup of it and
# extracting it again (as sparse) should also work:
self.cmd(f"--repo={self.repository_location}", "rcreate", RK_ENCRYPTION)
self.cmd(f"--repo={self.repository_location}", "create", "test", "input")
with changedir(self.output_path):
self.cmd(f"--repo={self.repository_location}", "extract", "test", "--sparse")
self.assert_dirs_equal("input", "output/input")
filename = os.path.join(self.output_path, "input", "sparse")
with open(filename, "rb") as fd:
# check if file contents are as expected
> self.assert_equal(fd.read(hole_size), b"\0" * hole_size)
E AssertionError: b'\x0[8388602 chars]x00\xf0Y\xb5\xe3\xee\xf3\x1f\xe3L\xcf\xae\x92\[159253621 chars]\x00' != b'\x0[8388602 chars]x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0[159383505 chars]\x00'
src/borg/testsuite/archiver/extract_cmd.py:212: AssertionError
Replacing the internals should make the implementation faster
and simpler since the order tracking is done by the `OrderedDict`.
Furthermore, this commit adds type hints to `LRUCache` and
renames the `upd` method to `replace` to make its use more clear.
Paths are not always sanitized when creating an archive and,
more importantly, never when extracting one. The following example
shows how this can be used to attempt to write a file outside the
extraction directory:
$ echo abcdef | borg create -r ~/borg/a --stdin-name x/../../../../../etc/shadow archive-1 -
$ borg list -r ~/borg/a archive-1
-rw-rw---- root root 7 Sun, 2022-10-23 19:14:27 x/../../../../../etc/shadow
$ mkdir borg/target
$ cd borg/target
$ borg extract -r ~/borg/a archive-1
x/../../../../../etc/shadow: makedirs: [Errno 13] Permission denied: '/home/user/borg/target/x/../../../../../etc'
Note that Borg tries to extract the file to /etc/shadow and the
permission error is a result of the user not having access.
This patch ensures file names are sanitized before archiving.
As for files extracted from the archive, paths are sanitized
by making all paths relative, removing '.' elements, and removing
superfluous slashes (as in '//'). '..' elements, however, are
rejected outright. The reasoning here is that it is easy to start
a path with './' or insert a '//' by accident (e.g. via --stdin-name
or import-tar). '..', however, seem unlikely to be the result
of an accident and could indicate a tampered repository.
With paths being sanitized as they are being read, this "errors"
will be corrected during the `borg transfer` required when upgrading
to Borg 2. Hence, the sanitation, when reading the archive,
can be removed once support for reading v1 repositories is dropped.
V2 repository will not contain non-sanitized paths. Of course,
a check for absolute paths and '..' elements needs to kept in
place to detect tempered archives.
I recommend treating this as a security issue. I see the following
cases where extracting a file outside the extraction path could
constitute a security risk:
a) When extraction is done as a different user than archive
creation. The user that created the archive may be able to
get a file overwritten as a different user.
b) When the archive is created on one host and extracted on
another. The user that created the archive may be able to
get a file overwritten on another host.
c) When an archive is created and extracted after a OS reinstall.
When a host is suspected compromised, it is common to reinstall
(or set up a new machine), extract the backups and then evaluate
their integrity. A user that manipulates the archive before such
a reinstall may be able to get a file overwritten outside the
extraction path and may evade integrity checks.
Notably absent is the creation and extraction on the same host as
the same user. In such case, an adversary must be assumed to be able
to replace any file directly.
This also (partially) fixes#7099.
shutting down logging is problematic as it is global
and we do multi-threaded execution, e.g. in tests.
thus, rather just flush the important loggers and keep
them alive.
server (listening) side:
borg serve --socket # default location
borg serve --socket=/path/to/socket
client side:
borg -r socket:///path/to/repo create ...
borg --socket=/path/to/socket -r socket:///path/to/repo ...
served connections:
- for ssh: proto: one connection
- for socket: proto: many connections (one after the other)
The socket has user and group permissions (770).
skip socket tests on win32, they hang infinitely, until
github CI terminates them after 60 minutes.
socket tests: use unique socket name
don't use the standard / default socket name, otherwise tests
running in parallel would interfere with each other by using
the same socket / the same borg serve process.
write a .pid file, clean up .pid and .sock file at exit
add stderr print for accepted/finished socket connection
- tears down logging (so no new log output is generated afterwards)
- sends all queued log output
- then returns
also: make stdin_fd / stdout_fd instance variables
for normal borg command invocation:
- logging is set up in Archiver.run
- the atexit handler calls logging.shutdown when process terminates
for tests:
- Archiver.run called by exec_cmd
- no atexit handler executed as process lives on
- borg.logger.teardown (calls shutdown and configured=False) now
called in exec_cmd
- simplify progress output (no \r, no terminal size related tweaks)
- emit progress output via the logging system (so it does not use stderr
of borg serve)
- progress code always logs a json string, the json has all needed
to either do json log output or plain text log output.
- use formatters to generate plain or json output from that.
- clean up setup_logging
- use a StderrHandler that always uses the **current** sys.stderr
- tweak TestPassphrase to not accidentally trigger just because of seeing 12 in output
Instead, install a handler that sends the LogRecord dicts to a queue.
That queue is then emptied in the borg serve main loop and
the LogRecords are sent msgpacked via stdout to the client,
similar to the RPC results.
On the client side, the LogRecords are recreated from the
received dicts and fed into the clientside logging system.
As we use msgpacked LogRecord dicts, we don't need JSON for
this purpose on the borg serve side any more.
On the client side, the LogRecords will then be either formatted
as normal text or as JSON log output (by the clientside log
formatter).
Compact moves data to new segments, and then removes the old segments.
When enough segments are moved, directories holding the now cleared segments
may thus become empty.
With this commit any empty directories are cleared after segments compacting.
Fixes#6823
+ os.scandir instead of os.listdir
Improved speed and added flexibility with attributes (name, path, is_dir(), is_file())
+ use is_dir / is_file to make sure we're reading only dirs / files respectively
+ Filtering to particular start, end index range built in
+ Move value bounds of segment (index) into constants module and use them instead
Resolves#7597
(forward patch from commits c9f35a16e9bf9e7073c486553177cef79ff1cb06^..edb5e749f512b7737b6933e13b7e61fefcd17bcb)
this used to call get_base_dir (and would have needed
legacy=True now to work like expected).
rather implemented the desired behaviour locally and
got rid of the legacy call (which was a bit strange
anyway as it also considered BORG_BASE_DIR, which is
unexpected when resolving ~).
in the sysinfo function, there is a way to suppress
all sysinfo output via an env var and just return an
empty string.
so we can expect it is always in unpacked, but it
might be the empty string.
log output:
always expect json, remove $LOG format support.
we keep limited support for unstructured format also,
just not to lose anything from remote stderr.
rpc format:
ancient borg used tuples in the rpc protocol,
but recent ones use easier-to-work-with dicts.
version info:
we expect dicts with server/client version now.
setUp enters the context manager, so let's .reopen() leave it.
then create a fresh Repository instance in self.repository and
enter the context manager again. tearDown then will leave that.
"if self.repository" did not work as expected:
- Repository has a __len__ method, so the boolean evaluation was calling that.
- self.repository is also not set to None anywhere.
while on macOS the new and old security dir location is the same path,
this is not the case on e.g. Linux, it could move from .config/borg/security to
.local/share/borg/security .
See #5760.
not needed for borg2 repos (we derive a new session key for each borg
invocation and start counting from 0).
also not needed for borg 1.x repos because we only read them (borg transfer)
and won't write new encrypted data to them.
use this to only list the kept (or pruned) archives.
--list-pruned and --list-kept also work in a additive way.
implied logging: support multiple prune options activating same logger
if any of --list / --list-kept / --list-pruned is used,
it should put the borg.output.list logger to INFO level,
otherwise to WARN level.
as a first step, i moved all the traceback formatting
to format_tb.
also, it **first** prints the error and then the traceback
as additional information for a bug report, as suggested
by @jimparis in that ticket.
saying "must be a writable directory" can distract
from the real root cause as seen in #7496.
so we better first check if the mountpoint is an
existing directory and if not, just tell that.
after that, we check permissions and if they are not
like required, tell that.
fix config dir compatibility issue, fixes#7445
- add tests
- make sure the result of get_cache_dir matches pre and post #7300 where desired
- harmonize implementation of config_dir_compat and cache_dir_compat tests
Co-authored-by: nain <126972030+F49FF806@users.noreply.github.com>
this needs to decompress and to hash the chunk data,
but better let's play safe.
at least we still can avoid the (re-)compression with
borg transfer (which is often much more expensive
than decompression).
diff: include changes in ctime and mtime, fixes#7248
also:
- sort JSON output alphabetically
- add --content-only to ignore metadata changes
Co-authored-by: Michael Deyaso <mdeyaso@fusioniq.io>
this is an incompatible change:
before:
borg debug put-obj path1 path2 ...
(and borg computed all IDs automatically) (*)
after:
borg debug put-obj id path
(id must be given)
(*) the code just using sha256(data) was outdated and incorrect anyway.
also: debug get-obj: improve error handling
Errors handled for backup src files:
- BackupOSError (converted from OSError), e.g. I/O Error
- BackupError (stats race, file changed while we backed it up)
Error Handling:
- retry the same file after some sleep time
- sleep time starts from 1ms, increases exponentially up to 10s
- 10 tries
If retrying does not help:
- BackupOSError: skip the file, log it with "E" status
- BackupError: last try will back it up, log it with "C" status
Works for:
- borg create's normal (builtin) fs recursion
- borg create --paths-from-command
- borg create --paths-from-stdin
Notes:
- update stats.files_stats late (so we don't get wrong
stats in case of e.g. IOErrors while reading the file).
- _process_any: no changes to the big block, just indented
for adding the retry loop and the try/except.
- test_create_erroneous_file succeeds because we retry the file.
we do book-keeping in item.chunks:
in case something goes wrong and we need to clean up,
we will have a list with chunks to decref in item.chunks.
also:
- make variable naming more consistent
- cosmetic changes
if a file can't be read (like here: there is a simulated
I/O error in the 2nd chunk of file2), it should be logged
with "E" status, skipped and backup shall proceed with
next file(s).
also, check that the repo has no orphan chunks (exception
handling code needs to deal with 1st chunk of file2 which
already has been written / incref'd in the repo).
--chunker-params=fail,4096,rrrEErrrr means:
- cut chunks of 4096b fixed size (last chunk in a file can be less)
- read chunks 0, 1 and 2 successfully
- error at chunk 3 and 4 (simulated OSError(errno.EIO))
- read successfully again for the next 4 chunks
Chunks are counted inside the chunker instance, starting
from 0, always increasing while the same instance is used.
Read chunks as well as failed chunks count up by 1.
also add a test: recreate without --chunker-params shall not rechunk
before the fix, it triggered rechunking if an archive
was created with non-default chunker params.
but it only should rechunk if borg recreate is invoked with explicitly giving --chunker-params=....
test hashtable expansion/rebuild.
hashindex_lookup:
- return -2 for a compact / completely full hashtable
- return -1 and (via start_idx pointer) the deleted/tombstone bucket index.
fix size assertion (we add 1 element to trigger rebuild)
fix upper_limit check - since we'll be adding 1 to num_entries below,
the condition should be >=:
hashindex_compact: set min_empty/upper_limit
Co-authored-by: Dan Christensen <jdc+github@uwo.ca>
hashindex_index returns the perfect hashtable index, but does not
check what's in the bucket there, so we had these loops afterwards
to search for an empty or deleted bucket.
problem: if the HT were completely filled with no empty and no deleted
buckets, that loop would never end. due to our HT resizing, it can
never happen, but still not pretty.
when using hashindex_lookup (as also used some lines above), the code
is easier to understand, because (after we resized the HT), we freshly
create the same situation as after the first call of that function:
- return value < 0, because we (still) can not find the key
- start_idx will point to an empty bucket
Thus, we do not need the problematic loops we had there.
Modified the checks to make sure we really have an empty or deleted
bucket before overwriting it with data.
Added some additional asserts to make sure the code behaves.
if we run into some issue reading an input file, e.g. an I/O error,
the BackupOSError exception raised due to that will skip the current
file and no archive item will be created for this file.
But we maybe have already added some of its content chunks to the repo,
we have either written them as new chunks or incref'd some identical chunk
in the repo.
Added an exception handler that decrefs (and deletes if refcount reaches 0)
these chunks again before re-raising the exception, so the repo is in a
consistent state again and we do not have orphaned content chunks in the repo.