For small remainders of files (last chunk), we do not need to buzhash if it
is already clear that there is not enough left (we want at least min_size big
chunks).
Small files are handled by same code - as they only give 1 chunk, that is
the last chunk (see above).
See "Cases" considerations below.
For big files, we do not need to buzhash the first min_size bytes of a chunk -
we do not want to cut there anyway, so we can start buzhashing at offset
min_size.
Cases (before this change)
--------------------------
- A) remaining <= window_size
- would do 2 chunker_fill calls (both line 253) and trigger eof with the 2nd call
- no buzhashing
- result is 1 <remaining> length chunk
- B) window_size < remaining <= min_size:
- the chunker would do 1 chunker_fill call (line 253) that would read the entire remaining file (but not trigger eof yet)
- would compute all possible remaining - window_size + 1 buzhashes, but without a chance for a cut,
because there is also the n < min_size condition
- would do another chunker_fill call (line 282), but not get more data, so loop ends
- result is 1 <remaining> length chunk
- C) file > min_size:
- normal chunking
Cases (after this change)
-------------------------
- A) similar to above A), but up to remaining < min_size + window_size + 1,
so it does not buzhash if there is no chance for a cut.
- B) see C) above
There are persistent questions why output from options like --list
and --stats doesn't show up. Also, borg currently isn't able to
show *just* the output for a given option (--list, --stats,
--show-rc, --show-version, or --progress), without other INFO level
messages.
The solution is to use more granular loggers, so that messages
specific to a given option goes to a logger designated for that
option. That option-specific logger can then be configured
separately from the regular loggers.
Those option-specific loggers can also be used as a hook in a
BORG_LOGGING_CONF config file to log the --list output to a separate
file, or send --stats output to a network socket where some daemon
could analyze it.
Steps:
- create an option-specific logger for each of the implied output options
- modify the messages specific to each option to go to the correct logger
- if an implied output option is passed, change the option-specific
logger (only) to log at INFO level
- test that root logger messages don't come through option-specific loggers
They shouldn't, per https://docs.python.org/3/howto/logging.html#logging-flow
but test just the same. Particularly test a message that can come from
remote repositories.
Fixes#526, #573, #665, #824
Parser now understands both old format messages (to keep talking to
old server) and new format messages that pass a logger name. If
logger name is passed, the message is directed to the same logger
locally.
This could be cherry-picked to 1.x-maint (and 0.x-maint?) to allow
point releases to understand borg 1.1 server messages changed in the
next commit. Worst case, currently existing 0.x and 1.0.x clients
talking to a 1.1.x server will see messages like:
borg.repository Remote: hi
borg.archiver Remote: foo
instead of
Remote: hi
Remote: foo
- Instead of very small (5 MB-ish) segment files, use larger ones
- Request asynchronous write-out or write-through (TODO) where it is supported,
to achieve a continuously high throughput for writes
- Instead of depending on ordered writes (write data, commit tag, sync)
for consistency, do a double-sync commit as more serious RDBMS also do
i.e. write data, sync, write commit tag, sync
Since commits are very expensive in Borg at the moment this makes no
difference performance-wise.
New platform APIs: SyncFile, sync_dir
[x] Naive implementation (equivalent to what Borg did before)
[x] Linux implementation
[ ] Windows implementation
[-] OSX implementation (F_FULLSYNC)