Add discussion and note on file attrs.

2024-12-24 16:55:36 +00:00 · 2022-03-09 14:06:40 +04:00 · 2022-03-09 14:06:40 +04:00 · dd1023df44
commit dd1023df44
parent a82673e708
1 changed files with 49 additions and 9 deletions
--- a/docs/faq.rst
+++ b/docs/faq.rst
@ -880,13 +880,30 @@ If you run into that, try this:
 What's the expected backup performance?
 ---------------------------------------

-A first backup will usually be somehow "slow" because there is a lot of data
-to process. Performance here depends on a lot of factors, so it is hard to
-give specific numbers.
+Compared to simply copying files (e.g. with ``rsync``), Borg has more work to do.
+This can make creation of the first archive slower, but saves time 
+and disk space on subsequent runs. Here what Borg does when you run ``borg create``:
+
+- Borg chunks the file (using the relatively expensive buzhash algorithm)
+- It then computes the "id" of the chunk (hmac-sha256 (often slow, except 
+  if your CPU has sha256 acceleration) or blake2b (fast, in software))
+- Then it checks whether this chunk is already in the repo (local hashtable lookup, 
+  fast). If so, the processing of the chunk is completed here. Otherwise it needs to 
+  process the chunk:
+- Compresses the chunks (the default lz4 is super fast)
+- Encrypts (AES, usually fast if your CPU has AES acceleration as usual
+  since about 10y)
+- Authenticates ("signs") using hmac-sha256 or blake2b (see above),
+- Transmits to repo (remote repo), usually involving an SSH connection 
+  (does its own encryption)
+- Stores the chunk into a key/value store (the key is the chunk id, the value 
+  is the data). While doing that, it computes a CRC32 of the data (repo low-level
+  checksum, used by borg check --repository) and also updates the repo index 
+  (another hashtable).

 Subsequent backups are usually very fast if most files are unchanged and only
 a few are new or modified. The high performance on unchanged files primarily depends
-only on a few factors (like fs recursion + metadata reading performance and the
+only on a few factors (like FS recursion + metadata reading performance and the
 files cache working as expected) and much less on other factors.

 E.g., for this setup:
@ -904,14 +921,37 @@ few FAQ entries below.

 .. _slow_backup:

-Why is backup slow for me?
+Why is my backup so slow?
 --------------------------

-So, if you feel your Borg backup is too slow somehow, you should find out why.
+If you feel your Borg backup is too slow somehow, here is what you can do:

-The usual way to approach this is to add ``--list --filter=AME --stats`` to your
-``borg create`` call to produce more log output, including a file list (with file status
-characters) and also some statistics at the end of the backup.
+- Make sure Borg has enough RAM (depends on how big your repo is / how many
+  files you have)
+- Use one of the blake2 modes for --encryption except if you positively know 
+  your CPU (and openssl) accelerates sha256 (then stay with hmac-sha256).
+- Don't use any expensive compression. The default is lz4 and super fast.
+  Uncompressed is often slower than lz4.
+- Just wait. You can also interrupt it and start it again as often as you like,
+  it will converge against a valid "completed" state (see ``--checkpoint-interval``,
+  maybe use the default, but in any case don't make it too short). It is starting
+  from the beginning each time, but it is still faster then as it does not store 
+  data into the repo which it already has there from last checkpoint.
+- If you don’t need additional file attributes, you can disable them with ``--noflags``,
+  ``--noacls``, ``--noxattrs``. This can lead to noticable performance improvements
+  when your backup consists of many small files.
+
+If you feel that Borg "freezes" on a file, it could be in the middle of processing a
+large file (like ISOs or VM images). Borg < 1.2 announces file names *after* finishing
+with the file. This can lead to displaying the name of a small file, while processing the
+next (larger) file. For very big files this can lead to the progress display show some
+previous short file for a long time while it processes the big one. With Borg 1.2 this 
+was changed to announcing the filename before starting to process it.
+
+To see what files have changed and take more time processing, you can also add
+``--list --filter=AME --stats`` to your ``borg create`` call to produce more log output, 
+including a file list (with file status characters) and also some statistics at 
+the end of the backup.

 Then you do the backup and look at the log output: