Merge pull request #5475 from milkey-mouse/fp-5377

docs: how to approach borg speed issues (forward port)
2025-02-22 06:01:54 +00:00 · 2020-11-03 15:55:15 +01:00 · 2020-11-03 15:55:15 +01:00 · 47e96bc261
commit 47e96bc261
parent 02d917b460 60ee199a0f
2 changed files with 62 additions and 0 deletions
--- a/docs/faq.rst
+++ b/docs/faq.rst
@ -798,6 +798,67 @@ If you run into that, try this:
  the parent directory (or even everything)
 - mount the repo using FUSE and use some file manager

+.. _expected_performance:
+
+What's the expected backup performance?
+---------------------------------------
+
+A first backup will usually be somehow "slow" because there is a lot of data
+to process. Performance here depends on a lot of factors, so it is hard to
+give specific numbers.
+
+Subsequent backups are usually very fast if most files are unchanged and only
+a few are new or modified. The high performance on unchanged files primarily depends
+only on a few factors (like fs recursion + metadata reading performance and the
+files cache working as expected) and much less on other factors.
+
+E.g., for this setup:
+
+- server grade machine (4C/8T 2013 Xeon, 64GB RAM, 2x good 7200RPM disks)
+- local zfs filesystem (mirrored) containing the backup source data
+- repository is remote (does not matter much for unchanged files)
+- backup job runs while machine is otherwise idle
+
+The observed performance is that |project_name| can process about
+**1 million unchanged files (and a few small changed ones) in 4 minutes!**
+
+If you are seeing much less than that in similar circumstances, read the next
+few FAQ entries below.
+
+.. _slow_backup:
+
+Why is backup slow for me?
+--------------------------
+
+So, if you feel your |project_name| backup is too slow somehow, you should find out why.
+
+The usual way to approach this is to add ``--list --filter=AME --stats`` to your
+``borg create`` call to produce more log output, including a file list (with file status
+characters) and also some statistics at the end of the backup.
+
+Then you do the backup and look at the log output:
+
+- stats: Do you really have little changes or are there more changes than you thought?
+  In the stats you can see the overall volume of changed data, which needed to be
+  added to the repo. If that is a lot, that can be the reason why it is slow.
+- ``A`` status ("added") in the file list:
+  If you see that often, you have a lot of new files (files that |project_name| did not find
+  in the files cache). If you think there is something wrong with that (the file was there
+  already in the previous backup), please read the FAQ entries below.
+- ``M`` status ("modified") in the file list:
+  If you see that often, |project_name| thinks that a lot of your files might be modified
+  (|project_name| found them in the files cache, but the metadata read from the filesystem did
+  not match the metadata stored in the files cache).
+  In such a case, |project_name| will need to process the files' contents completely, which is
+  much slower than processing unmodified files (|project_name| does not read their contents!).
+  The metadata values used in this comparison are determined by the ``--files-cache`` option
+  and could be e.g. size, ctime and inode number (see the ``borg create`` docs for more
+  details and potential issues).
+  You can use the ``stat`` command on files to manually look at fs metadata to debug if
+  there is any unexpected change triggering the ``M`` status.
+
+See also the next few FAQ entries for more details.
+
 .. _a_status_oddity:

 I am seeing 'A' (added) status for an unchanged file!?
--- a/src/borg/archiver.py
+++ b/src/borg/archiver.py
@ -3078,6 +3078,7 @@ def define_borg_mount(parser):
          as it can not be set from userspace. But, a metadata-only change will already
          update the ctime, so there might be some unnecessary chunking/hashing even
          without content changes. Some filesystems do not support ctime (change time).
+          E.g. doing a chown or chmod to a file will change its ctime.
        - mtime usually works and only updates if file contents were changed. But mtime
          can be arbitrarily set from userspace, e.g. to set mtime back to the same value
          it had before a content change happened. This can be used maliciously as well as