From 454317dc0271ef23baca370d316e0b9a9cc715c9 Mon Sep 17 00:00:00 2001 From: Milkey Mouse Date: Sun, 5 Nov 2017 20:42:44 -0800 Subject: [PATCH 1/3] Add instructions for ntfsclone (fixes #81) --- docs/faq.rst | 58 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 58 insertions(+) diff --git a/docs/faq.rst b/docs/faq.rst index 62e5f8c67..527422a4d 100644 --- a/docs/faq.rst +++ b/docs/faq.rst @@ -48,6 +48,64 @@ and platform/software dependency. Combining Borg with the mechanisms provided by the platform (snapshots, hypervisor features) will be the best approach to start tackling them. +How can I decrease the size of disk image backups? +-------------------------------------------------- + +Full disk images are as large as the full disk when uncompressed and might not get much +smaller post-deduplication after heavy use. This is because virtually all file systems +don't actually delete the data on disk (that is the place of so-called "secure delete") +but instead delete the filesystem entries referring to the data. This leaves the random +data on disk until the FS eventually claims it for another file. Therefore, if a hard +drive nears capacity and files are deleted again, the change will barely decrease the +space it takes up when compressed and deduplicated. Depending on the filesystem of the +VM (or physical computer, if for some reason a normal filesystem backup can't be taken), +there are several ways to decrease the size of a full image: + +Using ntfsclone (NTFS, i.e. Windows VMs) +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +ntfsclone can only operate on filesystems with the journal cleared (i.e. turned-off +machines) which somewhat limits its utility in the case of VM snapshots. However, +when it can be used, its special image format is even more efficient than just zeroing +and deduplicating. For backup, save the disk header and the contents of each partition:: + + HEADER_SIZE=$(sfdisk -lo Start $DISK | grep -A1 -P 'Start$' | tail -n1 | xargs echo) + PARTITIONS=$(sfdisk -lo Device,Type $DISK | sed -e '1,/Device\s*Type/d') + dd if=$DISK count=$HEADER_SIZE | borg create repo::hostname-partinfo - + echo "$PARTITIONS" | grep NTFS | cut -d' ' -f1 | while read x; do + PARTNUM=$(echo $x | grep -Eo "[0-9]+$") + ntfsclone -so - $x | borg create repo::hostname-part$PARTNUM - + done + # to backup non-NTFS partitions as well: + echo "$PARTITIONS" | grep -v NTFS | cut -d' ' -f1 | while read x; do + PARTNUM=$(echo $x | grep -Eo "[0-9]+$") + borg create --read-special repo::hostname-part$PARTNUM $x + done + +Restoration is similar to the above process, but done in reverse:: + + borg extract --stdout repo::hostname-partinfo | dd of=$DISK && partprobe + PARTITIONS=$(sfdisk -lo Device,Type $DISK | sed -e '1,/Device\s*Type/d') + borg list --format {archive}{NL} repo | grep 'part[0-9]*$' | while read x; do + PARTNUM=$(echo $x | grep -Eo "[0-9]+$") + PARTITION=$(echo "$PARTITIONS" | grep -E "$DISKp?$PARTNUM" | head -n1) + if echo "$PARTITION" | cut -d' ' -f2- | grep -q NTFS; then + borg extract --stdout repo::$x | ntfsclone -rO $(echo "$PARTITION" | cut -d' ' -f1) - + else + borg extract --stdout repo::$x | dd of=$(echo "$PARTITION" | cut -d' ' -f1) + fi + done + +.. note:: + + When backing up a disk image (as opposed to a real block device), mount it as + a loopback image to use the above snippets:: + + DISK=$(losetup -Pf --show /path/to/disk/image) + # do backup as shown above + sync $DISK + losetup -d $DISK + Can I backup from multiple servers into a single repository? ------------------------------------------------------------ From f9ed3b3ed7998bbb5166ca5dfb2783edbeb84efa Mon Sep 17 00:00:00 2001 From: Milkey Mouse Date: Thu, 9 Nov 2017 20:36:39 -0800 Subject: [PATCH 2/3] Add instructions for zerofree --- docs/faq.rst | 23 ++++++++++++++++++++++- 1 file changed, 22 insertions(+), 1 deletion(-) diff --git a/docs/faq.rst b/docs/faq.rst index 527422a4d..bf539ff0e 100644 --- a/docs/faq.rst +++ b/docs/faq.rst @@ -103,9 +103,30 @@ Restoration is similar to the above process, but done in reverse:: DISK=$(losetup -Pf --show /path/to/disk/image) # do backup as shown above - sync $DISK losetup -d $DISK +Using zerofree (ext2, ext3, ext4) +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +zerofree works similarly to ntfsclone in that it zeros out unused chunks of the FS, except +that it works in place, zeroing the original partition. This makes the backup process a bit +simpler:: + + sfdisk -lo Device,Type $DISK | sed -e '1,/Device\s*Type/d' | grep Linux | cut -d' ' -f1 | xargs -n1 zerofree + borg create --read-special repo::hostname-disk $DISK + +Because the partitions were zeroed in place, restoration is only one command:: + + borg extract --stdout repo::hostname-disk | dd of=$DISK + +.. note:: The "traditional" way to zero out space on a partition, especially one already + mounted, is to simply ``dd`` from ``/dev/zero`` to a temporary file and delete + it. This is ill-advised for the reasons mentioned in the ``zerofree`` man page: + + - it is slow + - it makes the disk image (temporarily) grow to its maximal extent + - it (temporarily) uses all free space on the disk, so other concurrent write actions may fail. + Can I backup from multiple servers into a single repository? ------------------------------------------------------------ From 46698bde6ed1af40ffd914511abda132fa9c73bc Mon Sep 17 00:00:00 2001 From: Milkey Mouse Date: Thu, 23 Nov 2017 11:36:05 -0800 Subject: [PATCH 3/3] Move image backup-related FAQ entries to a new page --- docs/deployment.rst | 1 + docs/deployment/image-backup.rst | 119 +++++++++++++++++++++++++++++++ docs/faq.rst | 119 ------------------------------- 3 files changed, 120 insertions(+), 119 deletions(-) create mode 100644 docs/deployment/image-backup.rst diff --git a/docs/deployment.rst b/docs/deployment.rst index 7b1caf925..c7c3a6ee1 100644 --- a/docs/deployment.rst +++ b/docs/deployment.rst @@ -12,3 +12,4 @@ This chapter details deployment strategies for the following scenarios. deployment/central-backup-server deployment/hosting-repositories deployment/automated-local + deployment/image-backup diff --git a/docs/deployment/image-backup.rst b/docs/deployment/image-backup.rst new file mode 100644 index 000000000..fd684c3f2 --- /dev/null +++ b/docs/deployment/image-backup.rst @@ -0,0 +1,119 @@ +.. include:: ../global.rst.inc +.. highlight:: none + +Backing up entire disk images +============================= + +Backing up disk images can still be efficient with Borg because its `deduplication`_ +technique makes sure only the modified parts of the file are stored. Borg also has +optional simple sparse file support for extract. + +Decreasing the size of image backups +------------------------------------ + +Disk images are as large as the full disk when uncompressed and might not get much +smaller post-deduplication after heavy use because virtually all file systems don't +actually delete file data on disk but instead delete the filesystem entries referencing +the data. Therefore, if a disk nears capacity and files are deleted again, the change +will barely decrease the space it takes up when compressed and deduplicated. Depending +on the filesystem, there are several ways to decrease the size of a disk image: + +Using ntfsclone (NTFS, i.e. Windows VMs) +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +``ntfsclone`` can only operate on filesystems with the journal cleared (i.e. turned-off +machines), which somewhat limits its utility in the case of VM snapshots. However, when +it can be used, its special image format is even more efficient than just zeroing and +deduplicating. For backup, save the disk header and the contents of each partition:: + + HEADER_SIZE=$(sfdisk -lo Start $DISK | grep -A1 -P 'Start$' | tail -n1 | xargs echo) + PARTITIONS=$(sfdisk -lo Device,Type $DISK | sed -e '1,/Device\s*Type/d') + dd if=$DISK count=$HEADER_SIZE | borg create repo::hostname-partinfo - + echo "$PARTITIONS" | grep NTFS | cut -d' ' -f1 | while read x; do + PARTNUM=$(echo $x | grep -Eo "[0-9]+$") + ntfsclone -so - $x | borg create repo::hostname-part$PARTNUM - + done + # to backup non-NTFS partitions as well: + echo "$PARTITIONS" | grep -v NTFS | cut -d' ' -f1 | while read x; do + PARTNUM=$(echo $x | grep -Eo "[0-9]+$") + borg create --read-special repo::hostname-part$PARTNUM $x + done + +Restoration is a similar process:: + + borg extract --stdout repo::hostname-partinfo | dd of=$DISK && partprobe + PARTITIONS=$(sfdisk -lo Device,Type $DISK | sed -e '1,/Device\s*Type/d') + borg list --format {archive}{NL} repo | grep 'part[0-9]*$' | while read x; do + PARTNUM=$(echo $x | grep -Eo "[0-9]+$") + PARTITION=$(echo "$PARTITIONS" | grep -E "$DISKp?$PARTNUM" | head -n1) + if echo "$PARTITION" | cut -d' ' -f2- | grep -q NTFS; then + borg extract --stdout repo::$x | ntfsclone -rO $(echo "$PARTITION" | cut -d' ' -f1) - + else + borg extract --stdout repo::$x | dd of=$(echo "$PARTITION" | cut -d' ' -f1) + fi + done + +.. note:: + + When backing up a disk image (as opposed to a real block device), mount it as + a loopback image to use the above snippets:: + + DISK=$(losetup -Pf --show /path/to/disk/image) + # do backup as shown above + losetup -d $DISK + +Using zerofree (ext2, ext3, ext4) +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +``zerofree`` works similarly to ntfsclone in that it zeros out unused chunks of the FS, +except it works in place, zeroing the original partition. This makes the backup process +a bit simpler:: + + sfdisk -lo Device,Type $DISK | sed -e '1,/Device\s*Type/d' | grep Linux | cut -d' ' -f1 | xargs -n1 zerofree + borg create --read-special repo::hostname-disk $DISK + +Because the partitions were zeroed in place, restoration is only one command:: + + borg extract --stdout repo::hostname-disk | dd of=$DISK + +.. note:: The "traditional" way to zero out space on a partition, especially one already + mounted, is to simply ``dd`` from ``/dev/zero`` to a temporary file and delete + it. This is ill-advised for the reasons mentioned in the ``zerofree`` man page: + + - it is slow + - it makes the disk image (temporarily) grow to its maximal extent + - it (temporarily) uses all free space on the disk, so other concurrent write actions may fail. + +Virtual machines +---------------- + +If you use non-snapshotting backup tools like Borg to back up virtual machines, then +the VMs should be turned off for the duration of the backup. Backing up live VMs can +(and will) result in corrupted or inconsistent backup contents: a VM image is just a +regular file to Borg with the same issues as regular files when it comes to concurrent +reading and writing from the same file. + +For backing up live VMs use filesystem snapshots on the VM host, which establishes +crash-consistency for the VM images. This means that with most file systems (that +are journaling) the FS will always be fine in the backup (but may need a journal +replay to become accessible). + +Usually this does not mean that file *contents* on the VM are consistent, since file +contents are normally not journaled. Notable exceptions are ext4 in data=journal mode, +ZFS and btrfs (unless nodatacow is used). + +Applications designed with crash-consistency in mind (most relational databases like +PostgreSQL, SQLite etc. but also for example Borg repositories) should always be able +to recover to a consistent state from a backup created with crash-consistent snapshots +(even on ext4 with data=writeback or XFS). Other applications may require a lot of work +to reach application-consistency; it's a broad and complex issue that cannot be explained +in entirety here. + +Hypervisor snapshots capturing most of the VM's state can also be used for backups and +can be a better alternative to pure file system based snapshots of the VM's disk, since +no state is lost. Depending on the application this can be the easiest and most reliable +way to create application-consistent backups. + +Borg doesn't intend to address these issues due to their huge complexity and +platform/software dependency. Combining Borg with the mechanisms provided by the platform +(snapshots, hypervisor features) will be the best approach to start tackling them. \ No newline at end of file diff --git a/docs/faq.rst b/docs/faq.rst index bf539ff0e..495c92bf4 100644 --- a/docs/faq.rst +++ b/docs/faq.rst @@ -8,125 +8,6 @@ Frequently asked questions Usage & Limitations ################### -Can I backup VM disk images? ----------------------------- - -Yes, the `deduplication`_ technique used by -|project_name| makes sure only the modified parts of the file are stored. -Also, we have optional simple sparse file support for extract. - -If you use non-snapshotting backup tools like Borg to back up virtual machines, -then the VMs should be turned off for the duration of the backup. Backing up live VMs can (and will) -result in corrupted or inconsistent backup contents: a VM image is just a regular file to -Borg with the same issues as regular files when it comes to concurrent reading and writing from -the same file. - -For backing up live VMs use file system snapshots on the VM host, which establishes -crash-consistency for the VM images. This means that with most file systems -(that are journaling) the FS will always be fine in the backup (but may need a -journal replay to become accessible). - -Usually this does not mean that file *contents* on the VM are consistent, since file -contents are normally not journaled. Notable exceptions are ext4 in data=journal mode, -ZFS and btrfs (unless nodatacow is used). - -Applications designed with crash-consistency in mind (most relational databases -like PostgreSQL, SQLite etc. but also for example Borg repositories) should always -be able to recover to a consistent state from a backup created with -crash-consistent snapshots (even on ext4 with data=writeback or XFS). - -Hypervisor snapshots capturing most of the VM's state can also be used for backups -and can be a better alternative to pure file system based snapshots of the VM's disk, -since no state is lost. Depending on the application this can be the easiest and most -reliable way to create application-consistent backups. - -Other applications may require a lot of work to reach application-consistency: -It's a broad and complex issue that cannot be explained in entirety here. - -Borg doesn't intend to address these issues due to their huge complexity -and platform/software dependency. Combining Borg with the mechanisms provided -by the platform (snapshots, hypervisor features) will be the best approach -to start tackling them. - -How can I decrease the size of disk image backups? --------------------------------------------------- - -Full disk images are as large as the full disk when uncompressed and might not get much -smaller post-deduplication after heavy use. This is because virtually all file systems -don't actually delete the data on disk (that is the place of so-called "secure delete") -but instead delete the filesystem entries referring to the data. This leaves the random -data on disk until the FS eventually claims it for another file. Therefore, if a hard -drive nears capacity and files are deleted again, the change will barely decrease the -space it takes up when compressed and deduplicated. Depending on the filesystem of the -VM (or physical computer, if for some reason a normal filesystem backup can't be taken), -there are several ways to decrease the size of a full image: - -Using ntfsclone (NTFS, i.e. Windows VMs) -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -ntfsclone can only operate on filesystems with the journal cleared (i.e. turned-off -machines) which somewhat limits its utility in the case of VM snapshots. However, -when it can be used, its special image format is even more efficient than just zeroing -and deduplicating. For backup, save the disk header and the contents of each partition:: - - HEADER_SIZE=$(sfdisk -lo Start $DISK | grep -A1 -P 'Start$' | tail -n1 | xargs echo) - PARTITIONS=$(sfdisk -lo Device,Type $DISK | sed -e '1,/Device\s*Type/d') - dd if=$DISK count=$HEADER_SIZE | borg create repo::hostname-partinfo - - echo "$PARTITIONS" | grep NTFS | cut -d' ' -f1 | while read x; do - PARTNUM=$(echo $x | grep -Eo "[0-9]+$") - ntfsclone -so - $x | borg create repo::hostname-part$PARTNUM - - done - # to backup non-NTFS partitions as well: - echo "$PARTITIONS" | grep -v NTFS | cut -d' ' -f1 | while read x; do - PARTNUM=$(echo $x | grep -Eo "[0-9]+$") - borg create --read-special repo::hostname-part$PARTNUM $x - done - -Restoration is similar to the above process, but done in reverse:: - - borg extract --stdout repo::hostname-partinfo | dd of=$DISK && partprobe - PARTITIONS=$(sfdisk -lo Device,Type $DISK | sed -e '1,/Device\s*Type/d') - borg list --format {archive}{NL} repo | grep 'part[0-9]*$' | while read x; do - PARTNUM=$(echo $x | grep -Eo "[0-9]+$") - PARTITION=$(echo "$PARTITIONS" | grep -E "$DISKp?$PARTNUM" | head -n1) - if echo "$PARTITION" | cut -d' ' -f2- | grep -q NTFS; then - borg extract --stdout repo::$x | ntfsclone -rO $(echo "$PARTITION" | cut -d' ' -f1) - - else - borg extract --stdout repo::$x | dd of=$(echo "$PARTITION" | cut -d' ' -f1) - fi - done - -.. note:: - - When backing up a disk image (as opposed to a real block device), mount it as - a loopback image to use the above snippets:: - - DISK=$(losetup -Pf --show /path/to/disk/image) - # do backup as shown above - losetup -d $DISK - -Using zerofree (ext2, ext3, ext4) -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -zerofree works similarly to ntfsclone in that it zeros out unused chunks of the FS, except -that it works in place, zeroing the original partition. This makes the backup process a bit -simpler:: - - sfdisk -lo Device,Type $DISK | sed -e '1,/Device\s*Type/d' | grep Linux | cut -d' ' -f1 | xargs -n1 zerofree - borg create --read-special repo::hostname-disk $DISK - -Because the partitions were zeroed in place, restoration is only one command:: - - borg extract --stdout repo::hostname-disk | dd of=$DISK - -.. note:: The "traditional" way to zero out space on a partition, especially one already - mounted, is to simply ``dd`` from ``/dev/zero`` to a temporary file and delete - it. This is ill-advised for the reasons mentioned in the ``zerofree`` man page: - - - it is slow - - it makes the disk image (temporarily) grow to its maximal extent - - it (temporarily) uses all free space on the disk, so other concurrent write actions may fail. - Can I backup from multiple servers into a single repository? ------------------------------------------------------------