Simultaneous Hot Spare Resilver
Imagine you have a 12 disk raidz3 vdev and not one, but two of the disks are producing signficant SMART errors ... monitoring suggests that these drives will each fail soon.
Normally, you might fail out the offending drive, physically replace it, and commence resilvering onto the new drive. Since this is raidz3 you're fairly well protected against additional failures during the resilver - you can lose another two drives after all and not suffer data loss. Your workflow would look like this:
zpool offline POOL da13
zpool replace POOL da13
This is different, however, because now there are two drives in pre-failure status ... you don't have the redundancy you expected to have - so it would be smart to resilver onto a hot spare without offlining an existing drive:
zpool replace POOL da13 da99
(Note that we did not offline the drive first before initiating the 'zpool replace' ...)
The problem here is that a resilver is an intensive operation for the entire vdev - you might beat up other marginal drives during the resilver, causing them to also fail.
Further, resilvering can have signficant performance impacts.
For these reasons it would be very nice not to have to do two consecutive resilvers - one for each failing drive. Luckily, ZFS allows you to amortize a single resilver operation over multiple drives.
The workflow is almost identical - you begin by doing a hot-spare resilver of the first drive:
zpool replace POOL da13 da99
... but then, after that command completes and you verify that the resilver has properly begun (by running 'zpool status') you simply run a second 'zpool replace' command with the other pair of failing/spare drives:
zpool replace POOL da15 da100
Your 'zpool status' output will then show two drives resilvering with two different hot-spares and your time to completion will not increase much as compared to when you were only resilvering one drive.
Allan sez ....
borg Binary Configuration for Performance
rsync.net has the 'borg' backup utility built into our platform. You can point your 'borg' client to your rsync.net account and the 'borg' binary executable that we maintain on this end will answer and perform backup functions for you.
The borg website is here:
https://borgbackup.readthedocs.io/en/stable/
... and a good description of how it works and why you should use it is here:
https://www.stavros.io/posts/holy-grail-backups/
... and we have had tremendous success allowing end users to point their borg backup jobs offsite to our platform.
We have learned about two different deployment and configuration options that are noteworthy and, in our case, made a tremendous difference in performance:
First, borg performs an initial unit tests every time it is run:
https://github.com/borgbackup/borg/blob/1.1-maint/src/borg/selftest.py
... and thanks to Thomas Waldmann, the project lead for borg, as of v1.1.17 there is now a switch to disable those unit tests:
# first test borg without this and if it works ok,
# use this to optimize startup performance:
export BORG_SELFTEST=disabled
Second - and more impactful for performance - we learned that the binary distribution of 'borg' comes in two flavors: the .tgz file which is the "directory" flavor and the "single file" flavor which needs to unpack all bundled files to a temp location first before executing (using pyinstaller).
If you are running many hundreds of 'borg' processes answering connections from users, the .tgz distribution will be significantly more performant as you will have unpacked all of the bundled files just once, during installation, and it never needs to be done again.
In our case, we had both inefficiencies together - we were running thousands of 'borg' unit tests every hour and every call of the binary resulted in a temporary unpacking of the bundled files (which generated I/O, etc.)
More Information
rsync.net publishes a wide array of support documents as well as a FAQ
rsync.net has been tested, reviewed and discussed in a variety of venues.
You, or your CEO, may find our CEO Page useful.
Please see our HIPAA, GDPR, and Sarbanes-Oxley compliance statements.
Contact info@rsync.net for more information, and answers to your questions.
