Backup storage for thousands of virtual machines using free tools

Hi, recently I faced across an interesting task to setup a storage server for backup of a large number of block devices.

Every week we back up all virtual machines in our cloud, so there is a need to be able handle thousands of backups and do it as fast and efficiently as possible.

Unfortunately, the standard RAID5 , RAID6 levels are not suitable due the fact that recovery process on such large disks as ours will be painfully long and most likely never finished successfully.

Let’s consider what alternatives are:

Erasure Coding — An analogue to RAID5, RAID6, but with a configurable parity level. Also the fault tolerance is performed not for whole block devices, but for each object separately. The easiest way to try Erasure Coding is to deploy minio .

DRAID is currently an unreleased feature of ZFS. Unlike RAIDZ, DRAID has a distributed parity block and uses all the disks in the array during recovery, this makes it better surviving for disk failures and provides faster recovery than standard RAID levels.


For this setup I’ve got a server Fujitsu Primergy RX300 S7 with Intel Xeon CPU E5–2650L 0 @ 1.80GHz processor, nine RAM modules Samsung DDR3–1333 8Gb PC3L-10600R ECC Registered (M393B1K70DH0-YH9) , disk shelf Supermicro SuperChassis 847E26-RJBOD1 connected via Dual LSI SAS2X36 Expander and also 45 disks Seagage ST6000NM0115–1YZ110 for 6TB each.

Before make any decisions, we first need to properly test everything.

To do this I prepared and tested various configurations. I used minio, which acted as an S3 gateway and start it in different modes with a different number of targets.

Basically I was choosing between minio with erasure coding and software raid configurations with the same amount of disks and parity level, and these are: RAID6, RAIDZ2 and DRAID2.

For reference: when you run minio with just one target, then it works simply as an S3 gateway, representing your local file system as S3 storage. If you run minio with several targets, then the Erasure Coding mode will be automatically turned on, in this case it will spread data between your targets and provide fault tolerance for your objects.
By default, minio divides targets into groups of 16 disks, where each group has 2 parity. Those at the same time two disks can fail without data loss.

To perform benchmark I used 16 disks for 6TB each and I was writing small objects for 1MB size long, this quite accurately described our future load, since all modern backup tools divide data into blocks of several megabytes and write them this way.

I used the s3bench utility that was running on a remote server and sends tens of thousands of such objects in hundred streams to minio. Afterwards it was trying to read them back by the same way.

The benchmark results are shown in the following table:

As we can see, minio with erasure coding mode works much worse for writing than minio running on top of software RAID6, RAIDZ2 and DRAID2 in the same configuration.

Additionally the test of minio on ext4 vs XFS was requested. Surprisingly, but XFS was significantly slower than ext4 for my type of load.

In the first batch of tests, mdadm was showing superiority over ZFS, but later George Melikov suggested me few options, which significantly improved ZFS performance:

xattr=sa atime=off recordsize=1M

and after applying them, tests with ZFS got a lot better.

In the last two tests I also tried to move metadata ( special ) and ZIL ( log ) to the mirror of SSDs. But moving of metadata didn’t give much gain in writing speed, when ZIL was moved my SSDSC2KI128G8 ’s were brake everything with 100% utilization, so I considered this test a failure. However I do not exclude that if I had faster SSD disks then perhaps this could greatly improve my results, but unfortunately I didn’t have them.

Finally, I decided to stop on DRAID and despite it’s beta status it is the fastest and most effective storage solution in our case.

I created a simple DRAID2 in configuration with three groups and two distributed spares:

# zpool status data
  pool: data
 state: ONLINE
  scan: none requested

    NAME                 STATE     READ WRITE CKSUM
    data                 ONLINE       0     0     0
      draid2:3g:2s-0     ONLINE       0     0     0
        sdy              ONLINE       0     0     0
        sdam             ONLINE       0     0     0
        sdf              ONLINE       0     0     0
        sdau             ONLINE       0     0     0
        sdab             ONLINE       0     0     0
        sdo              ONLINE       0     0     0
        sdw              ONLINE       0     0     0
        sdak             ONLINE       0     0     0
        sdd              ONLINE       0     0     0
        sdas             ONLINE       0     0     0
        sdm              ONLINE       0     0     0
        sdu              ONLINE       0     0     0
        sdai             ONLINE       0     0     0
        sdaq             ONLINE       0     0     0
        sdk              ONLINE       0     0     0
        sds              ONLINE       0     0     0
        sdag             ONLINE       0     0     0
        sdi              ONLINE       0     0     0
        sdq              ONLINE       0     0     0
        sdae             ONLINE       0     0     0
        sdz              ONLINE       0     0     0
        sdan             ONLINE       0     0     0
        sdg              ONLINE       0     0     0
        sdac             ONLINE       0     0     0
        sdx              ONLINE       0     0     0
        sdal             ONLINE       0     0     0
        sde              ONLINE       0     0     0
        sdat             ONLINE       0     0     0
        sdaa             ONLINE       0     0     0
        sdn              ONLINE       0     0     0
        sdv              ONLINE       0     0     0
        sdaj             ONLINE       0     0     0
        sdc              ONLINE       0     0     0
        sdar             ONLINE       0     0     0
        sdl              ONLINE       0     0     0
        sdt              ONLINE       0     0     0
        sdah             ONLINE       0     0     0
        sdap             ONLINE       0     0     0
        sdj              ONLINE       0     0     0
        sdr              ONLINE       0     0     0
        sdaf             ONLINE       0     0     0
        sdao             ONLINE       0     0     0
        sdh              ONLINE       0     0     0
        sdp              ONLINE       0     0     0
        sdad             ONLINE       0     0     0
      s0-draid2:3g:2s-0  AVAIL   
      s1-draid2:3g:2s-0  AVAIL   

errors: No known data errors