Nicholas Helke
nhelke@gmail.com

Using different interfaces to improve redundancy in a RAID setup

19 October 2009

I have been managing for close to two months a little, 6 drive, ZFS pool. The pools is made up of three mirrors, equatable with RAID 10 setup. I build the setup with what drives I had so they were spread over the SATA, ATA and USB interfaces, as follows:

SATA       USB
 120      320  640 – 640
    \     /
    120 320
      ATA

This morning my ZFS pool suffered its first disk failure (cf. Twitter). One of my 640 GB USB drives failed. My other 640 was exactly the same model, so I was concerned that it too might fail before I got a chance to replace and resilver the first one that failed. Additionally I started thinking about what might happen if there was a catastrophic USB interface failure. Something like an inadvertent unplugging of both 640 drives. The above figure illustrates quite clearly how the 640s are solely dependent on the USB interface.

I decided to fix this single fail point by connecting the replacement 640 to the SATA interface, as follows:

  640 – 640
SATA      USB
120        320
  \        /
  120    320
     ATA

The new setup is particularly robust. Any one interface can fail without service interruption and up to three drives can fail simultaneously as long as they all belong to different mirrored sets.

As it happens all is well that ends well. The failed drive turned out to be a failed case, so I was able to rescue the SATA drive from inside the case and with a quick restart—to install the drive internally—get the pool back to ONLINE state. It is worth noting that the pool never fell below a DEGRADED state.

The failure today is in fact the second premature case failure I have experienced with the WD AAKS enclosure. So I would recommend avoiding that model. Also, in general it is always worth checking when an external drive fails whether the internal drive is still viable or whether the case is still viable with a new internal disk.