Sunday, January 12, 2014

Replacing a bad hard drive in a ZFS pool - Linux/zfs-fuse

Thought I'd re-post this here, for convenience. I've got a home-brew NAS server that is running Fedora, zfs-fuse, and CIFS. The situation:
  • "Disk Utility" reports that drive /dev/sde has many bad sectors.
  • zpool status shows a degraded state for the main pool.  The drive is listed in the main pool by it's id.
    # zpool status -v
      pool: nasdata
     state: DEGRADED
    status: One or more devices could not be used because the label is missing or
        invalid.  Sufficient replicas exist for the pool to continue
        functioning in a degraded state.
    action: Replace the device using 'zpool replace'.
       see: http://www.sun.com/msg/ZFS-8000-4J
     scrub: resilver completed after 0h28m with 0 errors on Sat Feb 23 23:54:32 2013
    config:
    
        NAME                                      STATE     READ WRITE CKSUM
        nasdata                                   DEGRADED     0     0     0
          raidz1-0                                DEGRADED     0     0     0
            disk/by-id/ata-ST31500541AS_5XW0PDZ1  ONLINE       0     0     0
            disk/by-id/ata-ST31500541AS_5XW0PZJQ  ONLINE       0     0     0
            disk/by-id/ata-ST31500541AS_6XW1MKZZ  UNAVAIL      0   193     6  experienced I/O failures
            disk/by-id/ata-ST31500541AS_6XW1KRR9  ONLINE       0     0     0
    
    errors: No known data errors
    
Well that pretty much sums it up. No data errors in the array itself, but the disk is unavailable. Here's the process for replacing it:

  1. Tell zfs to take the disk offline:
    # zpool offline nasdata /dev/disk/by-id/ata-ST31500541AS_6XW1MKZZ
    

    Note that I'm using /dev/disk/by-id here. This is because that is how it is listed in the pool. 
  2. Shut the machine down.
  3. Add the new disk.  I also removed the failing disk because it was causing problems during POST.
    NOTE: REMEMBER TO LABEL YOUR DISKS! This really helps when the time comes to replace them! 
  4. Start the machine up.
  5. Tell zfs about the new disk:
    # zpool replace nasdata /dev/disk/by-id/ata-ST31500541AS_6XW1MKZZ /dev/disk/by-id/ata-SAMSUNG_HD204UI_S2HGJ90BA09450
    

    Note: I had to use the disk IDs because the pool set itself up that way in the first place (I had switched the drives to a new SATA card).
  6. Immediately ZFS begins replacing the disk:
    # zpool status 
      pool: nasdata
     state: DEGRADED
    status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
    action: Wait for the resilver to complete.
     scrub: resilver in progress for 0h0m, 0.00% done, 2695h59m to go
    config:
        NAME                                                 STATE     READ WRITE CKSUM
        nasdata                                              DEGRADED     0     0     0
          raidz1-0                                           DEGRADED     0     0     0
            disk/by-id/ata-ST31500541AS_5XW0PDZ1             ONLINE       0     0     0
            disk/by-id/ata-ST31500541AS_5XW0PZJQ             ONLINE       0     0     0
            replacing-2                                      DEGRADED     0     0     0
              disk/by-id/ata-ST31500541AS_6XW1MKZZ           OFFLINE      0   193     6
              disk/by-id/ata-SAMSUNG_HD204UI_S2HGJ90BA09450  ONLINE       0     0     0  2.34M resilvered
            disk/by-id/ata-ST31500541AS_6XW1KRR9             ONLINE       0     0     0
    errors: No known data errors
    

    Now hopefully this won't take 2695 hours to complete! :) Later on the status goes down to 11h. Okay, that's doable. 
  7. Several hours later, the new drive is incorporated into the pool:
    # zpool status -v
      pool: nasdata
     state: ONLINE
     scrub: resilver completed after 9h2m with 0 errors on Sun Feb 24 10:30:21 2013
    config:
            NAME                                               STATE     READ WRITE CKSUM
            nasdata                                            ONLINE       0     0     0
              raidz1-0                                         ONLINE       0     0     0
                disk/by-id/ata-ST31500541AS_5XW0PDZ1           ONLINE       0     0     0
                disk/by-id/ata-ST31500541AS_5XW0PZJQ           ONLINE       0     0     0
                disk/by-id/ata-SAMSUNG_HD204UI_S2HGJ90BA09450  ONLINE       0     0     0  916G resilvered
                disk/by-id/ata-ST31500541AS_6XW1KRR9           ONLINE       0     0     0
    errors: No known data errors
So that's it. While it was re-slivering, the ZFS filesystem was completely available. How nice! My SMB/CIFS shares were working just fine.