Thought I'd re-post this here, for convenience. I've got a home-brew NAS server that is running Fedora, zfs-fuse, and CIFS. The situation:
Well that pretty much sums it up. No data errors in the array itself, but the disk is unavailable. Here's the process for replacing it:
- Tell zfs to take the disk offline:
# zpool offline nasdata /dev/disk/by-id/ata-ST31500541AS_6XW1MKZZ
Note that I'm using /dev/disk/by-id here. This is because that is how it is listed in the pool.
- Shut the machine down.
- Add the new disk. I also removed the failing disk because it was causing problems during POST.
NOTE: REMEMBER TO LABEL YOUR DISKS! This really helps when the time comes to replace them!
- Start the machine up.
- Tell zfs about the new disk:
# zpool replace nasdata /dev/disk/by-id/ata-ST31500541AS_6XW1MKZZ /dev/disk/by-id/ata-SAMSUNG_HD204UI_S2HGJ90BA09450
Note: I had to use the disk IDs because the pool set itself up that way in the first place (I had switched the drives to a new SATA card).
- Immediately ZFS begins replacing the disk:
# zpool status
pool: nasdata
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scrub: resilver in progress for 0h0m, 0.00% done, 2695h59m to go
config:
NAME STATE READ WRITE CKSUM
nasdata DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
disk/by-id/ata-ST31500541AS_5XW0PDZ1 ONLINE 0 0 0
disk/by-id/ata-ST31500541AS_5XW0PZJQ ONLINE 0 0 0
replacing-2 DEGRADED 0 0 0
disk/by-id/ata-ST31500541AS_6XW1MKZZ OFFLINE 0 193 6
disk/by-id/ata-SAMSUNG_HD204UI_S2HGJ90BA09450 ONLINE 0 0 0 2.34M resilvered
disk/by-id/ata-ST31500541AS_6XW1KRR9 ONLINE 0 0 0
errors: No known data errors
Now hopefully this won't take 2695 hours to complete! :) Later on the status goes down to 11h. Okay, that's doable.
- Several hours later, the new drive is incorporated into the pool:
# zpool status -v
pool: nasdata
state: ONLINE
scrub: resilver completed after 9h2m with 0 errors on Sun Feb 24 10:30:21 2013
config:
NAME STATE READ WRITE CKSUM
nasdata ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
disk/by-id/ata-ST31500541AS_5XW0PDZ1 ONLINE 0 0 0
disk/by-id/ata-ST31500541AS_5XW0PZJQ ONLINE 0 0 0
disk/by-id/ata-SAMSUNG_HD204UI_S2HGJ90BA09450 ONLINE 0 0 0 916G resilvered
disk/by-id/ata-ST31500541AS_6XW1KRR9 ONLINE 0 0 0
errors: No known data errors
So that's it. While it was re-slivering, the ZFS filesystem was completely available. How nice! My SMB/CIFS shares were working just fine.
Great post. I am getting ready to replace a drive and need to know the new drive's ID. Every command I see to find it shows the UUID which is not the same as the "by-id" id ZFS uses. How do I know what the new drive's ID is?
ReplyDeleteNevermind...found the command
ReplyDeletels -l /dev/disk/by-id/