zfs pool self healing and scrub and pre-replace "bad"-disks

ZFS的又一个强大之处, 支持坏块的自愈 (如果使用了冗余的话,如raidz1, raidz2, raidze, ... 并且正确的块可通过ECC重新计算出的话.).

同时ZFS具备类似ECC DIMM的校验功能, 默认使用SHA-256 checksum.

使用scrub来检测ZPOOL底层的块设备是否健康, 对于SAS或FC硬盘, 可以一个月检测一次, 而对于低端的SATA, SCSI设备则最好1周检测一次.

这些可以放在定时任务中执行, 例如每天0点1分开始执行一次scrub.

crontab -e
1 0 * * * /opt/zfs0.6.2/sbin/zpool scrub zptest

对于检测到的指标不好的盘, 可以提前更换(使用zpool replace).

指标 :

The rows in the "zpool status" command give you vital information about the pool, most of which are self-explanatory. They are defined as follows:

pool- The name of the pool.
state- The current health of the pool. This information refers only to the ability of the pool to provide the necessary replication level.
status- A description of what is wrong with the pool. This field is omitted if no problems are found.
action- A recommended action for repairing the errors. This field is an abbreviated form directing the user to one of the following sections. This field is omitted if no problems are found.
see- A reference to a knowledge article containing detailed repair information. Online articles are updated more often than this guide can be updated, and should always be referenced for the most up-to-date repair procedures. This field is omitted if no problems are found.
scrub- Identifies the current status of a scrub operation, which might include the date and time that the last scrub was completed, a scrub in progress, or if no scrubbing was requested.
errors- Identifies known data errors or the absence of known data errors.
config- Describes the configuration layout of the devices comprising the pool, as well as their state and any errors generated from the devices. The state can be one of the following: ONLINE, FAULTED, DEGRADED, UNAVAILABLE, or OFFLINE. If the state is anything but ONLINE, the fault tolerance of the pool has been compromised.
The columns in the status output, "READ", "WRITE" and "CHKSUM" are defined as follows:

NAME- The name of each VDEV in the pool, presented in a nested order.
STATE- The state of each VDEV in the pool. The state can be any of the states found in "config" above.
READ- I/O errors occurred while issuing a read request.
WRITE- I/O errors occurred while issuing a write request.
CHKSUM- Checksum errors. The device returned corrupted data as the result of a read request.

Scrubbing ZFS storage pools is not something that happens automatically. You need to do it manually, and it's highly recommended that you do it on a regularly scheduled interval. The recommended frequency at which you should scrub the data depends on the quality of the underlying disks. If you have SAS or FC disks, then once per month should be sufficient. If you have consumer grade SATA or SCSI, you should do once per week. You can schedule a scrub easily with the following command:
# zpool scrub tank
# zpool status tank
  pool: tank
 state: ONLINE
 scan: scrub in progress since Sat Dec  8 08:06:36 2012
    32.0M scanned out of 48.5M at 16.0M/s, 0h0m to go
    0 repaired, 65.99% done
config:

        NAME        STATE     READ WRITE CKSUM
        tank        ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            sde     ONLINE       0     0     0
            sdf     ONLINE       0     0     0
          mirror-1  ONLINE       0     0     0
            sdg     ONLINE       0     0     0
            sdh     ONLINE       0     0     0
          mirror-2  ONLINE       0     0     0
            sdi     ONLINE       0     0     0
            sdj     ONLINE       0     0     0

errors: No known data errors

例如, 使用raidz1冗余, 创建一个zp pool.

[root@spark01 ~]# zpool create zp raidz1 /home/digoal/zfs.disk1 /home/digoal/zfs.disk2 /home/digoal/zfs.disk3 /home/digoal/zfs.disk4 log mirror /home/digoal/zfs.log1 /home/digoal/zfs.log2

[root@spark01 ~]# zpool status
  pool: zp
 state: ONLINE
  scan: none requested
config:

        NAME                        STATE     READ WRITE CKSUM
        zp                          ONLINE       0     0     0
          raidz1-0                  ONLINE       0     0     0
            /home/digoal/zfs.disk1  ONLINE       0     0     0
            /home/digoal/zfs.disk2  ONLINE       0     0     0
            /home/digoal/zfs.disk3  ONLINE       0     0     0
            /home/digoal/zfs.disk4  ONLINE       0     0     0
        logs
          mirror-1                  ONLINE       0     0     0
            /home/digoal/zfs.log1   ONLINE       0     0     0
            /home/digoal/zfs.log2   ONLINE       0     0     0

errors: No known data errors

拷贝一些文件到dataset.

[root@spark01 ~]# cd /home/digoal
[root@spark01 digoal]# ll
total 10575000
drwxr-xr-x.  9 digoal digoal       4096 Mar 31 17:15 hadoop-2.4.0
-rw-rw-r--.  1 digoal digoal  138943699 Mar 31 17:16 hadoop-2.4.0.tar.gz
drwxr-xr-x. 10   7900   7900       4096 May 19 01:24 spl-0.6.2
-rw-r--r--.  1 root   root       565277 Aug 24  2013 spl-0.6.2.tar.gz
drwxr-xr-x. 13   7900   7900       4096 May 19 01:28 zfs-0.6.2
-rw-r--r--.  1 root   root      2158948 Aug 24  2013 zfs-0.6.2.tar.gz
-rw-r--r--.  1 root   root   2147483648 May 19 05:54 zfs.disk1
-rw-r--r--.  1 root   root   2147483648 May 19 05:54 zfs.disk2
-rw-r--r--.  1 root   root   2147483648 May 19 05:54 zfs.disk3
-rw-r--r--.  1 root   root   2147483648 May 19 05:54 zfs.disk4
-rw-r--r--.  1 root   root   1048576000 May 19 05:54 zfs.log1
-rw-r--r--.  1 root   root   1048576000 May 19 05:54 zfs.log2
[root@spark01 digoal]# zfs create zp/test
[root@spark01 digoal]# cp -r spl-0.6.2* zfs-0.6.2* hadoop-2.4.0* /zp/test/

[root@spark01 digoal]# df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1        31G  1.2G   29G   5% /
tmpfs            12G     0   12G   0% /dev/shm
/dev/sda3        89G   11G   74G  13% /home
zp              5.4G     0  5.4G   0% /zp
zp/test         5.9G  535M  5.4G   9% /zp/test

使用zpool scrub检查这个pool.

[root@spark01 digoal]# zpool scrub zp
[root@spark01 digoal]# zpool status
  pool: zp
 state: ONLINE
  scan: scrub repaired 0 in 0h0m with 0 errors on Mon May 19 05:56:17 2014
config:

        NAME                        STATE     READ WRITE CKSUM
        zp                          ONLINE       0     0     0
          raidz1-0                  ONLINE       0     0     0
            /home/digoal/zfs.disk1  ONLINE       0     0     0
            /home/digoal/zfs.disk2  ONLINE       0     0     0
            /home/digoal/zfs.disk3  ONLINE       0     0     0
            /home/digoal/zfs.disk4  ONLINE       0     0     0
        logs
          mirror-1                  ONLINE       0     0     0
            /home/digoal/zfs.log1   ONLINE       0     0     0
            /home/digoal/zfs.log2   ONLINE       0     0     0
errors: No known data errors

关闭一个正在执行的scrub任务 :

[root@spark01 test]# zpool scrub -s zp
cannot cancel scrubbing zp: there is no active scrub

接下来要测试一下在线替换scrub检查到问题的块设备, 我这里使用删除一个zfs.disk来模拟坏盘.

[root@spark01 digoal]# rm -f zfs.disk1
[root@spark01 digoal]# zpool scrub zp    #使用scrub没有检测到删除的盘.
[root@spark01 digoal]# zpool status
  pool: zp
 state: ONLINE
  scan: scrub repaired 0 in 0h0m with 0 errors on Mon May 19 05:56:44 2014
config:

        NAME                        STATE     READ WRITE CKSUM
        zp                          ONLINE       0     0     0
          raidz1-0                  ONLINE       0     0     0
            /home/digoal/zfs.disk1  ONLINE       0     0     0
            /home/digoal/zfs.disk2  ONLINE       0     0     0
            /home/digoal/zfs.disk3  ONLINE       0     0     0
            /home/digoal/zfs.disk4  ONLINE       0     0     0
        logs
          mirror-1                  ONLINE       0     0     0
            /home/digoal/zfs.log1   ONLINE       0     0     0
            /home/digoal/zfs.log2   ONLINE       0     0     0

errors: No known data errors

但是因为使用了raidz1, 所以删除disk1后还能查询. (从校验数据中计算出原始数据. raidz1允许坏1块盘)

[root@spark01 digoal]# cd /zp/test
[root@spark01 test]# ll
total 138651
drwxr-xr-x.  9 root root        12 May 19 05:55 hadoop-2.4.0
-rw-r--r--.  1 root root 138943699 May 19 05:56 hadoop-2.4.0.tar.gz
drwxr-xr-x. 10 root root        30 May 19 05:55 spl-0.6.2
-rw-r--r--.  1 root root    565277 May 19 05:55 spl-0.6.2.tar.gz
drwxr-xr-x. 13 root root        37 May 19 05:55 zfs-0.6.2
-rw-r--r--.  1 root root   2158948 May 19 05:55 zfs-0.6.2.tar.gz
[root@spark01 test]# du -sh *
250M    hadoop-2.4.0
133M    hadoop-2.4.0.tar.gz
39M     spl-0.6.2
643K    spl-0.6.2.tar.gz
193M    zfs-0.6.2
2.2M    zfs-0.6.2.tar.gz

新建一个文件, 用来替换被我删掉的zfs.disk1文件, 新增的这个文件可以与zfs.disk1同名, 也可以不同名.

[root@spark01 test]# cd /home/digoal/
[root@spark01 digoal]# dd if=/dev/zero of=./zfs.disk1 bs=1024k count=2048
2048+0 records in
2048+0 records out
2147483648 bytes (2.1 GB) copied, 1.29587 s, 1.7 GB/s

使用zpool replace替换坏盘 :

[root@spark01 digoal]# zpool replace -h
usage:
        replace [-f] <pool> <device> [new-device]

[root@spark01 digoal]# zpool replace zp /home/digoal/zfs.disk1 /home/digoal/zfs.disk1
[root@spark01 digoal]# zpool scrub zp
[root@spark01 digoal]# zpool status zp
  pool: zp
 state: ONLINE
  scan: scrub repaired 0 in 0h0m with 0 errors on Mon May 19 06:01:19 2014
config:

        NAME                        STATE     READ WRITE CKSUM
        zp                          ONLINE       0     0     0
          raidz1-0                  ONLINE       0     0     0
            /home/digoal/zfs.disk1  ONLINE       0     0     0
            /home/digoal/zfs.disk2  ONLINE       0     0     0
            /home/digoal/zfs.disk3  ONLINE       0     0     0
            /home/digoal/zfs.disk4  ONLINE       0     0     0
        logs
          mirror-1                  ONLINE       0     0     0
            /home/digoal/zfs.log1   ONLINE       0     0     0
            /home/digoal/zfs.log2   ONLINE       0     0     0

errors: No known data errors

使用status -x参数查看pool的健康状态

[root@spark01 digoal]# zpool status zp -x
pool 'zp' is healthy

注意如果是真实环境中的硬盘替换的话, 支持热插拔的硬盘可以直接替换硬盘, 然后使用zpool replace替换.

对于不能热插拔的硬盘, 需要关机替换硬盘, 再使用zpool replace替换掉坏盘.

查看坏盘对应的设备号(或序列号, 因为更换硬盘时需要拔下硬盘后现场对比一下序列号, 以免弄错).

hdparm -I, 对比zpool status中的设备名.

[参考]
1. http://docs.oracle.com/cd/E26502_01/pdf/E29007.pdf

2. http://www.root.cz/clanky/suborovy-system-zfs-konzistentnost-dat/

3. https://pthree.org/2012/12/11/zfs-administration-part-vi-scrub-and-resilver/

4. https://pthree.org/2012/12/05/zfs-administration-part-ii-raidz/

时间： 2024-10-29 01:05:57

zfs pool self healing and scrub and pre-replace "bad"-disks的相关文章

use export and import move ZPOOL's underdev from one machine to another OR upgrade a zfs version OR recover destroyed pools

前面我们介绍了zfs的pool, 类似LVM. 由多个块设备组成. 如果这些块设备要从一个机器转移到另一台机器的话, 怎么实现呢? zfs通过export和import来实现底层块设备的转移. 在已有POOL的主机上, 先将会读写POOL或dataset的正在运行的程序停止掉, 然后执行export. 执行export会把cache flush到底层的块设备, 同时卸载dataset和pool. import时, 可能需要指定块设备的目录, 但是并不需要指定顺序. 例如 : [root@spar

zfs pool self healing and scrub and pre-replace "bad"-disks

zfs pool self healing and scrub and pre-replace "bad"-disks的相关文章

use export and import move ZPOOL's underdev from one machine to another OR upgrade a zfs version OR recover destroyed pools

ZPOOL health check and repair use scrub

13.3. Creating a Storage Pool

ZFS 那点事

replace offline or FAULTED device in ZPOOL

13.9. Health Status

防止代码复制

让 Ghost 不通过加载 js 文件来实现代码高亮

asp.net+jquery.form实现图片异步上传的方法(附jquery.form.js下载)_jquery