zfs pool self healing and scrub and pre-replace "bad"-disks

ZFS的又一个强大之处, 支持坏块的自愈 (如果使用了冗余的话,如raidz1, raidz2, raidze, ... 并且正确的块可通过ECC重新计算出的话.). 

同时ZFS具备类似ECC DIMM的校验功能, 默认使用SHA-256 checksum.

使用scrub来检测ZPOOL底层的块设备是否健康, 对于SAS或FC硬盘, 可以一个月检测一次, 而对于低端的SATA, SCSI设备则最好1周检测一次.

这些可以放在定时任务中执行, 例如每天0点1分开始执行一次scrub.

crontab -e
1 0 * * * /opt/zfs0.6.2/sbin/zpool scrub zptest

对于检测到的指标不好的盘, 可以提前更换(使用zpool replace).

指标 :

The rows in the "zpool status" command give you vital information about the pool, most of which are self-explanatory. They are defined as follows:

pool- The name of the pool.
state- The current health of the pool. This information refers only to the ability of the pool to provide the necessary replication level.
status- A description of what is wrong with the pool. This field is omitted if no problems are found.
action- A recommended action for repairing the errors. This field is an abbreviated form directing the user to one of the following sections. This field is omitted if no problems are found.
see- A reference to a knowledge article containing detailed repair information. Online articles are updated more often than this guide can be updated, and should always be referenced for the most up-to-date repair procedures. This field is omitted if no problems are found.
scrub- Identifies the current status of a scrub operation, which might include the date and time that the last scrub was completed, a scrub in progress, or if no scrubbing was requested.
errors- Identifies known data errors or the absence of known data errors.
config- Describes the configuration layout of the devices comprising the pool, as well as their state and any errors generated from the devices. The state can be one of the following: ONLINE, FAULTED, DEGRADED, UNAVAILABLE, or OFFLINE. If the state is anything but ONLINE, the fault tolerance of the pool has been compromised.
The columns in the status output, "READ", "WRITE" and "CHKSUM" are defined as follows:

NAME- The name of each VDEV in the pool, presented in a nested order.
STATE- The state of each VDEV in the pool. The state can be any of the states found in "config" above.
READ- I/O errors occurred while issuing a read request.
WRITE- I/O errors occurred while issuing a write request.
CHKSUM- Checksum errors. The device returned corrupted data as the result of a read request.
Scrubbing ZFS storage pools is not something that happens automatically. You need to do it manually, and it's highly recommended that you do it on a regularly scheduled interval. The recommended frequency at which you should scrub the data depends on the quality of the underlying disks. If you have SAS or FC disks, then once per month should be sufficient. If you have consumer grade SATA or SCSI, you should do once per week. You can schedule a scrub easily with the following command:
# zpool scrub tank
# zpool status tank
  pool: tank
 state: ONLINE
 scan: scrub in progress since Sat Dec  8 08:06:36 2012
    32.0M scanned out of 48.5M at 16.0M/s, 0h0m to go
    0 repaired, 65.99% done
config:

        NAME        STATE     READ WRITE CKSUM
        tank        ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            sde     ONLINE       0     0     0
            sdf     ONLINE       0     0     0
          mirror-1  ONLINE       0     0     0
            sdg     ONLINE       0     0     0
            sdh     ONLINE       0     0     0
          mirror-2  ONLINE       0     0     0
            sdi     ONLINE       0     0     0
            sdj     ONLINE       0     0     0

errors: No known data errors

例如, 使用raidz1冗余, 创建一个zp pool.

[root@spark01 ~]# zpool create zp raidz1 /home/digoal/zfs.disk1 /home/digoal/zfs.disk2 /home/digoal/zfs.disk3 /home/digoal/zfs.disk4 log mirror /home/digoal/zfs.log1 /home/digoal/zfs.log2

[root@spark01 ~]# zpool status
  pool: zp
 state: ONLINE
  scan: none requested
config:

        NAME                        STATE     READ WRITE CKSUM
        zp                          ONLINE       0     0     0
          raidz1-0                  ONLINE       0     0     0
            /home/digoal/zfs.disk1  ONLINE       0     0     0
            /home/digoal/zfs.disk2  ONLINE       0     0     0
            /home/digoal/zfs.disk3  ONLINE       0     0     0
            /home/digoal/zfs.disk4  ONLINE       0     0     0
        logs
          mirror-1                  ONLINE       0     0     0
            /home/digoal/zfs.log1   ONLINE       0     0     0
            /home/digoal/zfs.log2   ONLINE       0     0     0

errors: No known data errors

拷贝一些文件到dataset.

[root@spark01 ~]# cd /home/digoal
[root@spark01 digoal]# ll
total 10575000
drwxr-xr-x.  9 digoal digoal       4096 Mar 31 17:15 hadoop-2.4.0
-rw-rw-r--.  1 digoal digoal  138943699 Mar 31 17:16 hadoop-2.4.0.tar.gz
drwxr-xr-x. 10   7900   7900       4096 May 19 01:24 spl-0.6.2
-rw-r--r--.  1 root   root       565277 Aug 24  2013 spl-0.6.2.tar.gz
drwxr-xr-x. 13   7900   7900       4096 May 19 01:28 zfs-0.6.2
-rw-r--r--.  1 root   root      2158948 Aug 24  2013 zfs-0.6.2.tar.gz
-rw-r--r--.  1 root   root   2147483648 May 19 05:54 zfs.disk1
-rw-r--r--.  1 root   root   2147483648 May 19 05:54 zfs.disk2
-rw-r--r--.  1 root   root   2147483648 May 19 05:54 zfs.disk3
-rw-r--r--.  1 root   root   2147483648 May 19 05:54 zfs.disk4
-rw-r--r--.  1 root   root   1048576000 May 19 05:54 zfs.log1
-rw-r--r--.  1 root   root   1048576000 May 19 05:54 zfs.log2
[root@spark01 digoal]# zfs create zp/test
[root@spark01 digoal]# cp -r spl-0.6.2* zfs-0.6.2* hadoop-2.4.0* /zp/test/

[root@spark01 digoal]# df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1        31G  1.2G   29G   5% /
tmpfs            12G     0   12G   0% /dev/shm
/dev/sda3        89G   11G   74G  13% /home
zp              5.4G     0  5.4G   0% /zp
zp/test         5.9G  535M  5.4G   9% /zp/test

使用zpool scrub检查这个pool.

[root@spark01 digoal]# zpool scrub zp
[root@spark01 digoal]# zpool status
  pool: zp
 state: ONLINE
  scan: scrub repaired 0 in 0h0m with 0 errors on Mon May 19 05:56:17 2014
config:

        NAME                        STATE     READ WRITE CKSUM
        zp                          ONLINE       0     0     0
          raidz1-0                  ONLINE       0     0     0
            /home/digoal/zfs.disk1  ONLINE       0     0     0
            /home/digoal/zfs.disk2  ONLINE       0     0     0
            /home/digoal/zfs.disk3  ONLINE       0     0     0
            /home/digoal/zfs.disk4  ONLINE       0     0     0
        logs
          mirror-1                  ONLINE       0     0     0
            /home/digoal/zfs.log1   ONLINE       0     0     0
            /home/digoal/zfs.log2   ONLINE       0     0     0
errors: No known data errors

关闭一个正在执行的scrub任务 : 

[root@spark01 test]# zpool scrub -s zp
cannot cancel scrubbing zp: there is no active scrub

接下来要测试一下在线替换scrub检查到问题的块设备, 我这里使用删除一个zfs.disk来模拟坏盘.

[root@spark01 digoal]# rm -f zfs.disk1
[root@spark01 digoal]# zpool scrub zp    #使用scrub没有检测到删除的盘.
[root@spark01 digoal]# zpool status
  pool: zp
 state: ONLINE
  scan: scrub repaired 0 in 0h0m with 0 errors on Mon May 19 05:56:44 2014
config:

        NAME                        STATE     READ WRITE CKSUM
        zp                          ONLINE       0     0     0
          raidz1-0                  ONLINE       0     0     0
            /home/digoal/zfs.disk1  ONLINE       0     0     0
            /home/digoal/zfs.disk2  ONLINE       0     0     0
            /home/digoal/zfs.disk3  ONLINE       0     0     0
            /home/digoal/zfs.disk4  ONLINE       0     0     0
        logs
          mirror-1                  ONLINE       0     0     0
            /home/digoal/zfs.log1   ONLINE       0     0     0
            /home/digoal/zfs.log2   ONLINE       0     0     0

errors: No known data errors

但是因为使用了raidz1, 所以删除disk1后还能查询. (从校验数据中计算出原始数据. raidz1允许坏1块盘)

[root@spark01 digoal]# cd /zp/test
[root@spark01 test]# ll
total 138651
drwxr-xr-x.  9 root root        12 May 19 05:55 hadoop-2.4.0
-rw-r--r--.  1 root root 138943699 May 19 05:56 hadoop-2.4.0.tar.gz
drwxr-xr-x. 10 root root        30 May 19 05:55 spl-0.6.2
-rw-r--r--.  1 root root    565277 May 19 05:55 spl-0.6.2.tar.gz
drwxr-xr-x. 13 root root        37 May 19 05:55 zfs-0.6.2
-rw-r--r--.  1 root root   2158948 May 19 05:55 zfs-0.6.2.tar.gz
[root@spark01 test]# du -sh *
250M    hadoop-2.4.0
133M    hadoop-2.4.0.tar.gz
39M     spl-0.6.2
643K    spl-0.6.2.tar.gz
193M    zfs-0.6.2
2.2M    zfs-0.6.2.tar.gz

新建一个文件, 用来替换被我删掉的zfs.disk1文件, 新增的这个文件可以与zfs.disk1同名, 也可以不同名. 

[root@spark01 test]# cd /home/digoal/
[root@spark01 digoal]# dd if=/dev/zero of=./zfs.disk1 bs=1024k count=2048
2048+0 records in
2048+0 records out
2147483648 bytes (2.1 GB) copied, 1.29587 s, 1.7 GB/s

使用zpool replace替换坏盘 : 

[root@spark01 digoal]# zpool replace -h
usage:
        replace [-f] <pool> <device> [new-device]

[root@spark01 digoal]# zpool replace zp /home/digoal/zfs.disk1 /home/digoal/zfs.disk1
[root@spark01 digoal]# zpool scrub zp
[root@spark01 digoal]# zpool status zp
  pool: zp
 state: ONLINE
  scan: scrub repaired 0 in 0h0m with 0 errors on Mon May 19 06:01:19 2014
config:

        NAME                        STATE     READ WRITE CKSUM
        zp                          ONLINE       0     0     0
          raidz1-0                  ONLINE       0     0     0
            /home/digoal/zfs.disk1  ONLINE       0     0     0
            /home/digoal/zfs.disk2  ONLINE       0     0     0
            /home/digoal/zfs.disk3  ONLINE       0     0     0
            /home/digoal/zfs.disk4  ONLINE       0     0     0
        logs
          mirror-1                  ONLINE       0     0     0
            /home/digoal/zfs.log1   ONLINE       0     0     0
            /home/digoal/zfs.log2   ONLINE       0     0     0

errors: No known data errors

使用status -x参数查看pool的健康状态

[root@spark01 digoal]# zpool status zp -x
pool 'zp' is healthy

注意如果是真实环境中的硬盘替换的话, 支持热插拔的硬盘可以直接替换硬盘, 然后使用zpool replace替换.

对于不能热插拔的硬盘, 需要关机替换硬盘, 再使用zpool replace替换掉坏盘.

查看坏盘对应的设备号(或序列号, 因为更换硬盘时需要拔下硬盘后现场对比一下序列号, 以免弄错).

hdparm -I, 对比zpool status中的设备名.

[参考]
1. http://docs.oracle.com/cd/E26502_01/pdf/E29007.pdf

2. http://www.root.cz/clanky/suborovy-system-zfs-konzistentnost-dat/

3. https://pthree.org/2012/12/11/zfs-administration-part-vi-scrub-and-resilver/

4. https://pthree.org/2012/12/05/zfs-administration-part-ii-raidz/

时间: 2024-10-29 01:05:57

zfs pool self healing and scrub and pre-replace "bad"-disks的相关文章

use export and import move ZPOOL&#039;s underdev from one machine to another OR upgrade a zfs version OR recover destroyed pools

前面我们介绍了zfs的pool, 类似LVM. 由多个块设备组成. 如果这些块设备要从一个机器转移到另一台机器的话, 怎么实现呢? zfs通过export和import来实现底层块设备的转移. 在已有POOL的主机上, 先将会读写POOL或dataset的正在运行的程序停止掉, 然后执行export. 执行export会把cache flush到底层的块设备, 同时卸载dataset和pool. import时, 可能需要指定块设备的目录, 但是并不需要指定顺序. 例如 : [root@spar

ZPOOL health check and repair use scrub

zpool健康检查(scrub)主要用于通过checksum来检查zpool数据块的数据是否正常, 如果vdev是mirror或raidz的, 可以自动从其他设备来修复异常的数据块. 由于健康检查是IO开销很大的动作, 所以建议在不繁忙的时候操作(scrub只检查分配出去的数据块, 不会检查空闲的数据块, 所以只和使用率有关, 对于一个很大的zpool, 如果使用率很低的话, scrub也是很快完成的). 用法 :  # zpool scrub zp1 查看zpool状态, 如下, 正在做scr

13.3. Creating a Storage Pool

13.3.1. Mirrored Pool # zpool create tank mirror c0t0d0 c0t0d1 freebsd# zpool create tank mirror ad1 ad3 freebsd# zpool status tank pool: tank state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 mirror ONLINE 0 0

ZFS 那点事

最近看到很多关于ZFS移植到Linux上文章,看来ZFS还是很被大家看好,那就写点关于ZFS的东西,之前对ZFS的使用主要集中在SmartOS上,那就在聊聊我对SmartOS上使用ZFS的体验和ZFS的特性吧. ZFS COW (Copy On Write) 首先说下ZFS的copy on write 这个技术并不复杂,看下图比较清晰, cow 图-1: 可以看到uberblock实际上是Merkle Tree的root. 它记录了文件系统的所有状态,当检索一个数据块的时候, 会从uberblo

replace offline or FAULTED device in ZPOOL

今早发现一台zfsonlinux主机的zpool其中一块硬盘写错误次数过多变成FAULTED状态了, raidz1的话变成了DEGRADED状态. 但是hot spare还是AVAIL的状态, 也就是说hot spare没有自动用起来. (后来发现确实如此, hot spare不会自动使用, 需要手工干预) 当前的zpool状态, sdl已经faulted了. [root@db-192-168-173-219 ~]# zpool status zp1 pool: zp1 state: DEGRA

13.9. Health Status

13.9.1. Basic Health Status freebsd# zpool status -x all pools are healthy freebsd# 13.9.2. Detailed Health Status freebsd# zpool status -x all pools are healthy freebsd# zpool status -v tank pool: tank state: ONLINE scrub: none requested config: NAM

防止代码复制

//插入 document.body.oncopy event事件中//或者 <body>的"oncopy"属性function copyCode(){ try {  var range = document.selection.createRange();  if (document.selection.type != "none")  {   var parent = range.parentElement();   var parentName =

让 Ghost 不通过加载 js 文件来实现代码高亮

我们添加代码高亮一般都是+css +js来实现的,不过么我们既然是使用 Ghost,基于 Node.js ,我们在后台用 Markdown 书写内容,然后 Ghost 将 markdown 转为 html 代码.如果在 markdown 转为 html 这个过程中调用 prism.js 处理代码片段,那生成页面只需有 CSS 样式就可实现高亮,不用引用JS文件. 安装 Prism 进入我们的 Ghost 目录,npm 安装一下 Prism,国内服务器可以使用 cnpm. cd /data/www

asp.net+jquery.form实现图片异步上传的方法(附jquery.form.js下载)_jquery

本文实例讲述了asp.net+jquery.form实现图片异步上传的方法.分享给大家供大家参考,具体如下: 首先我们需要做准备工作: jquery 点击此处本站下载. jquery.form.js 点击此处本站下载. 页面JqueryFormTest.aspx: <%@ Page Language="C#" AutoEventWireup="true" CodeFile="JqueryFormTest.aspx.cs" Inherits=