使用pt-table-checksum校验MySQL主从复制

pt-table-checksum是一个基于MySQL数据库主从架构在线数据一致性校验工具。其工作原理在主库上运行, 通过对同步的表在主从段执行checksum, 从而判断数据是否一致。在校验完毕时,该工具将列出与主库存在差异的对象结果。

一、主从不一致的情形

    Master端使用了不确定的语句(如:CURRENT_USER(), UUID())
    不正确的故障转移(failover)流程
    误操作或直接在Slave进行DML操作
    持续的升级更新(Rolling upgrades)
    混合使用事务引擎和非事务引擎的表
    跳过了复制事件 (SET GLOBAL SQL_SLAVE_SKIP_COUNTER = N)
    使用临时表(Temporary Tables)
    复制过滤(Replication Filters)
    使用含LIMIT且没有order by的更新语句(update/delete with LIMIT clause without order by)

二、pt-table-checksum特性

    pt-table-checksum connects to the server you specify, and finds databases and tables that
    match the filters you specify  (if any). It works one table at a time, so it does not accumulate
    large amounts of memory or do a lot of work before beginning to checksum. This makes it usable
    on very large servers. We have used it on servers with hundreds of thousands of databases and tables,
    and trillions of rows. No matter how large the server is, pt-table-checksum works equally well.

    One reason it can work on very large tables is that it divides each table into chunks of rows,
    and checksums each chunk with a single REPLACE..SELECT query. It varies the chunk size to make
    the checksum queries run in the desired amount of time. The goal of chunking the tables, instead of
    doing each table with a single big query, is to ensure that checksums are unintrusive and don’t cause too
    much replication lag or load on the server. That’s why the target time for each chunk is 0.5 seconds by default.

    The tool keeps track of how quickly the server is able to execute the queries, and adjusts the chunks
    as it learns more about the server’s performance. It uses an exponentially decaying weighted average
    to keep the chunk size stable, yet remain responsive if the server’s performance changes during checksumming
    for any reason. This means that the tool will quickly throttle itself if your server becomes heavily loaded during
    a trafficc spike or a background task, for example.

    After pt-table-checksum finishes checksumming all of the chunks in a table, it pauses and waits for all
    detected replicas to finish executing the checksum queries. Once that is finished, it checks all of the replicas to
    see if they have the same data as the master, and then prints a line of output with the results.

三、演示pt-table-checksum

-- 环境:Master 192.168.1.8, Slave 192.168.1.12,主从已构建
-- 演示中,mysql提示符为:用户名@主机名[库名]
-- 如master@localhost[test],表示master用户表示在主,slave表示用户在slave上
-- 复制过滤器如下:
[root@vdbsrv4 ~]# mysql -uroot -p -e "show slave status\G"|grep "Replicate
Enter password:
              Replicate_Do_DB: sakila,test
          Replicate_Ignore_DB: mysql
a、环境准备
--对用于执行checksum的用户授权,注,如果主从复制未开启mysql系统库复制,则从库也同样执行用户创建
master@localhost[test]> grant select, process, super, replication slave on *.* to
 ->  'checksums'@'192.168.1.%' identified by 'xxx';
Query OK, 0 rows affected (0.00 sec)

--主库建表及插入记录
master@localhost[test]> create table t(id tinyint primary key auto_increment,ename varchar(20));
Query OK, 0 rows affected (0.01 sec)

master@localhost[test]> insert into t(ename) values('Leshami'),('Henry'),('Jack');
Query OK, 3 rows affected (0.01 sec)
Records: 3  Duplicates: 0  Warnings: 0

--从库查询结果
slave@localhost[test]> select * from t;
+----+---------+
| id | ename  |
+----+---------+
|  1 | Leshami |
|  2 | Henry  |
|  3 | Jack    |
+----+---------+

--模拟数据不一致,slave端删除记录
slave@localhost[test]> delete from t where id=2;

b、单表校验
-- 执行pt-table-checksum
[root@vdbsrv3 ~]# pt-table-checksum h='192.168.1.8',u='checksums',p='xxx',P=3306 \
> -dtest -tt --nocheck-replication-filters \
> --no-check-binlog-format  --replicate=test.checksum
            TS ERRORS  DIFFS    ROWS  CHUNKS SKIPPED    TIME TABLE
08-06T10:14:32      0      1        3      1      0  0.031 test.t

TS            :完成检查的时间。
ERRORS        :检查时候发生错误和警告的数量。
DIFFS        :0表示一致,1表示不一致。当指定--no-replicate-check时,
                会一直为0,当指定--replicate-check-only会显示不同的信息。
ROWS          :表的行数。
CHUNKS        :被划分到表中的块的数目。
SKIPPED      :由于错误或警告或过大,则跳过块的数目。
TIME          :执行的时间。
TABLE        :被检查的表名。

--基于从库端SQL脚本查看checksum结果
slave@localhost[test]> system more check_sync_stat.sql;
SELECT
    db, tbl, SUM(this_cnt) AS total_rows, COUNT(*) AS chunks
FROM
    test.checksum
WHERE
    (master_cnt <> this_cnt
        OR master_crc <> this_crc
        OR ISNULL(master_crc) <> ISNULL(this_crc))
GROUP BY db , tbl;

slave@localhost[test]> source check_sync_stat.sql;
+------+-----+------------+--------+
| db  | tbl | total_rows | chunks |
+------+-----+------------+--------+
| test | t  |          2 |      1 |
+------+-----+------------+--------+

--从库端插入记录
slave@localhost[test]> insert into t(ename) values('Robin');
Query OK, 1 row affected (0.00 sec)

slave@localhost[test]> select * from t;
+----+---------+
| id | ename  |
+----+---------+
|  1 | Leshami |   #Author : Leshami
|  3 | Jack    |   #Blog     : http://blog.csdn.net/leshami
|  4 | Robin  |
+----+---------+

-- 再次在master端执行pt-table-checksum(此处略),后查看结果如下
slave@localhost[test]> source check_sync_stat.sql;
+------+-----+------------+--------+
| db  | tbl | total_rows | chunks |
+------+-----+------------+--------+
| test | t  |          3 |      1 |
+------+-----+------------+--------+

b、查看pt-table-checksum工作原理
-- 使用--explain参数,不执行checksum,列出checksum时真正执行的SQL语句
Show, but do not execute, checksum queries (disables --[no]empty-replicate-table). If specifed
twice, the tool actually iterates through the chunking algorithm, printing the upper and lower boundary values
for each chunk, but not executing the checksum queries.

[root@vdbsrv3 ~]# pt-table-checksum h='192.168.1.8',u='checksums',p='xxx',P=3306 \
> -dtest -tt --nocheck-replication-filters \
> --no-check-binlog-format  --replicate=test.checksum --explain
--
-- test.t
--

REPLACE INTO `test`.`checksum` (db, tbl, chunk, chunk_index, lower_boundary,
upper_boundary, this_cnt, this_crc) SELECT ?, ?, ?, ?, ?, ?, COUNT(*) AS cnt,
COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `id`, `ename`,
CONCAT(ISNULL(`ename`)))) AS UNSIGNED)), 10, 16)), 0) AS crc FROM `test`.`t`
  /*checksum table*/

c、库级别校验
[root@vdbsrv3 ~]# pt-table-checksum h='192.168.1.8',u='checksums',p='xxx',P=3306 \
> --databases=sakila --nocheck-replication-filters --no-check-binlog-format \
> --replicate=test.checksum
            TS ERRORS  DIFFS    ROWS  CHUNKS SKIPPED    TIME TABLE
08-06T13:52:17      0      0      200      1      0  0.083 sakila.actor
08-06T13:52:17      0      0      603      1      0  0.024 sakila.address
08-06T13:52:17      0      0      16      1      0  0.012 sakila.category
08-06T13:52:17      0      0      600      1      0  0.025 sakila.city
08-06T13:52:17      0      0      109      1      0  0.019 sakila.country
08-06T13:52:17      0      0      599      1      0  0.019 sakila.customer
08-06T13:52:17      0      0    1000      1      0  0.035 sakila.film
08-06T13:52:17      0      0    5462      1      0  0.295 sakila.film_actor
08-06T13:52:17      0      0    1000      1      0  0.019 sakila.film_category
08-06T13:52:17      0      0    1000      1      0  0.015 sakila.film_text
08-06T13:52:17      0      0    4581      1      0  0.041 sakila.inventory
08-06T13:52:17      0      0        6      1      0  0.012 sakila.language
08-06T13:52:18      0      0    16049      1      0  0.367 sakila.payment
08-06T13:52:18      0      0    16044      1      0  0.357 sakila.rental
08-06T13:52:18      0      0        2      1      0  0.013 sakila.staff
08-06T13:52:18      0      0        2      1      0  0.012 sakila.store

--在从库删除一张表
slave@localhost[test]> drop table sakila.payment;
Query OK, 0 rows affected (0.01 sec)

-- 再次执行pt-table-checksum,收到如下提示
08-06T13:56:42 Skipping table sakila.payment because it has problems on these replicas:
Table sakila.payment does not exist on replica vdbsrv4
This can break replication.  If you understand the risks, specify --no-check-slave-tables to disable this check.
08-06T13:56:42 Error checksumming table sakila.payment: DBD::mysql::db selectrow_hashref failed:
Table 'sakila.payment' doesn't exist
[for Statement "EXPLAIN SELECT * FROM `sakila`.`payment` WHERE 1=1"] at /usr/bin/pt-table-checksum line 6530.

d、多从校验
-- 下面演示多个从库时主从一致性校验
-- 缺省情况下
-- 参数:--recursion-method ; type: array; default: processlist,hosts.
--            Preferred recursion method for discovering replicas.
--  pt-table-checksum performs several “REPLICACHECKS” before and while running.

master@localhost[(none)]> show slave hosts;
+-----------+------+------+-----------+--------------------------------------+
| Server_id | Host | Port | Master_id | Slave_UUID                          |
+-----------+------+------+-----------+--------------------------------------+
|        11 |      | 3307 |      1002 | 69fc46b6-3c06-11e5-94f0-000c29a05f26 |
|        1 |      | 3306 |      1002 | f2824060-e2cb-11e4-8f18-000c2926f457 |
+-----------+------+------+-----------+--------------------------------------+

root@localhost[(none)]> show variables like 'port';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| port          | 3307  |
+---------------+-------+

root@localhost[(none)]> delete from test.t where id=1;
Query OK, 1 row affected (0.00 sec)

[root@vdbsrv3 ~]# pt-table-checksum h='192.168.1.8',u='checksums',p='xxx',P=3306 -dtest \
> -tt --nocheck-replication-filters --no-check-binlog-format --replicate=test.checksum \
> --recursion-method=hosts

# A software update is available:
#  * The current version for Percona::Toolkit is 2.2.14.

            TS ERRORS  DIFFS    ROWS  CHUNKS SKIPPED    TIME TABLE
08-06T16:12:52      0      1        3      1      0  0.034 test.t

四、参数描述

–nocheck-replication-filters
  不检查复制过滤器,建议启用。后面可以用–databases来指定需要检查的数据库。
–no-check-binlog-format
  不检查复制的binlog模式,要是binlog模式是ROW,则会报错。
–replicate-check-only
  只显示不同步的信息。
–replicate=
  把checksum的信息写入到指定表中,建议直接写到被检查的数据库当中。
–databases=
  指定需要被检查的数据库,多个则用逗号隔开。
–tables=
  指定需要被检查的表,多个用逗号隔开
  h=127.0.0.1 :Master的地址
  u=root :用户名
  p=123456 :密码
  P=3306 :端口

五、常见问题

[root@vdbsrv3 ~]# pt-table-checksum h='192.168.1.8',u='checksums',p='xxx',P=3306 -d mysql \
> --nocheck-replication-filters --replicate=test.checksums
Replica vdbsrv4 has binlog_format MIXED which could cause pt-table-checksum to break replication.
Please read "Replicas using row-based replication" in the LIMITATIONS section of the tool's documentation.
  If you understand the risks, specify --no-check-binlog-format to disable this check.
上面描述的是关于使用mixed日志格式时的问题  

[root@vdbsrv3 ~]# pt-table-checksum h='192.168.1.8',u='checksums',p='xxx',P=3306 -d mysql \
> --nocheck-replication-filters --no-check-binlog-format
DBD::mysql::db do failed: Access denied for user 'checksums'@'192.168.1.%' to database 'percona'
[for Statement "CREATE DATABASE IF NOT EXISTS `percona` /* pt-table-checksum */"]
at /usr/bin/pt-table-checksum line 10743.
07-29T08:42:03 --replicate database percona does not exist and it cannot be created automatically.
You need to create the database.

[root@vdbsrv3 ~]# pt-table-checksum h='192.168.1.8',u='checksums',p='xxx',P=3306 -dtest -tt \
> --nocheck-replication-filters --no-check-binlog-format  --replicate=test.checksum
Cannot connect to P=3306,h=vdbsrv4,p=...,u=checksums
Diffs cannot be detected because no slaves were found.
Please read the --recursion-method documentation for information.
            TS ERRORS  DIFFS    ROWS  CHUNKS SKIPPED    TIME TABLE
08-06T10:03:10      0      0        3      1      0  0.023 test.t

[root@vdbsrv3 ~]# pt-table-checksum h='192.168.1.8',u='checksums',p='xxx',P=3306 -dtest -tt \
> --nocheck-replication-filters --no-check-binlog-format \
> --replicate=test.checksum --recursion-method=hosts
Cannot connect to P=3306,h=,p=...,u=checksums
Cannot connect to P=3307,h=,p=...,u=checksums
Diffs cannot be detected because no slaves were found.
Please read the --recursion-method documentation for information.
            TS ERRORS  DIFFS    ROWS  CHUNKS SKIPPED    TIME TABLE
08-06T16:02:27      0      0        3      1      0  0.016 test.t

master@localhost[(none)]> show slave hosts;
+-----------+------+------+-----------+--------------------------------------+
| Server_id | Host | Port | Master_id | Slave_UUID                          |
+-----------+------+------+-----------+--------------------------------------+
|        1 |      | 3306 |      1002 | f2824060-e2cb-11e4-8f18-000c2926f457 |
|        11 |      | 3307 |      1002 | 69fc46b6-3c06-11e5-94f0-000c29a05f26 |
+-----------+------+------+-----------+--------------------------------------+

-- 增加参数report_host后重启从库
[root@vdbsrv4 ~]# grep report_host /etc/my.cnf
report_host='192.168.1.12'

master@localhost[(none)]> show slave hosts;
+-----------+--------------+------+-----------+--------------------------------------+
| Server_id | Host        | Port | Master_id | Slave_UUID                          |
+-----------+--------------+------+-----------+--------------------------------------+
|        11 | 192.168.1.12 | 3307 |      1002 | 69fc46b6-3c06-11e5-94f0-000c29a05f26 |
|        1 | 192.168.1.12 | 3306 |      1002 | f2824060-e2cb-11e4-8f18-000c2926f457 |
+-----------+--------------+------+-----------+--------------------------------------+

时间: 2024-09-12 04:40:37

使用pt-table-checksum校验MySQL主从复制的相关文章

MySQL主从复制中常见的3个错误及填坑方案

一.问题描述    主从复制错误一直是MySQL DBA一直填不完的坑,如鲠在喉,也有人说mysql主从复制不稳定云云,其实MySQL复制比我们想象中要坚强得多,而绝大部分DBA却认为只要跳过错误继续复制就好啦,接下来不发生错误就好了,其实跳过错误就会有数据不一致的风险,数据不一致可能还会越来越严重,而我就复制错误中反复出现的1045.1032和1062错误引起的数据库主从不一致的的现象进行深入分析及给出一套完善的解决方案.   (1) [ERROR]1452:无法在外键的表插入参考主键没有的数

MySQL主从复制的延迟监测

主从复制延迟的监测,我以前的做法是通过比较show slave status\G中的两个变量的差值(Read_Master_Log_Pos,Exec_Master_Log_Pos),将差值设置为一个自己认为合理的范围,Seconds_Behind_Master 没有适用过,今天做一次解析: Seconds_Behind_Master 是通过比较 SQL THREAD 接受 events事件的时间戳(timestamp) 与IO THREAD  执行事件 events时间戳的差值--秒数来确定sl

高可用架构-- MySQL主从复制的配置

环境 操作系统:CentOS-6.6-x86_64-bin-DVD1.iso MySQL版本:mysql-5.6.26.tar.gz 主节点IP:192.168.1.205     主机名:edu-mysql-01 从节点IP:192.168.1.206     主机名:edu-mysql-02 主机配置:4核CPU.4G内存   依赖课程 <高可用架构篇--第13节--MySQL源码编译安装(CentOS-6.6+MySQL-5.6)>   MySQL主从复制官方文档 http://dev.

关于mysql主从复制自增长列

问题描述 关于mysql主从复制自增长列 现有两台mysql服务器A和B A:auto_increment_offset = 2 auto_increment_increment = 2 binlog_format="STATEMENT" B:auto_increment_offset = 1 auto_increment_increment = 2 A和B都有如下表,建表语句如下: test | CREATE TABLE test (id int(11) NOT NULL AUTO_

MySQL 主从复制详解(详细)

目录: MySQL 主从原理 MySQL 主从配置 MySQL 主从一致性检查 MySQL 主从错误处理 参考链接 一.mysql主从原理 1. 基本介绍 MySQL 内建的复制功能是构建大型,高性能应用程序的基础.将 MySQL 的 数亿分布到到多个系统上去,这种分步的机制,是通过将 MySQL 的某一台主机的数据复制到其它主机( Slave )上,并重新执行一遍来实现的.复制过程中一个服务器充当服务器,而一个或多个其它服务器充当从服务器.主服务器将更新写入二进制日志,并维护文件的一个索引以跟

MySQL主从复制结构中常用参数

MySQL主从复制结构中常用参数 这篇文章主要简单说一下MySQL主从复制结构中常用到的一些参数.参数是一个程序的翅膀,正是因为有了很多不同的配置参数,程序才会这么强大.一般来说,参数越多越复杂的程序,功能也越强大,因为要处理的组合关系越多,同时掌握起来也越难.即使是一个你天天用的程序,比如 ls 你也很难掌握他的全部参数,所以没事的时候多翻翻man page,你会发现,我靠!这个命令居然还有这个参数,真是NB大了.好了,进入正题: server-id ID值唯一的标识了复制群集中的主从服务器,

MySQL主从复制与读写分离

MySQL主从复制与读写分离 MySQL主从复制(Master-Slave)与读写分离(MySQL-Proxy)实践 Mysql作为目前世界上使用最广泛的免费数据库,相信所有从事系统运维的工程师都一定接触过.但在实际的生产环境中,由单台Mysql作为独立的数据库是完全不能满足实际需求的,无论是在安全性,高可用性以及高并发等各个方面. 因此,一般来说都是通过 主从复制(Master-Slave)的方式来同步数据,再通过读写分离(MySQL-Proxy)来提升数据库的并发负载能力 这样的方案来进行部

CentOS 6.4系统MySQL主从复制基本配置实践

对于MySQL数据库一般用途的主从复制,可以实现数据的备份(如果希望在主节点失效后,能够使从节点自动接管,就需要更加复杂的配置,这里暂时先不考虑),如果主节点出现硬件故障,数据库服务器可以直接手动切换成备份节点(从节点),继续提供服务.基本的主从复制配置起来非常容易,这里我们做个简单的记录总结. 我们选择两台服务器来进行MySQL的主从复制实践,一台m1作为主节点,另一台nn作为从节点. 两台机器上都需要安装MySQL数据库,如果想要卸掉默认安装的,可以执行如下命令: 1 sudo rpm -e

mysql主从复制与数据同步Slave_IO_Running错误

mysql主从复制 怎么安装mysql数据库,这里不说了,只说它的主从复制,步骤如下: 1.主从服务器分别作以下操作: 1.1.版本一致 1.2.初始化表,并在后台启动mysql 1.3.修改root的密码 2.修改主服务器master: #vi /etc/my.cnf [mysqld] log-bin=mysql-bin   //[必须]启用二进制日志 server-id=108       //[必须]服务器唯一ID,默认是1,一般取IP最后一段 3.修改从服务器slave: #vi /et