-------------------------------------------------------------------------------------------------正文---------------------------------------------------------------------------------------------------------------
场景:
MySQL-5.7.17, 主从架构, 业务读写分离, 只读从库不定期出现延迟, 并触发Innodb的Semaphore wait导致从库主动Crash;
结论:
数据库升级到5.7.19, 调整block size和buffer_pool_instances;
重点!:
毕竟看不懂innodb的源代码, 没有理清楚详细的逻辑, 只能对问题现象和代码碎片进行分析;
分析:
源代码为5.7.17
现象上, 所有实例中只有这一个业务的实例有这种问题, 且通过更换虚拟机/物理机的方式确认了并非外部原因导致同步延迟;
通过监控的观察, 确认了在延迟出现的时候没有突发性的CPU使用率和IO使用率;
先上error log
点击(此处)折叠或打开
- 2017-05-24T15:27:30.303424+08:00 0 [Warning] InnoDB: A long semaphore wait:
- --Thread 140484883478272 has waited at trx0undo.ic line 171 for 241.00 seconds the semaphore:
- X-lock on RW-latch at 0x7fd08b2ed088 created in file buf0buf.cc line 1459
- a writer (thread id 140484736354048) has reserved it in mode exclusive
- number of readers 0, waiters flag 1, lock_word: 0
- Last time read locked in file trx0undo.ic line 190
- Last time write locked in file /export/home/pb2/build/sb_0-21378219-1480360739.71/release-ET46834/mysql-5.7.17/storage/innobase/include/trx0undo.ic line 171
- 2017-05-24T15:27:30.303552+08:00 0 [Warning] InnoDB: A long semaphore wait:
- --Thread 140484736354048 has waited at trx0rseg.ic line 48 for 241.00 seconds the semaphore:
- X-lock on RW-latch at 0x7fd34dff7f60 created in file buf0buf.cc line 1459
- a writer (thread id 140484883478272) has reserved it in mode exclusive
- number of readers 0, waiters flag 1, lock_word: 0
- Last time read locked in file not yet reserved line 0
- Last time write locked in file /export/home/pb2/build/sb_0-21378219-1480360739.71/release-ET46834/mysql-5.7.17/storage/innobase/include/trx0rseg.ic line 48
- InnoDB: ###### Starts InnoDB Monitor for 30 secs to print diagnostic info:
- InnoDB: Pending preads 0, pwrites 0
- ......
- ......
- InnoDB: ###### Diagnostic info printed to the standard error stream
- 2017-05-24T15:39:22.319643+08:00 0 [ERROR] [FATAL] InnoDB: Semaphore wait has lasted > 600 seconds. We intentionally crash the server because it appears to be hung.
- 2017-05-24 15:39:22 0x7f4223c21700 InnoDB: Assertion failure in thread 139922044491520 in file ut0ut.cc line 916
多数情况下, long semaphore wait是因为突发性的preads或者pwrites,
但是这次看到error log里面的记录都是0, 而且监控也没有发现突发性的CPU和IO负载, 所以感觉这次问题不在负载上面;
在几次出问题的过程中, 也用gdb抓了thread的堆栈信息, 摘抄上面error log中对应的数据
点击(此处)折叠或打开
- Thread 334 (Thread 0x7fc526ce8700 (LWP 58644)):
- #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
- #1 0x0000000000ffe7f6 in os_event::wait_low(long) ()
- #2 0x00000000010b96bc in sync_array_wait_event(sync_array_t*, sync_cell_t*&) ()
- #3 0x00000000010bc510 in rw_lock_x_lock_func(rw_lock_t*, unsigned long, char const*, unsigned long) ()
- #4 0x000000000114123d in ?? ()
- #5 0x0000000001147a6e in buf_page_get_gen(page_id_t const&, page_size_t const&, unsigned long, buf_block_t*, unsigned long, char const*, unsigned long, mtr_t*, bool) ()
- #6 0x00000000010c9d2a in trx_purge_add_update_undo_to_history(trx_t*, trx_undo_ptr_t*, unsigned char*, bool, unsigned long, mtr_t*) ()
- #7 0x00000000010fc5ef in trx_undo_update_cleanup(trx_t*, trx_undo_ptr_t*, unsigned char*, bool, unsigned long, mtr_t*) ()
- #8 0x00000000010f42f0 in trx_write_serialisation_history(trx_t*, mtr_t*) ()
- #9 0x00000000010f4d9b in trx_commit_low(trx_t*, mtr_t*) ()
- #10 0x00000000010f5714 in trx_commit(trx_t*) ()
- #11 0x00000000010f6277 in trx_commit_for_mysql(trx_t*) ()
- #12 0x0000000000f6a0e7 in innobase_commit_low(trx_t*) ()
- #13 0x0000000000f7a133 in ?? ()
- #14 0x00000000007f53f7 in ha_commit_low(THD*, bool, bool) ()
- #15 0x0000000000e10fd1 in MYSQL_BIN_LOG::process_commit_stage_queue(THD*, THD*) ()
- #16 0x0000000000e1ac75 in MYSQL_BIN_LOG::ordered_commit(THD*, bool, bool) ()
- #17 0x0000000000e1c128 in MYSQL_BIN_LOG::commit(THD*, bool) ()
- #18 0x00000000007f5e22 in ha_commit_trans(THD*, bool, bool) ()
- #19 0x0000000000cfa189 in trans_commit(THD*) ()
- #20 0x0000000000deade2 in Xid_log_event::do_commit(THD*) ()
- #21 0x0000000000deaf5f in Xid_apply_log_event::do_apply_event_worker(Slave_worker*) ()
- #22 0x0000000000e52353 in slave_worker_exec_job_group(Slave_worker*, Relay_log_info*) ()
- #23 0x0000000000e34f73 in handle_slave_worker ()
- #24 0x0000000000e97824 in pfs_spawn_thread ()
- #25 0x00007fdba9cc6064 in start_thread (arg=0x7fc526ce8700) at pthread_create.c:309
- #26 0x00007fdba878962d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
- ......
- ......
- Thread 494 (Thread 0x7fc52f937700 (LWP 11164)):
- #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
- #1 0x0000000000ffe7f6 in os_event::wait_low(long) ()
- #2 0x00000000010b96bc in sync_array_wait_event(sync_array_t*, sync_cell_t*&) ()
- #3 0x00000000010bc510 in rw_lock_x_lock_func(rw_lock_t*, unsigned long, char const*, unsigned long) ()
- #4 0x000000000114123d in ?? ()
- #5 0x0000000001147a6e in buf_page_get_gen(page_id_t const&, page_size_t const&, unsigned long, buf_block_t*, unsigned long, char const*, unsigned long, mtr_t*, bool) ()
- #6 0x00000000010cb380 in ?? ()
- #7 0x00000000010d0e48 in ?? ()
- #8 0x00000000010d1bdd in trx_purge(unsigned long, unsigned long, bool) ()
- #9 0x00000000010abb06 in srv_purge_coordinator_thread ()
- #10 0x00007fdba9cc6064 in start_thread (arg=0x7fc52f937700) at pthread_create.c:309
- #11 0x00007fdba878962d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
标记出一些方法的信息, 通过搜索, 找到这些方法的描述:
buf_page_get_gen() : 代替了buf_page_get方法, 用来访问page, 在从file读取page到bp, 获取page对应的buf_pool->mutex时, 都会使用rw_lock_x_lock来尝试加锁;
代码位于./mysql-server/storage/innobase/buf/buf0buf.cc
rw_lock_x_lock_func() : 代替了很多底层的rw_lock_x_lock_XXX的方法, 是线程用来加X锁用的方法, 如果无法获得X锁, 则会等待一段时间之后才会阻塞线程;
通过查看./mysql-server/storage/innobase/sync/sync0rw.cc中的源代码, 发现线程可能是始终无法获得对应page的X锁,
一直处于如下的循环: 尝试加锁->加锁失败->sync_array_wait_event->下一次循环;
从源代码的注释和逻辑上看, 这两个线程互相在等待对方的锁;
再通过error log的信息, 看看trx0undo.ic line 171, 190和trx0rseg.ic line 48在干嘛 :
点击(此处)折叠或打开
- trx0undo.ic
- trx_undo_page_get(
- const page_id_t& page_id,
- const page_size_t& page_size,
- mtr_t* mtr)
- {
- buf_block_t* block = buf_page_get(page_id, page_size,
- RW_X_LATCH, mtr); -- 171
- buf_block_dbg_add_level(block, SYNC_TRX_UNDO_PAGE);
- return(buf_block_get_frame(block));
- }
- ......
- trx_undo_page_get_s_latched(
- const page_id_t& page_id,
- const page_size_t& page_size,
- mtr_t* mtr)
- {
- buf_block_t* block = buf_page_get(page_id, page_size,
- RW_S_LATCH, mtr); -- 190
- buf_block_dbg_add_level(block, SYNC_TRX_UNDO_PAGE);
- return(buf_block_get_frame(block));
- }
从代码的文件名就可以很清晰的看出来, 这是在对undo log的内容进行加锁操作;
点击(此处)折叠或打开
- trx0rseg.ic
- /** Gets a rollback segment header.
- @param[in] space space where placed
- @param[in] page_no page number of the header
- @param[in] page_size page size
- @param[in,out] mtr mini-transaction
- @return rollback segment header, page x-latched */
- UNIV_INLINE
- trx_rsegf_t*
- trx_rsegf_get(
- ulint space,
- ulint page_no,
- const page_size_t& page_size,
- mtr_t* mtr)
- {
- buf_block_t* block;
- trx_rsegf_t* header;
- block = buf_page_get(
- page_id_t(space, page_no), page_size, RW_X_LATCH, mtr); -- 48
- buf_block_dbg_add_level(block, SYNC_RSEG_HEADER);
- header = TRX_RSEG + buf_block_get_frame(block);
- return(header);
- }
从注释可以看到, 这个方法是用来获取回滚段的header的;
再结合gdb中thread的堆栈信息, 可以大致还原出问题的场景:
两个线程在对undo log & rollback segment进行操作, 在对bp的pages进行加锁(从file读取page, 或者是对bp_instance加锁)发生争用, 导致这两个thread一直处于os_wait状态,
最终引起了Semaphore wait导致主动Crash;
而引起这种争用的原因就是对undo log的操作(purge和get), 所以考虑从两个方面来入手解决问题:
增加bp_instance的数量, 减少争用的可能性;
增大block_size, 减少争用的可能性, 并降低undo log的相关操作(往rollback segment list添加内容或者是唤醒purge线程的频率)
那么最终给出了解决方案的建议:
调整buffer_pool_instance的数量 : 8 -> 32
修改block_size的数量 : 8k -> 16k
数据库升级 : 5.7.17 -> 5.7.19
在5.7.18有一个和undo log相关的修复, 感觉可能有效, 就一起升级了吧~
点击(此处)折叠或打开
- InnoDB: The restriction that required the first undo tablespace to use space ID 1 was removed. The first undo tablespace may now be assigned a space ID other than 1. Space ID values for undo tablespaces are still assigned in a consecutive sequence. (Bug #25551311)
彩蛋:
这个问题其实持续了比较长的时间, 从开始排查到最后给出解决建议差不多有半个月了, 抓取故障信息的过程中, 也找到了引起这个问题的嫌疑SQL;
写了个shell脚本, 挂在后台跑, 没几个小时就复现了这个问题, 也是Semaphore wait, 不过没有导致同步延迟, 残念~_(:з」∠)_ .....
不过从出问题的error log来看, 是在代码的同一行出的问题, 可惜系统包版本的问题, gdb抓不到堆栈信息, 所以没办法拿出确凿的证据证明这个SQL就是罪魁祸首;
惊喜:
难得看到最后, 给各位看到最后的看官一个惊喜~ 业务更倾向于......
数据库降级!
数据库降级?
数据库降级.......
(╯‵□′)╯︵┻━┻
所以??
所以把那个SQL挂起来吧, 嗯嗯...
点击(此处)折叠或打开
- select r.id
- from tb_r r
- inner join tb_a a on r.id = a.id
- left join tb_asf asf on r.id = asf.id and asf.created_time > '2999-99-99 99:99:99'
- inner join tb_ua ua on a.id = ua.id
- where r.type=0 and r.status = 0 and a.type > 0 and r.num > 0 and ua.num=2
- GROUP BY r.id
- order by a.col1 desc , SUM(IFNULL(asf.col2,0)) desc, a.col3 desc
- limit 28
PS:
所谓的惊喜还是吐槽为主啦~
其实挂了一个修正过后的只读实例, 一直跑着这个语句进行验证, 持续了三天也没有出现Semaphore wait的问题了~
业务还是觉得求稳, 用以前没出问题的老版本也无可厚非~
此事已结~持续半个月的debug is finished