Percona Server 5.7 并行doublewrite 特性

In this blog post, we’ll discuss the ins and outs of Percona Server 5.7 parallel doublewrite.

在这篇文章中,我们将由里及外讨论Percona Server 5.7的并行doublewrite。

After implementing parallel LRU flushing as described in the previous post, we went back to benchmarking. At first, we tested with the doublewrite buffer turned off. We wanted to isolate the effect of the parallel LRU flusher, and the results validated the design. Then we turned the doublewrite buffer back on and saw very little, if any, gain from the parallel LRU flusher. What happened? Let’s take a look at the data:

在上篇文章(Percona Server 5.7: multi-threaded LRU flushing,详见文末:延伸阅读)中 ,我们描述了多线程LRU刷新线程的实现,现在让我们回到基准测试。首先,在doublewrite buffer关闭状态下进行测试。我们想要隔离并行LRU刷新带来的影响,结果验证了设想。然后,重新开启doublewrite,发现从并行LRU刷新获得的好处很小。到底发生了什么?我们先来看看数据:

We see that the doublewrite buffer mutex is gone as expected and that the top waiters are the rseg mutexes and the index lock (shouldn’t this be fixed in 5.7?). Then we checked PMP:

如上图,我们看到doublewrite buffer互斥量如预期一样消失了,最高的等待是rseg 互斥量和index锁(这不应该在5.7修复了么?)。接着,我们检查下PMP:

Again we see that PFS is not telling the whole story, this time due to a missing annotation in XtraDB. Whereas the PFS results might lead us to leave the flushing analysis and focus on the rseg/undo/purge or check the index lock, PMP clearly shows that a lack of free pages is the biggest source of waits. Turning on the doublewrite buffer makes LRU flushing inadequate again. This data, however, doesn’t tell us why that is.

我们再次看到,PFS并没有显示所有的内容,这里是因为XtraDB的缺陷(详见文末延伸阅读)。而PFS的结果让我们忽略刷新方面的分析,转而聚焦rseg/undo/purge或者索引锁的检查上。PMP清晰地展现缺少空闲也是最大等待的源头。打开doublewriter buffer又会导致LRU刷新不足。然而,这些数据并没有告诉我们为什么会这样。

To see how enabling the doublewrite buffer makes LRU flushing perform worse, we collect PFS and PMP data only for the server flusher (cleaner coordinator, cleaner worker, and LRU flusher) threads and I/O completion threads:

为了了解为何开启了doublewrite buffer 会使LRU刷新变得糟糕,我们收集了PFS和PMP数据,这些数据只包含刷新相关(cleaner coordinator,cleaner worker,以及LRU flusher)线程和I/O相关线程:

If we zoom in from the whole server to the flushers only, the doublewrite mutex is back. Since we removed its contention for the single page flushes, it must be the batch doublewrite buffer usage by the flusher threads that causes it to reappear. The doublewrite buffer has a single area for 120 pages that is shared and filled by flusher threads. The page add to the batch action is protected by the doublewrite mutex, serialising the adds, and results in the following picture:

如果我们从整个服务放大到刷新线程,就又能看到douoblewrite mutex了。由于我们移除了单页刷新之间的争用,所以它会在刷新线程批量使用doublewrite buffer时重新出现。doublewrite buffer有一个有120个page的单独区域,刷新线程负责填充并共享使用。将页添加到批处理操作由doublewrite mutex保护,持续添加之后的结果如下图:

By now we should be wary of reviewing PFS data without checking its results against PMP. Here it is:

现在我们应该更谨慎地评估PFS数据,并与PMP进行对比。PMP结果如下:

As with the single-page flush doublewrite contention and the wait to get a free page in the previous posts, here we have an unannotated-for-Performance Schema doublewrite OS event wait (same bug 80979):

与之前文章中提到的单页刷新doublewrite争用,等待一个空闲的页的情景一样,这里我们有一个在Performance Schema中未被注解的doublewrite OS 事件。

This is as bad as it looks (the comment is outdated). A running doublewrite flush blocks any doublewrite page add attempts from all the other flusher threads for the duration of the flush (up to 120 data pages written twice to storage):

这看起来很糟糕(里面的注释可以不用关注,已经过时)。活跃的doublewrite刷新时会阻塞所有其他flush线程任何的doublewrite page添加(多达120个页写入两次存储):

The issue also occurs with MySQL 5.7 multi-threaded flusher but becomes more acute with the PS 5.7 multi-threaded LRU flusher. There is no inherent reason why all the parallel flusher threads must share the single doublewrite buffer. Each thread can have its own private buffer, and doing so allows us to add to the buffers and flush them independently. This means a lot of synchronisation simply disappears. Adding pages to parallel buffers is fully asynchronous:

使用MySQL 5.7多线程flush也会出现此问题,但Percona Server 5.7的多线程LRU 刷新尤为突出。但并发flush线程并非必须共享单个doublewrite buffer。每个线程都可以有自己的私有buffer,这样可以允许添加到buffer并单独刷新它们。这意味着大量的同步会消失。将页面添加到并行buffer完全是异步的。

And so is flushing them:

变成了下面的刷新模式:

This behavior is what we shipped in the 5.7.11-4 release, and the performance results were shown in a previous post. To see how the private doublewrite buffer affects flusher threads, let’s look at isolated data for those threads again.

这个特性是我们在5.7.11-4版本添加的,其性能提升效果在之前的文章(《Percona Server 5.7 performance improvements》)中已经展示。想知道私有doublewrite buffer对flush线程的影响,让我们再看下这些线程的隔离数据:

Performance Schema:

It shows the redo log mutex as the current top contention source from the PFS point of view, which is not caused directly by flushing.

从PFS的角度看,redo log互斥量是当前使用量最多的争用来源,这不是直接由flush引起的。

PMP data looks better too:

PMP的数据看起来好点:

The buf_dblwr_flush_buffered_writes now waits for its own thread I/O to complete and doesn’t block other threads from proceeding. The other top mutex waits belong to the LRU list mutex, which is again not caused directly by flushing.

buf_dblwr_flush_buffered_writes 现在等待自己的线程I/O完成,并不阻塞其他线程运行。其他较高的互斥量等待属于LRU list mutex,这也不是由flush引起的。

This concludes the description of the current flushing implementation in Percona Server. To sum up, in these post series we took you through the road to the current XtraDB 5.7 flushing implementation:

  • Under high concurrency I/O-bound workloads, the server has a high demand for free buffer pages. This demand can be satisfied by either LRU batch flushing, either single page flushing.
  • Single page flushes cause a lot of doublewrite buffer contention and are bad even without the doublewrite.
  • Same as in XtraDB 5.6, we removed the single page flushing altogether.
  • Existing cleaner LRU flushing could not satisfy free page demand.
  • Multi-threaded LRU flushing design addresses this issue – if the doublewrite buffer is disabled.
  • If the doublewrite buffer is enabled, MT LRU flushing contends on it, negating its improvements.
  • Parallel doublewrite buffers address this bottleneck.

以下是对当前Percona Server的flush实现描述。总结一下,在这些的系列文章中,我们重现了XtraDB 5.7刷新实现的风雨历程:

  • 在I/O密集的工作负载下,server对空闲buffer页面的需求量很大,这要求通过批量的LRU刷新,或者单个页面刷新来满足需要。
  • 单个页面flush会导致大量的doublewrite buffer争用,即使没开启doublewrite也是糟糕的。
  • 与XtraDB 5.6 相同,我们一并移除了单页flush。
  • 现有的LRU刷新机制不能满足空闲页的需求。
  • 多线程LRU刷新解决了这个问题——如果doublewrite buffer关闭掉。
  • 如果开启了doublewrite,则多线程的LRU flush会争用它,也会使得性能下降。
  • 并行的doublewrite buffer解决了这个问题。
时间: 2024-08-04 05:35:08

Percona Server 5.7 并行doublewrite 特性的相关文章

Percona Server 5.7有哪些性能提升?

In this blog post, we'll be discussing Percona Server 5.7 performance improvements. 在这篇文章中,我们将讨论Percona Server 5.7有哪些性能提升. Starting from the Percona Server 5.6 release, we've introduced several significant changes that help address performance proble

[MySQL 5.6] Percona Server 5.6.15(及之前版本)的优化点

Percona版本的MySQL 5.6目前已经GA了几个版本,本文大概理一下Percona版本相比Oracle版本的主要不同之处. 本文基于Percona Server 5.6.15,不排除后续有更多的新特性. 总的来说,Percona版本的5.6最吸引我的地方在于,强化了后台线程的优先级(例如Page cleaner),让后台线程多干活,同时弱化用户线程对全局资源(buffer pool)的管理介入:这在大负载写入的场景下,可以有助于性能稳定(例如通常我们在redo log推进到74%左右的时

Percona Server 与 MySQL 5.5 的性能比较

Percona 为 MySQL 数据库服务器进行了改进,在功能和性能上较 MySQL 有着很显著的提升.该版本提升了在高负载情况下的 InnoDB 的性能.为 DBA 提供一些非常有用的性能诊断工具;另外有更多的参数和命令来控制服务器行为. Percona Server 只包含 MySQL 的服务器版,并没有提供相应对 MySQL 的 Connector 和 GUI 工具进行改进. Percona Server 使用了一些 google-mysql-tools, Proven Scaling,

MySQL服务器版Percona Server 5.1.49-rel11.3

Percona 为 MySQL 数据库服务器进行了改进,在功能和性能上较 MySQL 有着很显著的提升.该版本提升了在高负载情况下的 InnoDB 的性能.为 DBA 提供一些非常有用的性能诊断工具;另外有更多的参数和命令来控制服务器行为. Percona Server 只包含 MySQL 的服务器版,并没有提供相应对 MySQL 的 Connector 和 GUI 工具进行改进. Percona Server 使用了一些 google-mysql-tools, Proven Scaling,

Windows Server 2012的存储新特性

  Windows服务器专家Anil Desai先生将介绍微软的新产品如何在内部自动化存储.稳定性.性能及对数据中心的提升效率等方面的功能. 问:Windows Server 2012?里有哪些新的存储特性? Anil Desai:微软现在做的是将传统存储中的易用性.可靠性.高性能植入到Windows Server 2012平台上的核心产品里,所以企业不再需要第三方工具或驱动就能使用. 问:可否深入谈一下Windows Server 2012的存储新特性,是如何提升数据中心性能的? Desai:

Percona Server 5.5.17-22.1发布 MySQL衍生版

Percona在2011年11月19日高兴地宣布Percona Server 5.5.17-22.1http://www.aliyun.com/zixun/aggregation/18782.html">正式发布.它是基于MySQL 5.5.17,包括修复所有之前的bug,Percona Server 5.5.17-22.1是目前5.5系列中最稳定的版本.所有Percona软件都是开源和自由的,更多的细节请参阅发行说明,你可以发现5.5.17-22.1是一个里程碑式的版本. 错误修正包括:

SQL Server与Oracle并行访问的本质区别

设计优良.性能卓越的数据库引擎可以轻松地同时为成千上万的用户服务.而"底气不足"的数据库系统随着更多的用户同时访问系统将大大降低其性能.最糟糕的情况下甚至可能导致系统的崩溃. 当然,并行访问是任何数据库解决方案都最为重视的问题了,为了解决并行访问方面的问题各类数据库系统提出了各种各样的方案.SQL Server和Oracle两大DBMS也分别采用了不同的并行处理方法.它们之间的实质差别在哪里呢? 并行访问的问题 并行访问出现问题存在若干种情况.在最简单的情形下,数量超过一个的用户可能同

SQL Server与Oracle并行访问的区别

设计优良.性能卓越的数据库引擎可以轻松地同时为成千上万的用户服务.而"底气不足"的数据库系统随着更多的用户同时访问系统将大大降低其性能.最糟糕的情况下甚至可能导致系统的崩溃. 中国.站.长站 当然,并行访问是任何数据库解决方案都最为重视的问题了,为了解决并行访问方面的问题各类数据库系统提出了各种各样的方案.SQL Server和Oracle两大DBMS也分别采用了不同的并行处理方法.它们之间的实质差别在哪里呢? Www.Chinaz.com 并行访问的问题     并行访问出现问题存在

SQL Server 2005 SP1的新特性

SQL Server 2005 Service Pack 1为数据库管理员们提供了数据库镜像.支持SAP NetWeaver 商务智能.支持全文本和额外功能的准产品,减轻了他们充分利用一个全新的数据库管理系统的痛苦.我们最近采访了微软的产品和项目,询问了他们一些有关新的特性和在Community Technology Preview过程中出现的相关问题的话题.以下是我们对微软的Michael Raheem, Tim Tow, Herain Oberoi ,和 Grant Culbertson四位