分布式通讯优化篇 – IRQ affinity

      在一次C500K性能压测过程中,发现一个问题:8 processor的CPU,负载基本集中在CPU0,并且负载达到70以上,并通过mpstat发现CPU0每秒总中断(%irq+%soft)次数比较高。

      基于对此问题的研究,解决和思考,便有了这篇文章,希望大家能够喜欢,也欢迎大家留言讨论。

      在正文开始之前,我们先来看两个跟性能相关的基本概念:中断与上线文切换(在实际场景中,发现90%以上的同学说不清楚),希望这篇文章能带给你一些帮助,如果有疑问,欢迎交流。

      中断

        Hardware interrupts are used by devices to communicate that they require attention from the operating system. Internally, hardware interrupts are implemented using electronic alerting
signals
that are sent to the processor from an external device, which is either a part of the computer itself, such as a disk controller, or an external peripheral. For example, pressing a key on the keyboard or moving the mouse triggers
hardware interrupts that cause the processor to read the keystroke or mouse position. Unlike the software type (described below), hardware interrupts are asynchronous and can occur in the middle of instruction execution, requiring additional
care in programming. The act of initiating a hardware interrupt is referred to as an interrupt request (IRQ).

         A software interrupt
is caused either by an exceptional condition in the processor itself, or a special instruction in the instruction set which causes an interrupt when it is executed. The former is often called a trap or exception
and is used for errors or events occurring during program execution that are exceptional enough that they cannot be handled within the program itself. For example, if the processor's arithmetic logic unit is commanded to divide a number by zero, this impossible
demand will cause a divide-by-zero exception, perhaps causing the computer to abandon the calculation or display an error message. Software interrupt instructions function similarly to subroutine calls and are used for a variety of purposes,
such as to request services from low-level system software such as device drivers. For example, computers often use software interrupt instructions to communicate with the disk controller to request data be read or written to the disk.

        硬中断,硬件中断CPU,通常是异步处理的;软中断,指令中断内核执行,分两种情况,一种是异常,另外一种类似subroutine calls,软中断可以用来实现system call。

      上线文切换

        In computing, a context switch is the process of storing and restoring the state (context) of a process or thread so that execution can be resumed from the same point at a later time. This enables
multiple processes to share a single CPU and is an essential feature of a multitasking operating system. What constitutes the context is determined by the processor and the operating system.Context switches are usually computationally intensive,
and much of the design of operating systems is to optimize the use of context switches. Switching from one process to another requires a certain amount of time for doing the administration – saving and loading registers and memory maps, updating various
tables and lists etc
. A context switch can mean a register context switch, a task context switch, as tack frame switch, a thread context switch, or a process context switch.

        上下文切换,发生在内核态,其诱因通常是密集型计算。system call仅仅是kernel mode switch,上下文切换有多种表现形式,如进程之间,线程之间,栈帧之间等。举个例子,比方所pidstat -w 统计出来的  cswch/s 和 nvcswch/s 两个指标就是进程维度的。当然,它的实现也是很直观的,尝试一下如下命令,grep ctxt /proc/${process_id}/status,一目了然。

        问题来了,中断和上下文切换之间究竟存在什么样的数理关系?翻阅了很多文献资料,无果而返。最后去check linux kernal代码中关于cs ,%soft和%irq的统计逻辑,发现中断统计的实现是靠Read stats from /proc/interrupts or /proc/softirqs。更详细的实现请参看here。而进程的上下文切换是通过watch /proc/${process_id}/status来实现的。至于两者的数理关系,至少依靠现在的知识体系还无法拿到,欢迎不吝赐教。

        Ok,下面我们来看一下整个亲核优化过程中所需要掌握的基本技巧:RPS/RFS,irqbalance和irq affinity!

      

      RPS/RFS - Receive Package Steering/Receive Flow Steering

        Google同学开发的patch,从2.6.35开始加入到kernel中。简单来说,其原理是利用hash算法来hash TCP或者 UDP的 package header,并根据应用所在的CPU去选择软中断所需要的CPU。文档中有一句话,最能概括它的使用场景,如下。大致意思是说网卡单队列模式以及队列数少于CPU核数的场景下,如果能保证共享内存,用它无疑是最佳神器。

        For a single queue device, a typical RPS configuration would be to set the rps_cpus to the CPUs in the same memory domain of the interrupting CPU. If NUMA locality is not an issue, this could also be all CPUs in the system. At high interrupt rate,
it might be wise to exclude the interrupting CPU from the map since that already performs much work. For a multi-queue system, if RSS is configured so that a hardware receive queue is mapped to each CPU, then RPS is probably redundant and unnecessary. If there
are fewer hardware queues than CPUs, then RPS might be beneficial if the rps_cpus for each queue are the ones that share the same memory domain as the interrupting CPU for that queue.

        那问题又来了,如何辨别多队列网卡?如何保障共享内存?提供一种思路,对于第一个问题,可以用命令

        lspci -vvv | grep 'Ethernet controller'

        

        如果有MSI-X && Enable+ && TabSize > 1,则该网卡是多队列网卡。对于第二个问题,可以考虑在lscpu的帮助下,将中断绑定到具体的物理CPU上。

      Irqbalance

        手册上是这么说的,distribute hardware interrupts across processors on a multiprocessor system。在SMP体系结构上问题还是蛮多的,可以参看Ubuntu的Bug追踪系统。当然,国内褚霸同学对其源码进行了详细分析,感兴趣的可以也参看这里

       SMP IRQ Affinity

        最后,来看一下kernel 2.4加入的SMP IRQ Affinity:

        An interrupt request (IRQ) is a request for service, sent at the hardware level. Interrupts can be sent by either a dedicated hardware line, or across a hardware bus as an information packet (a Message Signaled Interrupt, or MSI). When interrupts are
enabled, receipt of an IRQ prompts a switch to interrupt context. Kernel interrupt dispatch code retrieves the IRQ number and its associated list of registered Interrupt Service Routines (ISRs), and calls each ISR in turn. The ISR acknowledges the interrupt
and ignores redundant interrupts from the same IRQ, then queues a deferred handler to finish processing the interrupt and stop the ISR from ignoring future interrupts.

       /proc/interrupts列出了IRQ number, the number of that interrupt handled by each CPU core, the interrupt type, and a comma-delimited list of drivers that are registered to receive that interrupt. (Refer to the proc(5) man page for further details: man 5 proc).

       /proc/irq/IRQ_NUMBER/smp_affinity,smp_affinity是用来描述中断亲和特性的,this property can be used to improve application performance by assigning both interrupt affinity and the application's thread affinity to one or more specific CPU cores. This allows cache line
sharing between the specified interrupt and application threads.

       如何验证你的中断亲核性设置是否OK呢?请参看下面的流程:

       a. 查看网卡中断号:

       cat /proc/interrupts

      b. 查看该中断号的cpu affinity:

       sudo cat /proc/irq/42/smp_affinity

      c. 修改绑定:

       sudo echo ff > /proc/irq/42/smp_affinity

      d. 访问特定网站:

       ping -f www.creative.com

      e. 查看中断绑定结果:

       cat /proc/interrupts | grep  'CPU\|42:'

小结:

       在我的多队列网卡中,手动绑定了SMP IRQ Affinity的值,并且排除了其它两种优化方式的干扰,解决掉了开篇提到的性能问题。 但我文章里提到的那个数理关系,还有代进一步挖掘,如果有更多的发现,会及时分享给大家,希望大家能够喜欢!

参考文档:

1. http://en.wikipedia.org/wiki/Interrupt
2. http://en.wikipedia.org/wiki/Context_switch
3. http://wenku.baidu.com/view/315d2c8571fe910ef12df838.html
4. https://www.kernel.org/doc/Documentation/networking/scaling.txt
5. http://kernelnewbies.org/Linux_2_6_35
6. https://cs.uwaterloo.ca/~brecht/servers/apic/SMP-affinity.txt
7. http://www.linfo.org/context_switch.html
8. http://lwn.net/Articles/328339/
9. http://lwn.net/Articles/398385/
10. https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Performance_Tuning_Guide/network-rps.html
11. https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Performance_Tuning_Guide/s-cpu-irq.html smp_affinity

12. http://www.linfo.org/context_switch.html

13. http://www.softpanorama.org/Admin/Monitoring/Sar/linux_implementation_of_sar.shtml

14. http://choices.cs.uiuc.edu/ExpCS07.pdf

15. http://sebastien.godard.pagesperso-orange.fr/

16. http://blog.tsunanet.net/2010/11/how-long-does-it-take-to-make-context.html

时间: 2025-01-19 20:39:30

分布式通讯优化篇 – IRQ affinity的相关文章

Linux 多核下绑定硬件中断到不同 CPU(IRQ Affinity)

转载 - Linux 多核下绑定硬件中断到不同 CPU(IRQ Affinity) 作者 digoal 日期 2016-11-20 标签 Linux , IRQ , 中断 , CPU亲和 , 绑定中断处理CPU 背景 原文 http://www.vpsee.com/2010/07/load-balancing-with-irq-smp-affinity/ 原文 硬件中断发生频繁,是件很消耗 CPU 资源的事情,在多核 CPU 条件下如果有办法把大量硬件中断分配给不同的 CPU (core) 处理

2013年如何让网站上百度首页——站内优化篇

2012年8月23是一个让站长悼念的日子,因为这一天百度对搜索引擎调整,很多网站相继被K,已经将近4个月了,至今没有从还排名.今天是2012年12月12日,在本年度的最后一个月里,相信很多站长都在思考2013年如何让网站上百度首页,说实话,本人SEO中文网站长也在思考这个问题,经过几个月的实践经验,终于悟出来针对当前搜索引擎调整后如何优化网站,能使关键词排名百度首页,下面我给大家分享一下,供大家学校交流. 当前百度惩罚网站原因 在笔者看来,很多网站被惩罚的原因有3点:垃圾外链.友情链接质量.站内

企业网站优化经验浅谈SEO之站内优化篇

     当初的自己开始工作可算是顶级的seo菜鸟了,自己当初在学校,在网上学习的全是seo理论知识,要是说实践的话那是曾没有过的.对此我也很感谢我以前公司的培养和给予我的机会.初做seo,是给企业公司网站做seo搜素引擎优化. 那么我就针对我做过的企业网站seo,结合自己那么一丁点的经验来给大家浅谈一下网站内部优化. 首先,你拿到一个网站你得首先看他是否适合做seo,是否需要做改版,网站结构是否需要改动等等问题.一般来说DIV+CSS是比较适合做seo网站优化的.待你解决掉网站构架整体问题后,

网站优化排名之网站建设优化篇

网站优化排名之网站建设优化篇 一:域名和程序 空间的选择 医疗站点的域名一般是以病种名称的缩写加上当地的区号或者拼音开头,比如泉州妇科,qzfk这样的域名是最好的. 或者0596fk.程序用开源的系统来搭建,帝国cms,php教程cms.dedecms是民营医院常用的几种程序,因为这样程序本身对 seo教程的支持不错的,又比较知名 没有漏洞.空间用当地的空间速度快,如果当地没有,就用广东 北京 上海的,多线空间,电信 网通都要支持的全能空间. 二:网站布局和代码的优化 首先要设计好首页的版面,头

秋色园QBlog技术原理解析:性能优化篇:access的并发极限及超级分库分散并发方案(十六)

上节回顾:   上节 秋色园QBlog技术原理解析:性能优化篇:数据库文章表分表及分库减压方案(十五) 中, 介绍了 秋色园QBlog 在性能优化方面,从技术的优化手段,开始步入数据库设计优化,并从数据的使用情况上进行了分析,从而将文章内容进行分离,得到新的分表,由于内容比较大,进而分了库,达到一种基础减压.   本节内容:   本节将介绍秋色园 QBlog 的Super分库方案,以及何以如此Super分库的原因.   描述说明:   在进行上了上节的分库方案后,虽然感觉一度秋色园QBlog的访

Andrid listview异步图片加载之优化篇

Listview异步加载之优化篇 关于listview的异步加载,网上其实很多示例了,总体思想差不多,不过很多版本或是有bug,或是有性能问题有待优化.有鉴于此,本人在网上找了个相对理想的版本并在此基础上进行改造,下面就让在下阐述其原理以探索个中奥秘,与诸君共赏-          贴张效果图先:            异步加载图片基本思想: 1.      先从内存缓存中获取图片显示(内存缓冲) 2.      获取不到的话从SD卡里获取(SD卡缓冲) 3.      都获取不到的话从网络下载

秋色园QBlog技术原理解析:性能优化篇:用户和文章计数器方案(十七)

上节概要:   上节 秋色园QBlog技术原理解析:性能优化篇:access的并发极限及分库分散并发方案(十六) 中, 介绍了 Access的并发上限,及从某种程度上 秋色园QBlog 针对并发上限进行了多个数据的划分,从而最大并发上限从64提升到64*N(个数据库),虽然总和的最大并发值是上升了,但是单个库的最大值并没有变化,或者说单个表的最大并发值没有发生变化,上限仍是64. 于是,对于频繁产生更新操作的访问计数器(用户表及文章表),是该进入优化的方案了.   本节概要:   本节将介绍秋色

秋色园QBlog技术原理解析:性能优化篇:数据库文章表分表及分库减压方案(十五)

文章回顾: 1: 秋色园QBlog技术原理解析:开篇:整体认识(一) --介绍整体文件夹和文件的作用 2: 秋色园QBlog技术原理解析:认识整站处理流程(二) --介绍秋色园业务处理流程 3: 秋色园QBlog技术原理解析:UrlRewrite之无后缀URL原理(三) --介绍如何实现无后缀URL 4: 秋色园QBlog技术原理解析:UrlRewrite之URL重定向体系(四) --介绍URL如何定位到处理程序 5: 秋色园QBlog技术原理解析:Module之页面基类设计(五) --介绍创建

秋色园QBlog技术原理解析:性能优化篇:打印页面SQL,全局的SQL语句优化(十三)

文章回顾: 1: 秋色园QBlog技术原理解析:开篇:整体认识(一) --介绍整体文件夹和文件的作用 2: 秋色园QBlog技术原理解析:认识整站处理流程(二) --介绍秋色园业务处理流程 3: 秋色园QBlog技术原理解析:UrlRewrite之无后缀URL原理(三) --介绍如何实现无后缀URL 4: 秋色园QBlog技术原理解析:UrlRewrite之URL重定向体系(四) --介绍URL如何定位到处理程序 5: 秋色园QBlog技术原理解析:Module之页面基类设计(五) --介绍创建