问题描述:
线上环境中,公司自研即时通讯软件不定时掉线。
问题排查:
由运维和测试人员发现并报告,线上环境出现网络异常,具体表现为登录服务器虚拟 IP 地址无法 ping 通,即时通讯工具不定时掉线;
在此情况下,现场人员第一反应就是受到了外部攻击(因为以前遇到过攻击情况),因为看到了如下信息
...
Apr 20 18:21:48 localhost kernel: possible SYN flooding on port 80. Sending cookies.
Apr 20 18:24:37 localhost kernel: possible SYN flooding on port 80. Sending cookies.
Apr 20 18:25:50 localhost kernel: possible SYN flooding on port 80. Sending cookies.
Apr 20 18:27:02 localhost kernel: possible SYN flooding on port 80. Sending cookies.
Apr 20 18:29:01 localhost kernel: possible SYN flooding on port 80. Sending cookies.
Apr 20 18:30:14 localhost kernel: possible SYN flooding on port 80. Sending cookies.
Apr 20 18:31:28 localhost kernel: possible SYN flooding on port 80. Sending cookies.
Apr 20 18:32:44 localhost kernel: possible SYN flooding on port 80. Sending cookies.
Apr 20 18:35:33 localhost kernel: possible SYN flooding on port 80. Sending cookies.
Apr 20 18:37:06 localhost kernel: possible SYN flooding on port 80. Sending cookies.
Apr 20 18:37:52 localhost ntpd_intres[1732]: host name not found: 0.centos.pool.ntp.org
Apr 20 18:38:12 localhost ntpd_intres[1732]: host name not found: 1.centos.pool.ntp.org
Apr 20 18:38:20 localhost kernel: possible SYN flooding on port 80. Sending cookies.
Apr 20 18:38:32 localhost ntpd_intres[1732]: host name not found: 2.centos.pool.ntp.org
Apr 20 18:38:52 localhost ntpd_intres[1732]: host name not found: 3.centos.pool.ntp.org
Apr 20 18:39:29 localhost kernel: possible SYN flooding on port 80. Sending cookies.
Apr 20 18:40:43 localhost kernel: possible SYN flooding on port 80. Sending cookies.
...
以及抓包内容
经过研发人员的初步排查,排除了遭受外部攻击的可能,因为发现
- 大量的 HTTP 请求均来自属于公司终端设备的地址;
- 从抓包中可以看到,问题终端大约每秒会发送 1000 个 TCP SYN 包(实际发现存在不止一个问题终端);
- nginx 未进行 backlog 配置,即使用的是 nginx 默认值 NGX_LISTEN_BACKLOG ,即 511;
- 后台检测脚本基于 nginx 的 /status 页面进行 HTTP 状态检测,会得到 500 状态码;
- 后台检测脚本在 3 次检测异常后,主动触发虚地址切换,在切换过程中通过虚拟 IP 地址进行访问的服务无法正常使用(这个是正常情况);
综上,基本上可以断定问题的原因为:
- 问题终端会在一定场景下不断发起 TCP SYN 请求,导致看似发生了 DOS 攻击行为;
- 系统内核参数以及 nginx 配置参数均未进行过优化调整,存在性能瓶颈;
当时的 系统配置为:
- net.core.somaxconn = 128
- net.ipv4.tcp_max_syn_backlog = 2048
- net.ipv4.tcp_syncookies = 1
后续再邮件讨论中,某大牛给出了如下结论:
平台开启了 SYN 攻击检测,故平台会认为 80 端口受到 SYN flood 攻击,而在这种情况下连正常的 keepalive 心跳检测也会受到影响。昨天的问题,是因为 SYN 攻击检测导致的,不是 TCP SYN 缓存队列占满的原因。
上述结论刚给出的时候,我没有提出任何异议(因为哥也没深入研究过~)。然而经过后续研究,我发现上述结论实际上是存在问题的:
- 针对 keepalive 来说,正如其他内核参数没有调整过一样,net.ipv4.tcp_keepalive_time 也一样没有进行过调整,而且一般也不建议做调整,可以查看默认值为 7200 秒;而这个时间长度在面临类似 SYN flood 攻击行为时,肯定已经可以不考虑 keepalive 问题了;
- 针对 TCP SYN 队列是否占满的问题,从相关资料或源码中可以看到“只有在 SYN 队列已满的情况下才会触发 SYN cookies 机制”,所以上面的说法其实是错误的;
虽然上述结论存在一定偏差,但对问题的整体推进还是有好处的,而 如何进行问题修复其实比较简单,因为在这次事件中,可以很明显的看出“主犯”是问题终端,而未经优化的内核参数以及 nginx 配置则是“从犯”,所以,优先枪决“主犯”,基本上就能解决问题了。而“从犯”理论上讲是可以缓刑处理的。
随着排查的展开,陆续又有更进一步的结论产生:
- nginx 在转发终端请求时,存在转发到错误地址的情况(这个问题的原因未知),进而导致终端获取 token 值失效。
- 同时,根据终端逻辑,其会不断重新建链进行 token 获取,所以在抓包中才看到终端一直在持续发送大量的 tcp 包(其实终端的重连逻辑中还存在更大的问题,此处就呵呵吧)。
另外在问题复现过程中,还抓到了如下内容
可以看到,在抓包最开始的时候,并非只有终端发起的 SYN 包,而是经历了 SYN->SYN,ACK->RST 过程;在运行了一段时间后,才变成了只有 SYN 包被发送,而没有其他回应的。
上述抓包提供了很高的价值,经过排查得到了以下结论:
终端在发送 SYN 后,会在另外的线程中启动定时器对当前 fd 是否可写进行超时判定(据说为 10 秒),在特定情况下(由于内核参数没有进行过调整,所以应该很容易达到所谓的特定情况 ),会触发此超时,导致业务层认为当前连接未建立成功,于是通过 close 关闭该 socket 。另外,由于此时并未成功建立 TCP 连接,故客户端侧协议栈不会发送 FIN 包。而之后当收到来自服务器端的 SYN,ACK 时(因为服务器侧并不知道当前连接已经被关闭 ),则直接由客户端底层 TCP 协议栈回复 RST 。
问题到此已经得到了解决,而此时还剩最后一个问题:线上问题是如何被触发的?按道理说应该一直存在该问题的呀~~
结论大致如下(推断出来的):nginx 所在机器在未经过内核参数优化的情况下,能够处理一定量的并发连接,在请求量不大,每个请求的访问时间较短的情况下,是能够正常提供服务的。由于近期存在一些其他服务的版本升级,怀疑部分业务的请求处理耗时有所增长,导致请求处理速度的整体下降,另外由于针对 nginx 的状态 检测脚本本身也是依赖 HTTP 接口进行的状态检测,也就必然会导致连接资源的紧张,而一旦检测失败后进行虚地址切换,又会导致问题的加剧。综上,导致了最终上述问题的爆发。
相关参数说明:
man 2 listen
...
#include <sys/types.h> /* See NOTES */
#include <sys/socket.h>
int listen(int sockfd, int backlog);
...
The behavior of the backlog argument on TCP sockets changed with Linux 2.2.
Now it specifies the queue length for completely established sockets waiting to be accepted, instead of the number of incomplete connection requests.
The maximum length of the queue for incomplete sockets can be set using /proc/sys/net/ipv4/tcp_max_syn_backlog.
When syncookies are enabled there is no logical maximum length and this setting is ignored.
See tcp(7) for more information.
If the backlog argument is greater than the value in /proc/sys/net/core/somaxconn, then it is silently truncated to that value;
the default value in this file is 128. In kernels before 2.4.25, this limit was a hard coded value, SOMAXCONN, with the value 128.
man 7 tcp
tcp_abort_on_overflow (Boolean; default: disabled; since Linux 2.4)
Enable resetting connections if the listening service is too slow and unable to keep up and accept them.
It means that if overflow occurred due to a burst, the connection will recover.
Enable this option only if you are really sure that the listening daemon cannot be tuned to accept connections faster.
Enabling this option can harm the clients of your server.
...
tcp_max_syn_backlog (integer; default: see below; since Linux 2.2)
The maximum number of queued connection requests which have still not received an acknowledgement from the connecting client.
If this number is exceeded, the kernel will begin dropping requests.
The default value of 256 is increased to 1024 when the memory present in the system is adequate or greater (>= 128Mb),
and reduced to 128 for those systems with very low memory (<= 32Mb).
It is recommended that if this needs to be increased above 1024, TCP_SYNQ_HSIZE in include/net/tcp.h be modified to keep TCP_SYNQ_HSIZE*16<=tcp_max_syn_backlog, and the kernel be recompiled.
...
tcp_synack_retries (integer; default: 5; since Linux 2.2)
The maximum number of times a SYN/ACK segment for a passive TCP connection will be retransmitted.
This number should not be higher than 255.
tcp_syncookies (Boolean; since Linux 2.2)
Enable TCP syncookies.
The kernel must be compiled with CONFIG_SYN_COOKIES.
Send out syncookies when the syn backlog queue of a socket overflows.
The syncookies feature attempts to protect a socket from a SYN flood attack.
This should be used as a last resort, if at all.
This is a violation of the TCP protocol, and conflicts with other areas of TCP such as TCP extensions.
It can cause problems for clients and relays.
It is not recommended as a tuning mechanism for heavily loaded servers to help with overloaded or misconfigured conditions.
For recommended alternatives see tcp_max_syn_backlog, tcp_synack_retries, and tcp_abort_on_overflow.
...