这几天web经常出现Nginx 502的问题,先开始也像很多人一样认为是Nginx的问题,从网上查了查原来是php-fpm在作怪。
web使用的是nginx+php的架构,网站上线还没多久,所以优化方面基本只是做了些初始的配置。
查看php-fpm.log发现有警告,这些警告和网站的挂了个时间基本吻合。我就从这里开始入手。
先开始也是找了些文档,但是第二天还是出现问题。后来查看配置文件并翻译了下(百度),英文底子不好。pm模块类似apache的模块,是分静态和动态的。
网上说的很多调整都是基于动态居多,但是并没说么定义这个模块。所以大家用动态和静态还是要仔细看看配置文件(/usr/local/php/etc/php-fpm.conf)
pm = static
; The number of child processes to be created when pm is set to 'static' and the
; maximum number of child processes when pm is set to 'dynamic' or 'ondemand'.
; This value sets the limit on the number of simultaneous requests that will be
; served. Equivalent to the ApacheMaxClients directive with mpm_prefork.
; Equivalent to the PHP_FCGI_CHILDREN environment variable in the original PHP
; CGI. The below defaults are based on a server without much resources. Don't
; forget to tweak pm.* to fit your needs.
; Note: Used when pm is set to 'static', 'dynamic' or 'ondemand'
; Note: This value is mandatory.
pm.max_children = 300
; The number of child processes created on startup.
; Note: Used only when pm is set to 'dynamic'
; Default Value: min_spare_servers + (max_spare_servers - min_spare_servers) / 2
;pm.start_servers = 50
; The desired minimum number of idle server processes.
; Note: Used only when pm is set to 'dynamic'
; Note: Mandatory when pm is set to 'dynamic'
;pm.min_spare_servers = 20
; The desired maximum number of idle server processes.
; Note: Used only when pm is set to 'dynamic'
; Note: Mandatory when pm is set to 'dynamic'
;pm.max_spare_servers = 500
; The number of seconds after which an idle process will be killed.
; Note: Used only when pm is set to 'ondemand'
; Default Value: 10s
pm.process_idle_timeout = 10s;
; The number of requests each child process should execute before respawning.
; This can be useful to work around memory leaks in 3rd party libraries. For
; endless request processing specify '0'. Equivalent to PHP_FCGI_MAX_REQUESTS.
; Default Value: 0
pm.max_requests = 10240
红色字段就是定义方式的,定义好这个再去根据服务器情况设置参数
假如使用静态 pm.max_children这个参数会起作用,其余不会。动态反之。
2G内存pm.max_children大概开启50左右,按照实际情况来调优,这个是很必要的。
补充:
1.php-fpm进程数不够用
使用 netstat -napo |grep "php-fpm" | wc -l 查看一下当前fastcgi进程个数,如果个数接近conf里配置的上限,就需要调高进程数。
但也不能无休止调高,可以根据服务器内存情况,可以把php-fpm子进程数调到100或以上,在4G内存的服务器上200就可以。
2. 调高调高linux内核打开文件数量
可以使用这些命令(必须是root帐号)
echo 'ulimit -HSn 65536' >> /etc/profile
echo 'ulimit -HSn 65536' >> /etc/rc.local
source /etc/profile
3.脚本执行时间超时
如果脚本因为某种原因长时间等待不返回 ,导致新来的请求不能得到处理,可以适当调小如下配置。
nginx.conf里面主要是如下
fastcgi_connect_timeout 300;
fastcgi_send_timeout 300;
fastcgi_read_timeout 300;
php-fpm.conf里如要是如下
request_terminate_timeout = 10s
4.缓存设置比较小
修改或增加配置到nginx.conf
proxy_buffer_size 64k;
proxy_buffers 512k;
proxy_busy_buffers_size 128k;
5. recv() failed (104: Connection reset by peer) while reading response header from upstream
可能的原因机房网络丢包或者机房有硬件防火墙禁止访问该域名
但最重要的是程序里要设置好超时,不要使用php-fpm的request_terminate_timeout,
最好设成request_terminate_timeout=0;
因为这个参数会直接杀掉php进程,然后重启php进程,这样前端nginx就会返回104: Connection reset by peer。这个过程是很慢,总体感觉就是网站很卡。
May 01 10:50:58.044162 [WARNING] [pool www] child 4074, script '/usr/local/nginx/html/quancha/sameip/detail.php' execution timed out (15.129933 sec), terminating
May 01 10:50:58.045725 [WARNING] [pool www] child 4074 exited on signal 15 SIGTERM after 90.227060 seconds from start
May 01 10:50:58.046818 [NOTICE] [pool www] child 4082 started
说一千道一万最重要的就是程序里控制好超时,gethostbyname、curl、file_get_contents等函数的都要设置超时时间。
另一个就是多说,这个东西是增加了网站的交互性,但是使用的多了反应就慢了,如果你网站超时且使用了多说是,可以关闭它。
如果哪里有不足希望大家提意见,502解决办法。