发现有IP对我们API进行大量的数量采集,所以写这个脚本来获取哪些IP只访问单一接口,却不访问其它接口,一般这样的行为,是异常的。
分析前端负载nginx的日志,日志格式如下:
114.249.4.96 - - [15/Jan/2016:23:59:47 +0800] "POST /api2/realtimetrack/ HTTP/1.1" 200 48 "-" "-" "-"
222.128.172.215 - - [15/Jan/2016:23:59:47 +0800] "POST /api2/button_log/ HTTP/1.1" 200 48 "-" "-" "-"
110.72.182.177 - - [15/Jan/2016:23:59:47 +0800] "POST /api2/realtimetrack/ HTTP/1.1" 200 48 "-" "-" "-"
58.63.7.92 - - [15/Jan/2016:23:59:48 +0800] "POST /api2/getgoodsdetail/ HTTP/1.1" 200 877 "-" "-" "-"
117.177.160.218 - - [15/Jan/2016:23:59:48 +0800] "POST /api2/realtimetrack/ HTTP/1.1" 200 82 "-" "-" "-"
117.177.160.218 - - [15/Jan/2016:23:59:48 +0800] "POST /api2/realtimetrack/ HTTP/1.1" 200 82 "-" "-" "-"
163.142.55.76 - - [15/Jan/2016:23:59:48 +0800] "POST /api2/getuserinfo/ HTTP/1.1" 200 546 "-" "-" "-"
114.112.89.34 - - [15/Jan/2016:23:59:48 +0800] "POST /api2/getgoodslist/ HTTP/1.1" 200 9532 "-" "-" "-"
58.61.225.110 - - [15/Jan/2016:23:59:49 +0800] "POST /api2/realtimetrack/ HTTP/1.1" 200 82 "-" "-" "-"
114.244.195.163 - - [15/Jan/2016:23:59:49 +0800] "POST /api2/getgoodslist/ HTTP/1.1" 200 47834 "-" "-" "-"
114.244.195.163 - - [15/Jan/2016:23:59:49 +0800] "POST /api2/getgoodslist/ HTTP/1.1" 200 47834 "-" "-" "-"
114.112.89.34 - - [15/Jan/2016:23:59:49 +0800] "POST /api2/getgoodslist/ HTTP/1.1" 200 9532 "-" "-" "-"
125.39.170.239 - - [15/Jan/2016:23:59:49 +0800] "POST /api2/realtimetrack/ HTTP/1.1" 200 30 "-" "-" "-"
110.84.169.57 - - [15/Jan/2016:23:59:50 +0800] "POST /api2/realtimetrack/ HTTP/1.1" 200 48 "-" "-" "-"
42.81.46.142 - - [15/Jan/2016:23:59:50 +0800] "POST /api2/realtimetrack/ HTTP/1.1" 200 48 "-" "-" "-"
110.84.169.57 - - [15/Jan/2016:23:59:50 +0800] "POST /api2/realtimetrack/ HTTP/1.1" 200 82 "-" "-" "-"
117.136.40.148 - - [15/Jan/2016:23:59:50 +0800] "POST /api2/getgoodslist/ HTTP/1.1" 200 1024 "-" "-" "-"
117.12.243.251 - - [15/Jan/2016:23:59:50 +0800] "POST /api2/realtimetrack/ HTTP/1.1" 200 48 "-" "-" "-"
117.12.243.251 - - [15/Jan/2016:23:59:50 +0800] "POST /api2/realtimetrack/ HTTP/1.1" 200 82 "-" "-" "-"
python分析代码:
#!/usr/bin/env python
#coding:utf8
__author__ = '戴儒锋'
"""
检测nginx日志的访问IP是是否有程序来抓取接口信息
规则:程序分析 只访问getgoodslist 接口,而不访问其它的接口IP
"""
log_path = '/home/logs/nginx/access.log'
#定义IP访问每个URL的次数的空字典,如{'10.0.0.1':{'/api2/getgoodslist':15}}
ip_info = {}
with open(log_path,'r') as f:
for line in f.readlines():
#获取IP地址
ip = line.split()[0]
#获取访问接口URL
url = line.split()[6]
#如果字典里没有该IP,则添加该IP为KEY值,URL为二级字典KEY,访问数=1
#如果有该IP但在二级字典中没有该URL,则该URL设为二级字典KEY,访问数为1
#如果有该IP,且在二级字典中有该URL,则把该URL的值+1
if ip not in ip_info:
ip_info[ip] = {url:1}
else:
if url not in ip_info[ip]:
ip_info[ip][url] = 1
else:
ip_info[ip][url] += 1
#遍历结果,把IP只访问小于3个接口,并且访问getgoodslist接口超过100次的打印出来
for ip,value in ip_info.items():
if len(value) < 3 and value.get('/api2/getgoodslist/',0) > 100:
print "IP:%s URL-COUNT:%s" %(ip,value)
分析结果:
IP:58.63.7.92 URL-COUNT:{'/api2/getgoodsdetail/': 3383, '/api2/getgoodslist/': 550}
IP:58.63.4.71 URL-COUNT:{'/api2/getgoodsdetail/': 4499, '/api2/getgoodslist/': 275}
IP:118.122.120.146 URL-COUNT:{'/api2/getgoodslist/': 443}
IP:114.244.195.163 URL-COUNT:{'/api2/getgoodslist/': 568}
IP:124.72.23.174 URL-COUNT:{'/api2/getgoodslist/': 132}
IP:183.30.79.59 URL-COUNT:{'/api2/getgoodslist/': 322, '/api2/realtimetrack/': 6}
IP:61.140.50.120 URL-COUNT:{'/api2/getgoodslist/': 1402}
IP:171.221.25.108 URL-COUNT:{'/api2/getgoodslist/': 1136}