linux系统systemtap监控应用问题分析

应用场景：一天，在我们服务器上PHP代码路径下多了一个log文件，从没注意到有这个log文件，但是log文件的格式明显不是我们生成的，格式比较简单，甚至没有function name，log level，明显是我们使用的某个第三方库的输出。到底是那个进程调用第三方库干的坏事？我们当然是有怀疑对象的，从log的语义也可以初步判断是那个进程干的这件事。可是没有证据。

有的童鞋就说了，对这个可疑进程直接执行lsof -p pid或者对文件执行 lsof file不就OK 了，如果这个进程打开了这个莫名其妙的文件，就证明的确是可疑进程写了这个log文件。可惜的是这个进程不是daemon，而且执行时间特别短，来不及对他进行lsof。

这有到了我们systemtap横空出世的时候了。 systemtap由于他的可定制，几乎是一把无所不能的瑞士军刀。第一思路就是监控sys_open,看下到底是那个进程在作死，打开了这个莫名其妙的文件。

方法一：监控sys_open

我们都知道systemtap可以监控系统调用，open作为一个系统调用，我们自然可以监控，如果open的文件恰好是我们要追踪的文件，我们就将pid ，execname 打印出来,真相大白。

代码如下

复制代码

function is_open_creating:long (flag:long)
{
    CREAT_FLAG = 4 // 0x4 = 00000100b
    if (flag & CREAT_FLAG)
    {
            return 1
    }
    return 0
}
probe begin
{
    printf("monitor file beginn")
}
probe kernel.function("sys_open")
{
    if(user_string($filename)== "/home/manu/shell/temp/abc.log")
    {
        creating = is_open_creating($mode);
        if(creating)
        {
            printf("pid %ld (%s) create the file %sn",pid(),execname(),user_string($filename));
        }
        else
        {
            printf("pid %ld (%s) open the file %s n",pid(),execname(),user_string($filename));
        }
    }
}

OK ,我们开始监控，看看能否捕捉到捣乱者。

代码如下	复制代码
root@manu:~/code/systemtap# root@manu:~/code/systemtap# stap file_monitor.stp monitor file begin

我们在另一个终端l中echo创建这个/home/manu/shell/temp/abc.log

代码如下	复制代码
root@manu:~/code/shell/temp# echo abefdf >/home/manu/code/shell/temp/abc.log root@manu:~/code/shell/temp#

我们看到stap捕捉到了这个事件：

代码如下	复制代码
root@manu:~/code/systemtap# stap file_monitor.stp monitor file begin pid 3024 (bash) create the file /home/manu/code/shell/temp/abc.log

Stap捕捉到进程名为bash，PID为3024的进程create了这个文件。目前为止，一切都好，可惜这种方法有个致命的缺陷。filename是进程调用系统调用 open时输入的文件名，可能输入全路径/home/manu/shell/temp/abc.log，也可能输入的是相对路径，如果输入的是相对路径，我们的stap不能捕捉到这个事件。

比如我们再次想abc.log追加写：

代码如下	复制代码
root@manu:~/code/shell/temp# echo "second line " >> abc.log root@manu:~/code/shell/temp# cat abc.log abefdf second line root@manu:~/code/shell/temp#

另一端没有stap没有检测到任何事件。
这种方法有缺陷，因为我们不能够假设进程输入的绝对路径还是相对路径。

方法二：监控文件的inode

文件的名字表示方法可能不同，比如当前路径是 /home/manu/shell/temp/ ，下面表示的都是同一个文件，这就给上面一种方法带来的困难。

abc.log
./abc.log
../temp/abc.log
/home/manu/shell/temp/abc.log
.....
如果我们的文件在磁盘上，那么只要有（主设备号，次设备好，inode）这三个元素，就唯一确定了一个文件。我们还是监控刚才的abc.log。对于我的文件在/dev/sda6，对应的主设备号和次设备号是（0x8,0x6），abc.log对应的inode为：

代码如下	复制代码
361way:/opt # ll -ali abc.log 2099351 -rwxr-xr-x 1 root root 17623 Sep 2 22:55 abc.log

我们的三元组是（0x08,0x06,2099351)
下面是我们的监控脚本inode_monitor.stp

代码如下	复制代码
probe begin { printf("watch inode %d %d %ld beginn",$1,$2,$3) } probe vfs.write { if (dev == MKDEV($1,$2) # major/minor device && ino == $3) printf ("%s(%d) %s 0x%x/%un",execname(), pid(), probefunc(), dev, ino) }

然后我们让stap来监控这个inode对应的文件

代码如下	复制代码
root@manu:~/code/systemtap# stap inode_monitor.stp 0x8 0x6 2099351 watch inode 8 6 2099351 begin

开始我们的实验，我们在shell终端上echo两句写入abc.log，在写一个test.sh脚本去写这个abc.log,实验如下

代码如下	复制代码
root@manu:~/code/shell/temp# echo abefdf >abc.log root@manu:~/code/shell/temp# echo "second line" > ./abc.log root@manu:~/code/shell/temp# cat test.sh #!/bin/sh echo "third line " >> ./abc.log root@manu:~/code/shell/temp# ./test.sh

stap监控脚本的输出如下：

代码如下	复制代码
root@manu:~/code/systemtap# stap inode_monitor.stp 0x8 0x6 2099351 watch inode 8 6 2099351 begin bash(3024) vfs_write 0x800006/2099351 bash(3024) vfs_write 0x800006/2099351 test.sh(9484) vfs_write 0x800006/2099351

我们看到这三个事件都被捕捉到了。也就完成了我们监控这个文件，判断到底是那个进程写入这个log的目标。

systemtap作为一个动态监控和跟踪linux内核的工具，其功能还可以发掘更多。更多内容可以查看IBM社区或systemtap官方网站。同时也可以了解DTrace、ProbeVue （这两者在unix平台上）及fanotify（下一代的inotify文件监控）同类工具。

其他性能工具OProfile、Valgrind、Perf、 redhat MGR 等回头再总结学习

时间： 2024-09-13 03:24:38

linux系统systemtap监控应用问题分析

linux系统systemtap监控应用问题分析的相关文章

Linux系统GoAccess Web实时日志分析和统计工具

Linux 系统实时监控的瑞士军刀 —— Glances

linux系统中监控自动化脚本

Linux系统中防火墙的框架分析_unix linux

linux系统中监控用户的操作记录命令

linux系统centOS6.5使用goaccess工具分析nginx网站日志

Shell脚本实现Linux系统和进程资源监控

Shell脚本实现Linux系统和进程资源监控_基础知识

Linux系统下使用XHProf和XHGui分析PHP运行性能_php技巧