OProfile & Systemtap

Oprofile性能损耗小,如果CPU支持硬件监控的话(现在大多数CPU已经支持)。但是Oprofile不能像stap样使用timer来间断输出或累计输出统计,STAP损耗较大。Oprofile 适合做性能诊断,例如系统中最耗CPU的进程,进程中哪些函数是比较耗CPU的,函数中哪段代码是最耗CPU的。。。operf开启监控, opreport, opannotate可以输出调用报告,或函数、汇编指令等统计情况。Stap 适合做跟踪。例子 : 


[root@digoal ~]# cd /data06

[root@digoal data06]#  operf --system-wide --lazy-conversion

operf: Press Ctl-c or 'kill -SIGINT 45366' to stop profiling

operf: Profiler started

^C

Profiling done.

Converting profile data to OProfile format

................

输出报告:


[root@digoal data06]# opreport -l -f -w -x -t 1 

Using /data06/oprofile_data/samples/ for samples directory.

CPU: Intel Core/i7, speed 1995.14 MHz (estimated)

Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 100000

vma      samples  %        app name                 symbol name

007827a0 2091381  26.6819  /opt/pgsql9.4.1/bin/postgres HeapTupleSatisfiesVacuum

00490300 988600   12.6126  /opt/pgsql9.4.1/bin/postgres heap_page_prune

0078a8c0 698665    8.9136  /opt/pgsql9.4.1/bin/postgres pg_qsort

0058afb0 676022    8.6247  /opt/pgsql9.4.1/bin/postgres vac_cmp_itemptr

0058baf0 385039    4.9123  /opt/pgsql9.4.1/bin/postgres lazy_vacuum_rel

004c4d00 365497    4.6630  /opt/pgsql9.4.1/bin/postgres XLogInsert

00675420 229805    2.9319  /opt/pgsql9.4.1/bin/postgres itemoffcompare

00675d20 184668    2.3560  /opt/pgsql9.4.1/bin/postgres PageRepairFragmentation

0078a7e0 169808    2.1664  /opt/pgsql9.4.1/bin/postgres swapfunc

00655590 147647    1.8837  /opt/pgsql9.4.1/bin/postgres BufferGetBlockNumber

00488940 139389    1.7783  /opt/pgsql9.4.1/bin/postgres heap_prepare_freeze_tuple

007624d0 86239     1.1002  /opt/pgsql9.4.1/bin/postgres hash_search_with_hash_value

[root@digoal data06]# opreport -l -f -g -w -x -t 1 /opt/pgsql/bin/postgres

Using /data06/oprofile_data/samples/ for samples directory.

CPU: Intel Core/i7, speed 1995.14 MHz (estimated)

Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 100000

vma      samples  %        linenr info                 symbol name

007827a0 2091381  26.7572  /opt/soft_bak/postgresql-9.4.1/src/backend/utils/time/tqual.c:1116 HeapTupleSatisfiesVacuum

00490300 988600   12.6482  /opt/soft_bak/postgresql-9.4.1/src/backend/access/heap/pruneheap.c:174 heap_page_prune

0078a8c0 698665    8.9387  /opt/soft_bak/postgresql-9.4.1/src/port/qsort.c:104 pg_qsort

0058afb0 676022    8.6491  /opt/soft_bak/postgresql-9.4.1/src/backend/commands/vacuumlazy.c:1728 vac_cmp_itemptr

0058baf0 385039    4.9262  /opt/soft_bak/postgresql-9.4.1/src/backend/commands/vacuumlazy.c:172 lazy_vacuum_rel

004c4d00 365497    4.6762  /opt/soft_bak/postgresql-9.4.1/src/backend/access/transam/xlog.c:844 XLogInsert

00675420 229805    2.9401  /opt/soft_bak/postgresql-9.4.1/src/backend/storage/page/bufpage.c:415 itemoffcompare

00675d20 184668    2.3626  /opt/soft_bak/postgresql-9.4.1/src/backend/storage/page/bufpage.c:433 PageRepairFragmentation

0078a7e0 169808    2.1725  /opt/soft_bak/postgresql-9.4.1/src/port/qsort.c:78 swapfunc

00655590 147647    1.8890  /opt/soft_bak/postgresql-9.4.1/src/backend/storage/buffer/bufmgr.c:1898 BufferGetBlockNumber

00488940 139389    1.7833  /opt/soft_bak/postgresql-9.4.1/src/backend/access/heap/heapam.c:5756 heap_prepare_freeze_tuple

007624d0 86239     1.1033  /opt/soft_bak/postgresql-9.4.1/src/backend/utils/hash/dynahash.c:824 hash_search_with_hash_value

可以看到最耗费CPU的调用是哪些。


[root@digoal data06]# opannotate -x -s -t 1 /opt/pgsql/bin/postgres -i HeapTupleSatisfiesVacuum|less

Using /data06/oprofile_data/samples/ for session-dir

/* 

 * Command line: opannotate -x -s -t 1 /opt/pgsql/bin/postgres -i HeapTupleSatisfiesVacuum 

 * 

 * Interpretation of command line:

 * Output annotated source file with samples

 * Output files where samples count reach 1% of the samples

 * 

 * CPU: Intel Core/i7, speed 1995.14 MHz (estimated)

 * Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 100000

 */

/* 

 * Total samples for file : "/opt/soft_bak/postgresql-9.4.1/src/backend/utils/time/tqual.c"

 * 

 * 2091381 100.000

 */

               :/*-------------------------------------------------------------------------

               : *

               : * tqual.c

               : *        POSTGRES "time qualification" code, ie, tuple visibility rules.

               : *

               : * NOTE: all the HeapTupleSatisfies routines will update the tuple's

               : * "hint" status bits if we see that the inserting or deleting transaction

               : * has now committed or aborted (and it is safe to set the hint bits).

               : * If the hint bits are changed, MarkBufferDirtyHint is called on

               : * the passed-in buffer.  The caller must hold not only a pin, but at least

               : * shared buffer content lock on the buffer containing the tuple.

               : *

               : * NOTE: must check TransactionIdIsInProgress (which looks in PGXACT array)

。。。。。。

1879024 89.8461 :       if (!HeapTupleHeaderXminCommitted(tuple))

               :        {

    63  0.0030 :                if (HeapTupleHeaderXminInvalid(tuple))

               :                        return HEAPTUPLE_DEAD;

               :                /* Used by pre-9.0 binary upgrades */

    18 8.6e-04 :                else if (tuple->t_infomask & HEAP_MOVED_OFF)

               :                {

               :                        TransactionId xvac = HeapTupleHeaderGetXvac(tuple);

               :

。。。。。。

最耗费的出现在代码中的这段调用。

if (!HeapTupleHeaderXminCommitted(tuple))

Oprofile支持的事件,使用opcontrol --list-events查看:


[root@digoal data06]# opcontrol --list-events

oprofile: available events for CPU type "Intel Core/i7"

See Intel Architecture Developer's Manual Volume 3B, Appendix A and

Intel Architecture Optimization Reference Manual

For architectures using unit masks, you may be able to specify

unit masks by name.  See 'opcontrol' or 'operf' man page for more details.

CPU_CLK_UNHALTED: (counter: all)

        Clock cycles when not halted (min count: 6000)

UNHALTED_REFERENCE_CYCLES: (counter: all)

        Unhalted reference cycles (min count: 6000)

        Unit masks (default 0x1)

        ----------

        0x01: No unit mask

......

事件配置:


       --events / -e event1[,event2[,...]]

              This option is for passing a comma-separated list of event specifications for profiling. Each event spec

              is of the form:

                 name:count[:unitmask[:kernel[:user]]]

              You can specify unit mask values using either a numerical value (hex values must begin with "0x")  or  a

              symbolic  name  (if  the name=<um_name> field is shown in the ophelp output). For some named unit masks,

              the hex value is not unique; thus, OProfile tools enforce specifying such unit masks value by name.

              Event names for some IBM PowerPC systems include a _GRP<n> (group number) suffix. You  can  pass  either

              the  full event name or the base event name (i.e., without the suffix) to operf.  If the base event name

              is passed, operf will automatically choose an appropriate group number suffix for the event; thus, OPro-

              file post-processing tools will always show real event names that include the group number suffix.

              When  no event specification is given, the default event for the running processor type will be used for

              profiling.  Use ophelp to list the available events for your processor type.

以下摘自redhat admin doc

OProfile is a low overhead, system-wide performance monitoring tool. It uses the performance monitoring hardware on the processor to retrieve information about the kernel and executables on the system, such as when memory is referenced, the number of L2 cache requests, and the number of hardware interrupts received. On a Red Hat Enterprise Linux system, the oprofile package must be installed to use this tool.

Many processors include dedicated performance monitoring hardware. This hardware makes it possible to detect when certain events happen (such as the requested data not being in cache). The hardware normally takes the form of one or more counters that are incremented each time an event takes place. When the counter value increments, an interrupt is generated, making it possible to control the amount of detail (and therefore, overhead) produced by performance monitoring.

OProfile uses this hardware (or a timer-based substitute in cases where performance monitoring hardware is not present) to collect samples of performance-related data each time a counter generates an interrupt. These samples are periodically written out to disk; later, the data contained in these samples can then be used to generate reports on system-level and application-level performance.

Be aware of the following limitations when using OProfile:

  • Use of shared libraries — Samples for code in shared libraries are not attributed to the particular application unless the --separate=library option is used.
  • Performance monitoring samples are inexact — When a performance monitoring register triggers a sample, the interrupt handling is not precise like a divide by zero exception. Due to the out-of-order execution of instructions by the processor, the sample may be recorded on a nearby instruction.
  • opreport does not associate samples for inline functions properly — opreport uses a simple address range mechanism to determine which function an address is in. Inline function samples are not attributed to the inline function but rather to the function the inline function was inserted into.
  • OProfile accumulates data from multiple runs — OProfile is a system-wide profiler and expects processes to start up and shut down multiple times. Thus, samples from multiple runs accumulate. Use the command opcontrol --reset to clear out the samples from previous runs.
  • Hardware performance counters do not work on guest virtual machines — Because the hardware performance counters are not available on virtual systems, you need to use the timer mode. Enter the command opcontrol --deinit, and then execute modprobe oprofile timer=1 to enable the timer mode.
  • Non-CPU-limited performance problems — OProfile is oriented to finding problems with CPU-limited processes. OProfile does not identify processes that are asleep because they are waiting on locks or for some other event to occur (for example an I/O device to finish an operation).

SystemTap is a tracing and probing tool that allows users to study and monitor the activities of the operating system in fine detail. It provides information similar to the output of tools like netstat, ps, top, and iostat; however, SystemTap is designed to provide more filtering and analysis options for the collected information.

While using OProfile is suggested in cases of collecting data on where and why the processor spends time in a particular area of code, it is less usable when finding out why the processor stays idle.

You might want to use SystemTap when instrumenting specific places in code. Because SystemTap allows you to run the code instrumentation without having to stop and restart the instrumented code, it is particularly useful for instrumenting the kernel and daemons.

[参考]
1. http://oprofile.sourceforge.net/

2. https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/System_Administrators_Guide/ch-OProfile.html

时间: 2024-09-20 19:52:10

OProfile & Systemtap的相关文章

PostgreSQL clang &amp; gcc

PostgreSQL clang vs gcc 编译 作者 digoal 日期 2016-11-04 标签 PostgreSQL , clang 背景 CLANG是一个不错的编译器,本文将介绍一下使用CLANG编译以及它的优化开关,如何编译PostgreSQL,同时对比一下GCC 4.4.6版本的性能. 安装clang 安装clang,需要更高版本的gcc来进行编译. 安装gcc 找一个比较快的镜像下载源码包 https://gcc.gnu.org/mirrors.html ftp://ftp.

linux系统systemtap监控应用问题分析

应用场景:一天,在我们服务器上PHP代码路径下多了一个log文件,从没注意到有这个log文件,但是log文件的格式明显不是我们生成的,格式比较简单,甚至没有function name,log level,明显是我们使用的某个第三方库的输出.到底是那个进程调用第三方库干的坏事 ? 我们当然是有怀疑对象的,从log的语义也可以初步判断是那个进程干的这件事.可是没有证据. 有的童鞋就说了,对这个可疑进程直接执行lsof -p pid或者对文件执行 lsof file不就OK 了,如果这个进程打开了这个

cpu-linux下 如何使用oprofile 分析代码的内存使用情况?

问题描述 linux下 如何使用oprofile 分析代码的内存使用情况? 小弟已知如何使用oprofile查看CPU的使用情况,现在想知道用它能查看内存的使用情况吗? 是通过配置 opcontrol --event=MEMORY _ REQUESTS :1000 来查看吗?

Systemtap examples, Identifying Contended User-Space Locks

本文的例子 可用于判断程序性能问题是否由于futex锁冲突引起的. This section describes how to identify contended user-space locks throughout the system within a specific time period. The ability to identify contended user-space locks can help you investigate poor program performa

taskset - retrieve or set a process&#039;s CPU affinity (affect SYSTEMTAP TIME)

在使用systemtap监控进程或者内核的运行状况时, 我们会发现使用systemtap和不使用systemtap时, 某些操作的运行时间差别会比较大. 这是因为systemtap本身带来的开销导致的, 那么如何减少这部分开销呢? 可选的方法较多, 例如精简systemtap, 减少systemtap的触发事件的范围, 简化handler的逻辑等等. 除此之外, 还有其他的方法, 例如设置CPU亲和, Linux进程使用哪个CPU资源是由内核进行调度的. 被跟踪进程的亲和与stap运行的进程亲和

SystemTap知识(二)

Unbuntu安装systemtap: http://www.cnblogs.com/hdflzh/archive/2012/07/25/2608910.html 1 更新源到http://mirror.ubuntu9.com/topmirror/sourceslist/all/54772423e19231bbb722a69fd878df28.list 2 apt-get install systemtap 3 apt-get install systemtap-sdt-dev 4 cdcdHO

SystemTap知识(一)

SystemTap是一个系统的跟踪探测工具.它能让用户来跟踪和研究计算机系统在底层的实现. 安装SystemTap需要为你的系统内核安装-devel,-debuginfo,-debuginfo-common包 使用下面的repo可以进行debuginfo的安装 [debuggery] name=CentOS-$releasever - DebugInfo baseurl=http://debuginfo.centos.org/releasever/releasever/basearch/ pri

SystemTap了解

SystemTrap是监控和跟踪运行中的Linux内核操作的动态方法. http://www.ibm.com/developerworks/cn/linux/l-systemtap/ 使用SystemTrap需要使用trap来运行一个stp脚本 如何安装: Centos下直接yum install systemtrap就行了 测试是否可以运行 运行:stap -ve 'probe begin { log("hello world") exit() }' 可以看到systemtap是先解

oprofile的使用问题,性能调优

问题描述 oprofile的使用问题,性能调优 在虚拟机下的linux系统使用oprofile 当运行attach# opcontrol --start时有如下结果, No event named CPU_CLK_UNHALTED is available. Using default event: CPU_CLK_UNHALTED:100000:0:1:1 No event named CPU_CLK_UNHALTED is available. 当我指定--event时attach# opc