taskset - retrieve or set a process's CPU affinity (affect SYSTEMTAP TIME)

在使用systemtap监控进程或者内核的运行状况时, 我们会发现使用systemtap和不使用systemtap时, 某些操作的运行时间差别会比较大. 这是因为systemtap本身带来的开销导致的, 那么如何减少这部分开销呢?

可选的方法较多, 例如精简systemtap, 减少systemtap的触发事件的范围, 简化handler的逻辑等等.

除此之外, 还有其他的方法, 例如设置CPU亲和, Linux进程使用哪个CPU资源是由内核进行调度的.

被跟踪进程的亲和与stap运行的进程亲和分开, 可以有效的减少stap的影响吗. 实际上影响甚微, 但是有效果.  

其次我发现CentOS 6.4 x64系统下面, 使用CPU 0的话, 性能会远远不如使用其他任何CPU.  (这才是本文主题)

所以改亲和的话, 只要不使用0号CPU就可以了.

例如 : 

服务器上有8个CPU

pg93@db-172-16-3-150-> numactl --show
policy: default
preferred node: current
physcpubind: 0 1 2 3 4 5 6 7
cpubind: 0
nodebind: 0
membind: 0
pg93@db-172-16-3-150-> cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 26
model name      : Intel(R) Xeon(R) CPU           E5504  @ 2.00GHz
stepping        : 5
cpu MHz         : 1595.992
cache size      : 4096 KB
physical id     : 1
siblings        : 4
core id         : 0
cpu cores       : 4
apicid          : 16
initial apicid  : 16
fpu             : yes
fpu_exception   : yes
cpuid level     : 11
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm dts tpr_shadow vnmi flexpriority ept vpid
bogomips        : 3191.98
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 6
model           : 26
model name      : Intel(R) Xeon(R) CPU           E5504  @ 2.00GHz
stepping        : 5
cpu MHz         : 1595.992
cache size      : 4096 KB
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 4
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 11
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm dts tpr_shadow vnmi flexpriority ept vpid
bogomips        : 3191.50
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor       : 2
vendor_id       : GenuineIntel
cpu family      : 6
model           : 26
model name      : Intel(R) Xeon(R) CPU           E5504  @ 2.00GHz
stepping        : 5
cpu MHz         : 1595.992
cache size      : 4096 KB
physical id     : 1
siblings        : 4
core id         : 1
cpu cores       : 4
apicid          : 18
initial apicid  : 18
fpu             : yes
fpu_exception   : yes
cpuid level     : 11
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm dts tpr_shadow vnmi flexpriority ept vpid
bogomips        : 3191.98
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor       : 3
vendor_id       : GenuineIntel
cpu family      : 6
model           : 26
model name      : Intel(R) Xeon(R) CPU           E5504  @ 2.00GHz
stepping        : 5
cpu MHz         : 1595.992
cache size      : 4096 KB
physical id     : 0
siblings        : 4
core id         : 1
cpu cores       : 4
apicid          : 2
initial apicid  : 2
fpu             : yes
fpu_exception   : yes
cpuid level     : 11
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm dts tpr_shadow vnmi flexpriority ept vpid
bogomips        : 3191.50
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor       : 4
vendor_id       : GenuineIntel
cpu family      : 6
model           : 26
model name      : Intel(R) Xeon(R) CPU           E5504  @ 2.00GHz
stepping        : 5
cpu MHz         : 1595.992
cache size      : 4096 KB
physical id     : 1
siblings        : 4
core id         : 2
cpu cores       : 4
apicid          : 20
initial apicid  : 20
fpu             : yes
fpu_exception   : yes
cpuid level     : 11
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm dts tpr_shadow vnmi flexpriority ept vpid
bogomips        : 3191.98
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor       : 5
vendor_id       : GenuineIntel
cpu family      : 6
model           : 26
model name      : Intel(R) Xeon(R) CPU           E5504  @ 2.00GHz
stepping        : 5
cpu MHz         : 1595.992
cache size      : 4096 KB
physical id     : 0
siblings        : 4
core id         : 2
cpu cores       : 4
apicid          : 4
initial apicid  : 4
fpu             : yes
fpu_exception   : yes
cpuid level     : 11
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm dts tpr_shadow vnmi flexpriority ept vpid
bogomips        : 3191.50
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor       : 6
vendor_id       : GenuineIntel
cpu family      : 6
model           : 26
model name      : Intel(R) Xeon(R) CPU           E5504  @ 2.00GHz
stepping        : 5
cpu MHz         : 1595.992
cache size      : 4096 KB
physical id     : 1
siblings        : 4
core id         : 3
cpu cores       : 4
apicid          : 22
initial apicid  : 22
fpu             : yes
fpu_exception   : yes
cpuid level     : 11
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm dts tpr_shadow vnmi flexpriority ept vpid
bogomips        : 3191.98
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor       : 7
vendor_id       : GenuineIntel
cpu family      : 6
model           : 26
model name      : Intel(R) Xeon(R) CPU           E5504  @ 2.00GHz
stepping        : 5
cpu MHz         : 1595.992
cache size      : 4096 KB
physical id     : 0
siblings        : 4
core id         : 3
cpu cores       : 4
apicid          : 6
initial apicid  : 6
fpu             : yes
fpu_exception   : yes
cpuid level     : 11
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm dts tpr_shadow vnmi flexpriority ept vpid
bogomips        : 3191.50
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

从以上CPUINFO可以看出, 这里有2个物理CPU, 0和1.

使用dmidecode也可以看到这部分信息

[root@db-172-16-3-150 ~]# dmidecode -t processor
# dmidecode 2.11
SMBIOS 2.6 present.

Handle 0x0400, DMI type 4, 40 bytes
Processor Information
        Socket Designation: CPU1
        Type: Central Processor
        Family: Xeon
        Manufacturer: Intel
        ID: A5 06 01 00 FF FB EB BF
        Signature: Type 0, Family 6, Model 26, Stepping 5
        Flags:
                FPU (Floating-point unit on-chip)
                VME (Virtual mode extension)
                DE (Debugging extension)
                PSE (Page size extension)
                TSC (Time stamp counter)
                MSR (Model specific registers)
                PAE (Physical address extension)
                MCE (Machine check exception)
                CX8 (CMPXCHG8 instruction supported)
                APIC (On-chip APIC hardware supported)
                SEP (Fast system call)
                MTRR (Memory type range registers)
                PGE (Page global enable)
                MCA (Machine check architecture)
                CMOV (Conditional move instruction supported)
                PAT (Page attribute table)
                PSE-36 (36-bit page size extension)
                CLFSH (CLFLUSH instruction supported)
                DS (Debug store)
                ACPI (ACPI supported)
                MMX (MMX technology supported)
                FXSR (FXSAVE and FXSTOR instructions supported)
                SSE (Streaming SIMD extensions)
                SSE2 (Streaming SIMD extensions 2)
                SS (Self-snoop)
                HTT (Multi-threading)
                TM (Thermal monitor supported)
                PBE (Pending break enabled)
        Version: Intel(R) Xeon(R) CPU           E5504  @ 2.00GHz
        Voltage: 1.2 V
        External Clock: 4800 MHz
        Max Speed: 3600 MHz
        Current Speed: 1600 MHz
        Status: Populated, Enabled
        Upgrade: Socket LGA1366
        L1 Cache Handle: 0x0700
        L2 Cache Handle: 0x0701
        L3 Cache Handle: 0x0702
        Serial Number: Not Specified
        Asset Tag: Not Specified
        Part Number: Not Specified
        Core Count: 4
        Core Enabled: 4
        Thread Count: 4
        Characteristics:
                64-bit capable

Handle 0x0401, DMI type 4, 40 bytes
Processor Information
        Socket Designation: CPU2
        Type: Central Processor
        Family: Xeon
        Manufacturer: Intel
        ID: A5 06 01 00 FF FB EB BF
        Signature: Type 0, Family 6, Model 26, Stepping 5
        Flags:
                FPU (Floating-point unit on-chip)
                VME (Virtual mode extension)
                DE (Debugging extension)
                PSE (Page size extension)
                TSC (Time stamp counter)
                MSR (Model specific registers)
                PAE (Physical address extension)
                MCE (Machine check exception)
                CX8 (CMPXCHG8 instruction supported)
                APIC (On-chip APIC hardware supported)
                SEP (Fast system call)
                MTRR (Memory type range registers)
                PGE (Page global enable)
                MCA (Machine check architecture)
                CMOV (Conditional move instruction supported)
                PAT (Page attribute table)
                PSE-36 (36-bit page size extension)
                CLFSH (CLFLUSH instruction supported)
                DS (Debug store)
                ACPI (ACPI supported)
                MMX (MMX technology supported)
                FXSR (FXSAVE and FXSTOR instructions supported)
                SSE (Streaming SIMD extensions)
                SSE2 (Streaming SIMD extensions 2)
                SS (Self-snoop)
                HTT (Multi-threading)
                TM (Thermal monitor supported)
                PBE (Pending break enabled)
        Version: Intel(R) Xeon(R) CPU           E5504  @ 2.00GHz
        Voltage: 1.2 V
        External Clock: 4800 MHz
        Max Speed: 3600 MHz
        Current Speed: 1600 MHz
        Status: Populated, Idle
        Upgrade: Socket LGA1366
        L1 Cache Handle: 0x0703
        L2 Cache Handle: 0x0704
        L3 Cache Handle: 0x0705
        Serial Number: Not Specified
        Asset Tag: Not Specified
        Part Number: Not Specified
        Core Count: 4
        Core Enabled: 4
        Thread Count: 4
        Characteristics:
                64-bit capable
[root@db-172-16-3-150 ~]# cat /proc/cpuinfo |grep "core id"
core id         : 0
core id         : 0
core id         : 1
core id         : 1
core id         : 2
core id         : 2
core id         : 3
core id         : 3
[root@db-172-16-3-150 ~]# cat /proc/cpuinfo |grep "physical id"
physical id     : 1
physical id     : 0
physical id     : 1
physical id     : 0
physical id     : 1
physical id     : 0
physical id     : 1
physical id     : 0

在没有运行业务的情况下, 我们要测试PostgreSQL某SQL的IO请求次数和请求时间.

启动PostgreSQL数据库时, 把CPU亲和设置为0. (也就是文章开头提到的性能最烂的做法.)

pg93@db-172-16-3-150-> taskset -c 0 /home/pg93/pgsql9.3.1/bin/postgres >/dev/null 2>&1

使用亲和1启动psql.

pg93@db-172-16-3-150-> taskset -c 1 psql
psql (9.3.1)
Type "help" for help.
digoal=# select pg_backend_pid();
 pg_backend_pid
----------------
          24781
(1 row)

然后在启动stap时把亲和设置为和PostgreSQL不一样的CPU id. 例如3(这样也确保了不在一个物理core上.). 

[root@db-172-16-3-150 ~]# taskset -c 3 stap -e '
global a
probe process("/home/pg93/pgsql9.3.1/bin/postgres").mark("query__start") {
  delete a
  println("query__start ", user_string($arg1), "pid:", pid())
}
probe vfs.read.return {
  t = gettimeofday_ns() - @entry(gettimeofday_ns())
  # if (execname() == "postgres" && devname != "N/A")  # 这句判断也有一定的影响, 所以不用. 我们使用target模式.
    a[pid()] <<< t
}
probe process("/home/pg93/pgsql9.3.1/bin/postgres").mark("query__done") {
  printdln("**", pid(), @count(a[pid()]), @avg(a[pid()]))
  println("query__done ", user_string($arg1), "pid:", pid())
  delete a
}' -x 24781

在psql中连续快速的执行SQL

digoal=# explain (analyze,verbose,costs,buffers,timing) select * from tbl_cost_align;
                                                               QUERY PLAN                                                           

------------------------------------------------------------------------------------------------------------------------------------
----
 Seq Scan on postgres.tbl_cost_align  (cost=0.00..195393.00 rows=10100000 width=45) (actual time=0.023..3154.782 rows=10100000 loops
=1)
   Output: id, info, crt_time
   Buffers: shared read=94393
 Total runtime: 4710.788 ms
(4 rows)

digoal=# explain (analyze,verbose,costs,buffers,timing) select * from tbl_cost_align;
                                                               QUERY PLAN                                                           

------------------------------------------------------------------------------------------------------------------------------------
----
 Seq Scan on postgres.tbl_cost_align  (cost=0.00..195393.00 rows=10100000 width=45) (actual time=0.040..3151.680 rows=10100000 loops
=1)
   Output: id, info, crt_time
   Buffers: shared hit=32 read=94361
 Total runtime: 4696.888 ms
(4 rows)

digoal=# explain (analyze,verbose,costs,buffers,timing) select * from tbl_cost_align;
                                                               QUERY PLAN                                                           

------------------------------------------------------------------------------------------------------------------------------------
----
 Seq Scan on postgres.tbl_cost_align  (cost=0.00..195393.00 rows=10100000 width=45) (actual time=0.035..3158.134 rows=10100000 loops
=1)
   Output: id, info, crt_time
   Buffers: shared hit=64 read=94329
 Total runtime: 4706.135 ms
(4 rows)

stap输出

query__start explain (analyze,verbose,costs,buffers,timing) select * from tbl_cost_align;pid:24781
24781**94417**5461
query__done explain (analyze,verbose,costs,buffers,timing) select * from tbl_cost_align;pid:24781
query__start explain (analyze,verbose,costs,buffers,timing) select * from tbl_cost_align;pid:24781
24781**94361**5429
query__done explain (analyze,verbose,costs,buffers,timing) select * from tbl_cost_align;pid:24781
query__start explain (analyze,verbose,costs,buffers,timing) select * from tbl_cost_align;pid:24781
24781**94329**5529
query__done explain (analyze,verbose,costs,buffers,timing) select * from tbl_cost_align;pid:24781

我们看到每次read IO的时间为5461纳秒左右.

如果把stap的亲和设置为非0, 即使与stap的亲和一致也没关系. 那么会发生什么情况呢?

pg93@db-172-16-3-150-> taskset -c 3 /home/pg93/pgsql9.3.1/bin/postgres >/dev/null 2>&1
pg93@db-172-16-3-150-> taskset -c 3 psql
psql (9.3.1)
Type "help" for help.

digoal=# select pg_backend_pid();
 pg_backend_pid
----------------
          24829
(1 row)
[root@db-172-16-3-150 ~]# taskset -c 3 stap -e '
global a
probe process("/home/pg93/pgsql9.3.1/bin/postgres").mark("query__start") {
  delete a
  println("query__start ", user_string($arg1), "pid:", pid())
}
probe vfs.read.return {
  t = gettimeofday_ns() - @entry(gettimeofday_ns())
  # if (execname() == "postgres" && devname != "N/A")  # 这句判断也有一定的影响, 所以不用. 建议使用target模式.
    a[pid()] <<< t
}
probe process("/home/pg93/pgsql9.3.1/bin/postgres").mark("query__done") {
  printdln("**", pid(), @count(a[pid()]), @avg(a[pid()]))
  println("query__done ", user_string($arg1), "pid:", pid())
  delete a
}' -x 24829

digoal=# explain (analyze,verbose,costs,buffers,timing) select * from tbl_cost_align;
                                                               QUERY PLAN                                                           

------------------------------------------------------------------------------------------------------------------------------------
----
 Seq Scan on postgres.tbl_cost_align  (cost=0.00..195393.00 rows=10100000 width=45) (actual time=0.023..2039.248 rows=10100000 loops
=1)
   Output: id, info, crt_time
   Buffers: shared read=94393
 Total runtime: 3065.989 ms
(4 rows)

digoal=# explain (analyze,verbose,costs,buffers,timing) select * from tbl_cost_align;
                                                               QUERY PLAN                                                           

------------------------------------------------------------------------------------------------------------------------------------
----
 Seq Scan on postgres.tbl_cost_align  (cost=0.00..195393.00 rows=10100000 width=45) (actual time=0.036..2039.643 rows=10100000 loops
=1)
   Output: id, info, crt_time
   Buffers: shared hit=32 read=94361
 Total runtime: 3066.841 ms
(4 rows)

digoal=# explain (analyze,verbose,costs,buffers,timing) select * from tbl_cost_align;
                                                               QUERY PLAN                                                           

------------------------------------------------------------------------------------------------------------------------------------
----
 Seq Scan on postgres.tbl_cost_align  (cost=0.00..195393.00 rows=10100000 width=45) (actual time=0.034..2039.005 rows=10100000 loops
=1)
   Output: id, info, crt_time
   Buffers: shared hit=64 read=94329
 Total runtime: 3066.746 ms
(4 rows)

query__start explain (analyze,verbose,costs,buffers,timing) select * from tbl_cost_align;pid:24829
24829**94417**3277
query__done explain (analyze,verbose,costs,buffers,timing) select * from tbl_cost_align;pid:24829
query__start explain (analyze,verbose,costs,buffers,timing) select * from tbl_cost_align;pid:24829
24829**94361**3272
query__done explain (analyze,verbose,costs,buffers,timing) select * from tbl_cost_align;pid:24829
query__start explain (analyze,verbose,costs,buffers,timing) select * from tbl_cost_align;pid:24829
24829**94329**3273
query__done explain (analyze,verbose,costs,buffers,timing) select * from tbl_cost_align;pid:24829

每次IO响应时间降到了3273纳秒左右, 几乎下降了一半, 所以说CPU 0不行啊.

最后把STAP的亲和设置为4, (与postgres进程分开使用物理CPU)那么会怎么样呢?

digoal=# explain (analyze,verbose,costs,buffers,timing) select * from tbl_cost_align;
                                                               QUERY PLAN                                                           

------------------------------------------------------------------------------------------------------------------------------------
----
 Seq Scan on postgres.tbl_cost_align  (cost=0.00..195393.00 rows=10100000 width=45) (actual time=0.055..2032.953 rows=10100000 loops
=1)
   Output: id, info, crt_time
   Buffers: shared hit=96 read=94297
 Total runtime: 3059.635 ms
(4 rows)

digoal=# explain (analyze,verbose,costs,buffers,timing) select * from tbl_cost_align;
                                                               QUERY PLAN                                                           

------------------------------------------------------------------------------------------------------------------------------------
----
 Seq Scan on postgres.tbl_cost_align  (cost=0.00..195393.00 rows=10100000 width=45) (actual time=0.044..2034.114 rows=10100000 loops
=1)
   Output: id, info, crt_time
   Buffers: shared hit=128 read=94265
 Total runtime: 3060.832 ms
(4 rows)

digoal=# explain (analyze,verbose,costs,buffers,timing) select * from tbl_cost_align;
                                                               QUERY PLAN                                                           

------------------------------------------------------------------------------------------------------------------------------------
----
 Seq Scan on postgres.tbl_cost_align  (cost=0.00..195393.00 rows=10100000 width=45) (actual time=0.043..2038.548 rows=10100000 loops
=1)
   Output: id, info, crt_time
   Buffers: shared hit=160 read=94233
 Total runtime: 3065.674 ms
(4 rows)

测得的IO响应时间又有了一定的下降, 说明还是有一定效果的. 但是相比不使用CPU 0 这里显得效果比较弱了.

query__start explain (analyze,verbose,costs,buffers,timing) select * from tbl_cost_align;pid:24829
24829**94297**3194
query__done explain (analyze,verbose,costs,buffers,timing) select * from tbl_cost_align;pid:24829
query__start explain (analyze,verbose,costs,buffers,timing) select * from tbl_cost_align;pid:24829
24829**94265**3200
query__done explain (analyze,verbose,costs,buffers,timing) select * from tbl_cost_align;pid:24829
query__start explain (analyze,verbose,costs,buffers,timing) select * from tbl_cost_align;pid:24829
24829**94233**3195
query__done explain (analyze,verbose,costs,buffers,timing) select * from tbl_cost_align;pid:24829

最后在附一个未开启stap得到的结果.

digoal=# explain (analyze,verbose,costs,buffers,timing) select * from tbl_cost_align;
                                                               QUERY PLAN                                                           

------------------------------------------------------------------------------------------------------------------------------------
----
 Seq Scan on postgres.tbl_cost_align  (cost=0.00..195393.00 rows=10100000 width=45) (actual time=0.037..1812.712 rows=10100000 loops
=1)
   Output: id, info, crt_time
   Buffers: shared hit=192 read=94201
 Total runtime: 2833.760 ms
(4 rows)

digoal=# explain (analyze,verbose,costs,buffers,timing) select * from tbl_cost_align;
                                                               QUERY PLAN                                                           

------------------------------------------------------------------------------------------------------------------------------------
----
 Seq Scan on postgres.tbl_cost_align  (cost=0.00..195393.00 rows=10100000 width=45) (actual time=0.038..1814.074 rows=10100000 loops
=1)
   Output: id, info, crt_time
   Buffers: shared hit=224 read=94169
 Total runtime: 2835.363 ms
(4 rows)

digoal=# explain (analyze,verbose,costs,buffers,timing) select * from tbl_cost_align;
                                                               QUERY PLAN                                                           

------------------------------------------------------------------------------------------------------------------------------------
----
 Seq Scan on postgres.tbl_cost_align  (cost=0.00..195393.00 rows=10100000 width=45) (actual time=0.039..1814.641 rows=10100000 loops
=1)
   Output: id, info, crt_time
   Buffers: shared hit=256 read=94137
 Total runtime: 2840.508 ms
(4 rows)

stap开启后, 查询时间长了220毫秒左右. 本文中每个handler带来的额外开销是 : 

digoal=# select 220/94137.0;
        ?column?
------------------------
 0.00233701945037551653
(1 row)
0.002337毫秒.
2337纳秒.

[参考]
1. man taskset

2. https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Performance_Tuning_Guide/main-cpu.html

3. http://docs.mongodb.org/manual/administration/production-notes/#mongodb-on-numa-hardware

时间: 2025-01-01 22:04:15

taskset - retrieve or set a process's CPU affinity (affect SYSTEMTAP TIME)的相关文章

CPU亲合力(CPU Affinity)

Bind specific processes to specific processors with a new system call. 使用新的系统调用,可以将一个进程绑定到一个特定的CPU上运行. The ability in Linux to bind one or more processes to one or more processors, called CPU affinity, is a long-requested feature. The idea is to say

起源自天文学的PostgreSQL 优化器成本因子校对

标签 PostgreSQL , 优化器 , 成本因子 , 校对 背景 最近在写一个PostgreSQL数据库培训PPT, 其中有explain的讲解需求, 刚刚接触PostgreSQL的童鞋对PostgreSQL的explain输出中cost的部分可能会一头雾水, 看不懂cost的值和SQL实际运行时间有什么联系. 为了让大家能更加深刻的了解explain, 我打算对explain的cost常量做一次校准, 这个校准的灵感来自天文望远镜赤道仪的校准. PostgreSQL cost常量在校准后,

system Idle Process CPU占用率高是什么原因?如何解决

在操作系统服务里面,都没有禁止它的选项::默认它是占用除了当前应用程序所分配的处理器(CPU)百分比之外的所有占用率,下面我们来了解一下system Idle Process CPU占用率高的问题 system Idle Process是Windows页面内存管理进程项.不能禁止,不能结束. 此项CPU使用率越大,说明可供分配的CPU使用率越多,数字越小则表示CPU使用率越少. 简单的多,此项进程越大越好. 一旦应用程序发出请求,处理器会立刻响应的.在这个进程里出现的CPU占用数值并不是真正的占

【系统】如何控制cpu资源使用

 在我们运维的db环境中,一个机器跑多个mysql实例经常会出现某个进程占用了几乎所有的cpu 进而影响其他的实例运行.基于此我们需要限制某个进程的cpu资源,将其使用 的cpu限定在某个或者某几个固定的cpu上,避免对其他的进程产生影响. 限制cpu的方法有以下几种: 1 taskset 2 cpulimit 3 使用nice和renice设置程序执行的优先级 4 使用ulimit 限制cpu   下面着重介绍taskset cpulimit 的方法. 方法 3 4 在现有环境中操作性不强.

[20130727]linux下使用taskset命令.txt

[20130727]linux下使用taskset命令.txt # rpm -qilf `which taskset`Name        : schedutils                   Relocations: (not relocatable)Version     : 1.4.0                             Vendor: Red Hat, Inc.Release     : 2                             Build

Linux工具集taskset的语法及示例

语法: taskset [options] mask command [arg]... taskset [options] -p [mask] pid 参数: -a, --all-tasks 操作所有的任务线程 -p, --pid 操作已存在的pid -c, --cpu-list 通过列表显示方式设置CPU(逗号相隔) 实例: 指定PID为9588的进程在CPU0上运行: taskset -pc 0 9588 pid 9588's current affinity list: 0,1 pid 9

Linux的进程运行在哪个CPU内核上的方法

  当你在 多核 NUMA 处理器上运行需要较高性能的 HPC(高性能计算)程序或非常消耗网络资源的程序时,CPU/memory 的亲和力是限度其发挥最大性能的重要因素之一.在同一 NUMA 节点上调度最相关的进程可以减少缓慢的远程内存访问.像英特尔 Sandy Bridge 处理器,该处理器有一个集成的 PCIe 控制器,你可以在同一 NUMA 节点上调度网络 I/O 负载(如网卡)来突破 PCI 到 CPU 亲和力限制. 作为性能优化和故障排除的一部分,你可能想知道特定的进程被调度到哪个 C

system idle process是什么进程,可以关闭吗

电脑开启后就多了一个进程,名字是system idle process,不知道system idle process是什么进程,可以关闭吗?还是说电脑中了病毒呢!下面小编将针对这一问题向大家详细介绍system idle process是什么进程. 如上图所示,我们会在任务管理器中找到System Idle Process 进程,那么System Idle Process 系统进程是不是病毒呢?这里可以肯定的告诉大家这个系统进程不是病毒,是Microsoft 公司windows操作系统带有的系统

Win7系统中system idle process是什么进程

  Win7系统中的system idle process是什么进程. 步骤如下: 如上图所示,我们会在任务管理器中找到System Idle Process 进程,那么System Idle Process 系统进程是不是病毒呢?这里可以肯定的告诉大家这个系统进程不是病毒,是Microsoft 公司windows操作系统带有的系统进程.您无法从任务管理器中结束此进程. 该进程是在各个处理器上运行的单个线程,它唯一的任务是在系统没有处理其他线程时占用处理器时间.在任务管理器中,此进程会占用绝大部