ganglia metric extended by mod_python

ganglia 对监控数据(metric)的采样是模块化方式进行的, 例如 : 

自带的监控项其实是放在一些C写的动态链接库来加载并通过gmond.conf配置来调度的.

[root@db-172-16-3-221 ganglia]# cd /opt/ganglia-core-3.6.0/
[root@db-172-16-3-221 ganglia-core-3.6.0]# ll
total 24
drwxr-xr-x 2 root root 4096 Sep  9 11:03 bin
drwxr-xr-x 3 root root 4096 Sep 15 09:31 etc
drwxr-xr-x 2 root root 4096 Sep  9 11:03 include
drwxr-xr-x 3 root root 4096 Sep  9 11:03 lib64
drwxr-xr-x 2 root root 4096 Sep  9 11:03 sbin
drwxr-xr-x 3 root root 4096 Sep  9 11:03 share
[root@db-172-16-3-221 ganglia-core-3.6.0]# cd lib64/
[root@db-172-16-3-221 lib64]# ll
total 660
drwxr-xr-x 2 root root   4096 Sep  9 11:03 ganglia
lrwxrwxrwx 1 root root     25 Sep  9 11:03 libganglia-3.6.0.so.0 -> libganglia-3.6.0.so.0.0.0
-rwxr-xr-x 1 root root 257554 Sep  9 11:03 libganglia-3.6.0.so.0.0.0
-rw-r--r-- 1 root root 408276 Sep  9 11:03 libganglia.a
-rwxr-xr-x 1 root root   1270 Sep  9 11:03 libganglia.la
lrwxrwxrwx 1 root root     25 Sep  9 11:03 libganglia.so -> libganglia-3.6.0.so.0.0.0
[root@db-172-16-3-221 lib64]# cd ganglia/
[root@db-172-16-3-221 ganglia]# ll
total 700
-rwxr-xr-x 1 root root 86896 Sep  9 11:03 modcpu.so
-rwxr-xr-x 1 root root 84118 Sep  9 11:03 moddisk.so
-rwxr-xr-x 1 root root 17896 Sep  9 11:03 modgstatus.so
-rwxr-xr-x 1 root root 84078 Sep  9 11:03 modload.so
-rwxr-xr-x 1 root root 85832 Sep  9 11:03 modmem.so
-rwxr-xr-x 1 root root 31655 Sep  9 11:03 modmulticpu.so
-rwxr-xr-x 1 root root 84480 Sep  9 11:03 modnet.so
-rwxr-xr-x 1 root root 83862 Sep  9 11:03 modproc.so
-rwxr-xr-x 1 root root 53954 Sep  9 11:03 modpython.so
-rwxr-xr-x 1 root root 85136 Sep  9 11:03 modsys.so

在gmond中加载这些模块, 才可以使用这些模块来采样监控数据(metric).

[root@db-172-16-3-221 ganglia]# cd /opt/ganglia-core-3.6.0/etc/
[root@db-172-16-3-221 etc]# ll
total 24
drwxr-xr-x 2 root root 4096 Sep  9 11:26 conf.d
-rw-r--r-- 1 root root 7973 Sep 10 17:37 gmetad.conf
-rw-r--r-- 1 root root 9157 Sep 15 09:31 gmond.conf
[root@db-172-16-3-221 etc]# less gmond.conf
/* Each metrics module that is referenced by gmond must be specified and
   loaded. If the module has been statically linked with gmond, it does
   not require a load path. However all dynamically loadable modules must
   include a load path. */
modules {
  module {
    name = "core_metrics"
  }
  module {
    name = "cpu_module"
    path = "modcpu.so"
  }
  module {
    name = "disk_module"
    path = "moddisk.so"
  }
  module {
    name = "load_module"
    path = "modload.so"
  }
  module {
    name = "mem_module"
    path = "modmem.so"
  }
  module {
    name = "net_module"
    path = "modnet.so"
  }
  module {
    name = "proc_module"
    path = "modproc.so"
  }
  module {
    name = "sys_module"
    path = "modsys.so"
  }
}

以上模块都是C写的, 为了提高开发效率, ganglia metric 扩展还支持php, python, perl等语言的接口.

在编译ganglia时, 需要指定 : 

[root@db-172-16-3-221 etc]# cd /opt/soft_bak/ganglia-3.6.0
[root@db-172-16-3-221 ganglia-3.6.0]# ./configure --help
  --enable-perl           include mod_perl and support for metric modules written in perl
  --enable-php            include mod_php and support for metric modules written in php

  --with-python=PATH      Specify prefix for python or full path to interpreter
  --with-perl=PATH        Specify prefix for perl or full path to interpreter
  --with-php=PATH         Specify prefix for php or full path to php-config

默认是支持python扩展的, 如下, 我们可以看到modpython.so动态链接库.

[root@db-172-16-3-221 ~]# cd /opt/ganglia-core-3.6.0/
[root@db-172-16-3-221 ganglia-core-3.6.0]# cd lib64/
[root@db-172-16-3-221 lib64]# cd ganglia/
[root@db-172-16-3-221 ganglia]# ll
total 700
...
-rwxr-xr-x 1 root root 53954 Sep  9 11:03 modpython.so
...

注意modpython.so, 显然这个模块也是C写的, 这个模块并不是直接用来采样监控数据(metric), 而是用来解释python脚本的, 并且被解释和执行的python脚本要按照一定的规则来定义.

例子可以参考github里面ganglia的gmond_python_modules项目.

https://github.com/ganglia/gmond_python_modules

下面简单的讲一下如何使用python来扩展metric采样.

一. 配置gmond以支持python metric 模块.

首先要写一个配置文件, 配置modpython.so以及存放.py脚本的路径, 存放.py脚本对应的.pyconf配置文件的路径.

[root@db-172-16-3-221 etc]# cd /opt/ganglia-core-3.6.0/etc/conf.d/
[root@db-172-16-3-221 etc]# vi modpython.conf
/*
  params - path to the directory where mod_python
           should look for python metric modules

  the "pyconf" files in the include directory below
  will be scanned for configurations for those modules
*/
modules {
  module {
    name = "python_module"
    path = "modpython.so"
    params = "/opt/ganglia-core-3.6.0/lib64/ganglia/python_modules"   # 存放.py脚本的目录
  }
}

include ("/opt/ganglia-core-3.6.0/etc/conf.d/*.pyconf")     # 存放.pyconf配置文件的目录.

每一个python metric 模块, 都需要2个文件, 一个.py脚本文件, 一个.pyconf配置文件. 

例如 : 

postgres.py脚本以及对应的postgres.pyconf配置文件. 

将modpython.conf这个配置文件加入gmond.conf

[root@db-172-16-3-221 etc]# vi /opt/ganglia-core-3.6.0/etc/gmond.conf
include ("/opt/ganglia-core-3.6.0/etc/conf.d/*.conf")

二. 编写python metric 模块

1. 编写.py脚本

.py脚本需要按照ganglia的协定编写, 必须包含3类函数 : 

metric_init(params), metric_handler(name), and metric_cleanup().

metric_init(params) 函数, 有且仅有一个参数params, 这个函数在模块初始化时调用, 主要目的是构造metric的定义字典. (当然, 你也可以在初始化过程执行你想要的动作)

params参数是字典类型, 字典的内容取自.py脚本对应的.pyconf配置文件中module section定义的param/value值. 例如 : 

gmond_python_modules / postgresql / conf.d / postgres.pyconf

modules {
   module {
     name = "postgres"
     language = "python"
     param host {
       value = "hostname_goes_here"  # 这里需要填写真实的主机名或IP, 例如127.0.0.1
     }
     param port {
       value = "port_goes_here"    # 这里需要填写真实的postgresql监听端口, 如5432
     }
     param dbname {
       value = "database_goes_here"   # 这里需要填写真实的数据库名, 如postgres
     }
     param username {
       value = "username_goes_here"    # 这里需要填写真实的用户名
     }
     param password {
       value = "password_goes_here"   # 这里需要填写真实的密码
     }
   }
}

以上.pyconf配置文件中定义的param有host, port, dbname, username, password. 

那么在metric_init(params)函数中可以这么来用:

# Metric descriptors are initialized here
def metric_init(params):
    HOST = str(params.get('host'))
    PORT = str(params.get('port'))
    DB = str(params.get('dbname'))
    USER = str(params.get('username'))
    PASSWORD = str(params.get('password'))

    global pgdsn
    pgdsn = "dbname=" + DB + " host=" + HOST + " user=" + USER + " port=" + PORT + " password=" + PASSWORD

鉴于以上情形, postgresql python metric module for ganglia 需要访问数据库, 所以最好在数据库本地服务器安装gmond , 并且新建一个超级用户, 配置该超级用户只能从127.0.0.1访问, 其他不允许访问.

例如 : 

=> create role digoal superuser login encrypted password 'digoal';
vi $PGDATA/pg_hba.conf
host all digoal 127.0.0.1/32 trust     # 允许本地无密码访问
host all digoal 0.0.0.0/0 reject    #  不允许从任何来源访问

metric 定义字典是什么意思呢? 来看看例子 : 

  descriptors = [
        {'name':'Pypg_idle_sessions','units':'Sessions','slope':'both','description':'PG Idle Sessions'},
        {'name':'Pypg_active_sessions','units':'Sessions','slope':'both','description':'PG Active Sessions'},
        {'name':'Pypg_idle_in_transaction_sessions','units':'Sessions','slope':'both','description':'PG Idle In Transaction Sessions'},
        {'name':'Pypg_waiting_sessions','units':'Sessions','slope':'both','description':'PG Waiting Sessions Blocked'},
        {'name':'Pypg_longest_xact','units':'Seconds','slope':'both','description':'PG Longest Transaction in Seconds'},
        {'name':'Pypg_longest_query','units':'Seconds','slope':'both','description':'PG Longest Query in Seconds'},
        {'name':'Pypg_locks_accessexclusive','units':'Locks','slope':'both','description':'PG AccessExclusive Locks read write blocking'},
        {'name':'Pypg_locks_otherexclusive','units':'Locks','slope':'both','description':'PG Exclusive Locks write blocking'},
        {'name':'Pypg_locks_shared','units':'Locks','slope':'both','description':'PG Shared Locks NON blocking'},
        {'name':'Pypg_bgwriter_checkpoints_timed','units':'checkpoints','slope':'positive','description':'PG scheduled checkpoints'},
        {'name':'Pypg_bgwriter_checkpoints_req','units':'checkpoints','slope':'positive','description':'PG unscheduled checkpoints'},
        {'name':'Pypg_bgwriter_checkpoint_write_time','units':'ms','slope':'positive','description':'PG time to write checkpoints to disk'},
        {'name':'Pypg_bgwriter_checkpoint_sync_time','units':'checkpoints','slope':'positive','description':'PG time to sync checkpoints to disk'},
        {'name':'Pypg_bgwriter_buffers_checkpoint','units':'buffers','slope':'positive','description':'PG number of buffers written during checkpoint'},
        {'name':'Pypg_bgwriter_buffers_clean','units':'buffers','slope':'positive','description':'PG number of buffers written by the background writer'},
        {'name':'Pypg_bgwriter_buffers_backend','units':'buffers','slope':'positive','description':'PG number of buffers written directly by a backend'},
        {'name':'Pypg_bgwriter_buffers_alloc','units':'buffers','slope':'positive','description':'PG number of buffers allocated'},
        {'name':'Pypg_transactions','units':'xacts','slope':'positive','description':'PG Transactions'},
        {'name':'Pypg_inserts','units':'tuples','slope':'positive','description':'PG Inserts'},
        {'name':'Pypg_updates','units':'tuples','slope':'positive','description':'PG Updates'},
        {'name':'Pypg_deletes','units':'tuples','slope':'positive','description':'PG Deletes'},
        {'name':'Pypg_reads','units':'tuples','slope':'positive','description':'PG Reads'},
        {'name':'Pypg_blks_diskread','units':'blocks','slope':'positive','description':'PG Blocks Read from Disk'},
        {'name':'Pypg_blks_memread','units':'blocks','slope':'positive','description':'PG Blocks Read from Memory'},
        {'name':'Pypg_tup_seqscan','units':'tuples','slope':'positive','description':'PG Tuples sequentially scanned'},
        {'name':'Pypg_tup_idxfetch','units':'tuples','slope':'positive','description':'PG Tuples fetched from indexes'},
        {'name':'Pypg_hours_since_last_vacuum','units':'hours','slope':'both','description':'PG hours since last vacuum'},
        {'name':'Pypg_hours_since_last_analyze','units':'hours','slope':'both','description':'PG hours since last analyze'}]

    for d in descriptors:
        # Add default values to dictionary, 本例所有metric对应的call_back函数都是同一个, 值类型都是uint
        d.update({'call_back': metric_handler, 'time_max': 90, 'value_type': 'uint', 'format': '%d', 'groups': 'Postgres'})

    return descriptors

metric_init(params)函数返回的metric定义字典结构, 包含了以下基本信息(基本信息必须包含)

(还可以包含其他内容, 如SPOOF要用到的SPOOF_HOST, SPOOF_NAME等, 还有GROUP等(group用于gweb对图形分组)) : 

'name': '<name>',                            #Name used in configuration
'call_back': <handler_function>,             #Call back function queried by gmond
'time_max': int(<time_max>),                 #Maximum metric value gathering interval
'value_type': '<data_type>',                 #Data type (string, uint, float, double)
'units': '<label>',                          #Units label
'slope': '<slope_type>',                     #Slope ('zero' constant values, 'both' numeric values)
'format': '<format>',                        #String formatting ('%s', '%u','%f')
'description': '<description>'}              #Free form metric description

以上即每个metric都必须包含的基本信息, 

在对metric进行采样时, 会调用call_back指定的函数, call_back函数的参数是metric字典里的 name字段的值.

接下来要说的是metric_handler(name)函数, 在对metric进行采样时, 会调用call_back指定的函数, call_back函数的参数是metric字典里的 name字段的值.  

例如 : 

metric_handler(Pypg_hours_since_last_analyze)

我们上面的例子中, 每个metric用的都是同一个函数metric_handler : 

    for d in descriptors:
        # Add default values to dictionary
        d.update({'call_back': metric_handler, 'time_max': 90, 'value_type': 'uint', 'format': '%d', 'groups': 'Postgres'})

这个函数的定义如下 : 

# Metric handler uses dictionary pg_metrics keys to return values from queries based on metric name
def metric_handler(name):
    pg_metrics = pg_metrics_queries()
    return int(pg_metrics[name])

每个metric被采样时都会触发这个函数, 也就是说都会调用pg_metrics_queries(), 所以重复工作比较多. 

这个module的作者显然是偷懒了, 应该为每个metric写一个call back函数比较好(或者这个pg_metrics_queries函数加个参数, 判断一下当前是那个metric触发的), 当然, 如果这些查询的效率很高的话, 重复就重复吧.(图省事)

import psycopg2
import psycopg2.extras
import syslog
import functools
import time

# Cache for postgres query values, this prevents opening db connections for each metric_handler callback
class Cache(object):
    def __init__(self, expiry):
        self.expiry = expiry
        self.curr_time = 0
        self.last_time = 0
        self.last_value = None

    def __call__(self, func):
        @functools.wraps(func)
        def deco(*args, **kwds):
            self.curr_time = time.time()
            if self.curr_time - self.last_time > self.expiry:
                self.last_value = func(*args, **kwds)
                self.last_time = self.curr_time
            return self.last_value
        return deco

# Queries update the pg_metrics dict with their values based on cache interval
@Cache(60)
def pg_metrics_queries():
    pg_metrics = {}
    db_conn = psycopg2.connect(pgdsn)
    db_curs = db_conn.cursor()

    # single session state query avoids multiple scans of pg_stat_activity
    # state is a different column name in postgres 9.2, previous versions will have to update this query accordingly
    db_curs.execute(
        'select state, waiting, \
        extract(epoch from current_timestamp - xact_start)::int, \
        extract(epoch from current_timestamp - query_start)::int from pg_stat_activity;')
    results = db_curs.fetchall()
    active = 0
    idle = 0
    idleintxn = 0
    waiting = 0
    active_results = []
    for state, waiting, xact_start_sec, query_start_sec in results:
        if state == 'active':
            active = int(active + 1)
            # build a list of query start times where query is active
            active_results.append(query_start_sec)
        if state == 'idle':
            idle = int(idle + 1)
        if state == 'idle in transaction':
            idleintxn = int(idleintxn + 1)
        if waiting == True:
            waitingtrue = int(waitingtrue + 1)

    # determine longest transaction in seconds
    sorted_by_xact = sorted(results, key=lambda tup: tup[2], reverse=True)
    longest_xact_in_sec = (sorted_by_xact[0])[2]

    # determine longest active query in seconds
    sorted_by_query = sorted(active_results, reverse=True)
    longest_query_in_sec = sorted_by_query[0]

    pg_metrics.update(
        {'Pypg_idle_sessions':idle,
        'Pypg_active_sessions':active,
        'Pypg_waiting_sessions':waiting,
        'Pypg_idle_in_transaction_sessions':idleintxn,
        'Pypg_longest_xact':longest_xact_in_sec,
        'Pypg_longest_query':longest_query_in_sec})

    # locks query
    db_curs.execute('select mode, locktype from pg_locks;')
    results = db_curs.fetchall()
    accessexclusive = 0
    otherexclusive = 0
    shared = 0
    for mode, locktype in results:
        if (mode == 'AccessExclusiveLock' and locktype != 'virtualxid'):
            accessexclusive = int(accessexclusive + 1)
        if (mode != 'AccessExclusiveLock' and locktype != 'virtualxid'):
            if 'Exclusive' in mode:
                otherexclusive = int(otherexclusive + 1)
        if ('Share' in mode and locktype != 'virtualxid'):
            shared = int(shared + 1)
    pg_metrics.update(
        {'Pypg_locks_accessexclusive':accessexclusive,
        'Pypg_locks_otherexclusive':otherexclusive,
        'Pypg_locks_shared':shared})

    # background writer query returns one row that needs to be parsed
    db_curs.execute(
        'select checkpoints_timed, checkpoints_req, checkpoint_write_time, \
        checkpoint_sync_time, buffers_checkpoint, buffers_clean, \
        buffers_backend, buffers_alloc from pg_stat_bgwriter;')
    results = db_curs.fetchall()
    bgwriter_values = results[0]
    checkpoints_timed = int(bgwriter_values[0])
    checkpoints_req = int(bgwriter_values[1])
    checkpoint_write_time = int(bgwriter_values[2])
    checkpoint_sync_time = int(bgwriter_values[3])
    buffers_checkpoint = int(bgwriter_values[4])
    buffers_clean = int(bgwriter_values[5])
    buffers_backend = int(bgwriter_values[6])
    buffers_alloc = int(bgwriter_values[7])
    pg_metrics.update(
        {'Pypg_bgwriter_checkpoints_timed':checkpoints_timed,
        'Pypg_bgwriter_checkpoints_req':checkpoints_req,
        'Pypg_bgwriter_checkpoint_write_time':checkpoint_write_time,
        'Pypg_bgwriter_checkpoint_sync_time':checkpoint_sync_time,
        'Pypg_bgwriter_buffers_checkpoint':buffers_checkpoint,
        'Pypg_bgwriter_buffers_clean':buffers_clean,
        'Pypg_bgwriter_buffers_backend':buffers_backend,
        'Pypg_bgwriter_buffers_alloc':buffers_alloc})

    # database statistics returns one row that needs to be parsed
    db_curs.execute(
        'select (sum(xact_commit) + sum(xact_rollback)), sum(tup_inserted), \
        sum(tup_updated), sum(tup_deleted), (sum(tup_returned) + sum(tup_fetched)), \
        sum(blks_read), sum(blks_hit) from pg_stat_database;')
    results = db_curs.fetchall()
    pg_stat_db_values = results[0]
    transactions = int(pg_stat_db_values[0])
    inserts = int(pg_stat_db_values[1])
    updates = int(pg_stat_db_values[2])
    deletes = int(pg_stat_db_values[3])
    reads = int(pg_stat_db_values[4])
    blksdisk = int(pg_stat_db_values[5])
    blksmem = int(pg_stat_db_values[6])
    pg_metrics.update(
        {'Pypg_transactions':transactions,
        'Pypg_inserts':inserts,
        'Pypg_updates':updates,
        'Pypg_deletes':deletes,
        'Pypg_reads':reads,
        'Pypg_blks_diskread':blksdisk,
        'Pypg_blks_memread':blksmem})

    # table statistics returns one row that needs to be parsed
    db_curs.execute(
        'select sum(seq_tup_read), sum(idx_tup_fetch), \
        extract(epoch from now() - min(last_vacuum))::int/60/60, \
        extract(epoch from now() - min(last_analyze))::int/60/60 \
        from pg_stat_all_tables;')
    results = db_curs.fetchall()
    pg_stat_table_values = results[0]
    seqscan = int(pg_stat_table_values[0])
    idxfetch = int(pg_stat_table_values[1])
    hours_since_vacuum = int(pg_stat_table_values[2])
    hours_since_analyze = int(pg_stat_table_values[3])
    pg_metrics.update(
        {'Pypg_tup_seqscan':seqscan,
        'Pypg_tup_idxfetch':idxfetch,
        'Pypg_hours_since_last_vacuum':hours_since_vacuum,
        'Pypg_hours_since_last_analyze':hours_since_analyze})

    db_curs.close()
    return pg_metrics

以上pg_metrics字典其实已经包含了所有metric name对应的value , 所以在metric对应的call back函数直接返回其值int(pg_metrics[name])即可.

def metric_handler(name):
    pg_metrics = pg_metrics_queries()
    return int(pg_metrics[name])

最后一个必须的函数是metric_cleanup(), 在gmond关闭时调用它. 一般用于关闭打开的文件, 断开网络等.

This function will be called once when gmond is shutting down. The cleanupfunction
should include any code that is required to close files, disconnection from the network,
or  any  other  type  of  clean  up  functionality.  Like  the  metric_init function,  the
cleanupfunction must be called metric_cleanupand must not take any parameters. In
addition, the cleanupfunction must not return a value.

例如 : 

# ganglia requires metric cleanup
def metric_cleanup():
    '''Clean up the metric module.'''
    pass

2. 编写.py脚本对应的.pyconf配置文件.  注意这个配置文件的位置需对应在modpython.conf中配置的位置.

文件内容和gmond.conf其实差别不大(除了modules section).

例如postgres.pyconf

modules {
   module {
     name = "postgres"
     language = "python"
     param host {
       value = "hostname_goes_here"
     }
     param port {
       value = "port_goes_here"
     }
     param dbname {
       value = "database_goes_here"
     }
     param username {
       value = "username_goes_here"
     }
     param password {
       value = "password_goes_here"
     }
   }
}

collection_group {
   collect_every = 120
   time_threshold = 120
   metric {
     name = "Pypg_idle_sessions"
     title = "Postgres Idle Sessions"
     value_threshold = 0
   }
   metric {
     name = "Pypg_idle_in_transaction_sessions"
     title = "Postgres Idle In Transaction Sessions"
     value_threshold = 0
   }
   metric {
     name = "Pypg_active_sessions"
     title = "Postgres Active Sessions"
     value_threshold = 0
   }
   metric {
     name = "Pypg_hours_since_last_vacuum"
     title = "Postgres Hours Since Last Vacuum"
     value_threshold = 0
   }
   metric {
     name = "Pypg_hours_since_last_analyze"
     title = "Postgres Hours Since Last Analyze"
     value_threshold = 0
   }
   metric {
     name = "Pypg_waiting_sessions"
     title = "Postgres Waiting Sessions blocked"
     value_threshold = 0
   }
   metric {
     name = "Pypg_locks_accessexclusive"
     title = "Postgres AccessExclusive Locks read write blocking"
     value_threshold = 0
   }
   metric {
     name = "Pypg_locks_otherexclusive"
     title = "Postgres Exclusive Locks write blocking"
     value_threshold = 0
   }
   metric {
     name = "Pypg_locks_shared"
     title = "Postgres Shared Locks NON blocking"
     value_threshold = 0
   }
   metric {
     name = "Pypg_longest_xact"
     title = "Postgres Longest Transaction in Minutes"
     value_threshold = 0
   }
   metric {
     name = "Pypg_longest_query"
     title = "Postgres Longest Active Query in Minutes"
     value_threshold = 0
   }
   metric {
     name = "Pypg_transactions"
     title = "Postgres Transactions"
     value_threshold = 0
   }
   metric {
     name = "Pypg_inserts"
     title = "Postgres Inserts"
     value_threshold = 0
   }
   metric {
     name = "Pypg_updates"
     title = "Postgres Updates"
     value_threshold = 0
   }
   metric {
     name = "Pypg_deletes"
     title = "Postgres Deletes"
     value_threshold = 0
   }
   metric {
     name = "Pypg_reads"
     title = "Postgres Reads"
     value_threshold = 0
   }
   metric {
     name = "Pypg_blks_diskread"
     title = "Postgres Blocks Read From Disk"
     value_threshold = 0
   }
   metric {
     name = "Pypg_blks_memread"
     title = "Postgres Blocks Read From Memory Buffer Cache"
     value_threshold = 0
   }
   metric {
     name = "Pypg_tup_seqscan"
     title = "Postgres Tuples Read From Table Sequence Scans"
     value_threshold = 0
   }
   metric {
     name = "Pypg_tup_idxfetch"
     title = "Postgres Tuples Fetched From Indexes"
     value_threshold = 0
   }
   metric {
     name = "Pypg_bgwriter_checkpoints_timed"
     title = "Postgres Scheduled Checkpoints"
     value_threshold = 0
   }
   metric {
     name = "Pypg_bgwriter_checkpoints_req"
     title = "Postgres Unscheduled Checkpoints"
     value_threshold = 0
   }
   metric {
     name = "Pypg_bgwriter_checkpoint_write_time"
     title = "Postgres Time to Write Checkpoints to Disk"
     value_threshold = 0
   }
   metric {
     name = "Pypg_bgwriter_checkpoint_sync_time"
     title = "Postgres Time to Sync Checkpoints to Disk"
     value_threshold = 0
   }
   metric {
     name = "Pypg_bgwriter_buffers_checkpoint"
     title = "Postgres Buffers Written During Checkpoint"
     value_threshold = 0
   }
   metric {
     name = "Pypg_bgwriter_buffers_clean"
     title = "Postgres Buffers Written By Background Writer"
     value_threshold = 0
   }
   metric {
     name = "Pypg_bgwriter_buffers_backend"
     title = "Postgres Buffers Written Directly By Backend"
     value_threshold = 0
   }
   metric {
     name = "Pypg_bgwriter_buffers_alloc"
     title = "Postgres Number Of Buffers Allocated"
     value_threshold = 0
   }
}

三. 调试 python metric 模块. 这部分内容也写在.py脚本里面. 用以输出metric的值

例如 : 

# this code is for debugging and unit testing
if __name__ == '__main__':
    descriptors = metric_init({"host":"hostname_goes_here","port":"port_goes_here","dbname":"database_name_goes_here","username":"username_goes_here","password":"password_goes_here"})
    while True:
        for d in descriptors:
            v = d['call_back'](d['name'])
            print 'value for %s is %u' % (d['name'],  v)
        time.sleep(5)

四. 配置和部署python metric 模块

前面已经讲过了, 需要配置gmond.conf以及.pyconf配置文件.

[参考]
1. http://blog.163.com/digoal@126/blog/static/16387704020148345219914/

2. http://blog.163.com/digoal@126/blog/static/163877040201481835916946/

3. http://blog.163.com/digoal@126/blog/static/163877040201481843417188/

4. https://github.com/ganglia/gmond_python_modules/blob/master/postgresql/conf.d/postgres.pyconf

5. https://github.com/ganglia/gmond_python_modules/blob/master/postgresql/python_modules/postgres.py

6. https://github.com/ganglia/gmond_python_modules

时间: 2024-10-30 07:26:30

ganglia metric extended by mod_python的相关文章

ganglia metric extended by gmetric command line tool

上一篇文章简单的介绍了一下metric的相关知识 :  http://blog.163.com/digoal@126/blog/static/163877040201481835916946/ ganglia gmond实际上是将metric collection模块化来处理的, 包括自带的metric, 也是模块化来加载的, 所以非常适合扩展. 例如 :  [root@db-172-16-3-221 cpu]# cd /opt/ganglia-core-3.6.0/ [root@db-172-

ganglia metric introduce

本文主要介绍一下ganglia metric的概念, 为以后更好的了解metric的扩展. metric即监控项, gmond自带的核心模块中, 包含了7类监控项目, 共计40几个监控项. 使用gmond -m可以查看所有的监控项. 我们先来了解一下metric的数据结构 :  ganglia-3.6.0/lib/gm_protocol.h struct Ganglia_25metric { int key; char *name; // metric name, 一台主机的metric nam

ganglia man page : gmond gmetad gmetad.py gmetric gstat gmond.conf

gmond GMOND(1) User Commands GMOND(1) NAME gmond - manual page for Ganglia Monitor Daemon SYNOPSIS gmond [OPTIONS]... DESCRIPTION The Ganglia Monitoring Daemon (gmond) listens to the cluster message channel, stores the data in-memory and when request

ganglia customized module : postgresql module

前面我们简单的聊了一下ganglia的架构, 以及它和其他监控软件如nagios, zabbix对比的优缺点. gmond只整合了一些常用的监控metric, 如果需要监控不在gmond范围内的metric, 那么ganglia提供了一个接口, 可以用于扩展. 例如你可以用C/C++写扩展模块, 也可以用python写扩展模块, 如果你不想写扩展模块, 也可以使用gmetric直接向send channel发送自定义的metric数据. 本文将拿一个开源的ganglia module for p

ganglia - distributed monitor system

传统的监控系统, 通常采用agent+server的方式, agent负责收集监控信息, 主动或被动发送给server, server负责向agent请求监控数据(agent被动), server和agent都通常使用TCP来进行连接.  传统监控的主要弊端, 当被监控的主机很多的情况下, server端的压力会很大, 例如要监控2万台主机的30个监控项, 就有60万个监控数据要从agent收集, 假设每分钟收集一次监控数据, 每秒需要上千次的metric get请求.  ganglia的设计思

ganglia Spoof with Modules or gmetric

spoof的作用是篡改metric包里主机IP, 主机名, metric名的信息. 例如我们要在A主机监控其他主机或应用, 同时通过A主机发送到网络上的metric接收者, 正常情况下, metric发出去会携带A主机的IP, 主机名. 那么就导致接收方会按照IP将信息写入A主机对应的RRD文件, 在gweb上显示时, 也会显示在A主机的监控项里面. 但实际上我们希望它写入其他主机对应的rrd文件. 如图1, 在不使用spoof时, 如果从a主机将b,c,d的监控数据发出去, 会写入a主机IP对

ganglia mtu metric BUG? on CentOS 6.x x64

今天在添加mtu的metric时, 发现一个ganglia bug, MTU值获取的应该是所有UP端口的最小MTU值, 但是实际上得到的结果并非如此. gmond.conf配置如下 :  gmond.conf collection_group { collect_once = yes time_threshold = 1200 metric { name = "mtu" title = "MTU" } 所有端口的MTU,  # ifconfig -a|grep MT

【工具】ganglia 监控技术分析

    Ganglia是一个分布式的监控工具,用来对Grid和Cluster上面的节点进行监控,利用它提供的web界面可以看到每个节点状态,并且可以输出图形化的表示. Ganglia 是 UC Berkeley 发起的一个开源监视项目,设计用于测量数以千计的节点.每台计算机都运行一个收集和发送度量数据(如处理器速度.内存使用量等)的名为 gmond 的守护进程.它将从操作系统和指定主机中收集.接收所有度量数据的主机可以显示这些数据并且可以将这些数据的精简表单传递到层次结构中.正因为有这种层次结构

集群监控系统Ganglia应用案例

原创作品,允许转载,转载时请务必以超链接形式标明文章 原始出处 .作者信息和本声明.否则将追究法律责任.http://chenguang.blog.51cto.com/350944/1330114 集群监控系统Ganglia应用案例 --我们把集群系统投入生产环境后,这时就需要一套可视化的工具来监视集群系统,这将有助于我们迅速地了解机群的整体配置情况,准确地把握机群各个监控节点的信息,全面地察看监控节点的性能指标,使机群系统具有较高的管理性.监视系统的主要目标是从各个监控节点采集监控信息,如CP