ganglia customized module : postgresql module

前面我们简单的聊了一下ganglia的架构, 以及它和其他监控软件如nagios, zabbix对比的优缺点.

gmond只整合了一些常用的监控metric, 如果需要监控不在gmond范围内的metric, 那么ganglia提供了一个接口, 可以用于扩展.

例如你可以用C/C++写扩展模块, 也可以用python写扩展模块, 如果你不想写扩展模块, 也可以使用gmetric直接向send channel发送自定义的metric数据.

本文将拿一个开源的ganglia module for postgresql为例, 结合PostgreSQL常用的监控指标来实现ganglia采样postgresql监控指标数据.

这个模块的代码请参见 : 

https://github.com/ganglia/gmond_python_modules

https://github.com/mentalblock/ganglia_postgresql/tree/master/python_modules

https://github.com/elecnix/postgres_gmetric

postgresql python module的详解, 参考

http://blog.163.com/digoal@126/blog/static/1638770402014823102256805/

这个模块提到的监控项有 : 

# Create your queries here. Keys whose names match those defined in the default
# set are overridden. Any additional key-value pairs (i.e. query) will not be
# added to the Ganglia metric definition but can be useful for data purposes.
metric_defs = {
    "pg_backends_waiting": {
        "description": "Number of postgres backends that are waiting",
        "units": "backends",
        "query": "SELECT count(*) AS backends_waiting FROM " + \
            "pg_stat_activity WHERE waiting = 't';"
    },
    "pg_database_size": {
        "description": "Total size of all databases in bytes",
        "value_type": "double",
        "format": "%.0f",
        "units": "bytes",
        "query": "SELECT sum(pg_database_size(d.oid)) AS " + \
            "size_database FROM pg_database d ORDER BY 1 DESC;"
    },
    "pg_idx_blks_read": {
        "description": "Total index blocks read",
        "slope": "positive",
        "units": "blocks",
        "query": "SELECT sum(idx_blks_read) AS idx_blks_read " + \
            "FROM pg_statio_all_indexes;"
    },
    "pg_idx_blks_hit": {
        "description": "Total index blocks hit",
        "slope": "positive",
        "units": "blocks",
        "query": "SELECT sum(idx_blks_hit) AS idx_blks_hit " + \
            "FROM pg_statio_all_indexes;"
    },
    "pg_locks": {
        "description": "Number of locks held",
        "units": "locks",
        "query": "SELECT count(*) FROM pg_locks;"
    },
    "pg_query_time_idle_in_txn": {
        "description": 'Age of longest _idle in transaction_ transaction',
        "units": "seconds",
        "query": "SELECT COALESCE(max(COALESCE(ROUND(EXTRACT(epoch " + \
            "FROM now()-query_start)),0)),0) AS " + \
            "query_time_idle_in_txn FROM pg_stat_activity " + \
            "WHERE current_query = '% in transaction';"
    },
    "pg_max_idle_txn_time": {
        "description": "Age of longest idle transaction",
        "units": "seconds",
        "query": "SELECT COALESCE(max(COALESCE(ROUND(EXTRACT(epoch " + \
            "FROM now()-query_start)),0)),0) as query_time_max FROM " + \
             "pg_stat_activity WHERE current_query <> '<IDLE>';"
    },
    "pg_txn_time_max": {
        "description": "Age of longest transaction",
        "units": "seconds",
        "query": "SELECT max(COALESCE(ROUND(EXTRACT(epoch " + \
            "FROM now()-xact_start)),0)) as txn_time_max " + \
            "FROM pg_stat_activity WHERE xact_start IS NOT NULL;"
    },
    "pg_connections": {
        "description": "Number of connections",
        "units": "connctions",
        "query": "SELECT sum(numbackends) FROM pg_stat_database;"
    },
    "pg_wal_files": {
        "description": "number of wal files in pg_xlog directory",
        "units": "# wal files",
        "query": "SELECT count(*) AS wal_files FROM " + \
            "pg_ls_dir('pg_xlog') WHERE pg_ls_dir ~ E'^[0-9A-F]{24}$';"
    },
    "pg_xact_commit": {
        "description": "Transactions committed",
        "slope": "positive",
        "units": "transactions",
        "query": "SELECT sum(xact_commit) as xact_commit FROM " + \
            "pg_stat_database;",
    },
    "pg_xact_rollback": {
        "description": "Transactions rolled back",
        "slope": "positive",
        "units": "transactions",
        "query": "SELECT sum(xact_rollback) as xact_rollback FROM " + \
            "pg_stat_database;",
    },
    "pg_blks_read": {
        "description": "Blocks read",
        "slope": "positive",
        "units": "blocks",
        "query": "SELECT sum(blks_read) as blks_read FROM " + \
            "pg_stat_database;",
    },
    "pg_blks_hit": {
        "description": "Blocks hit",
        "slope": "positive",
        "units": "blocks",
        "query": "SELECT sum(blks_hit) as blks_hit FROM " + \
            "pg_stat_database;",
    },
    "pg_tup_returned": {
        "description": "Tuples returned",
        "slope": "positive",
        "units": "tuples",
        "query": "SELECT sum(tup_returned) as tup_returned FROM " + \
            "pg_stat_database;",
    },
    "pg_tup_fetched": {
        "description": "Tuples fetched",
        "slope"       : "positive",
        "units": "tuples",
        "query": "SELECT sum(tup_fetched) as tup_fetched FROM " + \
            "pg_stat_database;",
    },
    "pg_tup_inserted": {
        "description": "Tuples inserted",
        "slope": "positive",
        "units": "tuples",
        "query": "SELECT sum(tup_inserted) as tup_inserted FROM " + \
            "pg_stat_database;",
    },
    "pg_tup_updated": {
        "description": "Tuples updated",
        "slope": "positive",
        "units": "tuples",
        "query": "SELECT sum(tup_updated) as tup_updated FROM " + \
            "pg_stat_database;",
    },
    "pg_tup_deleted": {
        "description": "Tuples deleted",
        "slope": "positive",
        "units": "tuples",
        "query": "SELECT sum(tup_deleted) as tup_deleted FROM " + \
            "pg_stat_database;",
    },
    "pg_heap_blks_read": {
        "description": "Heap blocks read",
        "slope": "positive",
        "units": "blocks",
        "query": "SELECT sum(heap_blks_read) as heap_blks_read FROM " + \
            "pg_statio_all_tables;",
    },
    "pg_heap_blks_hit": {
        "description": "Heap blocks hit",
        "slope": "positive",
        "units": "blocks",
        "query": "SELECT sum(heap_blks_hit) as heap_blks_hit FROM " + \
            "pg_statio_all_tables;",
    },
    "pg_idx_blks_read_tbl": {
        "description": "Index blocks read",
        "slope": "positive",
        "units": "blocks",
        "query": "SELECT sum(idx_blks_read) as idx_blks_read_tbl FROM " + \
            "pg_statio_all_tables;",
    },
    "pg_idx_blks_hit_tbl": {
        "description": "Index blocks hit",
        "slope": "positive",
        "units": "blocks",
        "query": "SELECT sum(idx_blks_hit) as idx_blks_hit_tbl FROM " + \
            "pg_statio_all_tables;",
    },
    "pg_toast_blks_read": {
        "description": "Toast blocks read",
        "slope": "positive",
        "units": "blocks",
        "query": "SELECT sum(toast_blks_read) as toast_blks_read FROM " + \
            "pg_statio_all_tables;",
    },
    "pg_toast_blks_hit": {
        "description": "Toast blocks hit",
        "slope": "positive",
        "units": "blocks",
        "query": "SELECT sum(toast_blks_hit) as toast_blks_hit FROM " + \
            "pg_statio_all_tables;",
    },
    "pg_tidx_blks_read": {
        "description": "Toast index blocks read",
        "slope": "positive",
        "units": "blocks",
        "query": "SELECT sum(tidx_blks_read) as tidx_blks_read FROM " + \
            "pg_statio_all_tables;",
    },
    "pg_tidx_blks_hit": {
        "description": "Toast index blocks hit",
        "slope": "positive",
        "units": "blocks",
        "query": "SELECT sum(tidx_blks_hit) as tidx_blks_hit FROM " + \
            "pg_statio_all_tables;",
    },
}

除此以外, 还可以丰富一下监控项. 如下 : 

postgresql常见监控项 : 

http://blog.163.com/digoal@126/blog/static/163877040201412763135184/

1. 是否打开归档, 布尔逻辑值.

2. 是否打开autovacuum, 布尔逻辑值.

3. 数据库年龄, 数字, 大于2, 低于20亿(大约).

4. 连接数, 低于最大连接数

5. 提交和回滚的比例(问题修复后,手工清除统计信息pg_stat_reset(), 连接到对应的库执行).

6. standby延迟, 字节数, 延迟高说明网络或STANDBY的IO有问题. 

7. 锁等待, 锁等待时间, 长则说明业务或在SQL的处理效率上有问题. 

8. 长事务/空闲事务, 长事务一般可能是业务设计的问题. 

9. prepared事务, 长时间的PREPARED事务可能是业务层出了问题, 例如未及时提交, 或业务以及挂了, prepared事务变成了僵尸事务. 

10. 序列剩余量(每个库查询), 序列如果没有开启cycle, 那么需要注意是否耗尽的问题. 

11. 未使用的索引(每个库查询), 如果索引不常使用, 那么说明索引不如不建. 

以上监控指标都可以量化, 所以使用ganglia来监控是可以的.

常见巡检项 : 

http://blog.163.com/digoal@126/blog/static/1638770402014252816497/

[参考]

1. https://github.com/mentalblock/ganglia_postgresql

2. http://blog.163.com/digoal@126/blog/static/1638770402014833639454/

3. http://blog.163.com/digoal@126/blog/static/1638770402014252816497/

4. http://blog.163.com/digoal@126/blog/static/163877040201412763135184/

5. http://bucardo.org/wiki/Check_postgres

时间: 2024-11-03 21:30:31

ganglia customized module : postgresql module的相关文章

Java9新特性——module模块系统

官方文档:https://docs.oracle.com/javase/9/index.html 关于 java9的新特性,官方原文:https://docs.oracle.com/javase/9/whatsnew/toc.htm 这玩意就是一个列表,具体的技术细节需要根据官方文档挖一挖. modular-模块系统 java9的模块化,从一个独立的开源项目而来,名为Jigsaw. 项目官网:http://openjdk.java.net/projects/jigsaw/ 为什么要使用模块化 j

Es6系列之module and class

Ecmascript 6简称es6,是javascript下一代标准,还处在开发阶段,估计2014年底发布,有关更多浏览器对es6的支持情况,点击这里 今天说说es6里新增的Module和Class. Class 关于class其实前端已有太多的模拟了,因为js本身的弱类型决定了只要你有想法什么编程模式都可以模拟出来,跟class相关的oop模式早已在后端领域扎根了,前端class概念大多是通过function函数来实现,现在我们来看看es6提供了什么语法?,我们先以下例子来说明,这样比较直观

require、module、exports dojo中的三个特殊模块标识

查看dojo源码过程中,发现这三个模块名并不以dojo.dijit.dojox开头,在dojo加载器中属于特殊模块名. require 这是一个上下文智能的加载器. 我们通过dojoConfig配置了一个package:myApp,现在对myApp目录下的文件可以使用如下方式请求: // this is "myApp/topLevelHandlers" define(["dojo"], function(dojo){ dojo.connect(dojo.byId(&

Magento 自定义module

 创建Magento模块 由于我在做我自己的Magento项目,我将使用我自己的项目名"App". 然后,我们要创建以下目录结构 Java代码   app/code/local/App/Shopping/Block   app/code/local/App/Shopping/Controller //controllers基类   app/code/local/App/Shopping/controllers   app/code/local/App/Shopping/etc   ap

《JavaScript设计模式》——9.2 Module(模块)模式

9.2 Module(模块)模式 模块是任何强大应用程序架构中不可或缺的一部分,它通常能够帮助我们清晰地分离和组织项目中的代码单元. 在JavaScript中,有几种用于实现模块的方法,包括: 对象字面量表示法 Module模式 AMD模块 CommonJS模块 ECMAScript Harmony模块 我们稍后将在本书第11章探索后三种方法.Module模式在某种程度上是基于对象字面量,因此首先重新认识对象字面量是有意义的. 9.2.1 对象字面量 在对象字面量表示法中,一个对象被描述为一组包

JavaScript的Module模式编程深入分析_javascript技巧

基础知识 首先我们要大概了解一下Module模式(2007年由YUI的EricMiraglia在博客中提出),如果你已熟悉 Module 模式,可以跳过本部分,直接阅读"高级模式". 匿名函数闭包 匿名函数闭包是JavaScript最棒的特征,没有之一,是它让一切都成为了可能.现在我们来创建一个匿名函数然后立即执行.函数中所有的代码都是在一个闭包中执行的,闭包决定了在整个执行过程中这些代码的私有性和状态. 复制代码 代码如下: (function () { // ... all var

seajs1.3.0源码解析之module依赖有序加载_javascript技巧

这里是seajs loader的核心部分,有些IE兼容的部分还不是很明白,主要是理解各个模块如何依赖有序加载,以及CMD规范. 代码有点长,需要耐心看: 复制代码 代码如下: /** * The core of loader */ ;(function(seajs, util, config) { // 模块缓存 var cachedModules = {} // 接口修改缓存 var cachedModifiers = {} // 编译队列 var compileStack = [] // 模

浅谈angularjs module返回对象的坑(推荐)_AngularJS

通过将module中不同的部件拆分到不同的js文件中,在组装的时候发现module存在一个奇怪的问题,不知道是不是AngularJS的bug.至今没有找到解释. 问题是这样的,按照理解,angular.module('app.main', []);这样一句话相当于从app.main命名空间返回出一个app对象.那么,不论在任何js文件中,我通过该方法获得的app变量所储存的指针都应该指向唯一的一个堆内存,而这个内存中存储的就是这个app对象.这种操作在module的js文件,和controlle

Node.js Error: Cannot find module ‘node-static’错误解决办法

Cannot find module 问题 在测试WebRTC codelab的时候,运行codelab.complete.step1出错,抛出Error: module.js:340     throw err;           ^ Error: Cannot find module 'node-static'     at Function.Module._resolveFilename (module.js:338:15)     at Function.Module._load (m