zabbix template pg_monz for PostgreSQL

之前写过一些关于zabbix监控和ganglia监控的相关文章, 有兴趣的朋友可以参考

http://blog.163.com/digoal@126/blog/#m=0&t=1&c=fks_084075080085088066084084082095085080082075083081086071084

ganglia的应用场景有限, 强项是画图, 众多监控目标的实时收集采集等, 利用rrdtool特性可以在图形上做一些文章, 例如添加event区域, 图形属性聚合, compare等.

但是ganglia欠缺设置阈值告警的功能, 这类功能可以用如zabbix, nagios这样的监控软件来实现. 

PostgreSQL要监控哪些东西, 可以参考 : 

http://blog.163.com/digoal@126/blog/static/1638770402014252816497/

http://blog.163.com/digoal@126/blog/static/163877040201412763135184/

本文要说的是zabbix的一个postgresql监控模板, 

http://pg-monz.github.io/pg_monz/index-en.html

部署步骤如下:

1. 部署zabbix server端

http://blog.163.com/digoal@126/blog/static/1638770402014731111811804/

http://blog.163.com/digoal@126/blog/static/1638770402014811040957/

2. 部署zabbix proxy (可选)

http://blog.163.com/digoal@126/blog/static/16387704020148110173574/

3. 部署zabbix agent (建议在被监控的数据库服务器上安装, 使用127.0.0.1或unix socket连接数据库, 配置为trust或配置.pgpass)

http://blog.163.com/digoal@126/blog/static/163877040201481103924780/

4. 下载模板(zabbix server 端)

https://raw.githubusercontent.com/pg-monz/pg_monz/master/pg_monz/pg_monz_template.xml

(注意不要直接另存为, 最好是git clone , 或直接拷贝内容, 编译一个xml文件)

# git clone https://raw.githubusercontent.com/pg-monz/pg_monz

5. 导入模板(zabbix server 端)

在web界面点击, Configuration, Templates, Import


已导入 : 

6. 配置模板 : 

某些宏根据你的应用场景进行修改.


注意宏的用途, 例如PGSCRIPTDIR.

[root@db-172-16-3-221 pg_monz]# grep -r PGSCRIPTDIR *
pg_monz_template.xml:                    <key>db_table.list.discovery[{$PGHOST},{$PGPORT},{$PGROLE},{$PGDATABASE},{$PGSCRIPTDIR}]</key>
pg_monz_template.xml:                    <key>db.list.discovery[{$PGHOST},{$PGPORT},{$PGROLE},{$PGDATABASE},{$PGSCRIPTDIR}]</key>
pg_monz_template.xml:                    <macro>{$PGSCRIPTDIR}</macro>

显然是用于调用find_dbname.sh和find_dbname_table.sh的.

PGLOGDIR则是postgresql log的目录.

[root@db-172-16-3-221 pg_monz]# grep -r PGLOGDIR *
pg_monz_template.xml:                    <key>logrt["{$PGLOGDIR}/postgresql-.*\.log","PANIC|FATAL|ERROR|[Ee]rror"]</key>
pg_monz_template.xml:                    <macro>{$PGLOGDIR}</macro>

但是LOG目录可能不同的数据库配置不一样, 所以你可以在所有的数据库服务器建立一个软链接, 来解决这个问题.

例如 : 

ln -s $PGDATA/pg_log /var/pg_log
{$PGLOGDIR} = /var/pg_log  (统一使用这个)

PGROLE可能需要PG超级用户, 或者是可以执行本监控模板中所有SQL的用户.

如果需要密码的话, 建议配置.pgpass.

7. 下载监控脚本以及配置文件(zabbix agent 端)

https://github.com/pg-monz/pg_monz/blob/master/pg_monz/find_dbname.sh

https://github.com/pg-monz/pg_monz/blob/master/pg_monz/find_dbname_table.sh

https://github.com/pg-monz/pg_monz/blob/master/pg_monz/userparameter_pgsql.conf

脚本需要放到模板宏PGSCRIPTDIR的制定位置.

并配置可执行权限.

[root@db-172-16-3-221 pg_monz]# cp find_dbname.sh find_dbname_table.sh /usr/local/bin/
[root@db-172-16-3-221 pg_monz]# chown postgres:postgres /usr/local/bin/find_dbname*.sh
[root@db-172-16-3-221 pg_monz]# chmod 555 /usr/local/bin/find_dbname*.sh

简单的来看一个脚本: 

find_dbname.sh, 用来输出PostgreSQL的所有数据库 : 

[root@db-172-16-3-221 pg_monz]# cat find_dbname.sh
#!/bin/bash

# Get list of Database Name which you want to monitor.
# The default settings are excepted template databases(template0/template1).
#
# :Example
#
# If you want to monitor "foo" and "bar" databases, you set the GETDB as
# GETDB="select datname from pg_database where datname in ('foo','bar');"

GETDB="select datname from pg_database where datistemplate = 'f';"

for dbname in $(psql -h $1 -p $2 -U $3 -d $4 -t -c "${GETDB}"); do
    dblist="$dblist,"'{"{#DBNAME}":"'$dbname'"}'
done
echo '{"data":['${dblist#,}' ]}'

[root@db-172-16-3-221 pg_monz]# export PATH=/opt/pgsql/bin:$PATH
[root@db-172-16-3-221 pg_monz]# . ./find_dbname.sh 127.0.0.1 5432 postgres postgres
{"data":[{"{#DBNAME}":"postgres"},{"{#DBNAME}":"digoal"} ]}

find_dbname_table.sh, 用来输出数据库对应的表.

在得到数据库和表之后, 可以对数据库, 表进行监控查询.

8. 配置zabbix agent端.

8.1. 修改数据库的pg_log, 到/var/pg_log, (和PGLOGDIR一致即可).

另外需要确认其他的宏和被监控的数据库一致.

[root@db-172-16-3-221 pg_monz]# ln -s /data01/pgdata/pg_root/pg_log /var/

8.2. 可能需要添加.pgpass, 配合模板需要的连接, 不需要密码访问.

8.3. 将userparameter_pgsql.conf拷贝到zabbix agent配置目录.

[root@db-172-16-3-221 pg_monz]# cp userparameter_pgsql.conf /opt/zabbix/etc/zabbix_agentd.conf.d/

这里的监控项如下 : 

[root@db-172-16-3-221 zabbix_agentd.conf.d]# cat userparameter_pgsql.conf
# PostgreSQL user parameter

#
# Server specific examples
#
# Get the total number of committed transactions
UserParameter=psql.tx_committed[*],psql -h $1 -p $2 -U $3 -d $4 -t -c "select sum(xact_commit) from pg_stat_database"
# Get the total number of rolled back transactions
UserParameter=psql.tx_rolledback[*],psql -h $1 -p $2 -U $3 -d $4 -t -c "select sum(xact_rollback) from pg_stat_database"
# Max Connections
UserParameter=psql.server_maxcon[*],psql -h $1 -p $2 -U $3 -d $4 -t -c "show max_connections"
# PostgreSQL is running
UserParameter=psql.running[*],psql -h $1 -p $2 -U $3 -d $4 -t -c "select 1" > /dev/null 2>&1 ; echo $?

# Added by SRA OSS
# Get number of checkpoint count (by checkpoint_timeout)
UserParameter=psql.checkpoints_timed[*],psql -h $1 -p $2 -U $3 -d $4 -t -c "select checkpoints_timed from pg_stat_bgwriter"
# Get number of checkpoint count (by checkpoint_segments)
UserParameter=psql.checkpoints_req[*],psql -h $1 -p $2 -U $3 -d $4 -t -c "select checkpoints_req from pg_stat_bgwriter"
# Get the total number of connections
UserParameter=psql.server_connections[*],psql -h $1 -p $2 -U $3 -d $4 -t -c "select count(*) from pg_stat_activity;"
# Get the total number of active (on processing SQL) connections
UserParameter=psql.active_connections[*],psql -h $1 -p $2 -U $3 -d $4 -t -c "select count(state) from pg_stat_activity where state = 'active'"
# Get the total number of idle connections
UserParameter=psql.idle_connections[*],psql -h $1 -p $2 -U $3 -d $4 -t -c "select count(state) from pg_stat_activity where state = 'idle'"
# Get the total number of idle in transaction connections
UserParameter=psql.idle_tx_connections[*],psql -h $1 -p $2 -U $3 -d $4 -t -c "select count(state) from pg_stat_activity where state = 'idle in transaction'"
# Get the total number of lock-waiting connections
UserParameter=psql.locks_waiting[*],psql -h $1 -p $2 -U $3 -d $4 -t -c "select count(*) from pg_stat_activity where waiting = 't'"

# Get buffer information
UserParameter=psql.buffers_checkpoint[*],psql -h $1 -p $2 -U $3 -d $4 -t -c "select buffers_checkpoint from pg_stat_bgwriter"
UserParameter=psql.buffers_clean[*],psql -h $1 -p $2 -U $3 -d $4 -t -c "select buffers_clean from pg_stat_bgwriter"
UserParameter=psql.maxwritten_clean[*],psql -h $1 -p $2 -U $3 -d $4 -t -c "select maxwritten_clean from pg_stat_bgwriter"
UserParameter=psql.buffers_backend[*],psql -h $1 -p $2 -U $3 -d $4 -t -c "select buffers_backend from pg_stat_bgwriter"
UserParameter=psql.buffers_backend_fsync[*],psql -h $1 -p $2 -U $3 -d $4 -t -c "select buffers_backend_fsync from pg_stat_bgwriter"
UserParameter=psql.buffers_alloc[*],psql -h $1 -p $2 -U $3 -d $4 -t -c "select buffers_alloc from pg_stat_bgwriter"

# Get number of slow queries
UserParameter=psql.slow_queries[*],psql -h $1 -p $2 -U $3 -d $4 -t -c "select count(*) from pg_stat_activity where state = 'active' and now() - query_start > '$5 sec'::interval"
UserParameter=psql.slow_select_queries[*],psql -h $1 -p $2 -U $3 -d $4 -t -c "select count(*) from pg_stat_activity where state = 'active' and now() - query_start > '$5 sec'::interval and query ilike 'select%'"
UserParameter=psql.slow_dml_queries[*],psql -h $1 -p $2 -U $3 -d $4 -t -c "select count(*) from pg_stat_activity where state = 'active' and now() - query_start > '$5 sec'::interval and query ~* '^(insert|update|delete)'"

#
# Database specific examples
#
# Get the size of a Database (in bytes)
UserParameter=psql.db_size[*],psql -h $1 -p $2 -U $3 -d $4 -t -c "select pg_database_size('$5') from pg_database where datname = '$5'"
# Get number of active connections for a specified database
UserParameter=psql.db_connections[*],psql -h $1 -p $2 -U $3 -d $4 -t -c "select numbackends from pg_stat_database where datname = '$5'"
# Get number of tuples returned for a specified database
UserParameter=psql.db_returned[*],psql -h $1 -p $2 -U $3 -d $4 -t -c "select tup_returned from pg_stat_database where datname = '$5'"
# Get number of tuples fetched for a specified database
UserParameter=psql.db_fetched[*],psql -h $1 -p $2 -U $3 -d $4 -t -c "select tup_fetched from pg_stat_database where datname = '$5'"
# Get number of tuples inserted for a specified database
UserParameter=psql.db_inserted[*],psql -h $1 -p $2 -U $3 -d $4 -t -c "select tup_inserted from pg_stat_database where datname = '$5'"
# Get number of tuples updated for a specified database
UserParameter=psql.db_updated[*],psql -h $1 -p $2 -U $3 -d $4 -t -c "select tup_updated from pg_stat_database where datname = '$5'"
# Get number of tuples deleted for a specified database
UserParameter=psql.db_deleted[*],psql -h $1 -p $2 -U $3 -d $4 -t -c "select tup_deleted from pg_stat_database where datname = '$5'"
# Get number of committed/rolled back transactions for a specified database
UserParameter=psql.db_tx_committed[*],psql -h $1 -p $2 -U $3 -d $4 -t -c "select xact_commit from pg_stat_database where datname = '$5'"
UserParameter=psql.db_tx_rolledback[*],psql -h $1 -p $2 -U $3 -d $4 -t -c "select xact_rollback from pg_stat_database where datname = '$5'"

# Cache Hit Ratio
UserParameter=psql.cachehit_ratio[*],psql -h $1 -p $2 -U $3 -d $4 -t -c "SELECT round(blks_hit*100/(blks_hit+blks_read), 2) AS cache_hit_ratio FROM pg_stat_database WHERE datname = '$5' and blks_read > 0 union all select 0.00 AS cache_hit_ratio order by cache_hit_ratio desc limit 1"

# Added by SRA OSS
# Get number of temp files
UserParameter=psql.db_temp_files[*],psql -h $1 -p $2 -U $3 -d $4 -t -c "select temp_files from pg_stat_database where datname = '$5'"
# Get temp file size (in bytes)
UserParameter=psql.db_temp_bytes[*],psql -h $1 -p $2 -U $3 -d $4 -t -c "select temp_bytes from pg_stat_database where datname = '$5'"
# Get percentage of dead tuples of all tables for a specified database
UserParameter=psql.db_dead_tup_ratio[*],psql -h $1 -p $2 -U $3 -d $4 -t -c "select round(sum(n_dead_tup)*100/sum(n_live_tup+n_dead_tup), 2) as dead_tup_ratio from pg_stat_all_tables where n_live_tup > 0"
# Get number of deadlocks for a specified database (9.2 or later)
UserParameter=psql.db_deadlocks[*],psql -h $1 -p $2 -U $3 -d $4 -t -c "select deadlocks from pg_stat_database where datname = '$5'"

#
# Table specific examples
#
# Get table cache hit ratio of a specific table
UserParameter=psql.table_cachehit_ratio[*],psql -h $1 -p $2 -U $3 -d $4 -t -c "select round(heap_blks_hit*100/(heap_blks_hit+heap_blks_read), 2) as cache_hit_ratio from pg_statio_user_tables where schemaname = '$5' and relname = '$6' and heap_blks_read > 0 union all select 0.00 as cache_hit_ratio order by cache_hit_ratio desc limit 1"
# Get number of sequencial scan of a specific table
UserParameter=psql.table_seq_scan[*],psql -h $1 -p $2 -U $3 -d $4 -t -c "select seq_scan from pg_stat_user_tables where schemaname = '$5' and relname = '$6'"
# Get number of index scan of a specific table
UserParameter=psql.table_idx_scan[*],psql -h $1 -p $2 -U $3 -d $4 -t -c "select coalesce(idx_scan,0) from pg_stat_user_tables where schemaname = '$5' and relname = '$6'"
# Get number of vacuum count of a specific table
UserParameter=psql.table_vacuum_count[*],psql -h $1 -p $2 -U $3 -d $4 -t -c "select vacuum_count from pg_stat_user_tables where schemaname = '$5' and relname = '$6'"
# Get number of analyze count of a specific table
UserParameter=psql.table_analyze_count[*],psql -h $1 -p $2 -U $3 -d $4 -t -c "select analyze_count from pg_stat_user_tables where schemaname = '$5' and relname = '$6'"
# Get number of autovacuum count of a specific table
UserParameter=psql.table_autovacuum_count[*],psql -h $1 -p $2 -U $3 -d $4 -t -c "select autovacuum_count from pg_stat_user_tables where schemaname = '$5' and relname = '$6'"
# Get number of autoanalyze count of a specific table
UserParameter=psql.table_autoanalyze_count[*],psql -h $1 -p $2 -U $3 -d $4 -t -c "select autoanalyze_count from pg_stat_user_tables where schemaname = '$5' and relname = '$6'"

# Get number of tuples of a specific table
UserParameter=psql.table_n_tup_ins[*],psql -h $1 -p $2 -U $3 -d $4 -t -c "select n_tup_ins from pg_stat_user_tables where schemaname = '$5' and relname = '$6'"
UserParameter=psql.table_n_tup_upd[*],psql -h $1 -p $2 -U $3 -d $4 -t -c "select n_tup_upd from pg_stat_user_tables where schemaname = '$5' and relname = '$6'"
UserParameter=psql.table_n_tup_del[*],psql -h $1 -p $2 -U $3 -d $4 -t -c "select n_tup_del from pg_stat_user_tables where schemaname = '$5' and relname = '$6'"
UserParameter=psql.table_seq_tup_read[*],psql -h $1 -p $2 -U $3 -d $4 -t -c "select seq_tup_read from pg_stat_user_tables where schemaname = '$5' and relname = '$6'"
UserParameter=psql.table_idx_tup_fetch[*],psql -h $1 -p $2 -U $3 -d $4 -t -c "select coalesce(idx_tup_fetch,0) from pg_stat_user_tables where schemaname = '$5' and relname = '$6'"
UserParameter=psql.table_n_tup_hot_upd[*],psql -h $1 -p $2 -U $3 -d $4 -t -c "select n_tup_hot_upd from pg_stat_user_tables where schemaname = '$5' and relname = '$6'"
UserParameter=psql.table_n_live_tup[*],psql -h $1 -p $2 -U $3 -d $4 -t -c "select n_live_tup from pg_stat_user_tables where schemaname = '$5' and relname = '$6'"
UserParameter=psql.table_n_dead_tup[*],psql -h $1 -p $2 -U $3 -d $4 -t -c "select n_dead_tup from pg_stat_user_tables where schemaname = '$5' and relname = '$6'"

#
# Discovery Rule
#
# Database Discovery
UserParameter=db.list.discovery[*],$5/find_dbname.sh $1 $2 $3 $4
UserParameter=db_table.list.discovery[*],$5/find_dbname_table.sh $1 $2 $3 $4

8.4. 配置zabbix_agentd.conf (被动监控)

[root@db-172-16-3-221 etc]# cd /opt/zabbix/etc/
[root@db-172-16-3-221 etc]# ll
total 20
-rw-r--r-- 1 root root 2485 Sep  1 14:45 zabbix_agent.conf
drwxr-xr-x 2 root root 4096 Sep 22 14:24 zabbix_agent.conf.d
-rw-r--r-- 1 root root 7456 Sep  1 17:19 zabbix_agentd.conf
drwxr-xr-x 2 root root 4096 Sep  1 14:45 zabbix_agentd.conf.d

[root@db-172-16-3-221 etc]# vi zabbix_agentd.conf
Include=/opt/zabbix/etc/zabbix_agentd.conf.d/
PidFile=/tmp/zabbix_agentd.pid
LogFile=/tmp/zabbix_agentd.log
LogFileSize=10
EnableRemoteCommands=1
LogRemoteCommands=1
Server=172.16.3.150
ListenPort=10050
ListenIP=0.0.0.0
StartAgents=8
ServerActive=172.16.3.150:10052
Hostname=agent221
RefreshActiveChecks=60

8.5. 启动agentd

[root@db-172-16-3-221 etc]# su - postgres -c "/opt/zabbix/sbin/zabbix_agentd -c /opt/zabbix/etc/zabbix_agentd.conf"

9. 创建监控主机, 或配置已监控的主机添加模板.

10. 查看PostgreSQL监控模板的items有被刷新.

[其他]

1. 如果宏的配置不通用的话(例如被监控的数据库有几百台, 但是监听端口不一样的话), 可以配置主机宏覆盖值.

2. 如果没有PostgreSQL模板信息被刷新的话, 可以排查一下agentd端的脚本权限, PATH路径等是否正确. 除了要调用两个脚本, 还需要调用psql命令.

[参考]
1. http://pg-monz.github.io/pg_monz/index-en.html

2. https://github.com/pg-monz/pg_monz/tree/master/pg_monz

3. http://blog.163.com/digoal@126/blog/static/1638770402014731111811804/

时间: 2024-11-17 17:00:58

zabbix template pg_monz for PostgreSQL的相关文章

zabbix template customize example for PostgreSQL pg_monz

zabbix PostgreSQL监控模板最终合并到如下repo :  https://github.com/digoal/pg_monz 监控项内容参考 :  http://blog.163.com/digoal@126/blog/static/163877040201482345943567/ 或自行添加. 以下主要将一下如何定制模板的xml和agentd的conf. ITEM内容截取自 https://raw.githubusercontent.com/pg-monz/pg_monz/ma

Install zabbix 2.2 + PostgreSQL + nginx + php on CentOS 6.x x64

首先要阅读requirement, 结合你的环境, 满足日后的使用, 需要合理的选择硬件, 以及需要的依赖组件或库. https://www.zabbix.com/documentation/2.2/manual/installation/requirements 需要注意一下数据库大小的估算. 数据库里存放的数据包括, 配置信息, 历史数据, 报告数据(趋势数据), 事件数据. 这些数据量的评估, 首先和每秒server接收到的数据量有关, 例如我们配置了3000个监控条目, 每60秒监控一次

zabbix proxy configure (optional)

zabbix proxy的部署分3个部分. 1. 安装proxy 2. 配置proxy配置文件, 启动proxy 3. 在zabbix WEB接口中添加proxy. 1. zabbix的安装, web接口的配置以及zabbix server 的配置请参考 : http://blog.163.com/digoal@126/blog/static/1638770402014731111811804/ http://blog.163.com/digoal@126/blog/static/1638770

How to Install and Configure Zabbix on Ubuntu 16.04

By Hitesh Jethva, Alibaba Cloud Tech Share Author Introduction Zabbix is an open source and enterprise-class network monitoring tool that can be used to monitor performance and availability of the server, network devices and other network components.

添加zabbix客户端

添加zabbix客户端: 添加客户端之前,需要执行第五步操作,配置zabbix_agentd.conf,把server改完服务端IP或者域名. 第一步选择-configuration-hosts---add---输入IP和agent ip和端口,同时选择添加templates模板-选择add-勾选Template OS Linux即可. 最后点击保存. 监控图像如下: 注意哦:我这里默认没有添加127.0.0.1监控端本机哦. 大家可以监控一下,其他监控服务器端zabbix_agentd服务器,

zabbix监控docker

当然思路和脚本参考了网上的,但网上的那些有好多错误,以下为本人经过更改调试后的. 如有疑问可以联系我 QQ: 279379936,一起改进优化. Centos6下安装easy_install # yum install python-setuptools 安装python 的docker模块 # easy_install docker-py sudo:sorry, you must have a tty to run sudo 使用不同账户,执行执行脚本时候sudo经常会碰到 sudo: sor

zabbix报警小案例

问题: zabbix的报警方式 设置为报警一次.但出现故障的时候,每次都是出现两次一样的报警.当问题处理完毕的时候,又是两次恢复通知. 解决尝试: 查看zabbix数据库中的action 表,没有错误信息.查看trigger其中有一条错误信息,将其删除还是没有解决.莫非是action之间有有重叠?对现有的action 加注标签,果然报错信息是不同: 原因: 对于action condition,有一个条件为 都是or 这样等于触发了两个action,所以会出现两次报警,更改为and即可! 注意二

zabbix客户端如何自定义端口监控

1.登陆zabbix主界面 选择:配置-模板 选择模板组,这里我选择的是Template App Agentless,原因该自带模板组内包含各种常用服务的模板 单击该模板组 项目 点击右上角的 创建项目

在 Linux 上安装 PostgreSQL

  说实在的,我的这个 LAMP 网站,其实是不能遗忘这头大象的.MySQL 是一条轻快的小海豚,但是缺少很多现代关系数据库应有的特色,例如:引用完整性,视图,触发器等.因此,如果你需要开发一个电子商务的网站,需要这些功能的话,你或许应该考虑 PostgreSQL 了.本文将通过其在 Red Hat 7.1 上安装过程,简要介绍其用法. PostgreSQL 的官方下载地址为: ftp://ftp.postgresql.org/pub/v7.1.3/postgresql-7.1.3.tar.gz