传统的监控系统, 通常采用agent+server的方式, agent负责收集监控信息, 主动或被动发送给server, server负责向agent请求监控数据(agent被动), server和agent都通常使用TCP来进行连接.
传统监控的主要弊端, 当被监控的主机很多的情况下, server端的压力会很大, 例如要监控2万台主机的30个监控项, 就有60万个监控数据要从agent收集, 假设每分钟收集一次监控数据, 每秒需要上千次的metric get请求.
ganglia的设计思路比较巧妙, 有效的避免了这些问题.
ganglia分成3个主要组件.
gmond: 负责收集监控数据(metric), 有别于传统的agent, gmond除了收集自己的数据, 同时可以整合整个多播域的监控数据, 也就是说, 一个多播域里面, 单个gmond就可以包含所有的数据. 例如一个多播域有200台主机, 那么200台主机的监控数据可以只从1台gmond获取, 从而减少了服务端以往要从200个主机获取的链接. 并且gmond之间是使用UDP来传输消息的, 在本地网络中比tcp效率要高. gmond 整合了一些常规监控(metric)例如cpu, network, memory, 同时支持c, python, gmetric来扩展监控项.
配置文件在gmond本地配置, 监控数据则通过XDR格式传输(http://en.wikipedia.org/wiki/External_Data_Representation)
gmond之间共享数据主要交给2个模块进行, sender和receiver, sender只负责往多播域发数据, receiver只负责从多播地址监听端口接收数据. 而且sender和receiver可以独立开启, 也就是说一个gmond可以配置为只发数据的模式, 那就类似传统的agent. 而如果配置为只接收数据的话, 就类似传统解决方案的proxy. 例如把整个多播域的所有gmond的数据全部接收到一个或几个gmond主机, 然后server则只需要从这几台中的任意一台gmond get metric即可.
只发不收的成为deaf(聋子), 只收不发的成为mute(哑巴).
gmetad: 负责从gmond获取metric数据, 解析gmond的监控数据, 按照每台主机的每个metric, 将数据写入RRDtools文件, 即每台主机的每个metric对应一个rrdtools文件(s).
因为gmetad的功能比较单一, 所以不使用gmetad, 直接使用SHELL或python写相关功能的脚本也可以代替gmetad的功能.
gmetad除了基本的功能, gmetad还支持从其他gmetad获取数据, 将数据发生给其他监控系统(如Graphite), 或者其他监控系统主动向gmetad请求数据(如nagios).
gweb: 负责监控数据的可视化, 使用RRD数据库.
扩展模块: c, python, gmetric, 因为gmond只整合了一些常见的metric, 如果要扩展监控的话, 需要写扩展模块, 或者直接使用gmetric来向gmond的sender通道发送监控数据, 例如我们要监控一个数据库的指标, 可以自己扩展监控模块.
ganglia core的帮助文件可见一斑 : (core不包含gweb)
[root@db-172-16-3-221 mans]# pwd
/opt/soft_bak/ganglia-3.6.0/mans
-rw-r--r-- 1 root root 2104 May 7 2013 gmetad.1
-rw-r--r-- 1 root root 1177 May 7 2013 gmetad.py.1
-rw-r--r-- 1 root root 2894 May 7 2013 gmetric.1
-rw-r--r-- 1 root root 2680 May 7 2013 gmond.1
-rw-r--r-- 1 root root 2412 May 7 2013 gstat.1
gmetad : Ganglia Meta Daemon
DESCRIPTION
The Ganglia Meta Daemon (gmetad) collects information from multiple gmond or gmetad data sources, saves the
information to local round-robin databases, and exports XML which is the concatentation of all data sources
gmetad.py : Ganglia Meta Daemon in Python
gmond: Ganglia Monitor Daemon
DESCRIPTION
The Ganglia Monitoring Daemon (gmond) listens to the cluster message channel, stores the data in-memory and
when requested will output an XML description of the state of the cluster
gmetric: Ganglia Custom Metric Utility
DESCRIPTION
The Ganglia Metric Client (gmetric) announces a metric on the list of defined send channels defined in a con-
figuration file
其他,
当然ganglia也有其弱点, 例如没有像nagios, zabbix这种监控软件强大的事件管理功能. 需要结合ganglia和类nagios来使用.
也没有像pgstatsinfo这种监控软件的专业方面的功能.
我们一般可以利用ganglia来做数据波动类的监控, 例如负载, 内存使用量, 流量, TPS, 响应延迟, 队列数量, 数据库容量变化, 服务响应延迟, 等.
[参考]
1. http://ganglia.sourceforge.net/
3. gmond, gmetad, gmetric依赖 :
* APR (http://apr.apache.org/)
* libConfuse (http://www.nongnu.org/confuse/)
* expat (http://expat.sourceforge.net/)
* pkg-config (http://www.freedesktop.org/wiki/Software/pkg-config)
* python (http://www.python.org/)
* PCRE (http://www.pcre.org/)
* RRDtool (http://oss.oetiker.ch/rrdtool/)
4.
Name
ganglia - distributed monitoring system
Version
ganglia 3.6.0
The latest version of this software and document will always be found at
http://ganglia.sourceforge.net/.
Synopsis
______ ___
/ ____/___ _____ ____ _/ (_)___ _
/ / __/ __ `/ __ \/ __ `/ / / __ `/
/ /_/ / /_/ / / / / /_/ / / / /_/ /
\____/\__,_/_/ /_/\__, /_/_/\__,_/
/____/ Distributed Monitoring System
Ganglia is a scalable distributed monitoring system for high-performance
computing systems such as clusters and Grids. It is based on a
hierarchical design targeted at federations of clusters. It relies on a
multicast-based listen/announce protocol to monitor state within
clusters and uses a tree of point-to-point connections amongst
representative cluster nodes to federate clusters and aggregate their
state. It leverages widely used technologies such as XML for data
representation, XDR for compact, portable data transport, and RRDtool
for data storage and visualization. It uses carefully engineered data
structures and algorithms to achieve very low per-node overheads and
high concurrency. The implementation is robust, has been ported to an
extensive set of operating systems and processor architectures, and is
currently in use on over 500 clusters around the world. It has been used
to link clusters across university campuses and around the world and can
scale to handle clusters with 2000 nodes.
The ganglia system is comprised of two unique daemons, a PHP-based web
frontend and a few other small utility programs.
Ganglia Monitoring Daemon (gmond)
Gmond is a multi-threaded daemon which runs on each cluster node you
want to monitor. Installation is easy. You don't have to have a
common NFS filesystem or a database backend, install special
accounts, maintain configuration files or other annoying hassles.
Gmond has four main responsibilities: monitor changes in host state,
announce relevant changes, listen to the state of all other ganglia
nodes via a unicast or multicast channel and answer requests for an
XML description of the cluster state.
Each gmond transmits in information in two different ways:
unicasting/multicasting host state in external data representation
(XDR) format using UDP messages or sending XML over a TCP
connection.
Ganglia Meta Daemon (gmetad)
Federation in Ganglia is achieved using a tree of point-to-point
connections amongst representative cluster nodes to aggregate the
state of multiple clusters. At each node in the tree, a Ganglia Meta
Daemon ("gmetad") periodically polls a collection of child data
sources, parses the collected XML, saves all numeric, volatile
metrics to round-robin databases and exports the aggregated XML over
a TCP sockets to clients. Data sources may be either "gmond"
daemons, representing specific clusters, or other "gmetad" daemons,
representing sets of clusters. Data sources use source IP addresses
for access control and can be specified using multiple IP addresses
for failover. The latter capability is natural for aggregating data
from clusters since each "gmond" daemon contains the entire state of
its cluster.
Ganglia PHP Web Frontend
The Ganglia web frontend provides a view of the gathered information
via real-time dynamic web pages. Most importantly, it displays
Ganglia data in a meaningful way for system administrators and
computer users. Although the web frontend to ganglia started as a
simple HTML view of the XML tree, it has evolved into a system that
keeps a colorful history of all collected data.
The Ganglia web frontend caters to system administrators and users.
For example, one can view the CPU utilization over the past hour,
day, week, month, or year. The web frontend shows similar graphs for
Memory usage, disk usage, network statistics, number of running
processes, and all other Ganglia metrics.
The web frontend depends on the existence of the "gmetad" which
provides it with data from several Ganglia sources. Specifically,
the web frontend will open the local port 8651 (by default) and
expects to receive a Ganglia XML tree. The web pages themselves are
highly dynamic; any change to the Ganglia data appears immediately
on the site. This behavior leads to a very responsive site, but
requires that the full XML tree be parsed on every page access.
Therefore, the Ganglia web frontend should run on a fairly powerful,
dedicated machine if it presents a large amount of data.
The Ganglia web frontend is written in the PHP scripting language,
and uses graphs generated by "gmetad" to display history
information. It has been tested on many flavours of Unix (primarily
Linux) with the Apache webserver and the PHP module (5.0.0 or
later). The GD graphics library for PHP is used to generate pie
charts in the frontend and needs to be installed separately. On
RPM-based system, it is usually provided by the php-gd package.