ganglia - distributed monitor system

传统的监控系统, 通常采用agent+server的方式, agent负责收集监控信息, 主动或被动发送给server, server负责向agent请求监控数据(agent被动), server和agent都通常使用TCP来进行连接.

传统监控的主要弊端, 当被监控的主机很多的情况下, server端的压力会很大, 例如要监控2万台主机的30个监控项, 就有60万个监控数据要从agent收集, 假设每分钟收集一次监控数据, 每秒需要上千次的metric get请求.

ganglia的设计思路比较巧妙, 有效的避免了这些问题.

ganglia分成3个主要组件.

gmond: 负责收集监控数据(metric), 有别于传统的agent, gmond除了收集自己的数据, 同时可以整合整个多播域的监控数据, 也就是说, 一个多播域里面, 单个gmond就可以包含所有的数据. 例如一个多播域有200台主机, 那么200台主机的监控数据可以只从1台gmond获取, 从而减少了服务端以往要从200个主机获取的链接. 并且gmond之间是使用UDP来传输消息的, 在本地网络中比tcp效率要高. gmond 整合了一些常规监控(metric)例如cpu, network, memory, 同时支持c, python, gmetric来扩展监控项.

配置文件在gmond本地配置, 监控数据则通过XDR格式传输(http://en.wikipedia.org/wiki/External_Data_Representation)

gmond之间共享数据主要交给2个模块进行, sender和receiver, sender只负责往多播域发数据, receiver只负责从多播地址监听端口接收数据. 而且sender和receiver可以独立开启, 也就是说一个gmond可以配置为只发数据的模式, 那就类似传统的agent. 而如果配置为只接收数据的话, 就类似传统解决方案的proxy. 例如把整个多播域的所有gmond的数据全部接收到一个或几个gmond主机, 然后server则只需要从这几台中的任意一台gmond get metric即可.

只发不收的成为deaf(聋子), 只收不发的成为mute(哑巴).

gmetad: 负责从gmond获取metric数据, 解析gmond的监控数据, 按照每台主机的每个metric, 将数据写入RRDtools文件, 即每台主机的每个metric对应一个rrdtools文件(s).

因为gmetad的功能比较单一, 所以不使用gmetad, 直接使用SHELL或python写相关功能的脚本也可以代替gmetad的功能.

gmetad除了基本的功能, gmetad还支持从其他gmetad获取数据, 将数据发生给其他监控系统(如Graphite), 或者其他监控系统主动向gmetad请求数据(如nagios).

gweb: 负责监控数据的可视化, 使用RRD数据库.

扩展模块: c, python, gmetric, 因为gmond只整合了一些常见的metric, 如果要扩展监控的话, 需要写扩展模块, 或者直接使用gmetric来向gmond的sender通道发送监控数据, 例如我们要监控一个数据库的指标, 可以自己扩展监控模块.

ganglia core的帮助文件可见一斑 : (core不包含gweb)

[root@db-172-16-3-221 mans]# pwd
/opt/soft_bak/ganglia-3.6.0/mans

-rw-r--r-- 1 root root  2104 May  7  2013 gmetad.1
-rw-r--r-- 1 root root  1177 May  7  2013 gmetad.py.1
-rw-r--r-- 1 root root  2894 May  7  2013 gmetric.1
-rw-r--r-- 1 root root  2680 May  7  2013 gmond.1
-rw-r--r-- 1 root root  2412 May  7  2013 gstat.1

gmetad : Ganglia Meta Daemon
DESCRIPTION
       The  Ganglia  Meta  Daemon  (gmetad) collects information from multiple gmond or gmetad data sources, saves the
       information to local round-robin databases, and exports XML which is the concatentation of all data sources

gmetad.py : Ganglia Meta Daemon in Python

gmond: Ganglia Monitor Daemon
DESCRIPTION
       The  Ganglia  Monitoring  Daemon  (gmond) listens to the cluster message channel, stores the data in-memory and
       when requested will output an XML description of the state of the cluster

gmetric: Ganglia Custom Metric Utility
DESCRIPTION
       The  Ganglia  Metric Client (gmetric) announces a metric on the list of defined send channels defined in a con-
       figuration file

其他,

当然ganglia也有其弱点, 例如没有像nagios, zabbix这种监控软件强大的事件管理功能. 需要结合ganglia和类nagios来使用.

也没有像pgstatsinfo这种监控软件的专业方面的功能.

我们一般可以利用ganglia来做数据波动类的监控, 例如负载, 内存使用量, 流量, TPS, 响应延迟, 队列数量, 数据库容量变化, 服务响应延迟, 等.

[参考]
1. http://ganglia.sourceforge.net/

2. https://github.com/ganglia

3. gmond, gmetad, gmetric依赖 :

* APR (http://apr.apache.org/)

* libConfuse (http://www.nongnu.org/confuse/)

* expat (http://expat.sourceforge.net/)

* pkg-config (http://www.freedesktop.org/wiki/Software/pkg-config)

* python (http://www.python.org/)

* PCRE (http://www.pcre.org/)

* RRDtool (http://oss.oetiker.ch/rrdtool/)

Name
    ganglia - distributed monitoring system

Version
    ganglia 3.6.0

    The latest version of this software and document will always be found at
    http://ganglia.sourceforge.net/.

Synopsis
         ______                  ___
        / ____/___ _____  ____ _/ (_)___ _
       / / __/ __ `/ __ \/ __ `/ / / __ `/
      / /_/ / /_/ / / / / /_/ / / / /_/ /
      \____/\__,_/_/ /_/\__, /_/_/\__,_/
                       /____/ Distributed Monitoring System

    Ganglia is a scalable distributed monitoring system for high-performance
    computing systems such as clusters and Grids. It is based on a
    hierarchical design targeted at federations of clusters. It relies on a
    multicast-based listen/announce protocol to monitor state within
    clusters and uses a tree of point-to-point connections amongst
    representative cluster nodes to federate clusters and aggregate their
    state. It leverages widely used technologies such as XML for data
    representation, XDR for compact, portable data transport, and RRDtool
    for data storage and visualization. It uses carefully engineered data
    structures and algorithms to achieve very low per-node overheads and
    high concurrency. The implementation is robust, has been ported to an
    extensive set of operating systems and processor architectures, and is
    currently in use on over 500 clusters around the world. It has been used
    to link clusters across university campuses and around the world and can
    scale to handle clusters with 2000 nodes.

    The ganglia system is comprised of two unique daemons, a PHP-based web
    frontend and a few other small utility programs.

    Ganglia Monitoring Daemon (gmond)
        Gmond is a multi-threaded daemon which runs on each cluster node you
        want to monitor. Installation is easy. You don't have to have a
        common NFS filesystem or a database backend, install special
        accounts, maintain configuration files or other annoying hassles.

        Gmond has four main responsibilities: monitor changes in host state,
        announce relevant changes, listen to the state of all other ganglia
        nodes via a unicast or multicast channel and answer requests for an
        XML description of the cluster state.

        Each gmond transmits in information in two different ways:
        unicasting/multicasting host state in external data representation
        (XDR) format using UDP messages or sending XML over a TCP
        connection.

    Ganglia Meta Daemon (gmetad)
        Federation in Ganglia is achieved using a tree of point-to-point
        connections amongst representative cluster nodes to aggregate the
        state of multiple clusters. At each node in the tree, a Ganglia Meta
        Daemon ("gmetad") periodically polls a collection of child data
        sources, parses the collected XML, saves all numeric, volatile
        metrics to round-robin databases and exports the aggregated XML over
        a TCP sockets to clients. Data sources may be either "gmond"
        daemons, representing specific clusters, or other "gmetad" daemons,
        representing sets of clusters. Data sources use source IP addresses
        for access control and can be specified using multiple IP addresses
        for failover. The latter capability is natural for aggregating data
        from clusters since each "gmond" daemon contains the entire state of
        its cluster.

    Ganglia PHP Web Frontend
        The Ganglia web frontend provides a view of the gathered information
        via real-time dynamic web pages. Most importantly, it displays
        Ganglia data in a meaningful way for system administrators and
        computer users. Although the web frontend to ganglia started as a
        simple HTML view of the XML tree, it has evolved into a system that
        keeps a colorful history of all collected data.

        The Ganglia web frontend caters to system administrators and users.
        For example, one can view the CPU utilization over the past hour,
        day, week, month, or year. The web frontend shows similar graphs for
        Memory usage, disk usage, network statistics, number of running
        processes, and all other Ganglia metrics.

        The web frontend depends on the existence of the "gmetad" which
        provides it with data from several Ganglia sources. Specifically,
        the web frontend will open the local port 8651 (by default) and
        expects to receive a Ganglia XML tree. The web pages themselves are
        highly dynamic; any change to the Ganglia data appears immediately
        on the site. This behavior leads to a very responsive site, but
        requires that the full XML tree be parsed on every page access.
        Therefore, the Ganglia web frontend should run on a fairly powerful,
        dedicated machine if it presents a large amount of data.

        The Ganglia web frontend is written in the PHP scripting language,
        and uses graphs generated by "gmetad" to display history
        information. It has been tested on many flavours of Unix (primarily
        Linux) with the Apache webserver and the PHP module (5.0.0 or
        later). The GD graphics library for PHP is used to generate pie
        charts in the frontend and needs to be installed separately. On
        RPM-based system, it is usually provided by the php-gd package.

时间： 2024-11-27 08:40:20

ganglia - distributed monitor system

ganglia - distributed monitor system的相关文章

bigtable: A Distributed Storage System for Structured Data

笔记：Ceph: A Scalable, High-Performance Distributed File System

Distributed File System(簇文件系统)

Distributed Message System

第 21 章 Distributed File System(簇文件系统)

ganglia man page : gmond gmetad gmetad.py gmetric gstat gmond.conf

Design and Application Learning of the Distributed Call Tracing System

分布式系统(Distributed System)资料

Pregel: A System for Large-Scale Graph Processing