ganglia - distributed monitor system

传统的监控系统, 通常采用agent+server的方式, agent负责收集监控信息, 主动或被动发送给server, server负责向agent请求监控数据(agent被动), server和agent都通常使用TCP来进行连接. 

传统监控的主要弊端, 当被监控的主机很多的情况下, server端的压力会很大, 例如要监控2万台主机的30个监控项, 就有60万个监控数据要从agent收集, 假设每分钟收集一次监控数据, 每秒需要上千次的metric get请求. 

ganglia的设计思路比较巧妙, 有效的避免了这些问题.

ganglia分成3个主要组件. 

gmond: 负责收集监控数据(metric), 有别于传统的agent, gmond除了收集自己的数据, 同时可以整合整个多播域的监控数据, 也就是说, 一个多播域里面, 单个gmond就可以包含所有的数据.  例如一个多播域有200台主机, 那么200台主机的监控数据可以只从1台gmond获取, 从而减少了服务端以往要从200个主机获取的链接. 并且gmond之间是使用UDP来传输消息的, 在本地网络中比tcp效率要高.  gmond 整合了一些常规监控(metric)例如cpu, network, memory, 同时支持c, python, gmetric来扩展监控项.

配置文件在gmond本地配置, 监控数据则通过XDR格式传输(http://en.wikipedia.org/wiki/External_Data_Representation)

gmond之间共享数据主要交给2个模块进行, sender和receiver, sender只负责往多播域发数据, receiver只负责从多播地址监听端口接收数据. 而且sender和receiver可以独立开启, 也就是说一个gmond可以配置为只发数据的模式, 那就类似传统的agent. 而如果配置为只接收数据的话, 就类似传统解决方案的proxy. 例如把整个多播域的所有gmond的数据全部接收到一个或几个gmond主机, 然后server则只需要从这几台中的任意一台gmond get metric即可.

只发不收的成为deaf(聋子), 只收不发的成为mute(哑巴).

gmetad: 负责从gmond获取metric数据, 解析gmond的监控数据, 按照每台主机的每个metric, 将数据写入RRDtools文件, 即每台主机的每个metric对应一个rrdtools文件(s). 

因为gmetad的功能比较单一, 所以不使用gmetad, 直接使用SHELL或python写相关功能的脚本也可以代替gmetad的功能.

gmetad除了基本的功能, gmetad还支持从其他gmetad获取数据, 将数据发生给其他监控系统(如Graphite), 或者其他监控系统主动向gmetad请求数据(如nagios). 

gweb: 负责监控数据的可视化, 使用RRD数据库.

扩展模块: c, python, gmetric, 因为gmond只整合了一些常见的metric, 如果要扩展监控的话, 需要写扩展模块, 或者直接使用gmetric来向gmond的sender通道发送监控数据, 例如我们要监控一个数据库的指标, 可以自己扩展监控模块. 

 

ganglia core的帮助文件可见一斑 : (core不包含gweb)

[root@db-172-16-3-221 mans]# pwd
/opt/soft_bak/ganglia-3.6.0/mans

-rw-r--r-- 1 root root  2104 May  7  2013 gmetad.1
-rw-r--r-- 1 root root  1177 May  7  2013 gmetad.py.1
-rw-r--r-- 1 root root  2894 May  7  2013 gmetric.1
-rw-r--r-- 1 root root  2680 May  7  2013 gmond.1
-rw-r--r-- 1 root root  2412 May  7  2013 gstat.1

gmetad : Ganglia Meta Daemon
DESCRIPTION
       The  Ganglia  Meta  Daemon  (gmetad) collects information from multiple gmond or gmetad data sources, saves the
       information to local round-robin databases, and exports XML which is the concatentation of all data sources

gmetad.py : Ganglia Meta Daemon in Python

gmond: Ganglia Monitor Daemon
DESCRIPTION
       The  Ganglia  Monitoring  Daemon  (gmond) listens to the cluster message channel, stores the data in-memory and
       when requested will output an XML description of the state of the cluster

gmetric: Ganglia Custom Metric Utility
DESCRIPTION
       The  Ganglia  Metric Client (gmetric) announces a metric on the list of defined send channels defined in a con-
       figuration file

其他, 

当然ganglia也有其弱点, 例如没有像nagios, zabbix这种监控软件强大的事件管理功能. 需要结合ganglia和类nagios来使用.

也没有像pgstatsinfo这种监控软件的专业方面的功能. 

我们一般可以利用ganglia来做数据波动类的监控, 例如负载, 内存使用量, 流量, TPS, 响应延迟, 队列数量, 数据库容量变化, 服务响应延迟, 等.

[参考]
1. http://ganglia.sourceforge.net/

2. https://github.com/ganglia

3. gmond, gmetad, gmetric依赖 : 

* APR (http://apr.apache.org/)

* libConfuse (http://www.nongnu.org/confuse/)

* expat (http://expat.sourceforge.net/)

* pkg-config (http://www.freedesktop.org/wiki/Software/pkg-config)

* python (http://www.python.org/)

* PCRE (http://www.pcre.org/)

* RRDtool (http://oss.oetiker.ch/rrdtool/)

4. 

Name
    ganglia - distributed monitoring system

Version
    ganglia 3.6.0

    The latest version of this software and document will always be found at
    http://ganglia.sourceforge.net/.

Synopsis
         ______                  ___
        / ____/___ _____  ____ _/ (_)___ _
       / / __/ __ `/ __ \/ __ `/ / / __ `/
      / /_/ / /_/ / / / / /_/ / / / /_/ /
      \____/\__,_/_/ /_/\__, /_/_/\__,_/
                       /____/ Distributed Monitoring System

    Ganglia is a scalable distributed monitoring system for high-performance
    computing systems such as clusters and Grids. It is based on a
    hierarchical design targeted at federations of clusters. It relies on a
    multicast-based listen/announce protocol to monitor state within
    clusters and uses a tree of point-to-point connections amongst
    representative cluster nodes to federate clusters and aggregate their
    state. It leverages widely used technologies such as XML for data
    representation, XDR for compact, portable data transport, and RRDtool
    for data storage and visualization. It uses carefully engineered data
    structures and algorithms to achieve very low per-node overheads and
    high concurrency. The implementation is robust, has been ported to an
    extensive set of operating systems and processor architectures, and is
    currently in use on over 500 clusters around the world. It has been used
    to link clusters across university campuses and around the world and can
    scale to handle clusters with 2000 nodes.

    The ganglia system is comprised of two unique daemons, a PHP-based web
    frontend and a few other small utility programs.

    Ganglia Monitoring Daemon (gmond)
        Gmond is a multi-threaded daemon which runs on each cluster node you
        want to monitor. Installation is easy. You don't have to have a
        common NFS filesystem or a database backend, install special
        accounts, maintain configuration files or other annoying hassles.

        Gmond has four main responsibilities: monitor changes in host state,
        announce relevant changes, listen to the state of all other ganglia
        nodes via a unicast or multicast channel and answer requests for an
        XML description of the cluster state.

        Each gmond transmits in information in two different ways:
        unicasting/multicasting host state in external data representation
        (XDR) format using UDP messages or sending XML over a TCP
        connection.

    Ganglia Meta Daemon (gmetad)
        Federation in Ganglia is achieved using a tree of point-to-point
        connections amongst representative cluster nodes to aggregate the
        state of multiple clusters. At each node in the tree, a Ganglia Meta
        Daemon ("gmetad") periodically polls a collection of child data
        sources, parses the collected XML, saves all numeric, volatile
        metrics to round-robin databases and exports the aggregated XML over
        a TCP sockets to clients. Data sources may be either "gmond"
        daemons, representing specific clusters, or other "gmetad" daemons,
        representing sets of clusters. Data sources use source IP addresses
        for access control and can be specified using multiple IP addresses
        for failover. The latter capability is natural for aggregating data
        from clusters since each "gmond" daemon contains the entire state of
        its cluster.

    Ganglia PHP Web Frontend
        The Ganglia web frontend provides a view of the gathered information
        via real-time dynamic web pages. Most importantly, it displays
        Ganglia data in a meaningful way for system administrators and
        computer users. Although the web frontend to ganglia started as a
        simple HTML view of the XML tree, it has evolved into a system that
        keeps a colorful history of all collected data.

        The Ganglia web frontend caters to system administrators and users.
        For example, one can view the CPU utilization over the past hour,
        day, week, month, or year. The web frontend shows similar graphs for
        Memory usage, disk usage, network statistics, number of running
        processes, and all other Ganglia metrics.

        The web frontend depends on the existence of the "gmetad" which
        provides it with data from several Ganglia sources. Specifically,
        the web frontend will open the local port 8651 (by default) and
        expects to receive a Ganglia XML tree. The web pages themselves are
        highly dynamic; any change to the Ganglia data appears immediately
        on the site. This behavior leads to a very responsive site, but
        requires that the full XML tree be parsed on every page access.
        Therefore, the Ganglia web frontend should run on a fairly powerful,
        dedicated machine if it presents a large amount of data.

        The Ganglia web frontend is written in the PHP scripting language,
        and uses graphs generated by "gmetad" to display history
        information. It has been tested on many flavours of Unix (primarily
        Linux) with the Apache webserver and the PHP module (5.0.0 or
        later). The GD graphics library for PHP is used to generate pie
        charts in the frontend and needs to be installed separately. On
        RPM-based system, it is usually provided by the php-gd package.
时间: 2024-11-27 08:40:20

ganglia - distributed monitor system的相关文章

bigtable: A Distributed Storage System for Structured Data

bigtable: A Distributed Storage System for Structured Data http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en//archive/bigtable-osdi06.pdf http://www.dbthink.com/?p=493, 中文翻译   总结 A Bigtable is a sparse, distri

笔记:Ceph: A Scalable, High-Performance Distributed File System

关于Ceph的名篇.Ceph是现在很火的一个存储系统,不同于HDSF主要是面向大数据应用,Ceph是立志要做一个通用的存储解决方案,要同时很好的支持对象存储(Object Storage),块存储(Block Storage)以及文件系统(File System) .现在很多Openstack私有云的存储都是基于Ceph的.Ceph就是基于这篇论文做得. 摘要 很明确的指出了Ceph的使命: We have developed Ceph, a distributed file system th

Distributed File System(簇文件系统)

Distributed File System(簇文件系统) 我吧分布式文件系统分为三类,聚合文件系统,全局文件系统,负载均衡文件系统.除了gfs其他文件系统都是建立在本地文件系统之上的网络文件系统. 几乎所有DFS都能通过fuse mount 到本地,但有些DFS mount 后性能不佳. 3.1. 聚合文件系统 以NFS, glusterfs 为代表,其特点是server独立运行,Server与Server间没有通信,然后访问者将其聚合组织并规划目录,为client提供数据共享. glust

Distributed Message System

http://dongxicheng.org/search-engine/log-systems/ 包括facebook的scribe,apache的chukwa,linkedin的kafka和cloudera的flume   Kafka http://www.cnblogs.com/fxjwind/archive/2013/03/22/2975573.html http://www.cnblogs.com/fxjwind/archive/2013/03/19/2969655.html    F

第 21 章 Distributed File System(簇文件系统)

我吧分布式文件系统分为三类,聚合文件系统,全局文件系统,负载均衡文件系统. 除了gfs其他文件系统都是建立在本地文件系统之上的网络文件系统. 几乎所有DFS都能通过fuse mount 到本地,但有些DFS mount 后性能不佳. 还有一个与分布式文件系统密切相关的,就是块设备,块设备不是文件系统,可以称为裸设备. 21.1. 聚合文件系统 以NFS, glusterfs 为代表,其特点是server独立运行,Server与Server间没有通信,然后访问者将其聚合组织并规划目录,为clien

ganglia man page : gmond gmetad gmetad.py gmetric gstat gmond.conf

gmond GMOND(1) User Commands GMOND(1) NAME gmond - manual page for Ganglia Monitor Daemon SYNOPSIS gmond [OPTIONS]... DESCRIPTION The Ganglia Monitoring Daemon (gmond) listens to the cluster message channel, stores the data in-memory and when request

Design and Application Learning of the Distributed Call Tracing System

1. Why the distributed call tracing system? With the surge in popularity of distributed service architecture and the application of design architectures, especially the microservices architecture, the chain of service calls is becoming increasingly c

分布式系统(Distributed System)资料

原文地址:https://github.com/ty4z2008/Qix/blob/master/ds.md 希望转载的朋友,你可以不用联系我.但是**一定要保留原文链接**,因为这个项目还在继续也在不定期更新.希望看到文章的朋友能够学到更多. <Reconfigurable Distributed Storage for Dynamic Networks> 介绍:这是一篇介绍在动态网络里面实现分布式系统重构的paper.论文的作者(导师)是MIT读博的时候是做分布式系统的研究的,现在在NUS

Pregel: A System for Large-Scale Graph Processing

作者Grzegorz Malewicz, Matthew H. Austern .etc.Google Inc 2010-6 原文http://people.apache.org/~edwardyoon/documents/pregel.pdf 译者phylips@bmy 2012-09-14 译文http://duanple.blog.163.com/blog/static/70971767201281610126277/ [说明Pregel这篇是发表在2010年的SIGMOD上Pregel这