MogileFS是一套高效的文件自动备份组件,由Six Apart开发,广泛应用在包括LiveJournal等web2.0站点上。
MogileFS由3个部分组成:
第1个部分是server端,包括mogilefsd和mogstored两个程序。前者即是mogilefsd的tracker,它将一些全局信息保存在数据库里,例如站点domain,class,host等。后者即是存储节点(store node),它其实是个HTTP Daemon,默认侦听在7500端口,接受客户端的文件备份请求。在安装完后,要运行mogadm工具将所有的store node注册到mogilefsd的数据库里,mogilefsd会对这些节点进行管理和监控。
第2个部分是utils(工具集),主要是MogileFS的一些管理工具,例如mogadm等。
第3个部分是客户端API,目前只有Perl API(MogileFS.pm)、PHP,用这个模块可以编写客户端程序,实现文件的备份管理功能
介绍
mogilefs曾经带我入门进入到分布式文件系统的领域,既然ttlsa上讲了gearman,也讲下mogilefs吧,都是一个人开发的,mogilefs巧妙的用http put实现了一个分布式服务器,适用于存储小文件
方法论
认识一个系统,我觉得我的步骤是如下的:
这个系统是干什么的,是为了解决什么问题而存在;
这个系统是长什么样子;
如何跟这个系统进行对话。
于是本文从MogileFS的来源谈起,继而勾画MogileFS的架构,介绍MogileFS的基本使用方法,最后介绍了MogileFS的管理。
文后介绍了如何将MogileFS纳入第三方应用的方法。
mind map
背景
MogileFS由Danga Interactive 公司开发出来的分布式文件系统,为解决当时所运营的LiveJournal站点的存储难题而产生。
在此之前该技术团队已经采取了数据库分区等技术,这意味着MogileFS中也包含着分而治之的思想。当前MogileFS已经广泛应用于一些高性能的web2.0网站之中,最典型的是Instagram使用它作为图片存储集群。
术语及解释
了解在MogileFS中出现的术语,对于掌握MogileFS的架构至关重要
术语 解释
application thing that wants to store/load files
database the database that stores the MogileFS metadata (the namespace, and which files are where). This should be setup in a HA config so you don’t have a single point of failure.
tracker event-based parent process/message bus that manages all client communication from applications (requesting operations to be performed), including load balancing those requests onto “query workers”, and handles all communication between mogilefsd child processes.
storage node where files are stored. The storage nodes are just HTTP servers that do DELETE, PUT, etc. Any WebDAV server is fine, but mogstored is recommended. mogilefsd can be configured to use two servers on different ports… mogstored for all DAV operations (and sideband monitoring), and your fast/light HTTP server of choice for GET operations. Typically people have one fat SATA disk per mountpoint, each mounted at /var/mogdata/devNN.
domain A domain is the top level separation of files. File keys are unique within domains. A domain consists of a set of classes that define the files within the domain. Examples of domains: fotobilder, livejournal.
class Every file is part of exactly one class. A class is part of exactly one domain. A class, in effect, specifies the minimum replica count of a file. Examples of classes: userpicture, userbackup, phonepost. Classes may have extra replication policies defined.
minimum replica count (mindevcount) This is a property of a class. This defines how many times the files in that class need to be replicated onto different devices in order to ensure redundancy among the data and prevent loss.
key A key is a unique textual string that identifies a file. Keys are unique within domains. Examples of keys: userpicture:34:39, phonepost:93:3834, userbackup:15. Fake structures work too: /pics/hello.png, any string.
file A file is a defined collection of bits uploaded to MogileFS to store. Files are replicated according to their minimum replica count. Each file has a key, is a part of one class, and is located in one domain. Files are the things that MogileFS stores for you.
fid A fid is an internal numerical representation of a file. Every file is assigned a unique fid. If a file is overwritten, it is given a new fid.
mogilefs安装配置
MogileFS的架构
MogileFS的架构如下
在一个MogileFS集群里,存在三种角色的节点
Tracker node
任务分发调度
Meta Database node
存储集群的元信息
Host信息
Device信息
Domain信息
Class信息
Key信息
File信息
Storage node
文件存储
MogileFS两种程序
MogileFSd #负责实现tracker角色功能
Mogstored #负责实现storage node角色功能
在MogileFS中file被定义为上传到storage node的一系列bits,在系统内以domain内唯一的key来标识。一个file属于一个class,class为一组属性值。
MogileFS的安装
服务器环境
ip hostname
10.1.192.63 cluster-database
10.1.192.58 cluster-master01
10.1.192.59 cluster-master02
10.1.192.60 cluster-segment01
10.1.192.61 cluster-segment02
10.1.192.62 cluster-segment03
此五台服务器是vmware vSphere上的五台虚拟机,虚拟机挂在一个新增vmware network2端口下,服务器之间通过vmware switch连接,端口速率为10000Mbps;
由于模块间的依赖关系并没有按照服务器角色区分严格,建议在所有的服务器下安装如下模块:
MogileFS-Utils-2.28.tar.gz
MogileFS-Server-2.70.tar.gz
MogileFS-Client-1.17.tar.gz
MogileFS的安装过程
在cluster-database上初始化数据库
建立用户与database
CREATE DATABASE mogilefs;
GRANT ALL ON mogilefs.* TO 'mogile'@'cluster-database';
SET PASSWORD FOR 'mogile'@'ibm01' = OLD_PASSWORD( 'mo' );
GRANT ALL ON mogilefs.* TO 'mogile'@'%';
SET PASSWORD FOR 'mogile'@'%' = OLD_PASSWORD( 'mo' );
FLUSH PRIVILEGES;
初始化数据库
mogdbsetup --dbname=mogilefs --dbuser=mogile --dbpass=mo
配置tracker节点
mkdir -p /etc/mogilefs
echo << END > mogilefsd.conf
db_dsn = DBI:mysql:mogilefs:host=cluster-database;port=3306;mysql_connect_timeout=5
#db连接串
db_user = mogile
db_pass = mo
conf_port = 7001
#管理端口
listener_jobs = 5
node_timeout = 5
rebalance_ignore_missing = 1
END
配置storage node节点
mkdir -p /etc/mogilefs
echo << END > mogstored.conf
httplisten=0.0.0.0:7500
mgmtlisten=0.0.0.0:7501
docroot=/data/mogData
#http server侦听目录
END
在storage node节点建立device目录
mkdir -p /data/mogData/dev[1-n]
增加host与device
启动tracker
mogilefsd -c /etc/mogilefs/mogilefsd.conf --daemon
增加host与device
view source
print
?
mogadm --trackers=cluster-master01:7001 host add segment01 --ip=10.1.192.60 --status=alive
mogadm --trackers=cluster-master01:7001 host add segment02 --ip=10.1.192.61 --status=alive
mogadm --trackers=cluster-master01:7001 host add segment03 --ip=10.1.192.62 --status=alive
mogadm --trackers=cluster-master01:7001 device add segment01 1
mogadm --trackers=cluster-master01:7001 device add segment02 2
mogadm --trackers=cluster-master01:7001 device add segment03 3
MogileFS的使用
MogileFS的使用
文件下载
mogfetch --trackers=cluster-master01:7001 --domain=abc --key="speach_of_dependence" --file=./speach_of_dependence_income.words
文件是存在与domain里的,在下载的时候要指定domain参数
文件上传
mogupload --trackers=cluster-master01:7001 --domain=abc --class=test01.abc --key="speach_of_dependence" --file=./speach_of_dependence.words
文件具备class属性,所以在上传的时候要指定class参数,和domain参数
文件查看
moglistkeys --trackers=cluster-master01:7001 --domain=abc
存储设备查看
mogadm --trackers=cluster-master01:7001 device list
节点设备查看
mogadm --trackers=cluster-master01:7001 host list
domain查看
mogadm --trackers=cluster-master01:7001 domain list
class查看
mogadm --trackers=cluster-master01:7001 class list
所有的请求都是发送到tracker节点。
Inner MogileFS
Key-file
MogileFS不维护原来的文件名,所谓的file是storage node收到的bit流。在MogileFS内部以在domain中可见的key来标记文件。
文件存放
MogileFS对每个文件分配fid,文件以.fid为后缀存放,系统维护fid到path的映射关系。fid按照(\d)(\d{3})(\d{3})(\d{3})分割成四部分后,文件放置于目录/devid/$1/$2/$3下,对于是哪个devid则由master提供给客户端决定。
文件冗余
通过class的dvcont属性来保证文件在系统内的冗余
look into MogileFS
既然MogileFS是用Perl写成的,我们就来看看程序相关的源代码吧
Mogdbsetup
本程序在安装database节点时初始化meta database
程序代码分析
调用模块
use MogileFS::Config;
use MogileFS::Store;
#!/usr/bin/perl
eval 'exec /usr/bin/perl -S $0 ${1+"$@"}'
if 0;
# not running under some shell
use strict;
use Getopt::Long;
use lib 'lib';
use MogileFS::Store;
use MogileFS::Config;
#
#省略usage与opt设置部分
#
MogileFS::Config->load_config;
my $sto = $sclass->new_from_mogdbsetup(
map { $_ => $args{$_} }
qw(dbhost dbport dbname
dbrootuser dbrootpass
dbuser dbpass)
);
my $dbh = $sto->dbh;
$sto->setup_database
or die "Database upgrade failed.\n";
my $latestver = MogileFS::Store->latest_schema_version;
if ($opt_noschemabump) {
warn "\n*\n* Per your request, NOT UPGRADING to $latestver. I assume you understand why.\n*\n";
} else {
$sto->set_schema_vesion($latestver);
}
warn "Done.\n" if $opt_verbose;
exit 0;
Mogdbsetup程序调用了MogileFS::Store中的setup_database subroutine初始化了数据库,通过SCHEMA_VERSION来判断当前操作是在安装还是升级中。
MogileFSd
Tracker节点进程,完成整个cluster的任务分派
程序代码分析
调用模块
use MogileFS::Server;
#!/usr/bin/perl
......
# Rename binary in process list to make init scripts saner
$0 = "MogileFSd";
my $s = MogileFS::Server->server;
$s->run;
1;
程序简单了调用了MogileFS::Server中的run subroutine。
整个MogileFS是一个event-based的cluster。
Mogstored
Storage node节点进程,负责文件的真实操作
程序代码分析
调用模块
use Perlbal 1.73;
use FindBin qw($Bin $RealScript);
use Mogstored::HTTPServer;
use Mogstored::HTTPServer::Perlbal;
use Mogstored::HTTPServer::Lighttpd;
use Mogstored::HTTPServer::None;
use Mogstored::HTTPServer::Apache;
use Mogstored::HTTPServer::Nginx;
use Mogstored::SideChannelListener;
use Mogstored::SideChannelClient;
......
# initialize basic required Perlbal machinery, for any HTTP server
my $perlbal_init = qq{
CREATE SERVICE mogstored
SET role = web_server
SET docroot = $docroot
# don't listen... this is just a stub service.
CREATE SERVICE mgmt
SET role = management
ENABLE mgmt
};
$perlbal_init .= "\nSERVER pidfile = $pidfile" if defined($pidfile);
Perlbal::run_manage_commands($perlbal_init , sub { print STDERR "$_[0]\n"; });
# start HTTP server
my $httpsrv_class = "Mogstored::HTTPServer::" . ucfirst($server);
my $httpsrv = $httpsrv_class->new(
listen => $http_listen,
docroot => $docroot,
maxconns => $max_conns,
bin => $serverbin,
);
# Configure Perlbal HTTP listener after daemonization since it can create a
# kqueue on *BSD. kqueue descriptors are automatically invalidated on fork(),
# making them unusable after daemonize. For non-Perlbal, starting the
# server before daemonization improves error reporting as daemonization
# redirects stdout/stderr to /dev/null.
$httpsrv->start if $server ne "perlbal";
if ($opt_daemonize) {
$httpsrv->pre_daemonize;
Perlbal::daemonize();
} else {
print "Running.\n";
}
# It is now safe for Perlbal to create a kqueue
$httpsrv->start if $server eq "perlbal";
$httpsrv->post_daemonize;
# kill our children processes on exit:
my $parent_pid = $$;
$SIG{TERM} = $SIG{INT} = sub {
return unless $$ == $parent_pid;
# don't let this be inherited
kill 'TERM', grep { $_ } keys %on_death;
POSIX::_exit(0);
};
setup_iostat_pipes();
start_disk_usage_process();
start_iostat_process() if $opt_iostat;
harvest_dead_children();
# every 2 seconds, it reschedules itself
setup_sidechannel_listener();
# now start the main loop
Perlbal::run();
管理 MogileFS
1. 系统的启动与停止
启动tracker
mogilefsd -c /etc/mogilefs/mogilefsd.conf --daemon
启动storage node
mogstored --daemon
停止tracker
echo !shutdown | nc cluster-master01 7001
停止storage node
killall mogstored
2. 查看系统状态
#mogadm check
dfs
mogilefs-04
【系统内流量分布】
系统内存在三种流量
Tcp7001 on tracke #Client客户端发送给tracker请求流量
Tcp3306 on mysql #tracker与meta database的流量
Tcp7500 on storage node #Client与storage node数据流量
Tcp7501