在基于java的hadoop如日中天的时代,开源云计算界有一匹基于C++的黑马,Sector/Sphere在性能方面对hadoop提出了挑战,在Open Cloud Consortium(OCC)开放云计算协会建立的Open Cloud Testbed开放云实验床的软件测试中, Sector is about twice as fast as Hadoop.
本篇先对这匹黑马做一次实战演习,先感受一下,下一篇深入其设计原理,探讨云计算的本质。
OCT是一套跨核心10G带宽教育网的多个数据中心的计算集群。
分2个阶段实现:
Phase 1. Phase 1 was operational in June 2008 and consists of 240 cores distributed across four cities in the U.S. This was upgraded in September, 2008 to 480 cores.
Here is a diagram of the testbed. The Phase 1 equipment consists of four racks. Each rack contains 30 nodes. Each node has 4 cores. The racks are located in:
University of Illinois at Chicago (Chicago) StarLight (Chicago) Calit2 (La Jolla) Johns Hopkins University (Baltimore)
All the racks are connected by a wide area 10 Gb/s network.
Phase 2. Phase 2 of the Open Cloud Testbed is planned to be operational by June, 2009. The testbed will add 4 racks of equipment for a total of 8 racks containing over 1000 cores. In addition, two more sites will be connected by 10 Gb/s networks. Phase 2 racks will be located at:
Johns Hopkins University (Baltimore) Calit2 (La Jolla) MIT Lincoln Lab (Cambridge) Pittsburgh Supercompter Center/Carnegie Mellon University (Pittsburgh) StarLight (Chicago) University of Illinois at Chicago (Chicago)
In addition, in Phase 2, the Open Cloud Testbed will add shared, non-dedicated resources.
企业和大学联合,开源云计算领域规模日益扩大。前面提到的hadoop也是OCT使用的软件之一,我们这里重点来看另外一匹黑马Sector/Sphere,也是OCT采用的核心软件之一,可见其分量,Sector/Sphere的重点在与可以跨公网运行,强调了核心数据安全性,另外给熟练C++的开发人员提供了开源云计算技术框架。我们来看一下性能测试中超过hadoop 2倍的这匹黑马。
Sector/Sphere设计思路清晰,不过资料和文档目前比较少,这也给大规模的推广带来不方便
在体验这匹黑马前,我们来看一下sector/sphere的设计结构图。
了解到sector的一个比较突出的地方时有security server的设置,这在广域网上进行云计算提供了一定的安全性保障。
软件非常小巧,下载最新版本codeblue.1.23c.tar.gz,一些问题可以在论坛进行讨论。
http://sourceforge.net/forum/?group_id=172838
在安装make之前,检查debian os里面的几个基本的包是否安装。
libssl-dev,gcc,g++,libfuse-dev 如果准备体验FUSE的功能。
debian:~# tar xvzf codeblue.1.23c.tar.gz
debian:~/codeblue2/conf# ls
client.conf master_node.cert masters.list security_node.key slave.conf topology.conf
master.conf master_node.key security_node.cert slave_acl.conf slaves.list users
debian:~/codeblue2/conf# pwd
/root/codeblue2/conf
根据你部署的环境更改security,master,slave,client的配置文件。
配置文件非常清晰,基本上改一下对应的主机,和data目录就可以了。
debian:~/codeblue2/conf# more master.conf
#SECTOR server port number
SECTOR_PORT
6000
#security server address
SECURITY_SERVER
localhost:5000
debian:~/codeblue2/conf# more slave.conf
#Master address
MASTER_ADDRESS
localhost:6000
#Data directory
DATA_DIRECTORY
/root/data/
debian:~/codeblue2/conf# more client.conf
#Master address
MASTER_ADDRESS
localhost:6000
编译,make成功完成后,就可以启动服务了。
启动服务:
debian:~/codeblue2/security# ./sserver &
[1] 8637
debian:~/codeblue2/security# Sector Security server running at port 5000
The server is started successfully; there is no further output from this program. Please do not shutdown the security server; otherwise no client may be able to login. If the server is down for any reason, you can restart it without restarting the masters and the slaves
debian:~/codeblue2/security# cd ../master/
debian:~/codeblue2/master# ./start_master &
[2] 8638
debian:~/codeblue2/master# Sector master is successfully running now. check sector.log for more details.
There is no further screen output from this program.
debian:~/codeblue2/master# cd ../slave/
debian:~/codeblue2/slave# ls
COPYING serv_file.cpp serv_spe.cpp slave.cpp slave.o start_slave.cpp
Makefile serv_file.o serv_spe.o slave.h start_slave
debian:~/codeblue2/slave# ./start_slave &
[3] 8652
debian:~/codeblue2/slave# scaning /root/data/
This Sector slave is successfully initialized and running now.
slave process: GMP 47087 DATA 42064
debian:~/codeblue2/slave#
默认sector会保留10GB的空间,产生的测试数据也是10GB,如果大家想用小一点的数据量来验证一下,可以通过更改源代码来实现。
比如,如果需要产生100M的测试数据进行排序。
那么
vi randwriter.cpp
修改,去掉最后的00,这样从10GB减少到100M的测试数据量。
//10GB = 100 * 1000000
57 for (long long int i = 0; i < 1000000; ++ i)
58 {
59 keygen(record);
60 ofs.write(record, 100);
61 }
67 for (long long int i = 0; i < 1000001; ++ i)
68 {
69 long long int d = i * 100;
70 idx.write((char*)&d, 8);
71 }
而mrsort.cpp里面需要注释掉一段,否则运行不过去。
debian:~/codeblue2/client/examples# vi mrsort.cpp
/* if (3 != argc)
{
cout << "usage: mrsort" << endl;
return 0;
}
*/
然后make或者到codeblue2目录下make clean,make。
这样下面的测试就可以开始了,也不会撑爆你的硬盘,不过玩云计算,建议大家还是多预留一些硬盘,很多benchmark的程序都要默认数据量达到一定级别才能有代表性,也能体现出云的庞大,呵呵。
生成测试数据。
debian:~/codeblue2/client/examples# ./testfs
recv cmd 127.0.0.1 6000 type 105
recv cmd 127.0.0.1 6000 type 103
recv cmd 127.0.0.1 6000 type 110
===> start file server 127.0.0.1 6000
open file tmp/guide.dat 127.0.0.1 60833
rendezvous connect source 127.0.0.1 45180 /root/data//tmp/guide.dat
connected
file server closed 127.0.0.1 45180 0
report 127.0.0.1 6000 14,/tmp/guide.dat,0,1245914942,4
recv cmd 127.0.0.1 6000 type 110
===> start file server 127.0.0.1 6000
rendezvous connect source 127.0.0.1 45180 /root/data//tmp/guide.dat.idx
connected
open file tmp/guide.dat.idx 127.0.0.1 60833
file server closed 127.0.0.1 45180 0
report 127.0.0.1 6000 18,/tmp/guide.dat.idx,0,1245914943,16
start time 1245914943
JOB 4 1
1 spes found! 1 data seg total.
recv cmd 127.0.0.1 6000 type 203
starting SPE ... 0 45180 randwriter 3
rendezvous connect 127.0.0.1 45180
connected
connect SPE 127.0.0.1 3
new job /tmp/guide.dat 0 1
completed 100 127.0.0.1 46922
sending data back... 0
report 127.0.0.1 6000 21,test/sort_input.0.dat,0,1245914946,100000000
report 127.0.0.1 6000 25,test/sort_input.0.dat.idx,0,1245914946,8000008
recv cmd 127.0.0.1 6000 type 105
comp server closed 127.0.0.1 46922 2
reportSphere 127.0.0.1 6000 3
通过./sysinfo 查看sector系统信息。
debian:~/codeblue2/client/tools# ./sysinfo
Sector System Information:
Running since Thu Jun 25 03:28:39 2009
Available Disk Size 27413 MB
Total File Size 102 MB
Total Number of Files 2
Total Number of Slave Nodes 1
------------------------------------------------------------
Total number of clusters 4
Cluster_ID Total_Nodes AvailDisk(MB) FileSize(MB) NetIn(MB) NetOut(MB)
0: 1 27413 102 0 0
1: 0 0 0 0 0
2: 0 0 0 0 0
3: 0 0 0 0 0
------------------------------------------------------------
SLAVE_ID IP TS(us) AvailDisk(MB) TotalFile(MB) Mem(MB) CPU(us) NetIn(MB) NetOut(MB)
1: 127.0.0.1 1245915399257411 27413 102 0 3440000 0 0
debian:~/codeblue2/client/tools# ./ls /
test <dir>
debian:~/codeblue2/client/tools# ./ls /test
sort_input.0.dat 100000000 bytes Thu Jun 25 03:29:06 2009
sort_input.0.dat.idx 8000008 bytes Thu Jun 25 03:29:06 2009
可以看到测试数据已经生成。
用testdc做排序实验。
debian:~/codeblue2/client/examples# ./testdc
start time 1245915520
JOB 100000000 1000000
request shuffler 127.0.0.1 41406
1 spes found! 1 data seg total.
connect SPE 127.0.0.1 5
stage 1 accomplished 1245915552
JOB 100000000 1000000
2 spes found! 16 data seg total.
connect SPE 127.0.0.1 6
connect SPE 127.0.0.1 7
stage 2 accomplished 1245915557
SPE COMPLETED
debian:~/codeblue2/client/examples#
在运行一个wordcount例子,这个在hadoop里面也有对应的example例子。
debian:~/codeblue2/client/tools# ./mkdir html
debian:~/codeblue2/client/tools# ./upload mv.cpp
usage: upload <src file/dir> <dst dir>
debian:~/codeblue2/client/tools# ./upload mv.cpp /html
uploading mv.cpp of 1821 bytes
open file /html/mv.cpp 127.0.0.1 60833
Uploading accomplished! AVG speed 0.0121632 Mb/s.
debian:~/codeblue2/client/tools# cd ../examples/
debian:~/codeblue2/client/examples# ./wordcount
start time 1245915644
JOB 1821 -1
request shuffler 127.0.0.1 41406
1 spes found! 1 data seg total.
connect SPE 127.0.0.1 10
stage 1 accomplished 1245915645
SPE COMPLETED
debian:~/codeblue2/client/examples#
有兴趣的同学可以访问http://sector.sourceforge.net/来了解更多的信息。