之前尝试在CentOS7上部署ROOT集群,却发现无论是源码包安装,还是官方提供的二进制包,都缺少了关键的xproofd可执行文件,导致PoD不能运行。没有办法,只能尝试在其他OS上部署,这里我选择了Ubuntu14.04。
部署准备
修改apt源
修改/etc/apt/sources.list,换成国内的163源,下载会更快和稳定一些。
# vim /etc/apt/sources.list
deb http://mirrors.163.com/ubuntu/ precise main restricted universe multiverse
deb http://mirrors.163.com/ubuntu/ precise-security main restricted universe multiverse
deb http://mirrors.163.com/ubuntu/ precise-updates main restricted universe multiverse
deb http://mirrors.163.com/ubuntu/ precise-proposed main restricted universe multiverse
deb http://mirrors.163.com/ubuntu/ precise-backports main restricted universe multiverse
deb-src http://mirrors.163.com/ubuntu/ precise main restricted universe multiverse
deb-src http://mirrors.163.com/ubuntu/ precise-security main restricted universe multiverse
deb-src http://mirrors.163.com/ubuntu/ precise-updates main restricted universe multiverse
deb-src http://mirrors.163.com/ubuntu/ precise-proposed main restricted universe multiverse
deb-src http://mirrors.163.com/ubuntu/ precise-backports main restricted universe multiverse
再调用apt-get update更新index。
安装gcc,g++
如果系统已经安装gcc和g++,可跳过此步骤。
# apt-get install gcc
# apt-get install g++
安装cmake
直接通过apt-get安装的cmake在安装ROOT组件时会出现问题,所以这里建议源码安装,我这里使用的是2.8.8版本。官网下载地址:https://cmake.org/files/,可选择自己适合的版本。
- 解压:tar xvf cmake-2.8.8.tar.gz
- 进入解压目录:cd cmake-2.8.8
- ./bootstrap
- make
- make install
安装zlib库
github上(https://github.com/madler/zlib)可下载对应的zlib库,我使用的是1.2.3版本,下载地址为:https://github.com/madler/zlib/archive/v1.2.3.zip
- 解压:unzip zlib-1.2.3.zip
- 进入解压目录:cd zlib-1.2.3
- ./configure
注意:在make之前,需要修改Makefile,否则调用库时会出现错误。找到 CFLAGS=-O3 -DUSE_MMAP这一行,在后面加入-fPIC,即变成CFLAGS=-O3 -DUSE_MMAP -fPIC
- make
- make install
其他库
apt-get install procmail
部署ROOT集群
安装ROOT
binary安装:https://root.cern.ch/content/release-60606。选择对应的OS系统编译包。解压并将其移动至/opt目录下:
# tar zxvf root_v6.06.06.Linux-ubuntu14-x86_64-gcc4.8.tar.gz
# mv root /opt
再将ROOT相关配置写入初始化文件,这里在/etc/profile.d/root.sh末尾加入以下语句:
export ROOTSYS=/opt/root
export PATH=$PATH:$ROOTSYS/bin
source $ROOTSYS/bin/thisroot.sh
source /etc/profile.d/root.sh让配置生效。运行命令root -b测试root是否能正常运行:
#root -b
root: error while loading shared libraries: libXpm.so.4: cannot open shared object file: No such file or directory
缺少libXpm库,运行apt-get install libxpm4命令安装。安装时有可能会提示缺少安装包,这和本地的源index有关系,需要先向远端源同步后(运行apt-get update命令),再安装xpm包。安装成功。
# apt-get install libxpm4
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following NEW packages will be installed:
libxpm4
0 upgraded, 1 newly installed, 0 to remove and 5 not upgraded.
Need to get 37.0 kB of archives.
……
再次运行root -b命令来测试,再次报错。
# root -b
ERROR in cling::CIFactory::createCI(): cannot extract standard library include paths!
Invoking:
echo | LC_ALL=C c++ -pipe -m64 -Wall -W -Woverloaded-virtual -fsigned-char -fPIC -pthread -std=c++11 -Wno-deprecated-declarations -Wno-comment -Wno-unused-parameter -Wno-maybe-uninitialized -Wno-unused-but-set-variable -Wno-missing-field-initializers -fPIC -fvisibility-inlines-hidden -std=c++11 -ffunction-sections -fdata-sections -fno-common -Woverloaded-virtual -Wcast-qual -fno-strict-aliasing -pedantic -Wno-long-long -Wall -W -Wno-unused-parameter -Wwrite-strings -Wno-unused-local-typedefs -O2 -DNDEBUG -xc++ -E -v - 2>&1 >/dev/null | awk '/^#include </,/^End of search/{if (!/^#include </ && !/^End of search/){ print }}' | grep -E "(c|g)\+\+"
results in
results in
with exit code 256
input_line_1:1:10: fatal error: 'new' file not found
#include <new>
缺少C++的new包,这个报错极有可能是未安装c++引起的,因为ROOT及其它组件都是使用C++编写的。因此需要安装gcc和gcc-c++。
# apt-get install gcc
……
# apt-get install g++
……
运行root -b命令,终于成功,未报错。
安装XRootD
安装XRootD有两种方法,通过ROOT源码包中的脚本安装,或者直接从官网下载源码安装。
通过ROOT源码包中的脚本安装XRootD
进入ROOT源码包目录,执行以下语句即可:
./build/unix/installXrootd.sh -v 3.0.0 /opt
源码安装XRootD:
解压后并进入源码目录:
# mkdir build; cd build
# cmake /root/xrootd-3.3.0 -DCMAKE_INSTALL_PREFIX=/opt/xrootd
# make
# make install
如果完全成功,则可以相关配置写入初始化文件,这里可以继续加入/etc/profile.d/root.sh的末尾:
source $ROOTSYS/bin/setxrd.sh /opt/xrootd/
安装PoD
官网(http://pod.gsi.de)下载源码,这里下载使用的是3.16版本的源码:pod.gsi.de/releases/pod/3.16/PoD-3.16-Source.tar.gz。如果连接失效,可自行查找。解压源码压缩包后,并进入源码目录:
cmake命令
mkdir build
cd build
cmake -C ../BuildSetup.cmake ..
运行cmake时,提示缺少boost库,这里需要安装boost库。
apt-get install libboost-dev
安装后继续运行上述的cmake命令,还是报错,提示缺少以下库:
The following Boost libraries could not be found:
boost_thread
boost_program_options
boost_filesystem
boost_system
boost_unit_test_framework
有个小tips:这些库直接使用apt-get install +库名的方式是不成功的,因为安装包和这个名称并不完全匹配,这里可以用apt-cache search的方法来查找安装包的名称再安装,以boost_thread为例。
# apt-cache search boost | grep thread
libboost-thread-dev - portable C++ multi-threading (default version)
libboost-thread1.46-dev - portable C++ multi-threading
libboost-thread1.46.1 - portable C++ multi-threading
libboost-thread1.48-dev - portable C++ multi-threading
libboost-thread1.48.0 - portable C++ multi-threading
根据这个提示,我就可以直接安装apt-get install libboost-thread-dev即可。以下:
apt-get install libboost-thread-dev
apt-get install libboost-program-options-dev
apt-get install libboost-filesystem-dev
apt-get install libboost-system-dev
apt-get install libboost-test-dev
再接着运行cmake -C ../BuildSetup.cmake ..命令,终于成功。
make命令
运行make命令,又报错了。
/usr/include/boost/thread/xtime.hpp:23:5: error: expected identifier before numeric constant
TIME_UTC=1
这个是boost1.5版本以下的一个固有bug,变量么命名重复了。修改起来很简单,打开/usr/include/boost/thread/xtime.hpp,将23行和71行的TIME_UTC都修改为TIME_UTC_即可,也就是说保证没有重命名。
再次运行make命令,再次提示错误。
/root/PoD-3.16-Source/app/MiscCommon/proof_status_file/ProofStatusFile.h:88:13: error: 'uint16_t' does not name a type
uint16_t xpdPort() const
看起来是编译时不认识uint16_t这个别名,修改很简单,头文件包含即可。在/root/PoD-3.16-Source/app/MiscCommon/proof_status_file/ProofStatusFile.h中的第19行加入#include <stdint.h>。具体插入的位置可能因PoD代码版本不同而有些差别,但有C或者C++基础的人应该很容易能找到合适的位置。
再次运行make命令,终于完美通过。
make install命令
该命令运行无任何报错。如果无指定配置,PoD会被安装在用户目录的Pod目录下,如我以root用户安装,则安装在/root/PoD目录下。
PoD安装最后一步
相关配置写入初始化文件,这里可以继续加入/etc/profile.d/root.sh的末尾:
source /root/PoD/3.16/PoD_env.sh
source /etc/profile.d/root.sh让配置生效。运行pod-server start,如果是第一次运行,会下载相关组件wn_bins目录到/root/PoD/3.16/bin/。如果服务器没有访问外网的权限,可以使用虚拟机搭建以上所有步骤,下载wn_bins目录。无论什么OS,下载的wn_bins目录都是一样的,可以直接拷贝。
组成ROOT集群
运行pod-server start,待其下载wn_bins目录后,如果没有出现错误,会出现如下结果:
# pod-server start
Starting PoD server...
updating xproofd configuration file...
starting xproofd...
starting PoD agent...
preparing PoD worker package...
selecting pre-compiled bins to be added to worker package...
PoD worker package: /root/.PoD/wrk/PoDWorker.sh
------------------------
XPROOFD [1809] port: 21001
PoD agent [1848] port: 22002
PROOF connection string: root@mac00000102030a.hostname.com:21001
使用上述所有方法,搭建两个服务器环境,从而搭建一套拥有一个server和一个client的小集群。ROOT服务器之间又多种通讯方式,这里,我们使用最简单直接的ssh方式。首先,两台服务器需要建立ssh登录互信,从而实现ssh登录免密码。搭建方法可见:http://chenlb.iteye.com/blog/211809。
之后,选择其中服务器A作为server,服务器B作为client(worker)。在server上,编辑/root/pod_ssh.cfg文件,内容如下:
@bash_begin@
. /etc/profile.d/root.sh
@bash_end@
r1, root@109.105.115.249,,/tmp/test, 2
前三行是ssh到client之后,需要执行的脚本文件,这里就是在各client上执行下ROOT系统的参数配置,设置环境变量等。第五行则是访问client的配置,每个client都对应一行,因为我们这里只有一个client,所以就只有一行。这一行的格式是:
1 2 3 4 5
client唯一识别符,不可重复 用户名@ip或者hostname ssh参数,可以为空 client端工作目录 期望的client端worker个数,可以为空
然后,在server端执行pod-ssh -c /root/pod_ssh.cfg submit --debug来建立集群。显示如下,则说明server端成功:
# pod-ssh -c /root/pod_ssh.cfg submit --debug
** [Mon, 29 Aug 2016 10:40:18 +0800] preparing PoD worker package...
** [Mon, 29 Aug 2016 10:40:18 +0800] selecting pre-compiled bins to be added to worker package...
** [Mon, 29 Aug 2016 10:40:18 +0800] PoD worker package: /root/.PoD/wrk/PoDWorker.sh
** [Mon, 29 Aug 2016 10:40:18 +0800] pod-ssh config contains an inline shell script. It will be injected it into wrk. package
** [Mon, 29 Aug 2016 10:40:18 +0800] preparing PoD worker package...
** [Mon, 29 Aug 2016 10:40:18 +0800] inline shell script is found and will be added to the package...
** [Mon, 29 Aug 2016 10:40:18 +0800] selecting pre-compiled bins to be added to worker package...
** [Mon, 29 Aug 2016 10:40:18 +0800] PoD worker package: /root/.PoD/wrk/PoDWorker.sh
** [Mon, 29 Aug 2016 10:40:18 +0800] There are 5 threads in the tread-pool.
** [Mon, 29 Aug 2016 10:40:18 +0800] Number of PoD workers: 1
** [Mon, 29 Aug 2016 10:40:18 +0800] Number of PROOF workers: 2
** [Mon, 29 Aug 2016 10:40:18 +0800] Workers list:
** [Mon, 29 Aug 2016 10:40:18 +0800] [r1] with 2 workers at root@109.105.115.249:/tmp/test/r1
r1 [Mon, 29 Aug 2016 10:40:18 +0800] pod-ssh-submit-worker is started for root@109.105.115.249 (dir: /tmp/test/r1, nworkers: 2, sshopt: )
** [Mon, 29 Aug 2016 10:40:19 +0800]
*******************
Successfully processed tasks: 1
Failed tasks: 0
*******************
我们再登录client端,进入/root/pod_ssh.cfg中设置的client端工作目录工作目录。
# ls
libboost_filesystem-mt.so.5 libpod_protocol.so PoD.cfg PoDWorker.sh proof.conf user_worker_env.sh xpd.log
libboost_program_options-mt.so.5 libproof_status_file.so pod-user-defaults pod-wrk-bin-3.16-Darwin-universal.tar.gz server_info.cfg version
libboost_system-mt.so.5 libSSHTunnel.so PoDWorker.lock pod-wrk-bin-3.16-Linux-amd64.tar.gz ssh-tunnel xpd.cf
libboost_thread-mt.so.5 pod-agent PoDWorker.pid pod-wrk-bin-3.16-Linux-x86.tar.gz ssh_worker.log xpd.cf.bup
可见,都是一些库、配置文件和日志等。我们暂时主要关注日志文件ssh_worker.log,日志末尾显示如下,则表示完全成功:
*** [Mon, 29 Aug 2016 10:44:48 +0800] Attempt to start pod-agent (1 out of 3)
*** [Mon, 29 Aug 2016 10:44:48 +0800] Attempt to start and detect xproofd (1 out of 10)
*** [Mon, 29 Aug 2016 10:44:48 +0800] trying to use XPROOF port: 21002
*** [Mon, 29 Aug 2016 10:44:48 +0800] starting xproofd...
*** [Mon, 29 Aug 2016 10:44:48 +0800] xproofd is running. pid=[2794] port=[21002]
*** [Mon, 29 Aug 2016 10:44:48 +0800] starting pod-agent...