Hadoop服务器集群HDFS 安装与配置详解

简单的描述一下这些系统:
HBase – Key/Value的分布式数据库
Zookeeper – 支撑分布式应用的协作系统
Hive – SQL解析引擎
Flume – 分布式的日志收集系统

一、相关环境说明:
s1:
hadoop-master
namenode,jobtracker;
secondarynamenode;
datanode,taskTracker

s2:
hadoop-node-1
datanode,taskTracker;

s3:
hadoop-node-2
dataNode,taskTracker;

namenode – 整个HDFS的命名空间管理服务
secondarynamenode – 可以看做是namenode的冗余服务
jobtracker – 并行计算的job管理服务
datanode – HDFS的节点服务
tasktracker – 并行计算的job执行服务

二、前提系统环境配置:
1。添加hosts记录(所有机器)
hwl@hadoop-master:~$ cat /etc/hosts
192.168.242.128 hadoop-master
192.168.242.128 hadoop-secondary
192.168.242.129 hadoop-node-1
192.168.242.130 hadoop-node-2

2. 修改主机名
hwl@hadoop-master:~$ cat /etc/hostname
hadoop-master
hwl@hadoop-node-1:~$ cat /etc/hostname
hadoop-node-1
hwl@hadoop-node-2:~$ cat /etc/hostname
hadoop-node-2

3. 所有机器配置相互key免密钥(略)

三、Hadoop环境配置:
1. 选择安装包
为了更方便和更规范的部署Hadoop集群,我们采用Cloudera的集成包。
因为Cloudera对Hadoop相关的系统做了很多优化,避免了很多因各个系统间版本不符产生的很多Bug。

https://ccp.cloudera.com/display/DOC/Documentation//

2. 安装Java环境
由于整个Hadoop项目主要是通过Java开发完成的,因此需要JVM的支持。
添加匹配的Java版本的APT源
所有server上都安装:
apt-get install python-software-properties
vim /etc/apt/sources.list.d/sun-java-community-team-sun-java6-maverick.list
deb http://ppa.launchpad.net/sun-java-community-team/sun-java6/ubuntu maverick main
deb-src http://ppa.launchpad.net/sun-java-community-team/sun-java6/ubuntu maverick main

安装sun-java6-jdk
add-apt-repository ppa:sun-java-community-team/sun-java6
apt-get update
apt-get install sun-java6-jdk

3. 增加Cloudera的Hadoop安装源
vim /etc/apt/sources.list.d/cloudera.list
deb http://archive.cloudera.com/debian maverick-cdh3u3 contrib
deb-src http://archive.cloudera.com/debian maverick-cdh3u3 contrib
apt-get install curl
curl -s http://archive.cloudera.com/debian/archive.key | sudo apt-key add -
apt-get update

4. 安装Hadoop相关套件
hadoop-master上安装:
apt-get install hadoop-0.20-namenode
apt-get install hadoop-0.20-datanode
apt-get install hadoop-0.20-secondarynamenode
apt-get install hadoop-0.20-jobtracker

hadoop-node-1、hadoop-node-2上均安装:
apt-get install hadoop-0.20-datanode
apt-get install hadoop-0.20-tasktracker

5. 创建Hadoop配置文件
cp -r /etc/hadoop-0.20/conf.empty /etc/hadoop-0.20/conf.my_cluster

6. 激活新的配置文件
update-alternatives –install /etc/hadoop-0.20/conf hadoop-0.20-conf /etc/hadoop-0.20/conf.my_cluster 50 (优先级别配置)
查询当前配置:
update-alternatives –display hadoop-0.20-conf

7. 配置hadoop相关文件
7.1 所有server上配置java环境变量位置:
hwl@hadoop-master:~$ cat /etc/hadoop/conf/hadoop-env.sh
# Set Hadoop-specific environment variables here.
export JAVA_HOME=”/usr/lib/jvm/java-6-sun”
7.2 所有server上配置master、slave名称:
hwl@hadoop-master:~$ cat /etc/hadoop/conf/masters
hadoop-master
hwl@hadoop-master:~$ cat /etc/hadoop/conf/slaves
hadoop-node-1
hadoop-node-2
7.3 创建HDFS目录
mkdir -p /data/storage
mkdir -p /data/hdfs
chmod 700 /data/hdfs
chown -R hdfs:hadoop /data/hdfs
chmod 777 /data/storage
chmod o+t /data/storage
7.4 所有server配置core-site.xml
////
////

 

hadoop.tmp.dir
/data/storage
A directory for other temporary directories.
fs.default.name
hdfs://hadoop-master:8020
hadoop.tmp.dir指定了所有上传到Hadoop的文件的存放目录,所以要确保这个目录是足够大的。
fs.default.name指定NameNode的地址和端口号。
7.5 所有server配置hdfs-site.xml
////
////

dfs.name.dir
${hadoop.tmp.dir}/dfs/name dfs.data.dir
/data/hdfs dfs.replication
2 dfs.datanode.max.xcievers
4096 fs.checkpoint.period
300 fs.checkpoint.dir
${hadoop.tmp.dir}/dfs/namesecondary dfs.namenode.secondary.http-address
hadoop-secondary:50090
dfs.data.dir指定数据节点存放数据的位置。
dfs.replication指定每个Block需要备份的次数,起到冗余备份的作用,值必须小于DataNode的数目,否则会出错。
dfs.datanode.max.xcievers指定了HDFS Datanode同时处理文件的上限。
7.6 所有server配置mapred-site.xml
////
////

mapred.job.tracker
hdfs://hadoop-master:8021 mapred.system.dir
/mapred/system mapreduce.jobtracker.staging.root.dir
/user
mapred.job.tracker定位jobtracker的地址和端口。
mapred.system.dir定位存放在HDFS中的目录。
8. 格式化HDFS分布式文件系统
hwl@hadoop-master:~$ sudo -u hdfs hadoop namenode -format
[sudo] password for hwl:
14/05/11 19:18:31 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = hadoop-master/192.168.242.128
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 0.20.2-cdh3u3
STARTUP_MSG: build = file:///data/1/tmp/nightly_2012-03-20_13-13-48_3/hadoop-0.20-0.20.2+923.197-1~maverick -r 318bc781117fa276ae81a3d111f5eeba0020634f; compiled by ‘root’ on Tue Mar 20 13:45:02 PDT 2012
************************************************************/
14/05/11 19:18:31 INFO util.GSet: VM type = 32-bit
14/05/11 19:18:31 INFO util.GSet: 2% max memory = 19.33375 MB
14/05/11 19:18:31 INFO util.GSet: capacity = 2^22 = 4194304 entries
14/05/11 19:18:31 INFO util.GSet: recommended=4194304, actual=4194304
14/05/11 19:18:32 INFO security.UserGroupInformation: JAAS Configuration already set up for Hadoop, not re-installing.
14/05/11 19:18:32 INFO namenode.FSNamesystem: fsOwner=hdfs (auth:SIMPLE)
14/05/11 19:18:32 INFO namenode.FSNamesystem: supergroup=supergroup
14/05/11 19:18:32 INFO namenode.FSNamesystem: isPermissionEnabled=true
14/05/11 19:18:32 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=1000
14/05/11 19:18:32 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
14/05/11 19:18:32 INFO common.Storage: Image file of size 110 saved in 0 seconds.
14/05/11 19:18:32 INFO common.Storage: Storage directory /data/storage/dfs/name has been successfully formatted.
14/05/11 19:18:32 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at hadoop-master/192.168.242.128
************************************************************/

9. 启动相关进程
9.1 master上:
hwl@hadoop-master:~$ sudo /etc/init.d/hadoop-0.20-datanode start
Starting Hadoop datanode daemon: datanode running as process 1218. Stop it first.
hadoop-0.20-datanode.
hwl@hadoop-master:~$ sudo /etc/init.d/hadoop-0.20-namenode start
Starting Hadoop namenode daemon: starting namenode, logging to /usr/lib/hadoop-0.20/logs/hadoop-hadoop-namenode-hadoop-master.out
hadoop-0.20-namenode.
hwl@hadoop-master:~$ sudo /etc/init.d/hadoop-0.20-jobtracker start (启动了两次才成功,第一次日志里面显示SHUTDOWN)
Starting Hadoop jobtracker daemon: starting jobtracker, logging to /usr/lib/hadoop-0.20/logs/hadoop-hadoop-jobtracker-hadoop-master.out
hadoop-0.20-jobtracker.
hwl@hadoop-master:~$ sudo /etc/init.d/hadoop-0.20-secondarynamenode start
Starting Hadoop secondarynamenode daemon: secondarynamenode running as process 1586. Stop it first.
hadoop-0.20-secondarynamenode.
hwl@hadoop-master:~$ sudo netstat -tnpl
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 838/sshd
tcp6 0 0 :::38197 :::* LISTEN 1589/java
tcp6 0 0 :::50070 :::* LISTEN 2070/java
tcp6 0 0 :::22 :::* LISTEN 838/sshd
tcp6 0 0 :::50010 :::* LISTEN 1274/java
tcp6 0 0 :::50075 :::* LISTEN 1274/java
tcp6 0 0 :::50020 :::* LISTEN 1274/java
tcp6 0 0 :::50090 :::* LISTEN 1589/java
tcp6 0 0 :::45579 :::* LISTEN 2070/java
tcp6 0 0 :::36590 :::* LISTEN 1274/java
tcp6 0 0 192.168.242.128:8020 :::* LISTEN 2070/java
hwl@hadoop-master:~$ sudo jps
2070 NameNode
3117 Jps
1589 SecondaryNameNode
1274 DataNode
3061 JobTracker

9.2 node上:
hwl@hadoop-node-1:~$ sudo /etc/init.d/hadoop-0.20-datanode start
Starting Hadoop datanode daemon: datanode running as process 1400. Stop it first.
hadoop-0.20-datanode.
hwl@hadoop-node-1:~$ sudo /etc/init.d/hadoop-0.20-tasktracker start
Starting Hadoop tasktracker daemon: starting tasktracker, logging to /usr/lib/hadoop-0.20/logs/hadoop-hadoop-tasktracker-hadoop-node-1.out
hadoop-0.20-tasktracker.
hwl@hadoop-node-1:~$ sudo jps
1926 TaskTracker
1968 Jps
1428 DataNode
hwl@hadoop-node-2:~$ sudo /etc/init.d/hadoop-0.20-datanode start
Starting Hadoop datanode daemon: datanode running as process 1156. Stop it first.
hadoop-0.20-datanode.
hwl@hadoop-node-2:~$ sudo /etc/init.d/hadoop-0.20-tasktracker start
Starting Hadoop tasktracker daemon: starting tasktracker, logging to /usr/lib/hadoop-0.20/logs/hadoop-hadoop-tasktracker-hadoop-node-2.out
hadoop-0.20-tasktracker.
hwl@hadoop-node-2:~$ sudo jps
1864 TaskTracker
1189 DataNode
1905 Jps

10 创建mapred.system.dir的HDFS目录
hwl@hadoop-master:~$ sudo -u hdfs hadoop fs -mkdir /mapred/system
14/05/11 19:30:54 INFO security.UserGroupInformation: JAAS Configuration already set up for Hadoop, not re-installing.
hwl@hadoop-master:~$ sudo -u hdfs hadoop fs -chown mapred:hadoop /mapred/system
14/05/11 19:31:11 INFO security.UserGroupInformation: JAAS Configuration already set up for Hadoop, not re-installing.

11 测试对HDFS的相关操作
hwl@hadoop-master:~$ echo “Hello” > hello.txt
hwl@hadoop-master:~$ sudo -u hdfs hadoop fs -mkdir /hwl
14/05/11 19:31:52 INFO security.UserGroupInformation: JAAS Configuration already set up for Hadoop, not re-installing.
hwl@hadoop-master:~$ sudo -u hdfs hadoop fs -copyFromLocal hello.txt /hwl
14/05/11 19:32:03 INFO security.UserGroupInformation: JAAS Configuration already set up for Hadoop, not re-installing.
hwl@hadoop-master:~$ sudo -u hdfs hadoop fs -ls /hwl
14/05/11 19:32:17 INFO security.UserGroupInformation: JAAS Configuration already set up for Hadoop, not re-installing.
Found 1 items
-rw-r–r– 2 hdfs supergroup 14 2014-05-11 19:32 /hwl/hello.txt

12 查看集群状态:
12.1 web查看

http://192.168.242.128:50070/

 

http://192.168.242.128:50030/

12.2 命令行下查看
hwl@hadoop-master:~$ sudo -u hdfs hadoop dfsadmin -report
14/05/11 19:45:11 INFO security.UserGroupInformation: JAAS Configuration already set up for Hadoop, not re-installing.
Configured Capacity: 252069396480 (234.76 GB)
Present Capacity: 234272096256 (218.18 GB)
DFS Remaining: 234271989760 (218.18 GB)
DFS Used: 106496 (104 KB)
DFS Used%: 0%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0

————————————————-
Datanodes available: 3 (3 total, 0 dead)

Name: 192.168.242.128:50010
Decommission Status : Normal
Configured Capacity: 84023132160 (78.25 GB)
DFS Used: 40960 (40 KB)
Non DFS Used: 5935935488 (5.53 GB)
DFS Remaining: 78087155712(72.72 GB)
DFS Used%: 0%
DFS Remaining%: 92.94%
Last contact: Sun May 11 19:45:11 PDT 2014

Name: 192.168.242.129:50010
Decommission Status : Normal
Configured Capacity: 84023132160 (78.25 GB)
DFS Used: 28672 (28 KB)
Non DFS Used: 5931614208 (5.52 GB)
DFS Remaining: 78091489280(72.73 GB)
DFS Used%: 0%
DFS Remaining%: 92.94%
Last contact: Sun May 11 19:45:08 PDT 2014

Name: 192.168.242.130:50010
Decommission Status : Normal
Configured Capacity: 84023132160 (78.25 GB)
DFS Used: 36864 (36 KB)
Non DFS Used: 5929750528 (5.52 GB)
DFS Remaining: 78093344768(72.73 GB)
DFS Used%: 0%
DFS Remaining%: 92.94%
Last contact: Sun May 11 19:45:08 PDT 2014

时间: 2024-10-21 09:38:45

Hadoop服务器集群HDFS 安装与配置详解的相关文章

Redis基本安装和配置详解

Redis基本安装和配置详解 1.安装 wget http://download.redis.io/releases/redis-3.2.3.tar.gz 编译安装: tar xf redis-3.2.3.tar.gz cd redis-3.2.3 make && make install 配置: mkdir /etc/redis 建立配置文件存放目录 cp -a redis.conf /etc/redis/6379.conf 复制配置文件 cp -a utils/redis_init_s

持续集成(CI)工具------Hudson/Jenkins(Continuous Integration)安装与配置详解

本文允许转载,但请标明出处:http://blog.csdn.net/wanghantong/article/40985653/, 版权所有 文章概述: 一. 描述了持续集成工具Hudson的安装与配置 二. 描述了Git .Maven环境的安装与配置 三. 描述了扩展邮件通知及其配置方法 四. 描述了jira的配置 一.Hudson简介 Hudson是Jenkins的前身,是基于Java开发的一种持续集成工具,用于监控持续的软件版本发布/测试项目 下载地址:http://eclipse.org

CentOS中vsftp安装与配置详解_Linux

一般我们在安装完系统后都会自动安装了vsftp服务了,但是有时候还是得需要自己动手的,比如这两天就在给我朋友配置了一下,顺手把过程记录下来,以便需要的时候查阅或者给需要的朋友提供方便:) 1. 安装 使用chkconfig --list来查看是否装有vsftpd服务: 使用yum命令直接安装: yum -y install vsftpd 然后为它创建日志文件: touch /var/log/vsftpd.log 这样简单的两个命令就完成了vsftp的安装,但是如果你现在想这样ftp://your

PHP MySQL的安装与配置详解_Mysql

 一.安装配置PHP 1.下载Php的版本zip包之后,解压缩到指定目录.下载地址:http://www.php.net/downloads.php 2.在Apache的httpd.conf文件中加入以下 #Php模块加载 LoadModule php5_module "D:/Software/GreenSoft/Php/php5.4.6/php5apache2_2.dll" #php.ini路径设置 PHPIniDir "D:/Software/GreenSoft/Php/

在CentOS6.5无外网环境下的MariaDB-Galera-Cluster 5.5集群的安装和配置

**如果有网络环境,可以对应自己的Linux发行版添加源,并通过包管理器进行安装.以下列举CentOS6_x86_64 MariaDB5.5的源地址 # MariaDB 5.5 CentOS repository list - created 2017-07-13 00:58 UTC # http://downloads.mariadb.org/mariadb/repositories/ [mariadb] name = MariaDB baseurl = http://yum.mariadb.

Windows和Linux中php代码调试工具Xdebug的安装与配置详解_php实例

一.为什么需要Debugger? 很多PHP程序员调试使用echo.print_r().var_dump().printf()等,其实对 于有较丰富开发经验的程序员来说这些也已经足够了,他们往往可以在程序执行的过程中,通过输出特定变量的值可以判断程序执行是否正确,甚至效率高低也可以 看出来(当然可能还需要使用一些时间函数).那么我们为什么还需要一个专门的调试程序来监控我们的程序运行呢? 这个问题的答案不妨留到后面来揭晓. 二.什么是Xdebug? Xdebug是一个开放源代码的PHP程序调试器(

CentOS6.6 vsFTP安装与配置详解

CentOS6.6 vsFTP安装与配置 第一步:安装vsftp pam db4 yum install vsftpd pam* db4* -y 使用命令将vsftp配置为系统服务 chkconfig --level 35 vsftpd on 第二步:配置vsftpd服务的宿主 #useradd vsftpd -s /sbin/nologin 这个vsftpd只是用来替换root的,并不需要登录 第三步:建立ftp虚拟宿主帐户 #useradd ftpuser -s /sbin/nologin

使Nginx服务器支持中文URL的相关配置详解_nginx

关于中文URL已经是老话题了,到目前为止依然有很大一部分SEOer都会说不要使用中文URL,对搜索引擎不友好. 不过,那已经是以前的事了,谷歌很早就支持了中文URL,当时百度技术没有跟上,URL中会出现乱码. 在谷歌的算法中,URL包含关键字是会给页面赋予一定权重的,英文是,中文也是,朽木猜测百度之前没有给予中文URL权重,可能是因为识别的问题. 经过一些简单的测试,朽木发现中文URL中包含关键字,对百度SEO有很积极的影响. 不过需要注意的是最好使用UTF8编码,虽然百度有了"一定的识别能力&

centos下fail2ban安装与配置详解_Linux

一.fail2ban简介 fail2ban可以监视你的系统日志,然后匹配日志的错误信息(正则式匹配)执行相应的屏蔽动作(一般情况下是防火墙),而且可以发送e-mail通知系统管理员,是不是很好.很实用.很强大! 二.简单来介绍一下fail2ban的功能和特性 1.支持大量服务.如sshd,apache,qmail,proftpd,sasl等等2.支持多种动作.如iptables,tcp-wrapper,shorewall(iptables第三方工具),mail notifications(邮件通