源云计算技术系列(七)Cloudera (hadoop 0.20)

虚拟一套centos 5.3 os.

下载 jdk-6u16-linux-i586-rpm.bin

[root@hadoop ~]# chmod +x jdk-6u16-linux-i586-rpm.bin

[root@hadoop ~]# ./jdk-6u16-linux-i586-rpm.bin

[root@hadoop ~]#  java -version
java version "1.6.0"
OpenJDK  Runtime Environment (build 1.6.0-b09)
OpenJDK Client VM (build 1.6.0-b09, mixed mode)

[root@hadoop yum.repos.d]# wget http://archive.cloudera.com/redhat/cdh/cloudera-testing.repo

[root@hadoop yum.repos.d]# ls
CentOS-Base.repo  CentOS-Base.repo.bak  CentOS-Media.repo  cloudera-testing.repo

[root@hadoop ~]# yum install hadoop-0.20 -y
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
Setting up Install Process
Parsing package install arguments
Resolving Dependencies
--> Running transaction check
---> Package hadoop-0.20.noarch 0:0.20.0+69-1 set to be updated
--> Finished Dependency Resolution

Dependencies Resolved

===============================================================================
Package                 Arch               Version                   Repository                    Size
===============================================================================
Installing:
hadoop-0.20             noarch             0.20.0+69-1               cloudera-testing              18 M

Transaction Summary
=========================================================================================================
Install      1 Package(s)        
Update       0 Package(s)        
Remove       0 Package(s)

Total download size: 18 M
Downloading Packages:
hadoop-0.20-0.20.0+69-1.noarch.rpm                                                |  18 MB     01:34    
Running rpm_check_debug
Running Transaction Test
Finished Transaction Test
Transaction Test Succeeded
Running Transaction
  Installing     : hadoop-0.20                                       [1/1]

Installed: hadoop-0.20.noarch 0:0.20.0+69-1
Complete!

root@hadoop conf]# yum install hadoop-0.20-conf-pseudo -y
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
Setting up Install Process
Parsing package install arguments
Resolving Dependencies
--> Running transaction check
---> Package hadoop-0.20-conf-pseudo.noarch 0:0.20.0+69-1 set to be updated
--> Finished Dependency Resolution

Dependencies Resolved

=========================================================================================================
Package                          Arch            Version                Repository                 Size
=========================================================================================================
Installing:
hadoop-0.20-conf-pseudo          noarch          0.20.0+69-1            cloudera-testing           11 k

Transaction Summary
=========================================================================================================
Install      1 Package(s)        
Update       0 Package(s)        
Remove       0 Package(s)

Total download size: 11 k
Downloading Packages:
hadoop-0.20-conf-pseudo-0.20.0+69-1.noarch.rpm                                    |  11 kB     00:00    
Running rpm_check_debug
Running Transaction Test
Finished Transaction Test
Transaction Test Succeeded
Running Transaction
  Installing     : hadoop-0.20-conf-pseudo                           [1/1]

Installed: hadoop-0.20-conf-pseudo.noarch 0:0.20.0+69-1
Complete!

安装完后可以在这个目录下看到。

[root@hadoop conf.pseudo]# rpm -ql hadoop-0.20-conf-pseudo
/etc/hadoop-0.20/conf.pseudo
/etc/hadoop-0.20/conf.pseudo/README
/etc/hadoop-0.20/conf.pseudo/capacity-scheduler.xml
/etc/hadoop-0.20/conf.pseudo/configuration.xsl
/etc/hadoop-0.20/conf.pseudo/core-site.xml
/etc/hadoop-0.20/conf.pseudo/fair-scheduler.xml
/etc/hadoop-0.20/conf.pseudo/hadoop-env.sh
/etc/hadoop-0.20/conf.pseudo/hadoop-metrics.properties
/etc/hadoop-0.20/conf.pseudo/hadoop-policy.xml
/etc/hadoop-0.20/conf.pseudo/hdfs-site.xml
/etc/hadoop-0.20/conf.pseudo/log4j.properties
/etc/hadoop-0.20/conf.pseudo/mapred-site.xml
/etc/hadoop-0.20/conf.pseudo/masters
/etc/hadoop-0.20/conf.pseudo/slaves
/etc/hadoop-0.20/conf.pseudo/ssl-client.xml.example
/etc/hadoop-0.20/conf.pseudo/ssl-server.xml.example
/var/lib/hadoop-0.20
/var/lib/hadoop-0.20/cache

[root@hadoop conf.pseudo]# pwd
/etc/hadoop-0.20/conf.pseudo

[root@hadoop conf.pseudo]# more core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
  <property>
    <name>fs.default.name</name>
    <value>hdfs://localhost:8020</value>
  </property>

<property>
     <name>hadoop.tmp.dir</name>
     <value>/var/lib/hadoop-0.20/cache/${user.name}</value>
  </property>
</configuration>

启动hadoop相关服务:

[root@hadoop conf.pseudo]# for service in /etc/init.d/hadoop-0.20-*
> do
> sudo $service start
> done
Starting Hadoop datanode daemon (hadoop-datanode): starting datanode, logging to /usr/lib/hadoop-0.20/bin/../logs/hadoop-hadoop-datanode-hadoop.out
                                                           [  OK  ]
Starting Hadoop jobtracker daemon (hadoop-jobtracker): starting jobtracker, logging to /usr/lib/hadoop-0.20/bin/../logs/hadoop-hadoop-jobtracker-hadoop.out
                                                           [  OK  ]
Starting Hadoop namenode daemon (hadoop-namenode): starting namenode, logging to /usr/lib/hadoop-0.20/bin/../logs/hadoop-hadoop-namenode-hadoop.out
                                                           [  OK  ]
Starting Hadoop secondarynamenode daemon (hadoop-secondarynamenode): starting secondarynamenode, logging to /usr/lib/hadoop-0.20/bin/../logs/hadoop-hadoop-secondarynamenode-hadoop.out
                                                           [  OK  ]
Starting Hadoop tasktracker daemon (hadoop-tasktracker): starting tasktracker, logging to /usr/lib/hadoop-0.20/bin/../logs/hadoop-hadoop-tasktracker-hadoop.out
                                                           [  OK  ]

验证一下启动成功:

hadoop    3503     1  8 18:33 ?        00:00:03 /usr/java/jdk1.6.0_16/bin/java -Xmx1000m -Dcom.sun.manage
hadoop    3577     1 10 18:33 ?        00:00:04 /usr/java/jdk1.6.0_16/bin/java -Xmx1000m -Dcom.sun.manage
hadoop    3657     1 15 18:33 ?        00:00:05 /usr/java/jdk1.6.0_16/bin/java -Xmx1000m -Dcom.sun.manage
hadoop    3734     1 11 18:33 ?        00:00:04 /usr/java/jdk1.6.0_16/bin/java -Xmx1000m -Dcom.sun.manage
hadoop    3827     1  7 18:33 ?        00:00:02 /usr/java/jdk1.6.0_16/bin/java -Xmx1000m -Dhadoop.log.di

测试几个例子:

root@hadoop conf.pseudo]# hadoop-0.20 fs -mkdir input
[root@hadoop conf.pseudo]# hadoop-0.20 fs -put /etc/hadoop-0.20/conf/*.xml input
[root@hadoop conf.pseudo]# hadoop-0.20 fs -ls input
Found 6 items
-rw-r--r--   1 root supergroup       6275 2009-08-25 18:34 /user/root/input/capacity-scheduler.xml
-rw-r--r--   1 root supergroup        338 2009-08-25 18:34 /user/root/input/core-site.xml
-rw-r--r--   1 root supergroup       3032 2009-08-25 18:34 /user/root/input/fair-scheduler.xml
-rw-r--r--   1 root supergroup       4190 2009-08-25 18:34 /user/root/input/hadoop-policy.xml
-rw-r--r--   1 root supergroup        496 2009-08-25 18:34 /user/root/input/hdfs-site.xml
-rw-r--r--   1 root supergroup        213 2009-08-25 18:34 /user/root/input/mapred-site.xml

[root@hadoop conf.pseudo]# hadoop-0.20 jar /usr/lib/hadoop-0.20/hadoop-*-examples.jar grep input output 'dfs[a-z.]+'
09/08/25 18:34:59 INFO mapred.FileInputFormat: Total input paths to process : 6
09/08/25 18:35:00 INFO mapred.JobClient: Running job: job_200908251833_0001
09/08/25 18:35:01 INFO mapred.JobClient:  map 0% reduce 0%
09/08/25 18:35:20 INFO mapred.JobClient:  map 33% reduce 0%
09/08/25 18:35:33 INFO mapred.JobClient:  map 66% reduce 11%
09/08/25 18:35:42 INFO mapred.JobClient:  map 66% reduce 22%
09/08/25 18:35:45 INFO mapred.JobClient:  map 100% reduce 22%
09/08/25 18:35:57 INFO mapred.JobClient:  map 100% reduce 100%
09/08/25 18:35:59 INFO mapred.JobClient: Job complete: job_200908251833_0001
09/08/25 18:35:59 INFO mapred.JobClient: Counters: 18
09/08/25 18:35:59 INFO mapred.JobClient:   Job Counters
09/08/25 18:35:59 INFO mapred.JobClient:     Launched reduce tasks=1
09/08/25 18:35:59 INFO mapred.JobClient:     Launched map tasks=6
09/08/25 18:35:59 INFO mapred.JobClient:     Data-local map tasks=6
09/08/25 18:35:59 INFO mapred.JobClient:   FileSystemCounters
09/08/25 18:35:59 INFO mapred.JobClient:     FILE_BYTES_READ=100
09/08/25 18:35:59 INFO mapred.JobClient:     HDFS_BYTES_READ=14544
09/08/25 18:35:59 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=422
09/08/25 18:35:59 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=204
09/08/25 18:35:59 INFO mapred.JobClient:   Map-Reduce Framework
09/08/25 18:35:59 INFO mapred.JobClient:     Reduce input groups=4
09/08/25 18:35:59 INFO mapred.JobClient:     Combine output records=4
09/08/25 18:35:59 INFO mapred.JobClient:     Map input records=364
09/08/25 18:35:59 INFO mapred.JobClient:     Reduce shuffle bytes=124
09/08/25 18:35:59 INFO mapred.JobClient:     Reduce output records=4
09/08/25 18:35:59 INFO mapred.JobClient:     Spilled Records=8
09/08/25 18:35:59 INFO mapred.JobClient:     Map output bytes=86
09/08/25 18:35:59 INFO mapred.JobClient:     Map input bytes=14544
09/08/25 18:35:59 INFO mapred.JobClient:     Combine input records=4
09/08/25 18:35:59 INFO mapred.JobClient:     Map output records=4
09/08/25 18:35:59 INFO mapred.JobClient:     Reduce input records=4
09/08/25 18:35:59 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
09/08/25 18:35:59 INFO mapred.FileInputFormat: Total input paths to process : 1
09/08/25 18:36:00 INFO mapred.JobClient: Running job: job_200908251833_0002
09/08/25 18:36:01 INFO mapred.JobClient:  map 0% reduce 0%
09/08/25 18:36:12 INFO mapred.JobClient:  map 100% reduce 0%
09/08/25 18:36:24 INFO mapred.JobClient:  map 100% reduce 100%
09/08/25 18:36:26 INFO mapred.JobClient: Job complete: job_200908251833_0002
09/08/25 18:36:26 INFO mapred.JobClient: Counters: 18
09/08/25 18:36:26 INFO mapred.JobClient:   Job Counters
09/08/25 18:36:26 INFO mapred.JobClient:     Launched reduce tasks=1
09/08/25 18:36:26 INFO mapred.JobClient:     Launched map tasks=1
09/08/25 18:36:26 INFO mapred.JobClient:     Data-local map tasks=1
09/08/25 18:36:26 INFO mapred.JobClient:   FileSystemCounters
09/08/25 18:36:26 INFO mapred.JobClient:     FILE_BYTES_READ=100
09/08/25 18:36:26 INFO mapred.JobClient:     HDFS_BYTES_READ=204
09/08/25 18:36:26 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=232
09/08/25 18:36:26 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=62
09/08/25 18:36:26 INFO mapred.JobClient:   Map-Reduce Framework
09/08/25 18:36:26 INFO mapred.JobClient:     Reduce input groups=1
09/08/25 18:36:26 INFO mapred.JobClient:     Combine output records=0
09/08/25 18:36:26 INFO mapred.JobClient:     Map input records=4
09/08/25 18:36:26 INFO mapred.JobClient:     Reduce shuffle bytes=0
09/08/25 18:36:26 INFO mapred.JobClient:     Reduce output records=4
09/08/25 18:36:26 INFO mapred.JobClient:     Spilled Records=8
09/08/25 18:36:26 INFO mapred.JobClient:     Map output bytes=86
09/08/25 18:36:26 INFO mapred.JobClient:     Map input bytes=118
09/08/25 18:36:26 INFO mapred.JobClient:     Combine input records=0
09/08/25 18:36:26 INFO mapred.JobClient:     Map output records=4
09/08/25 18:36:26 INFO mapred.JobClient:     Reduce input records=4

[root@hadoop conf.pseudo]#    hadoop-0.20 fs -ls
Found 2 items
drwxr-xr-x   - root supergroup          0 2009-08-25 18:34 /user/root/input
drwxr-xr-x   - root supergroup          0 2009-08-25 18:36 /user/root/output

[root@hadoop conf.pseudo]# hadoop-0.20 fs -ls output
Found 2 items
drwxr-xr-x   - root supergroup          0 2009-08-25 18:36 /user/root/output/_logs
-rw-r--r--   1 root supergroup         62 2009-08-25 18:36 /user/root/output/part-00000

[root@hadoop conf.pseudo]# hadoop-0.20 fs -cat output/part-00000 | head
1       dfs.name.dir
1       dfs.permissions
1       dfs.replication
1       dfsadmin

时间: 2024-09-01 15:23:22

源云计算技术系列(七)Cloudera (hadoop 0.20)的相关文章

[Hadoop系列]Changes of Hadoop 0.20笔记

最近学习hadoop 0.20.1,网上找到一篇文章<What's New in Hadoop Core 0.20 >,非完整的给翻译了一下,为以后检索方便,发上来保存一份.如果能读懂英文的,千万不要看下面的中文.   Hadoop Core 0.20.0在2009年4月22日发布.这一发布相对0.19发布,有很多用户使用层面上的改变. Core Hadoop中两个主要的组件是分布式文件系统(HDFS)和MapReduce,那两个组件分别挪入各自的子项目中,因此他们能拥有自己的发布周期,并且更

开源云计算技术系列(四)(Cloudera体验篇)

Cloudera  的定位在于 Bringing Big Data to the Enterprise with Hadoop Cloudera为了让Hadoop的配置标准化,可以帮助企业安装,配置,运行hadoop以达到大规模企业数据的处理和分析. 既然是给企业使用,Cloudera的软件配置不是采用最新的hadoop 0.20,而是采用了Hadoop 0.18.3-12.cloudera.CH0_3的版本进行封装,并且集成了facebook提供的hive,yahoo提供的pig等基于hado

开源云计算技术系列(四)(Cloudera安装配置)

节省篇幅,直入正题. 首先用虚拟机virtualbox 配置一台debian 5.0. debian在开源linux里面始终是最为纯正的linux血统,使用起来方便,运行起来高效,重新审视一下最新的5.0,别有一番似是故人来的感觉. 只需要下载debian-501-i386-CD-1.iso进行安装,剩下的基于debian强大的网络功能,可以很方便的进行软件包的配置.具体过程这里略去,可以在www.debian.org里面找到所有你需要的信息. 下面我们来体验一下稳定版0.183的方便和简洁.

开源云计算技术系列(六)hypertable(hadoop hdfs)

选择virtualbox建立ubuntu server 904 的虚拟机作为基础环境. hadoop@hadoop:~$ sudo apt-get install g++ cmake libboost-dev liblog4cpp5-dev git-core cronolog libgoogle-perftools-dev libevent-dev zlib1g-dev libexpat1-dev libdb4.6++-dev libncurses-dev libreadline5-dev ha

开源云计算技术系列(四)(Cloudera安装配置hadoop 0.20最新版配置)

接上文,我们继续体验Cloudera 0.20最新版. wget hadoop-0.20-conf-pseudo_0.20.0-1cloudera0.5.0~lenny_all.deb wget hadoop-0.20_0.20.0-1cloudera0.5.0~lenny_all.deb debian:~# dpkg –i hadoop-0.20-conf-pseudo_0.20.0-1cloudera0.5.0~lenny_all.deb dpkg –i hadoop-0.20_0.20.0

开源云计算技术系列(五)(崛起的黑马Sector/Sphere 实战篇)

在基于java的hadoop如日中天的时代,开源云计算界有一匹基于C++的黑马,Sector/Sphere在性能方面对hadoop提出了挑战,在Open Cloud Consortium(OCC)开放云计算协会建立的Open Cloud Testbed开放云实验床的软件测试中, Sector is about twice as fast as Hadoop. 本篇先对这匹黑马做一次实战演习,先感受一下,下一篇深入其设计原理,探讨云计算的本质. OCT是一套跨核心10G带宽教育网的多个数据中心的计

开源云计算技术系列三(10gen)安装配置

10gen 是一套云计算平台,可以为web应用提供可以扩展的高性能的数据存储解决方案.10gen的开源项目是mongoDB,主要功能是解决website的操作性数据存储,session对象的存储,数据缓存,高效率的实时计数(比如统计pv,uv),并支持ruby,python,java,c++,php等众多的页面语言. MongoDB主要特征是存储数据非常方便,不在是传统的object-relational mapping的模式,高性能,可以存储大对象数据,比如视频等,可以自动复制和failove

开源云计算技术系列(六)hypertable (HQL)

既然已经安装配置好hypertable,那趁热打铁体验一下HQL. 准备好实验数据 hadoop@hadoop:~$ gunzip access.tsv.gz hadoop@hadoop:~$ mv access.tsv ~/hypertable/0.9.2.5/examples/hql_tutorial/ hadoop@hadoop:~$ cd ~/hypertable/0.9.2.5/examples/hql_tutorial/ hadoop@hadoop:~/hypertable/0.9.

基于云计算技术的化合物相似性分析系统

基于云计算技术的化合物相似性分析系统 复旦大学  李杰辉 本文研究了云计算的相关理论.特点和关键技术,探索了分布式数据处理编程模型MapReduce以及其开源实现Hadoop的运行机制和原理,针对分子结构比较问题提出了一个分布式解决方案.主要做了如下研究:1)研究了云计算相关技术和分子相似性比较算法,结合Hadoop云计算技术的优点,针对Hadoop无法直接应用于分子相似性比较问题,提出了基于索引文件的处理方法,将Hadoop云计算技术应用到分子相似性比较领域上.2)通过实验验证了该方法的可行性