hadoop 安装
安装jdk
vim ~/.bash_profile
export JAVA_HOME="YOUR_JAVA_HOME"
export PATH=$PATH:$JAVA_HOME/bin
配置完成后,运行
java -version
--------------
java version "1.8.0_121"
Java(TM) SE Runtime Environment (build 1.8.0_121-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.121-b13, mixed mode)
ssh免密登入
ssh-keygen -t rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
ssh localhost # 验证
配置hadoop
下载hadoop,解压到指定目录,这里是/opt
配置系统变量
vim ~/.bash_profile
export HADOOP_HOME=/opt/hadoop-2.7.3
export HADOOP_PREFIX=$HADOOP_HOME
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
修改 /etc/hadoop/hadoop-env.sh
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_121.jdk/Contents/Home
export HADOOP_CONF_DIR=/opt/hadoop-2.7.3/etc/hadoop
export HADOOP_OPTS="$HADOOP_OPTS -Djava.security.krb5.realm=OX.AC.UK -Djava.security.krb5.kdc=kdc0.ox.ac.uk:kdc1.ox.ac.uk"
修改/etc/hadoop/core-site.xml
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
<description>The name of the default file system.</description>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/Users/micmiu/tmp/hadoop</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>io.native.lib.available</name>
<value>false</value>
<description>default value is true:Should native hadoop libraries, if present, be used.</description>
</property>
修改hdfs-site.xml
<property>
<name>dfs.replication</name>
<value>1</value>
<!--如果是单节点配置为1,如果是集群根据实际集群数量配置 -->
</property>
修改yarn-site.xml
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
修改mapred-site.xml
cp mapred-site.xml.template mapred-site.xml.
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
<final>true</final>
</property>
格式化namenode
hadoop namenode -format
启动hdfs和yarn
start-dfs.sh
start-yarn.sh
查看守护进程是否开启
jps
6917 DataNode
6838 NameNode
2810 Launcher
7130 ResourceManager
7019 SecondaryNameNode
7772 Jps
7215 NodeManager
wordcount示例
hdfs dfs -mkdir -p /user/jjzhu/wordcount/in
hdfs dfs -put xxxxx.txt /user/jjzhu/wordcount/in
hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar wordcount /user/jjzhu/wordcount/in /user/jjzhu/wordcount/out
运行过程
17/04/07 13:04:10 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
17/04/07 13:04:10 INFO input.FileInputFormat: Total input paths to process : 1
17/04/07 13:04:10 INFO mapreduce.JobSubmitter: number of splits:1
17/04/07 13:04:11 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1491532908338_0004
17/04/07 13:04:11 INFO impl.YarnClientImpl: Submitted application application_1491532908338_0004
17/04/07 13:04:11 INFO mapreduce.Job: The url to track the job: http://jjzhu:8088/proxy/application_1491532908338_0004/
17/04/07 13:04:11 INFO mapreduce.Job: Running job: job_1491532908338_0004
17/04/07 13:04:18 INFO mapreduce.Job: Job job_1491532908338_0004 running in uber mode : false
17/04/07 13:04:18 INFO mapreduce.Job: map 0% reduce 0%
17/04/07 13:04:23 INFO mapreduce.Job: map 100% reduce 0%
17/04/07 13:04:29 INFO mapreduce.Job: map 100% reduce 100%
17/04/07 13:04:29 INFO mapreduce.Job: Job job_1491532908338_0004 completed successfully
17/04/07 13:04:29 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=1141
FILE: Number of bytes written=239913
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=869
HDFS: Number of bytes written=779
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=2859
Total time spent by all reduces in occupied slots (ms)=2527
Total time spent by all map tasks (ms)=2859
Total time spent by all reduce tasks (ms)=2527
Total vcore-milliseconds taken by all map tasks=2859
Total vcore-milliseconds taken by all reduce tasks=2527
Total megabyte-milliseconds taken by all map tasks=2927616
Total megabyte-milliseconds taken by all reduce tasks=2587648
Map-Reduce Framework
Map input records=1
Map output records=118
Map output bytes=1219
Map output materialized bytes=1141
Input split bytes=122
Combine input records=118
Combine output records=89
Reduce input groups=89
Reduce shuffle bytes=1141
Reduce input records=89
Reduce output records=89
Spilled Records=178
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=103
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
Total committed heap usage (bytes)=329252864
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=747
File Output Format Counters
Bytes Written=779
查看结果
hdfs dfs -ls /user/jjzhu/wordcount/out
-rw-r--r-- 1 didi supergroup 0 2017-04-07 13:04 /user/jjzhu/wordcount/out/_SUCCESS
-rw-r--r-- 1 didi supergroup 779 2017-04-07 13:04 /user/jjzhu/wordcount/out/part-r-00000
hdfs dfs -cat /user/jjzhu/wordcount/out/part-r-00000
A 1
Other 1
Others 1
Some 2
There 1
a 1
access 2
access); 1
according 1
adding 1
allowing 1
......
关闭hadoop
stop-hdfs.sh
stop-yarn.sh
安装hive
下载解压配置环境变量
export HIVE_HOME=/opt/hive-2.1.1
export PATH=$HIVE_HOME/bin:$PATH
配置hive
cd /opt/hive/conf
cp hive-env.sh.template hive-env.sh
cp hive-default.xml.template hive-site.xml
vim hive-env.sh
HADOOP_HOME=/opt/hadoop-2.7.3
export HIVE_CONF_DIR=/opt/hive-2.1.1/conf
export HIVE_AUX_JARS_PATH=/opt/hive-2.1.1//lib
下载mysql-connector-xx.xx.xx.jar 到lib下
vim hive-site.xml
将所有${system:java.io.tmpdir} 和 ${system:user.name}替换
并配置mysql数据库连接信息
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true&characterEncoding=UTF-8&useSSL=false</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>123456</value>
</property>
为hive创建HDFS目录
hdfs dfs -mkdir -p /usr/hive/warehouse
hdfs dfs -mkdir -p /usr/hive/tmp
hdfs dfs -mkdir -p /usr/hive/log
hdfs dfs -chmod -R 777 /usr/hive
初始化数据库
./bin/schematool -initSchema -dbType mysql
mysql> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| hive |
| mysql |
| performance_schema |
| sys |
+--------------------+
mysql> use hive;
Database changed
mysql> show tables;
+---------------------------+
| Tables_in_hive |
+---------------------------+
| AUX_TABLE |
| BUCKETING_COLS |
| SORT_COLS |
| TABLE_PARAMS |
| TAB_COL_STATS |
| TBLS |
| TBL_COL_PRIVS |
| TBL_PRIVS |
| TXNS |
| TXN_COMPONENTS |
| TYPES |
| TYPE_FIELDS |
| VERSION |
| WRITE_SET |
+---------------------------+
启动hive
jjzhu:opt didi$ hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hive-2.1.1/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Logging initialized using configuration in jar:file:/opt/hive-2.1.1/lib/hive-common-2.1.1.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive>
安装sqoop
下载解压配置环境变量
export $SQOOP_HOME=/opt/sqoop-1.99.7
export SQOOP_SERVER_EXTRA_LIB=$SQOOP_HOME/extra
export PATH=$SQOOP_HOME/bin:$PATH
修改sqoop配置
在conf目录下的两个主要配置文件sqoop.properties和sqoop_bootstrap.properties
主要修改sqoop.properties
org.apache.sqoop.submission.engine.mapreduce.configuration.directory=/opt/hadoop-2.7.3/etc/hadoop
org.apache.sqoop.security.authentication.type=SIMPLE
org.apache.sqoop.security.authentication.handler=org.apache.sqoop.security.authentication.SimpleAuthenticationHandler
org.apache.sqoop.security.authentication.anonymous=true
验证配置是否有效
jzhu:bin didi$ ./sqoop2-tool verify
Setting conf dir: /opt/sqoop-1.99.7/bin/../conf
Sqoop home directory: /opt/sqoop-1.99.7
Sqoop tool executor:
Version: 1.99.7
Revision: 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb
Compiled on Tue Jul 19 16:08:27 PDT 2016 by abefine
Running tool: class org.apache.sqoop.tools.tool.VerifyTool
0 [main] INFO org.apache.sqoop.core.SqoopServer - Initializing Sqoop server.
12 [main] INFO org.apache.sqoop.core.PropertiesConfigurationProvider - Starting config file poller thread
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hive-2.1.1/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
Verification was successful.
Tool class org.apache.sqoop.tools.tool.VerifyTool has finished correctly.
jjzhu:bin didi$
开启服务器
./bin/sqoop2-server start
jps
9505 SqoopJettyServer
....
时间: 2024-10-01 15:31:49