高可用Hadoop平台-应用JAR部署

1.概述

  今天在观察集群时,发现NN节点的负载过高,虽然对NN节点的资源进行了调整,同时对NN节点上的应用程序进行重新打包调整,负载问题暂时得到
缓解。但是,我想了想,这样也不是长久之计。通过这个问题,我重新分析了一下以前应用部署架构图,发现了一些问题的所在,之前的部署架构是,将打包的应用
直接部署在Hadoop集群上,虽然这没什么不好,但是我们分析得知,若是将应用部署在DN节点,那么时间长了应用程序会不会抢占DN节点的资源,那么如
果我们部署在NN节点上,又对NN节点计算任务时造成影响,于是,经过讨论后,我们觉得应用程序不应该对Hadoop集群造成干扰,他们应该是属于一种松
耦合的关系,所有的应用应该部署在一个AppServer集群上。下面,我就为大家介绍今天的内容。

2.应用部署剖析

  由于之前的应用程序直接部署在Hadoop集群上,这堆集群或多或少造成了一些影响。我们知道在本地开发Hadoop应用的时候,都可以直接运
行相关Hadoop代码,这里我们只用到了Hadoop的HDFS的地址,那我们为什么不能直接将应用单独部署呢?其实本地开发就可以看作是
AppServer集群的一个节点,借助这个思路,我们将应用单独打包后,部署在一个独立的AppServer集群,只需要用到Hadoop集群的
HDFS地址即可,这里需要注意的是,保证AppServer集群与Hadoop集群在同一个网段。下面我给出解耦后应用部署架构图,如下图所示:

  从图中我们可以看出,AppServer集群想Hadoop集群提交作业,两者之间的数据交互,只需用到Hadoop的HDFS地址和Java API。在AppServer上的应用不会影响到Hadoop集群的正常运行。

3.示例

  下面为大家演示相关示例,以WordCountV2为例子,代码如下所示:

package cn.hadoop.hdfs.main;

import java.io.IOException;
import java.util.Random;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import cn.hadoop.hdfs.util.SystemConfig;

/**
 * @Date Apr 23, 2015
 *
 * @Author dengjie
 *
 * @Note Wordcount的例子是一个比较经典的mapreduce例子,可以叫做Hadoop版的hello world。
 *       它将文件中的单词分割取出,然后shuffle,sort(map过程),接着进入到汇总统计
 *       (reduce过程),最后写道hdfs中。基本流程就是这样。
 */
public class WordCountV2 {

    private static Logger logger = LoggerFactory.getLogger(WordCountV2.class);
    private static Configuration conf;

    /**
     * 设置高可用集群连接信息
     */
    static {
        String tag = SystemConfig.getProperty("dev.tag");
        String[] hosts = SystemConfig.getPropertyArray(tag + ".hdfs.host", ",");
        conf = new Configuration();
        conf.set("fs.defaultFS", "hdfs://cluster1");
        conf.set("dfs.nameservices", "cluster1");
        conf.set("dfs.ha.namenodes.cluster1", "nna,nns");
        conf.set("dfs.namenode.rpc-address.cluster1.nna", hosts[0]);
        conf.set("dfs.namenode.rpc-address.cluster1.nns", hosts[1]);
        conf.set("dfs.client.failover.proxy.provider.cluster1",
                "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider");
    }

    public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> {

        private final static IntWritable one = new IntWritable(1);
        private Text word = new Text();

        /**
         * 源文件:a b b
         *
         * map之后:
         *
         * a 1
         *
         * b 1
         *
         * b 1
         */
        public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
            StringTokenizer itr = new StringTokenizer(value.toString());// 整行读取
            while (itr.hasMoreTokens()) {
                word.set(itr.nextToken());// 按空格分割单词
                context.write(word, one);// 每次统计出来的单词+1
            }
        }
    }

    /**
     * reduce之前:
     *
     * a 1
     *
     * b 1
     *
     * b 1
     *
     * reduce之后:
     *
     * a 1
     *
     * b 2
     */
    public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
        private IntWritable result = new IntWritable();

        public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException,
                InterruptedException {
            int sum = 0;
            for (IntWritable val : values) {
                sum += val.get();// 分组累加
            }
            result.set(sum);
            context.write(key, result);// 按相同的key输出
        }
    }

    public static void main(String[] args) {
        try {
            if (args.length < 1) {
                logger.info("args length is 0");
                run("hello.txt");
            } else {
                logger.info("args length is not 0");
                run(args[0]);
            }
        } catch (Exception ex) {
            ex.printStackTrace();
            logger.error(ex.getMessage());
        }
    }

    private static void run(String name) throws Exception {
        long randName = new Random().nextLong();// 重定向输出目录
        logger.info("output name is [" + randName + "]");

        Job job = Job.getInstance(conf);
        job.setJarByClass(WordCountV2.class);
        job.setMapperClass(TokenizerMapper.class);// 指定Map计算的类
        job.setCombinerClass(IntSumReducer.class);// 合并的类
        job.setReducerClass(IntSumReducer.class);// Reduce的类
        job.setOutputKeyClass(Text.class);// 输出Key类型
        job.setOutputValueClass(IntWritable.class);// 输出值类型

        String sysInPath = SystemConfig.getProperty("hdfs.input.path.v2");
        String realInPath = String.format(sysInPath, name);
        String syOutPath = SystemConfig.getProperty("hdfs.output.path.v2");
        String realOutPath = String.format(syOutPath, randName);

        FileInputFormat.addInputPath(job, new Path(realInPath));// 指定输入路径
        FileOutputFormat.setOutputPath(job, new Path(realOutPath));// 指定输出路径

        System.exit(job.waitForCompletion(true) ? 0 : 1);// 执行完MR任务后退出应用
    }
}

  在本地IDE中运行正常,截图如下所示:

4.应用打包部署

  然后,我们将WordCountV2应用打包后部署到AppServer1节点,这里由于工程是基于Maven结构的,我们使用Maven命令直接打包,打包命令如下所示:



mvn assembly:assembly

  然后,我们使用scp命令将打包后的JAR文件上传到AppServer1节点,上传命令如下所示:



scp hadoop-ubas-1.0.0-jar-with-dependencies.jar hadoop@apps:~/

  接着,我们在AppServer1节点上运行我们打包好的应用,运行命令如下所示:



java -jar hadoop-ubas-1.0.0-jar-with-dependencies.jar

  但是,这里却很无奈的报错了,错误信息如下所示:

java.io.IOException: No FileSystem for scheme: hdfs
    at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
    at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:169)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:354)
    at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.addInputPath(FileInputFormat.java:518)
    at cn.hadoop.hdfs.main.WordCountV2.run(WordCountV2.java:134)
    at cn.hadoop.hdfs.main.WordCountV2.main(WordCountV2.java:108)
2015-05-10 23:31:21 ERROR [WordCountV2.main] - No FileSystem for scheme: hdfs

5.错误分析

  首先,我们来定位下问题原因,我将打包后的JAR在Hadoop集群上运行,是可以完成良好的运行,并计算出结果信息的,为什么在非
Hadoop集群却报错呢?难道是这种架构方式不对?经过仔细的分析错误信息,和我们的Maven依赖环境,问题原因定位出来了,这里我们使用了
Maven的assembly插件来打包应用。只是因为当我们使用Maven组件时,它将所有的JARS合并到一个文件中,所有的META-
INFO/services/org.apache.hadoop.fs.FileSystem被互相覆盖,仅保留最后一个加入的,在这种情况下
FileSystem的列表从Hadoop-Commons重写到Hadoop-HDFS的列表,而DistributedFileSystem就会找不
到相应的声明信息。因而,就会出现上述错误信息。在原因找到后,我们剩下的就是去找到解决方法,这里通过分析,我找到的解决办法如下,在Loading相
关Hadoop的Configuration时,我们设置相关FileSystem即可,配置代码如下所示:

conf.set("fs.hdfs.impl", org.apache.hadoop.hdfs.DistributedFileSystem.class.getName());
conf.set("fs.file.impl", org.apache.hadoop.fs.LocalFileSystem.class.getName());

  接下来,我们重新打包应用,然后在AppServer1节点运行该应用,运行正常,并正常统计结果,运行日志如下所示:

[hadoop@apps example]$ java -jar hadoop-ubas-1.0.0-jar-with-dependencies.jar
2015-05-11 00:08:15 INFO  [SystemConfig.main] - Successfully loaded default properties.
2015-05-11 00:08:15 INFO  [WordCountV2.main] - args length is 0
2015-05-11 00:08:15 INFO  [WordCountV2.main] - output name is [6876390710620561863]
2015-05-11 00:08:16 WARN  [NativeCodeLoader.main] - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2015-05-11 00:08:17 INFO  [deprecation.main] - session.id is deprecated. Instead, use dfs.metrics.session-id
2015-05-11 00:08:17 INFO  [JvmMetrics.main] - Initializing JVM Metrics with processName=JobTracker, sessionId=
2015-05-11 00:08:17 WARN  [JobSubmitter.main] - Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
2015-05-11 00:08:17 INFO  [FileInputFormat.main] - Total input paths to process : 1
2015-05-11 00:08:18 INFO  [JobSubmitter.main] - number of splits:1
2015-05-11 00:08:18 INFO  [JobSubmitter.main] - Submitting tokens for job: job_local519626586_0001
2015-05-11 00:08:18 INFO  [Job.main] - The url to track the job: http://localhost:8080/
2015-05-11 00:08:18 INFO  [Job.main] - Running job: job_local519626586_0001
2015-05-11 00:08:18 INFO  [LocalJobRunner.Thread-14] - OutputCommitter set in config null
2015-05-11 00:08:18 INFO  [LocalJobRunner.Thread-14] - OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
2015-05-11 00:08:18 INFO  [LocalJobRunner.Thread-14] - Waiting for map tasks
2015-05-11 00:08:18 INFO  [LocalJobRunner.LocalJobRunner Map Task Executor #0] - Starting task: attempt_local519626586_0001_m_000000_0
2015-05-11 00:08:18 INFO  [Task.LocalJobRunner Map Task Executor #0] -  Using ResourceCalculatorProcessTree : [ ]
2015-05-11 00:08:18 INFO  [MapTask.LocalJobRunner Map Task Executor #0] - Processing split: hdfs://cluster1/home/hdfs/test/in/hello.txt:0+24
2015-05-11 00:08:19 INFO  [MapTask.LocalJobRunner Map Task Executor #0] - (EQUATOR) 0 kvi 26214396(104857584)
2015-05-11 00:08:19 INFO  [MapTask.LocalJobRunner Map Task Executor #0] - mapreduce.task.io.sort.mb: 100
2015-05-11 00:08:19 INFO  [MapTask.LocalJobRunner Map Task Executor #0] - soft limit at 83886080
2015-05-11 00:08:19 INFO  [MapTask.LocalJobRunner Map Task Executor #0] - bufstart = 0; bufvoid = 104857600
2015-05-11 00:08:19 INFO  [MapTask.LocalJobRunner Map Task Executor #0] - kvstart = 26214396; length = 6553600
2015-05-11 00:08:19 INFO  [MapTask.LocalJobRunner Map Task Executor #0] - Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
2015-05-11 00:08:19 INFO  [LocalJobRunner.LocalJobRunner Map Task Executor #0] -
2015-05-11 00:08:19 INFO  [MapTask.LocalJobRunner Map Task Executor #0] - Starting flush of map output
2015-05-11 00:08:19 INFO  [MapTask.LocalJobRunner Map Task Executor #0] - Spilling map output
2015-05-11 00:08:19 INFO  [MapTask.LocalJobRunner Map Task Executor #0] - bufstart = 0; bufend = 72; bufvoid = 104857600
2015-05-11 00:08:19 INFO  [MapTask.LocalJobRunner Map Task Executor #0] - kvstart = 26214396(104857584); kvend = 26214352(104857408); length = 45/6553600
2015-05-11 00:08:19 INFO  [MapTask.LocalJobRunner Map Task Executor #0] - Finished spill 0
2015-05-11 00:08:19 INFO  [Task.LocalJobRunner Map Task Executor #0] - Task:attempt_local519626586_0001_m_000000_0 is done. And is in the process of committing
2015-05-11 00:08:19 INFO  [LocalJobRunner.LocalJobRunner Map Task Executor #0] - map
2015-05-11 00:08:19 INFO  [Task.LocalJobRunner Map Task Executor #0] - Task 'attempt_local519626586_0001_m_000000_0' done.
2015-05-11 00:08:19 INFO  [LocalJobRunner.LocalJobRunner Map Task Executor #0] - Finishing task: attempt_local519626586_0001_m_000000_0
2015-05-11 00:08:19 INFO  [LocalJobRunner.Thread-14] - map task executor complete.
2015-05-11 00:08:19 INFO  [LocalJobRunner.Thread-14] - Waiting for reduce tasks
2015-05-11 00:08:19 INFO  [LocalJobRunner.pool-6-thread-1] - Starting task: attempt_local519626586_0001_r_000000_0
2015-05-11 00:08:19 INFO  [Task.pool-6-thread-1] -  Using ResourceCalculatorProcessTree : [ ]
2015-05-11 00:08:19 INFO  [ReduceTask.pool-6-thread-1] - Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@16769723
2015-05-11 00:08:19 INFO  [MergeManagerImpl.pool-6-thread-1] - MergerManager: memoryLimit=177399392, maxSingleShuffleLimit=44349848, mergeThreshold=117083600, ioSortFactor=10, memToMemMergeOutputsThreshold=10
2015-05-11 00:08:19 INFO  [EventFetcher.EventFetcher for fetching Map Completion Events] - attempt_local519626586_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
2015-05-11 00:08:19 INFO  [LocalFetcher.localfetcher#1] - localfetcher#1 about to shuffle output of map attempt_local519626586_0001_m_000000_0 decomp: 50 len: 54 to MEMORY
2015-05-11 00:08:19 INFO  [InMemoryMapOutput.localfetcher#1] - Read 50 bytes from map-output for attempt_local519626586_0001_m_000000_0
2015-05-11 00:08:19 INFO  [MergeManagerImpl.localfetcher#1] - closeInMemoryFile -> map-output of size: 50, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->50
2015-05-11 00:08:19 INFO  [EventFetcher.EventFetcher for fetching Map Completion Events] - EventFetcher is interrupted.. Returning
2015-05-11 00:08:19 INFO  [LocalJobRunner.pool-6-thread-1] - 1 / 1 copied.
2015-05-11 00:08:19 INFO  [MergeManagerImpl.pool-6-thread-1] - finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs
2015-05-11 00:08:19 INFO  [Merger.pool-6-thread-1] - Merging 1 sorted segments
2015-05-11 00:08:19 INFO  [Merger.pool-6-thread-1] - Down to the last merge-pass, with 1 segments left of total size: 46 bytes
2015-05-11 00:08:19 INFO  [MergeManagerImpl.pool-6-thread-1] - Merged 1 segments, 50 bytes to disk to satisfy reduce memory limit
2015-05-11 00:08:19 INFO  [MergeManagerImpl.pool-6-thread-1] - Merging 1 files, 54 bytes from disk
2015-05-11 00:08:19 INFO  [MergeManagerImpl.pool-6-thread-1] - Merging 0 segments, 0 bytes from memory into reduce
2015-05-11 00:08:19 INFO  [Merger.pool-6-thread-1] - Merging 1 sorted segments
2015-05-11 00:08:19 INFO  [Merger.pool-6-thread-1] - Down to the last merge-pass, with 1 segments left of total size: 46 bytes
2015-05-11 00:08:19 INFO  [LocalJobRunner.pool-6-thread-1] - 1 / 1 copied.
2015-05-11 00:08:19 INFO  [deprecation.pool-6-thread-1] - mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
2015-05-11 00:08:19 INFO  [Job.main] - Job job_local519626586_0001 running in uber mode : false
2015-05-11 00:08:19 INFO  [Job.main] -  map 100% reduce 0%
2015-05-11 00:08:19 INFO  [Task.pool-6-thread-1] - Task:attempt_local519626586_0001_r_000000_0 is done. And is in the process of committing
2015-05-11 00:08:19 INFO  [LocalJobRunner.pool-6-thread-1] - 1 / 1 copied.
2015-05-11 00:08:19 INFO  [Task.pool-6-thread-1] - Task attempt_local519626586_0001_r_000000_0 is allowed to commit now
2015-05-11 00:08:19 INFO  [FileOutputCommitter.pool-6-thread-1] - Saved output of task 'attempt_local519626586_0001_r_000000_0' to hdfs://cluster1/home/hdfs/test/out/6876390710620561863/_temporary/0/task_local519626586_0001_r_000000
2015-05-11 00:08:19 INFO  [LocalJobRunner.pool-6-thread-1] - reduce > reduce
2015-05-11 00:08:19 INFO  [Task.pool-6-thread-1] - Task 'attempt_local519626586_0001_r_000000_0' done.
2015-05-11 00:08:19 INFO  [LocalJobRunner.pool-6-thread-1] - Finishing task: attempt_local519626586_0001_r_000000_0
2015-05-11 00:08:19 INFO  [LocalJobRunner.Thread-14] - reduce task executor complete.
2015-05-11 00:08:20 INFO  [Job.main] -  map 100% reduce 100%
2015-05-11 00:08:20 INFO  [Job.main] - Job job_local519626586_0001 completed successfully
2015-05-11 00:08:20 INFO  [Job.main] - Counters: 38
    File System Counters
        FILE: Number of bytes read=77813788
        FILE: Number of bytes written=78928898
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=48
        HDFS: Number of bytes written=24
        HDFS: Number of read operations=13
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=4
    Map-Reduce Framework
        Map input records=2
        Map output records=12
        Map output bytes=72
        Map output materialized bytes=54
        Input split bytes=108
        Combine input records=12
        Combine output records=6
        Reduce input groups=6
        Reduce shuffle bytes=54
        Reduce input records=6
        Reduce output records=6
        Spilled Records=12
        Shuffled Maps =1
        Failed Shuffles=0
        Merged Map outputs=1
        GC time elapsed (ms)=53
        CPU time spent (ms)=0
        Physical memory (bytes) snapshot=0
        Virtual memory (bytes) snapshot=0
        Total committed heap usage (bytes)=241442816
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Input Format Counters
        Bytes Read=24
    File Output Format Counters
        Bytes Written=24

6.总结

  这里需要注意的是,我们应用部署架构没问题,思路是正确的,问题出在打包上,在打包的时候需要特别注意,另外,有些同学使用IDE的Export导出时也要注意一下,相关依赖是否存在,还有常见的第三方打包工具Fat,这个也是需要注意的。

7.结束语

  这篇博客就和大家分享到这里,如果大家在研究学习的过程当中有什么问题,可以加群进行讨论或发送邮件给我,我会尽我所能为您解答,与君共勉!

时间: 2025-01-29 23:25:33

高可用Hadoop平台-应用JAR部署的相关文章

高可用Hadoop平台-启航

1.概述 在上篇博客中,我们搭建了<配置高可用Hadoop平台>, 接下来我们就可以驾着Hadoop这艘巨轮在大数据的海洋中遨游了.工欲善其事,必先利其器.是的,没错:我们开发需要有开发工具(IDE):本篇文章, 我打算讲解如何搭建和使用开发环境,以及编写和讲解WordCount这个例子,给即将在Hadoop的海洋驰骋的童鞋入个门.上次,我在<网站日志统计案例分析与实现>中说会将源码放到Github,后来,我考虑了下,决定将<高可用的Hadoop平台>做一个系列,后面基

高可用Hadoop平台-实战尾声篇

1.概述 今天这篇博客就是<高可用Hadoop平台>的尾声篇了,从搭建安装到入门运行 Hadoop 版的 HelloWorld(WordCount 可以称的上是 Hadoop 版的 HelloWorld ),在到开发中需要用到的各个套件以及对套件的安装使用,在到 Hadoop 的实战,一路走来我们对在Hadoop平台下开发的基本流程应该都熟悉了.今天我们来完成在高可用Hadoop平台开发的最后一步,导出数据. 2.导出数据目的 首先,我来说明下为什么要导出数据,导出数据的目的是为了干嘛? 我们

高可用Hadoop平台-探索

1.概述 上篇<高可用Hadoop平台-启航>博客已经让我们初步了解了Hadoop平台:接下来,我们对Hadoop做进一步的探索,一步一步的揭开Hadoop的神秘面纱.下面,我们开始赘述今天的探索之路. 2.探索 在探索之前,我们来看一下Hadoop解决了什么问题,Hadoop就是解决了大数据(大到单台服务器无法进行存储,单台服务器无法在限定的时间内进行处理)的可靠存储和处理. HDFS:在由普通或廉价的服务器(或PC)组成的集群上提供高可用的文件存储,通过将块保存多个副本的办法解决服务器或硬

高可用Hadoop平台-集成Hive HAProxy

1.概述 这篇博客是接着<高可用Hadoop平台>系列讲,本篇博客是为后面用 Hive 来做数据统计做准备的,介绍如何在 Hadoop HA 平台下集成高可用的 Hive 工具,下面我打算分以下流程来赘述: 环境准备 集成并配置 Hive 工具 使用 Java API 开发 Hive 代码 下面开始进行环境准备. 2.环境准备 Hive版本:<Hive-0.14> HAProxy版本:<HAProxy-1.5.11> 注:前提是 Hadoop 的集群已经搭建完成,若还没

高可用Hadoop平台-实战

1.概述 今天继续<高可用的Hadoop平台>系列,今天开始进行小规模的实战下,前面的准备工作完成后,基本用于统计数据的平台都拥有了,关于导出统计结果的文章留到后面赘述.今天要和大家分享的案例是一个基于电商网站的用户行为分析,这里分析的指标包含以下指标: 统计每日PV 每日注册用户 每日IP 跳出用户 其他指标可以参考上述4个指标进行拓展,下面我们开始今天的分析之旅. 2.流程 首先,在开发之前我们需要注意哪些问题?我们不能盲目的按照自己的意愿去开发项目,这样到头来得不到产品的认可,我们的工作

高可用Hadoop平台-Flume NG实战图解篇

1.概述 今天补充一篇关于Flume的博客,前面在讲解高可用的Hadoop平台的时候遗漏了这篇,本篇博客为大家讲述以下内容: Flume NG简述 单点Flume NG搭建.运行 高可用Flume NG搭建 Failover测试 截图预览 下面开始今天的博客介绍. 2.Flume NG简述 Flume NG是一个分布式,高可用,可靠的系统,它能将不同的海量数据收集,移动并存储到一个数据存储系统中.轻量,配置简单,适用于各种日志收集,并支持Failover和负载均衡.并且它拥有非常丰富的组件.Fl

高可用Hadoop平台-Ganglia安装部署

1.概述 最近,有朋友私密我,Hadoop有什么好的监控工具,其实,Hadoop的监控工具还是蛮多的.今天给大家分享一个老牌监控工具 Ganglia,这个在企业用的也算是比较多的,Hadoop对它的兼容也很好,不过就是监控界面就不是很美观.下次给大家介绍另一款工具--Hue,这 个界面官方称为Hadoop UI,界面美观,功能也比较丰富.今天,在这里主要给大家介绍Ganglia这款监控工具,介绍的内容主要包含如下: Ganglia背景 Ganglia安装部署.配置 Hadoop集群配置Gangl

高可用Hadoop平台-HBase集群搭建

1.概述 今天补充一篇HBase集群的搭建,这个是高可用系列遗漏的一篇博客,今天抽时间补上,今天给大家介绍的主要内容目录如下所示: 基础软件的准备 HBase介绍 HBase集群搭建 单点问题验证 截图预览 那么,接下来我们开始今天的HBase集群搭建学习. 2.基础软件的准备 由于HBase的数据是存放在HDFS上的,所以我们在使用HBase时,确保Hadoop集群已搭建完成,并运行良好.若是为搭建Hadoop集群,请参考我写的<配置高可用的Hadoop平台>来完成Hadoop平台的搭建.另

高可用Hadoop平台-答疑篇

1.概述 这篇博客不涉及到具体的编码,只是解答最近一些朋友心中的疑惑.最近,一些朋友和网友纷纷私密我,我总结了一下,疑问大致包含以下几点: 我学 Hadoop 后能从事什么岗位? 在遇到问题,我该如何去寻求解决方案? 针对以上问题,我在这里赘述下个人的经验,给即将步入 Hadoop 行业的同学做个参考. 2.我学 Hadoop 后能从事什么岗位 目前 Hadoop 相关的工作大致分为三类:应用,运维,二次开发 2.1 应用 这方面的主要工作是编写MapReduce作业,利用Hive之类的套件来进