作为输入
当压缩文件做为mapreduce的输入时,mapreduce将自动通过扩展名找到相应的codec对其解压 。
作为输出
当mapreduce的输出文件需要压缩时,可以更改mapred.output.compress为true, mapped.output.compression.codec为想要使用的codec的类名就
可以了,当然你可以在代码中指定,通过 调用FileOutputFormat的静态方法去设置这两个属性,我们来看代码:
package com.sweetop.styhadoop; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.io.compress.GzipCodec; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import java.io.IOException; /** * Created with IntelliJ IDEA. * User: lastsweetop * Date: 13-6-27 * Time: 下午7:48 * To change this template use File | Settings | File Templates. */ public class MaxTemperatureWithCompression { public static void main(String[] args) throws Exception { if (args.length!=2){ System.out.println("Usage: MaxTemperature <input path> <out path>"); System.exit(-1); } Job job=new Job(); job.setJarByClass(MaxTemperature.class); job.setJobName("Max Temperature"); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setMapperClass(MaxTemperatrueMapper.class); job.setCombinerClass(MaxTemperatureReducer.class); job.setReducerClass(MaxTemperatureReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileOutputFormat.setCompressOutput(job, true); FileOutputFormat.setOutputCompressorClass(job, GzipCodec.class); System.exit(job.waitForCompletion(true)?0:1); } }
输入也是一个压缩文件
~/hadoop/bin/hadoop com.sweetop.styhadoop.MaxTemperatureWithCompression input/data.gz output/
以上是小编为您精心准备的的内容,在的博客、问答、公众号、人物、课程等栏目也有的相关内容,欢迎继续使用右上角搜索按钮进行搜索hadoop
, class
, mapreduce
, apache
, mapreduce多路径输出
, job
, import
, hadoop mapreduce
, hadoop mapreduce
, hadoop job
, mapreduc
, 源代码mapreduce
, 代码 MapReduce
job属性
hadoop mapreduce详解、hadoop mapreduce、hadoop mapreduce原理、hadoop mapreduce实例、hadoop2 mapreduce,以便于您获取更多的相关知识。