hadoop1.0.4,mahout0.5。
mahout里面有实现读取聚类算法中的方法,叫做ClusterDumper,这个类输出的格式一般如下:
VL-2{n=6 c=[1.833, 2.417] r=[0.687, 0.344]} Weight: Point: 1.0: [1.000, 3.000] ... 1.0: [3.000, 2.500] VL-11{n=7 c=[2.857, 4.714] r=[0.990, 0.364]} Weight: Point: 1.0: [1.000, 5.000] ... 1.0: [4.000, 4.500] VL-14{n=8 c=[4.750, 3.438] r=[0.433, 0.682]} Weight: Point: 1.0: [4.000, 3.000] ... 1.0: [5.000, 4.000]
不过,如果我只想实现输出聚类中心的文件的话,那么就不行了。本来想继承ClusterDumper,结果ClusterDumper是一个final的,算了,还是自己写吧。
返回栏目页:http://www.bianceng.cnhttp://www.bianceng.cn/Programming/extra/
参考ClusterDumper中的源码,如下:
for (Cluster value : new SequenceFileDirValueIterable<Cluster>(new Path(seqFileDir, "part-*"), PathType.GLOB, conf)) { String fmtStr = value.asFormatString(dictionary); if (subString > 0 && fmtStr.length() > subString) { writer.write(':'); writer.write(fmtStr, 0, Math.min(subString, fmtStr.length())); } else { writer.write(fmtStr); }
或者参考lz之前的一篇文章:mahout源码KMeansDriver分析之二中心点文件分析(无语篇),里面也有关于聚类中心的读取;
可以写一个ClusterCenterDump的类,如下:
package com.caic.cloud.util; import java.io.File; import java.io.FileNotFoundException; import java.io.IOException; import java.io.Writer; import org.apache.commons.logging.Log; import org.apache.commons.logging.LogFactory; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.mahout.clustering.Cluster; import org.apache.mahout.common.iterator.sequencefile.PathType; import org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterable; import com.google.common.base.Charsets; import com.google.common.io.Files; /** * just output the center vector to a given file * @author fansy * */ public class ClusterCenterDump { private Log log=LogFactory.getLog(ClusterCenterDump.class); private Configuration conf; private Path centerPathDir; private String outputPath; /*public ClusterCenterDump(){} public ClusterCenterDump(Configuration conf){ this.conf=conf; }*/ public ClusterCenterDump(Configuration conf,String centerPathDir,String outputPath){ this.conf=conf; this.centerPathDir=new Path(centerPathDir); this.setOutputPath(outputPath); } /** * write the given cluster center to the given file * @return * @throws FileNotFoundException */ public boolean writeCenterToLocal() throws FileNotFoundException{ if(this.conf==null||this.outputPath==null||this.centerPathDir==null){ log.info("error:\nshould initial the configuration ,outputPath and centerPath"); return false; } Writer writer=null; try { File outputFile=new File(outputPath); writer = Files.newWriter(outputFile, Charsets.UTF_8); this.writeTxtCenter(writer, new SequenceFileDirValueIterable<Cluster>(new Path(centerPathDir, "part-*"), PathType.GLOB, conf)); // new SequenceFileDirValueIterable<Writable>(new Path(centerPathDir, "part-r-00000"), PathType.LIST, // PathFilters.partFilter(),conf)); writer.flush(); } catch (IOException e) { log.info("write error:\n"+e.getMessage()); return false; }finally{ try { if(writer!=null){ writer.close(); } } catch (IOException e) { log.info("close writer error:\n"+e.getMessage()); } } return true; } /** * write the cluster to writer * @param writer * @param cluster * @return * @throws IOException */ private boolean writeTxtCenter(Writer writer,Iterable<Cluster> clusters) throws IOException{ for(Cluster cluster:clusters){ String fmtStr = cluster.asFormatString(null); System.out.println("fmtStr:"+fmtStr); writer.write(fmtStr); writer.write("\n"); } return true; } public Configuration getConf() { return conf; } public void setConf(Configuration conf) { this.conf = conf; } public Path getCenterPathDir() { return centerPathDir; } public void setCenterPathDir(Path centerPathDir) { this.centerPathDir = centerPathDir; } /** * @return the outputPath */ public String getOutputPath() { return outputPath; } /** * @param outputPath the outputPath to set */ public void setOutputPath(String outputPath) { this.outputPath = outputPath; } }
下面是一个测试类:
以上是小编为您精心准备的的内容,在的博客、问答、公众号、人物、课程等栏目也有的相关内容,欢迎继续使用右上角搜索按钮进行搜索mahout
, conf
, return
, import
, configuration
, 无语 ios
, public
SequenceFile
mahout seqdumper、mahout clusterdump、kettle错误输出定制、xp3dumper、data dumper 安装,以便于您获取更多的相关知识。