一. 我本来的程序
其实我本来的程序挺简单, 完全修改自Demo里面的SearchFiles和IndexFiles. 唯一不同的是引用了SmartCN的分词器.
我把修改那一点的代码贴出来.
IndexhChinese.java:
Date start = new Date();
try {
IndexWriter writer = new IndexWriter(FSDirectory.open(INDEX_DIR),
new SmartChineseAnalyzer(Version.LUCENE_CURRENT), true, IndexWriter.MaxFieldLength.LIMITED);
indexDocs(writer, docDir);
System.out.println("Indexing to directory '" +INDEX_DIR+ "'...");
System.out.println("Optimizing...");
//writer.optimize();
writer.close();
Date end = new Date();
System.out.println(end.getTime() - start.getTime() + " total milliseconds");
}
SearchChinese.java
Analyzer analyzer = new SmartChineseAnalyzer(Version.LUCENE_CURRENT);
BufferedReader in = null;
if (queries != null) {
in = new BufferedReader(new FileReader(queries));
} else {
in = new BufferedReader(new InputStreamReader(System.in, "GBK"));
}
在这里, 我制定了输入的查询是采用GBK编码的.
然后我充满信心的运行后......发现无法检索出中文, 里面的英文检索是正常的.