lucene 分词相关的类

TokemStream

org.apache.lucene.analysis.TokenStream

一个抽象类。一个TokenStream会枚举若干个token的序列，要么来自文档的域，要门来自查询文本。

A TokenStream enumerates the sequence of tokens, either from Fields of a Document or from query text.

TokenStream org.apache.lucene.analysis.Analyzer.tokenStream(String fieldName, Reader reader)
从reader的文本中得到一个Analyzer分词后的TokenStream。
Creates a TokenStream which tokenizes all the text in the provided Reader.

void org.apache.lucene.analysis.TokenStream.reset() throws IOException
将TokenStream的游标重置到初始位置。
Resets this stream to the beginning.

boolean org.apache.lucene.analysis.TokenStream.incrementToken() throws IOException
消费者，也就是IndexWriter使用这个方法来获得下一个token。
Consumers (i.e., IndexWriter) use this method to advance the stream to the next token.

org.apache.lucene.analysis.tokenattributes.CharTermAttribute
一个token的词文本。
The term text of a Token.

<CharTermAttribute> CharTermAttribute org.apache.lucene.util.AttributeSource.getAttribute(Class<CharTermAttribute> attClass)
获得指定的Attribute。
The caller must pass in a Class<? extends Attribute> value. Returns the instance of the passed in Attribute contained in this AttributeSource。

Tokenizer

org.apache.lucene.analysis.Tokenizer
一个Tokenizer是一个输入为Reader的TokenStream。
A Tokenizer is a TokenStream whose input is a Reader.

TokenFilter

org.apache.lucene.analysis.TokenFilter
一个TokenFilter是一个输入为其他TokenStream的TokenStream。用于过滤。
A TokenFilter is a TokenStream whose input is another TokenStream.

org.apache.lucene.analysis.LowerCaseFilter
将token替换为小写。
Normalizes token text to lower case.

org.apache.lucene.analysis.StopFilter
从一个TokenStream中去除停用词。
Removes stop words from a token stream.

Analyzer

org.apache.lucene.analysis.KeywordAnalyzer
将整个stream作为一个token。适用于邮政编码、产品名称等。
"Tokenizes" the entire stream as a single token. This is useful for data like zip codes, ids, and some product names.

org.apache.lucene.analysis.ReusableAnalyzerBase
一个Analyzer的方便的子类，可以方便地实现TokenStream的重用。
An convenience subclass of Analyzer that makes it easy to implement TokenStream reuse.

时间： 2024-12-03 03:41:04

lucene 分词相关的类

TokemStream

Tokenizer

TokenFilter

Analyzer

lucene 分词相关的类的相关文章

lucene 6.0 常用类与方法

DNN模块开发系列文章（4）——与模块开发相关的类

图片-Android开发相关的类文件问题

推荐几种Apache Lucene 分词系统

WebGIS中兴趣点简单查询、基于Lucene分词查询的设计和实现

Lucene分词后的结果

Spring源码解析——配置文件读取相关的类

新人求解 ConcreteMediator类及其相关的类为什么在api找不到

Lucene分词后查询字符数字问题