问题描述
- lucene TokenStream.incrementToken() 报错
-
初学,在网上找了一些例子例如:CSDN移动问答然后自己在电脑上跑了一下报错,我的代码
public static void main(String[] args) throws IOException { String s = "Good Afternoon Doesn't IS a good body names NAMES 1,671,000 hy body"; Analyzer analyzer = new WhitespaceAnalyzer(Version.LUCENE_42); TokenStream ts =analyzer.tokenStream(s, new StringReader(s)); CharTermAttribute cab = ts.addAttribute(CharTermAttribute.class); ts.incrementToken(); /*while(ts.incrementToken()) { System.out.println(cab.toString()); }*/ }
结果报错:Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: -1
at java.lang.Character.codePointAtImpl(Unknown Source)
at java.lang.Character.codePointAt(Unknown Source)
at org.apache.lucene.analysis.util.CharacterUtils$Java5CharacterUtils.codePointAt(CharacterUtils.java:164)
at org.apache.lucene.analysis.util.CharTokenizer.incrementToken(CharTokenizer.java:166)
at pim.topicmap.FormatConverter.main(FormatConverter.java:69)就是这句ts.incrmentToken();求解
解决方案
你看的代码应该是3.5左右的版本;
4之后做了改进,api里和源代码里有说明
The workflow of the new TokenStream API is as follows:
Instantiation of TokenStream/TokenFilters which add/get attributes to/from the AttributeSource.
The consumer calls reset().
The consumer retrieves attributes from the stream and stores local references to all attributes it wants to access.
The consumer calls incrementToken() until it returns false consuming the attributes after each call.
The consumer calls end() so that any end-of-stream operations can be performed.
The consumer calls close() to release any resource when finished using the TokenStream.
To make sure that filters and consumers know which attributes are available, the attributes must be added during instantiation. Filters and consumers are not required to check for availability of attributes in incrementToken().
1.while 之前reset()一下
2.while 之后end()一下
3.然后关闭流