一. TermHashPerField.add()方法
这一章继续上面的内容, 上一章谈到TermHashPerField.add()方法就是把一个Term加入到posting表的过程, 那么下面我将从算法的角度来分析这个add()方法:
final char[] tokenText = termAtt.termBuffer();;
final int tokenTextLen = termAtt.termLength();
这两行主要做的事情就是获取当前需要分析的单词的内容以及长度.
int downto = tokenTextLen;
int code = 0;
while (downto > 0) {
char ch = tokenText[--downto];
if (ch >= UnicodeUtil.UNI_SUR_LOW_START && ch <= UnicodeUtil.UNI_SUR_LOW_END) {
if (0 == downto) {
// Unpaired
ch = tokenText[downto] = UnicodeUtil.UNI_REPLACEMENT_CHAR;
} else {
final char ch2 = tokenText[downto-1];
if (ch2 >= UnicodeUtil.UNI_SUR_HIGH_START && ch2 <= UnicodeUtil.UNI_SUR_HIGH_END) {
// OK: high followed by low. This is a valid
// surrogate pair.
code = ((code*31) + ch)*31+ch2;
downto--;
continue;
} else {
// Unpaired
ch = tokenText[downto] = UnicodeUtil.UNI_REPLACEMENT_CHAR;
}
}
} else if (ch >= UnicodeUtil.UNI_SUR_HIGH_START && (ch <= UnicodeUtil.UNI_SUR_HIGH_END ||
ch == 0xffff)) {
// Unpaired or 0xffff
ch = tokenText[downto] = UnicodeUtil.UNI_REPLACEMENT_CHAR;
}
code = (code*31) + ch;
}