全文检索-corseek 中文检索时搜不出结果 搜英文单词正常

问题描述

corseek 中文检索时搜不出结果 搜英文单词正常

[root@abc testpack]# /usr/local/coreseek/bin/indexer -c etc/sphinx.conf --all Coreseek Fulltext 4.1 [ Sphinx 2.0.2-dev (r2922)]
Copyright (c) 2007-2011,
Beijing Choice Software Technologies Inc (http://www.coreseek.com)

using config file 'etc/sphinx.conf'...
indexing index 'test1'...
WARNING: Attribute count is 0: switching to none docinfo
collected 5 docs, 0.0 MB
sorted 0.0 Mhits, 100.0% done
total 5 docs, 186 bytes
total 0.064 sec, 2870 bytes/sec, 77.16 docs/sec
total 2 reads, 0.000 sec, 0.0 kb/call avg, 0.0 msec/call avg
total 6 writes, 0.000 sec, 0.0 kb/call avg, 0.0 msec/call avg
检索中文 不出结果
[root@abc testpack]# /usr/local/coreseek/bin/search -c etc/sphinx.conf '水火不容'
Coreseek Fulltext 4.1 [ Sphinx 2.0.2-dev (r2922)]
Copyright (c) 2007-2011,
Beijing Choice Software Technologies Inc (http://www.coreseek.com)

using config file 'etc/sphinx.conf'...
index 'test1': query '水火不容 ': returned 0 matches of 0 total in 0.000 sec

words:
1. '水火': 0 documents, 0 hits
2. '不容': 0 documents, 0 hits

检索英文就能出结果
[root@abc testpack]# /usr/local/coreseek/bin/search -c etc/sphinx.conf 'apple'
Coreseek Fulltext 4.1 [ Sphinx 2.0.2-dev (r2922)]
Copyright (c) 2007-2011,
Beijing Choice Software Technologies Inc (http://www.coreseek.com)

using config file 'etc/sphinx.conf'...
index 'test1': query 'apple ': returned 1 matches of 1 total in 0.001 sec

displaying matches:
1. document=5, weight=2780
id=5
title=apple
content=apple,banana

words:
1. 'apple': 1 documents, 2 hits

这个是数据库
mysql> select * from tt;
+----+--------------+-----------------+
| id | title | content |
+----+--------------+-----------------+
| 1 | 西水 | 水水 |
| 2 | 水火不容 | 水火不容 |
| 3 | 水啊啊 | 啊水货 |
| 4 | 东南西水 | 啊西西哈哈 |
| 5 | apple | apple,banana |
+----+--------------+-----------------+
5 rows in set (0.00 sec)

下面是配置那个文件
#

Sphinx configuration file sample

#

WARNING! While this sample file mentions all available options,

it contains (very) short helper descriptions only. Please refer to

doc/sphinx.html for details.

#

#############################################################################

data source definition

#############################################################################

source src1
{
# data source type. mandatory, no default value
# known types are mysql, pgsql, mssql, xmlpipe, xmlpipe2, odbc
type = mysql

#####################################################################
## SQL settings (for 'mysql' and 'pgsql' types)
#####################################################################

# some straightforward parameters for SQL source types
sql_host        = localhost
sql_user        = root
sql_pass        = 123456
sql_db          = haha
sql_port        = 3306  # optional, default is 3306

# UNIX socket name
# optional, default is empty (reuse client library defaults)
# usually '/var/lib/mysql/mysql.sock' on Linux
# usually '/tmp/mysql.sock' on FreeBSD
#
 sql_sock       = /var/lib/mysql/mysql.sock

# MySQL specific client connection flags
# optional, default is 0
#
# mysql_connect_flags   = 32 # enable compression

# MySQL specific SSL certificate settings
# optional, defaults are empty
#
# mysql_ssl_cert        = /etc/ssl/client-cert.pem
# mysql_ssl_key     = /etc/ssl/client-key.pem
# mysql_ssl_ca      = /etc/ssl/cacert.pem

# MS SQL specific Windows authentication mode flag
# MUST be in sync with charset_type index-level setting
# optional, default is 0
#
# mssql_winauth     = 1 # use currently logged on user credentials

# MS SQL specific Unicode indexing flag
# optional, default is 0 (request SBCS data)
#
# mssql_unicode     = 1 # request Unicode data from server

# ODBC specific DSN (data source name)
# mandatory for odbc source type, no default value
#
# odbc_dsn      = DBQ=C:data;DefaultDir=C:data;Driver={Microsoft Text Driver (*.txt; *.csv)};
# sql_query     = SELECT id, data FROM documents.csv

# ODBC and MS SQL specific, per-column buffer sizes
# optional, default is auto-detect
#
# sql_column_buffers    = content=12M, comments=1M

# pre-query, executed before the main fetch query
# multi-value, optional, default is empty list of queries
#
 sql_query_pre      = SET NAMES utf8
 sql_query_pre      = SET SESSION query_cache_type=OFF

# main document fetch query
# mandatory, integer document ID field MUST be the first selected column
sql_query       =
    SELECT id, title, content FROM tt

# joined/payload field fetch query
# joined fields let you avoid (slow) JOIN and GROUP_CONCAT
# payload fields let you attach custom per-keyword values (eg. for ranking)
#
# syntax is FIELD-NAME 'from'  ( 'query' | 'payload-query' ); QUERY
# joined field QUERY should return 2 columns (docid, text)
# payload field QUERY should return 3 columns (docid, keyword, weight)
#
# REQUIRES that query results are in ascending document ID order!
# multi-value, optional, default is empty list of queries
#
# sql_joined_field  = tags from query; SELECT docid, CONCAT('tag',tagid) FROM tags ORDER BY docid ASC
# sql_joined_field  = wtags from payload-query; SELECT docid, tag, tagweight FROM tags ORDER BY docid ASC

# file based field declaration
#
# content of this field is treated as a file name
# and the file gets loaded and indexed in place of a field
#
# max file size is limited by max_file_field_buffer indexer setting
# file IO errors are non-fatal and get reported as warnings
#
# sql_file_field        = content_file_path
    # sql_query_info        = SELECT * FROM tt  WHERE id=$id

# range query setup, query that must return min and max ID values
# optional, default is empty
#
# sql_query will need to reference $start and $end boundaries
# if using ranged query:
#
# sql_query     =
#   SELECT doc.id, doc.id AS group, doc.title, doc.data
#   FROM documents doc
#   WHERE id>=$start AND id<=$end
#
# sql_query_range       = SELECT MIN(id),MAX(id) FROM documents

# range query step
# optional, default is 1024
#
# sql_range_step        = 1000

# unsigned integer attribute declaration
# multi-value (an arbitrary number of attributes is allowed), optional
# optional bit size can be specified, default is 32
#
# sql_attr_uint     = author_id
# sql_attr_uint     = forum_id:9 # 9 bits for forum_id
#sql_attr_uint      = group_id

# boolean attribute declaration
# multi-value (an arbitrary number of attributes is allowed), optional
# equivalent to sql_attr_uint with 1-bit size
#
# sql_attr_bool     = is_deleted

# bigint attribute declaration
# multi-value (an arbitrary number of attributes is allowed), optional
# declares a signed (unlike uint!) 64-bit attribute
#
# sql_attr_bigint       = my_bigint_id

# UNIX timestamp attribute declaration
# multi-value (an arbitrary number of attributes is allowed), optional
# similar to integer, but can also be used in date functions
#
# sql_attr_timestamp    = posted_ts
# sql_attr_timestamp    = last_edited_ts
#sql_attr_timestamp = date_added

# string ordinal attribute declaration
# multi-value (an arbitrary number of attributes is allowed), optional
# sorts strings (bytewise), and stores their indexes in the sorted list
# sorting by this attr is equivalent to sorting by the original strings
#
# sql_attr_str2ordinal  = author_name

# floating point attribute declaration
# multi-value (an arbitrary number of attributes is allowed), optional
# values are stored in single precision, 32-bit IEEE 754 format
#
# sql_attr_float        = lat_radians
# sql_attr_float        = long_radians

# multi-valued attribute (MVA) attribute declaration
# multi-value (an arbitrary number of attributes is allowed), optional
# MVA values are variable length lists of unsigned 32-bit integers
#
# syntax is ATTR-TYPE ATTR-NAME 'from' SOURCE-TYPE [;QUERY] [;RANGE-QUERY]
# ATTR-TYPE is 'uint' or 'timestamp'
# SOURCE-TYPE is 'field', 'query', or 'ranged-query'
# QUERY is SQL query used to fetch all ( docid, attrvalue ) pairs
# RANGE-QUERY is SQL query used to fetch min and max ID values, similar to 'sql_query_range'
#
# sql_attr_multi        = uint tag from query; SELECT docid, tagid FROM tags
# sql_attr_multi        = uint tag from ranged-query;
#   SELECT docid, tagid FROM tags WHERE id>=$start AND id<=$end;
#   SELECT MIN(docid), MAX(docid) FROM tags

# string attribute declaration
# multi-value (an arbitrary number of these is allowed), optional
# lets you store and retrieve strings
#
# sql_attr_string       = stitle

# wordcount attribute declaration
# multi-value (an arbitrary number of these is allowed), optional
# lets you count the words at indexing time
#
# sql_attr_str2wordcount    = stitle

# combined field plus attribute declaration (from a single column)
# stores column as an attribute, but also indexes it as a full-text field
#
# sql_field_string  = author
# sql_field_str2wordcount   = title

# post-query, executed on sql_query completion
# optional, default is empty
#
# sql_query_post        =

# post-index-query, executed on successful indexing completion
# optional, default is empty
# $maxid expands to max document ID actually fetched from DB
#
# sql_query_post_index  = REPLACE INTO counters ( id, val )
#   VALUES ( 'max_indexed_id', $maxid )

# ranged query throttling, in milliseconds
# optional, default is 0 which means no delay
# enforces given delay before each query step
sql_ranged_throttle = 0

# document info query, ONLY for CLI search (ie. testing and debugging)
# optional, default is empty
# must contain $id macro and must fetch the document by that id
sql_query_info      = SELECT * FROM tt WHERE id=$id

# kill-list query, fetches the document IDs for kill-list
# k-list will suppress matches from preceding indexes in the same query
# optional, default is empty
#
# sql_query_killlist    = SELECT id FROM documents WHERE edited>=@last_reindex

# columns to unpack on indexer side when indexing
# multi-value, optional, default is empty list
#
# unpack_zlib       = zlib_column
# unpack_mysqlcompress  = compressed_column
# unpack_mysqlcompress  = compressed_column_2

# maximum unpacked length allowed in MySQL COMPRESS() unpacker
# optional, default is 16M
#
# unpack_mysqlcompress_maxsize  = 16M

#####################################################################
## xmlpipe2 settings
#####################################################################

# type          = xmlpipe

# shell command to invoke xmlpipe stream producer
# mandatory
#
# xmlpipe_command       = cat /usr/local/coreseek/var/test.xml

# xmlpipe2 field declaration
# multi-value, optional, default is empty
#
# xmlpipe_field     = subject
# xmlpipe_field     = content

# xmlpipe2 attribute declaration
# multi-value, optional, default is empty
# all xmlpipe_attr_XXX options are fully similar to sql_attr_XXX
#
# xmlpipe_attr_timestamp    = published
# xmlpipe_attr_uint = author_id

# perform UTF-8 validation, and filter out incorrect codes
# avoids XML parser choking on non-UTF-8 documents
# optional, default is 0
#
# xmlpipe_fixup_utf8    = 1

}

inherited source example

#

all the parameters are copied from the parent source,

and may then be overridden in this source definition

source src1throttled : src1
{
sql_ranged_throttle = 100
}

#############################################################################

index definition

#############################################################################

local index example

#

this is an index which is stored locally in the filesystem

#

all indexing-time options (such as morphology and charsets)

are configured per local index

index test1
{
# index type
# optional, default is 'plain'
# known values are 'plain', 'distributed', and 'rt' (see samples below)
# type = plain

# document source(s) to index
# multi-value, mandatory
# document IDs must be globally unique across all sources
source          = src1

# index files path and file name, without extension
# mandatory, path must be writable, extensions will be auto-appended
#path           = /usr/local/coreseek/var/data/test1

# document attribute values (docinfo) storage mode
# optional, default is 'extern'
# known values are 'none', 'extern' and 'inline'
docinfo         = extern

# memory locking for cached data (.spa and .spi), to prevent swapping
# optional, default is 0 (do not mlock)
# requires searchd to be run from root
mlock           = 0

# a list of morphology preprocessors to apply
# optional, default is empty
#
# builtin preprocessors are 'none', 'stem_en', 'stem_ru', 'stem_enru',
# 'soundex', and 'metaphone'; additional preprocessors available from
# libstemmer are 'libstemmer_XXX', where XXX is algorithm code
# (see libstemmer_c/libstemmer/modules.txt)
#
# morphology        = stem_en, stem_ru, soundex
# morphology        = libstemmer_german
# morphology        = libstemmer_sv
morphology      = none

# minimum word length at which to enable stemming
# optional, default is 1 (stem everything)
#
# min_stemming_len  = 1

path = /root/rearch_dir
# stopword files list (space separated)
# optional, default is empty
# contents are plain text, charset_table and stemming are both applied
#
# stopwords = /usr/local/coreseek/var/data/stopwords.txt

# wordforms file, in "mapfrom > mapto" plain text format
# optional, default is empty
#
# wordforms     = /usr/local/coreseek/var/data/wordforms.txt

# tokenizing exceptions file
# optional, default is empty
#
# plain text, case sensitive, space insensitive in map-from part
# one "Map Several Words => ToASingleOne" entry per line
#
# exceptions        = /usr/local/coreseek/var/data/exceptions.txt

# minimum indexed word length
# default is 1 (index everything)
min_word_len        = 1

# charset encoding type
# optional, default is 'sbcs'
# known types are 'sbcs' (Single Byte CharSet) and 'utf-8'
charset_type        = zh_cn.utf-8
    charset_dictpath        = /usr/local/mmseg3/etc/
# charset definition and case folding rules "table"
# optional, default value depends on charset_type
#
# defaults are configured to include English and Russian characters only
# you need to change the table to include additional ones
# this behavior MAY change in future versions
#
# 'sbcs' default value is
# charset_table     = 0..9, A..Z->a..z, _, a..z, U+A8->U+B8, U+B8, U+C0..U+DF->U+E0..U+FF, U+E0..U+FF
#
# 'utf-8' default value is
#charset_table      = 0..9, A..Z->a..z, _, a..z, U+410..U+42F->U+430..U+44F, U+430..U+44F

# ignored characters list
# optional, default value is empty
#
# ignore_chars      = U+00AD

# minimum word prefix length to index
# optional, default is 0 (do not index prefixes)
#
# min_prefix_len        = 0

# minimum word infix length to index
# optional, default is 0 (do not index infixes)
#
# min_infix_len     = 0

# list of fields to limit prefix/infix indexing to
# optional, default value is empty (index all fields in prefix/infix mode)
#
# prefix_fields     = filename
# infix_fields      = url, domain

# enable star-syntax (wildcards) when searching prefix/infix indexes
# search-time only, does not affect indexing, can be 0 or 1
# optional, default is 0 (do not use wildcard syntax)
#
# enable_star       = 1

# expand keywords with exact forms and/or stars when searching fit indexes
# search-time only, does not affect indexing, can be 0 or 1
# optional, default is 0 (do not expand keywords)
#
# expand_keywords       = 1

# n-gram length to index, for CJK indexing
# only supports 0 and 1 for now, other lengths to be implemented
# optional, default is 0 (disable n-grams)
#
 ngram_len      = 0

# n-gram characters list, for CJK indexing
# optional, default is empty
#
# ngram_chars       = U+3000..U+2FA1F

# phrase boundary characters list
# optional, default is empty
#
# phrase_boundary       = ., ?, !, U+2026 # horizontal ellipsis

# phrase boundary word position increment
# optional, default is 0
#
# phrase_boundary_step  = 100

# blended characters list
# blended chars are indexed both as separators and valid characters
# for instance, AT&T will results in 3 tokens ("at", "t", and "at&t")
# optional, default is empty
#
# blend_chars       = +, &, U+23

# blended token indexing mode
# a comma separated list of blended token indexing variants
# known variants are trim_none, trim_head, trim_tail, trim_both, skip_pure
# optional, default is trim_none
#
# blend_mode        = trim_tail, skip_pure

# whether to strip HTML tags from incoming documents
# known values are 0 (do not strip) and 1 (do strip)
# optional, default is 0
html_strip      = 0

# what HTML attributes to index if stripping HTML
# optional, default is empty (do not index anything)
#
# html_index_attrs  = img=alt,title; a=title;

# what HTML elements contents to strip
# optional, default is empty (do not strip element contents)
#
# html_remove_elements  = style, script

# whether to preopen index data files on startup
# optional, default is 0 (do not preopen), searchd-only
#
# preopen           = 1

# whether to keep dictionary (.spi) on disk, or cache it in RAM
# optional, default is 0 (cache in RAM), searchd-only
#
# ondisk_dict       = 1

# whether to enable in-place inversion (2x less disk, 90-95% speed)
# optional, default is 0 (use separate temporary files), indexer-only
#
# inplace_enable        = 1

# in-place fine-tuning options
# optional, defaults are listed below
#
# inplace_hit_gap       = 0 # preallocated hitlist gap size
# inplace_docinfo_gap   = 0 # preallocated docinfo gap size
# inplace_reloc_factor  = 0.1 # relocation buffer size within arena
# inplace_write_factor  = 0.1 # write buffer size within arena

# whether to index original keywords along with stemmed versions
# enables "=exactform" operator to work
# optional, default is 0
#
# index_exact_words = 1

# position increment on overshort (less that min_word_len) words
# optional, allowed values are 0 and 1, default is 1
#
# overshort_step        = 1

# position increment on stopword
# optional, allowed values are 0 and 1, default is 1
#
# stopword_step     = 1

# hitless words list
# positions for these keywords will not be stored in the index
# optional, allowed values are 'all', or a list file name
#
# hitless_words     = all
# hitless_words     = hitless.txt

# detect and index sentence and paragraph boundaries
# required for the SENTENCE and PARAGRAPH operators to work
# optional, allowed values are 0 and 1, default is 0
#
# index_sp          = 1

# index zones, delimited by HTML/XML tags
# a comma separated list of tags and wildcards
# required for the ZONE operator to work
# optional, default is empty string (do not index zones)
#
# index_zones       = title, h*, th

}

inherited index example

#

all the parameters are copied from the parent index,

and may then be overridden in this index definition

#index test1stemmed : test1
#{

path = /usr/local/coreseek/var/data/test1stemmed

morphology = stem_en

#}

distributed index example

#

this is a virtual index which can NOT be directly indexed,

and only contains references to other local and/or remote indexes

#index dist1
#{
# 'distributed' index type MUST be specified

type = distributed

# local index to be searched
# there can be many local indexes configured

local = test1

local = test1stemmed

# remote agent
# multiple remote agents may be specified
# syntax for TCP connections is 'hostname:port:index1,[index2[,...]]'
# syntax for local UNIX connections is '/path/to/socket:index1,[index2[,...]]'

agent = localhost:9313:remote1

agent = localhost:9314:remote2,remote3

# agent         = /var/run/searchd.sock:remote4

# blackhole remote agent, for debugging/testing
# network errors and search results will be ignored
#
# agent_blackhole       = testbox:9312:testindex1,testindex2

# remote agent connection timeout, milliseconds
# optional, default is 1000 ms, ie. 1 sec

agent_connect_timeout = 1000

# remote agent query timeout, milliseconds
# optional, default is 3000 ms, ie. 3 sec

agent_query_timeout = 3000

#}

realtime index example

#

you can run INSERT, REPLACE, and DELETE on this index on the fly

using MySQL protocol (see 'listen' directive below)

#index rt
#{
# 'rt' index type must be specified to use RT index
#type = rt

# index files path and file name, without extension
# mandatory, path must be writable, extensions will be auto-appended

path = /usr/local/coreseek/var/data/rt

# RAM chunk size limit
# RT index will keep at most this much data in RAM, then flush to disk
# optional, default is 32M
#
# rt_mem_limit      = 512M

# full-text field declaration
# multi-value, mandatory

rt_field = title

rt_field = content

# unsigned integer attribute declaration
# multi-value (an arbitrary number of attributes is allowed), optional
# declares an unsigned 32-bit attribute

rt_attr_uint = gid

# RT indexes currently support the following attribute types:
# uint, bigint, float, timestamp, string
#
# rt_attr_bigint        = guid
# rt_attr_float     = gpa
# rt_attr_timestamp = ts_added
# rt_attr_string        = content

#}

#############################################################################

indexer settings

#############################################################################

indexer
{
# memory limit, in bytes, kiloytes (16384K) or megabytes (256M)
# optional, default is 32M, max is 2047M, recommended is 256M to 1024M
mem_limit = 256M

# maximum IO calls per second (for I/O throttling)
# optional, default is 0 (unlimited)
#
# max_iops      = 40

# maximum IO call size, bytes (for I/O throttling)
# optional, default is 0 (unlimited)
#
# max_iosize        = 1048576

# maximum xmlpipe2 field length, bytes
# optional, default is 2M
#
# max_xmlpipe2_field    = 4M

# write buffer size, bytes
# several (currently up to 4) buffers will be allocated
# write buffers are allocated in addition to mem_limit
# optional, default is 1M
#
# write_buffer      = 1M

# maximum file field adaptive buffer size
# optional, default is 8M, minimum is 1M
#
# max_file_field_buffer = 32M

}

#############################################################################

searchd settings

#############################################################################

searchd
{
# [hostname:]port[:protocol], or /unix/socket/path to listen on
# known protocols are 'sphinx' (SphinxAPI) and 'mysql41' (SphinxQL)
#
# multi-value, multiple listen points are allowed
# optional, defaults are 9312:sphinx and 9306:mysql41, as below
#
# listen = 127.0.0.1
# listen = 192.168.0.1:9312
# listen = 9312
# listen = /var/run/searchd.sock
listen = 9312
#listen = 9306:mysql41

# log file, searchd run info is logged here
# optional, default is 'searchd.log'
log         = /usr/local/coreseek/var/log/searchd.log

# query log file, all search queries are logged here
# optional, default is empty (do not log queries)
query_log       = /usr/local/coreseek/var/log/query.log

# client read timeout, seconds
# optional, default is 5
read_timeout        = 5

# request timeout, seconds
# optional, default is 5 minutes
client_timeout      = 300

# maximum amount of children to fork (concurrent searches to run)
# optional, default is 0 (unlimited)
max_children        = 30

# PID file, searchd process ID file name
# mandatory
pid_file        = /usr/local/coreseek/var/log/searchd.pid

# max amount of matches the daemon ever keeps in RAM, per-index
# WARNING, THERE'S ALSO PER-QUERY LIMIT, SEE SetLimits() API CALL
# default is 1000 (just like Google)
max_matches     = 1000

# seamless rotate, prevents rotate stalls if precaching huge datasets
# optional, default is 1
seamless_rotate     = 1

# whether to forcibly preopen all indexes on startup
# optional, default is 1 (preopen everything)
preopen_indexes     = 0

# whether to unlink .old index copies on succesful rotation.
# optional, default is 1 (do unlink)
unlink_old      = 1

# attribute updates periodic flush timeout, seconds
# updates will be automatically dumped to disk this frequently
# optional, default is 0 (disable periodic flush)
#
# attr_flush_period = 900

# instance-wide ondisk_dict defaults (per-index value take precedence)
# optional, default is 0 (precache all dictionaries in RAM)
#
# ondisk_dict_default   = 1

# MVA updates pool size
# shared between all instances of searchd, disables attr flushes!
# optional, default size is 1M
mva_updates_pool    = 1M

# max allowed network packet size
# limits both query packets from clients, and responses from agents
# optional, default size is 8M
max_packet_size     = 8M

# crash log path
# searchd will (try to) log crashed query to 'crash_log_path.PID' file
# optional, default is empty (do not create crash logs)
#
# crash_log_path        = /usr/local/coreseek/var/log/crash

# max allowed per-query filter count
# optional, default is 256
max_filters     = 256

# max allowed per-filter values count
# optional, default is 4096
max_filter_values   = 4096

# socket listen queue length
# optional, default is 5
#
# listen_backlog        = 5

# per-keyword read buffer size
# optional, default is 256K
#
# read_buffer       = 256K

# unhinted read size (currently used when reading hits)
# optional, default is 32K
#
# read_unhinted     = 32K

# max allowed per-batch query count (aka multi-query count)
# optional, default is 32
max_batch_queries   = 32

# max common subtree document cache size, per-query
# optional, default is 0 (disable subtree optimization)
#
# subtree_docs_cache    = 4M

# max common subtree hit cache size, per-query
# optional, default is 0 (disable subtree optimization)
#
# subtree_hits_cache    = 8M

# multi-processing mode (MPM)
# known values are none, fork, prefork, and threads
# optional, default is fork
#
workers         = threads # for RT to work

# max threads to create for searching local parts of a distributed index
# optional, default is 0, which means disable multi-threaded searching
# should work with all MPMs (ie. does NOT require workers=threads)
#
# dist_threads      = 4

# binlog files path; use empty string to disable binlog
# optional, default is build-time configured data directory
#
# binlog_path       = # disable logging
# binlog_path       = /usr/local/coreseek/var/data # binlog.001 etc will be created there

# binlog flush/sync mode
# 0 means flush and sync every second
# 1 means flush and sync every transaction
# 2 means flush every transaction, sync every second
# optional, default is 2
#
# binlog_flush      = 2

# binlog per-file size limit
# optional, default is 128M, 0 means no limit
#
# binlog_max_log_size   = 256M

# per-thread stack size, only affects workers=threads mode
# optional, default is 64K
#
# thread_stack          = 128K

# per-keyword expansion limit (for dict=keywords prefix searches)
# optional, default is 0 (no limit)
#
# expansion_limit       = 1000

# RT RAM chunks flush period
# optional, default is 0 (no periodic flush)
#
# rt_flush_period       = 900

# query log file format
# optional, known values are plain and sphinxql, default is plain
#
# query_log_format      = sphinxql

# version string returned to MySQL network protocol clients
# optional, default is empty (use Sphinx version)
#
# mysql_version_string  = 5.0.37

# trusted plugin directory
# optional, default is empty (disable UDFs)
#
# plugin_dir            = /usr/local/sphinx/lib

# default server-wide collation
# optional, default is libc_ci
#
# collation_server      = utf8_general_ci

# server-wide locale for libc based collations
# optional, default is C
#
# collation_libc_locale = ru_RU.UTF-8

# threaded server watchdog (only used in workers=threads mode)
# optional, values are 0 and 1, default is 1 (watchdog on)
#
# watchdog              = 1

# SphinxQL compatibility mode (legacy columns and their names)
# optional, default is 0 (SQL compliant syntax and result sets)
#
# compat_sphinxql_magics    = 1

}

--eof--

求救一下 不知道哪里错了 中文搜不出结果来

时间: 2024-11-05 17:26:02

全文检索-corseek 中文检索时搜不出结果 搜英文单词正常的相关文章

Mybatis使用MySQL模糊查询时输入中文检索不到结果怎么办_java

项目开发中,在做Mybatis动态查询时,遇到了一个问题:MySQL在进行LIKE模糊查询时,输入英文可以正常检索出结果,但是输入中文后检索得到的结果为空. 由于是使用GET方式请求,所以为了确保中文不乱码,在控制台接收到请求参数后,对中文进行了一次编码. try { realName = new String(realName.getBytes("GBK"), "UTF-8"); } catch (UnsupportedEncodingException exce

Linux下PHP+MySQL+CoreSeek中文检索引擎配置

说明: 操作系统:CentOS 5.X 服务器IP地址:192.168.21.127 Web环境:Nginx+PHP+MySQL 站点根目录:/usr/local/nginx/html 目的:安装coreseek中文检索引擎,配置MySQL数据库访问接口,使用PHP程序实现中文检索. CoreSeek官方网站: http://www.coreseek.cn/ http://www.coreseek.cn/products/=%22/products-install/step_by_step/ h

在SQL 2005中用T-SQL插入中文数据时出现的问号或乱码的解决方案[转]

在SQL 2005中用T-SQL插入中文数据时出现的问号或乱码的解决方案 病症表现为:主要表现为用T-sql语句插入中文数据时数据库显示全是问号"???" 解决办法: 第一种办法:先选中出错的数据库→选中以后右键点击属性会弹出数据库属性 对话框→选中数据库属性对话框中的选项→把选项中的排序规则设置成:Chinese_PRC_90_CI_AS→最后点击确定即可.(注意:在选择数据库属性的时候必须确保你所修改的数据库未被使用才可以修改否则会失败的) 第二种办法:首先打开你的sql查询分析器

javaweb-jsp页面通过href向servlet类页面传递中文参数时出现乱码了怎么办?

问题描述 jsp页面通过href向servlet类页面传递中文参数时出现乱码了怎么办? jsp页面通过href向servlet类页面传递中文参数时出现乱码了怎么办?我在doGet方法中设置了response.setContentType("text/html"); response.setCharacterEncoding("utf-8"); request.setCharacterEncoding("utf-8");所有的编码方式都是utf-8

我想写一个监控程序,如有一个程序a我让它一直开着,,写个程序b,当a出现异常时(有弹出框)把a重启,怎么实现

问题描述 我想写一个监控程序,如有一个程序a我让它一直开着,,写个程序b,当a出现异常时(有弹出框)把a重启,怎么实现大家帮帮忙 解决方案 解决方案二:大家给个建议行吗,,,谢谢了解决方案三:被控端时刻监听服务端的指令请求.控制端监听被控端的异常请求.如果被控端异常那么发送指令给控制端.由控制端决定是否让他重起.解决方案四:被控端不是自己写的,这个东西很棘手,,我在google上搜了老长时间都没有点儿方法解决方案五:a程序出现异常的时候,把异常代码存储到某个文件里面,b程序定时读取这个文件,发现

java-用过LIRE的朋友,请问在建索引的时候能额外添加文本信息并在检索时可同时加入文本条件吗?

问题描述 用过LIRE的朋友,请问在建索引的时候能额外添加文本信息并在检索时可同时加入文本条件吗? 我为图像建立索引的时候,想对图像进行手动的分类,需要加入一些文字作为标签,然后在检索的时候可以加入标签文字以实现在一定范围内的图像检索. 我在建索引的时候,可以往DocumentBuilder创建的Document中添加额外的Field,这是没问题的.但在检索的时候,不知道如何为ImageSearcher添加文本条件,也没有发现提供这样的方法,请问有办法实现我的需求吗?

Outlook Web Access过程中或者发送删除邮件时均弹出错误提示

症状:用OWA(Outlook Web Access)的过程中,或者发送删除邮件时均弹出错误提示,显示detail后见到信息: Exception Details ----------------- Date: Wed Nov 4 09:38:48 UTC+0800 2009 Message: Automation server can't create object Url: https://lfmail.net/owa/?ae=Folder&t=IPF.Note&id=LgAAAABi

win8系统关闭浏览器时不弹出警告窗口怎么办

  解决方法: 1.打开IE 浏览器,在浏览器的右上角点击"工具"; 2.然后在 工具的选卡栏中选择"常规",然后点击"选项卡"; 3.在"选项卡浏览设置"中,我们勾选"关闭多个选项卡时发出警告"即可. win8系统关闭浏览器时不弹出警告窗口的解决方法全部内容讲解到这里,其实浏览器在我们第一次点击红色"关闭"按钮的时候,都会弹出提示的,只是有时候我们没去留意,就把不该勾选的地方勾选了,结

JavaScript读取中文cookie时的乱码问题的解决方法_javascript技巧

复制代码 代码如下: function Get_Cookie(name) { var start = document.cookie.indexOf(name+-=-); var len = start+name.length+1; if ((!start) && (name != document.cookie.substring(0,name.length))) return null; if (start == -1) return null; var end = document.