Discuz X1.5+Sphinx帖子全文检索功能搭建

10.09.2010 by jiezhou - 3 Comments, Posted in linux, php+mysql

网站的搜索从来都是巨耗服务器资源的。
如何让用户方便快捷的搜索网站信息一直是困扰站长们的一大问题。
Sphinx是由俄罗斯人Andrew Aksyonoff开发的一个全文检索引擎。意图为其他应用提供高速、低空间占用、高结果相关度的全文搜索功能。
下面就说说如何安装Sphinx并将它集成到Discuz X1.5中。
以下操作均在ubuntu server 9.04中进行的，其它系统相差不大。

下载安装文件

wget http://www.coreseek.cn/uploads/csft/3.2/coreseek-3.2.13.tar.gz
tar zxvf coreseek-3.2.13.tar.gz
cd coreseek-3.2.13
这里下载的是coreseek，它由国人基于Sphinx二次开发，更适合中文搜索。
里面包含修改后的sphinx(csft)及中文分词程序mmseg

安装mmseg

安装coreseek开发的mmseg，它为sphinx提供中文分词功能
cd mmseg-3.2.13
./bootstrap
#ubuntu环境下，需要使用ACLOCAL_FLAGS="-I /usr/share/aclocal" ./bootstrap
./configure --prefix=/usr/local/mmseg
make
make install
安装完成后，mmseg使用的词典和配置文件，将自动安装到/usr/local/mmseg/etc中
其中uni.lib就是分词默认使用的词典
如果觉得这个词库词量过小，可到搜狗上下载词库: http://pinyin.sogou.com/dict/list.php
很多词库找不到TXT版的下载，可以直接通过地址: http://pinyin.sogou.com/dict/download_txt.php?id=词库id ，来下载
然后通过下面这种方式生成词典，替换掉默认词典就可以了
/usr/local/mmseg/bin/mmseg -u words.txt # utf-8编码
# 生成的文件名默认为原文件名加.uni后缀
cp words.txt.uni /usr/local/mmseg/etc/uni.lib

安装sphinx

cd csft-3.2.13
#执行configure，进行编译配置：
./configure --prefix=/usr/local/sphinx \
--with-mysql=/usr/local/mysql/ \
--with-mysql-includes=/usr/local/mysql/include/mysql/ \
--with-mysql-libs=/usr/local/mysql/lib/mysql/ \
--with-mmseg=/usr/local/mmseg/ \
--with-mmseg-includes=/usr/local/mmseg/include/mmseg/ \
--with-mmseg-libs=/usr/local/mmseg/lib/
make
make install

注意：请修改/usr/local/mysql/为相应的mysql目录

配置sphinx

1、编写适合discuz x1.5的配置文件
vim /usr/local/sphinx/etc/csft.cnf
文件内容请点击这里下载。
把其中的数据库信息修改自己mysql服务器信息
2、建立sph_counter，在数据库中执行以下语词
CREATE TABLE IF NOT EXISTS `sph_counter` (
  `counter_id` int(10) NOT NULL,
  `max_doc_id` int(10) NOT NULL,
  PRIMARY KEY (`counter_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

运行sphinx

1、生成索引文件
索引文件是检索的根据，如果数据量大的话，第一次运行速度会比较慢
执行以下命令：

/usr/local/sphinx/bin/indexer --config /usr/local/sphinx/etc/csft.conf --all

2、运行sphinx程序

/usr/local/sphinx/bin/searchd --config /usr/local/sphinx/etc/csft.conf

3、加入crontab计划任务
这样的目的是每分钟自动更新索引，以保证检索引擎可以获得最新的数据

vim /etc/crontab

加上以下内容：

# Incremental Index posts data
* 0-3 * * * root /usr/local/sphinx/bin/indexer --config /usr/local/sphinx/etc/csft.conf posts_minute --rotate
* 6-23 * * * root /usr/local/sphinx/bin/indexer --config /usr/local/sphinx/etc/csft.conf posts_minute --rotate
0 4 * * * root /usr/local/sphinx/bin/indexer --config /usr/local/sphinx/etc/csft.conf posts_merge --rotate && /usr/local/sphinx/bin/indexer --config /usr/local/sphinx/etc/csft.conf --merge posts posts_merge --rotate

# Incremental Index threads data
* 0-3 * * * root /usr/local/sphinx/bin/indexer --config /usr/local/sphinx/etc/csft.conf threads_minute --rotate
* 6-23 * * * root /usr/local/sphinx/bin/indexer --config /usr/local/sphinx/etc/csft.conf threads_minute --rotate
0 4 * * * root /usr/local/sphinx/bin/indexer --config /usr/local/sphinx/etc/csft.conf threads_merge --rotate && /usr/local/sphinx/bin/indexer --config /usr/local/sphinx/etc/csft.conf --merge threads threads_merge --rotate#

配置Discuz X1.5

1、后台配置
Discuz X1.5带有一个简单的接口，简单得你不修改都没法使用。
进入后台->全局->搜索设置，如图设置：

2、文件修改
打开/source/module/search/forum.php
//找到以下这行
if($srchtype == 'fulltext' && $_G['setting']['sphinxon']) {
//改成
if($_G['setting']['sphinxon']) {


//再找到下面几行
if($srchtype == "fulltext") {
	$result = $s->query("'".$srchtxt."'", $_G['setting']['sphinxmsgindex']);
} else {
	$result = $s->query($srchtxt, $_G['setting']['sphinxsubindex']);
}
//改成
$_srchtxt = iconv('gbk','utf-8',$srchtxt); //将gbk转成utf-8
if($srchtype == "fulltext") {
	$result = $s->query("'".$_srchtxt."'", $_G['setting']['sphinxmsgindex']);
} else {
	$result = $s->query($_srchtxt, $_G['setting']['sphinxsubindex']);
}
至此，OK，大功告成！
哈哈，就这样，你也能拥有一个强大的搜索引擎~
相信这将会是你论坛一大亮点！

补充：操作中可能遇到的问题

1、执行indexer可能报错，报错内容为：error while loading shared libraries: libmysqlclient.so.16: cannot open shared object file
解决办法：
echo "/usr/local/mysql/lib/mysql" >> /etc/ld.so.conf
ldconfig
2、生成词典的时候报错，报错内容为：FATAL: index ‘posts’: unknown charset type ‘zh_cn.gbk’
那是因为你的配置文件改成这样了：charset_type = zh_cn.gbk
解决办法：
官方说，还是使用utf-8靠谱，改回来charset_type = zh_cn.utf-8
只要在往数据库取数据时，执行 set names utf-8，即使你的数据表的编码是gbk，也可以正常读出的。
但注意，搜索的时候，还是要将编码转成utf-8后才能正确搜索。

3 Responses to “Discuz X1.5+Sphinx帖子全文检索功能搭建”

yang说道：

2010年11月25日 5:40 下午

您好。我在运行/usr/local/sphinx/bin/indexer –config /usr/local/sphinx/etc/csft.conf –all的时候出现了如下错误：
Coreseek Fulltext 3.2 [ Sphinx 0.9.9-release (r2117)]
Copyright (c) 2007-2010,
Beijing Choice Software Technologies Inc (http://www.coreseek.com)

using config file ‘/usr/local/sphinx/etc/csft.conf’…
indexing index ‘posts’…
collected 11116 docs, 6.8 MB
sorted 1.9 Mhits, 100.0% done
total 11116 docs, 6768273 bytes
total 5.652 sec, 1197415 bytes/sec, 1966.59 docs/sec
indexing index ‘posts_minute’…
ERROR: index ‘posts_minute’: sql_query_range: min_id=’12110′, max_id=’12109′: min_id must be less than max_id.
total 0 docs, 0 bytes
total 0.000 sec, 0 bytes/sec, 0.00 docs/sec
indexing index ‘posts_merge’…
ERROR: index ‘posts_merge’: sql_query_range: min_id=’12110′, max_id=’12109′: min_id must be less than max_id.
total 0 docs, 0 bytes
total 0.000 sec, 0 bytes/sec, 0.00 docs/sec
indexing index ‘threads’…
collected 2036 docs, 0.1 MB
sorted 0.0 Mhits, 100.0% done
total 2036 docs, 70004 bytes
total 0.068 sec, 1028850 bytes/sec, 29923.13 docs/sec
indexing index ‘threads_minute’…
ERROR: index ‘threads_minute’: sql_query_range: min_id=’2450′, max_id=’2449′: min_id must be less than max_id.
total 0 docs, 0 bytes
total 0.000 sec, 0 bytes/sec, 0.00 docs/sec
indexing index ‘threads_merge’…
ERROR: index ‘threads_merge’: sql_query_range: min_id=’2450′, max_id=’2449′: min_id must be less than max_id.
total 0 docs, 0 bytes
total 0.000 sec, 0 bytes/sec, 0.00 docs/sec
total 4 reads, 0.014 sec, 1456.4 kb/call avg, 3.6 msec/call avg
total 25 writes, 0.049 sec, 544.0 kb/call avg, 1.9 msec/call avg
不知何原因？好像是最小ID大于最大ID？具体能分析下吗？谢谢了！
9527说道：

2011年08月27日 1:35 上午

posts_minute, posts_merge都是对上次merge之后发的新帖子建立索引的, 所以索引范围是max_doc_id+1 到 (SELECT MAX(tid) FROM pre_forum_thread). 但如果上次merge之后没有新帖子, 则会有max_doc_id =(SELECT MAX(tid) FROM pre_forum_thread). 简单的解决办法是把max_doc_id+1改成max_doc_id. 只是这样会出现重复索引. 彻底的解决办法是在增量索引之前先检测有没有新帖子.
sungyism说道：

2012年02月13日 5:27 下午

服务器安装了Sphinx+discuz，怎么测试是不是使用Sphinx搜索的？

芥舟笔记

Discuz X1.5+Sphinx帖子全文检索功能搭建

3 Responses to “Discuz X1.5+Sphinx帖子全文检索功能搭建”

Archives

Categories

default