问题导读
1.如何实现构建RAMDirectory,将索引放入内存中?
2.自定义分词:3.0 ,4.x有什么不同?
3. 4.x 如何实现构建多索引目录?
最近,需要对项目进行lucene版本升级。而原来项目时基于lucene 3.0的,很古老的一个版本的了。在老版本中中,我们主要用了几个lucene的东西:
1、查询lucene多目录索引。
2、构建RAMDirectory,把索引放到内存中,以提高检索效率。
3、构建Lucene自定义分词。
4、修改Lucene默认的打分算法。
下面,将代码改造前和改造后做一对比:
1. 搜索多索引目录
3.0 构建多索引目录:
- // 初始化全国索引
- private boolean InitGlobal(String strRootPath) {
- try {
-
- IndexSearcher[] searchers = new IndexSearcher[2];
-
- MultiSearcher globalSearcher = null;
- if (Configution.IsMMap.equalsIgnoreCase("true")) {
-
- searchers[0] = new IndexSearcher(new RAMDirectory(FSDirectory
- .open(new File(strRootPath + "/" + GLABOL_INDEX))));
- searchers[1] = new IndexSearcher(new RAMDirectory(FSDirectory
- .open(new File(strRootPath + "/" + BUS_INDEX))));
- // searchers[2] = new IndexSearcher(new RAMDirectory(FSDirectory
- // .open(new File(strRootPath + "/" + LU_INDEX))));
- globalSearcher = new MultiSearcher(searchers);
- } else {
- searchers[0] = new IndexSearcher(FSDirectory.open(new File(
- strRootPath + "/" + GLABOL_INDEX)));
- searchers[1] = new IndexSearcher(FSDirectory.open(new File(
- strRootPath + "/" + BUS_INDEX)));
- // searchers[2] = new IndexSearcher(FSDirectory.open(new File(
- // strRootPath + "/" + LU_INDEX)));
-
- globalSearcher = new MultiSearcher(searchers);
- }
- System.out.println("finish Global");
-
- m_mapIndexName2Searcher.put("0", globalSearcher);
- m_mapAdmin2IndexName.put("0", "0");
-
- return true;
-
- } catch (Exception e) {
- e.printStackTrace();
- SearchLog.SearchLog.error("全国索引初始化异常");
- return false;
- }
- }
复制代码
Ok,使用MultiSearcher,这是lucene低版本搜索多索引的解决方案。但是在高版本,MutiSearcher这个类本身都删除了,折腾我很长时间。可见以版本帝著称的Lucene代码设计不是太好。整个lucene代码,接口使用很少,大多是类和抽象类。
4.x 构建多索引目录:
- // 初始化全国索引
- private boolean InitGlobal(String strRootPath) {
- try {
-
- IndexSearcher globalSearcher = null;
- if (Configution.IsMMap.equalsIgnoreCase("true")) {
-
- IndexReader irGlobal = DirectoryReader.open(new RAMDirectory(FSDirectory
- .open(new File(strRootPath + "/" + GLABOL_INDEX)),new IOContext()));
-
- IndexReader irBus = DirectoryReader.open(new RAMDirectory(FSDirectory
- .open(new File(strRootPath + "/" + BUS_INDEX)),new IOContext()));
-
- MultiReader mr = new MultiReader(irGlobal,irBus);
-
-
- globalSearcher = new IndexSearcher(mr);//new MultiSearcher(searchers);
- } else {
-
- IndexReader irGlobal = DirectoryReader.open(FSDirectory
- .open(new File(strRootPath + "/" + GLABOL_INDEX)));
-
- IndexReader irBus = DirectoryReader.open(FSDirectory
- .open(new File(strRootPath + "/" + BUS_INDEX)));
-
- MultiReader mr = new MultiReader(irGlobal,irBus);
- globalSearcher = new IndexSearcher(mr);//new MultiSearcher(searchers);
- }
- System.out.println("finish Global");
-
- m_mapIndexName2Searcher.put("0", globalSearcher);
- m_mapAdmin2IndexName.put("0", "0");
-
- return true;
-
- } catch (Exception e) {
- e.printStackTrace();
- SearchLog.SearchLog.error("全国索引初始化异常");
- return false;
- }
- }
复制代码
ok 改造后,直接用IndexSearcher替代MultiSearcher,通过传入MultiReader来检索多个索引目录。
2、构建RAMDirectory,将索引放入内存中。
3.0 构建内存索引目录:
- searchers[0] = new IndexSearcher(new RAMDirectory(FSDirectory
- .open(new File(strRootPath + "/" + GLABOL_INDEX))));
- searchers[1] = new IndexSearcher(new RAMDirectory(FSDirectory
- .open(new File(strRootPath + "/" + BUS_INDEX))));
复制代码
直接将Diretory作为RAMDirectory的构造函数,注意这个动作有坑,如果数据量大,你要等很久的!
4.x 构建内存索引目录:
- IndexReader irGlobal = DirectoryReader.open(new RAMDirectory(FSDirectory
- .open(new File(strRootPath + "/" + GLABOL_INDEX)),new IOContext()));
-
- IndexReader irBus = DirectoryReader.open(new RAMDirectory(FSDirectory
- .open(new File(strRootPath + "/" + BUS_INDEX)),new IOContext()));
-
- MultiReader mr = new MultiReader(irGlobal,irBus);
复制代码
在4.x中,安装3.0构造方法是不行的,还需要传入一个IOContext对象,汗~~~~~~~~~~~~~~~~
3、自定义分词:
3.0 自定义分词:
- public class SingleAnalyzer extends Analyzer {
-
- /**
- * @param args
- */
-
-
- public TokenStream tokenStream(String fieldName, Reader reader){
- TokenStream result = null;
- if(fieldName.equals("name"))
- {
- result = new SingleTokenizer(reader);
- }
- if(fieldName.equals("totalcity"))
- {
- result = new IKTokenizer(reader, false);
- }
-
- // result = new StandardFilter(result);
- // result = new LowerCaseFilter(result);
- // result = new StopFilter(result, stopSet);
- return result;
- }
-
-
- public static void main(String[] args) {
- // TODO Auto-generated method stub
-
- }
-
- }
复制代码
重写tokenStream方法即可,很简单。
4.x自定义分词:
- public class SingleAnalyzer extends Analyzer {
-
- /**
- * @param args
- */
-
-
- // public TokenStream tokenStream(String fieldName, Reader reader){
- // TokenStream result = null;
- // if(fieldName.equals("name"))
- // {
- // result = new SingleTokenizer(reader);
- // }
- // if(fieldName.equals("totalcity"))
- // {
- // result = new IKTokenizer(reader, false);
- // }
- //
- //// result = new StandardFilter(result);
- //// result = new LowerCaseFilter(result);
- // // result = new StopFilter(result, stopSet);
- // return result;
- // }
-
- @Override
- protected TokenStreamComponents createComponents(String fieldName,
- Reader reader) {
- // TODO Auto-generated method stub
- // final Tokenizer source = new ChineseTokenizer(reader);
- // return new TokenStreamComponents(source, new ChineseFilter(source));
- Tokenizer source = null;
- if(fieldName.equals("name")){
- source = new SingleTokenizer(reader);
- }else if(fieldName.equals("totalcity")){
- source = new IKTokenizer(reader, false);
- }
- return new TokenStreamComponents(source, source);
- }
-
- }
复制代码
OK,在4.x中你需要重写createComponents方法。
4、打分算法:
3.x和4.x打分算法变化不大,但是命名空间发生了变化,汗~~~~~~~~~~~~
3.x 命名空间:引入:import org.apache.lucene.search.DefaultSimilarity,命名空间在:org.apache.lucene.search
4.x命名空间:引入:import org.apache.lucene.search.similarities.*,命名空间在:org.apache.lucene.search.similarities。
5、查询表达式:主要体现在TermRangeQuery上,3.x版本的一个参数是string类型,但是在4.x版本变成了包了string一层的BytesRef,还有其他很多细节变化
3.x TermRangerQuery:
- String left = Long
- .toString((long) (rcBound.m_dLeft * COORDINATE_SCALE_FACTOR));
- String right = Long
- .toString((long) (rcBound.m_dRight * COORDINATE_SCALE_FACTOR));
- String top = Long
- .toString((long) (rcBound.m_dTop * COORDINATE_SCALE_FACTOR));
- String bottom = Long
- .toString((long) (rcBound.m_dBottom * COORDINATE_SCALE_FACTOR));
-
-
-
- TermRangeQuery query1 = new TermRangeQuery("lon", left, right,
- true, true);
- TermRangeQuery query2 = new TermRangeQuery("lat", bottom, top,
- true, true);
- searchQuery.add(query1, BooleanClause.Occur.MUST);
- searchQuery.add(query2, BooleanClause.Occur.MUST);
复制代码
4.x TermRangerQuery:
- String left = Long
- .toString((long) (rcBound.m_dLeft * COORDINATE_SCALE_FACTOR));
- String right = Long
- .toString((long) (rcBound.m_dRight * COORDINATE_SCALE_FACTOR));
- String top = Long
- .toString((long) (rcBound.m_dTop * COORDINATE_SCALE_FACTOR));
- String bottom = Long
- .toString((long) (rcBound.m_dBottom * COORDINATE_SCALE_FACTOR));
-
-
- BytesRef brLeft = new BytesRef(left);
- BytesRef brRight = new BytesRef(right);
- BytesRef brBottom = new BytesRef(bottom);
- BytesRef brTop = new BytesRef(top);
-
- TermRangeQuery query1 = new TermRangeQuery("lon",
- brLeft, brRight, true, true);
- TermRangeQuery query2 = new TermRangeQuery("lat",
- brBottom, brTop, true, true);
- searchQuery.add(query1, BooleanClause.Occur.MUST);
- searchQuery.add(query2, BooleanClause.Occur.MUST);
复制代码
6、关闭IndexSearcher
3.x 关闭IndexSearcher直接调用close方法即可:
- public void UnInit() {
- if (!m_bIsInit)
- return;
-
- Iterator iter = m_mapIndexName2Searcher.keySet().iterator();
-
- while (iter.hasNext()) {
-
- String key = (String) iter.next();
-
- MultiSearcher val = (MultiSearcher) m_mapIndexName2Searcher
- .get(key);
-
- try {
-
- val.close();//关闭IndexSearcher
- } catch (IOException e) {
- e.printStackTrace();
- SearchLog.SearchLog.error("分级索引关闭异常");
- }
- }
-
- m_mapIndexName2Searcher.clear();
- m_mapAdmin2IndexName.clear();
- m_mapIndexName2Searcher = null;
- m_mapAdmin2IndexName = null;
- m_bIsInit = false;
- }
复制代码
4.x 关闭IndexSearcher 没有直接close的方法,需要getIndexReader 然后调用IndexReader的close方法:
- public void UnInit() {
- if (!m_bIsInit)
- return;
-
- Iterator iter = m_mapIndexName2Searcher.keySet().iterator();
-
- while (iter.hasNext()) {
-
- String key = (String) iter.next();
-
- IndexSearcher val = (IndexSearcher) m_mapIndexName2Searcher
- .get(key);
-
- try {
- val.getIndexReader().close();//关闭IndexSearcher
- } catch (IOException e) {
- e.printStackTrace();
- SearchLog.SearchLog.error("分级索引关闭异常");
- }
- }
-
- m_mapIndexName2Searcher.clear();
- m_mapAdmin2IndexName.clear();
- m_mapIndexName2Searcher = null;
- m_mapAdmin2IndexName = null;
- m_bIsInit = false;
- }
复制代码
总之,lucene版本变化很大,如果升级很多方法发送变化,您需要细致观察,多试试,才能升级。升级完成后,最好进行一次功能测试,有些功能可能发生变化甚至错误。升级Lucene不是一件好差事~~~~~~~~~
文章转载请注明出处:http://www.cnblogs.com/likehua/p/4387700.html
|