博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
lucene Index Store TermVector 说明
阅读量:7237 次
发布时间:2019-06-29

本文共 2608 字,大约阅读时间需要 8 分钟。

最新的lucene 3.0的field是这样的:

Field options for indexing

Index.ANALYZED – use the analyzer to break the Field’s value into a stream of separate tokens and make each token searchable.
Index.NOT_ANALYZED – do index the field, but do not analyze the String. Instead, treat the Field’s entire value as a single token and make that token searchable. 
Index.ANALYZED_NO_NORMS – an advanced variant of Index.ANALYZED which does not store norms information in the index. 
Index.NOT_ANALYZED_NO_NORMS – just like , but also do not store Norms.
Index.NO – don’t make this field’s value available for searching at all.

Field options for storing fields

Store.YES — store the value. When the value is stored, the original String in its entirety is recorded in the index and may be retrieved by an IndexReader.
Store.NO – do not store the value. This is often used along with Index.ANALYZED to index a large text field that doesn’t need to be retrieved in its original form.

Field options for term vectors

TermVector.YES – record the unique terms that occurred, and their counts, in each document, but do not store any positions or offsets information.
TermVector.WITH_POSITIONS – record the unique terms and their counts, and also the positions of each occurrence of every term, but no offsets.
TermVector.WITH_OFFSETS – record the unique terms and their counts, with the offsets (start & end character position) of each occurrence of every term, but no positions.
TermVector.WITH_POSITIONS_OFFSETS – store unique terms and their counts, along with positions and offsets.
TermVector.NO – do not store any term vector information.
If Index.NO is specified for a field, then you must also specify TermVector.NO.

具一些例子来说明这些怎么用

Index                   Store  TermVector                                Example usage 
NOT_ANALYZED     YES         NO                                        Identifiers (file names, primary keys),
                                                                                         Telephone and Social Security
                                                                                         numbers, URLs, personal names, Dates
ANALYZED              YES     WITH_POSITIONS_OFFSETS    Document title, document abstract
ANALYZED              NO      WITH_POSITIONS_OFFSETS    Document body
NO                         YES        NO                                        Document type, database primary key
NOT_ANALYZED     NO         NO                                         Hidden keywords

When Lucene builds the inverted index, by default it stores all necessary information to implement the Vector Space model. This model requires the count of every term that occurred in the document, as well as the positions of each occurrence (needed for phrase searches).

You can tell Lucene to skip indexing the term frequency and positions by calling:
Field.setOmitTermFreqAndPositions(true)

 

摘自:http://www.cnblogs.com/fxjwind/archive/2011/07/04/2097705.html

本文转自张昺华-sky博客园博客,原文链接:http://www.cnblogs.com/bonelee/p/6604399.html,如需转载请自行联系原作者

你可能感兴趣的文章
【挨踢人物传】frankfan:和自己赛跑的人——不要怕、不后悔!(第九期)
查看>>
anjularjs 第一天
查看>>
HSRP (不同VLAN之间的热备份路由协议)
查看>>
大数据平台一键安装OS【定制化OS镜像制作】
查看>>
git跟踪指定几个文件夹
查看>>
centos服务器到网关丢包(nf_conntrack:table full)
查看>>
Keepalive 之 高可用实现
查看>>
Ansible 之 概念和常用模块介绍
查看>>
Python实例:字典运算:查找字典中的最大最小值
查看>>
git rebase(高级)
查看>>
电信2月国内市场份额52.22% 环比上月下降0.61%
查看>>
6月21日全球域名注册商(国际域名)保有量及市场份额
查看>>
批量设置0777
查看>>
centos6对xen4.2的支持
查看>>
用rsync同步公网centos yum源做本地yum源服务器
查看>>
linux sftp
查看>>
Linux的两种随机数生成器
查看>>
freeradius+mysql+pppoe认证
查看>>
与“十“俱进 阿里数据库运维10年演进之路
查看>>
关于运维人员的未来职业生涯
查看>>