lucene Index Store TermVector 说明-白红宇

最新的lucene 3.0的field是这样的:

Field options for indexing

Index.ANALYZED – use the analyzer to break the Field’s value into a stream of separate tokens and make each token searchable.

Index.NOT_ANALYZED – do index the field, but do not analyze the String. Instead, treat the Field’s entire value as a single token and make that token searchable.

Index.ANALYZED_NO_NORMS – an advanced variant of Index.ANALYZED which does not store norms information in the index.

Index.NOT_ANALYZED_NO_NORMS – just like , but also do not store Norms.

Index.NO – don’t make this field’s value available for searching at all.

Field options for storing fields

Store.YES — store the value. When the value is stored, the original String in its entirety is recorded in the index and may be retrieved by an IndexReader.

Store.NO – do not store the value. This is often used along with Index.ANALYZED to index a large text field that doesn’t need to be retrieved in its original form.

Field options for term vectors

TermVector.YES – record the unique terms that occurred, and their counts, in each document, but do not store any positions or offsets information.

TermVector.WITH_POSITIONS – record the unique terms and their counts, and also the positions of each occurrence of every term, but no offsets.

TermVector.WITH_OFFSETS – record the unique terms and their counts, with the offsets (start & end character position) of each occurrence of every term, but no positions.

TermVector.WITH_POSITIONS_OFFSETS – store unique terms and their counts, along with positions and offsets.

TermVector.NO – do not store any term vector information.

If Index.NO is specified for a field, then you must also specify TermVector.NO.

具一些例子来说明这些怎么用

Index Store TermVector Example usage

NOT_ANALYZED YES NO Identifiers (file names, primary keys),

Telephone and Social Security

numbers, URLs, personal names, Dates

ANALYZED YES WITH_POSITIONS_OFFSETS Document title, document abstract

ANALYZED NO WITH_POSITIONS_OFFSETS Document body

NO YES NO Document type, database primary key

NOT_ANALYZED NO NO Hidden keywords

When Lucene builds the inverted index, by default it stores all necessary information to implement the Vector Space model. This model requires the count of every term that occurred in the document, as well as the positions of each occurrence (needed for phrase searches).

You can tell Lucene to skip indexing the term frequency and positions by calling:

Field.setOmitTermFreqAndPositions(true)

摘自：http://www.cnblogs.com/fxjwind/archive/2011/07/04/2097705.html

本文转自张昺华-sky博客园博客，原文链接：http://www.cnblogs.com/bonelee/p/6604399.html，如需转载请自行联系原作者