Other Indexing Improvements
Now that we can index multiple different types of documents, it would be nice to be able to have a bit more control over the indexing process. We should be able to specify multiple directories to add, and also specify file path patterns. It would also be nice if we could somehow make sure that files are added only when they need to beâe.g., either they havenât been added yet or they were modified since they were added. Weâd also like some way to update the index so that modified files are reindexed and deleted files are deleted from the index.
To implement these requirements, we use Rubyâs DBM class to
record the time each file was added to the index. DBM is basically a
storable Hash
, which we will store in the /path/to/index/added_at file. Note that since
the filename added_at doesnât begin with an underscore, it wonât conflict
with any of the index files. It makes sense to store it in the same place
as the rest of the index files, since it is basically just another index
file. Here is the code used to add files to the index:
115
if
not
options
.
add
.
empty?
116
include
Ferret
::
Index
117
readers_dir
=
File
.
join
(
File
.
dirname
(
__FILE__
),
"
readers
",
"
*.rb
")
118
Dir
[
readers_dir
].
each
{|
fn
|
require
fn
}
119
field_infos
=
FieldInfos
.
new
(
:index
=>
:untokenized_omit_norms
,
120
:term_vector
=>
:no
)
121
field_infos
.
add_field
(
:content
,
:store
=>
:no
,
:index
=>
:yes
)
122
FerretFind
::
Reader
.
load_readers
(
field_infos
)
123
writer
=
IndexWriter
.
new
(
:path
=>
options ...
Get Ferret now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.