Tuning Indexing Performance
Ferret’s indexing performance is lightning-fast out of the box, so you’re justified in wondering whether you need to know how to make Ferret even faster. In most cases, you won’t need Ferret to go any faster than it already does. But if you are indexing gigabytes rather than megabytes and the indexing process is taking hours rather than seconds, you need to know how to push Ferret to its limits.
In-Memory Indexing
People who have used Lucene or earlier versions of Ferret might
try to improve indexing speed by indexing to a RAMDirectory
and
then flushing the RAMDirectory
to disk. That trick is now
pointless; Ferret automatically indexes as many documents as it can in
memory before flushing them to the Directory
. You can ensure that all the
indexing is done in memory by setting the parameters :max_buffered_docs
and :max_buffer_memory
to
sufficiently large quantities.
Indexing Parameters
The indexing process is regulated by the parameters, shown with their defaults in Table 3-1.
Table 3-1. Index parameters
Parameter | Default | Short description |
---|---|---|
:max_buffer_memory
| 16 Mb | The maximum memory used by the IndexWriter before buffered documents
are flushed to the index |
:chunk_size
| 1 Mb | The size of the memory chunks allocated to the memory pool during indexing |
:merge_factor
| 10 | The minimum number of similar sized segments needed to trigger a merge |
:max_buffered_docs
| 10,000 | The maximum number of documents that will be buffered by
the IndexWriter before they are flushed to the index ... |
Get Ferret now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.