Tuning Indexing Performance

Ferret’s indexing performance is lightning-fast out of the box, so you’re justified in wondering whether you need to know how to make Ferret even faster. In most cases, you won’t need Ferret to go any faster than it already does. But if you are indexing gigabytes rather than megabytes and the indexing process is taking hours rather than seconds, you need to know how to push Ferret to its limits.

In-Memory Indexing

People who have used Lucene or earlier versions of Ferret might try to improve indexing speed by indexing to a RAMDirectory and then flushing the RAMDirectory to disk. That trick is now pointless; Ferret automatically indexes as many documents as it can in memory before flushing them to the Directory. You can ensure that all the indexing is done in memory by setting the parameters :max_buffered_docs and :max_buffer_memory to sufficiently large quantities.

Indexing Parameters

The indexing process is regulated by the parameters, shown with their defaults in Table 3-1.

Table 3-1. Index parameters

Parameter	Default	Short description
`:max_buffer_memory`	16 Mb	The maximum memory used by the `IndexWriter` before buffered documents are flushed to the index
`:chunk_size`	1 Mb	The size of the memory chunks allocated to the memory pool during indexing
`:merge_factor`	10	The minimum number of similar sized segments needed to trigger a merge
`:max_buffered_docs`	10,000	The maximum number of documents that will be buffered by the `IndexWriter` before they are flushed to the index ...

Get Ferret now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Ferret by David Balmain

Tuning Indexing Performance

In-Memory Indexing

Indexing Parameters

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly