Documents, Fields, and Boosts

Documents

The best way to think of an index is as a searchable array of documents. A Ferret document is a collection of fields representing a chunk of data that you want to make searchable. Whether that chunk of data is a database row, a Word document, or an MP3 file doesn’t matter. They are all just documents to Ferret. A Ferret document can be represented by the Ferret::Document class. This class extends Ruby’s Hash class, adding only a boost attribute. In fact, as you saw in Example 1-2, documents can also be Hashes, where the key is the name of the field and the value is the data stored in the field.

Ferret Field Names

Field names should always be represented by :symbols rather than strings. That is, you should add fields like this:

index << {:title => "Tom Sawyer", :author => "Mark Twain"}

Not like this:

index << {"title" => "Tom Sawyer", "author" => "Mark Twain"}

The term “document” can be quite confusing. We often need to talk about the idea of a document in an index that is implemented by the Document class. A document can represent a PDF or a text document, or it can represent something like a movie or a product. Make note of the formatting we use to distinguish documents from the Document class.

Earlier we mentioned that Documents have a boost attribute, but we didn’t say what boost was for. The boost attribute gives a document a higher weighting in the results of a search. By using the boost attribute, you can make more important documents appear ...

Get Ferret now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Ferret by David Balmain

Documents, Fields, and Boosts

Documents

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly