By default, documents are sorted by relevance and then by
document ID if scores are equal. But what if we want to sort the result
set by the value in one of the fields (e.g., price)? One way to do this is
to retrieve the entire result set and make use of Ruby’s Array#sort
method. However, this would take too long for large result sets, not
to mention use up a lot of unnecessary memory. Searcher provides a
:sort
parameter for easy sorting. The
easiest way to specify a sort is to pass a sort string. A sort string is a comma-separated list of field
names with an optional DESC
modifier to reverse the
sort for that field. The type of the field is automatically detected and
the field sorted accordingly. So Float
fields will be
sorted by Float
value, and Integer
fields will be sorted by Integer
value.
SCORE
and DOC_ID
can be used in place of field names to sort by relevance and
internal document ID, respectively. Here are some examples:
index
.
search
(
query
,
:sort
=>
"
title, year DESC
")
index
.
search
(
query
,
:sort
=>
"
SCORE DESC, DOC_ID DESC
")
index
.
search
(
query
,
:sort
=>
"
SCORE, rating DESC
")
Although this will do the job most of the time, you can be a little
more explicit in describing how a result set is sorted by using the Sort
API. You will also need to use the Sort API to take full advantage of sort
caching. There are two classes in the Sort API: Sort
and SortField
.
A SortField
describes how a particular field should be sorted. To create a
SortField
, you need to supply a
field name and a sort type. You can also optionally reverse the sort.
Table 4-2 shows the available sort types. Note
that sort types are identified by Symbols
.
Table 4-2. Sort types
Sort type | Description |
---|---|
:auto
| The default type used when we supply a string sort
descriptor. Ferret will look at the first term in the field’s
index to detect its type. It will sort the field either by
integer, float, or string depending on that first term’s type.
Be careful when using :auto
to sort fields that have numbers in them. If, for example, you
are sorting a field with television show titles, “24” would
probably be the first term in the index, making Ferret think
that the field is an integer field. |
:integer
| Converts every term in the field to an integer and sorts by those integers. |
:float
| Converts every term in the field to a float and sorts by those floats. |
:string
| Performs a locale-sensitive sort on the field. You need to make sure you have your locale set correctly for this to work. If the locale is set to ASCII or ISO-8859-1 and the field is encoded in UTF-8, the field will be incorrectly sorted. |
:byte
| Sorts terms by the order they appear in the index. This will work perfectly for ASCII data and is a lot faster than a string sort. |
:doc_id
| Sorts documents by their internal document ID. For this
type of SortField , a field name is not
necessary. |
:score
| Sort documents by their relevance. This is how documents
are sorted when no sort is specified. For this type of
SortField , a field name is not
necessary. |
The SortField
class also
has four constant SortField
objects:
SortField::SCORE
SortField::DOC_ID
SortField::SCORE_REV
SortField::DOC_ID_REV
With these constants available, you generally won’t ever need to
create a SortField
with the type
:score
or :doc_id
. Here are some examples of how to
create SortFields
:
title_sort
=
SortField
.
new
(
:title
,
:type
=>
:string
)
path_sort
=
SortField
.
new
(
:path
,
:type
=>
:byte
)
rating_sort
=
SortField
.
new
(
:rating
,
:type
=>
:float
,
:reverse
=>
true
)
The Sort
object is used to hold SortFields
in order of
precedence to sort a result set. It is relatively straightforward to
use. It also allows you to completely reverse all
SortFields
in one go (so already reversed fields will
be reversed back to normal). Here are a couple of examples:
title_sort
=
SortField
.
new
(
:title
,
:type
=>
:string
)
path_sort
=
SortField
.
new
(
:path
,
:type
=>
:byte
)
rating_sort
=
SortField
.
new
(
:rating
,
:type
=>
:float
,
:reverse
=>
true
)
sort
=
Sort
.
new
([
title_sort
,
rating_sort
,
SortField
::
SCORE
])
top_docs
=
index
.
search
(
query
,
:sort
=>
sort
)
# reverse all sort-fields.
sort
=
Sort
.
new
([
path_sort
,
SortField
::
DOC_ID_REV
],
true
)
top_docs
=
index
.
search
(
query
,
:sort
=>
sort
)
The Sort
class also has two
constants: Sort::RELAVANCE
and
Sort::INDEX_ORDER
. Sort::RELAVANCE
will order fields by score as
is done by default in Ferret. Sort::INDEX_ORDER
sorts a result set to
the order in which the documents were added to the index.
Possibly one of the most common sorts to perform is a sort by date.
We discussed how to store date fields for sorting in the Date Fields” section in Chapter 2. If you
have stored the date field correctly (in YYYYMMDD format), it is very
simple to sort by this field. The best sort type to use is :byte
because it will be the fastest to create
the index and otherwise performs just as well as aninteger sort. Using
:auto
, Ferret will sort the field by
integer, which will be fine as well, so it is no problem using the sort
string descriptor (e.g., “updated_on, created_on, DESC”). Here is how
you would explicitly create a Sort
to sort a date
field:
updated_on
=
SortField
.
new
(
:updated_on
,
:type
=>
:byte
)
created_on
=
SortField
.
new
(
:created_on
,
:type
=>
:byte
,
:reverse
=>
true
)
sort
=
Sort
.
new
([
updated_on
,
created_on
,
SortField
::
DOC_ID
])
index
.
search
(
query
,
:sort
=>
sort
)
Get Ferret now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.