You can use pattern matches to look through any number of rows, but as the amount of text goes up, the match operation can become quite slow. It’s also common to look for the same text in several string columns, which with pattern matching tends to result in unwieldy queries:
SELECT * fromtbl_name
WHEREcol1
LIKE 'pat
' ORcol2
LIKE 'pat
' ORcol3
LIKE 'pat
' ...
A useful alternative (available as of MySQL 3.23.23) is to use
FULLTEXT
searching, which is designed for looking
through large amounts of text, and can search multiple columns
simultaneously. To use this capability, add a
FULLTEXT
index to your table, then use the
MATCH
operator to look for strings in the indexed
column or columns. FULLTEXT
indexing can be used
with MyISAM tables, for columns of type CHAR
,
VARCHAR
, or TEXT
.
FULLTEXT
searching is best illustrated with a
reasonably good-sized body of text. If you don’t
have a sample dataset, several repositories of freely available
electronic text are available on the Internet. For the examples here,
the one I’ve chosen is the complete text of the King
James Version of the Bible (KJV), which is relatively large and has
the advantage of being nicely structured by book, chapter, and verse.
Because of its size, this dataset is not included with the
recipes
distribution, but is available separately
as the mcb-kjv
distribution at the MySQL Cookbook
web site.[25] (See Appendix A.) The distribution
includes a file kjv.txt
that contains the verse
records. Some sample records look like this:
O Genesis 1 1 1 In the beginning God created the heaven and the earth. O Exodus 2 20 13 Thou shalt not kill. N Luke 42 17 32 Remember Lot's wife.
Each record contains the following fields:
Book section. This is either
O
orN
, signifying the Old or New Testament.Book name and corresponding book number, from 1 to 66.
Chapter and verse numbers.
Text of the verse.
To import the records into MySQL, create a table named
kjv
that looks like this:
CREATE TABLE kjv ( bsect ENUM('O','N') NOT NULL, # book section (testament) bname VARCHAR(20) NOT NULL, # book name bnum TINYINT UNSIGNED NOT NULL, # book number cnum TINYINT UNSIGNED NOT NULL, # chapter number vnum TINYINT UNSIGNED NOT NULL, # verse number vtext TEXT NOT NULL # text of verse ) TYPE = MyISAM;
Then load the kjv.txt
file into the table using
this statement:
mysql> LOAD DATA LOCAL INFILE 'kjv.txt' INTO TABLE kjv;
You’ll notice that the kjv
table
contains columns for both book names (Genesis, Exodus, ...) and book
numbers (1, 2, ...). The names and numbers have a fixed
correspondence, and one can be derived from the other—a
redundancy that means the table is not in normal form.
It’s possible to eliminate the redundancy by storing
just the book numbers (which take less space than the names), and
then producing the names when necessary in query results by joining
the numbers to a small mapping table that associates each book number
with the corresponding name. But I want to avoid using joins at this
point. Thus, the table includes book names so that search results can
be interpreted more easily, and numbers so that the results can be
sorted easily into book order.
After populating the table, prepare it for use in
FULLTEXT
searching by adding a
FULLTEXT
index. This can be done using an ALTER
TABLE
statement:[26]
mysql> ALTER TABLE kjv ADD FULLTEXT (vtext);
To perform a search using the index, use MATCH( )
to name the indexed column and
AGAINST( )
to specify what text to look for. For
example, to answer the question “How often does the
name Mizraim occur?” (you’ve often
wondered about that, right?), search the vtext
column using this query:
mysql> SELECT COUNT(*) from kjv WHERE MATCH(vtext) AGAINST('Mizraim');
+----------+
| COUNT(*) |
+----------+
| 4 |
+----------+
To find out what those verses are, select the columns you want to see
(the example here uses \G
so that the results
better fit the page):
mysql>SELECT bname, cnum, vnum, vtext
->FROM kjv WHERE MATCH(vtext) AGAINST('Mizraim')\G
*************************** 1. row *************************** bname: Genesis cnum: 10 vnum: 6 vtext: And the sons of Ham; Cush, and Mizraim, and Phut, and Canaan. *************************** 2. row *************************** bname: Genesis cnum: 10 vnum: 13 vtext: And Mizraim begat Ludim, and Anamim, and Lehabim, and Naphtuhim, *************************** 3. row *************************** bname: 1 Chronicles cnum: 1 vnum: 8 vtext: The sons of Ham; Cush, and Mizraim, Put, and Canaan. *************************** 4. row *************************** bname: 1 Chronicles cnum: 1 vnum: 11 vtext: And Mizraim begat Ludim, and Anamim, and Lehabim, and Naphtuhim,
The results come out in book, chapter, and verse number order in this
particular instance, but that’s actually just
coincidence. By default, FULLTEXT
searches compute
a relevance ranking and use it for sorting. To make sure a search
result is sorted the way you want, add an explicit
ORDER
BY
clause:
SELECT bname, cnum, vnum, vtext
FROM kjv WHERE MATCH(vtext) AGAINST('search string
')
ORDER BY bnum, cnum, vnum;
You can include additional criteria to narrow the search further. The following queries perform progressively more specific searches to find out how often the name Abraham occurs in the entire KJV, the New Testament, the book of Hebrews, and Chapter 11 of Hebrews:
mysql>SELECT COUNT(*) from kjv WHERE MATCH(vtext) AGAINST('Abraham');
+----------+ | COUNT(*) | +----------+ | 216 | +----------+ mysql>SELECT COUNT(*) from kjv
->WHERE MATCH(vtext) AGAINST('Abraham')
->AND bsect = 'N';
+----------+ | COUNT(*) | +----------+ | 66 | +----------+ mysql>SELECT COUNT(*) from kjv
->WHERE MATCH(vtext) AGAINST('Abraham')
->AND bname = 'Hebrews';
+----------+ | COUNT(*) | +----------+ | 10 | +----------+ mysql>SELECT COUNT(*) from kjv
->WHERE MATCH(vtext) AGAINST('Abraham')
->AND bname = 'Hebrews' AND cnum = 11;
+----------+ | COUNT(*) | +----------+ | 2 | +----------+
If you expect to use search criteria that include other
non-FULLTEXT
columns frequently, you can increase
the performance of such queries by adding regular indexes to those
columns. For example, to index the book, chapter, and verse number
columns, do this:
mysql> ALTER TABLE kjv ADD INDEX (bnum), ADD INDEX (cnum), ADD INDEX (vnum);
Search strings in FULLTEXT
queries can include
more than just a single word, and you might suppose that adding
additional words would make a search more specific. But in fact that
widens it, because a FULLTEXT
search returns
records that contain any of the words. In effect, the query performs
an OR
search for any of the words. This is
illustrated by the following queries, which identify successively
larger numbers of verses as additional search words are added:
mysql>SELECT COUNT(*) from kjv
->WHERE MATCH(vtext) AGAINST('Abraham');
+----------+ | COUNT(*) | +----------+ | 216 | +----------+ mysql>SELECT COUNT(*) from kjv
->WHERE MATCH(vtext) AGAINST('Abraham Sarah');
+----------+ | COUNT(*) | +----------+ | 230 | +----------+ mysql>SELECT COUNT(*) from kjv
->WHERE MATCH(vtext) AGAINST('Abraham Sarah Ishmael Isaac');
+----------+ | COUNT(*) | +----------+ | 317 | +----------+
To perform a search where each word in the search string must be present, see Recipe 4.14.
If you want to use a FULLTEXT
search that looks
though multiple columns simultaneously, name them all when you
construct the index:
ALTER TABLEtbl_name
ADD FULLTEXT (col1
,col2
,col3
);
To issue a search query that uses this index, name those same columns
in the MATCH( )
list:
SELECT ... FROMtbl_name
WHERE MATCH(col1
,col2
,col3
) AGAINST('search string
');
FULLTEXT
indexes provide a quick-and-easy way to
set up a simple search engine. One way to use this capability is to
provide a web-based interface to the indexed text. The MySQL Cookbook
site includes a basic web-based KJV search page that demonstrates
this.
[25] The mcb-kjv
distribution
is derived from the KJV text available at the Unbound Bible site at
Biola University (http://unbound.biola.edu), but has been
modified somewhat to make it easier to use for the recipes in this
book. The mcb-kjv
distribution includes notes that
describe how it differs from the Biola distribution.
[26] It’s possible to include the index definition
in the initial CREATE
TABLE
statement, but it’s generally faster to create a
non-indexed table and then add the index with
ALTER
TABLE
after populating
the table than to load a large dataset into an indexed table.
Get MySQL Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.