This is a collection of some common SQL queries and how to get the same result in CouchDB. The key to remember here is that CouchDB does not work like an SQL database at all and that best practices from the SQL world do not translate well or at all to CouchDB. This chapter’s “cookbook” assumes that you are familiar with the CouchDB basics such as creating and updating databases and documents.
How you would do this in SQL:
CREATE TABLE
or:
ALTER TABLE
Using views is a two-step process. First you
define a view; then you query
it. This is analogous to defining a table structure (with indexes) using
CREATE TABLE
or ALTER TABLE
and
querying it using an SQL query.
Defining a view is done by creating a special document in a
CouchDB database. The only real specialness is the
_id
of the document, which starts with
_design/
—for example,
_design/application
. Other than that, it is just a
regular CouchDB document. To make sure CouchDB understands that you are
defining a view, you need to prepare the contents of that design
document in a special format. Here is an example:
{
"_id"
:
"_design/application"
,
"_rev"
:
"1-C1687D17"
,
"views"
:
{
"viewname"
:
{
"map"
:
"function(doc) { ... }"
,
"reduce"
:
"function(keys, values) { ... }"
}
}
}
We are defining a view viewname
. The definition
of the view consists of two functions: the map
function and the reduce function. Specifying a
reduce function is optional. We’ll look at the nature of the functions
later. Note that viewname
can be whatever you like:
users
, by-name
, or
by-date
are just some examples.
A single design document can also include multiple view definitions, each identified by a unique name:
{
"_id"
:
"_design/application"
,
"_rev"
:
"1-C1687D17"
,
"views"
:
{
"viewname"
:
{
"map"
:
"function(doc) { ... }"
,
"reduce"
:
"function(keys, values) { ... }"
},
"anotherview"
:
{
"map"
:
"function(doc) { ... }"
,
"reduce"
:
"function(keys, values) { ... }"
}
}
}
The name of the design document and the name of the view are
significant for querying the view. To query the view
viewname
, you perform an HTTP GET
request to the following URI:
/database/_design/application/_view/viewname
database
is the name of the database you
created your design document in. Next up is the design document name,
and then the view name prefixed with _view/
. To query
anotherview
, replace viewname
in
that URI with anotherview
. If you want to query a
view in a different design document, adjust the design document
name.
MapReduce is a concept that solves problems by applying a two-step
process, aptly named the map phase and the
reduce phase. The map phase looks at all documents
in CouchDB separately one after the other and creates a map
result. The map result is an ordered list of key/value pairs.
Both key
and value
can be
specified by the user writing the map function. A map function may call
the built-in emit(key, value)
function 0 to
N times per document, creating a row in the map
result per invocation.
CouchDB is smart enough to run a map function only once for every document, even on subsequent queries on a view. Only changes to documents or new documents need to be processed anew.
Map functions run in isolation for every document. They can’t modify the document, and they can’t talk to the outside world—they can’t have side effects. This is required so that CouchDB can guarantee correct results without having to recalculate a complete result when only one document gets changed.
The map result looks like this:
{
"total_rows"
:
3
,
"offset"
:
0
,
"rows"
:
[
{
"id"
:
"fc2636bf50556346f1ce46b4bc01fe30"
,
"key"
:
"Lena"
,
"value"
:
5
},
{
"id"
:
"1fb2449f9b9d4e466dbfa47ebe675063"
,
"key"
:
"Lisa"
,
"value"
:
4
},
{
"id"
:
"8ede09f6f6aeb35d948485624b28f149"
,
"key"
:
"Sarah"
,
"value"
:
6
}
}
It is a list of rows sorted by the value of
key
. The id
is added
automatically and refers back to the document that created this row.
The value
is the data you’re looking for. For
example purposes, it’s the girl’s age.
The map function that produces this result is:
function
(
doc
)
{
if
(
doc
.
name
&&
doc
.
age
)
{
emit
(
doc
.
name
,
doc
.
age
);
}
}
It includes the if
statement as a sanity
check to ensure that we’re operating on the right fields and calls the
emit
function with the name and age as the key and
value.
Reduce functions are explained in Aggregate Functions.
How you would do this in SQL:
SELECT field FROM table WHERE value="searchterm"
Use case: get a result (which can be a record
or set of records) associated with a key
("searchterm"
).
To look something up quickly, regardless of the storage mechanism, an index is needed. An index is a data structure optimized for quick search and retrieval. CouchDB’s map result is stored in such an index, which happens to be a B+ tree.
To look up a value by "searchterm"
, we need to
put all values into the key of a view. All we need is a simple map
function:
function
(
doc
)
{
if
(
doc
.
value
)
{
emit
(
doc
.
value
,
null
);
}
}
This creates a list of documents that have a
value
field sorted by the data in the
value
field. To find all the records that match
"searchterm"
, we query the view and specify the search
term as a query parameter:
/database/_design/application/_view/viewname?key="searchterm"
Consider the documents from the previous section, and say we’re
indexing on the age
field of the documents to find all
the five-year-olds:
function
(
doc
)
{
if
(
doc
.
age
&&
doc
.
name
)
{
emit
(
doc
.
age
,
doc
.
name
);
}
}
Query:
/ladies/_design/ladies/_view/age?key=5
Result:
{
"total_rows"
:
3
,
"offset"
:
1
,
"rows"
:
[
{
"id"
:
"fc2636bf50556346f1ce46b4bc01fe30"
,
"key"
:
5
,
"value"
:
"Lena"
}
]}
Easy.
Note that you have to emit a value. The view result includes the
associated document ID in every row. We can use it to look up more data
from the document itself. We can also use the
?include_docs=true
parameter to have CouchDB fetch the
documents individually for
us.
How you would do this in SQL:
SELECT field FROM table WHERE value LIKE "searchterm%"
Use case: find all documents that have a field value that starts
with searchterm
. For example, say you stored a MIME
type (like text/html
or image/jpg
)
for each document and now you want to find all documents that are images
according to the MIME type.
The solution is very similar to the previous example: all we need is a map function that is a little more clever than the first one. But first, an example document:
{
"_id"
:
"Hugh Laurie"
,
"_rev"
:
"1-9fded7deef52ac373119d05435581edf"
,
"mime-type"
:
"image/jpg"
,
"description"
:
"some dude"
}
The clue lies in extracting the prefix that we want to search for from our document and putting it into our view index. We use a regular expression to match our prefix:
function
(
doc
)
{
if
(
doc
[
"mime-type"
])
{
// from the start (^) match everything that is not a slash ([^\/]+) until
// we find a slash (\/). Slashes needs to be escaped with a backslash (\/)
var
prefix
=
doc
[
"mime-type"
].
match
(
/^[^\/]+\//
);
if
(
prefix
)
{
emit
(
prefix
,
null
);
}
}
}
We can now query this view with our desired MIME type prefix and not only find all images, but also text, video, and all other formats:
/files/_design/finder/_view/by-mime-type?key="image/"
How you would do this in SQL:
SELECT COUNT(field) FROM table
Use case: calculate a derived value from your data.
We haven’t explained reduce functions yet. Reduce functions are similar to aggregate functions in SQL. They compute a value over multiple documents.
To explain the mechanics of reduce functions, we’ll create one that doesn’t make a whole lot of sense. But this example is easy to understand. We’ll explore more useful reductions later.
Reduce functions operate on the output of the map function (also called the map result or intermediate result). The reduce function’s job, unsurprisingly, is to reduce the list that the map function produces.
Here’s what our summing reduce function looks like:
function
(
keys
,
values
)
{
var
sum
=
0
;
for
(
var
idx
in
values
)
{
sum
=
sum
+
values
[
idx
];
}
return
sum
;
}
Here’s an alternate, more idiomatic JavaScript version:
function
(
keys
,
values
)
{
var
sum
=
0
;
values
.
forEach
(
function
(
element
)
{
sum
=
sum
+
element
;
});
return
sum
;
}
This reduce function takes two arguments: a list of
keys
and a list of values
. For our
summing purposes we can ignore the keys
-list and
consider only the value
list. We’re looping over the
list and add each item to a running total that we’re returning at the end
of the function.
You’ll see one difference between the map and the reduce function.
The map function uses emit()
to create its result,
whereas the reduce function returns a value.
For example, from a list of integer values that specify the age, calculate the sum of all years of life for the news headline, “786 life years present at event.” A little contrived, but very simple and thus good for demonstration purposes. Consider the documents and the map view we used earlier in this chapter.
The reduce function to calculate the total age of all girls is:
function
(
keys
,
values
)
{
return
sum
(
values
);
}
Note that, instead of the two earlier versions, we use CouchDB’s
predefined sum()
function. It does the same thing as
the other two, but it is such a common piece of code that CouchDB has it
included.
The result for our reduce view now looks like this:
{
"rows"
:
[
{
"key"
:
null
,
"value"
:
15
}
]}
The total sum of all age
fields in all our
documents is 15
. Just what we wanted. The
key
member of the result object is
null
, as we can’t know anymore which documents took
part in the creation of the reduced result. We’ll cover more advanced
reduce cases later on.
As a rule of thumb, the reduce function should reduce a single
scalar value. That is, an integer; a string; or a small, fixed-size list
or object that includes an aggregated value (or values) from the
values
argument. It should never just return
values
or similar. CouchDB will give you a warning if
you try to use reduce “the wrong way”:
{
"error"
:
"reduce_overflow_error"
,
"message"
:
"Reduce output must shrink more rapidly:
Current output: ..."
}
How you would do this in SQL:
SELECT DISTINCT field FROM table
Getting unique values is not as easy as adding a keyword. But a reduce view and a special query parameter give us the same result. Let’s say you want a list of tags that your users have tagged themselves with and no duplicates.
First, let’s look at the source documents. We punt on
_id
and _rev
attributes here:
{
"name"
:
"Chris"
,
"tags"
:
[
"mustache"
,
"music"
,
"couchdb"
]
}
{
"name"
:
"Noah"
,
"tags"
:
[
"hypertext"
,
"philosophy"
,
"couchdb"
]
}
{
"name"
:
"Jan"
,
"tags"
:
[
"drums"
,
"bike"
,
"couchdb"
]
}
Next, we need a list of all tags. A map function will do the trick:
function
(
dude
)
{
if
(
dude
.
name
&&
dude
.
tags
)
{
dude
.
tags
.
forEach
(
function
(
tag
)
{
emit
(
tag
,
null
);
});
}
}
The result will look like this:
{
"total_rows"
:
9
,
"offset"
:
0
,
"rows"
:
[
{
"id"
:
"3525ab874bc4965fa3cda7c549e92d30"
,
"key"
:
"bike"
,
"value"
:
null
},
{
"id"
:
"3525ab874bc4965fa3cda7c549e92d30"
,
"key"
:
"couchdb"
,
"value"
:
null
},
{
"id"
:
"53f82b1f0ff49a08ac79a9dff41d7860"
,
"key"
:
"couchdb"
,
"value"
:
null
},
{
"id"
:
"da5ea89448a4506925823f4d985aabbd"
,
"key"
:
"couchdb"
,
"value"
:
null
},
{
"id"
:
"3525ab874bc4965fa3cda7c549e92d30"
,
"key"
:
"drums"
,
"value"
:
null
},
{
"id"
:
"53f82b1f0ff49a08ac79a9dff41d7860"
,
"key"
:
"hypertext"
,
"value"
:
null
},
{
"id"
:
"da5ea89448a4506925823f4d985aabbd"
,
"key"
:
"music"
,
"value"
:
null
},
{
"id"
:
"da5ea89448a4506925823f4d985aabbd"
,
"key"
:
"mustache"
,
"value"
:
null
},
{
"id"
:
"53f82b1f0ff49a08ac79a9dff41d7860"
,
"key"
:
"philosophy"
,
"value"
:
null
}
]}
As promised, these are all the tags, including duplicates. Since each document gets run through the map function in isolation, it cannot know if the same key has been emitted already. At this stage, we need to live with that. To achieve uniqueness, we need a reduce:
function
(
keys
,
values
)
{
return
true
;
}
This reduce doesn’t do anything, but it allows us to specify a special query parameter when querying the view:
/dudes/_design/dude-data/_view/tags?group=true
CouchDB replies:
{
"rows"
:
[
{
"key"
:
"bike"
,
"value"
:
true
},
{
"key"
:
"couchdb"
,
"value"
:
true
},
{
"key"
:
"drums"
,
"value"
:
true
},
{
"key"
:
"hypertext"
,
"value"
:
true
},
{
"key"
:
"music"
,
"value"
:
true
},
{
"key"
:
"mustache"
,
"value"
:
true
},
{
"key"
:
"philosophy"
,
"value"
:
true
}
]}
In this case, we can ignore the value part because it is always true, but the result includes a list of all our tags and no duplicates!
With a small change we can put the reduce to good use, too. Let’s
see how many of the non-unique tags are there for each tag. To calculate
the tag frequency, we just use the summing up we already learned about. In
the map function, we emit a 1
instead of
null
:
function
(
dude
)
{
if
(
dude
.
name
&&
dude
.
tags
)
{
dude
.
tags
.
forEach
(
function
(
tag
)
{
emit
(
tag
,
1
);
});
}
}
In the reduce function, we return the sum of all values:
function
(
keys
,
values
)
{
return
sum
(
values
);
}
Now, if we query the view with the ?group=true
parameter, we get back the count for each tag:
{
"rows"
:
[
{
"key"
:
"bike"
,
"value"
:
1
},
{
"key"
:
"couchdb"
,
"value"
:
3
},
{
"key"
:
"drums"
,
"value"
:
1
},
{
"key"
:
"hypertext"
,
"value"
:
1
},
{
"key"
:
"music"
,
"value"
:
1
},
{
"key"
:
"mustache"
,
"value"
:
1
},
{
"key"
:
"philosophy"
,
"value"
:
1
}
]}
How you would do this in SQL:
UNIQUE KEY(column)
Use case: your applications require that a certain value exists only once in a database.
This is an easy one: within a CouchDB database, each document must
have a unique _id
field. If you require unique values
in a database, just assign them to a document’s _id
field and CouchDB will enforce uniqueness for you.
There’s one caveat, though: in the distributed case, when you are running more than one CouchDB node that accepts write requests, uniqueness can be guaranteed only per node or outside of CouchDB. CouchDB will allow two identical IDs to be written to two different nodes. On replication, CouchDB will detect a conflict and flag the document accordingly.
Get CouchDB: The Definitive Guide now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.