Chapter 4. Developing with Couchbase

When building an application with Couchbase, your first consideration should be how you communicate and integrate with the Couchbase cluster. From a client perspective, your Couchbase cluster should operate like a “black box” environment. Although you need to be aware of the environment in which you are operating, the configuration, topology, and scope of your cluster is irrelevant to the way you will store and retrieve information.

You configure your client to communicate with the cluster and store data. The client library (or the Moxi proxy service) is responsible for the communication to individual nodes within the cluster to handle the distribution of data. Other aspects, such as rebalancing or failover, should not affect or interrupt the core operations of exchanging information.

In this chapter we’re going to start by taking a quick look at how the basic storage operations work, then look more closely at the key operations supported within Couchbase Server on the stored data, and then discuss some of the main considerations when developing an application that uses the document model with Couchbase Server to store data.

Hello Couchbase

Couchbase Server stores information by storing a document (the value) identified by a document ID (the key). This makes the development and deployment of your application very simple. You store a document by sending the document data and the document ID you want to store it under. To get the information back, you provide the document ID and get the exact data you stored back again.

Providing you know the ID of the document you want to retrieve, you can always get the information back. The data is stored simply as a sequence of bytes. This means that you can either store raw information (such as a string or integer), more complex structures (such as JavaScript Object Notation [JSON]), or serialized objects. Serialization converts native objects for your given language into a suitable bytestring that can then be materialized back into a object when it has been retrieved from the server.

The basic storage and retrieval process is therefore very simple. For the examples below I’ve used Ruby, although all the different client languages work in the same fashion since they all use the same core protocol.

Note

You can get downloads of all the Couchbase client libraries from http://www.couchbase.com/develop. For Ruby specifically, follow the instructions on http://couchbase.com/develop/ruby/current.

With the Couchbase client library for Ruby installed, you can create and build a simple program to store and then retrieve information in your Couchbase Server. A sample program, hello-world.rb, is shown in Example 4-1.

Example 4-1. Hello World!

require 'rubygems'
require 'couchbase'

client = Couchbase.new "http://127.0.0.1:8091/pools/default"
client.quiet = false
begin
  spoon = client.get "spoon"
  puts spoon
rescue Couchbase::Error::NotFound => e
  puts "There is no spoon."
  client.set "spoon", "Hello World!", :ttl => 10
end

Dissecting the script reveals the set and get process common in Couchbase:

  • The first two lines load the necessary libraries.

  • The next line opens up a connection to your Couchbase Cluster. The definition is through a URL which should point to at least one node within your cluster. In this example, the localhost address is used. The bucket name, default, is explicitly requested. You can change this to another bucket if you have configured it.

  • The remainder of the script performs a retrieve and store operation. If the initial retrieve operation (for the document ID "spoon") fails, then we set the data into the database. If the document ID does exist, the script prints out the stored value.

You can test this script out by running it from the command line. The first time you run it, it should output this error string:

shell> ruby hello-world.rb
There is no spoon.

The specified document does not exist in the database, but is added after the error string has been printed. The second time you run it, you should get the stored document value:

shell> ruby hello-world.rb
Hello World!

As an additional demonstration, the welcome string stored has been given an expiry value of 10 seconds. This means that if you wait longer than 10 seconds after you have stored the value, the value will be deleted from the database. If you wait more than 10 seconds from the first time you ran the script and execute the script again, it should output this error string:

shell> ruby hello-world.rb
There is no spoon.

Although this is a very basic example, it demonstrates the simplicity of retrieving and storing information into Couchbase Server using the basic get/set operations.

Deployment Options

To get the best out of Couchbase Server and the client environment that you are using, you should use one of the Couchbase clients. These “smart” clients combine the core interface protocol used for storing information with the administration protocol. The latter enables the clients to communicate directly with the Couchbase cluster to understand the vBucket map so that information can be sent directly to individual nodes within the cluster. The same system also allows for changes to the vBucket map to be acted upon during failover and rebalance scenarios.

There are six client libraries supported directly by Couchbase:

  • Java

  • .NET

  • PHP

  • Ruby

  • C (libcouchbase)

  • Python

Each of these clients is a “smart” client, providing you with the best combination of key functionality and intelligent support of the cluster configuration and operation. You can get more information at http://www.couchbase.com/develop.

If you want to use a memcached-compatible library or application that you have already written that uses this protocol, but which takes advantage of the Couchbase Server cluster architecture, then you can use the Moxi service.

The Moxi proxy service interfaces between the memcached protocol and a Couchbase Server cluster. Couchbase Server is 100% memcached-compatible at a protocol level. To do this, you should install Moxi on each client, configure the Moxi service to connect to your Couchbase Server cluster, and then connect to Moxi using localhost as the hostname. For more information, see the Couchbase Server manual (http://www.couchbase.com/docs).

Note

Although Couchbase Server is memcached protocol-compliant, there are additional operations within the Couchbase protocol that are not supported by memcached, which you will obviously not be able to take advantage of.

Basic Operations

Couchbase Server operates as a document store with a very simple store/retrieve model based on the ID that you give for each document. There are no tables or structures to define when you store the information, and there are no complicated queries to write to get information in and out.

All operations within Couchbase Server follow some basic rules:

All operations are atomic

This means that there are no locking mechanisms within the server, and there is no possibility of simultaneous commands from multiple clients corrupting the stored data. However, this also means that if multiple clients perform a set operation, it is the last one that will remain active when the operations have completed.

To manage concurrency and race conditions, you can use the CAS operations. These require an additional checksum value so that the values cannot be updated without supplying a suitable valid checksum.

All data operation require a key

All operations require the key name of the data being updated or retrieved. You cannot perform a global operation or an operation on multiple keys (with the exception of the multiple-get).

No implicit locking

There is no implicit locking within the system when storing or updating data. The operation will either complete successfully, or fail for a reason unrelated to the individual key/value pair (for example, a temporary out of memory error).

The different client languages and implementations work with the core protocol to communicate with Couchbase Server:

All clients implement the core protocol

Although there are some differences in the exact structure and function names used by different languages and environments, they all implement the same core protocol operations. For example, the set() protocol call is available in all implementations, although some clients may use the term “store.”

Function call structure differences

Due to the differences in the different languages and environments, the exact function structure may be different from the core protocol. For example, within Java, where variable-argument methods are not available, there are multiple variants of the same function. In other languages, such as Perl, Python, and Ruby, where hashes are core variable types, these are often used for storing and returning information.

Different languages implement additional functionality

Related to the two previous examples, some of the client implementations provide additional function calls and structures that are not supported by the core protocol. For example, within Java, all operations are available as both synchronous and asynchronous operations, enabling you to continue processing information while get or set operations are executing.

Not all implementations support flags

The flags, stored by the server along with the value and specified key, are not supported by all the different client libraries.

The core protocol and operations supported with Couchbase are shown in Table 4-1.

Table 4-1. Core protocol operations

OperationDescription
add(key, value [,expiry])Adds a new value if the key does not exist, or returns an error.
set(key, value [,expiry])Sets a value, whether the key already exists or not.
get(key)Gets a value using the supplied key.
getAndTouch(key, expiry)Gets a value, and updates the expiry time.
getBulk(key [,key, ..., keyn])Get multiple values simultaneously. More efficient than multiple single get operations.
gets(key)Get the value and CAS for a given key.
replace(key, value [,expiry])Replaces an existing value if the specified key exists.
append(key, value)Append data to an existing key/value pair.
prepend(key, value)Prepend data to an existing key/value pair.
increment(key, value [, offset])Increment a stored integer by a specified value (default 1)
decrement(key, value [, offset])Decrement a stored integer by a specified value (default 1)
touch(key, expiry)Update the expiry time for a given value
cas(key, value, checksum)Updates a document only when the supplied checksum matches the one stored on the server
delete(key)Deleted the specified document

Regardless of the client library, the functions work the same across the different languages, with some differences to account for conventions. For example, you can increment a value within Ruby using:

couchbase.incr("counter", 5)

Within .NET, the function call is:

client.Increment("counter", 100, 1);

The second argument in this case is the default value if the specified document ID does not already exist.

Compare and Swap (Check and Set)

In addition to the core functions, there is one special function called compare and swap (or check and set, or compare and swap, depending on who you talk to!). Compare and swap provides a checksum that ensures multiple clients do not update a document that may have subsequently changed on the server since the document was last fetched.

For example, consider the following scenario:

  1. Client A gets the value for the document “Martin”.

  2. Client B gets the value for the document “Martin”.

  3. Client A adds information to the document value and updates it.

  4. Client B adds information to the document value and updates it.

In the above sequence, the update by Client B will overwrite the information in the database, removing the data that Client A added.

To provide a solution to this, you can use the compare and swap (cas()) function. This requires that a unique CAS value be retrieved from the server. The CAS value is changed every time the document is updated, even if the document is updated to the same value. When sending the update to the server, if the CAS value known by the client does not match the CAS value currently stored for the document, then the operation will fail.

The result is a change to the above sequence:

  1. Client A gets the value for the document “Martin” and the CAS value.

  2. Client B gets the value for the document “Martin” and the CAS value.

  3. Client A adds information to the document value and updates it, using the CAS value as a check. The document is updated.

  4. Client B adds information to the document value and tries to update it using the CAS value. The operation fails, because the cached CAS value on client B is now different from the CAS value on the server after the update by client A.

CAS therefore supports an additional level of checking and verifies that the information you are updating matches the copy of the information you originally retrieved.

Within your code, CAS is a function just like the update() function. Depending on your environment, you may need to use a special get function (gets()) that obtains both the document information and CAS value.

For example, within Java you would update an existing document through CAS first by getting the value and stored CAS value, and then using the cas() method to update the document:

CASValue customer = client.gets("customer");
CASResponse casr = client.cas("customer", customer.getCas(), "new string value");

The limitation of using CAS is that it is not enforceable at an application client library level. If you want to use it for all the update operations, you must explicitly use it over the standard document update functions.

Storing Data in Couchbase Server

Couchbase Server is strictly a document database. That is, information is stored in the database according to the document ID (used to reference the data), and the corresponding document value. This means that there is no need to expressly set the format of the data, create a schema, or even tell Couchbase Server about the information that you are storing.

All you need to do is store document data against a document ID that you specify.

Because of the document structure, there are some different considerations when building and developing your applications. Let’s start by taking a look at the basics of the document ID and document value.

Document IDs

The document ID (or key) used to store your data is important. Keys must be unique within a bucket (because they must uniquely identify the content of the corresponding value).

The key should be used to identify the information and can be any string, generally up to 128 characters in length. There are no mechanisms within Couchbase Server to create a unique or sequential ID. If you want to use a UUID you will need to use a library within your chosen application environment.

It also standard practice to use a prefix, type, and/or a separator to different information that you store into each bucket. For example, you might store information about a beer by using an ID like beer_9834759. Here the beer prefix identifies the record type, and the underscore acts as a separator between that and the unique beer ID.

Within the same bucket you could add brewery_893749 to store brewery information and differentiate that from the beer records.

Couchbase Server 1.8 does not support the ability to get the list of document IDs within a bucket, or to iterate over the documents stored in the database in any way. You also cannot perform queries or lookups on the information except by knowing the ID. However, this functionality will be supported in the forthcoming Couchbase Server 2.0 release.

One way to simplify this is to enable your application to create and produce links of the information. For example, when a new beer record is added to the database, you can update a document called beer_list that contains a list of the individual beer records. Because updates are atomic, it should be possible to keep an up-to-date manual list of this information. The same basic principle can also be used to link records in the database. For example of this in action, see https://blog.couchbase.com/maintaining-set-memcached.

Your application should be able to bootstrap itself by using and reading this fixed-named record, either by reading a local configuration or by storing a configuration record into the bucket.

Document Data

The data stored within a document is purely a sequence of bytes. The server makes no attempt to parse or understand the information being stored into the document. This means that you can store everything from numbers up to images.

The open-ended structure of the information means that there is no need to declare or define the structure of the information that you want to store. It also gives you the ultimate flexibility to determine the structure of the information.

To store simple information, such as a number or a string, you can simply write the data into the document value.

To store more complicated structures of information, you will want to use either native object serialization or a generic structure such as JSON.

Serialization

Serialization converts a complicated internal structure, such as a hash or object from your client language, and converts it into a sequence of bytes that can be stored within Couchbase Server’s document database structure.

More usefully, serialized structures can also be deserialized back into their internal, language-specific, format so that they can be used natively within your application.

All the Couchbase Server client libraries automatically support serialization and deserialization of a structure or object supplied to them within the storage and retrieve methods.

JSON

The problem with serialization of information is that it is language-specific. If you store an object or data structure from within Java into Couchbase Server, it will be serialized (transcoded) into a string that only the Java client library can understand. If you want to use the same information from a different client, you need to store it in a more generic format.

One of the more popular generic formats available is JSON. The popularity is based on a combination of its simplicity (it looks very similar to the nested hash structures of many scripting languages), and the fact that it can be used natively by JavaScript—and therefore within your web-based application—without any additional processing.

The JSON format is well described on the internet, and particularly at http://json.org. The best way to use JSON within Couchbase Server is to create and store your data by using the JSON hash structure to create an individual record. For example, you could define a basic beer record like this:

{
     "id": "beer_Hoptimus_Prime",
     "type": "beer",
     "abv": 10.0,
     "brewery": "Legacy Brewing Co.",
     "category": "North American Ale",
     "name": "Hoptimus Prime",
     "style": "Imperial or Double India Pale Ale",
}

The information is split into fields (for example, the brewery name), and types are implied by the JSON formatting as strings or floating point values.

Many languages include support for a similar hash, hashmap, or associative array structure, and there are libraries that will convert from an internal object into a JSON compatible format and back again.

Important

Looking forward to Couchbase Server 2.0 (which is already available in Developer Preview), using JSON when storing data will allow you to take advantage of the querying and indexing functionality. This works by parsing the stored documents in JSON format and picking out individual fields and other structures used to build a view into the data.

Expiry Times

We covered the role of expiry (or time to live [TTL]) times when looking at the core architecture of Couchbase Server. The expiry time is useful because it allows you to set a timeout on the information that you are storing so that it gets automatically deleted when it is no longer usable.

Other than using the delete() function, using the expiry value on a document is the only way to delete information from the database. Once the expiry time has been reached, the data will be deleted.

Expiration times are set either using a numeric value expressed as the number of seconds. For values less than 30 days (that is, 30*24*60*60 seconds), the value is taken as a relative value. For example, 3600 seconds would expire the document after one hour. For values higher than this, the value is interpreted as an absolute time expressed in seconds from the epoch.

Expiry can be useful in a number of different environments, but the most obvious is when using Couchbase Server to store session data for an application. You can set the expiry time to allow two hours of access to the website, with the session data being deleted when the user stops using the website for any period of time.

You can use the touch() and getAndTouch() functions to update the expiry while the user is still accessing the data, without having to explicitly set the expiry time through an update operation.

Flags

In addition to the expiry time, all documents are also stored with a set of flags. Not all client libraries expose the flags, but where they are available, you can use them to add information about a document, such as the document type.

Client Interaction with the Cluster

One of the most common questions when developing applications is how clients and client libraries react and are affected by the topology and topology changes that occur in a running cluster.

In general, a Couchbase Server cluster acts as a “black box” in terms of the client and database interaction. If you are using a smart client, the topology, node structure, and changes to this are entirely handled through the combination of the vBucket map and the client library.

The client library will take care of the communication between the client and the individual nodes. The node that you use to connect to the cluster when first opening a connection does not act as a proxy or distribution service.

Instead, a smart client (or Moxi) will load the vBucket map, and from this information, determine which node within the cluster should be contacted to store and retrieve different documents. The information exchange is direct with the right node for that data.

During a topology change (for example, a rebalance or failover operation), the client library should automatically handle any transient errors. In all other respects, the configuration and topology of the cluster is not something you should ever have to worry about.

Get Getting Started with Couchbase Server now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.