Chapter 1. An Introduction to Redis
This chapter discusses some of Redis’s basic concepts. We’ll look into when Redis is a great fit, how to install the server and command-line client on your machines, and Redis’s data types.
When to use Redis
Nearly every application has to store data, and often lots of fast-changing data. Until recently, most applications stored their data using relational database management systems (RDBMS for short) like Oracle, MySQL, or PostgreSQL. Recently, however, a new paradigm of data storage has emerged from the need to store schema-less data in a more effective way—NoSQL. Choosing whether to use SQL or NoSQL is often an important first step in the design of a successful application.
There are two important thing to consider when choosing whether to use SQL or NoSQL to store your data: its nature and your usage pattern. Some data is a great fit for a relational storage engine, while other data benefits from the schema-free nature of a NoSQL engine like Redis or its alternatives. If you don’t rely on a particular RDBMS feature and need the performance or scalability of a NoSQL database, that might in fact be the ideal choice. So in order to decide whether your data should be stored in a RDBMS or NoSQL engine, you need to look into a few specific things that will help you make a decision. Also bear in mind that quite often the ideal solution will be to use both.
Are your application and data a good fit for NoSQL?
When working on the web, chances are your data and data model keep changing with added functionality and business updates. Evolving the schema to support these changes in a relational database is a painful process, especially if you can’t really afford downtime—which people most often can’t these days, because applications are expected to run 24/7. As a case in point, in a recent presentation on MongoDB, Jeremy Zawodny of Craigslist mentioned how changing the schema on their database typically takes a two month-long toll on their post archival service.
Examples of data that are a particularly good fit for nonrelation storage are transactional details, historical data, and server logs. These are normally highly dynamic, changing quite often, and their storage tends to grow quite quickly, further compounding the problem of adjusting the schema to store them. They also don’t typically feel “relational”—that is, the data in them doesn’t tend to fan out in relationships to other types of data. That’s a good indication that they can use something other than a RDBMS.
Another way to gauge the fit for NoSQL is to look at whether you find yourself denormalizing your data for performance reasons, and no longer benefit from some of the advantages of a relational system, such as consistency and redundancy checks.
One thing to keep in mind is that NoSQL databases generally
don’t provide ACID (atomicity, consistency, isolation, durability), or
do it only partially. This allows them to make a few tradeoffs that
wouldn’t be possible otherwise. Redis provides partial ACID compliance
by design due to the fact that it is single threaded (which guarantees
consistency and isolation), and full compliance if configured with
appendfsync always, providing durability as well.
Performance can also be a key factor. NoSQL databases are generally faster, particularly for write operations, making them a good fit for applications that are write-heavy.
All this being said, and even though NoSQL feels more flexible, there are also great arguments to be made for storing relational data in a RDBMS. If you have predictable data that is a great fit for normalization, you can reap the benefits of using a relational data storage engine. Always look at the data before making a decision.
Don’t believe the hype
NoSQL databases such as Redis are fast, scale easily, and are a great fit for many modern problems. But as with everything else, it is important to always choose the right tool for the job. Play to the strengths of your tools by looking at what you’re storing, how often you’ll access it, and how data (and its schema) might change over time.
Once you’ve weighted all the options, picking between SQL (for stable, predictable, relational data) and NoSQL (for temporary, highly dynamic data) should be an easy task. Doing this kind of thinking in advance will save you many headaches in future data migration efforts.
There are also big differences between NoSQL databases that you should account for. For example, MongoDB (a popular NoSQL database) is a feature-heavy document database that allows you to perform range queries, regular expression searches, indexing, and MapReduce. You should weigh all the factors when choosing your database. As we said earlier, the questions boil down to what your data looks like and what your usage pattern is.
For example, Redis is extremely fast, making it perfectly suited for applications that are write-heavy, data that changes often, and data that naturally fits one of Redis’s data structures (for instance, analytics data). A scenario where you probably shouldn’t use Redis is if you have a very large dataset of which only a small part is “hot” (accessed often) or a case where your dataset doesn’t fit in memory.
You want to install Redis on your computer.
There are several ways to install Redis on your computer or server, but the best and most flexible option is to compile it yourself. Nevertheless, depending on your distribution or operating system, there are other options.
Compiling From Source
Redis evolves very quickly and package maintainers have a hard time keeping up with the latest developments. Since Redis doesn’t have any external dependencies, compilation and installation are very straightforward, so we recommend you do it yourself. Redis should build cleanly in most Linux distributions, Mac OS X, Solaris, and Cygwin on Windows.
Downloading the source
You can download Redis from the official site or directly from the Github project, either using Git or your browser to fetch a snapshot of one the branches or tags. This allows you get to get development versions, release candidates, etc.
Redis compilation is straightforward. The only required tools should be a C compiler (normally GCC) and Make. If you want to run the test suite, you also need Tcl 8.5.
This will compile Redis, which on a modern computer should take less than 10 seconds. If you’re using a x86_64 system but would like an x86 build (which uses less memory but also has a much lower memory limit), you can do so by passing along
cd src && ./redis-server
make install /opt/local
This will install the binaries in /opt/local/bin.
After installating the Redis server, you should also copy the configuration file (redis.conf) to a path of your choice, the default being /etc/redis.conf. If your configuration file is in a different path from the default, you can pass it along as a parameter to redis-server:
Installing on Linux
Most modern Linux distributions have Redis packages available for installation, but keep in mind that these are normally not up-to-date. However, if you prefer to use these, the installation procedure is much simpler:
This approach has a few advantages: by using your package management system, you can more easily keep software up-to-date, and you’ll most likely get at least security and stability updates. Besides that, you’ll also get startup scripts and an environment more suited to your distribution (user accounts, log files, database location, etc).
Installing on Windows
Although Redis is not officially supported on Windows for
several reasons—notably the lack
of a copy-on-write
fork()—there is now a
native port by Microsoft Open Technologies that implements CoW in userspace and therefore should have acceptable performance.
Beware that for performance and stability reasons, the Windows versions of Redis are not recommended for production use. Consider using a native or virtualized Linux/UNIX environment instead. Despite that, you might find these versions useful for development or testing.
The main disadvantage of using the Microsoft Open Technologies version is that, because it’s a fork of Redis, there is a lag incorporating version updates from the original.
Being a native solution, compilation is simpler as it requires only Visual Studio. You can get the source and follow the project on Github.
Installing on Mac OS X
There are several ways to install Redis on Mac OS X. They all require you to have the XCode developer tools installed, which includes libraries and compilers. If you are a developer on a Mac, chances are you already have this package installed. If you don’t, you can either download it from Apple’s developers website or run “Install Developer Tools” on your Mac’s installation DVDs.
You can manually compile Redis from source by following the steps earlier in this chapter. Most people, however, prefer the convenience of a package manager such as Fink, MacPorts, or Homebrew. A Redis package isn’t available on Fink, so we’ll cover the other two.
Installing through MacPorts
MacPorts defines itself as “an easy to use system for compiling, installing, and managing open source software.” It is based on the FreeBSD Ports system, and to a large extent can be used in the exact same way.
In order to install Redis through MacPorts, you need to first install the package management system. There’s an extensive guide on how to do that at guide.macports.org. Once you’ve installed MacPorts, installing the Redis package is as simple as:
port install redis
Since Redis has no direct dependencies, the actual compilation and installation process is quite speedy. You will then be ready to start using Redis.
Installing through Homebrew
Homebrew is the latest entrant in the Mac package management scene. Being relatively new means that not every package you might be looking for is available on it—even though they make contributions very easy—but if you’re looking for a tool that developers use often, chances are that it’s going to be available through a Homebrew recipe.
You can install Homebrew by following the detailed instructions available over at Github, but it is usually as simple as running the following command:
ruby -e "$(curl -fsSLk https://gist.github.com/raw/323731/install_homebrew.rb)"
Once that’s done, you’ll be ready to install packages using the Homebrew recipes system. Installing Redis is just a matter of typing:
brew install redis
You can then run
redis-server manually or install
it into the Mac’s own LaunchServices so that it starts when you
reboot your computer. You can edit the configuration file /usr/local/etc/redis.conf to
tweak it to your liking, and then start the server:
Using Redis Data Types
You need to understand Redis data types in order to make better use of them for specific applications.
Unlike most other NoSQL solutions and key-value storage engines, Redis includes several built-in data types, allowing developers to structure their data in meaningful semantic ways. Predefined data types add the benefit of being able to perform data-type specific operations inside Redis, which is typically faster than processing the data externally. In this section, we will look at the data types Redis supports, and some of the thinking behind them.
Before we dive into the specific data types, it is important to look at a few things you should keep in mind when designing the key structure that holds your data:
Be consistent when defining your key space. Because a key can contain any characters, you can use separators to define a namespace with a semantic value for your business. An example might be using
cache:project:319:tasks, where the colon acts as a namespace separator.
When defining your keys, try to limit them to a reasonable size. Retrieving a key from storage requires comparison operations, so keeping keys as small as possible is a good idea. Additionally, smaller keys are more effective in terms of memory usage.
Even though keys shouldn’t be exceptionally large, there are no big performance improvements for extremely small keys. This means you should design your keys in such a way that combines readability (to help you) and regular key sizes (to help Redis).
With this in mind, keys like
user 123 would be bad—the
first because it
is semantically crude, and the latter because it includes
whitespace. On the other hand, keys like
464A1E96B2D217EBE87449FA8B70E6C7D112560C are good, because
they’re semantically meaningful. Note that the last example of an SHA1
hash is, while hard to guess and predict, semantically meaningful and
quite useful if you are storing data related to an object for which you
can consistently calculate a hash.
The simplest data type in Redis is a string. Strings are also the typical (and frequently the sole) data type in other key-value storage engines. You can store strings of any kind, including binary data. You might, for example, want to cache image data for avatars in a social network. The only thing you need to keep in mind is that a specific value inside Redis shouldn’t go beyond 512MB of data.
Lists in Redis are ordered lists of binary safe strings, implemented on the idea of a linked list. This means that while getting an element by a specific index is a slow operation, adding to the head or tail of the data structure is extremely fast, as it should be in a database. You might want to use lists in order to implement structures such as queues, a recipe for which we’ll look into later in the book.
Much like traditional hashtables, hashes in Redis store several fields and their values inside a specific key. Hashes are a perfect option to map complex objects inside Redis, by using fields for object attributes (example fields for a car object might be “color”, “brand”, “license plate”).
Sets and Sorted Sets
Sets in Redis are an unordered collection of binary-safe
strings. Elements in a given set can have no duplicates. For instance,
if you try to add an element
wheel to a set twice, Redis
will ignore the second operation. Sets allow you to perform typical
set operations such as intersections and unions.
While these might look similar to lists, their implementation is quite different and they are suited to different needs due to the different operations they make available. Memory usage should be higher than when using lists.
Sorted sets are a particular case of the set implementation that
are defined by a score in addition to the typical binary-safe string.
This score allows you to retrieve an ordered list of elements by using
ZRANGE command. We’ll look at some example
applications for both sets and sorted sets later in this book.