First released in 2009, MongoDB is relatively new on the database scene compared to contemporary giants like Oracle which trace their first releases to the 1970’s. As a document-oriented database generally grouped into the NoSQL category, it stands out among distributed key value stores, Amazon Dynamo clones and Google BigTable reimplementations. With a focus on rich operator support and high performance Online Transaction Processing (OLTP), MongoDB is in many ways closer to MySQL than to batch-oriented databases like HBase.
The key differences between MongoDB’s document-oriented approach and a traditional relational database are:
MongoDB does not support joins.
MongoDB does not support transactions. It does have some support for atomic operations, however.
MongoDB schemas are flexible. Not all documents in a collection must adhere to the same schema.
1 and 2 are a direct result of the huge difficulties in making these features scale across a large distributed system while maintaining acceptable performance. They are tradeoffs made in order to allow for horizontal scalability. Although MongoDB lacks joins, it does introduce some alternative capabilites, e.g. embedding, which can be used to solve many of the same data modeling problems as joins. Of course, even if embedding doesn’t quite work, you can always perform your join in application code, by making multiple queries.
The lack of transactions can be painful at times, but fortunately MongoDB supports a fairly decent set of atomic operations. From the basic atomic increment and decrement operators to the richer “findAndModify”, which is essentially an atomic read-modify-write operator.
It turns out that a flexible schema can be very beneficial, especially when you expect to be iterating quickly. While up front schema design—as used in the relational model—has its place, there is often a heavy cost in terms of maintenance. Handling schema updates in the relational world is of course doable, but comes with a price.
In MongoDB, you can add new properties at any time, dynamically, without having to worry about ALTER TABLE statements that can take hours to run and complicated data migration scripts. However, this approach does come with its own tradeoffs. For example, type enforcement must be carefully handled by the application code. Custom document versioning might be desirable to avoid large conditional blocks to handle heterogeneous documents in the same collection.
The dynamic nature of MongoDB lends itself quite naturally to working with a dynamic language such as Python. The tradeoffs between a dynamically typed language such as Python and a statically typed language such as Java in many respects mirror the tradeoffs between the flexible, document-oriented model of MongoDB and the up-front and statically typed schema definition of SQL databases.
Python allows you to express MongoDB documents and queries natively, through the use of existing language features like nested dictionaries and lists. If you have worked with JSON in Python, you will immediately be comfortable with MongoDB documents and queries.
For these reasons, MongoDB and Python make a powerful combination for rapid, iterative development of horizontally scalable backend applications. For the vast majority of modern Web and mobile applications, we believe MongoDB is likely a better fit than RDBMS technology.
MongoDB, Python, 10gen’s PyMongo driver and each of the Web frameworks mentioned in this book all have good reference documentation online.
For MongoDB, we would strongly suggest bookmarking and at least
skimming over the official MongoDB manual which is available in a few
different formats and constantly updated at http://www.mongodb.org/display/DOCS/Manual. While the
mongo console utility as opposed to the Python
interface, most of the code snippets should be easily understood by a
Python programmer and more-or-less portable to PyMongo, albeit sometimes
with a little bit of work. Furthermore, the MongoDB manual goes into
greater depth on certain advanced and technical implementation and
database administration topics than is possible in this book.
For the Python language and standard library, you can use the
help() function in the interpreter or
pydoc tool on the command line to
get API documentation for any methods or modules. For example:
The latest Python language and API documentation is also available for online browsing at http://docs.python.org/.
10gen’s PyMongo driver has API documentation available online to go
with each release. You can find this at http://api.mongodb.org/python/. Additionally, once you have
the PyMongo driver package installed on your system, a summary version of
the API documentation should be available to you in the Python interpreter
help() function. Due to an
issue with the
mentioned in the next section, “pydoc” does not work inside a virtual
environment. You must instead run
python -m pydoc
For the purposes of development, it is recommended to run a MongoDB server on your local machine. This will permit you to iterate quickly and try new things without fear of destroying a production database. Additionally, you will be able to develop with MongoDB even without an Internet connection.
Depending on your operating system, you may have multiple options for how to install MongoDB locally.
Most modern UNIX-like systems will have a version of MongoDB available in their package management system. This includes FreeBSD, Debian, Ubuntu, Fedora, CentOS and ArchLinux. Installing one of these packages is likely the most convenient approach, although the version of MongoDB provided by your packaging vendor may lag behind the latest release from 10gen. For local development, as long as you have the latest major release, you are probably fine.
10gen also provides their own MongoDB packages for many systems which they update very quickly on each release. These can be a little more work to get installed but ensure you are running the latest-and-greatest. After the initial setup, they are typically trivial to keep up-to-date. For a production deployment, where you likely want to be able to update to the most recent stable MongoDB version with a minimum of hassle, this option probably makes the most sense.
In addition to the system package versions of MongoDB, 10gen provide binary zip and tar archives. These are independent of your system package manager and are provided in both 32-bit and 64-bit flavours for OS X, Windows, Linux and Solaris. 10gen also provide statically-built binary distributions of this kind for Linux, which may be your best option if you are stuck on an older, legacy Linux system lacking the modern libc and other library versions. Also, if you are on OS X, Windows or Solaris, these are probably your best bet.
Finally, you can always build your own binaries from the source code. Unless you need to make modifications to MongoDB internals yourself, this method is best avoided due to the time and complexity involved.
In the interests of simplicity, we will provide the commands required to install a stable version of MongoDB using the system package manager of the most common UNIX-like operating systems. This is the easiest method, assuming you are on one of these platforms. For Mac OS X and Windows, we provide instructions to install the binary packages from 10gen.
Ubuntu / Debian:
sudo apt-get update; sudo apt-get install mongodb
sudo yum install mongo-stable-server
sudo pkg_add -r mongodb
Go to http://www.mongodb.org and download the
latest production release zip file for Windows—choosing 32-bit or 64-bit
depending on your system. Extract the contents of the zipfile to a
C:\mongodb and add the
bin directory to your PATH.
Mac OS X:
Go to http://www.mongodb.org and download the
latest production release compressed tar file for OS X—choosing 32-bit or
64-bit depending on your system. Extract the contents to a location like
/opt and add the
bin directory to your $PATH. For exmaple:
cd /tmp wget http://fastdl.mongodb.org/osx/mongodb-osx-x86_64-1.8.3-rc1.tgz tar xfz mongodb-osx-x86_64-1.8.3-rc1.tgz sudo mkdir /usr/local/mongodb sudo cp -r mongodb-osx-x86_64-1.8.3-rc1/bin /usr/local/mongodb/ export PATH=$PATH:/usr/local/mongodb/bin
On some platforms—such as Ubuntu—the package manager will automatically start the mongod daemon for you, and ensure it starts on boot also. On others, such as Mac OS X, you must write your own script to start it, and manually integrate with launchd so that it starts on system boot.
Note that before you can start MongoDB, its data and log directories must exist.
If you wish to have MongoDB start automatically on boot on Windows, 10gen have a document describing how to set this up at http://www.mongodb.org/display/DOCS/Windows+Service
To have MongoDB start automatically on boot under Mac OS X, first
you will need a plist file. Save the following (changing db and log paths
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd"> <plist version="1.0"> <dict> <key>RunAtLoad</key> <true/> <key>Label</key> <string>org.mongo.mongod</string> <key>ProgramArguments</key> <array> <string>/usr/local/mongodb/bin/mongod</string> <string>--dbpath</string> <string>/usr/local/mongodb/data/</string> <string>--logpath</string> <string>/usr/local/mongodb/log/mongodb.log</string> </array> </dict> </plist>
Next run the following commands to activate the startup script with launchd:
sudo launchctl load /Library/LaunchDaemons/org.mongodb.mongod.plist sudo launchctl start org.mongodb.mongod
A quick way to test whether there is a MongoDB instance already
running on your local machine is to type
mongo at the command-line. This will start the
MongoDB admin console, which attempts to connect to a database server
running on the default port (27017).
In any case, you can always start MongoDB manually from the command-line. This is a useful thing to be familiar with in case you ever want to test features such as replica sets or sharding by running multiple mongod instances on your local machine.
Assuming the mongod binary is in your $PATH, run:
mongod --logpath <path/to/mongo.logfile> --port <port to listen on> --dbpath <path/to/data directory>
In order to be able to connect to MongoDB with Python, you need to install the PyMongo driver package. In Python, the best practice is to create what is known as a “virtual environment” in which to install your packages. This isolates them cleanly from any “system” packages you have installed and yields the added bonus of not requiring root privileges to install additional Python packages. The tool to create a “virtual environment” is called virtualenv.
There are two approaches to installing the virtualenv tool on your
system—manually and via your system package management tool. Most modern
UNIX-like systems will have the virtualenv tool in their package
repositories. For example, on Mac OS X with Mac Ports, you can run
sudo port install py27-virtualenv to
install virtualenv for Python 2.7. On Ubuntu you can run
sudo apt-get install python-virtualenv. Refer to
the documentation for your OS to learn how to install it on your specific
In case you are unable or simply don’t want to use your system’s
package manager, you can always install it yourself, by hand. In order to
manually install it, you must have the Python setuptools package. You may
already have setuptools on your system. You can test this by running
python -c import setuptools on the
command line. If nothing is printed and you are simply returned to the
prompt, you don’t need to do anything. If an ImportError is raised, you
need to install setuptools.
To manually install setuptools, first download the file http://peak.telecommunity.com/dist/ez_setup.py
python ez_setup.py as
For Windows, first download and install the latest Python 2.7.x package from http://www.python.org. Once you have installed Python, download and install the Windows setuptools installer package from http://pypi.python.org/pypi/setuptools/. After installing Python 2.7 and setuptools, you will have the easy_install tool available on your machine in the Python scripts directory—default is C:\Python27\Scripts\.
Once you have setuptools installed on your system, run
easy_install virtualenv as root.
Now that you have the “virtualenv” tool available on your machine,
you can create your first virtual Python environment. You can do this by
executing the command
--no-site-packages myenv. You do not need—and indeed should not
want—to run this command with root privileges. This will create a virtual
environment in the directory “myenv”. The --no-site-packages option to the
“virtualenv” utility instructs it to create a clean Python environment,
isolated from any existing packages installed in the system.
You are now ready to install the PyMongo driver.
With the “myenv” directory as your working directory (i.e. after “cd
myenv”), simply execute
pymongo. This will install the latest stable version of PyMongo
into your virtual Python environment. To verify that this worked
successfully, execute the command
import pymongo, making sure that the “myenv” directory is still
your working directory, as with the previous command.
Assuming Python did not raise an ImportError, you now have a Python virtualenv with the PyMongo driver correctly installed and are ready to connect to MongoDB and start issuing queries!