There’s a GitHub repository for the bulk of the code covered in this book, including the full Nobel Prize visualization. To get hold of it, just perform a git clone to a suitable local directory:
$ git clone https://github.com/Kyrand/ dataviz-with-python-and-js.git
This should create a local dataviz-with-python-and-js directory with the key source code covered by the book.
The bulk of the libraries covered in the book are Python-based, but what might have been a challenging attempt to provide comprehensive installation instructions for the various operating systems and their quirks is made much easier by the existence of Continuum Analytics’ Anaconda, a Python platform that bundles together most of the popular analytics libraries in a convenient package.
Installing some of the bigger Python libraries used to be a challenge all in itself, particularly those such as NumPy that depend on complex low-level C and Fortran packages. That’s why the existence of Anaconda is such a godsend. It does all the dependency checking and binary installs so you don’t have to. It’s also a very convenient resource for a book like this.
To get your free Anaconda install, just navigate your browser to https://www.continuum.io/downloads, choose the version for your operating system (as of late 2015, we’re going with Python 2.7), and follow the instructions. Windows and OS X get a graphical installer (just download and double-click), whereas Linux requires you to run a little bash script:
I recommend sticking to defaults when installing Anaconda.
The best way to check that your Anaconda install went well is to try firing up an IPython session at the command line. How you do this depends on your operating system:
At the Windows command prompt:
At the OS X or Linux prompt:
This should produce something like the following:
kyran@Tweedledum:~/projects/pyjsbook$ ipython Python 2.7.10 |Anaconda 2.3.0 (64-bit)| (default, May 28 2015, 17:02:03) Type "copyright", "credits" or "license" for more information. IPython 3.2.0 -- An enhanced Interactive Python. Anaconda is brought to you by Continuum Analytics. Please check out: http://continuum.io/thanks and https://anaconda.org ...
Most installation problems will stem from a badly configured environment
PATH variable. This
PATH needs to contain the location of the main Anaconda directory and its Scripts subdirectory. In Windows, this should look something like:
You can access and adjust the environment variables in Windows 7 by typing
environment variables in the program’s search field and selecting “Edit environment variables” or in XP via Control Panel→System→ Advanced→Environment Variables.
In OS X and Linux systems, you should be able to set your
PATH variable explicitly by appending this line to the .bashrc file in your home directory:
Anaconda contains almost all the Python libraries covered in this book (see here for the full list of Anaconda libraries). Where we need a non-Anaconda library, we can use
pip (short for Pip Installs Python), the de facto standard for installing Python libraries. Using
pip to install is as easy as can be. Just call
pip install followed by the name of the package from the command line and it should be installed or, with any luck, give a sensible error:
$pip install dataset
Virtual environments provide a way of creating a sandboxed development environment with a particular Python version and/or set of third-party libraries. Using these virtual environments avoids polluting your global Python with these installs and gives you a lot more flexibility (you can play with different package versions or change your Python version if need be). The use of virtual environments is becoming a best practice in Python development, and I strongly suggest that you follow it.
Anaconda comes with a
conda system command that makes creating and using virtual environments easy. Let’s create a special one for this book, based on the full Anaconda package:
$ conda create --name pyjsviz anaconda ... # # To activate this environment, use: # $ source activate pyjsviz # # To deactivate this environment, use: # $ source deactivate #
As the final message says, to use this virtual environment you need only
source activate it (for Windows machines you can leave out the
sourceactivate pyjsviz discarding /home/kyran/anaconda/bin from PATH prepending /home/kyran/.conda/envs/pyjsviz/bin to PATH
Note that you get a helpful cue at the command line to let you know which virtual environment you’re using.
conda command can do a lot more than just facilitate virtual environments, combining the functionality of Python’s
pip installer and
virtualenv command, among other things. You can get a full rundown here.
To download Chrome, just go here and download the version for your operating system. This should be automatically detected.
If you want something slightly less Google-fied, then you can use Chromium, the browser based on the open source project from which Google Chrome is derived. You can find up-to-date instructions on installation here or just head to the main download page. Chromium tends to lag Chrome feature-wise but is still an eminently usable development browser.
To include a library via CDN, you use the usual
<script> tag, typically placed at the bottom of your HTML page. For example, the following call adds the latest (as of late 2015) version of D3:
You can just download the separate libraries and put them in your local server’s static folder. This is a typical folder structure. Third-party libraries go in the static/libs directory off root, like so:
nobel_viz/ └── static ├── css ├── data ├── libs │ └── d3.min.js └── js
If you organize things this way, to use D3 in your scripts now requires a local file reference with the
This book shows how to interact with the main SQL databases and MongoDB, the chief nonrelational or NoSQL database, from Python. We’ll be using SQLite, the brilliant file-based SQL database. Here are the download details for SQLite and MongoDB:
A great, file-based, serverless SQL database. It should come standard with OS X and Linux. For Windows, follow this guide.
By a long shot, the most popular NoSQL database. Installation instructions here.
Note that we’ll be using Python’s SQLAlchemy SQL library either directly or through libraries that build on it. This means we can convert any SQLite examples to another SQL backend (e.g., MySQL or PostgreSQL) by changing a configuration line or two.
MongoDB can be a little trickier to install than some databases, but it is well worth the effort. Its JSON-like document storage makes it a natural for web-based dataviz work.
For OS X users, check out the official docs for MongoDB installation instructions.
This Windows-specific guide from the official docs should get your MongoDB server up and running. You will probably need to use administrator privileges to create the necessary data directories and so on.
More often than not these days, you’ll be installing MongoDB to a Linux-based server, most commonly an Ubuntu variant, which uses the Deb file format to deliver its packages. The official MongoDB docs do a good job covering an Ubuntu install.
MongoDB uses a data directory to store to and, depending how you install it, you may need to create this yourself. On OS X and Linux boxes, the default is a data directory off the root directory, which you can create using
mkdir as a superuser (
$ sudo mkdir /data $ sudo mkdir /data/db
You’ll then want to set ownership to yourself:
$ sudo chown 'whoami' /data/db
With Windows, installing the MongoDB Community Edition, you can create the necessary data directory with the following command:
$ md \data\db
The MongoDB server will often be started by default on Linux boxes; otherwise, on Linux and OS X the following command will start a server instance:
On Windows Community Edition, the following, run from a command prompt, will start a server instance:
For Python, I have tried a few IDEs but they’ve never stuck. The main itch I was trying to scratch was a decent debugging system. Setting breakpoints in Python with a text editor isn’t particularly elegant, and using the command-line debugger
pdb feels a little too old school sometimes. Nevertheless, Python’s logging is so easy and effective that breakpoints became an edge case that didn’t justify leaving my favorite editor,4 which does pretty decent code completion and solid syntax highlighting.
In no particular order, here are a few that I’ve tried and not disliked:
1 There are a number of pretty reliable automatic converters out there.
2 The Scrapy team is working hard to rectify this. Scrapy relies on Python’s Twisted, an event-driven networking engine also making the journey to Python 3+ compatibility.
3 This is imported from the
__future__ module (i.e.,
from __future__ import print_function).
4 Emacs with VIM key bindings.
5 SQLite is great for development purposes and doesn’t need a server running on your machine.