Chapter 1. Development Setup

This chapter covers the downloads and software installations needed to use this book, and sketches out a recommended development environment. As you’ll see, this isn’t as onerous as it might once have been. I’ll cover Python and JavaScript dependencies separately and give a brief overview of cross-language IDEs.

The Accompanying Code

There’s a GitHub repository for the bulk of the code covered in this book, including the full Nobel Prize visualization. To get hold of it, just perform a git clone to a suitable local directory:

$ git clone https://github.com/Kyrand/
  dataviz-with-python-and-js.git

This should create a local dataviz-with-python-and-js directory with the key source code covered by the book.

Python

The bulk of the libraries covered in the book are Python-based, but what might have been a challenging attempt to provide comprehensive installation instructions for the various operating systems and their quirks is made much easier by the existence of Continuum Analytics’ Anaconda, a Python platform that bundles together most of the popular analytics libraries in a convenient package.

Anaconda

Installing some of the bigger Python libraries used to be a challenge all in itself, particularly those such as NumPy that depend on complex low-level C and Fortran packages. That’s why the existence of Anaconda is such a godsend. It does all the dependency checking and binary installs so you don’t have to. It’s also a very convenient resource for a book like this.

Python 2 or 3?

Right now, Python is in transition to version 3, a process that is taking longer than many would like. This is because Python 2+ works fine for many people, a lot of code will have to be converted,¹ and up until recently some of the big libraries, such as NumPy and Scipy, only worked for Python 2.5+.

Now that most of the major libraries are compatible with Python 3, it would be a no-brainer to recommend that version for this book. Unfortunately, one of the few holdouts is Scrapy, a big tool on our toolchain,² which you’ll learn about in Chapter 6. I don’t want to oblige you to run two versions, so for that reason we’ll be using the version 2 Anaconda package.

I will be using the new print function,³ which means all the non-Scrapy code will work fine with Python 3.

To get your free Anaconda install, just navigate your browser to https://www.continuum.io/downloads, choose the version for your operating system (as of late 2015, we’re going with Python 2.7), and follow the instructions. Windows and OS X get a graphical installer (just download and double-click), whereas Linux requires you to run a little bash script:

$ bash Anaconda-2.3.0-Linux-x86_64.sh

I recommend sticking to defaults when installing Anaconda.

Checking the Anaconda Install

The best way to check that your Anaconda install went well is to try firing up an IPython session at the command line. How you do this depends on your operating system:

At the Windows command prompt:

C:\Users\Kyran>ipython

At the OS X or Linux prompt:

$ ipython

This should produce something like the following:

kyran@Tweedledum:~/projects/pyjsbook$ ipython
Python 2.7.10 |Anaconda 2.3.0 (64-bit)|
               (default, May 28 2015, 17:02:03) Type
"copyright", "credits" or "license" for more information.

IPython 3.2.0 -- An enhanced Interactive Python.  Anaconda is
brought to you by Continuum Analytics.  Please check out:
http://continuum.io/thanks and
https://anaconda.org
...

Most installation problems will stem from a badly configured environment PATH variable. This PATH needs to contain the location of the main Anaconda directory and its Scripts subdirectory. In Windows, this should look something like:

'...C:\\Anaconda;C:\\Anaconda\Scripts...

You can access and adjust the environment variables in Windows 7 by typing environment variables in the program’s search field and selecting “Edit environment variables” or in XP via Control Panel→System→ Advanced→Environment Variables.

In OS X and Linux systems, you should be able to set your PATH variable explicitly by appending this line to the .bashrc file in your home directory:

export PATH=/home/${USER}/anaconda/bin:$PATH

Installing Extra Libraries

Anaconda contains almost all the Python libraries covered in this book (see here for the full list of Anaconda libraries). Where we need a non-Anaconda library, we can use pip (short for Pip Installs Python), the de facto standard for installing Python libraries. Using pip to install is as easy as can be. Just call pip install followed by the name of the package from the command line and it should be installed or, with any luck, give a sensible error:

$ pip install dataset

Virtual Environments

Virtual environments provide a way of creating a sandboxed development environment with a particular Python version and/or set of third-party libraries. Using these virtual environments avoids polluting your global Python with these installs and gives you a lot more flexibility (you can play with different package versions or change your Python version if need be). The use of virtual environments is becoming a best practice in Python development, and I strongly suggest that you follow it.

Anaconda comes with a conda system command that makes creating and using virtual environments easy. Let’s create a special one for this book, based on the full Anaconda package:

$ conda create --name pyjsviz anaconda
...
#
# To activate this environment, use:
# $ source activate pyjsviz
#
# To deactivate this environment, use:
# $ source deactivate
#

As the final message says, to use this virtual environment you need only source activate it (for Windows machines you can leave out the source):

$ source activate pyjsviz
discarding /home/kyran/anaconda/bin from PATH
prepending /home/kyran/.conda/envs/pyjsviz/bin to PATH
(pyjsviz) $

Note that you get a helpful cue at the command line to let you know which virtual environment you’re using.

The conda command can do a lot more than just facilitate virtual environments, combining the functionality of Python’s pip installer and virtualenv command, among other things. You can get a full rundown here.

JavaScript

The good news is that you don’t need much JavaScript software at all. The only must-have is the Chrome/Chromium web browser, which is used in this book. It offers the most powerful set of developer tools of any current browser and is cross-platform.

To download Chrome, just go here and download the version for your operating system. This should be automatically detected.

If you want something slightly less Google-fied, then you can use Chromium, the browser based on the open source project from which Google Chrome is derived. You can find up-to-date instructions on installation here or just head to the main download page. Chromium tends to lag Chrome feature-wise but is still an eminently usable development browser.

Content Delivery Networks

One of the reasons you don’t have to worry about installing JavaScript libraries is that the ones used in this book are available via content delivery networks (CDN). Rather than having the libraries installed on your local machine, the JavaScript is retrieved by the browser over the Web, from the closest available server. This should make things very fast—faster than if you served the content yourself.

To include a library via CDN, you use the usual <script> tag, typically placed at the bottom of your HTML page. For example, the following call adds the latest (as of late 2015) version of D3:

<script
 src="https://cdnjs.cloudflare.com/ajax/libs/d3/3.5.6/d3.min.js"
 charset="utf-8">
</script>

Installing Libraries Locally

If you need to install JavaScript libraries locally, because, for example, you anticipate doing some offline development work or can’t guarantee an Internet connection, there are a number of fairly simple ways to do so.

You can just download the separate libraries and put them in your local server’s static folder. This is a typical folder structure. Third-party libraries go in the static/libs directory off root, like so:

nobel_viz/
└── static
    ├── css
    ├── data
    ├── libs
    │     └── d3.min.js
    └── js

If you organize things this way, to use D3 in your scripts now requires a local file reference with the <script> tag:

<script src="/static/libs/d3.min.js"></script>

Databases

This book shows how to interact with the main SQL databases and MongoDB, the chief nonrelational or NoSQL database, from Python. We’ll be using SQLite, the brilliant file-based SQL database. Here are the download details for SQLite and MongoDB:

SQLite: A great, file-based, serverless SQL database. It should come standard with OS X and Linux. For Windows, follow this guide.

MongoDB: By a long shot, the most popular NoSQL database. Installation instructions here.

Note that we’ll be using Python’s SQLAlchemy SQL library either directly or through libraries that build on it. This means we can convert any SQLite examples to another SQL backend (e.g., MySQL or PostgreSQL) by changing a configuration line or two.

Installing MongoDB

MongoDB can be a little trickier to install than some databases, but it is well worth the effort. Its JSON-like document storage makes it a natural for web-based dataviz work.

For OS X users, check out the official docs for MongoDB installation instructions.

This Windows-specific guide from the official docs should get your MongoDB server up and running. You will probably need to use administrator privileges to create the necessary data directories and so on.

More often than not these days, you’ll be installing MongoDB to a Linux-based server, most commonly an Ubuntu variant, which uses the Deb file format to deliver its packages. The official MongoDB docs do a good job covering an Ubuntu install.

MongoDB uses a data directory to store to and, depending how you install it, you may need to create this yourself. On OS X and Linux boxes, the default is a data directory off the root directory, which you can create using mkdir as a superuser (sudo):

$ sudo mkdir /data
$ sudo mkdir /data/db

You’ll then want to set ownership to yourself:

$ sudo chown 'whoami' /data/db

With Windows, installing the MongoDB Community Edition, you can create the necessary data directory with the following command:

$ md \data\db

The MongoDB server will often be started by default on Linux boxes; otherwise, on Linux and OS X the following command will start a server instance:

$ mongod

On Windows Community Edition, the following, run from a command prompt, will start a server instance:

C:\mongodb\bin\mongod.exe

Integrated Development Environments

As I explain in “The Myth of IDEs, Frameworks, and Tools”, you don’t need an IDE to program in Python or JavaScript. The development tools provided by modern browsers, Chrome in particular, mean you only really need a good code editor to have pretty much the optimal setup. It’s free as in beer too.

For Python, I have tried a few IDEs but they’ve never stuck. The main itch I was trying to scratch was a decent debugging system. Setting breakpoints in Python with a text editor isn’t particularly elegant, and using the command-line debugger pdb feels a little too old school sometimes. Nevertheless, Python’s logging is so easy and effective that breakpoints became an edge case that didn’t justify leaving my favorite editor,⁴ which does pretty decent code completion and solid syntax highlighting.

In no particular order, here are a few that I’ve tried and not disliked:

PyCharm: This option offers solid code assistance and good debugging.
PyDev: If you like Eclipse and can tolerate its rather large footprint, this might well be for you.
WingIDE: This is a solid bet, with a great debugger and incremental improvements over a decade-and-a-half’s worth of development.

Summary

With free, packaged Python distributions such as Anaconda, and the inclusion of sophisticated JavaScript development tools in freely available web browsers, the necessary Python and JavaScript elements of your development environment are a couple of clicks away. Add a favorite editor and a database of choice,⁵ and you are pretty much good to go. There are additional libraries, such as node.js, that can be useful but don’t count as essential. Now that we’ve established our programming environment, the next chapters will teach the preliminaries needed to start our journey of data transformation along the toolchain, starting with a language bridge between Python and JavaScript.

¹ There are a number of pretty reliable automatic converters out there.

² The Scrapy team is working hard to rectify this. Scrapy relies on Python’s Twisted, an event-driven networking engine also making the journey to Python 3+ compatibility.

³ This is imported from the __future__ module (i.e., from __future__ import print_function).

⁴ Emacs with VIM key bindings.

⁵ SQLite is great for development purposes and doesn’t need a server running on your machine.

Get Data Visualization with Python and JavaScript now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Data Visualization with Python and JavaScript by Kyran Dale