Chapter 1. Development Setup

This chapter covers the downloads and software installations needed to use this book, and sketches out a recommended development environment. As you’ll see, this isn’t as onerous as it might once have been. I’ll cover Python and JavaScript dependencies separately and give a brief overview of cross-language IDEs.

The Accompanying Code

There’s a GitHub repository for the bulk of the code covered in this book, including the full Nobel Prize visualization. To get hold of it, just perform a git clone to a suitable local directory:

$ git clone https://github.com/Kyrand/dataviz-with-python-and-js-ed-2.git

This should create a local dataviz-with-python-and-js-v2 directory with the key source code covered by the book.

Python

The bulk of the libraries covered in the book are Python-based, but what might have been a challenging attempt to provide comprehensive installation instructions for the various operating systems and their quirks is made much easier by the existence of Anaconda, a Python platform that bundles together most of the popular analytics libraries in a convenient package. The book assumes you are using Python 3, which was released in 2008 and is now firmly established.

Anaconda

Installing some of the bigger Python libraries used to be a challenge all in itself, particularly those such as NumPy that depend on complex low-level C and Fortran packages. That’s a great deal easier now and most will happily install using Python’s easy_install with a pip command:

$ pip install NumPy

But some big number-crunching libraries are still tricky to install. Dependency management and versioning (you might need to use different versions of Python on the same machine) can make things trickier still, and this is where Anaconda comes into its own. It does all the dependency checking and binary installs so you don’t have to. It’s also a very convenient resource for a book like this.

To get your free Anaconda install, just navigate your browser to the Anaconda site, choose the version for your operating system (ideally at least Python 3.5), and follow the instructions. Windows and OS X get a graphical installer (just download and double-click), whereas Linux requires you to run a little bash script:

$ bash Anaconda3-2021.11-Linux-x86_64.sh

Here’s the latest installing instructions:

I recommend sticking to defaults when installing Anaconda.

The official check guide can be found at the Anaconda site. Windows and macOS users can use the Anaconda’s Navigator GUI or, along with Linux users, use the Conda command-line interface.

Installing Extra Libraries

Anaconda contains almost all the Python libraries covered in this book (see the Anaconda documentation for the full list of Anaconda library packages). Where we need a non-Anaconda library, we can use pip (short for Pip Installs Python), the de facto standard for installing Python libraries. Using pip to install is as easy as can be. Just call pip install followed by the name of the package from the command line and it should be installed or, with any luck, give a sensible error:

$ pip install dataset

Virtual Environments

Virtual environments provide a way of creating a sandboxed development environment with a particular Python version and/or set of third-party libraries. Using these virtual environments avoids polluting your global Python with these installs and gives you a lot more flexibility (you can play with different package versions or change your Python version if need be). The use of virtual environments is becoming a best practice in Python development, and I strongly suggest that you follow it.

Anaconda comes with a conda system command that makes creating and using virtual environments easy. Let’s create a special one for this book, based on the full Anaconda package:

$ conda create --name pyjsviz anaconda
...
#
# To activate this environment, use:
# $ source activate pyjsviz
#
# To deactivate this environment, use:
# $ source deactivate
#

As the final message says, to use this virtual environment you need only source activate it (for Windows machines, you can leave out the source):

$ source activate pyjsviz
discarding /home/kyran/anaconda/bin from PATH
prepending /home/kyran/.conda/envs/pyjsviz/bin to PATH
(pyjsviz) $

Note that you get a helpful cue at the command line to let you know which virtual environment you’re using.

The conda command can do a lot more than just facilitate virtual environments, combining the functionality of Python’s pip installer and virtualenv command, among other things. You can get a full rundown in the Anaconda documentation.

If you’re confident with standard Python virtual environments, these have been made a lot easier to work with by their incorporation in Python’s Standard Library. To create a virtual environment from the command line:

$ python -m venv python-js-viz

This creates a python-js-viz directory containing the various elements of the virtual environment. This includes some activation scripts. To activate the virtual environment with macOS or Linux, run the activate script:

$ source python-js-viz/bin/activate

On Windows machines, run the .bat file:

$ python-js-viz/Scripts/activate.bat

You can then use pip to install Python libraries to the virtual environment, avoiding polluting your global Python distribution:

$ (python-js-viz) pip install NumPy

To install all the libraries required by this book, you can use the requirements.txt file in the book’s GitHub repo:

$ (python-js-viz) pip install -r requirements.txt

You can find information on the virtual environment in the Python documentation.

JavaScript

The good news is that you don’t need much JavaScript software at all. The only must-have is the Chrome/Chromium web browser, which is used in this book. It offers the most powerful set of developer tools of any current browser and is cross-platform.

To download Chrome, just go to the home page and download the version for your operating system. This should be automatically detected.

All the JavaScript libraries used in this book can be found in the accompanying GitHub repo, but there are generally two ways to deliver them to the browser. You can use a content delivery network (CDN), which efficiently caches a copy of the library retrieved from the delivery network. Alternatively, you can use a local copy of the library served to the browser. Both of these methods use the script tag in an HTML document.

Content Delivery Networks

With CDNs, rather than having the libraries installed on your local machine, the JavaScript is retrieved by the browser over the web, from the closest available server. This should make things very fast—faster than if you served the content yourself.

To include a library via CDN, you use the usual <script> tag, typically placed at the bottom of your HTML page. For example, the following call adds a current version of D3:

<script
 src="https://cdnjs.cloudflare.com/ajax/libs/d3/7.1.1/d3.min.js"
 charset="utf-8">
</script>

Installing Libraries Locally

If you need to install JavaScript libraries locally, because, for example, you anticipate doing some offline development work or can’t guarantee an internet connection, there are a number of fairly simple ways to do so.

You can just download the separate libraries and put them in your local server’s static folder. This is a typical folder structure. Third-party libraries go in the static/libs directory off root, like so:

nobel_viz/
└── static
    ├── css
    ├── data
    ├── libs
    │     └── d3.min.js
    └── js

If you organize things this way, to use D3 in your scripts now requires a local file reference with the <script> tag:

<script src="/static/libs/d3.min.js"></script>

Databases

The recommended database for small to medium-sized dataviz projects is the brilliant, serverless, file-based, SQL-based SQLite. This database is used throughout the dataviz toolchain demonstrated in the book and is the only database you really need.

The book also covers basic Python interactions with MongoDB, the most popular nonrelational, or NoSQL database:

SQLite

SQLite should come as standard with macOS and Linux machines. For Windows, follow this guide.

MongoDB

You can find installation instructions for the various operating systems in the MongoDB documentation.

Note that we’ll be using Python’s SQLAlchemy SQL library either directly or through libraries that build on it. This means we can convert any SQLite examples to another SQL backend (e.g., MySQL or PostgreSQL) by changing a configuration line or two.

Getting MongoDB Up and Running

MongoDB can be a little trickier to install than some databases. As mentioned, you can follow this book perfectly well without going through the hassle of installing the server-based MongoDB, but if you want to try it out or find yourself needing to use it at work, here are some installation notes:

For OS X users, check out the official docs for MongoDB installation instructions.

This Windows-specific guide from the official docs should get your MongoDB server up and running. You will probably need to use administrator privileges to create the necessary data directories and so on.

More often than not these days, you’ll be installing MongoDB to a Linux-based server, most commonly an Ubuntu variant, which uses the deb file format to deliver its packages. The official MongoDB docs do a good job covering an Ubuntu install.

MongoDB uses a data directory to store to and, depending how you install it, you may need to create this yourself. On OS X and Linux boxes, the default is a data directory off the root directory, which you can create using mkdir as a superuser (sudo):

$ sudo mkdir /data
$ sudo mkdir /data/db

You’ll then want to set ownership to yourself:

$ sudo chown 'whoami' /data/db

With Windows, installing the MongoDB Community Edition, you can create the necessary data directory with the following command:

$ cd C:\
$ md "\data\db"

The MongoDB server will often be started by default on Linux boxes; otherwise, on Linux and OS X the following command will start a server instance:

$ mongod

On Windows Community Edition, the following, run from a command prompt, will start a server instance:

C:\mongodb\bin\mongod.exe

Easy MongoDB with Docker

MongoDB can be tricky to install. For example, current Ubuntu variants (> version 22.04) have incompatible SSL libs. If you have Docker installed, a working development DB on the default port 27017 is only a single command away:

$ sudo docker run -dp 27017:27017 -v local-mongo:/data/db
              --name local-mongo --restart=always mongo

This nicely side-steps local library incompatibilities and the like.

Integrated Development Environments

As I explain in “The Myth of IDEs, Frameworks, and Tools”, you don’t need an IDE to program in Python or JavaScript. The development tools provided by modern browsers, Chrome in particular, mean you only really need a good code editor to have pretty much the optimal setup.

One caveat here is that these days intermediate to advanced JavaScript tends to involve frameworks like React, Vue, and Svelte that do benefit from the bells and whistles provided by a decent IDE, particularly handling multiformat files (where HTML, CSS, and JS are all embedded together). The good news is that the freely available Visual Studio Code (VSCode) has become the de facto standard for modern web development. It’s got plug-ins for pretty much everything and a very large and active community, so questions tend to be answered and bugs hunted down fast.

For Python, I have tried a few dedicated IDEs but they’ve never stuck. The main itch I was trying to scratch was finding a decent debugging system. Setting breakpoints in Python with a text editor isn’t particularly elegant, and using the command-line debugger pdb feels a little too old school sometimes. Nevertheless, Python does have a pretty good logging system included, which takes the edge off its rather clunky default debugging. VSCode is pretty good for Python programming, but there are some Python-specific IDEs that are arguably a little smoother.

In no particular order, here are a few that I’ve tried and not disliked:

PyCharm

This option offers solid code assistance and good debugging and would probably top a favorite IDE poll of seasoned Pythonistas.

PyDev

If you like Eclipse and can tolerate its rather large footprint, this might well be for you.

Wing Python IDE

This is a solid bet, with a great debugger and incremental improvements over a decade-and-a-half’s worth of development.

Summary

With free, packaged Python distributions such as Anaconda, and the inclusion of sophisticated JavaScript development tools in freely available web browsers, the necessary Python and JavaScript elements of your development environment are a couple of clicks away. Add a favorite editor and a database of choice,1 and you are pretty much good to go. There are additional libraries, such as node.js, that can be useful but don’t count as essential. Now that we’ve established our programming environment, the next chapters will teach the preliminaries needed to start our journey of data transformation along the toolchain, starting with a language bridge between Python and JavaScript.

1 SQLite is great for development purposes and doesn’t need a server running on your machine.

Get Data Visualization with Python and JavaScript, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.