Chapter 4. Webdev 101

This chapter introduces the core web-development knowledge you will need to understand the web pages you scrape for data and to structure those you want to deliver as the skeleton of your JavaScripted visualizations. As you’ll see, a little knowledge goes a long way in modern webdev, particularly when your focus is building self-contained visualizations and not entire websites (see “Single-Page Apps” for more details).

The usual caveats apply: this chapter is part reference, part tutorial. There will probably be stuff here you know already, so feel free to skip over it and get to the new material.

The Big Picture

The humble web page, the basic building block of the World Wide Web (WWW)—that fraction of the Internet consumed by humans—is constructed from files of various types. Apart from the multimedia files (images, videos, sound, etc.), the key elements are textual, consisting of Hypertext Markup Language (HTML), Cascading Style Sheets (CSS), and JavaScript. These three, along with any necessary data files, are delivered using the Hypertext Transfer Protocol (HTTP) and used to build the page you see and interact with in your browser window, which is described by the Document Object Model (DOM), a hierarchical tree off which your content hangs. A basic understanding of how these elements interact is vital to building modern web visualizations, and the aim of this chapter is to get you quickly up to speed.

Web development is a big field, and the aim here is not to turn you into a full-fledged web developer. I assume you want to limit the amount of webdev you have to do as much as possible, focusing only on that fraction necessary to build a modern visualization. In order to build the sort of visualizations showcased at d3js.org, published in the New York Times, or incorporated in basic interactive data dashboards, you actually need surprisingly little webdev fu. The result of your labors should be easy to add to a larger website by someone dedicated to that job. In the case of small, personal websites, it’s easy enough to incorporate the visualization yourself.

Single-Page Apps

Single-page applications (SPAs) are web applications (or whole sites) that are dynamically assembled using JavaScript, often building upon a lightweight HTML backbone and CSS styles that can be applied dynamically using class and id attributes. Many modern data visualizations fit this description, including the Nobel Prize visualization this book builds toward.

Often self-contained, the SPA’s root folder can be easily incorporated in an existing website or stand alone, requiring only an HTTP server such as Apache or Nginx.

Thinking of our data visualizations in terms of SPAs removes a lot of the cognitive overhead from the webdev aspect of JavaScript visualizations, allowing us to focus on programming challenges. The skills required to put the visualization on the Web are still fairly basic and quickly amortized. Often it will be someone else’s job.

Tooling Up

As you’ll see, the webdev needed to make modern data visualizations requires no more than a decent text editor, modern browser, and a terminal (Figure 4-1). I’ll cover what I see as the minimal requirements for a webdev-ready editor and nonessential but nice-to-have features.

dvpj 0401
Figure 4-1. Primary webdev tools

My browser development tools of choice are Chrome’s web-developer kit, freely available on all platforms. It has a lot of tab-delineated functionality, the following of which I’ll cover in this chapter:

  • The Elements tab, which allows you to explore the structure of a web page, its HTML content, CSS styles, and DOM presentation

  • The Sources tab, where most of your JavaScript debugging will take place

You’ll need a terminal for output, starting your local web server, and sketching ideas with the IPython interpreter.

Before dealing with what you do need, let’s deal with a few things you don’t need when setting out, laying a couple of myths to rest on the way.

The Myth of IDEs, Frameworks, and Tools

There is a common assumption among the prospective JavaScripter that to program for the Web requires a complex toolset, primarily an Intelligent Development Environment (IDE), as used by enterprise—and other—coders everywhere. This is potentially expensive and presents another learning curve. The good news is that not only have I never used an IDE to program for the Web, but I can’t think of anyone I know in the discipline who does. In all probability, the wonderful web visualizations you have seen, which may have spurred you to pick up this book, were created with nothing more than a humble text editor, a modern web browser for viewing and debugging, and a console or terminal for logging and output.

There is also a common myth that one cannot be productive in JavaScript without using a framework of some kind. At the moment, a number of these frameworks are vying for control of the JS ecosystem, sponsored by the various huge companies that created them. These frameworks come and go at a dizzying rate, and my advice for anyone starting out in JavaScript is to ignore them entirely while you develop your core skills. Use small, targeted libraries, such as those in the jQuery ecosystem or Underscore’s functional programming extensions, and see how far you can get before needing a my way or the highway framework. Only lock yourself into a framework to meet a clear and present need, not because the current JS groupthink is raving about how great it is.1 Another important consideration is that D3, the prime web dataviz library, doesn’t really play well with any of the bigger frameworks I know, particularly the ones that want control over the DOM.

Another thing you’ll find if you hang around webdev forums, Reddit lists, and Stack Overflow is a huge range of tools constantly clamoring for attention. There are JS+CSS minifiers, watchers to automatically detect file changes and reload web pages during development, among others. While a few of these have their place, in my experience there are a lot of flaky tools that probably cost more time in hair-tearing than they gain in productivity. To reiterate, you can be very productive without these things and should only reach for one to scratch an urgent itch. Some are keepers, but very few are even remotely essential for data visualization work.

A Text-Editing Workhorse

First and foremost among your webdev tools is a text editor that you are comfortable with and which can, at the very least, do syntax highlighting for multiple languages—in our case, HTML, CSS, JavaScript, and Python. You can get away with a plain, nonhighlighting editor, but in the long run it will prove to be a pain. Things like syntax highlighting, code linting, intelligent indentation, and the like remove a huge cognitive load from the process of programming, so much so that I see their absence as a limiting factor. These are my minimal requirements for a text editor:

  • Syntax highlighting for all languages you use

  • Configurable indentation levels and types for languages (e.g., Python 4 soft tabs, JavaScript 2 soft tabs)

  • Multiple windows/panes/tabs to allow easy navigation around your code base

If you are using a relatively advanced text editor, all the above should come as standard with the exception of code linting, which will probably require a bit of configuration.

My leading candidate for nice to have is a decent code linter. If the mark of a useful tool is how much you would miss its absence, then code linting is easily in my top five. For scripting languages like Python and JavaScript, there’s only so much intelligent code analysis that can be achieved syntactically, but just sanity-checking the obvious syntax errors can be a huge time saver. In JavaScript in particular, some mistakes are transparent, in the sense that things will run in spite of them, and will quite often produce confusing error messages. A code linter can save you time here and enforce good practice. Figure 4-2 shows a contrived example of a JavaScript code linter in action.

dvpj 0402
Figure 4-2. A running code linter analyzes the JavaScript continuously, highlighting syntax errors in red and adding a ! to the left of the offending line

A recent addition to Ecmascript 52 is a strict mode, which enforces a modern JavaScript context. This mode is recognized by most linters and you can invoke it by placing 'use strict' at the top of your program or within a function, to restrict it to that context. Modern browsers should also honor strict mode, throwing errors for non-compliance. In strict mode, trying to assign foo = "bar"; will fail if foo hasn’t been previously defined. See John Resig’s blog for a nice explanation of strict mode.

Browser with Development Tools

One of the reasons an IDE is pretty much redundant in modern webdev is that the best place to do debugging is in the web browser itself, and such is the pace of change there that any IDE attempting to emulate that context will have its work cut out for it. On top of this, modern web browsers have evolved a powerful set of debugging and development tools. Firefox’s Firebug led the way but has since been surpassed by Chrome Developer, which offers a huge amount of functionality, from sophisticated (certainly to a Pythonista) debugging (parametric breakpoints, variable watches, etc.) to memory and processor optimization profiling, device emulation (want to know what your web page looks like on a smartphone or tablet?), and a whole lot more. Chrome Developer is my debugger of choice and will be used in this book. Like everything covered, it’s free as in beer.

Terminal or Command Prompt

The terminal or command line is where you initiate the various servers and probably output useful logging information. It’s also where you’ll try out Python modules or run a Python interpreter (IPython being in many ways the best).

In OS X and Linux, this window is called a Terminal or xterm. In Windows, it’s a command prompt that should be available through clicking Start→All Programs→Accessories.

Building a Web Page

There are four elements to a typical web visualization:

  • An HTML skeleton, with placeholders for our programmatic visualization

  • Cascading Style Sheets (CSS), which define the look and feel (e.g., border widths, colors, font sizes, placement of content blocks).

  • JavaScript to build the visualization

  • Data to be transformed

The first three of these are just text files, created using our favorite editor and delivered to the browser by the web server (see Chapter 12). Let’s examine each in turn.

Serving Pages with HTTP

The delivery of the HTML, CSS, and JS files that are used to make a particular web page (and any related data files, multimedia, etc.) is negotiated between a server and browser using the Hypertext Transfer Protocol. HTTP provides a number of methods, the most commonly used being GET, which requests a web resource, retrieving data from the server if all goes well or throwing an error if it doesn’t. We’ll be using GET, along with Python’s requests module, to scrape some web page content in Chapter 6.

To negotiate the browser-generated HTTP requests, you’ll need a server. In development, you can run a little server locally using Python’s command-line initialized SimpleHTTPServer, like this:

$ python -m SimpleHTTPServer
Serving HTTP on 0.0.0.0 port 8000 ...

This server is now serving content locally on port 8000. You can access the site it is serving by going to the URL http://localhost:8000 in your browser.

SimpleHTTPServer is a nice thing to have and OK for demos and the like, but it lacks a lot of basic functionality. For this reason, as we’ll see in Part IV, it’s better to master the use of a proper development (and production) server like Flask (this book’s server of choice).

The DOM

The HTML files you send through HTTP are converted at the browser end into a Document Object Model, or DOM, which can in turn be adapted by JavaScript because this programmatic DOM is the basis of dataviz libraries like D3. The DOM is a tree structure, represented by hierarchical nodes, the top node being the main web page or document.

Essentially, the HTML you write or generate with a template is converted by the browser into a tree hierarchy of nodes, each one representing an HTML element. The top node is called the Document Object and all other nodes descend in a parent-child fashion. Programmatically manipulating the DOM is at the heart of such libraries as jQuery and the mighty D3, so it’s vital to have a good mental model of what’s going on. A great way to get the feel for the DOM is to use a web tool such as Chrome Developer (my recommended toolset) to inspect branches of the tree.

Whatever you see rendered on the web page, the bookkeeping of the object’s state (displayed or hidden, matrix transform, etc.) is being done with the DOM. D3’s powerful innovation was to attach data directly to the DOM and use it to drive visual changes (Data-Driven Documents).

The HTML Skeleton

A typical web visualization uses an HTML skeleton, and builds the visualization on top of it using JavaScript.

HTML is the language used to describe the content of a web page. It was first proposed by physicist Tim Berners-Lee in 1980 while he was working at the CERN particle accelerator complex in Switzerland. It uses tags such as <div>, <image>, and <h> to structure the content of the page, while CSS is used to define the look and feel.3 The advent of HTML5 has reduced the boilerplate considerably, but the essence has remained essentially unchanged over those thirty years.

Fully specced HTML used to involve a lot of rather confusing header tags, but with HTML5 some thought was put into a more user-friendly minimalism. This is pretty much the minimal requirement for a starting template:4

<!DOCTYPE html>
<meta charset="utf-8">
<body>
    <!-- page content -->
</body>

So we need only declare the document HTML, our character-set 8-bit Unicode, and a <body> tag below which to add our page content. This is a big improvement on the bookkeeping required before and provides a very low threshold to entry as far as creating the documents that will be turned into web pages goes. Note the comment tag form: <!-- comment -->.

More realistically, we would probably want to add some CSS and JavaScript. You can add both directly to an HTML document by using the <style> and <script> tags like this:

<!DOCTYPE html>
<meta charset="utf-8">
<style>
    <!-- CSS  -->
</style>
<body>
    <!-- page content -->
    <script>
        <!-- JavaScript -->
    </script>
</body>

This single-page HTML form is often used in examples such as the visualizations at d3js.org. It’s convenient to have a single page to deal with when demonstrating code or keeping track of files, but generally I’d suggest separating the HTML, CSS, and JavaScript elements into separate files. The big win here, apart from easier navigation as the code base gets larger, is that you can take full advantage of your editor’s specific language enhancements such as solid syntax highlighting and code linting (essentially syntax checking on the fly). While some editors and libraries claim to deal with embedded CSS and JavaScript, I haven’t found an adequate one.

To use CSS and JavaScript files, we just include them in the HTML using <link> and <script> tags like this:

<!DOCTYPE html>
<meta charset="utf-8">
<link rel="stylesheet" href="style.css" />
<body>
    <!-- page content -->
    <script type="text/javascript" src="script.js"></script>
</body>

Marking Up Content

Visualizations often use a small subset of the available HTML tags, usually building the page programmatically by attaching elements to the DOM tree.

The most common tag is the <div>, marking a block of content. <div>s can contain other <div>s, allowing for a tree hierarchy, the branches of which are used during element selection and to propagate user interface (UI) events such as mouse clicks. Here’s a simple <div> hierarchy:

<div id="my-chart-wrapper" class="chart-holder dev">
    <div id="my-chart" class="bar chart">
         this is a placeholder, with parent #my-chart-wrapper
    </div>
</div>

Note the use of id and class attributes. These are used when you’re selecting DOM elements and to apply CSS styles. IDs are unique identifiers; each element should have only one and there should be only one occurrence of any particular id per page. The class can be applied to multiple elements, allowing bulk selection, and each element can have multiple classes.

For textual content, the main tags are <p>, <h*>, and <br>. You’ll be using these a lot. This code produces Figure 4-3:

<h2>A Level-2 Header</h2>
<p>A paragraph of body text with a line break here..</br>
and a second paragraph...</p>
dvpj 0403
Figure 4-3. An h2 header and text

Header tags are reverse-ordered by size from the largest <h1>.

<div>, <h*>, and <p> are what is known as block elements. They normally begin and end with a new line. The other class of tag is inline elements, which display without line breaks. Images <img>, hyperlinks <a>, and table cells <td> are among these, which include the <span> tag for inline text:

<div id="inline-examples">
    <img src="path/to/image.png" id="prettypic"> 1
    <p>This is a <a href="link-url">link</a> to
        <span class="url">link-url</span></p> 2
</div>
1

Note that we don’t need a closing tag for images.

2

The span and link are continuous in the text.

Other useful tags include lists, ordered <ol> and unordered <ul>:

<ol>
    <li>First item</li>
    <li>Second item</li>
</ol>

HTML also has a dedicated <table> tag, useful if you want to present raw data in your visualization. This HTML produces the header and row in Figure 4-4:

 <table id="chart-data">
  <tr> 1
    <th>Name</th>
    <th>Category</th>
    <th>Country</th>
  </tr>
  <tr> 2
    <td>Albert Einstein</td>
    <td>Physics</td>
    <td>Switzerland</td>
  </tr>
</table>
1

The header row

2

The first row of data

dvpj 0404
Figure 4-4. An HTML table

When you are making web visualizations, the most often used of the tags above are the textual tags, which provide instructions, information boxes, and so on. But the meat of our JavaScript efforts will probably be devoted to building DOM branches rooted on the Scalable Vector Graphics (SVG) <svg> and <canvas> tags. On most modern browsers, the <canvas> tag also supports a 3D WebGL context, allowing OpenGL visualizations to be embedded in the page.

We’ll deal with SVG, the focus of this book and the format used by the mighty D3 library, in “Scalable Vector Graphics”. Now let’s look at how we add style to our content blocks.

CSS

CSS, short for Cascading Style Sheets, is a language for describing the look and feel of a web page. Though you can hardcode style attributes into your HTML, it’s generally considered bad practice.5 It’s much better to label your tag with an id or class and use that to apply styles in the stylesheet.

The key word in CSS is cascading. CSS follows a precedence rule so that in the case of a clash, the latest style overrides earlier ones. This means the order of inclusion for sheets is important. Usually, you want your stylesheet to be loaded last so that you can override both the browser defaults and styles defined by any libraries you are using.

Figure 4-5 shows how CSS is used to apply styles to the HTML elements. First you select the element using hashes (#) to indicate a unique ID and dots (.) to select members of a class. You then define one or more property/value pairs. Note that the font-family property can be a list of fallbacks, in order of preference. Here we want the browser default font-family of serif (capped strokes) to be replaced with the more modern sans-serif, with Helvetica Neue as our first choice.

dvpj 0405
Figure 4-5. Styling the page with CSS

Understanding CSS precedence rules is key to successfully applying styles. In a nutshell, the order is:

  1. !important after CSS property trumps all.

  2. The more specific the better (i.e., ids override classes).

  3. The order of declaration: last declaration wins, subject to 1 and 2.

So, for example, say we have a <span> of class alert:

<span class="alert" id="special-alert">
something to be alerted to</span>

Putting the following in our style.css file will make the alert text red and bold:

.alert { font-weight:bold; color:red }

If we then add this to the style.css, the id color black will override the class color red, while the class font-weight remains bold:

#special-alert {background: yellow; color:black}

To enforce the color red for alerts, we can use the !important directive:6

.alert { font-weight:bold; color:red !important }

If we then add another stylesheet, style2.css, after style.css:

<link rel="stylesheet" href="style.css" type="text/css" />
<link rel="stylesheet" href="style2.css" type="text/css" />

with style2.css containing the following:

.alert { font-weight:normal }

then the font-weight of the alert will be reverted to normal because the new class style was declared last.

JavaScript

JavaScript is the only first-class, browser-based programming language. In order to do anything remotely advanced (and that includes all modern web visualizations), you should have a JavaScript grounding. Other languages that claim to make client-side/browser programming easier, such as Typescript, Coffeescript, and the like, compile to JavaScript, which means debugging either uses (generally flaky) mapping files or involves understanding the automated JavaScript. 99% of all web visualization examples, the ones you should aim to be learning from, are in JavaScript, and voguish alternatives have a way of fading with time. In essence, good competence in (if not mastery of) JavaScript is a prerequisite for interesting web visualizations.

The good news for Pythonistas is that JavaScript is actually quite a nice language once you’ve tamed a few of its more awkward quirks.7 As I showed in Chapter 2, JavaScript and Python have a lot in common and it’s usually easy to translate from one to the other.

Data

The data needed to fuel your web visualization will be provided by the web server as static files (e.g., JSON or CSV files) or dynamically through some kind of web API (e.g., RESTful APIs), usually retrieving the data server-side from a database. We’ll be covering all these forms in Part IV.

Although a lot of data used to be delivered in XML form, modern web visualization is predominantly about JSON and, to a lesser extent, CSV or TSV files.

JSON (short for JavaScript Object Notation) is the de facto web visualization data standard and I recommend you learn to love it. It obviously plays very nicely with JavaScript, but its structure will also be familiar to Pythonistas. As we saw in “JSON”, reading and writing JSON data with Python is a snap. Here’s a little example of some JSON data:

{
  "firstName": "Groucho",
  "lastName": "Marx",
  "siblings": ["Harpo", "Chico", "Gummo", "Zeppo"],
  "nationality": "American",
  "yearOfBirth": 1890
}

Chrome’s Developer Tools

The arms race in JavaScript engines in recent years, which has produced huge increases in performance, has been matched by an increasingly sophisticated range of development tools built in to the various browers. Firefox’s Firebug led the pack for a while but Chrome’s Developer Tools have surpassed it, and are adding functionality all the time. There’s now a huge amount you can do with Chrome’s tabbed tools, but here I’ll introduce the two most useful tabs, the HTML+CSS-focused Elements and the JavaScript-focused Sources. Both of these work in complement to Chrome’s developer console, demonstrated in “JavaScript”.

The Elements Tab

To access the Elements tab, select More Tools→Developer Tools from the righthand options menu or use the Ctrl-Shift-I keyboard shortcut.

Figure 4-6 shows the Elements tab at work. You can select DOM elements on the page by using the lefthand magnifying glass and see their HTML branch in the left panel. The right panel allows you to see CSS styles applied to the element and look at any event listeners that are attached or DOM properties.

dvpj 0406
Figure 4-6. Chrome Developer Tools Elements tab

One really cool feature of the Elements tab is that you can interactively change element styling for both CSS styles and attributes.8 This is a great way to refine the look and feel of your data visualizations.

Chrome’s Elements tab provides a great way to explore the structure of a page, finding out how the different elements are positioned. This is good way to get your head around positioning content blocks with the position and float properties. Seeing how the pros apply CSS styles is a really good way to up your game and learn some useful tricks.

The Sources Tab

The Sources tab allows you to see any JavaScript included in the page. Figure 4-7 shows the tab at work. In the lefthand panel, you can select a script or an HTML file with embedded <script> tagged JavaScript. As shown, you can place a breakpoint in the code, load the page, and, on break, see the call stack and any scoped or global variables. These breakpoints are parametric, so you can set conditions for them to trigger, which is handy if you want to catch and step through a particular configuration. On break, you have the standard to step in, out, and over functions, and so on.

dvpj 0407
Figure 4-7. Chrome Developer Tools Sources tab

The Sources tab is a fantastic resource and is the main reason why I hardly ever turn to console logging when trying to debug JavaScript. In fact, where JS debugging was once a hit-and-miss black art, it is now almost a pleasure.

Other Tools

There’s a huge amount of functionality in those Chrome Developer Tools tabs and they are being updated almost daily. You can do memory and CPU timelines and profiling, monitor your network downloads, and test out your pages for different form factors. But you’ll spend 99% of your time as a data visualizer in the Elements and Sources tabs.

A Basic Page with Placeholders

Now that we have covered the major elements of a web page, let’s put them together. Most web visualizations start off as HTML and CSS skeletons, with placeholder elements ready to be fleshed out with a little JavaScript plus data (see “Single-Page Apps”).

We’ll first need our HTML skeleton, using the code in Example 4-1. This consists of a tree of <div> content blocks defining three chart-elements: a header, main, and sidebar section. We’ll save this file as index.html.

Example 4-1. The file index.html, our HTML skeleton
<!DOCTYPE html>
<meta charset="utf-8">

<link rel="stylesheet" href="style.css" type="text/css" />

<body>

  <div id="chart-holder" class="dev">
    <div id="header">
      <h2>A Catchy Title Coming Soon...</h2>
      <p>Some body text describing what this visualization is all
      about and why you should care.</p>
    </div>
    <div id="chart-components">
      <div id="main">
        A placeholder for the main chart.
      </div><div id="sidebar">
        <p>Some useful information about the chart,
          probably changing with user interaction.</p>
      </div>
    </div>
  </div>

  <script src="script.js"></script>
</body>

Now we have our HTML skeleton, we want to style it using some CSS. This will use the classes and ids of our content blocks to adjust size, position, background color, etc. To apply our CSS, in Example 4-1 we import a style.css file, shown in Example 4-2.

Example 4-2. The style.css file, providing our CSS styling
body {
    background: #ccc;
    font-family: Sans-serif;
}

div.dev { 1
    border: solid 1px red;
}

div.dev div {
    border: dashed 1px green;
}

div#chart-holder {
    width: 600px;
    background :white;
    margin: auto;
    font-size :16px;
}

div#chart-components {
    height :400px;
    position :relative; 2
}

div#main, div#sidebar {
    position: absolute; 3
}

div#main {
    width: 75%;
    height: 100%;
    background: #eee;
}

div#sidebar {
    right: 0; 4
    width: 25%;
    height: 100%;
}
1

This dev class is a handy way to see the border of any visual blocks, which is useful for visualization work.

2

Makes chart-components the relative parent.

3

Makes the main and sidebar positions relative to chart-components.

4

Positions this block flush with the right wall of chart-components.

We use absolute positioning of the main and sidebar chart elements (Example 4-2). There are various ways to position the content blocks with CSS, but absolute positioning gives you explicit control over their placement, which is a must if you want to get the look just right.

After specifying the size of the chart-components container, the main and sidebar child elements are sized and positioned using percentages of their parent. This means any changes to the size of chart-components will be reflected in its children.

With our HTML and CSS defined, we can examine the skeleton by firing up Python’s single-line SimpleHTTPServer in the project directory containing the index.html and style.css files defined in Examples 4-1 and 4-2, like so:

$ python -m SimpleHTTPServer
Serving HTTP on 0.0.0.0 port 8000 ...

Figure 4-8 shows the resulting page with the Elements tab open, displaying the page’s DOM tree.

The chart’s content blocks are now positioned and sized correctly, ready for JavaScript to add some engaging content.

dvpj 0408
Figure 4-8. Building a basic web page

Filling the Placeholders with Content

With our content blocks defined in HTML and positioned with CSS, a modern data visualization uses JavaScript to construct its interactive charts, menus, tables, and the like. There are many ways to create visual content (aside from image or multimedia tags) in your modern browser, the main ones being:

  • Scalable Vector Graphics (SVG) using special HTML tags

  • Drawing to a 2D canvas context

  • Drawing to a 3D canvas WebGL context, allowing a subset of OpenGL commands

  • Using modern CSS to create animations, graphic primitives, and more.

Because SVG is the language of choice for D3, in many ways the biggest JavaScript dataviz library, many of the cool web data visualizations you have seen, such as those by the New York Times, are built using it. Broadly speaking, unless you anticipate having lots (>1,000) of moving elements in your visualization or need to use a specific canvas-based library, SVG is probably the way to go.

By using vectors instead of pixels to express its primitives, SVG will generally produce cleaner graphics that respond smoothly to scaling operations. It’s also much better at handling text, a crucial consideration for many visualizations. Another key advantage of SVG is that user interaction (e.g., mouse hovering or clicking) is native to the browser, being part of the standard DOM event handling.9 A final point in its favor is that because the graphic components are built on the DOM, you can inspect and adapt them using your browser’s development tools (see “Chrome’s Developer Tools”). This can make debugging and refining your visualizations much easier than trying to find errors in the canvas’s relatively black box.

canvas graphics contexts come into their own when you need to move beyond simple graphic primitives like circles and lines, such as when incorporating images like PNGs and JPGs. canvas is usually considerably more performant than SVG, so anything with lots of moving elements10 is better off rendered to a canvas. If you want to be really ambitious or move beyond 2D graphics, you can even unleash the awesome power of modern graphics cards by using a special form of canvas context, the OpenGL-based WebGL context. Just bear in mind that what would be simple user interaction with SVG (e.g., clicking on a visual element) often has to be derived from mouse coordinates manually, which adds a tricky layer of complexity.

The Nobel Prize data visualization realized at the end of this book’s toolchain is built primarily with D3, so SVG graphics are the focus of this book. Being comfortable with SVG is fundamental to modern web-based dataviz, so let’s take a little primer.

Scalable Vector Graphics

It doesn’t seem long ago that Scalable Vector Graphics seemed all washed up. Browser coverage was spotty and few big libraries were using it. It seemed inevitable that the canvas tag would act as a gateway to full-fledged, rendered graphics based on leveraging the awesome power of modern graphics cards. Pixels—not vectors—would be the building block of web graphics and SVG would go down in history as a valiant but ultimately doomed “nice idea.”

D3 might not single-handedly have rescued SVG in the browser, but it must take the lion’s share of responsibility. By demonstrating what can be done by using data to manipulate or drive the web page’s DOM, it has provided a compelling use case for SVG. D3 really needs its graphic primitives to be part of the document hierarchy, in the same domain as the other HTML content. In this sense it needed SVG as much as SVG needed it.

The <svg> Element

All SVG creations start with an <svg> root tag. All graphical elements, such as circles and lines, and groups thereof, are defined on this branch of the DOM tree. Example 4-3 shows a little SVG context we’ll use in upcoming demonstrations, a light-gray rectangle with id chart. We also include the D3 library, loaded from d3js.org and a script.js JavaScript file in the project folder.

Example 4-3. A basic SVG context
<!DOCTYPE html>
<meta charset="utf-8">
<!-- A few CSS style rules -->
<style>
  svg#chart {
  background: lightgray;
  }
</style>

<svg id="chart" width="300" height="225">
</svg>

<!-- Third-party libraries and our JS script. -->
<script src="http://d3js.org/d3.v3.min.js"></script>
<script src="script.js"></script>

Now that we’ve got our little SVG canvas in place, let’s start doing some drawing.

The <g> Element

We can group shapes within our <svg> element by using the group <g> element. As we’ll see in “Working with Groups”, shapes contained in a group can be manipulated together, including changing their position, scale, or opacity.

Circles

Creating SVG visualizations, from the humblest little static bar chart to full-fledged interactive, geographic masterpieces, involves putting together elements from a fairly small set of graphical primitives such as lines, circles, and the very powerful paths. Each of these elements will have its own DOM tag, which will update as it changes.11 For example, its x and y attributes will change to reflect any translations within its <svg> or group (<g>) context.

Let’s add a circle to our <svg> context to demonstrate:

<svg id="chart" width="300" height="225">
  <circle r="15" cx="100" cy="50"></circle>
</svg>

This produces Figure 4-9. Note that the y coordinate is measured from the top of the <svg> '#chart' container, a common graphic convention.

dvpj 0409
Figure 4-9. An SVG circle

Now let’s see how we go about applying styles to SVG elements.

Applying CSS Styles

The circle in Figure 4-9 is fill-colored light blue using CSS styling rules:

#chart circle{ fill: lightblue }

In modern browsers, you can set most visual SVG styles using CSS, including fill, stroke, stroke-width, and opacity. So if we wanted a thick, semi-transparent green line (with id total) we could use the following CSS:

#chart line#total {
    stroke: green;
    stroke-width: 3px;
    opacity: 0.5;
}

You can also set the styles as attributes of the tags, though CSS is generally preferable.

<circle r="15" cx="100" cy="50" fill="lightblue"></circle>
Tip

Which SVG features can be set by CSS and which can’t is a source of some confusion and plenty of gotchas. The SVG spec distinguishes between element properties and attributes, the former being more likely to be found among the valid CSS styles. You can investigate the valid CSS properties using Chrome’s Elements tab and its autocomplete. Also, be prepared for some surprises. For example, SVG text is colored by the fill, not color, property.

For fill and stroke, there are various color conventions you can use:

  • Named HTML colors, such as lightblue

  • Using HTML hex codes (#RRGGBB); for example, white is #FFFFFF

  • RGB values; for example, red = rgb(255, 0, 0)

  • RGBA values, where A is an alpha channel (0–1); for example, half-transparent blue is rgba(0, 0, 255, 0.5)

In addition to adjusting the color’s alpha channel with RGBA, you can fade the SVG elements using their opacity property. Opacity is used a lot in D3 animations.

Stroke width is measured in pixels by default but can use points.

Lines, Rectangles, and Polygons

We’ll add a few more elements to our chart to produce Figure 4-10.

dvpj 0410
Figure 4-10. Adding a few elements to our dummy chart

First we’ll add a couple of simple axis lines to our chart, using the <line> tag. Line positions are defined by a start coordinate (x1, y1) and an end one (x2, y2):

<line x1="20" y1="20" x2="20" y2="130"></line>
<line x1="20" y1="130" x2="280" y2="130"></line>

We’ll also add a dummy legend box in the top-right corner using an SVG rectangle. Rectangles are defined by x and y coordinates relative to their parent container, and a width and height:

<rect x="240" y="5" width="55" height="30"></rect>

You can create irregular polygons using the <polygon> tag, which takes a list of coordinate pairs. Let’s make a triangle marker in the bottom right of our chart:

<polygon points="210,100, 230,100, 220,80"></polygon>

We’ll style the elements with a little CSS:

#chart circle {fill: lightblue}
#chart line {stroke: #555555; stroke-width: 2}
#chart rect {stroke: red; fill: white}
#chart polygon {fill: green}

Now that we’ve got a few graphical primitives in place, let’s see how we add some text to our dummy chart.

Text

One of the key strengths of SVG over the rasterized canvas context is how it handles text. Vector-based text tends to look a lot clearer than its pixelated counterparts and benefits from smooth scaling, too. You can also adjust stroke and fill properties, just like any SVG element.

Let’s add a bit of text to our dummy chart: a title and labeled y-axis (see Figure 4-11).

We place text using x and y coordinates. One important property is the text-anchor, which stipulates where the text is placed relative to its x position. The options are start, middle, and end; start is the default.

We can use the text-anchor property to center our chart title. We set the x coordinates at half the chart width and then set the text-anchor to middle:

<text id="title" text-anchor="middle" x="150" y="20">
  A Dummy Chart
</text>

As with all SVG primitives, we can apply scaling and rotation transforms to our text. To label our y-axis, we’ll need to rotate the text to the vertical (Example 4-4). By convention, rotations are clockwise by degree so we’ll want a counterclockwise, –90 degree rotation. By default rotations are around the (0,0) point of the element’s container (<svg> or group <g>). We want to rotate our text around its own position, so first translate the rotation point using the extra arguments to the rotate function. We also want to first set the text-anchor to the end of the y axis label string to rotate about its end point.

Example 4-4. Rotating text
<text x="20" y="20" transform="rotate(-90,20,20)"
      text-anchor="end" dy="0.71em">y axis label</text>

In Example 4-4, we make use of the text’s dy attribute, which, along with dx, can be used to make fine adjustments to the text’s position. In this case, we want to lower it so that when rotated counterclockwise it will be to the right of the y-axis.

SVG text elements can also be styled with CSS. Here we set the font-family of the chart to sans-serif and the font-size to 16px, using the title id to make that a little bigger:

#chart {
background: #eee;
font-family: sans-serif;
}
#chart text{ font-size: 16px }
#chart text#title{ font-size: 18px }
dvpj 0411
Figure 4-11. Some SVG text

Note that the text elements inherit font-family and font-size from the chart’s CSS; you don’t have to specify a text element.

Paths

Paths are the most complicated and powerful SVG element, enabling the creation of multiline, multicurve component paths that can be closed and filled, creating pretty much any shape you want. A simple example is adding a little chart line to our dummy chart to produce Figure 4-12.

dvpj 0412
Figure 4-12. A red line path from the chart axis

The red path in Figure 4-12 is produced by the following SVG:

<path d="M20 130L60 70L110 100L160 45"></path>

The path’s d attribute specifies the series of operations needed to make the red line. Let’s break it down:

  • “M20 130”: move to coordinate (20, 130)

  • “L60 70”: draw a line to (60, 70)

  • “L110 100”: draw a line to (110, 100)

  • “L160 45”: draw a line to (160, 45)

You can imagine d as a set of instructions to a pen to move to a point with M raising the pen from the canvas.

A little CSS styling is needed. Note that the fill is set to none; otherwise, to create a fill area, the path would be closed, drawing a line from its end to beginning points, and any enclosed areas filled in with the default color black:

#chart path {stroke: red; fill: none}

As well as the moveto 'M' and lineto 'L', the path has a number of other commands to draw arcs, Bézier curves, and the like. SVG arcs and curves are commonly used in dataviz work, with many of D3’s libraries making use of them.12 Figure 4-13 shows some SVG elliptical arcs created by the following code:

<svg id="chart" width="300" height="150">
  <path d="M40 40
           A30 40  1
           0 0 1  2
           80 80
           A50 50  0 0 1  160  80
           A30 30  0 0 1  190  80
">
</svg>
1

Having moved to position (40, 40), draw an elliptical arc with x-radius 30, y-radius 40, and end point (80, 80).

2

The last two flags (0, 1) are large-arc-flag, specifying which arc of the ellipse to use and sweep-flag, which specifies which of the two possible ellipses defined by start and end points to use.

dvpj 0413
Figure 4-13. Some SVG elliptical arcs

The key flags used in the elliptical arc (large-arc-flag and sweep-flag) are, like most things geometric, better demonstrated than described. Figure 4-14 shows the effect of changing the flags for the same relative beginning and end points, like so:

<svg id="chart" width="300" height="150">
  <path d="M40 80
           A30 40  0 0 1  80 80
           A30 40  0 0 0  120  80
           A30 40  0 1 0  160  80
           A30 40  0 1 1  200  80
">
</svg>
dvpj 0414
Figure 4-14. Changing the elliptic-arc flags

As well as lines and arcs, the path element offers a number of Bézier curves, including quadratic, cubic, and compounds of the two. With a little work, these can create any line path you want. There’s a nice run-through on SitePoint with good illustrations.

For the definitive list of path elements and their arguments, go to the w3 source. And for a nice round-up, see Jakob Jenkov’s introduction.

Scaling and Rotating

As befits their vector nature, all SVG elements can be transformed by geometric operations. The most commonly used are rotate, translate, and scale, but you can also apply skewing using skewX and skewY or use the powerful, multipurpose matrix transform.

Let’s demonstrate the most popular transforms, using a set of identical rectangles. The transformed rectangles in Figure 4-15 are achieved like so:

<svg id="chart" width="300" height="150">
  <rect width="20" height="40" transform="translate(60, 55)"
        fill="blue"/>
  <rect width="20" height="40" transform="translate(120, 55),
        rotate(45)" fill="blue"/>
  <rect width="20" height="40" transform="translate(180, 55),
        scale(0.5)" fill="blue"/>
  <rect width="20" height="40" transform="translate(240, 55),
        rotate(45),scale(0.5)" fill="blue"/>
</svg>
dvpj 0415
Figure 4-15. Some SVG transforms: rotate(45), scale(0.5), scale(0.5), then rotate(45)
Note

The order in which transforms are applied is important. A rotation of 45 degrees clockwise folllowed by a translation along the x-axis will see the element moved southeasterly, whereas the reverse operation moves it to the left and then rotates it.

Working with Groups

Often when you are constructing a visualization, it’s helpful to group the visual elements. A couple of particular uses are:

  • When you require local coordinate schemes (e.g., if you have a text label for an icon and you want to specify its position relative to the icon, not the whole <svg> canvas).

  • If you want to apply a scaling and/or rotation transformation to a subset of the visual elements.

SVG has a group <g> tag for this, which you can think of as a mini canvas within the <svg> canvas. Groups can contain groups, allowing for very flexible geometric mappings.13

Example 4-5 groups shapes in the center of the canvas, producing Figure 4-16. Note that the position of circle, rect, and path elements is relative to the translated group.

Example 4-5. Grouping SVG shapes
<svg id="chart" width="300" height="150">
  <g id="shapes" transform="translate(150,75)">
    <circle cx="50" cy="0" r="25" fill="red" />
    <rect x="30" y="10" width="40" height="20" fill="blue" />
    <path d="M-20 -10L50 -10L10 60Z" fill="green" />
    <circle r="10" fill="yellow">
  </g>
</svg>
dvpj 0416
Figure 4-16. Grouping shapes with SVG <g> tag

If we now apply a transform to the group, all shapes within it will be affected. Figure 4-17 shows the result of scaling Figure 4-16 by a factor of 0.75 and then rotating it 90 degrees, which we achieve by adapting the transform attribute, like so:

<svg id="chart" width="300" height="150">
  <g id="shapes",
     transform = "translate(150,75),scale(0.5),rotate(90)">
     ...
</svg>
dvpj 0417
Figure 4-17. Transforming an SVG group

Layering and Transparency

The order in which the SVG elements are added to the DOM tree is important, with later elements taking precedence, layering over others. In Figure 4-16, for example, the triangle path obscures the red circle and blue rectangle and is in turn obscured by the yellow circle.

Manipulating the DOM ordering is an important part of JavaScripted dataviz (e.g., D3’s insert method allows you to place an SVG element before an existing one).

Element transparency can be manipulated using the alpha channel of rgba(R,G,B,A) colors or the more convenient opacity property. Both can be set using CSS. For overlaid elements, opacity is cumulative, as demonstrated by the color triangle in Figure 4-18, produced by the following SVG:

<style>
  #chart circle { opacity: 0.33 }
</style>

<svg id="chart" width="300" height="150">
  <g transform="translate(150, 75)">
    <circle cx="0" cy="-20" r="30" fill="red"/>
    <circle cx="17.3" cy="10" r="30" fill="green"/>
    <circle cx="-17.3" cy="10" r="30" fill="blue"/>
  </g>
</svg>
dvpj 0418
Figure 4-18. Manipulating opacity with SVG

The SVG elements demonstrated here were handcoded in HTML, but in data visualization work they are almost always added programmatically. Thus the basic D3 workflow is to add SVG elements to a visualization, using data files to specify their attributes and properties.

JavaScripted SVG

The fact that SVG graphics are described by DOM tags has a number of advantages over a black box such as the <canvas> context. For example, it allows nonprogrammers to create or adapt graphics and is a boon for debugging.

In web dataviz, pretty much all your SVG elements will be created with JavaScript, through a library such as D3. You can inspect the results of this scripting using the browser’s Elements tab (see “Chrome’s Developer Tools”), which is a great way to refine and debug your work (e.g., nailing an annoying visual glitch).

As a little taster for things to come, let’s use D3 to scatter a few red circles on an SVG canvas. The dimensions of the canvas and circles are contained in a data object sent to a chartCircles function.

We use a little HTML placeholder for the <svg> element:

<!DOCTYPE html>
<meta charset="utf-8">

<style>
  #chart circle {fill: red}
</style>

<body>
  <svg id="chart"></svg>

  <script src="http://d3js.org/d3.v3.min.js"></script>
  <script src="script.js"></script>
</body>

With our placeholder SVG chart element in place, a little D3 in the script.js file is used to turn some data into the scattered circles (see Figure 4-19):

// script.js

var chartCircles = function(data) {

    var chart = d3.select('#chart');
    // Set the chart height and width from data
    chart.attr('height', data.height).attr('width', data.width);
    // Create some circles using the data
    chart.selectAll('circle').data(data.circles)
        .enter()
        .append('circle')
        .attr('cx', function(d) { return d.x })
        .attr('cy', function(d) { return d.y })
        .attr('r', function(d) { return d.r });
};

var data = {
    width: 300, height: 150,
    circles: [
        {'x': 50, 'y': 30, 'r': 20},
        {'x': 70, 'y': 80, 'r': 10},
        {'x': 160, 'y': 60, 'r': 10},
        {'x': 200, 'y': 100, 'r': 5},
    ]
};

chartCircles(data);
dvpj 0419
Figure 4-19. D3-generated circles

We’ll see exactly how D3 works its magic in Chapter 16. For now, let’s summarize what we’ve learned in this chapter.

Summary

This chapter provided a basic set of modern web-development skills for the budding data visualizer. It showed how the various elements of a web page (HTML, CSS stylesheets, JavaScript, and media files) are delivered by HTTP and, on being received by the browser, combined to become the web page the user sees. We saw how content blocks are described, using HTML tags such as div and p, and then styled and positioned using CSS. We also covered Chrome’s Elements and Sources tabs, which are the key browser development tools. Finally we had a little primer in SVG, the language in which most modern web data visualizations are expressed. These skills will be extended when our toolchain reaches its D3 visualization and new ones will be introduced in context.

1 I bear the scars so you don’t have to.

2 The specification for modern JavaScript is defined by the Ecmascript lineage.

3 You can code style in HTML tags using the style attribute, but it’s generally bad practice. It’s better to use classes and ids defined in CSS.

4 As demonstrated by Mike Bostock, with a hat-tip to Paul Irish.

5 This is not the same as programmatically setting styles, which is a hugely powerful technique that allows styles to adapt to user interaction.

6 This is generally considered bad practice and is usually an indication of poorly structured CSS. Use with extreme caution, as it can make life very difficult for codevelopers.

7 These are succinctly discussed in Douglas Crockford’s famously short JavaScript: The Good Parts (O’Reilly).

8 Being able to play with attributes is particularly useful when trying to get Scalable Vector Graphics (SVG) to work.

9 With a canvas graphic context, you generally have to contrive your own event handling.

10 This number changes with time and the browser in question, but as a rough rule of thumb, SVG often starts to strain in the low thousands.

11 You should be able to use your browser’s development tools to see the tag attributes updating in real time.

12 Mike Bostock’s chord diagram is a nice example, and uses D3’s chord function.

13 For example, a body group can contain an arm group, which can contain a hand group, which can contain finger elements.

Get Data Visualization with Python and JavaScript now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.