Chapter 9. Visualization

One of the best ways to communicate the meaning of data is by extracting the important parts and presenting them graphically. This is helpful both for internal use, as an exploration technique to spot patterns that aren’t obvious from the raw values, and as a way to succinctly present end users with understandable results. As the Web has turned graphs from static images to interactive objects, the lines between presentation and exploration have blurred. The possibilities of the new medium have led to some of the fantastic new tools I cover in this section.

Gephi is an open source Java application that creates network visualizations from raw edge and node graph data. It’s very useful for understanding social network information; one of the project’s founders was hired by LinkedIn, and Gephi is now used for LinkedIn visualizations. There are several different layout algorithms, each with multiple parameters you can tweak to arrange the positions of the nodes in your data. If there are any manual changes you want to make, to either the input data or the positioning, you can do that through the data laboratory, and once you’ve got your basic graph laid out, the preview tab lets you customize the exact appearance of the rendered result. Though Gephi is best known for its window interface, you can also script a lot of its functions from automated backend tools, using its toolkit library.

GraphViz is a command-line network graph visualization tool. It’s ...

Get Big Data Glossary now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.