book

Visualizing Data

Name: Visualizing Data
Author: Ben Fry
ISBN: 9780596514556

by Ben Fry

December 2007

Beginner to intermediate

382 pages

10h 29m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Preface
The Audience for This BookBackground InformationOverview of the BookSafari® Books OnlineAcknowledgmentsConventions Used in This BookUsing Code ExamplesWe’d Like to Hear from You
1. The Seven Stages of Visualizing Data
Why Data Display Requires PlanningToo Much InformationData CollectionThinking About DataData Never Stays the SameWhat Is the Question?A Combination of Many DisciplinesProcessAn ExampleWhat Is the Question?AcquireParseFilterMineRepresentRefineInteractIteration and CombinationPrinciplesEach Project Has Unique RequirementsAvoid the All-You-Can-Eat BuffetKnow Your AudienceOnward
2. Getting Started with Processing
Sketching with ProcessingHello WorldHello MouseExporting and Distributing Your WorkSaving Your WorkExamples and ReferenceMore About the size( ) MethodLoading and Displaying DataFunctionsLibraries Add New FeaturesSketching and ScriptingDon’t Start by Trying to Build a CathedralReady?
3. Mapping
Drawing a MapExplanation of the Processing CodeLocations on a MapData on a MapTwo-Sided Data RangesProvide More Information with a Mouse Rollover (Interact)Updating Values over Time (Acquire, Mine)Smooth Interpolation of Values over Time (Refine)Using Your Own DataTaking Data from the UserNext Steps
4. Time Series
Milk, Tea, and Coffee (Acquire and Parse)Cleaning the Table (Filter and Mine)A Simple Plot (Represent and Refine)Labeling the Current Data Set (Refine and Interact)Drawing Axis Labels (Refine)Year LabelsLabeling Volume on the Vertical AxisBringing It All Together and Titling Both AxesChoosing a Proper Representation (Represent and Refine)Using Rollovers to Highlight Points (Interact)Ways to Connect Points (Refine)Showing Data As an AreaFurther Refinements and Erasing ElementsDiscrete Values with a Bar Chart (Represent)Text Labels As Tabbed Panes (Interact)Adding the Necessary VariablesDrawing Tabs Instead of a Single TitleHandling Mouse InputBetter Tab Images (Refine)Interpolation Between Data Sets (Interact)End of the Series
5. Connections and Correlations
Changing Data SourcesProblem StatementPreprocessingRetrieving Win/Loss Data (Acquire)Data source for baseball statisticsUnpacking the Win/Loss files (Mine and Filter)Introducing regular expressionsRetrieving Team Logos (Acquire, Refine)Retrieving Salary Data (Acquire, Parse, Filter)Using the Preprocessed Data (Acquire, Parse, Filter, Mine)Team Names and CodesTeam SalariesWin-Loss StandingsTeam LogosFinishing the SetupDisplaying the Results (Represent)Returning to the Question (Refine)Highlighting the LinesA Better Typeface for Numeric DataA Word About TypographySophisticated Sorting: Using Salary As a Tiebreaker (Mine)Moving to Multiple Days (Interact)Drawing the DatesLoad Standings for the Entire SeasonSwitching Between DatesChecking Our ProgressSmoothing Out the Interaction (Refine)Deployment Considerations (Acquire, Parse, Filter)
6. Scatterplot Maps
PreprocessingData from the U.S. Census Bureau (Acquire)Dealing with the Zip Code Database File (Parse and Filter)Building the PreprocessorWhat about a binary data file or a database?Loading the Data (Acquire and Parse)Drawing a Scatterplot of Zip Codes (Mine and Represent)Highlighting Points While Typing (Refine and Interact)Show the Currently Selected Point (Refine)Progressively Dimming and Brightening Points (Refine)Zooming In (Interact)Changing How Points Are Drawn When Zooming (Refine)Deployment Issues (Acquire and Refine)Next Steps
7. Trees, Hierarchies, and Recursion
Using Recursion to Build a Directory TreeCaveats When Dealing with Files (Filter)Recursively Printing Tree Contents (Represent)Using a Queue to Load Asynchronously (Interact)Showing Progress (Represent)An Introduction to TreemapsA Simple Treemap LibraryA Simple Treemap ExampleWhich Files Are Using the Most Space?Reading the Directory Structure (Acquire, Parse, Filter, Mine, Represent)Viewing Folder Contents (Interact)Improving the Treemap Display (Refine)Maintaining Context (Refine)Making Colors More Useful (Mine, Refine)Flying Through Files (Interact)Updating FileItem for zoomUpdating FolderItemAdding a Folder Selection Dialog (Interact)Next Steps
8. Networks and Graphs
Simple Graph DemoPorting from Java to ProcessingInteracting with NodesA More Complicated GraphUsing Text As Input (Acquire)Reading a Book (Parse)Removing Stop Words (Filter)Smarter Addition of Nodes and Edges (Mine)Viewing the Book (Represent and Refine)Saving an Image in a Vector FormatChecking Our WorkApproaching Network ProblemsAdvanced Graph ExampleGetting Started with Java IDEsStep-by-step instructions if you’re new to EclipseObtaining a Web Server Logfile (Acquire)Reading Apache Logfiles (Parse)A Look at the Other Source FilesMoving from Processing to JavaHelpful additions in Java 1.5 (J2SE 5.0) and laterReading and Cleaning the Data (Acquire, Parse, Filter)Filtering site addresses and aliasesFiltering for useful page informationBringing It All Together (Mine and Represent)Mining unused nodes: Maintaining performance and readabilityDepicting Branches and Nodes (Represent and Refine)Playing with Data (Interact)Drawing Node Names (Represent and Refine)Drawing Visitor Paths (Represent and Refine)Mining Additional Information
9. Acquiring Data
Where to Find DataData Acquisition EthicsTools for Acquiring Data from the InternetWget and cURLNcFTP and LinksLocating Files for Use with ProcessingThe Data FolderUniform Resource Locator (URL)Absolute Path to a Local FileSpecifying Output LocationsLoading Text DataFiles Too Large for loadStrings( )Reading Files ProgressivelyReading Files Asynchronously with a ThreadParsing Large Files As They Are AcquiredDealing with Files and FoldersUsing the Java File Object to Locate FilesListing Files in a Folder Listing files with a filter classSorting file listsHandling Numbered File SequencesAsynchronous Image DownloadsUsing openStream( ) As a Bridge to JavaDealing with Byte ArraysAdvanced Web TechniquesHandling Web FormsPretending to Be a Web BrowserUsing a DatabaseGetting Started with MySQLUsing MySQL with ProcessingOther Database OptionsPerformance Aspects of Databases in Interactive ApplicationsDealing with a Large Number of Files

10. Parsing Data
Levels of EffortTools for Gathering CluesText Is BestTab-Separated Values (TSV)Comma-Separated Values (CSV)Text with Fixed Column WidthsText Markup LanguagesHyperText Markup Language (HTML)Embedding Tidy into a sketchIs a parser necessary?Using Swing’s built-in HTML parserParsing and manipulating tables from HTML filesOther HTML parser librariesWriting a custom HTML parserExtensible Markup Language (XML)Cleaning up XMLExample: Using the Processing XML library to read geocoding dataOther methods for parsing XMLJavaScript Object Notation (JSON)Regular Expressions (regexps)Grammars and BNF NotationCompressed DataGZIP Streams (GZ)PKZip files (ZIP)Other compression formatsVectors and GeometryScalable Vector Graphics (SVG)OBJ and AutoCAD DXFPostScript (PS) and Portable Document Format (PDF)Shapefile and Well-Known TextBinary Data FormatsExcel Spreadsheets (XLS)dBASE/xBase (DBF)Arbitrary Binary FormatsBit ShiftingDataInputStreamAdvanced Detective WorkWatching Network Traffic
11. Integrating Processing with Java
Programming ModesBasicContinuousJavaAdditional Source Files (Tabs)Using .java Source FilesThe PreprocessorAPI StructureEvent HandlingThe size( ) MethodThe main( ) MethodThe frame ObjectEmbedding PApplet into Java ApplicationsTwo Models for Updating the ScreenEmbedding in a Swing ApplicationUsing Java Code in a Processing SketchUsing the Code Folder to Add .jar Files to a SketchPackaging Code into LibrariesUsing LibrariesBuilding with the Source for processing.core
Bibliography
Index
About the Author
Colophon
Copyright

Content preview from Visualizing Data

Chapter 10. Parsing Data

Parsing converts a raw stream of data into a structure that can be manipulated in software. Lots of parsing is detective work, requiring you to spend time looking at files or data streams to figure out what’s inside. The data might be available in an easily parsed format (such as an RSS feed in XML format) or in a proprietary binary format. This chapter covers some of the methods used to store data, methods for reading common data formats, and some detective procedures for dissecting data. Even if your particular data format is not covered in this chapter, the methods discussed are applicable to any data source.

Parsing may also seem to be quite disconnected from the actual process of data visualization. However, it’s part of the process for a reason: chances are, you’ll have to obtain data from a source that’s not under your control and will spend a lot of time figuring out how to use the data that you’re given. This chapter aims to give you a sense of how files are typically structured because more likely than not, the data you acquire will be poorly documented (if it’s documented at all). Being able to recognize the basic file format, or even whether the data is compressed, are valuable clues to unpacking unknown information.

Generally, data boils down to lists (one-dimensional sets), matrices (two-dimensional tables, such as a spreadsheet), or trees and graphs (individual “nodes” of data and sets of “edges” that describe connections between them). Strictly ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9780596514556Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Visualizing Data

by Ben Fry

Chapter 10. Parsing Data

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.