book

Hands-On Data Visualization

by Jack Dougherty, Ilya Ilyankou

April 2021

Beginner to intermediate

471 pages

13h 16m

English

O'Reilly Media, Inc.

Book available

Read now

Unlock full access

Audience and OverviewAdvice for Hands-On LearningChapter OutlineConventions Used in This BookO’Reilly Online LearningHow to Contact UsAcknowledgments
Why Data Visualization?What Can You Believe?Some Pictures Are More PersuasiveDifferent Shades of the TruthOrganization of the Book
Start Sketching Your Data StoryTen Factors When Considering Tools1. Easy to Learn2. Free or Affordable3. Powerful4. Supported5. Portable6. Secure and Private7. Collaborative8. Cross-Platform9. Open Source10. Accessible for Visually Impaired ReadersOur Recommended ToolsUse a Password Manager
Select Your Spreadsheet ToolsDownload to CSV or ODS FormatMake a Copy of a Google SheetShare Your Google SheetsUpload and Convert to Google SheetsGeocode Addresses in Google SheetsCollect Data with Google FormsSort and Filter DataCalculate with FormulasSummarize Data with Pivot TablesMatch Columns with VLOOKUPSpreadsheet Versus Relational Database
Guiding Questions for Your SearchPublic and Private DataMask or Aggregate Sensitive DataOpen Data RepositoriesSource Your DataRecognize Bad DataQuestion Your Data
Smart Cleanup with Google SheetsFind and Replace with BlankTranspose Rows and ColumnsSplit Data into Separate ColumnsExample 1: Simple SplittingExample 2: Complex SplittingCombine Data into One ColumnExtract Tables from PDFs with TabulaClean Data with OpenRefineSet Up OpenRefineLoad Data and Start a New ProjectConvert Dollar Amounts from Text to NumbersCluster Similar Spellings
Precisely Describe ComparisonsNormalize Your DataBeware of Biased Comparisons
Chart Design PrinciplesDeconstruct a ChartSome Rules Are More Important Than OthersChart AestheticsGoogle Sheets ChartsBar and Column ChartsGrouped Bar and Column ChartsSplit Bar and Column ChartsStacked Bar and Column ChartsHistogramsQuick Histograms with Google Sheets Column StatsRegular Histograms with Google Sheets ChartsPie, Line, and Area ChartsPie ChartsLine ChartsStacked Area ChartsDatawrapper ChartsAnnotated ChartsRange ChartsScatter and Bubble ChartsScatter Charts with Google SheetsBubble ChartsTableau Public ChartsScatter Charts with Tableau PublicInstall Tableau Public and Connect DataCreate Scatter Chart in the WorksheetAdd Title and Caption, and PublishFiltered Line ChartConnect Data to Tableau PublicBuild and Publish a Filtered Line Chart

Map Design PrinciplesDeconstructing a MapClarify Point-Versus-Polygon DataMap One Variable, Not TwoChoose Smaller Geographies for Choropleth MapsDesign Choropleth Colors and IntervalsChoose Choropleth Palettes to Match Your DataChoose Color Intervals to Group Choropleth Map DataNormalize Choropleth Map DataPoint Map with Google My MapsSymbol Point Map with DatawrapperChoropleth Map with DatawrapperChoropleth Map with Tableau PublicCurrent Map with Socrata Open Data
Table Design PrinciplesDatawrapper Table with SparklinesOther Table-Making Tools
Static Image Versus Interactive iframeGet the Embed Code or iframe TagFrom Google SheetsFrom DatawrapperFrom Tableau PublicPaste Code or iframe to a WebsiteTo WordPress.com SitesTo Self-Hosted WordPress SitesFor Squarespace, Wix, Weebly, or Other Web-Building Sites
Copy, Edit, and Host a Simple Leaflet Map TemplateConvert GitHub Pages Link to iframeCreate a New Repo and Upload Files on GitHubGitHub Desktop and Atom Text Editor to Code Efficiently
Bar or Column Chart with Chart.jsError Bars with Chart.jsLine Chart with Chart.jsAnnotated Line Chart with HighchartsScatter Chart with Chart.jsBubble Chart with Chart.js
Leaflet Maps with Google SheetsTutorial Requirements and OverviewLeaflet Storymaps with Google SheetsTutorial Requirements and OverviewGet Your Google Sheets API KeyLeaflet Maps with CSV DataLeaflet Heatmap Points with CSV DataLeaflet Searchable Point MapStep 1: Prepare Your DataStep 2: Download and Edit This TemplateStep 3: Publish Your MapLeaflet Maps with Open Data APIs
Geospatial Data and GeoJSONGeoJSONShapefilesGPS Exchange FormatKeyhole Markup LanguageMapInfo TABFind GeoJSON Boundary FilesDraw and Edit with GeoJson.ioConvert KML, GPX, and Other Formats into GeoJSONCreate GeoJSON from a CSV FileCreate New GeoJSON Data with Drawing ToolsEdit and Join with MapshaperImport, Convert, and Export Map Boundary FilesEdit Data for Specific PolygonsRename Data FieldsRemove Unwanted Data FieldsSimplify Map Boundaries to Reduce File SizeDissolve Internal Polygons to Create an Outline MapClip a Map to Match an Outline LayerJoin Spreadsheet Data With Polygon MapCount Points in Polygons with MapshaperMore About JoinsMerge Selected Polygons with Join and Dissolve CommandsConvert Compressed KMZ to KMLGeoreference with Map WarperBulk Geocode with US CensusPivot Points into Polygon Data
How to Lie with ChartsExaggerate Change in ChartsDiminish Change in ChartsHow to Lie with MapsExamine Data and Upload to DatawrapperModify the Map Color RangesRecognize and Reduce Data BiasRecognize and Reduce Spatial Bias
Build a Narrative on a StoryboardDraw Attention to MeaningAcknowledge Sources and UncertaintyDecide on Your Data Story Format
Tool or Platform ProblemsTry a Different BrowserDiagnose with Developer ToolsMac or Chromebook ProblemsWatch Out for Bad DataCommon iframe ErrorsFix Your Code on GitHub

Content preview from Hands-On Data Visualization

Chapter 4. Clean Up Messy Data

More often than not, datasets will be messy and hard to visualize right away. They will have missing values, dates in different formats, text in numeric-only columns, multiple items in the same columns, various spellings of the same name, and other unexpected things. See Figure 4-1 for inspiration. Don’t be surprised if you find yourself spending more time cleaning up data than you do analyzing and visualizing it.

In this chapter, you’ll learn about different tools to help you make decisions about which one to use to clean up your data efficiently. We’ll start with basic cleanup methods using Google Sheets in “Smart Cleanup with Google Sheets”, “Find and Replace with Blank”, “Transpose Rows and Columns”, “Split Data into Separate Columns”, and “Combine Data into One Column”. While we feature Google Sheets in our examples, many of these principles (and in some cases, the same formulas) apply to Microsoft Excel, LibreOffice Calc, Mac’s Numbers, or other spreadsheet packages. Next, you’ll learn how to extract table data from text-based PDF documents with Tabula, a free tool used by data journalists and researchers worldwide to analyze spending data, health reports, and all sorts of other datasets that get trapped in PDFs (see “Extract Tables from PDFs with Tabula”). Finally, we will introduce ...