Data Visualization with Python and JavaScript, 2nd Edition

Book description

How do you turn raw, unprocessed, or malformed data into dynamic, interactive web visualizations? In this practical book, author Kyran Dale shows data scientists and analysts--as well as Python and JavaScript developers--how to create the ideal toolchain for the job. By providing engaging examples and stressing hard-earned best practices, this guide teaches you how to leverage the power of best-of-breed Python and JavaScript libraries.

Python provides accessible, powerful, and mature libraries for scraping, cleaning, and processing data. And while JavaScript is the best language when it comes to programming web visualizations, its data processing abilities can't compare with Python's. Together, these two languages are a perfect complement for creating a modern web-visualization toolchain. This book gets you started.

You'll learn how to:

  • Obtain data you need programmatically, using scraping tools or web APIs: Requests, Scrapy, Beautiful Soup
  • Clean and process data using Python's heavyweight data processing libraries within the NumPy ecosystem: Jupyter notebooks with pandas+Matplotlib+Seaborn
  • Deliver the data to a browser with static files or by using Flask, the lightweight Python server, and a RESTful API
  • Pick up enough web development skills (HTML, CSS, JS) to get your visualized data on the web
  • Use the data you've mined and refined to create web charts and visualizations with Plotly, D3, Leaflet, and other libraries

Publisher resources

View/Submit Errata

Table of contents

  1. Preface
    1. Part I: Basic Toolkit
    2. Part II: Getting Your Data
    3. Part III: Cleaning and Exploring Data with pandas
    4. Part IV: Delivering the Data
    5. Part V: Visualizing Your Data with D3 and Plotly
    6. The Second Edition
    7. Conventions Used in This Book
    8. Using Code Examples
    9. O’Reilly Online Learning
    10. How to Contact Us
    11. Acknowledgments
      1. Second Edition
  2. Introduction
    1. Who This Book Is For
      1. Minimal Requirements to Use This Book
    2. Why Python and JavaScript?
      1. Why Not Python in the Browser?
      2. Why Python for Data Processing
      3. Python’s Getting Better All the Time
    3. What You’ll Learn
      1. The Choice of Libraries
      2. Preliminaries
    4. The Dataviz Toolchain
      1. 1. Scraping Data with Scrapy
      2. 2. Cleaning Data with pandas
      3. 3. Exploring Data with pandas and Matplotlib
      4. 4. Delivering Your Data with Flask
      5. 5. Transforming Data into Interactive Visualizations with Plotly and D3
      6. Smaller Libraries
    5. Using the Book
    6. A Little Bit of Context
    7. Summary
    8. Recommended Books
  3. I. Basic Toolkit
  4. 1. Development Setup
    1. The Accompanying Code
    2. Python
      1. Anaconda
      2. Installing Extra Libraries
      3. Virtual Environments
    3. JavaScript
      1. Content Delivery Networks
      2. Installing Libraries Locally
    4. Databases
      1. Getting MongoDB Up and Running
      2. Easy MongoDB with Docker
    5. Integrated Development Environments
    6. Summary
  5. 2. A Language-Learning Bridge Between Python and JavaScript
    1. Similarities and Differences
    2. Interacting with the Code
      1. Python
      2. JavaScript
    3. Basic Bridge Work
      1. Style Guidelines, PEP 8, and use strict
      2. CamelCase Versus Underscore
      3. Importing Modules, Including Scripts
      4. JavaScript Modules
      5. Keeping Your Namespaces Clean
      6. Outputting “Hello World!”
      7. Simple Data Processing
      8. String Construction
      9. Significant Whitespace Versus Curly Brackets
      10. Comments and Doc-Strings
      11. Declaring Variables Using let or var
      12. Strings and Numbers
      13. Booleans
      14. Data Containers: dicts, objects, lists, Arrays
      15. Functions
      16. Iterating: for Loops and Functional Alternatives
      17. Conditionals: if, else, elif, switch
      18. File Input and Output
      19. Classes and Prototypes
    4. Differences in Practice
      1. Method Chaining
      2. Enumerating a List
      3. Tuple Unpacking
      4. Collections
      5. Underscore
      6. Functional Array Methods and List Comprehensions
      7. Map, Reduce, and Filter with Python’s Lambdas
      8. JavaScript Closures and the Module Pattern
    5. A Cheat Sheet
    6. Summary
  6. 3. Reading and Writing Data with Python
    1. Easy Does It
    2. Passing Data Around
    3. Working with System Files
    4. CSV, TSV, and Row-Column Data Formats
    5. JSON
      1. Dealing with Dates and Times
    6. SQL
      1. Creating the Database Engine
      2. Defining the Database Tables
      3. Adding Instances with a Session
      4. Querying the Database
      5. Easier SQL with Dataset
    7. MongoDB
    8. Dealing with Dates, Times, and Complex Data
    9. Summary
  7. 4. Webdev 101
    1. The Big Picture
    2. Single-Page Apps
    3. Tooling Up
      1. The Myth of IDEs, Frameworks, and Tools
      2. A Text-Editing Workhorse
      3. Browser with Development Tools
      4. Terminal or Command Prompt
    4. Building a Web Page
      1. Serving Pages with HTTP
      2. The DOM
      3. The HTML Skeleton
      4. Marking Up Content
      5. CSS
      6. JavaScript
      7. Data
    5. Chrome DevTools
      1. The Elements Tab
      2. The Sources Tab
      3. Other Tools
    6. A Basic Page with Placeholders
    7. Positioning and Sizing Containers with Flex
      1. Filling the Placeholders with Content
    8. Scalable Vector Graphics
      1. The <g> Element
      2. Circles
      3. Applying CSS Styles
      4. Lines, Rectangles, and Polygons
      5. Text
      6. Paths
      7. Scaling and Rotating
      8. Working with Groups
      9. Layering and Transparency
      10. JavaScripted SVG
    9. Summary
  8. II. Getting Your Data
  9. 5. Getting Data Off the Web with Python
    1. Getting Web Data with the Requests Library
    2. Getting Data Files with Requests
    3. Using Python to Consume Data from a Web API
      1. Consuming a RESTful Web API with Requests
      2. Getting Country Data for the Nobel Dataviz
    4. Using Libraries to Access Web APIs
      1. Using Google Spreadsheets
      2. Using the Twitter API with Tweepy
    5. Scraping Data
      1. Why We Need to Scrape
      2. Beautiful Soup and lxml
      3. A First Scraping Foray
    6. Getting the Soup
    7. Selecting Tags
      1. Crafting Selection Patterns
      2. Caching the Web Pages
      3. Scraping the Winners’ Nationalities
    8. Summary
  10. 6. Heavyweight Scraping with Scrapy
    1. Setting Up Scrapy
    2. Establishing the Targets
    3. Targeting HTML with Xpaths
      1. Testing Xpaths with the Scrapy Shell
      2. Selecting with Relative Xpaths
    4. A First Scrapy Spider
    5. Scraping the Individual Biography Pages
    6. Chaining Requests and Yielding Data
      1. Caching Pages
      2. Yielding Requests
    7. Scrapy Pipelines
    8. Scraping Text and Images with a Pipeline
      1. Specifying Pipelines with Multiple Spiders
    9. Summary
  11. III. Cleaning and Exploring Data with pandas
  12. 7. Introduction to NumPy
    1. The NumPy Array
      1. Creating Arrays
      2. Array Indexing and Slicing
      3. A Few Basic Operations
    2. Creating Array Functions
      1. Calculating a Moving Average
    3. Summary
  13. 8. Introduction to pandas
    1. Why pandas Is Tailor-Made for Dataviz
    2. Why pandas Was Developed
    3. Categorizing Data and Measurements
    4. The DataFrame
      1. Indices
      2. Rows and Columns
      3. Selecting Groups
    5. Creating and Saving DataFrames
      1. JSON
      2. CSV
      3. Excel Files
      4. SQL
      5. MongoDB
    6. Series into DataFrames
    7. Summary
  14. 9. Cleaning Data with pandas
    1. Coming Clean About Dirty Data
    2. Inspecting the Data
    3. Indices and pandas Data Selection
      1. Selecting Multiple Rows
    4. Cleaning the Data
      1. Finding Mixed Types
      2. Replacing Strings
      3. Removing Rows
      4. Finding Duplicates
      5. Sorting Data
      6. Removing Duplicates
      7. Dealing with Missing Fields
      8. Dealing with Times and Dates
    5. The Full clean_data Function
    6. Adding the born_in column
      1. Merging DataFrames
    7. Saving the Cleaned Datasets
    8. Summary
  15. 10. Visualizing Data with Matplotlib
    1. pyplot and Object-Oriented Matplotlib
    2. Starting an Interactive Session
    3. Interactive Plotting with pyplot’s Global State
      1. Configuring Matplotlib
      2. Setting the Figure’s Size
      3. Points, Not Pixels
      4. Labels and Legends
      5. Titles and Axes Labels
      6. Saving Your Charts
    4. Figures and Object-Oriented Matplotlib
      1. Axes and Subplots
    5. Plot Types
      1. Bar Charts
      2. Scatter Plots
    6. seaborn
      1. FacetGrids
      2. PairGrids
    7. Summary
  16. 11. Exploring Data with pandas
    1. Starting to Explore
    2. Plotting with pandas
    3. Gender Disparities
      1. Unstacking Groups
      2. Historical Trends
    4. National Trends
      1. Prize Winners Per Capita
      2. Prizes by Category
      3. Historical Trends in Prize Distribution
    5. Age and Life Expectancy of Winners
      1. Age at Time of Award
      2. Life Expectancy of Winners
      3. Increasing Life Expectancies over Time
    6. The Nobel Diaspora
    7. Summary
  17. IV. Delivering the Data
  18. 12. Delivering the Data
    1. Serving the Data
      1. Organizing Your Flask Files
      2. Serving Data with Flask
    2. Delivering Data Files
    3. Dynamic Data with Flask APIs
      1. A Simple Data API with Flask
    4. Using Static or Dynamic Delivery
    5. Summary
  19. 13. RESTful Data with Flask
    1. The Tools for a RESTful Job
    2. Creating the Database
    3. A Flask RESTful Data Server
      1. Serializing with marshmallow
    4. Adding our RESTful API Routes
      1. Posting Data to the API
    5. Extending the API with MethodViews
    6. Paginating the Data Returns
    7. Deploying the API Remotely with Heroku
      1. CORS
      2. Consuming the API Using JavaScript
    8. Summary
  20. V. Visualizing Your Data with D3 and Plotly
  21. 14. Bringing Your Charts to the Web with Matplotlib and Plotly
    1. Static Charts with Matplotlib
      1. Adapting to Screen Sizes
      2. Using Remote Images or Assets
    2. Charting with Plotly
      1. Basic Charts
      2. Plotly Express
      3. Plotly Graph-Objects
      4. Mapping with Plotly
      5. Adding Custom Controls with Plotly
    3. From Notebook to Web with Plotly
    4. Native JavaScript Charts with Plotly
      1. Fetching JSON Files
    5. User-Driven Plotly with JavaScript and HTML
    6. Summary
  22. 15. Imagining a Nobel Visualization
    1. Who Is It For?
    2. Choosing Visual Elements
    3. Menu Bar
    4. Prizes by Year
    5. A Map Showing Selected Nobel Countries
    6. A Bar Chart Showing Number of Winners by Country
    7. A List of the Selected Winners
      1. A Mini-Biography Box with Picture
    8. The Complete Visualization
    9. Summary
  23. 16. Building a Visualization
    1. Preliminaries
      1. Core Components
      2. Organizing Your Files
      3. Serving the Data
    2. The HTML Skeleton
    3. CSS Styling
    4. The JavaScript Engine
      1. Importing the Scripts
      2. Modular JS with Imports
      3. Basic Data Flow
      4. The Core Code
      5. Initializing the Nobel Prize Visualization
      6. Ready to Go
      7. Data-Driven Updates
      8. Filtering Data with Crossfilter
    5. Running the Nobel Prize Visualization App
    6. Summary
  24. 17. Introducing D3—​The Story of a Bar Chart
    1. Framing the Problem
    2. Working with Selections
    3. Adding DOM Elements
    4. Leveraging D3
    5. Measuring Up with D3’s Scales
      1. Quantitative Scales
      2. Ordinal Scales
    6. Unleashing the Power of D3 with Data Binding/Joining
    7. Updating the DOM with Data
    8. Putting the Bar Chart Together
    9. Axes and Labels
    10. Transitions
      1. Updating the Bar Chart
    11. Summary
  25. 18. Visualizing Individual Prizes
    1. Building the Framework
    2. Scales
    3. Axes
    4. Category Labels
    5. Nesting the Data
    6. Adding the Winners with a Nested Data-Join
    7. A Little Transitional Sparkle
      1. Updating the Bar Chart
    8. Summary
  26. 19. Mapping with D3
    1. Available Maps
    2. D3’s Mapping Data Formats
      1. GeoJSON
      2. TopoJSON
      3. Converting Maps to TopoJSON
    3. D3 Geo, Projections, and Paths
      1. Projections
      2. Paths
      3. graticules
    4. Putting the Elements Together
    5. Updating the Map
    6. Adding Value Indicators
    7. Our Completed Map
    8. Building a Simple Tooltip
      1. Updating the Map
    9. Summary
  27. 20. Visualizing Individual Winners
    1. Building the List
    2. Building the Bio-Box
      1. Updating the Winners List
    3. Summary
  28. 21. The Menu Bar
    1. Creating HTML Elements with D3
    2. Building the Menu Bar
      1. Building the Category Selector
      2. Adding the Gender Selector
      3. Adding the Country Selector
      4. Wiring Up the Metric Radio Button
    3. Summary
  29. 22. Conclusion
    1. Recap
      1. Part I: Basic Toolkit
      2. Part II: Getting Your Data
      3. Part III: Cleaning and Exploring Data with pandas
      4. Part IV: Delivering the Data
      5. Part V: Visualizing Your Data with D3 and Plotly
    2. Future Progress
      1. Visualizing Social Media Networks
      2. Machine-Learning Visualizations
    3. Final Thoughts
  30. A. D3’s enter/exit Pattern
    1. The enter Method
    2. Accessing the Bound Data
  31. Index
  32. About the Author

Product information

  • Title: Data Visualization with Python and JavaScript, 2nd Edition
  • Author(s): Kyran Dale
  • Release date: December 2022
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9781098111878