O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Fundamentals of Data Visualization

Book Description

Effective visualization is the best way to communicate information from the increasingly large and complex datasets in natural and social sciences. But with the increasing power of visualization software today, scientists, engineers, and business analysts often have to navigate a bewildering array of visualization choices and options.

This practical book takes you through many commonly encountered visualization problems and pitfalls and provides simple and clear guidelines on how to turn large datasets into clear and compelling figures. What visualization type is best for the story you want to tell? How do you make informative figures that are visually pleasing? Author Claus O. Wilke teaches you the elements most critical to successful data visualization.

  • Explore the basic concepts of color use as a tool to highlight, distinguish, or represent a value
  • Understand the importance of redundant coding to ensure that you provide key information in multiple ways
  • Use our directory of visualizations: a graphical guide to the most commonly used types of data visualizations
  • Get extensive examples of good and bad figures; learn how to use figures in a document or report
  • Learn methods for visualizing amounts and proportions, paired data, trends, and time series
  • Visualize distributions with histograms and density plots, boxplots and violin plots, and ridgeline plots

Table of Contents

  1. Preface
    1. Thoughts on graphing software and figure-preparation pipelines
    2. Acknowledgments
  2. 1. Introduction
    1. Ugly, bad, and wrong figures
  3. I. From data to visualization
  4. 2. Visualizing data: Mapping data onto aesthetics
    1. Aesthetics and types of data
    2. Scales map data values onto aesthetics
  5. 3. Coordinate systems and axes
    1. Cartesian coordinates
    2. Nonlinear axes
    3. Coordinate systems with curved axes
  6. 4. Color scales
    1. Color as a tool to distinguish
    2. Color to represent data values
    3. Color as a tool to highlight
    4. References
  7. 5. Directory of visualizations
    1. Amounts
    2. Distributions
    3. Proportions
    4. x--y relationships
    5. Uncertainty
  8. 6. Visualizing amounts
    1. Bar plots
    2. Grouped and stacked bars
    3. Dot plots and heatmaps
  9. 7. Visualizing distributions: Histograms and density plots
    1. Visualizing a single distribution
    2. Visualizing multiple distributions at the same time
  10. 8. Visualizing distributions: Empirical cumulative distribution functions and q-q plots
    1. Empirical cumulative distribution functions
    2. Highly skewed distributions
    3. Quantile—quantile plots
    4. References
  11. 9. Visualizing many distributions at once
    1. Visualizing distributions along the vertical axis
    2. Visualizing distributions along the horizontal axis
    3. References
  12. 10. Visualizing proportions
    1. A case for pie charts
    2. A case for side-by-side bars
    3. A case for stacked bars and stacked densities
    4. Visualizing proportions separately as parts of the total
    5. References
  13. 11. Visualizing nested proportions
    1. Nested proportions gone wrong
    2. Mosaic plots and treemaps
    3. Nested pies
    4. Parallel sets
    5. References
  14. 12. Visualizing associations among two or more quantitative variables
    1. Scatter plots
    2. Correlograms
    3. Dimension reduction
    4. Paired data
  15. 13. Visualizing time series and other functions of an independent variable
    1. Individual time series
    2. Multiple time series and dose—response curves
    3. Time series of two or more response variables
    4. References
  16. II. Principles of figure design
  17. 14. The principle of proportional ink
    1. Visualizations along linear axes
    2. Visualizations along logarithmic axes
    3. Direct area visualizations
    4. References
  18. 15. Handling overlapping points
    1. Partial transparency and jittering
    2. 2d histograms
    3. Contour lines
    4. References
  19. 16. Common pitfalls of color use
    1. Encoding too much or irrelevant information
    2. Using non-monotonic color scales to encode data values
    3. Not designing for color-vision deficiency
    4. References
  20. 17. Redundant coding
    1. Designing legends with redundant coding
    2. Designing figures without legends
  21. 18. Multi-panel figures
    1. Small multiples
    2. Compound figures
    3. References
  22. 19. Titles, captions, and tables
    1. Figure titles and captions
    2. Axis and legend titles
    3. Tables
    4. References
  23. 20. Balance the data-to-ink ratio
    1. Finding the appropriate data-to-ink ratio
    2. Background grids
    3. Paired data
    4. Summary
    5. References
  24. 21. Your axis labels are too small
    1. References
  25. 22. Avoid line drawings
  26. 23. Don’t go 3D
    1. Avoid gratuitous 3D
    2. Avoid 3D position scales
    3. Appropriate use of 3D visualizations
    4. References
  27. III. Miscellaneous topics
  28. 24. Understanding the most commonly used image file formats
    1. Bitmap and vector graphics
    2. Lossless and lossy compression of bitmap graphics
    3. Converting between image formats
  29. 25. Choosing the right visualization software
    1. Reproducibility and repeatability
    2. Data exploration versus data presentation
    3. Separation of content and design
  30. 26. Telling a story and making a point
    1. What is a story?
    2. Make a figure for the generals
    3. Build up towards complex figures
    4. Be consistent but don’t be repetitive
    5. References
  31. 27. Annotated bibliography
    1. Thinking about data and visualization
    2. Programming books
    3. Statistics texts
    4. Historical texts
    5. Books on broadly related topics