O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Preparing Data for Analysis with JMP

Book Description

Access and clean up data easily using JMP®!

Data acquisition and preparation commonly consume approximately 75% of the effort and time of total data analysis. JMP provides many visual, intuitive, and even innovative data-preparation capabilities that enable you to make the most of your organization's data.

Preparing Data for Analysis with JMP® is organized within a framework of statistical investigations and model-building and illustrates the new data-handling features in JMP, such as the Query Builder. Useful to students and programmers with little or no JMP experience, or those looking to learn the new data-management features and techniques, it uses a practical approach to getting started with plenty of examples. Using step-by-step demonstrations and screenshots, this book walks you through the most commonly used data-management techniques that also include lots of tips on how to avoid common problems.

With this book, you will learn how to:

  • Manage database operations using the JMP Query Builder
  • Get data into JMP from other formats, such as Excel, csv, SAS, HTML, JSON, and the web
  • Identify and avoid problems with the help of JMP’s visual and automated data-exploration tools
  • Consolidate data from multiple sources with Query Builder for tables
  • Deal with common issues and repairs that include the following tasks:
    • reshaping tables (stack/unstack)
    • managing missing data with techniques such as imputation and Principal Components Analysis
    • cleaning and correcting dirty data
    • computing new variables
    • transforming variables for modelling
    • reconciling time and date
  • Subset and filter your data
  • Save data tables for exchange with other platforms

Table of Contents

  1. About This Book
  2. About The Author
  3. Chapter 1: Data Management in the Analytics Process
  4. Introduction
  5. A Continuous Process
  6. Asking Questions That Data Can Help to Answer
  7. Sourcing Relevant Data
  8. Reproducibility
  9. Combining and Reconciling Multiple Sources
  10. Identifying and Addressing Data Issues
  11. Data Requirements Shaped by Modeling Strategies
  12. Plan of the Book
  13. Conclusion
  14. References
  15. Chapter 2: Data Management Foundations
  16. Introduction
  17. Matching Form to Function
  18. JMP Data Tables
  19. Data Types and Modeling Types
    1. Data Types
    2. Modeling Types
  20. Basics of Relational Databases
  21. Conclusion
  22. References
  23. Chapter 3: Sources of Data and Their Challenges
  24. Introduction
  25. Internal Data in Flat Files
  26. Relational Databases
  27. External Data on the World Wide Web
    1. User-Facing Query Interfaces
    2. Tabular Data Pages
    3. Evolving WWW Data Standards
  28. Ethical and Legal Considerations
  29. Conclusion
  30. References
  31. Chapter 4: Single Files
  32. Introduction
  33. Review of JMP File Types
  34. Common Formats Other than JMP
    1. MS Excel
    2. Text Files
    3. SAS Files
  35. Other Data File Formats
  36. Conclusion
  37. References
  38. Chapter 5: Database Queries
  39. Introduction
  40. Sample Databases in This Chapter
  41. Connecting to a Database
  42. Extracting Data from One Table in a Database
    1. Import an Entire Table
    2. Import a Subset of a Table
  43. Querying a Database from JMP
    1. Query Builder
    2. An Illustrative Scenario: Bicycle Parts
    3. Designing a Query with Query Builder
  44. Query Builder for SAS Server Data
  45. Conclusion
  46. References
  47. Chapter 6: Importing Data from Websites
  48. Introduction
  49. Variety of Web Formats
  50. Internet Open
  51. Common Issues to Anticipate
  52. Conclusion
  53. References
  54. Chapter 7: Reshaping a Data Table
  55. Introduction
  56. What Shape Is a Data Table?
    1. Wide versus Long Format
  57. Reasons for Wide and Long Formats
  58. Stacking Wide Data
  59. Unstacking Narrow Data
  60. Additional Examples
    1. Stacking Wide Data
    2. Scripting for Reproducibility
    3. Splitting Long Data
    4. Transposing Rows and Columns
  61. Reshaping the WDI Data
  62. Conclusion
  63. References
  64. Chapter 8: Joining, Subsetting, and Filtering
  65. Introduction
  66. Combining Data from Multiple Tables with Join
  67. Saving Memory with a Virtual Join
  68. Why and How to Select a Subset
    1. A Brief Detour: Creating a New Column from an Existing Column
  69. Row Filters: Global and Local
    1. Global Filter
    2. Local Filter
    3. A More Durable Subset
  70. Combining Rows with Concatenate
  71. Query Builder for Tables
    1. Back to the Movies
    2. Olympic Medals and Development Indicators
  72. Conclusion
  73. References
  74. Chapter 9: Data Exploration: Visual and Automated Tools to Detect Problems
  75. Introduction
  76. Common Issues to Anticipate
  77. On the Hunt for Dirty Data
  78. Distribution
  79. Columns Viewer
  80. Multivariate (Correlations and Scatterplot Matrix)
    1. More Tools within the Multivariate Platform
    2. Principal Components
    3. Outlier Analysis
    4. Item Reliability
  81. Explore Outliers
    1. Quantile Range Outliers
    2. Robust Fit Outliers
    3. Multivariate Robust Outliers
    4. Multivariate k-Nearest Neighbors Outliers
  82. Explore Missing
  83. Conclusion
  84. References
  85. Chapter 10: Missing Data Strategies
  86. Introduction
  87. Much Ado about Nothing?
  88. Four Basic Approaches
  89. Working with Complete Cases
  90. Analysis with Sampling Weights
  91. Imputation-based Methods
    1. Recode
    2. Informative Missing
    3. Multivariate Normal Imputation
    4. Multivariate SVD Imputation
    5. Special Considerations for Time Series
  92. Conclusion and a Note of Caution
  93. References
  94. Chapter 11: Data Preparation for Analysis
  95. Introduction
  96. Common Issues and Appropriate Strategies
  97. Distribution of Observations
    1. Noisy Data
    2. Skewness or Outliers
    3. Scale Differences among Model Variables
    4. Too Many Levels of a Categorical Variable
  98. High Dimensionality: Abundance of Columns
    1. Correlated or Redundant Variables
    2. Missing or Sparse Observations across Columns
    3. A PCA Example
  99. Abundance of Rows
    1. Partitioning into Training, Validation, and Test Sets
    2. Aggregating Rows with Summary Tables
    3. Oversampling Rare Events
  100. Date and Time-Related Issues
    1. Formatting Dates and Times
    2. Some Date Functions: Extracting Parts
    3. Aggregation
    4. Row Functions Especially Useful in Time-Ordered Data
    5. Elapsed Time and Date Arithmetic
  101. Conclusion
  102. References
  103. Chapter 12: Exporting Work to Other Platforms
  104. Introduction
  105. Why Export or Exchange Data?
  106. Fit the Method to the Purpose
    1. Save As
    2. Export to a Database
    3. Export to a SAS Library
  107. Exporting Reports
    1. Interactive Graphics
    2. Static Images: Graphics Formats, PowerPoint, and Word
  108. Conclusion
  109. References
  110. Index