book

Data Science For Dummies, 2nd Edition

Name: Data Science For Dummies, 2nd Edition
ISBN: 9781119327639

by Lillian Pierson, Jake Porway

March 2017

Beginner

384 pages

9h 24m

English

For Dummies

Audiobook available

Read now

Unlock full access

Cover
Introduction
About This BookFoolish AssumptionsIcons Used in This BookBeyond the BookWhere to Go from Here
Foreword
Part 1: Getting Started with Data Science
Chapter 1: Wrapping Your Head around Data Science
Seeing Who Can Make Use of Data ScienceAnalyzing the Pieces of the Data Science PuzzleExploring the Data Science Solution AlternativesLetting Data Science Make You More Marketable
Chapter 2: Exploring Data Engineering Pipelines and Infrastructure
Defining Big Data by the Three VsIdentifying Big Data SourcesGrasping the Difference between Data Science and Data EngineeringMaking Sense of Data in HadoopIdentifying Alternative Big Data SolutionsData Engineering in Action: A Case Study
Chapter 3: Applying Data-Driven Insights to Business and Industry
Benefiting from Business-Centric Data ScienceConverting Raw Data into Actionable Insights with Data AnalyticsTaking Action on Business InsightsDistinguishing between Business Intelligence and Data ScienceDefining Business-Centric Data ScienceDifferentiating between Business Intelligence and Business-Centric Data ScienceKnowing Whom to Call to Get the Job Done RightExploring Data Science in Business: A Data-Driven Business Success Story
Part 2: Using Data Science to Extract Meaning from Your Data
Chapter 4: Machine Learning: Learning from Data with Your Machine
Defining Machine Learning and Its ProcessesConsidering Learning StylesSeeing What You Can Do
Chapter 5: Math, Probability, and Statistical Modeling
Exploring Probability and Inferential StatisticsQuantifying CorrelationReducing Data Dimensionality with Linear AlgebraModeling Decisions with Multi-Criteria Decision MakingIntroducing Regression MethodsDetecting OutliersIntroducing Time Series Analysis

Chapter 6: Using Clustering to Subdivide Data
Introducing Clustering BasicsIdentifying Clusters in Your DataCategorizing Data with Decision Tree and Random Forest Algorithms
Chapter 7: Modeling with Instances
Recognizing the Difference between Clustering and ClassificationMaking Sense of Data with Nearest Neighbor AnalysisClassifying Data with Average Nearest Neighbor AlgorithmsClassifying with K-Nearest Neighbor AlgorithmsSolving Real-World Problems with Nearest Neighbor Algorithms
Chapter 8: Building Models That Operate Internet-of-Things Devices
Overviewing the Vocabulary and TechnologiesDigging into the Data Science ApproachesAdvancing Artificial Intelligence Innovation
Part 3: Creating Data Visualizations That Clearly Communicate Meaning
Chapter 9: Following the Principles of Data Visualization Design
Data Visualizations: The Big ThreeDesigning to Meet the Needs of Your Target AudiencePicking the Most Appropriate Design StyleChoosing How to Add ContextSelecting the Appropriate Data Graphic TypeChoosing a Data Graphic
Chapter 10: Using D3.js for Data Visualization
Introducing the D3.js LibraryKnowing When to Use D3.js (and When Not To)Getting Started in D3.jsImplementing More Advanced Concepts and Practices in D3.js
Chapter 11: Web-Based Applications for Visualization Design
Designing Data Visualizations for CollaborationVisualizing Spatial Data with Online Geographic ToolsVisualizing with Open Source: Web-Based Data Visualization PlatformsKnowing When to Stick with Infographics
Chapter 12: Exploring Best Practices in Dashboard Design
Focusing on the AudienceStarting with the Big PictureGetting the Details RightTesting Your Design
Chapter 13: Making Maps from Spatial Data
Getting into the Basics of GISAnalyzing Spatial DataGetting Started with Open-Source QGIS
Part 4: Computing for Data Science
Chapter 14: Using Python for Data Science
Sorting Out the Python Data TypesPutting Loops to Good Use in PythonHaving Fun with FunctionsKeeping Cool with ClassesChecking Out Some Useful Python LibrariesAnalyzing Data with Python — an Exercise
Chapter 15: Using Open Source R for Data Science
R’s Basic VocabularyDelving into Functions and OperatorsIterating in RObserving How Objects WorkSorting Out Popular Statistical Analysis PackagesExamining Packages for Visualizing, Mapping, and Graphing in R
Chapter 16: Using SQL in Data Science
Getting a Handle on Relational Databases and SQLInvesting Some Effort into Database DesignIntegrating SQL, R, Python, and Excel into Your Data Science StrategyNarrowing the Focus with SQL Functions
Chapter 17: Doing Data Science with Excel and Knime
Making Life Easier with ExcelUsing KNIME for Advanced Data Analytics
Part 5: Applying Domain Expertise to Solve Real-World Problems Using Data Science
Chapter 18: Data Science in Journalism: Nailing Down the Five Ws (and an H)
Who Is the Audience?What: Getting Directly to the PointBringing Data Journalism to Life: The Black BudgetWhen Did It Happen?Where Does the Story Matter?Why the Story MattersHow to Develop, Tell, and Present the StoryCollecting Data for Your StoryFinding and Telling Your Data’s Story
Chapter 19: Delving into Environmental Data Science
Modeling Environmental-Human Interactions with Environmental IntelligenceModeling Natural Resources in the RawUsing Spatial Statistics to Predict for Environmental Variation across Space
Chapter 20: Data Science for Driving Growth in E-Commerce
Making Sense of Data for E-Commerce GrowthOptimizing E-Commerce Business Systems
Chapter 21: Using Data Science to Describe and Predict Criminal Activity
Temporal Analysis for Crime Prevention and MonitoringSpatial Crime Prediction and MonitoringProbing the Problems with Data Science for Crime Analysis
Part 6: The Part of Tens
Chapter 22: Ten Phenomenal Resources for Open Data
Digging through data.govChecking Out Canada Open DataDiving into data.gov.ukChecking Out U.S. Census Bureau DataKnowing NASA DataWrangling World Bank DataGetting to Know Knoema DataQueuing Up with Quandl DataExploring Exversion DataMapping OpenStreetMap Spatial Data
Chapter 23: Ten Free Data Science Tools and Applications
Making Custom Web-Based Data Visualizations with Free R PackagesExamining Scraping, Collecting, and Handling ToolsLooking into Data Exploration ToolsEvaluating Web-Based Visualization Tools
About the Author
Connect with Dummies
End User License Agreement

Content preview from Data Science For Dummies, 2nd Edition

Chapter 6

Using Clustering to Subdivide Data

IN THIS CHAPTER

Understanding the basics of clustering

Clustering your data with the k-means algorithm and kernel density estimation

Getting to know hierarchical and neighborhood clustering algorithms

Checking out decision tree and random forest algorithms

Data scientists use clustering to help them divide their unlabeled data into subsets. The basics behind clustering are relatively easy to understand, but things get tricky fast when you get into using some of the more advanced algorithms. In this chapter, I introduce the basics behind clustering. I follow that by introducing several nuanced algorithms that offer clustering solutions to meet your requirements, based on the specific characteristics of your feature dataset.

Introducing Clustering Basics

To grasp advanced methods for use in clustering your data, you should first take a few moments to make sure you have a firm understanding of the basics that underlie all forms of clustering. Clustering is a form of machine learning — the machine in this case is your computer, and learning ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781119327639Purchase book

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Data Science For Dummies, 2nd Edition

by Lillian Pierson, Jake Porway

Using Clustering to Subdivide Data

Introducing Clustering Basics

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.