book

Data Smart: Using Data Science to Transform Information into Insight

Name: Data Smart: Using Data Science to Transform Information into Insight
Author: John W. Foreman
ISBN: 9781118661468

by John W. Foreman

November 2013

Beginner to intermediate

432 pages

10h 39m

English

Wiley

Audiobook available

Read now

Unlock full access

Cover Page
Title Page
Copyright
Dedication
Credits
About the Author
Acknowledgments
Contents
Introduction
What Am I Doing Here?A Workable Definition of Data ScienceBut Wait, What about Big Data?Who Am I?Who Are You?No Regrets. Spreadsheets ForeverConventionsLet's Get Going
1: Everything You Ever Needed to Know about Spreadsheets but Were Too Afraid to Ask
Some Sample DataMoving Quickly with the Control ButtonCopying Formulas and Data QuicklyFormatting CellsPaste Special ValuesInserting ChartsLocating the Find and Replace MenusFormulas for Locating and Pulling ValuesUsing VLOOKUP to Merge DataFiltering and SortingUsing PivotTablesUsing Array FormulasSolving Stuff with SolverOpenSolver: I Wish We Didn't Need This, but We DoWrapping Up

2: Cluster Analysis Part I: Using K-Means to Segment Your Customer Base
Girls Dance with Girls, Boys Scratch Their ElbowsGetting Real: K-Means Clustering Subscribers in E-mail MarketingK-Medians Clustering and Asymmetric Distance MeasurementsWrapping Up
3: Naïve Bayes and the Incredible Lightness of Being an Idiot
When You Name a Product Mandrill, You're Going to Get Some Signal and Some NoiseThe World's Fastest Intro to Probability TheoryUsing Bayes Rule to Create an AI ModelLet's Get This Excel Party StartedWrapping Up
4: Optimization Modeling: Because That “Fresh Squeezed” Orange Juice Ain't Gonna Blend Itself
Why Should Data Scientists Know Optimization?Starting with a Simple Trade-OffFresh from the Grove to Your Glass...with a Pit Stop through a Blending ModelModeling RiskWrapping Up
5: Cluster Analysis Part II: Network Graphs and Community Detection
What Is a Network Graph?Visualizing a Simple GraphBrief Introduction to GephiBuilding a Graph from the Wholesale Wine DataHow Much Is an Edge Worth? Points and Penalties in Graph ModularityLet's Get Clustering!There and Back Again: A Gephi TaleWrapping Up
6: The Granddaddy of Supervised Artificial Intelligence—Regression
Wait, What? You're Pregnant?Don't Kid YourselfPredicting Pregnant Customers at RetailMart Using Linear RegressionPredicting Pregnant Customers at RetailMart Using Logistic RegressionFor More InformationWrapping Up
7: Ensemble Models: A Whole Lot of Bad Pizza
Using the Data from Chapter 6Bagging: Randomize, Train, RepeatBoosting: If You Get It Wrong, Just Boost and Try AgainWrapping Up
8: Forecasting: Breathe Easy; You Can't Win
The Sword Trade Is HoppingGetting Acquainted with Time Series DataStarting Slow with Simple Exponential SmoothingYou Might Have a TrendHolt's Trend-Corrected Exponential SmoothingMultiplicative Holt-Winters Exponential SmoothingWrapping Up
9: Outlier Detection: Just Because They're Odd Doesn't Mean They're Unimportant
Outliers Are (Bad?) People, TooThe Fascinating Case of Hadlum v. HadlumTerrible at Nothing, Bad at EverythingWrapping Up
10: Moving from Spreadsheets into R
Getting Up and Running with RDoing Some Actual Data ScienceWrapping Up
Conclusion
Where Am I? What Just Happened?Before You Go-GoGet Creative and Keep in Touch!
Index

Overview

Data Science gets thrown around in the press like it's magic. Major retailers are predicting everything from when their customers are pregnant to when they want a new pair of Chuck Taylors. It's a brave new world where seemingly meaningless data can be transformed into valuable insight to drive smart business decisions.

But how does one exactly do data science? Do you have to hire one of these priests of the dark arts, the "data scientist," to extract this gold from your data? Nope.

Data science is little more than using straight-forward steps to process raw data into actionable insight. And in Data Smart, author and data scientist John Foreman will show you how that's done within the familiar environment of a spreadsheet.

Why a spreadsheet? It's comfortable! You get to look at the data every step of the way, building confidence as you learn the tricks of the trade. Plus, spreadsheets are a vendor-neutral place to learn data science without the hype.

But don't let the Excel sheets fool you. This is a book for those serious about learning the analytic techniques, the math and the magic, behind big data.

Each chapter will cover a different technique in a spreadsheet so you can follow along:

Mathematical optimization, including non-linear programming and genetic algorithms
Clustering via k-means, spherical k-means, and graph modularity
Data mining in graphs, such as outlier detection
Supervised AI through logistic regression, ensemble models, and bag-of-words models
Forecasting, seasonal adjustments, and prediction intervals through monte carlo simulation
Moving from spreadsheets into the R programming language

You get your hands dirty as you work alongside John through each technique. But never fear, the topics are readily applicable and the author laces humor throughout. You'll even learn what a dead squirrel has to do with optimization modeling, which you no doubt are dying to know.

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Building an Effective Data Science Practice: A Framework to Bootstrap and Manage a Successful Data Science Practice

Publisher Resources

ISBN: 9781118661468Purchase book

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills