Book description
Make sense of your data and predict the unpredictable
About This Book
- A unique book that centers around develop six key practical skills needed to develop and implement predictive analytics
- Apply the principles and techniques of predictive analytics to effectively interpret big data
- Solve real-world analytical problems with the help of practical case studies and real-world scenarios taken from the world of healthcare, marketing, and other business domains
Who This Book Is For
This book is for those with a mathematical/statistics background who wish to understand the concepts, techniques, and implementation of predictive analytics to resolve complex analytical issues. Basic familiarity with a programming language of R is expected.
What You Will Learn
- Master the core predictive analytics algorithm which are used today in business
- Learn to implement the six steps for a successful analytics project
- Classify the right algorithm for your requirements
- Use and apply predictive analytics to research problems in healthcare
- Implement predictive analytics to retain and acquire your customers
- Use text mining to understand unstructured data
- Develop models on your own PC or in Spark/Hadoop environments
- Implement predictive analytics products for customers
In Detail
This is the go-to book for anyone interested in the steps needed to develop predictive analytics solutions with examples from the world of marketing, healthcare, and retail. We'll get started with a brief history of predictive analytics and learn about different roles and functions people play within a predictive analytics project. Then, we will learn about various ways of installing R along with their pros and cons, combined with a step-by-step installation of RStudio, and a description of the best practices for organizing your projects.
On completing the installation, we will begin to acquire the skills necessary to input, clean, and prepare your data for modeling. We will learn the six specific steps needed to implement and successfully deploy a predictive model starting from asking the right questions through model development and ending with deploying your predictive model into production. We will learn why collaboration is important and how agile iterative modeling cycles can increase your chances of developing and deploying the best successful model.
We will continue your journey in the cloud by extending your skill set by learning about Databricks and SparkR, which allow you to develop predictive models on vast gigabytes of data.
Style and Approach
This book takes a practical hands-on approach wherein the algorithms will be explained with the help of real-world use cases. It is written in a well-researched academic style which is a great mix of theoretical and practical information. Code examples are supplied for both theoretical concepts as well as for the case studies. Key references and summaries will be provided at the end of each chapter so that you can explore those topics on their own.
Publisher resources
Table of contents
- Preface
-
Getting Started with Predictive Analytics
- Predictive analytics are in so many industries
- Skills and roles that are important in Predictive Analytics
- Predictive analytics software
- Other helpful tools
- R
- How is a predictive analytics project organized?
- GUIs
- Getting started with RStudio
- The R console
- The source window
- Our first predictive model
- Your second script
- R packages
- References
- Summary
-
The Modeling Process
- Advantages of a structured approach
- Analytic process methodologies
- An analytics methodology outline – specific steps
- Step 2 data understanding
- Step 3 data preparation
- Step 4 modeling
- Step 5 evaluation
- Step 6 deployment
- References
- Summary
-
Inputting and Exploring Data
- Data input
- Joining data
- Exploring the hospital dataset
- Transposing a dataframe
- Missing values
- Imputing categorical variables
- Outliers
- Data transformations
- Variable reduction/variable importance
- References
- Summary
-
Introduction to Regression Algorithms
- Supervised versus unsupervised learning models
- Regression techniques
- Generalized linear models
-
Logistic regression
- The odds ratio
- The logistic regression coefficients
- Example - using logistic regression in health care to predict pain thresholds
- Fitting a GLM model
- Examining the residuals
- Added variable plots
- P-values and effect size
- P-values and effect sizes
- Variable selection
- Interactions
- Goodness of fit statistics
- Confidence intervals and Wald statistics
- Basic regression diagnostic plots
- Description of the plots
- Goodness of fit – Hosmer-Lemeshow test
- Regularization
- An example – ElasticNet
- Choosing a correct lamda
- Printing out the possible coefficients based on Lambda
- Summary
-
Introduction to Decision Trees, Clustering, and SVM
-
Decision tree algorithms
- Advantages of decision trees
- Disadvantages of decision trees
- Basic decision tree concepts
- Growing the tree
- Impurity
- Controlling the growth of the tree
- Types of decision tree algorithms
- Examining the target variable
- Using formula notation in an rpart model
- Interpretation of the plot
- Printing a text version of the decision tree
- Pruning
- Other options to render decision trees
- Cluster analysis
- Support vector machines
- References
- Summary
-
Decision tree algorithms
-
Using Survival Analysis to Predict and Analyze Customer Churn
- What is survival analysis?
- Our customer satisfaction dataset
- Partitioning into training and test data
- Setting the stage by creating survival objects
-
Examining survival curves
- Better plots
- Contrasting survival curves
- Testing for the gender difference between survival curves
- Testing for the educational differences between survival curves
- Plotting the customer satisfaction and number of service call curves
- Improving the education survival curve by adding gender
- Transforming service calls to a binary variable
- Testing the difference between customers who called and those who did not
- Cox regression modeling
- Time-based variables
- Comparing the models
- Variable selection
- Summary
-
Using Market Basket Analysis as a Recommender Engine
- What is market basket analysis?
- Examining the groceries transaction file
- The sample market basket
- Association rule algorithms
- Antecedents and descendants
- Evaluating the accuracy of a rule
- Preparing the raw data file for analysis
- Analyzing the input file
- Scrubbing and cleaning the data
- Removing colors automatically
- Filtering out single item transactions
- Merging the results back into the original data
- Compressing descriptions using camelcase
- Creating the test and training datasets
- Creating the market basket transaction file
- Method two – Creating a physical transactions file
- Converting to a document term matrix
- K-means clustering of terms
- Predicting cluster assignments
- Running the apriori algorithm on the clusters
- Summarizing the metrics
- References
- Summary
-
Exploring Health Care Enrollment Data as a Time Series
- Time series data
- Health insurance coverage dataset
- Housekeeping
- Read the data in
- Subsetting the columns
- Description of the data
- Target time series variable
- Saving the data
- Determining all of the subset groups
- Merging the aggregate data back into the original data
- Checking the time intervals
- Picking out the top groups in terms of average population size
- Plotting the data using lattice
- Plotting the data using ggplot
- Sending output to an external file
- Examining the output
- Detecting linear trends
- Automating the regressions
- Ranking the coefficients
- Merging scores back into the original dataframe
- Plotting the data with the trend lines
- Plotting all the categories on one graph
- Performing some automated forecasting using the ets function
- Smoothing the data using moving averages
- Simple moving average
- Verifying the SMA calculation
- Exponential moving average
- Using the ets function
- Forecasting using ALL AGES
- Plotting the predicted and actual values
- The forecast (fit) method
- Plotting future values with confidence bands
- Modifying the model to include a trend component
- Running the ets function iteratively over all of the categories
- Accuracy measures produced by onestep
- Comparing the Test and Training for the "UNDER 18 YEARS" group
- Accuracy measures
- References
- Summary
-
Introduction to Spark Using R
- About Spark
- Spark environments
- SparkR
- Building our first Spark dataframe
- Importing the sample notebook
- Creating a new notebook
- Becoming large by starting small
- Running the code
- Running the initialization code
- Extracting the Pima Indians diabetes dataset
- Simulating the data
- Simulating the negative cases
- Running summary statistics
- Saving your work
- Summary
-
Exploring Large Datasets Using Spark
- Performing some exploratory analysis on positives
- Cleaning up and caching the table in memory
- Some useful Spark functions to explore your data
- Creating new columns
- Constructing a cross-tab
- Contrasting histograms
- Plotting using ggplot
-
Spark SQL
- Registering tables
- Issuing SQL through the R interface
- Using SQL to examine potential outliers
- Creating some aggregates
- Picking out some potential outliers using a third query
- Changing to the SQL API
- SQL – computing a new column using the Case statement
- Evaluating outcomes based upon the Age segment
- Computing mean values for all of the variables
- Exporting data from Spark back into R
- Running local R packages
- Some tips for using Spark
- Summary
-
Spark Machine Learning - Regression and Cluster Models
- About this chapter/what you will learn
- Splitting the data into train and test datasets
- Spark machine learning using logistic regression
- Running predictions for the test data
- Combining the training and test dataset
- Exposing the three tables to SQL
- Validating the regression results
- Calculating goodness of fit measures
- Confusion matrix for test group
- Plotting outside of Spark
- Creating some global views
- Normalizing the data
- Characterizing the clusters by their mean values
- Summary
- Spark Models – Rule-Based Learning
Product information
- Title: Practical Predictive Analytics
- Author(s):
- Release date: June 2017
- Publisher(s): Packt Publishing
- ISBN: 9781785886188
You might also like
book
Mastering Financial Pattern Recognition
Candlesticks have become a key component of platforms and charting programs for financial trading. With these …
book
Analytics for Retail: A Step-by-Step Guide to the Statistics Behind a Successful Retail Business
Examine select retail business scenarios to learn basic mathematics, as well as probability and statistics required …
book
Analytical Skills for AI and Data Science
While several market-leading companies have successfully transformed their business models by following data- and AI-driven paths, …
book
The GuruBook
The GuruBook is an inspiring collection of 45 articles and interviews with well-known thought leaders and …