O'Reilly logo

Apache Spark for Data Science Cookbook by Padma Priya Chitturi

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 4. Clustering, Classification, and Regression

In this chapter, we will cover the following recipes:

  • Introduction
  • Applying regression analysis for sales data
    • Variable identification
    • Data exploration
    • Feature engineering
    • Applying linear regression
  • Applying logistic regression on bank marketing data
    • Variable identification
    • Data exploration
    • Feature engineering
    • Applying logistic regression
  • Real-time intrusion detection using streaming k-means
    • Variable identification
    • Producer code generating real-time data
    • Applying streaming k-means

Introduction

Machine learning is a field of study that gives computers the ability to learn without being explicitly programmed. Many successful applications of machine learning exist already, including systems that analyse past sales ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required