Skip to Content
The Path to Predictive Analytics and Machine Learning
book

The Path to Predictive Analytics and Machine Learning

by Conor Doherty, Steven Camina, Kevin White, Gary Orenstein
October 2016
Intermediate to advanced content levelIntermediate to advanced
87 pages
1h 50m
English
O'Reilly Media, Inc.
Content preview from The Path to Predictive Analytics and Machine Learning

Appendix A. Appendix

Sample code that generates data, runs a linear regression, and plots the results:

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

x = np.arange(1,15)

delta = np.random.uniform(-2,2, size=(14,))

y = .9 * x + 1 +  delta

plt.scatter(x,y, s=50)

slope, int, r_val, p_val, err = stats.linregress(x, y)

plt.plot(x, slope * x + intercept)
plt.xlim(0)
plt.ylim(0)

# calling show() will open your plot in a window
# you can save rather than opening the plot using savefig()
plt.show()

Sample code that generates data, runs a clustering algorithm, and plots the results:

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
from scipy.cluster.vq import vq, kmeans

data = np.vstack((np.random.rand(200,2) + \
       np.array([.5, .5]),np.random.rand(200,2)))

centroids2, _ = kmeans(data, 2)
idx2,_ = vq(data,centroids2)

# scatter plot without centroids
plt.figure(1)

plt.plot(data[:,0],data[:,1], 'o')

# scatter plot with 2 centroids
plt.figure(2)

plt.plot(data[:,0],data[:,1],'o')
plt.plot(centroids2[:,0],centroids2[:,1],'sm',markersize=16)

# scatter plot with 2 centroids and point colored by cluster
plt.figure(3)

plt.plot(data[idx2==0,0],data[idx2==0,1],'ob',data[idx2==1,0], \
         data[idx2==1,1],'or')
plt.plot(centroids2[:,0],centroids2[:,1],'sm',markersize=16)

centroids3, _ = kmeans(data, 3)
idx3,_ = vq(data,centroids3)

# scatter plot with 3 centroids and points colored by cluster
plt.figure(4)

plt.plot(data[idx3==0,0],data[idx3==0,1],'ob',
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Big Data Analytics with Spark: A Practitioner’s Guide to Using Spark for Large-Scale Data Processing, Machine Learning, and Graph Analytics, and High-Velocity Data Stream Processing

Big Data Analytics with Spark: A Practitioner’s Guide to Using Spark for Large-Scale Data Processing, Machine Learning, and Graph Analytics, and High-Velocity Data Stream Processing

Mohammed Guller
Big Data Analytics for Internet of Things

Big Data Analytics for Internet of Things

Tausifa Jan Saleem, Mohammad Ahsan Chishti

Publisher Resources

ISBN: 9781492042884