Clustering

First, we use two-dimensional data to see whether there is a cluster. The two-dimensional attributes are averageRating and episodeNumber

We need to set an index column for clustering. To do so, let's use tconst for clustering: 

#set 'tconst' to the index columnpre=re.set_index('tconst')

To see the relationship between the number of episodes and ratings, we drop numVotes and seasonNumber since we only need the other variables:

ratingandepisode=pre.drop('numVotes',1)ratingandepisode=ratingandepisode.drop('seasonNumber',1)

After that, let's preprocess the data and get the arrays we need:

processed=preprocessing.scale(ratingandepisode)

Now, let's plot the scaled data:

x,y=processed.Tplt.scatter(x,y)

The preceding snippet should ...

Get Hands-On Big Data Modeling now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.