First, we use two-dimensional data to see whether there is a cluster. The two-dimensional attributes are averageRating and episodeNumber.
We need to set an index column for clustering. To do so, let's use tconst for clustering:
#set 'tconst' to the index columnpre=re.set_index('tconst')
To see the relationship between the number of episodes and ratings, we drop numVotes and seasonNumber since we only need the other variables:
ratingandepisode=pre.drop('numVotes',1)ratingandepisode=ratingandepisode.drop('seasonNumber',1)
After that, let's preprocess the data and get the arrays we need:
processed=preprocessing.scale(ratingandepisode)
Now, let's plot the scaled data:
x,y=processed.Tplt.scatter(x,y)
The preceding snippet should ...