May 2018
Beginner
490 pages
13h 16m
English
The training dataset consists of 5,000 lines. The first line contains a header for maintenance purposes (data checking), which is not used. k-means clustering is an unsupervised learning algorithm, meaning that it classifies unlabeled data into cluster-labeled data to make future predictions. The following code displays the dataset:
#I.The training Dataset dataset = pd.read_csv('data.csv')print (dataset.head())print(dataset)
The print(dataset) line can be useful though not necessary to check the training data during a prototype phase or for maintenance purposes. The following output confirms that the data was correctly imported:
'''Output of print(dataset) Distance location0 80 531 18 82 55 38...'''
Read now
Unlock full access