Chapter 3. Implementing Machine Learning Algorithms

All of the analysis that we discussed so far in this report was manual. We looked at some data, we had some idea what we wanted to find or highlight, we transformed the data, and we built a visualization. Machine learning aims to make the process more automated. In general, machine learning is the process of building models automatically from data. There are two basic kinds of algorithms. Supervised algorithms learn to generalize from data with known answers, while unsupervised algorithms automatically learn to model data without known structure.

In this chapter, we implement a basic, unsupervised machine learning algorithm called k-means clustering that automatically splits inputs into a specified number of groups. We’ll use it to group countries based on the indicators obtained in the previous chapter.

This chapter also shows the F# language from a different perspective. So far, we did not need to implement any complicated logic and mostly relied on existing libraries. In contrast, this chapter uses just the standard F# library, and you’ll see a number of ways in which F# makes it very easy to implement new algorithms—the primary way is type inference which lets us write efficient and correct code while keeping it very short and readable.

How k-Means Clustering Works

The k-means clustering algorithm takes input data, together with the number k that specifies how many clusters we want to obtain, and automatically assigns the ...

Get Analyzing and Visualizing Data with F# now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.