CHAPTER 19Basic Principles of Machine Learning

In the following chapters, we will focus our attention on topics which commonly fall under the machine learning literature. Although q is not a typical candidate for implementing machine learning methods, we aim to show that many techniques can be easily implemented in q, which gives us the advantage of using smart algorithms sitting next to the data, quickly writing, modifying and adapting our logic while at the same time we crunch and review large data sets.

In this chapter, we prepare the ground by introducing two concepts. First, we discuss the various data types we meet in our empirical work and how to (pre-)process them. We then walk through the general programming technique we use to implement the algorithms. This will lay the foundation for subsequent algorithms and adjust our mindset for further exploratory journeys in q.

19.1 NON-NUMERIC FEATURES AND NORMALISATION

Let us first discuss two technical tricks which are useful to cover before we dive into the machine learning methods themselves: dealing with non-numeric features, and the normalisation of features. For both, we provide q functions.

19.1.1 Non-Numeric Features

When working with large and rich data sets, we often encounter features which do not have a numerical representation. In the context of finance, such a feature can be the venue where a financial transaction took place: some assets are traded at the same time on several exchanges, which – due to regulatory ...

Get Machine Learning and Big Data with kdb+/q now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.