In the following chapters, we will focus our attention on topics which commonly fall under the machine learning literature. Although
q is not a typical candidate for implementing machine learning methods, we aim to show that many techniques can be easily implemented in
q, which gives us the advantage of using smart algorithms sitting next to the data, quickly writing, modifying and adapting our logic while at the same time we crunch and review large data sets.
In this chapter, we prepare the ground by introducing two concepts. First, we discuss the various data types we meet in our empirical work and how to (pre-)process them. We then walk through the general programming technique we use to implement the algorithms. This will lay the foundation for subsequent algorithms and adjust our mindset for further exploratory journeys in
19.1 NON-NUMERIC FEATURES AND NORMALISATION
Let us first discuss two technical tricks which are useful to cover before we dive into the machine learning methods themselves: dealing with non-numeric features, and the normalisation of features. For both, we provide
19.1.1 Non-Numeric Features
When working with large and rich data sets, we often encounter features which do not have a numerical representation. In the context of finance, such a feature can be the venue where a financial transaction took place: some assets are traded at the same time on several exchanges, which – due to regulatory ...