Chapter 1. Machine Learning in Finance: The Landscape
Machine learning promises to shake up large swathes of finance
The Economist (2017)
There is a new wave of machine learning and data science in finance, and the related applications will transform the industry over the next few decades.
Currently, most financial firms, including hedge funds, investment and retail banks, and fintech firms, are adopting and investing heavily in machine learning. Going forward, financial institutions will need a growing number of machine learning and data science experts.
Machine learning in finance has become more prominent recently due to the availability of vast amounts of data and more affordable computing power. The use of data science and machine learning is exploding exponentially across all areas of finance.
The success of machine learning in finance depends upon building efficient infrastructure, using the correct toolkit, and applying the right algorithms. The concepts related to these building blocks of machine learning in finance are demonstrated and utilized throughout this book.
In this chapter, we provide an introduction to the current and future application of machine learning in finance, including a brief overview of different types of machine learning. This chapter and the two that follow serve as the foundation for the case studies presented in the rest of the book.
Current and Future Machine Learning Applications in Finance
Let’s take a look at some promising machine learning applications in finance. The case studies presented in this book cover all the applications mentioned here.
Algorithmic Trading
Algorithmic trading (or simply algo trading) is the use of algorithms to conduct trades autonomously. With origins going back to the 1970s, algorithmic trading (sometimes called Automated Trading Systems, which is arguably a more accurate description) involves the use of automated preprogrammed trading instructions to make extremely fast, objective trading decisions.
Machine learning stands to push algorithmic trading to new levels. Not only can more advanced strategies be employed and adapted in real time, but machine learning–based techniques can offer even more avenues for gaining special insight into market movements. Most hedge funds and financial institutions do not openly disclose their machine learning–based approaches to trading (for good reason), but machine learning is playing an increasingly important role in calibrating trading decisions in real time.
Portfolio Management and Robo-Advisors
Asset and wealth management firms are exploring potential artificial intelligence (AI) solutions for improving their investment decisions and making use of their troves of historical data.
One example of this is the use of robo-advisors, algorithms built to calibrate a financial portfolio to the goals and risk tolerance of the user. Additionally, they provide automated financial guidance and service to end investors and clients.
A user enters their financial goals (e.g., to retire at age 65 with $250,000 in savings), age, income, and current financial assets. The advisor (the allocator) then spreads investments across asset classes and financial instruments in order to reach the user’s goals.
The system then calibrates to changes in the user’s goals and real-time changes in the market, aiming always to find the best fit for the user’s original goals. Robo-advisors have gained significant traction among consumers who do not need a human advisor to feel comfortable investing.
Fraud Detection
Fraud is a massive problem for financial institutions and one of the foremost reasons to leverage machine learning in finance.
There is currently a significant data security risk due to high computing power, frequent internet use, and an increasing amount of company data being stored online. While previous financial fraud detection systems depended heavily on complex and robust sets of rules, modern fraud detection goes beyond following a checklist of risk factors—it actively learns and calibrates to new potential (or real) security threats.
Machine learning is ideally suited to combating fraudulent financial transactions. This is because machine learning systems can scan through vast datasets, detect unusual activities, and flag them instantly. Given the incalculably high number of ways that security can be breached, genuine machine learning systems will be an absolute necessity in the days to come.
Loans/Credit Card/Insurance Underwriting
Underwriting could be described as a perfect job for machine learning in finance, and indeed there is a great deal of worry in the industry that machines will replace a large swath of underwriting positions that exist today.
Especially at large companies (big banks and publicly traded insurance firms), machine learning algorithms can be trained on millions of examples of consumer data and financial lending or insurance outcomes, such as whether a person defaulted on their loan or mortgage.
Underlying financial trends can be assessed with algorithms and continuously analyzed to detect trends that might influence lending and underwriting risk in the future. Algorithms can perform automated tasks such as matching data records, identifying exceptions, and calculating whether an applicant qualifies for a credit or insurance product.
Automation and Chatbots
Automation is patently well suited to finance. It reduces the strain that repetitive, low-value tasks put on human employees. It tackles the routine, everyday processes, freeing up teams to finish their high-value work. In doing so, it drives enormous time and cost savings.
Adding machine learning and AI into the automation mix adds another level of support for employees. With access to relevant data, machine learning and AI can provide an in-depth data analysis to support finance teams with difficult decisions. In some cases, it may even be able to recommend the best course of action for employees to approve and enact.
AI and automation in the financial sector can also learn to recognize errors, reducing the time wasted between discovery and resolution. This means that human team members are less likely to be delayed in providing their reports and are able to complete their work with fewer errors.
AI chatbots can be implemented to support finance and banking customers. With the rise in popularity of live chat software in banking and finance businesses, chatbots are the natural evolution.
Risk Management
Machine learning techniques are transforming how we approach risk management. All aspects of understanding and controlling risk are being revolutionized through the growth of solutions driven by machine learning. Examples range from deciding how much a bank should lend a customer to improving compliance and reducing model risk.
Asset Price Prediction
Asset price prediction is considered the most frequently discussed and most sophisticated area in finance. Predicting asset prices allows one to understand the factors that drive the market and speculate asset performance. Traditionally, asset price prediction was performed by analyzing past financial reports and market performance to determine what position to take for a specific security or asset class. However, with a tremendous increase in the amount of financial data, the traditional approaches for analysis and stock-selection strategies are being supplemented with ML-based techniques.
Derivative Pricing
Recent machine learning successes, as well as the fast pace of innovation, indicate that ML applications for derivatives pricing should become widely used in the coming years. The world of Black-Scholes models, volatility smiles, and Excel spreadsheet models should wane as more advanced methods become readily available.
The classic derivative pricing models are built on several impractical assumptions to reproduce the empirical relationship between the underlying input data (strike price, time to maturity, option type) and the price of the derivatives observed in the market. Machine learning methods do not rely on several assumptions; they just try to estimate a function between the input data and price, minimizing the difference between the results of the model and the target.
The faster deployment times achieved with state-of-the-art ML tools are just one of the advantages that will accelerate the use of machine learning in derivatives pricing.
Sentiment Analysis
Sentiment analysis involves the perusal of enormous volumes of unstructured data, such as videos, transcriptions, photos, audio files, social media posts, articles, and business documents, to determine market sentiment. Sentiment analysis is crucial for all businesses in today’s workplace and is an excellent example of machine learning in finance.
The most common use of sentiment analysis in the financial sector is the analysis of financial news—in particular, predicting the behaviors and possible trends of markets. The stock market moves in response to myriad human-related factors, and the hope is that machine learning will be able to replicate and enhance human intuition about financial activity by discovering new trends and telling signals.
However, much of the future applications of machine learning will be in understanding social media, news trends, and other data sources related to predicting the sentiments of customers toward market developments. It will not be limited to predicting stock prices and trades.
Trade Settlement
Trade settlement is the process of transferring securities into the account of a buyer and cash into the seller’s account following a transaction of a financial asset.
Despite the majority of trades being settled automatically, and with little or no interaction by human beings, about 30% of trades need to be settled manually.
The use of machine learning not only can identify the reason for failed trades, but it also can analyze why the trades were rejected, provide a solution, and predict which trades may fail in the future. What usually would take a human being five to ten minutes to fix, machine learning can do in a fraction of a second.
Machine Learning, Deep Learning, Artificial Intelligence, and Data Science
For the majority of people, the terms machine learning, deep learning, artificial intelligence, and data science are confusing. In fact, a lot of people use one term interchangeably with the others.
Figure 1-1 shows the relationships between AI, machine learning, deep learning and data science. Machine learning is a subset of AI that consists of techniques that enable computers to identify patterns in data and to deliver AI applications. Deep learning, meanwhile, is a subset of machine learning that enables computers to solve more complex problems.
Data science isn’t exactly a subset of machine learning, but it uses machine learning, deep learning, and AI to analyze data and reach actionable conclusions. It combines machine learning, deep learning and AI with other disciplines such as big data analytics and cloud computing.
The following is a summary of the details about artificial intelligence, machine learning, deep learning, and data science:
- Artificial intelligence
-
Artificial intelligence is the field of study by which a computer (and its systems) develop the ability to successfully accomplish complex tasks that usually require human intelligence. These tasks include, but are not limited to, visual perception, speech recognition, decision making, and translation between languages. AI is usually defined as the science of making computers do things that require intelligence when done by humans.
- Machine learning
-
Machine learning is an application of artificial intelligence that provides the AI system with the ability to automatically learn from the environment and apply those lessons to make better decisions. There are a variety of algorithms that machine learning uses to iteratively learn, describe and improve data, spot patterns, and then perform actions on these patterns.
- Deep learning
-
Deep learning is a subset of machine learning that involves the study of algorithms related to artificial neural networks that contain many blocks (or layers) stacked on each other. The design of deep learning models is inspired by the biological neural network of the human brain. It strives to analyze data with a logical structure similar to how a human draws conclusions.
- Data science
-
Data science is an interdisciplinary field similar to data mining that uses scientific methods, processes, and systems to extract knowledge or insights from data in various forms, either structured or unstructured. Data science is different from ML and AI because its goal is to gain insight into and understanding of the data by using different scientific tools and techniques. However, there are several tools and techniques common to both ML and data science, some of which are demonstrated in this book.
Machine Learning Types
This section will outline all types of machine learning that are used in different case studies presented in this book for various financial applications. The three types of machine learning, as shown in Figure 1-2, are supervised learning, unsupervised learning, and reinforcement learning.
Supervised
The main goal in supervised learning is to train a model from labeled data that allows us to make predictions about unseen or future data. Here, the term supervised refers to a set of samples where the desired output signals (labels) are already known. There are two types of supervised learning algorithms: classification and regression.
Regression
Regression is another subcategory of supervised learning used in the prediction of continuous outcomes. In regression, we are given a number of predictor (explanatory) variables and a continuous response variable (outcome or target), and we try to find a relationship between those variables that allows us to predict an outcome.
An example of regression versus classification is shown in Figure 1-3. The chart on the left shows an example of regression. The continuous response variable is return, and the observed values are plotted against the predicted outcomes. On the right, the outcome is a categorical class label, whether the market is bull or bear, and is an example of classification.
Unsupervised
Unsupervised learning is a type of machine learning used to draw inferences from datasets consisting of input data without labeled responses. There are two types of unsupervised learning: dimensionality reduction and clustering.
Dimensionality reduction
Dimensionality reduction is the process of reducing the number of features, or variables, in a dataset while preserving information and overall model performance. It is a common and powerful way to deal with datasets that have a large number of dimensions.
Figure 1-4 illustrates this concept, where the dimension of data is converted from two dimensions (X1 and X2) to one dimension (Z1). Z1 conveys similar information embedded in X1 and X2 and reduces the dimension of the data.
Clustering
Clustering is a subcategory of unsupervised learning techniques that allows us to discover hidden structures in data. The goal of clustering is to find a natural grouping in data so that items in the same cluster are more similar to each other than to those from different clusters.
An example of clustering is shown in Figure 1-5, where we can see the entire data clustered into two distinct groups by the clustering algorithm.
Reinforcement Learning
Learning from experiences, and the associated rewards or punishments, is the core concept behind reinforcement learning (RL). It is about taking suitable actions to maximize reward in a particular situation. The learning system, called an agent, can observe the environment, select and perform actions, and receive rewards (or penalties in the form of negative rewards) in return, as shown in Figure 1-6.
Reinforcement learning differs from supervised learning in this way: In supervised learning, the training data has the answer key, so the model is trained with the correct answers available. In reinforcement learning, there is no explicit answer. The learning system (agent) decides what to do to perform the given task and learns whether that was a correct action based on the reward. The algorithm determines the answer key through its experience.
The steps of the reinforcement learning are as follows:
-
First, the agent interacts with the environment by performing an action.
-
Then the agent receives a reward based on the action it performed.
-
Based on the reward, the agent receives an observation and understands whether the action was good or bad. If the action was good—that is, if the agent received a positive reward—then the agent will prefer performing that action. If the reward was less favorable, the agent will try performing another action to receive a positive reward. It is basically a trial-and-error learning process.
Natural Language Processing
Natural language processing (NLP) is a branch of AI that deals with the problems of making a machine understand the structure and the meaning of natural language as used by humans. Several techniques of machine learning and deep learning are used within NLP.
NLP has many applications in the finance sectors in areas such as sentiment analysis, chatbots, and document processing. A lot of information, such as sell side reports, earnings calls, and newspaper headlines, is communicated via text message, making NLP quite useful in the financial domain.
Given the extensive application of NLP algorithms based on machine learning in finance, there is a separate chapter of this book (Chapter 10) dedicated to NLP and related case studies.
Chapter Summary
Machine learning is making significant inroads across all the verticals of the financial services industry. This chapter covered different applications of machine learning in finance, from algorithmic trading to robo-advisors. These applications will be covered in the case studies later in this book.
Next Steps
In terms of the platforms used for machine learning, the Python ecosystem is growing and is one of the most dominant programming languages for machine learning. In the next chapter, we will learn about the model development steps, from data preparation to model deployment in a Python-based framework.
Get Machine Learning and Data Science Blueprints for Finance now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.