Chapter 1. Importing and Processing Financial Data in Python
This chapter is dedicated to laying the foundation needed to analyze financial data through coding. This requires some preparation, such as downloading the right software and creating an algorithm that fetches historical data automatically.
By the end of the chapter, you should know how to automatically import historical financial data using Python, a skill that should save you time. So letâs get started.
Installing the Environment
The first step is to prepare the environment and everything else necessary for the success of the algorithms. For this, you need two programs:
- A Python interpreter that you use to write and execute code
- Charting and financial software that you use as a database
Letâs start with the Python interpreter. I use a software called SPYDER. Some people may be more familiar with other software such as Jupyter and PyCharm, but the process is the same. You can download SPYDER from the official website or, even better, download it as part of a bigger package called Anaconda, which facilitates installation and offers more tools. Note that it is open source, free-to-use software.
SPYDERâs interface is split into three windows, as you can see in Figure 1-1. The window on the left is used to write the code that is later executed (the algorithm is told to run and apply the code). Typically, you will see multiple lines of code in that area.
The window on the upper right is the variable explorer. Every time a variable is stored, you can see it there. The window on the lower right is the console that shows the result of the code, whether it is an error or an output.
The types of data that you can define and use in the code are classified into several categories:
- Integers
- These are whole numbers, which can be either positive or negative. Examples are â8 and 745. They are, however, limited to between â2147483648 and 2147483647. Any number falling outside this range is considered a different data type called a long. The difference between data types has to do with memory. Integers are 32 bits in width, whereas longs are 64 bits in width.
- Floats
- These are real numbers with decimal points such as 18.54 and 311.52.
- Strings
- These are words stored in a variable. More scientifically, they are a set of structured characters (text). In Python, you write strings between single or double quotes.
In line 1 of the code in Figure 1-1, I have defined a variable called age
and set it to 13. When you run this code, you should see the creation of age
in the variable explorer with type int
(integer) and a value of 13. In line 3, I have executed the code that defines the height
variable set to 184.50 (therefore, a float data type).
Notice that next to the definition of the variable, I have written the phrase in centimeters
, preceded by a hashmark. This is called a comment. Comments are very important in Python for explaining the code. Therefore, anything preceded by #
will not be executed. In the variable explorer, you can see the height
variable with the float
type. Line 5 defines a string, which in the variable explorer is shown as str
type (string). In the console, you can see that the code has been successfully executed because there are no errors, which would be shown in red.
The next step in preparing the environment is to install the charting software that allows you to import historical data into SPYDER. Throughout the book, I use MetaTrader 5, a benchmark charting program used by many traders around the globe. Follow these steps:
- Download SPYDER and familiarize yourself with how it works.
- Download the MetaTrader 5 software.
- Use SPYDER to import historical prices from MetaTrader 5.
From the official website, download and install MetaTrader 5. You need to create a demo account, which is simply a virtual account with imaginary money. The word demo does not refer to a limited duration of use but to the fact that it is not using real money.
To open an account, select File > Open an Account, choose MetaQuotes Software Corp, and then click Next. Next, choose the first option to open a demo account; this will let you trade virtual money. Finally, enter some basic information such as name, email, and account type. You will not receive a verification request or any type of confirmation as the demo should launch directly, allowing you to see the charts.
Figure 1-2 shows the platformâs interface. By default, MetaTrader 5 does not show all the markets it covers, so you need to make them accessible for import and visualization if necessary. Click View, click Market Watch, and then right-click any of the symbols shown in the new tab and choose Show All. This way you can see the extended list with more markets.
Creating the Importing AlgorithmÂ
Being able to automatically summon historical data of any time frame is a wonderful time-saver as it allows you to focus on research and analysis instead of wasting valuable time acquiring and cleaning the data. Letâs create a set of functions that import the historical data of a selected asset almost instantaneously.
Before proceeding to the coding part, you need to install the MetaTrader 5 Python integration library so you can use it later in SPYDER. This is easy and requires one step. Open the Anaconda prompt and type in pip install Metatrader5
, as shown in Figure 1-3.
Installation is the bridge that allows you to use Python libraries designed for MetaTrader 5 in the interpreter.
The following code block uses the import
built-in statement, which calls for internal (self-created) or external (created by third parties) libraries. A library is a store of functions, and thus, you need to import the libraries that are pertinent to what you want to do. For demonstration purposes, import the following modules, packages, and libraries:
import
datetime
import
pytz
import
pandas
as
pd
import
MetaTrader5
as
mt5
import
numpy
as
np
Â
The datetime
module gives tools for manipulating the dates and times, the pytz
library offers cross-platform time zone calculations that are needed for the import, and the pandas
and numpy
libraries are used for data manipulation and analysis.
Note
You mainly use numpy
 for most calculations and data manipulation.
The MetaTrader 5 library imports the functions relating to the softwareâs module and is the key library that will allow you to import the financial historical data.
Note that the three last lines of code contain the as
statement. This is used to give a custom name to the library when you want to use it frequently and save writing space. In other words, Python recognizes the MetaTrader 5 library as mt5 from now on.
Note
Modules are files that contain functions and variables. A package is a collection of modules; it needs to have an init.py file. A library is simply a collection of packages.
Executing the import
statements means that Python now recognizes the functions inside them and will allow you to use them in future code if you decide to call them. You must run them every time you open a new session, which is why the import
statements are usually found at the beginning of the code.
The next step is to create the universe of the time frames that you will be able to import. Even though I will be showing you how to analyze and back-test hourly data, you can define a wider universe, as shown in the following code snippet:
frame_M15
=
mt5
.
TIMEFRAME_M15
# 15-minute time
frameframe_M30
=
mt5
.
TIMEFRAME_M30
# 30-minute time frame
frame_H1
=
mt5
.
TIMEFRAME_H1
# Hourly time frame
frame_H4
=
mt5
.
TIMEFRAME_H4
# 4-hour time frame
frame_D1
=
mt5
.
TIMEFRAME_D1
# Daily time frame
frame_W1
=
mt5
.
TIMEFRAME_W1
# Weekly time frame
frame_M1
=
mt5
.
TIMEFRAME_MN1
# Monthly time frame
A time frame is the frequency with which you record the prices. With hourly data, you will record the last price printed every hour. This means that in a day, you can have up to 24 hourly prices. This allows you to see the intraday evolution of the price. However, the close price is just one of the things that you want to import. During a time period (whether hourly or daily), you will see the following:
- The first price of the time period, which is called the open price.
- The highest price printed during the time period, which is called the high price.Â
- The lowest price printed during the time period, which is called the low price.
- The last price seen before starting a new time period, which is referred to as the close price.
Altogether these are called the OHLC1 data, which are generally ordered as written.
The following code defines the current time, which is used so that the algorithm has a reference point when importing the data. Basically, you are creating a variable that stores the current time and date:
now
=
datetime
.
datetime
.
now
()
Letâs now proceed to defining the universe of the assets you want to back-test. I do a mix of four asset classes: currencies, cryptocurrencies, commodities, and equity indices:
- Currencies (also known as forex, an abbreviation for foreign exchange market) form the biggest financial market in terms of daily volume. Currencies are quoted in pairs, meaning that you cannot just buy USD in absolute terms; you have to buy it using another currency. Therefore, the EURUSD pair refers to the price of 1 EUR in terms of USD. The back-testing universe comprises EURUSD, USDCHF, GBPUSD, and USDCAD.
Note
USD is an abbreviation for the United States dollar, EUR is an abbreviation for the euro currency, CHF is an abbreviation for the Swiss franc, GBP is an abbreviation for Great Britainâs pound, and CAD is an abbreviation for the Canadian dollar.
- Cryptocurrencies (also known as cryptos) are a new, disruptive asset class characterized by severe volatility. The most well-known cryptocurrency is Bitcoin, followed by Ethereum. Note that both are expressed in terms of USD; this is why they are labeled as BTCUSD and ETHUSD.
Note
Notice that Bitcoin (BTC) and Ethereum (ETH) are quoted versus USD. They are generally considered the most liquid cryptocurrency pairs.
- Commodities are physical assets such as gold, silver, and copper. They are divided into many categories such as energy (crude oil, Brent oil, etc.) and industrial metals (copper, zinc, etc.). In the universe of assets, I stick to gold and silver.
- Equity indices are weighted calculations of a select basket of a countryâs stocks. They are used to analyze the overall stock market health of a nation. In this book, I cover the S&P 500, a proxy for US stocks, and the FTSE 100, a proxy for UK stocks:
assets
=
[
'EURUSD'
,
'USDCHF'
,
'GBPUSD'
,
'USDCAD'
,
'BTCUSD'
,
Â'ETHUSD'
,
'XAUUSD'
,
'XAGUSD'
,
'SP500m'
,
'UK100'
]
Now that you have your time and asset variables ready, all you need is to create the structure of the importing algorithm. The get_quotes()
function does this:
def
get_quotes
(
time_frame
,
year
=
2005
,
month
=
1
,
day
=
1
,
Âasset
=
"EURUSD"
):
      Âif
not
mt5
.
initialize
():
        Â(
"initialize() failed, error code ="
,
mt5
.
last_error
())
        Âquit
()
    Âtimezone
=
pytz
.
timezone
(
"Europe/Paris"
)
    Âtime_from
=
datetime
.
datetime
(
year
,
month
,
day
,
tzinfo
=
timezone
)
    Âtime_to
=
datetime
.
datetime
.
now
(
timezone
)
+
datetime
.
timedelta
(
days
=
1
)
    Ârates
=
mt5
.
copy_rates_range
(
asset
,
time_frame
,
time_from
,
time_to
)
    Ârates_frame
=
pd
.
DataFrame
(
rates
)
 Âreturn
rates_frame
Notice that in the get_quotes()
function, you finally use the pytz
and pandas
libraries. The function starts by defining the Olson time zone, which you can set yourself. Here is a brief, nonexhaustive list of what you can enter depending on your time zone:
America
/
New_York
Europe
/
London
Europe
/
Paris
Asia
/
Tokyo
Australia
/
Sydney
Afterward, I define two variables called time_from
and time_to
:
- The
time_from
variable contains the datetime referring to the beginning of the import date (e.g., 01-01-2020). - The
time_to
variable contains the datetime referring to the end of the import date (e.g., 12-31-2020).
The next step is to create a variable that imports the financial data using the time periods you have specified. This is done through the rates
variable using the mt5.copy_rates_range()
function. Finally, using pandas
, transform the data into a data frame.
Note
Throughout the book you will be dealing with arrays and not data frames; however, the function get_quotes()
first imports the values as a data frame due to compatibility, and then you will transform it into an array. At any rate, the main difference between a data frame and an array is the type of data you can keep inside and the structure of the axes.
The final function required for the importing process is the mass_import()
function. It lets you choose the time frame using the variable and then uses the get_quotes()
function to import the data and format it to an array. The following code snippet defines the mass_import()
function:
def
mass_import
(
asset
,
time_frame
):
          Âif
time_frame
==
'H1'
:
   Âdata
=
get_quotes
(
frame_H1
,
2013
,
1
,
1
,
asset
=
assets
[
asset
])
   Âdata
=
data
.
iloc
[:,
1
:
5
]
.
values
   Âdata
=
data
.
round
(
decimals
=
5
)
           Âif
time_frame
==
'D1'
:
   Âdata
=
get_quotes
(
frame_D1
,
2000
,
1
,
1
,
asset
=
assets
[
asset
])
   Âdata
=
data
.
iloc
[:,
1
:
5
]
.
values
   Âdata
=
data
.
round
(
decimals
=
5
)
            Âreturn
data
Â
The mass_import()
function automatically converts the data frame into an array, so you do not have to worry about conversion when using the automatic import.
Note
The algorithm imports a number of historical data limited by MetaTrader 5. Although that number is high, in time you may need to adjust the year argument higher in order to get the data. For instance, if you get an empty array using the mass_import()
function, try putting a more recent year in the get_quotes()
function (â2014â instead of â2013,â as shown in the preceding example).
Even though there is a MAC version of MetaTrader 5, the Python library only works on Windows. It requires an emulator on a Mac. For Mac users, you may also try the manual import method shown later in the chapter.
Putting It All Together
Letâs now see a full example of data import. Remember that the full importing code can be found in the GitHub repository of the book. Normally I am dealing only with hourly data in the book because traders use it heavily, which creates interesting signals; however, letâs try applying a couple of examples of importing after having defined the functions seen in this chapter:
my_data
=
mass_import
(
5
,
'D1'
)
my_data
=
mass_import
(
2
,
'H1'
)
There are many ways to import data into Python; some are automatic, and some are manual. You have just seen the first way of using code to communicate with a charting platform and to download the data. The manual way is to have an Excel file with OHLC data that you have downloaded from a third party. In this case, you can use the pandas
library to import it and transform it into an array.
Suppose that the Excel filename is my_data and the file is stored on your desktop. You have to make sure that the SPYDER directory is in the same place as the file. In laypersonâs terms, SPYDER must search the desktop for the Excel file. To choose the right directory, you must click the folder button next to the arrow, as shown in Figure 1-4.
You should get a separate window where you can choose the desktop location and then validate the choice. Having done this, the tab should look like Figure 1-5.
You must use the read_excel()
function to get the values inside the Excel file. Follow this syntax:
# Importing the excel file into the Python interpreter
my_data
=
pd
.
read_excel
(
'
my_data.xlsx
'
)
Right about now, you have a data frame called my_data
with four different columns representing open, high, low, and close prices. You generally have to enter the libraryâs name before using a function that belongs to it; this is why read_excel()
is preceded by pd
.
Note
Remember that pd
is the shortcut used to refer to pandas
. In parallel, np
is the shortcut used to refer to numpy
.
The following syntax shows how to convert structured elements from a data frame to an array to facilitate manipulation. The array library I use is called numpy
, which is the main library used in this book.
Note
I recommend using the automatic way for Windows users and the manual way for macOS users due to compatibility issues.
In your opinion, what should you do before using the functions of numpy
? If your answer is import the library, then you are correct. The following code snippet imports numpy
and converts my_data
into an array so that it is ready for analysis:
# Importing the library
import
numpy
as
np
# Converting from data frame to array
my_data
=
np
.
array
(
my_data
)
Alternatively, you can do all of this in one line by just adding .values
to pd.âread_excel('my_data.xlsx')
, thus becoming pd.read_excel('my_data.xlsx').âvalues
and resulting in an array instead of a data frame.
Summary
The research and trading framework is composed of four different algorithms that are discussed in detail in the next chapter. They can be summarized as follows:
- Import algorithm
- This is the algorithm shown in this chapter that deals with importing and preparing the historical OHLC data to be analyzed or back-tested. I believe that at this stage you can easily do this automatically and manually.
- Signal algorithm
- This algorithm, which you will see in Chapter 2, will be responsible for generating the buy and sell orders. In essence, it is the set of conditions that gives the green light that a pattern has appeared and a trade can take place.
- Charting algorithm
- This is the simplest algorithm. You will use it to chart the signals on the price chart. Its purpose is to visually analyze the buy and sell signals on the chart. You will also learn about this in Chapter 2.
- Performance algorithm
- This algorithm is used to calculate and analyze the results acquired from the signal algorithm. Basically, it calculates performance metrics on the signals generated following the patternsâ conditions. You will also learn about this in Chapter 2.
It is important that you know how to automatically import and prepare financial data that you will later analyze. Python offers strong and rapid tools to do so; hence, make sure you master this technique so that you supersize your research capabilities.
1 An abbreviation for open, high, low, and close.
Get Mastering Financial Pattern Recognition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.