## Chapter 4. Data Types and Structures

Linus Torvalds

This chapter introduces basic data types and data structures of `Python`. Although the `Python` interpreter itself already brings a rich variety of data structures with it, `NumPy` and other libraries add to these in a valuable fashion.

The chapter is organized as follows:

Basic data types
The first section introduces basic data types such as `int`, `float`, and `string`.
Basic data structures
The next section introduces the fundamental data structures of Python (e.g., `list` objects) and illustrates control structures, functional programming paradigms, and anonymous functions.
NumPy data structures
The following section is devoted to the characteristics and capabilities of the `NumPy` `ndarray` class and illustrates some of the benefits of this class for scientific and financial applications.
Vectorization of code
As the final section illustrates, thanks to `NumPy`’s array class vectorized code is easily implemented, leading to more compact and also better-performing code.

The spirit of this chapter is to provide a general introduction to `Python` specifics when it comes to data types and structures. If you are equipped with a background from another programing language, say `C` or `Matlab`, you should be able to easily grasp the differences that `Python` usage might bring along. The topics introduced here are all important and fundamental for the chapters to come.

## Basic Data Types

`Python` is a dynamically typed language, which means that the `Python` interpreter infers the type of an object at runtime. In comparison, compiled languages like `C` are generally statically typed. In these cases, the type of an object has to be attached to the object before compile time.[18]

### Integers

One of the most fundamental data types is the integer, or `int`:

````In` `[``1``]:` `a` `=` `10`
`type``(``a``)````
`Out[1]: int`

The built-in function `type` provides type information for all objects with standard and built-in types as well as for newly created classes and objects. In the latter case, the information provided depends on the description the programmer has stored with the class. There is a saying that “everything in `Python` is an object.” This means, for example, that even simple objects like the `int` object we just defined have built-in methods. For example, you can get the number of bits needed to represent the `int` object in-memory by calling the method `bit_length`:

``In` `[``2``]:` `a``.``bit_length``()``
`Out[2]: 4`

You will see that the number of bits needed increases the higher the integer value is that we assign to the object:

````In` `[``3``]:` `a` `=` `100000`
`a``.``bit_length``()````
`Out[3]: 17`

In general, there are so many different methods that it is hard to memorize all methods of all classes and objects. Advanced `Python` environments, like `IPython`, provide tab completion capabilities that show all methods attached to an object. You simply type the object name followed by a dot (e.g., `a.`) and then press the Tab key, e.g., `a.tab`. This then provides a collection of methods you can call on the object. Alternatively, the `Python` built-in function `dir` gives a complete list of attributes and methods of any object.

A specialty of `Python` is that integers can be arbitrarily large. Consider, for example, the googol number 10100. `Python` has no problem with such large numbers, which are technically `long` objects:

````In` `[``4``]:` `googol` `=` `10` `**` `100`
`googol````
```Out[4]: 100000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000L```
``In` `[``5``]:` `googol``.``bit_length``()``
`Out[5]: 333`

### Large Integers

`Python` integers can be arbitrarily large. The interpreter simply uses as many bits/bytes as needed to represent the numbers.

It is important to note that mathematical operations on `int` objects return `int` objects. This can sometimes lead to confusion and/or hard-to-detect errors in mathematical routines. The following expression yields the expected result:

``In` `[``6``]:` `1` `+` `4``
`Out[6]: 5`

However, the next case may return a somewhat surprising result:

``In` `[``7``]:` `1` `/` `4``
`Out[7]: 0`
``In` `[``8``]:` `type``(``1` `/` `4``)``
`Out[8]: int`

### Floats

For the last expression to return the generally desired result of 0.25, we must operate on `float` objects, which brings us naturally to the next basic data type. Adding a dot to an integer value, like in `1.` or `1.0`, causes `Python` to interpret the object as a `float`. Expressions involving a `float` also return a `float` object in general:[19]

``In` `[``9``]:` `1.` `/` `4``
`Out[9]: 0.25`
``In` `[``10``]:` `type` `(``1.` `/` `4``)``
`Out[10]: float`

A `float` is a bit more involved in that the computerized representation of rational or real numbers is in general not exact and depends on the specific technical approach taken. To illustrate what this implies, let us define another `float` object:

````In` `[``11``]:` `b` `=` `0.35`
`type``(``b``)````
`Out[11]: float`

`float` objects like this one are always represented internally up to a certain degree of accuracy only. This becomes evident when adding 0.1 to `b`:

``In` `[``12``]:` `b` `+` `0.1``
`Out[12]: 0.44999999999999996`

The reason for this is that `float`s are internally represented in binary format; that is, a decimal number 0 < n < 1 is represented by a series of the form . For certain floating-point numbers the binary representation might involve a large number of elements or might even be an infinite series. However, given a fixed number of bits used to represent such a number—i.e., a fixed number of terms in the representation series—inaccuracies are the consequence. Other numbers can be represented perfectly and are therefore stored exactly even with a finite number of bits available. Consider the following example:

````In` `[``13``]:` `c` `=` `0.5`
`c``.``as_integer_ratio``()````
`Out[13]: (1, 2)`

One half, i.e., 0.5, is stored exactly because it has an exact (finite) binary representation as . However, for `b = 0.35` we get something different than the expected rational number :

``In` `[``14``]:` `b``.``as_integer_ratio``()``
`Out[14]: (3152519739159347, 9007199254740992)`

The precision is dependent on the number of bits used to represent the number. In general, all platforms that `Python` runs on use the IEEE 754 double-precision standard (i.e., 64 bits), for internal representation.[20] This translates into a 15-digit relative accuracy.

Since this topic is of high importance for several application areas in finance, it is sometimes necessary to ensure the exact, or at least best possible, representation of numbers. For example, the issue can be of importance when summing over a large set of numbers. In such a situation, a certain kind and/or magnitude of representation error might, in aggregate, lead to significant deviations from a benchmark value.

The module `decimal` provides an arbitrary-precision object for floating-point numbers and several options to address precision issues when working with such numbers:

````In` `[``15``]:` `import` `decimal`
`from` `decimal` `import` `Decimal````
``In` `[``16``]:` `decimal``.``getcontext``()``
```Out[16]: Context(prec=28, rounding=ROUND_HALF_EVEN, Emin=-999999999, Emax=999999
999, capitals=1, flags=[], traps=[Overflow, InvalidOperation, DivisionB
yZero])```
````In` `[``17``]:` `d` `=` `Decimal``(``1``)` `/` `Decimal` `(``11``)`
`d````
`Out[17]: Decimal('0.09090909090909090909090909091')`

You can change the precision of the representation by changing the respective attribute value of the `Context` object:

``In` `[``18``]:` `decimal``.``getcontext``()``.``prec` `=` `4`  `# lower precision than default``
````In` `[``19``]:` `e` `=` `Decimal``(``1``)` `/` `Decimal` `(``11``)`
`e````
`Out[19]: Decimal('0.09091')`
``In` `[``20``]:` `decimal``.``getcontext``()``.``prec` `=` `50`  `# higher precision than default``
````In` `[``21``]:` `f` `=` `Decimal``(``1``)` `/` `Decimal` `(``11``)`
`f````
`Out[21]: Decimal('0.090909090909090909090909090909090909090909090909091')`

If needed, the precision can in this way be adjusted to the exact problem at hand and one can operate with floating-point objects that exhibit different degrees of accuracy:

````In` `[``22``]:` `g` `=` `d` `+` `e` `+` `f`
`g````
`Out[22]: Decimal('0.27272818181818181818181818181909090909090909090909')`

### Arbitrary-Precision Floats

The module `decimal` provides an arbitrary-precision floating-point number object. In finance, it might sometimes be necessary to ensure high precision and to go beyond the 64-bit double-precision standard.

### Strings

Now that we can represent natural and floating-point numbers, we turn to text. The basic data type to represent text in `Python` is the `string`. The `string` object has a number of really helpful built-in methods. In fact, `Python` is generally considered to be a good choice when it comes to working with text files of any kind and any size. A `string` object is generally defined by single or double quotation marks or by converting another object using the `str` function (i.e., using the object’s standard or user-defined `string` representation):

``In` `[``23``]:` `t` `=` `'this is a string object'``

With regard to the built-in methods, you can, for example, capitalize the first word in this object:

``In` `[``24``]:` `t``.``capitalize``()``
`Out[24]: 'This is a string object'`

Or you can split it into its single-word components to get a `list` object of all the words (more on `list` objects later):

``In` `[``25``]:` `t``.``split``()``
`Out[25]: ['this', 'is', 'a', 'string', 'object']`

You can also search for a word and get the position (i.e., index value) of the first letter of the word back in a successful case:

``In` `[``26``]:` `t``.``find``(``'string'``)``
`Out[26]: 10`

If the word is not in the `string` object, the method returns -1:

``In` `[``27``]:` `t``.``find``(``'Python'``)``
`Out[27]: -1`

Replacing characters in a string is a typical task that is easily accomplished with the `replace` method:

``In` `[``28``]:` `t``.``replace``(``' '``,` `'|'``)``
`Out[28]: 'this|is|a|string|object'`

The stripping of strings—i.e., deletion of certain leading/lagging characters—is also often necessary:

``In` `[``29``]:` `'http://www.python.org'``.``strip``(``'htp:/'``)``
`Out[29]: 'www.python.org'`

Table 4-1 lists a number of helpful methods of the `string` object.

Table 4-1. Selected string methods
 Method Arguments Returns/result `capitalize` `()` Copy of the string with first letter capitalized `count` `(``sub``[,` `start``[,` `end``]])` Count of the number of occurrences of substring `decode` `([``encoding``[,` `errors``]])` Decoded version of the string, using `encoding` (e.g., UTF-8) `encode` `([``encoding``[,` `errors``]])` Encoded version of the string `find` `(``sub``[,` `start``[,` `end``]])` (Lowest) index where substring is found `join` `(``seq``)` Concatenation of strings in sequence `seq` `replace` `(``old``,` `new``[,` `count``])` Replaces `old` by `new` the first `count` times `split` `([``sep``[,` `maxsplit``]])` List of words in string with `sep` as separator `splitlines` `([``keepends``])` Separated lines with line ends/breaks if `keepends` is `True` `strip` `(``chars``)` Copy of string with leading/lagging characters in `chars` removed `upper` `()` Copy with all letters capitalized

A powerful tool when working with `string` objects is regular expressions. `Python` provides such functionality in the module `re`:

``In` `[``30``]:` `import` `re``

Suppose you are faced with a large text file, such as a comma-separated value (`CSV`) file, which contains certain time series and respective date-time information. More often than not, the date-time information is delivered in a format that `Python` cannot interpret directly. However, the date-time information can generally be described by a regular expression. Consider the following `string` object, containing three date-time elements, three integers, and three strings. Note that triple quotation marks allow the definition of strings over multiple rows:

````In` `[``31``]:` `series` `=` `"""`
`         '01/18/2014 13:00:00', 100, '1st';`
`         '01/18/2014 13:30:00', 110, '2nd';`
`         '01/18/2014 14:00:00', 120, '3rd'`
`         """````

The following regular expression describes the format of the date-time information provided in the `string` object:[21]

``In` `[``32``]:` `dt` `=` `re``.``compile``(``"'[0-9/:\s]+'"``)`  `# datetime``

Equipped with this regular expression, we can go on and find all the date-time elements. In general, applying regular expressions to `string` objects also leads to performance improvements for typical parsing tasks:

````In` `[``33``]:` `result` `=` `dt``.``findall``(``series``)`
`result````
```Out[33]: ["'01/18/2014 13:00:00'", "'01/18/2014 13:30:00'", "'01/18/2014 14:00:0
0'"]```

### Regular Expressions

When parsing `string` objects, consider using regular expressions, which can bring both convenience and performance to such operations.

The resulting `string` objects can then be parsed to generate `Python datetime` objects (cf. Appendix C for an overview of handling date and time data with `Python`). To parse the `string` objects containing the date-time information, we need to provide information of how to parse—again as a `string` object:

````In` `[``34``]:` `from` `datetime` `import` `datetime`
`pydt` `=` `datetime``.``strptime``(``result``[``0``]``.``replace``(``"'"``,` `""``),`
`'%m/``%d``/%Y %H:%M:%S'``)`
`pydt````
`Out[34]: datetime.datetime(2014, 1, 18, 13, 0)`
``In` `[``35``]:` `print` `pydt``
`Out[35]: 2014-01-18 13:00:00`
``In` `[``36``]:` `print` `type``(``pydt``)``
`Out[36]: <type 'datetime.datetime'>`

Later chapters provide more information on date-time data, the handling of such data, and `datetime` objects and their methods. This is just meant to be a teaser for this important topic in finance.

## Basic Data Structures

As a general rule, data structures are objects that contain a possibly large number of other objects. Among those that `Python` provides as built-in structures are:

`tuple`
A collection of arbitrary objects; only a few methods available
`list`
A collection of arbitrary objects; many methods available
`dict`
A key-value store object
`set`
An unordered collection object for other unique objects

### Tuples

A `tuple` is an advanced data structure, yet it’s still quite simple and limited in its applications. It is defined by providing objects in parentheses:

````In` `[``37``]:` `t` `=` `(``1``,` `2.5``,` `'data'``)`
`type``(``t``)````
`Out[37]: tuple`

You can even drop the parentheses and provide multiple objects separated by commas:

````In` `[``38``]:` `t` `=` `1``,` `2.5``,` `'data'`
`type``(``t``)````
`Out[38]: tuple`

Like almost all data structures in `Python` the `tuple` has a built-in index, with the help of which you can retrieve single or multiple elements of the `tuple`. It is important to remember that `Python` uses zero-based numbering, such that the third element of a `tuple` is at index position 2:

``In` `[``39``]:` `t``[``2``]``
`Out[39]: 'data'`
``In` `[``40``]:` `type``(``t``[``2``])``
`Out[40]: str`

### Zero-Based Numbering

In contrast to some other programming languages like `Matlab`, `Python` uses zero-based numbering schemes. For example, the first element of a `tuple` object has index value 0.

There are only two special methods that this object type provides: `count` and `index`. The first counts the number of occurrences of a certain object and the second gives the index value of the first appearance of it:

``In` `[``41``]:` `t``.``count``(``'data'``)``
`Out[41]: 1`
``In` `[``42``]:` `t``.``index``(``1``)``
`Out[42]: 0`

`tuple` objects are not very flexible since, once defined, they cannot be changed easily.

### Lists

Objects of type `list` are much more flexible and powerful in comparison to `tuple` objects. From a finance point of view, you can achieve a lot working only with `list` objects, such as storing stock price quotes and appending new data. A `list` object is defined through brackets and the basic capabilities and behavior are similar to those of `tuple` objects:

````In` `[``43``]:` `l` `=` `[``1``,` `2.5``,` `'data'``]`
`l``[``2``]````
`Out[43]: 'data'`

`list` objects can also be defined or converted by using the function `list`. The following code generates a new `list` object by converting the `tuple` object from the previous example:

````In` `[``44``]:` `l` `=` `list``(``t``)`
`l````
`Out[44]: [1, 2.5, 'data']`
``In` `[``45``]:` `type``(``l``)``
`Out[45]: list`

In addition to the characteristics of `tuple` objects, `list` objects are also expandable and reducible via different methods. In other words, whereas `string` and `tuple` objects are immutable sequence objects (with indexes) that cannot be changed once created, `list` objects are mutable and can be changed via different operations. You can append `list` objects to an existing `list` object, and more:

````In` `[``46``]:` `l``.``append``([``4``,` `3``])`  `# append list at the end`
`l````
`Out[46]: [1, 2.5, 'data', [4, 3]]`
````In` `[``47``]:` `l``.``extend``([``1.0``,` `1.5``,` `2.0``])`  `# append elements of list`
`l````
`Out[47]: [1, 2.5, 'data', [4, 3], 1.0, 1.5, 2.0]`
````In` `[``48``]:` `l``.``insert``(``1``,` `'insert'``)`  `# insert object before index position`
`l````
`Out[48]: [1, 'insert', 2.5, 'data', [4, 3], 1.0, 1.5, 2.0]`
````In` `[``49``]:` `l``.``remove``(``'data'``)`  `# remove first occurrence of object`
`l````
`Out[49]: [1, 'insert', 2.5, [4, 3], 1.0, 1.5, 2.0]`
````In` `[``50``]:` `p` `=` `l``.``pop``(``3``)`  `# removes and returns object at index`
`print` `l``,` `p````
`Out[50]: [1, 'insert', 2.5, 1.0, 1.5, 2.0] [4, 3]`

Slicing is also easily accomplished. Here, slicing refers to an operation that breaks down a data set into smaller parts (of interest):

``In` `[``51``]:` `l``[``2``:``5``]`  `# 3rd to 5th elements``
`Out[51]: [2.5, 1.0, 1.5]`

Table 4-2 provides a summary of selected operations and methods of the `list` object.

Table 4-2. Selected operations and methods of list objects
 Method Arguments Returns/result `l[i] = x` `[``i``]` Replaces `i`th element by `x` `l[i:j:k] = s` `[``i``:``j``:``k``]` Replaces every `k`th element from `i` to `j` - 1 by `s` `append` `(``x``)` Appends `x` to object `count` `(``x``)` Number of occurrences of object `x` `del l[i:j:k]` `[``i``:``j``:``k``]` Deletes elements with index values `i` to `j` – 1 `extend` `(``s``)` Appends all elements of `s` to object `index` `(``x``[,` `i``[,` `j``]])` First index of `x` between elements `i` and `j` – 1 `insert` `(``i``,` `x`)++ Inserts `x` at/before index `i` `remove` `(``i``)` Removes element with index `i` `pop` `(``i``)` Removes element with index `i` and return it `reverse` `()` Reverses all items in place `sort` `([``cmp``[,` `key``[,` `reverse``]]])` Sorts all items in place

### Excursion: Control Structures

Although a topic in itself, control structures like `for` loops are maybe best introduced in `Python` based on `list` objects. This is due to the fact that looping in general takes place over `list` objects, which is quite different to what is often the standard in other languages. Take the following example. The `for` loop loops over the elements of the `list` object `l` with index values 2 to 4 and prints the square of the respective elements. Note the importance of the indentation (whitespace) in the second line:

````In` `[``52``]:` `for` `element` `in` `l``[``2``:``5``]:`
`print` `element` `**` `2````
```Out[52]: 6.25
1.0
2.25```

This provides a really high degree of flexibility in comparison to the typical counter-based looping. Counter-based looping is also an option with `Python`, but is accomplished based on the (standard) `list` object `range`:

````In` `[``53``]:` `r` `=` `range``(``0``,` `8``,` `1``)`  `# start, end, step width`
`r````
`Out[53]: [0, 1, 2, 3, 4, 5, 6, 7]`
``In` `[``54``]:` `type``(``r``)``
`Out[54]: list`

For comparison, the same loop is implemented using `range` as follows:

````In` `[``55``]:` `for` `i` `in` `range``(``2``,` `5``):`
`print` `l``[``i``]` `**` `2````
```Out[55]: 6.25
1.0
2.25```

### Looping over Lists

In `Python` you can loop over arbitrary `list` objects, no matter what the content of the object is. This often avoids the introduction of a counter.

`Python` also provides the typical (conditional) control elements `if`, `elif`, and `else`. Their use is comparable in other languages:

````In` `[``56``]:` `for` `i` `in` `range``(``1``,` `10``):`
`if` `i` `%` `2` `==` `0``:`  `# % is for modulo`
`print` `"``%d`` is even"` `%` `i`
`elif` `i` `%` `3` `==` `0``:`
`print` `"``%d`` is multiple of 3"` `%` `i`
`else``:`
`print` `"``%d`` is odd"` `%` `i````
```Out[56]: 1 is odd
2 is even
3 is multiple of 3
4 is even
5 is odd
6 is even
7 is odd
8 is even
9 is multiple of 3```

Similarly, `while` provides another means to control the flow:

````In` `[``57``]:` `total` `=` `0`
`while` `total` `<` `100``:`
`total` `+=` `1`
`print` `total````
`Out[57]: 100`

A specialty of `Python` is so-called `list` comprehensions. Instead of looping over existing `list` objects, this approach generates `list` objects via loops in a rather compact fashion:

````In` `[``58``]:` `m` `=` `[``i` `**` `2` `for` `i` `in` `range``(``5``)]`
`m````
`Out[58]: [0, 1, 4, 9, 16]`

In a certain sense, this already provides a first means to generate “something like” vectorized code in that loops are rather more implicit than explicit (vectorization of code is discussed in more detail later in this chapter).

### Excursion: Functional Programming

`Python` provides a number of tools for functional programming support as well—i.e., the application of a function to a whole set of inputs (in our case `list` objects). Among these tools are `filter`, `map`, and `reduce`. However, we need a function definition first. To start with something really simple, consider a function `f` that returns the square of the input `x`:

````In` `[``59``]:` `def` `f``(``x``):`
`return` `x` `**` `2`
`f``(``2``)````
`Out[59]: 4`

Of course, functions can be arbitrarily complex, with multiple input/parameter objects and even multiple outputs, (return objects). However, consider the following function:

````In` `[``60``]:` `def` `even``(``x``):`
`return` `x` `%` `2` `==` `0`
`even``(``3``)````
`Out[60]: False`

The return object is a Boolean. Such a function can be applied to a whole `list` object by using `map`:

``In` `[``61``]:` `map``(``even``,` `range``(``10``))``
`Out[61]: [True, False, True, False, True, False, True, False, True, False]`

To this end, we can also provide a function definition directly as an argument to `map`, by using `lambda` or anonymous functions:

``In` `[``62``]:` `map``(``lambda` `x``:` `x` `**` `2``,` `range``(``10``))``
`Out[62]: [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]`

Functions can also be used to filter a `list` object. In the following example, the filter returns elements of a `list` object that match the Boolean condition as defined by the `even` function:

``In` `[``63``]:` `filter``(``even``,` `range``(``15``))``
`Out[63]: [0, 2, 4, 6, 8, 10, 12, 14]`

Finally, `reduce` helps when we want to apply a function to all elements of a `list` object that returns a single value only. An example is the cumulative sum of all elements in a `list` object (assuming that summation is defined for the objects contained in the list):

``In` `[``64``]:` `reduce``(``lambda` `x``,` `y``:` `x` `+` `y``,` `range``(``10``))``
`Out[64]: 45`

An alternative, nonfunctional implementation could look like the following:

````In` `[``65``]:` `def` `cumsum``(``l``):`
`total` `=` `0`
`for` `elem` `in` `l``:`
`total` `+=` `elem`
`return` `total`
`cumsum``(``range``(``10``))````
`Out[65]: 45`

### List Comprehensions, Functional Programming, Anonymous Functions

It can be considered good practice to avoid loops on the `Python` level as far as possible. `list` comprehensions and functional programming tools like `map`, `filter`, and `reduce` provide means to write code without loops that is both compact and in general more readable. `lambda` or anonymous functions are also powerful tools in this context.

### Dicts

`dict` objects are dictionaries, and also mutable sequences, that allow data retrieval by keys that can, for example, be `string` objects. They are so-called key-value stores. While `list` objects are ordered and sortable, `dict` objects are unordered and unsortable. An example best illustrates further differences to `list` objects. Curly brackets are what define `dict` objects:

````In` `[``66``]:` `d` `=` `{`
`'Name'` `:` `'Angela Merkel'``,`
`'Country'` `:` `'Germany'``,`
`'Profession'` `:` `'Chancelor'``,`
`'Age'` `:` `60`
`}`
`type``(``d``)````
`Out[66]: dict`
``In` `[``67``]:` `print` `d``[``'Name'``],` `d``[``'Age'``]``
`Out[67]: Angela Merkel 60`

Again, this class of objects has a number of built-in methods:

``In` `[``68``]:` `d``.``keys``()``
`Out[68]: ['Country', 'Age', 'Profession', 'Name']`
``In` `[``69``]:` `d``.``values``()``
`Out[69]: ['Germany', 60, 'Chancelor', 'Angela Merkel']`
``In` `[``70``]:` `d``.``items``()``
```Out[70]: [('Country', 'Germany'),
('Age', 60),
('Profession', 'Chancelor'),
('Name', 'Angela Merkel')]```
````In` `[``71``]:` `birthday` `=` `True`
`if` `birthday` `is` `True``:`
`d``[``'Age'``]` `+=` `1`
`print` `d``[``'Age'``]````
`Out[71]: 61`

There are several methods to get `iterator` objects from the `dict` object. The objects behave like `list` objects when iterated over:

````In` `[``72``]:` `for` `item` `in` `d``.``iteritems``():`
`print` `item````
```Out[72]: ('Country', 'Germany')
('Age', 61)
('Profession', 'Chancelor')
('Name', 'Angela Merkel')```
````In` `[``73``]:` `for` `value` `in` `d``.``itervalues``():`
`print` `type``(``value``)````
```Out[73]: <type 'str'>
<type 'int'>
<type 'str'>
<type 'str'>```

Table 4-3 provides a summary of selected operations and methods of the `dict` object.

Table 4-3. Selected operations and methods of dict objects
 Method Arguments Returns/result `d[k]` `[k]` Item of `d` with key `k` `d[k] = x` `[k]` Sets item key `k` to `x` `del d[k]` `[k]` Deletes item with key `k` `clear` `()` Removes all items `copy` `()` Makes a copy `has_key` `(k)` `True` if `k` is a key `items` `()` Copy of all key-value pairs `iteritems` `()` Iterator over all items `iterkeys` `()` Iterator over all keys `itervalues` `()` Iterator over all values `keys` `()` Copy of all keys `poptiem` `(k)` Returns and removes item with key `k` `update` `([e])` Updates items with items from `e` `values` `()` Copy of all values

### Sets

The last data structure we will consider is the `set` object. Although set theory is a cornerstone of mathematics and also finance theory, there are not too many practical applications for `set` objects. The objects are unordered collections of other objects, containing every element only once:

````In` `[``74``]:` `s` `=` `set``([``'u'``,` `'d'``,` `'ud'``,` `'du'``,` `'d'``,` `'du'``])`
`s````
`Out[74]: {'d', 'du', 'u', 'ud'}`
``In` `[``75``]:` `t` `=` `set``([``'d'``,` `'dd'``,` `'uu'``,` `'u'``])``

With `set` objects, you can implement operations as you are used to in mathematical set theory. For example, you can generate unions, intersections, and differences:

``In` `[``76``]:` `s``.``union``(``t``)`  `# all of s and t``
`Out[76]: {'d', 'dd', 'du', 'u', 'ud', 'uu'}`
``In` `[``77``]:` `s``.``intersection``(``t``)`  `# both in s and t``
`Out[77]: {'d', 'u'}`
``In` `[``78``]:` `s``.``difference``(``t``)`  `# in s but not t``
`Out[78]: {'du', 'ud'}`
``In` `[``79``]:` `t``.``difference``(``s``)`  `# in t but not s``
`Out[79]: {'dd', 'uu'}`
``In` `[``80``]:` `s``.``symmetric_difference``(``t``)`  `# in either one but not both``
`Out[80]: {'dd', 'du', 'ud', 'uu'}`

One application of `set` objects is to get rid of duplicates in a `list` object. For example:

````In` `[``81``]:` `from` `random` `import` `randint`
`l` `=` `[``randint``(``0``,` `10``)` `for` `i` `in` `range``(``1000``)]`
`# 1,000 random integers between 0 and 10`
`len``(``l``)`  `# number of elements in l````
`Out[81]: 1000`
``In` `[``82``]:` `l``[:``20``]``
`Out[82]: [8, 3, 4, 9, 1, 7, 5, 5, 6, 7, 4, 4, 7, 1, 8, 5, 0, 7, 1, 9]`
````In` `[``83``]:` `s` `=` `set``(``l``)`
`s````
`Out[83]: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10}`

## NumPy Data Structures

The previous section shows that `Python` provides some quite useful and flexible general data structures. In particular, `list` objects can be considered a real workhorse with many convenient characteristics and application areas. However, scientific and financial applications generally have a need for high-performing operations on special data structures. One of the most important data structures in this regard is the array. Arrays generally structure other (fundamental) objects in rows and columns.

Assume for the moment that we work with numbers only, although the concept generalizes to other types of data as well. In the simplest case, a one-dimensional array then represents, mathematically speaking, a vector of, in general, real numbers, internally represented by `float` objects. It then consists of a single row or column of elements only. In a more common case, an array represents an i × j matrix of elements. This concept generalizes to i × j × k cubes of elements in three dimensions as well as to general n-dimensional arrays of shape i × j × k × l × … .

Mathematical disciplines like linear algebra and vector space theory illustrate that such mathematical structures are of high importance in a number of disciplines and fields. It can therefore prove fruitful to have available a specialized class of data structures explicitly designed to handle arrays conveniently and efficiently. This is where the `Python` library `NumPy` comes into play, with its `ndarray` class.

### Arrays with Python Lists

Before we turn to `NumPy`, let us first construct arrays with the built-in data structures presented in the previous section. `list` objects are particularly suited to accomplishing this task. A simple `list` can already be considered a one-dimensional array:

``In` `[``84``]:` `v` `=` `[``0.5``,` `0.75``,` `1.0``,` `1.5``,` `2.0``]`  `# vector of numbers``

Since `list` objects can contain arbitrary other objects, they can also contain other `list` objects. In that way, two- and higher-dimensional arrays are easily constructed by nested `list` objects:

````In` `[``85``]:` `m` `=` `[``v``,` `v``,` `v``]`  `# matrix of numbers`
`m````
```Out[85]: [[0.5, 0.75, 1.0, 1.5, 2.0],
[0.5, 0.75, 1.0, 1.5, 2.0],
[0.5, 0.75, 1.0, 1.5, 2.0]]```

We can also easily select rows via simple indexing or single elements via double indexing (whole columns, however, are not so easy to select):

``In` `[``86``]:` `m``[``1``]``
`Out[86]: [0.5, 0.75, 1.0, 1.5, 2.0]`
``In` `[``87``]:` `m``[``1``][``0``]``
`Out[87]: 0.5`

Nesting can be pushed further for even more general structures:

````In` `[``88``]:` `v1` `=` `[``0.5``,` `1.5``]`
`v2` `=` `[``1``,` `2``]`
`m` `=` `[``v1``,` `v2``]`
`c` `=` `[``m``,` `m``]`  `# cube of numbers`
`c````
`Out[88]: [[[0.5, 1.5], [1, 2]], [[0.5, 1.5], [1, 2]]]`
``In` `[``89``]:` `c``[``1``][``1``][``0``]``
`Out[89]: 1`

Note that combining objects in the way just presented generally works with reference pointers to the original objects. What does that mean in practice? Let us have a look at the following operations:

````In` `[``90``]:` `v` `=` `[``0.5``,` `0.75``,` `1.0``,` `1.5``,` `2.0``]`
`m` `=` `[``v``,` `v``,` `v``]`
`m````
```Out[90]: [[0.5, 0.75, 1.0, 1.5, 2.0],
[0.5, 0.75, 1.0, 1.5, 2.0],
[0.5, 0.75, 1.0, 1.5, 2.0]]```

Now change the value of the first element of the `v` object and see what happens to the `m` object:

````In` `[``91``]:` `v``[``0``]` `=` `'Python'`
`m````
```Out[91]: [['Python', 0.75, 1.0, 1.5, 2.0],
['Python', 0.75, 1.0, 1.5, 2.0],
['Python', 0.75, 1.0, 1.5, 2.0]]```

This can be avoided by using the `deepcopy` function of the `copy` module:

````In` `[``92``]:` `from` `copy` `import` `deepcopy`
`v` `=` `[``0.5``,` `0.75``,` `1.0``,` `1.5``,` `2.0``]`
`m` `=` `3` `*` `[``deepcopy``(``v``),` `]`
`m````
```Out[92]: [[0.5, 0.75, 1.0, 1.5, 2.0],
[0.5, 0.75, 1.0, 1.5, 2.0],
[0.5, 0.75, 1.0, 1.5, 2.0]]```
````In` `[``93``]:` `v``[``0``]` `=` `'Python'`
`m````
```Out[93]: [[0.5, 0.75, 1.0, 1.5, 2.0],
[0.5, 0.75, 1.0, 1.5, 2.0],
[0.5, 0.75, 1.0, 1.5, 2.0]]```

### Regular NumPy Arrays

Obviously, composing array structures with `list` objects works, somewhat. But it is not really convenient, and the `list` class has not been built with this specific goal in mind. It has rather been built with a much broader and more general scope. From this point of view, some kind of specialized class could therefore be really beneficial to handle array-type structures.

Such a specialized class is `numpy.ndarray`, which has been built with the specific goal of handling n-dimensional arrays both conveniently and efficiently—i.e., in a highly performing manner. The basic handling of instances of this class is again best illustrated by examples:

``In` `[``94``]:` `import` `numpy` `as` `np``
````In` `[``95``]:` `a` `=` `np``.``array``([``0``,` `0.5``,` `1.0``,` `1.5``,` `2.0``])`
`type``(``a``)````
`Out[95]: numpy.ndarray`
``In` `[``96``]:` `a``[:``2``]`  `# indexing as with list objects in 1 dimension``
`Out[96]: array([ 0. ,  0.5])`

A major feature of the `numpy.ndarray` class is the multitude of built-in methods. For instance:

``In` `[``97``]:` `a``.``sum``()`  `# sum of all elements``
`Out[97]: 5.0`
``In` `[``98``]:` `a``.``std``()`  `# standard deviation``
`Out[98]: 0.70710678118654757`
``In` `[``99``]:` `a``.``cumsum``()`  `# running cumulative sum``
`Out[99]: array([ 0. ,  0.5,  1.5,  3. ,  5. ])`

Another major feature is the (vectorized) mathematical operations defined on `ndarray` objects:

``In` `[``100``]:` `a` `*` `2``
`Out[100]: array([ 0.,  1.,  2.,  3.,  4.])`
``In` `[``101``]:` `a` `**` `2``
`Out[101]: array([ 0.  ,  0.25,  1.  ,  2.25,  4.  ])`
``In` `[``102``]:` `np``.``sqrt``(``a``)``
```Out[102]: array([ 0.        ,  0.70710678,  1.        ,  1.22474487,  1.41421356
])```

The transition to more than one dimension is seamless, and all features presented so far carry over to the more general cases. In particular, the indexing system is made consistent across all dimensions:

````In` `[``103``]:` `b` `=` `np``.``array``([``a``,` `a` `*` `2``])`
`b````
```Out[103]: array([[ 0. ,  0.5,  1. ,  1.5,  2. ],
[ 0. ,  1. ,  2. ,  3. ,  4. ]])```
``In` `[``104``]:` `b``[``0``]`  `# first row``
`Out[104]: array([ 0. ,  0.5,  1. ,  1.5,  2. ])`
``In` `[``105``]:` `b``[``0``,` `2``]`  `# third element of first row``
`Out[105]: 1.0`
``In` `[``106``]:` `b``.``sum``()``
`Out[106]: 15.0`

In contrast to our `list` object-based approach to constructing arrays, the `numpy.ndarray` class knows axes explicitly. Selecting either rows or columns from a matrix is essentially the same:

````In` `[``107``]:` `b``.``sum``(``axis``=``0``)`
`# sum along axis 0, i.e. column-wise sum````
`Out[107]: array([ 0. ,  1.5,  3. ,  4.5,  6. ])`
````In` `[``108``]:` `b``.``sum``(``axis``=``1``)`
`# sum along axis 1, i.e. row-wise sum````
`Out[108]: array([  5.,  10.])`

There are a number of ways to initialize (instantiate) a `numpy.ndarray` object. One is as presented before, via `np.array`. However, this assumes that all elements of the array are already available. In contrast, one would maybe like to have the `numpy.ndarray` objects instantiated first to populate them later with results generated during the execution of code. To this end, we can use the following functions:

````In` `[``109``]:` `c` `=` `np``.``zeros``((``2``,` `3``,` `4``),` `dtype``=``'i'``,` `order``=``'C'``)`  `# also: np.ones()`
`c````
```Out[109]: array([[[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0]],

[[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0]]], dtype=int32)```
````In` `[``110``]:` `d` `=` `np``.``ones_like``(``c``,` `dtype``=``'f16'``,` `order``=``'C'``)`  `# also: np.zeros_like()`
`d````
```Out[110]: array([[[ 1.0,  1.0,  1.0,  1.0],
[ 1.0,  1.0,  1.0,  1.0],
[ 1.0,  1.0,  1.0,  1.0]],

[[ 1.0,  1.0,  1.0,  1.0],
[ 1.0,  1.0,  1.0,  1.0],
[ 1.0,  1.0,  1.0,  1.0]]], dtype=float128)```

With all these functions we provide the following information:

`shape`
Either an `int`, a sequence of `int`s, or a reference to another `numpy.ndarray`
`dtype` (optional)
A `numpy.dtype`—these are `NumPy`-specific data types for `numpy.ndarray` objects
`order` (optional)
The order in which to store elements in memory: `C` for `C`-like (i.e., row-wise) or `F` for `Fortran`-like (i.e., column-wise)

Here, it becomes obvious how `NumPy` specializes the construction of arrays with the `numpy.ndarray` class, in comparison to the `list`-based approach:

• The shape/length/size of the array is homogenous across any given dimension.
• It only allows for a single data type (`numpy.dtype`) for the whole array.

The role of the `order` parameter is discussed later in the chapter. Table 4-4 provides an overview of `numpy.dtype` objects (i.e., the basic data types `NumPy` allows).

Table 4-4. NumPy dtype objects
 dtype Description Example `t` Bit field `t4` (4 bits) `b` Boolean `b` (true or false) `i` Integer `i8` (64 bit) `u` Unsigned integer `u8` (64 bit) `f` Floating point `f8` (64 bit) `c` Complex floating point `c16` (128 bit) `O` Object `0` (pointer to object) `S`, `a` String `S24` (24 characters) `U` Unicode `U24` (24 Unicode characters) `V` Other `V12` (12-byte data block)

`NumPy` provides a generalization of regular arrays that loosens at least the `dtype` restriction, but let us stick with regular arrays for a moment and see what the specialization brings in terms of performance.

As a simple exercise, suppose we want to generate a matrix/array of shape 5,000 × 5,000 elements, populated with (pseudo)random, standard normally distributed numbers. We then want to calculate the sum of all elements. First, the pure `Python` approach, where we make heavy use of `list` comprehensions and functional programming methods as well as `lambda` functions:

````In` `[``111``]:` `import` `random`
`I` `=` `5000````
````In` `[``112``]:` `%``time` `mat` `=` `[[``random``.``gauss``(``0``,` `1``)` `for` `j` `in` `range``(``I``)]` `for` `i` `in` `range``(``I``)]`
`# a nested list comprehension````
```Out[112]: CPU times: user 36.5 s, sys: 408 ms, total: 36.9 s
Wall time: 36.4 s```
````In` `[``113``]:` `%``time` `reduce``(``lambda` `x``,` `y``:` `x` `+` `y``,`      \
`[``reduce``(``lambda` `x``,` `y``:` `x` `+` `y``,` `row``)` \
`for` `row` `in` `mat``])````
```Out[113]: CPU times: user 4.3 s, sys: 52 ms, total: 4.35 s
Wall time: 4.07 s

678.5908519876674```

Let us now turn to `NumPy` and see how the same problem is solved there. For convenience, the `NumPy` sublibrary `random` offers a multitude of functions to initialize a `numpy.ndarray` object and populate it at the same time with (pseudo)random numbers:

``In` `[``114``]:` `%``time` `mat` `=` `np``.``random``.``standard_normal``((``I``,` `I``))``
```Out[114]: CPU times: user 1.83 s, sys: 40 ms, total: 1.87 s
Wall time: 1.87 s```
``In` `[``115``]:` `%``time` `mat``.``sum``()``
```Out[115]: CPU times: user 36 ms, sys: 0 ns, total: 36 ms
Wall time: 34.6 ms

349.49777911439384```

We observe the following:

Syntax
Although we use several approaches to compact the pure `Python` code, the `NumPy` version is even more compact and readable.
Performance
The generation of the `numpy.ndarray` object is roughly 20 times faster and the calculation of the sum is roughly 100 times faster than the respective operations in pure `Python`.

### Using NumPy Arrays

The use of `NumPy` for array-based operations and algorithms generally results in compact, easily readable code and significant performance improvements over pure `Python` code.

### Structured Arrays

The specialization of the `numpy.ndarray` class obviously brings a number of really valuable benefits with it. However, a too-narrow specialization might turn out to be too large a burden to carry for the majority of array-based algorithms and applications. Therefore, `NumPy` provides structured arrays that allow us to have different `NumPy` data types per column, at least. What does “per column” mean? Consider the following initialization of a structured array object:

````In` `[``116``]:` `dt` `=` `np``.``dtype``([(``'Name'``,` `'S10'``),` `(``'Age'``,` `'i4'``),`
`(``'Height'``,` `'f'``),` `(``'Children/Pets'``,` `'i4'``,` `2``)])`
`s` `=` `np``.``array``([(``'Smith'``,` `45``,` `1.83``,` `(``0``,` `1``)),`
`(``'Jones'``,` `53``,` `1.72``,` `(``2``,` `2``))],` `dtype``=``dt``)`
`s````
```Out[116]: array([('Smith', 45, 1.8300000429153442, [0, 1]),
('Jones', 53, 1.7200000286102295, [2, 2])],
dtype=[('Name', 'S10'), ('Age', '<i4'), ('Height', '<f4'), ('Chi
ldren/Pets', '<i4', (2,))])```

In a sense, this construction comes quite close to the operation for initializing tables in a `SQL` database. We have column names and column data types, with maybe some additional information (e.g., maximum number of characters per `string` object). The single columns can now be easily accessed by their names:

``In` `[``117``]:` `s``[``'Name'``]``
```Out[117]: array(['Smith', 'Jones'],
dtype='|S10')```
``In` `[``118``]:` `s``[``'Height'``]``.``mean``()``
`Out[118]: 1.7750001`

Having selected a specific row and record, respectively, the resulting objects mainly behave like `dict` objects, where one can retrieve values via keys:

``In` `[``119``]:` `s``[``1``][``'Age'``]``
`Out[119]: 53`

In summary, structured arrays are a generalization of the regular `numpy.ndarray` object types in that the data type only has to be the same per column, as one is used to in the context of tables in `SQL` databases. One advantage of structured arrays is that a single element of a column can be another multidimensional object and does not have to conform to the basic `NumPy` data types.

### Structured Arrays

`NumPy` provides, in addition to regular arrays, structured arrays that allow the description and handling of rather complex array-oriented data structures with a variety of different data types and even structures per (named) column. They bring `SQL` table-like data structures to `Python`, with all the benefits of regular `numpy.ndarray` objects (syntax, methods, performance).

## Vectorization of Code

Vectorization of code is a strategy to get more compact code that is possibly executed faster. The fundamental idea is to conduct an operation on or to apply a function to a complex object “at once” and not by iterating over the single elements of the object. In `Python`, the functional programming tools `map`, `filter`, and `reduce` provide means for vectorization. In a sense, `NumPy` has vectorization built in deep down in its core.

### Basic Vectorization

As we learned in the previous section, simple mathematical operations can be implemented on `numpy.ndarray` objects directly. For example, we can add two `NumPy` arrays element-wise as follows:

````In` `[``120``]:` `r` `=` `np``.``random``.``standard_normal``((``4``,` `3``))`
`s` `=` `np``.``random``.``standard_normal``((``4``,` `3``))````
``In` `[``121``]:` `r` `+` `s``
```Out[121]: array([[-1.94801686, -0.6855251 ,  2.28954806],
[ 0.33847593, -1.97109602,  1.30071653],
[-1.12066585,  0.22234207, -2.73940339],
[ 0.43787363,  0.52938941, -1.38467623]])```

`NumPy` also supports what is called broadcasting. This allows us to combine objects of different shape within a single operation. We have already made use of this before. Consider the following example:

``In` `[``122``]:` `2` `*` `r` `+` `3``
```Out[122]: array([[ 2.54691692,  1.65823523,  8.14636725],
[ 4.94758114,  0.25648128,  1.89566919],
[ 0.41775907,  0.58038395,  2.06567484],
[ 0.67600205,  3.41004636,  1.07282384]])```

In this case, the `r` object is multiplied by 2 element-wise and then 3 is added element-wise—the 3 is broadcasted or stretched to the shape of the `r` object. It works with differently shaped arrays as well, up to a certain point:

````In` `[``123``]:` `s` `=` `np``.``random``.``standard_normal``(``3``)`
`r` `+` `s````
```Out[123]: array([[ 0.23324118, -1.09764268,  1.90412565],
[ 1.43357329, -1.79851966, -1.22122338],
[-0.83133775, -1.63656832, -1.13622055],
[-0.70221625, -0.22173711, -1.63264605]])```

This broadcasts the one-dimensional array of size 3 to a shape of (4, 3). The same does not work, for example, with a one-dimensional array of size 4:

````In` `[``124``]:` `s` `=` `np``.``random``.``standard_normal``(``4``)`
`r` `+` `s````
```Out[124]: ValueError
operands could not be broadcast together with shapes (4,3) (4,)```

However, transposing the `r` object makes the operation work again. In the following code, the `transpose` method transforms the `ndarray` object with shape (4, 3) into an object of the same type with shape (3, 4):

``In` `[``125``]:` `r``.``transpose``()` `+` `s``
```Out[125]: array([[-0.63380522,  0.5964174 ,  0.88641996, -0.86931849],
[-1.07814606, -1.74913253,  0.9677324 ,  0.49770367],
[ 2.16591995, -0.92953858,  1.71037785, -0.67090759]])```
``In` `[``126``]:` `np``.``shape``(``r``.``T``)``
`Out[126]: (3, 4)`

As a general rule, custom-defined `Python` functions work with `numpy.ndarray`s as well. If the implementation allows, arrays can be used with functions just as `int` or `float` objects can. Consider the following function:

````In` `[``127``]:` `def` `f``(``x``):`
`return` `3` `*` `x` `+` `5````

We can pass standard `Python` objects as well as `numpy.ndarray` objects (for which the operations in the function have to be defined, of course):

``In` `[``128``]:` `f``(``0.5``)`  `# float object``
`Out[128]: 6.5`
``In` `[``129``]:` `f``(``r``)`  `# NumPy array``
```Out[129]: array([[  4.32037538,   2.98735285,  12.71955087],
[  7.9213717 ,   0.88472192,   3.34350378],
[  1.1266386 ,   1.37057593,   3.59851226],
[  1.51400308,   5.61506954,   2.10923576]])```

What `NumPy` does is to simply apply the function `f` to the object element-wise. In that sense, by using this kind of operation we do not avoid loops; we only avoid them on the `Python` level and delegate the looping to `NumPy`. On the `NumPy` level, looping over the `numpy.ndarray` object is taken care of by highly optimized code, most of it written in `C` and therefore generally much faster than pure `Python`. This explains the “secret” behind the performance benefits of using `NumPy` for array-based use cases.

When working with arrays, one has to take care to call the right functions on the respective objects. For example, the `sin` function from the standard `math` module of `Python` does not work with `NumPy` arrays:

````In` `[``130``]:` `import` `math`
`math``.``sin``(``r``)````
```Out[130]: TypeError
only length-1 arrays can be converted to Python scalars```

The function is designed to handle, for example, `float` objects—i.e., single numbers, not arrays. `NumPy` provides the respective counterparts as so-called ufuncs, or universal functions:

``In` `[``131``]:` `np``.``sin``(``r``)`  `# array as input``
```Out[131]: array([[-0.22460878, -0.62167738,  0.53829193],
[ 0.82702259, -0.98025745, -0.52453206],
[-0.96114497, -0.93554821, -0.45035471],
[-0.91759955,  0.20358986, -0.82124413]])```
``In` `[``132``]:` `np``.``sin``(``np``.``pi``)`  `# float as input``
`Out[132]: 1.2246467991473532e-16`

`NumPy` provides a large number of such ufuncs that generalize typical mathematical functions to `numpy.ndarray` objects.[22]

### Universal Functions

Be careful when using the `from library import *` approach to importing. Such an approach can cause the `NumPy` reference to the ufunc `numpy.sin` to be replaced by the reference to the `math` function `math.sin`. You should, as a rule, import both libraries by name to avoid confusion: `import numpy as np; import math`. Then you can use `math.sin` alongside `np.sin`.

### Memory Layout

When we first initialized `numpy.ndarray` objects by using `numpy.zero`, we provided an optional argument for the memory layout. This argument specifies, roughly speaking, which elements of an array get stored in memory next to each other. When working with small arrays, this has hardly any measurable impact on the performance of array operations. However, when arrays get large the story is somewhat different, depending on the operations to be implemented on the arrays.

To illustrate this important point for memory-wise handling of arrays in science and finance, consider the following construction of multidimensional `numpy.ndarray` objects:

````In` `[``133``]:` `x` `=` `np``.``random``.``standard_normal``((``5``,` `10000000``))`
`y` `=` `2` `*` `x` `+` `3`  `# linear equation y = a * x + b`
`C` `=` `np``.``array``((``x``,` `y``),` `order``=``'C'``)`
`F` `=` `np``.``array``((``x``,` `y``),` `order``=``'F'``)`
`x` `=` `0.0``;` `y` `=` `0.0`  `# memory cleanup````
``In` `[``134``]:` `C``[:``2``]``.``round``(``2``)``
```Out[134]: array([[[-0.51, -1.14, -1.07, ...,  0.2 , -0.18,  0.1 ],
[-1.22,  0.68,  1.83, ...,  1.23, -0.27, -0.16],
[ 0.45,  0.15,  0.01, ..., -0.75,  0.91, -1.12],
[-0.16,  1.4 , -0.79, ..., -0.33,  0.54,  1.81],
[ 1.07, -1.07, -0.37, ..., -0.76,  0.71,  0.34]],

[[ 1.98,  0.72,  0.86, ...,  3.4 ,  2.64,  3.21],
[ 0.55,  4.37,  6.66, ...,  5.47,  2.47,  2.68],
[ 3.9 ,  3.29,  3.03, ...,  1.5 ,  4.82,  0.76],
[ 2.67,  5.8 ,  1.42, ...,  2.34,  4.09,  6.63],
[ 5.14,  0.87,  2.27, ...,  1.48,  4.43,  3.67]]])```

Let’s look at some really fundamental examples and use cases for both types of `ndarray` objects:

``In` `[``135``]:` `%``timeit` `C``.``sum``()``
`Out[135]: 10 loops, best of 3: 123 ms per loop`
``In` `[``136``]:` `%``timeit` `F``.``sum``()``
`Out[136]: 10 loops, best of 3: 123 ms per loop`

When summing up all elements of the arrays, there is no performance difference between the two memory layouts. However, consider the following example with the C-like memory layout:

``In` `[``137``]:` `%``timeit` `C``[``0``]``.``sum``(``axis``=``0``)``
`Out[137]: 10 loops, best of 3: 102 ms per loop`
``In` `[``138``]:` `%``timeit` `C``[``0``]``.``sum``(``axis``=``1``)``
`Out[138]: 10 loops, best of 3: 61.9 ms per loop`

Summing five large vectors and getting back a single large results vector obviously is slower in this case than summing 10,000,000 small ones and getting back an equal number of results. This is due to the fact that the single elements of the small vectors—i.e., the rows—are stored next to each other. With the `Fortran`-like memory layout, the relative performance changes considerably:

``In` `[``139``]:` `%``timeit` `F``.``sum``(``axis``=``0``)``
`Out[139]: 1 loops, best of 3: 801 ms per loop`
``In` `[``140``]:` `%``timeit` `F``.``sum``(``axis``=``1``)``
`Out[140]: 1 loops, best of 3: 2.23 s per loop`
``In` `[``141``]:` `F` `=` `0.0``;` `C` `=` `0.0`  `# memory cleanup``

In this case, operating on a few large vectors performs better than operating on a large number of small ones. The elements of the few large vectors are stored in memory next to each other, which explains the relative performance advantage. However, overall the operations are absolutely much slower when compared to the `C`-like variant.

## Conclusions

`Python` provides, in combination with `NumPy`, a rich set of flexible data structures. From a finance point of view, the following can be considered the most important ones:

Basic data types
In finance, the classes `int`, `float`, and `string` provide the atomic data types.
Standard data structures
The classes `tuple`, `list`, `dict`, and `set` have many application areas in finance, with `list` being the most flexible workhorse in general.
Arrays
A large class of finance-related problems and algorithms can be cast to an array setting; `NumPy` provides the specialized class `numpy.ndarray`, which provides both convenience and compactness of code as well as high performance.

This chapter shows that both the basic data structures and the `NumPy` ones allow for highly vectorized implementation of algorithms. Depending on the specific shape of the data structures, care should be taken with regard to the memory layout of arrays. Choosing the right approach here can speed up code execution by a factor of two or more.

This chapter focuses on those issues that might be of particular importance for finance algorithms and applications. However, it can only represent a starting point for the exploration of data structures and data modeling in `Python`. There are a number of valuable resources available to go deeper from here.

Here are some Internet resources to consult:

Good references in book form are:

• Goodrich, Michael et al. (2013): Data Structures and Algorithms in Python. John Wiley & Sons, Hoboken, NJ.
• Langtangen, Hans Petter (2009): A Primer on Scientific Programming with Python. Springer Verlag, Berlin, Heidelberg.

[18] The `Cython` library brings static typing and compiling features to `Python` that are comparable to those in `C`. In fact, `Cython` is a hybrid language of `Python` and `C`.

[19] Here and in the following discussion, terms like float, float object, etc. are used interchangeably, acknowledging that every float is also an object. The same holds true for other object types.

[21] It is not possible to go into details here, but there is a wealth of information available on the Internet about regular expressions in general and for `Python` in particular. For an introduction to this topic, refer to Fitzgerald, Michael (2012): Introducing Regular Expressions. O’Reilly, Sebastopol, CA.

Get Python for Finance now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.