Chapter 1. Basic Math and Calculus Review
We will kick off the first chapter covering what numbers are and how variables and functions work on a Cartesian system. We will then cover exponents and logarithms. After that, we will learn the two basic operations of calculus: derivatives and integrals.
Before we dive into the applied areas of essential math such as probability, linear algebra, statistics, and machine learning, we should probably review a few basic math and calculus concepts. Before you drop this book and run screaming, do not worry! I will present how to calculate derivatives and integrals for a function in a way you were probably not taught in college. We have Python on our side, not a pencil and paper. Even if you are not familiar with derivatives and integrals, you still do not need to worry.
I will make these topics as tight and practical as possible, focusing only on what will help us in later chapters and what falls under the “essential math” umbrella.
This Is Not a Full Math Crash Course!
This is by no means a comprehensive review of high school and college math. If you want that, a great book to check out is No Bullshit Guide to Math and Physics by Ivan Savov (pardon my French). The first few chapters contain the best crash course on high school and college math I have ever seen. The book Mathematics 1001 by Dr. Richard Elwes has some great content as well, and in bitesized explanations.
Number Theory
What are numbers? I promise to not be too philosophical in this book, but are numbers not a construct we have defined? Why do we have the digits 0 through 9, and not have more digits than that? Why do we have fractions and decimals and not just whole numbers? This area of math where we muse about numbers and why we designed them a certain way is known as number theory.
Number theory goes all the way back to ancient times, when mathematicians studied different number systems, and it explains why we have accepted them the way we do today. Here are different number systems that you may recognize:
 Natural numbers

These are the numbers 1, 2, 3, 4, 5…and so on. Only positive numbers are included here, and they are the earliest known system. Natural numbers are so ancient cavemen scratched tally marks on bones and cave walls to keep records.
 Whole numbers

Adding to natural numbers, the concept of “0” was later accepted; we call these “whole numbers.” The Babylonians also developed the useful idea for placeholding notation for empty “columns” on numbers greater than 9, such as “10,” “1,000,” or “1,090.” Those zeros indicate no value occupying that column.
 Integers

Integers include positive and negative natural numbers as well as 0. We may take them for granted, but ancient mathematicians deeply distrusted the idea of negative numbers. But when you subtract 5 from 3, you get –2. This is useful especially when it comes to finances where we measure profits and losses. In 628 AD, an Indian mathematician named Brahmagupta showed why negative numbers were necessary for arithmetic to progress with the quadratic formula, and therefore integers became accepted.
 Rational numbers

Any number that you can express as a fraction, such as 2/3, is a rational number. This includes all finite decimals and integers since they can be expressed as fractions, too, such as 687/100 = 6.87 and 2/1 = 2, respectively. They are called rational because they are ratios. Rational numbers were quickly deemed necessary because time, resources, and other quantities could not always be measured in discrete units. Milk does not always come in gallons. We may have to measure it as parts of a gallon. If I run for 12 minutes, I cannot be forced to measure in whole miles when in actuality I ran 9/10 of a mile.
 Irrational numbers

Irrational numbers cannot be expressed as a fraction. This includes the famous $\pi $, square roots of certain numbers like $\sqrt{2}$, and Euler’s number $e$, which we will learn about later. These numbers have an infinite number of decimal digits, such as 3.141592653589793238462…
There is an interesting history behind irrational numbers. The Greek mathematician Pythagoras believed all numbers are rational. He believed this so fervently, he made a religion that prayed to the number 10. “Bless us, divine number, thou who generated gods and men!” he and his followers would pray (why “10” was so special, I do not know). There is a legend that one of his followers, Hippasus, proved not all numbers are rational simply by demonstrating the square root of 2. This severely messed with Pythagoras’s belief system, and he responded by drowning Hippasus at sea.
Regardless, we now know not all numbers are rational.
 Real numbers

Real numbers include rational as well as irrational numbers. In practicality, when you are doing any data science work you can treat any decimals you work with as real numbers.
 Complex and imaginary numbers

You encounter this number type when you take the square root of a negative number. While imaginary and complex numbers have relevance in certain types of problems, we will mostly steer clear of them.
In data science, you will find most (if not all) of your work will be using whole numbers, natural numbers, integers, and real numbers. Imaginary numbers may be encountered in more advanced use cases such as matrix decomposition, which we will touch on in Chapter 4.
Complex and Imaginary Numbers
If you do want to learn about imaginary numbers, there is a great playlist Imaginary Numbers are Real on YouTube.
Order of Operations
Hopefully, you are familiar with order of operations, which is the order you solve each part of a mathematical expression. As a brief refresher, recall that you evaluate components in parentheses, followed by exponents, then multiplication, division, addition, and subtraction. After that order, operations are then performed lefttoright. You can remember the order of operations by the mnemonic device PEMDAS (Please Excuse My Dear Aunt Sally), which corresponds to the ordering parentheses, exponents, multiplication, division, addition, and subtraction.
Take for example this expression:
First we evaluate the parentheses (3 + 2), which equals 5:
Next we solve the exponent, which we can see is squaring that 5 we just summed. That is 25:
Next up we have multiplication and division. Let’s go ahead and multiply the 2 with the $\frac{25}{5}$, yielding $\frac{50}{5}$:
Next we will perform the division, dividing 50 by 5, which will yield 10:
And finally, we perform any addition and subtraction. Of course, $104$ is going to give us 6:
Sure enough, if we were to express this in Python we would print a value of 6.0
as shown in Example 11.
Example 11. Solving an expression in Python
my_value
=
2
*
(
3
+
2
)
**
2
/
5

4
(
my_value
)
# prints 6.0
This may be elementary but it is still critical. In code, even if you get the correct result without them, it is a good practice to liberally use parentheses in complex expressions so you establish control of the evaluation order.
Here I group the fractional part of my expression in parentheses, helping to set it apart from the rest of the expression in Example 12.
Example 12. Making use of parentheses for clarity in Python
my_value
=
2
*
((
3
+
2
)
**
2
/
5
)

4
(
my_value
)
# prints 6.0
While both examples are technically correct, the latter is more clear to us easily confused humans. If you or someone else makes changes to your code, the parentheses provide an easy reference of operation order as you make changes. This provides a line of defense against code changes to prevent bugs as well.
Variables
If you have done some scripting with Python or another programming language, you have an idea what a variable is. In mathematics, a variable is a named placeholder for an unspecified or unknown number.
You may have a variable x representing any real number, and you can multiply that variable without declaring what it is. In Example 13 we take a variable input x from a user and multiply it by 3.
Example 13. A variable in Python that is then multiplied
x
=
int
(
input
(
"Please input a number
\n
"
))
product
=
3
*
x
(
product
)
There are some standard variable names for certain variable types. If these variable names and concepts are unfamiliar, no worries! But some readers might recognize we use theta $\theta $ to denote angles and beta $\beta $ for a parameter in a linear regression. Greek symbols make awkward variable names in Python, so we would likely name these variables theta
and beta
in Python as shown in Example 14.
Example 14. Greek variable names in Python
beta
=
1.75
theta
=
30.0
Note also that variable names can be subscripted so that several instances of a variable name can be used. For practical purposes, just treat these as separate variables. If you encounter variables x_{1}, x_{2}, and x_{3}, just treat them as three separate variables as shown in Example 15.
Example 15. Expressing subscripted variables in Python
x1
=
3
# or x_1 = 3
x2
=
10
# or x_2 = 10
x3
=
44
# or x_3 = 44
Functions
Functions are expressions that define relationships between two or more variables. More specifically, a function takes input variables (also called domain variables or independent variables), plugs them into an expression, and then results in an output variable (also called dependent variable).
Take this simple linear function:
For any given xvalue, we solve the expression with that x to find y. When x = 1, then y = 3. When x = 2, y = 5. When x = 3, y = 7 and so on, as shown in Table 11.
x  2x + 1  y 

0 
2(0) + 1 
1 
1 
2(1) + 1 
3 
2 
2(2) + 1 
5 
3 
2(3) + 1 
7 
Functions are useful because they model a predictable relationship between variables, such as how many fires y can we expect at x temperature. We will use linear functions to perform linear regressions in Chapter 5.
Another convention you may see for the dependent variable y is to explicitly label it a function of x, such as $f\left(x\right)$. So rather than express a function as $y=2x+1$, we can also express it as:
Example 16 shows how we can declare a mathematical function and iterate it in Python.
Example 16. Declaring a linear function in Python
def
f
(
x
):
return
2
*
x
+
1
x_values
=
[
0
,
1
,
2
,
3
]
for
x
in
x_values
:
y
=
f
(
x
)
(
y
)
When dealing with real numbers, a subtle but important feature of functions is they often have an infinite number of xvalues and resulting yvalues. Ask yourself this: how many xvalues can we put through the function $y=2x+1$? Rather than just 0, 1, 2, 3…why not 0, 0.5, 1, 1.5, 2, 2.5, 3 as shown in Table 12?
x  2x + 1  y 

0.0 
2(0) + 1 
1 
0.5 
2(.5) + 1 
2 
1.0 
2(1) + 1 
3 
1.5 
2(1.5) + 1 
4 
2.0 
2(2) + 1 
5 
2.5 
2(2.5) + 1 
6 
3.0 
2(3) + 1 
7 
Or why not do quarter steps for x? Or 1/10 of a step? We can make these steps infinitely small, effectively showing $y=2x+1$ is a continuous function, where for every possible value of x there is a value for y. This segues us nicely to visualize our function as a line as shown in Figure 11.
When we plot on a twodimensional plane with two number lines (one for each variable) it is known as a Cartesian plane, xy plane, or coordinate plane. We trace a given xvalue and then look up the corresponding yvalue, and plot the intersections as a line. Notice that due to the nature of real numbers (or decimals, if you prefer), there are an infinite number of x values. This is why when we plot the function f(x) we get a continuous line with no breaks in it. There are an infinite number of points on that line, or any part of that line.
If you want to plot this using Python, there are a number of charting libraries from Plotly to matplotlib. Throughout this book we will use SymPy to do many tasks, and the first we will use is plotting a function. SymPy uses matplotlib so make sure you have that package installed. Otherwise it will print an ugly textbased graph to your console. After that, just declare the x variable to SymPy using symbols()
, declare your function, and then plot it as shown in Example 17 and Figure 12.
Example 17. Charting a linear function in Python using SymPy
from
sympy
import
*
x
=
symbols
(
'x'
)
f
=
2
*
x
+
1
plot
(
f
)
Example 18 and Figure 13 are another example showing the function $f\left(x\right)={x}^{2}+1$.
Example 18. Charting a function involving an exponent
from
sympy
import
*
x
=
symbols
(
'x'
)
f
=
x
**
2
+
1
plot
(
f
)
Note in Figure 13 we do not get a straight line but rather a smooth, symmetrical curve known as a parabola. It is continuous but not linear, as it does not produce values in a straight line. Curvy functions like this are mathematically harder to work with, but we will learn some tricks to make it not so bad.
Curvilinear Functions
When a function is continuous but curvy, rather than linear and straight, we call it a curvilinear function.
Note that functions utilize multiple input variables, not just one. For example, we can have a function with independent variables x and y. Note that y is not dependent like in previous examples.
Since we have two independent variables (x and y) and one dependent variable (the output of f(x,y)), we need to plot this graph on three dimensions to produce a plane of values rather than a line, as shown in Example 19 and Figure 14.
Example 19. Declaring a function with two independent variables in Python
from
sympy
import
*
from
sympy.plotting
import
plot3d
x
,
y
=
symbols
(
'x y'
)
f
=
2
*
x
+
3
*
y
plot3d
(
f
)
No matter how many independent variables you have, your function will typically output only one dependent variable. When you solve for multiple dependent variables, you will likely be using separate functions for each one.
Summations
I promised not to use equations full of Greek symbols in this book. However, there is one that is so common and useful that I would be remiss to not cover it. A summation is expressed as a sigma $\Sigma $ and adds elements together.
For example, if I want to iterate the numbers 1 through 5, multiply each by 2, and sum them, here is how I would express that using a summation. Example 110 shows how to execute this in Python.
Example 110. Performing a summation in Python
summation
=
sum
(
2
*
i
for
i
in
range
(
1
,
6
))
(
summation
)
Note that i is a placeholder variable representing each consecutive index value we are iterating in the loop, which we multiply by 2 and then sum all together. When you are iterating data, you may see variables like ${x}_{i}$ indicating an element in a collection at index i.
The range() function
Recall that the range()
function in Python is end exclusive, meaning if you invoke range(1,4)
it will iterate the numbers 1, 2, and 3. It excludes the 4 as an upper boundary.
It is also common to see n represent the number of items in a collection, like the number of records in a dataset. Here is one such example where we iterate a collection of numbers of size n, multiply each one by 10, and sum them:
In Example 111 we use Python to execute this expression on a collection of four numbers. Note that in Python (and most programming languages in general) we typically reference items starting at index 0, while in math we start at index 1. Therefore, we shift accordingly in our iteration by starting at 0 in our range()
.
Example 111. Summation of elements in Python
x
=
[
1
,
4
,
6
,
2
]
n
=
len
(
x
)
summation
=
sum
(
10
*
x
[
i
]
for
i
in
range
(
0
,
n
))
(
summation
)
That is the gist of summation. In a nutshell, a summation $\Sigma $ says, “add a bunch of things together,” and uses an index i and a maximum value n to express each iteration feeding into the sum. We will see these throughout this book.
Exponents
Exponents multiply a number by itself a specified number of times. When you raise 2 to the third power (expressed as 2^{3} using 3 as a superscript), that is multiplying three 2s together:
The base is the variable or value we are exponentiating, and the exponent is the number of times we multiply the base value. For the expression ${2}^{3}$, 2 is the base and 3 is the exponent.
Exponents have a few interesting properties. Say we multiplied ${x}^{2}$ and ${x}^{3}$ together. Observe what happens next when I expand the exponents with simple multiplication and then consolidate into a single exponent:
When we multiply exponents together with the same base, we simply add the exponents, which is known as the product rule. Let me emphasize that the base of all multiplied exponents must be the same for the product rule to apply.
Let’s explore division next. What happens when we divide ${x}^{2}$ by ${x}^{5}$ ?
As you can see, when we divide ${x}^{2}$ by ${x}^{5}$ we can cancel out two x’s in the numerator and denominator, leaving us with $\frac{1}{{x}^{3}}$. When a factor exists in both the numerator and denominator, we can cancel out that factor.
What about the ${x}^{3}$, you wonder? This is a good point to introduce negative exponents, which is another way of expressing an exponent operation in the denominator of a fraction. To demonstrate, $\frac{1}{{x}^{3}}$ is the same as ${x}^{3}$:
Tying back the product rule, we can see it applies to negative exponents, too. To get intuition behind this, let’s approach this problem a different way. We can express this division of two exponents by making the “5” exponent of ${x}^{5}$ negative, and then multiplying it with ${x}^{2}$. When you add a negative number, it is effectively performing subtraction. Therefore, the exponent product rule summing the multiplied exponents still holds up as shown next:
Last but not least, can you figure out why any base with an exponent of 0 is 1?
The best way to get this intuition is to reason that any number divided by itself is 1. If you have $\frac{{x}^{3}}{{x}^{3}}$ it is algebraically obvious that reduces to 1. But that expression also evaluates to ${x}^{0}$:
By the transitive property, which states that if a = b and b = c, then a = c, we know that ${x}^{0}=1$.
Now what about fractional exponents? They are an alternative way to represent roots, such as the square root. As a brief refresher, a $\sqrt{4}$ asks “What number multiplied by itself will give me 4?” which of course is 2. Note here that ${4}^{1/2}$ is the same as $\sqrt{4}$:
Cubed roots are similar to square roots, but they seek a number multiplied by itself three times to give a result. A cubed root of 8 is expressed as $\sqrt[3]{8}$ and asks “What number multiplied by itself three times gives me 8?” This number would be 2 because $2*2*2=8$. In exponents a cubed root is expressed as a fractional exponent, and $\sqrt[3]{8}$ can be reexpressed as ${8}^{1/3}$:
To bring it back full circle, what happens when you multiply the cubed root of 8 three times? This will undo the cubed root and yield 8. Alternatively, if we express the cubed root as fractional exponents ${8}^{1/3}$, it becomes clear we add the exponents together to get an exponent of 1. That also undoes the cubed root:
And one last property: an exponent of an exponent will multiply the exponents together. This is known as the power rule. So ${\left({8}^{3}\right)}^{2}$ would simplify to ${8}^{6}$:
If you are skeptical why this is, try expanding it and you will see the sum rule makes it clear:
Lastly, what does it mean when we have a fractional exponent with a numerator other than 1, such as ${8}^{\frac{2}{3}}$? Well, that is taking the cube root of 8 and then squaring it. Take a look:
And yes, irrational numbers can serve as exponents like ${8}^{\pi}$, which is 687.2913. This may feel unintuitive, and understandably so! In the interest of time, we will not dive deep into this as it requires some calculus. But essentially, we can calculate irrational exponents by approximating with a rational number. This is effectively what computers do since they can compute to only so many decimal places anyway.
For example $\pi $ has an infinite number of decimal places. But if we take the first 11 digits, 3.1415926535, we can approximate $\pi $ as a rational number 31415926535 / 10000000000. Sure enough, this gives us approximately 687.2913, which should approximately match any calculator:
Logarithms
A logarithm is a math function that finds a power for a specific number and base. It may not sound interesting at first, but it actually has many applications. From measuring earthquakes to managing volume on your stereo, the logarithm is found everywhere. It also finds its way into machine learning and data science a lot. As a matter of fact, logarithms will be a key part of logistic regressions in Chapter 6.
Start your thinking by asking “2 raised to what power gives me 8?” One way to express this mathematically is to use an x for the exponent:
We intuitively know the answer, $x=3$, but we need a more elegant way to express this common math operation. This is what the $log\left(\right)$ function is for.
As you can see in the preceding logarithm expression, we have a base 2 and are finding a power to give us 8. More generally, we can reexpress a variable exponent as a logarithm:
Algebraically speaking, this is a way of isolating the x, which is important to solve for x. Example 112 shows how we calculate this logarithm in Python.
Example 112. Using the log function in Python
from
math
import
log
# 2 raised to what power gives me 8?
x
=
log
(
8
,
2
)
(
x
)
# prints 3.0
When you do not supply a base argument to a log()
function on a platform like Python, it will typically have a default base. In some fields, like earthquake measurements, the default base for the log is 10. But in data science the default base for the log is Euler’s number $e$. Python uses the latter, and we will talk about $e$ shortly.
Just like exponents, logarithms have several properties when it comes to multiplication, division, exponentiation, and so on. In the interest of time and focus, I will just present this in Table 13. The key idea to focus on is a logarithm finds an exponent for a given base to result in a certain number.
If you need to dive into logarithmic properties, Table 13 displays exponent and logarithm behaviors sidebyside that you can use for reference.
Operator  Exponent property  Logarithm property 

Multiplication 
${x}^{m}\times {x}^{n}={x}^{m+n}$ 
$log(a\times b)=log\left(a\right)+log\left(b\right)$ 
Division 
$\frac{{x}^{m}}{{x}^{n}}={x}^{mn}$ 
$log\left(\frac{a}{b}\right)=log\left(a\right)log\left(b\right)$ 
Exponentiation 
${\left({x}^{m}\right)}^{n}={x}^{mn}$ 
$log\left({a}^{n}\right)=n\times log\left(a\right)$ 
Zero Exponent 
${x}^{0}=1$ 
$log\left(1\right)=0$ 
Inverse 
${x}^{1}=\frac{1}{x}$ 
$log\left({x}^{1}\right)=log\left(\frac{1}{x}\right)=log\left(x\right)$ 
Euler’s Number and Natural Logarithms
There is a special number that shows up quite a bit in math called Euler’s number $e$. It is a special number much like Pi $\pi $ and is approximately 2.71828. $e$ is used a lot because it mathematically simplifies a lot of problems. We will cover $e$ in the context of exponents and logarithms.
Euler’s Number
Back in high school, my calculus teacher demonstrated Euler’s number in several exponential problems. Finally I asked, “Mr. Nowe, what is $e$ anyway? Where does it come from?” I remember never being fully satisfied with the explanations involving rabbit populations and other natural phenomena. I hope to give a more satisfying explanation here.
Here is how I like to discover Euler’s number. Let’s say you loan $100 to somebody with 20% interest annually. Typically, interest will be compounded monthly, so the interest each month would be $.20/12=.01666$. How much will the loan balance be after two years? To keep it simple, let’s assume the loan does not require payments (and no payments are made) until the end of those two years.
Putting together the exponent concepts we learned so far (or perhaps pulling out a finance textbook), we can come up with a formula to calculate interest. It consists of a balance A for a starting investment P, interest rate r, time span t (number of years), and periods n (number of months in each year). Here is the formula:
So if we were to compound interest every month, the loan would grow to $148.69 as calculated here:
If you want to do this in Python, try it out with the code in Example 113.
Example 113. Calculating compound interest in Python
from
math
import
exp
p
=
100
r
=
.20
t
=
2.0
n
=
12
a
=
p
*
(
1
+
(
r
/
n
))
**
(
n
*
t
)
(
a
)
# prints 148.69146179463576
But what if we compounded interest daily? What happens then? Change n to 365:
Huh! If we compound our interest daily instead of monthly, we would earn 47.4666 cents more at the end of two years. If we got greedy why not compound every hour as shown next? Will that give us even more? There are 8,760 hours in a year, so set n to that value:
Ah, we squeezed out roughly 2 cents more in interest! But are we experiencing a diminishing return? Let’s try to compound every minute! Note that there are 525,600 minutes in a year, so let’s set that value to n:
OK, we are only gaining smaller and smaller fractions of a cent the more frequently we compound. So if I keep making these periods infinitely smaller to the point of compounding continuously, where does this lead?
Let me introduce you to Euler’s number $e$, which is approximately 2.71828. Here is the formula to compound “continuously,” meaning we are compounding nonstop:
Returning to our example, let’s calculate the balance of our loan after two years if we compounded continuously:
This is not too surprising considering compounding every minute got us a balance of 149.1824584. That got us really close to our value of 149.1824698 when compounding continuously.
Typically you use $e$ as an exponent base in Python, Excel, and other platforms using the exp()
function. You will find that $e$ is so commonly used, it is the default base for both exponent and logarithm functions.
Example 114 calculates continuous interest in Python using the exp()
function.
Example 114. Calculating continuous interest in Python
from
math
import
exp
p
=
100
# principal, starting amount
r
=
.20
# interest rate, by year
t
=
2.0
# time, number of years
a
=
p
*
exp
(
r
*
t
)
(
a
)
# prints 149.18246976412703
So where do we derive this constant $e$? Compare the compounding interest formula and the continuous interest formula. They structurally look similar but have some differences:
More technically speaking, $e$ is the resulting value of the expression ${(1+\frac{1}{n})}^{n}$ as n forever gets bigger and bigger, thus approaching infinity. Try experimenting with increasingly large values for n. By making it larger and larger you will notice something:
As you make n larger, there is a diminishing return and it converges approximately on the value 2.71828, which is our value $e$. You will find this $e$ used not just in studying populations and their growth. It plays a key role in many areas of mathematics.
Later in the book, we will use Euler’s number to build normal distributions in Chapter 3 and logistic regressions in Chapter 6.
Natural Logarithms
When we use $e$ as our base for a logarithm, we call it a natural logarithm. Depending on the platform, we may use ln()
instead of log()
to specify a natural logarithm. So rather than express a natural logarithm expressed as $lo{g}_{e}10$ to find the power raised on $e$ to get 10, we would shorthand it as $ln\left(10\right)$:
However, in Python, a natural logarithm is specified by the log()
function. As discussed earlier, the default base for the log()
function is $e$. Just leave the second argument for the base empty and it will default to using $e$ as the base shown in Example 115.
Example 115. Calculating the natural logarithm of 10 in Python
from
math
import
log
# e raised to what power gives us 10?
x
=
log
(
10
)
(
x
)
# prints 2.302585092994046
We will use $e$ in a number of places throughout this book. Feel free to experiment with exponents and logarithms using Excel, Python, Desmos.com, or any other calculation platform of your choice. Make graphs and get comfortable with what these functions look like.
Limits
As we have seen with Euler’s number, some interesting ideas emerge when we forever increase or decrease an input variable and the output variable keeps approaching a value but never reaching it. Let’s formally explore this idea.
Take this function, which is plotted in Figure 15:
We are looking only at positive x values. Notice that as x forever increases, f(x) gets closer to 0. Fascinatingly, f(x) never actually reaches 0. It just forever keeps getting closer.
Therefore the fate of this function is, as $x$ forever extends into infinity, it will keep getting closer to 0 but never reach 0. The way we express a value that is forever being approached, but never reached, is through a limit:
The way we read this is “as x approaches infinity, the function 1/x approaches 0 (but never reaches 0).” You will see this kind of “approach but never touch” behavior a lot, especially when we dive into derivatives and integrals.
Using SymPy, we can calculate what value we approach for $f\left(x\right)=\frac{1}{x}$ as x approaches infinity $\infty $ (Example 116). Note that $\infty $ is cleverly expressed in SymPy with oo
.
Example 116. Using SymPy to calculate limits
from
sympy
import
*
x
=
symbols
(
'x'
)
f
=
1
/
x
result
=
limit
(
f
,
x
,
oo
)
(
result
)
# 0
As you have seen, we discovered Euler’s number $e$ this way too. It is the result of forever extending n into infinity for this function:
Funnily enough, when we calculate Euler’s number with limits in SymPy (shown in the following code), SymPy immediately recognizes it as Euler’s number. We can call evalf()
so we can actually display it as a number:
from
sympy
import
*
n
=
symbols
(
'n'
)
f
=
(
1
+
(
1
/
n
))
**
n
result
=
limit
(
f
,
n
,
oo
)
(
result
)
# E
(
result
.
evalf
())
# 2.71828182845905
Derivatives
Let’s go back to talking about functions and look at them from a calculus perspective, starting with derivatives. A derivative tells the slope of a function, and it is useful to measure the rate of change at any point in a function.
Why do we care about derivatives? They are often used in machine learning and other mathematical algorithms, especially with gradient descent. When the slope is 0, that means we are at the minimum or maximum of an output variable. This concept will be useful later when we do linear regression (Chapter 5), logistic regression (Chapter 6), and neural networks (Chapter 7).
Let’s start with a simple example. Let’s take a look at the function $f\left(x\right)={x}^{2}$ in Figure 16. How “steep” is the curve at x = 2?
Notice that we can measure “steepness” at any point in the curve, and we can visualize this with a tangent line. Think of a tangent line as a straight line that “just touches” the curve at a given point. It also provides the slope at a given point. You can crudely estimate a tangent line at a given xvalue by creating a line intersecting that xvalue and a really close neighboring xvalue on the function.
Take x = 2 and a nearby value x = 2.1, which when passed to the function $f\left(x\right)={x}^{2}$ will yield f(2) = 4 and f(2.1) = 4.41 as shown in Figure 17. The resulting line that passes through these two points has a slope of 4.1.
You can quickly calculate the slope $m$ between two points using the simple riseoverrun formula:
If I made the x step between the two points even smaller, like x = 2 and x = 2.00001, which would result in f(2) = 4 and f(2.00001) = 4.00004, that would get really close to the actual slope of 4. So the smaller the step is to the neighboring value, the closer we get to the slope value at a given point in the curve. Like so many important concepts in math, we find something meaningful as we approach infinitely large or infinitely small values.
Example 117 shows a derivative calculator implemented in Python.
Example 117. A derivative calculator in Python
def
derivative_x
(
f
,
x
,
step_size
):
m
=
(
f
(
x
+
step_size
)

f
(
x
))
/
((
x
+
step_size
)

x
)
return
m
def
my_function
(
x
):
return
x
**
2
slope_at_2
=
derivative_x
(
my_function
,
2
,
.00001
)
(
slope_at_2
)
# prints 4.000010000000827
Now the good news is there is a cleaner way to calculate the slope anywhere on a function. We have already been using SymPy to plot graphs, but I will show you how it can also do tasks like derivatives using the magic of symbolic computation.
When you encounter an exponential function like $f\left(x\right)={x}^{2}$ the derivative function will make the exponent a multiplier and then decrement the exponent by 1, leaving us with the derivative $\frac{d}{dx}{x}^{2}=2x$. The $\frac{d}{dx}$ indicates a derivative with respect to x, which says we are building a derivative targeting the xvalue to get its slope. So if we want to find the slope at x = 2, and we have the derivative function, we just plug in that xvalue to get the slope:
If you intend to learn these rules to handcalculate derivatives, there are plenty of calculus books for that. But there are some nice tools to calculate derivatives symbolically for you. The Python library SymPy is free and open source, and it nicely adapts to using the Python syntax. Example 118 shows how to calculate the derivative for $f\left(x\right)={x}^{2}$ on SymPy.
Example 118. Calculating a derivative in SymPy
from
sympy
import
*
# Declare 'x' to SymPy
x
=
symbols
(
'x'
)
# Now just use Python syntax to declare function
f
=
x
**
2
# Calculate the derivative of the function
dx_f
=
diff
(
f
)
(
dx_f
)
# prints 2*x
Wow! So by declaring variables using the symbols()
function in SymPy, I can then proceed to use normal Python syntax to declare my function. After that I can use diff()
to calculate the derivative function. In Example 119 we can then take our derivative function back to plain Python and simply declare it as another function.
Example 119. A derivative calculator in Python
def
f
(
x
):
return
x
**
2
def
dx_f
(
x
):
return
2
*
x
slope_at_2
=
dx_f
(
2.0
)
(
slope_at_2
)
# prints 4.0
If you want to keep using SymPy, you can call the subs()
function to swap the x variable with the value 2
as shown in Example 120.
Example 120. Using the substitution feature in SymPy
# Calculate the slope at x = 2
(
dx_f
.
subs
(
x
,
2
))
# prints 4
Partial Derivatives
Another concept we will encounter in this book is partial derivatives, which we will use in Chapters 5, 6, and 7. These are derivatives of functions that have multiple input variables.
Think of it this way. Rather than finding the slope on a onedimensional function, we have slopes with respect to multiple variables in several directions. For each given variable derivative, we assume the other variables are held constant. Take a look at the 3D graph of $f(x,y)=2{x}^{3}+3{y}^{3}$ in Figure 18, and you will see we have slopes in two directions for two variables.
Let’s take the function $f(x,y)=2{x}^{3}+3{y}^{3}$. The x and y variable each get their own derivatives $\frac{d}{dx}$ and $\frac{d}{dy}$. These represent the slope values with respect to each variable on a multidimensional surface. We technically call these “slopes” gradients when dealing with multiple dimensions. These are the derivatives for x and y, followed by the SymPy code to calculate those derivatives:
Example 121 and Figure 18 show how we calculate the partial derivatives for x and y, respectively, with SymPy.
Example 121. Calculating partial derivatives with SymPy
from
sympy
import
*
from
sympy.plotting
import
plot3d
# Declare x and y to SymPy
x
,
y
=
symbols
(
'x y'
)
# Now just use Python syntax to declare function
f
=
2
*
x
**
3
+
3
*
y
**
3
# Calculate the partial derivatives for x and y
dx_f
=
diff
(
f
,
x
)
dy_f
=
diff
(
f
,
y
)
(
dx_f
)
# prints 6*x**2
(
dy_f
)
# prints 9*y**2
# plot the function
plot3d
(
f
)
So for (x,y) values (1,2), the slope with respect to x is $6\left(1\right)=6$ and the slope with respect to y is $9{\left(2\right)}^{2}=36$.
The Chain Rule
In Chapter 7 when we build a neural network, we are going to need a special math trick called the chain rule. When we compose the neural network layers, we will have to untangle the derivatives from each layer. But for now let’s learn the chain rule with a simple algebraic example. Let’s say you are given two functions:
Notice that these two functions are linked, because the y is the output variable in the first function but is the input variable in the second. This means we can substitute the first function y into the second function z like this:
So what is the derivative for z with respect to x? We already have the substitution expressing z in terms of x. Let’s use SymPy to calculate that in Example 124.
Example 124. Finding the derivative of z with respect to x
from
sympy
import
*
x
=
symbols
(
'x'
)
z
=
(
x
**
2
+
1
)
**
3

2
dz_dx
=
diff
(
z
,
x
)
(
dz_dx
)
# 6*x*(x**2 + 1)**2
So our derivative for z with respect to x is $6x{({x}^{2}+1)}^{2}$:
But look at this. Let’s start over and take a different approach. If we take the derivatives of the y and z functions separately, and then multiply them together, this also produces the derivative of z with respect to x! Let’s try it:
All right, $6x{y}^{2}$ may not look like $6x{({x}^{2}+1)}^{2}$, but that’s only because we have not substituted the y function yet. Do that so the entire $\frac{dz}{dx}$ derivative is expressed in terms of x without y.
Now we see we got the same derivative function $6x{({x}^{2}+1)}^{2}$!
This is the chain rule, which says that for a given function y (with input variable x) composed into another function z (with input variable y), we can find the derivative of z with respect to x by multiplying the two respective derivatives together:
Example 125 shows the SymPy code that makes this comparison, showing the derivative from the chain rule is equal to the derivative of the substituted function.
Example 125. Calculating the derivative dz/dx with and without the chain rule, but still getting the same answer
from
sympy
import
*
x
,
y
=
symbols
(
'x y'
)
# derivative for first function
# need to underscore y to prevent variable clash
_y
=
x
**
2
+
1
dy_dx
=
diff
(
_y
)
# derivative for second function
z
=
y
**
3

2
dz_dy
=
diff
(
z
)
# Calculate derivative with and without
# chain rule, substitute y function
dz_dx_chain
=
(
dy_dx
*
dz_dy
)
.
subs
(
y
,
_y
)
dz_dx_no_chain
=
diff
(
z
.
subs
(
y
,
_y
))
# Prove chain rule by showing both are equal
(
dz_dx_chain
)
# 6*x*(x**2 + 1)**2
(
dz_dx_no_chain
)
# 6*x*(x**2 + 1)**2
The chain rule is a key part of training a neural network with the proper weights and biases. Rather than untangle the derivative of each node in a nested onion fashion, we can multiply the derivatives across each node instead, which is mathematically a lot easier.
Integrals
The opposite of a derivative is an integral, which finds the area under the curve for a given range. In Chapters 2 and 3, we will be finding the areas under probability distributions. Although we will not use integrals directly, and instead will use cumulative density functions that are already integrated, it is good to be aware of how integrals find areas under curves. Appendix A contains examples of using this approach on probability distributions.
I want to take an intuitive approach for learning integrals called the Riemann Sums, one that flexibly adapts to any continuous function. First, let’s point out that finding the area for a range under a straight line is easy. Let’s say I have a function $f\left(x\right)=2x$ and I want to find the area under the line between 0 and 1, as shaded in Figure 19.
Notice that I am finding the area bounded between the line and the xaxis, and in the x range 0.0 to 1.0. If you recall basic geometry formulas, the area A for a triangle is $A=\frac{1}{2}bh$ where b is the length of the base and h is the height. We can visually spot that $b=1$ and $h=2$. So plugging into the formula, we get for our area 1.0 as calculated here:
That was not bad, right? But let’s look at a function that is difficult to find the area under: $f\left(x\right)={x}^{2}+1$. What is the area between 0 and 1 as shaded in Figure 110?
Again we are interested in the area below the curve and above the xaxis, only within the x range between 0 and 1. The curviness here does not give us a clean geometric formula to find the area, but here is a clever little hack you can do.
What if we packed five rectangles of equal length under the curve as shown in Figure 111, where the height of each one extends from the xaxis to where the midpoint touches the curve?
The area of a rectangle is $A=\text{length}\times \text{width}$, so we could easily sum the areas of the rectangles. Would that give us a good approximation of the area under the curve? What if we packed 100 rectangles? 1,000? 100,000? As we increase the number of rectangles while decreasing their width, would we not get closer to the area under the curve? Yes we would, and it is yet another case where we increase/decrease something toward infinity to approach an actual value.
Let’s try it out in Python. First we need a function that approximates an integral that we will call approximate_integral()
. The arguments a
and b
will specify the min and max of the x range, respectively. n
will be the number of rectangles to pack, and f
will be the function we are integrating. We implement the function in Example 126, and then use it to integrate our function $f\left(x\right)={x}^{2}+1$ with five rectangles, between 0.0 and 1.0.
Example 126. An integral approximation in Python
def
approximate_integral
(
a
,
b
,
n
,
f
):
delta_x
=
(
b

a
)
/
n
total_sum
=
0
for
i
in
range
(
1
,
n
+
1
):
midpoint
=
0.5
*
(
2
*
a
+
delta_x
*
(
2
*
i

1
))
total_sum
+=
f
(
midpoint
)
return
total_sum
*
delta_x
def
my_function
(
x
):
return
x
**
2
+
1
area
=
approximate_integral
(
a
=
0
,
b
=
1
,
n
=
5
,
f
=
my_function
)
(
area
)
# prints 1.33
So we get an area of 1.33. What happens if we use 1,000 rectangles? Let’s try it out in Example 127.
Example 127. Another integral approximation in Python
area
=
approximate_integral
(
a
=
0
,
b
=
1
,
n
=
1000
,
f
=
my_function
)
(
area
)
# prints 1.333333250000001
OK, we are getting some more precision here, and getting some more decimal places. What about one million rectangles as shown in Example 128?
Example 128. Yet another integral approximation in Python
area
=
approximate_integral
(
a
=
0
,
b
=
1
,
n
=
1_000_000
,
f
=
my_function
)
(
area
)
# prints 1.3333333333332733
OK, I think we are getting a diminishing return here and converging on the value $1.\overline{333}$ where the “.333” part is forever recurring. If this were a rational number, it is likely 4/3 = $1.\overline{333}$. As we increase the number of rectangles, the approximation starts to reach its limit at smaller and smaller decimals.
Now that we have some intuition on what we are trying to achieve and why, let’s do a more exact approach with SymPy, which happens to support rational numbers, in Example 129.
Example 129. Using SymPy to perform integration
from
sympy
import
*
# Declare 'x' to SymPy
x
=
symbols
(
'x'
)
# Now just use Python syntax to declare function
f
=
x
**
2
+
1
# Calculate the integral of the function with respect to x
# for the area between x = 0 and 1
area
=
integrate
(
f
,
(
x
,
0
,
1
))
(
area
)
# prints 4/3
Cool! So the area actually is 4/3, which is what our previous method converged on. Unfortunately, plain Python (and many programming languages) only support decimals, but computer algebra systems like SymPy give us exact rational numbers. We will be using integrals to find areas under curves in Chapters 2 and 3, although we will have scikitlearn do the work for us.
Conclusion
In this chapter we covered some foundations we will use for the rest of this book. From number theory to logarithms and calculus integrals, we highlighted some important mathematical concepts relevant to data science, machine learning, and analytics. You may have questions about why these concepts are useful. That will come next!
Before we move on to discuss probability, take a little time to skim these concepts one more time and then do the following exercises. You can always revisit this chapter as you progress through this book and refresh as necessary when you start applying these mathematical ideas.
Exercises

Is the value 62.6738 rational or irrational? Why or why not?

Evaluate the expression: ${10}^{7}{10}^{5}$

Evaluate the expression: ${81}^{\frac{1}{2}}$

Evaluate the expression: ${25}^{\frac{3}{2}}$

Assuming no payments are made, how much would a $1,000 loan be worth at 5% interest compounded monthly after 3 years?

Assuming no payments are made, how much would a $1,000 loan be worth at 5% interest compounded continuously after 3 years?

For the function $f\left(x\right)=3{x}^{2}+1$ what is the slope at x = 3?

For the function $f\left(x\right)=3{x}^{2}+1$ what is the area under the curve for x between 0 and 2?
Answers are in Appendix B.
Get Essential Math for Data Science now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.