Data Science 101: Methodology, Python, and Essential Math

Video description

The opening part of Data Science 101 examines some frequently asked questions.

Following that, we will explore data science methodology with a case study. You will see the typical data science steps and techniques utilized by data professionals. Next, you will build a simple chatbot so you can get a clear sense of what is involved.

The next part is an introduction to data science in Python. You will have an opportunity to master Python for data science as each section is followed by an assignment to practice your skills. By the end of the section, you will understand Python fundamentals, decision and looping structures, Python functions, how to work with nested data, and list comprehension. Finally, we will wrap up the two most popular libraries for data science—NumPy and Pandas.

The last part delves into essential math for data science. You will get the hang of linear algebra along with probability and statistics. Our goal for the linear algebra part is to introduce all necessary concepts and intuition for an in-depth understanding of an often-utilized technique for data fitting called least squares. We will spend a lot of time on probability, both classical and Bayesian, as reasoning about problems is a much more difficult aspect than simply running statistics.

By the end of this course, you will understand data science methodology and how to use essential math in your real projects.

What You Will Learn

  • Examine frequent questions asked by passionate learners
  • Explore data science methodology with a healthcare insurance case study
  • Solve a system of linear equations
  • Define the idea of a vector space
  • Recognize the proper probability model for your use case
  • Compute a least-squares solution through pseudoinverse

Audience

This course is designed for people who are new to data science or who are interested in pursuing a career in data science, as well as those who wish to obtain a broad overview before diving into specialized data science topics.

This course will also benefit students who want to master the fundamental arithmetic for data science or obtain an introduction to data science in Python.

You need not have any prior experience in data science to take up this course.

About The Author

Ermin Dedic: Ermin Dedic started his studies by studying psychology for six years. He received his bachelor’s degree from the University of Ottawa, Canada, and his master’s degree from the University of Calgary, Canada. Ermin also spent two years in a master’s program (school/child psychology) at the University of Calgary before voluntarily withdrawing, in part to focus more on his teaching. It was through academia that he was introduced to and fell in love with statistics and statistical programming with SAS.

He is passionate about making education accessible and fun for students. Ermin believes that students learn better when they feel the passion that the instructor has for the content.

Table of contents

  1. Chapter 1 : Introduction to Data Science 101
    1. Matching Activity - Match the Project to the Data Role
    2. Introduction to Data Science
    3. What a Data Scientist Does
    4. Big Data
    5. Data Mining
    6. Machine Learning Versus Deep Learning
    7. Advice to Data Scientists
  2. Chapter 2 : Best Language for Data Science
    1. What IS the Best Language for Data Science?
    2. Python
    3. SAS (Statistical Analysis System)
    4. R
    5. SQL
  3. Chapter 3 : Data Science Methodology
    1. Data Science Methodology/Process Introduction
    2. Business Understanding
    3. Data Understanding
    4. Data Prep
    5. Modelling
    6. Evaluation
    7. Deployment
  4. Chapter 4 : Data Science Through Chatbot
    1. Purpose of Chatbot Section
    2. What is a Chatbot?
    3. Signing Up for Watson Assistant
    4. Creating a Name - Healthcare Service Chatbot
    5. Intents
    6. Entities
    7. Suggestions for More Learning
    8. Section Recap: Natural Language Processing, Machine Learning, and Use Cases
  5. Chapter 5 : Libraries, APIs, Datasets
    1. Libraries
    2. APIs
    3. Datasets
  6. Chapter 6 : GitHub
    1. Introduction to GitHub
    2. Create a Repository
    3. Create a Branch and Commit Changes
    4. Pull Request and Merging Pull Request
  7. Chapter 7 : Installation / Jupyter / Comments (Windows and MacOS/Jupyter Notebook)
    1. Windows - Download Anaconda Distribution (Includes Python!)
    2. Windows - Install Anaconda Distribution
    3. Windows - Setting Up Environment
    4. Windows - Opening Jupyter Notebook
    5. MacOS - Anaconda Download and Install
    6. MacOS - Conda Environment
    7. MacOS - Jupyter Notebook
    8. Jupyter Notebook Interface and Shortcuts
  8. Chapter 8 : Introduction to Data Science in Python - Python Fundamentals
    1. How to Use Markdown Cells (Adding Headers, Links, and Images)
    2. Comments - Inline and Block Comments
    3. Python Indentation
    4. Writing Single and Multiple Lines of Code
    5. Understanding Variables
    6. Main Data Types and Creating Them (Integer, Float, String, List, Dictionary)
    7. Lists - How to Use
    8. Dictionaries - How to Use
    9. Creating a Tuple
    10. Tuple - How to Use
    11. Creating a Set
    12. Set - How to Use
    13. Operators
  9. Chapter 9 : Introduction to Data Science in Python - Decision and Looping Structures
    1. Introducing Decision and Looping Structures
    2. If Statement
    3. Else Statement
    4. Elif
    5. For Loop
    6. While Loop
    7. Break and Continue Statements
  10. Chapter 10 : Introduction to Data Science in Python - Python Functions
    1. Introducing Functions
    2. Functions - General Syntax
    3. +1 Function
    4. Fav Band Function
    5. Celsius to Fahrenheit Function
    6. Optional Return Statement (and Comparing It to Print Statement)
    7. Defining a Function Versus Calling a Function
    8. Practical/Real World Example: Function to Get Reddit Data
    9. Lambda Introduction (Anonymous Functions)
    10. Formal Function Versus Lambda for Splitting Strings
  11. Chapter 11 : Introduction to Data Science - Nested Data, Iteration, and List Comprehension
    1. Introducing you to Nested Data and Iteration
    2. Simple Nested Example
    3. Double Indexing
    4. Assigning Values
    5. List of Dicts and Dicts of Dicts Example
    6. Nested Iteration - Iterating Through List of Lists
    7. Defining List Comprehension and Syntax
    8. List Comprehension - Simple Examples
    9. List Comp as an Alternative to Loops
    10. Practical/Real World Example - Using Common Mathematical Notation
    11. Practical/Real World Example - Creating a Constrained ID
    12. Activity: Building Intuition (Loops, Nested Data, Iteration, and List Comp)
  12. Chapter 12 : Introduction to Data Science in Python - Learn NumPy
    1. Introducing NumPy
    2. Creating Our First NumPy Array
    3. Shaping an Array (When You Know the Shape You Want)
    4. Creating a Sequence of Integers and Floats
    5. Element-Wise Operations
    6. A Range with a Shape (Arrange Function with Reshape Function)
    7. NumPy Indexing
    8. NumPy Slicing
    9. Indexing and Slicing with Breast Cancer Wisconsin Dataset
    10. Delete Elements
    11. Append
    12. Insert Elements
    13. Reshape -1 Feature
    14. Flatten
    15. Transpose
    16. Concatenate
    17. Splitting
    18. Aggregate/Statistical Functions
  13. Chapter 13 : Introduction to Data Science in Python - Pandas
    1. Introducing Pandas
    2. For SAS Programmers: Analogous Terms in Pandas (Python)
    3. Using Series as Input into DataFrame
    4. Comparing Series and DataFrame
    5. Importing TSLA Dataset
    6. Index-Based Selection (iloc)
    7. Label-Based Selection (loc)
    8. Conditional Selection
    9. Summary Functions
    10. Grouping (groupby)
    11. Sorting
    12. Checking Data Types and Converting
    13. Dealing with Missing Values
    14. Dropping Columns/Variables and Records/Rows
    15. Renaming Columns/Variables and Records/Rows
    16. Concat Function + Pop Quiz
    17. Real-World Activity: Add New Columns and Predict Stock Movement
  14. Chapter 14 : Introduction to Data Science in Python - Python Activity Solutions
    1. Solution - Fill in Activity - Fundamentals
    2. Solution - Fill in Activity - Looping and Functions
    3. Solution - Fill in Activity - Nested and List Comprehension
    4. Solution - Fill in Activity - NumPy
  15. Chapter 15 : Essential Math for Data Science - Linear Algebra Made Easy
    1. Linear Equation Definition
    2. Forms of a Linear Equation
    3. Systems of Linear Equations
    4. Line and Plane
    5. Aij Notation
    6. System of Equations as a Matrix
    7. System in Corresponding Forms
    8. Row Echelon Form (Gaussian Elimination)
    9. Reduced Row Echelon Form
    10. Row Operations Rules
    11. Row Operations Example (REF)
    12. Visualizing Ax=b
    13. General Formula - Matrix Vector Multiplication
    14. Tips for Row Operations
  16. Chapter 16 : Essential Math for Data Science - Mathematical Structures
    1. Mathematical Structures
    2. Abelian Groups and Fields
    3. Vector Spaces 1
    4. Vector Spaces - Concrete Example
    5. Subspaces
    6. Linear Combinations and Span
    7. Is It in the Span?
    8. Linear Independence
    9. A Basis for a Vector Space
    10. Dim of C(A) and N(A)
    11. The Dimension of a Vector Space
    12. Linear Maps
    13. The Four Fundamental Subspaces
    14. Adding Geometry to Vector Spaces
    15. Orthogonal Projection - How to Derive Projection and Check for Orthogonality
    16. Least Squares
    17. Least Squares Through Pseudoinverse - with Python and SAS code
  17. Chapter 17 : Essential Math for Data Science - Introduction to Probability
    1. Probability Models and Axioms
    2. Simple Counting
    3. Discrete Example
    4. Conditional Bayes
    5. Conditional Example 1
    6. Conditional Healthcare (Cancer) Example 2
    7. Independence of Events (What It Means and Does Not Mean)
    8. Permutations and Combinations
  18. Chapter 18 : Essential Math for Data Science - Random Variables and Multiple Variables
    1. Random Variables
    2. Probability Mass Function and Discrete R.V.s
    3. Expectation and Variance for Discrete Random Variables
    4. Joint PMFs (Multiple Discrete Variables)
    5. Continuous Random Variables
    6. Continuous Random Variables and Probability Density Function
    7. Continuous R.V. Example
    8. Joint PDF Example - Banking
    9. Cumulative Distribution Function (CDF)
    10. Covariance, Correlation, and More on Variance
    11. Law of Large Numbers (LLN)
    12. Central Limit Theorem (CLT)
  19. Chapter 19 : Essential Math for Data Science - Statistical Inference
    1. Statistical Inference
    2. Bayesian Estimator
    3. Example - Bayesian Estimator
    4. Mean Squared Error = Variance. Why?

Product information

  • Title: Data Science 101: Methodology, Python, and Essential Math
  • Author(s): Ermin Dedic
  • Release date: April 2022
  • Publisher(s): Packt Publishing
  • ISBN: 9781803242125