Becoming a Data Analyst

Book description

Get started with your data science career with in-depth explanations of key concepts, real-world applications, and career advice

Key Features

  • Conquer all essential data analysis techniques using real-world examples, case studies, and hands-on exercises
  • Get to grips with ethical considerations in data analysis while developing problem-solving and critical thinking skills
  • Get career guidance and tips for building a portfolio, regardless of your background or experience

Book Description

Are you interested in becoming a data analyst, but don't know how to begin? Look no further than this comprehensive book that covers everything you need to know to get started with data analysis.

Becoming a Data Analyst will teach you about data collection techniques, data cleaning and pre-processing, exploratory data analysis, statistical analysis, data wrangling and transformation, data modeling, data presentation and communication, and ethical considerations in data analysis.

Each chapter provides clear, step-by-step guidance on how to conduct data analysis using various techniques and tools. You'll also benefit from real-world examples, case studies, hands-on exercises, and visual aids such as diagrams, charts, and graphs to help you understand complex concepts and data.

In addition to practical skills, this book emphasizes problem-solving skills, showing you how to identify and solve real-world problems using data analysis techniques. It also addresses ethical considerations in data analysis, such as data privacy and bias, and provides guidance on how to conduct data analysis in an ethical and responsible manner.

Whether you're just starting out or looking to take your data analysis skills to the next level, Becoming a Data Analyst is the essential guide you need to succeed.

What you will learn

  • Get to grips with data collection techniques, including surveys, interviews, and experiments
  • Start cleaning and pre-processing data by removing duplicates, dealing with missing values, and handling outliers
  • Use visualization and statistical techniques to gain insights from data
  • Understand basic statistical concepts such as probability distributions, hypothesis testing, and regression analysis
  • Transform data into a format that is suitable for analysis
  • Apply machine learning algorithms to build predictive models, evaluate them, and interpret the results
  • Create and communicate effective visualizations and dashboards
  • Ensure that your analysis is conducted in an ethical and responsible manner

Who this book is for

If you’re interested in pursuing a career in data analysis or seeking to enhance your existing data analysis skills, this book is for you. It’s designed for beginners, so no prior experience in data analysis is required. However, basic computer literacy, proficiency in using Microsoft Excel, and some familiarity with mathematics and statistics will help you on your journey. The book is suitable for students, recent graduates, and professionals from various fields, including business, finance, healthcare, and education, who want to learn how to analyze and interpret data to make informed decisions.

Table of contents

  1. Becoming a Data Analyst: A beginner’s guide to kickstarting your data analysis journey
  2. 1 Understanding the Business Context of Data Analysis
    1. Join our book community on Discord
    2. A Data Analyst’s Role in the Data Analytics Lifecyle
      1. Business Understanding
      2. Data Inspection
      3. Data Pre-processing & Preparation
      4. Exploratory Data Analysis
      5. Data Validation
      6. Explanatory Data Analysis
    3. Summary
  3. 2 Introduction to SQL
    1. Join our book community on Discord
    2. SQL and its use cases
      1. Brief History of SQL
      2. SQL and data analysis
    3. Different Databases
      1. Relational vs. Non-Relational Databases
      2. Popular DBMS’s
    4. SQL Terminology
      1. Query
      2. Statement
      3. Clause
      4. Keyword
      5. View
    5. Setting up your environment
      1. Choosing a DBMS
      2. Installing necessary software
      3. Creating a sample database
    6. Writing Basic SQL Queries
      1. SELECT Statement
      2. Structure of a Query
      3. INSERT Statements
      4. UPDATE Statements
      5. DELETE Statements
      6. SQL basic rules and syntax
    7. Filtering and organizing data with clauses
      1. WHERE Clause
      2. ORDER BY Clause
      3. DISTINCT Clause
      4. LIMIT Clause
    8. Using operators and functions
      1. Comparison operators
      2. Logical operators (AND, OR)
      3. LIKE operator
      4. Arithmetic operators
      5. Functions for calculations
      6. Functions for text manipulation
      7. Date functions
    9. Summary
  4. 3 Joining Tables in SQL
    1. Join our book community on Discord
    2. Table relations
      1. Implementing Relationships in SQL
      2. SQL Joins
      3. Best Practices for Using JOIN in SQL
    3. Summary
  5. 4 Creating Business Metrics with Aggregations
    1. Join our book community on Discord
    2. Aggregations in Business Metrics
      1. Aggregations in SQL to Analyze Data
      2. GROUP BY Clause
      3. HAVING clause
      4. Best Practices for Aggregations
    3. Summary
  6. 5 Advanced SQL
    1. Join our book community on Discord
    2. Working with subqueries
      1. Types of subqueries
      2. Non-code explanation of subquery
      3. Using a basic subquery on our library database
      4. Subquery vs joining tables
      5. Rules of subquery usage
      6. More advanced subquery usage on our library database
    3. Common Table Expressions
      1. Use cases for CTEs
      2. Examples with the library database
    4. Window functions: A panoramic view of your data
      1. Example with the Library Database
    5. Understanding date time manipulation
      1. Date and time functions
      2. Examples with the library database
    6. Understanding text manipulation
      1. Text functions
      2. Text functions in the library databaseU
    7. Best practices: bringing it all together
      1. Write readable SQL Code
      2. Be careful with NULL values
      3. Use subqueries and CTEs wisely
      4. Think about performance
      5. Test your queries
    8. Summary
  7. 6 SQL for Data Analysis Case Study
    1. Join our book community on Discord
    2. Setting up the database
    3. Performing data analysis with SQL
    4. Exploring the data
      1. General data insights
    5. Analyzing the data
      1. Examining the clothing category
      2. Determining the number of customers
      3. Researching the top payment methods
      4. Gathering customer feedback
      5. Exploring the relationship between ratings and sales
      6. Finding the percentage of products with reviews
      7. Effectiveness of discounts
      8. Identifying the top customers
      9. Top-selling clothing products
      10. Top 5 high-performing products
      11. Most popular product by country
      12. ADDResearching the performance of delivery
      13. Future projections with linear regression
    6. Summary
  8. 7 Fundamental Statistical Concepts
    1. Join our book community on Discord
    2. Descriptive statistics
      1. Levels of measurement
      2. Measures of central tendency
      3. Measures of variability
    3. Inferential statistics
      1. Probability theory
      2. Probability distributions
      3. Correlation vs causation
    4. Summary
  9. 8 Testing Hypotheses
    1. Join our book community on Discord
    2. Technical requirements (H1 – Section)
    3. Introduction to Hypothesis Testing
      1. Role of Hypothesis Testing in Data Analysis
      2. Null and Alternative Hypothesis
      3. Step by Step Guide to Performing Hypothesis Testing
    4. One Sample t-Test
    5. Conditions for Performing a One-Sample T-Test
      1. Case Study: Average Exam Scores
    6. Two Sample t-Test
      1. Case Study: Comparing Exam Scores Between Two Schools
    7. Chi Square Test
      1. Case Study: Effect of Tutoring on Passing Rates
    8. Analysis of Variance (ANOVA)
      1. Case Study: Comparing Exam Scores Among Three Schools
    9. Summary
  10. 9 Business Statistics Case Study
    1. Join our book community on Discord
    2. Technical requirements (H1 – Section)
    3. Case Study Overview
      1. Learning Objectives:
      2. Questions:
      3. Solutions:
    4. Additional Topics to Explore
      1. Text Analytics
      2. Big Data
      3. Time Series Analysis
      4. Predictive Analytics
      5. Prescriptive Analytics & Optimization
      6. Database Management
    5. Where to practice
    6. Summary
  11. 10 Data analysis and programming
    1. Join our book community on Discord
    2. The role of programming and our case
    3. Different programming languages
      1. Python
      2. R
      3. SQL
      4. Julia
      5. MATLAB
    4. Working with the Command Line Interface (CLI)
      1. Command Line Interface (CLI) vs Graphical User Interface (GUI)
      2. Accessing the CLI
      3. Typical CLI tasks
      4. Using the CLI for programming
    5. Setting up your system for Python programming
      1. Check if Python is installed
      2. MacOS
      3. Linux
      4. Windows
      5. Browser (cloud-based)
      6. Testing the Python setup
    6. Python use cases for CleanAndGreen
      1. Data Cleaning and Preparation
      2. Data Visualization
      3. Statistical Modeling
      4. Predictive Modeling/Machine Learning
      5. General remarks on Python
    7. Summary
  12. 11 Introduction to Python
    1. Join our book community on Discord
    2. Understanding the Python Syntax
      1. Print Statements
      2. Comments
      3. Variables
      4. Operations on variables
      5. Operators and Expressions
    3. Exploring Data Types in Python
      1. Strings
      2. Integers
      3. Floats
      4. Booleans
      5. Type Conversion
    4. Indexing and Slicing in Python
    5. Unpacking Data Structures
      1. Lists
      2. Dictionaries
      3. Sets
      4. Tuples
    6. Mastering Control Flow Structures
      1. Conditional Statements in Python
      2. Looping in Python
    7. Functions in Python
      1. Creating Your Own Functions
      2. Python Built-In Functions
    8. Summary
  13. 12 Analyzing data with NumPy & Pandas
    1. Join our book community on Discord
    2. Introduction to NumPy
      1. Installing and Importing NumPy
      2. Basic NumPy Operations
    3. Statistical and Mathematical Operations
      1. Mathematical Operations with NumPy Arrays
    4. Multi-dimensional Arrays
      1. Creating Multi-dimensional Arrays
      2. Accessing elements in Multi-dimensional Arrays
      3. Reading Data from a CSV File
    5. Introduction to Pandas
      1. Series and DataFrame
      2. Loading Data with Pandas
      3. Data Analysis with Pandas
      4. Data Analysis
    6. Summary
  14. 13 Introduction to Exploratory Data Analysis
    1. Join our book community on Discord
    2. The Importance of EDA
      1. The EDA Process
      2. Tools and Techniques
    3. Univariate Analysis
      1. Analyzing Continuous Variables
      2. Analyzing Categorical Variables
    4. Bivariate Analysis
      1. Understanding bivariate analysis
      2. Correlation vs Causation
      3. Visualizing relationships between two continuous variables
    5. Multivariate analysis
      1. Heatmaps
      2. Pair plots
    6. Summary
  15. 14 Data Cleaning
    1. Join our book community on Discord
    2. Technical requirements
    3. Importance of data cleaning
      1. Impact on data quality
      2. Relevance to business decisions
    4. Common data cleaning challenges
      1. Inconsistent formats
      2. Misspellings and Inaccuracies
      3. Duplicate records
    5. Dealing with missing values
      1. Causes of missing values
      2. Strategies for handling missing values
      3. Types of missing data
    6. Dealing with duplicate values
      1. Causes of duplicate data
      2. Identification and removal
    7. Dealing with outliers
      1. Types of outliers
      2. Impact on analysis
      3. Techniques for identifying and handling outliers
    8. Cleaning and transforming data
      1. Handling inconsistencies
      2. Converting categorical data
      3. Normalizing numerical features
    9. Data validation
      1. Validation methods
    10. Summary
  16. 17 Exploratory Data Analysis Case Study
    1. Join our book community on Discord
    2. Technical Requirements
    3. E-commerce Sales Optimization Case Study
      1. Time Series Analysis
      2. Customer Segmentation
      3. Product Analysis
      4. Payment and Returns
      5. Case Study Answers
    4. Summary

Product information

  • Title: Becoming a Data Analyst
  • Author(s): Kedeisha Bryan, Maaike van Putten
  • Release date: February 2024
  • Publisher(s): Packt Publishing
  • ISBN: 9781805126416