Become a Python Data Analyst

Book description

Enhance your data analysis and predictive modeling skills using popular Python tools

Key Features

  • Cover all fundamental libraries for operation and manipulation of Python for data analysis
  • Implement real-world datasets to perform predictive analytics with Python
  • Access modern data analysis techniques and detailed code with scikit-learn and SciPy

Book Description

Python is one of the most common and popular languages preferred by leading data analysts and statisticians for working with massive datasets and complex data visualizations.

Become a Python Data Analyst introduces Python's most essential tools and libraries necessary to work with the data analysis process, right from preparing data to performing simple statistical analyses and creating meaningful data visualizations.

In this book, we will cover Python libraries such as NumPy, pandas, matplotlib, seaborn, SciPy, and scikit-learn, and apply them in practical data analysis and statistics examples. As you make your way through the chapters, you will learn to efficiently use the Jupyter Notebook to operate and manipulate data using NumPy and the pandas library. In the concluding chapters, you will gain experience in building simple predictive models and carrying out statistical computation and analysis using rich Python tools and proven data analysis techniques.

By the end of this book, you will have hands-on experience performing data analysis with Python.

What you will learn

  • Explore important Python libraries and learn to install Anaconda distribution
  • Understand the basics of NumPy
  • Produce informative and useful visualizations for analyzing data
  • Perform common statistical calculations
  • Build predictive models and understand the principles of predictive analytics

Who this book is for

Become a Python Data Analyst is for entry-level data analysts, data engineers, and BI professionals who want to make complete use of Python tools for performing efficient data analysis. Prior knowledge of Python programming is necessary to understand the concepts covered in this book

Publisher resources

Download Example Code

Table of contents

  1. Title Page
  2. Copyright and Credits
    1. Become a Python Data Analyst
  3. Packt Upsell
    1. Why subscribe?
    2. Packt.com
  4. Contributor
    1. About the author
    2. Packt is searching for authors like you
  5. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
      1. Download the example code files
      2. Download the color images
      3. Conventions used
    4. Get in touch
      1. Reviews
  6. The Anaconda Distribution and Jupyter Notebook
    1. The Anaconda distribution
      1. Installing Anaconda
    2. Jupyter Notebook
      1. Creating your own Jupyter Notebook
      2. Notebook user interfaces
    3. Using the Jupyter Notebook
      1. Running code in a code cell
      2. Running markdown syntax in a text cell
        1. Styles and formats
        2. Lists
      3. Useful keyboard shortcuts
    4. Summary
  7. Vectorizing Operations with NumPy
    1. Introduction to NumPy
      1. Problems and solutions
    2.  NumPy arrays
      1. Creating arrays in NumPy
        1. Creating arrays from lists
        2. Creating arrays from built-in NumPy functions
      2. Attributes of arrays
      3. Basic math with arrays
      4. Common manipulations with arrays
        1. Indexing arrays
        2. Slicing arrays
        3. Reshaping arrays
    3. Using NumPy for simulations
      1. Coin flips
      2. Simulating stock returns
    4. Summary
  8. Pandas - Everyone's Favorite Data Analysis Library
    1. Introduction to the pandas library
      1. Important objects in pandas
        1. Series
          1. Creating a pandas series
        2. DataFrames
          1. Creating a pandas DataFrame
          2. Anatomy of a DataFrame
    2. Operations and manipulations of pandas
      1. Inspection of data
      2. Selection, addition, and deletion of data
      3. Slicing DataFrames
      4. Selection by labels
    3. Answering simple questions about a dataset
      1. Total employees by department in the dataset
      2. Overall attrition rate
      3. Average hourly rate
      4. Average number of years
      5. Employees with the most number of years
      6. Overall employee satisfaction
    4. Answering further questions
      1. Employees with Low JobSatisfaction
      2. Employees with both Low JobSatisfaction and JobInvolvement
      3. Employee comparison
    5. Summary
  9. Visualization and Exploratory Data Analysis
    1. Introducing Matplotlib
      1. Terminologies in Matplotlib
    2. Introduction to pyplot
    3. Object-oriented interface
    4. Common customizations
      1. Colors
        1. Colornames
      2. Setting axis limits
      3. Setting ticks and tick labels
      4. Legend
      5. Annotations
      6. Producing grids, horizontal, and vertical lines
    5. EDA with seaborn and pandas
      1. Understanding the seaborn library
      2. Performing exploratory data analysis
      3. Key objectives when performing data analysis
      4. Types of variable
    6. Analyzing variables individually
      1. Understanding the main variable
      2. Numerical variables
      3. Categorical variables
    7. Relationships between variables
      1. Scatter plot
      2. Box plot
      3. Complex conditional plots
    8. Summary
  10. Statistical Computing with Python
    1. Introduction to SciPy
      1. Statistics subpackage 
        1. Confidence intervals
          1. Probability calculations
    2. Hypothesis testing
      1. Performing statistical tests 
    3. Summary
  11. Introduction to Predictive Analytics Models
    1. Predictive analytics and machine learning
    2. Understanding the scikit-learn library
      1. scikit-learn
    3. Building a regression model using scikit-learn
    4. Regression model to predict house prices
    5. Summary
  12. Other Books You May Enjoy
    1. Leave a review - let other readers know what you think

Product information

  • Title: Become a Python Data Analyst
  • Author(s): Alvaro Fuentes
  • Release date: August 2018
  • Publisher(s): Packt Publishing
  • ISBN: 9781789531701