Data Analysis with Python

Book description

Learn a modern approach to data analysis using Python to harness the power of programming and AI across your data. Detailed case studies bring this modern approach to life across visual data, social media, graph algorithms, and time series analysis.

Key Features

  • Bridge your data analysis with the power of programming, complex algorithms, and AI
  • Use Python and its extensive libraries to power your way to new levels of data insight
  • Work with AI algorithms, TensorFlow, graph algorithms, NLP, and financial time series
  • Explore this modern approach across with key industry case studies and hands-on projects

Book Description

Data Analysis with Python offers a modern approach to data analysis so that you can work with the latest and most powerful Python tools, AI techniques, and open source libraries. Industry expert David Taieb shows you how to bridge data science with the power of programming and algorithms in Python. You'll be working with complex algorithms, and cutting-edge AI in your data analysis. Learn how to analyze data with hands-on examples using Python-based tools and Jupyter Notebook. You'll find the right balance of theory and practice, with extensive code files that you can integrate right into your own data projects.

Explore the power of this approach to data analysis by then working with it across key industry case studies. Four fascinating and full projects connect you to the most critical data analysis challenges you're likely to meet in today. The first of these is an image recognition application with TensorFlow ? embracing the importance today of AI in your data analysis. The second industry project analyses social media trends, exploring big data issues and AI approaches to natural language processing. The third case study is a financial portfolio analysis application that engages you with time series analysis - pivotal to many data science applications today. The fourth industry use case dives you into graph algorithms and the power of programming in modern data science. You'll wrap up with a thoughtful look at the future of data science and how it will harness the power of algorithms and artificial intelligence.

What you will learn

  • A new toolset that has been carefully crafted to meet for your data analysis challenges
  • Full and detailed case studies of the toolset across several of today's key industry contexts
  • Become super productive with a new toolset across Python and Jupyter Notebook
  • Look into the future of data science and which directions to develop your skills next

Who this book is for

This book is for developers wanting to bridge the gap between them and data scientists. Introducing PixieDust from its creator, the book is a great desk companion for the accomplished Data Scientist. Some fluency in data interpretation and visualization is assumed. It will be helpful to have some knowledge of Python, using Python libraries, and some proficiency in web development.

Table of contents

  1. Data Analysis with Python
    1. Table of Contents
    2. Data Analysis with Python
      1. Why subscribe?
      2. PacktPub.com
    3. Contributors
      1. About the author
      2. About the reviewers
      3. Packt is searching for authors like you
    4. Preface
      1. Why am I writing this book?
      2. Who this book is for
      3. What this book covers
      4. To get the most out of this book
        1. Download the example code files
        2. Download the color images
        3. Conventions used
      5. Get in touch
        1. Reviews
    5. 1. Programming and Data Science – A New Toolset
      1. What is data science
      2. Is data science here to stay?
      3. Why is data science on the rise?
      4. What does that have to do with developers?
      5. Putting these concepts into practice
      6. Deep diving into a concrete example
      7. Data pipeline blueprint
      8. What kind of skills are required to become a data scientist?
      9. IBM Watson DeepQA
      10. Back to our sentiment analysis of Twitter hashtags project
      11. Lessons learned from building our first enterprise-ready data pipeline
      12. Data science strategy
      13. Jupyter Notebooks at the center of our strategy
        1. Why are Notebooks so popular?
      14. Summary
    6. 2. Python and Jupyter Notebooks to Power your Data Analysis
      1. Why choose Python?
      2. Introducing PixieDust
      3. SampleData – a simple API for loading data
      4. Wrangling data with pixiedust_rosie
      5. Display – a simple interactive API for data visualization
      6. Filtering
      7. Bridging the gap between developers and data scientists with PixieApps
      8. Architecture for operationalizing data science analytics
      9. Summary
    7. 3. Accelerate your Data Analysis with Python Libraries
      1. Anatomy of a PixieApp
        1. Routes
        2. Generating requests to routes
        3. A GitHub project tracking sample application
        4. Displaying the search results in a table
        5. Invoking the PixieDust display() API using pd_entity attribute
        6. Invoking arbitrary Python code with pd_script
        7. Making the application more responsive with pd_refresh
        8. Creating reusable widgets
      2. Summary
    8. 4. Publish your Data Analysis to the Web - the PixieApp Tool
      1. Overview of Kubernetes
      2. Installing and configuring the PixieGateway server
        1. PixieGateway server configuration
        2. PixieGateway architecture
        3. Publishing an application
        4. Encoding state in the PixieApp URL
        5. Sharing charts by publishing them as web pages
        6. PixieGateway admin console
        7. Python Console
        8. Displaying warmup and run code for a PixieApp
      3. Summary
    9. 5. Python and PixieDust Best Practices and Advanced Concepts
      1. Use @captureOutput decorator to integrate the output of third-party Python libraries
        1. Create a word cloud image with @captureOutput
      2. Increase modularity and code reuse
        1. Creating a widget with pd_widget
        2. PixieDust support of streaming data
          1. Adding streaming capabilities to your PixieApp
        3. Adding dashboard drill-downs with PixieApp events
        4. Extending PixieDust visualizations
        5. Debugging
          1. Debugging on the Jupyter Notebook using pdb
          2. Visual debugging with PixieDebugger
          3. Debugging PixieApp routes with PixieDebugger
          4. Troubleshooting issues using PixieDust logging
          5. Client-side debugging
      3. Run Node.js inside a Python Notebook
      4. Summary
    10. 6. Analytics Study: AI and Image Recognition with TensorFlow
      1. What is machine learning?
      2. What is deep learning?
      3. Getting started with TensorFlow
        1. Simple classification with DNNClassifier
      4. Image recognition sample application
        1. Part 1 – Load the pretrained MobileNet model
        2. Part 2 – Create a PixieApp for our image recognition sample application
        3. Part 3 – Integrate the TensorBoard graph visualization
        4. Part 4 – Retrain the model with custom training data
      5. Summary
    11. 7. Analytics Study: NLP and Big Data with Twitter Sentiment Analysis
      1. Getting started with Apache Spark
        1. Apache Spark architecture
        2. Configuring Notebooks to work with Spark
      2. Twitter sentiment analysis application
      3. Part 1 – Acquiring the data with Spark Structured Streaming
        1. Architecture diagram for the data pipeline
        2. Authentication with Twitter
        3. Creating the Twitter stream
        4. Creating a Spark Streaming DataFrame
        5. Creating and running a structured query
        6. Monitoring active streaming queries
        7. Creating a batch DataFrame from the Parquet files
      4. Part 2 – Enriching the data with sentiment and most relevant extracted entity
        1. Getting started with the IBM Watson Natural Language Understanding service
      5. Part 3 – Creating a real-time dashboard PixieApp
        1. Refactoring the analytics into their own methods
        2. Creating the PixieApp
      6. Part 4 – Adding scalability with Apache Kafka and IBM Streams Designer
        1. Streaming the raw tweets to Kafka
        2. Enriching the tweets data with the Streaming Analytics service
        3. Creating a Spark Streaming DataFrame with a Kafka input source
      7. Summary
    12. 8. Analytics Study: Prediction - Financial Time Series Analysis and Forecasting
      1. Getting started with NumPy
        1. Creating a NumPy array
        2. Operations on ndarray
        3. Selections on NumPy arrays
        4. Broadcasting
      2. Statistical exploration of time series
        1. Hypothetical investment
        2. Autocorrelation function (ACF) and partial autocorrelation function (PACF)
      3. Putting it all together with the StockExplorer PixieApp
        1. BaseSubApp – base class for all the child PixieApps
        2. StockExploreSubApp – first child PixieApp
        3. MovingAverageSubApp – second child PixieApp
        4. AutoCorrelationSubApp – third child PixieApp
      4. Time series forecasting using the ARIMA model
        1. Build an ARIMA model for the MSFT stock time series
        2. StockExplorer PixieApp Part 2 – add time series forecasting using the ARIMA model
      5. Summary
    13. 9. Analytics Study: Graph Algorithms - US Domestic Flight Data Analysis
      1. Introduction to graphs
        1. Graph representations
        2. Graph algorithms
        3. Graph and big data
      2. Getting started with the networkx graph library
        1. Creating a graph
        2. Visualizing a graph
      3. Part 1 – Loading the US domestic flight data into a graph
        1. Graph centrality
      4. Part 2 – Creating the USFlightsAnalysis PixieApp
      5. Part 3 – Adding data exploration to the USFlightsAnalysis PixieApp
      6. Part 4 – Creating an ARIMA model for predicting flight delays
      7. Summary
    14. 10. The Future of Data Analysis and Where to Develop your Skills
      1. Forward thinking – what to expect for AI and data science
      2. References
    15. A. PixieApp Quick-Reference
      1. Annotations
      2. Custom HTML attributes
      3. Methods
    16. Other Books You May Enjoy
      1. Leave a review – let other readers know what you think
    17. Index

Product information

  • Title: Data Analysis with Python
  • Author(s): David Taieb
  • Release date: December 2018
  • Publisher(s): Packt Publishing
  • ISBN: 9781789950069