Python Data Analysis

Book description

Learn how to apply powerful data analysis techniques with popular open source Python modules

In Detail

Python is a multi-paradigm programming language well suited for both object-oriented application development as well as functional design patterns. Python has become the language of choice for data scientists for data analysis, visualization, and machine learning. It will give you velocity and promote high productivity.

This book will teach novices about data analysis with Python in the broadest sense possible, covering everything from data retrieval, cleaning, manipulation, visualization, and storage to complex analysis and modeling. It focuses on a plethora of open source Python modules such as NumPy, SciPy, matplotlib, pandas, IPython, Cython, scikit-learn, and NLTK. In later chapters, the book covers topics such as data visualization, signal processing, and time-series analysis, databases, predictive analytics and machine learning. This book will turn you into an ace data analyst in no time.

What You Will Learn

  • Install open source Python modules on various platforms
  • Get to know about the fundamentals of NumPy including arrays
  • Manipulate data with pandas
  • Retrieve, process, store, and visualize data
  • Understand signal processing and time-series data analysis
  • Work with relational and NoSQL databases
  • Discover more about data modeling and machine learning
  • Get to grips with interoperability and cloud computing

Table of contents

  1. Python Data Analysis
    1. Table of Contents
    2. Python Data Analysis
    3. Credits
    4. About the Author
    5. About the Reviewers
    6. www.PacktPub.com
      1. Support files, eBooks, discount offers, and more
        1. Why subscribe?
        2. Free access for Packt account holders
    7. Preface
      1. What this book covers
      2. What you need for this book
      3. Who this book is for
      4. Conventions
      5. Reader feedback
      6. Customer support
        1. Downloading the example code
        2. Errata
        3. Piracy
        4. Questions
    8. 1. Getting Started with Python Libraries
      1. Software used in this book
        1. Installing software and setup
        2. On Windows
        3. On Linux
        4. On Mac OS X
      2. Building NumPy, SciPy, matplotlib, and IPython from source
      3. Installing with setuptools
      4. NumPy arrays
      5. A simple application
      6. Using IPython as a shell
      7. Reading manual pages
      8. IPython notebooks
      9. Where to find help and references
      10. Summary
    9. 2. NumPy Arrays
      1. The NumPy array object
        1. The advantages of NumPy arrays
      2. Creating a multidimensional array
      3. Selecting NumPy array elements
      4. NumPy numerical types
        1. Data type objects
        2. Character codes
        3. The dtype constructors
        4. The dtype attributes
      5. One-dimensional slicing and indexing
      6. Manipulating array shapes
        1. Stacking arrays
        2. Splitting NumPy arrays
        3. NumPy array attributes
        4. Converting arrays
      7. Creating array views and copies
      8. Fancy indexing
      9. Indexing with a list of locations
      10. Indexing NumPy arrays with Booleans
      11. Broadcasting NumPy arrays
      12. Summary
    10. 3. Statistics and Linear Algebra
      1. NumPy and SciPy modules
      2. Basic descriptive statistics with NumPy
      3. Linear algebra with NumPy
        1. Inverting matrices with NumPy
        2. Solving linear systems with NumPy
      4. Finding eigenvalues and eigenvectors with NumPy
      5. NumPy random numbers
        1. Gambling with the binomial distribution
        2. Sampling the normal distribution
        3. Performing a normality test with SciPy
      6. Creating a NumPy-masked array
        1. Disregarding negative and extreme values
      7. Summary
    11. 4. pandas Primer
      1. Installing and exploring pandas
      2. pandas DataFrames
      3. pandas Series
      4. Querying data in pandas
      5. Statistics with pandas DataFrames
      6. Data aggregation with pandas DataFrames
      7. Concatenating and appending DataFrames
      8. Joining DataFrames
      9. Handling missing values
      10. Dealing with dates
      11. Pivot tables
      12. Remote data access
      13. Summary
    12. 5. Retrieving, Processing, and Storing Data
      1. Writing CSV files with NumPy and pandas
      2. Comparing the NumPy .npy binary format and pickling pandas DataFrames
      3. Storing data with PyTables
      4. Reading and writing pandas DataFrames to HDF5 stores
      5. Reading and writing to Excel with pandas
      6. Using REST web services and JSON
      7. Reading and writing JSON with pandas
      8. Parsing RSS and Atom feeds
      9. Parsing HTML with Beautiful Soup
      10. Summary
    13. 6. Data Visualization
      1. matplotlib subpackages
      2. Basic matplotlib plots
      3. Logarithmic plots
      4. Scatter plots
      5. Legends and annotations
      6. Three-dimensional plots
      7. Plotting in pandas
      8. Lag plots
      9. Autocorrelation plots
      10. Plot.ly
      11. Summary
    14. 7. Signal Processing and Time Series
      1. statsmodels subpackages
      2. Moving averages
      3. Window functions
      4. Defining cointegration
      5. Autocorrelation
      6. Autoregressive models
      7. ARMA models
      8. Generating periodic signals
      9. Fourier analysis
      10. Spectral analysis
      11. Filtering
      12. Summary
    15. 8. Working with Databases
      1. Lightweight access with sqlite3
      2. Accessing databases from pandas
      3. SQLAlchemy
        1. Installing and setting up SQLAlchemy
        2. Populating a database with SQLAlchemy
        3. Querying the database with SQLAlchemy
      4. Pony ORM
      5. Dataset – databases for lazy people
      6. PyMongo and MongoDB
      7. Storing data in Redis
      8. Apache Cassandra
      9. Summary
    16. 9. Analyzing Textual Data and Social Media
      1. Installing NLTK
      2. Filtering out stopwords, names, and numbers
      3. The bag-of-words model
      4. Analyzing word frequencies
      5. Naive Bayes classification
      6. Sentiment analysis
      7. Creating word clouds
      8. Social network analysis
      9. Summary
    17. 10. Predictive Analytics and Machine Learning
      1. A tour of scikit-learn
      2. Preprocessing
      3. Classification with logistic regression
      4. Classification with support vector machines
      5. Regression with ElasticNetCV
      6. Support vector regression
      7. Clustering with affinity propagation
      8. Mean Shift
      9. Genetic algorithms
      10. Neural networks
      11. Decision trees
      12. Summary
    18. 11. Environments Outside the Python Ecosystem and Cloud Computing
      1. Exchanging information with MATLAB/Octave
      2. Installing rpy2
      3. Interfacing with R
      4. Sending NumPy arrays to Java
      5. Integrating SWIG and NumPy
      6. Integrating Boost and Python
      7. Using Fortran code through f2py
      8. Setting up Google App Engine
      9. Running programs on PythonAnywhere
      10. Working with Wakari
      11. Summary
    19. 12. Performance Tuning, Profiling, and Concurrency
      1. Profiling the code
      2. Installing Cython
      3. Calling C code
      4. Creating a process pool with multiprocessing
      5. Speeding up embarrassingly parallel for loops with Joblib
      6. Comparing Bottleneck to NumPy functions
      7. Performing MapReduce with Jug
      8. Installing MPI for Python
      9. IPython Parallel
      10. Summary
    20. A. Key Concepts
    21. B. Useful Functions
      1. matplotlib
      2. NumPy
      3. pandas
      4. Scikit-learn
      5. SciPy
        1. scipy.fftpack
        2. scipy.signal
        3. scipy.stats
    22. C. Online Resources
    23. Index

Product information

  • Title: Python Data Analysis
  • Author(s): Ivan Idris
  • Release date: October 2014
  • Publisher(s): Packt Publishing
  • ISBN: 9781783553358