Hands-On Data Science with Anaconda

Book description

Develop, deploy, and streamline your data science projects with the most popular end-to-end platform, Anaconda

About This Book
  • Use Anaconda to find solutions for clustering, classification, and linear regression
  • Analyze your data efficiently with the most powerful data science stack
  • Use the Anaconda cloud to store, share, and discover projects and libraries
Who This Book Is For

Hands-On Data Science with Anaconda is for you if you are a developer who is looking for the best tools in the market to perform data science. It's also ideal for data analysts and data science professionals who want to improve the efficiency of their data science applications by using the best libraries in multiple languages. Basic programming knowledge with R or Python and introductory knowledge of linear algebra is expected.

What You Will Learn
  • Perform cleaning, sorting, classification, clustering, regression, and dataset modeling using Anaconda
  • Use the package manager conda and discover, install, and use functionally efficient and scalable packages
  • Get comfortable with heterogeneous data exploration using multiple languages within a project
  • Perform distributed computing and use Anaconda Accelerate to optimize computational powers
  • Discover and share packages, notebooks, and environments, and use shared project drives on Anaconda Cloud
  • Tackle advanced data prediction problems
In Detail

Anaconda is an open source platform that brings together the best tools for data science professionals with more than 100 popular packages supporting Python, Scala, and R languages. Hands-On Data Science with Anaconda gets you started with Anaconda and demonstrates how you can use it to perform data science operations in the real world.

The book begins with setting up the environment for Anaconda platform in order to make it accessible for tools and frameworks such as Jupyter, pandas, matplotlib, Python, R, Julia, and more. You'll walk through package manager Conda, through which you can automatically manage all packages including cross-language dependencies, and work across Linux, macOS, and Windows. You'll explore all the essentials of data science and linear algebra to perform data science tasks using packages such as SciPy, contrastive, scikit-learn, Rattle, and Rmixmod.

Once you're accustomed to all this, you'll start with operations in data science such as cleaning, sorting, and data classification. You'll move on to learning how to perform tasks such as clustering, regression, prediction, and building machine learning models and optimizing them. In addition to this, you'll learn how to visualize data using the packages available for Julia, Python, and R.

Style and approach

This book is your step-by-step guide full of use cases, examples and illustrations to get you well-versed with the concepts of Anaconda.

Table of contents

  1. Title Page
  2. Copyright and Credits
    1. Hands-On Data Science with Anaconda
  3. Dedication
  4. Packt Upsell
    1. Why subscribe?
    2. PacktPub.com
  5. Contributors
    1. About the authors
    2. About the reviewer
    3. Packt is searching for authors like you
  6. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
      1. Download the example code files
      2. Download the color images
      3. Conventions used
    4. Get in touch
      1. Reviews
  7. Ecosystem of Anaconda
    1. Introduction
      1. Reasons for using Jupyter via Anaconda
      2. Using Jupyter without pre-installation
    2. Miniconda
    3. Anaconda Cloud
    4. Finding help
    5. Summary
    6. Review questions and exercises
  8. Anaconda Installation
    1. Installing Anaconda
      1. Anaconda for Windows
    2. Testing Python
    3. Using IPython
    4. Using Python via Jupyter
    5. Introducing Spyder
    6. Installing R via Conda
    7. Installing Julia and linking it to Jupyter
    8. Installing Octave and linking it to Jupyter
    9. Finding help
    10. Summary
    11. Review questions and exercises
  9. Data Basics
    1. Sources of data
    2. UCI machine learning
    3. Introduction to the Python pandas package
    4. Several ways to input data
      1. Inputting data using R
      2. Inputting data using Python
    5. Introduction to the Quandl data delivery platform
    6. Dealing with missing data
    7. Data sorting
      1. Slicing and dicing datasets
      2. Merging different datasets
      3. Data output
    8. Introduction to the cbsodata Python package
    9. Introduction to the datadotworld Python package
    10. Introduction to the haven and foreign R packages
    11. Introduction to the dslabs R package
    12. Generating Python datasets
    13. Generating R datasets
    14. Summary
    15. Review questions and exercises
  10. Data Visualization
    1. Importance of data visualization
    2. Data visualization in R
    3. Data visualization in Python
    4. Data visualization in Julia
    5. Drawing simple graphs
      1. Various bar charts, pie charts, and histograms
      2. Adding a trend
      3. Adding legends and other explanations
    6. Visualization packages for R
    7. Visualization packages for Python
    8. Visualization packages for Julia
    9. Dynamic visualization
      1. Saving pictures as pdf
      2. Saving dynamic visualization as HTML file
    10. Summary
    11. Review questions and exercises
  11. Statistical Modeling in Anaconda
    1. Introduction to linear models
    2. Running a linear regression in R, Python, Julia, and Octave
    3. Critical value and the decision rule
    4. F-test, critical value, and the decision rule
      1. An application of a linear regression in finance
    5. Dealing with missing data
      1. Removing missing data
      2. Replacing missing data with another value
    6. Detecting outliers and treatments
    7. Several multivariate linear models
    8. Collinearity and its solution
    9. A model's performance measure
    10. Summary
    11. Review questions and exercises
  12. Managing Packages
    1. Introduction to packages, modules, or toolboxes
    2. Two examples of using packages
    3. Finding all R packages
    4. Finding all Python packages
    5. Finding all Julia packages
    6. Finding all Octave packages
    7. Task views for R
    8. Finding manuals
    9. Package dependencies
    10. Package management in R
    11. Package management in Python
    12. Package management in Julia
    13. Package management in Octave
    14. Conda – the package manager
    15. Creating a set of programs in R and Python
    16. Finding environmental variables
    17. Summary
    18. Review questions and exercises
  13. Optimization in Anaconda
    1. Why optimization is important
    2. General issues for optimization problems
      1. Expressing various kinds of optimization problems as LPP
    3. Quadratic optimization
      1. Optimization in R
      2. Optimization in Python
      3. Optimization in Julia
      4. Optimization in Octave
    4. Example #1 – stock portfolio optimization
    5. Example #2 – optimal tax policy
    6. Packages for optimization in R
    7. Packages for optimization in Python
    8. Packages for optimization in Octave
    9. Packages for optimization in Julia
    10. Summary
    11. Review questions and exercises
  14. Unsupervised Learning in Anaconda
    1. Introduction to unsupervised learning
    2. Hierarchical clustering
    3. k-means clustering
    4. Introduction to Python packages – scipy
    5. Introduction to Python packages – contrastive
    6. Introduction to Python packages – sklearn (scikit-learn)
    7. Introduction to R packages – rattle
    8. Introduction to R packages – randomUniformForest
    9. Introduction to R packages – Rmixmod
    10. Implementation using Julia
    11. Task view for Cluster Analysis
    12. Summary
    13. Review questions and exercises
  15. Supervised Learning in Anaconda
    1. A glance at supervised learning
    2. Classification
      1. The k-nearest neighbors algorithm
      2. Bayes classifiers
      3. Reinforcement learning
    3. Implementation of supervised learning via R
      1. Introduction to RTextTools
    4. Implementation via Python
      1. Using the scikit-learn (sklearn) module
    5. Implementation via Octave
    6. Implementation via Julia
      1. Task view for machine learning in R
    7. Summary
    8. Review questions and exercises
  16. Predictive Data Analytics – Modeling and Validation
    1. Understanding predictive data analytics
    2. Useful datasets
      1. The AppliedPredictiveModeling R package
      2. Time series analytics
    3. Predicting future events
      1. Seasonality
      2. Visualizing components
      3. R package – LiblineaR
      4. R package – datarobot
      5. R package – eclust
    4. Model selection
      1. Python package – model-catwalk
      2. Python package – sklearn
      3. Julia package – QuantEcon
      4. Octave package – ltfat
    5. Granger causality test
    6. Summary
    7. Review questions and exercises
  17. Anaconda Cloud
    1. Introduction to Anaconda Cloud
    2. Jupyter Notebook in depth
      1. Formats of Jupyter Notebook
      2. Sharing of notebooks
      3. Sharing of projects
      4. Sharing of environments
    3. Replicating others' environments locally
      1. Downloading a package from Anaconda
    4. Summary
    5. Review questions and exercises
  18. Distributed Computing, Parallel Computing, and HPCC
    1. Introduction to distributed versus parallel computing
      1. Task view for parallel processing
      2. Sample programs in Python
    2. Understanding MPI
      1. R package Rmpi
      2. R package plyr
      3. R package parallel
      4. R package snow
    3. Parallel processing in Python
      1. Parallel processing for word frequency
      2. Parallel Monte-Carlo options pricing
    4. Compute nodes
    5. Anaconda add-on
    6. Introduction to HPCC
    7. Summary
    8. Review questions and exercises
  19. References
    1. Chapter 01: Ecosystem of Anaconda
    2. Chapter 02: Anaconda Installation
    3. Chapter 03: Data Basics
    4. Chapter 04: Data Visualization
    5. Chapter 05: Statistical Modeling in Anaconda
    6. Chapter 06: Managing Packages
    7. Chapter 07: Optimization in Anaconda
    8. Chapter 08: Unsupervised Learning in Anaconda
    9. Chapter 09: Supervised Learning in Anaconda
    10. Chapter 10: Predictive Data Analytics – Modelling and Validation
    11. Chapter 11: Anaconda Cloud
    12. Chapter 12: Distributed Computing, Parallel Computing, and HPCC
  20. Other Books You May Enjoy
    1. Leave a review - let other readers know what you think

Product information

  • Title: Hands-On Data Science with Anaconda
  • Author(s): Dr. Yuxing Yan, James Yan
  • Release date: May 2018
  • Publisher(s): Packt Publishing
  • ISBN: 9781788831192