book

Python Data Science Essentials

Name: Python Data Science Essentials
Author: Alberto Boschetti
ISBN: 9781785280429

by Alberto Boschetti

April 2015

Beginner

258 pages

5h 48m

English

Packt Publishing

Read now

Unlock full access

Python Data Science Essentials
Table of Contents
Python Data Science Essentials
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers, and moreWhy subscribe?Free access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for

Conventions
Reader feedback
Customer support
Downloading the example codeErrataPiracyQuestions
1. First Steps
Introducing data science and Python
Installing Python
Python 2 or Python 3?Step-by-step installationA glance at the essential Python packagesNumPySciPypandasScikit-learnIPythonMatplotlibStatsmodelsBeautiful SoupNetworkXNLTKGensimPyPyThe installation of packagesPackage upgrades
Scientific distributions
AnacondaEnthought CanopyPythonXYWinPython
Introducing IPython
The IPython NotebookDatasets and code used in the bookScikit-learn toy datasetsThe MLdata.org public repositoryLIBSVM data examplesLoading data directly from CSV or text filesScikit-learn sample generators
Summary
2. Data Munging
The data science process
Data loading and preprocessing with pandas
Fast and easy data loadingDealing with problematic dataDealing with big datasetsAccessing other data formatsData preprocessingData selection
Working with categorical and textual data
A special type of data – text
Data processing with NumPy
NumPy's n-dimensional arrayThe basics of NumPy ndarray objects
Creating NumPy arrays
From lists to unidimensional arraysControlling the memory sizeHeterogeneous listsFrom lists to multidimensional arraysResizing arraysArrays derived from NumPy functionsGetting an array directly from a fileExtracting data from pandas
NumPy fast operation and computations
Matrix operationsSlicing and indexing with NumPy arraysStacking NumPy arrays
Summary
3. The Data Science Pipeline
Introducing EDA
Feature creation
Dimensionality reduction
The covariance matrixPrincipal Component Analysis (PCA)A variation of PCA for big data – RandomizedPCALatent Factor Analysis (LFA)Linear Discriminant Analysis (LDA)Latent Semantical Analysis (LSA)Independent Component Analysis (ICA)Kernel PCARestricted Boltzmann Machine (RBM)
The detection and treatment of outliers
Univariate outlier detectionEllipticEnvelopeOneClassSVM
Scoring functions
Multilabel classificationBinary classificationRegression
Testing and validating
Cross-validation
Using cross-validation iteratorsSampling and bootstrapping
Hyper-parameters' optimization
Building custom scoring functionsReducing the grid search runtime
Feature selection
Univariate selectionRecursive eliminationStability and L1-based selection
Summary
4. Machine Learning
Linear and logistic regression
Naive Bayes
The k-Nearest Neighbors
Advanced nonlinear algorithms
SVM for classificationSVM for regressionTuning SVM
Ensemble strategies
Pasting by random samplesBagging with weak ensemblesRandom Subspaces and Random PatchesSequences of models – AdaBoostGradient tree boosting (GTB)Dealing with big dataCreating some big datasets as examplesScalability with volumeKeeping up with velocityDealing with varietyA quick overview of Stochastic Gradient Descent (SGD)
A peek into Natural Language Processing (NLP)
Word tokenizationStemmingWord TaggingNamed Entity Recognition (NER)StopwordsA complete data science example – text classification
An overview of unsupervised learning
Summary
5. Social Network Analysis
Introduction to graph theory
Graph algorithms
Graph loading, dumping, and sampling
Summary
6. Visualization
Introducing the basics of matplotlibCurve plottingUsing panelsScatterplotsHistogramsBar graphsImage visualization
Selected graphical examples with pandas
Boxplots and histogramsScatterplotsParallel coordinates
Advanced data learning representation
Learning curvesValidation curvesFeature importanceGBT partial dependence plot
Summary
Index

Overview

"Python Data Science Essentials" is your complete guide to mastering the applications of Python in the field of data science. This book will take you through setting up your environment, data preprocessing, machine learning, and visualization, providing hands-on examples and clear explanations.

What this Book will help me do

Set up a Python environment for scientific computing on various OS platforms.
Preprocess data for analysis through fixing, transforming, and exploration.
Apply and tune machine learning algorithms to solve data science problems.
Analyze and interpret data connections using graph-based methods.
Create compelling visualizations to effectively present analysis results.

Author(s)

Alberto Boschetti is a recognized expert in data science and Python programming, with years of experience teaching and practicing in the field. He specializes in making complex technical topics accessible through practical examples and real-world applications, ensuring readers can readily apply what they learn.

Who is it for?

This book is designed for aspiring data scientists who have basic Python skills and want to advance to real-world data science tasks. Data analysts familiar with languages like R or MATLAB will benefit from the focus on Python tools. It's ideal for learners aiming to efficiently solve real-world problems using data science.

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Python Data Science Essentials - Second Edition

Publisher Resources

ISBN: 9781785280429

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills