Book description
If you use data to make critical business decisions, this book is for you. Whether you're a data analyst, research scientist, data engineer, ML engineer, data scientist, application developer, or systems developer, this guide helps you broaden your understanding of the modern data science stack, create your own machine learning pipelines, and deploy them to applications at production scale.
The AWS data science stack unifies data science, data engineering, and application development to help you level up your skills beyond your current role. Authors Antje Barth and Chris Fregly show you how to build your own ML pipelines from existing APIs, submit them to the cloud, and integrate results into your application in minutes instead of days.
- Innovate quickly and save money with AWS's on-demand, serverless, and cloud-managed services
- Implement open source technologies such as Kubeflow, Kubernetes, TensorFlow, and Apache Spark on AWS
- Build and deploy an end-to-end, continuous ML pipeline with the AWS data science stack
- Perform advanced analytics on at-rest and streaming data with AWS and Spark
- Integrate streaming data into your ML pipeline for continuous delivery of ML models using AWS and Apache Kafka
Publisher resources
Table of contents
- 1. Automated Machine Learning
- 2. Ingest Data Into The Cloud
-
3. Explore the Dataset
- Tools for Exploring Data in AWS
- Visualize our Data Lake with SageMaker Studio
- Query our Data Warehouse
- Create Dashboards with QuickSight
- Detect Data Quality Issues with SageMaker and Apache Spark
- Detect Data Bias with SageMaker Clarify
- Identify Feature Importance with SageMaker Data Wrangler Quick Model
- Detect Different Types of Drift with SageMaker Clarify
- Analyze our Data with AWS Glue DataBrew
- Reduce Cost and Increase Performance
- Summary
-
4. Prepare the Dataset for Model Training
- Perform Feature Selection and Engineering
- Scale Feature Engineering with SageMaker Processing Jobs
- Share Features through a Feature Store
- Ingest and Transform Data with SageMaker Data Wrangler
- Track Lineage with SageMaker Lineage and Experiments
- Ingest and Transform Data with AWS Glue DataBrew
- Reduce Cost and Increase Performance
- Summary
-
5. Train Your First Model
- Understand the SageMaker Infrastructure
- Deploy A Pre-Trained BERT Model with SageMaker JumpStart
- Develop a SageMaker Model
- A Brief History of Natural Language Processing
- Training BERT from Scratch
- Use Pre-Trained BERT Models
- Create the Training Script
- Launch the Training Script from a SageMaker Notebook
- Evaluate Our Models
- Debug and Profile Model Training with SageMaker Debugger
- Interpret and Explain Model Predictions
- Detect Model Bias and Explain Predictions
- More Training Options for BERT
-
Reduce Cost and Increase Performance
- Use Small Notebook Instances
- Test Model-Training Scripts Locally in the Notebook
- Profile Training Jobs with SageMaker Debugger
- Start with a Pre-Trained Model
- Use 16-bit Half Precision and bfloat16
- Mixed 32-bit Full and 16-bit Half Precision
- Quantization
- Use Training-Optimized Hardware
- Spot Instances and Checkpoints
- Early Stopping Rule in SageMaker Debugger
- Summary
- 6. Train and Optimize Models at Scale
-
7. Deploy Models to Production
- Choose Real-Time or Batch Predictions
- Real-Time Predictions with SageMaker Endpoints
- Auto-Scale SageMaker Endpoints using CloudWatch
- Strategies to Deploy New and Updated Models
- Testing and Comparing New Models
- Monitor Model Performance and Detect Drift
- Monitor Data Quality of a Deployed SageMaker Endpoint
- Monitor Model Quality of Deployed SageMaker Endpoints
- Monitor Bias Drift of Deployed SageMaker Endpoints
- Monitor Explainability Drift of Deployed SageMaker Endpoints
- Perform Batch Predictions with SageMaker Batch Transform
- Lambda Functions and API Gateway
- Optimize and Manage Models at the Edge
- Deploy a PyTorch Model with TorchServe
- TensorFlow-BERT Inference with AWS Deep Java Library
- Reduce Cost and Increase Performance
- Summary
- 8. Pipelines and MLOps
Product information
- Title: Data Science on AWS
- Author(s):
- Release date: July 2021
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9781492079392
You might also like
book
Full Stack Serverless
Cloud computing is typically associated with backend development and DevOps. But with the rise of serverless …
video
Amazon Web Services AWS LiveLessons 2nd Edition
More Than 17 Hours of Video Instruction More than 17 hours of video instruction on Amazon …
book
Python for Programmers, First Edition
The professional programmer's Deitel® guide to Python® with introductory artificial intelligence case studies Written for programmers …
book
Software Engineering at Google
Today, software engineers need to know not only how to program effectively but also how to …