Video description
This course shows you how to build data pipelines and automate workflows using Python 3. From simple task-based messaging queues to complex frameworks like Luigi and Airflow, the course delivers the essential knowledge you need to develop your own automation solutions. You'll learn the architecture basics, and receive an introduction to a wide variety of the most popular frameworks and tools.
Designed for the working data professional who is new to the world of data pipelines and distributed solutions, the course requires intermediate level Python experience and the ability to manage your own system set-ups.
- Acquire a practical understanding of how to approach data pipelining using Python toolsets
- Master the ability to determine when a Python framework is appropriate for a project
- Understand workflow concepts like directed acyclic graphs, producers, and consumers
- Learn to integrate data flows into pipelines, workflows, and task-based automation solutions
- Understand how to parallelize data analysis, both locally and in a distributed cluster
- Practice writing simple data tests using property-based testing
Table of contents
-
Introduction
- Welcome To The Course 00:02:53
- About The Author 00:01:55
-
Automation 101
- Introduction To Automation 00:02:48
- Adventures With Servers 00:06:37
- Being A Good Systems Caretaker 00:06:03
- What Is A Queue? 00:02:32
- What Is A Consumer? What Is A Producer? 00:02:00
-
Easy Task Processing With Celery
- Why Celery? 00:01:49
- Celery Architecture & Set Up 00:05:25
- Writing Your First Tasks 00:07:49
- Deploying Your Tasks 00:06:08
- Scaling Your Workers 00:08:52
- Monitoring With Flower 00:05:05
- Advanced Celery Features 00:06:00
-
Scaling Data Analysis With Dask
- Why Dask? 00:03:01
- First Steps With Dask 00:10:08
- Dask Bags 00:10:18
- Dask Distributed 00:09:58
-
Data Pipelines With Luigi & Airflow
- What Are Data Pipelines? What Is Dag? 00:02:37
- Luigi And Airflow: A Comparison 00:05:50
- First Steps With Luigi 00:07:12
- More Complex Luigi Tasks 00:09:17
- Introduction To Hadoop 00:08:21
- First Steps With Airflow 00:08:07
- Custom Tasks With Airflow 00:09:16
- Advanced Airflow: Subdags And Branches 00:11:17
- Using Luigi With Hadoop 00:10:15
-
Other Workflow Frameworks
- Apache Spark 00:08:28
- Apache Spark Streaming 00:06:32
- Django Channels 00:09:39
- And Many More 00:05:59
-
Testing With Pipelines
- Introduction To Testing With Python 00:07:24
- Property-Based Testing With Hypothesis 00:06:09
-
Conclusion
- What's Next? 00:03:57
Product information
- Title: Building Data Pipelines with Python
- Author(s):
- Release date: November 2016
- Publisher(s): Infinite Skills
- ISBN: 9781491970263
You might also like
video
Apache Spark 3 for Data Engineering and Analytics with Python
Apache Spark 3 is an open-source distributed engine for querying and processing data. This course will …
book
Learn Python by Building Data Science Applications
Understand the constructs of the Python programming language and use them to build data science projects …
book
Building Machine Learning Pipelines
Companies are spending billions on machine learning projects, but it’s money wasted if the models can’t …
video
Spark, Ray, and Python for Scalable Data Science
7.5 Hours of Video Instruction Conceptual overviews and code-along sessions get you scaling up your data …