Video description
If you have some Python experience, and you want to take it to the next level, this practical, hands-on Learning Path will be a helpful resource. Video tutorials in this Learning Path will show you how to use Python for distributed task processing, and perform large-scale data processing in Spark using the PySpark API.
Publisher resources
Table of contents
-
Building Data Pipelines with Python
- Welcome To The Course 00:02:53
- About The Author 00:01:55
- Introduction To Automation 00:02:48
- Adventures With Servers 00:06:37
- Being A Good Systems Caretaker 00:06:03
- What Is A Queue? 00:02:32
- What Is A Consumer? What Is A Producer? 00:02:00
- Why Celery? 00:01:49
- Celery Architecture & Set Up 00:05:25
- Writing Your First Tasks 00:07:49
- Deploying Your Tasks 00:06:08
- Scaling Your Workers 00:08:52
- Monitoring With Flower 00:05:05
- Advanced Celery Features 00:06:00
- Why Dask? 00:03:01
- First Steps With Dask 00:10:08
- Dask Bags 00:10:18
- Dask Distributed 00:09:58
- What Are Data Pipelines? What Is Dag? 00:02:37
- Luigi And Airflow: A Comparison 00:05:50
- First Steps With Luigi 00:07:12
- More Complex Luigi Tasks 00:09:17
- Introduction To Hadoop 00:08:21
- First Steps With Airflow 00:08:07
- Custom Tasks With Airflow 00:09:16
- Advanced Airflow: Subdags And Branches 00:11:17
- Using Luigi With Hadoop 00:10:15
- Apache Spark 00:08:28
- Apache Spark Streaming 00:06:32
- Django Channels 00:09:39
- And Many More 00:05:59
- Introduction To Testing With Python 00:07:24
- Property-Based Testing With Hypothesis 00:06:09
- What's Next? 00:03:57
-
Introduction to PySpark
- Introduction And Course Overview 00:02:01
- About The Author 00:01:02
- Installing Python 00:04:38
- Installing iPython And Using Notebooks 00:06:28
- Download And Setup 00:03:24
- Running The Spark Shell 00:05:35
- Running The Spark Shell With iPython 00:06:38
- What Is A Resilient Distributed Dataset - RDD? 00:04:54
- Reading A Text File 00:03:34
- Actions 00:02:13
- Transformations 00:02:30
- Persisting Data 00:04:11
- Map 00:03:04
- Filter 00:03:56
- Flatmap 00:03:16
- MapPartitions 00:04:07
- MapPartitionsWithIndex 00:01:51
- Sample 00:02:36
- Union 00:01:11
- Intersection 00:01:28
- Distinct 00:02:02
- Cartesian 00:03:17
- Pipe 00:03:40
- Coalesce 00:02:12
- Repartition 00:02:29
- RepartitionAndSortWithinPartitions 00:03:58
- Reduce 00:04:19
- Collect 00:01:56
- Count 00:03:05
- First 00:01:20
- Take 00:01:05
- TakeSample 00:03:03
- TakeOrdered 00:02:10
- SaveAsTextFile 00:04:09
- CountByKey 00:02:40
- ForEach 00:03:11
- GroupByKey 00:02:31
- ReduceByKey 00:03:30
- AggregateByKey 00:03:44
- SortByKey 00:02:47
- Join 00:04:16
- CoGroup 00:02:09
- WholeTextFile 00:03:15
- Pickle Files 00:03:59
- HadoopInputFormat 00:05:35
- HadoopOutputFormat 00:05:31
- Broadcast Variables 00:04:17
- Accumulators 00:05:08
- Using A Custom Accumulator 00:04:52
- Partitioning 00:07:56
- Spark Standalone Cluster 00:04:26
- Mesos 00:03:38
- Yarn 00:02:28
- Client Versus Cluster Mode 00:02:41
- Spark Streaming 00:04:21
- Dataframes And SQL 00:03:28
- MLlib 00:04:29
- Resources And Where To Go From Here 00:01:02
- Wrap Up 00:01:28
Product information
- Title: Learning Path: Scaling Python for Big Data
- Author(s):
- Release date: December 2016
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9781491977798
You might also like
video
Building Data Pipelines with Python
This course shows you how to build data pipelines and automate workflows using Python 3. From …
video
Clean Code
Expanded Edition (August 2018) Updated with Design Patterns episodes from the Clean Code series from Clean …
video
Python Programming Language
6+ Hours of Video Instruction Python Programming Language LiveLessons provides developers with a guided tour of …
video
Python Fundamentals
51+ hours of video instruction. Overview The professional programmer’s Deitel® video guide to Python development with …