Learning Path: Scaling Python for Big Data

Video Description

If you have some Python experience, and you want to take it to the next level, this practical, hands-on Learning Path will be a helpful resource. Video tutorials in this Learning Path will show you how to use Python for distributed task processing, and perform large-scale data processing in Spark using the PySpark API.

Table of Contents

  1. Building Data Pipelines with Python
    1. Welcome To The Course 00:02:53
    2. About The Author 00:01:55
    3. Introduction To Automation 00:02:48
    4. Adventures With Servers 00:06:37
    5. Being A Good Systems Caretaker 00:06:03
    6. What Is A Queue? 00:02:32
    7. What Is A Consumer? What Is A Producer? 00:02:00
    8. Why Celery? 00:01:49
    9. Celery Architecture & Set Up 00:05:25
    10. Writing Your First Tasks 00:07:49
    11. Deploying Your Tasks 00:06:08
    12. Scaling Your Workers 00:08:52
    13. Monitoring With Flower 00:05:05
    14. Advanced Celery Features 00:06:00
    15. Why Dask? 00:03:01
    16. First Steps With Dask 00:10:08
    17. Dask Bags 00:10:18
    18. Dask Distributed 00:09:58
    19. What Are Data Pipelines? What Is Dag? 00:02:37
    20. Luigi And Airflow: A Comparison 00:05:50
    21. First Steps With Luigi 00:07:12
    22. More Complex Luigi Tasks 00:09:17
    23. Introduction To Hadoop 00:08:21
    24. First Steps With Airflow 00:08:07
    25. Custom Tasks With Airflow 00:09:16
    26. Advanced Airflow: Subdags And Branches 00:11:17
    27. Using Luigi With Hadoop 00:10:15
    28. Apache Spark 00:08:28
    29. Apache Spark Streaming 00:06:32
    30. Django Channels 00:09:39
    31. And Many More 00:05:59
    32. Introduction To Testing With Python 00:07:24
    33. Property-Based Testing With Hypothesis 00:06:09
    34. What's Next? 00:03:57
  2. Introduction to PySpark
    1. Introduction And Course Overview 00:02:01
    2. About The Author 00:01:02
    3. Installing Python 00:04:38
    4. Installing iPython And Using Notebooks 00:06:28
    5. Download And Setup 00:03:24
    6. Running The Spark Shell 00:05:35
    7. Running The Spark Shell With iPython 00:06:38
    8. What Is A Resilient Distributed Dataset - RDD? 00:04:54
    9. Reading A Text File 00:03:34
    10. Actions 00:02:13
    11. Transformations 00:02:30
    12. Persisting Data 00:04:11
    13. Map 00:03:04
    14. Filter 00:03:56
    15. Flatmap 00:03:16
    16. MapPartitions 00:04:07
    17. MapPartitionsWithIndex 00:01:51
    18. Sample 00:02:36
    19. Union 00:01:11
    20. Intersection 00:01:28
    21. Distinct 00:02:02
    22. Cartesian 00:03:17
    23. Pipe 00:03:40
    24. Coalesce 00:02:12
    25. Repartition 00:02:29
    26. RepartitionAndSortWithinPartitions 00:03:58
    27. Reduce 00:04:19
    28. Collect 00:01:56
    29. Count 00:03:05
    30. First 00:01:20
    31. Take 00:01:05
    32. TakeSample 00:03:03
    33. TakeOrdered 00:02:10
    34. SaveAsTextFile 00:04:09
    35. CountByKey 00:02:40
    36. ForEach 00:03:11
    37. GroupByKey 00:02:31
    38. ReduceByKey 00:03:30
    39. AggregateByKey 00:03:44
    40. SortByKey 00:02:47
    41. Join 00:04:16
    42. CoGroup 00:02:09
    43. WholeTextFile 00:03:15
    44. Pickle Files 00:03:59
    45. HadoopInputFormat 00:05:35
    46. HadoopOutputFormat 00:05:31
    47. Broadcast Variables 00:04:17
    48. Accumulators 00:05:08
    49. Using A Custom Accumulator 00:04:52
    50. Partitioning 00:07:56
    51. Spark Standalone Cluster 00:04:26
    52. Mesos 00:03:38
    53. Yarn 00:02:28
    54. Client Versus Cluster Mode 00:02:41
    55. Spark Streaming 00:04:21
    56. Dataframes And SQL 00:03:28
    57. MLlib 00:04:29
    58. Resources And Where To Go From Here 00:01:02
    59. Wrap Up 00:01:28

Product Information

  • Title: Learning Path: Scaling Python for Big Data
  • Author(s): O'Reilly Media, Inc.
  • Release date: December 2016
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9781491977804