Rust Data Engineering

Video description

Rust Data Engineering: Course Description

Are you a data engineer, software developer, or a tech enthusiast with a basic understanding of Rust, seeking to enhance your skills and dive deep into the realm of data engineering with Rust? Or are you a professional from another programming language background, aiming to explore the efficiency, safety, and concurrency features of Rust for data engineering tasks? If so, this course is designed for you.

While a fundamental knowledge of Rust is expected, you should ideally be comfortable with the basics of data structures and algorithms, and have a working understanding of databases and data processing. Familiarity with SQL, the command line, and version control with git is advantageous.

This four-week course focuses on leveraging Rust to create efficient, safe, and concurrent data processing systems. The journey begins with a deep dive into Rust's data structures and collections, followed by exploring Rust's safety and security features in the context of data engineering. In the subsequent week, you'll explore libraries and tools specific to data engineering like Diesel, async, Polars, and Apache Arrow, and learn to interface with data processing systems, REST, gRPC protocols, and AWS SDK for cloud-based data operations. The final week focuses on designing and implementing full-fledged data processing systems using Rust.

By the end of this course, you will be well-equipped to use Rust for handling large-scale data engineering tasks, solving real-world problems with efficiency and speed. The hands-on labs and projects throughout this course will ensure you gain practical experience, putting your knowledge into action. This course is your gateway to mastering data engineering with Rust, preparing you for the next level in your data engineering journey.

Learning Rust During Live Coding

This video series covers live coding the Rust language iteratively, thus learning the language. Lessons Covered Include:

Lesson 1: Getting Started With The Modern Rust Development Ecosystem

  • Meet the instructor & Course Overview
  • Introduction to the AI Coding Paradigm Shift
  • Introduction to cloud-based development environments
  • Introduction to GitHub Copilot Ecosystem for Rust
  • Prompt Engineering with GCP BigQuery SQL
  • Introduction to AWS CodeWhisperer for Rust
  • Using Google Bard to Enhance Productivity
  • Continuous Integration with Rust and GitHub Actions

Lesson 2: Rust Sequences and Maps

  • Introducing Rust Sequences and Maps
  • Print Rust data structures demo
  • Vector Fruit Salad demo
  • VecDeque Fruit Salad demo
  • Linkedin List Fruit Salad demo
  • Fruit Salad CLI demo
  • HashMap frequency counter demo
  • HashMap language comparison

Lesson 3: Rust Sets, Graphs and Miscellaneous Data Structures

  • Analyzing UFC Fighter Network Using Graph Centrality in Rust
  • Storing Unique Fruits Using HashSet in Rust
  • Maintaining Sorted and Unique Fruits Using BTreeSet in Rust
  • Creating a Fig Priority Fruit Salad Using Binary Heap in Rust
  • PageRank algorithm for sports data
  • Showing shortest path with dijkstra
  • Detecting Strongly Connected Components: A Deep Dive into Kosaraju's Algorithm
  • Simple Charting of Data Structures in Rust
Section 2: Safety, Security and Concurrency with Rust

Lesson 1: Rust Safety and Security Features

  • Multi-Factor Authentication
  • Network Segmentation
  • Least Privilege Access
  • Encryption
  • Mutable fruit salad
  • Customize fruit salad with a CLI
  • Data Race example

Lesson 2: Security Programming with Rust

  • High Availability
  • Understanding the Homophonic Cipher: A Cryptographic Technique
  • Decoding the Secrets of the Caesar Cipher
  • Building a Caesar Cipher Command Line Interface
  • Creating a Decoder Ring: A Practical Guide
  • Detecting Duplicates with SHA-3: A Data Integrity Tool
  • Incident Response
  • Compliance

Lesson 3: Concurrency with Rust

  • Core Concepts in Concurrency
  • Dining Philosophers
  • Web Crawl Wikipedia with Rayon
  • Intelligent Chatbot with Tokio
  • Multi-threaded deduplication with Rust
  • Energy Efficiency Python vs Rust
  • Concurrency Stress test with a GPU
  • Host Efficiency Serverless Optimization problem
Section 3: Rust Data Engineering Libraries and Tools

Lesson 1: Using Rust to Manage Data, Files and Network Storage

  • Process CSV files in Rust
  • Using Cargo Lambda with Rust
  • List files on AWS EFS with Rust
  • Use AWS S3 Storage
  • Use AWS S3 Storage from Rust
  • Write encrypted data to tables or Parquet files

Lesson 2: DataFrames with Rust, Python and Notebooks

  • What is Colab?
  • Using Bard to enhance notebook development
  • Exploring Life Expectency in a Notebook
  • Load a DataFrame with sensitive data
  • Using MLFlow with Databricks Notebooks
  • End to End ML with MLFlow and Databricks
  • Comparing DataFrame Libraries between Rust and Python

Lesson 3: Data Engineering Libraries and Tools with Rust

  • Parquet file writing and reading with Rust
  • Arrow & Parquet in Rust
  • Serverless functions with Rust and AWS Lambda
  • Polars library overview
  • Building RESTful APIs with Rocket
  • Utilizing Async Rust in Web Development
  • Applying Data Cleaning Techniques with Rust
  • Deploying Rust Applications in a Kubernetes Environment
Section 4: Designing Data Processing Systems in Rust

Lesson 1: Getting Started with Rust Data Pipelines (Including ETL)

  • Jack and the Beanstalk Data Pipelines
  • Open Source Data Engineering - Pros and Cons
  • Core Components of Data Engineering Pipelines
  • Rust AWS Step Functions Pipeline
  • Rust AWS Lambda Async S3 Size Calculator
  • What is Distroless
  • Demo Deploying Rust Microservices on GCP

Lesson 2: Using Rust and Python for LLMs, ONNX, Hugging Face, and PyTorch Pipelines

  • Introduction to Hugging Face Hub
  • Rust PyTorch Pre-trained Model Ecosystem
  • Rust GPU Hugging Face Translator
  • Rust PyTorch High-Performance Options
  • Rust CUDA PyTorch Stress Test
  • EFS ONNX Rust Inference with AWS Lambda
  • Theory behind model fine-tuning
  • Doing Fine Tuning

Lesson 3: Building SQL Solutions with Rust, Generative AI and Cloud

  • Selecting the correct database on GCP
  • Rust SQLite Hugging Face Zero Shot Classifier
  • Prompt Engineering for BigQuery
  • Big Query to Colab Pipeline
  • Exploring Data with Big Query
  • Using Public Datasets for Data Science
  • Querying Log files with BigQuery
  • There is no one size database
  • Course Conclusion
Learning Objectives

By the end of this Course, you will be able to:

  • Leverage Rust's robust data structures and collections for efficient data manipulation.
  • Understand and utilize Rust's safety and security features to build reliable and secure data engineering solutions.
  • Utilize Rust's libraries and tools specific to data engineering, such as Diesel, async, Polars, and Apache Arrow.
  • Interface effectively with databases, data processing systems, REST and gRPC protocols, and leverage AWS SDK for cloud-based data operations in Rust.
  • Design and implement comprehensive data processing systems in Rust.
  • Apply the principles of concurrent programming in Rust to build high-performance data processing applications.
  • Identify and mitigate common data engineering problems using Rust's unique features, like its strong type system and memory safety guarantees.
  • Develop command-line applications and multi-threaded servers in Rust, focusing on efficient, safe, and concurrent processing of data.
  • Create practical projects, gaining hands-on experience in Rust for data engineering.
Additional Popular Resources

Table of contents

  1. Lesson 1
    1. "Meet Instructor Course Overview"
    2. "Ai Pair Programming Paradigm Shift"
    3. "Github Codespaces Ecosystem With Copilot Chat"
    4. "Copilot Enabled Rust"
    5. "Big Query Prompt Engineering"
    6. "Aws Codewhisperer For Rust"
    7. "Using Bard To Enhance Productivity"
    8. "Continuous Integration Rust Github Actions"
    9. "Rust Sequences Maps"
    10. "Print Rust Data Structures Demo"
    11. "Vector Fruit Salad Demo"
    12. "Vecdeque Fruit Salad Demo"
    13. "Linkedin List Fruit Salad Demo"
    14. "Fruit Salad Cli Demo"
    15. "Hashmap Frequency Counter Demo"
    16. "Hashmap Language Comparison"
    17. "Ufc Graph Centrality"
    18. "Unique Fruits With Hashset"
    19. "Sorted Unique Fruits With Btreeset"
    20. "Fig Priority Fruit Salad With Binary Heap"
    21. "Pagerank Sports"
    22. "Shortest Path V2"
    23. "Strongly Connected Components With Kosaraju"
    24. "Ascii Graphing"
  2. Lesson 2
    1. "Multi Factor Authentication"
    2. "Network Segmentation"
    3. "Least Privilege Access"
    4. "Encryption"
    5. "Mutable Fruit Salad"
    6. "Customize Csv Fruit Salad"
    7. "Data Race"
    8. "High Availability"
    9. "Homphonic Cipher V2"
    10. "Caesar Cipher"
    11. "Caesar Cipher Cli"
    12. "Decoder Ring"
    13. "Sha3 Dupe Detector"
    14. "Incident Response"
    15. "Compliance"
    16. "Core Concepts Concurrency"
    17. "Dining Philosopher"
    18. "Web Crawl Wikipedia Rayon"
    19. "Tokio Chatbot"
    20. "Data Eng With Rust Dedupe"
    21. "Energy Efficiency Python Rust"
    22. "Building Cuda Enabled Stress Test With Rust Pytorch V2"
    23. "Host Efficiency Optimization Problem"
    24. "Process Csv Rust"
    25. "Cargo Lambda Rust"
    26. "Rust Efs Lister"
    27. "Use S3 Storage"
    28. "Use Rust For S3 Storage"
    29. "Write Encrypted Data To Tables Or Parquet Files"
    30. "What Is Colab"
    31. "Using Bard To Enhance Productivity"
    32. "Life Expectancy Eda"
    33. "Load A Dataframe With Sensitive Information"
    34. "Mlops Mlflow Tracking"
    35. "End To End Ml Databricks Mlflow V2"
    36. "Exploring Life Expectancy Polars"
    37. "Cloud Developer Workspace Advantage"
    38. "Onboard Gcp"
    39. "Demo Google Cloud Shell V2"
    40. "Learn Aws Cloudshell"
    41. "Prototyping Ai Apis Aws Cloudshell Bash"
    42. "Cloud9 With Codewhisperer"
    43. "Demo App Engine Rust Deploy"
    44. "Intro Actix Rust Containerized Microservice"
  3. Lesson 4
    1. "Jack Beanstalk Building Data Pipelines"
    2. "Open Source De Pro Con"
    3. "Core Components Data Engineering Pipelines"
    4. "Rust Aws Step Functions"
    5. "Rust Async S3 Size Calculator Lambda"
    6. "What Is Distroless"
    7. "Demo Build Deploy Rust Microservice Cloud Run"
    8. "Intro Hugging Face Hub"
    9. "Rust Pytorch Pretrained Models Ecosystem"
    10. "Rust Gpu Hugging Face Translator"
    11. "High Performance Pytorch Rust Demo"
    12. "Efs Onnx Lambda Rust Inference Mlops"
    13. "Intro Fine Tuning Theory"
    14. "Doing Fine Tuning"
    15. "Gcp Optimize Database Solution"
    16. "Rust Sqlite Hugging Face Zero Shot Classifier Demo"
    17. "Big Query Prompt Engineering V3"
    18. "Bq Colab Pipeline V2"
    19. "Exploring Data Google Bigquery V2"
    20. "Using Public Datasets"
    21. "Demo Big Query Log Query"
    22. "One Size Database"
    23. "Conclusion"

Product information

  • Title: Rust Data Engineering
  • Author(s): Alfredo Deza, Noah Gift
  • Release date: September 2023
  • Publisher(s): Pragmatic AI Labs
  • ISBN: 07072023VIDEOPAIML