Skip to Content
Rust Data Engineering
video

Rust Data Engineering

by Alfredo Deza, Noah Gift
September 2023
Intermediate
7h 42m
English
Pragmatic AI Labs

Overview

Rust Data Engineering: Course Description

Are you a data engineer, software developer, or a tech enthusiast with a basic understanding of Rust, seeking to enhance your skills and dive deep into the realm of data engineering with Rust? Or are you a professional from another programming language background, aiming to explore the efficiency, safety, and concurrency features of Rust for data engineering tasks? If so, this course is designed for you.

While a fundamental knowledge of Rust is expected, you should ideally be comfortable with the basics of data structures and algorithms, and have a working understanding of databases and data processing. Familiarity with SQL, the command line, and version control with git is advantageous.

This four-week course focuses on leveraging Rust to create efficient, safe, and concurrent data processing systems. The journey begins with a deep dive into Rust's data structures and collections, followed by exploring Rust's safety and security features in the context of data engineering. In the subsequent week, you'll explore libraries and tools specific to data engineering like Diesel, async, Polars, and Apache Arrow, and learn to interface with data processing systems, REST, gRPC protocols, and AWS SDK for cloud-based data operations. The final week focuses on designing and implementing full-fledged data processing systems using Rust.

By the end of this course, you will be well-equipped to use Rust for handling large-scale data engineering tasks, solving real-world problems with efficiency and speed. The hands-on labs and projects throughout this course will ensure you gain practical experience, putting your knowledge into action. This course is your gateway to mastering data engineering with Rust, preparing you for the next level in your data engineering journey.

Learning Rust During Live Coding

This video series covers live coding the Rust language iteratively, thus learning the language. Lessons Covered Include:

Lesson 1: Getting Started With The Modern Rust Development Ecosystem

  • Meet the instructor & Course Overview
  • Introduction to the AI Coding Paradigm Shift
  • Introduction to cloud-based development environments
  • Introduction to GitHub Copilot Ecosystem for Rust
  • Prompt Engineering with GCP BigQuery SQL
  • Introduction to AWS CodeWhisperer for Rust
  • Using Google Bard to Enhance Productivity
  • Continuous Integration with Rust and GitHub Actions

Lesson 2: Rust Sequences and Maps

  • Introducing Rust Sequences and Maps
  • Print Rust data structures demo
  • Vector Fruit Salad demo
  • VecDeque Fruit Salad demo
  • Linkedin List Fruit Salad demo
  • Fruit Salad CLI demo
  • HashMap frequency counter demo
  • HashMap language comparison

Lesson 3: Rust Sets, Graphs and Miscellaneous Data Structures

  • Analyzing UFC Fighter Network Using Graph Centrality in Rust
  • Storing Unique Fruits Using HashSet in Rust
  • Maintaining Sorted and Unique Fruits Using BTreeSet in Rust
  • Creating a Fig Priority Fruit Salad Using Binary Heap in Rust
  • PageRank algorithm for sports data
  • Showing shortest path with dijkstra
  • Detecting Strongly Connected Components: A Deep Dive into Kosaraju's Algorithm
  • Simple Charting of Data Structures in Rust
Section 2: Safety, Security and Concurrency with Rust

Lesson 1: Rust Safety and Security Features

  • Multi-Factor Authentication
  • Network Segmentation
  • Least Privilege Access
  • Encryption
  • Mutable fruit salad
  • Customize fruit salad with a CLI
  • Data Race example

Lesson 2: Security Programming with Rust

  • High Availability
  • Understanding the Homophonic Cipher: A Cryptographic Technique
  • Decoding the Secrets of the Caesar Cipher
  • Building a Caesar Cipher Command Line Interface
  • Creating a Decoder Ring: A Practical Guide
  • Detecting Duplicates with SHA-3: A Data Integrity Tool
  • Incident Response
  • Compliance

Lesson 3: Concurrency with Rust

  • Core Concepts in Concurrency
  • Dining Philosophers
  • Web Crawl Wikipedia with Rayon
  • Intelligent Chatbot with Tokio
  • Multi-threaded deduplication with Rust
  • Energy Efficiency Python vs Rust
  • Concurrency Stress test with a GPU
  • Host Efficiency Serverless Optimization problem
Section 3: Rust Data Engineering Libraries and Tools

Lesson 1: Using Rust to Manage Data, Files and Network Storage

  • Process CSV files in Rust
  • Using Cargo Lambda with Rust
  • List files on AWS EFS with Rust
  • Use AWS S3 Storage
  • Use AWS S3 Storage from Rust
  • Write encrypted data to tables or Parquet files

Lesson 2: DataFrames with Rust, Python and Notebooks

  • What is Colab?
  • Using Bard to enhance notebook development
  • Exploring Life Expectency in a Notebook
  • Load a DataFrame with sensitive data
  • Using MLFlow with Databricks Notebooks
  • End to End ML with MLFlow and Databricks
  • Comparing DataFrame Libraries between Rust and Python

Lesson 3: Data Engineering Libraries and Tools with Rust

  • Parquet file writing and reading with Rust
  • Arrow & Parquet in Rust
  • Serverless functions with Rust and AWS Lambda
  • Polars library overview
  • Building RESTful APIs with Rocket
  • Utilizing Async Rust in Web Development
  • Applying Data Cleaning Techniques with Rust
  • Deploying Rust Applications in a Kubernetes Environment
Section 4: Designing Data Processing Systems in Rust

Lesson 1: Getting Started with Rust Data Pipelines (Including ETL)

  • Jack and the Beanstalk Data Pipelines
  • Open Source Data Engineering - Pros and Cons
  • Core Components of Data Engineering Pipelines
  • Rust AWS Step Functions Pipeline
  • Rust AWS Lambda Async S3 Size Calculator
  • What is Distroless
  • Demo Deploying Rust Microservices on GCP

Lesson 2: Using Rust and Python for LLMs, ONNX, Hugging Face, and PyTorch Pipelines

  • Introduction to Hugging Face Hub
  • Rust PyTorch Pre-trained Model Ecosystem
  • Rust GPU Hugging Face Translator
  • Rust PyTorch High-Performance Options
  • Rust CUDA PyTorch Stress Test
  • EFS ONNX Rust Inference with AWS Lambda
  • Theory behind model fine-tuning
  • Doing Fine Tuning

Lesson 3: Building SQL Solutions with Rust, Generative AI and Cloud

  • Selecting the correct database on GCP
  • Rust SQLite Hugging Face Zero Shot Classifier
  • Prompt Engineering for BigQuery
  • Big Query to Colab Pipeline
  • Exploring Data with Big Query
  • Using Public Datasets for Data Science
  • Querying Log files with BigQuery
  • There is no one size database
  • Course Conclusion
Learning Objectives

By the end of this Course, you will be able to:

  • Leverage Rust's robust data structures and collections for efficient data manipulation.
  • Understand and utilize Rust's safety and security features to build reliable and secure data engineering solutions.
  • Utilize Rust's libraries and tools specific to data engineering, such as Diesel, async, Polars, and Apache Arrow.
  • Interface effectively with databases, data processing systems, REST and gRPC protocols, and leverage AWS SDK for cloud-based data operations in Rust.
  • Design and implement comprehensive data processing systems in Rust.
  • Apply the principles of concurrent programming in Rust to build high-performance data processing applications.
  • Identify and mitigate common data engineering problems using Rust's unique features, like its strong type system and memory safety guarantees.
  • Develop command-line applications and multi-threaded servers in Rust, focusing on efficient, safe, and concurrent processing of data.
  • Create practical projects, gaining hands-on experience in Rust for data engineering.
Additional Popular Resources
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Watch now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

Rust for Rustaceans

Rust for Rustaceans

Jon Gjengset
Rust in Action

Rust in Action

Tim McNamara
The Rust Programming Language, 2nd Edition

The Rust Programming Language, 2nd Edition

Steve Klabnik, Carol Nichols
Effective Rust

Effective Rust

David Drysdale

Publisher Resources

ISBN: 07072023VIDEOPAIMLOtherOtherOther