Book description
- Install and configure Spark .NET on Windows, Linux, and macOS
- Write Apache Spark programs in C# and F# using the .NET bindings
- Access and invoke the Apache Spark APIs from .NET with the same high performance as Python, Scala, and R
- Encapsulate functionality in user-defined functions
- Transform and aggregate large datasets
- Execute SQL queries against files through Apache Hive
- Distribute processing of large datasets across multiple servers
- Create your own batch, streaming, and machine learning programs
Table of contents
Product information
- Title: Introducing .NET for Apache Spark: Distributed Processing for Massive Datasets
- Author(s):
- Release date: April 2021
- Publisher(s): Apress
- ISBN: 9781484269923
You might also like
article
Run Llama-2 Models Locally with llama.cpp
Llama is Meta’s answer to the growing demand for LLMs. Unlike its well-known technological relative, ChatGPT, …
book
Productive and Efficient Data Science with Python: With Modularizing, Memory profiles, and Parallel/GPU Processing
This book focuses on the Python-based tools and techniques to help you become highly productive at …
book
Reproducible Data Science with Pachyderm
Create scalable and reliable data pipelines easily with Pachyderm Key Features Learn how to build an …
book
Machine Learning Automation with TPOT
Discover how TPOT can be used to handle automation in machine learning and explore the different …