Skip to Content
实时数据处理和分析指南
book

实时数据处理和分析指南

by Posts & Telecom Press, Shilpi Saxena, Saurabh Gupta
May 2024
Beginner to intermediate
296 pages
4h 54m
Chinese
Packt Publishing
Content preview from 实时数据处理和分析指南

第11章 Spark Streaming

本章将介绍Spark Streaming、体系结构和微批的概念;将研究流应用的各种组件,以及集成了大量输入源的流应用的内部结构;还将进行一些实践练习,以演示动作中流应用的执行。

本章主要包括以下内容

  • Spark Streaming的概念
  • Spark Streaming的简介和体系结构
  • Spark Streaming的封装结构
  • 连接Kafka和Spark Streaming

Spark框架及其所有扩展一起提供了一种通用的解决方案,该方案可以满足批处理、分析和实时等企业数据需求。为了能够执行实时数据处理,该框架应该能够处理近乎实时的无界数据流。此功能是由Spark框架中Spark Streaming扩展下的微批和流处理提供。

简单来说,可以将数据集理解为一个不断实时生成的无界数据序列。现在,为了能够处理这些不断到达的数据流,各种框架对它们的处理方式如下。

  • 单独处理的不同的离散事件。
  • 将单个事件通过微批变为非常小的批次,这些批次作为单个单元进行处理。

Spark提供流API作为其内核API的扩展,内核API是一个可扩展、低延迟、高吞吐量和容错的框架,能使用微批实时处理传入的流数据。

在某些用例中,基于Spark框架的实时处理方案将派上用场。这些用例包括:监控基础设施、应用程序或进程;检测欺诈;营销和广告;物联网。

图11-1截取了世界各地关于实时数据生成速率的一些统计数据,其中描述的所有场景往往都可作为Spark Streaming处理的用例。

图11-1

Spark Streaming是Spark内核API的一个非常有用的扩展,被广泛用于处理实时或接 ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

scikit-learn机器学习(第2版)

scikit-learn机器学习(第2版)

Posts & Telecom Press, Gavin Hackeling
自然语言处理与计算语言学

自然语言处理与计算语言学

Posts & Telecom Press, Bhargav Srinivasa-Desikan
编写整洁的Python代码(第2版)

编写整洁的Python代码(第2版)

Posts & Telecom Press, Mariano Anaya
Excel金融建模

Excel金融建模

Posts & Telecom Press, Shmuel Oluwa

Publisher Resources

ISBN: 9781836208617