Skip to Content
SPARK學習手冊
book

SPARK學習手冊

by Holden Karau, Andy Konwinski, Patrick We
September 2016
Intermediate to advanced
288 pages
6h 6m
Chinese
GoTop Information, Inc.
Content preview from SPARK學習手冊
檔案系統 |
89
一些輸入格式(例如 SequenceFiles)允許我們只對鍵值對中的值進行壓縮,這對於搜尋
方面的應用是很有幫助的。其他的輸入格式有各自的壓縮控制流程:舉例來說,許多在
Twitter Elephant Bird 套件中的格式都是使用 LZO 壓縮。
檔案系統
Spark 對眾多的檔案系統支援存取操作,我們可以使用任何一種想使用的檔案系統。
本地 /「一般」檔案系統
雖然 Spark 支援從本地系統讀取檔案,但它需要
這個檔案在叢集中的所有運算節點中
都有相同的路徑
一些網路檔案系統,例如 NFSAFS MapR NFS 對使用者來說就像一般的檔案系
統。如果資料存在上述的檔案系統之一,可以用
file://
宣告將那些資料當作輸入來源;
一旦檔案系統掛載在每個節點中的相同路徑(請參考範例 5-29), Spark 就會處理那些
檔案。
範例
5-29 Scala
從本地系統讀取壓縮文字
val rdd = sc.textFile("file:///home/holden/happypandas.gz")
如果你的檔案沒有在叢集的所有節點內,可以在驅動程式中先不透過 Spark 從本地端讀
取檔案,隨後呼叫
parallelize
分散檔案到所有工作節點。這個方法可能會相當的慢,
所以我們建議你將檔案存在分散式的檔案系統中,例如 HDFSNFS 或是 S3
Amazon S3
Amazon S3 是儲存大量資料集時越來越盛行的選項。當你的運算節點位於 Amazon EC2
內,S3 的存取速度會特別的快。但如果必須透過公共網路傳輸檔案,那效能會變得相當 ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

成為卓越程式設計師的38項必修法則

成為卓越程式設計師的38項必修法則

Pete Goodliffe
高性能Spark

高性能Spark

Holden Karau, Rachel Warren
持續交付|使用Java

持續交付|使用Java

Daniel Bryant, Abraham Marín-Pérez

Publisher Resources

ISBN: 9789864760466