April 2015
Beginner to intermediate
328 pages
11h 1m
English
This chapter covers
In the last chapter you saw the requirements for storing a master dataset and how a distributed filesystem is a great fit for those requirements. But you also saw how using a filesystem API directly felt way too low-level for the kinds of operations you need to do on the master dataset. In this chapter we’ll show you how to use a specific distributed filesystem—HDFS—and then show how to automate the tasks you need to do with a higher-level API.
Like all illustration chapters, we’ll focus on specific tools to show the nitty-gritty of applying the higher-level ...