Chapter 5

Reading and Writing Data

In This Chapter

arrow Compressing data

arrow Managing files with the Hadoop file system commands

arrow Ingesting log data with Flume

This chapter tells you all about getting data in and out of Hadoop, which are basic operations along the path of big data discovery.

We begin by describing the importance of data compression for optimizing the performance of your Hadoop installation, and we briefly outline some of the available compression utilities that are supported by Hadoop. We also give you an overview of the Hadoop file system (FS) shell (a command-line interface), which includes a number of shell-like commands that you can use to directly interact with the Hadoop Distributed File System (HDFS) and other file systems that Hadoop supports. Finally, we describe how you can use Apache Flume — the Hadoop community technology for collecting large volumes of log files and storing them in Hadoop — to efficiently ingest huge volumes of log data.

tip.eps We use the word “ingest” all over this chapter and this book. In short, ingesting data simply means to accept data from an ...

Get Hadoop For Dummies now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.