November 2017
Beginner to intermediate
290 pages
7h 34m
English
The first operator (or group of operators) rely on scanning a directory for available files and then read the file incrementally while emitting records. The operator works based on the Hadoop filesystem abstraction and can therefore be used with HDFS, S3, MaprFS, FTP, local/mounted filesystems like NFS, and other supported systems. How the file content is split and what types of records are produced is defined by specializations of a common base class to support various file formats.

The preceding diagram shows some of the specialization available and the properties that the user can configure. It is impossible to support all file ...
Read now
Unlock full access