July 2017
Intermediate to advanced
796 pages
18h 55m
English
Create an input stream that monitors a Hadoop-compatible filesystem for new files and reads them using the given key-value types and input format. Files must be written to the monitored directory by moving them from another location within the same filesystem. File names starting with a dot (.) are ignored, so this is an obvious choice for the moved file names in the monitored directory. Using an atomic file rename function call, the filename which starts with . can be now renamed to an actual usable filename so that fileStream can pick it up and let us process the file content:
def fileStream[K: ClassTag, V: ClassTag, F <: NewInputFormat[K, V]: ClassTag] (directory: String): InputDStream[(K, V)]
Read now
Unlock full access