November 2017
Beginner to intermediate
290 pages
7h 34m
English
The first transform applied to the beginning of the pipeline reads the works of Shakespeare from a public bucket on Google Cloud Storage:
TextIO.read().from("gs://apache-beam-samples/shakespeare/*")
This outputs a PCollection containing one string element for each line of all the works of shakespeare. TextIO is a built-in connector for reading collections of lines stored in many text files. The reading will be split across files and also within a file, so this supports both numerous files and extremely large files.
Read now
Unlock full access