November 2017
Beginner to intermediate
290 pages
7h 34m
English
As the next step toward running on Apex, you can also run your pipeline on a local Apex cluster, for a testing scenario that is slightly more similar to production:
mvn compile exec:java \
-P apex-runner \
-D exec.mainClass=org.apache.beam.examples.WordCount \
-Dexec.args="--inputFile=gs://apache-beam-samples/shakespeare/* --output=/tmp/output-apex/ --runner=ApexRunner --embeddedExecution=true"
Again, you should find output files in /tmp/output-apex. The number of files may differ, but their overall contents will be the same. Unless you request particular sharding, it is up to the Beam runner to decide the parallelism of the write step.
Now, we should run this on a real YARN cluster; if you ...
Read now
Unlock full access