Cascading provides a wide range of built-in operations to perform on workflows. For many apps, the Cascading API is more than sufficient. However, you may run into cases where a slightly different transformation is needed. Each of the Cascading operations can be extended by subclassing in Java. Let’s extend the Cascading app from Example 2: The Ubiquitous Word Count to show how to customize an operation.
Modifying a conceptual flow diagram is a good way to add new requirements for a Cascading app.
Figure 2-1 shows how this iteration of
Word Count can be modified to clean up the token stream.
A new class for this example will go right after the
Tokenize operation so that it can scrub each tuple.
In terms of Cascading patterns, this operation needs to be used in an
Each operator, so we must implement it as a
Starting from the source code directory that you cloned in Git, connect into the part3 subdirectory.
We’ll define a new class called
ScrubFunction as our custom operation, which subclasses from
while implementing the
Next, we need to define a constructor, which specifies how this function consumes from the tuple stream: