Chapter 2. Extending Pipe Assemblies
Example 3: Customized Operations
Cascading provides a wide range of built-in operations to perform on workflows. For many apps, the Cascading API is more than sufficient. However, you may run into cases where a slightly different transformation is needed. Each of the Cascading operations can be extended by subclassing in Java. Let’s extend the Cascading app from Example 2: The Ubiquitous Word Count to show how to customize an operation.
Modifying a conceptual flow diagram is a good way to add new requirements for a Cascading app.
Figure 2-1 shows how this iteration of Word Count
can be modified to clean up the token stream.
A new class for this example will go right after the Tokenize
operation so that it can scrub each tuple.
In terms of Cascading patterns, this operation needs to be used in an Each
operator, so we must implement it as a Function
.
Figure 2-1. Conceptual flow diagram for Example 3: Customized Operations
Starting from the source code directory that you cloned in Git, connect into the part3 subdirectory.
We’ll define a new class called ScrubFunction
as our custom operation, which subclasses from
BaseOperation
while implementing the
Function
interface:
public
class
ScrubFunction
extends
BaseOperation
implements
Function
{
...
}
Next, we need to define a constructor, which specifies how this function consumes from the tuple stream:
public ...
Get Enterprise Data Workflows with Cascading now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.