Chapter 9. Writing Evaluation and Filter Functions

It is time to turn our attention to how you can extend Pig. So far we have focused on the operators and functions Pig provides. But Pig also makes it easy for you to add your own processing logic via user-defined functions (UDFs). These are written in Java and scripting languages. This chapter will walk through how you can build evaluation functions, or UDFs that operate on single elements of data or collections of data. It will also cover how to write filter functions, which are UDFs that can be used as part of filter statements.

UDFs are powerful tools, and thus the interfaces are somewhat complex. In designing Pig, a central goal was to make easy things easy and hard things possible. So, the simplest UDFs can be implemented in a single method, but you will have to implement a few more methods to take advantage of more advanced features. We will cover both cases in this chapter.

Throughout this chapter we will use several running examples of UDFs. Some of these are built-in Pig UDFs, which can be found in your Pig distribution at src/org/apache/pig/builtin/. The others can be found on GitHub with the other example UDFs, in the directory udfs.

Writing an Evaluation Function in Java

Pig and Hadoop are implemented in Java, and so it is natural to implement UDFs in Java. This allows UDFs access to the Hadoop APIs and to many of Pig’s facilities.

Before diving into the details, it is worth considering names. Pig locates a UDF by looking ...

Get Programming Pig, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.