Chapter 11. Writing Drill User-Defined Functions
In the previous chapters, you learned about Drill’s powerful analytic capabilities. There are many situations in which you might want to perform a transformation on some data and Drill simply does not have the capability readily at hand. However, it is quite possible to extend Drill’s capabilities by writing your own user-defined functions (UDFs).
Drill supports two different types of UDFs: simple and aggregate. A simple UDF accepts a column or expression as input and returns a single result. The result can be a complex type such as an array or map. An aggregate UDF is different in that it accepts as input all the values for a group as defined in a
GROUP BY or similar clause and returns a single result. The
SUM() function is a good example: it accepts a column or expression, adds up all the values, and returns a single result. You can use an aggregate UDF in conjunction with the
GROUP BY statement as well, and it will perform aggregate operations on a section of the data.
Use Case: Finding and Filtering Valid Credit Card Numbers
Suppose you are conducting security research and you find a large list of what appear to be credit card numbers. You want to determine whether these are valid credit card numbers and, if so, notify the appropriate banks.
A credit card number is not simply a random sequence of digits. Indeed, these numbers are quite specific and can be validated by an algorithm known as the Luhn algorithm. Although Drill ...