Appendix A. Built-in User Defined Functions and PiggyBank

This appendix covers UDFs that come as part of the Pig distribution, including built-in UDFs and user-contributed UDFs in PiggyBank.

Built-in UDFs

Pig comes prepackaged with many UDFs that can be used directly in Pig without using register or define. These include load, store, evaluation, and filter functions.

Built-in Load and Store Functions

Pig’s built-in load functions are listed in Table A-1; Table A-2 lists the store functions.

Table A-1. Load functions
FunctionLocation string indicatesConstructor argumentsDescription
AccumuloStorageAccumulo table

The first argument is a string describing the column family and column to Pig field mapping.

The second is an option string (optional).

Load data from Accumulo.
AvroStorageHDFS file (Avro files)

The first argument is the input schema or record name (optional).

The second is an option string (optional).

Load data from Avro files on HDFS.
HBaseStorageHBase table

The first argument is a string describing the column family and column to Pig field mapping.

The second is an option string (optional).

Load data from HBase (see “HBase”).
JsonLoaderHDFS file (JSON files)

The first argument is the input schema (optional).

Load data from JSON files on HDFS.
OrcStorageHDFS file (ORC files)None.Load data from ORC files on HDFS.
ParquetLoaderHDFS file (Parquet files)

The first argument is a subset schema to load (optional).

Load data from Parquet files on HDFS.
PigStorageHDFS fileThe first argument is a field separator ...

Get Programming Pig, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.