Chapter 4. Pig’s Data Model
Before we take a look at the operators that Pig Latin provides, we first need to understand Pig’s data model. This includes Pig’s data types, how it handles concepts such as missing data, and how you can describe your data to Pig.
Types
Pig’s data types can be divided into two categories: scalar types, which contain a single value, and complex types, which contain other types.
Scalar Types
Pig’s scalar types are simple types that appear in most
programming languages. With the exception of bytearray, they are all
represented in Pig interfaces by java.lang
classes, making them easy to work with in UDFs:
- int
An integer. Ints are represented in interfaces by
java.lang.Integer. They store a four-byte signed integer. Constant integers are expressed as integer numbers, for example,42.- long
A long integer. Longs are represented in interfaces by
java.lang.Long. They store an eight-byte signed integer. Constant longs are expressed as integer numbers with anLappended, for example,5000000000L.- float
A floating-point number. Floats are represented in interfaces by
java.lang.Floatand use four bytes to store their value. You can find the range of values representable by Java’sFloattype at http://java.sun.com/docs/books/jls/third_edition/html/typesValues.html#4.2.3. Note that because this is a floating-point number, in some calculations it will lose precision. For calculations that require no loss of precision, you should use an int or long instead. Constant floats are ...