Index

A note on the digital index

A link in an index entry is displayed as the section title in which that entry appears. Because some sections have multiple index markers, it is not unusual for an entry to have several links to the same section. Clicking on any link will take you directly to the place in the text in which the marker appears.

Symbols

!= inequality operator, Filter
# dereference operator for maps, Map
$ macro parameter, Macros
$ parameter substitution target, Parameter Substitution
% modulo operator, Expressions in foreach
() tuple parentheses, Dump
* all fields, Expressions in foreach
* multiplication operator, Expressions in foreach
* zero or more characters glob, Load
+ addition operator, Expressions in foreach
- subtraction operator, Expressions in foreach
- unary negative operator, Expressions in foreach
-- single line comment operator, Comments
.. range of fields, Expressions in foreach
/ division operator, Expressions in foreach
/* */ multiline comment operator, Comments
< inequality operator, Filter
<= inequality operator, Filter
== equality operator, Filter
> inequality operator, Filter
>= inequality operator, Filter
? any character glob, Load
? bincond operator, Expressions in foreach
[] map brackets, Dump
\ escape character, Load
{} bag braces, Dump
{} macro operator, Macros

A

ABS function, Built-in math UDFs
accumulator interface, Accumulator Interface
ACID, NoSQL Databases
ACOS function, Built-in math UDFs
AddForEach optimization, Debugging Tips
algebraic calculations, Group, Algebraic Interface
algebraic interface, Algebraic InterfaceAlgebraic Interface
aliases, Preliminary Matters, define and UDFs
Amazon Elastic MapReduce (EMR), Pig’s History, Running Pig in the Cloud
Apache HBase, HBaseHBase
Apache HCatalog, Metadata in Hadoop
Apache Hive, Pig and Hive
Apache open source, What Is Pig?, Downloading the Pig Package from Apache
arithmetic operators, Expressions in foreach
as clause (load function), Load, Naming fields in foreach
as clause (stream command), stream
ASIN function, Built-in math UDFs
ATAN function, Built-in math UDFs
AVG functions, Built-in aggregate UDFs

B

bad records, handling, Bad Record Handling
bag data type, Bag, Schemas, Interacting with Pig values, Memory Issues in Eval Funcs, Python UDFs
bag DIFF function, Built-in complex type UDFs
bag projection, Expressions in foreach
bag TOBAG function, Built-in complex type UDFs
bag TOP function, Built-in complex type UDFs
BagFactory class, Interacting with Pig values
baseball examples, Code Examples in This Book, Schemas, Expressions in foreach, Registering Python UDFs, flatten, Nonlinear Data Flows
base on balls and IBBs, Schemas
batting average, Expressions in foreach
data set, Code Examples in This Book, flatten
players by position and team, Nonlinear Data Flows
slugging percentage, Registering Python UDFs
behavior prediction models, What Is Pig Useful For?
binary condition operator, Expressions in foreach
bind call, Bind
bindings, multiple, Binding Multiple Sets of Variables, Running Multiple Bindings
boolean IsEmpty functions, Built-in filter functions
Boolean operators, Filter
bottlenecks, Making Pig Fly
built-in aggregate UDFs, Built-in aggregate UDFsBuilt-in aggregate UDFs
built-in chararray and bytearray UDFs, Built-in chararray and bytearray UDFsBuilt-in chararray and bytearray UDFs
built-in complex type UDFs, Built-in complex type UDFsBuilt-in complex type UDFs
built-in filter functions, Built-in filter functions
built-in load and store functions, Built-in Load and Store Functions
built-in math UDFs, Built-in math UDFs
bytearray CONCAT functions, Built-in chararray and bytearray UDFs
bytearray type, Scalar Types, Schemas, Choose the Right Data Type, Python UDFs, Casting bytearrays

C

cache clause (define statement), stream
caching option (HBase), HBase
Cascading, Cascading
case sensitivity, Case Sensitivity, User Defined Functions, Writing an Evaluation Function in Java
Pig Latin, Case Sensitivity
UDF names, User Defined Functions, Writing an Evaluation Function in Java
Cassandra, Apache, Cassandra
Cassandra: The Definitive Guide (Hewitt), Cassandra
caster option (HBase), HBase
casts, CastsCasts, Getting the casting functions, Casting bytearrays
cat command, HDFS Commands in Grunt, Order by
CBRT function, Built-in math UDFs
CEIL function, Built-in math UDFs
chararray functions, Built-in aggregate UDFs, Built-in aggregate UDFs, Built-in chararray and bytearray UDFs, Built-in chararray and bytearray UDFs, Built-in chararray and bytearray UDFs, Built-in chararray and bytearray UDFs, Built-in chararray and bytearray UDFs, Built-in chararray and bytearray UDFs, Built-in chararray and bytearray UDFs, Built-in chararray and bytearray UDFs, Built-in chararray and bytearray UDFs, Built-in chararray and bytearray UDFs, Built-in chararray and bytearray UDFs, Built-in chararray and bytearray UDFs
CONCAT, Built-in chararray and bytearray UDFs
LCFIRST, Built-in chararray and bytearray UDFs
LOWER, Built-in chararray and bytearray UDFs
MAX, Built-in aggregate UDFs
MIN, Built-in aggregate UDFs
REGEX_EXTRACT, Built-in chararray and bytearray UDFs
REGEX_EXTRACT_ALL, Built-in chararray and bytearray UDFs
REPLACE, Built-in chararray and bytearray UDFs
STRSPLIT, Built-in chararray and bytearray UDFs
SUBSTRING, Built-in chararray and bytearray UDFs
TOKENIZE, Built-in chararray and bytearray UDFs
TRIM, Built-in chararray and bytearray UDFs
UCFIRST, Built-in chararray and bytearray UDFs
UPPER, Built-in chararray and bytearray UDFs
chararray type, Scalar Types, Schemas, Filter, Python UDFs
checking syntax, Syntax Highlighting and Checking
Cloud computing, Running Pig in the Cloud
Cloudera, downloading Pig from, Downloading Pig from Cloudera
cluster, Running Pig on Your Hadoop Cluster, Using Compression in Intermediate Results
running Pig on your, Running Pig on Your Hadoop Cluster
setting up LZO on your, Using Compression in Intermediate Results
cogroup operator, Parallel, cogroup, Nonlinear Data Flows, Setting the Partitioner, explain, explain, Filter Early and Often
columnMapKeyPrune optimization, Debugging Tips
combiner phase, Group, Algebraic Interface, Combiner Phase
combiner, turning off, Debugging Tips
command tab completion, Grunt
command-line options, Command-Line and Configuration Options
comment operators (Pig Latin), Comments
compile method, Compile
complex data types, Complex TypesNulls, Evaluation Function Basics, Input and Output Schemas, Built-in Evaluation and Filter Functions, Built-in complex type UDFs
compression, using in intermediate results, Using Compression in Intermediate Results
CONCAT functions, Built-in chararray and bytearray UDFs
constructors, Constructors and Passing Data from Frontend to BackendUDFContext
controlling execution, Controlling Execution
copyFromLocal command, HDFS Commands in Grunt
copyToLocal command, HDFS Commands in Grunt
COR function, Built-in complex type UDFs
corrupted data, handling, Bad Record Handling
COS function, Built-in math UDFs
COSH function, Built-in math UDFs
COUNT function, Evaluation Function Basics, Algebraic Interface, Algebraic Interface, Accumulator Interface, Built-in aggregate UDFs
COUNT_STAR function, Built-in aggregate UDFs
COV function, Built-in complex type UDFs
cross operator, Parallel, crosscross, Nonlinear Data Flows, Setting the Partitioner, Filter Early and Often

D

-D passing properties, Command-Line and Configuration Options
DAG (directed acyclic graph), Pig Latin, a Parallel Dataflow Language, Nonlinear Data Flows
data, What Is Pig Useful For?, TypesNulls, Debugging Tips, Choose the Right Data Type, Data Layout Optimization, Constructors and Passing Data from Frontend to Backend, Writing DataWriting records, Pig and Hive, Metadata in Hadoop
layout optimization, Data Layout Optimization
passing, Constructors and Passing Data from Frontend to Backend
pipelines, What Is Pig Useful For?, Debugging Tips, Pig and Hive, Metadata in Hadoop
types, TypesNulls, Choose the Right Data Type
writing, Writing DataWriting records
data sets, example, Code Examples in This Book
dataflow languages, Pig Latin, a Parallel Dataflow Language, Embedding Pig Latin in Python
DataNodes, Loading the distributed cache, Distributed Cache, Hadoop Distributed File System
debugging, Debugging Tips
%declare, Parameter Substitution
declaring, Schemas, Nonlinear Data Flows, Macros, Choose the Right Data Type, Input and Output Schemas, Constructors and Passing Data from Frontend to Backend
a filename, Constructors and Passing Data from Frontend to Backend
a macro, Macros
a schema, Schemas, Input and Output Schemas
a type, Nonlinear Data Flows, Choose the Right Data Type
%default, Parameter Substitution
define statement, Registering UDFs, define and UDFs, stream, Macros, Constructors and Passing Data from Frontend to Backend
define utility method, Utility Methods
describe operator, describe
development tools, Development ToolsDebugging Tips
DeWitt, David J., Joining skewed data
DIFF function, Built-in complex type UDFs
directed acyclic graph (DAG), Pig Latin, a Parallel Dataflow Language, Nonlinear Data Flows
distinct operator, Distinct, Parallel, Nested foreach, Nested foreach, Setting the Partitioner, Filter Early and Often
distributed cache, Joining small to large data, stream, Loading the distributed cache, Distributed Cache
distributive calculations, Group, Algebraic Interface
double functions, Built-in math UDFs, Built-in math UDFs, Built-in math UDFs, Built-in math UDFs, Built-in math UDFs, Built-in math UDFs, Built-in math UDFs, Built-in math UDFs, Built-in math UDFs, Built-in math UDFs, Built-in math UDFs, Built-in math UDFs, Built-in math UDFs, Built-in math UDFs, Built-in math UDFs, Built-in math UDFs, Built-in math UDFs, Built-in aggregate UDFs, Built-in aggregate UDFs, Built-in aggregate UDFs, Built-in aggregate UDFs, Built-in aggregate UDFs, Miscellaneous built-in UDF
ABS, Built-in math UDFs
ACOS, Built-in math UDFs
ASIN, Built-in math UDFs
ATAN, Built-in math UDFs
AVG, Built-in aggregate UDFs
CBRT, Built-in math UDFs
CEIL, Built-in math UDFs
COS, Built-in math UDFs
COSH, Built-in math UDFs
EXP, Built-in math UDFs
FLOOR, Built-in math UDFs
LOG, Built-in math UDFs
LOG10, Built-in math UDFs
MAX, Built-in aggregate UDFs, Built-in aggregate UDFs
MIN, Built-in aggregate UDFs
RANDOM, Miscellaneous built-in UDF
SIN, Built-in math UDFs
SINH, Built-in math UDFs
SQRT, Built-in math UDFs
SUM, Built-in aggregate UDFs
TAN, Built-in math UDFs
TANH, Built-in math UDFs
double type, Scalar Types, Schemas, Python UDFs
-dryrun command line option, Macros, Syntax Highlighting and Checking
dump statement, Dump

E

Eclipse syntax highlighting, Syntax Highlighting and Checking
Elastic MapReduce (EMR), Running Pig in the Cloud
Emacs syntax highlighting, Syntax Highlighting and Checking
embedding Pig Latin in Python, Embedding Pig Latin in PythonUtility Methods
EMR (Elastic MapReduce), Amazon, Running Pig in the Cloud
equality operators, Filter
errors, How Pig differs from MapReduce, Entering Pig Latin Scripts in Grunt, Schemas, Schemas, Order by, union, explain, Run, Input and Output Schemas, Error Handling and Progress Reporting, Reading records, Failure Cleanup, Handling Failure
checking in Grunt, Entering Pig Latin Scripts in Grunt
debugging with explain, explain
in evaluation functions, Error Handling and Progress Reporting
failure cleanup, Failure Cleanup, Handling Failure
getErrorMessage function, Run
parse, Reading records
in Pig Latin scripts, How Pig differs from MapReduce
runtime exceptions, Input and Output Schemas
schema, Schemas, Schemas, union
sorting by maps, tuples, bags, Order by
escape characters (Unix shell command line), Load
ETL (extract transform load) data pipelines, What Is Pig Useful For?
evaluation functions, UDFs in foreach, Writing an Evaluation Function in Java, Where Your UDF Will Run, Evaluation Function Basics, Input and Output SchemasInput and Output Schemas, Error Handling and Progress Reporting, Memory Issues in Eval Funcs, Built-in Evaluation and Filter FunctionsMiscellaneous built-in UDF
basics, UDFs in foreach, Evaluation Function Basics
built-in, Built-in Evaluation and Filter FunctionsMiscellaneous built-in UDF
error handling and progress reporting, Error Handling and Progress Reporting
input and output schemas, Input and Output SchemasInput and Output Schemas
memory issues in, Memory Issues in Eval Funcs
where your UDF will run, Where Your UDF Will Run
writing in Java, Writing an Evaluation Function in Java
examples, Code Examples in This Book, MapReduce’s hello world, MapReduce’s hello world, MapReduce’s hello world, MapReduce’s hello world, Comparing query and dataflow languages, How Pig differs from MapReduce, Running Pig Locally on Your Machine, Running Pig on Your Hadoop Cluster, Expressions in foreach, Joining small to large data, Joining skewed data, cross, cross, streammapreduce, streammapreduce, Embedding Pig Latin in PythonUtility Methods, Constructors and Passing Data from Frontend to BackendLoading the distributed cache, Writing Load and Store Functions, Writing Load and Store Functions, Store FunctionsStore Functions and UDFContext, Storing Metadata, HBase
(see also baseball examples)
(see also NYSE examples)
blacklisting URLs, streammapreduce
calculating page rank from web crawl, Code Examples in This Book, streammapreduce, Embedding Pig Latin in PythonUtility Methods
determining metropolitan area, cross
finding the top five URLs, How Pig differs from MapReduce
group then join in SQL and Pig Latin, Comparing query and dataflow languages
HBase table, HBase
“hello world”, MapReduce’s hello world
JsonLoader, Writing Load and Store Functions
JsonStorage, Writing Load and Store Functions
MetroResolver, Constructors and Passing Data from Frontend to BackendLoading the distributed cache
running Pig in local mode, Running Pig Locally on Your Machine
running Pig on your cluster, Running Pig on Your Hadoop Cluster
store function, Store FunctionsStore Functions and UDFContext, Storing Metadata
user distribution by city, Joining skewed data, cross
word count, MapReduce’s hello world
ZIP code lookup, Joining small to large data
exec command, Controlling Pig from Grunt
-execute (-e) command-line option, Command-Line and Configuration Options
EXP function, Built-in math UDFs
explain operator, explainexplain
explicit splits, Nonlinear Data Flows

F

failure cleanup, Failure Cleanup, Handling Failure
fields, Preliminary Matters
FileOutputFormat, Setting the output location
filesystem operations, Utility Methods
filter functions, Filter, define and UDFs, Writing Evaluation and Filter Functions, Writing Filter Functions, Built-in filter functions
filter operator, How Pig differs from MapReduce, FilterFilter, Nested foreach, Writing Evaluation and Filter Functions, Writing Filter Functions, Using partitions, Metadata in Hadoop
filters, Debugging Tips, Debugging Tips, Debugging Tips, Filter Early and Often
MergeFilter optimization, Debugging Tips
pushing, Filter Early and Often
PushUpFilter optimization, Debugging Tips
SplitFilter optimization, Debugging Tips
Finding the Top Five URLs example, How Pig differs from MapReduce
flatten statement, flattenflatten
float functions, Built-in aggregate UDFs, Built-in aggregate UDFs, Built-in aggregate UDFs
AVG, Built-in aggregate UDFs
MAX, Built-in aggregate UDFs
MIN, Built-in aggregate UDFs
float type, Scalar Types, Schemas, Python UDFs
FLOOR function, Built-in math UDFs
foreach operator, foreach, UDFs in foreach, Advanced Features of foreachNested foreach, explain, Filter Early and Often
fragment-replicate join, Joining small to large data
frontend planning functions, Frontend Planning FunctionsPassing Information from the Frontend to the Backend, Store Function Frontend PlanningStore Functions and UDFContext
frontend/backend invocation, Constructors and Passing Data from Frontend to BackendUDFContext
fs keyword, HDFS Commands in Grunt
fuzzy joins, cross

G

gateway machine, Running Pig on Your Hadoop Cluster
Gaussian distribution, Group
getAllErrorMessages method, Run
getBytesWritten method, Run
getDuration method, Run
getErrorMessage method, Run
getNumberBytes method, Run
getNumberJobs method, Run
getNumberRecords method, Run
getOutputFormat method, Determining OutputFormat
getOutputLocations, getOutputNames methods, Run
getRecordWritten method, Run
getReturnCode method, Run
getUDFContext method, UDFContext
Global Rearrange operator, explain
globs, Load
GNU Public License (GPL) for LZO, Using Compression in Intermediate Results
group by clause, GroupGroup
group by operator, How Pig differs from MapReduce
group operator, GroupGroup, Parallel, Nonlinear Data Flows, Setting the Partitioner, Filter Early and Often, Evaluation Function Basics
“Group then join in SQL and Pig Latin” example, Comparing query and dataflow languages
Grunt, Grunt, Entering Pig Latin Scripts in Grunt, HDFS Commands in Grunt, Controlling Pig from Grunt, explain
controlling Pig from, Controlling Pig from Grunt
entering Pig Latin scripts in, Entering Pig Latin Scripts in Grunt
explain Pig Latin script in, explain
HDFS commands in, HDFS Commands in Grunt
gt option (HBase), HBase
gte option (HBase), HBase
gzip compression type, Using Compression in Intermediate Results

H

-h properties command-line option, Command-Line and Configuration Options
Hadoop, Pig on Hadoop, Running Pig on Your Hadoop Cluster, Command-Line and Configuration Options, HDFS Commands in Grunt, HDFS Commands in Grunt, Tune Pig and Hadoop for Your Job, Using Compression in Intermediate Results, Constructors and Passing Data from Frontend to BackendLoading the distributed cache, Writing Load and Store FunctionsDetermining the location, Metadata in Hadoop, Overview of HadoopHadoop Distributed File System, Hadoop Distributed File System
fs shell commands, HDFS Commands in Grunt
HDFS (Hadoop Distributed File System), Pig on Hadoop, HDFS Commands in Grunt, Constructors and Passing Data from Frontend to BackendLoading the distributed cache, Writing Load and Store FunctionsDetermining the location, Hadoop Distributed File System
Java properties used, Command-Line and Configuration Options
metadata in, Metadata in Hadoop
overview, Overview of HadoopHadoop Distributed File System
running Pig on your cluster, Running Pig on Your Hadoop Cluster
tarball, Using Compression in Intermediate Results
tuning, Tune Pig and Hadoop for Your Job
hadoop-site.xml file, Running Pig on Your Hadoop Cluster
Hadoop: The Definitive Guide (White), Tune Pig and Hadoop for Your Job, Overview of Hadoop
handling failure, Handling Failure
hashCode function, Shuffle Phase
HashPartitioner, Shuffle Phase
HBase, Apache, HBaseHBase
HBaseStorage function, Getting the casting functions, HBaseHBase, Built-in Load and Store Functions, Built-in Load and Store Functions
HCatalog, Apache, Metadata in Hadoop
HCatLoader, Using partitions, Pushing down projections
heap size, Joining skewed data, Tune Pig and Hadoop for Your Job, Memory Issues in Eval Funcs
hello world example, MapReduce’s hello world
-help (-h) command-line option, Command-Line and Configuration Options
Hewitt, Eben, Cassandra
highlighting syntax, Syntax Highlighting and Checking
Hive, Apache, Pig and Hive

J

Jackson JSON library, Writing Load and Store Functions
JAR files, Downloading Pig Artifacts from Maven, Registering UDFs, Registering Python UDFs, Testing Your Scripts with PigUnit, Utility Methods, Python UDFs, Writing Load and Store Functions, Piggybank
downloading, Downloading Pig Artifacts from Maven
Jackson, Writing Load and Store Functions
Jython, Registering Python UDFs
Piggybank, Registering UDFs, Piggybank
pigunit, Testing Your Scripts with PigUnit
registering, Utility Methods, Python UDFs
Java, Pig Philosophy, Downloading the Pig Package from Apache, Downloading the Pig Package from Apache, Command-Line and Configuration Options, Scalar TypesNulls, Bag, Filter, User Defined Functions, define and UDFs, Calling Static Java Functions, Calling Static Java Functions, Joining small to large data, mapreduce, set, Setting the Partitioner, Testing Your Scripts with PigUnit, Embedding Pig Latin in Python, Writing an Evaluation Function in JavaMemory Issues in Eval Funcs, Interacting with Pig values, Input and Output Schemas, Input and Output Schemas, Input and Output Schemas, Input and Output Schemas, Loading the distributed cache, Overloading UDFs, Python UDFs, Casting bytearrays, Store Functions, Cascading, HBase, Built-in Evaluation and Filter Functions, Map Phase
and Cascading data flows, Cascading
casting and HBase, HBase
compared with Python, Python UDFs
data types used by Pig, Scalar TypesNulls, Input and Output Schemas
embedding interface, Embedding Pig Latin in Python
evaluation functions in, Writing an Evaluation Function in JavaMemory Issues in Eval Funcs, Built-in Evaluation and Filter Functions
integration with Pig, Pig Philosophy, Downloading the Pig Package from Apache
Iterable, Interacting with Pig values
JUnit, Testing Your Scripts with PigUnit
and MapReduce, Map Phase
memory requirements of, Bag, Joining small to large data
multiple inheritance workaround, Casting bytearrays, Store Functions
passing arguments to, mapreduce
properties used by Pig and Hadoop, Command-Line and Configuration Options, set
reflection, Calling Static Java Functions, Input and Output Schemas, Input and Output Schemas
regular expressions, Filter
setting JAVA_HOME, Downloading the Pig Package from Apache
setting the Partitioner, Setting the Partitioner
static functions, Calling Static Java Functions
UDFs and, User Defined Functions, define and UDFs, Input and Output Schemas, Loading the distributed cache, Overloading UDFs
JobTracker, Running Pig on Your Hadoop Cluster, MapReduce Job Status, Error Handling and Progress Reporting, MapReduce
join operator, Parallel
joining small to large data, Joining small to large data, Distributed Cache
joining sorted data, Joining sorted data
joins, Comparing query and dataflow languages, How Pig differs from MapReduce, What Is Pig Useful For?, JoinJoin, Join, Join, Parallel, Using Different Join Implementationscross, Joining small to large data, Joining sorted data, Joining sorted data, Nonlinear Data Flows, Setting the Partitioner, illustrate, Filter Early and Often, Set Up Your Joins Properly, Determining the location
default behavior, JoinJoin
and filter pushing, Filter Early and Often
how to update every five minutes, What Is Pig Useful For?
inner, Join, Joining sorted data
input path overwritten, Determining the location
no multiquery for, Nonlinear Data Flows
other implementations, Using Different Join Implementationscross, Set Up Your Joins Properly
outer, Join, Joining small to large data
parallel clause and, Parallel
partition clause and, Setting the Partitioner
in Pig Latin versus MapReduce, How Pig differs from MapReduce
in Pig Latin versus SQL, Comparing query and dataflow languages
and sample records, illustrate
sort-merge, Joining sorted data
JSON, Schemas, Interacting with Pig values, Writing Load and Store FunctionsLoading metadata, Determining OutputFormatStoring Metadata
JsonLoader example, Interacting with Pig values, Writing Load and Store FunctionsLoading metadata
JsonStorage example, Determining OutputFormatStoring Metadata
JUnit, Testing Your Scripts with PigUnit
Jython, User Defined Functions, Registering Python UDFs, Python UDFs

L

LAST_INDEX_OF function, Built-in chararray and bytearray UDFs
LCFIRST function, Built-in chararray and bytearray UDFs
Le Dem, Julien, Embedding Pig Latin in Python
licensing, What Is Pig?, Using Compression in Intermediate Results
limit operator, Limit, Parallel, Nested foreach
limit option (HBase), HBase
LimitOptimizer optimization, Debugging Tips
linear data flows, Nonlinear Data Flows
load clause (mapreduce statement), mapreduce
load function (PigStorage), Choose the Right Data Type
load functions (Pig), Load FunctionsPushing down projections, Frontend Planning FunctionsPassing Information from the Frontend to the Backend, Passing Information from the Frontend to the Backend, Backend Data ReadingReading records, Additional Load Function InterfacesPushing down projections, Loading metadata, Built-in Load and Store Functions
additional interfaces, Additional Load Function InterfacesPushing down projections
backend data reading, Backend Data ReadingReading records
built-in, Built-in Load and Store Functions
frontend planning functions, Frontend Planning FunctionsPassing Information from the Frontend to the Backend
loading metadata, Loading metadata
passing info frontend to backend, Passing Information from the Frontend to the Backend
load operator, Load, explain, Filter Early and Often
loadKey option (HBase), HBase
local mode, Running Pig Locally on Your Machine
Local Rearrange operator, explain
LOG function, Built-in math UDFs
LOG10 function, Built-in math UDFs
logical optimizer, Debugging Tips
logical plan, explain, Debugging Tips
LogicalExpressionsSimplifier optimization, Debugging Tips
logs, MapReduce Job Status, Error Handling and Progress Reporting
long AVG function, Built-in aggregate UDFs
long functions, Built-in math UDFs, Built-in aggregate UDFs, Built-in aggregate UDFs, Built-in aggregate UDFs, Built-in aggregate UDFs, Built-in aggregate UDFs, Built-in chararray and bytearray UDFs, Built-in complex type UDFs
COUNT, Built-in aggregate UDFs
COUNT_STAR, Built-in aggregate UDFs
MAX, Built-in aggregate UDFs
MIN, Built-in aggregate UDFs
ROUND, Built-in math UDFs
SIZE, Built-in chararray and bytearray UDFs, Built-in complex type UDFs
SUM, Built-in aggregate UDFs
long type, Scalar Types, Schemas, Python UDFs
lookup table, constructing, Constructors and Passing Data from Frontend to Backend
LOWER function, Built-in chararray and bytearray UDFs
lt option (HBase), HBase
lte option (HBase), HBase
LZO compression type, Using Compression in Intermediate Results

M

macros, Macros
map data type, Map, Schemas, Python UDFs
map only jobs, Reduce Phase
map parallelism, Parallel
map phase, Pig on Hadoop, Map Phase
map projection operator (#), Expressions in foreach
map TOMAP function, Built-in complex type UDFs
MapReduce, Pig on Hadoop, How Pig differs from MapReduceHow Pig differs from MapReduce, mapreduce, MapReduce Job Status, Tune Pig and Hadoop for Your Job, MapReduce
how Pig differs from, How Pig differs from MapReduceHow Pig differs from MapReduce
integrating with Pig, mapreduce
job status, MapReduce Job Status
performance tuning properties, Tune Pig and Hadoop for Your Job
mapreduce operator, mapreduce, Filter Early and Often
“Mary had a Little Lamb” example, MapReduce’s hello world
Maven, downloading Pig from, Downloading Pig Artifacts from Maven
MAX functions, Built-in aggregate UDFs
memory, Bag, Making Pig Fly, Tune Pig and Hadoop for Your Job
buffer size, Tune Pig and Hadoop for Your Job
requirements for Pig data types, Bag
size, Making Pig Fly
merge join, Joining sorted data, Set Up Your Joins Properly
MergeFilter optimization, Debugging Tips
MergeForEach optimization, Debugging Tips
metadata, Loading metadata, Storing Metadata, Metadata in Hadoop
in Hadoop, Metadata in Hadoop
loading, Loading metadata
storing, Storing Metadata
metropolitan name example, Constructors and Passing Data from Frontend to BackendLoading the distributed cache
MIN functions, Overloading UDFs, Built-in aggregate UDFs
multiple bindings, running, Running Multiple Bindings
multiple joins, Join
multiple keys, grouping on, Group
multiquery, Nonlinear Data Flows, Use Multiquery When Possible
multiway joins, Joining skewed data

N

NameNode, Running Pig on Your Hadoop Cluster, Joining small to large data, Data Layout Optimization, Loading the distributed cache, Distributed Cache, Hadoop Distributed File System
namespaces, Registering Python UDFs
nested foreach, Nested foreachNested foreach
noise words, Join
nonlinear data flows, Nonlinear Data FlowsNonlinear Data Flows
NoSQL databases, NoSQL Databases
null, Nulls, Expressions in foreach, Filter, Join, Error Handling and Progress Reporting
NYSE examples, Code Examples in This Book, Running Pig Locally on Your Machine, Casts, Distinct, Join, Nested foreach, Nested foreach, Nested foreach, Joining sorted data, stream, Macros, UDFContext
average dividends, Running Pig Locally on Your Machine
buy/sell analyzer, UDFContext
daily sorted dividends, Joining sorted data
data set, Code Examples in This Book
dividends increased between two dates, Join
filter out low-dividend stocks, stream
find list of ticker symbols, Distinct
number of unique stock symbols, Nested foreach
stock-price changes on dividend days, Macros
top three dividends, Nested foreach
total trade estimate, Casts
tracking a stock over time, Nested foreach

O

Olston, Christopher, Pig’s History
optimizations, turning off, Debugging Tips, Debugging Tips
optimizing scripts, Making Pig FlyBad Record Handling
order by operator, How Pig differs from MapReduce, Order by
order operator, Order by, Order by, Parallel, Nested foreach, Setting the Partitioner
outer joins, Join, Joining small to large data
output clause (define command), stream
output location, Setting the output location
output phase, Output Phase
output schemas, Input and Output Schemas
output size, Making Pig Fly
OutputFormat, Store Functions, Output Phase
overloading, Calling Static Java Functions, Overloading UDFs

P

Package operator, explain
page rank, calculating from web crawl, Embedding Pig Latin in PythonUtility Methods
parallel clause, Parallel
parallel dataflow language, Pig Latin, a Parallel Dataflow Language
parallelism, Select the Right Level of Parallelism, Where Your UDF Will Run, Writing Load and Store Functions
parameter substitution, Parameter SubstitutionParameter Substitution
partition clause, Setting the Partitioner
Partitioner class, Setting the Partitioner, Shuffle Phase
partitions, using, Using partitions
performance tuning properties (MapReduce), Tune Pig and Hadoop for Your Job
philosophy of Pig, Pig Philosophy
physical plan, explain
Pig, Pig Philosophy, Pig’s History, Downloading and Installing PigDownloading the Source, Downloading the Pig Package from Apache, Downloading the Pig Package from Apache, Downloading the Source, Downloading the Source, Running PigCommand-Line and Configuration Options, Casts, Integrating Pig with Legacy Code and MapReducemapreduce, Tune Pig and Hadoop for Your Job, Utility Methods, Python UDFs
downloading and installing, Downloading and Installing PigDownloading the Source
fs method, Utility Methods
history, Pig’s History
integrating with legacy code and MapReduce, Integrating Pig with Legacy Code and MapReducemapreduce
issue-tracking system, Downloading the Source
performance tuning, Tune Pig and Hadoop for Your Job
philosophy, Pig Philosophy
portability, Downloading the Pig Package from Apache
release page, Downloading the Pig Package from Apache
running, Running PigCommand-Line and Configuration Options
strength of typing, Casts
translation to Python types, Python UDFs
version control page, Downloading the Source
“Pig counts Mary and her lamb” example, MapReduce’s hello world
Pig Latin, What Is Pig?, What Is Pig Useful For?, Preliminary Matters, Preliminary Matters, Case Sensitivity, Comments, Input and OutputDump, Relational OperationsParallel, Pig Latin PreprocessorIncluding Other Pig Latin Scripts, Developing and Testing Pig Latin ScriptsTesting Your Scripts with PigUnit, Syntax Highlighting and Checking, Embedding Pig Latin in PythonUtility Methods
best use cases for, What Is Pig Useful For?
case sensitivity, Case Sensitivity
comment operators, Comments
developing and testing scripts, Developing and Testing Pig Latin ScriptsTesting Your Scripts with PigUnit
embedding in Python, Embedding Pig Latin in PythonUtility Methods
fields, Preliminary Matters
input and output, Input and OutputDump
preprocessor, Pig Latin PreprocessorIncluding Other Pig Latin Scripts
relational operations, Relational OperationsParallel
relations, Preliminary Matters
syntax highlighting packages, Syntax Highlighting and Checking
“Pig Latin: A Not-So-Foreign Language for Data Processing” (Olston), Pig’s History
Piggybank, User Defined Functions, Piggybank
PigStats methods, Run
PigStorage function, Store, Getting the casting functions, Built-in Load and Store Functions, Built-in Load and Store Functions
PigUnit, Testing Your Scripts with PigUnitTesting Your Scripts with PigUnit
pipelines, data, What Is Pig Useful For?, Debugging Tips, Pig and Hive, Metadata in Hadoop
POSIX, Pig on Hadoop, Hadoop Distributed File System
power law distribution, Group
“Practical Skew Handling in Parallel Joins” (DeWitt et al.), Joining skewed data
prepareToRead, Getting ready to read
prepareToWrite method, Preparing to write
prereduce merge, Combiner Phase
projections, pushing down, Pushing down projections
-propertyFile (-P) command-line option, Command-Line and Configuration Options
PushDownForeachFlatten feature, Debugging Tips
PushUpFilter optimization, Debugging Tips
Pygmalion project, Cassandra
Python, User Defined Functions, Registering Python UDFs, Embedding Pig Latin in PythonUtility Methods, Python UDFsPython UDFs
embedding Pig Latin in, Embedding Pig Latin in PythonUtility Methods
UDFs, User Defined Functions, Registering Python UDFs, Python UDFsPython UDFs

R

RANDOM functions, Miscellaneous built-in UDF
raw data, What Is Pig Useful For?, Pig and Hive
RDBMS versus Hadoop environments, Comparing query and dataflow languages, Using Different Join Implementations
RecordWriter class, Preparing to write, Output Phase
reduce phase, Pig on Hadoop, Reduce Phase
reducers, How Pig differs from MapReduce, Group, Order by, Joining skewed data, Select the Right Level of Parallelism, Combiner Phase
reflection, Calling Static Java Functions, Input and Output Schemas, Input and Output Schemas
REGEX_EXTRACT function, Built-in chararray and bytearray UDFs
REGEX_EXTRACT_ALL function, Built-in chararray and bytearray UDFs
register command, Registering UDFs
registerJar utility method, Utility Methods
registerUDF utility method, Utility Methods
regular expressions, Filter
relational operations, Relational OperationsParallel, Advanced Features of foreachcross
relations, Preliminary Matters
REPLACE function, Built-in chararray and bytearray UDFs
result method, Run
return codes, Return Codes, Run
returns clause (define statement), Macros
rmr command, HDFS Commands in Grunt
ROUND function, Built-in math UDFs
run command, Controlling Pig from Grunt
running multiple bindings, Running Multiple Bindings
“Running Pig in Local Mode” example, Running Pig Locally on Your Machine
“Running Pig On Your Cluster” example, Running Pig on Your Hadoop Cluster
runSingle command, Run
runtime declaration (schemas), Schemas
runtime exceptions, Input and Output Schemas

S

sampling, Sample, illustrate
illustrate tool, illustrate
sample operator, Sample
scalar types, Scalar Types
schemas, SchemasCasts, Input and Output SchemasInput and Output Schemas, Python UDFs, Loading metadata, Checking the schema
scripts, Testing Your Scripts with PigUnitTesting Your Scripts with PigUnit, Making Pig FlyBad Record Handling
optimizing, Making Pig FlyBad Record Handling
testing with PigUnit, Testing Your Scripts with PigUnitTesting Your Scripts with PigUnit
self joins, Join
semi-join, cogroup
set command, set
set utility method, Utility Methods
setLocation, Determining the location
setOutputPath utility function, Setting the output location
setStoreLocation function, Setting the output location
setting the Partitioner, Setting the Partitioner
ship clause, stream
shuffle phase, Pig on Hadoop, Shuffle Phase
shuffle size, Making Pig Fly
SIN function, Built-in math UDFs
SINH function, Built-in math UDFs
SIZE functions, Built-in chararray and bytearray UDFs, Built-in complex type UDFs
skew joins, Joining skewed data, Setting the Partitioner, Set Up Your Joins Properly, Tune Pig and Hadoop for Your Job
skew, handling of, How Pig differs from MapReduce, Group, Group, Order by, Joining skewed data, Setting the Partitioner, Set Up Your Joins Properly, Select the Right Level of Parallelism, Tune Pig and Hadoop for Your Job, Algebraic Interface, Combiner Phase
Hadoop combiner, Group, Algebraic Interface, Combiner Phase
order by operator, Order by
skew joins, Joining skewed data, Setting the Partitioner, Set Up Your Joins Properly, Tune Pig and Hadoop for Your Job
sort command, Filter Early and Often
sort-merge join, Joining sorted data
source code, Downloading the Source
speculative execution, Select the Right Level of Parallelism, Handling Failure
spill files, number of, Tune Pig and Hadoop for Your Job
spilling to disk, Memory Issues in Eval Funcs
split operator, Nonlinear Data Flows, Filter Early and Often
SplitCombination optimization, Debugging Tips
SplitFilter optimization, Debugging Tips
SQL compared/contrasted with Pig, Comparing query and dataflow languagesComparing query and dataflow languages, Tuple, Bag, Filter, Filter, Group, Distinct, Join, Join, Using Different Join Implementations, union, Pig and Hive, Built-in aggregate UDFs
Apache Hive, Pig and Hive
constraints on data, Bag
dataflow and query languages, Comparing query and dataflow languagesComparing query and dataflow languages
group operator, Group
long COUNT, Built-in aggregate UDFs
noise words, Join
nulls, Filter, Join
optimizers, Using Different Join Implementations
trinary logic, Filter
tuples, Tuple
union, union
use of distinct statement, Distinct
SQL layer (Apache Hive), Pig and Hive
SQRT function, Built-in math UDFs
static Java functions, Calling Static Java Functions
statistics summary, Pig Statistics
stats command, Pig Statistics
stock analyzer example, UDFContext
store clause (mapreduce statement), mapreduce
store functions, Writing Load and Store Functions, Store FunctionsStoring Metadata, Built-in Load and Store Functions
built-in, Built-in Load and Store Functions
writing, Writing Load and Store Functions, Store FunctionsStoring Metadata
store operator, Store, explain, Filter Early and Often
StoreFunc class, Store Functions
storing metadata, Storing Metadata
stream operator, stream, Filter Early and Often
streams, number of, Tune Pig and Hadoop for Your Job
STRSPLIT functions, Built-in chararray and bytearray UDFs
subqueries, Pig alternative to, Comparing query and dataflow languages
SUBSTRING functions, Built-in chararray and bytearray UDFs
SUM functions, Algebraic Interface, Built-in aggregate UDFs, Built-in aggregate UDFs
svn version control, Downloading the Source
syntax highlighting and checking, Syntax Highlighting and Checking
synthetic join, cross

V

variables, binding multiple sets of, Binding Multiple Sets of Variables
-version command-line option, Command-Line and Configuration Options
version control with git, Downloading the Source
version differences in Hadoop, Running Pig on Your Hadoop Cluster, Load
file locations, Running Pig on Your Hadoop Cluster
globs, Load
version differences in Pig, Downloading the Pig Package from Apache, Running Pig Locally on Your Machine, Running Pig on Your Hadoop Cluster, Command-Line and Configuration Options, HDFS Commands in Grunt, HDFS Commands in Grunt, Map, Schemas, Schemas, Dump, Expressions in foreach, Parallel, User Defined Functions, User Defined Functions, Registering UDFs, Registering UDFs, Registering Python UDFs, Calling Static Java Functions, flatten, Joining skewed data, Joining sorted data, cross, mapreduce, Setting the Partitioner, Pig Latin Preprocessor, Macros, Including Other Pig Latin Scripts, illustrate, Pig Statistics, Debugging Tips, Testing Your Scripts with PigUnit, Project Early and Often, Data Layout Optimization, Embedding Pig Latin in Python, Writing Evaluation and Filter Functions, Writing Evaluation and Filter Functions, Input and Output Schemas, Loading the distributed cache, UDFContext, Python UDFs, Writing Load and Store Functions, Casting bytearrays, HBase, Built-in Evaluation and Filter FunctionsMiscellaneous built-in UDF
.. field range, Expressions in foreach
built-in eval and filter functions, Built-in Evaluation and Filter FunctionsMiscellaneous built-in UDF
bytesToMap methods, Casting bytearrays
column families, HBase
data layout optimization, Data Layout Optimization
dependencies inside Python scripts, Registering Python UDFs
dump output, Dump
EvalFunc, Loading the distributed cache
flatten schema bug, flatten
globs accepted by register, Registering UDFs
Grunt command sh, HDFS Commands in Grunt
hadoop fs shell commands, Running Pig on Your Hadoop Cluster, HDFS Commands in Grunt
Hadoop requirements, Downloading the Pig Package from Apache
handling of Java properties, Command-Line and Configuration Options
HDFS paths for register, Registering UDFs
illustrate, illustrate
invoker methods, Calling Static Java Functions
Java eval funcs, Writing Evaluation and Filter Functions
joins, Joining skewed data, Joining sorted data
load and store functions, Writing Load and Store Functions
local mode execution, Running Pig Locally on Your Machine
logical optimizer and plan, Debugging Tips, Project Early and Often
macros, Macros
map declared values, Map
map schemas, Input and Output Schemas
mapreduce command, mapreduce
non-Java UDFs, User Defined Functions
number of output records in a bag, cross
parallel level, Parallel
PigUnit, Testing Your Scripts with PigUnit
preprocessor actions, Pig Latin Preprocessor, Including Other Pig Latin Scripts
Python, Embedding Pig Latin in Python, Writing Evaluation and Filter Functions, Python UDFs
runtime adaption code, Schemas
setting the Partitioner, Setting the Partitioner
summary statistics, Pig Statistics
truncation and null padding, Schemas
UDFContext class, UDFContext
UDFs languages, User Defined Functions
Vim syntax highlighting, Syntax Highlighting and Checking

Get Programming Pig now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.