A note on the digital index
A link in an index entry is displayed as the section title in which that entry appears. Because some sections have multiple index markers, it is not unusual for an entry to have several links to the same section. Clicking on any link will take you directly to the place in the text in which the marker appears.
Symbols
- != inequality operator, Filter
- # dereference operator for maps, Map
- $ macro parameter, Macros
- $ parameter substitution target, Parameter Substitution
- % modulo operator, Expressions in foreach
- () tuple parentheses, Dump
- * all fields, Expressions in foreach
- * multiplication operator, Expressions in foreach
- * zero or more characters glob, Load
- + addition operator, Expressions in foreach
- - subtraction operator, Expressions in foreach
- - unary negative operator, Expressions in foreach
- -- single line comment operator, Comments
- .. range of fields, Expressions in foreach
- / division operator, Expressions in foreach
- /* */ multiline comment operator, Comments
- < inequality operator, Filter
- <= inequality operator, Filter
- == equality operator, Filter
- > inequality operator, Filter
- >= inequality operator, Filter
- ? any character glob, Load
- ? bincond operator, Expressions in foreach
- [] map brackets, Dump
- \ escape character, Load
- {} bag braces, Dump
- {} macro operator, Macros
A
- ABS function, Built-in math UDFs
- accumulator interface, Accumulator Interface
- ACID, NoSQL Databases
- ACOS function, Built-in math UDFs
- AddForEach optimization, Debugging Tips
- algebraic calculations, Group, Algebraic Interface
- algebraic interface, Algebraic Interface–Algebraic Interface
- aliases, Preliminary Matters, define and UDFs
- Amazon Elastic MapReduce (EMR), Pig’s History, Running Pig in the Cloud
- Apache HBase, HBase–HBase
- Apache HCatalog, Metadata in Hadoop
- Apache Hive, Pig and Hive
- Apache open source, What Is Pig?, Downloading the Pig Package from Apache
- arithmetic operators, Expressions in foreach
- as clause (load function), Load, Naming fields in foreach
- as clause (stream command), stream
- ASIN function, Built-in math UDFs
- ATAN function, Built-in math UDFs
- AVG functions, Built-in aggregate UDFs
B
- bad records, handling, Bad Record Handling
- bag data type, Bag, Schemas, Interacting with Pig values, Memory Issues in Eval Funcs, Python UDFs
- bag DIFF function, Built-in complex type UDFs
- bag projection, Expressions in foreach
- bag TOBAG function, Built-in complex type UDFs
- bag TOP function, Built-in complex type UDFs
- BagFactory class, Interacting with Pig values
- baseball examples, Code Examples in This Book, Schemas, Expressions in foreach, Registering Python UDFs, flatten, Nonlinear Data Flows
- base on balls and IBBs, Schemas
- batting average, Expressions in foreach
- data set, Code Examples in This Book, flatten
- players by position and team, Nonlinear Data Flows
- slugging percentage, Registering Python UDFs
- behavior prediction models, What Is Pig Useful For?
- binary condition operator, Expressions in foreach
- bind call, Bind
- bindings, multiple, Binding Multiple Sets of Variables, Running Multiple Bindings
- boolean IsEmpty functions, Built-in filter functions
- Boolean operators, Filter
- bottlenecks, Making Pig Fly
- built-in aggregate UDFs, Built-in aggregate UDFs–Built-in aggregate UDFs
- built-in chararray and bytearray UDFs, Built-in chararray and bytearray UDFs–Built-in chararray and bytearray UDFs
- built-in complex type UDFs, Built-in complex type UDFs–Built-in complex type UDFs
- built-in filter functions, Built-in filter functions
- built-in load and store functions, Built-in Load and Store Functions
- built-in math UDFs, Built-in math UDFs
- bytearray CONCAT functions, Built-in chararray and bytearray UDFs
- bytearray type, Scalar Types, Schemas, Choose the Right Data Type, Python UDFs, Casting bytearrays
C
- cache clause (define statement), stream
- caching option (HBase), HBase
- Cascading, Cascading
- case sensitivity, Case Sensitivity, User Defined Functions, Writing an Evaluation Function in Java
- Pig Latin, Case Sensitivity
- UDF names, User Defined Functions, Writing an Evaluation Function in Java
- Cassandra, Apache, Cassandra
- Cassandra: The Definitive Guide (Hewitt), Cassandra
- caster option (HBase), HBase
- casts, Casts–Casts, Getting the casting functions, Casting bytearrays
- cat command, HDFS Commands in Grunt, Order by
- CBRT function, Built-in math UDFs
- CEIL function, Built-in math UDFs
- chararray functions, Built-in aggregate UDFs, Built-in aggregate UDFs, Built-in chararray and bytearray UDFs, Built-in chararray and bytearray UDFs, Built-in chararray and bytearray UDFs, Built-in chararray and bytearray UDFs, Built-in chararray and bytearray UDFs, Built-in chararray and bytearray UDFs, Built-in chararray and bytearray UDFs, Built-in chararray and bytearray UDFs, Built-in chararray and bytearray UDFs, Built-in chararray and bytearray UDFs, Built-in chararray and bytearray UDFs, Built-in chararray and bytearray UDFs
- CONCAT, Built-in chararray and bytearray UDFs
- LCFIRST, Built-in chararray and bytearray UDFs
- LOWER, Built-in chararray and bytearray UDFs
- MAX, Built-in aggregate UDFs
- MIN, Built-in aggregate UDFs
- REGEX_EXTRACT, Built-in chararray and bytearray UDFs
- REGEX_EXTRACT_ALL, Built-in chararray and bytearray UDFs
- REPLACE, Built-in chararray and bytearray UDFs
- STRSPLIT, Built-in chararray and bytearray UDFs
- SUBSTRING, Built-in chararray and bytearray UDFs
- TOKENIZE, Built-in chararray and bytearray UDFs
- TRIM, Built-in chararray and bytearray UDFs
- UCFIRST, Built-in chararray and bytearray UDFs
- UPPER, Built-in chararray and bytearray UDFs
- chararray type, Scalar Types, Schemas, Filter, Python UDFs
- checking syntax, Syntax Highlighting and Checking
- Cloud computing, Running Pig in the Cloud
- Cloudera, downloading Pig from, Downloading Pig from Cloudera
- cluster, Running Pig on Your Hadoop Cluster, Using Compression in Intermediate Results
- running Pig on your, Running Pig on Your Hadoop Cluster
- setting up LZO on your, Using Compression in Intermediate Results
- cogroup operator, Parallel, cogroup, Nonlinear Data Flows, Setting the Partitioner, explain, explain, Filter Early and Often
- columnMapKeyPrune optimization, Debugging Tips
- combiner phase, Group, Algebraic Interface, Combiner Phase
- combiner, turning off, Debugging Tips
- command tab completion, Grunt
- command-line options, Command-Line and Configuration Options
- comment operators (Pig Latin), Comments
- compile method, Compile
- complex data types, Complex Types–Nulls, Evaluation Function Basics, Input and Output Schemas, Built-in Evaluation and Filter Functions, Built-in complex type UDFs
- compression, using in intermediate results, Using Compression in Intermediate Results
- CONCAT functions, Built-in chararray and bytearray UDFs
- constructors, Constructors and Passing Data from Frontend to Backend–UDFContext
- controlling execution, Controlling Execution
- copyFromLocal command, HDFS Commands in Grunt
- copyToLocal command, HDFS Commands in Grunt
- COR function, Built-in complex type UDFs
- corrupted data, handling, Bad Record Handling
- COS function, Built-in math UDFs
- COSH function, Built-in math UDFs
- COUNT function, Evaluation Function Basics, Algebraic Interface, Algebraic Interface, Accumulator Interface, Built-in aggregate UDFs
- COUNT_STAR function, Built-in aggregate UDFs
- COV function, Built-in complex type UDFs
- cross operator, Parallel, cross–cross, Nonlinear Data Flows, Setting the Partitioner, Filter Early and Often
D
- -D passing properties, Command-Line and Configuration Options
- DAG (directed acyclic graph), Pig Latin, a Parallel Dataflow Language, Nonlinear Data Flows
- data, What Is Pig Useful For?, Types–Nulls, Debugging Tips, Choose the Right Data Type, Data Layout Optimization, Constructors and Passing Data from Frontend to Backend, Writing Data–Writing records, Pig and Hive, Metadata in Hadoop
- layout optimization, Data Layout Optimization
- passing, Constructors and Passing Data from Frontend to Backend
- pipelines, What Is Pig Useful For?, Debugging Tips, Pig and Hive, Metadata in Hadoop
- types, Types–Nulls, Choose the Right Data Type
- writing, Writing Data–Writing records
- data sets, example, Code Examples in This Book
- dataflow languages, Pig Latin, a Parallel Dataflow Language, Embedding Pig Latin in Python
- DataNodes, Loading the distributed cache, Distributed Cache, Hadoop Distributed File System
- debugging, Debugging Tips
- %declare, Parameter Substitution
- declaring, Schemas, Nonlinear Data Flows, Macros, Choose the Right Data Type, Input and Output Schemas, Constructors and Passing Data from Frontend to Backend
- a filename, Constructors and Passing Data from Frontend to Backend
- a macro, Macros
- a schema, Schemas, Input and Output Schemas
- a type, Nonlinear Data Flows, Choose the Right Data Type
- %default, Parameter Substitution
- define statement, Registering UDFs, define and UDFs, stream, Macros, Constructors and Passing Data from Frontend to Backend
- define utility method, Utility Methods
- describe operator, describe
- development tools, Development Tools–Debugging Tips
- DeWitt, David J., Joining skewed data
- DIFF function, Built-in complex type UDFs
- directed acyclic graph (DAG), Pig Latin, a Parallel Dataflow Language, Nonlinear Data Flows
- distinct operator, Distinct, Parallel, Nested foreach, Nested foreach, Setting the Partitioner, Filter Early and Often
- distributed cache, Joining small to large data, stream, Loading the distributed cache, Distributed Cache
- distributive calculations, Group, Algebraic Interface
- double functions, Built-in math UDFs, Built-in math UDFs, Built-in math UDFs, Built-in math UDFs, Built-in math UDFs, Built-in math UDFs, Built-in math UDFs, Built-in math UDFs, Built-in math UDFs, Built-in math UDFs, Built-in math UDFs, Built-in math UDFs, Built-in math UDFs, Built-in math UDFs, Built-in math UDFs, Built-in math UDFs, Built-in math UDFs, Built-in aggregate UDFs, Built-in aggregate UDFs, Built-in aggregate UDFs, Built-in aggregate UDFs, Built-in aggregate UDFs, Miscellaneous built-in UDF
- ABS, Built-in math UDFs
- ACOS, Built-in math UDFs
- ASIN, Built-in math UDFs
- ATAN, Built-in math UDFs
- AVG, Built-in aggregate UDFs
- CBRT, Built-in math UDFs
- CEIL, Built-in math UDFs
- COS, Built-in math UDFs
- COSH, Built-in math UDFs
- EXP, Built-in math UDFs
- FLOOR, Built-in math UDFs
- LOG, Built-in math UDFs
- LOG10, Built-in math UDFs
- MAX, Built-in aggregate UDFs, Built-in aggregate UDFs
- MIN, Built-in aggregate UDFs
- RANDOM, Miscellaneous built-in UDF
- SIN, Built-in math UDFs
- SINH, Built-in math UDFs
- SQRT, Built-in math UDFs
- SUM, Built-in aggregate UDFs
- TAN, Built-in math UDFs
- TANH, Built-in math UDFs
- double type, Scalar Types, Schemas, Python UDFs
- -dryrun command line option, Macros, Syntax Highlighting and Checking
- dump statement, Dump
E
- Eclipse syntax highlighting, Syntax Highlighting and Checking
- Elastic MapReduce (EMR), Running Pig in the Cloud
- Emacs syntax highlighting, Syntax Highlighting and Checking
- embedding Pig Latin in Python, Embedding Pig Latin in Python–Utility Methods
- EMR (Elastic MapReduce), Amazon, Running Pig in the Cloud
- equality operators, Filter
- errors, How Pig differs from MapReduce, Entering Pig Latin Scripts in Grunt, Schemas, Schemas, Order by, union, explain, Run, Input and Output Schemas, Error Handling and Progress Reporting, Reading records, Failure Cleanup, Handling Failure
- checking in Grunt, Entering Pig Latin Scripts in Grunt
- debugging with explain, explain
- in evaluation functions, Error Handling and Progress Reporting
- failure cleanup, Failure Cleanup, Handling Failure
- getErrorMessage function, Run
- parse, Reading records
- in Pig Latin scripts, How Pig differs from MapReduce
- runtime exceptions, Input and Output Schemas
- schema, Schemas, Schemas, union
- sorting by maps, tuples, bags, Order by
- escape characters (Unix shell command line), Load
- ETL (extract transform load) data pipelines, What Is Pig Useful For?
- evaluation functions, UDFs in foreach, Writing an Evaluation Function in Java, Where Your UDF Will Run, Evaluation Function Basics, Input and Output Schemas–Input and Output Schemas, Error Handling and Progress Reporting, Memory Issues in Eval Funcs, Built-in Evaluation and Filter Functions–Miscellaneous built-in UDF
- basics, UDFs in foreach, Evaluation Function Basics
- built-in, Built-in Evaluation and Filter Functions–Miscellaneous built-in UDF
- error handling and progress reporting, Error Handling and Progress Reporting
- input and output schemas, Input and Output Schemas–Input and Output Schemas
- memory issues in, Memory Issues in Eval Funcs
- where your UDF will run, Where Your UDF Will Run
- writing in Java, Writing an Evaluation Function in Java
- examples, Code Examples in This Book, MapReduce’s hello world, MapReduce’s hello world, MapReduce’s hello world, MapReduce’s hello world, Comparing query and dataflow languages, How Pig differs from MapReduce, Running Pig Locally on Your Machine, Running Pig on Your Hadoop Cluster, Expressions in foreach, Joining small to large data, Joining skewed data, cross, cross, stream–mapreduce, stream–mapreduce, Embedding Pig Latin in Python–Utility Methods, Constructors and Passing Data from Frontend to Backend–Loading the distributed cache, Writing Load and Store Functions, Writing Load and Store Functions, Store Functions–Store Functions and UDFContext, Storing Metadata, HBase
- (see also baseball examples)
- (see also NYSE examples)
- blacklisting URLs, stream–mapreduce
- calculating page rank from web crawl, Code Examples in This Book, stream–mapreduce, Embedding Pig Latin in Python–Utility Methods
- determining metropolitan area, cross
- finding the top five URLs, How Pig differs from MapReduce
- group then join in SQL and Pig Latin, Comparing query and dataflow languages
- HBase table, HBase
- “hello world”, MapReduce’s hello world
- JsonLoader, Writing Load and Store Functions
- JsonStorage, Writing Load and Store Functions
- MetroResolver, Constructors and Passing Data from Frontend to Backend–Loading the distributed cache
- running Pig in local mode, Running Pig Locally on Your Machine
- running Pig on your cluster, Running Pig on Your Hadoop Cluster
- store function, Store Functions–Store Functions and UDFContext, Storing Metadata
- user distribution by city, Joining skewed data, cross
- word count, MapReduce’s hello world
- ZIP code lookup, Joining small to large data
- exec command, Controlling Pig from Grunt
- -execute (-e) command-line option, Command-Line and Configuration Options
- EXP function, Built-in math UDFs
- explain operator, explain–explain
- explicit splits, Nonlinear Data Flows
F
- failure cleanup, Failure Cleanup, Handling Failure
- fields, Preliminary Matters
- FileOutputFormat, Setting the output location
- filesystem operations, Utility Methods
- filter functions, Filter, define and UDFs, Writing Evaluation and Filter Functions, Writing Filter Functions, Built-in filter functions
- filter operator, How Pig differs from MapReduce, Filter–Filter, Nested foreach, Writing Evaluation and Filter Functions, Writing Filter Functions, Using partitions, Metadata in Hadoop
- filters, Debugging Tips, Debugging Tips, Debugging Tips, Filter Early and Often
- MergeFilter optimization, Debugging Tips
- pushing, Filter Early and Often
- PushUpFilter optimization, Debugging Tips
- SplitFilter optimization, Debugging Tips
- Finding the Top Five URLs example, How Pig differs from MapReduce
- flatten statement, flatten–flatten
- float functions, Built-in aggregate UDFs, Built-in aggregate UDFs, Built-in aggregate UDFs
- float type, Scalar Types, Schemas, Python UDFs
- FLOOR function, Built-in math UDFs
- foreach operator, foreach, UDFs in foreach, Advanced Features of foreach–Nested foreach, explain, Filter Early and Often
- fragment-replicate join, Joining small to large data
- frontend planning functions, Frontend Planning Functions–Passing Information from the Frontend to the Backend, Store Function Frontend Planning–Store Functions and UDFContext
- frontend/backend invocation, Constructors and Passing Data from Frontend to Backend–UDFContext
- fs keyword, HDFS Commands in Grunt
- fuzzy joins, cross
G
- gateway machine, Running Pig on Your Hadoop Cluster
- Gaussian distribution, Group
- getAllErrorMessages method, Run
- getBytesWritten method, Run
- getDuration method, Run
- getErrorMessage method, Run
- getNumberBytes method, Run
- getNumberJobs method, Run
- getNumberRecords method, Run
- getOutputFormat method, Determining OutputFormat
- getOutputLocations, getOutputNames methods, Run
- getRecordWritten method, Run
- getReturnCode method, Run
- getUDFContext method, UDFContext
- Global Rearrange operator, explain
- globs, Load
- GNU Public License (GPL) for LZO, Using Compression in Intermediate Results
- group by clause, Group–Group
- group by operator, How Pig differs from MapReduce
- group operator, Group–Group, Parallel, Nonlinear Data Flows, Setting the Partitioner, Filter Early and Often, Evaluation Function Basics
- “Group then join in SQL and Pig Latin” example, Comparing query and dataflow languages
- Grunt, Grunt, Entering Pig Latin Scripts in Grunt, HDFS Commands in Grunt, Controlling Pig from Grunt, explain
- controlling Pig from, Controlling Pig from Grunt
- entering Pig Latin scripts in, Entering Pig Latin Scripts in Grunt
- explain Pig Latin script in, explain
- HDFS commands in, HDFS Commands in Grunt
- gt option (HBase), HBase
- gte option (HBase), HBase
- gzip compression type, Using Compression in Intermediate Results
H
- -h properties command-line option, Command-Line and Configuration Options
- Hadoop, Pig on Hadoop, Running Pig on Your Hadoop Cluster, Command-Line and Configuration Options, HDFS Commands in Grunt, HDFS Commands in Grunt, Tune Pig and Hadoop for Your Job, Using Compression in Intermediate Results, Constructors and Passing Data from Frontend to Backend–Loading the distributed cache, Writing Load and Store Functions–Determining the location, Metadata in Hadoop, Overview of Hadoop–Hadoop Distributed File System, Hadoop Distributed File System
- fs shell commands, HDFS Commands in Grunt
- HDFS (Hadoop Distributed File System), Pig on Hadoop, HDFS Commands in Grunt, Constructors and Passing Data from Frontend to Backend–Loading the distributed cache, Writing Load and Store Functions–Determining the location, Hadoop Distributed File System
- Java properties used, Command-Line and Configuration Options
- metadata in, Metadata in Hadoop
- overview, Overview of Hadoop–Hadoop Distributed File System
- running Pig on your cluster, Running Pig on Your Hadoop Cluster
- tarball, Using Compression in Intermediate Results
- tuning, Tune Pig and Hadoop for Your Job
- hadoop-site.xml file, Running Pig on Your Hadoop Cluster
- Hadoop: The Definitive Guide (White), Tune Pig and Hadoop for Your Job, Overview of Hadoop
- handling failure, Handling Failure
- hashCode function, Shuffle Phase
- HashPartitioner, Shuffle Phase
- HBase, Apache, HBase–HBase
- HBaseStorage function, Getting the casting functions, HBase–HBase, Built-in Load and Store Functions, Built-in Load and Store Functions
- HCatalog, Apache, Metadata in Hadoop
- HCatLoader, Using partitions, Pushing down projections
- heap size, Joining skewed data, Tune Pig and Hadoop for Your Job, Memory Issues in Eval Funcs
- hello world example, MapReduce’s hello world
- -help (-h) command-line option, Command-Line and Configuration Options
- Hewitt, Eben, Cassandra
- highlighting syntax, Syntax Highlighting and Checking
- Hive, Apache, Pig and Hive
I
- illustrate operator, illustrate
- implicit splits, Nonlinear Data Flows
- import command, Including Other Pig Latin Scripts
- including other Pig Latin scripts, Including Other Pig Latin Scripts
- INDEXOF function, Built-in chararray and bytearray UDFs
- inner joins, Join, Joining sorted data
- input clause (define command), stream
- input schemas, Input and Output Schemas
- input size, Making Pig Fly
- InputFormat, determining, Determining InputFormat
- int AVG function, Built-in aggregate UDFs
- int functions, Built-in aggregate UDFs, Built-in aggregate UDFs, Built-in chararray and bytearray UDFs, Built-in chararray and bytearray UDFs
- INDEXOF, Built-in chararray and bytearray UDFs
- LAST_INDEX_OF, Built-in chararray and bytearray UDFs
- MAX, Built-in aggregate UDFs
- MIN, Built-in aggregate UDFs
- int type, Scalar Types, Schemas, Python UDFs
- intermediate results size, Making Pig Fly
- invoker methods, Calling Static Java Functions
- isSuccessful method, Run
- iterative processing, What Is Pig Useful For?, Embedding Pig Latin in Python, Binding Multiple Sets of Variables
J
- Jackson JSON library, Writing Load and Store Functions
- JAR files, Downloading Pig Artifacts from Maven, Registering UDFs, Registering Python UDFs, Testing Your Scripts with PigUnit, Utility Methods, Python UDFs, Writing Load and Store Functions, Piggybank
- downloading, Downloading Pig Artifacts from Maven
- Jackson, Writing Load and Store Functions
- Jython, Registering Python UDFs
- Piggybank, Registering UDFs, Piggybank
- pigunit, Testing Your Scripts with PigUnit
- registering, Utility Methods, Python UDFs
- Java, Pig Philosophy, Downloading the Pig Package from Apache, Downloading the Pig Package from Apache, Command-Line and Configuration Options, Scalar Types–Nulls, Bag, Filter, User Defined Functions, define and UDFs, Calling Static Java Functions, Calling Static Java Functions, Joining small to large data, mapreduce, set, Setting the Partitioner, Testing Your Scripts with PigUnit, Embedding Pig Latin in Python, Writing an Evaluation Function in Java–Memory Issues in Eval Funcs, Interacting with Pig values, Input and Output Schemas, Input and Output Schemas, Input and Output Schemas, Input and Output Schemas, Loading the distributed cache, Overloading UDFs, Python UDFs, Casting bytearrays, Store Functions, Cascading, HBase, Built-in Evaluation and Filter Functions, Map Phase
- and Cascading data flows, Cascading
- casting and HBase, HBase
- compared with Python, Python UDFs
- data types used by Pig, Scalar Types–Nulls, Input and Output Schemas
- embedding interface, Embedding Pig Latin in Python
- evaluation functions in, Writing an Evaluation Function in Java–Memory Issues in Eval Funcs, Built-in Evaluation and Filter Functions
- integration with Pig, Pig Philosophy, Downloading the Pig Package from Apache
- Iterable, Interacting with Pig values
- JUnit, Testing Your Scripts with PigUnit
- and MapReduce, Map Phase
- memory requirements of, Bag, Joining small to large data
- multiple inheritance workaround, Casting bytearrays, Store Functions
- passing arguments to, mapreduce
- properties used by Pig and Hadoop, Command-Line and Configuration Options, set
- reflection, Calling Static Java Functions, Input and Output Schemas, Input and Output Schemas
- regular expressions, Filter
- setting JAVA_HOME, Downloading the Pig Package from Apache
- setting the Partitioner, Setting the Partitioner
- static functions, Calling Static Java Functions
- UDFs and, User Defined Functions, define and UDFs, Input and Output Schemas, Loading the distributed cache, Overloading UDFs
- JobTracker, Running Pig on Your Hadoop Cluster, MapReduce Job Status, Error Handling and Progress Reporting, MapReduce
- join operator, Parallel
- joining small to large data, Joining small to large data, Distributed Cache
- joining sorted data, Joining sorted data
- joins, Comparing query and dataflow languages, How Pig differs from MapReduce, What Is Pig Useful For?, Join–Join, Join, Join, Parallel, Using Different Join Implementations–cross, Joining small to large data, Joining sorted data, Joining sorted data, Nonlinear Data Flows, Setting the Partitioner, illustrate, Filter Early and Often, Set Up Your Joins Properly, Determining the location
- default behavior, Join–Join
- and filter pushing, Filter Early and Often
- how to update every five minutes, What Is Pig Useful For?
- inner, Join, Joining sorted data
- input path overwritten, Determining the location
- no multiquery for, Nonlinear Data Flows
- other implementations, Using Different Join Implementations–cross, Set Up Your Joins Properly
- outer, Join, Joining small to large data
- parallel clause and, Parallel
- partition clause and, Setting the Partitioner
- in Pig Latin versus MapReduce, How Pig differs from MapReduce
- in Pig Latin versus SQL, Comparing query and dataflow languages
- and sample records, illustrate
- sort-merge, Joining sorted data
- JSON, Schemas, Interacting with Pig values, Writing Load and Store Functions–Loading metadata, Determining OutputFormat–Storing Metadata
- JsonLoader example, Interacting with Pig values, Writing Load and Store Functions–Loading metadata
- JsonStorage example, Determining OutputFormat–Storing Metadata
- JUnit, Testing Your Scripts with PigUnit
- Jython, User Defined Functions, Registering Python UDFs, Python UDFs
K
- keys, Pig on Hadoop, How Pig differs from MapReduce
- kill command, Controlling Pig from Grunt
L
- LAST_INDEX_OF function, Built-in chararray and bytearray UDFs
- LCFIRST function, Built-in chararray and bytearray UDFs
- Le Dem, Julien, Embedding Pig Latin in Python
- licensing, What Is Pig?, Using Compression in Intermediate Results
- limit operator, Limit, Parallel, Nested foreach
- limit option (HBase), HBase
- LimitOptimizer optimization, Debugging Tips
- linear data flows, Nonlinear Data Flows
- load clause (mapreduce statement), mapreduce
- load function (PigStorage), Choose the Right Data Type
- load functions (Pig), Load Functions–Pushing down projections, Frontend Planning Functions–Passing Information from the Frontend to the Backend, Passing Information from the Frontend to the Backend, Backend Data Reading–Reading records, Additional Load Function Interfaces–Pushing down projections, Loading metadata, Built-in Load and Store Functions
- additional interfaces, Additional Load Function Interfaces–Pushing down projections
- backend data reading, Backend Data Reading–Reading records
- built-in, Built-in Load and Store Functions
- frontend planning functions, Frontend Planning Functions–Passing Information from the Frontend to the Backend
- loading metadata, Loading metadata
- passing info frontend to backend, Passing Information from the Frontend to the Backend
- load operator, Load, explain, Filter Early and Often
- loadKey option (HBase), HBase
- local mode, Running Pig Locally on Your Machine
- Local Rearrange operator, explain
- LOG function, Built-in math UDFs
- LOG10 function, Built-in math UDFs
- logical optimizer, Debugging Tips
- logical plan, explain, Debugging Tips
- LogicalExpressionsSimplifier optimization, Debugging Tips
- logs, MapReduce Job Status, Error Handling and Progress Reporting
- long AVG function, Built-in aggregate UDFs
- long functions, Built-in math UDFs, Built-in aggregate UDFs, Built-in aggregate UDFs, Built-in aggregate UDFs, Built-in aggregate UDFs, Built-in aggregate UDFs, Built-in chararray and bytearray UDFs, Built-in complex type UDFs
- COUNT, Built-in aggregate UDFs
- COUNT_STAR, Built-in aggregate UDFs
- MAX, Built-in aggregate UDFs
- MIN, Built-in aggregate UDFs
- ROUND, Built-in math UDFs
- SIZE, Built-in chararray and bytearray UDFs, Built-in complex type UDFs
- SUM, Built-in aggregate UDFs
- long type, Scalar Types, Schemas, Python UDFs
- lookup table, constructing, Constructors and Passing Data from Frontend to Backend
- LOWER function, Built-in chararray and bytearray UDFs
- lt option (HBase), HBase
- lte option (HBase), HBase
- LZO compression type, Using Compression in Intermediate Results
M
- macros, Macros
- map data type, Map, Schemas, Python UDFs
- map only jobs, Reduce Phase
- map parallelism, Parallel
- map phase, Pig on Hadoop, Map Phase
- map projection operator (#), Expressions in foreach
- map TOMAP function, Built-in complex type UDFs
- MapReduce, Pig on Hadoop, How Pig differs from MapReduce–How Pig differs from MapReduce, mapreduce, MapReduce Job Status, Tune Pig and Hadoop for Your Job, MapReduce
- how Pig differs from, How Pig differs from MapReduce–How Pig differs from MapReduce
- integrating with Pig, mapreduce
- job status, MapReduce Job Status
- performance tuning properties, Tune Pig and Hadoop for Your Job
- mapreduce operator, mapreduce, Filter Early and Often
- “Mary had a Little Lamb” example, MapReduce’s hello world
- Maven, downloading Pig from, Downloading Pig Artifacts from Maven
- MAX functions, Built-in aggregate UDFs
- memory, Bag, Making Pig Fly, Tune Pig and Hadoop for Your Job
- buffer size, Tune Pig and Hadoop for Your Job
- requirements for Pig data types, Bag
- size, Making Pig Fly
- merge join, Joining sorted data, Set Up Your Joins Properly
- MergeFilter optimization, Debugging Tips
- MergeForEach optimization, Debugging Tips
- metadata, Loading metadata, Storing Metadata, Metadata in Hadoop
- in Hadoop, Metadata in Hadoop
- loading, Loading metadata
- storing, Storing Metadata
- metropolitan name example, Constructors and Passing Data from Frontend to Backend–Loading the distributed cache
- MIN functions, Overloading UDFs, Built-in aggregate UDFs
- multiple bindings, running, Running Multiple Bindings
- multiple joins, Join
- multiple keys, grouping on, Group
- multiquery, Nonlinear Data Flows, Use Multiquery When Possible
- multiway joins, Joining skewed data
N
- NameNode, Running Pig on Your Hadoop Cluster, Joining small to large data, Data Layout Optimization, Loading the distributed cache, Distributed Cache, Hadoop Distributed File System
- namespaces, Registering Python UDFs
- nested foreach, Nested foreach–Nested foreach
- noise words, Join
- nonlinear data flows, Nonlinear Data Flows–Nonlinear Data Flows
- NoSQL databases, NoSQL Databases
- null, Nulls, Expressions in foreach, Filter, Join, Error Handling and Progress Reporting
- NYSE examples, Code Examples in This Book, Running Pig Locally on Your Machine, Casts, Distinct, Join, Nested foreach, Nested foreach, Nested foreach, Joining sorted data, stream, Macros, UDFContext
- average dividends, Running Pig Locally on Your Machine
- buy/sell analyzer, UDFContext
- daily sorted dividends, Joining sorted data
- data set, Code Examples in This Book
- dividends increased between two dates, Join
- filter out low-dividend stocks, stream
- find list of ticker symbols, Distinct
- number of unique stock symbols, Nested foreach
- stock-price changes on dividend days, Macros
- top three dividends, Nested foreach
- total trade estimate, Casts
- tracking a stock over time, Nested foreach
O
- Olston, Christopher, Pig’s History
- optimizations, turning off, Debugging Tips, Debugging Tips
- optimizing scripts, Making Pig Fly–Bad Record Handling
- order by operator, How Pig differs from MapReduce, Order by
- order operator, Order by, Order by, Parallel, Nested foreach, Setting the Partitioner
- outer joins, Join, Joining small to large data
- output clause (define command), stream
- output location, Setting the output location
- output phase, Output Phase
- output schemas, Input and Output Schemas
- output size, Making Pig Fly
- OutputFormat, Store Functions, Output Phase
- overloading, Calling Static Java Functions, Overloading UDFs
P
- Package operator, explain
- page rank, calculating from web crawl, Embedding Pig Latin in Python–Utility Methods
- parallel clause, Parallel
- parallel dataflow language, Pig Latin, a Parallel Dataflow Language
- parallelism, Select the Right Level of Parallelism, Where Your UDF Will Run, Writing Load and Store Functions
- parameter substitution, Parameter Substitution–Parameter Substitution
- partition clause, Setting the Partitioner
- Partitioner class, Setting the Partitioner, Shuffle Phase
- partitions, using, Using partitions
- performance tuning properties (MapReduce), Tune Pig and Hadoop for Your Job
- philosophy of Pig, Pig Philosophy
- physical plan, explain
- Pig, Pig Philosophy, Pig’s History, Downloading and Installing Pig–Downloading the Source, Downloading the Pig Package from Apache, Downloading the Pig Package from Apache, Downloading the Source, Downloading the Source, Running Pig–Command-Line and Configuration Options, Casts, Integrating Pig with Legacy Code and MapReduce–mapreduce, Tune Pig and Hadoop for Your Job, Utility Methods, Python UDFs
- downloading and installing, Downloading and Installing Pig–Downloading the Source
- fs method, Utility Methods
- history, Pig’s History
- integrating with legacy code and MapReduce, Integrating Pig with Legacy Code and MapReduce–mapreduce
- issue-tracking system, Downloading the Source
- performance tuning, Tune Pig and Hadoop for Your Job
- philosophy, Pig Philosophy
- portability, Downloading the Pig Package from Apache
- release page, Downloading the Pig Package from Apache
- running, Running Pig–Command-Line and Configuration Options
- strength of typing, Casts
- translation to Python types, Python UDFs
- version control page, Downloading the Source
- “Pig counts Mary and her lamb” example, MapReduce’s hello world
- Pig Latin, What Is Pig?, What Is Pig Useful For?, Preliminary Matters, Preliminary Matters, Case Sensitivity, Comments, Input and Output–Dump, Relational Operations–Parallel, Pig Latin Preprocessor–Including Other Pig Latin Scripts, Developing and Testing Pig Latin Scripts–Testing Your Scripts with PigUnit, Syntax Highlighting and Checking, Embedding Pig Latin in Python–Utility Methods
- best use cases for, What Is Pig Useful For?
- case sensitivity, Case Sensitivity
- comment operators, Comments
- developing and testing scripts, Developing and Testing Pig Latin Scripts–Testing Your Scripts with PigUnit
- embedding in Python, Embedding Pig Latin in Python–Utility Methods
- fields, Preliminary Matters
- input and output, Input and Output–Dump
- preprocessor, Pig Latin Preprocessor–Including Other Pig Latin Scripts
- relational operations, Relational Operations–Parallel
- relations, Preliminary Matters
- syntax highlighting packages, Syntax Highlighting and Checking
- “Pig Latin: A Not-So-Foreign Language for Data Processing” (Olston), Pig’s History
- Piggybank, User Defined Functions, Piggybank
- PigStats methods, Run
- PigStorage function, Store, Getting the casting functions, Built-in Load and Store Functions, Built-in Load and Store Functions
- PigUnit, Testing Your Scripts with PigUnit–Testing Your Scripts with PigUnit
- pipelines, data, What Is Pig Useful For?, Debugging Tips, Pig and Hive, Metadata in Hadoop
- POSIX, Pig on Hadoop, Hadoop Distributed File System
- power law distribution, Group
- “Practical Skew Handling in Parallel Joins” (DeWitt et al.), Joining skewed data
- prepareToRead, Getting ready to read
- prepareToWrite method, Preparing to write
- prereduce merge, Combiner Phase
- projections, pushing down, Pushing down projections
- -propertyFile (-P) command-line option, Command-Line and Configuration Options
- PushDownForeachFlatten feature, Debugging Tips
- PushUpFilter optimization, Debugging Tips
- Pygmalion project, Cassandra
- Python, User Defined Functions, Registering Python UDFs, Embedding Pig Latin in Python–Utility Methods, Python UDFs–Python UDFs
- embedding Pig Latin in, Embedding Pig Latin in Python–Utility Methods
- UDFs, User Defined Functions, Registering Python UDFs, Python UDFs–Python UDFs
Q
- query languages, Comparing query and dataflow languages
R
- RANDOM functions, Miscellaneous built-in UDF
- raw data, What Is Pig Useful For?, Pig and Hive
- RDBMS versus Hadoop environments, Comparing query and dataflow languages, Using Different Join Implementations
- RecordWriter class, Preparing to write, Output Phase
- reduce phase, Pig on Hadoop, Reduce Phase
- reducers, How Pig differs from MapReduce, Group, Order by, Joining skewed data, Select the Right Level of Parallelism, Combiner Phase
- reflection, Calling Static Java Functions, Input and Output Schemas, Input and Output Schemas
- REGEX_EXTRACT function, Built-in chararray and bytearray UDFs
- REGEX_EXTRACT_ALL function, Built-in chararray and bytearray UDFs
- register command, Registering UDFs
- registerJar utility method, Utility Methods
- registerUDF utility method, Utility Methods
- regular expressions, Filter
- relational operations, Relational Operations–Parallel, Advanced Features of foreach–cross
- relations, Preliminary Matters
- REPLACE function, Built-in chararray and bytearray UDFs
- result method, Run
- return codes, Return Codes, Run
- returns clause (define statement), Macros
- rmr command, HDFS Commands in Grunt
- ROUND function, Built-in math UDFs
- run command, Controlling Pig from Grunt
- running multiple bindings, Running Multiple Bindings
- “Running Pig in Local Mode” example, Running Pig Locally on Your Machine
- “Running Pig On Your Cluster” example, Running Pig on Your Hadoop Cluster
- runSingle command, Run
- runtime declaration (schemas), Schemas
- runtime exceptions, Input and Output Schemas
S
- sampling, Sample, illustrate
- illustrate tool, illustrate
- sample operator, Sample
- scalar types, Scalar Types
- schemas, Schemas–Casts, Input and Output Schemas–Input and Output Schemas, Python UDFs, Loading metadata, Checking the schema
- scripts, Testing Your Scripts with PigUnit–Testing Your Scripts with PigUnit, Making Pig Fly–Bad Record Handling
- optimizing, Making Pig Fly–Bad Record Handling
- testing with PigUnit, Testing Your Scripts with PigUnit–Testing Your Scripts with PigUnit
- self joins, Join
- semi-join, cogroup
- set command, set
- set utility method, Utility Methods
- setLocation, Determining the location
- setOutputPath utility function, Setting the output location
- setStoreLocation function, Setting the output location
- setting the Partitioner, Setting the Partitioner
- ship clause, stream
- shuffle phase, Pig on Hadoop, Shuffle Phase
- shuffle size, Making Pig Fly
- SIN function, Built-in math UDFs
- SINH function, Built-in math UDFs
- SIZE functions, Built-in chararray and bytearray UDFs, Built-in complex type UDFs
- skew joins, Joining skewed data, Setting the Partitioner, Set Up Your Joins Properly, Tune Pig and Hadoop for Your Job
- skew, handling of, How Pig differs from MapReduce, Group, Group, Order by, Joining skewed data, Setting the Partitioner, Set Up Your Joins Properly, Select the Right Level of Parallelism, Tune Pig and Hadoop for Your Job, Algebraic Interface, Combiner Phase
- Hadoop combiner, Group, Algebraic Interface, Combiner Phase
- order by operator, Order by
- skew joins, Joining skewed data, Setting the Partitioner, Set Up Your Joins Properly, Tune Pig and Hadoop for Your Job
- sort command, Filter Early and Often
- sort-merge join, Joining sorted data
- source code, Downloading the Source
- speculative execution, Select the Right Level of Parallelism, Handling Failure
- spill files, number of, Tune Pig and Hadoop for Your Job
- spilling to disk, Memory Issues in Eval Funcs
- split operator, Nonlinear Data Flows, Filter Early and Often
- SplitCombination optimization, Debugging Tips
- SplitFilter optimization, Debugging Tips
- SQL compared/contrasted with Pig, Comparing query and dataflow languages–Comparing query and dataflow languages, Tuple, Bag, Filter, Filter, Group, Distinct, Join, Join, Using Different Join Implementations, union, Pig and Hive, Built-in aggregate UDFs
- Apache Hive, Pig and Hive
- constraints on data, Bag
- dataflow and query languages, Comparing query and dataflow languages–Comparing query and dataflow languages
- group operator, Group
- long COUNT, Built-in aggregate UDFs
- noise words, Join
- nulls, Filter, Join
- optimizers, Using Different Join Implementations
- trinary logic, Filter
- tuples, Tuple
- union, union
- use of distinct statement, Distinct
- SQL layer (Apache Hive), Pig and Hive
- SQRT function, Built-in math UDFs
- static Java functions, Calling Static Java Functions
- statistics summary, Pig Statistics
- stats command, Pig Statistics
- stock analyzer example, UDFContext
- store clause (mapreduce statement), mapreduce
- store functions, Writing Load and Store Functions, Store Functions–Storing Metadata, Built-in Load and Store Functions
- store operator, Store, explain, Filter Early and Often
- StoreFunc class, Store Functions
- storing metadata, Storing Metadata
- stream operator, stream, Filter Early and Often
- streams, number of, Tune Pig and Hadoop for Your Job
- STRSPLIT functions, Built-in chararray and bytearray UDFs
- subqueries, Pig alternative to, Comparing query and dataflow languages
- SUBSTRING functions, Built-in chararray and bytearray UDFs
- SUM functions, Algebraic Interface, Built-in aggregate UDFs, Built-in aggregate UDFs
- svn version control, Downloading the Source
- syntax highlighting and checking, Syntax Highlighting and Checking
- synthetic join, cross
T
- tab delimited files, Choose the Right Data Type
- TAN function, Built-in math UDFs
- TANH function, Built-in math UDFs
- tarball, Hadoop, Downloading the Pig Package from Apache, Using Compression in Intermediate Results
- TaskTracker, MapReduce, Hadoop Distributed File System
- testing scripts with PigUnit, Testing Your Scripts with PigUnit–Testing Your Scripts with PigUnit
- TextLoader function, Built-in Load and Store Functions
- TextMate syntax highlighting, Syntax Highlighting and Checking
- theta joins, cross
- threshold usage, Tune Pig and Hadoop for Your Job
- TOBAG function, Built-in complex type UDFs
- TOKENIZE function, Built-in chararray and bytearray UDFs
- TOMAP function, Built-in complex type UDFs
- TOP function, Built-in complex type UDFs
- TOTUPLE function, Built-in complex type UDFs
- TRIM function, Built-in chararray and bytearray UDFs
- trinary logic, Filter
- tuning Pig and Hadoop, Tune Pig and Hadoop for Your Job
- tuple data type, Tuple, Schemas, Interacting with Pig values, Python UDFs
- tuple projection operator (.), Expressions in foreach
- tuple TOTUPLE function, Built-in complex type UDFs
- TupleFactory class, Interacting with Pig values
- Turing Complete Pig, Embedding Pig Latin in Python
- turning off features, Debugging Tips
- typechecking, Input and Output Schemas, Overloading UDFs
- types, data, Types–Nulls, Python UDFs
U
- UCFIRST function, Built-in chararray and bytearray UDFs
- UDFContext class, UDFContext, Store Functions and UDFContext
- UDFs (User Defined Functions), Code Examples in This Book, UDFs in foreach, User Defined Functions, Registering UDFs–Registering Python UDFs, define and UDFs, Writing Your UDF to Perform, Writing an Evaluation Function in Java, Where Your UDF Will Run, Error Handling and Progress Reporting, Overloading UDFs, Built-in UDFs–Miscellaneous built-in UDF
- built-in, Built-in UDFs–Miscellaneous built-in UDF
- define and, define and UDFs
- error handling, Error Handling and Progress Reporting
- in foreach, UDFs in foreach
- naming, Writing an Evaluation Function in Java
- optimizing, Writing Your UDF to Perform
- overloading, Overloading UDFs
- registering, Registering UDFs–Registering Python UDFs
- where your UDF will run, Where Your UDF Will Run
- union operator, How Pig differs from MapReduce, union, Nonlinear Data Flows, Filter Early and Often, Determining the location
- UPPER function, Built-in chararray and bytearray UDFs
- User Defined Functions, UDFs in foreach (see UDFs)
- using clause (load function), Load
- using clause (store function), Store
- Utf8StorageConverter, Casting bytearrays
- utility methods, Utility Methods
V
- variables, binding multiple sets of, Binding Multiple Sets of Variables
- -version command-line option, Command-Line and Configuration Options
- version control with git, Downloading the Source
- version differences in Hadoop, Running Pig on Your Hadoop Cluster, Load
- file locations, Running Pig on Your Hadoop Cluster
- globs, Load
- version differences in Pig, Downloading the Pig Package from Apache, Running Pig Locally on Your Machine, Running Pig on Your Hadoop Cluster, Command-Line and Configuration Options, HDFS Commands in Grunt, HDFS Commands in Grunt, Map, Schemas, Schemas, Dump, Expressions in foreach, Parallel, User Defined Functions, User Defined Functions, Registering UDFs, Registering UDFs, Registering Python UDFs, Calling Static Java Functions, flatten, Joining skewed data, Joining sorted data, cross, mapreduce, Setting the Partitioner, Pig Latin Preprocessor, Macros, Including Other Pig Latin Scripts, illustrate, Pig Statistics, Debugging Tips, Testing Your Scripts with PigUnit, Project Early and Often, Data Layout Optimization, Embedding Pig Latin in Python, Writing Evaluation and Filter Functions, Writing Evaluation and Filter Functions, Input and Output Schemas, Loading the distributed cache, UDFContext, Python UDFs, Writing Load and Store Functions, Casting bytearrays, HBase, Built-in Evaluation and Filter Functions–Miscellaneous built-in UDF
- .. field range, Expressions in foreach
- built-in eval and filter functions, Built-in Evaluation and Filter Functions–Miscellaneous built-in UDF
- bytesToMap methods, Casting bytearrays
- column families, HBase
- data layout optimization, Data Layout Optimization
- dependencies inside Python scripts, Registering Python UDFs
- dump output, Dump
- EvalFunc, Loading the distributed cache
- flatten schema bug, flatten
- globs accepted by register, Registering UDFs
- Grunt command sh, HDFS Commands in Grunt
- hadoop fs shell commands, Running Pig on Your Hadoop Cluster, HDFS Commands in Grunt
- Hadoop requirements, Downloading the Pig Package from Apache
- handling of Java properties, Command-Line and Configuration Options
- HDFS paths for register, Registering UDFs
- illustrate, illustrate
- invoker methods, Calling Static Java Functions
- Java eval funcs, Writing Evaluation and Filter Functions
- joins, Joining skewed data, Joining sorted data
- load and store functions, Writing Load and Store Functions
- local mode execution, Running Pig Locally on Your Machine
- logical optimizer and plan, Debugging Tips, Project Early and Often
- macros, Macros
- map declared values, Map
- map schemas, Input and Output Schemas
- mapreduce command, mapreduce
- non-Java UDFs, User Defined Functions
- number of output records in a bag, cross
- parallel level, Parallel
- PigUnit, Testing Your Scripts with PigUnit
- preprocessor actions, Pig Latin Preprocessor, Including Other Pig Latin Scripts
- Python, Embedding Pig Latin in Python, Writing Evaluation and Filter Functions, Python UDFs
- runtime adaption code, Schemas
- setting the Partitioner, Setting the Partitioner
- summary statistics, Pig Statistics
- truncation and null padding, Schemas
- UDFContext class, UDFContext
- UDFs languages, User Defined Functions
- Vim syntax highlighting, Syntax Highlighting and Checking
W
- warn method, Error Handling and Progress Reporting
- web crawl, Embedding Pig Latin in Python–Utility Methods, Embedding Pig Latin in Python–Utility Methods
- calculating page rank from, Embedding Pig Latin in Python–Utility Methods
- data set, Embedding Pig Latin in Python–Utility Methods
- White, Tom, Tune Pig and Hadoop for Your Job, Overview of Hadoop
- word count example, MapReduce’s hello world
- writing MapReduce in Java, compared to Pig Latin, How Pig differs from MapReduce
- writing records, Writing records–Writing records
Y
- Yahoo!, Pig’s History
Get Programming Pig now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.