Glossary
- accessible
-
In the context of a computing cluster, a node is accessible if it is reachable through the network. In other contexts, a tool or library is accessible if it easily accessed and understandable to particular groups.
- accumulator
-
A shared variable to which only associative operations might be applied, like addition (particularly in Spark, called counters in MapReduce). Because associative operations are order independent, accumulators can stay consistent in a distributed environment, no matter the order of operations.
- actions and transformations
- agent
-
Services, usually background processes, that run routinely on the behalf of a user, performing tasks independently. Flume agents are the building blocks of data flows, which ingest and wrangle data from a source to a channel and eventually a sink.
- anonymous functions
-
A function that is not specified by an identifier (variable name). These functions are typically constructed at runtime and passed as arguments to higher-order functions. They can also be used to easily create closures. Anonymous functions are passed to Spark operations to define their behavior. See also closure and lambda function.
- application programming interface (API)
-
A collection of routines, protocols, or interfaces that specify how software components should interact. The MapReduce API specifies interfaces for constructing
Mapper,Reducer, andJobsubclasses that define MapReduce behavior. Similarly, ...