550 Large Scale and Big Data
the large text data set. The rst word of each line in both types of le serves as the
join key. The map program emits the lines of the input large and small les. Each
line of the small le is labeled so that they can be distinguished from the map output.
In the reduce, the lines are checked to nd those with matched keys. If the lines from
both les are found to be matched, a Cartesian product is applied between the two
sets of lines with the same key to generate the output. Depending on the key distribu-
tion, the size of output data may vary. In the reduce program, assume there is λ lines
from the large le and μ lines from the small le. The result of Cartesian product
is λμ lines. Since μ ≤ 50 is very sma