Parallel Processing


The first approach to managing a growing workload is to get a bigger computer. At some point the cost and size of the bigger computer becomes prohibitive. At that point the workload needs to be spread across multiple processors that run in parallel to each other. One approach to parallel processing is called the MPP approach. Note that parallel processing reduces the elapsed time of processing, not the total amount of processing that occurs. In Big Data, it is necessary to parse data before it can be used. Parsing repetitive data is usually simple and straightforward whereas parsing nonrepetitive data is anything but simple and straightforward.


repetitive unstructured data
nonrepetitive unstructured ...

Get Data Architecture: A Primer for the Data Scientist now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.