The first approach to managing a growing workload is to get a bigger computer. At some point the cost and size of the bigger computer becomes prohibitive. At that point the workload needs to be spread across multiple processors that run in parallel to each other. One approach to parallel processing is called the MPP approach. Note that parallel processing reduces the elapsed time of processing, not the total amount of processing that occurs. In Big Data, it is necessary to parse data before it can be used. Parsing repetitive data is usually simple and straightforward whereas parsing nonrepetitive data is anything but simple and straightforward.
Get Data Architecture: A Primer for the Data Scientist now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.