July 2017
Beginner to intermediate
418 pages
9h 46m
English
Big data comprises a huge amount of data distributed across a cluster of thousands (if not more) of machines. Building graphs based on this massive data has different challenges. Due to the vast amount of data involved, the data for the graph is distributed across a cluster of machines. Hence, in actuality, it's not a single node graph, and we have to build a graph that spans across a cluster of machines. A graph that spans across a cluster of machines would have vertices and edges spread across different machines, and this data in a graph won't fit into the memory of one single machine. Consider your friend's list on Facebook; some of your friend's data in your Facebook friend list graph might lie on different machines, ...