Part 3. Over the arc
Part 3 covers the missing pieces and documentation. In chapter 8, you’ll see algorithms you might expect to be part of the GraphX API but that aren’t as of Spark 1.6. From reading standard RDF format graph data to merging graphs, the algorithms in chapter 8 plug some of those holes.
Chapter 8 also covers how to use IndexedRDD, which is like the HashMap of RDDs. We go through an example showing how it can speed up performance.
Finally, you’ll see an example of identifying likely missing data from Wikipedia using ideas from graph isomorphisms—finding pieces of graphs that are similar to each other.
Chapter 9 is all about putting GraphX into production and doing debugging and performance tuning. It steps you through tools ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access