This chapter covers
- Key performance factors and bottlenecks in Giraph
- Optimal setups of the Hadoop cluster hosting Giraph
- Designing and implementing ad hoc data structures optimized for your algorithms
- Spilling excessing data to disk when necessary
- The different Giraph parameters and knobs
So far, you have seen how you can use Giraph to compute graph analytics on large graphs across hundreds of machines. You have been presented with Giraph’s architecture and the programming model that allows you to write programs that scale. It is neat to write programs with the vertex-centric programming model, without worrying about ...