Summary
In this chapter, we have seen how to put large-scale graph analytics in practice using Spark GraphX. Modeling entity relationships as graphs with vertices and edges is a powerful paradigm to assess many interesting problems.
In GraphX, graphs are finite, directed property graphs, potentially with multiple edges and loops. GraphX does graph analytics on highly optimized versions of vertex and edge RDDs, which allows you to leverage both data and graph-parallel applications. We have seen how such graphs can be read by either loading them from edgeListFile
or constructing them individually from other RDDs. On top of that, we have seen how easy it is to create both random and deterministic graph data for quick experiments. Using just the rich built-in functionality of the Graph
model, we have shown how to investigate a graph for core properties. To visualize more complex graphs, we introduced Gephi and an interface to it, which allows one to gain intuition about the graph structure at...