Now that we have imported our data into Neo4j and we have a better idea of the graph structure, we can start thinking about the type of graph-based features we can create to improve our classification model. In this section, we are going to create them through the browser. We will then study how this step can be automated using the Neo4j Python driver.
As the previous figure shows, it seems the graph has a clear community structure and it makes sense to assume that users contributing to the same repositories are more connected to each other. It follows that using the result of a community-detection algorithm as a feature for our classifier may improve the classification performances.
Another piece of information that can be extracted from the graph is the node importance. Since our graph of users is very Neo4j-centric, it would be a weak hypothesis to consider that the Neo4j contributors are the most important nodes in terms of PageRank, for instance.
So...