Following the exact same steps as in the previous section, using the most crowded Louvain communities and the PageRank score, we end up with the following final results for the decision tree classifier:
precision recall f1-score support False 0.91 0.99 0.95 128 True 0.97 0.75 0.84 51 accuracy 0.92 179 macro avg 0.94 0.87 0.90 179 weighted avg 0.93 0.92 0.92 179
The confusion matrix is reproduced here:
Our overall accuracy has jumped from 66% to 92%. Even more importantly, the algorithm is now able to correctly identify 38 users as having contributed to Neo4j, compared to only 9 with the non-graph features and 29 when using only the WCC information.
A feature importance study shows us that the most impactful feature in this model is the PageRank score, as shown in the following bar chart:
This means that our assumption about Neo4j contributors forming communities is not really reproduced by our graph. However, these users are clearly the most...