In Chapter 8, Using Graph-Based Features in Machine Learning, we studied a dataset with the following columns:
If you followed Chapter 2, The Cypher Query Language, you have probably noticed the similarity between this dataset and the graph we studied in this chapter. Built from the GitHub public API, it contains data related to the Neo4j organization on GitHub:
- Its contributors
- The repositories those contributors contributed to
- The contributors to those new repositories
The graph schema is as follows:
Language and Document are some special nodes added by an NLP-based analysis of the repository's README. We will focus here on the User and Repository labels.
From this graph, we can build the data used in the preceding chapter using the following Cypher query:
MATCH (u:User)
OPTIONAL MATCH (u)-[:CONTRIBUTED_TO]->(r:Repository)<-[:OWNS]-(:User {login: "neo4j"})
WITH u, COLLECT(r) as rs
RETURN...