We are going to use the Neo4j community GitHub graph, containing GitHub users characterized by a login, the repositories owned by Neo4j, and relationships of type CONTRIBUTED_TO between users and the repositories they contributed to. If you have not yet built the graph from previous parts of this book, data and loading instructions are available in this book's repository on GitHub.
The first step to use similarity algorithms is to build a set of data associated with each user:
MATCH (user:User)-[:CONTRIBUTED_TO]->(repo:Repository)
WITH {item: user.login, categories: collect(repo.name)} as userData
RETURN userData
userData contains the following content, for a given user with login j:
{
"item": "j",
"categories": [
"cypher-shell",
"neo4j-ogm",
"docker-neo4j",
"doctools"
]
}
Computing the similarity between two users in that case means comparing the repositories...