The Jaccard similarity definition is similar to the overlap similarity, except that the denominator contains the size of the union of sets A and B:
J(A, B) = | A ∩ B | / |A ∪ B|
Using the union of both sets in the denominator is useful for identifying the cases where a user would have contributed to a single repository, with many different contributors. With the overlap similarity, this user would have a similarity of 1 to all other contributors. With the Jaccard formula, the similarity will depend on the number of repositories each of the other users contributed to and will be equal to 1 only for the contributors that have contributed to that single repository as well.
Running the Jaccard similarity algorithm on this projected graph is as simple as this:
MATCH (u:User)
MATCH (v:User)
RETURN u, v, gds.alpha.similarity.jaccard(u, v) as score
You can check the similarity between systay and the other users in this graph and notice that, now, the similarity...