In order to create a dataset for a link prediction task, we need to do the following:
- Compute the score for each pair of nodes in the graph, using the KNOWS_T1 relationships only.
- Discard the pairs of nodes already linked to each other at t1.
- Extract the label for each of the remaining pairs; the label is True if a relationship between the two nodes exists at time t2, otherwise False.
The following query performs these three operations:
MATCH (u)
MATCH (v)
// take only one link from undirected graph
WHERE u.id < v.id // exclude u = v
// exclude edges that were already there at T1:
AND NOT ( (u)-[:KNOWS_T1]-(v) )
// compute score
WITH u, v, gds.alpha.linkprediction.adamicAdar(
u, v, {
relationshipQuery: "KNOWS_T1",
direction: "BOTH"
}
) as score
RETURN u.id as u_id,
v.id as v_id,
score,
// get the label: does the edge exist...