Computing the shortest distance between nodes is a very basic approach and it does not take into account possible connections between nodes. An elaborated version, known as the Katz index, consists of summing all paths between two nodes, with increasing lengths. It is computed using the following formula:
score(u, v) = ∑l βl p(u, v; l)
Let's look at each of the components in detail:
- l is the path length, from 1 to ∞.
- p(u, v; l) is the number of paths of length l between nodes u and v.
- β is a weight parameter whose value is chosen between 0 and 1 in order to give more weights to the shortest distances.
In practice, the adjacency matrix is used to compute this score, since it can be shown that the number of paths of length l between u and v is equal to the uv element of the adjacency matrix to the power l:
score(u, v) = ∑l βl A(u, v)l
The Katz index has been shown to be among the best-performing, path-based link prediction...