This measure's mathematical definition is as follows:
O(A, B) = | A ∩ B | / min(|A|, |B|)
A ∩ B is the intersection between sets A and B (common elements) and |A| denotes the size of set A.
The GDS contains, in its alpha tier, a function that will allow us to test and understand this similarity measure. We use it in the following way:
RETURN gds.alpha.similarity.overlap(<set1>, <set2>) AS similarity
The following statement returns the result of O([1, 2, 3], [1, 2, 3, 4]):
RETURN gds.alpha.similarity.overlap([1,2,3], [1,2,3,4]) AS similarity
The intersection between sets [1, 2, 3] and [1, 2, 3, 4] contains the elements that are in both sets: 1, 2, and 3. It contains three elements, so its size is | A ∩ B | = 3. On the denominator, we need to find the size of the smallest set. In our case, the smallest set is [1, 2, 3] containing three elements. So the expected value for the overlapping similarity is 1, which is the value returned by the...