FP-Growth algorithm
We will apply the FP-Growth algorithm to find frequently recommended movies.
The FP-Growth algorithm has been described in the paper by Han et al., Mining frequent patterns without candidate generation available at: http://dx.doi.org/10.1145/335191.335372, where FP stands for the frequent pattern. For given a dataset of transactions, the first step of FP-Growth is to calculate item frequencies and identify frequent items. The second step of FP-Growth algorithm implementation uses a suffix tree (FP-tree) structure to encode transactions; this is done without generating candidate sets explicitly, which are usually expensive to generate for large datasets.
FP-Growth Basic Sample
Let's start with a very simple dataset of random numbers:
val transactions = Seq( "r z h k p", "z y x w v u t s", "s x o n r", "x z y m t s q e", "z", "x z y r q t p") .map(_.split(" "))
We will find out the most frequent items (character in this case)...