Community features, either wcc or lv, are categorical features. Let's imagine that node A belongs to community 1, node B to community 2, and node C to community 35. We cannot assume that nodes A and B are more similar than nodes A and C because their community number is closer. We just know that nodes A and B do not belong to the same community, exactly like A and C or B and C.
One way of handling categorical features in machine learning is to transform them through a one-hot encoder. Its role is to transform a vector feature with N categories into N vectors, having values of either 0 or 1:
[ [
1, [1 0 0]
2, [0 1 0]
3, [0 0 1]
1, = [1 0 0]
1, [1 0 0]
3 [0 0 1]
] ]
However, since our wcc feature contains 129 unique values, this would add 129 features to our model. This is too many, especially considering that we only have a few hundred observations! To avoid trouble with dimensionality, we will only consider communities with at least two observations...