Using classification models
We now have four models trained on our input labels and features. We will now see how to use these models to make predictions on our dataset. For now, we will use the same training data to illustrate the predict method of each model.
Generating predictions for the Kaggle/StumbleUpon evergreen classification dataset
We will use our logistic regression model as an example (the other models are used in the same way):
val dataPoint = data.first val prediction = lrModel.predict(dataPoint.features)
The following is the output:
prediction: Double = 1.0
We saw that, for the first data point in our training dataset, the model predicted a label of 1 (that is, evergreen). Let's examine the true label for this data point.
val trueLabel = dataPoint.label
You can see the following output:
trueLabel: Double = 0.0
So, in this case, our model got it wrong!
We can also make predictions in bulk by passing in an RDD[Vector]
as input:
val predictions = lrModel.predict(data.map(lp => lp...