Binary classification model evaluation using Spark 2.0
In this recipe, we demonstrate the use of the BinaryClassificationMetrics
facility in Spark 2.0 and its application to evaluating a model that has a outcome (for example, a logistic regression).
The purpose here is not to showcase the regression itself, but to demonstrate how to go about evaluating it using common metrics such as receiver operating characteristic (ROC), Area Under ROC Curve, thresholds, and so on.
We recommend that you concentrate on step 8 since we cover regression in detail in Chapter 5, Practical Machine Learning with Regression and Classification in Spark 2.0 - Part I and Chapter 6, Practical Machine Learning with Regression and Classification in Spark 2.0 - Part II.
How to do it...
- Start a new project in IntelliJ or in an IDE of your choice. Make sure that the necessary JAR files are included.
- Set up the package location where the program will reside:
package spark.ml.cookbook.chapter4
- Import the necessary packages...