Implementation steps
A good way to start is to download the skeleton SBT project archive file from the ModernScalaProjects_Code
folder.
The step-by-step instructions are as follows:
- EDA on the testing (cross-validation) dataset.
- Calculate the probability densities.
- Generate a fraud detection model.
- Generate scores that measure the accuracy of the model:
- Compute the best F1 score
- Compute the best error term
- Calculate outliers by repeatedly having the model generate predictions over each value of error term in a range.
We will create a FraudDetection
trait now.
Create the FraudDetection trait
In an empty FraudDetectionPipeline.scala
file, add in the following imports. These are imports that we need for Logging
, Feature Vector
creation, DataFrame
and SparkSession
respectively:
import org.apache.log4j.{Level, Logger} import org.apache.spark.ml.linalg.Vectors import org.apache.spark.sql.{DataFrame, SparkSession}
This is an all-important trait, holding a method for SparkSession
creation and other code....