Exploring ML pipelines and DataFrames using logistic regression in Spark 2.0
We have gone out of our way to present the in detail and as as possible so you get started without the additional syntactic sugar that Scala uses.
Getting ready
In this recipe, we combine the ML pipelines and logistic regression to demonstrate how you can combine various steps in a single pipeline that operates on DataFrames as they get transformed and travel through the pipe. We skip some of the steps, such as splitting the data and model evaluation, and reserve them for later chapters to make the program shorter, but provide a full treatment of pipeline, DataFrame, estimators, and transformers in a single recipe.
This recipe explores the details of the pipeline and DataFrames as they travel through the pipeline and get operated on.
How to do it...
- Start a new project in IntelliJ or in an IDE of your choice. Make sure that the necessary JAR files are included.
- Set up the package location where the program will reside...