Applying the logistic regression model
The stage is now set to apply the model to the dataframe.
Getting ready
This section will focus on applying a very common classification model called logistic regression, which will involve importing some of the following from Spark:
from pyspark.ml.feature import VectorAssembler from pyspark.ml.evaluation import BinaryClassificationEvaluator from pyspark.ml.classification import LogisticRegression
How to do it...
This section will walk through the steps of applying our model and evaluating the results.
- Execute the following script to lump all of the feature variables in the dataframe in a list called
features
:
features = df.columns[1:]
- Execute the following to import
VectorAssembler
and configure the fields that will be assigned to the feature vector by assigning theinputCols
andoutputCol
:
from pyspark.ml.feature import VectorAssembler feature_vectors = VectorAssembler( inputCols = features, outputCol = "features")
- Execute the following script to apply...