Preparing feature variables for the logistic regression model
In the previous section, we identified our target variable that will be used as our predictor for fire calls in our logistic regression model. This section will focus on identifying all of the features that will best help the model identify what the target should be. This is known as feature selection.
Getting ready
This section will require importing StringIndexer
from pyspark.ml.feature
. In order to ensure proper feature selection, we will need to map string columns to columns of indices. This will help generate distinct numeric values for categorical variables that will provide ease of computation for the machine learning model to ingest the independent variables used to predict the target outcome.
How to do it...
This section will walk through the steps to prepare the feature variables for our model.
- Execute the following script to update the dataframe,
df
, by only selecting the fields that are independent of any fire indicators...