Identifying the target variable of the logistic regression model
A logistic regression model operates as a classification algorithm aiming to predict a binary outcome. In this section, we will specify the best column within the dataset to predict whether an incoming call to the operator is related to fire or non-fire incidents.
Getting ready
We will visualize many of the data points in this section, which will require the following:
- Ensuring that
matplotlib
is installed by executingpip install matplotlib
at the command line. - Running
import matplotlib.pyplot as plt
as well as ensuring graphs are viewed within cells by running%matplotlib inline
.
Additionally, there will be some manipulation of functions within pyspark.sql
that requires importing functions as F
.
How to do it...
This section will walk through visualizing the data from the San Francisco Fire Department.
- Execute the following script to get a cursory identification of the unique values in the
Call Type Group
column:
df.select('Call Type...