Let's build a classifier using logistic regression:
- Load the dataset into a Dask DataFrame, as follows:
# Read CSV file using Dask
import dask.dataframe as dd
# Read Human Resource Data
ddf = dd.read_csv("HR_comma_sep.csv")
# Let's see top 5 records
ddf.head()
This results in the following output:
In the preceding code, we read the human resource CSV file using the read_csv() function into a Dask DataFrame. The preceding output only shows some of the columns that are available. However, you run the notebook for yourself, you will be able to see all the columns in the dataset. Now, let's scale the last_evalaution column (last evaluated performance score).
- Next, select the required column for classification and divide it into dependent and independent variables:
data=ddf[['satisfaction_level','last_evaluation']].to_dask_array(lengths=True)
label=ddf['left'].to_dask_array(lengths=True)
- Now, let's create a LogisticRegression...