Chapter 5: Mastering Structured Data
Activity 14: Training and Predicting the Income of a Person
Solution:
- Import the libraries and load the income dataset using pandas. First, import pandas and then read the data using
read_csv
.import pandas as pd import xgboost as xgb import numpy as np from sklearn.metrics import accuracy_score data = pd.read_csv("../data/adult-data.csv", names=['age', 'workclass', 'education-num', 'occupation', 'capital-gain', 'capital-loss', 'hours-per-week', 'income'])
The reason we are passing the names of the columns is because the data doesn't contain them. We do this to make our lives easy.
- Use Label Encoder from sklearn to encode strings. First, import
Label Encoder
. Then, encode all string categorical columns one by one.from sklearn.preprocessing import LabelEncoder data['workclass'] = LabelEncoder().fit_transform(data['workclass&apos...