One-hot encoding transforms the categorical column into labels and splits the column into multiple columns. The numbers are replaced by binary values such as 1s or 0s. For example, let's say that, in the color variable, there are three categories; that is, red, green, and blue. These three categories are labeled and encoded into binary columns, as shown in the following diagram:
One-hot encoding can also be performed using the get_dummies() function. Let's use the get_dummies() function as an example:
# Read the data
data=pd.read_csv('employee.csv')
# Dummy encoding
encoded_data = pd.get_dummies(data['gender'])
# Join the encoded _data with original dataframe
data = data.join(encoded_data)
# Check the top-5 records of the dataframe
data.head()
This results in the following output:
Here, we can see two extra columns, F and M. Both columns are dummy columns that were added by the Boolean encoder. We can also perform the same task with OneHotEncoder...