Packt+ | Advance your knowledge in tech

You're reading from Machine Learning Algorithms Popular algorithms for data science and machine learning

Product type Paperback

Published in Aug 2018

Publisher Packt

ISBN-13 9781789347999

Length 522 pages

Edition 2nd Edition

Languages

Python

Tools

Scikit-learn

Concepts

Data Science

Author (1):

Giuseppe Bonaccorso

View More author details

Table of Contents (24) Chapters

Title Page

Dedication

Packt Upsell

Contributors

Preface

1. A Gentle Introduction to Machine Learning FREE CHAPTER

2. Important Elements in Machine Learning

3. Feature Selection and Feature Engineering

4. Regression Algorithms

5. Linear Classification Algorithms

6. Naive Bayes and Discriminant Analysis

7. Support Vector Machines

8. Decision Trees and Ensemble Learning

9. Clustering Fundamentals

10. Advanced Clustering

11. Hierarchical Clustering

12. Introducing Recommendation Systems

13. Introducing Natural Language Processing

14. Topic Modeling and Sentiment Analysis in NLP

15. Introducing Neural Networks

16. Advanced Deep Learning Models

17. Creating a Machine Learning Architecture

1. Other Books You May Enjoy

Leave a review - let other readers know what you think

Index

Managing categorical data

In many classification problems, the target dataset is made up of categorical labels that cannot immediately be processed by every algorithm. An encoding is needed, and scikit-learn offers at least two valid options. Let's consider a very small dataset made of 10 categorical samples with 2 features each:

import numpy as np

X = np.random.uniform(0.0, 1.0, size=(10, 2))
Y = np.random.choice(('Male', 'Female'), size=(10))

print(X[0])
array([ 0.8236887 ,  0.11975305])

print(Y[0])
'Male'

The first option is to use the LabelEncoder class, which adopts a dictionary-oriented approach, associating to each category label a progressive integer number, that is, an index of an instance array called classes_:

from sklearn.preprocessing import LabelEncoder

le = LabelEncoder()
yt = le.fit_transform(Y)

print(yt)
[0 0 0 1 0 1 1 0 0 1]

le.classes_array(['Female', 'Male'], dtype='|S6')

The inverse transformation can be obtained in this simple way:

output = [1, 0, 1, 1, 0, 0]
decoded_output...

The rest of the chapter is locked

You're reading from Machine Learning Algorithms Popular algorithms for data science and machine learning

Table of Contents (24) Chapters

Managing categorical data

Other recommended products

Personalised recommendations for you

You're reading from Machine Learning Algorithms Popular algorithms for data science and machine learning

Table of Contents (24) Chapters

Managing categorical data

Unlock this book and the full library FREE for 7 days

Other recommended products

Personalised recommendations for you