Packt+ | Advance your knowledge in tech

You're reading from scikit-learn Cookbook , Second Edition Over 80 recipes for machine learning in Python with scikit-learn

Product type Paperback

Published in Nov 2017

Publisher Packt

ISBN-13 9781787286382

Length 374 pages

Edition 2nd Edition

Languages

Python

Tools

Scikit-learn

Concepts

Machine Learning

Author (1):

Trent Hauck

View More author details

Table of Contents (19) Chapters

Title Page

Credits

About the Authors

About the Reviewer

www.PacktPub.com

Customer Feedback

Preface

1. High-Performance Machine Learning – NumPy FREE CHAPTER

2. Pre-Model Workflow and Pre-Processing

3. Dimensionality Reduction

4. Linear Models with scikit-learn

5. Linear Models – Logistic Regression

6. Building Models with Distance Metrics

7. Cross-Validation and Post-Model Workflow

8. Support Vector Machines

9. Tree Algorithms and Ensembles

10. Text and Multiclass Classification with scikit-learn

11. Neural Networks

12. Create a Simple Estimator

Loading the iris dataset

To perform machine learning with scikit-learn, we need some data to start with. We will load the iris dataset, one of the several datasets available in scikit-learn.

Getting ready

A scikit-learn program begins with several imports. Within Python, preferably in Jupyter Notebook, load the numpy, pandas, and pyplot libraries:

import numpy as np    #Load the numpy library for fast array computations
import pandas as pd   #Load the pandas data-analysis library
import matplotlib.pyplot as plt   #Load the pyplot visualization library

If you are within a Jupyter Notebook, type the following to see a graphical output instantly:

%matplotlib inline

How to do it...

From the scikit-learn datasets module, access the iris dataset:

from sklearn import datasets
iris = datasets.load_iris()

How it works...

Similarly, you could have imported the diabetes dataset as follows:

from sklearn import datasets  #Import datasets module from scikit-learn
diabetes = datasets.load_diabetes()

There! You've loaded diabetes using the load_diabetes() function of the datasets module. To check which datasets are available, type:

datasets.load_*?

Once you try that, you might observe that there is a dataset named datasets.load_digits. To access it, type the load_digits() function, analogous to the other loading functions: