Loading the iris dataset
To perform machine learning with scikit-learn, we need some data to start with. We will load the iris dataset, one of the several datasets available in scikit-learn.
Getting ready
A scikit-learn program begins with several imports. Within Python, preferably in Jupyter Notebook, load the numpy, pandas, and pyplot libraries:
import numpy as np #Load the numpy library for fast array computations import pandas as pd #Load the pandas data-analysis library import matplotlib.pyplot as plt #Load the pyplot visualization library
If you are within a Jupyter Notebook, type the following to see a graphical output instantly:
%matplotlib inlineHow to do it...
- From the scikit-learn
datasetsmodule, access theirisdataset:
from sklearn import datasets iris = datasets.load_iris()
How it works...
Similarly, you could have imported the diabetes dataset as follows:
from sklearn import datasets #Import datasets module from scikit-learn diabetes = datasets.load_diabetes()
There! You've loaded diabetes using the load_diabetes() function of the datasets module. To check which datasets are available, type:
datasets.load_*?Once you try that, you might observe that there is a dataset named datasets.load_digits. To access it, type the load_digits() function, analogous to the other loading functions:
digits = datasets.load_digits()To view information about the dataset, type digits.DESCR.