Creating sample data for toy analysis
If possible, use some of your own data for this book, but in the event you cannot, we'll learn how we can use scikit-learn to create toy data. scikit-learn's pseudo, theoretically constructed data is very interesting in its own right.
Getting ready
Very similar to getting built-in datasets, fetching new datasets, and creating sample datasets, the functions that are used follow the naming convention make_*
. Just to be clear, this data is purely artificial:
from sklearn import datasets datasets.make_*? datasets.make_biclusters datasets.make_blobs datasets.make_checkerboard datasets.make_circles datasets.make_classification ...
To save typing, import the datasets
module as d
, and numpy
as np
:
import sklearn.datasets as d import numpy as np
How to do it...
This section will walk you through the creation of several datasets. In addition to the sample datasets, these will be used throughout the book to create data with the necessary characteristics for the algorithms...