Packt+ | Advance your knowledge in tech

You're reading from scikit-learn Cookbook , Second Edition Over 80 recipes for machine learning in Python with scikit-learn

Product type Paperback

Published in Nov 2017

Publisher Packt

ISBN-13 9781787286382

Length 374 pages

Edition 2nd Edition

Languages

Python

Tools

Scikit-learn

Concepts

Machine Learning

Author (1):

Trent Hauck

View More author details

Table of Contents (19) Chapters

Title Page

Credits

About the Authors

About the Reviewer

www.PacktPub.com

Customer Feedback

Preface

1. High-Performance Machine Learning – NumPy FREE CHAPTER

2. Pre-Model Workflow and Pre-Processing

3. Dimensionality Reduction

4. Linear Models with scikit-learn

5. Linear Models – Logistic Regression

6. Building Models with Distance Metrics

7. Cross-Validation and Post-Model Workflow

8. Support Vector Machines

9. Tree Algorithms and Ensembles

10. Text and Multiclass Classification with scikit-learn

11. Neural Networks

12. Create a Simple Estimator

Regression metrics

Cross-validation with a regression metric is straightforward with scikit-learn. Either import a score function from sklearn.metrics and place it within a make_scorer function, or you could create a custom scorer for a particular data science problem.

Getting ready

Load a dataset that utilizes a regression metric. We will load the Boston housing dataset and split it into training and test sets:

from sklearn.datasets import load_boston
boston = load_boston()

X = boston.data
y = boston.target

from sklearn.model_selection import train_test_split, cross_val_score
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=7)

We do not know much about the dataset. We can try a quick grid search using a high variance algorithm:

from sklearn.neighbors import KNeighborsRegressor
from sklearn.model_selection import RandomizedSearchCV

knn_reg = KNeighborsRegressor()
param_dist = {'n_neighbors': list(range(3,20,1))}
rs = RandomizedSearchCV(knn_reg,param_dist...