Introduction to scikit-learn
Entire books have been written on scikit-learn (http://scikit-learn.org/stable/). The scikit-learn library has numerous submodules. Only a few of these submodules will be used in this book (in Chapter 7, Making Predictive Models in Healthcare). These include the sklearn.linear_model
and sklearn.ensemble
submodules, for example. Here we will give an overview of some of the more commonly used submodules. For convenience, we have grouped the relevant modules into various segments of the data science pipeline discussed in Chapter 1, Introduction to Healthcare Analytics.
Sample data
scikit-learn includes several sample datasets in the sklearn.datasets
submodule. At least two of these datasets, sklearn.datasets.load_breast_cancer
and sklearn.datasets.load_diabetes
, are healthcare-related. These datasets have been already preprocessed and are small in size, spanning only dozens of features and hundreds of patients. The data we will use in Chapter 7, Making Predictive...