Taking a more fundamental approach to regularization with LARS
To borrow from Gilbert Strang's evaluation of the Gaussian elimination, LARS is an idea you probably would've considered eventually had it not already been discovered by Efron, Hastie, Johnstone, and Tibshirani in their work [1].
Getting ready
LARS is a regression technique that is well suited to high-dimensional problems, that is, p >> n, where p denotes the columns or features and n is the number of samples.
How to do it...
- First, import the necessary objects. The data we use will have 200 data points and 500 features. We'll also choose low noise and a small number of informative features:
from sklearn.datasets import make_regression reg_data, reg_target = make_regression(n_samples=200, n_features=500, n_informative=10, noise=2)
- Since we used 10 informative features, let's also specify that we want 10 nonzero coefficients in LARS. We will probably not know the exact number of informative features beforehand, but it's useful...