A bidimensional example
Let's consider a small dataset built by adding some uniform noise to the points belonging to a segment bounded between -6 and 6. The original equation is y = x + 2 + η, where η is a noise term.
In the following graph, there's a plot with a candidate regression function:

A simple bidimensional dataset with a candidate regression line
The dataset is defined as follows:
import numpy as np nb_samples = 200 X = np.arange(-5, 5, 0.05) Y = X + 2 Y += np.random.normal(0.0, 0.5, size=nb_samples)
As we're working on a plane, the regressor we're looking for is a function of only two parameters (the intercept and the only multiplicative coefficient) with an additive random normal noise term that is associated with every data point xi (formally, all ηi are independent and identically distributed (i.i.d) variables):

To fit our model, we must find the best parameters and we start with an Ordinary Least Squares (OLS) approach based on the known data points (xi, yi). The cost function...