Learning visibility
There are lots of great data science algorithms that one can use to solve problems in different domains, but the key component that makes the learning process visible is having enough data. You might ask how much data is needed for the learning process to be visible and worth doing. As a rule of thumb, researchers and machine learning practitioners agree that you need to have data samples at least 10 times the number of degrees of freedom in your model.
For example, in the case of linear models, the degree of freedom represents the number of features that you have in your dataset. If you have 50 explanatory features in your data, then you need at least 500 data samples/observations in your data.
Breaking the rule of thumb
In practice, you can get away with this rule and do learning with less than 10 times the number of features in your data; this mostly happens if your model is simple and you are using something called regularization (addressed in the next chapter).
Jake...