Chapter 10 – Best Practices for Machine Learning and Feature Engineering
- What is the difference between feature engineering and feature selection?
Feature selection is a part of feature engineering.
- What is the difference between principal component analysis (PCA) and feature selection?
Feature selection takes the dataset and gives us the best set of features, while PCA is a dimensionality reduction method.
- How can we encode features like dates and hours?
One of the techniques is adding the (sine, cosine) transformation of the time of day variable.
- Why it is useful to print out training and testing accuracy?
It is useful to detect overfitting by comparing the two metrics.
- How can we deploy a machine learning model and use it in a product?
There are many ways to take a machine learning model to production, such as web services and containerization depending on your model (Online, offline? Deep learning, SVM, Naive Bayes?).
- Why does feature engineering take much more time than other steps?
Because analyzing, cleaning, and processing features takes more time than building the model.
- What is the role of a dummy variable?
A dummy variable is a numerical variable used in regression analysis to represent subgroups of the sample in your study. In research design, a dummy variable is often used to distinguish between different treatment groups.