Workflow of a machine learning project
In this section, we will formalize a solution framework that can be used to solve any machine learning problem by bringing together the problem statement, evaluation, feature engineering, and avoidance of overfitting.
Problem definition and dataset creation
To define the problem, we need two important things; namely, the input data and the type of problem.
What will be our input data and target labels? For example, say we want to classify restaurants based on their speciality—say Italian, Mexican, Chinese, and Indian food—from the reviews given by the customers. To start working with this kind of problem, we need to manually hand annotate the training data as one of the possible categories before we can train the algorithm on it. Data availability is often a challenging factor at this stage.
Identifying the type of problem will help in deciding whether it is a binary classification, multi-classification, scalar regression (house pricing), or vector regression...