Data transformation
Let's assume we are working on an ML model whose task is to predict employee attrition. Based on our business understanding, we might include some relevant variables that are necessary to create a good model. On the other hand, we might choose to discard some features, such as EmployeeID
, which carry no relevant information.
Note
Identifying the ID
columns is known as identifier detection. Identifier
columns don't add any information to a model in pattern detection and prediction. So, identifier
column detection functionality can be a part of the AutoML
package and we use it based on the algorithm or a task dependency.
Once we have decided on the fields to use, we may explore the data to transform certain features that aid in the learning process. The transformation adds some experience to the data, which benefits ML models. For example, an employee start date of 11-02-2018 doesn't provide any information. However, if we transform this feature to four attributes—date, day...