Massaging your data
Given different problems, the minimum requirements to successfully apply deep learning vary. Unlike benchmark datasets, such as MNIST or CIFAR-10, real-world data is messy and evolving. However, data is the foundation of every machine learning-based application. With higher quality data or features, even fairly simple models may provide better and faster results. For deep learning, similar rules apply. In this section, we will introduce some common good practices that you can do to prepare your data.
Data cleaning
Before jumping into training, it’s necessary to do some data cleaning, such as removing any corrupted samples. For example, we can remove short texts, highly distorted images, spurious output labels, features with lots of null values, and so on.
Data augmentation
Deep learning requires a large corpus of training data in order to effectively learn, but sometimes, collecting such data can be very expensive and unrealistic. One way to help is to do data augmentation...