Building an RNN model for speech recognition
We will be using the free-spoken digits audio dataset from https://github.com/Jakobovski/free-spoken-digit-dataset/tree/master/recordings for our basic model. Download the data to any directory on your system. In the example code, replace the path referring to the .wav
file with the path you have copied the data to.
Note
Note that we have split the data into training data which includes 1,470 files and 30 for the test set.
Before we get into the details of the model itself, we will look at how to prepare it for the training. The most common preprocessing step used in practice is to transform the raw audio data into its frequency spectrum. The frequency spectrum or power spectrum is like a fingerprint for the data in which the raw audio is broken into constituent parts or frequencies. This representation helps in identifying which frequencies (high or low pitch) dominate (in power or energy) in the signal compared to others. We will now look at how...