LSTM
RNNs are quite popular in building real-world applications like language translation, text classification and many more sequential problems, but in reality, we rarely would use a vanilla version of RNN which we saw in the previous section. The vanilla version of RNN has problems like vanishing gradients and gradient explosion when dealing with large sequences. In most of the real-world problems, variants of RNN such as LSTM or GRU are used, which solve the limitations of plain RNN and also have the ability to handle sequential data better. We will try to understand what happens in LSTM, and build a network based on LSTM to solve the text classification problem on the IMDB
datasets.
Long-term dependency
RNNs, in theory, should learn all the dependency required from the historical data to build a context of what happens next. Say, for example, we are trying to predict the last word in the sentence the clouds are in the sky. RNN would be able to predict it, as the information (clouds) is...