Motivation of word vectors
Similar to the work we did in the previous chapter, traditional NLP approaches rely on converting individual words--which we created via tokenization--into a format that a computer algorithm can learn (that is, predicting the movie sentiment). Doing this required us to convert a single review of N tokens into a fixed representation by creating a TF-IDF matrix. In doing so, we did two important things behind the scenes:
- Individual words were assigned an integer ID (for example, a hash). For example, the word friend might be assigned to 39,584, while the word bestie might be assigned to 99,928,472. Cognitively, we know that friend is very similar to bestie; however, any notion of similarity is lost by converting these tokens into integer IDs.
- By converting each token into an integer ID, we consequently lose the context with which the token was used. This is important because in order to understand the cognitive meaning of words, and thereby train a computer to learn...