Vectorizing text
Machine learning models, including deep neural networks, take numeric information in and produce numeric output. The challenge with natural language processing then becomes, naturally, converting words to numbers.
There are a variety of ways that we can convert words to numbers. All of these methods satisfy the same goal, converting some sequence of words into a numeric vector. Some methods work better than others because, sometimes, when we make this conversion, we can lose some meaning in the translation.
NLP terminology
Let's start with by defining a few common terms, so that we remove any ambiguity their use might cause. I know that, since you can read, you likely have some understanding of these terms. I apologize if this seems pedantic, but I do promise it will immediately relate to the models we talk about next:
- Words: The atomic element of most of the systems we will be using. While some character level models do exist, we won't be talking about them today.
- Sentence:...