Word Representation in Vector Space
This section will cover the different architectures for computing a continuous vector representation of words from a corpus. These representations will depend on the similarity of words, in terms of meaning. Also, there will be an introduction to a new Python library (Gensim) to do this task.
Word Embeddings
Word embeddings are a collection of techniques and methods to map words and sentences from a corpus and output them as vectors or real numbers. Word embeddings generate a representation of each word in terms of the context in which the word appears. The main task of word embeddings is to perform a dimension reduction from a space with one dimension per word to a continuous vector space.
To better understand what that means, let's have a look at an example. Imagine we have two similar sentences, such as these:
- I am good.
- I am great.
Now, encoding these sentences as one-hot vectors, we have something like this:
- I [1,0,0,0] ...