Understanding the logic of the word2vec model
We will start decomposing the word2vec model and try to understand the logic of it. word2vec is a piece of software and it uses a bunch of algorithms. Refer to Figure 6.7:

Figure 6.7: Word2vec building block (Image credit: Xin Rong)
As you can see in Figure 6.7, there are three main building blocks. We will examine each of them in detail:
- Vocabulary builder
- Context builder
- Neural network with two layers
Vocabulary builder
The vocabulary builder is the first building block of the word2vec model. It takes raw text data, mostly in the form of sentences. The vocabulary builder is used to build vocabulary from your given text corpus. It will collect all the unique words from your corpus and build the vocabulary.
In Python, there is a library called gensim
. We will use gensim
to generate word2vec for our corpus. There are some parameters available in gensim
that we can use to build vocabulary from our corpus as per your application needs. The parameter list...