Visualize the word embeddings using t-SNE
Let's visualize the word embeddings that we generated in the previous section. The t-SNE is the most popular method to display high-dimensional data in two-dimensional spaces. We shall use the method from the scikit-learn library and reuse the code given in TensorFlow documentation to draw a graph of the word embeddings we just learned.
Note
The original code from the TensorFlow documentation is available at the following link: https://github.com/tensorflow/tensorflow/blob/r1.3/tensorflow/examples/tutorials/word2vec/word2vec_basic.py.
Here is how we implement the procedure:
- Create the
tsne
model:
tsne = TSNE(perplexity=30, n_components=2, init='pca', n_iter=5000, method='exact')
- Limit the number of embeddings to display to 500, otherwise, the graph becomes very unreadable:
n_embeddings = 500
- Create the low-dimensional representation by calling the
fit_transform()
method on thetsne
model and passing the firstn_embeddings
offinal_embeddings
...