Working with bag-of-words embeddings
In this section, we will start by showing you how to work with a bag-of-words embedding in TensorFlow. This mapping is what we introduced in the introduction. Here, we will show you how to use this type of embedding for spam prediction.
Getting ready
To illustrate how to use bag-of-words with a text dataset, we will use a spam-ham phone text database from the UCI machine learning data repository (https://archive.ics.uci.edu/ml/datasets/SMS+Spam+Collection). This is a collection of phone text messages that are spam or not-spam (ham). We will download this data, store it for future use, and then proceed with the bag-of-words method to predict if a text is spam or not. The model that will operate on the bag-of-words algorithm will be a logistic model with no hidden layers. We will use stochastic training, with a batch size of 1, and compute the accuracy on a held-out test set at the end.
How to do it...
For this example, we will start by getting the data, normalizing...