In this subsection, we will perform sentiment analysis and text classification based on BoW. Here, a bag of words is generated using the scikit-learn library. Let's see how we perform sentiment analysis using BoW features in the following steps:
- Load the dataset:
The first step to build a machine learning model is to load the dataset. Let's first read the data using the pandas read_csv() function:
# Import libraries
import pandas as pd
# read the dataset
df=pd.read_csv('amazon_alexa.tsv', sep='\t')
# Show top 5-records
df.head()
This results in the following output:
In the preceding output dataframe, we have seen that the Alexa review dataset has five columns: rating, date, variation, verified_reviews, and feedback.
- Explore the dataset.
Let's plot the feedback column count to see how many positive and negative reviews the dataset has:
# Import seaborn
import seaborn as sns
import matplotlib.pyplot as plt
# Count...