In this subsection, we will perform sentiment analysis and text classification based on TF-IDF. Here, TF-IDF is generated using the scikit-learn library. Let's see how we perform sentiment analysis using TF-IDF features using the following steps:
- Load the dataset:
The first step for building a machine learning model is to load the dataset.
Let's first read the data using the pandas read_csv() function:
# Import libraries
import pandas as pd
# read the dataset
df=pd.read_csv('amazon_alexa.tsv', sep='\t')
# Show top 5-records
df.head()
This results in the following output:
In the preceding output dataframe, we have seen that the Alexa review dataset has five columns: rating, date, variation, verified_reviews, and feedback.
- Feature generation using TfidfVectorizer:
Let's generate a TF-IDF matrix for the customer reviews using scikit-learn's TfidfVectorizer:
# Import TfidfVectorizer and RegexTokenizer
from...