Evaluating Yelp reviews
We read in the processed Yelp reviews using this script and print out some statistics of the data:
reviews <- read.csv("c:/Users/Dan/yelp_academic_dataset_review.csv")I usually take a look at some of the data once loaded to visually check that things are working as expected. We can do this with a head() function call:
head(reviews)
Summary data
All of the columns appear to be correctly loading. Now, we can look at summary statistics for the data:
summary(reviews)
There are several points in the summary worth noting:
- Some of the data points I had assumed would be just
TRUE/FALSE,0/1have ranges instead; for example,funnyhas a max value over 600;usefulhas a max 1100,coolhas 500. - All of the IDs (users, businesses) have been mangled. We could use the user file and the business file to come up with exact references.
- Star ratings are
1-5, as expected. However, the mean and median are about a4, which I take as many people only take the time to write good reviews.