Chapter 8: Market Basket Analysis
Activity 18: Loading and Preparing Full Online Retail Data
Solution:
- Load the online retail dataset file:
import matplotlib.pyplot as plt import mlxtend.frequent_patterns import mlxtend.preprocessing import numpy import pandas online = pandas.read_excel( io="Online Retail.xlsx", sheet_name="Online Retail", header=0 )
- Clean and prep the data for modeling, including turning the cleaned data into a list of lists:
online['IsCPresent'] = ( online['InvoiceNo'] .astype(str) .apply(lambda x: 1 if x.find('C') != -1 else 0) ) online1 = ( online .loc[online["Quantity"] > 0] .loc[online['IsCPresent'] != 1] .loc[:, ["InvoiceNo", "Description"]] .dropna() ) invoice_item_list = [] for num in list(set(online1.InvoiceNo.tolist())): tmp_df = online1.loc[online1['InvoiceNo'] == num] tmp_items = tmp_df...