Simulating the sales data
Enough concepts; let's start programming. To get a clear idea of where we're heading, we start by initializing the sales
data frame we will be using, with zero observations for now. We do so by defining the available categories for each factor variable, and defining empty values with the data type we need for each variable. As you can see, it has the identifiers SALE_ID
and CLIENT_ID
, which will allow us to link this data with the one from clients
and client_messages
. To understand this, let's have a look at the following code:
status_levels <- c("PENDING", "DELIVERED", "RETURNED", "CANCELLED") protein_source_levels <- c("BEEF", "FISH", "CHICKEN", "VEGETARIAN") continent_levels <- c("AMERICA", "EUROPE", "ASIA") delivery_levels <- c("IN STORE", "TO LOCATION") paid_levels <- c("YES", "NO") sales <- data.frame( SALE_ID = character(), CLIENT_ID = character(), DATE = as.Date(character()), QUANTITY = integer(), COST = numeric(), ...