Comparing computation time with data frame and XDF
Computation time is one of the important things to consider while doing big data analytics. The efficiency of the algorithm is assessed by the computation time along with other parameters. The objective of using an XDF file instead of the default R data frame is to achieve high speed computation. In this recipe, you will compare the performance in terms of computation time using the default data frame and the XDF file.
Getting ready
Suppose you have a dataset stored in two different formats. The first one is an CSV file containing nine variables, and the other one is the XDF file containing the same variables. The following are the variable names:
YEAR
QUARTER
MONTH
DAY_OF_MONTH
DAY_OF_WEEK
ORIGIN
DEST
DEP_DELAY
ARR_DELAY
The objective is to calculate mean departure delay for each combination of origin and destination airports. The required library for this recipe is RevoScaleR
.
How to do it…
The following are the steps to calculate the processing time...