Processing data as a chunk
The XDF file allows for processing tasks more easily by applying the operation chunk by chunk. You do not need to import the entire dataset into the memory to do the processing tasks. To speed up the algorithm or any processing task, the chunk by chunk operation is quite helpful. In this recipe, you will see the chunk by chunk operation.
Getting ready
The XDF file you have created in the previous recipe Creating an XDF file from CSV input in this chapter, contains the following nine variables:
YEAR
QUARTER
MONTH
DAY_OF_MONTH
DAY_OF_WEEK
ORIGIN
DEST
DEP_DELAY
ARR_DELAY
The objective of this recipe is to create a new binary variable binDelay
, representing an indicator variable; if the departure delay is positive, then this new variable will get a value of 1
, and 0
otherwise. Since you are going to use the XDF file for this operation, the task will be automatically split into chunks. You will need to load the RevoScaleR
library for this operation.
How to do it…
The following are...