Linear regression with larger data (rxFastLiner)
Linear regression is one of the most popular algorithms to predict the numeric outcome based on observed features. The default implementation in R for the linear regression is the lm()
function. For a larger dataset with a large number of variables, this could take a very long time to run. The rxFastLinear()
function for the RevoScaleR
library offers a very fast implementation of linear regression with a larger dataset with many variables. In this recipe, you will build a linear regression model to predict arrival delay time as a function of the origin and destination airport along with the departure delay and the day of the week.
Getting ready
To build a linear regression model to predict the arrival delay time, you will need to have the RevoScaleR
library. The dataset for this recipe will be the XDF file containing the following variables:
YEAR
QUARTER
MONTH
DAY_OF_MONTH
DAY_OF_WEEK
ORIGIN
DEST
DEP_DELAY
ARR_DELAY
The objective is to build a linear...