Extracting the Pima Indians diabetes dataset
After running the following code, we will have the PimaIndiansDiabetes
R dataframe loaded and we will run the usual str()
and summary()
functions. Note that we need to first install the mlbench
package to retrieve the data that is contained within the package.
At this point, no Spark directives are being introduced. Even though we are running in a databricks environment, the code is pure R, and you can replicate this code in your regular R environment as well.
# load the library devtools::install_github("cran/mlbench") library(mlbench) data(PimaIndiansDiabetes) str(PimaIndiansDiabetes) summary(PimaIndiansDiabetes)
Examining the output
As usual, the str()
and summary()
functions will give you your first insights into the data. The outputs will appear in the console pane, which is typically right below the coding pane.
Note: not all output is shown.
Output from the str() function
- The
str()
function tells us that there are 768 observations and 9 variables...