Packt+ | Advance your knowledge in tech

You're reading from Practical Predictive Analytics Analyse current and historical data to predict future trends using R, Spark, and more

Product type Paperback

Published in Jun 2017

Publisher Packt

ISBN-13 9781785886188

Length 576 pages

Edition 1st Edition

Languages

Tools

Splunk

Concepts

Predictive Analytics

Author (1):

Winters

View More author details

Table of Contents (19) Chapters

Title Page

Credits

About the Author

About the Reviewers

www.PacktPub.com

Customer Feedback

Preface

1. Getting Started with Predictive Analytics

2. The Modeling Process FREE CHAPTER

3. Inputting and Exploring Data

4. Introduction to Regression Algorithms

5. Introduction to Decision Trees, Clustering, and SVM

6. Using Survival Analysis to Predict and Analyze Customer Churn

7. Using Market Basket Analysis as a Recommender Engine

8. Exploring Health Care Enrollment Data as a Time Series

9. Introduction to Spark Using R

10. Exploring Large Datasets Using Spark

11. Spark Machine Learning - Regression and Cluster Models

12. Spark Models – Rule-Based Learning

Extracting the Pima Indians diabetes dataset

After running the following code, we will have the PimaIndiansDiabetes R dataframe loaded and we will run the usual str() and summary() functions. Note that we need to first install the mlbench package to retrieve the data that is contained within the package.

At this point, no Spark directives are being introduced. Even though we are running in a databricks environment, the code is pure R, and you can replicate this code in your regular R environment as well.

# load the library 
devtools::install_github("cran/mlbench") 
library(mlbench) 
data(PimaIndiansDiabetes) 
str(PimaIndiansDiabetes) 
summary(PimaIndiansDiabetes)

Examining the output

As usual, the str() and summary() functions will give you your first insights into the data. The outputs will appear in the console pane, which is typically right below the coding pane.

Note: not all output is shown.