Packt+ | Advance your knowledge in tech

You're reading from Hands-On Ensemble Learning with R A beginner's guide to combining the power of machine learning algorithms using ensemble techniques

Product type Paperback

Published in Jul 2018

Publisher Packt

ISBN-13 9781788624145

Length 376 pages

Edition 1st Edition

Languages

Concepts

Machine Learning

Author (1):

Tattar

View More author details

Table of Contents (17) Chapters

Hands-On Ensemble Learning with R

Contributors

Preface

1. Introduction to Ensemble Techniques FREE CHAPTER

2. Bootstrapping

3. Bagging

4. Random Forests

5. The Bare Bones Boosting Algorithms

6. Boosting Refinements

7. The General Ensemble Technique

8. Ensemble Diagnostics

9. Ensembling Regression Models

10. Ensembling Survival Models

11. Ensembling Time Series Models

12. What's Next?

Bibliography

Index

A

adabag packages
- using / Using the adabag and gbm packages
adaptive boosting / Adaptive boosting
Adaptive boosting algorithm
- about / Why does boosting work?
- working / Why does boosting work?
additive effect / Exponential smoothing state space model
advantages, extreme gradient boosting implementation
- parallel computing / The xgboost package
- regularization / The xgboost package
- cross-validation / The xgboost package
- pruning / The xgboost package
- missing values / The xgboost package
- saving and reloading / The xgboost package
- cross platform / The xgboost package
amyotrophic lateral sclerosis (ALS) / Squared-error loss function
area under curve (AUC) / Complementary statistical tests
auto-correlation function (ACF) / Core concepts and metrics
Auto-regressive Integrated Moving Average (ARIMA) models / Auto-regressive Integrated Moving Average (ARIMA) models

B

bagging
- comparing, with random forests / Comparing bagging, random forests, and boosting
- comparing, with boosting / Comparing bagging, random forests, and boosting
- for regression data / Bagging and Random Forests
bagging technique
- describing / Bagging and time series
board stiffness dataset / Board Stiffness
Boostap AGGregatING (bagging) / Bagging
boot package / The boot package
Bootstrap
- about / Bootstrap – a statistical method
- standard error of correlation coefficient / The standard error of correlation coefficient
- parametric bootstrap / The parametric bootstrap
- eigen values / Eigen values
- rule of thumb / Rule of thumb
bootstrap hypothesis testing problems / Bootstrap and testing hypotheses

C

Chi-square Automatic Interaction Detector (CHAID) / Random Forests
chi-square test / Chi-square and McNemar test
Classification and Regression Trees (CART) / Random Forests
- advanatges / Random Forests
- drawbacks / Random Forests
classification trees / Classification trees and pruning
class prediction / Class prediction
Cohen's statistic / Cohen's statistic
complementary statistical tests
- about / Complementary statistical tests
- permutation test / Permutation test
- chi-square test / Chi-square and McNemar test
- McNemar test / Chi-square and McNemar test
- ROC test / ROC test
complexity parameter (Cp) / Classification trees and pruning
contingency table
- about / Pairwise measure
correlation coefficient measure / Correlation coefficient measure
Cox proportional hazards models / Regression models – parametric and Cox proportional hazards models

D

data
- pre-processing / Pre-processing the housing data
- housing / Pre-processing the housing data
datasets
- about / Datasets
- hypothyroid datasets / Hypothyroid
- waveform datasets / Waveform
- German Credit / German Credit
- Iris / Iris
- Pima Indians Diabetes / Pima Indians Diabetes
- US Crime / US Crime
- Overseas Visitors / Overseas visitors
- Primary Biliary Cirrhosis / Primary Biliary Cirrhosis
- multishapes / Multishapes
- board stiffness dataset / Board Stiffness
- selecting / The right model dilemma!
decision tree
- about / Decision tree
- for hypothyroid classification / Decision tree for hypothyroid classification
disagreement measure / Disagreement measure
- for ensemble / Disagreement measure for ensemble
double-fault measure / Double-fault measure

E

ensemble
- need for / An ensemble purview
- disagreement measure / Disagreement measure for ensemble
ensemble diagnostics
- about / What is ensemble diagnostics?
ensemble diversity
- about / Ensemble diversity
- numeric prediction / Numeric prediction
- class prediction / Class prediction
ensemble survival models / Ensemble survival models
ensembling
- working / Why does ensembling work?
- by voting / Ensembling by voting
- by averaging / Ensembling by averaging
ensembling, by averaging
- about / Ensembling by averaging
- simple averaging / Simple averaging
- weight averaging / Weight averaging
ensembling, by voting
- majority voting / Majority voting
- weighted voting / Weighted voting
entropy measure / Entropy measure
Exponential Distribution / Core concepts of survival analysis
exponential models
- reference / Exponential smoothing state space model
exponential smoothing state space model / Exponential smoothing state space model

F

functional-delta theorem / Nonparametric inference

G

Gamma Distribution / Core concepts of survival analysis
gbm package
- about / The gbm package
- reference / The gbm package
- boosting, for count data / Boosting for count data
- boosting, for survival data / Boosting for survival data
gbm packages
- using / Using the adabag and gbm packages
general boosting algorithm / The general boosting algorithm
German Credit
- about / German Credit
- reference / German Credit
German credit dataset / Classification trees and pruning
gradient boosting algorithm
- about / Gradient boosting
- building, from scratch / Building it from scratch
- squared-error loss function / Squared-error loss function

H

h2o package
- about / The h2o package
- reference / The h2o package
hazards regression model / Regression models – parametric and Cox proportional hazards models
hypothyroid dataset
- about / Hypothyroid
- reference / Hypothyroid

I

interrater agreement
- about / Interrating agreement
- entropy measure / Entropy measure
- Kohavi-Wolpert measure / Kohavi-Wolpert measure
- measurement / Measurement of interrater agreement
Iris dataset
- about / Iris
iterative reweighted least squares (IRLS) algorithm / Logistic regression model

J

jackknife technique
- about / The jackknife technique
- for mean and variance / The jackknife method for mean and variance
- pseudovalues method for survival data / Pseudovalues method for survival data

K

k-NN bagging / k-NN bagging
k-NN classifier / k-NN classifier, Analyzing waveform data
Kaplan-Meier estimator / Nonparametric inference
Kohavi-Wolpert measure / Kohavi-Wolpert measure

L

linear regression model / Linear regression model
logistic regression model
- about / Logistic regression model
- for hypothyroid classification / Logistic regression for hypothyroid classification

M

McNemar test / Chi-square and McNemar test
memoryless property / Core concepts of survival analysis
metrics / Core concepts and metrics
missForest function
- reference / Missing data imputation
missing data
- handling, random forests used / Missing data imputation
modeling dilemma / The right model dilemma!
multishapes dataset / Multishapes
multivariate statistics / Visualization and variable reduction

N

Naïve Bayes classifier
- about / Naïve Bayes classifier
- for hypothyroid classification / Naïve Bayes for hypothyroid classification
Nelson-Aalen estimator / Nonparametric inference
neural networks
- about / Neural networks
- for hypothyroid classification / Neural network for hypothyroid classification
/ Neural networks
nonparametric inference / Nonparametric inference
number prediction / Numeric prediction

O

Overseas Visitors dataset
- about / Overseas visitors
- reference / Overseas visitors

P

pairwise measure
- about / Pairwise measure
- disagreement measure / Disagreement measure
- Yule's coefficient / Yule's or Q-statistic
- Q-statistic / Yule's or Q-statistic
- correlation coefficient measure / Correlation coefficient measure
- Cohen's statistic / Cohen's statistic
- double-fault measure / Double-fault measure
partial auto-correlation function (PACF)
- about / Core concepts and metrics
- reference / Core concepts and metrics
partial likelihood function / Regression models – parametric and Cox proportional hazards models
permutation test / Permutation test
Pima Indians Diabetes dataset / Pima Indians Diabetes
Primary Biliary Cirrhosis dataset
- about / Primary Biliary Cirrhosis
Principal Component Analysis (PCA) / Visualization and variable reduction
proximity plots
- using / Proximity plots
pruning / Classification trees and pruning

Q

Q-statistic / Yule's or Q-statistic

R

random forest
- used, for clustering / Clustering with Random Forest
Random Forest algorithm
- about / Random Forests
random forest nuances / Random Forest nuances
random forests
- comparing, with bagging / Comparisons with bagging
- used, for handling missing data / Missing data imputation
- used, for clustering / Clustering with Random Forest
Random Forests
- for regression data / Bagging and Random Forests
raters
- about / Pairwise measure
regression models
- bootstrapping / Bootstrapping regression models
- about / Regression models, Regression models – parametric and Cox proportional hazards models
- linear regression model / Linear regression model
- neural networks / Neural networks
- regression tree / Regression tree
- prediction / Prediction for regression models
- boosting / Boosting regression models
- stacking methods / Stacking methods for regression models
- Cox proportional hazards models / Regression models – parametric and Cox proportional hazards models
- hazards regression model / Regression models – parametric and Cox proportional hazards models
regression tree / Regression tree
residual bootstrapping method / Bootstrapping regression models
ROC test / ROC test

S

split function / Bagging and Random Forests
stack ensembling / Stack ensembling
stacking methods
- for regression models / Stacking methods for regression models
statistical/machine learning models
- about / Statistical/machine learning models
- logistic regression model / Logistic regression model
- neural networks / Neural networks
- Naïve Bayes classifier / Naïve Bayes classifier
- decision tree / Decision tree
- support vector machines / Support vector machines
support vector machines
- about / Support vector machines
- for hypothyroid classification / SVM for hypothyroid classification
survival analysis
- about / Core concepts of survival analysis
Survival Models
- bootstrapping / Bootstrapping survival models*
survival tree
- about / Survival tree

T

time series datasets
- about / Time series datasets
- AirPassengers dataset / AirPassengers
- co2 time series data / co2
- uspop / uspop
- gas time series data / gas
- car sales data / Car Sales
- austres time series dataset / austres
- WWWusage time series dataset / WWWusage
time series models
- bootstrapping / Bootstrapping time series models*
- about / Essential time series models
- Naïve forecasting / Naïve forecasting
- seasonal / Seasonal, trend, and loess fitting
- trend / Seasonal, trend, and loess fitting
- loess fitting / Seasonal, trend, and loess fitting
- exponential smoothing state space model / Exponential smoothing state space model
- Auto-regressive Integrated Moving Average (ARIMA) models / Auto-regressive Integrated Moving Average (ARIMA) models
- auto-regressive neural networks / Auto-regressive neural networks
- linear model (LM) / Messing it all up
- messing up / Messing it all up
- ensembling / Ensemble time series models
time series visualization / Time series visualization

U

US Crime dataset / US Crime

V

variable clustering / Variable clustering
variable importance
- for decision trees and random forests / Variable importance
/ Variable importance
variable reduction
- about / Visualization and variable reduction
- techniques / Visualization and variable reduction
visualization / Visualization and variable reduction