Packt+ | Advance your knowledge in tech

You're reading from Machine Learning with R Expert techniques for predictive modeling to solve all your data analysis problems

Product type Paperback

Published in Jul 2015

Publisher Packt

ISBN-13 9781784393908

Length 452 pages

Edition 2nd Edition

Languages

Concepts

Machine Learning

Author (1):

Brett Lantz

View More author details

Table of Contents (19) Chapters

Machine Learning with R Second Edition

Credits

About the Author

About the Reviewers

www.PacktPub.com

Preface

1. Introducing Machine Learning FREE CHAPTER

2. Managing and Understanding Data

3. Lazy Learning – Classification Using Nearest Neighbors

4. Probabilistic Learning – Classification Using Naive Bayes

5. Divide and Conquer – Classification Using Decision Trees and Rules

6. Forecasting Numeric Data – Regression Methods

7. Black Box Methods – Neural Networks and Support Vector Machines

8. Finding Patterns – Market Basket Analysis Using Association Rules

9. Finding Groups of Data – Clustering with k-means

10. Evaluating Model Performance

11. Improving Model Performance

12. Specialized Machine Learning Topics

Index

A

abstraction / Abstraction
activation function / From biological to artificial neurons
- about / Activation functions
- threshold activation function / Activation functions
- unit step activation function / Activation functions
- sigmoid activation function / Activation functions
AdaBoost
- about / Boosting
AdaBoost.M1 algorithm / Boosting
adaptive boosting
- about / Boosting the accuracy of decision trees, Boosting
allocation function / Understanding ensembles
Apache Hadoop
- about / Parallel cloud computing with MapReduce and Hadoop
Application Programming Interfaces (APIs)
- about / Parsing JSON from web APIs
Apriori
- property / The Apriori algorithm for association rule learning
Apriori algorithm
- for association rule learning / The Apriori algorithm for association rule learning
- strengths / The Apriori algorithm for association rule learning
Apriori principle
- used, for building set of rules / Building a set of rules with the Apriori principle
Artificial Neural Network (ANN)
- about / Understanding neural networks
association rules
- about / Understanding association rules
- potential applications / Understanding association rules
- rule interest, measuring / Measuring rule interest – support and confidence
- set of rules, building with Apriori principle / Building a set of rules with the Apriori principle
- frequently purchased groceries, identifying with / Example – identifying frequently purchased groceries with association rules
automated parameter tuning
- caret package used for / Using caret for automated parameter tuning
- requisites / Using caret for automated parameter tuning
axon
- about / From biological to artificial neurons

B

backpropagation
- neural networks, training with / Training neural networks with backpropagation
- about / Training neural networks with backpropagation
bag-of-words / Step 2 – exploring and preparing the data
bagging
- about / Bagging
bank loans example, with C5.0 decision trees
- data, collecting / Step 1 – collecting data
- data, exploring / Step 2 – exploring and preparing the data
- data, preparing / Step 2 – exploring and preparing the data
- random training, creating / Data preparation – creating random training and test datasets
- test datasets, creating / Data preparation – creating random training and test datasets
- model, training on data / Step 3 – training a model on the data
- model performance, evaluating / Step 4 – evaluating model performance
- model performance, improving / Step 5 – improving model performance
Bayesian methods
- basic concepts / Basic concepts of Bayesian methods
Bayesian methods,basics concepts
- joint probability / Understanding joint probability
- conditional probability / Computing conditional probability with Bayes' theorem
Bayesian methods, basics concepts
- probability / Understanding probability
Beowulf cluster
- about / Working in parallel with multicore and snow
betweenness centrality
- about / Analyzing and visualizing network data
bias / The case of linearly separable data
bias-variance tradeoff / Choosing an appropriate k
biglm package
- regression models, building / Building bigger regression models with biglm
bigmemory package
- massive matrices, using with / Using massive matrices with bigmemory
- URL / Using massive matrices with bigmemory
bigrf package
- random forests, building / Growing bigger and faster random forests with bigrf
- URL / Growing bigger and faster random forests with bigrf
bimodal / Measuring the central tendency – the mode
binning
- about / Using numeric features with Naive Bayes
bins
- about / Using numeric features with Naive Bayes
Bioconductor
- about / Analyzing bioinformatics data
- URL / Analyzing bioinformatics data
bioinformatics
- about / Analyzing bioinformatics data
bioinformatics data
- analyzing / Analyzing bioinformatics data
bivariate relationships
- about / Exploring relationships between variables
blind tasting experience example / The k-NN algorithm
blowby / Simple linear regression
body mass index (BMI) / Step 1 – collecting data
boosting
- about / Boosting
bootstrap aggregating
- about / Bagging
bootstrap sampling / Bootstrap sampling
box-and-whiskers plot / Visualizing numeric variables – boxplots
branches
- about / Understanding decision trees
breast cancer
- diagnosing, with k-NN algorithm / Example – diagnosing breast cancer with the k-NN algorithm
breast cancer example
- data, collecting / Step 1 – collecting data
- data, exploring / Step 2 – exploring and preparing the data
- data, preparing / Step 2 – exploring and preparing the data
- model, training on data / Step 3 – training a model on the data
- model performance, evaluating / Step 4 – evaluating model performance
- model performance, improving / Step 5 – improving model performance

C

C5.0 algorithm
- about / The C5.0 decision tree algorithm
- split, selecting / Choosing the best split
- decision tree, pruning / Pruning the decision tree
caret package
- using, for automated parameter tuning / Using caret for automated parameter tuning
- URL / Using caret for automated parameter tuning, Training and evaluating models in parallel with caret
- used, for evaluating models in parallel / Training and evaluating models in parallel with caret
categorical / Types of input data
categorical variables
- about / Exploring categorical variables
- central tendency, measuring / Measuring the central tendency – the mode
cell body / From biological to artificial neurons
centroid / Using distance to assign and update clusters
characteristics, neural networks
- activation function / From biological to artificial neurons
- network topology / From biological to artificial neurons
- training algorithm / From biological to artificial neurons
classification / Types of machine learning algorithms
classification and regression training (caret package) / Beyond accuracy – other measures of performance
Classification and Regression Tree (CART) algorithm / Understanding regression trees and model trees
classification performance
- measuring / Measuring performance for classification
classification prediction data-classification prediction data
- working with / Working with classification prediction data in R
classification rules
- about / Understanding classification rules
- separate and conquer / Separate and conquer
- 1 R algorithm / The 1R algorithm
- RIPPER algorithm / The RIPPER algorithm
- obtaining, from decision trees / Rules from decision trees
class imbalance problem / Measuring performance for classification
clustering / Types of machine learning algorithms
- about / Understanding clustering
- as machine learning task / Clustering as a machine learning task
clustering, k-means clustering algorithm
- about / The k-means clustering algorithm
- distance, used for assigning cluster / Using distance to assign and update clusters
- distance, used for updating cluster / Using distance to assign and update clusters
- appropriate number of clusters, selecting / Choosing the appropriate number of clusters
column-major order / Matrixes and arrays
combination function / Understanding ensembles
Complete Unified Device Architecture (CUDA)
- about / GPU computing
Comprehensive R Archive Network (CRAN)
- about / Machine learning with R
- URL / Machine learning with R
concrete strength, modeling with ANNs
- about / Example – Modeling the strength of concrete with ANNs
- data, collecting / Step 1 – collecting data
- data, preparing / Step 2 – exploring and preparing the data
- data, exploring / Step 2 – exploring and preparing the data
- model, training on data / Step 3 – training a model on the data
- model performance, evaluating / Step 4 – evaluating model performance
- model performance, improving / Step 5 – improving model performance
conditional probability
- about / Computing conditional probability with Bayes' theorem
confusion matrix
- about / A closer look at confusion matrices
- used, for measuring performance / Using confusion matrices to measure performance
control object / Customizing the tuning process
convex hull / The case of linearly separable data
corpus / Data preparation – cleaning and standardizing text data
correlation
- about / Correlations
CRAN
- about / Improving the performance of R
- URL / Improving the performance of R
CRAN task view
- URL / Analyzing bioinformatics data
CRAN Web Technologies
- URL / Working with online data and services
cross-validation / Cross-validation
CSV (Comma-Separated Values) file
- about / Importing and saving data from CSV files
CSV files
- data, importing from / Importing and saving data from CSV files
curl utility
- about / Downloading the complete text of web pages
cut points
- about / Using numeric features with Naive Bayes

D

data
- managing, with R / Managing data with R
- importing, from CSV files / Importing and saving data from CSV files
data.table package
- using / Making data frames faster with data.table
- URL / Making data frames faster with data.table
Database Management Systems (DBMSs)
- about / Querying data in SQL databases
databases
- about / Working with proprietary files and databases
- data, querying in SQL databases / Querying data in SQL databases
data dictionary
- about / Exploring the structure of data
data exploration
- about / Exploring and understanding data
data frame
- about / Data frames
data mining
- about / The origins of machine learning
data munging
- about / Working with proprietary files and databases
data preparation, breast cancer example
- training, creating / Data preparation – creating training and test datasets
- test datasets, creating / Data preparation – creating training and test datasets
Data Source Name (DSN)
- about / Querying data in SQL databases
data storage / Data storage
data structures, R
- about / R data structures
- vector / Vectors
- factor / Factors
- lists / Lists
- data frame / Data frames
- matrix / Matrixes and arrays
- array / Matrixes and arrays
- saving / Saving, loading, and removing R data structures
- loading / Saving, loading, and removing R data structures
- removing / Saving, loading, and removing R data structures
- exploring / Exploring the structure of data
data table
- about / Making data frames faster with data.table
data wrangling
- about / Working with proprietary files and databases
decision nodes
- about / Understanding decision trees
decision tree
- potential uses / Understanding decision trees
- about / Understanding decision trees, Example – identifying risky bank loans using C5.0 decision trees
- divide and conquer / Divide and conquer
- pruning / Pruning the decision tree
- used, for identifying risky bank loans / Example – identifying risky bank loans using C5.0 decision trees
- accuracy, boosting / Boosting the accuracy of decision trees
decision tree forests
- about / Random forests
decision trees
- classification rules, obtaining from / Rules from decision trees
deep learning
- about / The direction of information travel
Deep Neural Network (DNN)
- about / The direction of information travel
delimiter
- about / Importing and saving data from CSV files
dendrites
- about / From biological to artificial neurons
dependent events / Understanding joint probability
dependent variable
- about / Understanding regression
descriptive model / Types of machine learning algorithms
disk-based data frames
- creating, with ff package / Creating disk-based data frames with ff
divide and conquer
- about / Divide and conquer
domain-specific data
- working with / Working with domain-specific data
- bioinformatics data, analyzing / Analyzing bioinformatics data
- network data, analyzing / Analyzing and visualizing network data
- network data, visualizing / Analyzing and visualizing network data
doParallel package
- using / Taking advantage of parallel with foreach and doParallel
dplyr package
- used, for generalizing tabular data structures / Generalizing tabular data structures with dplyr
- URL / Generalizing tabular data structures with dplyr
dummy coding / Preparing data for use with k-NN, Step 3 – training a model on the data
dummy variable / Examining relationships – two-way cross-tabulations, Step 3 – training a model on the data

E

early stopping
- about / Pruning the decision tree
edgelist
- about / Analyzing and visualizing network data
elements
- about / Vectors
embarrassingly parallel problems
- about / Learning faster with parallel computing
ensemble methods
- bagging / Bagging
- boosting / Boosting
- random forests / Random forests
ensembles
- about / Understanding ensembles
- advantages / Understanding ensembles
entropy
- about / Choosing the best split
epoch
- about / Training neural networks with backpropagation
- forward phase / Training neural networks with backpropagation
- backward phase / Training neural networks with backpropagation
erosion / Simple linear regression
Euclidean norm / The case of linearly separable data
evaluation / Evaluation

F

10-fold cross-validation (10-fold CV) / Cross-validation
F-measure / The F-measure
F-score / The F-measure
F1 score / The F-measure
factor
- about / Factors
feedforward networks
- about / The direction of information travel
ffbase project
- URL / Creating disk-based data frames with ff
ff package
- used, for creating disk-based data frames / Creating disk-based data frames with ff
- URL / Creating disk-based data frames with ff
five-number summary / Measuring spread – quartiles and the five-number summary
foreach package
- using / Taking advantage of parallel with foreach and doParallel
frequently purchased groceries
- identifying, with association rules / Example – identifying frequently purchased groceries with association rules
future performance
- estimating / Estimating future performance
future performance estimation
- holdout method / The holdout method
- cross-validation / Cross-validation
- bootstrap sampling / Bootstrap sampling

G

Gaussian RBF kernel / Using kernels for non-linear spaces
generalization / Generalization
Generalized Linear Models (GLM) / Understanding regression
glyph / Step 1 – collecting data
GPU
- about / GPU computing
- computing / GPU computing
- URL / GPU computing
gradient descent / Training neural networks with backpropagation
Graph Modeling Language (GML)
- about / Analyzing and visualizing network data
greedy learners / What makes trees and rules greedy?
grid
- about / Learning faster with parallel computing

H

Hadoop
- using / Parallel cloud computing with MapReduce and Hadoop
- URL / Parallel cloud computing with MapReduce and Hadoop
harmonic mean / The F-measure
header line
- about / Importing and saving data from CSV files
histograms / Visualizing numeric variables – histograms
holdout method / The holdout method, Cross-validation
httr package
- URL / Downloading the complete text of web pages
hyperplane / Understanding Support Vector Machines
Hypertext Markup Language (HTML)
- about / Downloading the complete text of web pages

I

igraph package
- about / Analyzing and visualizing network data
- URL / Analyzing and visualizing network data
imputation / Data preparation – imputing the missing values
Incremental Reduced Error Pruning (IREP) algorithm / The RIPPER algorithm
independent events / Understanding joint probability
independent variables
- about / Understanding regression
information gain / Choosing the best split
input data
- types / Types of input data
- matching, to algorithms / Matching input data to algorithms
input nodes / The number of layers
instance-based learning
- about / Why is the k-NN algorithm lazy?
intercept
- about / Understanding regression
Interquartile Range (IQR) / Measuring spread – quartiles and the five-number summary
itemset
- about / Understanding association rules
Iterative Dichotomiser 3 (ID3) / The C5.0 decision tree algorithm

J

joint probability / Understanding joint probability
JSON
- parsing, from web APIs / Parsing JSON from web APIs
- about / Parsing JSON from web APIs
- URL / Parsing JSON from web APIs
jsonlite package
- URL / Parsing JSON from web APIs

K

k-fold cross-validation (or k-fold CV) / Cross-validation
k-means++ / Using distance to assign and update clusters
k-means clustering algorithm
- about / The k-means clustering algorithm
k-NN algorithm
- about / The k-NN algorithm
- weaknesses / The k-NN algorithm
- similarity, measuring with distance / Measuring similarity with distance
- appropriate k, selecting / Choosing an appropriate k
- data, preparing / Preparing data for use with k-NN
- lazy learning algorithm / Why is the k-NN algorithm lazy?
- used, for diagnosing breast cancer / Example – diagnosing breast cancer with the k-NN algorithm
kernels
- using, for non-linear spaces / Using kernels for non-linear spaces
kernel trick / Using kernels for non-linear spaces
kernlab
- reference / Step 3 – training a model on the data

L

Laplace estimator
- about / The Laplace estimator
large datasets
- managing / Managing very large datasets
- tabular data structures, generalizing with dplyr / Generalizing tabular data structures with dplyr
- data.table package, using / Making data frames faster with data.table
- disk-based data frames, creating with ff package / Creating disk-based data frames with ff
- massive matrices, using with bigmemory package / Using massive matrices with bigmemory
latitude / Using kernels for non-linear spaces
layers
- about / The number of layers
lazy learning algorithms / Why is the k-NN algorithm lazy?
leaf nodes
- about / Understanding decision trees
learning rate / Training neural networks with backpropagation
leave-one-out method / Cross-validation
left-hand side (LHS) / Understanding association rules
levels / Types of machine learning algorithms
LIBSVM
- URL / Step 3 – training a model on the data
likelihood
- about / Computing conditional probability with Bayes' theorem
linear kernel / Using kernels for non-linear spaces
link function / Understanding regression
lists / Lists
loess curve / Visualizing relationships among features – the scatterplot matrix
logistic regression
- about / Understanding regression
longitude / Using kernels for non-linear spaces

M

machine learning
- origins / The origins of machine learning
- about / The origins of machine learning
- abuses / Uses and abuses of machine learning
- uses / Uses and abuses of machine learning
- successes / Machine learning successes
- limitations / The limits of machine learning
- ethics / Machine learning ethics
- process / How machines learn
- with R / Machine learning with R
- R packages, installing / Installing R packages
- R packages, loading / Loading and unloading R packages
- R packages, unloading / Loading and unloading R packages
machine learning, in practice
- about / Machine learning in practice
- data collection / Machine learning in practice
- data exploration and preparation / Machine learning in practice
- model training / Machine learning in practice
- model evaluation / Machine learning in practice
- model improvement / Machine learning in practice
- input data, types / Types of input data
- algorithms, types / Types of machine learning algorithms
- input data, matching to algorithms / Matching input data to algorithms
machine learning, process
- about / How machines learn
- data storage / How machines learn, Data storage
- abstraction / How machines learn, Abstraction
- generalization / How machines learn, Generalization
- evaluation / How machines learn, Evaluation
machine learning algorithms
- types / Types of machine learning algorithms
magrittr package
- about / Scraping data from web pages
- URL / Scraping data from web pages
MapReduce
- about / Parallel cloud computing with MapReduce and Hadoop
- map step / Parallel cloud computing with MapReduce and Hadoop
- reduce step / Parallel cloud computing with MapReduce and Hadoop
marginal likelihood
- about / Computing conditional probability with Bayes' theorem
market basket analysis example
- data, collecting / Step 1 – collecting data
- data, preparing / Step 2 – exploring and preparing the data
- data, exploring / Step 2 – exploring and preparing the data
- sparse matrix, creating for transaction data / Data preparation – creating a sparse matrix for transaction data
- item support, visualizing / Visualizing item support – item frequency plots
- transaction data, visualizing / Visualizing the transaction data – plotting the sparse matrix
- model, training on data / Step 3 – training a model on the data
- model performance, evaluating / Step 4 – evaluating model performance
- model performance, improving / Step 5 – improving model performance
- set of association rules, sorting / Sorting the set of association rules
- subset of association rules, sorting / Taking subsets of association rules
- association rules, saving to file / Saving association rules to a file or data frame
- association rules, saving to data frame / Saving association rules to a file or data frame
matrix
- about / Matrixes and arrays
/ Matrixes and arrays
matrix notation / Multiple linear regression
maximum margin hyperplane (MMH) / Classification with hyperplanes
mean / Measuring the central tendency – mean and median
mean absolute error (MAE) / Measuring performance with the mean absolute error
medical expenses, predicting with linear regression
- about / Example – predicting medical expenses using linear regression
- data, collecting / Step 1 – collecting data
- data, preparing / Step 2 – exploring and preparing the data
- data, exploring / Step 2 – exploring and preparing the data
- correlation matrix / Exploring relationships among features – the correlation matrix
- relationships, visualizing among features / Visualizing relationships among features – the scatterplot matrix
- scatterplot matrix / Visualizing relationships among features – the scatterplot matrix
- model, training on data / Step 3 – training a model on the data
- model performance, training / Step 4 – evaluating model performance
- model performance, improving / Step 5 – improving model performance, Model specification – adding non-linear relationships, Transformation – converting a numeric variable to a binary indicator, Model specification – adding interaction effects, Putting it all together – an improved regression model
message-passing interface (MPI)
- about / Working in parallel with multicore and snow
meta-learners / Types of machine learning algorithms
meta-learning methods
- used, for improving model performance / Improving model performance with meta-learning
- about / Improving model performance with meta-learning
min-max normalization / Preparing data for use with k-NN
mobile phone spam
- filtering, with Naive Bayes algorithm / Example – filtering mobile phone spam with the Naive Bayes algorithm
mobile phone spam example
- data, collecting / Step 1 – collecting data
- dat a collecting, URL / Step 1 – collecting data
- data, preparing / Step 2 – exploring and preparing the data
- data, exploring / Step 2 – exploring and preparing the data
- text data, cleaning / Data preparation – cleaning and standardizing text data
- text data, standardizing / Data preparation – cleaning and standardizing text data
- text documents, splitting into words / Data preparation – splitting text documents into words
- training, creating / Data preparation – creating training and test datasets
- test datasets, creating / Data preparation – creating training and test datasets
- text data, visualizing / Visualizing text data – word clouds
- indicator features, creating for frequent words / Data preparation – creating indicator features for frequent words
- model, training on data / Step 3 – training a model on the data
- model performance, evaluating / Step 4 – evaluating model performance
- model performance, improving / Step 5 – improving model performance
model performance
- improving, with meta-learning / Improving model performance with meta-learning
model performance, breast cancer example
- z-score standardization / Transformation – z-score standardization
- alternatives values, testing of k / Testing alternative values of k
model trees / Understanding regression trees and model trees
multicore package
- using / Working in parallel with multicore and snow
multilayer network
- about / The number of layers
Multilayer Perceptron (MLP)
- about / The direction of information travel
multimodal / Measuring the central tendency – the mode
multinomial logistic regression / Understanding regression
multiple linear regression / Understanding regression
- about / Multiple linear regression
- weaknesses / Multiple linear regression
multiple R-squared value (coefficient of determination) / Step 4 – evaluating model performance
multivariate relationships
- about / Exploring relationships between variables

N

Naive Bayes algorithm
- about / Understanding Naive Bayes, The Naive Bayes algorithm
- classification / Classification with Naive Bayes
- Laplace estimator / The Laplace estimator
- numeric features, using with / Using numeric features with Naive Bayes
- used, for filtering mobile phone spam / Example – filtering mobile phone spam with the Naive Bayes algorithm
nearest neighbor classification
- about / Understanding nearest neighbor classification
network analysis
- about / Analyzing and visualizing network data
network data
- analyzing / Analyzing and visualizing network data
- visualizing / Analyzing and visualizing network data
network topology
- about / Network topology
- layers / The number of layers
- direction of information travel / The direction of information travel
- number of nodes in each layer / The number of nodes in each layer
neural networks
- about / Understanding neural networks
- biological, to artificial neurons / From biological to artificial neurons
- characteristics / From biological to artificial neurons
- training, with backpropagation / Training neural networks with backpropagation
neurons
- about / Understanding neural networks
nodes / Understanding neural networks
nominal / Types of input data
nominal variables
- about / Factors
non-linear spaces
- kernels, using for / Using kernels for non-linear spaces
normal distribution / Understanding numeric data – uniform and normal distributions
numeric / Types of input data
numeric data
- about / Understanding numeric data – uniform and normal distributions
- normalizing / Transformation – normalizing numeric data
numeric features
- using, with Naive Bayes / Using numeric features with Naive Bayes
numeric prediction / Types of machine learning algorithms
numeric variables
- about / Exploring numeric variables
- central tendency, measuring / Measuring the central tendency – mean and median
- spread, measuring / Measuring spread – quartiles and the five-number summary, Measuring spread – variance and standard deviation
- visualizing / Visualizing numeric variables – boxplots, Visualizing numeric variables – histograms

O

OCR, performing with SVMs
- about / Example – performing OCR with SVMs
- data, collecting / Step 1 – collecting data
- data, exploring / Step 2 – exploring and preparing the data
- data, preparing / Step 2 – exploring and preparing the data
- model, training on data / Step 3 – training a model on the data
- model performance, evaluating / Step 4 – evaluating model performance
- model performance, improving / Step 5 – improving model performance
one-way table / Exploring categorical variables
online data
- working with / Working with online data and services
- parsing / Working with online data and services
- complete text of web pages, downloading / Downloading the complete text of web pages
- parsing, within web pages / Scraping data from web pages
online services
- working with / Working with online data and services
Open Database Connectivity (ODBC)
- about / Querying data in SQL databases
optimized learning algorithms
- deploying / Deploying optimized learning algorithms
- regression models, building with biglm package / Building bigger regression models with biglm
- random forests, building with bigrf package / Growing bigger and faster random forests with bigrf
- models in parallel, evaluating with caret package / Training and evaluating models in parallel with caret
ordinal / Types of input data
ordinary least squares estimation
- about / Ordinary least squares estimation
out-of-bag error rate / Training random forests
overfitting / Evaluation

P

parallel cloud computing
- with MapReduce / Parallel cloud computing with MapReduce and Hadoop
- with Hadoop / Parallel cloud computing with MapReduce and Hadoop
parallel computing
- about / Learning faster with parallel computing
- execution time, measuring / Measuring execution time
- with multicore package / Working in parallel with multicore and snow
- with snow package / Working in parallel with multicore and snow
- with foreach package / Taking advantage of parallel with foreach and doParallel
- with doParallel package / Taking advantage of parallel with foreach and doParallel
parameter tuning
- about / Tuning stock models for better performance
pattern discovery / Types of machine learning algorithms
Pearson's correlation coefficient / Correlations
performance
- measuring, confusion matrices used / Using confusion matrices to measure performance
performance measures
- about / Beyond accuracy – other measures of performance
- kappa statistic / The kappa statistic
- sensitivity / Sensitivity and specificity
- specificity / Sensitivity and specificity
- precision / Precision and recall
performance tradeoffs
- -visualizing / Visualizing performance trade-offs
poisonous mushrooms
- identifying, with rule learners / Example – identifying poisonous mushrooms with rule learners
poisonous mushrooms example, with rule learners
- data, collecting / Step 1 – collecting data
- data, exploring / Step 2 – exploring and preparing the data
- data, preparing / Step 2 – exploring and preparing the data
- model, training on data / Step 3 – training a model on the data
- model performance, evaluating / Step 4 – evaluating model performance
- model performance, improving / Step 5 – improving model performance
Poisson regression
- about / Understanding regression
polynomial kernel / Using kernels for non-linear spaces
positive predictive value / Precision and recall
posterior probability
- about / Computing conditional probability with Bayes' theorem
postpruning
- about / Pruning the decision tree
pre-pruning
- about / Pruning the decision tree
precision / Precision and recall
predictive model / Types of machine learning algorithms
prior probability
- about / Computing conditional probability with Bayes' theorem
probability
- about / Understanding probability
proprietary files
- about / Working with proprietary files and databases
- Microsoft Excel files, reading / Reading from and writing to Microsoft Excel, SAS, SPSS, and Stata files
- Microsoft Excel files, writing / Reading from and writing to Microsoft Excel, SAS, SPSS, and Stata files
- SAS files, writing / Reading from and writing to Microsoft Excel, SAS, SPSS, and Stata files
- SAS files, reading / Reading from and writing to Microsoft Excel, SAS, SPSS, and Stata files
- SPSS files, reading / Reading from and writing to Microsoft Excel, SAS, SPSS, and Stata files
- SPSS files, writing / Reading from and writing to Microsoft Excel, SAS, SPSS, and Stata files
- Stata files, writing / Reading from and writing to Microsoft Excel, SAS, SPSS, and Stata files
- Stata files, reading / Reading from and writing to Microsoft Excel, SAS, SPSS, and Stata files
proprietary microarray
- using / Analyzing bioinformatics data
pure / Choosing the best split
purity / Choosing the best split

Q

quadratic optimization / The case of linearly separable data
quantiles / Measuring spread – quartiles and the five-number summary

R

1 R algorithm / The 1R algorithm
R
- about / Machine learning with R
- packages, installing / Installing R packages
- packages, loading / Loading and unloading R packages
- packages, unloading / Loading and unloading R packages
- data structures / R data structures
- used, for managing data / Managing data with R
- working with classification prediction data / Working with classification prediction data in R
R, performance improvement
- about / Improving the performance of R
- large datasets, managing / Managing very large datasets
- parallel computing / Learning faster with parallel computing
- GPU, computing / GPU computing
- optimized learning algorithms, deploying / Deploying optimized learning algorithms
R-squared value / Step 4 – evaluating model performance
Radial Basis Function (RBF) network
- about / Activation functions
random forests
- about / Random forests
- URL / Random forests
- strengths / Random forests
- training / Training random forests
- performance, evaluating / Evaluating random forest performance
- building, with bigrf package / Growing bigger and faster random forests with bigrf
RCurl
- URL / Downloading the complete text of web pages
rea under the ROC curve (AUC) / ROC curves
Receiver Operating Characteristic (ROC) curve
- about / ROC curves
- creating / ROC curves
recurrent network
- about / The direction of information travel
recursive partitioning
- about / Divide and conquer
regression
- about / Understanding regression
- simple linear regression / Simple linear regression
- ordinary least squares estimation / Ordinary least squares estimation
- correlation / Correlations
- multiple linear regression / Multiple linear regression
- adding, to trees / Adding regression to trees
regression analysis
- use cases / Understanding regression
regression equations
- about / Understanding regression
regression models
- building, with biglm package / Building bigger regression models with biglm
regression trees
- about / Understanding regression trees and model trees
relationships
- exploring, between variables / Exploring relationships between variables
- visualizing / Visualizing relationships – scatterplots
- examining / Examining relationships – two-way cross-tabulations
Repeated Incremental Pruning to Produce Error Reduction (RIPPER) algorithm / The RIPPER algorithm
residuals / Ordinary least squares estimation
resubstitution error / Estimating future performance
Revolution Analytics
- URL / Taking advantage of parallel with foreach and doParallel
RHadoop
- URL / Parallel cloud computing with MapReduce and Hadoop
RHIPE package
- URL / Parallel cloud computing with MapReduce and Hadoop
rio package
- URL / Reading from and writing to Microsoft Excel, SAS, SPSS, and Stata files
- about / Reading from and writing to Microsoft Excel, SAS, SPSS, and Stata files
RIPPER algorithm
- about / The RIPPER algorithm
risky bank loans
- identifying, C5.0 decision trees used / Example – identifying risky bank loans using C5.0 decision trees
rote learning
- about / Why is the k-NN algorithm lazy?
rpart.plot
- URL / Visualizing decision trees
rudimentary ANNs / Understanding neural networks
rvest package
- about / Scraping data from web pages

S

scatterplot
- about / Visualizing relationships – scatterplots
scatterplot matrix (SPLOM) / Visualizing relationships among features – the scatterplot matrix
Scoville scale / Preparing data for use with k-NN
segmentation analysis / Types of machine learning algorithms
semi-supervised learning / Clustering as a machine learning task
separate and conquer
- about / Separate and conquer
sigmoid kernel / Using kernels for non-linear spaces
simple linear regression / Understanding regression
- about / Simple linear regression
simple tuned model
- creating / Creating a simple tuned model
slack variable / The case of nonlinearly separable data
slope
- about / Understanding regression
slope-intercept form
- about / Understanding regression
SMS Spam Collection
- URL / Step 1 – collecting data
snowball
- URL / Data preparation – cleaning and standardizing text data
snow package
- using / Working in parallel with multicore and snow
- URL / Working in parallel with multicore and snow
social networking service (SNS) / Example – finding teen market segments using k-means clustering
sparse matrix / Data preparation – splitting text documents into words, Data preparation – creating a sparse matrix for transaction data
SQL databases
- data, querying in / Querying data in SQL databases
squashing functions / Activation functions
stacking
- about / Understanding ensembles
standard deviation
- about / Measuring spread – variance and standard deviation
standard deviation reduction (SDR) / Adding regression to trees
statistical hypothesis testing / Understanding regression
stock models
- tuning, for better performance / Tuning stock models for better performance
Structured Query Language (SQL)
- about / Querying data in SQL databases
subtree raising / Pruning the decision tree
subtree replacement / Pruning the decision tree
summary statistics / Exploring numeric variables
supervised learning / Types of machine learning algorithms
Support Vector Machine (SVM)
- about / Understanding Support Vector Machines
- applications / Understanding Support Vector Machines
- classifications, with hyperplanes / Classification with hyperplanes
- case of linearly separable data / The case of linearly separable data
- case of nonlinearly separable data / The case of nonlinearly separable data
- OCR, performing with / Example – performing OCR with SVMs
/ Bagging
support vectors / Classification with hyperplanes
SVMlight
- about / Step 3 – training a model on the data
- URL / Step 3 – training a model on the data
synapse
- about / From biological to artificial neurons

T

Tab-Separated Value (TSV)
- about / Importing and saving data from CSV files
tabular
- about / Importing and saving data from CSV files
tabular data structures
- generalizing, with dplyr package / Generalizing tabular data structures with dplyr
teen market segments search, with k-means clustering
- about / Example – finding teen market segments using k-means clustering
- data, collecting / Step 1 – collecting data
- data, exploring / Step 2 – exploring and preparing the data
- data, preparing / Step 2 – exploring and preparing the data, Data preparation – dummy coding missing values, Data preparation – imputing the missing values
- model, training on data / Step 3 – training a model on the data
- model performance, evaluating / Step 4 – evaluating model performance
- model performance, improving / Step 5 – improving model performance
terminal nodes / Understanding decision trees
threshold activation function / Activation functions
training / Abstraction
trees
- regression, adding to / Adding regression to trees
tree structure
- about / Understanding decision trees
tuning process
- customizing / Customizing the tuning process
two-way cross-tabulation
- about / Examining relationships – two-way cross-tabulations

U

UCI Machine Learning Data Repository
- URL / Step 1 – collecting data, Step 1 – collecting data, Step 1 – collecting data
- about / Step 1 – collecting data
unimodal / Measuring the central tendency – the mode
unit of analysis / Types of input data
unit of observation / Types of input data
unit step activation function / Activation functions
univariate statistics
- about / Exploring relationships between variables
universal function approximator / The number of nodes in each layer
unsupervised learning / Types of machine learning algorithms

V

vector
- about / Vectors
vector types
- types / Vectors
Voronoi diagram / Using distance to assign and update clusters

W

web pages
- complete text, downloading / Downloading the complete text of web pages
- data, parsing / Scraping data from web pages
- XML documents, parsing / Parsing XML documents
- JSON, parsing from web APIs / Parsing JSON from web APIs
web scraping
- about / Scraping data from web pages
wine quality estimation, with regression trees
- about / Example – estimating the quality of wines with regression trees and model trees
- data, collecting / Step 1 – collecting data
- data, preparing / Step 2 – exploring and preparing the data
- data, exploring / Step 2 – exploring and preparing the data
- model, training on data / Step 3 – training a model on the data
- decision trees, visualizing / Visualizing decision trees
- model performance, evaluating / Step 4 – evaluating model performance
- performance, measuring with mean absolute error / Measuring performance with the mean absolute error
- model performance, improving / Step 5 – improving model performance
word cloud
- about / Visualizing text data – word clouds
wordcloud package
- URL / Visualizing text data – word clouds

X

xml2 GitHub
- URL / Parsing XML documents
XML documents
- parsing / Parsing XML documents
XML package
- about / Parsing XML documents
- URL / Parsing XML documents

Z

z-score / Preparing data for use with k-NN
z-score standardization / Preparing data for use with k-NN, Transformation – z-score standardization
ZeroR / The 1R algorithm

The rest of the chapter is locked

You're reading from Machine Learning with R Expert techniques for predictive modeling to solve all your data analysis problems

Table of Contents (19) Chapters

Index

A

B

C

D

E

F

G

H

I

J

K

L

M

N

O

P

Q

R

S

T

U

V

W

X

Z

Authors (1)

Personalised recommendations for you

You're reading from Machine Learning with R Expert techniques for predictive modeling to solve all your data analysis problems

Table of Contents (19) Chapters

Index

A

B

C

D

E

F

G

H

I

J

K

L

M

N

O

P

Q

R

S

T

U

V

W

X

Z

Unlock this book and the full library FREE for 7 days

Authors (1)

Personalised recommendations for you