Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Arrow up icon
GO TO TOP
Machine Learning with R

You're reading from   Machine Learning with R Expert techniques for predictive modeling to solve all your data analysis problems

Arrow left icon
Product type Paperback
Published in Jul 2015
Publisher Packt
ISBN-13 9781784393908
Length 452 pages
Edition 2nd Edition
Languages
Arrow right icon
Author (1):
Arrow left icon
Brett Lantz Brett Lantz
Author Profile Icon Brett Lantz
Brett Lantz
Arrow right icon
View More author details
Toc

Table of Contents (19) Chapters Close

Machine Learning with R Second Edition
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
1. Introducing Machine Learning FREE CHAPTER 2. Managing and Understanding Data 3. Lazy Learning – Classification Using Nearest Neighbors 4. Probabilistic Learning – Classification Using Naive Bayes 5. Divide and Conquer – Classification Using Decision Trees and Rules 6. Forecasting Numeric Data – Regression Methods 7. Black Box Methods – Neural Networks and Support Vector Machines 8. Finding Patterns – Market Basket Analysis Using Association Rules 9. Finding Groups of Data – Clustering with k-means 10. Evaluating Model Performance 11. Improving Model Performance 12. Specialized Machine Learning Topics Index

Index

A

  • abstraction / Abstraction
  • activation function / From biological to artificial neurons
    • about / Activation functions
    • threshold activation function / Activation functions
    • unit step activation function / Activation functions
    • sigmoid activation function / Activation functions
  • AdaBoost
    • about / Boosting
  • AdaBoost.M1 algorithm / Boosting
  • adaptive boosting
    • about / Boosting the accuracy of decision trees, Boosting
  • allocation function / Understanding ensembles
  • Apache Hadoop
    • about / Parallel cloud computing with MapReduce and Hadoop
  • Application Programming Interfaces (APIs)
    • about / Parsing JSON from web APIs
  • Apriori
    • property / The Apriori algorithm for association rule learning
  • Apriori algorithm
    • for association rule learning / The Apriori algorithm for association rule learning
    • strengths / The Apriori algorithm for association rule learning
  • Apriori principle
    • used, for building set of rules / Building a set of rules with the Apriori principle
  • Artificial Neural Network (ANN)
    • about / Understanding neural networks
  • association rules
    • about / Understanding association rules
    • potential applications / Understanding association rules
    • rule interest, measuring / Measuring rule interest – support and confidence
    • set of rules, building with Apriori principle / Building a set of rules with the Apriori principle
    • frequently purchased groceries, identifying with / Example – identifying frequently purchased groceries with association rules
  • automated parameter tuning
    • caret package used for / Using caret for automated parameter tuning
    • requisites / Using caret for automated parameter tuning
  • axon
    • about / From biological to artificial neurons

B

  • backpropagation
    • neural networks, training with / Training neural networks with backpropagation
    • about / Training neural networks with backpropagation
  • bag-of-words / Step 2 – exploring and preparing the data
  • bagging
    • about / Bagging
  • bank loans example, with C5.0 decision trees
    • data, collecting / Step 1 – collecting data
    • data, exploring / Step 2 – exploring and preparing the data
    • data, preparing / Step 2 – exploring and preparing the data
    • random training, creating / Data preparation – creating random training and test datasets
    • test datasets, creating / Data preparation – creating random training and test datasets
    • model, training on data / Step 3 – training a model on the data
    • model performance, evaluating / Step 4 – evaluating model performance
    • model performance, improving / Step 5 – improving model performance
  • Bayesian methods
    • basic concepts / Basic concepts of Bayesian methods
  • Bayesian methods,basics concepts
    • joint probability / Understanding joint probability
    • conditional probability / Computing conditional probability with Bayes' theorem
  • Bayesian methods, basics concepts
    • probability / Understanding probability
  • Beowulf cluster
    • about / Working in parallel with multicore and snow
  • betweenness centrality
    • about / Analyzing and visualizing network data
  • bias / The case of linearly separable data
  • bias-variance tradeoff / Choosing an appropriate k
  • biglm package
    • regression models, building / Building bigger regression models with biglm
  • bigmemory package
    • massive matrices, using with / Using massive matrices with bigmemory
    • URL / Using massive matrices with bigmemory
  • bigrf package
    • random forests, building / Growing bigger and faster random forests with bigrf
    • URL / Growing bigger and faster random forests with bigrf
  • bimodal / Measuring the central tendency – the mode
  • binning
    • about / Using numeric features with Naive Bayes
  • bins
    • about / Using numeric features with Naive Bayes
  • Bioconductor
    • about / Analyzing bioinformatics data
    • URL / Analyzing bioinformatics data
  • bioinformatics
    • about / Analyzing bioinformatics data
  • bioinformatics data
    • analyzing / Analyzing bioinformatics data
  • bivariate relationships
    • about / Exploring relationships between variables
  • blind tasting experience example / The k-NN algorithm
  • blowby / Simple linear regression
  • body mass index (BMI) / Step 1 – collecting data
  • boosting
    • about / Boosting
  • bootstrap aggregating
    • about / Bagging
  • bootstrap sampling / Bootstrap sampling
  • box-and-whiskers plot / Visualizing numeric variables – boxplots
  • branches
    • about / Understanding decision trees
  • breast cancer
    • diagnosing, with k-NN algorithm / Example – diagnosing breast cancer with the k-NN algorithm
  • breast cancer example
    • data, collecting / Step 1 – collecting data
    • data, exploring / Step 2 – exploring and preparing the data
    • data, preparing / Step 2 – exploring and preparing the data
    • model, training on data / Step 3 – training a model on the data
    • model performance, evaluating / Step 4 – evaluating model performance
    • model performance, improving / Step 5 – improving model performance

C

  • C5.0 algorithm
    • about / The C5.0 decision tree algorithm
    • split, selecting / Choosing the best split
    • decision tree, pruning / Pruning the decision tree
  • caret package
    • using, for automated parameter tuning / Using caret for automated parameter tuning
    • URL / Using caret for automated parameter tuning, Training and evaluating models in parallel with caret
    • used, for evaluating models in parallel / Training and evaluating models in parallel with caret
  • categorical / Types of input data
  • categorical variables
    • about / Exploring categorical variables
    • central tendency, measuring / Measuring the central tendency – the mode
  • cell body / From biological to artificial neurons
  • centroid / Using distance to assign and update clusters
  • characteristics, neural networks
    • activation function / From biological to artificial neurons
    • network topology / From biological to artificial neurons
    • training algorithm / From biological to artificial neurons
  • classification / Types of machine learning algorithms
  • classification and regression training (caret package) / Beyond accuracy – other measures of performance
  • Classification and Regression Tree (CART) algorithm / Understanding regression trees and model trees
  • classification performance
    • measuring / Measuring performance for classification
  • classification prediction data-classification prediction data
    • working with / Working with classification prediction data in R
  • classification rules
    • about / Understanding classification rules
    • separate and conquer / Separate and conquer
    • 1 R algorithm / The 1R algorithm
    • RIPPER algorithm / The RIPPER algorithm
    • obtaining, from decision trees / Rules from decision trees
  • class imbalance problem / Measuring performance for classification
  • clustering / Types of machine learning algorithms
    • about / Understanding clustering
    • as machine learning task / Clustering as a machine learning task
  • clustering, k-means clustering algorithm
    • about / The k-means clustering algorithm
    • distance, used for assigning cluster / Using distance to assign and update clusters
    • distance, used for updating cluster / Using distance to assign and update clusters
    • appropriate number of clusters, selecting / Choosing the appropriate number of clusters
  • column-major order / Matrixes and arrays
  • combination function / Understanding ensembles
  • Complete Unified Device Architecture (CUDA)
    • about / GPU computing
  • Comprehensive R Archive Network (CRAN)
    • about / Machine learning with R
    • URL / Machine learning with R
  • concrete strength, modeling with ANNs
    • about / Example – Modeling the strength of concrete with ANNs
    • data, collecting / Step 1 – collecting data
    • data, preparing / Step 2 – exploring and preparing the data
    • data, exploring / Step 2 – exploring and preparing the data
    • model, training on data / Step 3 – training a model on the data
    • model performance, evaluating / Step 4 – evaluating model performance
    • model performance, improving / Step 5 – improving model performance
  • conditional probability
    • about / Computing conditional probability with Bayes' theorem
  • confusion matrix
    • about / A closer look at confusion matrices
    • used, for measuring performance / Using confusion matrices to measure performance
  • control object / Customizing the tuning process
  • convex hull / The case of linearly separable data
  • corpus / Data preparation – cleaning and standardizing text data
  • correlation
    • about / Correlations
  • CRAN
    • about / Improving the performance of R
    • URL / Improving the performance of R
  • CRAN task view
    • URL / Analyzing bioinformatics data
  • CRAN Web Technologies
    • URL / Working with online data and services
  • cross-validation / Cross-validation
  • CSV (Comma-Separated Values) file
    • about / Importing and saving data from CSV files
  • CSV files
    • data, importing from / Importing and saving data from CSV files
  • curl utility
    • about / Downloading the complete text of web pages
  • cut points
    • about / Using numeric features with Naive Bayes

D

  • data
    • managing, with R / Managing data with R
    • importing, from CSV files / Importing and saving data from CSV files
  • data.table package
    • using / Making data frames faster with data.table
    • URL / Making data frames faster with data.table
  • Database Management Systems (DBMSs)
    • about / Querying data in SQL databases
  • databases
    • about / Working with proprietary files and databases
    • data, querying in SQL databases / Querying data in SQL databases
  • data dictionary
    • about / Exploring the structure of data
  • data exploration
    • about / Exploring and understanding data
  • data frame
    • about / Data frames
  • data mining
    • about / The origins of machine learning
  • data munging
    • about / Working with proprietary files and databases
  • data preparation, breast cancer example
    • training, creating / Data preparation – creating training and test datasets
    • test datasets, creating / Data preparation – creating training and test datasets
  • Data Source Name (DSN)
    • about / Querying data in SQL databases
  • data storage / Data storage
  • data structures, R
    • about / R data structures
    • vector / Vectors
    • factor / Factors
    • lists / Lists
    • data frame / Data frames
    • matrix / Matrixes and arrays
    • array / Matrixes and arrays
    • saving / Saving, loading, and removing R data structures
    • loading / Saving, loading, and removing R data structures
    • removing / Saving, loading, and removing R data structures
    • exploring / Exploring the structure of data
  • data table
    • about / Making data frames faster with data.table
  • data wrangling
    • about / Working with proprietary files and databases
  • decision nodes
    • about / Understanding decision trees
  • decision tree
    • potential uses / Understanding decision trees
    • about / Understanding decision trees, Example – identifying risky bank loans using C5.0 decision trees
    • divide and conquer / Divide and conquer
    • pruning / Pruning the decision tree
    • used, for identifying risky bank loans / Example – identifying risky bank loans using C5.0 decision trees
    • accuracy, boosting / Boosting the accuracy of decision trees
  • decision tree forests
    • about / Random forests
  • decision trees
    • classification rules, obtaining from / Rules from decision trees
  • deep learning
    • about / The direction of information travel
  • Deep Neural Network (DNN)
    • about / The direction of information travel
  • delimiter
    • about / Importing and saving data from CSV files
  • dendrites
    • about / From biological to artificial neurons
  • dependent events / Understanding joint probability
  • dependent variable
    • about / Understanding regression
  • descriptive model / Types of machine learning algorithms
  • disk-based data frames
    • creating, with ff package / Creating disk-based data frames with ff
  • divide and conquer
    • about / Divide and conquer
  • domain-specific data
    • working with / Working with domain-specific data
    • bioinformatics data, analyzing / Analyzing bioinformatics data
    • network data, analyzing / Analyzing and visualizing network data
    • network data, visualizing / Analyzing and visualizing network data
  • doParallel package
    • using / Taking advantage of parallel with foreach and doParallel
  • dplyr package
    • used, for generalizing tabular data structures / Generalizing tabular data structures with dplyr
    • URL / Generalizing tabular data structures with dplyr
  • dummy coding / Preparing data for use with k-NN, Step 3 – training a model on the data
  • dummy variable / Examining relationships – two-way cross-tabulations, Step 3 – training a model on the data

E

  • early stopping
    • about / Pruning the decision tree
  • edgelist
    • about / Analyzing and visualizing network data
  • elements
    • about / Vectors
  • embarrassingly parallel problems
    • about / Learning faster with parallel computing
  • ensemble methods
    • bagging / Bagging
    • boosting / Boosting
    • random forests / Random forests
  • ensembles
    • about / Understanding ensembles
    • advantages / Understanding ensembles
  • entropy
    • about / Choosing the best split
  • epoch
    • about / Training neural networks with backpropagation
    • forward phase / Training neural networks with backpropagation
    • backward phase / Training neural networks with backpropagation
  • erosion / Simple linear regression
  • Euclidean norm / The case of linearly separable data
  • evaluation / Evaluation

F

  • 10-fold cross-validation (10-fold CV) / Cross-validation
  • F-measure / The F-measure
  • F-score / The F-measure
  • F1 score / The F-measure
  • factor
    • about / Factors
  • feedforward networks
    • about / The direction of information travel
  • ffbase project
    • URL / Creating disk-based data frames with ff
  • ff package
    • used, for creating disk-based data frames / Creating disk-based data frames with ff
    • URL / Creating disk-based data frames with ff
  • five-number summary / Measuring spread – quartiles and the five-number summary
  • foreach package
    • using / Taking advantage of parallel with foreach and doParallel
  • frequently purchased groceries
    • identifying, with association rules / Example – identifying frequently purchased groceries with association rules
  • future performance
    • estimating / Estimating future performance
  • future performance estimation
    • holdout method / The holdout method
    • cross-validation / Cross-validation
    • bootstrap sampling / Bootstrap sampling

G

  • Gaussian RBF kernel / Using kernels for non-linear spaces
  • generalization / Generalization
  • Generalized Linear Models (GLM) / Understanding regression
  • glyph / Step 1 – collecting data
  • GPU
    • about / GPU computing
    • computing / GPU computing
    • URL / GPU computing
  • gradient descent / Training neural networks with backpropagation
  • Graph Modeling Language (GML)
    • about / Analyzing and visualizing network data
  • greedy learners / What makes trees and rules greedy?
  • grid
    • about / Learning faster with parallel computing

H

  • Hadoop
    • using / Parallel cloud computing with MapReduce and Hadoop
    • URL / Parallel cloud computing with MapReduce and Hadoop
  • harmonic mean / The F-measure
  • header line
    • about / Importing and saving data from CSV files
  • histograms / Visualizing numeric variables – histograms
  • holdout method / The holdout method, Cross-validation
  • httr package
    • URL / Downloading the complete text of web pages
  • hyperplane / Understanding Support Vector Machines
  • Hypertext Markup Language (HTML)
    • about / Downloading the complete text of web pages

I

  • igraph package
    • about / Analyzing and visualizing network data
    • URL / Analyzing and visualizing network data
  • imputation / Data preparation – imputing the missing values
  • Incremental Reduced Error Pruning (IREP) algorithm / The RIPPER algorithm
  • independent events / Understanding joint probability
  • independent variables
    • about / Understanding regression
  • information gain / Choosing the best split
  • input data
    • types / Types of input data
    • matching, to algorithms / Matching input data to algorithms
  • input nodes / The number of layers
  • instance-based learning
    • about / Why is the k-NN algorithm lazy?
  • intercept
    • about / Understanding regression
  • Interquartile Range (IQR) / Measuring spread – quartiles and the five-number summary
  • itemset
    • about / Understanding association rules
  • Iterative Dichotomiser 3 (ID3) / The C5.0 decision tree algorithm

J

  • joint probability / Understanding joint probability
  • JSON
    • parsing, from web APIs / Parsing JSON from web APIs
    • about / Parsing JSON from web APIs
    • URL / Parsing JSON from web APIs
  • jsonlite package
    • URL / Parsing JSON from web APIs

K

  • k-fold cross-validation (or k-fold CV) / Cross-validation
  • k-means++ / Using distance to assign and update clusters
  • k-means clustering algorithm
    • about / The k-means clustering algorithm
  • k-NN algorithm
    • about / The k-NN algorithm
    • weaknesses / The k-NN algorithm
    • similarity, measuring with distance / Measuring similarity with distance
    • appropriate k, selecting / Choosing an appropriate k
    • data, preparing / Preparing data for use with k-NN
    • lazy learning algorithm / Why is the k-NN algorithm lazy?
    • used, for diagnosing breast cancer / Example – diagnosing breast cancer with the k-NN algorithm
  • kernels
    • using, for non-linear spaces / Using kernels for non-linear spaces
  • kernel trick / Using kernels for non-linear spaces
  • kernlab
    • reference / Step 3 – training a model on the data

L

  • Laplace estimator
    • about / The Laplace estimator
  • large datasets
    • managing / Managing very large datasets
    • tabular data structures, generalizing with dplyr / Generalizing tabular data structures with dplyr
    • data.table package, using / Making data frames faster with data.table
    • disk-based data frames, creating with ff package / Creating disk-based data frames with ff
    • massive matrices, using with bigmemory package / Using massive matrices with bigmemory
  • latitude / Using kernels for non-linear spaces
  • layers
    • about / The number of layers
  • lazy learning algorithms / Why is the k-NN algorithm lazy?
  • leaf nodes
    • about / Understanding decision trees
  • learning rate / Training neural networks with backpropagation
  • leave-one-out method / Cross-validation
  • left-hand side (LHS) / Understanding association rules
  • levels / Types of machine learning algorithms
  • LIBSVM
    • URL / Step 3 – training a model on the data
  • likelihood
    • about / Computing conditional probability with Bayes' theorem
  • linear kernel / Using kernels for non-linear spaces
  • link function / Understanding regression
  • lists / Lists
  • loess curve / Visualizing relationships among features – the scatterplot matrix
  • logistic regression
    • about / Understanding regression
  • longitude / Using kernels for non-linear spaces

M

  • machine learning
    • origins / The origins of machine learning
    • about / The origins of machine learning
    • abuses / Uses and abuses of machine learning
    • uses / Uses and abuses of machine learning
    • successes / Machine learning successes
    • limitations / The limits of machine learning
    • ethics / Machine learning ethics
    • process / How machines learn
    • with R / Machine learning with R
    • R packages, installing / Installing R packages
    • R packages, loading / Loading and unloading R packages
    • R packages, unloading / Loading and unloading R packages
  • machine learning, in practice
    • about / Machine learning in practice
    • data collection / Machine learning in practice
    • data exploration and preparation / Machine learning in practice
    • model training / Machine learning in practice
    • model evaluation / Machine learning in practice
    • model improvement / Machine learning in practice
    • input data, types / Types of input data
    • algorithms, types / Types of machine learning algorithms
    • input data, matching to algorithms / Matching input data to algorithms
  • machine learning, process
    • about / How machines learn
    • data storage / How machines learn, Data storage
    • abstraction / How machines learn, Abstraction
    • generalization / How machines learn, Generalization
    • evaluation / How machines learn, Evaluation
  • machine learning algorithms
    • types / Types of machine learning algorithms
  • magrittr package
    • about / Scraping data from web pages
    • URL / Scraping data from web pages
  • MapReduce
    • about / Parallel cloud computing with MapReduce and Hadoop
    • map step / Parallel cloud computing with MapReduce and Hadoop
    • reduce step / Parallel cloud computing with MapReduce and Hadoop
  • marginal likelihood
    • about / Computing conditional probability with Bayes' theorem
  • market basket analysis example
    • data, collecting / Step 1 – collecting data
    • data, preparing / Step 2 – exploring and preparing the data
    • data, exploring / Step 2 – exploring and preparing the data
    • sparse matrix, creating for transaction data / Data preparation – creating a sparse matrix for transaction data
    • item support, visualizing / Visualizing item support – item frequency plots
    • transaction data, visualizing / Visualizing the transaction data – plotting the sparse matrix
    • model, training on data / Step 3 – training a model on the data
    • model performance, evaluating / Step 4 – evaluating model performance
    • model performance, improving / Step 5 – improving model performance
    • set of association rules, sorting / Sorting the set of association rules
    • subset of association rules, sorting / Taking subsets of association rules
    • association rules, saving to file / Saving association rules to a file or data frame
    • association rules, saving to data frame / Saving association rules to a file or data frame
  • matrix
    • about / Matrixes and arrays
    / Matrixes and arrays
  • matrix notation / Multiple linear regression
  • maximum margin hyperplane (MMH) / Classification with hyperplanes
  • mean / Measuring the central tendency – mean and median
  • mean absolute error (MAE) / Measuring performance with the mean absolute error
  • medical expenses, predicting with linear regression
    • about / Example – predicting medical expenses using linear regression
    • data, collecting / Step 1 – collecting data
    • data, preparing / Step 2 – exploring and preparing the data
    • data, exploring / Step 2 – exploring and preparing the data
    • correlation matrix / Exploring relationships among features – the correlation matrix
    • relationships, visualizing among features / Visualizing relationships among features – the scatterplot matrix
    • scatterplot matrix / Visualizing relationships among features – the scatterplot matrix
    • model, training on data / Step 3 – training a model on the data
    • model performance, training / Step 4 – evaluating model performance
    • model performance, improving / Step 5 – improving model performance, Model specification – adding non-linear relationships, Transformation – converting a numeric variable to a binary indicator, Model specification – adding interaction effects, Putting it all together – an improved regression model
  • message-passing interface (MPI)
    • about / Working in parallel with multicore and snow
  • meta-learners / Types of machine learning algorithms
  • meta-learning methods
    • used, for improving model performance / Improving model performance with meta-learning
    • about / Improving model performance with meta-learning
  • min-max normalization / Preparing data for use with k-NN
  • mobile phone spam
    • filtering, with Naive Bayes algorithm / Example – filtering mobile phone spam with the Naive Bayes algorithm
  • mobile phone spam example
    • data, collecting / Step 1 – collecting data
    • dat a collecting, URL / Step 1 – collecting data
    • data, preparing / Step 2 – exploring and preparing the data
    • data, exploring / Step 2 – exploring and preparing the data
    • text data, cleaning / Data preparation – cleaning and standardizing text data
    • text data, standardizing / Data preparation – cleaning and standardizing text data
    • text documents, splitting into words / Data preparation – splitting text documents into words
    • training, creating / Data preparation – creating training and test datasets
    • test datasets, creating / Data preparation – creating training and test datasets
    • text data, visualizing / Visualizing text data – word clouds
    • indicator features, creating for frequent words / Data preparation – creating indicator features for frequent words
    • model, training on data / Step 3 – training a model on the data
    • model performance, evaluating / Step 4 – evaluating model performance
    • model performance, improving / Step 5 – improving model performance
  • model performance
    • improving, with meta-learning / Improving model performance with meta-learning
  • model performance, breast cancer example
    • z-score standardization / Transformation – z-score standardization
    • alternatives values, testing of k / Testing alternative values of k
  • model trees / Understanding regression trees and model trees
  • multicore package
    • using / Working in parallel with multicore and snow
  • multilayer network
    • about / The number of layers
  • Multilayer Perceptron (MLP)
    • about / The direction of information travel
  • multimodal / Measuring the central tendency – the mode
  • multinomial logistic regression / Understanding regression
  • multiple linear regression / Understanding regression
    • about / Multiple linear regression
    • weaknesses / Multiple linear regression
  • multiple R-squared value (coefficient of determination) / Step 4 – evaluating model performance
  • multivariate relationships
    • about / Exploring relationships between variables

N

  • Naive Bayes algorithm
    • about / Understanding Naive Bayes, The Naive Bayes algorithm
    • classification / Classification with Naive Bayes
    • Laplace estimator / The Laplace estimator
    • numeric features, using with / Using numeric features with Naive Bayes
    • used, for filtering mobile phone spam / Example – filtering mobile phone spam with the Naive Bayes algorithm
  • nearest neighbor classification
    • about / Understanding nearest neighbor classification
  • network analysis
    • about / Analyzing and visualizing network data
  • network data
    • analyzing / Analyzing and visualizing network data
    • visualizing / Analyzing and visualizing network data
  • network topology
    • about / Network topology
    • layers / The number of layers
    • direction of information travel / The direction of information travel
    • number of nodes in each layer / The number of nodes in each layer
  • neural networks
    • about / Understanding neural networks
    • biological, to artificial neurons / From biological to artificial neurons
    • characteristics / From biological to artificial neurons
    • training, with backpropagation / Training neural networks with backpropagation
  • neurons
    • about / Understanding neural networks
  • nodes / Understanding neural networks
  • nominal / Types of input data
  • nominal variables
    • about / Factors
  • non-linear spaces
    • kernels, using for / Using kernels for non-linear spaces
  • normal distribution / Understanding numeric data – uniform and normal distributions
  • numeric / Types of input data
  • numeric data
    • about / Understanding numeric data – uniform and normal distributions
    • normalizing / Transformation – normalizing numeric data
  • numeric features
    • using, with Naive Bayes / Using numeric features with Naive Bayes
  • numeric prediction / Types of machine learning algorithms
  • numeric variables
    • about / Exploring numeric variables
    • central tendency, measuring / Measuring the central tendency – mean and median
    • spread, measuring / Measuring spread – quartiles and the five-number summary, Measuring spread – variance and standard deviation
    • visualizing / Visualizing numeric variables – boxplots, Visualizing numeric variables – histograms

O

  • OCR, performing with SVMs
    • about / Example – performing OCR with SVMs
    • data, collecting / Step 1 – collecting data
    • data, exploring / Step 2 – exploring and preparing the data
    • data, preparing / Step 2 – exploring and preparing the data
    • model, training on data / Step 3 – training a model on the data
    • model performance, evaluating / Step 4 – evaluating model performance
    • model performance, improving / Step 5 – improving model performance
  • one-way table / Exploring categorical variables
  • online data
    • working with / Working with online data and services
    • parsing / Working with online data and services
    • complete text of web pages, downloading / Downloading the complete text of web pages
    • parsing, within web pages / Scraping data from web pages
  • online services
    • working with / Working with online data and services
  • Open Database Connectivity (ODBC)
    • about / Querying data in SQL databases
  • optimized learning algorithms
    • deploying / Deploying optimized learning algorithms
    • regression models, building with biglm package / Building bigger regression models with biglm
    • random forests, building with bigrf package / Growing bigger and faster random forests with bigrf
    • models in parallel, evaluating with caret package / Training and evaluating models in parallel with caret
  • ordinal / Types of input data
  • ordinary least squares estimation
    • about / Ordinary least squares estimation
  • out-of-bag error rate / Training random forests
  • overfitting / Evaluation

P

  • parallel cloud computing
    • with MapReduce / Parallel cloud computing with MapReduce and Hadoop
    • with Hadoop / Parallel cloud computing with MapReduce and Hadoop
  • parallel computing
    • about / Learning faster with parallel computing
    • execution time, measuring / Measuring execution time
    • with multicore package / Working in parallel with multicore and snow
    • with snow package / Working in parallel with multicore and snow
    • with foreach package / Taking advantage of parallel with foreach and doParallel
    • with doParallel package / Taking advantage of parallel with foreach and doParallel
  • parameter tuning
    • about / Tuning stock models for better performance
  • pattern discovery / Types of machine learning algorithms
  • Pearson's correlation coefficient / Correlations
  • performance
    • measuring, confusion matrices used / Using confusion matrices to measure performance
  • performance measures
    • about / Beyond accuracy – other measures of performance
    • kappa statistic / The kappa statistic
    • sensitivity / Sensitivity and specificity
    • specificity / Sensitivity and specificity
    • precision / Precision and recall
  • performance tradeoffs
    • -visualizing / Visualizing performance trade-offs
  • poisonous mushrooms
    • identifying, with rule learners / Example – identifying poisonous mushrooms with rule learners
  • poisonous mushrooms example, with rule learners
    • data, collecting / Step 1 – collecting data
    • data, exploring / Step 2 – exploring and preparing the data
    • data, preparing / Step 2 – exploring and preparing the data
    • model, training on data / Step 3 – training a model on the data
    • model performance, evaluating / Step 4 – evaluating model performance
    • model performance, improving / Step 5 – improving model performance
  • Poisson regression
    • about / Understanding regression
  • polynomial kernel / Using kernels for non-linear spaces
  • positive predictive value / Precision and recall
  • posterior probability
    • about / Computing conditional probability with Bayes' theorem
  • postpruning
    • about / Pruning the decision tree
  • pre-pruning
    • about / Pruning the decision tree
  • precision / Precision and recall
  • predictive model / Types of machine learning algorithms
  • prior probability
    • about / Computing conditional probability with Bayes' theorem
  • probability
    • about / Understanding probability
  • proprietary files
    • about / Working with proprietary files and databases
    • Microsoft Excel files, reading / Reading from and writing to Microsoft Excel, SAS, SPSS, and Stata files
    • Microsoft Excel files, writing / Reading from and writing to Microsoft Excel, SAS, SPSS, and Stata files
    • SAS files, writing / Reading from and writing to Microsoft Excel, SAS, SPSS, and Stata files
    • SAS files, reading / Reading from and writing to Microsoft Excel, SAS, SPSS, and Stata files
    • SPSS files, reading / Reading from and writing to Microsoft Excel, SAS, SPSS, and Stata files
    • SPSS files, writing / Reading from and writing to Microsoft Excel, SAS, SPSS, and Stata files
    • Stata files, writing / Reading from and writing to Microsoft Excel, SAS, SPSS, and Stata files
    • Stata files, reading / Reading from and writing to Microsoft Excel, SAS, SPSS, and Stata files
  • proprietary microarray
    • using / Analyzing bioinformatics data
  • pure / Choosing the best split
  • purity / Choosing the best split

Q

  • quadratic optimization / The case of linearly separable data
  • quantiles / Measuring spread – quartiles and the five-number summary

R

  • 1 R algorithm / The 1R algorithm
  • R
    • about / Machine learning with R
    • packages, installing / Installing R packages
    • packages, loading / Loading and unloading R packages
    • packages, unloading / Loading and unloading R packages
    • data structures / R data structures
    • used, for managing data / Managing data with R
    • working with classification prediction data / Working with classification prediction data in R
  • R, performance improvement
    • about / Improving the performance of R
    • large datasets, managing / Managing very large datasets
    • parallel computing / Learning faster with parallel computing
    • GPU, computing / GPU computing
    • optimized learning algorithms, deploying / Deploying optimized learning algorithms
  • R-squared value / Step 4 – evaluating model performance
  • Radial Basis Function (RBF) network
    • about / Activation functions
  • random forests
    • about / Random forests
    • URL / Random forests
    • strengths / Random forests
    • training / Training random forests
    • performance, evaluating / Evaluating random forest performance
    • building, with bigrf package / Growing bigger and faster random forests with bigrf
  • RCurl
    • URL / Downloading the complete text of web pages
  • rea under the ROC curve (AUC) / ROC curves
  • Receiver Operating Characteristic (ROC) curve
    • about / ROC curves
    • creating / ROC curves
  • recurrent network
    • about / The direction of information travel
  • recursive partitioning
    • about / Divide and conquer
  • regression
    • about / Understanding regression
    • simple linear regression / Simple linear regression
    • ordinary least squares estimation / Ordinary least squares estimation
    • correlation / Correlations
    • multiple linear regression / Multiple linear regression
    • adding, to trees / Adding regression to trees
  • regression analysis
    • use cases / Understanding regression
  • regression equations
    • about / Understanding regression
  • regression models
    • building, with biglm package / Building bigger regression models with biglm
  • regression trees
    • about / Understanding regression trees and model trees
  • relationships
    • exploring, between variables / Exploring relationships between variables
    • visualizing / Visualizing relationships – scatterplots
    • examining / Examining relationships – two-way cross-tabulations
  • Repeated Incremental Pruning to Produce Error Reduction (RIPPER) algorithm / The RIPPER algorithm
  • residuals / Ordinary least squares estimation
  • resubstitution error / Estimating future performance
  • Revolution Analytics
    • URL / Taking advantage of parallel with foreach and doParallel
  • RHadoop
    • URL / Parallel cloud computing with MapReduce and Hadoop
  • RHIPE package
    • URL / Parallel cloud computing with MapReduce and Hadoop
  • rio package
    • URL / Reading from and writing to Microsoft Excel, SAS, SPSS, and Stata files
    • about / Reading from and writing to Microsoft Excel, SAS, SPSS, and Stata files
  • RIPPER algorithm
    • about / The RIPPER algorithm
  • risky bank loans
    • identifying, C5.0 decision trees used / Example – identifying risky bank loans using C5.0 decision trees
  • rote learning
    • about / Why is the k-NN algorithm lazy?
  • rpart.plot
    • URL / Visualizing decision trees
  • rudimentary ANNs / Understanding neural networks
  • rvest package
    • about / Scraping data from web pages

S

  • scatterplot
    • about / Visualizing relationships – scatterplots
  • scatterplot matrix (SPLOM) / Visualizing relationships among features – the scatterplot matrix
  • Scoville scale / Preparing data for use with k-NN
  • segmentation analysis / Types of machine learning algorithms
  • semi-supervised learning / Clustering as a machine learning task
  • separate and conquer
    • about / Separate and conquer
  • sigmoid kernel / Using kernels for non-linear spaces
  • simple linear regression / Understanding regression
    • about / Simple linear regression
  • simple tuned model
    • creating / Creating a simple tuned model
  • slack variable / The case of nonlinearly separable data
  • slope
    • about / Understanding regression
  • slope-intercept form
    • about / Understanding regression
  • SMS Spam Collection
    • URL / Step 1 – collecting data
  • snowball
    • URL / Data preparation – cleaning and standardizing text data
  • snow package
    • using / Working in parallel with multicore and snow
    • URL / Working in parallel with multicore and snow
  • social networking service (SNS) / Example – finding teen market segments using k-means clustering
  • sparse matrix / Data preparation – splitting text documents into words, Data preparation – creating a sparse matrix for transaction data
  • SQL databases
    • data, querying in / Querying data in SQL databases
  • squashing functions / Activation functions
  • stacking
    • about / Understanding ensembles
  • standard deviation
    • about / Measuring spread – variance and standard deviation
  • standard deviation reduction (SDR) / Adding regression to trees
  • statistical hypothesis testing / Understanding regression
  • stock models
    • tuning, for better performance / Tuning stock models for better performance
  • Structured Query Language (SQL)
    • about / Querying data in SQL databases
  • subtree raising / Pruning the decision tree
  • subtree replacement / Pruning the decision tree
  • summary statistics / Exploring numeric variables
  • supervised learning / Types of machine learning algorithms
  • Support Vector Machine (SVM)
    • about / Understanding Support Vector Machines
    • applications / Understanding Support Vector Machines
    • classifications, with hyperplanes / Classification with hyperplanes
    • case of linearly separable data / The case of linearly separable data
    • case of nonlinearly separable data / The case of nonlinearly separable data
    • OCR, performing with / Example – performing OCR with SVMs
    / Bagging
  • support vectors / Classification with hyperplanes
  • SVMlight
    • about / Step 3 – training a model on the data
    • URL / Step 3 – training a model on the data
  • synapse
    • about / From biological to artificial neurons

T

  • Tab-Separated Value (TSV)
    • about / Importing and saving data from CSV files
  • tabular
    • about / Importing and saving data from CSV files
  • tabular data structures
    • generalizing, with dplyr package / Generalizing tabular data structures with dplyr
  • teen market segments search, with k-means clustering
    • about / Example – finding teen market segments using k-means clustering
    • data, collecting / Step 1 – collecting data
    • data, exploring / Step 2 – exploring and preparing the data
    • data, preparing / Step 2 – exploring and preparing the data, Data preparation – dummy coding missing values, Data preparation – imputing the missing values
    • model, training on data / Step 3 – training a model on the data
    • model performance, evaluating / Step 4 – evaluating model performance
    • model performance, improving / Step 5 – improving model performance
  • terminal nodes / Understanding decision trees
  • threshold activation function / Activation functions
  • training / Abstraction
  • trees
    • regression, adding to / Adding regression to trees
  • tree structure
    • about / Understanding decision trees
  • tuning process
    • customizing / Customizing the tuning process
  • two-way cross-tabulation
    • about / Examining relationships – two-way cross-tabulations

U

  • UCI Machine Learning Data Repository
    • URL / Step 1 – collecting data, Step 1 – collecting data, Step 1 – collecting data
    • about / Step 1 – collecting data
  • unimodal / Measuring the central tendency – the mode
  • unit of analysis / Types of input data
  • unit of observation / Types of input data
  • unit step activation function / Activation functions
  • univariate statistics
    • about / Exploring relationships between variables
  • universal function approximator / The number of nodes in each layer
  • unsupervised learning / Types of machine learning algorithms

V

  • vector
    • about / Vectors
  • vector types
    • types / Vectors
  • Voronoi diagram / Using distance to assign and update clusters

W

  • web pages
    • complete text, downloading / Downloading the complete text of web pages
    • data, parsing / Scraping data from web pages
    • XML documents, parsing / Parsing XML documents
    • JSON, parsing from web APIs / Parsing JSON from web APIs
  • web scraping
    • about / Scraping data from web pages
  • wine quality estimation, with regression trees
    • about / Example – estimating the quality of wines with regression trees and model trees
    • data, collecting / Step 1 – collecting data
    • data, preparing / Step 2 – exploring and preparing the data
    • data, exploring / Step 2 – exploring and preparing the data
    • model, training on data / Step 3 – training a model on the data
    • decision trees, visualizing / Visualizing decision trees
    • model performance, evaluating / Step 4 – evaluating model performance
    • performance, measuring with mean absolute error / Measuring performance with the mean absolute error
    • model performance, improving / Step 5 – improving model performance
  • word cloud
    • about / Visualizing text data – word clouds
  • wordcloud package
    • URL / Visualizing text data – word clouds

X

  • xml2 GitHub
    • URL / Parsing XML documents
  • XML documents
    • parsing / Parsing XML documents
  • XML package
    • about / Parsing XML documents
    • URL / Parsing XML documents

Z

  • z-score / Preparing data for use with k-NN
  • z-score standardization / Preparing data for use with k-NN, Transformation – z-score standardization
  • ZeroR / The 1R algorithm
lock icon The rest of the chapter is locked
arrow left Previous Section
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime
Visually different images