Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Arrow up icon
GO TO TOP
Scala for Machine Learning

You're reading from   Scala for Machine Learning Leverage Scala and Machine Learning to construct and study systems that can learn from data

Arrow left icon
Product type Paperback
Published in Dec 2014
Publisher
ISBN-13 9781783558742
Length 624 pages
Edition 1st Edition
Languages
Arrow right icon
Author (1):
Arrow left icon
 R. Nicolas R. Nicolas
Author Profile Icon R. Nicolas
R. Nicolas
Arrow right icon
View More author details
Toc

Table of Contents (20) Chapters Close

Scala for Machine Learning
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
1. Getting Started FREE CHAPTER 2. Hello World! 3. Data Preprocessing 4. Unsupervised Learning 5. Naïve Bayes Classifiers 6. Regression and Regularization 7. Sequential Data Models 8. Kernel Models and Support Vector Machines 9. Artificial Neural Networks 10. Genetic Algorithms 11. Reinforcement Learning 12. Scalable Frameworks Basic Concepts Index

Index

A

  • abstraction, Scala
    • about / Abstraction
    • higher-kind projection / Higher-kind projection
    • covariant functors for vectors / Covariant functors for vectors
    • contravariant functors for co-vectors / Contravariant functors for co-vectors
    • monads / Monads
  • Actor model
    • about / The Actor model
    • components / The Actor model
  • actors
    • about / Scalability
  • adaptive modeling / Model categorization
  • Akka.io
    • about / An overview
  • Akka framework
    • about / An overview, Akka
    • URL / Akka
    • master-workers / Master-workers
    • futures / Futures
  • Algebird
    • about / Abstraction
  • algebraic libraries
    • about / Algebraic and numerical libraries
    • jBlas 1.2.3 / Algebraic and numerical libraries
    • Colt 1.2.0 / Algebraic and numerical libraries
    • AlgeBird 2.10 / Algebraic and numerical libraries
    • Breeze 0.8 / Algebraic and numerical libraries
  • alternative preprocessing techniques
    • autoregressive models / Alternative preprocessing techniques
    • curve-fitting algorithms / Alternative preprocessing techniques
    • nonlinear dynamic systems / Alternative preprocessing techniques
    • Hidden Markov models / Alternative preprocessing techniques
  • annual dividend yield
    • about / Fundamental analysis
  • Apache Commons Math
    • URL / Don't reinvent the wheel!
    • about / Apache Commons Math
    • description / Description
    • licensing / Licensing
    • installation / Installation
    • installation, for Mac OS X / Installation
    • installation, for Windows / Installation
  • Apache Spark
    • about / Apache Spark
    • features / Why Spark?
    • deign principles / Design principles
    • deployment modes / Deploying Spark
    • performance evaluation / Performance evaluation
    • pros / Pros and cons
    • cons / Pros and cons
  • Apache Spark (Akka)
    • about / Scalability
  • artificial neural networks
    • feed-forward neural networks / Feed-forward neural networks
    • advantages / Benefits and limitations
    • disadvantages / Benefits and limitations
  • autonomous systems / The problem
  • Autoregressive Integrated Moving Average (ARIMA) / Alternative preprocessing techniques
  • Autoregressive Moving Average (ARMA) / Alternative preprocessing techniques

B

  • batch gradient descent algorithm / Selecting an optimizer
  • batch training / Online training versus batch training
  • Baum-Welch estimator
    • about / The Baum-Welch estimator (EM)
  • Bayesian network
    • about / Probabilistic graphical models
  • Berkeley Data Analytics Stack (BDAS)
    • reference / Apache Spark
  • Bernoulli mixture model
    • about / Model
  • Bernoulli model
    • about / The Multivariate Bernoulli classification
  • bias-variance decomposition
    • about / Bias-variance decomposition
  • bias input / Mathematical background
  • binary SVC
    • about / The binary SVC
    • LIBSVM / LIBSVM
    • design / Design
    • configuration parameters / Configuration parameters
    • interface to LIBSVM / Interface to LIBSVM
    • training / Training
    • classification / Classification
    • c-penalty and margin / C-penalty and margin
    • kernel evaluation / Kernel evaluation
    • applications in risk analysis / Applications in risk analysis
  • Breeze Scala libraries / Abstraction
  • Broyden-Fletcher-Goldfarb-Shanno (BGFS) / BFGS

C

  • C-Epsilon SVM formulation / The nonseparable case – the soft margin
  • cake pattern
    • about / Configurability
    / Step 3 – instantiation
  • case classes
    • versus companion objects / Companion objects versus case classes
    • versus enumerations / Enumerations versus case classes
    • advantages / Enumerations versus case classes
  • cash per share
    • about / Fundamental analysis
  • categories, NP problems
    • about / NP problems
    • P-problems / NP problems
    • NP problems / NP problems
    • NP-complete problems / NP problems
    • NP-hard problems / NP problems
  • centroid / K-means clustering
  • Cholesky decomposition
    • about / Cholesky factorization
  • Cholesky factorization
    • about / Cholesky factorization
  • chromosomes / Evolutionary computing
  • class constructor template
    • about / Class constructor template
  • classification model, evaluation factors
    • accuracy / Key quality metrics
    • precision / Key quality metrics
    • recall / Key quality metrics
    • F-measure or F-score F / Key quality metrics
    • G-measure / Key quality metrics
  • classification model, terminology
    • true positives (TP) / Key quality metrics
    • true negatives (TN) / Key quality metrics
    • false positives (FP) / Key quality metrics
    • false negatives (FN) / Key quality metrics
  • class prior
    • about / Formalism
  • class prior probability
    • about / Formalism
  • cluster assignment, K-means clustering
    • about / Step 2 – cluster assignment
  • cluster configuration, K-means clustering
    • about / Step 1 – cluster configuration
    • clusters, defining / Defining clusters
    • clusters, initializing / Initializing clusters
  • clustering
    • about / Clustering
    • expectation-maximization algorithm / The expectation-maximization algorithm
  • clustering algorithms
    • K-means clustering / Clustering, K-means clustering
    • EM / Clustering
  • co-vector
    • about / Higher-kind projection
  • code snippets
    • format / Code snippets format
  • common discriminative kernels
    • about / Common discriminative kernels
  • companion objects
    • versus case classes / Companion objects versus case classes
  • complex adaptive systems / Introduction to LCS
  • components, XCS
    • about / XCS components
    • application to portfolio management / Application to portfolio management, The XCS core data
    • XCS rules / XCS rules
    • covering / Covering
    • implementation example / An implementation example
  • computational workflow
    • overview / An overview of computational workflows
  • conditional dependency / Training
  • conditional independence / A model by any other name
    • about / Probabilistic graphical models
  • conditional random field (CRF)
    • about / Conditional random fields, Introduction to CRF
    • linear chain CRF / Linear chain CRF
    • potential functions / Linear chain CRF
    • identity potential functions / Linear chain CRF
    • transition feature functions / Linear chain CRF
    • state feature functions / Linear chain CRF
    • text analytics / Regularized CRFs and text analytics
    • versus HMM / Comparing CRF and HMM
  • configurability
    • about / Configurability
  • configuration parameters, SVM
    • SVM formulation / The SVM formulation
    • SVM kernel function / The SVM kernel function
    • SVM execution / The SVM execution
  • confusion matrix / F-score for multinomial classification
  • conjugate directions
    • about / Conjugate gradient
  • conjugate gradient
    • about / Conjugate gradient
  • connectionism
    • about / The biological background
  • constructive tuning strategy / Regularization
  • Consumer Price Index (CPI)
    • about / Fundamental analysis
  • consumer price index (CPI)
    • about / Introducing the multinomial Naïve Bayes
  • continuation-passing style (CPS) / Beyond actors – reactive programming
  • control learning / A solution – Q-learning
  • convolution neural networks
    • about / Convolution neural networks
    • local receptive fields / Local receptive fields
    • weights, sharing / Sharing of weights
    • convolution layers / Convolution layers
    • subsampling layers / Subsampling layers
    • fully connected hidden layer and output layer / Putting it all together
  • core parking
    • about / Performance evaluation
  • Counter class
    • about / Counter
  • covariant functor
    • about / Covariant functors for vectors
  • cross-validation, model
    • about / Cross-validation
    • one-fold cross validation / One-fold cross validation
    • K-fold cross validation / K-fold cross validation
  • crossover operator, genetic algorithm implementation
    • about / Crossover
    • population / Population
    • chromosomes / Chromosomes
    • genes / Genes
  • curve fitting
    • about / Supervised learning

D

  • Darwinian process / The origin
  • data, profiling
    • about / Profiling data
    • immutable statistics / Immutable statistics
    • Z-score / Z-Score and Gauss
  • data chunks / 0xdata Sparkling Water
  • data clustering
    • about / Clustering
  • data elements / 0xdata Sparkling Water
  • data extraction
    • about / Data extraction
  • data frames / 0xdata Sparkling Water
  • data partitioning
    • about / Clustering
  • data segmentation
    • about / Clustering
  • DataSourceConfig class
    • pathName parameter / Data extraction
    • normalize parameter / Data extraction
    • reverseOrder parameter / Data extraction
    • headerLines parameter / Data extraction
  • DBpedia / Basics of information retrieval
  • decision-making agent / Concepts
  • decision boundary / Plotting data
  • decoding, hidden Markov model (HMM)
    • about / Decoding – CF-3
    • Viterbi algorithm / The Viterbi algorithm
  • def
    • about / Understanding the problem
  • dependency injection
    • about / Configurability
  • deployment modes, Spark
    • standalone / Deploying Spark
    • local / Deploying Spark
    • Yarn clusters manager / Deploying Spark
    • Apache Mesos resource manager / Deploying Spark
  • descriptive models / Model categorization
  • designing
    • about / Model versus design
  • design principles, Spark
    • about / Design principles
    • in-memory persistency / In-memory persistency
    • laziness / Laziness
    • transforms / Transforms and actions
    • actions / Transforms and actions
    • shared variables / Shared variables
  • design template, for classifiers
    • about / Design template for immutable classifiers
  • destructive tuning strategy / Regularization
  • DFT-based filtering
    • about / DFT-based filtering
  • dimension reduction
    • about / Dimension reduction, Dimension reduction
    • principal components analysis / Principal components analysis
    • non-linear models / Non-linear models
  • directed graphical models
    • about / Probabilistic graphical models
  • discrete Fourier transform (DFT)
    • about / Discrete Fourier transform
    / PCA
  • discrete Kalman filter
    • about / The discrete Kalman filter
    • recursive algorithm / The discrete Kalman filter, The recursive algorithm
    • optimal estimator / The discrete Kalman filter
    • state space estimation / The state space estimation
    • benefits / Benefits and drawbacks
    • drawbacks / Benefits and drawbacks
    • alternative preprocessing techniques / Alternative preprocessing techniques
  • discretization / Value encoding
  • dividend coverage ratio
    • about / Fundamental analysis
  • DMatrix class
    • about / DMatrix class
  • DNA / Evolutionary computing
  • Domain Specific Languages (DSL)
    • about / Maintainability
  • dynamic programming
    • about / Overview of dynamic programming

E

  • earnings per share (EPS)
    • about / Fundamental analysis
  • Eigenvalue decomposition
    • about / Eigenvalue decomposition
  • encapsulation
    • about / Encapsulation
    • package scope / Encapsulation
    • class or object scope / Encapsulation
  • encoding scheme, genetic encoding
    • about / The encoding scheme
    • flat encoding / Flat encoding
    • hierarchical encoding / Hierarchical encoding
  • enumerations
    • versus case classes / Enumerations versus case classes
    • advantages / Enumerations versus case classes
  • epoch / The training epoch
  • Erlang programming language / The Actor model
  • error backpropagation, training epoch
    • about / Step 2 – error backpropagation
    • weights' adjustment / Weights' adjustment
    • error propagation / The error propagation
    • computational model / The computational model
  • error handling, monadic data transformation
    • about / Error handling
    • input value / Error handling
    • output value / Error handling
  • error insensitive zone
    • about / An overview
  • evaluation
    • about / Evaluation
    • execution profile / The execution profile
    • impact of learning rate / Impact of the learning rate
    • impact of momentum factor / The impact of the momentum factor
    • impact of number of hidden layers / The impact of the number of hidden layers
    • test case / Test case
  • evaluation, hidden Markov model (HMM)
    • about / Evaluation – CF-1
    • alpha algorithm / Alpha – the forward pass
    • beta algorithm / Beta – the backward pass
  • evidence
    • about / Formalism
  • evolution
    • about / Evolution
    • origin / The origin
    • NP problems / NP problems
    • ary computing / Evolutionary computing
  • exchange-traded funds (ETFs) / Test case
  • ExecutionContextTaskSupport
    • about / Processing a parallel collection
  • expectation-maximization (EM)
    • about / Training – CF-2
  • expectation-maximization algorithm
    • about / The expectation-maximization algorithm
    • Gaussian mixture models / Gaussian mixture models
    • overview / Overview of EM
    • implementation / Implementation
    • classification / Classification
    • testing / Testing
    • online EM algorithm / The online EM algorithm
  • experimenting, with Spark
    • about / Experimenting with Spark
    • Spark, deploying / Deploying Spark
    • Spark shell, using / Using Spark shell
    • MLlib / MLlib
    • RDD generation / RDD generation
    • K-means, using Spark / K-means using Spark
  • exponential moving average
    • about / The exponential moving average
  • exponential normalization / Softmax
  • extended Kalman filter (EKF) / Benefits and drawbacks
  • Extended Kalman Filters (EKF) / The discrete Kalman filter
  • extended learning classifier systems
    • about / Extended learning classifier systems
    • exploration phase / Extended learning classifier systems
    • exploitation phase / Extended learning classifier systems
    • components / XCS components

F

  • -fold cross validation / K-fold cross validation
  • F-score for binomial classification
    • about / F-score for binomial classification
  • F-score for multinomial classification
    • about / F-score for multinomial classification
    • macro method / F-score for multinomial classification
    • micro method / F-score for multinomial classification
  • Fast Fourier Transform (FFT)
    • about / Discrete Fourier transform
  • features extraction
    • about / Extracting features
  • features maps / Sharing of weights
  • features selection
    • about / Selecting features
  • Federal Fund rate
    • about / Fundamental analysis
  • Federal fund rate (FDF)
    • about / Introducing the multinomial Naïve Bayes
  • feed-forward neural network (FFNN) / The biological background
  • feed-forward neural networks
    • about / Feed-forward neural networks
    • biological background / The biological background
    • mathematical background / Mathematical background
  • FFNN without a hidden layer / The multilayer perceptron
  • finances 101
    • about / Finances 101
    • fundamental analysis / Fundamental analysis
    • technical analysis / Technical analysis
    • options trading / Options trading
    • financial data sources / Financial data sources
  • first order predicate logic
    • about / First order predicate logic
  • fitness functions, genetic algorithms
    • about / The fitness score
    • fixed fitness function / The fitness score
    • evolutionary fitness function / The fitness score
    • approximate fitness function / The fitness score
  • fixed lag smoothing / Fixed lag smoothing
  • fork-join pool
    • about / Processing a parallel collection
  • ForkJoinTaskSupport
    • about / Processing a parallel collection
  • Fourier analysis
    • about / Fourier analysis
    • discrete Fourier transform (DFT) / Discrete Fourier transform
    • DFT-based filtering / DFT-based filtering
    • market cycles, detecting / Detection of market cycles
  • Fourier transform
    • about / Fourier analysis
  • frameworks
    • about / Tools and frameworks
  • frequency domain
    • about / Discrete Fourier transform
  • fully connected neural network / The network topology
  • function approximation
    • about / Supervised learning
    / Quantization
  • functors
    • about / Abstraction
  • fundamental analysis
    • about / Fundamental analysis
  • futures, Akka framework
    • about / Futures
    • Actor life cycle / The Actor life cycle
    • blocking on / Blocking on futures
    • future callbacks, handling / Handling future callbacks

G

  • Gauss-Newton technique
    • about / Gauss-Newton
  • generalized autoregressive conditional heteroscedasticity (GARCH) / Alternative preprocessing techniques
  • generic Lp -norm
    • about / Ln roughness penalty
  • genes / Evolutionary computing
  • genetic algorithms
    • about / Genetic algorithms and machine learning
    • discrete model parameters / Genetic algorithms and machine learning
    • reinforcement learning / Genetic algorithms and machine learning
    • neural network architecture / Genetic algorithms and machine learning
    • ensemble learning / Genetic algorithms and machine learning
    • components / Genetic algorithm components
    • fitness score / The fitness score
    • implementation / Implementation
    • tests / Tests
    • advantages / Advantages and risks of genetic algorithms
    • disadvantages / Advantages and risks of genetic algorithms
  • genetic algorithms, for trading strategies
    • about / GA for trading strategies
    • trading strategies, defining / Definition of trading strategies
    • test case / A test case
  • genetic encoding
    • about / Genetic algorithm components, Encoding
    • value encoding / Value encoding
    • predicate encoding / Predicate encoding
    • solution encoding / Solution encoding
    • encoding scheme / The encoding scheme
  • genetic fitness functions
    • about / Genetic algorithm components
  • genetic operators
    • about / Genetic algorithm components, Genetic operators
    • selection / Genetic operators, Selection
    • crossover / Genetic operators, Crossover
    • mutation / Genetic operators, Mutation
    • transposition operator / Genetic operators
  • GNU Lesser General Public License (LGPL) / Licensing
  • GoogleFinancials / Data sources
  • gradient descent / Ordinary least squares regression
  • gradient descent methods
    • about / Steepest descent
    • steepest descent / Steepest descent
    • conjugate gradient / Conjugate gradient
    • stochastic gradient descent / Stochastic gradient descent
  • graph-structured CRF / Introduction to CRF
  • graphical models / Probabilistic graphical models
  • gross domestic product (GDP)
    • about / Introducing the multinomial Naïve Bayes
  • Growth Domestic Product (GDP)
    • about / Fundamental analysis

H

  • Hadoop Distributed File System (HDFS) / Step 2 – loading data
  • Hadoop distributed file system (HDFS) / Apache Spark
  • hard margin / The separable case – the hard margin
  • Hessian matrix
    • about / Jacobian and Hessian matrices
  • hidden layers / The multilayer perceptron
  • hidden Markov model (HMM)
    • about / The hidden Markov model
    • components / The hidden Markov model
    • canonical forms / The hidden Markov model
    • notations / Notations
    • lambda model / The lambda model
    • design / Design
    • evaluation / Evaluation – CF-1
    • training / Training – CF-2
    • decoding / Decoding – CF-3
    • canonical forms, implementing / Putting it all together
    • training, test case 1 / Test case 1 – training
    • evaluation, test case 2 / Test case 2 – evaluation
    • as filtering technique / HMM as a filtering technique
    • performance consideration / Performance consideration
  • Hidden Naïve Bayes (HNB) / Training
  • hinge loss / The nonseparable case – the soft margin
  • HMM constructor
    • config / Putting it all together
    • xt / Putting it all together
    • form / Putting it all together
    • quantize / Putting it all together
    • f / Putting it all together
  • hyperplane / Binomial classification

I

  • implementation, genetic algorithms
    • about / Implementation
    • software design / Software design
    • key components / Key components
    • selection operator / Selection
    • population growth, controlling / Controlling the population growth
    • GA configuration / The GA configuration
    • crossover operator / Crossover
    • mutation operator / Mutation
    • reproduction / Reproduction
    • solver / Solver
  • implementation, Q-learning
    • about / Implementation
    • software design / Software design
    • states and actions / The states and actions
    • search space / The search space, The policy and action-value
    • Q-learning components / The Q-learning components
    • Q-learning training / The Q-learning training
    • tail recursion to rescue / Tail recursion to the rescue
    • validation / The validation
    • prediction / The prediction
  • information retrieval and text mining
    • about / Basics of information retrieval
  • input forward propagation, training epoch
    • about / Step 1 – input forward propagation
    • computational flow / The computational flow
    • error functions / Error functions
    • operating nodes / Operating modes
    • softmax / Softmax
  • insensitive error
    • about / An overview

J

  • Jacobian matrix
    • about / Jacobian and Hessian matrices
  • Java
    • about / Java
  • JBlas/Linpack
    • URL / Don't reinvent the wheel!
  • JFreeChart
    • about / JFreeChart
    • description / Description
    • licensing / Licensing
    • installation / Installation
    • installation, for Mac OSX / Installation
    • installation, for Windows / Installation
  • JFreeChart library
    • about / Bias-variance decomposition

K

  • K-fold cross-validation scheme / Assessing a model
  • K-means clustering
    • about / K-means clustering
    • similarity, measuring / Measuring similarity
    • algorithm, defining / Defining the algorithm
    • cluster configuration / Step 1 – cluster configuration
    • cluster assignment / Step 2 – cluster assignment
    • reconstruction/error minimization / Step 3 – reconstruction/error minimization
    • classification / Step 4 – classification
    • curse of dimensionality / The curse of dimensionality
    • evaluation, setting up / Setting up the evaluation
    • results, evaluating / Evaluating the results
    • number of clusters, tuning / Tuning the number of clusters
    • validation / Validation
  • Kalman smoothing
    • about / Kalman smoothing
  • kernel functions
    • about / Kernel functions, An overview
    • common discriminative kernels / Common discriminative kernels
    • linear kernel (dot product) / Common discriminative kernels
    • polynomial kernel / Common discriminative kernels
    • radial basis function (RBF) / Common discriminative kernels
    • sigmoid kernel / Common discriminative kernels
    • Laplacian kernel / Common discriminative kernels
    • log kernel / Common discriminative kernels
    • kernel monadic composition / Kernel monadic composition
  • kernel trick
    • about / The kernel trick
  • key components, genetic algorithm implementation
    • population / Population
    • chromosomes / Chromosomes
    • genes / Genes
  • keyquality metrics
    • about / Key quality metrics

L

  • L1 regularization / Ln roughness penalty
  • L2 regularization / Ln roughness penalty
  • Lagrange multipliers
    • about / Lagrange multipliers
  • Laplace / The zero-frequency problem
  • lasso regularization
    • about / Ln roughness penalty
  • Latent Dirichlet allocation (LDA)
    • about / Probabilistic graphical models
  • lazy methods
    • about / Computation on demand
  • LDL decomposition / LDL decomposition
  • learning classifier systems (LCS)
    • about / Learning classifier systems, Introduction to LCS
    • components / Introduction to LCS
    • features / Why LCS?
    • terminology / Terminology
    • benefits / Benefits and limitations of learning classifier systems
    • limitations / Benefits and limitations of learning classifier systems
  • learning vector quantization / Clustering
  • least squares problem / Numerical optimization
  • lemmatization / Basics of information retrieval
  • Levenberg-Marquardt
    • about / Levenberg-Marquardt
  • Levenstein distance / Basics of information retrieval
  • libraries
    • about / Other libraries and frameworks
  • libraries directory
    • about / List of libraries and tools
  • LIBSVM
    • about / LIBSVM
    • URL, for downloading / LIBSVM
    • URL, for documentation / LIBSVM
    • benefits / LIBSVM
  • LIBSVM, Java classes
    • svm_model / LIBSVM
    • svm_node / LIBSVM
    • svm_parameters / LIBSVM
    • svm_problem / LIBSVM
    • svm / LIBSVM
  • Lidstone / The zero-frequency problem
  • likelihood
    • about / Formalism
  • Limited memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) / L-BFGS
  • linear algebra
    • about / Linear algebra
    • QR decomposition / QR decomposition
    • LU factorization / LU factorization
    • LDL decomposition / LDL decomposition
    • Cholesky factorization / Cholesky factorization
    • singular value decomposition (SVD) / Singular Value Decomposition
    • Eigenvalue decomposition / Eigenvalue decomposition
    • algebraic libraries / Algebraic and numerical libraries
    • numerical libraries / Algebraic and numerical libraries
  • linear chain CRF / Introduction to CRF
  • linear chain structured graph CRF / Introduction to CRF
  • linear regression
    • about / Linear regression
    • one-variate linear regression / One-variate linear regression
    • ordinary least squares regression / Ordinary least squares regression
    • versus SVR / SVR versus linear regression
  • linear SVM
    • about / The linear SVM
    • separable case (hard margin) / The separable case – the hard margin
    • nonseparable case (soft margin) / The nonseparable case – the soft margin
  • LogBinRegression constructor
    • obsSet / Step 5 – implementing the classifier
    • expected / Step 5 – implementing the classifier
    • maxIters / Step 5 – implementing the classifier
    • eta / Step 5 – implementing the classifier
    • eps / Step 5 – implementing the classifier
  • logistic regression
    • about / Logistic regression
    • logistic function / Logistic function
    • binomial classification / Binomial classification
    • design / Design
    • training workflow / The training workflow
    • classification / Classification
  • low-band filter
    • about / The exponential moving average
  • LU factorization
    • about / LU factorization
    • basic LU factorization / LU factorization
    • with pivot / LU factorization

M

  • machine learning
    • features / Why machine learning?
  • machine learning algorithms
    • taxonomy / Taxonomy of machine learning algorithms
  • machine learning problems
    • classification / Classification
    • prediction / Prediction
    • optimization / Optimization
    • regression / Regression
  • maintainability
    • about / Maintainability
  • Markov decision processes
    • about / Markov decision processes
    • Markov property / Markov decision processes, The Markov property
    • first order discrete Markov chain / The first order discrete Markov chain
  • master-workers, Akka
    • about / Master-workers
    • exchange of messages / Exchange of messages
    • worker actors / Worker actors
    • workflow controller / The workflow controller
    • master actor / The master actor
    • master with routing / Master with routing
    • discrete Fourier transform (DFT) / Distributed discrete Fourier transform
    • limitations / Limitations
  • mathematical abstractions
    • about / Supporting mathematical abstractions
    • variable declaration / Step 1 – variable declaration
    • model definition / Step 2 – model definition
    • instantiation / Step 3 – instantiation
  • mathematical concepts
    • about / Mathematics
    • linear algebra / Linear algebra
    • first order predicate logic / First order predicate logic
    • Jacobian matrix / Jacobian and Hessian matrices
    • Hessian matrix / Jacobian and Hessian matrices
    • optimization techniques / Summary of optimization techniques
    • dynamic programming / Overview of dynamic programming
  • mathematical notation / Mathematical notation for the curious
  • maximum margin classifiers
    • kernel trick / Max-margin classification
  • mean squared error (MSE) / One-variate linear regression
  • measurement noise covariance / The measurement equation
  • message-passing mechanisms
    • fire-and-forget or tell / The Actor model
    • send-and-receive or ask / The Actor model
  • metaphor for graphical models / Probabilistic graphical models
  • methodology
    • defining / Defining a methodology
  • Michigan approach / Why LCS?
  • mixins
    • about / Composing mixins to build a workflow
  • mixins, composing for building workflow
    • about / Composing mixins to build a workflow
    • problem, understanding / Understanding the problem
    • modules, defining / Defining modules
    • workflow, instantiating / Instantiating the workflow
  • model
    • about / A model by any other name
    • features / A model by any other name
    • attributes / A model by any other name
    • variables / A model by any other name
    • parametric / A model by any other name
    • differential / A model by any other name
    • probabilistic / A model by any other name
    • graphical / A model by any other name
    • directed graphs / A model by any other name
    • numerical method / A model by any other name
    • chemistry / A model by any other name
    • taxonomy / A model by any other name
    • grammar and lexicon / A model by any other name
    • inference logic / A model by any other name
    • versus design / Model versus design
    • features, selecting / Selecting features
    • features, extracting / Extracting features
  • model, assessing
    • about / Assessing a model
    • validation / Validation
    • cross-validation / Cross-validation
    • bias-variance decomposition / Bias-variance decomposition
    • overfitting / Overfitting
  • model categorization
    • about / Model categorization
    • predictive models / Model categorization
    • descriptive models / Model categorization
    • adaptive modeling / Model categorization
  • modeling
    • about / Modeling, Model versus design
  • monadic composition
    • about / Monads
  • monadic data transformation
    • about / Monadic data transformation
    • explicit model / Monadic data transformation, Explicit models
    • implicit model / Monadic data transformation, Implicit models
    • error handling / Error handling
  • monads
    • about / Abstraction, Monads
  • Monitor class
    • about / Monitor
  • morphism / Error handling
  • moving averages
    • about / Moving averages
    • simple moving average / The simple moving average
    • weighted moving average / The weighted moving average
    • exponential moving average / The exponential moving average
  • multilayer perceptron
    • about / The multilayer perceptron
    • activation function / The activation function
    • network topology / The network topology
    • design / Design
    • UML class diagram / Design
    • configuration / Configuration
    • network components / Network components
    • model / The model
    • problem types (modes) / Problem types (modes)
    • online training, versus batch training / Online training versus batch training
    • training epoch / The training epoch
    • training and classification / Training and classification
  • multinomial Naïve Bayes model
    • about / Introducing the multinomial Naïve Bayes
    • formalism / Formalism
    • frequentist perspective / The frequentist perspective
    • predictive model / The predictive model
    • zero-frequency problem / The zero-frequency problem
  • Multivariate Bernoulli classification
    • about / The Multivariate Bernoulli classification
    • model / Model
    • implementation / Implementation
  • mutation operator, genetic algorithm implementation
    • about / Mutation
    • population / Population
    • chromosomes / Chromosomes
    • genes / Genes

N

  • n-grams / Basics of information retrieval
  • natural language processing (NLP) / The feature functions model
  • Naïve Bayes
    • applying, to text mining / Naïve Bayes and text mining
  • Naïve Bayes algorithm
    • pros / Pros and cons
    • cons / Pros and cons
  • Naïve Bayes classifiers
    • about / Naïve Bayes classifiers
    • multinomial Naïve Bayes / Introducing the multinomial Naïve Bayes
  • Naïve Bayes classifiers implementation
    • about / Implementation
    • design / Design
    • training / Training
    • classification / Classification
    • F1 validation / F1 validation
    • feature extraction / Feature extraction
    • testing / Testing
  • Naïve Bayes models
    • about / Probabilistic graphical models
    • mathematical notation / Formalism
  • net profit margin
    • about / Fundamental analysis
  • net sales
    • about / Fundamental analysis
  • network components, multilayer perceptron
    • about / Network components
    • network topology / The network topology
    • input and hidden layers / Input and hidden layers
    • output layer / The output layer
    • synapses / Synapses
    • connections / Connections
    • initialization weights / The initialization weights
  • non-linear models, dimension reduction
    • about / Non-linear models
    • kernel PCA / Kernel PCA
    • manifolds / Manifolds
  • nonlinear least squares minimization
    • about / Nonlinear least squares minimization
    • Gauss-Newton / Gauss-Newton
    • Levenberg-Marquardt / Levenberg-Marquardt
  • nonlinear SVM
    • about / The nonlinear SVM
    • max-margin classification / Max-margin classification
    • kernel trick / The kernel trick
  • NP problems
    • categories / NP problems
    • about / NP problems
  • Nu-SVM / The nonseparable case – the soft margin
  • numerical optimization
    • about / Numerical optimization
    • Newton / Numerical optimization
    • Quasi-Newton / Numerical optimization

O

  • observation
    • about / Extracting features
  • one-class SVC
    • used, for anomaly detection / Anomaly detection with one-class SVC
  • one-variate linear regression
    • about / One-variate linear regression
    • implementation / Implementation
    • test case / Test case
  • online training / Online training versus batch training
  • operating income
    • about / Fundamental analysis
  • operating profit margin
    • about / Fundamental analysis
  • optimal substructures
    • about / Overview of dynamic programming
  • optimization techniques
    • about / Summary of optimization techniques
    • gradient descent methods / Steepest descent
    • Quasi-Newton algorithms / Quasi-Newton algorithms
    • nonlinear least squares minimization / Nonlinear least squares minimization
    • Lagrange multipliers / Lagrange multipliers
  • OptionModel class / The OptionModel class
  • OptionProperty class / The OptionProperty class
  • options trading
    • about / Options trading
  • option trading, with Q-learning
    • about / Option trading using Q-learning
    • OptionProperty class / The OptionProperty class, The OptionModel class
    • quantization / Quantization
  • ordinary least squares regression
    • about / Ordinary least squares regression
    • design / Design
    • implementation / Implementation
    • trending, test case 1 / Test case 1 – trending
    • feature selection, test case 2 / Test case 2 – feature selection
  • overfitting
    • about / Overfitting, The frequentist perspective
  • overlapping substructures
    • about / Overview of dynamic programming
  • overload operators
    • about / Overloading
    • += / Overloading
    • + / Overloading

P

  • padding / Value encoding
  • parallel collections, Scala
    • about / Processing a parallel collection
    • benchmark framework / The benchmark framework
    • performance evaluation / Performance evaluation
  • Parallel Colt
    • URL / Don't reinvent the wheel!
  • Partial Least Square Regression (PLSR) / Evaluation
  • partially connected neural networks / The network topology
  • pay-out ratio
    • about / Fundamental analysis
  • penalized least squares regression / Ln roughness penalty
  • performance considerations
    • about / Performance considerations
    • K-means / K-means
    • EM / EM
    • PCA / PCA
  • performance evaluation, Spark
    • about / Performance evaluation
    • parameters, tuning / Tuning parameters
    • tests / Tests
    • performance considerations / Performance considerations
  • Pittsburgh approach / Why LCS?
  • Pool
    • about / Key components
  • posterior probability
    • about / Formalism
  • Predicted Residual Error Sum of Squares (PRESS) / Evaluation
  • predictive model
    • about / The predictive model
  • predictive models / Model categorization
  • price/book value ratio (PB)
    • about / Fundamental analysis
  • price/earnings ratio (PE)
    • about / Fundamental analysis
  • price/sales ratio (PS)
    • about / Fundamental analysis
  • price patterns
    • about / Price patterns
  • Price to Earnings/Growth (PEG)
    • about / Fundamental analysis
  • primal problem / The nonseparable case – the soft margin
  • principal components analysis, dimension reduction
    • about / Principal components analysis
    • algorithm / Algorithm
    • implementation / Implementation
    • test case / Test case
    • evaluation / Evaluation
  • probabilistic graphical models
    • about / Probabilistic graphical models
  • probabilistic kernels
    • about / Common discriminative kernels
  • probabilistic reasoning
    • about / Probabilistic graphical models
  • propositional logic
    • about / First order predicate logic
  • protein sequence annotation
    • about / An overview

Q

  • Q-learning
    • about / A solution – Q-learning
    • Bellman optimality equations / The Bellman optimality equations
    • temporal difference, for model-free learning / Temporal difference for model-free learning
    • action-value iterative update / Action-value iterative update
    • implementation / Implementation
    • for option trading / Option trading using Q-learning
    • implementing / Putting it all together
    • evaluation / Evaluation
  • QR decomposition / Ordinary least squares regression
  • QStar class / The Viterbi algorithm
  • quantization / Value encoding
  • Quasi-Newton algorithms
    • about / Quasi-Newton algorithms
    • Broyden-Fletcher-Goldfarb-Shanno (BGFS) / BFGS
    • Limited memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) / L-BFGS

R

  • real-world Bayesian network
    • example / Probabilistic graphical models
  • recombination
    • about / Evolutionary computing
  • reconstruction/error minimization, K-means clustering
    • about / Step 3 – reconstruction/error minimization
    • K-means components, creating / Creating K-means components
    • tail recursive implementation / Tail recursive implementation
    • iterative implementation / Iterative implementation
  • recursive algorithm, discrete Kalman filter
    • about / The recursive algorithm
    • prediction phase / Prediction
    • correction / Correction
    • Kalman smoothing / Kalman smoothing
    • fixed lag smoothing / Fixed lag smoothing
    • experimentation / Experimentation
  • regression model / Design
  • regression weights
    • about / One-variate linear regression
  • regularization
    • about / Regularization, Ln roughness penalty
    • Ln roughness penalty / Ln roughness penalty
    • ridge regression / Ridge regression
  • reinforcement learning
    • about / Model categorization, Reinforcement learning
    • problem / The problem
    • Q-learning / A solution – Q-learning
    • terminologies / Terminology
    • value of a policy / Value of a policy
    • pros / Pros and cons of reinforcement learning
    • cons / Pros and cons of reinforcement learning
  • reinforcement learning agent
    • overview architecture / Concepts
  • reproducible kernel Hilbert spaces
    • about / Common discriminative kernels
  • residuals mean square (RMS) / Step 5 – minimizing the sum of square errors
  • resilient distributed dataset (RDD) / Apache Spark
    • transformation / Apache Spark
    • action / Apache Spark
  • Resilient Distributed Datasets (RDD)
    • about / Computation on demand
  • ridge regression
    • about / Ln roughness penalty, Ridge regression
    • design / Design
    • implementation / Implementation
    • test case / Test case
  • Riemann metric
    • about / Kernel monadic composition

S

  • Scala
    • about / Why Scala?, Scala, Scala
    • features / Why Scala?
    • abstraction / Abstraction
    • scalability / Scalability
    • configurability / Configurability
    • maintainability / Maintainability
    • computation / Computation on demand
    • time series / Time series in Scala
    • object creation / Object creation
    • streams / Streams
    • parallel collections / Parallel collections
  • scalability
    • about / Scalability
  • scalability, with Actors
    • about / Scalability with Actors
    • Actor model / The Actor model
    • partitioning / Partitioning
    • reactive programming / Beyond actors – reactive programming
  • Scalable frameworks
    • about / An overview
  • Scala plugin for Eclipse
    • reference / Scala
  • Scala plugin for IntelljIDEA
    • reference / Scala
  • Scala programming
    • about / Scala programming
    • libraries directory / List of libraries and tools
    • code snippets format / Code snippets format
    • encapsulation / Encapsulation
    • class constructor template / Class constructor template
    • companion objects, versus case classes / Companion objects versus case classes
    • enumerations, versus case classes / Enumerations versus case classes
    • overload operators / Overloading
    • design template, for classifiers / Design template for immutable classifiers
    • data extraction / Data extraction
    • financial data sources / Data sources
    • document extraction / Extraction of documents
    • DMatrix class / DMatrix class
    • Counter class / Counter
    • Monitor class / Monitor
  • Scalaz
    • about / Abstraction
  • semi-supervised learning
    • about / Semi-supervised learning
  • Sequential Minimal Optimization (SMO) / The nonseparable case – the soft margin
    • about / LIBSVM
  • short interest
    • about / Fundamental analysis
  • short interest ratio
    • about / Fundamental analysis
  • shrinkage
    • about / Ln roughness penalty
  • Simple Build Tool (SBT)
    • about / Scala
  • simple build tool (sbt) / Deploying Spark
  • simple moving average
    • about / The simple moving average
  • simple workflow
    • writing / Writing a simple workflow
    • problem, scoping / Step 1 – scoping the problem
    • data loading / Step 2 – loading data
    • data, preprocessing / Step 3 – preprocessing the data
    • immutable normalization / Immutable normalization
    • patterns, discovering / Step 4 – discovering patterns
    • data, analyzing / Analyzing data
    • data, plotting / Plotting data
    • classifier, implementing / Step 5 – implementing the classifier
    • optimizer, selecting / Selecting an optimizer
    • model, training / Training the model
    • observations, classifying / Classifying observations
    • model, evaluating / Step 6 – evaluating the model
  • singular value decomposition / Ordinary least squares regression
  • singular value decomposition (SVD) / PCA
    • about / Singular Value Decomposition
  • smoothing factor for counters
    • about / The zero-frequency problem
  • smoothing kernels
    • about / Common discriminative kernels
  • soft margin / The nonseparable case – the soft margin
  • source code
    • about / Source code
    • context, versus view bounds / Context versus view bounds
    • presentation / Presentation
    • primitive types / Primitive types
    • type conversions / Type conversions
    • implicit conversion / Type conversions
    • immutability / Immutability
    • Scala iterators, performance / Performance of Scala iterators
  • Spark ecosystem
    • about / Apache Spark
  • Sparkling Water
    • about / 0xdata Sparkling Water
  • spectral density estimation
    • purpose / Fourier analysis
  • stackable trait injection / Composing mixins to build a workflow
  • state space estimation, discrete Kalman filter
    • about / The state space estimation
    • transition equation / The transition equation
    • measurement equation / The measurement equation
  • steepest descent
    • about / Steepest descent
  • stemming / Basics of information retrieval
  • stimuli / The biological background
  • stochastic gradient descent / Ordinary least squares regression
    • about / Stochastic gradient descent
  • substructures
    • about / Overview of dynamic programming
  • sum of squared errors (SSE) / One-variate linear regression
  • supervised learning
    • about / Supervised learning
  • supervised machine learning algorithms
    • about / Supervised learning
    • generative models / Generative models
    • discriminative models / Discriminative models
  • support vector machines (SVMs)
    • about / Support vector machines
    • linear SVM / The linear SVM
    • nonlinear SVM / The nonlinear SVM
  • SVC
    • about / Support vector classifiers – SVC
    • binary SVC / The binary SVC
    • one-class SVC / Anomaly detection with one-class SVC
  • SVM
    • components / Design
    • configuration parameters / Configuration parameters
    • performance considerations / Performance considerations
  • SVM dual problem
    • kernel trick / Max-margin classification
  • SVMLight
    • about / LIBSVM
  • SVR
    • about / Support vector regression
    • overview / An overview
    • versus linear regression / SVR versus linear regression

T

  • tagging model / Basics of information retrieval
  • TaskSupport
    • about / Processing a parallel collection
  • taxonomy, machine learning algorithms
    • about / Taxonomy of machine learning algorithms
    • unsupervised learning / Unsupervised learning
    • supervised learning / Supervised learning
    • semi-supervised learning / Semi-supervised learning
    • reinforcement learning / Reinforcement learning
  • technical analysis
    • about / Technical analysis
    • trading data / Trading data
    • trading signal and strategy / Trading signals and strategy
    • price patterns / Price patterns
  • technical analysis, terminology
    • bearish or bearish position / Terminology
    • bullish or bullish position / Terminology
    • long position / Terminology
    • neutral position / Terminology
    • oscillator / Terminology
    • overbought / Terminology
    • oversold / Terminology
    • relative strength index (RSI) / Terminology
    • resistance / Terminology
    • short position / Terminology
    • support / Terminology
    • technical indicator / Terminology
    • trading range / Terminology
    • trading signal / Terminology
    • volatility / Terminology
  • temporal difference
    • about / Temporal difference for model-free learning
  • terminology, LCS
    • environment / Terminology
    • agent / Terminology
    • predicate / Terminology
    • compound predicate / Terminology
    • action / Terminology
    • rule / Terminology
    • classifier / Terminology
    • rule fitness or score / Terminology
    • sensors / Terminology
    • input data stream / Terminology
    • rule matching / Terminology
    • covering / Terminology
    • predictor / Terminology
  • terminology, reinforcement learning
    • environment / Terminology
    • agent / Terminology
    • state / Terminology
    • goal / Terminology
    • absorbing state / Terminology
    • terminal state / Terminology
    • action / Terminology
    • policy / Terminology
    • best policy / Terminology
    • reward / Terminology
    • episode / Terminology
    • horizon / Terminology
  • test case, evaluation
    • about / Test case
    • implementation / Implementation
    • evaluation of models / Evaluation of models
    • impact of the hidden layers' architecture / Impact of the hidden layers' architecture
  • test case, trading strategy
    • about / A test case
    • trading strategies, creating / Creating trading strategies
    • optimizer, configuring / Configuring the optimizer
    • best trading strategy, finding / Finding the best trading strategy
  • testing, Naïve Bayes
    • about / Testing
    • textual information, retrieving / Retrieving the textual information
    • text mining classifier, evaluating / Evaluating the text mining classifier
  • tests, genetic algorithms
    • about / Tests
    • weighted score / The weighted score
    • unweighted score / The unweighted score
  • text analytics, conditional random field (CRF)
    • about / Regularized CRFs and text analytics
    • feature functions model / The feature functions model
    • design / Design
    • implementation / Implementation
    • CRF classifier, configuring / Configuring the CRF classifier
    • CRF model, training / Training the CRF model
    • CRF model, applying / Applying the CRF model
    • tests / Tests
    • training convergence profile / The training convergence profile
    • impact, of size of training set / Impact of the size of the training set
    • impact, of L2 regularization factor / Impact of the L2 regularization factor
  • text mining
    • about / Naïve Bayes and text mining
    • Naïve Bayes, applying to / Naïve Bayes and text mining
  • text mining methodology
    • implementing / Implementation
    • documents, analyzing / Analyzing documents
    • frequency of relative terms, extracting / Extracting the frequency of relative terms
    • features, generating / Generating the features
  • ThreadPoolTaskSupport
    • about / Processing a parallel collection
  • time series, in Scala
    • about / Time series in Scala
    • types and operations / Types and operations
    • magnet pattern / The magnet pattern
    • transpose operator / The transpose operator
    • differential operator / The differential operator
    • lazy views / Lazy views
  • tools
    • about / Tools and frameworks
  • trading signal / Trading signals and strategy
  • trading strategies
    • about / Definition of trading strategies
    • trading operators / Trading operators
    • cost function / The cost function
    • trading signals / Trading signals
    • trading strategies / Trading strategies
    • trading signal encoding / Trading signal encoding
  • training, hidden Markov model (HMM)
    • about / Training – CF-2
    • Baum-Welch estimator / The Baum-Welch estimator (EM)
  • training, Naïve Bayes classifiers implementation
    • about / Training
    • class likelihood / Class likelihood
    • binomial model / Binomial model
    • multinomial model / The multinomial model
    • classifier components / Classifier components
  • training and classification, multilayer perceptron
    • about / Training and classification
    • regularization / Regularization
    • model generation / The model generation
    • Fast Fisher-Yates shuffle / The Fast Fisher-Yates shuffle
    • prediction / Prediction
    • model fitness / Model fitness
  • training epoch, multilayer perceptron
    • about / The training epoch
    • input forward propagation / Step 1 – input forward propagation
    • error backpropagation / Step 2 – error backpropagation
    • exit condition / Step 3 – exit condition
    • implementing / Putting it all together
  • training workflow, logistic regression
    • about / The training workflow
    • optimizer, configuring / Step 1 – configuring the optimizer
    • Jacobian matrix, computing / Step 2 – computing the Jacobian matrix
    • convergence of optimizer, managing / Step 3 – managing the convergence of the optimizer
    • least squares problem, defining / Step 4 – defining the least squares problem
    • sum of square errors, minimizing / Step 5 – minimizing the sum of square errors
    • binomial multivariate logistic regression, testing / Test
  • trending / Test case 1 – trending
  • two-step lag smoothing algorithm / Experimentation
  • Typesafe Activator
    • URL / Akka

U

  • unsupervised learning
    • about / Unsupervised learning
    • data clustering / Clustering
    • dimension reduction / Dimension reduction

V

  • validation, model
    • about / Validation
    • key quality metrics / Key quality metrics
    • F-score for binomial classification / F-score for binomial classification
    • F-score for multinomial classification / F-score for multinomial classification
  • variance-bias trade-off
    • about / Bias-variance decomposition
  • vector quantization
    • about / Clustering
  • view bounds / Context versus view bounds
  • Viterbi algorithm
    • about / The Viterbi algorithm
    • psi / The Viterbi algorithm
    • qStar / The Viterbi algorithm
    • delta / The Viterbi algorithm
  • ViterbiPath class / Putting it all together
  • ViterbiPath object / Putting it all together

W

  • weighted moving average
    • about / The weighted moving average
  • WordNet / Basics of information retrieval
  • workflow computational model
    • about / A workflow computational model
    • mathematical abstractions, supporting / Supporting mathematical abstractions
    • mixins, combining to build workflow / Composing mixins to build a workflow
    • modularization / Modularization

X

  • 0xdata H2O / 0xdata Sparkling Water
  • 0xdata Sparkling Water
    • about / 0xdata Sparkling Water

Y

  • 1-year Treasury bill (1yTB)
    • about / Introducing the multinomial Naïve Bayes
  • Yahoo Finances / Step 1 – scoping the problem
  • YahooFinancials / Data sources

Z

  • zero-frequency problem
    • about / The zero-frequency problem
lock icon The rest of the chapter is locked
arrow left Previous Section
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at £13.99/month. Cancel anytime
Visually different images