Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Arrow up icon
GO TO TOP
Scala for Data Science

You're reading from   Scala for Data Science Leverage the power of Scala with different tools to build scalable, robust data science applications

Arrow left icon
Product type Paperback
Published in Jan 2016
Publisher
ISBN-13 9781785281372
Length 416 pages
Edition 1st Edition
Languages
Arrow right icon
Author (1):
Arrow left icon
 Bugnion Bugnion
Author Profile Icon Bugnion
Bugnion
Arrow right icon
View More author details
Toc

Table of Contents (22) Chapters Close

Scala for Data Science
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
1. Scala and Data Science FREE CHAPTER 2. Manipulating Data with Breeze 3. Plotting with breeze-viz 4. Parallel Collections and Futures 5. Scala and SQL through JDBC 6. Slick – A Functional Interface for SQL 7. Web APIs 8. Scala and MongoDB 9. Concurrency with Akka 10. Distributed Batch Processing with Spark 11. Spark SQL and DataFrames 12. Distributed Machine Learning with MLlib 13. Web APIs with Play 14. Visualization with D3 and the Play Framework Pattern Matching and Extractors Index

Index

A

  • actions / Actions
    • asynchronous actions / Asynchronous actions
  • actors
    • as people / Actors as people
    • constructing / Actor construction, Anatomy of an actor, Follower network crawler
    • fetcher / Fetcher actors
  • aggregate functions
    • URL / Aggregation operations
  • aggregation operations
    • about / Aggregation operations
  • aggregations
    • with Group by / Aggregations with "Group by"
  • Akka documentation / What we have not talked about
  • Akka library / Futures example – stock price fetcher
  • Amazon Web Services (AWS)
    • URL / Running Spark applications on EC2
  • Apache Parquet
    • about / Parquet files
  • APIs
    • creating, with Play / Creating APIs with Play: a summary
  • application
    • building / Building an application
  • applications
    • Bootstrapping / Bootstrapping the applications
  • Arrays / A whirlwind tour of JSON
  • arrays
    • about / Complex data types – arrays, maps, and structs, Arrays
  • authentication
    • HTTP headers, adding / Authentication – adding HTTP headers

B

  • backend
    • need for / Do I need a backend?
  • BinaryClassificationMetrics instance
    • URL / Evaluation
  • BLAS library / Basic Breeze data types
  • Body Mass Index (BMI) / DataFrames – a whirlwind introduction
  • BooleanColumnExtensionMethods class
    • URL / Operations on columns
  • Bootstrap layouts
    • URL / Towards a web application: HTML templates
  • Breeze
    • code, examples / Code examples
    • installing / Installing Breeze
    • help, getting / Getting help on Breeze
    • Wiki page, on GitHub / Getting help on Breeze
    • data types / Basic Breeze data types
    • alternatives / Alternatives to Breeze
    • URL / References
    • API documents, URL / References
    • diving into / Diving into Breeze
  • Breeze-viz
    • about / Managing without documentation
    • URL / Managing without documentation
    • reference / Breeze-viz reference

C

  • Casbah
    • URL / Casbah query DSL, References
    • about / Beyond Casbah
  • Casbah query DSL
    • about / Casbah query DSL
  • case classes
    • used, for pattern matching / JSON in Scala – an exercise in pattern matching
    • used, for extraction / Extraction using case classes
    • as messages / Case classes as messages
  • client-server applications
    • about / Client-server applications
  • client-side program
    • architecture / Client-side program architecture
    • model, designing / Designing the model
    • event bus / The event bus
    • AJAX calls, thorugh JQuery / AJAX calls through JQuery
    • response views / Response views
  • collision / Transformers
  • complex queries / Complex queries
  • configuration options
    • URL / Reducing logging output and Spark configuration
  • Connection class
    • API documentation, URL / References
  • context bound / Coding against type classes
  • cross-validation
    • and model selection / Cross-validation and model selection
  • custom supervisor strategies / Custom supervisor strategies
  • custom type serialization
    • about / Custom type serialization

D

  • data access layer
    • about / Creating a data access layer
  • database metadata
    • accessing / Accessing database metadata
  • DataFrames
    • about / DataFrames – a whirlwind introduction
    • joining, together / Joining DataFrames together
    • custom functions / Custom functions on DataFrames
    • immutability / DataFrame immutability and persistence
    • persistence / DataFrame immutability and persistence
    • SQL statements / SQL statements on DataFrames
  • data mapper pattern
    • URL / References
  • data science
    • about / Data science
    • programming in / Programming in data science
  • dataset
    • URL / Data preprocessing and feature engineering
  • data shuffling
    • about / Data shuffling and partitions
  • data sources
    • interacting with / Interacting with data sources
    • JSON files / JSON files
    • Parquet files / Parquet files
  • data types
    • about / Complex data types – arrays, maps, and structs
  • data types, Breeze
    • about / Basic Breeze data types
    • vectors / Vectors
    • matrices / Matrices
    • vectors, building / Building vectors and matrices
    • matrices, building / Building vectors and matrices
    • indexing / Advanced indexing and slicing
    • slicing / Advanced indexing and slicing
    • vectors, mutating / Mutating vectors and matrices
    • matrices, mutating / Mutating vectors and matrices
    • matrix multiplication / Matrix multiplication, transposition, and the orientation of vectors
    • matrix transposition / Matrix multiplication, transposition, and the orientation of vectors
    • vectors, orientation / Matrix multiplication, transposition, and the orientation of vectors
    • data preprocessing / Data preprocessing and feature engineering
    • feature engineering / Data preprocessing and feature engineering
    • function optimization / Breeze – function optimization
    • numerical derivatives / Numerical derivatives
    • regularization / Regularization
  • DenseVector or DenseMatrix
    • URL / Vectors
  • directed acyclic graph (DAG) / Lifting the hood
  • documents
    • inserting / Inserting documents
  • drivers
    • URL / Importing Slick
  • dynamic routing
    • about / Dynamic routing

E

  • element-wise operators
    • pitfalls / Vectors
  • estimators
    • about / Estimators
  • evaluation
    • about / Evaluation
  • event bus / The event bus
  • example data
    • acquiring / Acquiring the example data
  • execution contexts
    • parallel execution, controlling with / Controlling parallel execution with execution contexts
  • extraction
    • used, for case classes / Extraction using case classes

F

  • Federal Election Commission (FEC)
    • about / FEC data
    • URL / FEC data
  • Federal Election Commission (FEC) data
    • about / FEC data
    • URL / FEC data
    • Slick, importing / Importing Slick
    • schema, defining / Defining the schema
    • database, connecting to / Connecting to the database
    • tables, creating / Creating tables
    • inserting / Inserting data
    • querying / Querying data
  • floating point format
    • URL / Defining the schema
  • follower network crawler / Follower network crawler, Fault tolerance
  • function optimization / Breeze – function optimization
  • futures
    • about / Futures
    • URL / Futures, References
    • result, using / Future composition – using a future's result
    • blocking until completion / Blocking until completion
    • parallel execution, controlling with execution contexts / Controlling parallel execution with execution contexts
    • stock price fetchers example / Futures example – stock price fetcher
    • concurrency and exception handling / Concurrency and exception handling with futures

G

  • GitHub
    • follower's graph / GitHub follower graph
    • URL / JavaScript dependencies through web-jars
  • GitHub API
    • URL / References
  • GitHub servers
    • URL / Client-server applications
  • GitHub user data
    • about / GitHub user data
    • URL / GitHub user data
  • Group by
    • aggregations with / Aggregations with "Group by"

H

  • HashingTF / Transformers
  • headers
    • adding, to HTTP requests in Scala / Adding headers to HTTP requests in Scala
  • Hello world
    • with Akka / Hello world with Akka
  • HTML templates
    • about / Towards a web application: HTML templates
  • HTTP
    • about / HTTP – a whirlwind overview
  • HTTP headers
    • adding / Authentication – adding HTTP headers

I

  • indexing / Advanced indexing and slicing
  • invokers
    • about / Invokers

J

  • java.sql.Types package
    • API documentation, URL / JDBC summary
  • JavaScipt dependencies
    • through web-jars / JavaScript dependencies through web-jars
  • JDBC
    • about / Interacting with JDBC
    • first steps / First steps with JDBC
    • database server, connecting to / Connecting to a database server
    • tables, creating / Creating tables
    • data, inserting / Inserting data
    • data, reading / Reading data
    • summary / JDBC summary
    • functional wrappers / Functional wrappers for JDBC
    • connections, with loan pattern / Safer JDBC connections with the loan pattern
    • connections enriching, with pimp my library pattern / Enriching JDBC statements with the "pimp my library" pattern
    • result sets in stream, wrapping / Wrapping result sets in a stream
    • API documentation, URL / References
    • versus Slick / Slick versus JDBC
  • JFreeChart documentation
    • URL / Customizing plots
  • JSON
    • about / A whirlwind tour of JSON
    • interacting with / Interacting with JSON
    • external APIs, querying / Querying external APIs and consuming JSON
    • consuming / Querying external APIs and consuming JSON
    • parsing / Parsing JSON
  • JSON4S types / JSON4S types
  • JSON files
    • about / JSON files
  • JSON in Scala
    • about / JSON in Scala – an exercise in pattern matching
    • JSON4S types / JSON4S types
    • fields extracting, XPath used / Extracting fields using XPath

K

  • k-fold cross-validation / Cross-validation and model selection

L

  • L-BFGS method / Breeze – function optimization
  • LAPACK library / Basic Breeze data types
  • lazy computation
    • about / Towards re-usable code
  • LET IT CRASH blog
    • URL / References
  • life-cycle hooks
    • about / Life-cycle hooks
  • line type
    • customizing / Customizing the line type
  • Ling-Spam dataset
    • URL / Reference, Introducing MLlib – Spam classification
  • Ling-Spam email dataset
    • URL / Acquiring the example data, Spam filtering
  • loan pattern / Reading data
    • JDBC connections with / Safer JDBC connections with the loan pattern
  • logistic regression
    • about / An example – logistic regression, Beyond logistic regression
    • regularization / Regularization in logistic regression
  • looser coupling
    • with type classes / Looser coupling with type classes
    • type classes / Type classes
    • coding, against type classes / Coding against type classes
    • type classes, using / When to use type classes
    • type classes, benefits / Benefits of type classes

M

  • Machine Learning course
    • URL / References
  • maps
    • about / Maps
  • matrices
    • about / Matrices
    • building / Building vectors and matrices
    • mutating / Mutating vectors and matrices
  • message
    • passing, between actors / Message passing between actors
  • message sender
    • accessing / Accessing the sender of a message
  • MLlib / Breeze – function optimization
    • spam classification / Introducing MLlib – Spam classification
  • Model-View-Controller (MVC)
    • architecture / Model-View-Controller architecture
  • modular JavaScript
    • through RequireJS / Modular JavaScript through RequireJS
  • MongoDB
    • about / MongoDB
    • manual installation, URL / MongoDB
    • connecting, with Casbah / Connecting to MongoDB with Casbah
    • authentication, connecting with / Connecting with authentication
    • reference documentation, URL / Complex queries
  • MTable instances
    • URL / Accessing database metadata
  • Mutual Information (MI) / Spam filtering

N

  • NumericColumnExtensionMethods class
    • URL / Operations on columns
  • NVD3
    • used, for drawing plots / Drawing plots with NVD3
    • URL / Drawing plots with NVD3

O

  • object-oriented design patterns
    • URL / References
  • objects
    • extracting, from database / Extracting objects from the database
  • Objects / A whirlwind tour of JSON
  • operations
    • on columns / Operations on columns
  • Ordering
    • URL / Transformations and actions on RDDs

P

  • package.scala source file
    • URL / Breeze-viz reference
  • PaintScale.scala source file
    • URL / More advanced scatter plots
  • parallel collections
    • about / Parallel collections
    • limitations / Limitations of parallel collections
    • error handling / Error handling
    • parallelism level, setting / Setting the parallelism level
    • cross-validation with / An example – cross-validation with parallel collections
  • parallel execution
    • controlling, with execution contexts / Controlling parallel execution with execution contexts
  • Parquet files
    • URL / References
  • parsers
    • URL / Understanding and parsing the request
  • pattern matchin
    • case classes used / JSON in Scala – an exercise in pattern matching
  • Pattern matching
    • for comprehensions / Pattern matching in for comprehensions
    • internals / Pattern matching internals
    • URL / Reference
  • permanence spectrum / Programming in data science
  • persistence level
    • URL / Persisting RDDs
  • Pimp my Library pattern
    • URL / References
  • pimp my library pattern
    • URL / Enriching JDBC statements with the "pimp my library" pattern
  • pimp my library pattern
    • used, for enriching JDBC statements / Enriching JDBC statements with the "pimp my library" pattern
  • pipeline
    • about / Pipeline components
    • transformers / Transformers
    • estimators / Estimators
  • pipeline API
    • URL / References
  • Play framework / Futures example – stock price fetcher
    • about / The Play framework
    • URL / Dynamic routing
  • plots
    • customizing / Customizing plots
    • drawing, with NVD3 / Drawing plots with NVD3
  • PreparedStatement API documentation
    • URL / Inserting data
  • PreparedStatement class
    • API documentation, URL / References

Q

  • queue control
    • and pull pattern / Queue control and the pull pattern

R

  • receiver operating characteristic (ROC) curve / Evaluation
  • regularization / Regularization
    • in logistic regression / Regularization in logistic regression
  • request
    • parsing / Understanding and parsing the request
  • RequireJS
    • modular JavaScript through / Modular JavaScript through RequireJS
  • resilient applications
    • building / Futures
  • Resilient distributed datasets (RDD)
    • about / Resilient distributed datasets
    • immutability / RDDs are immutable
    • operations, executing / RDDs are lazy
    • constructing / RDDs know their lineage
    • resiliency / RDDs are resilient
    • distribution / RDDs are distributed
    • transformations / Transformations and actions on RDDs
    • actions / Transformations and actions on RDDs
    • operations, URL / Transformations and actions on RDDs
    • persisting / Persisting RDDs
    • Key-value / Key-value RDDs
    • double / Double RDDs
  • response
    • composing / Composing the response
  • response views / Response views
  • Rest APIs
    • about / Rest APIs: best practice
  • results
    • URL / Composing the response
  • ResultSet interface
    • API documentation, URL / References
  • routing
    • about / Routing

S

  • Scala
    • and data science / Data science
    • uses / Why Scala?, Scala encourages immutability, Easier parallelism
    • static typing and type inference / Static typing and type inference
    • and functional programs / Scala and functional programs
    • null pointer uncertainty / Null pointer uncertainty
    • interoperability, with Java / Interoperability with Java
    • drawbacks / When not to use Scala
    • references / References
    • URL / References
  • Scala constructs
    • URL / Reference
  • scatter plot matrix plots
    • about / Multi-plot example – scatterplot matrix plots
  • scatter plots
    • about / More advanced scatter plots
  • schema
    • defining / Defining the schema
  • semantic URLs / Dynamic routing, References
  • sequences
    • extracting / Extracting sequences
  • shuffling / Data shuffling and partitions
  • single page applications
    • about / Single page applications
  • slicing / Advanced indexing and slicing
  • Slick
    • importing / Importing Slick
    • arguments, URL / Defining the schema
    • joins, URL / Invokers
    • versus JDBC / Slick versus JDBC
    • URL / References
  • spam filtering
    • about / Spam filtering
  • Spark
    • installing / Installing Spark
    • URL / Installing Spark, SQL statements on DataFrames
    • on EC2, URL / Running Spark applications on EC2
    • data shuffling / Data shuffling and partitions
    • Web UI, URL / Reference
    • internals, URL / Reference
  • Spark applications
    • running, locally / Running Spark applications locally
    • URL / Running Spark applications locally
    • running, on EC2 / Running Spark applications on EC2
  • Spark notebooks
    • URL / Data visualization beyond breeze-viz
  • SQL statements
    • on DataFrames / SQL statements on DataFrames
  • stand-alone programs
    • building / Building and running standalone programs
  • standalone programs
    • about / Standalone programs
  • Stanford NLP toolkit
    • URL / Spam filtering
  • stateful actors / Stateful actors
  • StringColumnExtensionMethods class
    • URL / Operations on columns
  • structs
    • about / Structs

T

  • tokenization
    • about / Transformers
  • tokens
    • URL / Authentication – adding HTTP headers
  • transformations
    • URL / Key-value RDDs
  • transformers
    • about / Transformers
    • URL / References
  • try/catch statements
    • versus Try type / Error handling
  • Try type
    • versus try/catch statements / Error handling
    • URL / References
  • tuning memory usage
    • URL / Persisting RDDs
  • type classes
    • loose coupling with / Looser coupling with type classes
    • about / Type classes
    • coding against / Coding against type classes
    • usage / When to use type classes
    • benefits / Benefits of type classes
    • URL / References
  • Typesafe activators
    • about / The Play framework
    • URL / The Play framework

U

  • URL design / Dynamic routing
  • user-defined function (UDF) / Custom functions on DataFrames
  • user-defined functions (UDFs) / Custom functions on DataFrames

V

  • vectors
    • about / Vectors
    • dense / Dense and sparse vectors and the vector trait
    • sparse / Dense and sparse vectors and the vector trait
    • trait / Dense and sparse vectors and the vector trait
    • building / Building vectors and matrices
    • mutating / Mutating vectors and matrices

W

  • web-jars
    • JavaScipt dependencies through / JavaScript dependencies through web-jars
  • web APIs
    • querying / Querying web APIs
  • web application
    • about / Towards a web application: HTML templates
  • web frameworks
    • about / Introduction to web frameworks
  • web services
    • external web services, calling / Calling external web services

X

  • XPath
    • used, for extracting fields / Extracting fields using XPath
  • XPath DSL / Extracting fields using XPath
lock icon The rest of the chapter is locked
arrow left Previous Section
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at £13.99/month. Cancel anytime
Visually different images