Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Arrow up icon
GO TO TOP
Big Data Analytics with Java

You're reading from   Big Data Analytics with Java Data analysis, visualization & machine learning techniques

Arrow left icon
Product type Paperback
Published in Jul 2017
Publisher Packt
ISBN-13 9781787288980
Length 418 pages
Edition 1st Edition
Languages
Concepts
Arrow right icon
Author (1):
Arrow left icon
RAJAT MEHTA RAJAT MEHTA
Author Profile Icon RAJAT MEHTA
RAJAT MEHTA
Arrow right icon
View More author details
Toc

Table of Contents (21) Chapters Close

Big Data Analytics with Java
Credits
About the Author
About the Reviewers
www.PacktPub.com
Customer Feedback
Preface
1. Big Data Analytics with Java FREE CHAPTER 2. First Steps in Data Analysis 3. Data Visualization 4. Basics of Machine Learning 5. Regression on Big Data 6. Naive Bayes and Sentiment Analysis 7. Decision Trees 8. Ensembling on Big Data 9. Recommendation Systems 10. Clustering and Customer Segmentation on Big Data 11. Massive Graphs on Big Data 12. Real-Time Analytics on Big Data 13. Deep Learning Using Big Data Index

Index

A

  • Activation Function / Perceptron
  • advanced visualization technique
    • about / Advanced visualization technique
    • prefuse / Prefuse
    • IVTK Graph toolkit / IVTK Graph toolkit
  • Alternating Least Square (ALS) / Alternating least square – collaborative filtering
  • Apache Kafka
    • about / Apache Kafka
    • IoT sensors, integration / Apache Kafka
    • social media real-time analytics / Apache Kafka
    • healthcare analytics / Apache Kafka
    • log analytics / Apache Kafka
    • risk aggregation, in finance / Apache Kafka
  • Apache Spark
    • about / Apache Spark
    • concepts / Concepts
    • transformations / Transformations
    • actions / Actions
    • Spark Java API / Spark Java API
    • samples, Java 8 used / Spark samples using Java 8
    • data, loading / Loading data
    • data operations / Data operations – cleansing and munging
    • data, analyzing / Analyzing data – count, projection, grouping, aggregation, and max/min
    • common transformations, on Spark RDDs / Analyzing data – count, projection, grouping, aggregation, and max/min
    • actions, on RDDs / Actions on RDDs
    • paired RDDs / Paired RDDs
    • data, saving / Saving data
    • results, collecting / Collecting and printing results
    • results, printing / Collecting and printing results
    • programs, executing on Hadoop / Executing Spark programs on Hadoop
    • subprojects / Apache Spark sub-projects
    • machine learning modules / Spark machine learning modules
    • Apache Mahout / Mahout – a popular Java ML library
    • Deeplearning4j / Deeplearning4j – a deep learning library
    • Apriori algorithm, implementation / Implementation of the Apriori algorithm in Apache Spark
    • FP-Growth algorithm, executing / Running FP-Growth on Apache Spark
  • Apache Spark, machine learning modules
    • MLlib Java API / MLlib Java API
    • machine learning libraries / Other machine learning libraries
  • Apache Spark machine learning API
    • about / The new Spark ML API
    • machine learning algorithms / The new Spark ML API
    • features handling tools / The new Spark ML API
    • model selection / The new Spark ML API
    • tuning tools / The new Spark ML API
    • utility methods / The new Spark ML API
  • Apriori algorithm
    • implementation, in Apache Spark / Implementation of the Apriori algorithm in Apache Spark
    • using / Implementation of the Apriori algorithm in Apache Spark
    • disadvantages / Implementation of the Apriori algorithm in Apache Spark
  • artificial neural network / Introduction to neural networks

B

  • bagging / Bagging
  • bag of words / Bag of words
  • bar chart
    • about / Bar charts
    • dataset, creating / Bar charts
  • base project setup / Base project setup
    • default Kafka configurations, used / Base project setup
    • Maven Java project, for Spark Streaming / Base project setup
  • bayes theorem / Bayes theorem
  • bid data
    • Analytical products / Basics of Hadoop – a Java sub-project
    • Batch products / Basics of Hadoop – a Java sub-project
    • Streamlining / Basics of Hadoop – a Java sub-project
    • Machine learning libraries / Basics of Hadoop – a Java sub-project
    • NoSQL / Basics of Hadoop – a Java sub-project
    • Search / Basics of Hadoop – a Java sub-project
  • bidirected graph / Refresher on graphs
  • big data
    • data analytics on / Why data analytics on big data?
    • for data analytics / Big data for analytics
    • to bigger pay package, for Java developers / Big data – a bigger pay package for Java developers
    • Hadoop, basics / Basics of Hadoop – a Java sub-project
  • big data stack
    • HDFS / Basics of Hadoop – a Java sub-project
    • Spark / Basics of Hadoop – a Java sub-project
    • Impala / Basics of Hadoop – a Java sub-project
    • MapReduce / Basics of Hadoop – a Java sub-project
    • Sqoop / Basics of Hadoop – a Java sub-project
    • Oozie / Basics of Hadoop – a Java sub-project
    • Flume / Basics of Hadoop – a Java sub-project
    • Kafka / Basics of Hadoop – a Java sub-project
    • Yarn / Basics of Hadoop – a Java sub-project
  • binary classification dataset / What are the feature types that can be extracted from the datasets?
  • boosting / Boosting
  • bootstrapping / Bagging
  • box plots / Box plots

C

  • charts
    • used, in big data analytics / Using charts in big data analytics
    • for initial data exploration / Using charts in big data analytics
    • for data visualization and reporting / Using charts in big data analytics
  • clustering
    • about / Clustering
    • customer segmentation / Clustering
    • search engines / Clustering
    • data exploration / Clustering
    • epidemic breakout zones, finding / Clustering
    • biology / Clustering
    • news categorization / Clustering
    • news, summarization / Clustering
    • types / Types of clustering
    • hierarchical clustering / Hierarchical clustering
    • K-means clustering / K-means clustering
    • k-means clustering, bisecting / Bisecting k-means clustering
    • for customer segmentation / Clustering for customer segmentation
  • clustering algorithm
    • changing / Changing the clustering algorithm
  • code
    • diving / Diving into the code:
  • cold start problem / Content-based recommendation systems
  • collaborative recommendation systems
    • about / Collaborative recommendation systems
    • advantages / Advantages
    • disadvantages / Disadvantages
    • collaborative filtering / Alternating least square – collaborative filtering
  • common transformations, on Spark RDDs
    • Filter / Analyzing data – count, projection, grouping, aggregation, and max/min
    • Map / Analyzing data – count, projection, grouping, aggregation, and max/min
    • FlatMap / Analyzing data – count, projection, grouping, aggregation, and max/min
    • other transformations / Analyzing data – count, projection, grouping, aggregation, and max/min
  • Conditional-FP tree / Efficient market basket analysis using FP-Growth algorithm
  • Conditional FP Tree / Efficient market basket analysis using FP-Growth algorithm
  • Conditional Pattern / Efficient market basket analysis using FP-Growth algorithm
  • Conditional Patterns Base / Efficient market basket analysis using FP-Growth algorithm
  • conditional probability / Conditional probability
  • content-based recommendation systems
    • about / Content-based recommendation systems
    • Euclidean Distance / Content-based recommendation systems
    • Pearson Correlation / Content-based recommendation systems
    • dataset / Dataset
    • content-based recommender, on MovieLens dataset / Content-based recommender on MovieLens dataset
    • collaborative recommendation systems / Collaborative recommendation systems
  • content-based recommender
    • on MovieLens dataset / Content-based recommender on MovieLens dataset
  • context
    • building / Building SparkConf and context
  • customer segmentation / Customer segmentation
    • clustering / Clustering for customer segmentation

D

  • data
    • cleaning / Data cleaning and munging, Cleaning and munging the data
    • munging / Data cleaning and munging, Cleaning and munging the data
    • unwanted data, filtering / Data cleaning and munging
    • missing data, handling / Data cleaning and munging
    • incomplete data, handling / Data cleaning and munging
    • discarding / Data cleaning and munging
    • constant value, filling / Data cleaning and munging
    • average value, populating / Data cleaning and munging
    • nearest neighbor approach / Data cleaning and munging
    • converting, to proper format / Data cleaning and munging
    • basic analysis, with Spark SQL / Basic analysis of data with Spark SQL
    • parsing / Load and parse data
    • loading / Load and parse data
    • Spark-SQL way / Analyzing data – the Spark-SQL way
    • Spark SQL, for data exploration and analytics / Spark SQL for data exploration and analytics
    • Apriori algorithm / Market basket analysis – Apriori algorithm
    • Full Apriori algorithm / Full Apriori algorithm
    • preparing / Preparing the data
    • formatting / Formatting the data
    • storing / Storing the data
  • data analytics
    • on big data / Why data analytics on big data?
    • distributed computing, on Hadoop / Distributed computing on Hadoop
    • HDFS concepts / HDFS concepts
    • Apache Spark / Apache Spark
  • data exploration
    • of text data / Data exploration of text data
    / Data exploration, Data exploration
  • dataframe / Dataframe and datasets
  • DataNode / Main components of HDFS
  • dataset / Dataset, Dataset
    • URL, for downloading / All India seasonal and annual average temperature series dataset
    • fields / All India seasonal and annual average temperature series dataset
    • data / All India seasonal and annual average temperature series dataset
    • reference link / Predicting house prices using linear regression
    • data, munging / Data cleaning and munging
    • full batch approach / Accuracy of multi-layer perceptrons
    • partial batch approach / Accuracy of multi-layer perceptrons
  • dataset, linear regression
    • data, cleaning / Data cleaning and munging
    • exploring / Exploring the dataset
    • number of rows / Exploring the dataset
    • average price per zipcode, sorting by highest on top / Exploring the dataset
    • linear regression model, executing / Running and testing the linear regression model
    • linear regression model, testing / Running and testing the linear regression model
  • dataset, logistic regression
    • data, cleaning / Data cleaning and munging
    • data, munging / Data cleaning and munging
    • data, missing / Data cleaning and munging
    • categorical data / Data cleaning and munging
    • data exploration / Data exploration
    • executing / Running and testing the logistic regression model
    • testing / Running and testing the logistic regression model
  • dataset object / Training and testing the model
  • datasets / Datasets, Dataframe and datasets
    • airports dataset / Datasets
    • routes dataset / Datasets
    • airlines dataset / Datasets
  • datasets splitting
    • features selected / Choosing the best features for splitting the datasets
    • Gini Impurity / Choosing the best features for splitting the datasets
  • data transfer techniques
    • Flume / Getting and preparing data in Hadoop
    • FTP / Getting and preparing data in Hadoop
    • Kafka / Getting and preparing data in Hadoop
    • HBase / Getting and preparing data in Hadoop
    • Hive / Getting and preparing data in Hadoop
    • Impala / Getting and preparing data in Hadoop
  • data visualization
    • with Java JFreeChart / Data visualization with Java JFreeChart
    • charts, used in big data analytics / Using charts in big data analytics
  • decision tree
    • about / What is a decision tree?
    • for classification / What is a decision tree?
    • for regression / What is a decision tree?
    • building / Building a decision tree
    • datasets splitting, features selected / Choosing the best features for splitting the datasets
    • advantages / Advantages of using decision trees
    • disadvantages / Disadvantages of using decision trees
    • dataset / Dataset
    • data exploration / Data exploration
    • data, cleaning / Cleaning and munging the data
    • data, munging / Cleaning and munging the data
    • model, training / Training and testing the model
    • model, testing / Training and testing the model
  • deep learning
    • about / Deep learning
    • advantages / Advantages and use cases of deep learning
    • use cases / Advantages and use cases of deep learning
    • no feature engineering required / Advantages and use cases of deep learning
    • accuracy / Advantages and use cases of deep learning
    • information / More information on deep learning
  • deeplearning4j / Deeplearning4j
    • references / Deeplearning4j
  • Deeplearning4j
    • about / Deeplearning4j – a deep learning library
    • data, compressing / Compressing data
    • Avro / Avro and Parquet
    • Parquet / Avro and Parquet
  • distributed computing
    • on Hadoop / Distributed computing on Hadoop

E

  • edges / Refresher on graphs
  • efficient market basket analysis
    • FP-Growth algorithm, used / Efficient market basket analysis using FP-Growth algorithm
  • ensembling
    • about / Ensembling
    • voting / Ensembling
    • averaging / Ensembling
    • machine learning algorithm, used / Ensembling
    • types / Types of ensembling
    • bagging / Bagging
    • boosting / Boosting
    • advantages / Advantages and disadvantages of ensembling
    • disadvantages / Advantages and disadvantages of ensembling
    • random forest / Random forests
    • Gradient boosted trees (GBTs) / Gradient boosted trees (GBTs)

F

  • feature selection
    • filter methods / How do you select the best features to train your models?
    • pearson correlation / How do you select the best features to train your models?
    • chi-square / How do you select the best features to train your models?
    • wrapper method / How do you select the best features to train your models?
    • forward selection / How do you select the best features to train your models?
    • backward elimination / How do you select the best features to train your models?
    • embedded method / How do you select the best features to train your models?
  • FP-Growth algorithm
    • used, for efficient market basket analysis / Efficient market basket analysis using FP-Growth algorithm
    • transaction dataset / Efficient market basket analysis using FP-Growth algorithm
    • frequency of items, calculating / Efficient market basket analysis using FP-Growth algorithm
    • priority, assigning to items / Efficient market basket analysis using FP-Growth algorithm
    • array items, by priority / Efficient market basket analysis using FP-Growth algorithm
    • FP-Tree, building / Efficient market basket analysis using FP-Growth algorithm
    • frequent patterns, identifying from FP-Tree / Efficient market basket analysis using FP-Growth algorithm
    • conditional patterns, mining / Efficient market basket analysis using FP-Growth algorithm
    • conditional patterns, from leaf node Diapers / Efficient market basket analysis using FP-Growth algorithm
    • executing, on Apache Spark / Running FP-Growth on Apache Spark
  • Frequent Item sets / Efficient market basket analysis using FP-Growth algorithm
  • Frequent Pattern Mining
    • reference link / Running FP-Growth on Apache Spark
  • Full Apriori algorithm
    • about / Full Apriori algorithm
    • dataset / Full Apriori algorithm
    • apriori implementation / Full Apriori algorithm

G

  • Gradient boosted trees (GBTs)
    • about / Advantages and disadvantages of ensembling, Gradient boosted trees (GBTs)
    • dataset, used / Classification problem and dataset used
    • issues, classifying / Classification problem and dataset used
    • data exploration / Data exploration
    • random forest model, training / Training and testing our random forest model
    • random forest model, testing / Training and testing our random forest model
    • gradient boosted tree model, testing / Training and testing our gradient boosted tree model
    • gradient boosted tree model, training / Training and testing our gradient boosted tree model
  • graph analytics
    • about / Graph analytics
    • path analytics / Graph analytics
    • connectivity analytics / Graph analytics
    • community analytics / Graph analytics
    • centrality analytics / Graph analytics
    • GraphFrames / GraphFrames
    • GraphFrames, used for building a graph / Building a graph using GraphFrames
    • on airports / Graph analytics on airports and their flights
    • on flights / Graph analytics on airports and their flights
    • datasets / Datasets
    • on flights data / Graph analytics on flights data
  • graphs
    • refresher / Refresher on graphs
    • representing / Representing graphs
    • adjacency matrix / Representing graphs
    • adjacency list / Representing graphs
    • common terminology / Common terminology on graphs
    • common algorithms / Common algorithms on graphs
    • plotting / Plotting graphs
  • graphs, common algorithms
    • breadth first search / Common algorithms on graphs
    • depth first search / Common algorithms on graphs
    • dijkstra shortest path / Common algorithms on graphs
    • PageRank algorithm / Common algorithms on graphs
  • graphs, common terminology
    • vertices / Common terminology on graphs
    • edges / Common terminology on graphs
    • degrees / Common terminology on graphs
    • indegrees / Common terminology on graphs
    • outdegrees / Common terminology on graphs
  • GraphStream library
    • reference link / Plotting graphs

H

  • Hadoop
    • basics / Basics of Hadoop – a Java sub-project
    • features / Basics of Hadoop – a Java sub-project
    • distributed computing on / Distributed computing on Hadoop
    • core / Distributed computing on Hadoop
    • HDFS / Distributed computing on Hadoop
  • Hadoop Distributed File System (HDFS)
    • about / Distributed computing on Hadoop
    • Open Source / Design and architecture of HDFS
    • Immense scalability, for amount of data / Design and architecture of HDFS
    • failover support / Design and architecture of HDFS
    • fault tolerance / Design and architecture of HDFS
    • data locality / Design and architecture of HDFS
    • NameNode / Main components of HDFS
    • DataNode / Main components of HDFS
    / Real-time SQL queries using Impala
  • hand written digit recognizition
    • using CNN / Hand written digit recognizition using CNN
  • HBase / Real-time data processing
  • HDFS concepts
    • about / HDFS concepts
    • architecture / Design and architecture of HDFS
    • design / Design and architecture of HDFS
    • components / Main components of HDFS
    • simple commands / HDFS simple commands
  • hierarchical clustering / Hierarchical clustering
  • histogram
    • about / Histograms
    • using / When would you use a histogram?
    • creating, JFreeChart used / How to make histograms using JFreeChart?
  • human neuron
    • dendrite / Introduction to neural networks
    • cell body / Introduction to neural networks
    • axom terminal / Introduction to neural networks
  • hyperplane / Scatter plots, What is simple linear regression?

I

  • Impala
    • used, for real-time SQL queries / Real-time SQL queries using Impala
    • advantages / Real-time SQL queries using Impala
    • flight delay analysis / Flight delay analysis using Impala
    • Apache Kafka / Apache Kafka
    • Spark Streaming / Spark Streaming, Typical uses of Spark Streaming
    • trending videos / Trending videos
  • Iris dataset
    • reference link / Flower species classification using multi-Layer perceptrons
  • IVTK Graph toolkit
    • about / IVTK Graph toolkit
    • other libraries / Other libraries

J

  • JFreeChart API
    • dataset loading, Apache Spark used / Simple single Time Series chart
    • chart object, creating / Simple single Time Series chart
    • dataset object, filling / Bar charts
    • chart component, creating / Bar charts

K

  • k-means clustering
    • bisecting / Bisecting k-means clustering
  • K-means clustering / K-means clustering

L

  • linear regression
    • about / Linear regression
    • using / Where is linear regression used?
    • used, for predicting house prices / Predicting house prices using linear regression
    • dataset / Dataset
  • line charts / Line charts
  • logistic regression
    • about / Logistic regression
    • mathematical functions, used / Which mathematical functions does logistic regression use?
    • Gradient ascent or descent / Which mathematical functions does logistic regression use?
    • Stochastic gradient descent / Which mathematical functions does logistic regression use?
    • used for / Where is logistic regression used?
    • heart disease, predicting / Where is logistic regression used?
    • dataset / Dataset

M

  • machine learning
    • about / What is machine learning?
    • example / Real-life examples of machine learning
    • at Netflix / Real-life examples of machine learning
    • spam filter / Real-life examples of machine learning
    • Hand writing detection, on cheque submitted via ATMs / Real-life examples of machine learning
    • type / Type of machine learning
    • supervised learning / Type of machine learning
    • un-supervised learning / Type of machine learning
    • semi supervised learning / Type of machine learning
    • supervised learning, case study / A small sample case study of supervised and unsupervised learning
    • unsupervised learning, case study / A small sample case study of supervised and unsupervised learning
    • issues / Steps for machine learning problems
    • model, selecting / Choosing the machine learning model
    • training/test set / Choosing the machine learning model
    • cross validation / Choosing the machine learning model
    • features extracted from datasets / What are the feature types that can be extracted from the datasets?
    • categorical features / What are the feature types that can be extracted from the datasets?
    • numerical features / What are the feature types that can be extracted from the datasets?
    • text features / What are the feature types that can be extracted from the datasets?
    • features, selecting to train models / How do you select the best features to train your models?
    • analytics, executing on big data / How do you run machine learning analytics on big data?
    • data, preparing in Hadoop / Getting and preparing data in Hadoop
    • data, obtaining in Hadoop / Getting and preparing data in Hadoop
    • models, storing on big data / Training and storing models on big data
    • models, training on big data / Training and storing models on big data
    • Apache Spark machine learning API / Apache Spark machine learning API
  • massive graphs
    • on big data / Massive graphs on big data
    • graph analytics / Graph analytics
    • graph analytics, on airports / Graph analytics on airports and their flights
  • maths stats
    • min / Box plots
    • max / Box plots
    • mean / Box plots
    • median / Box plots
    • lower quartile / Box plots
    • upper quartile / Box plots
    • outliers / Box plots
  • mean squared error (MSE) / Bisecting k-means clustering
  • median value / Box plots
  • MNIST database
    • reference link / Hand written digit recognizition using CNN
  • model
    • selecting / Training and storing models on big data
    • training / Training and storing models on big data, Training and testing the model
    • storing / Training and storing models on big data
    • testing / Training and testing the model
  • multi-Layer perceptron
    • used, for flower species classification / Flower species classification using multi-Layer perceptrons
  • multi-layer perceptron
    • about / Multi-layer perceptrons
    • accuracy / Accuracy of multi-layer perceptrons
  • multiple linear regression / What is simple linear regression?

N

  • N-grams
    • about / N-grams
    • examples / N-grams
  • NameNode / Main components of HDFS
  • Natural Language Processing (NLP) / What are the feature types that can be extracted from the datasets?, Concepts for sentimental analysis
  • Naïve bayes algorithm
    • about / Naive Bayes algorithm
    • advantages / Advantages of Naive Bayes
    • disadvantages / Disadvantages of Naive Bayes
  • neural networks / Introduction to neural networks

O

  • OpenFlights airports database
    • reference link / Datasets

P

  • paired RDDs
    • about / Paired RDDs
    • transformations / Transformations on paired RDDs
  • perceptron
    • about / Perceptron
    • issues / Problems with perceptrons
    • Logical AND / Problems with perceptrons
    • Logical OR / Problems with perceptrons
    • sigmoid neuron / Sigmoid neuron
    • multi-layer perceptron / Multi-layer perceptrons
  • PFP / Running FP-Growth on Apache Spark
  • prefuse
    • about / Prefuse
    • reference link / Prefuse

R

  • random forest / Random forests
  • real-time analytics
    • about / Real-time analytics
    • fraud analytics / Real-time analytics
    • sensor data analysis (Internet of Things) / Real-time analytics
    • recommendations, giving to users / Real-time analytics
    • in healthcare / Real-time analytics
    • ad-processing / Real-time analytics
    • big data stack / Big data stack for real-time analytics
  • real-time data ingestion / Real-time data ingestion and storage
    • Apache Kafka / Real-time data ingestion and storage
    • Apache Flume / Real-time data ingestion and storage
    • HBase / Real-time data ingestion and storage
    • Cassandra / Real-time data ingestion and storage
  • real-time data processing / Real-time data processing
    • Spark Streaming / Real-time data processing
    • Storm / Real-time data processing
  • real-time SQL queries
    • on big data / Real-time SQL queries on big data
    • impala / Real-time SQL queries on big data
    • Apache Drill / Real-time SQL queries on big data
    • Impala, used / Real-time SQL queries using Impala
  • real-time storage / Real-time data ingestion and storage
  • Recency, Frequency, and Monetary (RFM) / Customer segmentation
  • recommendation system
    • about / Recommendation systems and their types
    • types / Recommendation systems and their types
    • content-based recommendation systems / Content-based recommendation systems
  • Resilient Distributed Dataset (RDD) / Concepts, Dataframe and datasets

S

  • scatter plots / Scatter plots
  • sentimental analysis
    • about / Sentimental analysis
    • concepts / Concepts for sentimental analysis
    • tokenization / Tokenization
    • stemming / Stemming
    • N-grams / N-grams
    • term presence / Term presence and Term Frequency
    • term frequency / Term presence and Term Frequency
    • Term Frequency and Inverse Document Frequency (TF-IDF) / TF-IDF
    • bag of words / Bag of words
    • dataset / Dataset
    • text data, data exploration / Data exploration of text data
    • on dataset / Sentimental analysis on this dataset
  • sigmoid neuron / Sigmoid neuron
  • simple linear regression / Linear regression, What is simple linear regression?
  • smoothing factor / Disadvantages of Naive Bayes
  • SOLR / Real-time data processing
  • SPAM Detector Model / Type of machine learning
  • SparkConf
    • building / Building SparkConf and context
  • Spark ML / Apache Spark machine learning API
  • Spark SQL
    • used, for basic analysis on data / Basic analysis of data with Spark SQL
    • SparkConf, building / Building SparkConf and context
    • context, building / Building SparkConf and context
    • dataframe / Dataframe and datasets
    • datasets / Dataframe and datasets
    • data, loading / Load and parse data
    • data, parsing / Load and parse data
  • Spark Streaming
    • about / Spark Streaming, Typical uses of Spark Streaming
    • use cases / Typical uses of Spark Streaming
    • data collection, in real time / Typical uses of Spark Streaming
    • storage, in real time / Typical uses of Spark Streaming
    • predictive analytics, in real time / Typical uses of Spark Streaming
    • windowed calculations / Typical uses of Spark Streaming
    • cumulative calculations / Typical uses of Spark Streaming
    • base project setup / Base project setup
  • stemming / Stemming
  • stop words removal / Stop words removal
  • Storm / Spark Streaming
  • sum of mean squared errors (SMEs) / Bisecting k-means clustering
  • supervised learning
    • about / Type of machine learning
    • classification / Type of machine learning
    • regression / Type of machine learning
  • Support Vector Machine (SVM) / SVM or Support Vector Machine

T

  • tendency / Content-based recommendation systems
  • term frequency
    • about / Term presence and Term Frequency
    • example / Term presence and Term Frequency
  • Term Frequency and Inverse Document Frequency (TF-IDF) / TF-IDF
    • about / TF-IDF
    • term frequency / TF-IDF
    • inverse document frequency / TF-IDF
  • TimeSeries chart
    • about / Time Series chart
    • all india seasonal / All India seasonal and annual average temperature series dataset
    • annual average temperature series dataset / All India seasonal and annual average temperature series dataset
    • simple single TimeSeries chart / Simple single Time Series chart
    • multiple TimeSeries, on single chart window / Multiple Time Series on a single chart window
  • tokenization
    • about / Tokenization
    • regular expression, used / Tokenization
    • pre-trained model, used / Tokenization
    • stop words removal / Stop words removal
  • trending videos
    • about / Trending videos
    • sentiment analysis, at real time / Sentiment analysis in real time

V

  • vertexes / Refresher on graphs
  • Visualization ToolKit (VTK)
    • about / IVTK Graph toolkit
    • URL / IVTK Graph toolkit

W

  • windowed calculations / Trending videos
lock icon The rest of the chapter is locked
arrow left Previous Section
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at £13.99/month. Cancel anytime
Visually different images