Packt+ | Advance your knowledge in tech

0

All Products

Best Sellers

New Releases

Books

Videos

Audiobooks

Free Learning

Big Data Analytics with Java

You're reading from Big Data Analytics with Java Data analysis, visualization & machine learning techniques

Product type Paperback

Published in Jul 2017

Publisher Packt

ISBN-13 9781787288980

Length 418 pages

Edition 1st Edition

Languages

Java

Tools

Apache Spark

Concepts

Big Data

Author (1):

RAJAT MEHTA

View More author details

Table of Contents (21) Chapters

Big Data Analytics with Java

Credits

About the Author

About the Reviewers

www.PacktPub.com

Customer Feedback

Preface

1. Big Data Analytics with Java

2. First Steps in Data Analysis FREE CHAPTER

3. Data Visualization

4. Basics of Machine Learning

5. Regression on Big Data

6. Naive Bayes and Sentiment Analysis

7. Decision Trees

8. Ensembling on Big Data

9. Recommendation Systems

10. Clustering and Customer Segmentation on Big Data

11. Massive Graphs on Big Data

12. Real-Time Analytics on Big Data

13. Deep Learning Using Big Data

Index

Chapter 10. Clustering and Customer Segmentation on Big Data

Up until now we have only used and worked on data that was prelabeled that is, supervised. Based on that prelabeled data, we trained our machine learning models and predicted our results. But what if the data is not labeled at all and we just get plain data? In that case, can we carry out any useful analysis of the data at all? Figuring out details from an unlabeled dataset is an example of unsupervised learning, where the machine learning algorithm makes deductions or predictions from raw unlabeled data. One of the most popular approaches to analyzing this unlabeled data is to find groups of similar items within a dataset. This grouping of data has several advantages and use cases, as we will see in this chapter.

In this chapter, we will cover the following topics:

The concepts of clustering and types of clustering, including k-means and bisecting k-means clustering
Advantages and use cases of clustering
Customer segmentation and...

The rest of the chapter is locked

Register for a free Packt account to unlock a world of extra content!

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Authors (1)

RAJAT MEHTA

RAJAT MEHTA

The author is a VP (Technical Architect) in technology in JP Morgan Chase in New York. The author is a sun certified java developer and has worked on java related technologies for more than 16 years. Current role for the past few years heavily involves the usage of bid data stack and running analytics on it. Author is also a contributor in various open source projects that are available on his GitHub repository and is also a frequent write on dev magazines.

See other products by RAJAT MEHTA

Other recommended products

Related to this chapter

Machine Learning with scikit-learn Quick Start Guide

Machine Learning with scikit-learn Quick Start Guide

Scikit-learn is a robust machine learning library for the Python programming language. It provides a set of supervised and unsupervised learning algorithms. This book is the easiest way to learn how to deploy, optimize and evaluate all the important machine learning algorithms that scikit-learn provides.

Oct 2018 5h 44m

Apache Spark 2.x for Java Developers

Apache Spark 2.x for Java Developers

Apache Spark is the buzzword in the big data industry right now, especially with the increasing need for real-time streaming and data processing. While Spark is built on Scala, the Spark Java API exposes all the Spark features available in the Scala version for Java developers. This book will show you how you can implement various functionalities of the Apache Spark framework in Java, without stepping out of your comfort zone.

Jul 2017 11h 40m

Apache Spark Quick Start Guide

Apache Spark Quick Start Guide

Apache Spark is a ?exible in-memory framework that allows processing of both batch and real-time data. Its unified engine has made it quite popular for big data use cases. This book will help you to quickly get started with Apache Spark 2.0 and write efficient big data applications for a variety of use cases.

Mastering Machine Learning with Spark 2.x

Mastering Machine Learning with Spark 2.x

The purpose of machine learning is to build systems that learn from data. With the meteoric rise of machine learning, developers are now keen on finding out how can they make their Spark applications smarter. The book commences by defining machine learning primitives by the MLlib and H2O libraries. You will learn how to use Binary classification to detect the Higgs Boson particle in the huge amount of data produced by CERN particle collider and classify daily health activities using ensemble Methods for Multi-Class Classification. Finally, you will build different pattern mining models using MLlib, perform complex manipulation of DataFrames using Spark and Spark SQL, and deploy your app in a Spark streaming environment.

Aug 2017 11h 20m

Apache Spark 2.x Machine Learning Cookbook

Apache Spark 2.x Machine Learning Cookbook

Machine learning aims to extract knowledge from data, relying on fundamental concepts in computer science, statistics, probability, and optimization. This book begins with a quick overview of setting up the necessary IDEs to facilitate the execution of code examples that will be covered in various chapters. It also highlights some key issues developers face while working with machine learning algorithms on the Spark platform. We progress by uncovering the various Spark APIs and the implementation of ML algorithms with developing classification systems, recommendation engines, text analytics, clustering, and learning systems. Toward the final chapters, we'll focus on building high-end applications and explain various unsupervised methodologies and challenges to tackle when implementing with big data ML systems.

Sep 2017 22h 12m

Learning Spark SQL

Learning Spark SQL

In the past year, Apache Spark has been increasingly adopted for development of distributed applications. Spark SQL APIs provides an optimized interface that helps developers build such applications quickly and easily. However, designing web-scale production applications using Spark SQL APIs can be a complex task. Understanding the design and implementation best practices for Spark SQL API based applications before you start your project will help you avoid these problems and ensure that your project is a success. Learning Spark SQL gives an insight into the engineering practices used to design and build real-world Spark-based applications. The hands-on examples will give you the required confidence to work on any future projects you encounter in Spark SQL.

Sep 2017 15h 4m

Mastering Apache Spark 2.x

Mastering Apache Spark 2.x

Apache Spark is an in-memory cluster-based parallel processing system that provides a wide range of functionality like graph processing, machine learning, stream processing and more. This book will familiarize you with the newest features in Apache Spark 2.x, and take you through an exciting journey of complex Big Data processing, analytics, streaming analytics as well as advanced machine learning with Apache Spark. During the course of the book, you will leverage different functionalities and modules of Apache Spark such as Spark SQL, Spark MLlib, Spark Streaming, SparkML and more, to build efficient data processing solutions. By the end of this book, you will have all the necessary knowledge to use Apache Spark effectively in your day to day tasks.

Jul 2017 11h 48m

Hands-On Deep Learning with Apache Spark

Hands-On Deep Learning with Apache Spark

Deep Learning is a subset of Machine Learning where data sets with several layers of complexity can be processed. This book teaches you the different techniques using which deep learning solutions can be implemented at scale, on Apache Spark. This will help you gain experience of implementing your deep learning models in many real-world use cases.

Jan 2019 10h 44m

Machine Learning with Spark

Machine Learning with Spark

Spark ML is the machine learning module of Spark. It uses in-memory RDDs to process machine learning models faster for clustering, classification, and regression.

Apr 2017 17h 44m

Machine Learning with Scala Quick Start Guide

Machine Learning with Scala Quick Start Guide

Scala as a programming language is a highly scalable integration of object-oriented and functional programming, which makes it easy to build scalable and complex big data applications. This book is a handy guide for machine learning developers and data scientists who want to train effective machine learning models using this popular language.

Apr 2019 7h 20m

Learning PySpark

Learning PySpark

This book will get you to grips with the Spark Python API. You'll explore how Python can be used with Spark to build scalable and reliable data-intensive applications.

Apache Spark 2.x Cookbook

Apache Spark 2.x Cookbook

Apache Spark has become the hottest platform and sought after skill set when it comes to the fields of Big Data, Analytics and Data Science. Apache Spark 2.x comes with series of new improvements in the areas of performance, scalability, operational and production readiness for structured processing of massive datasets. This book brings in a systematic way of getting a practical hands on to using its improved programming APIs, expanded SQL functionalities and implement distributed machine learning applications with Spark ML. Through the course of chapters, you will have explored the power of Spark DataFrames/Datasets, harness MLLib for Data mining, analyze complex problems with iterative or multi-stage Spark scripts and other associated toolsets such as Spark SQL, Spark Streaming and GraphX .

May 2017 9h 48m

Personalised recommendations for you

Based on your interests and search pattern

Mathematics of Machine Learning

Mathematics of Machine Learning

Deepen your theoretical knowledge and enhance your ability to solve complex machine learning problems with structured guidance. Gain the confidence to engage with advanced ML literature and tailor algorithms to meet your project requirements.

May 2025 24h 20m

Generative AI with Python and PyTorch

Generative AI with Python and PyTorch

Learn how to create images and text using VAEs, GANs, LSTMs, and transformers. Implement applications in natural language processing and computer vision through practical tutorials.

Mar 2025 15h 0m

Practical Generative AI with ChatGPT

Practical Generative AI with ChatGPT

This book helps you unlock ChatGPT's potential to make your working life better. From prompt engineering to creating custom GPTs, you'll enhance your productivity, creativity, and efficiency with practical insights and advanced techniques.

Apr 2025 12h 52m

Generative AI with LangChain

Generative AI with LangChain

Gain a solid foundation in LangChain, agentic AI, and LangGraph, and learn to build production-ready systems with multi-agent architectures, advanced RAG pipelines, Tree of Thought reasoning, agent handoffs, and fine-grained error handling.

May 2025 15h 52m

Architecting Power BI Solutions in Microsoft Fabric

Architecting Power BI Solutions in Microsoft Fabric

Power BI provides several options to solve common data problems, and designing the correct solution for each scenario can be a daunting task. This book makes it easier by guiding you through designing optimal solutions using Power BI.

Apr 2025 14h 16m

Microsoft Identity and Access Administrator SC-300 Exam Guide

Microsoft Identity and Access Administrator SC-300 Exam Guide

This comprehensive guide covers key topics such as Microsoft Entra ID implementation, authentication and access management, external user management, and hybrid identity solutions, providing practical insights and techniques for SC-300 exam success.

Mar 2025 19h 48m

LLM Design Patterns

LLM Design Patterns

This book helps you gain practical skills to develop and deploy LLMs. You'll learn data prep, training, pruning, quantization, and evaluation, as well as explore RAG, advanced prompting, and optimization to build robust, scalable language models.

May 2025 17h 48m

Tableau Cookbook for Experienced Professionals

Tableau Cookbook for Experienced Professionals

Advance your Tableau knowledge beyond the basics, streamline dashboard performance, tackle advanced geospatial challenges, and unlock API potential while fortifying your corporate data infrastructure with proven best practices.

Apr 2025 12h 24m

Time Series Analysis with Spark

Time Series Analysis with Spark

This book offers a complete guide to time series analysis with Apache Spark and Databricks, covering essential concepts and advanced techniques including Generative AI to equip readers with skills for real-world challenges across industries.

Mar 2025 9h 56m

Hands-On Artificial Intelligence for IoT

Hands-On Artificial Intelligence for IoT

Transform IoT systems with the power of artificial intelligence using this hands-on guide. Dive into practical techniques and expert insights to innovate and optimize your IoT devices, making them smarter and more efficient.

May 2025 15h 44m