Packt+ | Advance your knowledge in tech

0

All Products

Best Sellers

New Releases

Books

Videos

Audiobooks

Free Learning

Practical Big Data Analytics

You're reading from Practical Big Data Analytics Hands-on techniques to implement enterprise analytics and machine learning using Hadoop, Spark, NoSQL and R

Product type Paperback

Published in Jan 2018

Publisher Packt

ISBN-13 9781783554393

Length 412 pages

Edition 1st Edition

Languages

Java

Tools

Apache Spark

Concepts

Big Data

Author (1):

Dasgupta

View More author details

Table of Contents (16) Chapters

Title Page

Packt Upsell

Contributors

Preface

1. Too Big or Not Too Big FREE CHAPTER

2. Big Data Mining for the Masses

3. The Analytics Toolkit

4. Big Data With Hadoop

5. Big Data Mining with NoSQL

6. Spark for Big Data Analytics

7. An Introduction to Machine Learning Concepts

8. Machine Learning Deep Dive

9. Enterprise Data Science

10. Closing Thoughts on Big Data

11. External Data Science Resources

1. Other Books You May Enjoy

Leave a review - let other readers know what you think

Summary

In this chapter, we read about some of the core features of Spark, one of the most prominent technologies in the Big Data landscape today. Spark has matured rapidly since its inception in 2014, when it was released as a Big Data solution that alleviated many of the shortcomings of Hadoop, such as I/O contention and others.

Today, Spark has several components, including dedicated ones for streaming analytics and machine learning, and is being actively developed. Databricks is the leading provider of the commercially supported version of Spark and also hosts a very convenient cloud-based Spark environment with limited resources that any user can access at no charge. This has dramatically lowered the barrier to entry as users do not need to install a complete Spark environment to learn and use the platform.

In the next chapter, we will begin our discussion on machine learning. Most of the text, until this section, has focused on the management of large scale data. Making use of the data...

The rest of the chapter is locked

Register for a free Packt account to unlock a world of extra content!

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at £13.99/month. Cancel anytime

Authors (1)

Dasgupta

Dasgupta

Nataraj Dasgupta is the vice president of advanced analytics at RxDataScience Inc. Nataraj has been in the IT industry for more than 19 years, and has worked in the technical and analytics divisions of Philip Morris, IBM, UBS Investment Bank, and Purdue Pharma. At Purdue Pharma, Nataraj led the data science division, where he developed the company's award-winning big data and machine learning platform. Prior to Purdue, at UBS, he held the role of Associate Director, working with high-frequency and algorithmic trading technologies in the foreign exchange trading division of the bank.

See other products by Dasgupta

Other recommended products

Related to this chapter

Web Application Development with R Using Shiny

Web Application Development with R Using Shiny

Shiny is an open source R package that provides an elegant and powerful web framework for building web applications using R. This guide takes a fresh approach to developing scalable web applications. It will enable you to create responsive, interactive web applications using the complete R Shiny suite.

Sep 2018 7h 56m

Apache Hadoop 3 Quick Start Guide

Apache Hadoop 3 Quick Start Guide

Apache Hadoop is a widely used distributed data platform. It enables large datasets to be efficiently processed instead of using one large computer to store and process the data. This book will get you started with the Hadoop ecosystem, and introduce you to the main technical topics such as MapReduce, YARN and HDFS.

Oct 2018 7h 20m

Hands-On Big Data Modeling

Hands-On Big Data Modeling

Big data modeling is very challenging to handle using traditional database modeling and management systems. This book will teach you how to model big data using the latest and more efficient tools such as ERWIN, ANACONDA (Python), and WEKA to model data.

Nov 2018 10h 12m

Apache Spark Quick Start Guide

Apache Spark Quick Start Guide

Apache Spark is a ?exible in-memory framework that allows processing of both batch and real-time data. Its unified engine has made it quite popular for big data use cases. This book will help you to quickly get started with Apache Spark 2.0 and write efficient big data applications for a variety of use cases.

Mastering Hadoop 3

Mastering Hadoop 3

This is a comprehensive guide to understand advanced concepts of Hadoop ecosystem. You will learn how Hadoop works internally, and build solutions to some of real world use cases. Finally, you will have a solid understanding of how components in the Hadoop ecosystem are effectively integrated to implement a fast and reliable Big Data pipeline

Feb 2019 18h 8m

Hands-on DevOps

Hands-on DevOps

Data Lake for Enterprises

Data Lake for Enterprises

The term 'Data Lake' has recently emerged as a prominent term in the big data industry. Data scientists can make use of it in deriving meaningful insights which can be used by businesses to redefine or transform the way they operate. Lambda architecture is also emerging as one of the very eminent patterns in the big data landscape, as it helps to derive useful information from not only the historical data but also correlates real-time data to enable business for taking critical decisions. This book tries to bring these two important aspects into one, namely data lake and lambda architecture.

May 2017 19h 52m

Hands-On Data Science with R

Hands-On Data Science with R

Hands-On Data Science with R explore various popular R packages to perform various data science tasks, including core statistical concepts and a wide array of use cases. This practical book covers the entire data science ecosystem for aspiring data scientists, including machine learning, NLP, and neural networks

Nov 2018 14h 0m

Learning Apache Spark 2

Learning Apache Spark 2

Apache Spark is one of the most popular Big Data processing frameworks today, delivering speed, accuracy and real-time results – all in one solution. With this book, you will delve into the world of Apache Spark and learn about the new features introduced in Spark 2, along with the architecture and the associated concepts. A comprehensive guide to Apache Spark 2 for beginners, this book covers everything you need to know to get up and running with Big Data processing, machine learning and stream processing with Apache Spark, and allows you to easily understand each of these concepts through real-world examples.

Mar 2017 11h 52m

Artificial Intelligence for Big Data

Artificial Intelligence for Big Data

Create smart systems to extract intelligent insights for decision making. You will learn about widely used Artificial Intelligence techniques for carrying out solutions in a production-ready environment. You'll explore advanced topics such as clustering, symbolic and sub-symbolic information representation, and many more.

May 2018 12h 48m

Personalised recommendations for you

Based on your interests and search pattern

Mathematics of Machine Learning

Mathematics of Machine Learning

Deepen your theoretical knowledge and enhance your ability to solve complex machine learning problems with structured guidance. Gain the confidence to engage with advanced ML literature and tailor algorithms to meet your project requirements.

May 2025 24h 20m

Generative AI with Python and PyTorch

Generative AI with Python and PyTorch

Learn how to create images and text using VAEs, GANs, LSTMs, and transformers. Implement applications in natural language processing and computer vision through practical tutorials.

Mar 2025 15h 0m

Practical Generative AI with ChatGPT

Practical Generative AI with ChatGPT

This book helps you unlock ChatGPT's potential to make your working life better. From prompt engineering to creating custom GPTs, you'll enhance your productivity, creativity, and efficiency with practical insights and advanced techniques.

Apr 2025 12h 52m

Generative AI with LangChain

Generative AI with LangChain

Gain a solid foundation in LangChain, agentic AI, and LangGraph, and learn to build production-ready systems with multi-agent architectures, advanced RAG pipelines, Tree of Thought reasoning, agent handoffs, and fine-grained error handling.

May 2025 15h 52m

Architecting Power BI Solutions in Microsoft Fabric

Architecting Power BI Solutions in Microsoft Fabric

Power BI provides several options to solve common data problems, and designing the correct solution for each scenario can be a daunting task. This book makes it easier by guiding you through designing optimal solutions using Power BI.

Apr 2025 14h 16m

Microsoft Identity and Access Administrator SC-300 Exam Guide

Microsoft Identity and Access Administrator SC-300 Exam Guide

This comprehensive guide covers key topics such as Microsoft Entra ID implementation, authentication and access management, external user management, and hybrid identity solutions, providing practical insights and techniques for SC-300 exam success.

Mar 2025 19h 48m

LLM Design Patterns

LLM Design Patterns

This book helps you gain practical skills to develop and deploy LLMs. You'll learn data prep, training, pruning, quantization, and evaluation, as well as explore RAG, advanced prompting, and optimization to build robust, scalable language models.

May 2025 17h 48m

Tableau Cookbook for Experienced Professionals

Tableau Cookbook for Experienced Professionals

Advance your Tableau knowledge beyond the basics, streamline dashboard performance, tackle advanced geospatial challenges, and unlock API potential while fortifying your corporate data infrastructure with proven best practices.

Apr 2025 12h 24m

Time Series Analysis with Spark

Time Series Analysis with Spark

This book offers a complete guide to time series analysis with Apache Spark and Databricks, covering essential concepts and advanced techniques including Generative AI to equip readers with skills for real-world challenges across industries.

Mar 2025 9h 56m

Hands-On Artificial Intelligence for IoT

Hands-On Artificial Intelligence for IoT

Transform IoT systems with the power of artificial intelligence using this hands-on guide. Dive into practical techniques and expert insights to innovate and optimize your IoT devices, making them smarter and more efficient.

May 2025 15h 44m