Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Arrow up icon
GO TO TOP
Mastering Apache Storm

You're reading from   Mastering Apache Storm Real-time big data streaming using Kafka, Hbase and Redis

Arrow left icon
Product type Paperback
Published in Aug 2017
Publisher
ISBN-13 9781787125636
Length 284 pages
Edition 1st Edition
Languages
Concepts
Arrow right icon
Author (1):
Arrow left icon
 Jain Jain
Author Profile Icon Jain
Jain
Arrow right icon
View More author details
Toc

Table of Contents (19) Chapters Close

Title Page
Credits
About the Author
About the Reviewers
www.PacktPub.com
Customer Feedback
Preface
1. Real-Time Processing and Storm Introduction FREE CHAPTER 2. Storm Deployment, Topology Development, and Topology Options 3. Storm Parallelism and Data Partitioning 4. Trident Introduction 5. Trident Topology and Uses 6. Storm Scheduler 7. Monitoring of Storm Cluster 8. Integration of Storm and Kafka 9. Storm and Hadoop Integration 10. Storm Integration with Redis, Elasticsearch, and HBase 11. Apache Log Processing with Storm 12. Twitter Tweet Collection and Machine Learning

Introduction to Hadoop


Apache Hadoop is an open source platform for developing and deploying big data applications. It was initially developed at Yahoo! based on the MapReduce and Google File System papers published by Google. Over the past few years, Hadoop has become the flagship big data platform.

In this section, we will discuss the key components of a Hadoop cluster.

Hadoop Common

This is the base library on which other Hadoop modules are based. It provides an abstraction for OS and filesystem operations so that Hadoop can be deployed on a variety of platforms.

Hadoop Distributed File System

Commonly known as HDFS, the Hadoop Distributed File System is a scalable, distributed, fault-tolerant filesystem. HDFS acts as the storage layer of the Hadoop ecosystem. It allows the sharing and storage of data and application code among the various nodes in a Hadoop cluster.

The following are the key assumptions taken while designing HDFS:

  • It should be deployable on a cluster of commodity hardware.
  • Hardware...
lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime
Visually different images