Hadoop Distributed File System
The Hadoop Distributed File System (HDFS) is a file system, as the name states, similar to Windows (FAT or NTFS) and Linux (swap, ext3, and so on) that is specifically designed for distributed systems that need to process large datasets with high availability. It is optimized to work with multiple nodes and spread data across them in a clustered computing environment to assure the maximum availability of data. The main difference between HDFS and other distributed file systems is that it is designed to run in a low-cost hardware setup and is highly fault-tolerant. The basic architecture of HDFS is shown in thefollowing diagram:

Figure-3.1.1
HDFS has two main components that basically work using the master and slave concept. These two components are called the Name Node, which is the master node, and the DataNode, which is a slave node.
NameNode is the key for the whole file system. It is responsible for spreading the data across different data nodes based on the...