Hadoop file formats
In Hadoop, there are many file formats available. A user can select any format based on the use case. Each format has special features in terms of storage and performance. Let's discuss each file format in detail.
Text/CSV file
Text and CSV files are very common in Hadoop data processing algorithms. Each line in the file is treated as a new record. Typically, each line ends with the n character. These files do not support column headers. Hence, while processing, an extra line of the code is always required to remove column headings. CSV files are typically compressed using GZIP codec because they do not support block level compression; it adds to more processing costs. Needless to mention they do not support schema evolution.
JSON
The JSON format is becoming very popular in all modern programming languages. These files are collection name/value pairs. The JSON format is typically used in data exchange applications and it is treated as an object, record, struct, or an array...