Chapter 1. Obtaining and Cleaning Data
In this chapter, we will cover the following recipes:
Retrieving all file names from hierarchical directories using Java
Retrieving all file names from hierarchical directories using Apache Commons IO
Reading contents from text files all at once using Java 8
Reading contents from text files all at once using Apache Commons IO
Extracting PDF text using Apache Tika
Cleaning ASCII text files using Regular Expressions
Parsing Comma Separated Value files using Univocity
Parsing Tab Separated Value files using Univocity
Parsing XML files using JDOM
Writing JSON files using JSON.simple
Reading JSON files using JSON.simple
Extracting web data from a URL using JSoup
Extracting web data from a website using Selenium
Webdriver
Reading table data from MySQL database