Chapter 9. Foundations of Datasets/DataFrames – The Proverbial Workhorse for DataScientists
From a data wrangling perspective, Datasets are the most important feature of Spark 2.0.0. In this chapter, we will first look at Datasets from a stack perspective, including layering, optimizations, and so forth. Then we will delve more deeply into the actual Dataset APIs and cover the various operations, starting from reading various formats to creating Datasets and finally covering the rich capabilities for queries, aggregations, and scientific operations. We will use the car and orders Datasets for our examples.