Chapter 8. Working with Spark SQL
This chapter will introduce Spark SQL and related concepts, like dataframe and dataset. Schema and advanced SQL functions will be discussed from the Apache Spark perspective; and writing custom user-defined function (UDF) and working with various data sources will also be touched upon.
This chapter uses Java APIs to create SQLContext
/SparkSession
and implement dataframes/datasets from Java RDD for raw data, such as CSV, and structured data, such as JSON.