Introduction
In the previous chapter, we talked about how to use Mahout and R to solve machine learning problems. In this chapter, we are going to talk about the latest sensation in the Big Data industry called Apache Spark. By now, everyone is aware, and has acknowledged the power of Apache Spark. This is a general and fast engine that processes large-scale data. It provides high-level APIs in Java, Scala, Python, and R. Spark can perform batch processing as well as stream processing. In this chapter, we are going to explore certain important topics related to Apache Spark such as batch processing, Spark SQL, streaming processing, machine learning with MLib
, and graph processing using Spark's GraphX
library. So, let's get started.