Apache Spark installation
Spark is a cross-platform framework, which can be deployed on Linux, Windows, and a Mac Machine as long as we have Java installed on the machine. In this section, we will look at to install Apache Spark.
Note
Apache Spark can be downloaded from http://spark.apache.org/downloads.html
First, let's look at the pre-requisites that be available on the machine:
- Java 8+ (mandatory as all Spark software runs as JVM processes)
- Python 3.4+ (optional and used only when you want to use PySpark)
- R 3.1+ (optional and used only when you want to use SparkR)
- Scala 2.11+ (optional and used only to write programs for Spark)
Spark can be deployed in three primary deployment modes, which we will look at:
- Spark standalone
- Spark on YARN
- Spark on Mesos
Spark standalone
Spark standalone uses a built-in scheduler without on any external scheduler such as YARN or Mesos. To install Spark in standalone mode, you have to copy the spark binary install package onto all the machines in the cluster.
In standalone...