Cloud Dataproc is Google’s managed Apache Spark and Apache Hadoop service. Both Spark and Hadoop are designed for “big data” applications. Spark supports analysis and machine learning, while Hadoop is well suited to batch, big data applications. As a Cloud Engineer, you should be familiar with creating a Dataproc cluster and submitting jobs to run in the cluster.
To create a cluster, navigate to the Dataproc part of Cloud Console (see Figure 12.31).
FIGURE 12.31 Dataproc console page
Create a Dataproc cluster by filling in the Create Cluster form. You will need to specify the name of the cluster and a region and zone. You’ll also need to specify the cluster mode, which can be single node, standard, or high availability. Single node is useful for development. Standard has only one master node, so if it fails, the cluster becomes inaccessible. The high availability mode uses three masters.
You will also need to...
The rest of the chapter is locked
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of