YARN label-based scheduling
In this recipe, we will configure YARN label-based scheduling. In a cluster, there can be a mixture of nodes with different configurations, some with more memory and CPU compared to other nodes in the cluster.
If we want to control which set of nodes a job executes, we need to assign labels to the nodes. A typical case could be that you want to run a Spark streaming job and want that to execute on nodes with high memory. For such a situation, we will configure the queue and assign a set of nodes for that, so that if a job is submitted to that queue, it executes on the nodes which have higher configuration in terms of memory and cores.
Getting ready
Make sure that the user has a running cluster with at least two Datanodes and YARN working perfectly. Users are expected to have a basic knowledge about queues in Hadoop, for which they can refer to the previous few recipes in this chapter.
How to do it...
Connect to the
master1.cyrus.com
master node and switch as userhadoop...