The next one is Yarn, because after you see one resource manager, all of them start looking similar in terms of the capabilities. So we'll look at Hadoop Yarn. As I just said, Yarn as an acronym, it stands for Yet Another Resource Navigator. So all of these start becoming looking fairly similar to one another. The goals of Yarn are very similar to Mesos. They want to do resource sharing of the cluster for multiple frameworks, recognizing that different types of applications are running couple of major differences. One major difference from Mesos is that Mesos is offer based and yarn, it is request based as opposed to offer based. So another words, the application say what they need and then the resource manager makes the decision. The reason why they went to this approach is because the application may be in a better position to know what kind of resource it needs, then getting an offer that may not be matching its needs. So they took the approach of going for request based system as opposed to an offer based system. The other important distinction is that in Mesos, the framework is in control, and if you think about a framework, maybe it's a Hadoop framework, and the other whole bunch of Hadoop applications running, and the framework is making decision, and the decision of that framework may be the same for all of the different applications that belong to that category. They want to make the distinction in Yarn, that even though they may all be one class of application, but they may be based on different iterations or versions of Hadoop, and therefore the policies may be slightly different and those variations can be easily accommodated if it is request based as opposed to offer based, because in offer based you're leaving it up to a particular framework to make the decision for a whole collection of application. But here, individual applications can make their own decisions on how requests can be made to Yarn. So let's look at the details of this. So before I talk about Yarn itself, I should give you a little bit of background of traditional Hadoop open source implementation. Many of these resource managers started out assuming not incorrectly, that most applications can't be Map-reduce applications. So the idea is that clients submit their jobs and there is a job tracker that keeps track of how many clients have submitted jobs and these are all the machines on which I can run the tasks that belong to these different jobs, and what we are getting back from these machines, there is an entity called task tracker that runs on each one of these machines and this task tracker is the one that is saying what's happening to the tasks running on that machine. Have they completed? Have they died? Those sorts of information is being funneled back to the job tracker so that it can keep a scoreboard of what's happening to a particular application. So that's, the Map-reduce status is being sent through these solid arrows that you see, and the job submission is these dotted arrows that you see here. This is the traditional Hadoop implementation of the resource manager for Map-reduce applications. So there is this jobtracker task tracker organization, and what that results in is in poor cluster utilization, and they noticed this in Yahoo, in a bizarre types of situations. First of all, as I said, most of the applications tend to be Map-reduce applications, and typically, the task tracker framework has distinct Map-reduce slots created, and we mentioned when we talked about different programming frameworks that Map-reduce is powerful, but it is not good for all kinds of applications. But before other frameworks came into being, Map-reduce was the only game in town. So if I have an application and I have to launch it on the Cloud, I have to use the Map-reduce framework. What application developers were doing were, they were abusing the Map-reduce paradigm. For instance, if I just wanted to create a pool of worker threads for a particular service, I'll just pretend as though it's a map only application. There's no reduced fees, it's just a bunch of worker threads response. So if you do that, and as a framework thinks that these are all Map-reduce applications, then the flux that you created for reduce may all be wasted. So these are the problems that they encountered and that is the reason they decided to build a new resource manager, which is Yarn. So in Yarn, what happens is that the structure, if your going from the previous picture to this picture, it doesn't look very different. There's this box, and the box is here, but what is inside each of these boxes is what is different. So there is a resource manager and the jobs submitted by the clients to the resource manager. Anytime the client submits an application to the resource manager, the resource manager will launch an application manager. So this is the application manager that you see here, application master, if you will, that is started up in one of the nodes, and this application master will register with the RM, The Resource Manager boot up, and it'll request what are called AC. AC is a nothing but application containers, meaning how many containers do I need to run my application? So that is something that the resource request that is coming back from the Application Master, saying that I need application continues to respond. Then when that happens, the Resource Manager is going to allocate these application containers in concert with the Node Manager, and Node Manager gives status information saying how much resources that are available. So the resource manager can node balance across these different Node Managers in saying when a particular instance of an application container has to be launched, you can make a decision, "Should they launch it here? Should we launch it here, or should I launch it here?" So once it launches that, you can see that here is an Application Master, and its application container is running on a different node, and here is an application master, and one container for this is running here, one container is running here. So it doesn't really matter where resources are allocated, and that is something that the resource manager does in terms of load balancing across all of these Node Managers, to launch the application containers, and once the application completes, the Application Master shuts down, and indicates to the Resource Manager so that the resource manager can free up the resources. An earlier question as to what is the scheduling policy that is used? Yarn can implement different scheduling policies. It can use first-come-first-served if it wants to, or you can use a variant of fairshare scheduling or capacity scheduling, meaning what are the capacities available in each one of the nodes? What are the requirements? Based on that you can do that. So all of these are policies that can be ingrained in the Resource Manager that gives you the flexibility in this setup. If you look at it from 10,000 feet distance, Mesos and Yarn are very similar, in that they want to support multiple frameworks because every framework has different needs, and it wants to provide the resources as finer grain of allocation as possible. The only difference being that in yarn it is done request based and in Mesos it is done based on availability and making offers to the frameworks.