One of the things that should be apparent in looking at the map-reduce framework is that it aims for simplicity, which applies for a large class of applications. But it is coming at the expense of generality and performance. And in particular, it cannot have arbitrary computations, right? You have to somehow put every computation that you want into the map-reduce framework, and that can be pretty artificial depending on the kinds of applications that you want to do. And also you're forced to use files for communication among the application components, and it is a strict two level graph. There's a mapper and reducer, and there's a single input, single output channel from a mapper to the reducer. So these are the restrictions that are there in the map-reduce framework. Having said that, I should mention that map-reduce framework is extremely popular because a large class of applications fit into that mold. But it is it is the case that generality may allow you to sort of use the data center resources for more diverse kinds of applications. And Dryad was proposed as another programming framework for dealing with general acyclic graph, which represents the application as opposed to a two level graph. So the vertices of this graph are the application components. It may have arbitrary set of inputs and outputs that is being generated or consumed by each vertex. The edges are the application specified communication channels, and now, as opposed to the map-reduce framework where you're forced to use files as a communication channel, your communication channel can be shared memory, can be TCP socket, can be files. So that generality is there so that you can take advantage of the specialization in terms of how the communication channels are instantiated to get performance advantage over the system. So that's sort of the general design principle. And and in terms of primitives, what the application developer writes are subroutines. Okay, similar to map-reduce framework, in that case their writing just a map and reduce function. But here it is an arbitrary data flow graph that you want. So you write the subroutines for that, and then you put together the data flow graphs that you want. And in order to put together the data flow graph, Dryad provides composition primitives. And by the way, Dryad is a system that was developed by Microsoft. Map-reduce is a system that was developed originally at Google. So the the composition primitives that Dryad provides you are available as C++ library to actually build the application. So let's look at the the primitives here. So here is a vertex that is taking in inputs and generating output. And as I said, you can have any number of inputs, any number of outputs. And you can clone a particular vertex. So here is the cloning primitive. And the cloning primitive, I mean all of these are being textually done because programmatically that's easier to do, right? So that's why it is provided in the C++ library, so that you can programmatically say I want to clone this A n times. And there's a cloning operator, so it creates n instances of this particular subroutine. And then you can have composition primitive. And composition primitive is essentially saying, well, I've created B instances of A, and I've created n instances of A. And now I want to compose these two into a graph. And the composition primitive if you use this, what it is saying is the one-to-one mapping between the n instances of A to the n instances of B. That's what this composition primitive does. You also have another composition primitive, which is generating a true bipartite graph. That is, all the A outputs are connected to all the B inputs in this form of composition. And you can also have merging of graphs. So for instance, if I create two subgraphs, A and then connect it to B. And I create another instance, A connected to C, then I can merge that into this composite like this, which says that the output of A goes to both B and C. So that is the merge operator that's available. And you can also have fork-join primitive. So you can use the composition primitives in this creative way to create A is connected to B, B is connected to D, A is connected to C connected to D. And then you can use the merge operator to have this fork-join structure for the graph. So essentially, you write the subroutines, and then you can put together the graph using the composition primitives that are available in Dryad. And you can also encapsulate, meaning you can create a new vertex out of a subgraph. Let's say that the subgraph I've created, and then I say I want to use this as a vertex, I can use an encapsulation primitive to create an aggregation of vertices that becomes one new mega vertex, if you will, in the graph. And then there is a set of primitives providing the transport for the edges, whether it can be shared memory or files or TCP IP connection and so on. So those are the primitives that are available in Dryad. And once you develop this application, the way you execute it, once again the scheduling, distribution, all of that is handled by the runtime system, you as the application developer wrote the subroutines, constructed the graph. Which is semantically saying how the subroutines talk to one another and what the data flow is, and then there is, this is your input, and of course dataset also is your input. And the job manager of the Dryad system is taking yo graph specification and it is consulting with a name server to find out what are all the nodes that are available at any point of time. And then it is going to launch a portion of the application graph on the available compute nodes. And so the only interaction for the job manager with the computation is in the control plane of launching these application components of the subgraph onto the computation nodes. And all of the actual communication inter subroutine or the data flow communication that's happening is entirely specified through the data plane as files or TCP IP connection or shared memory and so on. So that's sort of the Dryad systems. It gives you the flexibility over the map-reduce framework, that you have the ability to create construct arbitrary graphs. And you also have the ability to specify what kind of transport you want to use for communication among the members of the data flow graph. The complexity of the data center management is entirely in the job manager. It is not something that the application developer has to worry about.