A parallel computation is at the heart of big data computing. However, to specify what becomes parallel, we usually think of a conceptual model of parallel computation often called, a programming model. A parallel programming model is a way to specify abstractly, how a parallel program will run. Naturally for a program to be parallel, there must be a number of concurrently operating processes. But how do these processes communicate and exchange data? How do they decide, when to communicate with each other? Further, what exactly is done in parallel? To think of the first question. Two processes communicate data by sharing memory. Indeed, there are architectures in which all memory in multiple machines can be made to virtually look like, one large addressable memory space. However, two processes, we also communicate by passing messages to one another. Either, directly from one process to another, or through a common message carrying pipe, often called a message bus. The second question can also have multiple answers. Two of the most common ways of achieving parallelism are pass parallelism and data parallels. In task parallelism, a large task can be decomposed into multiple sub-tasks, each of which can be run concurrently. In data parallelism, the data can be partitioned into many smaller fragments and operation can run on each partition, independent of the others. Typically, these partial operations have then synchronized and partially process data combined to produce a full answer. Many parallel data management systems, operate a partition due to parallel manner. We need to remember that task parallelism is somewhat independent of data parallelism. And it is possible to have both problems of parallelism in a programming model. It is important to emphasize the issue of a programming model, should be not be confused with the issue of a programming language. A programming language is independent of the programming model. And therefore, a programming model can be implemented in several different languages. As I mentioned, the programming model we are going to consider is BSP. BSP wasn't initially created for graph computation. It was thought of as a parallel computing model, that will bridge the gap between software models of parallel computation, and hardware capabilities for supporting parallelism. The basic idea of BSP is as follows. There are number of processors. Each processor can perform local computation, using its own local memory. There's a router, which can serve to pass a message from any processor to any other. When one pair of nodes are exchanging messages, another third node can still perform computation. There's a facility that can synchronize the state of all auto-substative processes. This synchronize may either happen periodically, at intervals of L time units or there may be another way of specifying, when this synchronization is going to happen. But when it does, all processors affected by it, will come to a consistent state. When synchronization is performed, a fresh round of computation can start. We call this. Synchronization point. Barrier synchronization. Because all executed processes must reach this barrier point, before the next step of processing can continue. A BSP program is broken up into a sequence stop supersteps. In each superstep, each processor will get the data, if needed. Performance computation if needed and then, exchange data with the right partner. Once all the nodes are done, the system helps to synchronize. Then, the next round starts. Each processor can determine, if it needs to compute or exchange data. If not, it will make itself inactive. If required later, a processor can be woken up to be active again. When all processors are inactive, the computation stops. In applying BSP model to graphs, we make a few assumptions. We assume that a processor is synonymous with a vertex. So for a processor can only send messages to or receive from, its neighboring processes. We also assume, a vertex has an ID and possibly a complex value. And an edge, we also have an idea and fact. Each vertex knows, which edges it's connected to. We cannot think of a computation as a vertex centered task. We shall [INAUDIBLE] what this means. In a now famous paper from Google. This programming model was called, think like a vertex. Well to think like a vertex, we need to know what a vertex can do. Here's a list of actions, a vertex can take. The first one is easy. A vertex can find its own identifier. The second operation is to get or set the value of the node. This operation may be a little involved, if the value is a complex data object. For our purposes, we'll assume the value is a scalar. Next, a node can ask for its own edges, the result of the operation is a set of edge objects. A node may also count its edges. Since we are referring to outgoing edges throughout, this is the out degree of the vertex. Recognize that, this means a vertex does not natively have access to its incident notes. However, it does have control of the outgoing edges. So it can get inside the edge values. There may be two different ways of specifying an edge. The edge we have an ID that the node can get. Passively, more commonly, an edge is identified the vertex of targets. So in our diagram V1 lasts for the edge, targeting V4. So in the situation like v3 and v5, we're there are multiple edges between v3 and v5, the source v3 can ask for the values of all edges going to v5. The operate operation can add or remove an edge of a vertex. Finally, since the vertices are processes, they can start or stop computing. Typically a node wakes up, if it receives a message from another node. Now in comparison to a vertex, an edge can do far less. It can get its own ID if the system allows edge ID's, it can set and retrieve its own values, and it can get the ID of the node its pointing to. That's it. Now, we still have not defined, how to think like a vertex. That's what we'll do next, using an example that we have seen several times before. It's Dijkstra's single source shortest path, SSSP, algorithm. We have seen this algorithm before. But now, we'll show how to compute it in a parallel setting using BSP. Here is the edge value, that's the weight of the edge. This is known before the algorithm starts. Each vertex, runs the exact same routine concurrently. Each vertex asks. 1, is it super step zero? 2, if yes, then if this is a source vertex, it sets the value to zero. Else, it sets the value to infinity, which is a large number. The source vertex, propagates its edge value to the nodes at the other end of the edges. Just the source vertex. All other vertices are quiet. The propagation process works like this. A vertex gets its own value, which for the source vertex is zero. It gets its edges, for each edge, it gets the value of the edge, adds it to its own value and sends the result to the end point of the edge. The blue numbers indicate that the messages are sent. All vertices go to halt. That is, they now have hit a synchronization barrier. Notice, that the receiving nodes do not look at the messages yet. It's the system's job to ensure that the messages are available to these nodes at the next superstep. All nodes who have received messages wake up and read the messages. If a node receives multiple messages, it picks the minimum. In our case, the two active nodes have received only one value each. We have for the sake of convenience, colored the processed edges in yellow. This is just for visualization purposes in this example. Now, it compares this band for a minimum value, to its own value. And if its own value is greater, it sets its own value with the minimum value. In our case, both notes set their value to that, of the incoming message. The same propagation routine works again. So, each note completes the new distance. And sends the message along an edge to the other endpoint, then halts. The same step is repeated in the next superstep. At this point, the nodes have updated their values. The node with the value 6, has received our message, just along one of the three edges on it. Continuing. At the end of superstep 2, all nodes are ready to receive messages from all their incident edges. The node with a value 6, received a value which is lower than its current value. Now the active nodes, have no more messages to send. So each vertex, votes to halt. The vertex ID and the value are read out from each vertex, and then the process starts. If these nodes are in different machines of a cluster, the system will rely on the underlying platform like YARN. Or sparks underlying infrastructure to ensure, that the edges going across machines can send and receive messages effectively. This should give you a sense of the speed of this process for a large scale network.