When we talked about network fabric, we said that as you go up in the hierarchy from the leaf nodes where the servers are, the bandwidth keeps shrinking. That is one thing that is happening. This constriction that can happen at the root. Similarly, when you go across the bisection of the network, meaning from one part which is over on the left side, to another part that is over on the right side, you're going through the network fabric and crossing the bisection bandwidth in order to get to that. So the question that comes up is, in terms of applications that run on these data center networks, how do they behave? What traffic do they generate? Is there constriction of the root? Is the bisection bandwidth sufficient? All of these are interesting questions that you have to worry about in designing the next generation of network. So there are lots of studies that have been done on this, and what I'm going to do is just give you one particular study that has been done that is interesting. Then I'm going to tell you something about classification of data-center traffic, looking at Yahoo has a concrete instance of a Cloud provider. We're also going to look at the structure of Google WAN from a 10,000 foot distance, even though that's not the central topic of this course, which is more on Cloud computing, but it is still interesting to look at Google WAN because of the deployment of the SDN technology in the WAN. This network traffic study was done by these folks here, Benson and others, and they settle to ask these questions, given that your structure of data-center may look like this, that you have a hierarchy of switches that you are going through, the lowest level, you have these edge switches and you have the aggregations switches, and you have the core switches at the top that connect all of these elements together. So the question is, are the links over-subscribed as you go up? Because you've seen in the VL2 design also that there is over-subscription as you go higher in the tree. How is or links over-subscription that is there in the physical design, how is it affecting application performance? That is one question that he said about the answer. The other thing is, is there sufficient bisection bandwidth when you have a structure like this? The third question is centralization. Meaning, you have an SDN controller that is setting the switches, is that a feasible way of thinking about data-center networks? So these are the three questions that they set out to answer. The setup for this particular study, as I said, they have the classic model of the data center network in terms of core switches, which is at the level 3 in the network stack. The aggregation switch which is at level 2 when it is tracking across and level 3 when it goes up to the core level. The edge switches, which is using top-of-the-rack loading with the level 2 technology. They looked at three classes of data centers. One is a university data-center, and other one is a private enterprise, like a private Cloud in which you have only proprietary traffic going on. Then finally, an open Cloud in which anybody can be running applications. The user community based on this classification of data centers could be entirely internal, which is what is the case of it is a university Cloud or at a private Cloud. It could be external, meaning it is open Cloud and therefore, the user community is not confined to any particular organization. The methodology that they used for the study is analyze running applications using packet traces. What they did was to quantify the network traffic from the application. So that's the setup. The details I'm not presenting in terms of how they did the study, but it's interesting to look at the significant results that they have. One of the things that they found is that significant amount of small packet traffic exists in all of these data centers. For example, 50 percent of the traffic is less than 200 bytes in terms of what's going on. The reason is because there's a lot of control messages that could go wrong, like TCP acks and "I am alive" messages that are exchanged between the hosts and so on. Those are all very short messages. If you take the CDF, cumulative distribution of all of the packets that are exchanged in the entire network, and look at the distribution, they find that 50 percent of the traffic is actually very small messages. There are going to be big messages also, but most of them tend to be small messages. That's number one. Connection persistence is the key. This is because a lot of applications are using TCP/IP as the mode of communication. We know that in TCP/IP you have a connection setup with the expectation that you're going to use it for a significant amount of time, and so there is important in terms of the persistence of the connection to make sure that the application runs well. In terms of the distribution of traffic, most of the traffic, 75 percent on this study, across these different classes of Clouds, university, private and general Cloud, they found that 75 percent of the traffic is within a rack. Remember that data-center is organized as a rack of machines, and all of the switches that we're talking about is about this. So most of the communication that is happening within this tends to be within a rack and so 75 percent of the communication is within this. Which indicates that Cloud providers do a pretty good goal co-location of application components in order to make sure that they take advantage of locality of communication for the application that are being placed in the Cloud. On the other hand, if you look at the university or the private cloud, there was quite a bit of inter-rack communication, 50 percent, that indicates that there is un-optimized placement of the applications in the other neuroscientist. The other thing that is worth noting is the link utilization. We expect that the link utilization will be the greatest at the edge level, and then next to the aggregation level and the core level. The other question is bisection bandwidth. It is interesting to note that only 30 percent of the bisection bandwidth was actually used in all of these applications in the study. Because this is one set of data points you have to take it all. This is a pinch of salt because it's one particular study, but they didn't cover a lot of different kinds of data centers and different kinds of applications to draw their conclusions. So with that caveat, the insight that you can take away from this study in terms of the questions that he was starting to answer, are the links over-subscribed? The answer is no because 75 percent of traffic is within a rack. So we said that as you go up the hierarchy, it is getting constricted towards the root but that's not effecting the data-center applications. In particular, the core link utilization was less than 25 percent. So the fact that there's not much bandwidth available near the core doesn't affect most of these applications. But at least for the other kinds of data centers, the university ones or the private Cloud, you need better load balancing and VM placement and VM migration. Because one thing, if you say that most of the traffic is within a rack, it also means that if I have a bunch of racks here, maybe all of the applications are running here and maybe under utilizing the resources that are out there. So you need better VM placement and VM migration in order to make sure that you have full utilization of not just the networking resources, but the computational resources in the data centers as well. Then the other question that is set to answer is, is there sufficient bisection bandwidth? The answer is yes because the utilization of the bisection bandwidth is less than 50 percent. A lot of it comes from the fact that we're dealing with a large amount of small packets, and given that the bisection bandwidth doesn't come to bite you as much as you think it might. Centralization is perfectly feasible. We know that because of the fact that most of these applications tend to use TCP/IP as a transport protocol. So setting up switches actually is used for quite a bit of time, and therefore, centralization using SDN is a good way to go in building these data center networks.