So the next lesson that we're gonna discuss is an exemplar for user-space packet processing. I mentioned there are several different alternatives and all of them are one thing in common that is bypassing the kernel. And one exemplar of that is DPDK or Data Plane Development Kit, which was originally proposed by Intel. And already been taken over by the Linux Foundation. And we will look at this particular exemplar in detail. So this DPDK was developed by Intel in 2010, and like I said, it's an open source project at the Linux Foundation. And you can find details of that at this link. And by the way, one thing I forgot to mention even in the previous lecture is that I have lots of resources in every one of these slide decks at the end. So it's important for you all to visit some of those resources and get more out of this lecture than what I can deliver in this short form and you'll also be tested when we have these assessment of what you learned in the lectures. And therefore it's important that you visit the resources that I have at the end of this lecture. So coming back to DPDK. It's a set of libraries to accelerate packet processing, and it targets a wide variety of CPU architectures, so that Know portability is not an issue. And basically, what it provides you is user level packet processing that avoids or mitigates the Linux kernel overheads. The first feature is that buffers for storing incoming and outgoing packets are in users-pace memory. So in other words here is your operating system kernel and this user-space and the transmit buffer and receive buffers are allocated in user-space, and they are directly accessible for the NIC okay, that's number one. The second thing is NIC configuration registers. So, there are registers in the NIC, that you have to manipulate. Those are things that you have to do in the as part of the device driver, but those configuration registers are also mapped in the user space. And the simple technique is the memory mapping technique. And that's the way by which you can make these user-space transmit buffers user-space registers to be mapped to the physical registers that the NIC has. The way that is being accomplished is through enhancements to the NIC hardware in the PCIE spec and and that way, now the user space application can directly access these registers and modify them. Okay, so basically this allows us to do is effectively bypass the kernel for interacting the NIC. So the application is running here and that application can directly interact with the NIC by the fact that the transmit buffers and the receive buffers are in the address space of this application. And similarly, the configuration registers are also mapped into the address space of this application and therefore, normal reads and writes into these memory locations actually result in affecting these configuration registers. And the NIC can directly DMA in and out of these receive buffers, there are mapped into the user space of the application. That's the key feature of DPDK. And like I said it's a user space library, but of course, there is a small component of this DPDK library that has to sit in the kernel. And that part is essentially for initialization of the user space packet processing. And that is needed to initialize the NIC to DMA to appropriate memory locations in the user-space. So that setup has to be done in the kernel space, but once that setup is done, it is transparent to the user space in terms of how everything flows directly from the NIC, to the user-space and back. So the setup that is being done by the, quote unquote device driver, as a small footprint device driver that sits in the kernel space is essentially to set up the PC configuration space and updating the registers when it is done in user-space. So that's the only thing that is being done by this the device driver that is sitting in the kernel space everything else is happening at the application space. So with bypassing the kernel, what we are doing in the network function is we're using polling as a way by which we can we can access the data to transmit and receive buffers. As you said earlier, interrupts are expensive. And therefore, we don't want interrupts but instead of that the application because interrupts also have to flow through the kernel, and we want to get the kernel out of the way. And so instead from the network function, which is running in the application space, we're gonna pull the transmit and receive buffers and this will allow accessing the receive and transmit queues that is being exposed to the NIC without having to reveal an interrupt. Essentially packet arrival interrupt the disable so that the NIC directly transmits into the transmit and receive queues. And the registers as I mentioned, which are mapped into the address space of the application, you can actually sample those registers to see if there is any packets that have been received and if so, then you take the appropriate action. So the code, that you very rudimentary code snippet that I'm showing you is showing what would happen in a network function. What you're doing is you're doing a bulk receive on a particular port for packets, and then look up the packet header. Remember in the context of Load Balancer you have to look at the header to know the five tuples. And we will deal with failures somehow, I am not going to show you that here. And what we are going to do is put on the output buffers the packet that we want to send out. And for our output port and output ports, you will do a bulk transfer of all the packets. So the key thing is doing bulk receive and bulk transmit, and that's the way we can avoid the per packet overheads that are being done both in terms of receiving and sending packets as well as in terms of feeling interrupt because we're doing this poll mode device driver. So what is happening is that now we're using the CPU in a busy mode, always polling for packets even if there are no packets to be received. And this might seem like it's an expensive thing to do. But the point is that network function is a specific thing that you're doing and you can dedicate a core in a processor to just deal with network function and it is continuously dealing with network function, implementation of packet processing. And so what we can accomplish by that is the receive and transmit, can happen in batches and that increases the efficiency.