Hello, and welcome to this course in which we're talking about using Python for active defense at the network level. In this video, we're going to be talking about protocol decoding. So as we've mentioned previously, malware authors will commonly use encoding or encryption to help protect their C2 to traffic, data exfiltration, etc, from discovery and analysis by security experts. So for example, it could be that they'll embed base 64 encoded traffic in another protocol or use AES to encrypt any data traveling from the client to the server and back again. And so the goal of this use of encoding and encryption is to make it harder to read and analyze the traffic being sent over the C2 channel. If the traffic is encoded or encrypted, it's not possible to perform string matching on the plain text without decoding or decrypting. And so we're going to be focusing on encoded traffic in this video. And the reason why is that we're focusing on the network level and what we can do pretty easily in Python here. So if you got encrypted traffic, you need access to the encryption key to decrypt the traffic. And this means that if you're using symmetric encryption there's a good chance that that key's embedded in the malware somewhere. But it'll take a fair amount of analysis to find it, extract it and make use of it. In contrast, protocols like Base64 encoding have no secret key in their public and have a known structure, known encoding, known decoding, etc. In fact, Python has built in libraries for performing base 64 encoding and decoding. And so what we'll be looking at in this video is Python code that's designed to attempt to identify base 64 encoded data in network traffic. And then decode it, which would allow the plain text to be presented to a security analyst, or fed into tools that could perform string matching or other techniques to detect malicious or suspicious content in the traffic. And so let's start out at the bottom of this program here. So we're going to be performing analysis at the network level. And so that means we need a way to gain access to network traffic and scape is a very useful tool for this, built-in python library. And with its sniff function, we can gain access to network interfaces and take a look at the various types of traffic traveling over the network. And so here when we're calling our sniff function, we're going to pass in the argument PRN equals analyze packets, which says for every packet that you see call the analyze packets function on it. We could also apply filters to cut down on the amount of data that we're analyzing. But we want a fairly wide variety here. Our analyzed packets function, on the other hand, is going to start trimming down the packets from the entire capture to things that we think might have a high probability of containing encoded traffic. And so we're going to be checking to see if we have an HTTP packet here. So we'll check has layer HTTP request or HTTP response. And then we'll call extractHTTP on the packet itself. We're also going to check to see if the packet has a raw layer. And so a raw layer, it's the payload of the packet. And so this might be data that just a particular packet happens to be carrying. And so in either case, if we have an HTTP packet or something just carrying a payload of data, we're going to call a function to perform further analysis of it. And this certainly isn't an exhaustive list of the potential places where we could find base 64 encoded data hidden inside a protocol. For example, we could have DNS traffic that uses the sub domain to carry base 64 encoded data. That's not something we're covering here. But we could easily extend this code to cover that and other potential cases. So for these two cases, the raw one is definitely the simpler of the two. It just says that we're going to take the raw layer of the packet and extract the payload from that, which is the data the packet's carrying. And send that to the extractData function to see if there's any encoded data contained within it. If we've got an HTTP packet, on the other hand, we need to perform a little bit more analysis to start looking for the places where we could reasonably expect to find data contained within HTTP. And so what we can take advantage of is the fact that the scapy HTTP functionality includes the ability to extract all of the fields from the packet by querying a field's field within the packet structure. And so we need to break this out into request versus response because of how scapy analyzes it. But if we do the HTTP request layer of a packet.fields or the HTTP response layer of the packet.fields, we'd get access to all of the fields contained within the packet. And this includes things like the path, and queries, and header data like a cookie that might contain base 64 encoded data. And so by pulling out the fields in this way, we can then check to see whether or not they contain that base 64 encoded data. And so we can do so by just iterating over the fields, extracting the data associated with each field, and then call our extractData function based off of the type of struct or object that data is. So in some cases, we're just going to get a string. And so we can call isInstanceData, string to see, okay, is this something like cookie equals and then the cookie value. We just get the cookie value that's just a string, and so we can extract data from it immediately. If it's a dictionary, then we should iterate over the various keys in the dictionary and test all of its values. If it's a list or a tuple, we need to just iterate over each value in that list or tuple and test each of them for base 64 encoded data. And so this is sort of a brute force search through an HTTP request or response packet for anything that might be of interest. Our extractData function is up here at the top of our file here. And so the first thing we do is we strip away any trailing whitespace on our data. Reason for this is that things like HTTP are designed to be printable on protocols. And so there's a good chance that a particular value might have carriage returns or line feeds at the end. And so by removing those, we might go from something that is not base 64 decodable to something that is. We're then going to take advantage of Python's RE library, and RE is for regular expressions. And so this library will allow us to extract any regular expressions from the data. And so we've defined a regular expression here that describes what we're looking for. And so as we mentioned earlier, base 64 encoded data has a particular alphabet. It's a set of capital A to Z, lowercase a to z, 0 to 9, plus sign in a forward slash are the 64 characters in base 64. We also have the potential for a padding character, which is an equal sign. And so this re.findAll takes this description which says we want one or more of these characters and the data and pulls out any sections of the data that match that description. And so we're going to have a lot of false positives here. However, this should allow us to find base 64 encoded data that might not be alone in the string that we're looking at. So we could have something that's not part of the base 64 encoded data in the data, this will only look at that encoded data. Then for each of our various matches, we're going to iterate, we need to make sure that we actually have a match so if the length of the match is zero, we're just going to continue. If not, we're going to attempt to decode this data using base 64. So before we can do that, we need to consider what the structure of base 64 is. And so base 64 encoded data has a length that's always a multiple 4. And the reason for this is that that's how base 64 is set up. And so if you have an input that's not a proper multiple of three, then it appends additional padding to the end to make it the desired length and then encodes it into a value that's going to be four characters in a chunk. And so this multiple of 4 thing is significant, because there's the chance that someone will strip the pattern characters off the base 64 encoded data before they send it. Reason for this is that the equal signs at the end of the data make it very easy to identify base 64 encoded data. Because you'll have set character set and then potentially one or two equal signs at the end. And so by removing those equal signs, it's easier to conceal what type of data this is. And we know how many equal signs to put back assuming that we have base 64 encoded data, because we know that length is always going to be a multiple of 4. And so we test if the length of the match is not a multiple of four, so length of match mod four equals zero and then negation. We're going to calculate how many padding characters we need. And then we can use b = and then times padnum to create a string of bytes containing the desired number of equal signs, and then append that to the value that we're matching on. Once we have that, we can use it b64 decode of match to attempt to decode the data. And then we'll do .decodeUTF8 to convert from bytes to a string. Finally have a couple of checks here just to weed out some of the false positives. We're going to say that we only care about the data if its length is greater than 5. Entirely possible someone might be sending less data base 64 encoded in this way, but we probably don't care about it. Or we can always remove this test if we have good reason to believe that accepting a massive number of additional false positives. And then in this case, we're only looking for printable characters. So this is something you'll probably want to remove if you use this code elsewhere. Because a common use of base 64 is to change unprintable content to printable content. And so but with this is printable statement, we're ignoring anything that might be, say, a downloaded piece of malware that's a binary executable, which includes a lot of unprintable bytes. However, this way we can see that if we've got just text data that's encoded to conceal it, we'll be able to look at the end result. And if we pass those two tests the data that we've decoded. And then finally here, notice that we're wrapped in a try except here that is that if base 64 decoding fails, then it'll throw an exception. And so we're just catching that exception and continuing to look at the next match. And so this code should allow us to identify and decode any base 64 encoded data contained within some network traffic. So now we need some network traffic that contains base 64 encoded data. And so in this case, we're going to go a little off script and take advantage of some code that we have from the mitre attack part of this learning path. So one topic that was discussed is protocol tunneling. Where we have the ability to hide a C2 channel or C2 traffic within a different type of protocol. In this case, what we did was we took the data that we wanted to send, we base 64 encoded, it and stuck it in the cookie header of an HTTP request. And then our responses were base 64 encoded data in the contents of the corresponding response. And so we're going to take a look at this to see if our code is able to identify and extract the base 64 encoded data that we've stored in the cookies here. And so we're looking for on the string C2 data in the request. And if we go to the server, if we receive C2 data, we're going to send back received, again, base 64 encoded. And so minimizing the code here, we've got a few terminal windows to do this. We'll start our protocol decoder here. So this is the code to extract our base 64 encoded data. We'll then start the server for our encoded C2. And then finally, the client that will call out to the server. And so here we see that we got some results, I'm just going to kill these processes. So the client sent C2 data base 64 encoded to the server, who's indicated they've received that data. And then they responded with an HTTP 200, containing the phrase, received. We look over here at our protocol decoder, we see that that base 64 encoded data was identified, decoded, and printed out. And so this demonstrates how we can use Python to implement a simple protocol decoder for active defense. Because by identifying this base 64 encoded command and control traffic, we can decode it and then apply additional analysis to it, which can help us to identify if a particular system is compromised. For example with the malware that we're running here in this terminal, and to allow us to analyze its command and control traffic to understand what's going on there. So for example, if we were performing data exfiltration with this, we'd be able to see what data was stolen and sent out over the channel. And then the server's response could include something more like instructions like okay, collect this data, etc, or attempt to move laterally. And that sort of intelligence about what the malware is doing can be invaluable for identifying the scope of the infection, the impacts, and remediating. Thank you.