How To Avoid Malware Rabbitholes

An Analysts Journey

I have reverse engineered malware used to shut down power grids and spy on journalists. I have looked at samples written by a soldier deep in the DPRK and samples written overnight by a teenager in the outskirts of England. I have analyzed original works of creativity that will never reach the public eye and code that was an exact copy of a public repository with thousands of stars. Through all these malware samples, I managed to understand that if you can determine what type of malware you are looking at, you can usually repeat the same analysis steps to get all the Indicators of Compromise (IoCs) necessary to understand a sample in less than 24 hours.

I have been a professional malware reverse engineer for a couple of years now and the hard part when starting out was avoiding rabbit holes. Too many times have I come across samples that I wished I had another week to explore. There were times I would stumble upon things like the latest GTA IV hack and I wanted to better understand how system drivers were compromised to make it all work. At times I would also come across samples that had hidden messages that would give clues on how to decrypt other messages in a different sample. As far as my job is concerned, video game hacks and secret russian messages are not analytically relevant. As far as my personal curiosity is concerned, I struggled A TON.

This blog is to hopefully help others who are at the beginning of their malware analysis journey and would like to become more efficient in extracting the important details from a sample in less time. This also helps anyone doing this as a hobby or just anyone who is curious about how malware analysis is done at an incident response level. Let’s dive in!


The Starting Point (1-2 hours)


It always helps to get as familiar with a sample as possible before digging into anything too technical. I usually start off by seeing if there is any internal or external information written up about the sample itself. This may be analysis from another company, or it may be a brief writeup of the host based indicators detected by someone working at the SOC. The objective in reading these reports is to better understand what to expect and potentially identify the malware category of the sample. Binary similarity is great here since variants of the sample may have reports written about them with information that may apply to the sample you’re currently working on.

The next step now is to start determining how difficult analysis may be. This means looking at things like the file header to determine if it is a PE or an ELF file, looking at the sections and their sizes to determine if it is packed, and looking at the strings to see if there are multiple PE headers (likely indicating an embedded payload), or some hard-coded indicators like an IP or a user-agent. This is a good time to also find quick wins like a valid file signature or any hard-coded host based or network based indicators.

The last step I like to perform at this stage is running the sample and just seeing what happens. This means running it manually in a VM and also submitting it into an isolated sandbox. You can quickly start weeding out false positives by seeing if anything on the system was manipulated both locally in the VM and remotely in the sandbox.

The main takeaway for this step is having an idea of the malware category you’ll be looking at. It rarely happens that we encounter a sample that triggers malicious flags without having a clue as to what category it falls under. Usually, whether it be through automated detections or some prior analyst, you’ll at least have a hunch as to what the sample could be.


Analysis (4-8 hours)


Now comes the analysis part, I prefer to start with static analysis in IDA if a sample is a PE, ELF, or MACH-O binary. Once the sample is loaded in IDA, I’ll usually open the pseudocode window, stack it next to the disassembly, and run the CAPA explorer.

With the CAPA explorer you can easily navigate through various starting points that will lead you to documenting some concrete IoCs. When you know a sample category, you’ll already have a set of questions you want answered to consider your analysis successful. For example, if you’re looking at something like a backdoor, one of the first things you will want to know is the network behavior. In IDA, the CAPA explorer will compile a series of addresses in the code where network related libraries are being invoked. This gives you immediate starting points for analyzing the backdoor since you’ll be immediately placed in the code block that is performing the behavior you’re after.

Below is a table you can use for reference with questions you will want to answer depending on the malware category. These questions will be the guide you use to determine what is a good starting point when looking at a sample with a tool like the CAPA explorer.

Category Questions to Answer
Ransomware - What is the encryption scheme?
- Is there data exfiltration, and if so, what data is it?
- How is the data exfiltrated?
- What are the target drives and extensions?
Backdoor - What are the supported commands?
- What is the C2 server?
- How does it communicate with the C2 server?
Stealer - What are the targeted credentials?
- What is the remote server?
- How are the credentials stored and exfiltrated?
Dropper - What is the payload destination?
- How is it executed?

I tend to answer the questions in the order they are presented as well. It may be difficult to believe that understanding the encryption scheme is more important than understanding what data is lost when it comes to ransomware, but from my experiences I realized that the encryption scheme can be the most complex component of the ransomware and I want to dedicate the most time to making sure I get that analysis right. The reason is a lot of the times a customer would want to know if their data could be decrypted, something that could be possible if the ransomware author made an error and you have enough time to carefully analyze the code and find it.

Through experience you’ll also learn that the latter questions in the table are answered more quickly. For the ransomware example, the target drives and extensions are usually a one-to-one string or hash comparison for the file extensions themselves. It is after prioritizing finding the details in the table above that I then revisit the sample and analyze the other things like the persistence, additional payloads, anti-analysis behavior, mutexes, and more.

Quick IDA Tips

Something else that helped me become a more efficient reverse engineer is changing the way I approached using IDA. I started thinking of using IDA as more about cleaning up all the initial pseudocode mess IDA shows me and then understanding exactly what it is the code does. You can do this by creating structs and making sure everything is labeled and annotated to your liking. Here are some quick random tips that helped me:

  • The M hotkey changes a constant value to an enum which is useful when looking at Windows API calls.
  • The = hotkey is used to map one variable to another so you don’t have repeating variables.
  • The \ hotkey hides the casting when looking at the pseudocode. I stopped using it as much but it’s helpful for lowering the overwhelming noise you might sometimes see at the start of analysis.
  • If an area in the .text section is highlighted in red and can’t display in graphical mode then you can go to the start of the function in red and press P to make it a procedure. This usually works if the code is called indirectly through something like a vtable. IDA won’t automatically assume it’s a procedure so you will have to mark it manually.
  • It always helps to remember that pseudocode is just an interpretation of the disassembly. It is normal for something like a char array or a pointer to a struct to be interpreted incorrectly.


Wrapping Up (1-2 hours)


Now that you’ve had time to answer the core questions for the malware sample and have been able to gather the remaining information as well, you should immediately write a draft report with your raw notes. I’ll usually formalize my notes at this point into sentences and double-check any missing information to make sure I’m correctly representing what I saw. Once you’re feeling confident about what you’ve got on paper, it’s time to ship off the report and do any remaining closeout procedure before moving on to the next one.

When it comes to writing reports, I recommend always performing your own analysis. Even if there is some report about a variant that is a 90% match, it’s always worth double-checking the findings and coming to your own conclusions. It’s also ok to reference OSINT articles, especially if they save you analysis time and the client is happy with it, but if there’s a chance any work you do goes public, make sure you reproduce it yourself and strictly use your own work to avoid any potential conflicts.

Getting your work recognized externally is an incredibly fulfilling feeling since you see the direct impact you have in keeping others secure!



If you liked this blog then feel free to connect with me at any of the links on top of my website! I am starting a course around this topic through KOSEC. Any feedback is welcome!