Teasing the Secrets From Threat Actors: Malware Configuration Parsing at Scale

A pictorial representation of malware configuration data such as that used by IcedID

This post is also available in:
日本語 (Japanese)

Executive Summary

Configuration data that changes across each instance of deployed malware can be a gold mine of information about what the bad guys are up to. The problem is that configuration data in malware is usually difficult to parse statically from the file, by design. Malware authors know the intelligence value as they provide directives for how the malware should behave.

Malware is like most complex software systems in that there are many advantages for code reuse and abstraction. Therefore, it is not surprising to see that the concept of software configuration is pervasive across the various malware families we analyze. After all, it’s pretty hard to imagine a stereotypical cybercriminal wanting to bother with recompiling their code to change an IP address or whatever else, when going after different targets.

But the good news is that statically armored configuration data can often easily be found and parsed directly from memory. We will cover a nice example of an IcedID (information stealer) configuration, how it was obfuscated and how we’ve extracted it.

Palo Alto Networks customers receive improved detection for the evasions discussed in this blog through Advanced WildFire. As we continue to parse and extract this information from malware families at scale, we hope to build out a pool of threat intelligence that will better help us understand the campaigns and tactics of the various threat actors who are targeting various organizations.

What Are Malware Configurations?
IcedID Analysis
Unpacking IcedID Stage One
Locating the Encrypted Configuration Data Blob
Extracting the Encryption Key
Decrypting the Configuration Data Blob With the Encryption Key
Unpacking the IcedID Stage Two Binary
Locating the Encrypted Configuration Data Blob
Extracting the Encryption Key
Decrypting the Configuration Data Blob With the Encryption Key
Scaling Up
Conclusion
Indicators of Compromise
Additional Resources

What Are Malware Configurations?

So what exactly do we mean by the term “configuration” when talking about malware? Outside the context of malware, we think of configuration in terms of defining how systems should behave. For example, we would consider the rules used to define which networking routes for a firewall are allowed, or which font size your web browser uses while you read this, as configurable information.

For malware, this is no different. Malware configurations are just collections of elements that define how a malware operates, such as the following:

  • Command-and-control (C2) network addresses
  • Passwords for remote administrators
  • File paths in which to drop persistent payloads

The way these elements are embedded in malware components tends to be specific to each malware family. Also, they might evolve over time as malware undergoes development, or when malware authors change their build process.

Generally speaking, malware configuration elements tend to be the properties of malware that the authors want to make easily editable between campaigns and deployments without requiring manual code edits for each one. Malware configuration elements can also expose latent behaviors and malware infrastructure that are not typically observable under routine dynamic analysis.

Malware configurations have intelligence value for security practitioners because they provide insights into campaigns over time. In some cases, defenders could use them as actionable artifacts for network detection, or for identifying infected hosts. The successful extraction and validation of a malware configuration can also be used to reinforce our confidence when identifying a file as malicious.

Because malware configurations have value to security systems and defenders alike, it is state-of-practice for modern malware authors to protect their configuration elements using different techniques. These protections often include a blend of encryption, obfuscation and compression. They might also be layered with evasive techniques.

This protection poses a significant challenge for malware configuration extractors that operate solely by using static analysis, because all of these protections must be detected and bypassed before extraction can be performed. Using an advanced dynamic analysis sandbox combined with intelligent runtime memory analysis makes it possible to bypass many of these protections and pinpoint the best opportunities to perform extraction.

When we represent and store these configurations using standardized schemas, it enables us to extract maximum value through automation, machine learning and interactive analysis. The DC3-MWCP library defines a schema for many of the most common configuration element types, and it provides a simple library for serialization to JSON.

The MITRE MAEC and STIX projects also provide us with a more general vocabulary for representing malware configuration elements. This also allows us to correlate the elements with observable objects collected during dynamic analysis.

IcedID Analysis

Let’s look at one IcedID binary and how its configurations are encrypted.

Hash 05a3a84096bcdc2a5cf87d07ede96aff7fd5037679f9585fee9a227c0d9cbf51

This particular attack chain, shown in Figure 1, was discovered in early November 2022. It delivered IcedID, an information stealer also known as Bokbot, as the final payload. This threat is well-known malware that has been attacking people since 2019.

The following diagram shows the infection chain.

Image 1 is a graphic showing the IcedID infection chain. It starts with malicious spam executed by the user, and it ends with a scheduled task created to persist IcedID stage two.
Figure 1. IcedID infection chain.

Authors of IcedID took pains to hide their configurations. Recent samples of IcedID stage two would only be downloaded if the victim’s machine matched the requirements of the threat actor.

The configurations of IcedID consisted of C2 URLs and their campaign IDs. The C2 URLs included some that might not be revealed during the execution of the IcedID binaries. The campaign ID links IcedID samples back to specific threat actors.

We will go through the following steps to extract the configurations found in the IcedID stage one and two binaries:

  1. Unpack the IcedID binary
  2. Locate the encrypted configuration data blob
  3. Extract the encryption key
  4. Decrypt the configuration data blob with the encryption key

Unpacking IcedID Stage One

IcedID stage one unpacks itself by first allocating memory using the VirtualAlloc function. This is followed by erasing the allocated memory using the Memset function, as shown in Figure 2. Finally, it copies the unpacked data to the allocated memory using the Memmove function.

To dump the unpacked data, we set a breakpoint at Memmove. The second argument of Memmove contains the address of the unpacked data. Figure 2 also shows the DOS MZ header of the unpacked IcedID stage one in the right-hand side of the hex dump.

Image 2 is a screenshot of iced ID stage one as it is unpacked image two is a screenshot of unpacking iced ID, stage one. This is done by using the VirtualAlloc function.
Figure 2. Unpacking IcedID stage one.

Locating the Encrypted Configuration Data Blob

Next, we located the encrypted configuration data blob using the unpacked stage one IcedID. While debugging the unpacked IcedID stage one file, we set a breakpoint at the address that called WinHttpConnect, as shown in Figure 3. The address pointed to by register RDI contains the string of the C2 URL.

Image 3 is a screenshot of IcedID stage one being debugged.
Figure 3. Debugging IcedID stage one.

By backtracing the code, we located a function that used the decrypted configuration as shown in Figure 4.

Image 4 is a screenshot of IcedID stage one code being backtraced.
Figure 4. Tracing code in IcedID stage one.

Tracing the code flow back, we found the loop that decrypted the configuration, as shown in Figure 5.

Image 5 is a screenshot of the configuration decryption loop for IcedID stage one.
Figure 5. Configuration decryption loop for IcedID stage one.

The instruction at 0x7FEF33339CD loaded the address of the encrypted configuration data blob (Encrypted_Config) into register RDX.

Extracting the Encryption Key

The instruction at 0x7FEF33339D4 reads the encryption key. The key is 0x40 bytes offset from the address of Encrypted_Config. We also learned the configuration is 0x20 bytes long. An XOR loop was used to decrypt the configuration.

Decrypting the Configuration Data Blob With the Encryption Key

After gathering the encryption key, the encrypted data blob and the decryption routine, we can now decrypt the configuration using the following script shown in Figure 6.

Image 6 is a screenshot of the configuration decryption script for IcedID stage one.
Figure 6. Configuration decryption script for IcedID stage one.

The decrypted IcedID stage 1 configuration has the following format, as shown in Figure 7.

Image 7 is a screenshot of the IcedID stage one configuration format.
Figure 7. IcedID stage one configuration format.

From the decrypted configuration, we can extract the following IoCs:

C2 URL bayernbadabum[.]com
Campaign ID 1139942657

Now, we will decrypt the configuration for the IcedID stage two binary.

Unpacking the IcedID Stage Two Binary

As the IcedID stage two binary uses the same packer as stage one, we will not repeat the unpacking steps here.

Locating the Encrypted Configuration Data Blob

We set a breakpoint at the address that calls Winhttpconnect, as shown in Figure 8.

Image 8 is a screenshot of IcedID stage two being debugged. Highlighted is row 14, were the breakpoint is set.
Figure 8. Debugging IcedID stage two.

After tracing the code, we located the function that used the decrypted configuration, as shown in Figure 9.

Image 9 is a screenshot of the traced code of IcedID stage two, showing the location of the function that used the description configuration.
Figure 9. Tracing code in IcedID stage two.

Extracting the Encryption Key

Tracing the code flow even further back, we found the function that decrypts the configuration. The first few instructions located the encrypted configuration blob. The encrypted blob is 0x25c bytes long. The encryption key is the last 0x10 bytes of the encrypted configuration blob, as shown in Figure 10.

Image 10 is a screenshot of the encryption key for IcedID stage two. Highlighted is the address of the encryption key, which is below the line offset to encryption key.
Figure 10. Loading the encryption key for IcedID stage two.

After retrieving the encryption key, the next step is the loop to decrypt the encrypted blob, as shown in Figure 11.

Image 11 is a screenshot of the configuration decryption loop for IcedID stage two. This is the next step after retrieving the encryption key.
Figure 11. Configuration decryption loop for IcedID stage two.

Decrypting the Configuration Data Blob With the Encryption Key

We replicated the instructions in the decryption loop using Python. After gathering the encryption key, encrypted data blob and the decryption routine, we can now decrypt the configuration using the following script (shown in Figure 12).

Image 12 is a screenshot of the configuration decryption script for IcedID stage two.
Figure 12. Configuration decryption script for IcedID stage two. Note: Jquinn147 and myrtus0x0 published a similar configuration decryption script for IcedID in May 2021, called IcedDecrypt (GitHub).

The decrypted IcedID stage two configuration has the following format, shown in Figure 13.

Image 13 is a screenshot that shows the configuration format for the decrypted IcedID stage two.
Figure 13. Configuration format for IcedID stage two.

From the decrypted configuration, we can extract the following indicators of compromise (IoCs):

C2 URLs newscommercde[.]com

spkdeutshnewsupp[.]com

germanysupportspk[.]com

nrwmarkettoys[.]com

C2 URI news
Campaign ID 1139942657

We have manually decrypted the configuration for both the IcedID stage one and two binaries.

Scaling Up

Now that we’ve discussed the work of figuring out how to target the configuration data in memory, the next challenge is to figure out how to perform this at scale. The massive scale of most malware processing systems means that most practitioners looking to build out a configuration extraction system will need to be careful about adding additional overhead. This means that we will need a mechanism to intelligently identify only the samples of interest for each parser, so we’re not unnecessarily running dozens of parsers across millions of samples.

We think a reasonable approach to this problem involves using intelligent runtime memory analysis, as it provides us with excellent visibility into the secrets malware authors want to protect. A typical workflow for our malware configuration extractors includes the following activities:

  • Scanning memory and/or other dynamic analysis artifacts
  • Applying a noise filter on the results to identify the best candidates for extraction
  • Performing extraction using the best fitting module and storing the results for reporting and indexing

Generalizing this common workflow presented us with the opportunity to make the following improvements:

  • Optimizing the search phase by only scanning analysis data once in most cases
  • Applying abstractions and reusable code for many common tasks
  • Limiting the impact of modules with problematic inputs or other bugs
  • Giving our security researchers visibility into the performance of their modules

The following example shows some of the IoCs from a recent IcedID extractor after being deployed at scale. Having a nice framework for deploying configuration extractors means that once you are finished crafting a configuration extraction script, it’s time to kick your feet up and relax while hundreds of configurations flow into your malware configuration database.

Image 14 is a screenshot of IoCs from IcedID samples. It starts at line 34 and continues to line 59.
Figure 14. IoCs from IcedID samples.

Conclusion

Thank you for joining us in this overview of malware configurations and why we are working hard to parse this information at scale in Advanced WildFire. Reverse engineering variants of each malware family allow us to build out parsers to extract meaningful and relevant data for all of them at scale.

There is a staggering amount of diversity among payloads in the malware landscape, which makes the task of supporting them all more or less impossible. Where possible, we use metrics-based approaches to prioritize focus on the malware families and variants most relevant to our customers. In this ongoing area of research, our team will continue to expand support for new malware families and variants.

Palo Alto Networks customers receive protections from threats such as those discussed in this post with Advanced WildFire.

Indicators of Compromise

05a3a84096bcdc2a5cf87d07ede96aff7fd5037679f9585fee9a227c0d9cbf51

Additional Resources

Updated May 17, 2023, at 6:00 a.m. PT.

Source: https://unit42.paloaltonetworks.com/teasing-secrets-malware-configuration-parsing/