Automating Pikabot’s String Deobfuscation

Introduction

Pikabot is a malware loader that originally emerged in early 2023 with one of the prominent features being the code obfuscation that it leverages to evade detection and thwart technical analysis. Pikabot employed the obfuscation method to encrypt binary strings, including the address of the command-and-control (C2) servers.

In this article, we briefly describe the obfuscation method used by Pikabot and we present an IDA plugin (with source code) that we developed to assist in our binary analysis.

As mentioned in our previous article, the obfuscation method was removed when Pikabot remerged with a new version in early 2024. As of April 2024, this obfuscation method has not been used again in any Pikabot samples.

Key Takeaways

  • Pikabot is a malware loader that was first observed in early 2023 and became very active following the takedown of Qakbot in August 2023.
  • Previous versions of Pikabot used advanced string encryption techniques, which have been replaced with simpler algorithms. Previously, the strings were encrypted using a combination of AES-CBC and RC4 algorithms. 
  • The string obfuscation’s implementation is similar to ADVobfuscator.
  • In this article, we describe the binary strings’ obfuscation algorithm and our approach to decrypt the binary strings using IDA’s microcode.
  • Zscaler ThreatLabz developed an IDA plugin to automatically decrypt Pikabot’s obfuscated strings and are releasing the source code.

Technical Analysis

Strings obfuscation

The steps for decrypting a Pikabot string are relatively simple. Each string is decrypted only when required (in other words, Pikabot does not decrypt all strings at once). Pikabot follows the steps below to decrypt a string:

  1. Pushes on the stack the encrypted string array.
  2. Initializes the RC4 encryption algorithm. The RC4 key is different for each string (with very few exceptions).
  3. Pikabot takes the decrypted RC4 output, decodes it using Base64 after replacing all instances of the character ‘_’ (underscore) with ‘=’ (equal) and decrypts it using the AES-CBC algorithm. The AES key and initialization vector (IV) are the same for all strings.

ANALYST NOTE: There are encrypted strings, which are encrypted only with the RC4 algorithm.

Figure 1 shows the code used to decrypt the string, Kernel32.dll

Figure 1: Example Pikabot string decryption for Kernel32.dll.

Figure 1: Example Pikabot string decryption for Kernel32.dll.

Figure 2 shows the function that first decrypts the AES key and IV. The RC4 decrypted string passed to the function is then Base64 decoded, and is finally decrypted using AES.

Figure 2: Pikabot Base64 decoding and AES decryption function.

Figure 2: Pikabot Base64 decoding and AES decryption function.

Decrypting Pikabot strings

The following information is required to decrypt a Pikabot string:

  • The AES key and IV of a binary sample.
  • The RC4 encrypted array of each string.
  • The RC4 key of each encrypted string.
  • The string’s size.

Our approach relies on IDA’s microcode. This decision helped us with several problems such as:

  • IDA’s microcode converts the assignment/copy of the RC4 key into a strcpy function. In the assembly level, this could either be multiple mov or rep instructions. As a result, it would make the detection and extraction harder and more challenging.
  • Extracting the RC4 encrypted array. Since IDA reconstructs the stack, it makes it much easier to search and extract the encrypted array.

IDA’s microcode brings other limitations (for example, decompilation failure for a function) but no such issues were encountered for the parts of the code we wanted to analyze.

In the sections below, we describe how each component was extracted.

Extracting the AES key/IV

For the extraction of the AES key and IV, we iterate all analyzed functions and discard any function, whose size is not in the range of 600 and 1,600 bytes.

Next, we scan the functions for the following patterns:

  • Existence of RC4 encryption. This is the same heuristic we use for detecting encrypted RC4 strings.
  • Existence of values 0x3D and 0x5F (used before Base64 decoding the string) that are used with microcode opcodes m_stx and m_jnz respectively.

Lastly, if all of the patterns above match, then the handler for decrypting a Pikabot string is invoked. For the classification of the key and the IV, we apply the following checks:

  • The number of decrypted strings from the identified function must be two. Otherwise, the identified function is incorrect.
  • The longest string is marked as the AES key (by taking the first 32-bytes) and the remaining decrypted string as the IV (by taking the first 16-bytes).

Extracting the RC4 encrypted array

Pikabot constructs the RC4 encrypted array by pushing it onto the stack and then decrypting it. Our approach involves the following steps for detecting each part of the array:

  • Use the detected RC4 encryption block address as a starting point.
  • Search for the microcode opcode m_add in the decryption instruction. The detected microcode holds the starting stack offset of the encrypted array.
  • Start iterating backwards and search for the microcode opcodes m_mov/m_call, the second opcode is used in case the data is copied via a strcpy or memcpy instruction. If the stack offset matches, then we save the data and update the stack offset. This process is repeated until the reconstructed encrypted array has the expected size.

Extracting the RC4 encrypted array size

The length of the encrypted array is extracted in a similar way as the encrypted array. The detection pattern is:

  • Use the detected RC4 encryption block address as a starting point.
  • Search for the microcode opcodes m_jb, m_jae, and m_setb, and use the immediate constant number in the instruction as a size.

Extracting the RC4 key

Extracting the RC4 key of each string proved to be the most challenging part while creating the plugin. In our first attempt, we were extracting the RC4 key after detecting the initialization of the RC4 algorithm. However, this approach had the following issues:

  • Incorrect extraction of the RC4 key: In many cases, an invalid/junk string was placed in-between the correct RC4 key and the RC4 algorithm initialization.
  • Incorrect detection of RC4 initialization code block: For example, if the size of the encrypted array was 256 bytes then an incorrect RC4 key would be detected.

Instead of trying to detect the RC4 key by detecting the initialization of the RC4 algorithm, we decided to extract all strings from each targeted function. Then, we decrypted the RC4 encrypted array with each extracted RC4 key and validated the decrypted output by applying the following checks:

  • If it matches the expected string size.
  • If all characters of the string are readable.

ANALYST NOTE: After successful decryption, the RC4 key is marked and not reused in order to limit any false-positives. For example, if the decrypted string does not have any junk characters.
 

IDA Plugin

We tested our Pikabot plugin with IDA versions 8 and newer. The plugin can be executed by compiling the source code using IDA’s SDK and/or copying the generated DLL into the IDA plugins folder. After a Pikabot sample is loaded, the user can decompile a function and right-click in the decompiled output and either choose to decrypt strings in the current function or in all of them (Figure 3).

Figure 3: IDA Pikabot plugin options.

Figure 3: IDA Pikabot plugin options.

For each decrypted string, the plugin sets a comment in the decompiled output. Figure 4 shows a function with the obfuscated strings before the plugin is invoked.

Figure 4: Before running the Pikabot string decryption plugin.

Figure 4: Before running the Pikabot string decryption plugin.

Figure 5 shows the output after our Pikabot IDA plugin is executed.

Figure 5: Output after running the Pikabot string decryption plugin.

Figure 5: Output after running the Pikabot string decryption plugin.

Source Code

The source code for our IDA plugin to deobfuscate Pikabot strings can be found at this GitHub repository.

Conclusion

Older Pikabot variants include a string obfuscation implementation, which can make automation a complicated task. By using IDA’s microcode and developing our own plugin, we were able to speed up our analysis in most cases and analyze the code much faster. Since this technique is no longer used by Pikabot, we decided to open source our IDA plugin to assist the research community with defeating current and future stack-based obfuscation techniques.

Zscaler Coverage

Zscaler sandbox coverage

In addition to sandbox detections, Zscaler’s multilayered cloud security platform detects indicators related to Pikabot at various levels with the following threat names:

Indicators Of Compromise (IOCs)

The following samples were used for testing the plugin.

SHA256 DESCRIPTION
aebff5134e07a1586b911271a49702c8623b8ac8da2c135d4d3b0145a826f507 Pikabot Sample
4c53383c1088c069573f918c0f99fe30fa2dc9e28e800d33c4d212a5e4d36839 Pikabot Sample
15e4de42f49ea4041e4063b991ddfc6523184310f03e645c17710b370ee75347 Pikabot Sample
e97fd71f076a7724e665873752c68d7a12b1b0c796bc7b9d9924ec3d49561272 Pikabot Sample
a9f0c978cc851959773b90d90921527dbf48977b9354b8baf024d16fc72eae01 Pikabot Sample
1c125a10c33d862e6179b6827131e1aac587d23f1b7be0dbcb32571d70e34de4 Pikabot Sample
62f2adbc73cbdde282ae3749aa63c2bc9c5ded8888f23160801db2db851cde8f Pikabot Sample
b178620d56a927672654ce2df9ec82522a2eeb81dd3cde7e1003123e794b7116 Pikabot Sample
72f1a5476a845ea02344c9b7edecfe399f64b52409229edaf856fcb9535e3242 Pikabot Sample

Acknowledgments

The following projects were the initial inspiration for developing our plugin. In addition, they assisted with the usage of IDA’s SDK:

Source: Original Post