We're all used to the regular CyberChef operations like "From Base64", From Decimal and the occasional magic decode or xor. But what happens when we need to do something more advanced?
Cyberchef contains many advanced operations that are often ignored in favour of Python scripting. Few are aware of the more complex operations of which Cyberchef is capable. These include things like Flow Control, Registers and various Regular Expression capabilities.
In this post. We will break down some of the more advanced CyberChef operations and how these can be applied to develop a configuration extractor for a multi-stage malware loader.
Examples of Advanced Operations in CyberChef
Before we dive in, let's look at a quick summary of the operations we will demonstrate.
- Registers
- Regular Expressions and Capture Groups
- Flow Control Via Forking and Merging
- Merging
- Subtraction
- AES Decryption
After demonstrating these individually to show the concepts, we will combine them all to develop a configuration extractor for a multi-stage malware sample.
Obtaining the Sample
The sample demonstrated can be found on Malware Bazaar with
SHA256:befc7ebbea2d04c14e45bd52b1db9427afce022d7e2df331779dae3dfe85bfab
Advanced Operation 1 – Registers
Registers allow us to create variables within the CyberChef session and later reference them when needed.
Registers are defined via a regular expression capture group and allow us to create a variable with an unknown value that fits a known pattern within the code.
How To Use Registers in CyberChef
Below we have a Powershell script utilising AES decryption.
Traditionally, this is easy to decode using CyberChef by manually copying out the key value and pasting it into an "AES Decrypt" Operation.
We can see the key copied into an AES Decrypt operation.
This method of manually copying out the key works effectively, however this means that the key is "hardcoded" and the recipe will not apply to similar samples using the same technique.
If another sample utilises a different key, then this new key will need to be manually updated for the CyberChef recipe to work.
Registers Example 1
By utilising a "Register" operation, we can develop a regular expression to match the structure of the AES key and later access this via a register variable like $R0
to
The AES key, in this case, is a 44-character base64 string, hence we can use a base64 regular expression of 44-46 characters to extract the AES Key.
We can later access this via the $R0 variable inside of the AES Decrypt operation.
Registers Example 2
In a previous stage of the same sample, the malware utilises a basic subtract operation to create ASCII char codes from an array of large integers.
Traditionally, this would be decoded by manually copying out the 787 value and applying this to a subtract operation.
However, again, this causes issues if another sample utilises the same technique but with a different value.
A better method is to create another register with a regular expression that matches the 787 value.
Here we can see an example of this, where a Register has been used to locate and store the 787 value inside of $R0. This can later be referenced in a subtract operation by referencing $R0.
Regular Expressions
Regular expressions are frustrating, tedious and difficult to learn. But they are extremely powerful and you should absolutely learn them in order to improve your Cyberchef and malware analysis capability.
In the development of this configuration extractor, regular expressions are applied in 10 separate operations.
Regular Expressions – Use Case 1 (Registers)
The first use of regular expressions is inside of the initial register operation.
Here, we have applied a regex to extract a key value used later as part of the deobfuscation process.
The key use of regex here is to generically capture keys related to the decoding process, avoiding the need to hardcode values and allowing the recipe to work across multiple samples.
How To Use Regular Expressions to Isolate Text
The second use of regular expressions in this recipe is to isolate the main array of integers containing the second stage of the malware.
The second stage is stored inside a large array of decimal values separated by commas and contained in round brackets.
By specifying this inside of a regex, we can extract and isolate the large array and effectively ignore the rest of the code. This is in contrast to manually copying out the array and starting a new recipe.
A key benefit here is the ability to isolate portions of the code without needing to copy and paste. This enables you to continue working inside of the same recipe
Regular Expressions – Use Case 3 (Appending Values)
Occasionally you will need to append values to individual lines of output.
In these cases, a regular expression can be utilised to capture an entire line (.*)
and then replace it with the same value (via capture group referenced in $1) followed by another value (our initial register).
The key use case is the ability to easily capture and append data, which is essential for operations like the subtract operator which will be later used in this recipe.
Regular Expressions – Use Case 4 (Extracting Encryption Keys)
We can utilise regular expressions inside of register operations to extract encryption keys and store these inside of variables.
Here, we can see the 44-character AES key stored inside of the $R1 register.
This is effective as the key is stored in a particular format across samples. Leveraging regex allows us to capture this format (44 char base64 inside single quotes) without needing to worry about the exact value.
Using Regular Expressions To Extract Base64 Text
Regular expressions can be used to isolate base64 text containing content of interest.
This particular sample stores the final malware stage inside of a large AES Encrypted and Base64 encoded blob.
Since we have already extracted the AES key via registers, we can apply the regex to isolate the primary base64 blob and later perform the AES Decryption.
Regular Expressions – Use Case 6 (Extracting Initial Characters)
This sample utilises the first 16 bytes of the base64 decoded content to create an IV for the AES decryption.
We can leverage regular expressions and registers to extract out the first 16 bytes of the decoded content using .{16}
.
This enables us to capture the IV and later reference it via a register to perform the AES Decryption.
Using Regular Expressions To Remove Trailing Null-Bytes
Regular expressions can be used to remove trailing null bytes from the end of data.
This is particularly useful as sometimes we only want to remove null bytes at the "end" of data. Whereas a traditional "remove null bytes" will remove null bytes everywhere in the code.
In the sample here, there are trailing null bytes that are breaking a portion of the decryption process.
By applying a null byte search