Written by:
Bernardo Quintero, Founder of VirusTotal and Security Director, Google Cloud Security
Alex Berry, Security Manager of the Mandiant FLARE Team, Google Cloud Security
Ilfak Guilfanov, author of IDA Pro and CTO, Hex-Rays
Vijay Bolina, Chief Information Security Officer & Head of Cybersecurity Research, Google DeepMind
Executive Summary
- Following up on our Gemini 1.5 Pro for malware analysis post, this time around we tested to see if our light-weight Gemini 1.5 Flash model is capable of large-scale malware dissection.
- The Gemini 1.5 Flash model was created to optimize efficiency and speed while maintaining performance, which allows us to utilize Gemini 1.5 Flash to process up to 1,000 requests per minute and 4 million tokens per minute.
- To evaluate the real-world performance of our malware analysis pipeline, we analyzed 1,000 Windows executables and DLLs randomly selected from VirusTotal’s incoming stream. The system effectively resolved cases of false positives, samples with obfuscated code, and malware with zero detections on VirusTotal.
- On average, Gemini 1.5 Flash processed each file in 12.72 seconds (excluding the unpacking and decompilation stages), providing accurate summary reports in human-readable language.
Introduction
In our previous post, we explored how Gemini 1.5 Pro could be used to automate the reverse engineering and code analysis of malware binaries. Now, we’re focusing on Gemini 1.5 Flash, Google’s new lightweight and cost-effective model, to transition that analysis from the lab to a production-ready system capable of large-scale malware dissection. With the ability to handle 1 million tokens, Gemini 1.5 Flash offers impressive speed and can manage large workloads. To support this, we’ve built an infrastructure on Google Compute Engine, incorporating a multi-stage workflow that includes scaled unpacking and decompilation stages. While promising, this is just the first step on a long journey to overcome accuracy challenges and unlock AI’s full potential in malware analysis.
VirusTotal analyzes an average of 1.2 million unique new files each day, ones that have never been seen before on the platform. Nearly half of these are binary files (PE_EXE, PE_DLL, ELF, MACH_O, APK, etc.) that could benefit from reverse engineering and code analysis. Traditional, manual methods simply cannot keep pace with this volume of new threats. Building a system to automatically unpack, decompile, and analyze this quantity of code in a timely and efficient manner is a significant challenge, one that Gemini 1.5 Flash is designed to help address.
Building on the extensive capabilities of Gemini 1.5 Pro, the Gemini 1.5 Flash model was created to optimize efficiency and speed while maintaining performance. Both models share the same robust, multimodal capabilities and are capable of handling a context window of over 1 million tokens; however, Gemini 1.5 Flash is particularly designed for rapid inference and cost-effective deployment. This is achieved through parallel computation of attention and feedforward components, as well as the use of online distillation techniques. The latter enables Flash to learn directly from the larger and more complex Pro model during training. These architectural optimizations allow us to utilize Gemini 1.5 Flash to process up to 1,000 requests per minute and 4 million tokens per minute.
To illustrate how this pipeline works, we’ll first showcase examples of Gemini 1.5 Flash analyzing decompiled binaries. Then we’ll briefly outline the preceding steps of unpacking and decompilation at scale.
Analysis Speed and Examples
To evaluate the real-world performance of our malware analysis pipeline, we analyzed 1,000 Windows executables and DLLs randomly selected from VirusTotal’s incoming stream. This selection ensured a diverse range of samples, encompassing both legitimate software and various types of malware. The first thing that struck us was the speed of Gemini 1.5 Flash. This aligns with the performance benchmarks highlighted in the Google Gemini team’s paper, where Gemini 1.5 Flash consistently outperformed other large language models in terms of text generation speed across multiple languages.
The fastest processing time we observed was a mere 1.51 seconds, while the slowest was 59.60 seconds. On average, Gemini 1.5 Flash processed each file in 12.72 seconds. It’s important to note that these times exclude the unpacking and decompilation stages, which we’ll explore further later in this blog post.
These processing times are influenced by factors such as the size and complexity of the input code, and the length of the resulting analysis. Importantly, these measurements encompass the entire end-to-end process: from sending the decompiled code to the Gemini 1.5 Flash API on Vertex AI, through the model’s analysis, to receiving the complete response back on our Google Compute Engine instance. This end-to-end perspective highlights the low latency and speed achievable with Gemini 1.5 Flash in real-world production scenarios.
Example 1: Dispelling a False Positive in 1.51 Seconds
Out of the 1,000 binaries we analyzed, this one was processed the fastest, highlighting the remarkable speed of Gemini 1.5 Flash. The file goopdate.dll (103.52 KB) triggered a single anti-virus detection on VirusTotal, a common occurrence that often requires time-consuming manual review.
Imagine this file triggered an alert in your SIEM system and you need answers fast. Gemini 1.5 Flash delivers, analyzing the decompiled code in just 1.51 seconds and providing a clear explanation: the file is a simple executable launcher for the “BraveUpdate.exe” application, likely a web browser component. This rapid, code-level insight allows analysts to confidently dismiss the alert as a false positive, preventing unnecessary escalation and saving valuable time and resources.
Example 2: Resolving Another False Positive
In another example, the file BootstrapPackagedGame-Win64-Shipping.exe (302.50 KB) was flagged by two anti-virus engines on VirusTotal, again requiring further scrutiny.
Gemini 1.5 Flash analyzes the decompiled code in just 4.01 seconds, revealing that the file is a game launcher. Gemini details the sample’s functionality, which includes checking for prerequisites like Microsoft Visual C++ Runtime and DirectX, locating and executing redistributable installers, and ultimately launching the main game executable. This level of understanding allows analysts to confidently categorize the file as legitimate, avoiding unnecessary time and effort spent on a potential false positive.
Example 3: Longest Processing with Obfuscated Code
The file svrwsc.exe (5.91 MB) stood out during our analysis for requiring the longest processing time: 59.60 seconds. Factors such as the size of the decompiled code and the presence of obfuscation techniques like XOR encryption likely contributed to the longer analysis time. Nevertheless, Gemini 1.5 Flash completed its analysis in less than a minute. This is a notable achievement, considering that manually reverse engineering such a binary could take a human analyst several hours.
Gemini correctly determined the sample to be malicious and pinpointed its backdoor functionality, which is designed to exfiltrate data and connect to command-and-control (C2) servers located on Russian domains. The analysis delivers a wealth of IOCs such as potential C2 server URLs, mutexes used for process synchronization, altered registry keys, and suspicious file names. This information enables security teams to swiftly investigate and respond to the threat.
Example 4: Cryptominer
This example shows Gemini 1.5 Flash analyzing the decompiled code of a cryptominer named colto.exe. It’s important to note that the model only receives the decompiled code as input, with no additional metadata or context from VirusTotal. In just 12.95 seconds, Gemini 1.5 Flash delivered a comprehensive analysis, identifying the malware as a cryptominer, highlighting obfuscation techniques, and extracting key IOCs, such as the download URL, file path, mining pool, and wallet address.
Example 5: Understanding Legitimate Software with Agnostic Approach
This example showcases Gemini 1.5 Flash analyzing a legitimate 3D viewer application named 3DViewer2009.exe in 16.72 seconds. Even with goodware, understanding a program’s functionality can be valuable for security purposes. It’s important to highlight that, just like in the previous examples, the model only receives the decompiled code for analysis without any additional metadata from VirusTotal, such as whether the binary is digitally signed by a trusted entity. This information is often taken into account by traditional malware detection systems, but we are adopting a code-centric approach.
Gemini 1.5 Flash successfully identifies the core purpose of the application (loading and displaying 3D models) and even recognizes the specific type of 3D data it handles (DTM). The analysis highlights the use of OpenGL for rendering, configuration file loading, and custom file classes for data management. This level of understanding could help security teams differentiate between legitimate software and malware that might attempt to mimic its behavior.
This agnostic approach to code analysis that focuses solely on functionality could be particularly valuable for scrutinizing digitally signed binaries, which might not always receive the same level of security analysis as unsigned files. This opens up new possibilities for identifying potentially malicious behavior, even within supposedly trusted software.
Example 6: Unmasking a Zero-Hour Keylogger
This example showcases the true power of analyzing code for malicious behavior: detecting threats that traditional security solutions miss. The executable AdvProdTool.exe (87KB) was submitted to VirusTotal, where it evaded all anti-virus engines, sandboxes, and detection systems at the time of its initial upload and analysis. However, Gemini 1.5 Flash uncovers its true nature. In just 4.7 seconds, the model analyzes the decompiled code, identifies it as a keylogger, and even reveals the IP address and port where it exfiltrates stolen data.
The analysis highlights the code’s use of OpenSSL to establish a secure TLS connection to the IP address on port 443. Crucially, Gemini points out the suspicious use of keyboard input capture functions (GetAsyncKeyState, GetKeyState) and their connection to data transmission over the secure channel (SSL_write).
This example underscores the potential of code analysis to identify zero-hour threats in early stages of development, as this keylogger appears to be. It also highlights a critical advantage of Gemini 1.5 Flash: analyzing the raw functionality of code can reveal malicious intent, even when disguised by metadata or detection evasion techniques.
Workflow Overview
Our malware analysis pipeline consists of three key stages: unpacking, decompilation, and code analysis with Gemini 1.5 Flash. Two critical processes drive the first two stages: automated unpacking and decompilation at scale. We leverage Mandiant Backscatter, our internal cloud-based malware analysis service, to dynamically unpack incoming binaries. The unpacked binaries are then processed by a cluster of Hex-Rays Decompilers running on Google Compute Engine. While Gemini is capable of analyzing both disassembled and decompiled code, we’ve opted for decompilation in our pipeline. The determining factor was decompiled code being 5–10 times more concise than disassembled code, making it a more efficient choice given the token window limitations of large language models. This decompiled code is ultimately fed to Gemini 1.5 Flash for analysis.
By orchestrating this workflow on Google Cloud, we can process a massive number of binaries, including the entire daily influx of over 500,000 new binaries submitted to VirusTotal.
Mandiant Backscatter
Our internal Mandiant Malware Analysis Backscatter Service, hosted on the Google Compute Engine, provides scalable malware configuration extraction. As part of extracting configurations, Backscatter also performs malware deobfuscation, decryption and unpacking in-line with our VirusTotal pipeline to decompose the malware into artifacts. From these artifacts, configurations are extracted and the resulting IOCs are used to identify and track malware threats and actors across hundreds of malware families in our Google Threat Intelligence platform. The artifacts, including unpacked binaries, are also resubmitted back into the pipeline, allowing tools such as Gemini 1.5 Flash to perform additional processing to extend our knowledge of what operations the malware is performing with the IOCs identified in previous stages.
Hex-Rays Decompiler
Our cluster of Hex-Rays IDA Pro Decompilers, hosted on Google Compute Engine, provides the scalable decompilation power necessary for this pipeline. We leverage the new IDA LIB, a headless version of IDA Pro designed for automated workflows, which is scheduled for release in Q3 2024. The cluster seamlessly integrates with our pipeline, reading unpacked binaries from a Google Cloud Pub/Sub queue fed by Mandiant Backscatter. The resulting decompiled pseudo-C code is then stored in a Google Cloud Storage bucket, ready for analysis by Gemini 1.5 Flash. Currently, each node in the cluster can decompile more than 3,000 files per hour, ensuring we can keep pace with the high volume of incoming binaries.
Challenges and Ongoing Development
As expected, our tests highlighted a crucial aspect of this pipeline: the performance of Gemini 1.5 Flash is heavily dependent on the quality of the preceding unpacking and decompilation stages. For instance, if the unpacking phase fails to fully unpack a new or unknown packer, the decompiler will only be able to extract the code of the packer itself, not the original program logic hidden within. In such cases, Gemini correctly reports that it’s analyzing a program performing unpacking, decryption, or deobfuscation operations, and that it won’t be able to analyze the true purpose of the code concealed by the packer.
Similarly, the quality of the decompiled code directly impacts Gemini’s ability to understand and analyze the program’s behavior. The decompiled code is the raw material for Gemini’s analysis, so any errors or inconsistencies in this code will propagate to the final report. Moreover, Gemini must also contend with various code-level obfuscation methods, including new approaches employed by attackers, requiring it to continuously adapt and improve its analysis capabilities in this evolving landscape.
This interdependence underscores the importance of continuously improving all three stages of the pipeline. A weakness in any part of this sequential workflow will directly impact the performance of the subsequent phases. Improved outputs from these stages directly translate to more successful analysis by Gemini. Therefore, our ongoing development efforts focus not only on enhancing Gemini’s analytical capabilities but also on refining the unpacking and decompilation stages to ensure they deliver the highest quality output for analysis.
On the decompilation side, we are working closely with Hex-Rays to enhance their decompiler, focusing on three key areas:
- Improved Language-Specific Structure Recognition: We aim to enhance the decompiler’s ability to recognize structures unique to specific programming languages. This includes elements like try-catch statements or class member definitions within C++, Rust, and Golang code. By adding a new semantic layer to the decompiler, we can enable it to interpret the underlying code more effectively. This leads to more accurate and readable output, ultimately benefiting Gemini’s analysis.
- More Meaningful Function and Variable Naming: Clear and descriptive names for functions and variables within the decompiled code significantly aid Gemini’s analysis. We’re exploring techniques to generate such names during the decompilation process, including the possibility of integrating Gemini for this purpose.
- Richer Contextual Information: Beyond improved decompiled code, we’re investigating methods to provide the model with richer contextual data. This might include visual representations like data flow diagrams and control flow graphs, or even a complete export of IDA Pro’s IDB. This additional information can provide valuable insights into the program’s overall structure and logic, enabling a more thorough and accurate analysis.
Google Threat Intelligence: The Next Evolution
This is just the beginning of our exploration into leveraging AI for large-scale threat analysis. We are excited to announce that these types of code analysis reports will soon be integrated into VirusTotal’s Code Insight section. This integration will provide the VirusTotal community with valuable insights into the behavior of binary files, powered by the speed and scalability of Gemini 1.5 Flash.
For an even more powerful analysis experience, we are developing an advanced version of this pipeline within Google Threat Intelligence. This implementation will leverage the capabilities of Gemini 1.5 Pro enhanced by AI agents that can use specialized malware analysis tools and correlate threat information from across Google, Mandiant, and VirusTotal. This advanced analysis will be available within our Private Scanning service, ensuring the confidentiality of the content processed. Watch our recent webinar for more on Gemini in Google Threat Intelligence.
We will continue to share our progress and new advancements in AI-driven threat analysis as we strive to make the digital world a safer place. Here at GSEC Malaga, we are dedicated to pushing the boundaries of what’s possible in cybersecurity and exploring new ways to apply AI to protect users from evolving threats.
Samples Details
The following table contains details on the binary samples discussed in this post.
Source: Original Post