Introduction
In late 2023 and early 2024, the NCC Group Hardware and Embedded Systems practice undertook an engagement to reverse engineer baseband firmware on several smartphones. This included MediaTek 5G baseband firmware based on the nanoMIPS architecture. While we were aware of some nanoMIPS modules for Ghidra having been developed in private, there was no publicly available reliable option for us to use at the time, which led us to develop our own nanoMIPS disassembler and decompiler module for Ghidra.
In the interest of time, we focused on implementing the features and instructions that we encountered on actual baseband firmware, and left complex P-Code instruction emulation unimplemented where it was not yet needed. Though the module is a work in progress, it still decompiles the majority of the baseband firmware we’ve analyzed. Combined with debug symbol information included with some MediaTek firmware, it has been very helpful in the reverse engineering process.
Here we will demonstrate how to load a MediaTek baseband firmware into Ghidra for analysis with our nanoMIPS ISA module.
Target firmware
For an example firmware to analyze, we looked up phones likely to include a MediaTek SoC with 5G support. Some relatively recent Motorola models were good candidates. (These devices were not part of our client engagement.)
We found many Android firmware images on https://mirrors.lolinet.com/firmware/lenomola/, including an image for the Motorola Moto Edge 2022, codename Tesla: https://mirrors.lolinet.com/firmware/lenomola/tesla/official/. This model is based on a MediaTek Dimensity 1050 (MT6879) SoC.
There are some carrier-specific variations of the firmware. We’ll randomly choose XT2205-1_TESLA_TMO_12_S2ST32.71-118-4-2-6_subsidy-TMO_UNI_RSU_QCOM_regulatory-DEFAULT_cid50_R1_CFC.xml.zip.
Extracting nanoMIPS firmware
The actual nanoMIPS firmware is in the md1img.img
file from the Zip package.
To extract the content of the md1img
file we also wrote some Kaitai structure definitions with simple Python wrapper scripts to run the structure parsing and output different sections to individual files. The ksy
Kaitai definitions can also be used to interactively explore these files with the Kaitai IDE.
Running md1_extract.py
with an --outdir
option will extract the files contained within md1img.img
:
$ ./md1_extract.py ../XT2205-1_TESLA_TMO_12_S2STS32.71-118-4-2-6-3_subsidy-TMO_UNI_RSU_QCOM_regulatory-DEFAULT_cid50_CFC/md1img.img --outdir ./md1img_out/ extracting files to: ./md1img_out md1rom: addr=0x00000000, size=43084864 extracted to 000_md1rom cert1md: addr=0x12345678, size=1781 extracted to 001_cert1md cert2: addr=0x12345678, size=988 extracted to 002_cert2 md1drdi: addr=0x00000000, size=12289536 extracted to 003_md1drdi cert1md: addr=0x12345678, size=1781 extracted to 004_cert1md cert2: addr=0x12345678, size=988 extracted to 005_cert2 md1dsp: addr=0x00000000, size=6776460 extracted to 006_md1dsp cert1md: addr=0x12345678, size=1781 extracted to 007_cert1md cert2: addr=0x12345678, size=988 extracted to 008_cert2 md1_filter: addr=0xffffffff, size=300 extracted to 009_md1_filter md1_filter_PLS_PS_ONLY: addr=0xffffffff, size=300 extracted to 010_md1_filter_PLS_PS_ONLY md1_filter_1_Moderate: addr=0xffffffff, size=300 extracted to 011_md1_filter_1_Moderate md1_filter_2_Standard: addr=0xffffffff, size=300 extracted to 012_md1_filter_2_Standard md1_filter_3_Slim: addr=0xffffffff, size=300 extracted to 013_md1_filter_3_Slim md1_filter_4_UltraSlim: addr=0xffffffff, size=300 extracted to 014_md1_filter_4_UltraSlim md1_filter_LowPowerMonitor: addr=0xffffffff, size=300 extracted to 015_md1_filter_LowPowerMonitor md1_emfilter: addr=0xffffffff, size=2252 extracted to 016_md1_emfilter md1_dbginfodsp: addr=0xffffffff, size=1635062 extracted to 017_md1_dbginfodsp md1_dbginfo: addr=0xffffffff, size=1332720 extracted to 018_md1_dbginfo md1_mddbmeta: addr=0xffffffff, size=899538 extracted to 019_md1_mddbmeta md1_mddbmetaodb: addr=0xffffffff, size=562654 extracted to 020_md1_mddbmetaodb md1_mddb: addr=0xffffffff, size=12280622 extracted to 021_md1_mddb md1_mdmlayout: addr=0xffffffff, size=8341403 extracted to 022_md1_mdmlayout md1_file_map: addr=0xffffffff, size=889 extracted to 023_md1_file_map
The most relevant files are:
md1rom
is the nanoMIPS firmware imagemd1_file_map
provides slightly more context on themd1_dbginfo
file: its original filename isDbgInfo_NR16.R2.MT6879.TC2.PR1.SP_LENOVO_S0MP1_K6879V1_64_MT6879_NR16_TC2_PR1_SP_V17_P38_03_24_03R_2023_05_19_22_31.xz
md1_dbginfo
is an XZ compressed binary file containing debug information formd1rom
, including symbols
Extracting debug symbols
md1_dbginfo
is another binary file format containing symbols and filenames with associated addresses. We’ll rename it and decompress it based on the filename from md1_file_map
:
$ cp 018_md1_dbginfo DbgInfo_NR16.R2.MT6879.TC2.PR1.SP_LENOVO_S0MP1_K6879V1_64_MT6879_NR16_TC2_PR1_SP_V17_P38_03_24_03R_2023_05_19_22_31.xz $ unxz DbgInfo_NR16.R2.MT6879.TC2.PR1.SP_LENOVO_S0MP1_K6879V1_64_MT6879_NR16_TC2_PR1_SP_V17_P38_03_24_03R_2023_05_19_22_31.xz $ hexdump DbgInfo_NR16.R2.MT6879.TC2.PR1.SP_LENOVO_S0MP1_K6879V1_64_MT6879_NR16_TC2_PR1_SP_V17_P38_03_24_03R_2023_05_19_22_31 | head 00000000 43 41 54 49 43 54 4e 52 01 00 00 00 98 34 56 00 |CATICTNR.....4V.| 00000010 43 41 54 49 01 00 00 00 00 00 00 00 4e 52 31 36 |CATI........NR16| 00000020 2e 52 32 2e 4d 54 36 38 37 39 2e 54 43 32 2e 50 |.R2.MT6879.TC2.P| 00000030 52 31 2e 53 50 00 4d 54 36 38 37 39 5f 53 30 30 |R1.SP.MT6879_S00| 00000040 00 4d 54 36 38 37 39 5f 4e 52 31 36 2e 54 43 32 |.MT6879_NR16.TC2| 00000050 2e 50 52 31 2e 53 50 2e 56 31 37 2e 50 33 38 2e |.PR1.SP.V17.P38.| 00000060 30 33 2e 32 34 2e 30 33 52 00 32 30 32 33 2f 30 |03.24.03R.2023/0| 00000070 35 2f 31 39 20 32 32 3a 33 31 00 73 00 00 00 2b |5/19 22:31.s...+| 00000080 ed 53 00 49 4e 54 5f 56 65 63 74 6f 72 73 00 4c |.S.INT_Vectors.L| 00000090 08 00 00 54 08 00 00 62 72 6f 6d 5f 65 78 74 5f |...T...brom_ext_|
To extract information from the debug info file, we made another Kaitai definition and wrapper script that extracts symbols and outputs them in a text format compatible with Ghidra’s ImportSymbolsScript.py
script:
$ ./mtk_dbg_extract.py md1img_out/DbgInfo_NR16.R2.MT6879.TC2.PR1.SP_LENOVO_S0MP1_K6879V1_64_MT6879_NR16_TC2_PR1_SP_V17_P38_03_24_03R_2023_05_19_22_31 | tee dbg_symbols.txt INT_Vectors 0x0000084c l brom_ext_main 0x00000860 l INT_SetPLL_Gen98 0x00000866 l PLL_Set_CLK_To_26M 0x000009a2 l PLL_MD_Pll_Init 0x000009da l INT_SetPLL 0x000009dc l INT_Initialize_Phase1 0x027b5c80 l INT_Initialize_Phase2 0x027b617c l init_cm 0x027b6384 l init_cm_wt 0x027b641e l ...
(Currently the script is set to only output label definitions rather than function definitions, as it was unknown if all of the symbols were for functions.)
Loading nanoMIPS firmware into Ghidra
Install the extension
First, we’ll have to install the nanoMIPS module for Ghidra. In the main Ghidra window, go to “File > Install Extensions”, click the “Add Extension” plus button, and select the module Zip file (e.g., ghidra_11.0.3_PUBLIC_20240424_nanomips.zip
). Then restart Ghidra.
Initial loading
Load md1rom
as a raw binary image. Select 000_md1rom
from the md1img.img
extract directory and keep “Raw Binary” as the format. For Language, click the “Browse” ellipsis and find the little endian 32-bit nanoMIPS option (nanomips:LE:32:default
) using the filter, then click OK.
We’ll load the image at offset 0 so no further options are necessary. Click OK again to load the raw binary.
When Ghidra asks if you want to do an initial auto-analysis, select No. We have to set up a mirrored memory address space at 0x90000000
first.
Memory mapping
Open the “Memory Map” window and click plus for “Add Memory Block”.
We’ll name the new block “mirror”, set the starting address to ram:90000000
, the length to match the length of the base image “ram” block (0x2916c40
), permissions to read and execute, and the “Block Type” to “Byte Mapped” with a source address of 0 and mapping ratio of 1:1.
Also change the permissions for the original “ram” block to just read and execute. Save the memory map changes and close the “Memory Map” window.
Note that this memory map is incomplete; it’s just the minimal setup required to get disassembly working.
Debug symbols
Next, we’ll load up the debug symbols. Open the Script Manager window and search for ImportSymbolsScript.py
. Run the script and select the text file generated by mtk_dbg_extract.py
earlier (dbg_symbols.txt
). This will create a bunch of labels, most of them in the mirrored address space.
Disassembly
Now we can begin disassembly. There is a jump instruction at address 0 that will get us started, so just select the byte at address 0 and press “d” or right-click and choose “Disassemble”. Thanks to the debug symbols, you may notice this instruction jumps to the INT_Initialize_Phase1
function.
Flow-based disassembly will now start to discover a bunch of code. The initial disassembly can take several minutes to complete.
Then we can run the normal auto-analysis with “Analysis > Auto Analyze…”. This should also discover more code and spend several minutes in disassembly and decompilation. We’ve found that the “Non-Returning Functions” analyzer creates many false positives with the default configuration in these firmware images, which disrupts the code flow, so we recommend disabling it for initial analysis.
The one-shot “Decompiler Parameter ID” analyzer is a good option to run next for better detection of function input types.
Conclusion
Although the module is still a work in progress, the results are already quite useable for analysis and allowed to us to reverse engineer some critical features in baseband processors.
The nanoMIPS Ghidra module and MediaTek binary file unpackers can be found on our GitHub at:
Source: Original Post
MITRE TTP
Software Discovery (T1518):
- The process involves discovering and identifying firmware from MediaTek that runs on nanoMIPS architecture. This includes analyzing various carrier-specific firmware versions for the MediaTek SoC.
Reverse Engineering (T1587):
- Fundamental to this project is the reverse engineering of baseband firmware. This includes disassembling and decompiling the firmware to understand its functionality and structure, which is crucial for identifying potential vulnerabilities and understanding the baseband processor’s operation.
Exploitation for Evasion (T1620):
- While not explicitly mentioned as a malicious action in your scenario, the techniques developed and used could potentially be applied in scenarios where evasion of security measures in embedded systems is required. This would typically involve understanding how firmware interacts with hardware to circumvent or disable security features.
Develop Capabilities (T1588):
- The development of a new nanoMIPS disassembler and decompiler module for Ghidra to handle the specific firmware structures encountered is a key capability developed during this engagement.
Supply Chain Compromise (T1195):
- Analyzing and potentially modifying firmware can lead to supply chain risks, where compromised firmware could be injected into the distribution channels, affecting the integrity of the devices.
Hardware Reverse Engineering (T0202):
- This is a specific technique under the MITRE Mobile ATT&CK matrix, focusing on the reverse engineering of hardware components and firmware. It is especially relevant when dealing with the baseband processors and understanding their interaction with mobile network functionalities.
Process Injection (T1055):
- This may be applicable if the reverse engineering findings were used to modify the firmware to inject malicious code into legitimate processes to hide or facilitate further malicious actions.
Software Modification (T1576):
- Modifying the baseband firmware, whether for legitimate or malicious purposes, falls under this technique. In security research, this might be aimed at improving security or developing patches, while in an adversarial context, it might be used to implant backdoors or disable security features.