This post is about exploiting CVE-2022-24834 against a Redis
container running on Alpine
Linux. CVE-2022-24834 is a vulnerability affecting the Lua cjson
module in Redis servers <=7.0.11. The bug is an integer overflow that
leads to a large copy of data, approximately 350MiB.
A colleague from NCC Group wanted to exploit this bug but found that
the public exploits didn’t work. This was ultimately due to those
exploits being written to target Ubuntu or similar distros, which use
the GNU libc library.
The target in our case was Alpine 13.8, which uses musl libc 1.2.4. The important
distinction here is that GNU libc uses the ptmalloc2 heap allocator, and
musl 1.2.4 uses its own custom allocator called mallocng. This resulted
in some interesting differences during exploitation, which I figured I
would document since there’s not a lot of public information about
targeting the musl heap.
I highly recommend reading Ricerca Security’s original writeup,
which goes into depth about the vulnerability and how they approached
exploitation on ptmalloc2. Conviso Lab’s has a README.md that
describes some improvements that they made, which is also worth a look.
There are quite a few differences between exploitation on ptmalloc2 and
mallocng, which I’ll explain as I go. I’ll try not to repeat the details
that previous research has already provided but rather focus on the
parts that differed for mallocng.
Finally, I want to note that I am not attacking the musl mallocng
allocator by corrupting its metadata, but rather I’m doing Lua-specific
exploitation on the mallocng heap, mimicking the strategy done by the
original exploit.
Lua 5.1
As the previous articles covered Lua internals in detail, I won’t
repeat that information here. Redis uses Lua 5.1, so it’s important to
refer to the specific version when reading, as Lua has undergone
significant changes across different releases. These changes include
structure layouts and the garbage collection algorithm utilized.
I would like to highlight that Lua utilizes Tagged Values to
represent various internal types such as numbers and tables. The
structure is defined as follows:
/* ** Tagged Values */ #define TValuefields Value value; int tt typedef struct lua_TValue { TValuefields; } TValue;
In this structure, tt
denotes the type, andvalue
can either be an inline value or a pointer depending
on the associated type. In Lua, a Table
serves as the
primary storage type, akin to a dictionary or list in Python. It
contains an array of TValue
structures. For simple types
like integers, value
is used directly. However, for more
complex types like nested tables, value
acts as a pointer.
For further implementation details, please refer to Lua’slobject.h
file or the aforementioned articles.
During debugging, I discovered the need to inspect Lua 5.1 objects.
The Alpine redis-server
target did not include symbols for
the static Lua library. To address this, I compiled my own version of
Lua and filtered out all function symbols to only access the structure
definitions easily. This was achieved by identifying and stripping out
all FUNC
symbols using readelf -Ws
andobjcopy --strip-symbol
.
Additionally, I came across the GdbLuaExtension,
which offers pretty printers and other functionalities for analyzing Lua
objects, albeit supporting version 5.3 only. I made some minor
modifications to enable its compatibility with Lua 5.1. These
changes enabled features like pretty printers for tables, although I
didn’t conduct exhaustive testing on the required functionalities.
This method provides a clearer analysis of objects like aTable
, presenting information in a more readable format
compared to a hexdump.
(gdb) p/x *(Table *) 0x7ffff7a05100 $2 = <lua_table> = { [1] = (TValue *) 0x7fffaf9ef620 <lua_table^> 0x7ffff4a76322, [2] = (TValue *) 0x7fffaf9ef630 <lua_table^> 0x7ffff7a051a0, [3] = (TValue *) 0x7fffaf9ef640 <lua_table^> 0x7ffff7a051f0, [4] = (TValue *) 0x7fffaf9ef650 <lua_table^> 0x7ffff7a05290, [5] = (TValue *) 0x7fffaf9ef660 <lua_table^> 0x7ffff7a052e0,
The Table
we printed shows an array ofTValue
structures, and we can see that eachTValue
in our table is referencing another table.
Musl’s Next
Generation Allocator – aka mallocng
On August 4, 2020,
musl 1.2.1 shipped a new heap algorithm called “mallocng”. This
allocator has received some good quality research in the past,
predominantly focused on CTF challenge exploitation. I didn’t find any
real-world exploitation examples, but if someone knows of some, please
let me know and I’ll update the article.
The mallocng allocator is slab-based and organizes fixed-sized
allocations (called slots) on multi-page slabs (called
groups). In general, groups are mmap()
-backed.
However, groups containing small slots may actually be less than a size
of a page, in which case the group is actually just a larger fixed-sized
slot on a larger group. The allocator not using brk()
is an
important detail as we will see later. The fixed size for a given group
is referred to as the group’s stride.
The mallocng allocator seems to be designed with security in mind,
mixing a combination of in-band metadata that contains some cookies,
with predominantly out-of-band metadata which is stored in slots on
dedicated group mappings that are prefixed with guard pages to prevent
corruption from linear overflows.
As I’m not actually going to be exploiting the allocator internals
itself, I won’t go into too much detail about the data structures. I
advise you to read pre-existing articles, which you can find in the
resource section.
There’s a useful gdb plugin called muslheap developed by
xf1les, which I made a lot of use of. xf1les also has an associated blog
post which is worth reading. At the time of writing, I have a PR open to add
this functionality to pwndbg, and hopefully will have time add some more
functionality to it afterwards.
There is one particularly interesting aspect of the allocator that I
want to go over, which is that it can adjust the starting offset of
slots inside a group across subsequent allocations, using a value it
calls the cycling offset. It only does so if the overhead of a given
slot inside the fixed size has a large enough remainder such that the
offset can be adjusted. Interestingly, in this case, because the slot we
are working in is the 0x50-stride group, and the Table
structure is 0x48 bytes, this cycling offset doesn’t apply. Since I
narrowly avoided having to deal with this, and originally thought I
would have to, I’ll still take a moment to explain what the mitigation
actually is for and what it looks like in practice.
mallocng Cycling Offset
The cycling offset is a technique used to mitigate double frees,
although it can have a negative effect on other exploitation scenarios
as well. It works by adjusting the offset of the user data part of an
allocation each time a chunk is used, wrapping back to the beginning
once the offset is larger than the slack space. The offset starts at 1
and increments each time the chunk is reused.
The idea behind mitigating a double free is that if a chunk is used
and then freed, and then re-used, the offset used for the second
allocation will not be the same as the first time, due to cycling. Then,
when it is double freed, that free will detect some in-band metadata
anomaly and fail.
The allocator goes about this offset cycling by abusing the fact that
groups have fixed-sized slots, and often the user data being allocated
will not fill up the entire space of the slot, resulting in some slack
space. If the remaining slack space in the slot is large enough, which
is calculated by subtracting both the size of the user data and the
required in-line metadata, then there are actually two in-line metadata
blocks used inside a slot. One contains an offset used to indicate the
actual start of the user data, and that user data will still have some
metadata prefixed before it.
The offset calculation is done in the enframe()
function in mallocng. Basically, each time a slot is allocated, the
offset is increased, and will wrap back around when it exceeds the size
of the slack.
To demonstrate what the cycling offset looks like in practice, I will
focus on larger-than-Table
stride groups, that have enough
slack such that the cycling offset will be used. If we review what the
stride sizes are, we see:
sizeclass | stride | sizeclass | stride | sizeclass | stride | sizeclass | stride |
---|---|---|---|---|---|---|---|
1 | 0x20 | 13 | 0x140 | 25 | 0xaa0 | 37 | 0x5540 |
2 | 0x30 | 14 | 0x190 | 26 | 0xcc0 | 38 | 0x6650 |
3 | 0x40 | 15 | 0x1f0 | 27 | 0xff0 | 39 | 0x7ff0 |
4 | 0x50 | 16 | 0x240 | 28 | 0x1240 | 40 | 0x9240 |
5 | 0x60 | 17 | 0x2a0 | 29 | 0x1540 | 41 | 0xaaa0 |
6 | 0x70 | 18 | 0x320 | 30 | 0x1990 | 42 | 0xccc0 |
7 | 0x80 | 19 | 0x3f0 | 31 | 0x1ff0 | 43 | 0xfff0 |
8 | 0x90 | 20 | 0x480 | 32 | 0x2480 | 44 | 0x12480 |
9 | 0xa0 | 21 | 0x540 | 33 | 0x2aa0 | 45 | 0x15540 |
10 | 0xc0 | 22 | 0x660 | 34 | 0x3320 | 46 | 0x19980 |
11 | 0xf0 | 23 | 0x7f0 | 35 | 0x3ff0 | 47 | 0x1fff0 |
Using a cycling offset requires an additional 4-byte in-band header
and also increases by UNIT
-sized (16-byte) increments. As
such, I think it’s unlikely for strides <= 0xf0 to have the cycling
offset applied (though I haven’t tested each). There might be some
exceptions, like if sometimes smaller allocations are placed into larger
strides rather than always allocating a new group, but I’m not sure if
that’s possible as I haven’t spent enough time studying the allocator
yet.
In light of this understanding, for the sake of demonstrating when
cycling offsets are used, we’ll look at the 0x140 stride. I allocate a
few tables, fill their arrays such that the resulting sizes are ~0x100
bytes.
I use Lua to leak the address of an outer table. Then in gdb I
analyze the array of all the tables it references, which should be of
increasing size. Let’s look at the first inner table’s array first:
pwndbg> p/x *(Table *) 0x7ffff7a945b0 $2 = <lua_table> = { [1] = (TValue *) 0x7ffff7a99880 <lua_table^> 0x7ffff7a94740, [2] = (TValue *) 0x7ffff7a99890 <lua_table^> 0x7ffff7a93d80, [3] = (TValue *) 0x7ffff7a998a0 <lua_table^> 0x7ffff7a93e70, [4] = (TValue *) 0x7ffff7a998b0 <lua_table^> 0x7ffff7a95040, [5] = (TValue *) 0x7ffff7a998c0 <lua_table^> 0x7ffff7a950e0, ... pwndbg> p/x ((Table *) 0x7ffff7a94740)->array $4 = 0x7ffff7a94e40 pwndbg> mchunkinfo 0x7ffff7a94e40 ============== IN-BAND META ============== INDEX : 2 RESERVED : 5 (Use reserved in slot end) OVERFLOW : 0 OFFSET_16 : 0x29 (group --> 0x7ffff7a94ba0) ================= GROUP ================== (at 0x7ffff7a94ba0) meta : 0x555555a69040 active_idx : 2 ================== META ================== (at 0x555555a69040) prev : 0x0 next : 0x0 mem : 0x7ffff7a94ba0 last_idx : 2 avail_mask : 0x0 (0b0) freed_mask : 0x0 (0b0) area->check : 0x8bbd98bb29552bcc sizeclass : 13 (stride: 0x140) maplen : 0 freeable : 1 Group allocation method : another groups slot Slot status map: [U]UU (from slot 2 to slot 0) (U: Inuse / A: Available / F: Freed) Result of nontrivial_free() : queue (active[13]) ================== SLOT ================== (at 0x7ffff7a94e30) cycling offset : 0x1 (userdata --> 0x7ffff7a94e40) nominal size : 0x100 reserved size : 0x2c OVERFLOW (user data) : 0 OVERFLOW (reserved) : 0 OVERFLOW (next slot) : 0
The first chunk we see under the == SLOT ==
head has a
cycling offset of 1. We can see that the slot itself starts at
0x7ffff7a94e30, but the user data does not start at the same address,
but rather 0x10-bytes further. This is due to the cycling offset *UNIT
adjustment. If we quickly look at a Table
(stride 0x50) slot, which is of a size that doesn’t allow enough slack
to use a cycling offset, we can see the difference:
pwndbg> mchunkinfo 0x7ffff7a94740 ============== IN-BAND META ============== INDEX : 11 RESERVED : 4 OVERFLOW : 0 OFFSET_16 : 0x37 (group --> 0x7ffff7a943c0) ================= GROUP ================== (at 0x7ffff7a943c0) meta : 0x555555a68ea0 active_idx : 11 ================== META ================== (at 0x555555a68ea0) prev : 0x555555a686f8 next : 0x555555a68d38 mem : 0x7ffff7a943c0 last_idx : avail_mask : 0x0 (0b00000000000) freed_mask : 0x5ac (0b10110101100) area->check : 0x8bbd98bb29552bcc sizeclass : 4 (stride: 0x50) maplen : 0 freeable : 1 Group allocation method : another groups slot Slot status map: [U]FUFFUFUFFUU (from slot 11 to slot 0) (U: Inuse / A: Available / F: Freed) Result of nontrivial_free() : Do nothing ================== SLOT ================== (at 0x7ffff7a94740) cycling offset : 0x0 (userdata --> 0x7ffff7a94740) nominal size : 0x48 reserved size : 0x4 OVERFLOW (user data) : 0 OVERFLOW (next slot) : 0
Above, we see the SLOT
section indicates a cycling
offset of 0. This will hold true for all Table
allocations
in a stride 0x50 group. In this case, the user data starts at the same
location as the slot.
So now let’s look at the second stride 0x140 group’s slot that we
allocated earlier:
pwndbg> p/x ((Table *) 0x7ffff7a93d80)->array $4 = 0x7ffff7a96ca0 pwndbg> mchunkinfo 0x7ffff7a96ca0 ============== IN-BAND META ============== INDEX : 1 RESERVED : 5 (Use reserved in slot end) OVERFLOW : 0 OFFSET_16 : 0x17 (group --> 0x7ffff7a96b20) ================= GROUP ================== (at 0x7ffff7a96b20) meta : 0x555555a690e0 active_idx : 2 ================== META ================== (at 0x555555a690e0) prev : 0x0 next : 0x0 mem : 0x7ffff7a96b20 last_idx : 2 avail_mask : 0x0 (0b0) freed_mask : 0x0 (0b0) area->check : 0x8bbd98bb29552bcc sizeclass : 13 (stride: 0x140) maplen : 0 freeable : 1 Group allocation method : another groups slot Slot status map: U[U]U (from slot 2 to slot 0) (U: Inuse / A: Available / F: Freed) Result of nontrivial_free() : queue (active[13]) ================== SLOT ================== (at 0x7ffff7a96c70) cycling offset : 0x3 (userdata --> 0x7ffff7a96ca0) nominal size : 0x100 reserved size : 0xc OVERFLOW (user data) : 0 OVERFLOW (reserved) : 0 OVERFLOW (next slot) : 0
This second array has a cycling offset of 3, so it starts 0x30 bytes
further than the start of the slot. Clearly, this slot has been used a
few times already.
The main takeaways here are:
- For certain allocation sizes, the exact offset of an overflow may be
unreliable unless you know exactly how many times the slot has been
allocated. - For a scenario like overwriting the LSB of a pointer inside of such
a group, you could be unable to predict where the resulting pointer will
point inside of another slot, depending on whether you know how many
times each slot has been used.
Considering all this in the context of the exploit this article
describes, I think that because we have fine-grained control over all
the allocations performed for our overflow, this mitigation wouldn’t
have stopped us. Even if the structures had been on a ‘stride’ group
that uses the cycling offsets, because we can easily control the number
of times the slots are actually used prior to overflow. That said, since
I originally thought it might be a problem and wanted to understand it,
hopefully the explanation was still interesting.
With that out of the way, let’s look into how to exploit
CVE-2022-24834 on the musl heap.
Exploiting
CVE-2022-24834 on the mallocng heap
To quickly recap the vulnerability, it’s an integer overflow when
calculating the size of a buffer to allocate while doing cjson encoding.
By triggering the overflow, we end up with an undersized buffer that we
can write 0x15555555 bytes to (341 MiB), which may be large enough to
qualify as a “wild copy,” although on a 64-bit target and the amount of
memory on modern systems, it’s not too hard to deal with. Exploitation
requires that the target buffer that we want to corrupt must be adjacent
to the overflown buffer with no unmapped gaps in between, so at a
minimum around 350 MiB.
While exploiting ptmalloc2, Ricerca Security solved this problem by
extending the heap, which is brk()
-based, to ensure that
enough space exists. Once the extension occurs, it won’t be shrunk
backward. This makes it easy to ensure no unmapped memory regions exist,
and that the 0x15555555-byte copy won’t hit any invalid memory.
This adjacent memory requirement poses some different problems on the
mallocng heap, which I’ll explain shortly.
After achieving the desired layout, the goal is to overwrite some
target chunk (or slot in our case) with the 0x22 value corresponding to
the ending double quote. In the Ricerca Security write-up, their
diagrams indicated they overwrote the LSB pointer of aTable->array
pointer; however, I believe their exploit
actually overwrites the LSB of a TValue->value
pointer,
which exists in a chunk that is pointed to by theTable->array
. I may misunderstand their exploit, but at
any rate, the latter is the approach I used.
To summarize, the goal of the heap shaping is ultimately to ensure
that the allocation associated with a table’s array, which is pointed to
by Table->array
, is adjacent to the buffer we overflow
so that we corrupt the TValue
.
mallocng Heap Shaping
mallocng requires a different strategy than ptmalloc2, as it does not
use brk()
. Rather, it will use mmap()
to
allocate groups (below I will assume that the group itself is not a slot
of another group) and populate those groups with various fixed-size
slots. Freeing the group, which may occur if all of the slots in a group
are no longer used, results in memory backing the group to be unmapped
using munmap()
.
This means we must leverage feng shui to have valid in-use
allocations adjacent to each other at the time of the overflow. While
doing this, in order to analyze gaps in the memory space, I wrote a
small gdb utility which I’ll use to show the layout that we are working
with. A slightly modified version of this utility has also now been
added to pwndbg.
First, let’s look at what happens if we trigger the bug and allow the
copy to happen, without first shaping the heap. Note this first example
is showing the entire memory space to give an idea of what it looks
like, but in future output, I will limit what’s shown to more relevant
mappings.
The annotations added to to the mapping output are as follows:
^-- ADJ: <num>
indicates a series of adjacent
memory regions, where<num>
is the accumulated
size!!! GUARD PAGE
indicates a series of pages with no
permissions, which writing to would trigger a fault[00....0] -- GAP: <num>
indicates an unmapped
page between mapped regions of memory, where<num>
is
the size of the gap
0: 0x555555554000 - 0x5555555bf000 0x6b000 r--p 2: 0x5555555bf000 - 0x555555751000 0x192000 r-xp 3: 0x555555751000 - 0x5555557d3000 0x82000 r--p 4: 0x5555557d3000 - 0x5555557da000 0x7000 r--p 5: 0x5555557da000 - 0x555555833000 0x59000 rw-p 6: 0x555555833000 - 0x555555a66000 0x233000 rw-p ^-- ADJ: 0x512000 7: 0x555555a66000 - 0x555555a67000 0x1000 ---p !!! GUARD PAGE 7: 0x555555a67000 - 0x555555af7000 0x90000 rw-p [0000000000000000000000000000000000000000000000 ]-- GAP: 0x2aaa2ed09000 9: 0x7fff84800000 - 0x7fff99d84000 0x15584000 rw-p [0000000000000000000000000000000000000000000000 ]-- GAP: 0x7c000 10: 0x7fff99e00000 - 0x7fffa48c3000 0xaac3000 rw-p [0000000000000000000000000000000000000000000000 ]-- GAP: 0xab3d000 11: 0x7fffaf400000 - 0x7fffcf401000 0x20001000 rw-p [0000000000000000000000000000000000000000000000 ]-- GAP: 0x24348000 12: 0x7ffff3749000 - 0x7ffff470a000 0xfc1000 rw-p [0000000000000000000000000000000000000000000000 ]-- GAP: 0xd000 13: 0x7ffff4717000 - 0x7ffff4c01000 0x4ea000 rw-p [0000000000000000000000000000000000000000000000 ]-- GAP: 0x1000 14: 0x7ffff4c02000 - 0x7ffff4e00000 0x1fe000 rw-p 15: 0x7ffff4e00000 - 0x7ffff5201000 0x401000 rw-p 16: 0x7ffff5201000 - 0x7ffff5c00000 0x9ff000 rw-p 17: 0x7ffff5c00000 - 0x7ffff5e01000 0x201000 rw-p 18: 0x7ffff5e01000 - 0x7ffff6000000 0x1ff000 rw-p ^-- ADJ: 0x13fe000 19: 0x7ffff6000000 - 0x7ffff6002000 0x2000 ---p !!! GUARD PAGE 19: 0x7ffff6002000 - 0x7ffff6404000 0x402000 rw-p 21: 0x7ffff6404000 - 0x7ffff6600000 0x1fc000 rw-p ^-- ADJ: 0x5fe000 22: 0x7ffff6600000 - 0x7ffff6602000 0x2000 ---p !!! GUARD PAGE 22: 0x7ffff6602000 - 0x7ffff6a04000 0x402000 rw-p 24: 0x7ffff6a04000 - 0x7ffff6a6e000 0x6a000 rw-p 25: 0x7ffff6a6e000 - 0x7ffff6c00000 0x192000 rw-p ^-- ADJ: 0x5fe000 26: 0x7ffff6c00000 - 0x7ffff6c02000 0x2000 ---p !!! GUARD PAGE 26: 0x7ffff6c02000 - 0x7ffff7004000 0x402000 rw-p 28: 0x7ffff7004000 - 0x7ffff7062000 0x5e000 rw-p 29: 0x7ffff7062000 - 0x7ffff715c000 0xfa000 rw-p 30: 0x7ffff715c000 - 0x7ffff71ce000 0x72000 rw-p 31: 0x7ffff71ce000 - 0x7ffff7200000 0x32000 rw-p 32: 0x7ffff7200000 - 0x7ffff7a00000 0x800000 rw-p 33: 0x7ffff7a00000 - 0x7ffff7a6f000 0x6f000 rw-p ^-- ADJ: 0xe6d000 34: 0x7ffff7a6f000 - 0x7ffff7a71000 0x2000 ---p !!! GUARD PAGE 34: 0x7ffff7a71000 - 0x7ffff7ac5000 0x54000 rw-p 36: 0x7ffff7ac5000 - 0x7ffff7b0e000 0x49000 r--p 37: 0x7ffff7b0e000 - 0x7ffff7dab000 0x29d000 r-xp 38: 0x7ffff7dab000 - 0x7ffff7e79000 0xce000 r--p 39: 0x7ffff7e79000 - 0x7ffff7ed2000 0x59000 r--p 40: 0x7ffff7ed2000 - 0x7ffff7ed5000 0x3000 rw-p 41: 0x7ffff7ed5000 - 0x7ffff7ed8000 0x3000 rw-p 42: 0x7ffff7ed8000 - 0x7ffff7ee9000 0x11000 r--p 43: 0x7ffff7ee9000 - 0x7ffff7f33000 0x4a000 r-xp 44: 0x7ffff7f33000 - 0x7ffff7f50000 0x1d000 r--p 45: 0x7ffff7f50000 - 0x7ffff7f5a000 0xa000 r--p 46: 0x7ffff7f5a000 - 0x7ffff7f5e000 0x4000 rw-p 47: 0x7ffff7f5e000 - 0x7ffff7f62000 0x4000 r--p 48: 0x7ffff7f62000 - 0x7ffff7f64000 0x2000 r-xp 49: 0x7ffff7f64000 - 0x7ffff7f78000 0x14000 r--p 50: 0x7ffff7f78000 - 0x7ffff7fc4000 0x4c000 r-xp 51: 0x7ffff7fc4000 - 0x7ffff7ffa000 0x36000 r--p 52: 0x7ffff7ffa000 - 0x7ffff7ffb000 0x1000 r--p 53: 0x7ffff7ffb000 - 0x7ffff7ffc000 0x1000 rw-p 54: 0x7ffff7ffc000 - 0x7ffff7fff000 0x3000 rw-p ^-- ADJ: 0x58e000 [0000000000000000000000000000000000000000000000 ]-- GAP: 0x7fdf000 55: 0x7ffffffde000 - 0x7ffffffff000 0x21000 rw-p [0000000000000000000000000000000000000000000000 ]-- GAP: 0xffff7fffff601000 56: 0xffffffffff600000 - 0xffffffffff601000 0x1000 --xp
When we crash we see:
Thread 1 "redis-server" received signal SIGSEGV, Segmentation fault. 0x00005555556cd676 in json_append_string () (gdb) x/i $pc => 0x5555556cd676 <json_append_string+166>: mov %al,(%rcx,%rdx,1) (gdb) info registers rcx rdx rcx 0x7ffff3749010 140737277890576 rdx 0x14b7ff0 21725168 (gdb) x/x $rcx+$rdx 0x7ffff4c01000: Cannot access memory at address 0x7ffff4c01000
Our destination buffer (the buffer being copied to) was allocated at0x7ffff3749010
(index 12), and after 0xfc1000 bytes, it
quickly writes into unmapped memory, which correlates to what we just
saw in the gap listing:
12: 0x7ffff3749000 - 0x7ffff470a000 0xfc1000 rw-p [0000000000000000000000000000000000000000000000 ]-- GAP: 0xd000
In this particular case, even if this gap didn’t exist, because we
didn’t shape the heap, we will inevitably run into a guard page and fail
anyway.
Similarly to the original exploit, shaping the heap to fill these
gaps is quite easy by just allocating lots of tables that point to
unique strings or large arrays of floating-point values. During this
process, it’s also useful to pre-allocate lots of other tables that are
used for different purposes, as well as anything else that may otherwise
create unwanted side effects on our well-groomed heap.
Ensuring Correct
Target Table->Array Distance
After solving the previous issue, the next problem is that even if we
fill the gaps, we have to be careful where our target buffer (the one we
want to corrupt) ends up being allocated. We need to take into account
that the large allocations for the source buffer (the one we copy our
controlled data from) might also be mapped at lower addresses in memory
than the target buffer, which might not be ideal. From the large gap map
listing above, we can see some large allocations at index 9 and 11,
which are related to generating a string large enough for the source
buffer to actually trigger the integer overflow.
9: 0x7fff84800000 - 0x7fff99d84000 0x15584000 rw-p [0000000000000000000000000000000000000000000000 ]-- GAP: 0x7c000 10: 0x7fff99e00000 - 0x7fffa48c3000 0xaac3000 rw-p [0000000000000000000000000000000000000000000000 ]-- GAP: 0xab3d000 11: 0x7fffaf400000 - 0x7fffcf401000 0x20001000 rw-p
Both the 9 and 11 mappings are roughly as big or larger than the
amount of memory that will actually be writing during our overflow, so
if our cjson buffer ends up being mapped before one of these maps, the
overflow will finish inside of the large string map and thus be useless.
Although in the case above our destination buffer (index 12) was
allocated later in memory than 9 and 11 and so won’t overflow into them,
in practice after doing heap shaping to fill all the gaps, this won’t
necessarily be the case.
This is an example of what that non-ideal scenario might look
like:
To resolve this, we must first shape the heap so that the target slot
we want to corrupt is actually mapped with an address lower than the
large mappings used for the source string. In this way, we can ensure
that our destination buffer ends up being directly before the target,
with only the exact amount of distance we need in between. To ensure
that our target slot gets allocated where we want, it needs to be large
enough to be in a single-slot group.
In order to ensure that our target buffer slot’s group gets allocated
after the aforementioned large strings, we can abuse the fact that we
can leak table addresses using Lua. By knowing the approximate size of
the large maps, we can predict when our target buffer would be mapped at
a lower address in memory and avoid it. By continuously allocating large
tables and leaking table addresses, we can work through relatively
adjacent mappings and eventually get an address that suddenly skips a
significantly sized gap, correlating to the large string allocations we
want to avoid. After this point, we can safely allocate the target
buffer we want to corrupt, followed by approximately 0x15556000 bytes of
filler memory, and then finally the destination buffer of the vulnerable
copy that we will overflow. Just a reminder, this order is in reverse of
what you might normally expect because each group is mmap()’ed at lower
addresses, but we overflow towards larger addresses.
The filler memory must still be adjacently mapped so that the copy
from the vulnerable cjson buffer to the target slot won’t encounter any
gaps. mallocng uses specific size thresholds for allocations that
determine the group they fit in. Each stride up to a maximum threshold
has an associated ‘sizeclass’. There are 48 sizeclasses. Anything above
the MMAP_THRESHOLD
(0x1FFEC) will fall into a ‘special’
sizeclass 63. In these cases, it will map a single-slot group just for
that single allocation only. We can utilize this to trigger large
allocations that we know will be of a fixed size, with fixed contents,
and won’t be used by any other code. I chose to use mappings of size
0x101000, as I found they were consistently mapped adjacent to each
other by mmap()
, as sizes too large or too small seemed to
occasionally create unwanted gaps.
To actually trigger the large allocations, I create a Lua table of
floating pointer numbers. The array contains TValue
structures with inline numeric values. Therefore, we just need to create
a table with an array big enough to cause the 0x101000 map (keeping in
mind the in-band metadata, which will add overhead). I do something like
this:
-- pre-allocate tables for i = 1, math.floor(0x15560000 / 0x101000) + 1 do spray_pages[i] = {} end ... -- trigger the 0x101000-byte mappings for i = 1, #spray_pages do for j = 1, 0xD000 do spray_pages[i][j] = 0x41414141 end end
I used the gap mapping script to confirm this behavior while
debugging and eventually ended up with something like this, where each
new table allocation ends up with a new array mapping like this:
7: 0x555555a67000 - 0x5555564a1000 0xa3a000 rw-p [0000000000000000000000000000000000000000000000 ]-- GAP: 0x2aaa4439c000 9: 0x7fff9a83d000 - 0x7fff9a93e000 0x101000 rw-p 10: 0x7fff9a93e000 - 0x7fff9aa3f000 0x101000 rw-p 11: 0x7fff9aa3f000 - 0x7fff9ab40000 0x101000 rw-p 12: 0x7fff9ab40000 - 0x7fff9ac41000 0x101000 rw-p 13: 0x7fff9ac41000 - 0x7fff9ad42000 0x101000 rw-p ... 350: 0x7fffafe92000 - 0x7fffb0093000 0x201000 rw-p 351: 0x7fffb0093000 - 0x7fffd00a4000 0x20011000 rw-p 352: 0x7fffd00a4000 - 0x7fffd80a5000 0x8001000 rw-p ^-- ADJ: 0x3d868000 [0000000000000000000000000000000000000000000000 ]-- GAP: 0x2000 ...
So the layout will ultimately look something like:
In the diagram above, the “source string slot” is the buffer from
which we copy our controlled data. The “cjson overflow slot” is the
vulnerable destination buffer that we overflow due to the integer
overflow, and the “target slot” is the victim buffer that we will
corrupt with our 0x22 byte.
There is one more thing which is that the exact offset of the
overflow may change by a small amount if the Lua script changes, or if
there are other side effects on the heap. This seems due to allocations
being made on the index 350 mapping above, before our actual target
buffer. I didn’t investigate this a lot, but it is likely solvable to
get rid of the indeterminism entirely. I chose to work around it by
using a slightly smaller offset, and repeatedly triggering the overflow
and increasing the length. The main caveat of multiple attempts is that
due to corruption of legitimate chunks we have to avoid the garbage
collector firing. Also, Lua has read-only strings, so each string being
allocated needs to be unique, so for each attempt that we make, it will
consume a few hundred MB of memory. In the event that our offset is too
far away, we may well exhaust the memory of the target before we
succeed. In practice, this isn’t a big issue, as once the exploit is
stable and the code isn’t changing, this offset won’t change.
Successful brute force applied to the previous example looks
something like this:
Lua Table Confusion
With that out of the way, we can get to the more interesting part. As
noted, we corrupt the LSB of a TValue
structure such thatTValue->value
points outside its original slot
boundaries. This leads to a sort of type confusion, where we can point
it into a different slot with data we control.
The corrupted array is like so:
While targeting ptmalloc2, the Ricera Security researchers showed
that it’s possible to modify a TValue
that originally
pointed to a Table
, and change its pointer such that it
points to a controlled part of a TString
chunk, which
contains a fake Table
structure. This can then be used to
kick off a read/write primitive. We can do something similar on
mallocng; however, we have much more strict limitations because the
group holding the Table
structure referenced by our
corrupted TValue
only contains other fixed-size slots, so
we will only be able to adjust the offset to point to these. Let’s take
a look at these constraints.
Because of the fixed-size slots, our “confused” table will overlap
with two 0x50-byte slots. Depending on the TValue
address
being corrupted, it may still partially overlap with itself (as this
graphic shows):
A Lua string is made up of a structure called TString
,
which is 0x18 bytes. It is immediately followed by the actual
user-controlled string data. This means that if we want to place a Lua
string into a group holding a Table
, we will be limited by
how many bytes we actually control.
(gdb) ptype /ox TString type = struct TString { /* 0x0000 | 0x0008 */ GCObject *next; /* 0x0008 | 0x0001 */ lu_byte tt; /* 0x0009 | 0x0001 */ lu_byte marked; /* 0x000a | 0x0001 */ lu_byte reserved; /* XXX 1-byte hole */ /* 0x000c | 0x0004 */ unsigned int hash; /* 0x0010 | 0x0008 */ size_t len; /* total size (bytes): 0x18 */ }
A Table
is 0x48 bytes and is placed on a 0x50-stride
group. This means that only the last 0x30 bytes of a string can be used
to fully control the Table
contents, assuming a direct
overlap.
(gdb) ptype /ox Table type = struct Table { /* 0x0000 | 0x0008 */ GCObject *next; /* 0x0008 | 0x0001 */ lu_byte tt; /* 0x0009 | 0x0001 */ lu_byte marked; /* 0x000a | 0x0001 */ lu_byte flags; /* XXX 1-byte hole */ /* 0x000c | 0x0004 */ int readonly; /* 0x0010 | 0x0001 */ lu_byte lsizenode; /* XXX 7-byte hole */ /* 0x0018 | 0x0008 */ struct Table *metatable; /* 0x0020 | 0x0008 */ TValue *array; /* 0x0028 | 0x0008 */ Node *node; /* 0x0030 | 0x0008 */ Node *lastfree; /* 0x0038 | 0x0008 */ GCObject *gclist; /* 0x0040 | 0x0004 */ int sizearray; /* XXX 4-byte padding */ /* total size (bytes): 0x48 */ }
In practice, because we are dealing with a misaligned overlap, we can
still leverage all of the user-controlled TString
data. As
previously mentioned, we don’t control the exact offset into theTString
we end up using. We are restricted by the fact that
the value written is 0x22. As it turns out, it’s still possible to make
it work, but it’s a little bit finicky.
To solve this problem, we need to figure out what the ideal
overlapping offset into a TString
would be, such that we
fully control Table->array
in our confused table. Even
if we control this array
member though, we still need to
see what side effects exist and how they affect the otherTable
fields. If some uncontrolled data pollutes a field in
a particular way, it could mean we can’t actually abuse thearray
field.
Let’s look at the offsets of our slots inside the fixed-sized group.
If we know the address of a table from which we can start:
(gdb) p/x *(Table *) 0x7ffff7a5fa30 $2 = <lua_table> = { [1] = (TValue *) 0x7fffafe92650 <lua_table^> 0x7ffff497cac0, [2] = (TValue *) 0x7fffafe92660 <lua_table^> 0x7ffff7a5fad0, [3] = (TValue *) 0x7fffafe92670 <lua_table^> 0x7ffff7a5fb20, ...
Here we have a table at 0x7ffff7a5fa30
, whosearray
value contains a bunch of other tables. We want to,
however, analyze the 0x50-stride group that this table is on, as well as
the other slots in this group.
We can use mchunkinfo
from the muslheap library to take a
look at the associated slot group.
(gdb) mchunkinfo 0x7ffff7a5fa30 ============== IN-BAND META ============== INDEX : 8 RESERVED : 4 OVERFLOW : 0 OFFSET_16 : 0x28 (group --> 0x7ffff7a5f7a0) ================= GROUP ================== (at 0x7ffff7a5f7a0) meta : 0x555555aefc48 active_idx : 24 ================== META ================== (at 0x555555aefc48) prev : 0x0 next : 0x0 mem : 0x7ffff7a5f7a0 last_idx : 24 avail_mask : 0x0 (0b0) freed_mask : 0x0 (0b0) area->check : 0x232d7200e6a00d1e sizeclass : 4 (stride: 0x50) maplen : 0 freeable : 1 Group allocation method : another groups slot Slot status map: UUUUUUUUUUUUUUUU[U]UUUUUUUU (from slot 24 to slot 0) (U: Inuse / A: Available / F: Freed) Result of nontrivial_free() : queue (active[4]) ================== SLOT ================== (at 0x7ffff7a5fa30) cycling offset : 0x0 (userdata --> 0x7ffff7a5fa30) nominal size : 0x48 reserved size : 0x4 OVERFLOW (user data) : 0 OVERFLOW (next slot) : 0
We can confirm that the stride is 0x50, and the slot size is 0x48.
The Slot status map
shows that this group is full, and our
slot is at index 8 (designated by [U]
and indexed in
reverse order). Also, the cycling offset
is 0, which means
that the userdata associated with the slot actually starts at the
beginning of the slot. As we saw earlier, this will be very useful to
us, as we will rely on predictable relative offsets between slots in the
group.
What we are most interested in is how overwriting the LSB of a slot
at a specific offset in this group will influence what we control during
the type confusion. I’ll use an example to make it clearer. Let’s print
out all the offsets of all the slots in this group:
0: 0x7ffff7a5f7a0 1: 0x7ffff7a5f7f0 2: 0x7ffff7a5f840 3: 0x7ffff7a5f890 4: 0x7ffff7a5f8e0 5: 0x7ffff7a5f930 6: 0x7ffff7a5f980 7: 0x7ffff7a5f9d0 8: 0x7ffff7a5fa20 (B2) 9: 0x7ffff7a5fa70 10: 0x7ffff7a5fac0 (B) 11: 0x7ffff7a5fb10 (A), (A2) 12: 0x7ffff7a5fb60 13: 0x7ffff7a5fbb0 14: 0x7ffff7a5fc00 15: 0x7ffff7a5fc50 16: 0x7ffff7a5fca0 17: 0x7ffff7a5fcf0 18: 0x7ffff7a5fd40 19: 0x7ffff7a5fd90 20: 0x7ffff7a5fde0 21: 0x7ffff7a5fe30 22: 0x7ffff7a5fe80 23: 0x7ffff7a5fed0 24: 0x7ffff7a5ff20
Before going further, I want to note that other than theTable
being targeted by the overwrite, these stride 0x50
slots can be TString
values that we control, so below if I
say target index N, it means the slot at index N is aTable
, but you can assume that slots adjacent (N-1 and N-2)
to it are controlled TString
structures.
Let’s start from the lowest LSB in the list and go until the pattern
repeats. We see at 2, the LSB is 0x40, then the pattern repeats at
offset 18. That means we only need to analyze candidate tables between 2
and 17 to cover all cases. We want to see what will happen if we
overwrite any of these entries with 0x22. Where does it fall within an
earlier slot, and how might that influence what we control? Since when
we trigger this confusion, due to the uncontrolled value 0x22, we are
guaranteed to overlap two different 0x50-byte slots, so we may want to
control them both.
A quick refresh in case you’ve forgotten, remember that we are
corrupting the LSB of a TValue
in some table’sTable->array
buffer, and that TValue
will
point to one of the slots in a group as we are analyzing.
I’ll choose a bad example of a table to target first. Assume we
decide to corrupt the LSB of index 11 (marked with (A)
above), which is at 0x7ffff7a5fb10
. If we corrupt its LSB
with 22
, we get a confused table at0x7ffff7a5fb22
so we end starting the confused table inside
of the associated Table
. I’ve indicated this above with(A2)
to show they are roughly at the same location. In this
scenario we don’t control the contents of the (A)
table at
all, and thus most of (A2)
is not controlled. Only the0x12
bytes of the slot at index 12, which follows the
confused Table
will actually be controlled, so probably not
ideal.
Okay, now we should find a better candidate… something that if we
corrupt it, we can jump back some large distance and overlap at least
one TString
structure. I’ll be biased and choose the one
that works, but in practice, some trial and error was required. Let’s
target index 10 (marked with (B)
), which is at address0x7ffff7a5fac0
. If we corrupt this, we will point to0x7ffff7a5fa22
(marked with (B2)
). Here(B)
will overlaps with both index 8 and the first two bytes
of 9. In this scenario, index 8 could be a TString
, which
we control.
Assuming we have a controlled TString
, we can check what
our confused Table
will look like. First, this is what theTString
looks like (no misaligned access):
(gdb) p/rx *(TString *) 0x7ffff7a5fa20 $7 = { tsv = { next = 0x7ffff3fa2460, tt = 0x4, marked = 0x1, reserved = 0x0, hash = 0xb94dc111, len = 0x32 (gdb) x/50b 0x7ffff7a5fa20+0x18 0x7ffff7a5fa38: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x7ffff7a5fa40: 0x00 0x00 0x41 0x41 0x41 0x41 0x41 0x41 0x7ffff7a5fa48: 0x41 0x41 0x30 0x30 0x30 0x30 0x30 0x30 0x7ffff7a5fa50: 0x30 0x31 0x00 0x00 0x00 0x00 0x00 0x00 0x7ffff7a5fa58: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x7ffff7a5fa60: 0x00 0x00 0xff 0xff 0xff 0x7f 0x00 0x00 0x7ffff7a5fa68: 0x00 0x00
We see the TString
header values, and then 0x32-bytes of
controlled data. This data I’ve already populated at the right offsets
to demonstrate what values in a confused Table
we can
control.
Now let’s look at the confused Table
at the misaligned
offset:
(gdb) p/rx *(Table *) 0x7ffff7a5fa22 $5 = { next = 0x10400007ffff3fa, tt = 0x0, marked = 0x0, flags = 0x11, readonly = 0x32b94d, lsizenode = 0x0, metatable = 0x0, array = 0x4141414141414141, node = 0x3130303030303030, lastfree = 0x0, gclist = 0x0, sizearray = 0x7fffffff }
As would be expected, the uncontrolled parts of TString
are clobbering the fields next
throughreadonly
. But we can easily control the array
and the sizearray
fields.
One problem is that the readonly
flag is non-zero, which
means even if we get Lua to use this table, we’re not going to be able
to use it for a write primitive. So we will have to work around this
(more on how shortly).
It may also look like we are in trouble because the tt
member is clobbered and no longer is of type LUA_TTABLE
.
Fortunately, this isn’t a problem because when accessing numbered index
members inside of a table’s array, Lua will use the type specified by
the TValue
pointing at the object to determine its type. It
won’t ever reference the type information inside the object. The type
information inside the object is used specifically by the garbage
collector, which we won’t plan on running. Similarly, thenext
pointer is only used by the garbage collector, so it
being invalid is no problem.
We can look at luaH_get()
to confirm:
/* ** main search function */ const TValue *luaH_get (Table *t, const TValue *key) { switch (ttype(key)) { case LUA_TNIL: return luaO_nilobject; case LUA_TSTRING: return luaH_getstr(t, rawtsvalue(key)); case LUA_TNUMBER: { int k; lua_Number n = nvalue(key); lua_number2int(k, n); if (luai_numeq(cast_num(k), nvalue(key))) /* index is int? */ return luaH_getnum(t, k); /* use specialized version */ /* else go through */ } ...
When looking up a table by index, if the index value is a number, we
encounter the LUA_TNUMBER
case. This triggers a call toluaH_getnum()
, which is:
const TValue *luaH_getnum (Table *t, int key) { /* (1 <= key key <= t->sizearray) */ if (cast(unsigned int, key-1) < cast(unsigned int, t->sizearray)) return t->array[key-1]; else { ...
This function will return the TValue
from theTable->array
value. The TValue
contains its
own tt
member, as mentioned earlier. ThisTValue
may be utilized later by some Lua code to access it
as a Table
, which is handled byluaV_gettable
.
void luaV_gettable (lua_State *L, const TValue *t, TValue *key, StkId val) { int loop; for (loop = 0; loop < MAXTAGLOOP; loop++) { const TValue *tm; if (ttistable(t)) { /* `t' is a table? */ Table *h = hvalue(t); const TValue *res = luaH_get(h, key); /* do a primitive get */ if (!ttisnil(res) || /* result is no nil? */ (tm = fasttm(L, h->metatable, TM_INDEX)) == NULL) { /* or no TM? */ setobj2s(L, val, res); return; } /* else will try the tag method */ } ...
We can see above that the parameter t
of typeTValue
is being passed and used as a Table
.
The code uses ttistable(t)
to ensure that theTValue
indicates that it is a table:
#define ttistable(o) (ttype(o) == LUA_TTABLE)
If it is a table, it calls into the luaH_get()
to
reference whatever index is being requested. We know thatluaH_get()
itself doesn’t check theTable->tt
value. So we see that if we corrupt aTValue
to point to a confused table, and then access the
associated Table
structure to fetch objects, we can do it
without the corrupted Table->tt
value ever being
validated, meaning we can use the read-only Table
to read
other, possibly more controlled objects.
So, we’ve now got a spoofed read-only table that we can use, which
can be visualized as:
Let’s use our read-only Table
to try to read a
controlled writable Table
object. The first question is,
where do we point our read-only Table->array
member? The
leak primitive that Lua gives us only will leak addresses of tables, so
we’re still only limited to values on a similarly fixed-size slot.
However, in this case, we aren’t limited to only overwriting an LSB with
0x22, so what do we do? First, we need to pointTable->array
to a fake TValue
that itself
points to yet another fake Table
object.
Because we are able to control other fields inside our read-onlyTable
that don’t need to be valid, and because I already
leaked its address, I chose Table->array
to be inside
the Table
itself. By re-using theTable->lastfree
and Table->gclist
members, we can plant a new TValue
of typeLUA_TTABLE
, and we can point TValue->value
to some other offset inside the 0x50-stride group. So where should we
point it this time?
Experimentation showed that by pointing to an offset of 0x5 into aTString
, we can create a confused Table
whereTable->readonly
is NULL
, and we are still
able to control the Table->array
pointer with controlled
string contents.
What we end up with looks like this:
Since this table is writable, we will point itsTable->array
to yet another table’s Table->array
address. This final Table
becomes our actual almost-arbitrary read/write (AARW) primitive. Using
insertions onto our writable confused table allows us to control the
address the r/w table will point to. At this point we are finally back
to where the original Ricera Security exploit expects to be.
This ultimately looks like so:
This AARW is a bit cumbersome, so the conviso exploit sets up aTString
object on the heap and modifies its length, to
allow for larger swaths of memory to be read in one go.
redis-server/libc
ASLR Bypass and Code Execution
The conviso labs exploit also used a trick originally documented by
saelo
that abuses the fact that a CCoroutine
that usesyield()
will end up using setjmp()
. This means
while executing Lua code inside the coroutine, it’s possible to use the
AARW primitive to leak the address of the stored setjmp buffer, which
leaks the stack address. From there, it’s possible to leak a GNU libc
address, which is enough to know where to kick off a ROP chain.
I still ran into some more quirks here, like the offset for the musl
libc leak was different. Also, unlike the conviso exploit, we can’t
easily brute force it due to the heap addresses and musl libc addresses
being too similar. This differs from when using brk()
in
the original ptmalloc2 example. This led to me having to use a static
offset on the stack to find the musl libc offset.
While poking around with this, I realized there’s maybe another way
to get musl libc addresses, without relying on theCCoroutine
setjmp technique. In Lua, there is a global
table that defines what types of functions are available. This can be
referenced using the symbol _G
. By looking inside of_G
, we can see a whole bunch of the function entries, which
point to other CCoroutine
structures on the heap. By
leaking the contents of the structure, we can read their function
address. These will all point into redis-server
.text
section. We could then parse the redis-server
ELF to find a
musl libc GOT entry. Or so I thought… there is another quirk about the
read primitive used, which is that a string object is constructed on the
heap and its length is modified to allow arbitrary (positive) indexing,
which makes it easier to read larger chunks of memory all in one go.
Since the string is on the heap, the leaked redis-server
addresses mentioned above might not be accessible depending on where
they are mapped. For instance, if you are testing with ASLR disabled or
redis-server is not complied PIE, redis-server will almost certainly be
inaccessible. As we saw earlier, the TString
data is stored
inline, and not referenced using a pointer, so we can’t just point it
into redis-server
.
I chose not to further pursue this and just rely on the static musl
libc offset I found on the stack, as I only needed to target a single
redis version. However, this is possibly an interesting exercise for the
reader.
Conclusion
This is a pretty interesting bug, and hopefully this article serves
to show that revisiting old exploits can be quite fun. Even if a bug is
proven exploitable on one environment, there may still be a lot of work
to be done elsewhere, so don’t necessarily skip over it thinking
everything’s already been explored.
I’d also like to give a big shout out to Ricerca and Conviso for the
impressive and interesting exploits!
Lastly, as I always mention lately, I started using voice coding
around 3-4 years ago for all my research/writing, and so want to thank
the Talon Voice community for building tooling to help people with RSI.
This is your friendly reminder to stand up, stretch, stop hunching, give
your arms a rest, etc. If you want to try voice coding, I suggest
checking out Talon and Cursorless.
Resources
The following is a list of papers mentioned in the article above.
Tools
- muslheap: A gdb
plugin designed for analyzing the mallocng heap structures.
Source: Original Post