< Home Light Mode
TRAVERTINE (CVE-2025-24118)
An absolutely wild race condition in the macOS kernel.
I guess rep movsb isn't so atomic after all.
This is the craziest kernel bug I have ever reported.
It involves a combination of several cutting-edge features in the macOS kernel (XNU)- Safe Memory Reclamation (SMR), read-only page mappings, per-thread credentials, memcpy implementation details, and of course, a race condition tying everything all together.
This bug allows for corruption of thread's kauth_cred_t
credential pointer.
Specifically, the SMR-protected p_ucred
field of a process's read-only struct can be corrupted to point to invalid memory, or potentially to a different (maybe even more privileged) credential.
In the process of discovering and reporting this bug, I learned a ton about some of the cutting edge features of XNU based on the latest in OS research. I will share what I learned in the process of triggering this bug (as well as how the bug itself works, of course).
You can find a proof of concept here.
Table of Contents
- Safe Memory Reclamation: What it is and how it works.
- Read-Only Pages in XNU: How XNU protects its most important data from data-only attacks via read-only mappings.
- Per-Thread Credentials: How credentials work in XNU, and how XNU allows concurrent threads to manage different credentials.
- The Race Condition: How these topics come together to result in a race condition leading to CVE-2025-24118.
- Conclusion: What happens when we win the race?
Buckle in, it's going to be quite a ride.
1. Safe Memory Reclamation
In a sentence, Safe Memory Reclamation (SMR) describes any algorithm for reclaiming memory without using locks while also making use-after-free's impossible. SMR is all about how to access and reclaim memory in a concurrent, lock-free context.
SMR has only recently been added to XNU in a few places, one of them being the process credential structures (the subject of this bug). At some point in the future, I may do a full writeup on SMR internals, but here I will just cover the basics required to understand CVE-2025-24118.
For today's bug, we only care about how SMR allows for concurrent readers and writers to a given data structure (and not how SMR frees old memory).
Locking in Concurrent Programming
Normally in concurrent programming (eg. the kernel), data that may be accessed by multiple consumers requires a lock to serialize accesses. At a high level, the basic idea is to avoid any situation where one thread is reading some data while another thread is updating it. Locks allow for this by providing a mechanism for obtaining exclusive access to a given resource.
When you want to access a shared resource, you must first "acquire" the lock for that resource. You can think of a lock as a special memory address that is set to 1 when held, and 0 if not. Special CPU magic is used to ensure only one thread can hold it (set it to 1) at a time. Holding a lock signals to other threads that you have taken ownership of that resource and they should not read or write it. Once the lock is acquired, you are free to read or write the object. Finally, when you're done, you "release" the lock (set it back to 0) to allow other threads their turn to access the resource.
Assuming everyone plays nicely and respects the lock, all reads and writes of the shared resource will appear atomic with respect to one another. That is, there shouldn't be a situation where one reader can observe a partially written / intermediate state of an object. We call that situation a race condition, and it is a very bad thing(TM) that can lead to memory corruption bugs (this will come back later ;D).
For situations where readers are quite common and writers are uncommon, locking the entire structure on every read is expensive and unnecessary.
In Linux, you can use a reader-writer
lock that allows multiple readers to simultaneously hold access to a resource at once.
That way, several readers can read the object at the same time.
Lockless Concurrency
Operating systems often use alternative mechanisms for synchronizing data between threads rather than locks.
The classic example from Linux is RCU
(Read, Copy, Update), which allows readers and at most one writer to run concurrently.
RCU can be thought of as a "publish-subscribe" algorithm, where various versions of an object are published over time, and older versions are kept around in memory until it is guaranteed that no readers are using them.
Concurrent updates publish a newer version of the object, and readers grab the latest published version when they want to reference the object.
Safe Memory Reclamation in XNU
XNU's safe memory reclamation approach is epoch-based and is influenced by the FreeBSD Global Unbounded Sequences / the UMA allocator SMR mechanism. The basic idea with epoch-based SMR strategies is that the system maintains a global counter called an epoch which increments over time. Threads track which epoch they have observed for a given object, and when all threads have sufficiently progressed, copies of the object from older epochs can be freed (as no threads can possibly be referencing them anymore). You can think of the epoch as a kind of "version number" for an object.
Let's take a closer look at SMR in XNU. SMR is documented here.
For the case of CVE-2025-24118, we actually don't care about how memory is reclaimed. The bug only involves the interaction between concurrent readers and writers under SMR. Specifically, we need to understand the mechanism by which writers publish new values.
The basic idea is that SMR-protected fields use a pointer that is always updated atomically by at most one writer. Readers can read this pointer asynchronously at any point and retrieve a correct state for the object. While this state may not be the very latest value of the object published (when a reader and writer access the pointer at the exact same time), the pointer will always point to some correct published version of the object.
Much like RCU, XNU's SMR implementation makes use of read-side critical sections.
That is, readers must call smr_enter()
before reading any SMR-protected fields, and call smr_leave()
when done.
This marks the reader as reading from a particular epoch.
You can safely dereference an SMR-protected pointer without locking as long as you are in an SMR critical section.
On the writer side, a lock is used to serialize writers, ensuring there can only be one writer at a time. Writers use atomic updates to publish new versions of the data structure. This way, readers will either read a pointer to the old value of the structure, or the new value. Therefore, it is extremely important that writers use atomic CPU instructions to update memory, as otherwise readers may read intermediate values for the SMR pointer (which would be a serious problem). Hint: this is foreshadowing.
To summarize, there are three ways to access SMR-protected memory (described here), two of which we care about:
- "Entered" (reader)- We are between
smr_enter
andsmr_leave
and are free to read without locks. - "Serialized" (writer)- We are holding the writer-serializing lock and are free to write, so long as we do it atomically (in case there are concurrent readers).
- "Unserialized" (reclaimer)- Guaranteed that no readers can see the memory, so we can free it. (This is irrelevant to the bug).
Writers (in the Serialized
state) serialize their atomic writes using a lock to ensure only one writer is updating an SMR pointer at a time.
In theory, this means a correct pointer will always be present in memory, allowing readers (in the Entered
state) to read without locking.
This only works if all writers are atomic.
If any writer is non-atomic, readers may observe intermediate / partially written values into the SMR protected pointer, which can cause the pointer to point to invalid memory.
2. Read-Only Pages in XNU
When performing data-only attacks, the attacker's goal is usually to change their user ID to root. This makes the credential structure a very enticing object for attackers, as that's where the user ID is stored. For that reason, XNU places the credentials in a read-only page along with other sensitive process information. In fact, XNU provides a kernel API for allocating and managing read-only objects. Let's take a look at the APIs for manipulating these read-only objects.
The read-only API is documented in doc/allocators/read-only.md.
Read-only objects can be allocated with zalloc_ro
, a special version of the zone allocator designed for read-only mappings.
zalloc_ro_mut
and its related methods are the only way to modify data in a read-only zone.
It takes as an argument the object to modify and what to write into it, kind of like a special version of memcpy but for read-only objects.
zalloc_ro_mut
internally uses pmap_ro_zone_memcpy
, which depending on the architecture may take a trip through the page protection layer (PPL) to unlock the page.
On X86_64, pmap_ro_zone_memcpy
is a wrapper around memcpy
to a mapping where all of physical memory is mapped virtually (called the "physical aperture").
Let's take a look at memcpy
's implementation on X86_64:
ENTRY(memcpy) movq %rdi, %rax /* return destination */ movq %rdx, %rcx cld /* copy forwards */ rep movsb ret
On X86_64, rep movsb
is not atomic- it copies byte-by-byte.
Since X86_64 is strongly ordered, the order of the byte-by-byte writes will be observed by other threads, but no guarantee is made about the byte alignment of the visibility of these writes.
If 8 bytes are being written using rep movsb
(say, for example, when a pointer's destination is being updated), a concurrent reader may see anywhere between 0 and 8 bytes written depending on exactly when the concurrent read occurs.
If a concurrent reader observes a partially updated pointer, it will attempt to dereference an invalid address that is formed by concatenating parts of two valid values (the old value and new value being written by the writer thread). This invalid address could be some other object near the two valid ones (likely in the same allocator zone), and could potentially point exactly to a valid third object never referenced by any writer to this field.
Looking back at the call tree when updating a read-only object, we see the following order of function calls:
zalloc_ro_mut -> pmap_ro_zone_memcpy -> memcpy -> rep movsb
Clearly, as zalloc_ro_mut
eventually uses rep movsb
, it is not atomic, and should not be used in places that atomic writes are required (remember SMR's requirement for writers?).
The XNU authors seem to have accounted for the non-atomicity of zalloc_ro_mut
by providing the private zalloc_ro_mut_atomic
API, which takes an argument for which kind of atomic operation should be performed.
So, if we can find a place where zalloc_ro_mut
was used where zalloc_ro_mut_atomic
(or one of its variants) should have been, there's a good chance we have a race condition bug. (Hint: this is foreshadowing).
Usage of Read-Only Objects in XNU
A general pattern I have observed with these read-only objects is that usually they are "paired" with a complementary read-write struct.
For example, the struct proc
that represents a process has a matching read-only struct proc_ro
.
proc.p_proc_ro
points to a given proc's proc_ro, and proc_ro.pr_proc
points back to its matching proc.
The really important bits of the structure are stored in the read-only object, and the relatively unimportant stuff is stored in the read-write version.
This diagram shows how proc
's and ucred
's are related.
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ proc_task_zone │ │ proc_ro zone │ │ kauth_cred zone │ │ ucred_rw_zone │ └────────┬────────┘ └────────┬────────┘ └────────┬────────┘ └────────┬────────┘ ▼ ▼ ▼ ▼ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌───▶│ struct proc │ ┌───▶│ struct │ ┌───▶│struct ucred │ ┌───▶│ struct │ │ │ (BSD) │ │ │ proc_ro │ │ │ │ │ │ ucred_rw │ │ │┌───────────┐│ │ │┌───────────┐│ │ │┌───────────┐│ │ │ │ │ ││ p_proc_ro │├───┤ ││ p_ucred │├───┘ ││ cred_rw │├───┘ │ │ │ │└───────────┘│ │ │└───────────┘│ *SMR* │└───────────┘│ │ │ │ ├─────────────┤ │ │┌───────────┐│ │┌───────────┐│ └─────────────┘ │ ┌─▶│ struct task │ │ ┌──┤│ pr_task ││ ││ struct ││ │ │ │ (Mach) │ │ │ │└───────────┘│ ││posix_cred ││ │ │ │┌───────────┐│ │ │ │┌───────────┐│ │└───────────┘│ │ │ ││bsd_info_ro│├───┘ │ ┌┤│ pr_proc ││ └─────────────┘ │ │ │└───────────┘│ │ ││└───────────┘│ │ │ └─────────────┘ │ │└─────────────┘ │ └──────────────────────┘ │ └──────────────────────────┘
Every process has a struct proc
and struct task
(allocated right next to each other).
The BSD world uses the struct proc
and the Mach world uses the struct task
.
Both of these structures point to a matching struct proc_ro
, which points back to both of them.
The proc_ro
is a read-only object allocated from the proc_ro
zone.
It is used for storing the sensitive data for this process- for example, the SMR protected pointer to a struct ucred
.
The ucred
also has a matching struct ucred_rw
structure it points to.
The important takeaway here is that every proc
has a proc_ro
, which is a read-only object that holds an SMR pointer to its credential.
This means that if a process's credential needs to be changed, it needs to be performed using the zalloc_ro
API.
I sure hope there isn't a place where they use zalloc_ro_mut
to write to p_ucred
instead of an atomic API, because p_ucred
is an SMR pointer and therefore must be written to atomically (hint: foreshadowing).
3. Per-Thread Credentials
A credential (struct ucred
) in XNU is a data structure that tracks a number of security related fields, such as the thread's user ID.
Here is the definition of a ucred:
struct ucred { struct ucred_rw *cr_rw; void *cr_unused; u_long cr_ref; /* reference count */ struct posix_cred { /* * The credential hash depends on everything from this point on * (see kauth_cred_get_hashkey) */ uid_t cr_uid; /* effective user id */ uid_t cr_ruid; /* real user id */ uid_t cr_svuid; /* saved user id */ u_short cr_ngroups; /* number of groups in advisory list */ u_short __cr_padding; gid_t cr_groups[NGROUPS];/* advisory group list */ gid_t cr_rgid; /* real group id */ gid_t cr_svgid; /* saved group id */ uid_t cr_gmuid; /* UID for group membership purposes */ int cr_flags; /* flags on credential */ } cr_posix; ... };
The posix_cred
part of a credential is used for tracking the privileges of the current thread.
Most threads in the system will have identical permissions- whatever permissions the current user has.
Storing a copy of these identical credentials for every thread would cost quite a bit of memory.
Instead, XNU makes use of an SMR hash table to hash these cred structs to allow threads to share the same credential object.
Credential objects use a reference count (cr_ref
) to track when they can be freed.
The hash is calculated using the second half of the cred (eg. cr_posix
and onward).
This allows threads with identical permissions to share the same credential object, saving memory.
This will be important later.
Managing Credentials
In XNU, threads belonging to the same process can have different credentials.
This is managed using current_cached_proc_cred_update
, which is called during every syscall.
During every syscall entry, the kernel retrieves the current process credential pointer and compares it to the per-thread credential to see if any changes need to be made.
void current_cached_proc_cred_update(void) { thread_ro_t tro = current_thread_ro(); proc_t proc = tro->tro_proc; if (__improbable(tro->tro_task != kernel_task && tro->tro_realcred != proc_ucred_unsafe(proc))) { kauth_cred_thread_update_slow(tro, proc); } }
This method compares the p_ucred
pointer from the proc_ro
with the tro_realcred
pointer from the current thread_ro
.
If the credentials are equivalent (eg. they have the same cr_posix
values), these pointers will point to the same cred in the hash table.
If the pointers don't match, kauth_cred_thread_update_slow
is called to update things, which eventually enters an SMR region and dereferences p_ucred
.
Note that in current_cached_proc_cred_update
, the usage of proc_ucred_unsafe
(reading p_ucred
without being in an SMR region) is ok since we only read the credential pointer's value and do not dereference it yet.
The important thing here is that every syscall can cause the kernel to enter an SMR region and dereference p_ucred
, the SMR-protected pointer to the process's credential structure.
This happens whenever the thread's credentials change even slightly.
4. The Race Condition
Now that we have all the pieces, let's put them together.
To recap:
proc_ro
is a read-only object used for managing a process's sensitive data (such as its credentials), and can only be modified via thezalloc_ro_mut*
family of functions.proc_ro.p_ucred
is an SMR-protected pointer to a process's credential structure.- Since
p_ucred
is an SMR pointer, writers must synchronize with one another via a lock (specifically this one), and when writing must use atomic operations to changep_ucred
. zalloc_ro_mut
, a function used for modifying read-only objects, is non-atomic and is therefore unsuitable for modifyingp_ucred
.
You might be able to see where this is going.
The bug is that there is a spot in the code that updates proc_ro.p_ucred
non-atomically using zalloc_ro_mut
.
I found a way to race this update call against an SMR dereference of p_ucred
, which will load without locking.
If you do this enough times, eventually you will observe a partially written value of p_ucred
that points to a different credential!
The Buggy Function
The bug lives in kauth_cred_proc_update
, which is the function responsible for updating a proc_ro
's p_ucred
pointer.
I have highlighted the buggy line in red.
bool
kauth_cred_proc_update(
proc_t p,
proc_settoken_t action,
kauth_cred_derive_t derive_fn)
{
kauth_cred_t cur_cred, free_cred, new_cred;
cur_cred = kauth_cred_proc_ref(p);
for (;;) {
new_cred = kauth_cred_derive(cur_cred, derive_fn);
if (new_cred == cur_cred) {
...
kauth_cred_unref(&new_cred);
kauth_cred_unref(&cur_cred);
return false;
}
proc_ucred_lock(p);
if (__probable(proc_ucred_locked(p) == cur_cred)) {
kauth_cred_ref(new_cred);
kauth_cred_hold(new_cred);
// This is the bug:
zalloc_ro_mut(ZONE_ID_PROC_RO, proc_get_ro(p),
offsetof(struct proc_ro, p_ucred),
&new_cred, sizeof(struct ucred *));
kauth_cred_drop(cur_cred);
ucred_rw_unref_live(cur_cred->cr_rw);
proc_update_creds_onproc(p, new_cred);
proc_ucred_unlock(p);
...
kauth_cred_unref(&new_cred);
kauth_cred_unref(&cur_cred);
return true;
}
...
}
}
To trigger the bug, we need to be able to cause frequent credential updates.
I tried several ways to do this and found the best way is to use setgid
to adjust the group ID over and over again.
Each time the group ID changes, kauth_cred_proc_update
will need to be called to adjust p_ucred
to point to the correct credential object in the hash table.
Let's take a closer look at how p_ucred
is read and written.
Reading p_ucred
Readers of p_ucred
use proc_ucred_smr
to fetch the p_proc_ro->p_ucred
field from a given proc_t
.
kauth_cred_t proc_ucred_unsafe(proc_t p) { kauth_cred_t cred = smr_serialized_load(&proc_get_ro(p)->p_ucred); return kauth_cred_require(cred); } kauth_cred_t proc_ucred_smr(proc_t p) { assert(smr_entered(&smr_proc_task)); return proc_ucred_unsafe(p); }
After ensuring we are in an smr_region
, smr_serialized_load
just returns the value of p_ucred
from memory without locking.
Whatever is currently in memory is what we get, even if it is an in-progress write from a non-atomic writer thread.
Writing p_ucred
The XNU SMR API requires writers to be serialized by an external mechanism- in this case, it's the p_ucred_mlock
(via the proc_ucred_locked
API).
This lock serializes writers so that, in theory, a correct pointer is always present in memory, allowing readers to read without locking.
However, as we've seen, even though kauth_cred_proc_update
correctly acquires the writer lock, it violates the SMR requirements due to the use of the non-atomic zalloc_ro_mut
API.
Triggering the Bug
Every time kauth_cred_proc_update
changes p_ucred
, the bug is triggered.
However, most of the time, this will not cause problems, because normal workflows only update their credentials rarely, if at all.
To hit the bug we need to read p_ucred
while a write is occurring.
We don't need to trigger any allocations or frees, all we need is to cause p_ucred
to be changed via zalloc_ro_mut
.
Specifically, this happens when a kauth_cred_derive_t
closure returns true
.
Many paths in the kernel can cause this (eg. setuid
, setgid
, setgroups
, etc.)
To hit the bug we need two threads- one to trigger frequent p_ucred
changes, and one to read p_ucred
.
Writer Thread
To allow for an unprivileged local attacker to cause credential changes, I use a binary with the setgid
bit.
This allows us to switch the effective group ID back and forth between the saved and real group ID for the caller without requiring root.
Each time the effective group ID changes, p_ucred
will need to be updated as well.
Specifically, two credentials will be allocated in the hash table (one for each possible GID), and kauth_cred_proc_update
will switch between them.
Here is what that thread looks like:
while(true) { setgid(rg); // real gid setgid(eg); // eff. gid }
Each time we call setgid
in this manner, setgid
will use kauth_cred_proc_update
to update the credential pointer in our proc's p_ucred
.
Unprivileged users are allowed to swap between the saved GID and real GID without root privileges, so this is a practical way to trigger many credential changes.
Each time p_ucred
is changed with zalloc_ro_mut
, there is a chance that a concurrent reader will observe an intermediate value.
Reader Thread
unix_syscall64
takes a reference to the current proc cred during every syscall to support maintaining different credentials across threads.
As we have seen, current_cached_proc_cred_update
will attempt to verify and dereference p_ucred
on credential changes to read the cr_rw
field.
Any syscall running concurrent to group ID changes will trigger this read.
My reader thread just calls getgid()
in a loop.
volatile gid_t tmp; while(true) tmp = getgid();
At some point, one of these reads will observe a p_ucred
value that is halfway written, which will cause a crash if you are lucky- or maybe silent corruption of the credential if you are unlucky!
Running the Proof-of-Concept
The binary needs to be a setgid
binary run as a different group than the real GID of the current user so that we have a different group to switch to.
The default group on macOS is staff
, so I use everyone
as this second group.
This just gives us a convenient way of getting kauth_cred_proc_update
to switch credentials without needing root.
Other ways of triggering this are also possible.
chgrp everyone poc chmod g+s poc ./poc
After running the proof of concept for a while, eventually your process's credential pointer will become corrupted.
This could cause a kernel panic, or maybe it could cause your credentials to silently be changed to point to some other credential object in the kauth_cred
zone.
You can find a proof of concept that uses two threads to race kauth_cred_proc_update
against current_cached_proc_cred_update
here.
5. Conclusion
First off, I should note that this race is quite hard to win. Since the two credentials we are switching between are in the same zone, many of the bytes in their addresses will be identical. This means that even when this race is triggered it may not cause visible issues.
When investigating this bug I would commonly setup a 2013 Mac Pro running OCLP (I got one for a few hundred bucks from eBay), turn on my proof of concept code, and leave it running overnight with a debugger attached, hoping that it would be stopped in a panic condition when I woke up the next day.
Second, I have only observed this issue affect Intel systems.
I currently believe this to be due to the fact that the version of memcpy used in ARM64 is optimized to copy larger blocks of bytes at a time, which gives some degree of atomicity in practice.
While the code is still not strictly correct on ARM systems (because zalloc_ro_mut
does not guarantee atomicity), I was not able to cause any kauth_cred_t
corruption there.
Maybe someone reading this can get it to work on an ARM system? If you do, let me know on X @0xjprx.
Suggested Fix
When I reported this bug to Apple, I provided the following suggested fix.
@@ -3947,9 +3947,9 @@ kauth_cred_proc_update( kauth_cred_ref(new_cred); kauth_cred_hold(new_cred); - zalloc_ro_mut(ZONE_ID_PROC_RO, proc_get_ro(p), + zalloc_ro_mut_atomic(ZONE_ID_PROC_RO, proc_get_ro(p), offsetof(struct proc_ro, p_ucred), - &new_cred, sizeof(struct ucred *)); + ZRO_ATOMIC_XCHG_LONG, (uint64_t)new_cred); kauth_cred_drop(cur_cred); ucred_rw_unref_live(cur_cred->cr_rw);
Running the kernel with this patch applied completely eliminated the bug from my setup.
I used zalloc_ro_mut_atomic
with ZRO_ATOMIC_XCHG_LONG
to atomically swap the old credential pointer for the new one.
A better function to use here is probably something like zalloc_ro_update_field_atomic
, but I found there were non-trivial incompatibilities between the implicit structs declared via the SMR pointer macro and the macros used by update_field_atomic, so I just called zalloc_ro_mut_atomic
directly.
Winning the Race
When you win the race, if the invalid pointer is not properly aligned for an element of the zone, you'll get a panic like this:
panic: zone_require_ro failed: element improperly aligned (addr: 0xffffff86c79e8350) @zalloc.c:7376 Panicked task 0xffffff952d31db88: 3 threads: pid 1110: poc Backtrace (CPU 8), panicked thread: 0xffffff90610770c8, Frame : Return Address 0xfffffff4078abac0 : 0xffffff8007becc41 mach_kernel : _handle_debugger_trap + 0x4c1 0xfffffff4078abb10 : 0xffffff8007d598ec mach_kernel : _kdp_i386_trap + 0x11c 0xfffffff4078abb50 : 0xffffff8007d48f6b mach_kernel : _kernel_trap + 0x48b 0xfffffff4078abc10 : 0xffffff8007b82971 mach_kernel : _return_from_trap + 0xc1 0xfffffff4078abc30 : 0xffffff8007becf37 mach_kernel : _DebuggerTrapWithState + 0x67 0xfffffff4078abd30 : 0xffffff8007bec5d2 mach_kernel : _panic_trap_to_debugger + 0x1e2 0xfffffff4078abda0 : 0xffffff80083d4938 mach_kernel : _panic + 0x81 0xfffffff4078abe90 : 0xffffff80083dab9f mach_kernel : ___smr_stail_invalid + 0x2ce9 0xfffffff4078abed0 : 0xffffff80080c6757 mach_kernel : _kauth_cred_proc_ref + 0x167 0xfffffff4078abf00 : 0xffffff80080c64c8 mach_kernel : _kauth_cred_ref + 0xc8 0xfffffff4078abf40 : 0xffffff800823b4eb mach_kernel : _unix_syscall64 + 0x39b 0xfffffff4078abfa0 : 0xffffff8007b82db6 mach_kernel : _hndl_unix_scall64 + 0x16 Process name corresponding to current thread: poc Mac OS version: 24A335 Kernel version: Darwin Kernel Version 24.0.0: Mon Aug 12 20:54:30 PDT 2024; root:xnu-11215.1.10~2/RELEASE_X86_64 Kernel UUID: 5DD51D41-0315-3DDD-BD5D-50E782643BDB roots installed: 0 KernelCache slide: 0x0000000007800000 KernelCache base: 0xffffff8007a00000 Kernel slide: 0x00000000078e4000 Kernel text base: 0xffffff8007ae4000 __HIB text base: 0xffffff8007900000
It might also be possible that the creds align just right such that combining them will give you a pointer to a correctly aligned credential, effectively changing your process's credentials.
Without some other kind of mechanism for deterministically controlling where in the kernel your credential objects are allocated, there isn't a lot of control over how the invalid pointer gets formed, so this may be hard to achieve in practice. Maybe you could try to get them lined up via spraying creds in a particular pattern? I'll leave that as an "exercise for the reader."
Even so, I consider this bug to be extraordinarily fascinating and a great learning example for some really cool features of XNU. What do you think? Feel free to reach out on X @0xjprx.
This bug was fixed in macOS 15.3, released on January 27, 2025.
References
A few cool links about concurrency and lock-free data structures.
[1] Paul E. McKenney. Is Parallel Programming Hard, And, If So, What Can You Do About It?. This whole book is amazing. Chapter 9 on Deferred Processing is of particular relevance.
[2] Keir Fraser. Practical Lock-Freedom..
[3] Locks in the Linux Kernel.
[4] SMR Discussion in the FreeBSD Mailing List.
-ravi
January 30, 2025