egberts1 2 days ago

Within QEMU amd64/x86 emulation (not KVM) mode, the IMUL opcode still fails its IMUL opcode emulation after an XOR opcode (also within the same TLB cached page) modified its neighbor IMUL operand(s).

TLB got double-invalidated yet some say never invalidated. The crux is a glitch within a singlr entire TLB invalidation operation thereby negating XOR opcode's ability to self-modify the neighboring IMUL operand. (Double ROT13, anyone?). I assert double-invalidation because within the same TLB invalidation stroke, XOR operation got performed ... twice, as opposed to retrieving and restoring original IMUL operand value after such invalidation thereby negating XOR computed result EITHER WAY.

A failure of self-modifying code within QEMU amd64/x86 emulation mode could be a useful test to determine if one is under QEMU emulation mode, of course if the page allows read-write-execute as often found in JavaScript, Java, Python and Dalvik (Android) bytecode memory regions.

Fabrice Bellard, author of QEMU, acknowledged the basic of above but failed amd64/x86 IMUL/XOR self-modify premise in emulation (not KVM) mode of QEMU.

https://github.com/unicorn-engine/unicorn/issues/364

  • dealbreaker 2 days ago

    It's fixed there, though. I am also in the thread having replied several times when I was working on unicorn.

    • egberts1 2 days ago

      What a deal breaker #headduck.

      No, really, thank you for the update.

      I will give it another spin using latest Unicorn-patched QEMU, we have a patched/updated QEMU, ya know, just for Unicorn, do we?

loeg 2 days ago

I'd be interested to see this written for reading (as opposed to presenting). Inferring the speaker's meaning from raw slides is a little challenging. I've had to interact with the minutiae of the x86 TLB in a past life, but happily have forgotten most of that.

nonrandomstring 2 days ago

Elaborate memory management (paging) systems need caching of lookups for high performance. But they can go wrong. The post was made in a security/safety context but did I miss something, because it didn't seems to make clear what the dangers are?

  • caspper69 2 days ago

    I only know x86/64, but I assume most page table caching would be somewhat similar.

    Basically, if you don't handle the TLB properly, the CPU will not know that page mappings and/or page permissions have changed. So if you had a page mapped RW, and then changed the mapping to a RO page (such as setting up COW), but failed to flush the TLB (or at least call INVLPG to flush the entry), the CPU might use those stale permissions and grant write access on that page when it shouldn't. The same could happen for changing a region of the VA space to use a different physical page, where the next bit of code would hit the old page (and who knows what state it might be in or what it could be being used for).

    The TLB is not super-complicated, but it has some quirks (it's been so long since I've done anything with it, the PCID handling rules were new to me; didn't even support it back when).

  • adrian_b 21 hours ago

    The article (towards its end) discusses a serious bug in the INVLPG instruction of the Intel Gracemont processor cores (which are the E-cores used in Alder Lake, Raptor Lake, Raptor Lake Refresh, Alder Lake N, Amston Lake, Twin Lake), which fails to invalidate all the entries that it should invalidate, in certain circumstances.

  • rybosworld 2 days ago

    I'm no expert on TLB invalidation bugs but generally they allow for an attacker to read/write arbitrary memory.

    https://googleprojectzero.blogspot.com/2019/01/taking-page-f...

    • caspper69 2 days ago

      I don't mean to be a pedant, so someone please correct me if I'm wrong, but I don't think TLB mishandling would result in arbitrary memory access (I suppose in the strictest sense arbitrary can just mean random, but generally I have understood it to imply that the address can be attacker controlled, which a stale TLB wouldn't allow).

      Unless you're like Microsoft (from your link) and accidentally leave the page tables writable from userspace for 2 months. But that's not really a TLB error, that's just L-O-L, wow!

      • jcalvinowens 21 hours ago

        Random access is arbitrary access, given enough time. You can try over and over again until you get lucky.

        Imagine I'm a user with local shell access trying to read a secret owned by root. Maybe I can't read the secret, but I can do something which makes another program read the secret. If I can make that program swap (perhaps by wasting a bunch of RAM to create memory pressure), and swapping has some probability of triggering a TLB invalidation bug that lets me see the old page, I win, although it might take awhile.

    • egberts1 2 days ago

      Read-Write-eXecute TLB memory region can be found in JavaScript, Java, Dalvik (Android), and Python.

      • Retr0id 2 days ago

        Modern javascript engines (namely V8) avoid RWX, although last time I checked there's been a backslide as part of WASM implementation.

        CPython also no longer appears to create RWX mappings even for ctypes, although you can of course still mmap them manually.

        • egberts1 2 days ago

          Wow. So V8 actually back to optimizing the entire bytecode region in one fell swoop?

          I had thought that such V8 optimization were still occuring (as of Chrome Blink81/SparkPlug) during JavaScript execution of untouched bytecode as a form of overhead reduction of its startup.

          https://egbert.net/blog/articles/javascript-jit-engines-time...

          • Retr0id a day ago

            I don't know what V8 actually does, but one possible strategy is to have a pair of RW and RX mappings to the same physical pages (or just keep flipping permissions between the two)

  • fn-mote 2 days ago

    It looks like the last 20 or so pages of the PDF contain two case studies. I read the first one, which lead to (nondeterministic) kernel errors.

    Perhaps “hacker” should be “crazy bug debugger”, but anybody who is working with TLB issues is a hacker in my book.

    There is no “CVE” vulnerability in the slides, for sure.

  • immibis 2 days ago

    I conclude that the title is wrong. Every developer doesn't need to know these things - only kernel developers need to know about TLB invalidation.

    • bell-cot 2 days ago

      Every developer needs to know that cache invalidation is one of the two hard things in computer science - and that people further down in your stack occasionally get it wrong.