Microsoft Security Blog

9 MIN READ

Everything Old Is New Again: Hardening the Trust Boundary of VBS Enclaves

Microsoft

Mar 03, 2025

Virtualization-Based Security (VBS) enclaves use the hypervisor’s virtual trust levels (VTLs) to isolate regions of memory and code execution within a user-mode process. This provides a powerful solution for trusted execution environments (TEE) that protects sensitive data, like encryption keys, from even malicious administrators. However, this also introduces a new trust boundary: one between the VTL1 enclave and the VTL0 host. This complicates things!

One of the foundational premises of evaluating if data is untrusted is whether that data crosses a trust boundary. Common examples of crossing trust boundaries include a higher privileged process ingesting data from a lower privileged process, a network service receiving packets from the internet, and a word processor opening a file from a USB drive you found in the parking lot. A key difference between those trust boundaries and the one separating an enclave and its host process is that in each of those boundaries the higher privileged entity is external to the lower privileged one: a kernel driver vs a user-mode process, a network server vs an internet client, a word processor vs a file on a USB drive you found in the parking lot. However, an enclave exists within its host process, and this new trust boundary is internal to that process. This fact requires a shift in perspective for the developer because the enclave cannot trust anything that originates from the host process.

MORSE has partnered closely with teams across Microsoft building VBS enclaves and has collected some lessons learned with this shift in perspective. Since support for third-party enclaves was announced last year, it is important that we highlight this new threat model and its design patterns for the broader developer community. In this blog post, we will present some recommendations that you can follow to help harden your enclave against common vulnerabilities.

Never trust VTL0

The most important thing to remember is that while the host process cannot read or write in the enclave’s memory region, the converse does not hold true – an enclave can read and write the memory of its host VTL0 process. This can create tricky situations when the enclave operates on pointers passed from the host process to the enclave. There are two guidelines you should always follow when operating on VTL0 data: validate that pointers are actually outside the address range of the VTL1 enclave and create a copy of parameters’ data structures in VTL1 before further validating structure fields.

Validate pointers are in VTL0

An exported enclave function called with the CallEnclave API has a similar type definition to a function invoked by the CreateThread API:

LPVOID (WINAPI *PENCLAVE_ROUTINE)(
    LPVOID lpThreadParameter
);

If the host process wants to pass a structure to the enclave routine, this parameter must be a pointer to data in the VTL0 host process’s memory region... but there is nothing to enforce that by default. In the analogous trust boundary between kernel and user, there are primitives for checking this (ProbeForRead and ProbeForWrite), but no such primitives exist in the enclave runtime.

Consider an example scenario where an enclave holds a secret buffer and the host process can query what the state of the buffer is:

enum State {
    Unallocated,
    Allocated,
    Initialized
};

State g_State = State::Unallocated;
uint8_t *g_SecretData = nullptr;
size_t g_SecretLength = 0;

LPVOID GetState(LPVOID lpParam) {
    State* state = (State*)lpParam;

    if (state == nullptr) {
        return (LPVOID)E_INVALIDARG;
    }

    *state = g_State;

    return (LPVOID)S_OK;
}

LPVOID AllocateBuffer(LPVOID lpParam) {
    size_t size = (size_t)lpParam;

    g_SecretData = new uint8_t[size];
    if (g_SecretData == nullptr) {
        return (LPVOID)E_INVALIDARG;
    }

    g_SecretLength = size;
    g_State = State::Allocated;

    return (LPVOID)S_OK;
}

// Generate secret data in other functions (not displayed)
LPVOID GenerateSecret(LPVOID lpParam) {
    /* ... */
}

A legitimate host will pass a VTL0 pointer to retrieve the state value, but what would happen if you called AllocateBuffer, then called GetState and passed in an address inside the enclave? Let's see what that might look like.

The global variable declarations of g_State, g_SecretData, and g_SecretLength in the code snippet above initializes all three to zero/nullptr. Within the VTL0 address space, the malicious host process allocates a buffer at address 0x00000001`00000000.

When the host calls the AllocateMemory routine with a parameter of 0x20, that function allocates a buffer of length 0x20 and sets g_SecretData to point to that buffer. The function then sets g_SecretLength to be the value passed in, 0x20. Finally, the function updates g_State to State::Allocated (1).

Next, the host process calls GetState, but instead of passing an address of a variable in the host process’s VTL0 address space, it passes an offset relative to g_SecretData (this address is known to the host because the host can easily calculate it based on the exported function addresses). Since the enclave does not validate the parameter, the enclave happily writes the value of g_State to this offset within g_SecretData.

Since g_State is only four bytes wide, it takes a few overlapped writes to fully clear the original value of the g_SecretData pointer.

The third function call completes the overwrite, and now g_SecretData points to 0x00000100`00000000. However, g_SecretLength is 0 now, which is a problem.

One final write sets g_SecretLength to a usable value again, in this case 0x100, so that any operations relying on the length don’t immediately fail because it’s been zeroed out.

Now that both values have been changed, the host has full control of the secret buffer; it can read it, or it can even modify it if necessary!

To prevent this type of pattern, use the EnclaveGetEnclaveInformation API during your enclave’s initialization to figure out what the bounds of your enclave are, then confirm every pointer passed from the VTL0 host process is outside of those bounds before copying data to/from that pointer.

Capture VTL0 structures in VTL1 before checks

Validating that the function parameter lives in VTL0 is only a piece of the puzzle. If the parameter is a structure, you also need to recursively confirm that every pointer in that structure is in the VTL0 address space, every pointer in those structures is in the VTL0 address space, ad infinitum. This is where developers make the second most common misstep: they do not “capture” the structure in VTL1, which is just a fancy way of saying “copy it”. Once you have checked the value of a structure in VTL0, that value is still sitting in VTL0. If the host process is fast enough, it can win the race and change a value with a second thread after the enclave checks a pointer (or other value like a buffer size!) and before it uses the pointer. This is known as a “time of check, time of use” bug, or TOCTOU, and is shown in the following diagram.

To avoid this class of bugs, you should first validate the parameter’s address as previously described, then create a local copy of the parameter in VTL1. Care must be taken to ensure any pointers within the captured structures are also validated and captured as well. After you have done this, you can freely check and use the fields without worrying that they might be changed out from under you!

Avoid reentrancy if possible

The host process can call the enclave’s exported functions using the CallEnclave API, but the enclave can also use the CallEnclave API to call a function in VTL0. A classic use case for this is when the host calls a function in the enclave that will generate an unknown amount of data; the enclave can invoke a callback in VTL0 that allocates the required space in VTL0, returning that address to VTL1 so that the enclave can copy the data out.

However, there is no prohibition on that VTL0 callback calling back into VTL1; this is known as “reentrancy,” and this pattern can often be abused to create use-after-free conditions and other types of bugs. The obvious solution here is to use synchronization primitives – enclaves support the commonly used CRITICAL_SECTION locks – but there is a subtle problem that arises.

When the host calls an entry point in the enclave, the Secure Kernel (SK) assigns execution to a VTL1 thread. If, in the process of executing that call, the enclave calls back out to VTL0, execution continues on the original VTL0 thread. If that host thread calls another entry point in the enclave before returning to the first enclave call, the SK will assign execution on the same thread again...and there’s the rub. Let’s take a look at a scenario that is possible with CRITICAL_SECTION locks:

CRITICAL_SECTION locks allow for recursive locking, which means if a thread tries to call EnterCriticalSection a second time before LeaveCriticalSection, that call will succeed. This can lead to situations where one enclave routine is operating on data it thinks is locked while a second enclave routine is deleting or changing it out from under the first. Thus, a developer writing an enclave that uses CRITICAL_SECTION locks must take great care to either avoid reentrancy all together or establish checks for reentrancy and respond accordingly.

If you absolutely must call back into VTL0 during an enclave routine, the best choice you can make is to only use the other primitive supported by enclaves: SRW locks. SRW locks cannot be acquired recursively, which will prevent a second enclave routine from modifying data out from under a first routine (assuming you have correctly protected your data with the locks, but that is true of all multi-threaded applications).

Keep secrets in the enclave

The whole purpose of a trusted execution environment like an enclave is to protect anyone from ever accessing secrets like encryption keys. Creating this sensitive information in the clear within the untrusted host process and feeding it to the enclave introduces a point of failure that can be exploited to reveal this secret information to untrusted actors on the system. Anything you wish for your enclave to protect should always be generated within the confines of the enclave, and it should never leave the enclave unless securely communicated to a trusted entity; a good example of this is the SQL Always Encrypted enclave: encryption keys are generated in the enclave, and sensitive database contents are only passed to and from the trusted client through an encrypted channel that the enclave has securely negotiated with the client.

It is important to remember that any process can load your enclave, and your enclave cannot tell what process loaded it. Imagine a scenario in which you have an enclave that persists an encryption key in an encrypted file on disk, and your host process needs to provision the encryption key when it does not exist. If the host process handles generating that key and passing it to the enclave, what happens if an attacker gets there first? When an attacker’s untrusted process loads the enclave and passes it an encryption key they control to encrypt and write to disk, your legitimate host process (and its loaded enclave) will not be able to tell the difference. All data encrypted by the enclave with that key will then be fully compromised by the attacker, because they already know the secret key that is protecting it.

If the enclave itself handles generating all secret data, then it matters much less if an attacker tries to provision this data first, because the attacker cannot know the secrets.

Similarly, your enclave should not release secrets outside of a secure channel negotiated with trusted parties. If it does, you might have created conditions for an oracle attack. If you do have sensitive data that your enclave needs to release to a remote trusted party, take care not to use a negotiation protocol that is vulnerable to an interception attack. For example, a Diffie-Hellman key exchange is not an authenticated protocol and allows an attacker to insert themselves between the exchange.

Instead, consider using enclave attestation reports in conjunction with Azure Attestation or Host Guardian Service, which can verify that the enclave’s system is healthy and can be trusted. Through this trusted service and attestation report, the enclave can provide a public key that the trusted party can use to encrypt a session key that can be used to provision the enclave for future secure communications.

Don’t reinvent the wheel

The runtime in enclaves is quite limited compared to a standard user-mode process or even a legacy VTL1 trustlet, so C is the only official language supported for developing enclaves. This can cause some friction, because if a developer wants to use safe coding patterns, like something as simple as a bounds-checking array, they need to “roll their own” ... and rolling your own anything can be error-prone and dangerous.

Options here are limited; official C++ build support may happen at some point but doesn’t yet exist. However, that does not have to stop you! With a little bit of effort and configuration, some C++ standard library features can still be compiled in the enclave environment. Additionally, some of the Windows Implementation Library RAII wrappers can be used once any linking errors are solved through stubbing. If you limit your modifications to only fixing linker errors, you can take advantage of safer containers in C++, rather than reimplementing them yourself.

But, if we’re already delving into the murky waters of “unsupported languages,” we might suggest implementing your enclave in Rust. During a recent MORSE hackathon, we built a simple proof-of-concept enclave in Rust. If you tightly constrain any unsafe behavior to a limited amount of glue code, the Rust language brings to bear the borrow checker for added memory safety.

Conclusion

VBS Enclaves are a great way to protect your sensitive data from even highly privileged actors, but as we’ve seen here, there are a lot of ways to step on rakes. Not only are there common errors that resemble those found in traditional trust boundaries, but there are also some new ones that can be subtle. By following the recommendations we’ve outlined in this article, you can harden your enclave against common vulnerability patterns that we have seen in our reviews!

Updated Mar 05, 2025

Version 2.0

morse

security

josh-watson

Microsoft

Joined February 27, 2025

View Profile

Microsoft Security Blog

Follow this blog board to get notified when there's new activity