Overview of ARM64EC ABI conventions
ARM64EC is an application binary interface (ABI) that enables ARM64 binaries to run natively and interoperably with x64 code. Specifically, the ARM64EC ABI follows x64 software conventions including calling convention, stack usage, and data alignment, making ARM64EC and x64 code interoperable. The operating system emulates the x64 portion of the binary. (The EC in ARM64EC stands for emulation compatible.)
For more information on the x64 and ARM64 ABIs, see Overview of x64 ABI conventions and Overview of ARM64 ABI conventions.
ARM64EC doesn't solve memory model differences between x64 and ARM based architectures. For more information, see Common Visual C++ ARM migration issues.
Definitions
- ARM64 - The code stream for ARM64 processes that contains traditional ARM64 code.
- ARM64EC - The code stream that utilizes a subset of the ARM64 register set to provide interoperability with x64 code.
Register mapping
x64 processes may have threads running ARM64EC code. So it's always possible to retrieve an x64 register context, ARM64EC uses a subset of the ARM64 core registers that map 1:1 to emulated x64 registers. Importantly, ARM64EC never uses registers outside of this subset, except to read the Thread Environment Block (TEB) address from x18
.
Native ARM64 processes shouldn't regress in performance when some or many functions are recompiled as ARM64EC. To maintain performance, the ABI follows these principles:
The ARM64EC register subset includes all registers that are part of the ARM64 function calling convention.
The ARM64EC calling convention directly maps to the ARM64 calling convention.
Special helper routines like __chkstk_arm64ec
use custom calling conventions and registers. These registers are also included in the ARM64EC subset of registers.
Register mapping for integer registers
ARM64EC register | x64 register | ARM64EC calling convention | ARM64 calling convention | x64 calling convention |
---|---|---|---|---|
x0 |
rcx |
volatile | volatile | volatile |
x1 |
rdx |
volatile | volatile | volatile |
x2 |
r8 |
volatile | volatile | volatile |
x3 |
r9 |
volatile | volatile | volatile |
x4 |
r10 |
volatile | volatile | volatile |
x5 |
r11 |
volatile | volatile | volatile |
x6 |
mm1 (low 64 bits of x87 R1 register) |
volatile | volatile | volatile |
x7 |
mm2 (low 64 bits of x87 R2 register) |
volatile | volatile | volatile |
x8 |
rax |
volatile | volatile | volatile |
x9 |
mm3 (low 64 bits of x87 R3 register) |
volatile | volatile | volatile |
x10 |
mm4 (low 64 bits of x87 R4 register) |
volatile | volatile | volatile |
x11 |
mm5 (low 64 bits of x87 R5 register) |
volatile | volatile | volatile |
x12 |
mm6 (low 64 bits of x87 R6 register) |
volatile | volatile | volatile |
x13 |
N/A | disallowed | volatile | N/A |
x14 |
N/A | disallowed | volatile | N/A |
x15 |
mm7 (low 64 bits of x87 R7 register) |
volatile | volatile | volatile |
x16 |
High 16 bits of each of the x87 R0 -R3 registers |
volatile(xip0 ) |
volatile(xip0 ) |
volatile |
x17 |
High 16 bits of each of the x87 R4 -R7 registers |
volatile(xip1 ) |
volatile(xip1 ) |
volatile |
x18 |
GS.base | fixed(TEB) | fixed(TEB) | fixed(TEB) |
x19 |
r12 |
non-volatile | non-volatile | non-volatile |
x20 |
r13 |
non-volatile | non-volatile | non-volatile |
x21 |
r14 |
non-volatile | non-volatile | non-volatile |
x22 |
r15 |
non-volatile | non-volatile | non-volatile |
x23 |
N/A | disallowed | non-volatile | N/A |
x24 |
N/A | disallowed | non-volatile | N/A |
x25 |
rsi |
non-volatile | non-volatile | non-volatile |
x26 |
rdi |
non-volatile | non-volatile | non-volatile |
x27 |
rbx |
non-volatile | non-volatile | non-volatile |
x28 |
N/A | disallowed | disallowed | N/A |
fp |
rbp |
non-volatile | non-volatile | non-volatile |
lr |
mm0 (low 64 bits of x87 R0 register) |
both | both | both |
sp |
rsp |
non-volatile | non-volatile | non-volatile |
pc |
rip |
instruction pointer | instruction pointer | instruction pointer |
PSTATE subset: N /Z /C /V /SS 1, 2 |
RFLAGS subset: SF /ZF /CF /OF /TF |
volatile | volatile | volatile |
N/A | RFLAGS subset: PF /AF |
N/A | N/A | volatile |
N/A | RFLAGS subset: DF |
N/A | N/A | non-volatile |
1 Avoid directly reading, writing or computing mappings between PSTATE
and RFLAGS
. These bits may be used in the future and are subject to change.
2 The ARM64EC carry flag C
is the inverse of the x64 carry flag CF
for subtraction operations. There's no special handling, because the flag is volatile and is therefore trashed when transitioning between (ARM64EC and x64) functions.
Register mapping for vector registers
ARM64EC register | x64 register | ARM64EC calling convention | ARM64 calling convention | x64 calling convention |
---|---|---|---|---|
v0 -v5 |
xmm0 -xmm5 |
volatile | volatile | volatile |
v6 -v7 |
xmm6 -xmm7 |
volatile | volatile | non-volatile |
v8 -v15 |
xmm8 -xmm15 |
volatile 1 | volatile 1 | non-volatile |
v16 -v31 |
xmm16 -xmm31 |
disallowed | volatile | disallowed (x64 emulator doesn't support AVX-512) |
FPCR 2 |
MXCSR[15:6] |
non-volatile | non-volatile | non-volatile |
FPSR 2 |
MXCSR[5:0] |
volatile | volatile | volatile |
1 These ARM64 registers are special in that the lower 64 bits are non-volatile but the upper 64 bits are volatile. From the point of view of an x64 caller, they're effectively volatile because the callee would trash data.
2 Avoid directly reading, writing, or computing mappings of FPCR
and FPSR
. These bits may be used in the future and are subject to change.
Struct packing
ARM64EC follows the same struct packing rules used for x64 to ensure interoperability between ARM64EC code and x64 code. For more information and examples of x64 struct packing, see Overview of x64 ABI conventions.
Emulation helper ABI routines
ARM64EC code and thunks use emulation helper routines to transition between x64 and ARM64EC functions.
The following table describes each special ABI routine and the registers the ABI uses. The routines don't modify the listed preserved registers under the ABI column. No assumptions should be made about unlisted registers. On-disk, the ABI routine pointers are null. At load time, the loader updates the pointers to point to the x64 emulator routines.
Name | Description | ABI |
---|---|---|
__os_arm64x_dispatch_call_no_redirect |
Called by an exit thunk to call an x64 target (either an x64 function or an x64 fast-forward sequence). The routine pushes the ARM64EC return address (in the LR register) followed by the address of the instruction that succeeds a blr x16 instruction that invokes the x64 emulator. It then runs the blr x16 instruction |
return value in x8 (rax ) |
__os_arm64x_dispatch_ret |
Called by an entry thunk to return to its x64 caller. It pops the x64 return address from the stack and invokes the x64 emulator to jump to it | N/A |
__os_arm64x_check_call |
Called by ARM64EC code with a pointer to an exit thunk and the indirect ARM64EC target address to execute. The ARM64EC target is considered patchable, and execution always returns to the caller with either the same data it was called with, or with modified data | Arguments:x9 : The target addressx10 : The exit thunk addressx11 : The fast forward sequence addressOut: x9 : If the target function was detoured, it contains the address of the fast forward sequencex10 : The exit thunk addressx11 : If the function was detoured, it contains the exit thunk address. Otherwise, the target address jumped toPreserved registers: x0 -x8 , x15 (chkstk ). and q0 -q7 |
__os_arm64x_check_icall |
Called by ARM64EC code, with a pointer to an exit thunk, to handle a jump to either a target address that is either x64 or ARM64EC. If the target is x64 and the x64 code hasn't been patched, the routine sets the target address register. It points to the ARM64EC version of the function if one exists. Otherwise, it sets the register to point to the exit thunk that transitions to the x64 target. Then, it returns to the calling ARM64EC code, which then jumps to the address in the register. This routine is a non-optimized version of __os_arm64x_check_call , where the target address isn't known at compile timeUsed at a call-site of an indirect call |
Arguments:x9 : The target addressx10 : The exit thunk addressx11 : The fast forward sequence addressOut: x9 : If the target function was detoured, it contains the address of the fast forward sequencex10 : The exit thunk addressx11 : If the function was detoured, it contains the exit thunk address. Otherwise, the target address jumped toPreserved registers: x0 -x8 , x15 (chkstk ), and q0 -q7 |
__os_arm64x_check_icall_cfg |
Same as __os_arm64x_check_icall but also checks that the specified address is a valid Control Flow Graph indirect call target |
Arguments:x10 : The address of the exit thunkx11 : The address of the target functionOut: x9 : If the target is x64, the address to the function. Otherwise, undefinedx10 : The address of the exit thunkx11 : If the target is x64, it contains the address of the exit thunk. Otherwise, the address of the functionPreserved registers: x0 -x8 , x15 (chkstk ), and q0 -q7 |
__os_arm64x_get_x64_information |
Gets the requested part of the live x64 register context | _Function_class_(ARM64X_GET_X64_INFORMATION) NTSTATUS LdrpGetX64Information(_In_ ULONG Type, _Out_ PVOID Output, _In_ PVOID ExtraInfo) |
__os_arm64x_set_x64_information |
Sets the requested part of the live x64 register context | _Function_class_(ARM64X_SET_X64_INFORMATION) NTSTATUS LdrpSetX64Information(_In_ ULONG Type,_In_ PVOID Input, _In_ PVOID ExtraInfo) |
__os_arm64x_x64_jump |
Used in signature-less adjustor and other thunks that directly forward (jmp ) a call to another function that can have any signature, deferring the potential application of the right thunk to the real target |
Arguments:x9 : target to jump toAll parameter registers preserved (forwarded) |
Thunks
Thunks are the low-level mechanisms to support ARM64EC and x64 functions calling each other. There are two types: entry thunks for entering ARM64EC functions and exit thunks for calling x64 functions.
Entry thunk and intrinsic entry thunks: x64 to ARM64EC function call
To support x64 callers when a C/C++ function is compiled as ARM64EC, the toolchain generates a single entry thunk consisting of ARM64EC machine code. Intrinsics have an entry thunk of their own. All other functions share an entry thunk with all functions that have a matching calling convention, parameters, and return type. The content of the thunk depends on the calling convention of the C/C++ function.
In addition to handling parameters and the return address, the thunk bridges the differences in volatility between ARM64EC and x64 vector registers caused by ARM64EC vector register mapping:
ARM64EC register | x64 register | ARM64EC calling convention | ARM64 calling convention | x64 calling convention |
---|---|---|---|---|
v6 -v15 |
xmm6 -xmm15 |
volatile, but saved/restored in the entry thunk (x64 to ARM64EC) | volatile or partially volatile upper 64 bits | non-volatile |
The entry thunk performs the following actions:
Parameter number | Stack usage |
---|---|
0-4 | Stores ARM64EC v6 and v7 into the caller-allocated home spaceSince the callee is ARM64EC, which doesn't have the notion of a home space, the stored values aren't clobbered. Allocates an extra 128 bytes on the stack and store ARM64EC v8 through v15 . |
5-8 | x4 = 5th parameter from the stackx5 = 6th parameter from the stackx6 = 7th parameter from the stackx7 = 8th parameter from the stackIf the parameter is SIMD, the v4 -v7 registers are used instead |
9+ | Allocates AlignUp(NumParams - 8 , 2) * 8 bytes on the stack. *Copies the 9th and remaining parameters to this area |
* Aligning the value to an even number guarantees that the stack remains aligned to 16 bytes
If the function accepts a 32-bit integer parameter, the thunk is permitted to only push 32 bits instead of the full 64 bits of the parent register.
Next, the thunk uses an ARM64 bl
instruction to call the ARM64EC function. After the function returns, the thunk:
- Undoes any stack allocations
- Calls the
__os_arm64x_dispatch_ret
emulator helper to pop the x64 return address and resume x64 emulation.
Exit thunk: ARM64EC to x64 function call
For every call that an ARM64EC C/C++ function makes to potential x64 code, the MSVC toolchain generates an exit thunk. The content of the thunk depends on the parameters of the x64 callee and whether the callee is using the standard calling convention or __vectorcall
. The compiler obtains this information from a function declaration for the callee.
First, The thunk pushes the return address that's in the ARM64EC lr
register and a dummy 8-byte value to guarantee that the stack is aligned to 16 bytes. Second, the thunk handles the parameters:
Parameter number | Stack usage |
---|---|
0-4 | Allocates 32 bytes of home space on the stack |
5-8 | Allocates AlignUp(NumParams - 4, 2) * 8 more bytes higher up on the stack. * Copies the 5th and any subsequent parameters from ARM64EC's x4 -x7 to this extra space |
9+ | Copies the 9th and remaining parameters to the extra space |
* Aligning the value to an even number guarantees that the stack remains aligned to 16 bytes.
Third, the thunk calls the __os_arm64x_dispatch_call_no_redirect
emulator helper to invoke the x64 emulator to run the x64 function. The call must be a blr x16
instruction (conveniently, x16
is a volatile register). A blr x16
instruction is required because the x64 emulator parses this instruction as a hint.
The x64 function usually attempts to return to the emulator helper by using an x64 ret
instruction. At this point, the x64 emulator detects that it's in ARM64EC code. It then reads the preceding 4-byte hint that happens to be the ARM64 blr x16
instruction. Since this hint indicates that the return address is in this helper, the emulator jumps directly to this address.
The x64 function is permitted to return to the emulator helper using any branch instruction, including x64 jmp
and call
. The emulator handles these scenarios as well.
When the helper then returns to the thunk, the thunk:
- Undoes any stack allocation
- Pops the ARM64EC
lr
register - Executes an ARM64
ret lr
instruction.
ARM64EC function name decoration
An ARM64EC function name has a secondary decoration applied after any language-specific decoration. For functions with C linkage (whether compiled as C or by using extern "C"
), a #
is prepended to the name. For C++ decorated functions, a $$h
tag is inserted into the name.
foo => #foo
?foo@@YAHXZ => ?foo@@$$hYAHXZ
__vectorcall
The ARM64EC toolchain currently doesn't support __vectorcall
. The compiler emits an error when it detects __vectorcall
usage with ARM64EC.
See also
Understanding ARM64EC ABI and assembly code
Common Visual C++ ARM migration issues
Decorated names