arch/xtensa: Add an arch-internal README on register windows
Back when I started work on this stuff, I had a set of notes on register windows that slowly evolved into something that looks like formal documentation. There really isn't any overview-style documentation of this stuff on the public internet, so it couldn't hurt to commit it here for posterity. Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
This commit is contained in:
parent
a230fafde5
commit
d0c538e9a2
108
arch/xtensa/core/README-WINDOWS.rst
Normal file
108
arch/xtensa/core/README-WINDOWS.rst
Normal file
|
@ -0,0 +1,108 @@
|
|||
# How Xtensa register windows work
|
||||
|
||||
There is a paucity of introductory material on this subject, and
|
||||
Zephyr plays some tricks here that require understanding the base
|
||||
layer.
|
||||
|
||||
## Hardware
|
||||
|
||||
When register windows are configured in the CPU, there are either 32
|
||||
or 64 "real" registers in hardware, with 16 visible at one time.
|
||||
Registers are grouped and rotated in units of 4, so there are 8 or 16
|
||||
such "quads" (my term, not Tensilica's) in hardware of which 4 are
|
||||
visible as A0-A15.
|
||||
|
||||
The first quad (A0-A3) is pointed to by a special register called
|
||||
WINDOWBASE. The register file is cyclic, so for example if NREGS==64
|
||||
and WINDOWBASE is 15, quads 15, 0, 1, and 2 will be visible as
|
||||
(respectively) A0-A3, A4-A7, A8-A11, and A12-A15.
|
||||
|
||||
There is a ROTW instruction that can be used to manually rotate the
|
||||
window by a immediate number of quads that are added to WINDOWBASE.
|
||||
Positive rotations "move" high registers into low registers
|
||||
(i.e. after "ROTW 1" the register that used to be called A4 is now
|
||||
A0).
|
||||
|
||||
There are CALL4/CALL8/CALL12 instructions to effect rotated calls
|
||||
which rotate registers upward (i.e. "hiding" low registers from the
|
||||
callee) by 1, 2 or 3 quads. These do not rotate the window
|
||||
themselves. Instead they place the rotation amount in two places
|
||||
(yes, two; see below): the 2-bit CALLINC field of the PS register, and
|
||||
the top two bits of the return address placed in A0.
|
||||
|
||||
There is an ENTRY instruction that does the rotation. It adds CALLINC
|
||||
to WINDOWBASE, at the same time copying the old (now hidden) stack
|
||||
pointer in A1 into the "new" A1 in the rotated frame, subtracting an
|
||||
immediate offset from it to make space for the new frame.
|
||||
|
||||
There is a RETW instruction that undoes the rotation. It reads the
|
||||
top two bits from the return address in A0 and subtracts that value
|
||||
from WINDOWBASE before returning. This is why the CALLINC bits went
|
||||
in two places. They have to be stored on the stack across potentially
|
||||
many calls, so they need to be GPR data that lives in registers and
|
||||
can be spilled. But ENTRY isn't specified to assume a particular
|
||||
return value format and is used immediately, so it makes more sense
|
||||
for it to use processor state instead.
|
||||
|
||||
Note that we still don't know how to detect when the register file has
|
||||
wrapped around and needs to be spilled or filled. To do this there is
|
||||
a WINDOWSTART register used to detect which register quads are in use.
|
||||
The name "start" is somewhat confusing, this is not a pointer.
|
||||
WINDOWSTART stores a bitmask with one bit per hardware quad (so it's 8
|
||||
or 16 bits wide). The bit in windowstart corresponding to WINDOWBASE
|
||||
will be set by the ENTRY instruction, and remain set after rotations
|
||||
until cleared by a function return (by RETW, see below). Other bits
|
||||
stay zero. So there is one set bit in WINDOWSTART corresponding to
|
||||
each call frame that is live in hardware registers, and it will be
|
||||
followed by 0, 1 or 2 zero bits that tell you how "big" (how many
|
||||
quads of registers) that frame is.
|
||||
|
||||
So the CPU executing RETW checks to make sure that the register quad
|
||||
being brought into A0-A3 (i.e. the new WINDOWBASE) has a set bit
|
||||
indicating it's valid. If it does not, the registers must have been
|
||||
spilled and the CPU traps to an exception handler to fill them.
|
||||
|
||||
Likewise, the processor can tell if a high register is "owned" by
|
||||
another call by seeing if there is a one in WINDOWSTART between that
|
||||
register's quad and WINDOWBASE. If there is, the CPU traps to a spill
|
||||
handler to spill one frame. Note that a frame might be only four
|
||||
registers, but it's possible to hit registers 12 out from WINDOWBASE,
|
||||
so it's actually possible to trap again when the instruction restarts
|
||||
to spill a second quad, and even a third time at maximum.
|
||||
|
||||
Finally: note that hardware checks the two bits of WINDOWSTART after
|
||||
the frame bit to detect how many quads are represented by the one
|
||||
frame. So there are six separate exception handlers to spill/fill
|
||||
1/2/3 quads of registers.
|
||||
|
||||
## Software & ABI
|
||||
|
||||
The advantage of the scheme above is that it allows the registers to
|
||||
be spilled naturally into the stack by using the stack pointers
|
||||
embedded in the register file. But the hardware design assumes and to
|
||||
some extent enforces a fairly complicated stack layout to make that
|
||||
work:
|
||||
|
||||
The spill area for a single frame's A0-A3 registers is not in its own
|
||||
stack frame. It lies in the 16 bytes below its CALLEE's stack
|
||||
pointer. This is so that the callee (and exception handlers invoked
|
||||
on its behalf) can see its caller's potentially-spilled stack pointer
|
||||
register (A1) on the stack and be able to walk back up on return.
|
||||
Other architectures do this too by e.g. pushing the incoming stack
|
||||
pointer onto the stack as a standard "frame pointer" defined in the
|
||||
platform ABI. Xtensa wraps this together with the natural spill area
|
||||
for register windows.
|
||||
|
||||
By convention spill regions always store the lowest numbered register
|
||||
in the lowest address.
|
||||
|
||||
The spill area for a frame's A4-A11 registers may or may not exist
|
||||
depending on whether the call was made with CALL8/CALL12. It is legal
|
||||
to write a function using only A0-A3 and CALL4 calls and ignore higher
|
||||
registers. But if those 0-2 register quads are in use, they appear at
|
||||
the top of the stack frame, immediately below the parent call's A0-A3
|
||||
spill area.
|
||||
|
||||
There is no spill area for A12-A15. Those registers are always
|
||||
caller-save. When using CALLn, you always need to overlap 4 registers
|
||||
to provide arguments and take a return value.
|
Loading…
Reference in a new issue