Commit graph

136 commits

Author SHA1 Message Date
Daniel Leung b69d2486fe kernel: rename Z_KERNEL_STACK_BUFFER to K_KERNEL_STACK_BUFFER
Simple rename to align the kernel naming scheme. This is being
used throughout the tree, especially in the architecture code.
As this is not a private API internal to kernel, prefix it
appropriately with K_.

Signed-off-by: Daniel Leung <daniel.leung@intel.com>
2024-03-27 19:27:10 -04:00
Daniel Leung 3664ed64c3 arch: move arch_interface.h under zephyr/arch
arch_interface.h is for architecture and should not be
under sys/. So move it under include/zephyr/arch/.

Signed-off-by: Daniel Leung <daniel.leung@intel.com>
2024-03-25 09:58:35 +00:00
Daniel Leung 57d591700b xtensa: mpu: enable userspace support
This extends the Xtensa MPU to support userspace.

Signed-off-by: Daniel Leung <daniel.leung@intel.com>
2024-03-19 22:17:34 -04:00
Daniel Leung df350c7469 xtensa: add MPU support for kernel mode
This enables support for MPU on Xtensa. Currently this is
for kernel mode only.

Signed-off-by: Daniel Leung <daniel.leung@intel.com>
2024-03-19 22:17:34 -04:00
Kai Vehmanen be881d4cf2 arch: xtensa: add isync to interrupt vector
On Intel ADSP platforms, additional "isync" is needed in interrupt
vector to synchronize icache when core is woken up from deeper
sleep state by an interrupt. This is only needed if DSP clock
gating is enabled.

Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
2024-03-15 21:45:57 -04:00
Peter Mitsis b0e527340e arch: xtensa: save/restore HiFi AudioEngine regs
Adds the necessary code required to unconditionally save/restore the
HiFi AE registers. The macros xchal_cp1_load and xchal_cp1_store
are defined in the Xtensa HAL.

Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
2024-03-05 10:57:33 +01:00
Peter Mitsis 520c8c2283 arch: xtensa: Add space for HiFi registers
Updates the xtensa_irq_base_save_area structure to include space
for saving/restoring the HiFi AudioEngine registers used by CP1.

The starting address of these HiFi AE registers also needs to be
referenced from assembly, so it is added to the set of symbols
symbols for which we need an offset to be auto-generated.

Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
2024-03-05 10:57:33 +01:00
Maciej Kusio 352b50bfc9 xtensa: add support for cores without NMI
Some Xtensa cores do not support NMI, so XCHAL_HAVE_NMI=0 and
XCHAL_NMILEVEL won't be defined at all causing
arch/xtensa/include/xtensa-asm2-s.h to throw compilation error.

Fixes: #67855

Signed-off-by: Maciej Kusio <maciejkusio@meta.com>
Signed-off-by: Anas Nashif <anas.nashif@intel.com>
2024-02-28 17:35:54 +00:00
Flavio Ceolin 2590ea280c xtensa: mmu: Optimize autorefill invalidation
There is no need to sync in every xtlb invalidation. Sync only
after all tlb autofill ways invalidation.

Signed-off-by: Flavio Ceolin <flavio.ceolin@intel.com>
2024-01-19 13:50:02 +01:00
Peter Mitsis 5c18a00d37 arch: xtensa: Use wsr.lowercase over wsr.UPPERCASE
wsr.UPPERCASE can lead to compiler errors when UPPERCASE matches
a macro defined in the special register header file.

Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
2024-01-17 09:55:57 +01:00
Peter Mitsis 2075a1b770 arch: xtensa: Use rsr.lowercase over rsr.UPPERCASE
rsr.UPPERCASE can lead to compiler errors when UPPERCASE matches
a macro defined in the special register header file.

Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
2024-01-17 09:55:57 +01:00
Daniel Leung a819bfb2d5 xtensa: rename z_xtensa to simply xtensa
Rename the remaining z_xtensa stuff as these are (mostly)
under arch/xtensa.

Signed-off-by: Daniel Leung <daniel.leung@intel.com>
2023-12-13 09:41:24 +01:00
Daniel Leung 8bf20ee975 xtensa: mmu: rename prefix z_xtensa to xtensa_mmu
This follows the idea to remove any z_ prefix. Since MMU has
a large number of these, separate out these changes into one
commit to ease review effort.

Since these are no longer have z_, these need proper doxygen
doc. So add them too.

Signed-off-by: Daniel Leung <daniel.leung@intel.com>
2023-12-13 09:41:24 +01:00
Daniel Leung 004e68ccea xtensa: move exception handling func to arch internal header
z_xtensa_dump_stack() and z_xtensa_exccause() are both arch
internal functions that should not be exposed in public API.

Signed-off-by: Daniel Leung <daniel.leung@intel.com>
2023-12-13 09:41:24 +01:00
Daniel Leung 43b0b48de7 xtensa: move files under core/include/ into include/
Header files under arch/xtensa/include are considered internal
to architecture. There is really no need for two places to
house architecture internal header files.

Signed-off-by: Daniel Leung <daniel.leung@intel.com>
2023-12-13 09:41:24 +01:00
Daniel Leung 106061b307 xtensa: rename files with hyphens to underscores
Simply to provide some consistencies on file naming under
arch/xtensa.

These are all internally used files and are not public.
So there is no need to provide a deprecation path for
them.

Signed-off-by: Daniel Leung <daniel.leung@intel.com>
2023-12-13 09:41:24 +01:00
Daniel Leung 43990d1c0e xtensa: remove xtensa-asm2.h
xtensa-asm2.h only contains the function declaration of
xtensa_init_stack() which is only used in one file. So
make the actual implementation a static function in that
file. Also there is really no need to expose stack init
function as arch public API. So remove xtensa-asm2.h.

Signed-off-by: Daniel Leung <daniel.leung@intel.com>
2023-12-13 09:41:24 +01:00
Flavio Ceolin c47880af0d arch/xtensa: Add new MMU layer
Andy Ross re-implementation of MMU layer with some subtle changes,
like re-using existent macros, fix page table cache property when
direct mapping it in TLB.

From Andy's original commit message:

This is a reworked MMU layer, sitting cleanly below the page table
handling in the OS.  Notable differences from the original work:

+ Significantly smaller code and simpler API (just three functions to
  be called from the OS/userspace/ptable layer).

+ Big README-MMU document containing my learnings over the process, so
  hopefully fewer people need to go through this in the future.

+ No TLB flushing needed.  Clean separation of ASIDs, just requires
  that the upper levels match the ASID to the L1 page table page
  consistently.

+ Vector mapping is done with a 4k page and not a 4M page, leading to
  much more flexibility with hardware memory layout.  The original
  scheme required that the 4M region containing vecbase be mapped
  virtually to a location other than the hardware address, which makes
  confusing linkage with call0 and difficult initialization
  constraints where the exception vectors run at different addresses
  before and after MMU setup (effectively forcing them to be PIC
  code).

+ More provably correct initialization, all MMU changes happen in a
  single asm block with no memory accesses which would generate a
  refill.

Signed-off-by: Andy Ross <andyross@google.com>
Signed-off-by: Flavio Ceolin <flavio.ceolin@intel.com>
2023-11-21 15:49:48 +01:00
Daniel Leung 0e7def1977 xtensa: selectively init interrupt stack at boot
During arch_kernel_init(), the interrupt stack is being
initialized. However, if the current in-use stack is
the interrupt stack, it would wipe all the data up to
that point in stack, and might result in crash. So skip
initializing the interrupt stack if the current stack
pointer is within the boundary of interrupt stack.

Signed-off-by: Daniel Leung <daniel.leung@intel.com>
2023-11-21 15:49:48 +01:00
Daniel Leung 7a5d2a2d81 xtensa: userspace: swap page tables at context restore
Swap page tables at exit of exception handler if we are going to
be restored to another thread context. Or else we would be using
the outgoing thread's page tables which is not going to work
correctly due to mapping and permissions.

Signed-off-by: Daniel Leung <daniel.leung@intel.com>
2023-11-21 15:49:48 +01:00
Daniel Leung bc0656a92e xtensa: mmu: allocate scratch registers for MMU
When MMU is enabled, we need some scratch registers to preload
page table entries. So update gen_zsr.py to that.

Signed-off-by: Daniel Leung <daniel.leung@intel.com>
Signed-off-by: Flavio Ceolin <flavio.ceolin@intel.com>
2023-11-21 15:49:48 +01:00
Flavio Ceolin a651862b30 xtensa: Enable userspace
Userspace support for Xtensa architecture using Xtensa MMU.

Some considerations:

- Syscalls are not inline functions like in other architectures because
  some compiler issues when using multiple registers to pass parameters
  to the syscall. So here we have a function call so we can use
  registers as we need.
- TLS is not supported by xcc in xtensa and reading PS register is
  a privileged instruction. So, we have to use threadptr to know if a
  thread is an user mode thread.

Signed-off-by: Flavio Ceolin <flavio.ceolin@intel.com>
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
2023-11-21 15:49:48 +01:00
Flavio Ceolin fff91cb542 xtensa: mmu: Simplify initialization
Simplify the logic around xtensa_mmu_init.

- Do not have a different path to init part of kernel
- Call xtensa_mmu_init from C

Signed-off-by: Flavio Ceolin <flavio.ceolin@intel.com>
2023-11-21 15:49:48 +01:00
Daniel Leung fcf22e59b8 xtensa: mark arch_switch ALWAYS_INLINE
arch_switch() is basically an alias to xtensa_switch() so
we can mark arch_switch() as ALWAYS_INLINE to avoid another
function call, especially when no optimization is used when
debugging.

Signed-off-by: Daniel Leung <daniel.leung@intel.com>
2023-09-26 08:37:29 +02:00
Daniel Leung e444cc9fb9 xtensa: mmu: always map data TLB for VECBASE
This adds code to always map data TLB for VECBASE so that
we would be dealing with fewer data TLB misses during
exception handling. With VECBASE always mapped, there is
no need to pre-load anymore.

Signed-off-by: Daniel Leung <daniel.leung@intel.com>
2023-05-23 08:54:29 +02:00
Daniel Leung c3d1fa2138 xtensa: mmu: handle TLB misses in C exception handler
This moves the TLB miss handling to the C exception handler.
This also allows us to handle page faults (for example,
unmapped pages) during this time as any more exceptions
handled in the C handler will not trigger the double
exception handler but the same C handler.

Signed-off-by: Daniel Leung <daniel.leung@intel.com>
2023-05-23 08:54:29 +02:00
Flavio Ceolin 020df54ba4 xtensa: mmu: Initial implementation
Initial support for Xtensa MMU version 3. It is using a two level page
table based on fact that the page table is in the virtual space.  Only
the top level (page directory) is wired mapped in the TLB to avoid
second level page miss.

The mapped memory is completely fragmented in multiple sections, maybe
we find a better way in future.

The exception handler is where we effectively map the memory, the way it
works is:

1) SW try to access some memory address
2) The address is not mapped, so the MMU will try the auto-refill,
   looking the page table
3) The page table contents is not mapped (remember, just the top-level page
   is mapped)
4) An exception will be triggered, in the exception we try to read the
   portion of the page table that maps the original address
5) The address is not mapped, so the MMU will try again the auto-refill.
   This time though, the address is mapped by the top level page that is
   properly mapped. (The top-level page maps the page table itself).

Signed-off-by: Flavio Ceolin <flavio.ceolin@intel.com>
2023-05-23 08:54:29 +02:00
Andy Ross e31ae60058 arch/xtensa: Fix nested interrupt entry
The "cross stack call" mechanism has intermediate states where the
stack frames are not valid for our own interrupt entry code, which
causes corruption if an interrupt races at exactly the right time.
Leave interrupts masked until just before the call.

The fix is midly complicated by the fact that we RELY on nested window
exception frames to spill registers from the interruptee, so have to
do the masking with PS.INTLEVEL, which requires a register to save its
contents, which we don't have since everything needs to happen in one
4-register window.  But thankfully our Zephyr-reserved EPS register is
guaranteed to be available through this process.

Fixes #57009

Signed-off-by: Andy Ross <andyross@google.com>
2023-05-08 16:56:17 -04:00
Anas Nashif 6388f5f106 xtensa: use sys_cache API instead of custom interfaces
Use sys_cache instead of custom and internal APIs.

Signed-off-by: Anas Nashif <anas.nashif@intel.com>
2023-04-26 07:31:22 -04:00
Daniel Leung 1e9d4602ab xtensa: add some structs for interrupt stack frames
This adds some structs for interrupt stack frames to make it
easier to access individual elements, and ultimately getting
rid of magic array element numbers in the code. Hopefully,
this would aid in debugging where you can view the whole
struct in debugger.

Signed-off-by: Daniel Leung <daniel.leung@intel.com>
2023-04-20 04:45:52 -04:00
Lucas Tamborrino 9e289c1b20 arch: xtensa: save FPU register in context switching
Save FP user register and FP register file during context switch.

This change enables shared FP registers mode using CONFIG_FPU_SHARING.

Since there is no lazy stacking, the FPU registers will be saved regardless
of whether floating point calculations are performed in the threads when
CONFIG_FPU_SHARING is enabled. This require 72 additional bytes in the
stack memory.

Signed-off-by: Lucas Tamborrino <lucas.tamborrino@espressif.com>
2022-12-27 13:23:17 +01:00
Kumar Gala c778eb2a56 smp: Move arrays to use CONFIG_MP_MAX_NUM_CPUS
Move to use CONFIG_MP_MAX_NUM_CPUS for array size declarations instead
of CONFIG_MP_NUM_CPUS.

Signed-off-by: Kumar Gala <kumar.gala@intel.com>
2022-10-17 14:40:12 +09:00
Andy Ross b141551cba arch/xtensa: Properly namespace special register API
The Xtensa arch has historically had state/user register accessor
macros with bare three-byte symbol names.  I think this might have
been in the original Cadence-contributed arch integration, but I'm not
sure.  In any case they also exist in the same names in vendor
HAL/toolchain code and are causing collisions.  We never should have
had these symbols exposed in our header.

Put them under an XTENSA_ prefix to decollide.

Signed-off-by: Andy Ross <andyross@google.com>
2022-09-07 20:28:06 -04:00
Andy Ross 910c96b7d8 intel_adsp: meteorlake: Initialize stack flush pointer SR
The simulator seems to drop garbage addresses (somewhere in the ROM it
looks like) into this SR at arbitrary times.  I don't know if this is
a hardware exception handler that we can't turn off, or a simulator
bug, or what.  But our code that assumes it will be cleared to zero or
valid is breaking.  Set it every time in every context switch for now
pending someone figuring out what's going wrong.

Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
Signed-off-by: Anas Nashif <anas.nashif@intel.com>
2022-07-25 16:00:22 -04:00
Stephanos Ioannidis 33f87408c4 global: Correct extern K_KERNEL_STACK_ARRAY_DEFINE usage
This commit corrects all `extern K_KERNEL_STACK_ARRAY_DEFINE` macro
usages to use the `K_KERNEL_STACK_ARRAY_DECLARE` macro instead.

Signed-off-by: Stephanos Ioannidis <root@stephanos.io>
2022-06-20 10:25:52 +02:00
Gerard Marull-Paretas 16811660ee arch: migrate includes to <zephyr/...>
In order to bring consistency in-tree, migrate all arch code to the new
prefix <zephyr/...>. Note that the conversion has been scripted, refer
to zephyrproject-rtos#45388 for more details.

Signed-off-by: Gerard Marull-Paretas <gerard.marull@nordicsemi.no>
2022-05-06 19:57:22 +02:00
Flavio Ceolin f5a0d4cd26 arch: xtensa: Optimize cache management for pinned threads
When building with CONFIG_SCHED_CPU_MASK_PIN_ONLY we can assume that a
thread will always be executed in a same CPU and consequently skip the
cache invalidation.

Signed-off-by: Flavio Ceolin <flavio.ceolin@intel.com>
2022-05-04 13:46:48 -04:00
Andy Ross 64a3159dee arch/xtensa: Optimize cache management on context switch
Making context switch cache-coherent in SMP is hard.  The
KERNEL_COHERENCE handling was conservatively invalidating the stack
region of a thread that was being switched in.  This was because it
might have (1) run on this CPU in the past, but (2) run most recently
on a different CPU.  In that case we might have stale data still in
our local dcache!

But this has performance impact in the (very common!) case of a thread
being switched out briefly and then back in (e.g. k_sleep() for a
small duration).  It will come back having lost all of its cached
stack context, and will have to fetch all that information back from
shared SRAM!

Treat this by tracking a "last_cpu" for each thread in the arch part
of the thread struct.  If we're coming back to the same CPU we left,
we know we can skip the invalidate.

Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2022-04-27 18:54:10 -04:00
Nazar Kazakov 9713f0d47c everywhere: fix typos
Fix a lot of typos

Signed-off-by: Nazar Kazakov <nazar.kazakov.work@gmail.com>
2022-03-14 20:22:24 -04:00
Andy Ross 642fc7ad54 arch/xtensa: Use ZSR assignments for stack flush markers
The kernel coherence cache flush code was using a scratch register to
mark the top of the stack.  Likewise a good candidate for ZSR use.

Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2022-01-20 12:58:00 -05:00
Andy Ross ca7024e1d6 arch/xtensa: Use ZSR assignments for the CPU pointer
Use the zsr.h assignments for the special register containing the
current CPU pointer.

Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2022-01-20 12:58:00 -05:00
Andy Ross 97ada8bc04 arch/xtensa: Promote adsp RPO/cache utilities to an arch API
This is trick (mapping RAM twice so you can use alternate Region
Protection Option addresses to control cacheability) is something any
Xtensa hardware designer might productively choose to do.  And as it
works really well, we should encourage that by making this a generic
architecture feature for Zephyr.

Now everything works by setting two kconfig values at the soc level
defining the cached and uncached regions.  As long as these are
correct, you can then use the new arch_xtensa_un/cached_ptr() APIs to
convert between them and a ARCH_XTENSA_SET_RPO_TLB() macro that
provides much smaller initialization code (in C!) than the HAL
assembly macros.  The conversion routines have been generalized to
support conversion between any two regions.

Note that full KERNEL_COHERENCE still requires support from the
platform linker script, that can't be made generic given the way
Zephyr does linkage.

Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2022-01-11 11:53:53 +01:00
Iuliana Prodan a6364da1a3 arch: xtensa: add workaround for small vector table entries
For some platforms, like NXP's IMX8 or Mediatek's MT8195,
the size of an interrupt vector table entry is 0x1C bytes,
less than usual (0x30 for Intel's platforms).
So, the interrupt handlers don't fit in the vector table
entries.

I've added a small indirection to bypass this size
constraint and moved the default handlers to the end
of vector table, renaming them to
_Level\LVL\()VectorHelper.
For this, I've added a generic configuration -
XTENSA_SMALL_VECTOR_TABLE_ENTRY.

Signed-off-by: Iuliana Prodan <iuliana.prodan@nxp.com>
2021-09-10 10:59:44 -04:00
Andy Ross b76bc6c80d arch/xtensa: Fix outgoing stack flush for dummy threads
On CPU startup, When we reach the cache flush code in arch_switch(),
the outgoing thread is a dummy.  The behavior of the existing code was
to leave the existing value in the SR unchanged (probably NULL at
startup).  Then the context switch would walk from that address up to
the top of the outgoing stack, flushing everything in between.  That's
wrong, because the outgoing stack is a real pointer (generally the
interrupt stack of the current CPU), and we're flushing everything in
memory underneath it.

This also reverts commit 29abc8adc0 ("xtensa: fix booting secondary
cores on the dummy thread"), which appears to have been an early
attempt to address this issue.  It worked (modulo all the extra and
potentially incorrect flushing) on cavs v1.5/1.8 because of the way
the entry code worked there.  But on 2.5 we now hit the first context
switch in a case where those extra lines are in address space already
marked unwritable by the CPU, so the flush explodes.

Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2021-09-03 07:19:34 -04:00
Iuliana Prodan f9810ccbe1 arch: xtensa: modify asm for interrupt sections
For IMX, for timer interrupt, the interrupt handler
was not the correct one executed and that’s because
the handlers were not at the expected address.
For IMX the size constraint of the interrupt vector
table entry is 0x1C bytes of code, less than usual.

I've added a small indirection to bypass this size
constraint and moved the default handlers to the end
of vector table, renaming them to
_Level\LVL\()VectorHelper.

Signed-off-by: Iuliana Prodan <iuliana.prodan@nxp.com>
2021-08-28 23:27:02 -04:00
Guennadi Liakhovetski 29abc8adc0 xtensa: fix booting secondary cores on the dummy thread
When secondary cores are booted, they use the dummy thread and
the IRQ stack until they switch over to a real thread. Therefore
dummy threads shouldn't be skipped when cohering outgoing thread
stack, only threads with zero stack size should be skipped.

Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
2021-05-03 17:13:01 -04:00
Andy Ross ae4f7a1a06 arch/xtensa: Remember to spill windows in arch_cohere_stacks()
When we reach this code in interrupt context, our upper GPRs contain a
cross-stack call that may still include some registers from the
interrupted thread.  Those need to go out to memory before we can do
our cache coherence dance here.

Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2021-03-08 11:14:27 -05:00
Andy Ross b28da4a3b7 arch/xtensa: Invalidate bottom of outbound stacks
Both new thread creation and context switch had the same mistake in
cache management: the bottom of the stack (the "unused" region between
the lower memory bound and the live stack pointer) needs to be
invalidated before we switch, because otherwise any dirty lines we
might have left over can get flushed out on top of the same thread on
another CPU that is putting live data there.

Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2021-03-08 11:14:27 -05:00
Andy Ross 64cf33952d arch/xtensa: Add non-HAL caching primitives
The Xtensa L1 cache layer has straightforward semantics accessible via
single-instructions that operate on cache lines via physical
addresses.  These are very amenable to inlining.

Unfortunately the Xtensa HAL layer requires function calls to do this,
leading to significant code waste at the calling site, an extra frame
on the stack and needless runtime instructions for situations where
the call is over a constant region that could elide the loop.  This is
made even worse because the HAL library is not built with
-ffunction-sections, so pulling in even one of these tiny cache
functions has the effect of importing a 1500-byte object file into the
link!

Add our own tiny cache layer to include/arch/xtensa/cache.h and use
that instead.

Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2021-03-08 11:14:27 -05:00
Andy Ross eb1ef50b6b arch/xtensa: General cleanup, remove dead code
There was a bunch of dead historical cruft floating around in the
arch/xtensa tree, left over from older code versions.  It's time to do
a cleanup pass.  This is entirely refactoring and size optimization,
no behavior changes on any in-tree devices should be present.

Among the more notable changes:

+ xtensa_context.h offered an elaborate API to deal with a stack frame
  and context layout that we no longer use.

+ xtensa_rtos.h was entirely dead code

+ xtensa_timer.h was a parallel abstraction layer implementing in the
  architecture layer what we're already doing in our timer driver.

+ The architecture thread structs (_callee_saved and _thread_arch)
  aren't used by current code, and had dead fields that were removed.
  Unfortunately for standards compliance and C++ compatibility it's
  not possible to leave an empty struct here, so they have a single
  byte field.

+ xtensa_api.h was really just some interrupt management inlines used
  by irq.h, so fold that code into the outer header.

+ Remove the stale assembly offsets.  This architecture doesn't use
  that facility.

All told, more than a thousand lines have been removed.  Not bad.

Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2021-03-08 11:14:27 -05:00