The original design intent with arch_sched_ipi() was that
interprocessor interrupts were fast and easily sent, so to reduce
latency the scheduler should notify other CPUs synchronously when
scheduler state changes.
This tends to result in "storms" of IPIs in some use cases, though.
For example, SOF will enumerate over all cores doing a k_sem_give() to
notify a worker thread pinned to each, each call causing a separate
IPI. Add to that the fact that unlike x86's IO-APIC, the intel_adsp
architecture has targeted/non-broadcast IPIs that need to be repeated
for each core, and suddenly we have an O(N^2) scaling problem in the
number of CPUs.
Instead, batch the "pending" IPIs and send them only at known
scheduling points (end-of-interrupt and swap). This semantically
matches the locations where application code will "expect" to see
other threads run, so arguably is a better choice anyway.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>