doc: provide error handling documentation
We don't really have docs on how fatal errors are induced or handled. Provide some documentation that covers: - Assertions (runtime and build) - Kernel panic and oops conditions - Stack overflows - Other exceptions - Exception handling policy Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
This commit is contained in:
parent
c311aa4675
commit
4ce988ab43
|
@ -115,3 +115,4 @@ These pages cover other kernel services.
|
|||
other/ring_buffers.rst
|
||||
other/cxx_support.rst
|
||||
other/version.rst
|
||||
other/fatal.rst
|
||||
|
|
263
doc/reference/kernel/other/fatal.rst
Normal file
263
doc/reference/kernel/other/fatal.rst
Normal file
|
@ -0,0 +1,263 @@
|
|||
.. _fatal:
|
||||
|
||||
Fatal Errors
|
||||
############
|
||||
|
||||
Software Errors Triggered in Source Code
|
||||
****************************************
|
||||
|
||||
Zephyr provides several methods for inducing fatal error conditions through
|
||||
either build-time checks, conditionally compiled assertions, or deliberately
|
||||
invoked panic or oops conditions.
|
||||
|
||||
Runtime Assertions
|
||||
==================
|
||||
|
||||
Zephyr provides some macros to perform runtime assertions which may be
|
||||
conditionally compiled. Their definitions may be found in
|
||||
:zephyr_file:`include/sys/__assert.h`.
|
||||
|
||||
Assertions are enabled by setting the ``__ASSERT_ON`` preprocessor symbol to a
|
||||
non-zero value. There are two ways to do this:
|
||||
|
||||
- Use the :option:`CONFIG_ASSERT` and :option:`CONFIG_ASSERT_LEVEL` kconfig
|
||||
options.
|
||||
- Add ``-D__ASSERT_ON=<level>`` to the project's CFLAGS, either on the
|
||||
build command line or in a CMakeLists.txt.
|
||||
|
||||
The ``__ASSERT_ON`` method takes precedence over the kconfig option if both are
|
||||
used.
|
||||
|
||||
Specifying an assertion level of 1 causes the compiler to issue warnings that
|
||||
the kernel contains debug-type ``__ASSERT()`` statements; this reminder is
|
||||
issued since assertion code is not normally present in a final product.
|
||||
Specifying assertion level 2 suppresses these warnings.
|
||||
|
||||
Assertions are enabled by default when running Zephyr test cases, as
|
||||
configured by the :option:`CONFIG_TEST` option.
|
||||
|
||||
The policy for what to do when encountering a failed assertion is controlled
|
||||
by the implementation of :c:func:`assert_post_action`. Zephyr provides
|
||||
a default implementation with weak linkage which invokes a kernel oops if
|
||||
the thread that failed the assertion was running in user mode, and a kernel
|
||||
panic otherwise.
|
||||
|
||||
__ASSERT()
|
||||
----------
|
||||
|
||||
The ``__ASSERT()`` macro can be used inside kernel and application code to
|
||||
perform optional runtime checks which will induce a fatal error if the
|
||||
check does not pass. The macro takes a string message which will be printed
|
||||
to provide context to the assertion. In addition, the kernel will print
|
||||
a text representation of the expression code that was evaluated, and the
|
||||
file and line number where the assertion can be found.
|
||||
|
||||
For example:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
__ASSERT(foo == 0xF0CACC1A, "Invalid value of foo, got 0x%x", foo);
|
||||
|
||||
If at runtime ``foo`` had some unexpected value, the error produced may
|
||||
look like the following:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
ASSERTION FAIL [foo == 0xF0CACC1A] @ ZEPHYR_BASE/tests/kernel/fatal/src/main.c:367
|
||||
Invalid value of foo, got 0xdeadbeef
|
||||
[00:00:00.000,000] <err> os: r0/a1: 0x00000004 r1/a2: 0x0000016f r2/a3: 0x00000000
|
||||
[00:00:00.000,000] <err> os: r3/a4: 0x00000000 r12/ip: 0x00000000 r14/lr: 0x00000a6d
|
||||
[00:00:00.000,000] <err> os: xpsr: 0x61000000
|
||||
[00:00:00.000,000] <err> os: Faulting instruction address (r15/pc): 0x00009fe4
|
||||
[00:00:00.000,000] <err> os: >>> ZEPHYR FATAL ERROR 4: Kernel panic
|
||||
[00:00:00.000,000] <err> os: Current thread: 0x20000414 (main)
|
||||
[00:00:00.000,000] <err> os: Halting system
|
||||
|
||||
__ASSERT_EVAL()
|
||||
---------------
|
||||
|
||||
The ``__ASSERT_EVAL()`` macro can also be used inside kernel and application
|
||||
code, with special semantics for the evaluation of its arguments.
|
||||
|
||||
It makes use of the ``__ASSERT()`` macro, but has some extra flexibility. It
|
||||
allows the developer to specify different actions depending whether the
|
||||
``__ASSERT()`` macro is enabled or not. This can be particularly useful to
|
||||
prevent the compiler from generating comments (errors, warnings or remarks)
|
||||
about variables that are only used with ``__ASSERT()`` being assigned a value,
|
||||
but otherwise unused when the ``__ASSERT()`` macro is disabled.
|
||||
|
||||
Consider the following example:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
int x;
|
||||
x = foo();
|
||||
__ASSERT(x != 0, "foo() returned zero!");
|
||||
|
||||
If ``__ASSERT()`` is disabled, then 'x' is assigned a value, but never used.
|
||||
This type of situation can be resolved using the __ASSERT_EVAL() macro.
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
__ASSERT_EVAL ((void) foo(),
|
||||
int x = foo(),
|
||||
x != 0,
|
||||
"foo() returned zero!");
|
||||
|
||||
The first parameter tells ``__ASSERT_EVAL()`` what to do if ``__ASSERT()`` is
|
||||
disabled. The second parameter tells ``__ASSERT_EVAL()`` what to do if
|
||||
``__ASSERT()`` is enabled. The third and fourth parameters are the parameters
|
||||
it passes to ``__ASSERT()``.
|
||||
|
||||
__ASSERT_NO_MSG()
|
||||
-----------------
|
||||
|
||||
The ``__ASSERT_NO_MSG()`` macro can be used to perform an assertion that
|
||||
reports the failed test and its location, but lacks additional debugging
|
||||
information provided to assist the user in diagnosing the problem; its use is
|
||||
discouraged.
|
||||
|
||||
Build Assertions
|
||||
================
|
||||
|
||||
Zephyr provides two macros for performing build-time assertion checks.
|
||||
These are evaluated completely at compile-time, and are always checked.
|
||||
|
||||
BUILD_ASSERT_MSG()
|
||||
------------------
|
||||
|
||||
This has the same semantics as C's ``_Static_assert`` or C++'s
|
||||
``static_assert``. If the evaluation fails, a build error will be generated by
|
||||
the compiler. If the compiler supports it, the provided message will be printed
|
||||
to provide further context.
|
||||
|
||||
Unlike ``__ASSERT()``, the message must be a static string, without
|
||||
:c:func:`printf()`-like format codes or extra arguments.
|
||||
|
||||
For example, suppose this check fails:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
BUILD_ASSERT_MSG(FOO == 2000,
|
||||
"Invalid value of FOO");
|
||||
|
||||
With GCC, the output resembles:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
tests/kernel/fatal/src/main.c: In function 'test_main':
|
||||
include/toolchain/gcc.h:28:37: error: static assertion failed: "Invalid value of FOO"
|
||||
#define BUILD_ASSERT_MSG(EXPR, MSG) _Static_assert(EXPR, MSG)
|
||||
^~~~~~~~~~~~~~
|
||||
tests/kernel/fatal/src/main.c:370:2: note: in expansion of macro 'BUILD_ASSERT_MSG'
|
||||
BUILD_ASSERT_MSG(FOO == 2000,
|
||||
^~~~~~~~~~~~~~~~
|
||||
|
||||
BUILD_ASSERT()
|
||||
--------------
|
||||
|
||||
This works just like ``BUILD_ASSERT_MSG()`` except there is no supplemental
|
||||
message provided, and like ``__ASSERT_NO_MSG()`` its use is discouraged.
|
||||
|
||||
Kernel Oops
|
||||
===========
|
||||
|
||||
A kernel oops is a software triggered fatal error invoked by
|
||||
:c:func:`k_oops()`. This should be used to indicate an unrecoverable condition
|
||||
in application logic.
|
||||
|
||||
The fatal error reason code generated will be ``K_ERR_KERNEL_OOPS``.
|
||||
|
||||
Kernel Panic
|
||||
============
|
||||
|
||||
A kernel error is a software triggered fatal error invoked by
|
||||
:c:func:`k_panic()`. This should be used to indicate that the Zephyr kernel is
|
||||
in an unrecoverable state. Implementations of
|
||||
:c:func:`k_sys_fatal_error_handler()` should not return if the kernel
|
||||
encounters a panic condition, as the entire system needs to be reset.
|
||||
|
||||
Threads running in user mode are not permitted to invoke :c:func:`k_panic()`,
|
||||
and doing so will generate a kernel oops instead. Otherwise, the fatal error
|
||||
reason code generated will be ``K_ERR_KERNEL_PANIC``.
|
||||
|
||||
Exceptions
|
||||
**********
|
||||
|
||||
Spurious Interrupts
|
||||
===================
|
||||
|
||||
If the CPU receives a hardware interrupt on an interrupt line that has not had
|
||||
a handler installed with ``IRQ_CONNECT()`` or :c:func:`irq_connect_dynamic()`,
|
||||
then the kernel will generate a fatal error with the reason code
|
||||
``K_ERR_SPURIOUS_IRQ()``.
|
||||
|
||||
Stack Overflows
|
||||
===============
|
||||
|
||||
In the event that a thread pushes more data onto its execution stack than its
|
||||
stack buffer provides, the kernel may be able to detect this situation and
|
||||
generate a fatal error with a reason code of ``K_ERR_STACK_CHK_FAIL``.
|
||||
|
||||
If a thread is running in user mode, then stack overflows are always caught,
|
||||
as the thread will simply not have permission to write to adjacent memory
|
||||
addresses outside of the stack buffer. Because this is enforced by the
|
||||
memory protection hardware, there is no risk of data corruption to memory
|
||||
that the thread would not otherwise be able to write to.
|
||||
|
||||
If a thread is running in supervisor mode, or if :option:`CONFIG_USERSPACE` is
|
||||
not enabled, depending on configuration stack overflows may or may not be
|
||||
caught. :option:`CONFIG_HW_STACK_PROTECTION` is supported on some
|
||||
architectures and will catch stack overflows in supervisor mode, including
|
||||
when handling a system call on behalf of a user thread. Typically this is
|
||||
implemented via dedicated CPU features, or read-only MMU/MPU guard regions
|
||||
placed immediately adjacent to the stack buffer. Stack overflows caught in this
|
||||
way can detect the overflow, but cannot guarantee against data corruption and
|
||||
should be treated as a very serious condition impacting the health of the
|
||||
entire system.
|
||||
|
||||
If a platform lacks memory management hardware support,
|
||||
:option:`CONFIG_STACK_SENTINEL` is a software-only stack overflow detection
|
||||
feature which periodically checks if a sentinel value at the end of the stack
|
||||
buffer has been corrupted. It does not require hardware support, but provides
|
||||
no protection against data corruption. Since the checks are typically done at
|
||||
interrupt exit, the overflow may be detected a nontrivial amount of time after
|
||||
the stack actually overflowed.
|
||||
|
||||
Finally, Zephyr supports GCC compiler stack canaries via
|
||||
:option:`CONFIG_STACK_CANARIES`. If enabled, the compiler will insert a canary
|
||||
value randomly generated at boot into function stack frames, checking that the
|
||||
canary has not been overwritten at function exit. If the check fails, the
|
||||
compiler invokes :c:func:`__stack_chk_fail()`, whose Zephyr implementation
|
||||
invokes a fatal stack overflow error. An error in this case does not indicate
|
||||
that the entire stack buffer has overflowed, but instead that the current
|
||||
function stack frame has been corrupted. See the compiler documentation for
|
||||
more details.
|
||||
|
||||
Other Exceptions
|
||||
================
|
||||
|
||||
Any other type of unhandled CPU exception will generate an error code of
|
||||
``K_ERR_CPU_EXCEPTION``.
|
||||
|
||||
Fatal Error Handling
|
||||
********************
|
||||
|
||||
The policy for what to do when encountering a fatal error is determined by the
|
||||
implementation of the :c:func:`k_sys_fatal_error_handler()` function. This
|
||||
function has a default implementation with weak linkage that calls
|
||||
``LOG_PANIC()`` to dump all pending logging messages and then unconditionally
|
||||
halts the system with :c:func:`k_fatal_halt()`.
|
||||
|
||||
Applications are free to implement their own error handling policy by
|
||||
overriding the implementation of :c:func:`k_sys_fatal_error_handler()`.
|
||||
If the implementation returns, the faulting thread will be aborted and
|
||||
the system will otherwise continue to function. See the documentation for
|
||||
this function for additional details and constraints.
|
||||
|
||||
API Reference
|
||||
*************
|
||||
|
||||
.. doxygengroup:: fatal_apis
|
||||
:project: Zephyr
|
||||
|
|
@ -4,12 +4,22 @@
|
|||
* SPDX-License-Identifier: Apache-2.0
|
||||
*/
|
||||
|
||||
/** @file
|
||||
* @brief Fatal error functions
|
||||
*/
|
||||
|
||||
#ifndef ZEPHYR_INCLUDE_FATAL_H
|
||||
#define ZEPHYR_INCLUDE_FATAL_H
|
||||
|
||||
#include <arch/cpu.h>
|
||||
#include <toolchain.h>
|
||||
|
||||
/**
|
||||
* @defgroup fatal_apis Fatal error APIs
|
||||
* @ingroup kernel_apis
|
||||
* @{
|
||||
*/
|
||||
|
||||
enum k_fatal_error_reason {
|
||||
/** Generic CPU exception, not covered by other codes */
|
||||
K_ERR_CPU_EXCEPTION,
|
||||
|
@ -88,4 +98,6 @@ void k_sys_fatal_error_handler(unsigned int reason, const z_arch_esf_t *esf);
|
|||
*/
|
||||
void z_fatal_error(unsigned int reason, const z_arch_esf_t *esf);
|
||||
|
||||
/** @} */
|
||||
|
||||
#endif /* ZEPHYR_INCLUDE_FATAL_H */
|
||||
|
|
|
@ -4,60 +4,6 @@
|
|||
* SPDX-License-Identifier: Apache-2.0
|
||||
*/
|
||||
|
||||
/**
|
||||
* @file
|
||||
* @brief Debug aid
|
||||
*
|
||||
*
|
||||
* The __ASSERT() macro can be used inside kernel code.
|
||||
*
|
||||
* Assertions are enabled by setting the __ASSERT_ON symbol to a non-zero value.
|
||||
* There are two ways to do this:
|
||||
* a) Use the ASSERT and ASSERT_LEVEL kconfig options
|
||||
* b) Add "CFLAGS += -D__ASSERT_ON=<level>" at the end of a project's Makefile
|
||||
* The Makefile method takes precedence over the kconfig option if both are
|
||||
* used.
|
||||
*
|
||||
* Specifying an assertion level of 1 causes the compiler to issue warnings that
|
||||
* the kernel contains debug-type __ASSERT() statements; this reminder is issued
|
||||
* since assertion code is not normally present in a final product. Specifying
|
||||
* assertion level 2 suppresses these warnings.
|
||||
*
|
||||
* The __ASSERT_EVAL() macro can also be used inside kernel code.
|
||||
*
|
||||
* It makes use of the __ASSERT() macro, but has some extra flexibility. It
|
||||
* allows the developer to specify different actions depending whether the
|
||||
* __ASSERT() macro is enabled or not. This can be particularly useful to
|
||||
* prevent the compiler from generating comments (errors, warnings or remarks)
|
||||
* about variables that are only used with __ASSERT() being assigned a value,
|
||||
* but otherwise unused when the __ASSERT() macro is disabled.
|
||||
*
|
||||
* Consider the following example:
|
||||
*
|
||||
* int x;
|
||||
*
|
||||
* x = foo ();
|
||||
* __ASSERT (x != 0, "foo() returned zero!");
|
||||
*
|
||||
* If __ASSERT() is disabled, then 'x' is assigned a value, but never used.
|
||||
* This type of situation can be resolved using the __ASSERT_EVAL() macro.
|
||||
*
|
||||
* __ASSERT_EVAL ((void) foo(),
|
||||
* int x = foo(),
|
||||
* x != 0,
|
||||
* "foo() returned zero!");
|
||||
*
|
||||
* The first parameter tells __ASSERT_EVAL() what to do if __ASSERT() is
|
||||
* disabled. The second parameter tells __ASSERT_EVAL() what to do if
|
||||
* __ASSERT() is enabled. The third and fourth parameters are the parameters
|
||||
* it passes to __ASSERT().
|
||||
*
|
||||
* The __ASSERT_NO_MSG() macro can be used to perform an assertion that reports
|
||||
* the failed test and its location, but lacks additional debugging information
|
||||
* provided to assist the user in diagnosing the problem; its use is
|
||||
* discouraged.
|
||||
*/
|
||||
|
||||
#ifndef ZEPHYR_INCLUDE_SYS___ASSERT_H_
|
||||
#define ZEPHYR_INCLUDE_SYS___ASSERT_H_
|
||||
|
||||
|
|
|
@ -117,9 +117,12 @@ config ASSERT
|
|||
default y if TEST
|
||||
help
|
||||
This enables the __ASSERT() macro in the kernel code. If an assertion
|
||||
fails, the calling thread is put on an infinite tight loop. Since
|
||||
enabling this adds a significant footprint, it should only be enabled
|
||||
in a non-production system.
|
||||
fails, the policy for what to do is controlled by the implementation
|
||||
of the assert_post_action() function, which by default will trigger
|
||||
a fatal error.
|
||||
|
||||
Disabling this option will cause assertions to compile to nothing,
|
||||
improving performance and system footprint.
|
||||
|
||||
config ASSERT_LEVEL
|
||||
int "__ASSERT() level"
|
||||
|
|
Loading…
Reference in a new issue