doc: provide error handling documentation

We don't really have docs on how fatal errors are induced
or handled. Provide some documentation that covers:

- Assertions (runtime and build)
- Kernel panic and oops conditions
- Stack overflows
- Other exceptions
- Exception handling policy

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
This commit is contained in:
Andrew Boie 2019-09-14 23:40:03 -07:00 committed by Ioannis Glaropoulos
parent c311aa4675
commit 4ce988ab43
5 changed files with 282 additions and 57 deletions

View file

@ -115,3 +115,4 @@ These pages cover other kernel services.
other/ring_buffers.rst other/ring_buffers.rst
other/cxx_support.rst other/cxx_support.rst
other/version.rst other/version.rst
other/fatal.rst

View file

@ -0,0 +1,263 @@
.. _fatal:
Fatal Errors
############
Software Errors Triggered in Source Code
****************************************
Zephyr provides several methods for inducing fatal error conditions through
either build-time checks, conditionally compiled assertions, or deliberately
invoked panic or oops conditions.
Runtime Assertions
==================
Zephyr provides some macros to perform runtime assertions which may be
conditionally compiled. Their definitions may be found in
:zephyr_file:`include/sys/__assert.h`.
Assertions are enabled by setting the ``__ASSERT_ON`` preprocessor symbol to a
non-zero value. There are two ways to do this:
- Use the :option:`CONFIG_ASSERT` and :option:`CONFIG_ASSERT_LEVEL` kconfig
options.
- Add ``-D__ASSERT_ON=<level>`` to the project's CFLAGS, either on the
build command line or in a CMakeLists.txt.
The ``__ASSERT_ON`` method takes precedence over the kconfig option if both are
used.
Specifying an assertion level of 1 causes the compiler to issue warnings that
the kernel contains debug-type ``__ASSERT()`` statements; this reminder is
issued since assertion code is not normally present in a final product.
Specifying assertion level 2 suppresses these warnings.
Assertions are enabled by default when running Zephyr test cases, as
configured by the :option:`CONFIG_TEST` option.
The policy for what to do when encountering a failed assertion is controlled
by the implementation of :c:func:`assert_post_action`. Zephyr provides
a default implementation with weak linkage which invokes a kernel oops if
the thread that failed the assertion was running in user mode, and a kernel
panic otherwise.
__ASSERT()
----------
The ``__ASSERT()`` macro can be used inside kernel and application code to
perform optional runtime checks which will induce a fatal error if the
check does not pass. The macro takes a string message which will be printed
to provide context to the assertion. In addition, the kernel will print
a text representation of the expression code that was evaluated, and the
file and line number where the assertion can be found.
For example:
.. code-block:: c
__ASSERT(foo == 0xF0CACC1A, "Invalid value of foo, got 0x%x", foo);
If at runtime ``foo`` had some unexpected value, the error produced may
look like the following:
.. code-block:: none
ASSERTION FAIL [foo == 0xF0CACC1A] @ ZEPHYR_BASE/tests/kernel/fatal/src/main.c:367
Invalid value of foo, got 0xdeadbeef
[00:00:00.000,000] <err> os: r0/a1: 0x00000004 r1/a2: 0x0000016f r2/a3: 0x00000000
[00:00:00.000,000] <err> os: r3/a4: 0x00000000 r12/ip: 0x00000000 r14/lr: 0x00000a6d
[00:00:00.000,000] <err> os: xpsr: 0x61000000
[00:00:00.000,000] <err> os: Faulting instruction address (r15/pc): 0x00009fe4
[00:00:00.000,000] <err> os: >>> ZEPHYR FATAL ERROR 4: Kernel panic
[00:00:00.000,000] <err> os: Current thread: 0x20000414 (main)
[00:00:00.000,000] <err> os: Halting system
__ASSERT_EVAL()
---------------
The ``__ASSERT_EVAL()`` macro can also be used inside kernel and application
code, with special semantics for the evaluation of its arguments.
It makes use of the ``__ASSERT()`` macro, but has some extra flexibility. It
allows the developer to specify different actions depending whether the
``__ASSERT()`` macro is enabled or not. This can be particularly useful to
prevent the compiler from generating comments (errors, warnings or remarks)
about variables that are only used with ``__ASSERT()`` being assigned a value,
but otherwise unused when the ``__ASSERT()`` macro is disabled.
Consider the following example:
.. code-block:: c
int x;
x = foo();
__ASSERT(x != 0, "foo() returned zero!");
If ``__ASSERT()`` is disabled, then 'x' is assigned a value, but never used.
This type of situation can be resolved using the __ASSERT_EVAL() macro.
.. code-block:: c
__ASSERT_EVAL ((void) foo(),
int x = foo(),
x != 0,
"foo() returned zero!");
The first parameter tells ``__ASSERT_EVAL()`` what to do if ``__ASSERT()`` is
disabled. The second parameter tells ``__ASSERT_EVAL()`` what to do if
``__ASSERT()`` is enabled. The third and fourth parameters are the parameters
it passes to ``__ASSERT()``.
__ASSERT_NO_MSG()
-----------------
The ``__ASSERT_NO_MSG()`` macro can be used to perform an assertion that
reports the failed test and its location, but lacks additional debugging
information provided to assist the user in diagnosing the problem; its use is
discouraged.
Build Assertions
================
Zephyr provides two macros for performing build-time assertion checks.
These are evaluated completely at compile-time, and are always checked.
BUILD_ASSERT_MSG()
------------------
This has the same semantics as C's ``_Static_assert`` or C++'s
``static_assert``. If the evaluation fails, a build error will be generated by
the compiler. If the compiler supports it, the provided message will be printed
to provide further context.
Unlike ``__ASSERT()``, the message must be a static string, without
:c:func:`printf()`-like format codes or extra arguments.
For example, suppose this check fails:
.. code-block:: c
BUILD_ASSERT_MSG(FOO == 2000,
"Invalid value of FOO");
With GCC, the output resembles:
.. code-block:: none
tests/kernel/fatal/src/main.c: In function 'test_main':
include/toolchain/gcc.h:28:37: error: static assertion failed: "Invalid value of FOO"
#define BUILD_ASSERT_MSG(EXPR, MSG) _Static_assert(EXPR, MSG)
^~~~~~~~~~~~~~
tests/kernel/fatal/src/main.c:370:2: note: in expansion of macro 'BUILD_ASSERT_MSG'
BUILD_ASSERT_MSG(FOO == 2000,
^~~~~~~~~~~~~~~~
BUILD_ASSERT()
--------------
This works just like ``BUILD_ASSERT_MSG()`` except there is no supplemental
message provided, and like ``__ASSERT_NO_MSG()`` its use is discouraged.
Kernel Oops
===========
A kernel oops is a software triggered fatal error invoked by
:c:func:`k_oops()`. This should be used to indicate an unrecoverable condition
in application logic.
The fatal error reason code generated will be ``K_ERR_KERNEL_OOPS``.
Kernel Panic
============
A kernel error is a software triggered fatal error invoked by
:c:func:`k_panic()`. This should be used to indicate that the Zephyr kernel is
in an unrecoverable state. Implementations of
:c:func:`k_sys_fatal_error_handler()` should not return if the kernel
encounters a panic condition, as the entire system needs to be reset.
Threads running in user mode are not permitted to invoke :c:func:`k_panic()`,
and doing so will generate a kernel oops instead. Otherwise, the fatal error
reason code generated will be ``K_ERR_KERNEL_PANIC``.
Exceptions
**********
Spurious Interrupts
===================
If the CPU receives a hardware interrupt on an interrupt line that has not had
a handler installed with ``IRQ_CONNECT()`` or :c:func:`irq_connect_dynamic()`,
then the kernel will generate a fatal error with the reason code
``K_ERR_SPURIOUS_IRQ()``.
Stack Overflows
===============
In the event that a thread pushes more data onto its execution stack than its
stack buffer provides, the kernel may be able to detect this situation and
generate a fatal error with a reason code of ``K_ERR_STACK_CHK_FAIL``.
If a thread is running in user mode, then stack overflows are always caught,
as the thread will simply not have permission to write to adjacent memory
addresses outside of the stack buffer. Because this is enforced by the
memory protection hardware, there is no risk of data corruption to memory
that the thread would not otherwise be able to write to.
If a thread is running in supervisor mode, or if :option:`CONFIG_USERSPACE` is
not enabled, depending on configuration stack overflows may or may not be
caught. :option:`CONFIG_HW_STACK_PROTECTION` is supported on some
architectures and will catch stack overflows in supervisor mode, including
when handling a system call on behalf of a user thread. Typically this is
implemented via dedicated CPU features, or read-only MMU/MPU guard regions
placed immediately adjacent to the stack buffer. Stack overflows caught in this
way can detect the overflow, but cannot guarantee against data corruption and
should be treated as a very serious condition impacting the health of the
entire system.
If a platform lacks memory management hardware support,
:option:`CONFIG_STACK_SENTINEL` is a software-only stack overflow detection
feature which periodically checks if a sentinel value at the end of the stack
buffer has been corrupted. It does not require hardware support, but provides
no protection against data corruption. Since the checks are typically done at
interrupt exit, the overflow may be detected a nontrivial amount of time after
the stack actually overflowed.
Finally, Zephyr supports GCC compiler stack canaries via
:option:`CONFIG_STACK_CANARIES`. If enabled, the compiler will insert a canary
value randomly generated at boot into function stack frames, checking that the
canary has not been overwritten at function exit. If the check fails, the
compiler invokes :c:func:`__stack_chk_fail()`, whose Zephyr implementation
invokes a fatal stack overflow error. An error in this case does not indicate
that the entire stack buffer has overflowed, but instead that the current
function stack frame has been corrupted. See the compiler documentation for
more details.
Other Exceptions
================
Any other type of unhandled CPU exception will generate an error code of
``K_ERR_CPU_EXCEPTION``.
Fatal Error Handling
********************
The policy for what to do when encountering a fatal error is determined by the
implementation of the :c:func:`k_sys_fatal_error_handler()` function. This
function has a default implementation with weak linkage that calls
``LOG_PANIC()`` to dump all pending logging messages and then unconditionally
halts the system with :c:func:`k_fatal_halt()`.
Applications are free to implement their own error handling policy by
overriding the implementation of :c:func:`k_sys_fatal_error_handler()`.
If the implementation returns, the faulting thread will be aborted and
the system will otherwise continue to function. See the documentation for
this function for additional details and constraints.
API Reference
*************
.. doxygengroup:: fatal_apis
:project: Zephyr

View file

@ -4,12 +4,22 @@
* SPDX-License-Identifier: Apache-2.0 * SPDX-License-Identifier: Apache-2.0
*/ */
/** @file
* @brief Fatal error functions
*/
#ifndef ZEPHYR_INCLUDE_FATAL_H #ifndef ZEPHYR_INCLUDE_FATAL_H
#define ZEPHYR_INCLUDE_FATAL_H #define ZEPHYR_INCLUDE_FATAL_H
#include <arch/cpu.h> #include <arch/cpu.h>
#include <toolchain.h> #include <toolchain.h>
/**
* @defgroup fatal_apis Fatal error APIs
* @ingroup kernel_apis
* @{
*/
enum k_fatal_error_reason { enum k_fatal_error_reason {
/** Generic CPU exception, not covered by other codes */ /** Generic CPU exception, not covered by other codes */
K_ERR_CPU_EXCEPTION, K_ERR_CPU_EXCEPTION,
@ -88,4 +98,6 @@ void k_sys_fatal_error_handler(unsigned int reason, const z_arch_esf_t *esf);
*/ */
void z_fatal_error(unsigned int reason, const z_arch_esf_t *esf); void z_fatal_error(unsigned int reason, const z_arch_esf_t *esf);
/** @} */
#endif /* ZEPHYR_INCLUDE_FATAL_H */ #endif /* ZEPHYR_INCLUDE_FATAL_H */

View file

@ -4,60 +4,6 @@
* SPDX-License-Identifier: Apache-2.0 * SPDX-License-Identifier: Apache-2.0
*/ */
/**
* @file
* @brief Debug aid
*
*
* The __ASSERT() macro can be used inside kernel code.
*
* Assertions are enabled by setting the __ASSERT_ON symbol to a non-zero value.
* There are two ways to do this:
* a) Use the ASSERT and ASSERT_LEVEL kconfig options
* b) Add "CFLAGS += -D__ASSERT_ON=<level>" at the end of a project's Makefile
* The Makefile method takes precedence over the kconfig option if both are
* used.
*
* Specifying an assertion level of 1 causes the compiler to issue warnings that
* the kernel contains debug-type __ASSERT() statements; this reminder is issued
* since assertion code is not normally present in a final product. Specifying
* assertion level 2 suppresses these warnings.
*
* The __ASSERT_EVAL() macro can also be used inside kernel code.
*
* It makes use of the __ASSERT() macro, but has some extra flexibility. It
* allows the developer to specify different actions depending whether the
* __ASSERT() macro is enabled or not. This can be particularly useful to
* prevent the compiler from generating comments (errors, warnings or remarks)
* about variables that are only used with __ASSERT() being assigned a value,
* but otherwise unused when the __ASSERT() macro is disabled.
*
* Consider the following example:
*
* int x;
*
* x = foo ();
* __ASSERT (x != 0, "foo() returned zero!");
*
* If __ASSERT() is disabled, then 'x' is assigned a value, but never used.
* This type of situation can be resolved using the __ASSERT_EVAL() macro.
*
* __ASSERT_EVAL ((void) foo(),
* int x = foo(),
* x != 0,
* "foo() returned zero!");
*
* The first parameter tells __ASSERT_EVAL() what to do if __ASSERT() is
* disabled. The second parameter tells __ASSERT_EVAL() what to do if
* __ASSERT() is enabled. The third and fourth parameters are the parameters
* it passes to __ASSERT().
*
* The __ASSERT_NO_MSG() macro can be used to perform an assertion that reports
* the failed test and its location, but lacks additional debugging information
* provided to assist the user in diagnosing the problem; its use is
* discouraged.
*/
#ifndef ZEPHYR_INCLUDE_SYS___ASSERT_H_ #ifndef ZEPHYR_INCLUDE_SYS___ASSERT_H_
#define ZEPHYR_INCLUDE_SYS___ASSERT_H_ #define ZEPHYR_INCLUDE_SYS___ASSERT_H_

View file

@ -117,9 +117,12 @@ config ASSERT
default y if TEST default y if TEST
help help
This enables the __ASSERT() macro in the kernel code. If an assertion This enables the __ASSERT() macro in the kernel code. If an assertion
fails, the calling thread is put on an infinite tight loop. Since fails, the policy for what to do is controlled by the implementation
enabling this adds a significant footprint, it should only be enabled of the assert_post_action() function, which by default will trigger
in a non-production system. a fatal error.
Disabling this option will cause assertions to compile to nothing,
improving performance and system footprint.
config ASSERT_LEVEL config ASSERT_LEVEL
int "__ASSERT() level" int "__ASSERT() level"