From 4ce988ab433d70c9e85ff6f3bc448239ce18c7bb Mon Sep 17 00:00:00 2001 From: Andrew Boie Date: Sat, 14 Sep 2019 23:40:03 -0700 Subject: [PATCH] doc: provide error handling documentation We don't really have docs on how fatal errors are induced or handled. Provide some documentation that covers: - Assertions (runtime and build) - Kernel panic and oops conditions - Stack overflows - Other exceptions - Exception handling policy Signed-off-by: Andrew Boie --- doc/reference/kernel/index.rst | 1 + doc/reference/kernel/other/fatal.rst | 263 +++++++++++++++++++++++++++ include/fatal.h | 12 ++ include/sys/__assert.h | 54 ------ subsys/debug/Kconfig | 9 +- 5 files changed, 282 insertions(+), 57 deletions(-) create mode 100644 doc/reference/kernel/other/fatal.rst diff --git a/doc/reference/kernel/index.rst b/doc/reference/kernel/index.rst index 87e92402c2..24cb9481ff 100644 --- a/doc/reference/kernel/index.rst +++ b/doc/reference/kernel/index.rst @@ -115,3 +115,4 @@ These pages cover other kernel services. other/ring_buffers.rst other/cxx_support.rst other/version.rst + other/fatal.rst diff --git a/doc/reference/kernel/other/fatal.rst b/doc/reference/kernel/other/fatal.rst new file mode 100644 index 0000000000..2f2ae05450 --- /dev/null +++ b/doc/reference/kernel/other/fatal.rst @@ -0,0 +1,263 @@ +.. _fatal: + +Fatal Errors +############ + +Software Errors Triggered in Source Code +**************************************** + +Zephyr provides several methods for inducing fatal error conditions through +either build-time checks, conditionally compiled assertions, or deliberately +invoked panic or oops conditions. + +Runtime Assertions +================== + +Zephyr provides some macros to perform runtime assertions which may be +conditionally compiled. Their definitions may be found in +:zephyr_file:`include/sys/__assert.h`. + +Assertions are enabled by setting the ``__ASSERT_ON`` preprocessor symbol to a +non-zero value. There are two ways to do this: + +- Use the :option:`CONFIG_ASSERT` and :option:`CONFIG_ASSERT_LEVEL` kconfig + options. +- Add ``-D__ASSERT_ON=`` to the project's CFLAGS, either on the + build command line or in a CMakeLists.txt. + +The ``__ASSERT_ON`` method takes precedence over the kconfig option if both are +used. + +Specifying an assertion level of 1 causes the compiler to issue warnings that +the kernel contains debug-type ``__ASSERT()`` statements; this reminder is +issued since assertion code is not normally present in a final product. +Specifying assertion level 2 suppresses these warnings. + +Assertions are enabled by default when running Zephyr test cases, as +configured by the :option:`CONFIG_TEST` option. + +The policy for what to do when encountering a failed assertion is controlled +by the implementation of :c:func:`assert_post_action`. Zephyr provides +a default implementation with weak linkage which invokes a kernel oops if +the thread that failed the assertion was running in user mode, and a kernel +panic otherwise. + +__ASSERT() +---------- + +The ``__ASSERT()`` macro can be used inside kernel and application code to +perform optional runtime checks which will induce a fatal error if the +check does not pass. The macro takes a string message which will be printed +to provide context to the assertion. In addition, the kernel will print +a text representation of the expression code that was evaluated, and the +file and line number where the assertion can be found. + +For example: + +.. code-block:: c + + __ASSERT(foo == 0xF0CACC1A, "Invalid value of foo, got 0x%x", foo); + +If at runtime ``foo`` had some unexpected value, the error produced may +look like the following: + +.. code-block:: none + + ASSERTION FAIL [foo == 0xF0CACC1A] @ ZEPHYR_BASE/tests/kernel/fatal/src/main.c:367 + Invalid value of foo, got 0xdeadbeef + [00:00:00.000,000] os: r0/a1: 0x00000004 r1/a2: 0x0000016f r2/a3: 0x00000000 + [00:00:00.000,000] os: r3/a4: 0x00000000 r12/ip: 0x00000000 r14/lr: 0x00000a6d + [00:00:00.000,000] os: xpsr: 0x61000000 + [00:00:00.000,000] os: Faulting instruction address (r15/pc): 0x00009fe4 + [00:00:00.000,000] os: >>> ZEPHYR FATAL ERROR 4: Kernel panic + [00:00:00.000,000] os: Current thread: 0x20000414 (main) + [00:00:00.000,000] os: Halting system + +__ASSERT_EVAL() +--------------- + +The ``__ASSERT_EVAL()`` macro can also be used inside kernel and application +code, with special semantics for the evaluation of its arguments. + +It makes use of the ``__ASSERT()`` macro, but has some extra flexibility. It +allows the developer to specify different actions depending whether the +``__ASSERT()`` macro is enabled or not. This can be particularly useful to +prevent the compiler from generating comments (errors, warnings or remarks) +about variables that are only used with ``__ASSERT()`` being assigned a value, +but otherwise unused when the ``__ASSERT()`` macro is disabled. + +Consider the following example: + +.. code-block:: c + + int x; + x = foo(); + __ASSERT(x != 0, "foo() returned zero!"); + +If ``__ASSERT()`` is disabled, then 'x' is assigned a value, but never used. +This type of situation can be resolved using the __ASSERT_EVAL() macro. + +.. code-block:: c + + __ASSERT_EVAL ((void) foo(), + int x = foo(), + x != 0, + "foo() returned zero!"); + +The first parameter tells ``__ASSERT_EVAL()`` what to do if ``__ASSERT()`` is +disabled. The second parameter tells ``__ASSERT_EVAL()`` what to do if +``__ASSERT()`` is enabled. The third and fourth parameters are the parameters +it passes to ``__ASSERT()``. + +__ASSERT_NO_MSG() +----------------- + +The ``__ASSERT_NO_MSG()`` macro can be used to perform an assertion that +reports the failed test and its location, but lacks additional debugging +information provided to assist the user in diagnosing the problem; its use is +discouraged. + +Build Assertions +================ + +Zephyr provides two macros for performing build-time assertion checks. +These are evaluated completely at compile-time, and are always checked. + +BUILD_ASSERT_MSG() +------------------ + +This has the same semantics as C's ``_Static_assert`` or C++'s +``static_assert``. If the evaluation fails, a build error will be generated by +the compiler. If the compiler supports it, the provided message will be printed +to provide further context. + +Unlike ``__ASSERT()``, the message must be a static string, without +:c:func:`printf()`-like format codes or extra arguments. + +For example, suppose this check fails: + +.. code-block:: c + + BUILD_ASSERT_MSG(FOO == 2000, + "Invalid value of FOO"); + +With GCC, the output resembles: + +.. code-block:: none + + tests/kernel/fatal/src/main.c: In function 'test_main': + include/toolchain/gcc.h:28:37: error: static assertion failed: "Invalid value of FOO" + #define BUILD_ASSERT_MSG(EXPR, MSG) _Static_assert(EXPR, MSG) + ^~~~~~~~~~~~~~ + tests/kernel/fatal/src/main.c:370:2: note: in expansion of macro 'BUILD_ASSERT_MSG' + BUILD_ASSERT_MSG(FOO == 2000, + ^~~~~~~~~~~~~~~~ + +BUILD_ASSERT() +-------------- + +This works just like ``BUILD_ASSERT_MSG()`` except there is no supplemental +message provided, and like ``__ASSERT_NO_MSG()`` its use is discouraged. + +Kernel Oops +=========== + +A kernel oops is a software triggered fatal error invoked by +:c:func:`k_oops()`. This should be used to indicate an unrecoverable condition +in application logic. + +The fatal error reason code generated will be ``K_ERR_KERNEL_OOPS``. + +Kernel Panic +============ + +A kernel error is a software triggered fatal error invoked by +:c:func:`k_panic()`. This should be used to indicate that the Zephyr kernel is +in an unrecoverable state. Implementations of +:c:func:`k_sys_fatal_error_handler()` should not return if the kernel +encounters a panic condition, as the entire system needs to be reset. + +Threads running in user mode are not permitted to invoke :c:func:`k_panic()`, +and doing so will generate a kernel oops instead. Otherwise, the fatal error +reason code generated will be ``K_ERR_KERNEL_PANIC``. + +Exceptions +********** + +Spurious Interrupts +=================== + +If the CPU receives a hardware interrupt on an interrupt line that has not had +a handler installed with ``IRQ_CONNECT()`` or :c:func:`irq_connect_dynamic()`, +then the kernel will generate a fatal error with the reason code +``K_ERR_SPURIOUS_IRQ()``. + +Stack Overflows +=============== + +In the event that a thread pushes more data onto its execution stack than its +stack buffer provides, the kernel may be able to detect this situation and +generate a fatal error with a reason code of ``K_ERR_STACK_CHK_FAIL``. + +If a thread is running in user mode, then stack overflows are always caught, +as the thread will simply not have permission to write to adjacent memory +addresses outside of the stack buffer. Because this is enforced by the +memory protection hardware, there is no risk of data corruption to memory +that the thread would not otherwise be able to write to. + +If a thread is running in supervisor mode, or if :option:`CONFIG_USERSPACE` is +not enabled, depending on configuration stack overflows may or may not be +caught. :option:`CONFIG_HW_STACK_PROTECTION` is supported on some +architectures and will catch stack overflows in supervisor mode, including +when handling a system call on behalf of a user thread. Typically this is +implemented via dedicated CPU features, or read-only MMU/MPU guard regions +placed immediately adjacent to the stack buffer. Stack overflows caught in this +way can detect the overflow, but cannot guarantee against data corruption and +should be treated as a very serious condition impacting the health of the +entire system. + +If a platform lacks memory management hardware support, +:option:`CONFIG_STACK_SENTINEL` is a software-only stack overflow detection +feature which periodically checks if a sentinel value at the end of the stack +buffer has been corrupted. It does not require hardware support, but provides +no protection against data corruption. Since the checks are typically done at +interrupt exit, the overflow may be detected a nontrivial amount of time after +the stack actually overflowed. + +Finally, Zephyr supports GCC compiler stack canaries via +:option:`CONFIG_STACK_CANARIES`. If enabled, the compiler will insert a canary +value randomly generated at boot into function stack frames, checking that the +canary has not been overwritten at function exit. If the check fails, the +compiler invokes :c:func:`__stack_chk_fail()`, whose Zephyr implementation +invokes a fatal stack overflow error. An error in this case does not indicate +that the entire stack buffer has overflowed, but instead that the current +function stack frame has been corrupted. See the compiler documentation for +more details. + +Other Exceptions +================ + +Any other type of unhandled CPU exception will generate an error code of +``K_ERR_CPU_EXCEPTION``. + +Fatal Error Handling +******************** + +The policy for what to do when encountering a fatal error is determined by the +implementation of the :c:func:`k_sys_fatal_error_handler()` function. This +function has a default implementation with weak linkage that calls +``LOG_PANIC()`` to dump all pending logging messages and then unconditionally +halts the system with :c:func:`k_fatal_halt()`. + +Applications are free to implement their own error handling policy by +overriding the implementation of :c:func:`k_sys_fatal_error_handler()`. +If the implementation returns, the faulting thread will be aborted and +the system will otherwise continue to function. See the documentation for +this function for additional details and constraints. + +API Reference +************* + +.. doxygengroup:: fatal_apis + :project: Zephyr + diff --git a/include/fatal.h b/include/fatal.h index bf3ddfd9fa..fb900d7c95 100644 --- a/include/fatal.h +++ b/include/fatal.h @@ -4,12 +4,22 @@ * SPDX-License-Identifier: Apache-2.0 */ +/** @file + * @brief Fatal error functions + */ + #ifndef ZEPHYR_INCLUDE_FATAL_H #define ZEPHYR_INCLUDE_FATAL_H #include #include +/** + * @defgroup fatal_apis Fatal error APIs + * @ingroup kernel_apis + * @{ + */ + enum k_fatal_error_reason { /** Generic CPU exception, not covered by other codes */ K_ERR_CPU_EXCEPTION, @@ -88,4 +98,6 @@ void k_sys_fatal_error_handler(unsigned int reason, const z_arch_esf_t *esf); */ void z_fatal_error(unsigned int reason, const z_arch_esf_t *esf); +/** @} */ + #endif /* ZEPHYR_INCLUDE_FATAL_H */ diff --git a/include/sys/__assert.h b/include/sys/__assert.h index a4f1824ff0..ce591bb8a7 100644 --- a/include/sys/__assert.h +++ b/include/sys/__assert.h @@ -4,60 +4,6 @@ * SPDX-License-Identifier: Apache-2.0 */ -/** - * @file - * @brief Debug aid - * - * - * The __ASSERT() macro can be used inside kernel code. - * - * Assertions are enabled by setting the __ASSERT_ON symbol to a non-zero value. - * There are two ways to do this: - * a) Use the ASSERT and ASSERT_LEVEL kconfig options - * b) Add "CFLAGS += -D__ASSERT_ON=" at the end of a project's Makefile - * The Makefile method takes precedence over the kconfig option if both are - * used. - * - * Specifying an assertion level of 1 causes the compiler to issue warnings that - * the kernel contains debug-type __ASSERT() statements; this reminder is issued - * since assertion code is not normally present in a final product. Specifying - * assertion level 2 suppresses these warnings. - * - * The __ASSERT_EVAL() macro can also be used inside kernel code. - * - * It makes use of the __ASSERT() macro, but has some extra flexibility. It - * allows the developer to specify different actions depending whether the - * __ASSERT() macro is enabled or not. This can be particularly useful to - * prevent the compiler from generating comments (errors, warnings or remarks) - * about variables that are only used with __ASSERT() being assigned a value, - * but otherwise unused when the __ASSERT() macro is disabled. - * - * Consider the following example: - * - * int x; - * - * x = foo (); - * __ASSERT (x != 0, "foo() returned zero!"); - * - * If __ASSERT() is disabled, then 'x' is assigned a value, but never used. - * This type of situation can be resolved using the __ASSERT_EVAL() macro. - * - * __ASSERT_EVAL ((void) foo(), - * int x = foo(), - * x != 0, - * "foo() returned zero!"); - * - * The first parameter tells __ASSERT_EVAL() what to do if __ASSERT() is - * disabled. The second parameter tells __ASSERT_EVAL() what to do if - * __ASSERT() is enabled. The third and fourth parameters are the parameters - * it passes to __ASSERT(). - * - * The __ASSERT_NO_MSG() macro can be used to perform an assertion that reports - * the failed test and its location, but lacks additional debugging information - * provided to assist the user in diagnosing the problem; its use is - * discouraged. - */ - #ifndef ZEPHYR_INCLUDE_SYS___ASSERT_H_ #define ZEPHYR_INCLUDE_SYS___ASSERT_H_ diff --git a/subsys/debug/Kconfig b/subsys/debug/Kconfig index ada718f38b..27ce9fa957 100644 --- a/subsys/debug/Kconfig +++ b/subsys/debug/Kconfig @@ -117,9 +117,12 @@ config ASSERT default y if TEST help This enables the __ASSERT() macro in the kernel code. If an assertion - fails, the calling thread is put on an infinite tight loop. Since - enabling this adds a significant footprint, it should only be enabled - in a non-production system. + fails, the policy for what to do is controlled by the implementation + of the assert_post_action() function, which by default will trigger + a fatal error. + + Disabling this option will cause assertions to compile to nothing, + improving performance and system footprint. config ASSERT_LEVEL int "__ASSERT() level"