doc: provide error handling documentation

We don't really have docs on how fatal errors are induced or handled. Provide some documentation that covers: - Assertions (runtime and build) - Kernel panic and oops conditions - Stack overflows - Other exceptions - Exception handling policy Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
2019-09-14 23:40:03 -07:00 · 2019-09-14 23:40:03 -07:00 · 4ce988ab43
parent c311aa4675
commit 4ce988ab43
5 changed files with 282 additions and 57 deletions
--- a/doc/reference/kernel/index.rst
+++ b/doc/reference/kernel/index.rst
@ -115,3 +115,4 @@ These pages cover other kernel services.
   other/ring_buffers.rst
   other/cxx_support.rst
   other/version.rst
+   other/fatal.rst
--- a/doc/reference/kernel/other/fatal.rst
+++ b/doc/reference/kernel/other/fatal.rst
@ -0,0 +1,263 @@
+.. _fatal:
+
+Fatal Errors
+############
+
+Software Errors Triggered in Source Code
+****************************************
+
+Zephyr provides several methods for inducing fatal error conditions through
+either build-time checks, conditionally compiled assertions, or deliberately
+invoked panic or oops conditions.
+
+Runtime Assertions
+==================
+
+Zephyr provides some macros to perform runtime assertions which may be
+conditionally compiled. Their definitions may be found in
+:zephyr_file:`include/sys/__assert.h`.
+
+Assertions are enabled by setting the ``__ASSERT_ON`` preprocessor symbol to a
+non-zero value. There are two ways to do this:
+
+- Use the :option:`CONFIG_ASSERT` and :option:`CONFIG_ASSERT_LEVEL` kconfig
+  options.
+- Add ``-D__ASSERT_ON=<level>`` to the project's CFLAGS, either on the
+  build command line or in a CMakeLists.txt.
+
+The ``__ASSERT_ON`` method takes precedence over the kconfig option if both are
+used.
+
+Specifying an assertion level of 1 causes the compiler to issue warnings that
+the kernel contains debug-type ``__ASSERT()`` statements; this reminder is
+issued since assertion code is not normally present in a final product.
+Specifying assertion level 2 suppresses these warnings.
+
+Assertions are enabled by default when running Zephyr test cases, as
+configured by the :option:`CONFIG_TEST` option.
+
+The policy for what to do when encountering a failed assertion is controlled
+by the implementation of :c:func:`assert_post_action`. Zephyr provides
+a default implementation with weak linkage which invokes a kernel oops if
+the thread that failed the assertion was running in user mode, and a kernel
+panic otherwise.
+
+__ASSERT()
+----------
+
+The ``__ASSERT()`` macro can be used inside kernel and application code to
+perform optional runtime checks which will induce a fatal error if the
+check does not pass. The macro takes a string message which will be printed
+to provide context to the assertion. In addition, the kernel will print
+a text representation of the expression code that was evaluated, and the
+file and line number where the assertion can be found.
+
+For example:
+
+.. code-block:: c
+
+  __ASSERT(foo == 0xF0CACC1A, "Invalid value of foo, got 0x%x", foo);
+
+If at runtime ``foo`` had some unexpected value, the error produced may
+look like the following:
+
+.. code-block:: none
+
+	ASSERTION FAIL [foo == 0xF0CACC1A] @ ZEPHYR_BASE/tests/kernel/fatal/src/main.c:367
+		Invalid value of foo, got 0xdeadbeef
+	[00:00:00.000,000] <err> os: r0/a1:  0x00000004  r1/a2:  0x0000016f  r2/a3:  0x00000000
+	[00:00:00.000,000] <err> os: r3/a4:  0x00000000 r12/ip:  0x00000000 r14/lr:  0x00000a6d
+	[00:00:00.000,000] <err> os:  xpsr:  0x61000000
+	[00:00:00.000,000] <err> os: Faulting instruction address (r15/pc): 0x00009fe4
+	[00:00:00.000,000] <err> os: >>> ZEPHYR FATAL ERROR 4: Kernel panic
+	[00:00:00.000,000] <err> os: Current thread: 0x20000414 (main)
+	[00:00:00.000,000] <err> os: Halting system
+
+__ASSERT_EVAL()
+---------------
+
+The ``__ASSERT_EVAL()`` macro can also be used inside kernel and application
+code, with special semantics for the evaluation of its arguments.
+
+It makes use of the ``__ASSERT()`` macro, but has some extra flexibility. It
+allows the developer to specify different actions depending whether the
+``__ASSERT()`` macro is enabled or not.  This can be particularly useful to
+prevent the compiler from generating comments (errors, warnings or remarks)
+about variables that are only used with ``__ASSERT()`` being assigned a value,
+but otherwise unused when the ``__ASSERT()`` macro is disabled.
+
+Consider the following example:
+
+.. code-block:: c
+
+  int x;
+  x = foo();
+  __ASSERT(x != 0, "foo() returned zero!");
+
+If ``__ASSERT()`` is disabled, then 'x' is assigned a value, but never used.
+This type of situation can be resolved using the __ASSERT_EVAL() macro.
+
+.. code-block:: c
+
+  __ASSERT_EVAL ((void) foo(),
+  		 int x = foo(),
+                 x != 0,
+                 "foo() returned zero!");
+
+The first parameter tells ``__ASSERT_EVAL()`` what to do if ``__ASSERT()`` is
+disabled.  The second parameter tells ``__ASSERT_EVAL()`` what to do if
+``__ASSERT()`` is enabled.  The third and fourth parameters are the parameters
+it passes to ``__ASSERT()``.
+
+__ASSERT_NO_MSG()
+-----------------
+
+The ``__ASSERT_NO_MSG()`` macro can be used to perform an assertion that
+reports the failed test and its location, but lacks additional debugging
+information provided to assist the user in diagnosing the problem; its use is
+discouraged.
+
+Build Assertions
+================
+
+Zephyr provides two macros for performing build-time assertion checks.
+These are evaluated completely at compile-time, and are always checked.
+
+BUILD_ASSERT_MSG()
+------------------
+
+This has the same semantics as C's ``_Static_assert`` or C++'s
+``static_assert``. If the evaluation fails, a build error will be generated by
+the compiler. If the compiler supports it, the provided message will be printed
+to provide further context.
+
+Unlike ``__ASSERT()``, the message must be a static string, without
+:c:func:`printf()`-like format codes or extra arguments.
+
+For example, suppose this check fails:
+
+.. code-block:: c
+
+	BUILD_ASSERT_MSG(FOO == 2000,
+			 "Invalid value of FOO");
+
+With GCC, the output resembles:
+
+.. code-block:: none
+
+	tests/kernel/fatal/src/main.c: In function 'test_main':
+	include/toolchain/gcc.h:28:37: error: static assertion failed: "Invalid value of FOO"
+	 #define BUILD_ASSERT_MSG(EXPR, MSG) _Static_assert(EXPR, MSG)
+					     ^~~~~~~~~~~~~~
+	tests/kernel/fatal/src/main.c:370:2: note: in expansion of macro 'BUILD_ASSERT_MSG'
+	  BUILD_ASSERT_MSG(FOO == 2000,
+	  ^~~~~~~~~~~~~~~~
+
+BUILD_ASSERT()
+--------------
+
+This works just like ``BUILD_ASSERT_MSG()`` except there is no supplemental
+message provided, and like ``__ASSERT_NO_MSG()`` its use is discouraged.
+
+Kernel Oops
+===========
+
+A kernel oops is a software triggered fatal error invoked by
+:c:func:`k_oops()`.  This should be used to indicate an unrecoverable condition
+in application logic.
+
+The fatal error reason code generated will be ``K_ERR_KERNEL_OOPS``.
+
+Kernel Panic
+============
+
+A kernel error is a software triggered fatal error invoked by
+:c:func:`k_panic()`.  This should be used to indicate that the Zephyr kernel is
+in an unrecoverable state. Implementations of
+:c:func:`k_sys_fatal_error_handler()` should not return if the kernel
+encounters a panic condition, as the entire system needs to be reset.
+
+Threads running in user mode are not permitted to invoke :c:func:`k_panic()`,
+and doing so will generate a kernel oops instead. Otherwise, the fatal error
+reason code generated will be ``K_ERR_KERNEL_PANIC``.
+
+Exceptions
+**********
+
+Spurious Interrupts
+===================
+
+If the CPU receives a hardware interrupt on an interrupt line that has not had
+a handler installed with ``IRQ_CONNECT()`` or :c:func:`irq_connect_dynamic()`,
+then the kernel will generate a fatal error with the reason code
+``K_ERR_SPURIOUS_IRQ()``.
+
+Stack Overflows
+===============
+
+In the event that a thread pushes more data onto its execution stack than its
+stack buffer provides, the kernel may be able to detect this situation and
+generate a fatal error with a reason code of ``K_ERR_STACK_CHK_FAIL``.
+
+If a thread is running in user mode, then stack overflows are always caught,
+as the thread will simply not have permission to write to adjacent memory
+addresses outside of the stack buffer. Because this is enforced by the
+memory protection hardware, there is no risk of data corruption to memory
+that the thread would not otherwise be able to write to.
+
+If a thread is running in supervisor mode, or if :option:`CONFIG_USERSPACE` is
+not enabled, depending on configuration stack overflows may or may not be
+caught.  :option:`CONFIG_HW_STACK_PROTECTION` is supported on some
+architectures and will catch stack overflows in supervisor mode, including
+when handling a system call on behalf of a user thread. Typically this is
+implemented via dedicated CPU features, or read-only MMU/MPU guard regions
+placed immediately adjacent to the stack buffer. Stack overflows caught in this
+way can detect the overflow, but cannot guarantee against data corruption and
+should be treated as a very serious condition impacting the health of the
+entire system.
+
+If a platform lacks memory management hardware support,
+:option:`CONFIG_STACK_SENTINEL` is a software-only stack overflow detection
+feature which periodically checks if a sentinel value at the end of the stack
+buffer has been corrupted. It does not require hardware support, but provides
+no protection against data corruption. Since the checks are typically done at
+interrupt exit, the overflow may be detected a nontrivial amount of time after
+the stack actually overflowed.
+
+Finally, Zephyr supports GCC compiler stack canaries via
+:option:`CONFIG_STACK_CANARIES`.  If enabled, the compiler will insert a canary
+value randomly generated at boot into function stack frames, checking that the
+canary has not been overwritten at function exit. If the check fails, the
+compiler invokes :c:func:`__stack_chk_fail()`, whose Zephyr implementation
+invokes a fatal stack overflow error. An error in this case does not indicate
+that the entire stack buffer has overflowed, but instead that the current
+function stack frame has been corrupted. See the compiler documentation for
+more details.
+
+Other Exceptions
+================
+
+Any other type of unhandled CPU exception will generate an error code of
+``K_ERR_CPU_EXCEPTION``.
+
+Fatal Error Handling
+********************
+
+The policy for what to do when encountering a fatal error is determined by the
+implementation of the :c:func:`k_sys_fatal_error_handler()` function.  This
+function has a default implementation with weak linkage that calls
+``LOG_PANIC()`` to dump all pending logging messages and then unconditionally
+halts the system with :c:func:`k_fatal_halt()`.
+
+Applications are free to implement their own error handling policy by
+overriding the implementation of :c:func:`k_sys_fatal_error_handler()`.
+If the implementation returns, the faulting thread will be aborted and
+the system will otherwise continue to function. See the documentation for
+this function for additional details and constraints.
+
+API Reference
+*************
+
+.. doxygengroup:: fatal_apis
+   :project: Zephyr
+
--- a/include/fatal.h
+++ b/include/fatal.h
@ -4,12 +4,22 @@
 * SPDX-License-Identifier: Apache-2.0
 */

+/** @file
+ *  @brief Fatal error functions
+ */
+
 #ifndef ZEPHYR_INCLUDE_FATAL_H
 #define ZEPHYR_INCLUDE_FATAL_H

 #include <arch/cpu.h>
 #include <toolchain.h>

+/**
+ * @defgroup fatal_apis Fatal error APIs
+ * @ingroup kernel_apis
+ * @{
+ */
+
 enum k_fatal_error_reason {
 	/** Generic CPU exception, not covered by other codes */
 	K_ERR_CPU_EXCEPTION,
@ -88,4 +98,6 @@ void k_sys_fatal_error_handler(unsigned int reason, const z_arch_esf_t *esf);
 */
 void z_fatal_error(unsigned int reason, const z_arch_esf_t *esf);

+/** @} */
+
 #endif /* ZEPHYR_INCLUDE_FATAL_H */
--- a/include/sys/__assert.h
+++ b/include/sys/__assert.h
@ -4,60 +4,6 @@
 * SPDX-License-Identifier: Apache-2.0
 */

-/**
- * @file
- * @brief Debug aid
- *
- *
- * The __ASSERT() macro can be used inside kernel code.
- *
- * Assertions are enabled by setting the __ASSERT_ON symbol to a non-zero value.
- * There are two ways to do this:
- *   a) Use the ASSERT and ASSERT_LEVEL kconfig options
- *   b) Add "CFLAGS += -D__ASSERT_ON=<level>" at the end of a project's Makefile
- * The Makefile method takes precedence over the kconfig option if both are
- * used.
- *
- * Specifying an assertion level of 1 causes the compiler to issue warnings that
- * the kernel contains debug-type __ASSERT() statements; this reminder is issued
- * since assertion code is not normally present in a final product. Specifying
- * assertion level 2 suppresses these warnings.
- *
- * The __ASSERT_EVAL() macro can also be used inside kernel code.
- *
- * It makes use of the __ASSERT() macro, but has some extra flexibility.  It
- * allows the developer to specify different actions depending whether the
- * __ASSERT() macro is enabled or not.  This can be particularly useful to
- * prevent the compiler from generating comments (errors, warnings or remarks)
- * about variables that are only used with __ASSERT() being assigned a value,
- * but otherwise unused when the __ASSERT() macro is disabled.
- *
- * Consider the following example:
- *
- * int  x;
- *
- * x = foo ();
- * __ASSERT (x != 0, "foo() returned zero!");
- *
- * If __ASSERT() is disabled, then 'x' is assigned a value, but never used.
- * This type of situation can be resolved using the __ASSERT_EVAL() macro.
- *
- * __ASSERT_EVAL ((void) foo(),
- *		  int x = foo(),
- *                x != 0,
- *                "foo() returned zero!");
- *
- * The first parameter tells __ASSERT_EVAL() what to do if __ASSERT() is
- * disabled.  The second parameter tells __ASSERT_EVAL() what to do if
- * __ASSERT() is enabled.  The third and fourth parameters are the parameters
- * it passes to __ASSERT().
- *
- * The __ASSERT_NO_MSG() macro can be used to perform an assertion that reports
- * the failed test and its location, but lacks additional debugging information
- * provided to assist the user in diagnosing the problem; its use is
- * discouraged.
- */
-
 #ifndef ZEPHYR_INCLUDE_SYS___ASSERT_H_
 #define ZEPHYR_INCLUDE_SYS___ASSERT_H_

--- a/subsys/debug/Kconfig
+++ b/subsys/debug/Kconfig
@ -117,9 +117,12 @@ config ASSERT
 	default y if TEST
 	help
 	  This enables the __ASSERT() macro in the kernel code. If an assertion
-	  fails, the calling thread is put on an infinite tight loop. Since
-	  enabling this adds a significant footprint, it should only be enabled
-	  in a non-production system.
+	  fails, the policy for what to do is controlled by the implementation
+	  of the assert_post_action() function, which by default will trigger
+	  a fatal error.
+
+	  Disabling this option will cause assertions to compile to nothing,
+	  improving performance and system footprint.

 config ASSERT_LEVEL
 	int "__ASSERT() level"