Nothing but Embedded systems

RAMOOPS – Keeping Linux kernel oops/trace logs persistent between reboots

Although the Linux kernel is one of the most stable kernels, we are not immune to a possible crash, whether due to a proprietary or client driver whose robustness has not been tested or an inherent bug to the core not yet fixed. The importance of logs/traces in the resolution of such bugs being paramount, we experimented a in-kernel feature  for one of our customers, which consists of making persistent (or rather semi-persistent) logs resulting from said bugs.

The idea is to capture kernel logs in its dying moments and make them available even after a reboot, so they can be analyzed later. This feature is called pstore, which stands for Persistent storage. In this blog post, we will explore the capabilities this feature offers.

Configuring the Linux kernel for pstore

Pstore was meant for platforms that need a mechanism to preserve kernel-related logs across system reboots. The kernel version on which this feature has been implemented for our customer is v5.4.Y. In this version, the logs are only RAM backed, thus it is sometime referred to as Ramoops. This means that the expectation is that from time to time, saved log files are removed from the persistent store quite often and saved elsewhere, to free up space ready for the next crashes. You might have a service doing that upon reboots.

Back to the configuration, the followings are kernel options that need to be enabled to support this feature:

CONFIG_PSTORE_RAM=y
CONFIG_FTRACE=y
CONFIG_FUNCTION_TRACER=y
CONFIG_PSTORE_FTRACE=y
Ramoops kernel configuration

The followings are the explanations of each options that is enables in the previous configuration excerpt:

  • CONFIG_PSTORE: This is the main option that enables the pstore feature
  • CONFIG_PSTORE_RAM: This enables panic and oops messages to be logged to a circular buffer in RAM where it can be read back at some later point

It might be interesting to have the kernel trace logs along with the OOPs, which could tell for example the last function that was executed by the kernel before it gave its last breath. To do that, we have to enable the following options:

  • CONFIG_FTRACE: enables ftrace, the Linux kernel tracer, which will allow to select the next option.
  • CONFIG_FUNCTION_TRACER: This will add the support for kernel function tracing and will make CONFIG_PSTORE_FTRACE accessible
  • CONFIG_PSTORE_FTRACE: With this option, the Linux kernel traces function calls into a persistent ram buffer that can be decoded and dumped after reboot through pstore filesystem. Upon reboot, captured trace will be available in /sys/fs/pstore/ftrace-ramoops file.

We could have extended our configuration with the following options:

  • CONFIG_PSTORE_CONSOLE: When the option is enabled, pstore will log all kernel messages, even if no oops or panic occurred.
  • CONFIG_PSTORE_PMSG: When the option is enabled, pstore will export a character interface /dev/pmsg0 to log user space messages. Writes to /dev/pmsg0 are appended to the buffer, and on reboot, the persistent contents are available in /sys/fs/pstore/pmsg-ramoops-[ID], where [ID] is the record number of dumps of this same kind.

You should note that in the kernel version we have tested this feature with, the only storage backend was the RAM. In recent kernel versions (starting from 5.8), it becomes possible to select a block device as backing store. To do so, you need to play with the following options (more on that here: https://www.kernel.org/doc/html/latest/admin-guide/pstore-blk.html):

  • CONFIG_PSTORE_BLK: enables pstore support for block devices.
  • CONFIG_PSTORE_BLK_BLKDEV: specifies the block device that is to be used to store the dumps. i.e  /dev/<disk_name>p<decimal> or EFI/GPT UUID partition
  • CONFIG_PSTORE_BLK_KMSG_SIZE: This just sets size of kernel message dump (oops, panic, etc) log for pstore/blk. The size is in KB and must be a multiple of 4.
  • CONFIG_PSTORE_BLK_MAX_REASON: The maximum reason for kernel message dumps to store.

Now that we are done with the kernel configuration that is related to pstore, we still miss some information about the memory regions where the logs will be stored.

Adapting the device tree

Ramoops needs to be provided with a reserved memory area to store the dump into. Though this can be done in different manners, the method we have chosen for this blog post is the device tree. From the device tree then, this memory area is described by a start address and a size using the following variables:

  • mem_address: the address where the memory region starts
  • mem_size: for the size or the memory region. The memory size will be rounded down to a power of two.
  • record-size, console-size, and pmsg-size, depending on the options that we have enabled

The following is the complete device tree node that describes our pstore memory regions and its different record sizes:

reserved-memory {
	#address-cells = <2>;
	#size-cells = <2>;
	ranges;

	ramoops@880000000 {
		#address-cells = <2>;
		#size-cells = <2>;
		compatible = "ramoops";
		reg = <0x00000008 0x80000000 0x0 0x100000>; /* 1MB from 0x880000000 */
		record-size = <0x8000>;  /* 32kB per record */
		console-size = <0x8000>; /* 32kB for console output */
		pmsg-size = <0x1000>;     /* 4kB for user messages */
	};
};
Ramoops devicetree node

In the previous snippet, we have added some properties that are related to both the console and pmsg (user message) pstore. The following are more details on each property:

  • record-size: Instructs ramoops to use up to 32kB per record to store kernel log messages when a panic occurs. That is, each file storing such dump will be up to 32kB sized.
  • console-size: this is the maxim size per console output record, 32kB in our case. Do note that when enabled, console output is stored even if a panic did not occur.
  • pmsg-size: this is the maximum record size per user message. In our case, it is up to 4kB. It means when this size is reached, another file will be created in the pstore filesystem to store the next incoming messages, and so on.

It is also possible to specify the ramoops settings via module parameter. To do that, you can look at /sys/module/pstore/parameters/ and /sys/module/ramoops/parameters/ to see what possibilities you have.

Now that we have described the memory regions to the pstore infrastructure, the time has come for us to do some testings.

Pstore live tests

To be able to get the best out of the pstore mechanism, we need to mount the pstore filesystem as the following:

root@imx8mmevk-labcsmart:~# mount -t pstore pstore /sys/fs/pstore
Monting pstore filesystem

According to the above excerpt, the dump data will be read from /sys/fs/pstore, mounted as the pstore filesystem. For any oops/panic that occurs, the format of the file is dmesg-ramoops-N, where N is the record number in memory. These files will be created and will contain the captured oops/panic, provided CONFIG_PSTORE_FTRACE option is enabled.

To be able to live test the Pstore mechanism, we need to manually trigger kernel panics. Note that on the system this feature has been tested on, the reboot timeout after a panic was set to 5s. Now we trigger the panic with the following command:

root@imx8mmevk-labcsmart:~# echo c > /proc/sysrq-trigger

After we have triggered the panic, our system will reboot 5s later. If your system is not configured with reboot on panic option, you can just push the reset button of the board.

Now, after remounting the pstore filesystem has we have seen in the beginning of this section (provided it is not automatically mounted), we can check its content again,  where we will see a new dmesg-ramoops-0 file, which corresponds to the dump that has been captured during the previous kernel panic. If we show the content of that captured oops message, we will have something like the following:

root@imx8mmevk-labcsmart:~# cat /sys/fs/pstore/dmesg-ramoops-0 
Panic#1 Part1
<6>[    0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd034]
<5>[    0.000000] Linux version 5.4.154-5.5.0-devel+git.9f04e0d3d560 (oe-user@oe-host) (gcc version 9.3.0 (GCC)) #1 SMP PREEMPT Fri Nov 26 15:33:03 UTC 2021
[...]

<6>[   33.763199] REG_3P3V: disabling
<6>[   33.763208] REG_MIPI_PHY: disabling
<6>[   33.763215] V3.3_SD: disabling
<6>[  238.779510] sysrq: Trigger a crash
<0>[  238.783033] Kernel panic - not syncing: sysrq triggered crash
<4>[  238.788803] CPU: 0 PID: 1602 Comm: sh Not tainted 5.4.154-5.5.0-devel+git.9f04e0d3d560 #1
<4>[  238.796994] Hardware name: NXP i.MX8MMini LPDDR4 EVK board
<4>[  238.805095] Call trace:
<4>[  238.807556]  dump_backtrace+0x0/0x148
<4>[  238.811228]  show_stack+0x24/0x30
<4>[  238.814555]  dump_stack+0xb4/0x110
<4>[  238.817969]  panic+0x160/0x368
<4>[  238.821037]  sysrq_handle_reboot+0x0/0x28
<4>[  238.825054]  __handle_sysrq+0x90/0x188
<4>[  238.828809]  write_sysrq_trigger+0x98/0xb8
<4>[  238.832912]  proc_reg_write+0x8c/0xe0
<4>[  238.836582]  __vfs_write+0x48/0x90
<4>[  238.839990]  vfs_write+0xe4/0x1c8
<4>[  238.843310]  ksys_write+0x78/0x100
<4>[  238.846720]  __arm64_sys_write+0x24/0x30
<4>[  238.850657]  el0_svc_common.constprop.0+0x74/0x180
<4>[  238.855460]  el0_svc_handler+0x34/0xa0
<4>[  238.859223]  el0_svc+0x8/0x208
<2>[  238.862294] SMP: stopping secondary CPUs
<0>[  238.866231] Kernel Offset: disabled
<0>[  238.869727] CPU features: 0x0002,2000200c
<0>[  238.873742] Memory Limit: none
root@imx8mmevk-labcsmart:~# 

In the previous excerpt, -0 in the the file name correspond to the number of the record. By triggering another kernel panic, then we will have two files and the new entry will be named dmesg-ramoops-1, and so on. To delete a stored record from RAM, you can rm the respective pstore file. Removing the file will notify to the underlying persistent storage backend that it can reclaim the space for later re-use.

User message log test

While configuring pstore, we have seen that enabling CONFIG_PSTORE_PMSG will result in creating a /dev/pmsg0 character device, which allows to store (by simply echoing in to it) user messages that will be persistent between reboots. The following is an example:

root@imx8mmevk-labcsmart:~# echo "Hello world" > /dev/pmsg0
root@imx8mmevk-labcsmart:~# reboot

After the reboot we can read what we have stored before:

root@imx8mmevk-labcsmart:~# mount -t pstore pstore /sys/fs/pstore
root@imx8mmevk-labcsmart:~# cat /sys/fs/pstore/pmsg-ramoops-0
Hello world

In the previous output, we can see how the content that has been echo’ed in /dev/pmesg0 has been appended to the corresponding persistent pstore buffer, which is pmesg-ramoops-0.  At this point we are familiar with the most basic pstore mechanisms. We can go further by also grabbing ftrace function call traces to make the debugging easier, as we will see in the next section.

Persistent function tracing

Persistent function tracing might be useful to ease the debugging software or hardware related hangs. The functions call chain log is stored in a ftrace-ramoops file. To enable ftrace records, you have to run the following command;

root@imx8mmevk-labcsmart:~# echo 1 > /sys/kernel/debug/pstore/record_ftrace
Enable ftrace record

In the previous command, the debug file system is assumed to be mounted under /sys/kernel/debug/. This path must be modified according to your system. If it is not the case,  you can mount it with the following command:

root@imx8mmevk-labcsmart:~# mount -t debugfs debugfs /sys/kernel/debug/

After triggering a reboot, we can check again the content of our pstore filesystem:

root@imx8mmevk-labcsmart:~# ls /sys/fs/pstore/
dmesg-ramoops-0 dmesg-ramoops-1 ftrace-ramoops pmesg-ramoops-0

In the previous logs, we can see the files resulting from the previous panics, as well as a new entry, ftrace-ramoops. This file has no ID suffix, so its content will be appended and rotate when the specified record size is reached. Let’s now show its content:

root@imx8mmevk-labcsmart:~# cat /sys/fs/pstore/ftrace-ramoops
0 c08bf830  c08bfbf0  do_page_fault.part.8 <- do_page_fault+0x3c/0xa8
0 c001b770  c08bfb48  fixup_exception <- do_page_fault.part.8+0x32c/0x398
0 c0045bb0  c001b780  search_exception_tables <- fixup_exception+0x20/0x38
0 c008914c  c0045bd8  search_module_extables <- search_exception_tables+0x38/0x44
0 c08bff5c  c008915c  add_preempt_count <- search_module_extables+0x24/0xc0
0 c08bfe78  c00891cc  sub_preempt_count <- search_module_extables+0x94/0xc0
0 c08b2e28  c08bfb64  __do_kernel_fault.part.7 <- do_page_fault.part.8+0x348/0x398
ftrace-ramoops dump

In our case, the above function call chain correspond to the path of the panic function. For a real bug, we would see exactly what functions where running during the kernel ‘s last breath.

Conclusion

In this blog post, we have learnt how to keep kernel logs/traces persistent in RAM, between reboots. Though we could have used other storage backends (block device, nand), not all systems have the luxury of being equipped with such devices. There are constrained system were the RAM is the only storage backend available. That said, we also discussed the case where a block device would be used. And you, what do you do to capture the Linux kernel logs/traces before it gives its last breath? Please post in comments.

 

Your email address will not be published.