Although the Linux kernel is one of the most stable kernels, we are not immune to a possible crash, whether due to a proprietary or client driver whose robustness has not been tested or an inherent bug to the core not yet fixed. The importance of logs/traces in the resolution of such bugs being paramount, we experimented a in-kernel feature for one of our customers, which consists of making persistent (or rather semi-persistent) logs resulting from said bugs.
The idea is to capture kernel logs in its dying moments and make them available even after a reboot, so they can be analyzed later. This feature is called pstore
, which stands for Persistent storage. In this blog post, we will explore the capabilities this feature offers.
Configuring the Linux kernel for pstore
Pstore was meant for platforms that need a mechanism to preserve kernel-related logs across system reboots. The kernel version on which this feature has been implemented for our customer is v5.4.Y
. In this version, the logs are only RAM backed, thus it is sometime referred to as Ramoops
. This means that the expectation is that from time to time, saved log files are removed from the persistent store quite often and saved elsewhere, to free up space ready for the next crashes. You might have a service doing that upon reboots.
Back to the configuration, the followings are kernel options that need to be enabled to support this feature:
The followings are the explanations of each options that is enables in the previous configuration excerpt:
CONFIG_PSTORE
: This is the main option that enables the pstore featureCONFIG_PSTORE_RAM
: This enables panic and oops messages to be logged to a circular buffer in RAM where it can be read back at some later point
It might be interesting to have the kernel trace logs along with the OOPs, which could tell for example the last function that was executed by the kernel before it gave its last breath. To do that, we have to enable the following options:
CONFIG_FTRACE
: enables ftrace, the Linux kernel tracer, which will allow to select the next option.CONFIG_FUNCTION_TRACER
: This will add the support for kernel function tracing and will makeCONFIG_PSTORE_FTRACE
accessibleCONFIG_PSTORE_FTRACE
: With this option, the Linux kernel traces function calls into a persistent ram buffer that can be decoded and dumped after reboot through pstore filesystem. Upon reboot, captured trace will be available in/sys/fs/pstore/ftrace-ramoops
file.
We could have extended our configuration with the following options:
CONFIG_PSTORE_CONSOLE
: When the option is enabled, pstore will log all kernel messages, even if no oops or panic occurred.CONFIG_PSTORE_PMSG
: When the option is enabled, pstore will export a character interface/dev/pmsg0
to log user space messages. Writes to/dev/pmsg0
are appended to the buffer, and on reboot, the persistent contents are available in/sys/fs/pstore/pmsg-ramoops-[ID]
, where[ID]
is the record number of dumps of this same kind.
You should note that in the kernel version we have tested this feature with, the only storage backend was the RAM. In recent kernel versions (starting from 5.8), it becomes possible to select a block device as backing store. To do so, you need to play with the following options (more on that here: https://www.kernel.org/doc/html/latest/admin-guide/pstore-blk.html):
CONFIG_PSTORE_BLK
: enables pstore support for block devices.CONFIG_PSTORE_BLK_BLKDEV
: specifies the block device that is to be used to store the dumps. i.e/dev/<disk_name>p<decimal>
or EFI/GPT UUID partitionCONFIG_PSTORE_BLK_KMSG_SIZE
: This just sets size of kernel message dump (oops, panic, etc) log for pstore/blk. The size is in KB and must be a multiple of 4.CONFIG_PSTORE_BLK_MAX_REASON
: The maximum reason for kernel message dumps to store.
Now that we are done with the kernel configuration that is related to pstore, we still miss some information about the memory regions where the logs will be stored.
Adapting the device tree
Ramoops needs to be provided with a reserved memory area to store the dump into. Though this can be done in different manners, the method we have chosen for this blog post is the device tree. From the device tree then, this memory area is described by a start address and a size using the following variables:
mem_address
: the address where the memory region startsmem_size
: for the size or the memory region. The memory size will be rounded down to a power of two.record-size
,console-size
, andpmsg-size
, depending on the options that we have enabled
The following is the complete device tree node that describes our pstore memory regions and its different record sizes:
In the previous snippet, we have added some properties that are related to both the console
and pmsg
(user message) pstore. The following are more details on each property:
record-size
: Instructs ramoops to use up to 32kB per record to store kernel log messages when a panic occurs. That is, each file storing such dump will be up to 32kB sized.console-size
: this is the maxim size per console output record, 32kB in our case. Do note that when enabled, console output is stored even if a panic did not occur.pmsg-size
: this is the maximum record size per user message. In our case, it is up to 4kB. It means when this size is reached, another file will be created in the pstore filesystem to store the next incoming messages, and so on.
It is also possible to specify the ramoops settings via module parameter. To do that, you can look at /sys/module/pstore/parameters/
and /sys/module/ramoops/parameters/
to see what possibilities you have.
Now that we have described the memory regions to the pstore infrastructure, the time has come for us to do some testings.
Pstore live tests
To be able to get the best out of the pstore mechanism, we need to mount the pstore filesystem as the following:
According to the above excerpt, the dump data will be read from /sys/fs/pstore
, mounted as the pstore filesystem. For any oops/panic that occurs, the format of the file is dmesg-ramoops-N
, where N
is the record number in memory. These files will be created and will contain the captured oops/panic, provided CONFIG_PSTORE_FTRACE
option is enabled.
To be able to live test the Pstore mechanism, we need to manually trigger kernel panics. Note that on the system this feature has been tested on, the reboot timeout after a panic was set to 5s. Now we trigger the panic with the following command:
After we have triggered the panic, our system will reboot 5s later. If your system is not configured with reboot on panic option, you can just push the reset button of the board.
Now, after remounting the pstore filesystem has we have seen in the beginning of this section (provided it is not automatically mounted), we can check its content again, where we will see a new dmesg-ramoops-0
file, which corresponds to the dump that has been captured during the previous kernel panic. If we show the content of that captured oops message, we will have something like the following:
In the previous excerpt, -0
in the the file name correspond to the number of the record. By triggering another kernel panic, then we will have two files and the new entry will be named dmesg-ramoops-1
, and so on. To delete a stored record from RAM, you can rm
the respective pstore file. Removing the file will notify to the underlying persistent storage backend that it can reclaim the space for later re-use.
User message log test
While configuring pstore, we have seen that enabling CONFIG_PSTORE_PMSG
will result in creating a /dev/pmsg0
character device, which allows to store (by simply echoing in to it) user messages that will be persistent between reboots. The following is an example:
After the reboot we can read what we have stored before:
In the previous output, we can see how the content that has been echo’ed in /dev/pmesg0
has been appended to the corresponding persistent pstore buffer, which is pmesg-ramoops-0
. At this point we are familiar with the most basic pstore mechanisms. We can go further by also grabbing ftrace function call traces to make the debugging easier, as we will see in the next section.
Persistent function tracing
Persistent function tracing might be useful to ease the debugging software or hardware related hangs. The functions call chain log is stored in a ftrace-ramoops
file. To enable ftrace records, you have to run the following command;
In the previous command, the debug file system is assumed to be mounted under /sys/kernel/debug/
. This path must be modified according to your system. If it is not the case, you can mount it with the following command:
After triggering a reboot, we can check again the content of our pstore filesystem:
In the previous logs, we can see the files resulting from the previous panics, as well as a new entry, ftrace-ramoops
. This file has no ID suffix, so its content will be appended and rotate when the specified record size is reached. Let’s now show its content:
In our case, the above function call chain correspond to the path of the panic function. For a real bug, we would see exactly what functions where running during the kernel ‘s last breath.
Conclusion
In this blog post, we have learnt how to keep kernel logs/traces persistent in RAM, between reboots. Though we could have used other storage backends (block device, nand), not all systems have the luxury of being equipped with such devices. There are constrained system were the RAM is the only storage backend available. That said, we also discussed the case where a block device would be used. And you, what do you do to capture the Linux kernel logs/traces before it gives its last breath? Please post in comments.