May 6, 2012

Facing up to a kernel panic (III)

Let's learn how to debug a kernel panic. This is the continuation of the two previous articles about facing up to a kernel panic (I, II). This method can be really useful when after have been looking into a kernel panic, you have not been able to find out anything about it.

First of all, you need to enable the debug repository in order to install the kernel-debuginfo package, which provides debug information for the kernel.

Aside from this package, you also have to install kexec-tools, which contains the kexec binary. This application allows to load and boot into another kernel from the currently running kernel, by performing the function of the boot loader from within the own kernel. That is to say, the first kernel reserves a small size of memory used by the second kernel afterwards. In this way, the kernel panic will be caught from the context of a second booted kernel and not from the context of the crashed kernel.

And finally, crash is a tool used to analyze the state of the system while it is running, or as in our example, after a kernel crash has come out and a core dump has been generated, in this case by kdump, utility also provided by the kexec-tools package.

[root@localhost ~]# cat /etc/yum.repos.d/CentOS-Debuginfo.repo
...
enabled=1

[root@localhost ~]# yum install crash kexec-tools kernel-debuginfo

Now you just need to edit the grub.conf file and add the crashkernel parameter to the line of your current kernel. This option is used to allocate enough memory for the second kernel, for example 128 MB. If you do not want to modify this file, another choice is to aggregate this parameter in real time when you start your system and come across with the grub menu, by editing the corresponding stanza of your kernel. In addition, you will have to enable the automatic startup of kdump.

[root@localhost ~]# cat /etc/grub.conf
...
title CentOS (2.6.32-220.13.1.el6.i686)
    root (hd0,0)
    kernel /vmlinuz-2.6.32-220.13.1.el6.i686 ro root=/dev/mapper/VolGroup-lv_root rd_NO_LUKS LANG=en_US.UTF-8  KEYBOARDTYPE=pc KEYTABLE=es rd_NO_MD rd_LVM_LV=VolGroup/lv_swap SYSFONT=latarcyrheb-sun16 rhgb crashkernel=128M quiet rd_LVM_LV=VolGroup/lv_root rd_NO_DM
...

[root@localhost ~]# chkconfig kdump on

Now you have to reboot your system in order to load the second kernel. If you take a look at the messages log file, you will see that a memory allocation has been carried out.

[root@localhost ~]# less /var/log/messages | grep crash
Apr 22 21:54:40 localhost kernel: Reserving 128MB of memory at 16MB for crashkernel (System RAM: 512MB)
Apr 22 21:54:40 localhost kernel: Kernel command line: ro root=/dev/mapper/VolGroup-lv_root rd_NO_LUKS LANG=en_US.UTF-8  KEYBOARDTYPE=pc KEYTABLE=es rd_NO_MD rd_LVM_LV=VolGroup/lv_swap SYSFONT=latarcyrheb-sun16 rhgb crashkernel=128M quiet rd_LVM_LV=VolGroup/lv_root rd_NO_DM
Apr 22 21:54:40 localhost kernel: crash memory driver: version 1.1

From now on, if you run into a crash, a core dump will be generated. So as to try out this functionality, let's simulate a kernel panic by giving off the 'c' SysRq key to the kernel. A SysRq signal allows to hit the kernel, which will respond immediately regardless of whatever else it is doing at that moment. This key performs a system crash without first trying to unmount file systems or syncing disks attached to the system.

[root@localhost ~]# echo c > /proc/sysrq-trigger

After the core dump is captured, the system will be rebooted. Be patient because this operation can take up a long time, depending on the size of memory used by the system. You have to be aware of that the entire state of the system must be saved into disk. When the system has started over, a core file will have been dumped into the /var/crash directory.

[root@localhost ~]# file /var/crash/127.0.0.1-2012-04-22-22\:45\:26/vmcore 
/var/crash/127.0.0.1-2012-04-22-22:45:26/vmcore: data

Now we are ready to research the core dump by means of crash. This application, similar to gdb, consists of common kernel core analysis tools such as kernel stack back traces of all processes, source code disassembly, formatted kernel structure and variable displays, virtual memory data, dumps of linked-lists, and so on.

To use this tool, you have to pass through the command line three parameters. First of all, the System.map file (symbol table used by the kernel) of the original kernel which was running when the system crashed. Secondly, an uncompressed kernel image which has been compiled with the '-g' option. And finally, the kernel core dump created in this case by kdump.

[root@localhost ~]# uname -r
2.6.32-220.13.1.el6.i686

[root@localhost ~]# crash /boot/System.map-2.6.32-220.13.1.el6.i686 /usr/lib/debug/lib/modules/2.6.32-220.7.1.el6.centos.plus.i686/vmlinux /var/crash/127.0.0.1-2012-04-22-22\:45\:26/vmcore
...
  SYSTEM MAP: /boot/System.map-2.6.32-220.13.1.el6.i686                
DEBUG KERNEL: /usr/lib/debug/lib/modules/2.6.32-220.7.1.el6.centos.plus.i686/vmlinux (2.6.32-220.7.1.el6.centos.plus.i686)
    DUMPFILE: /var/crash/127.0.0.1-2012-04-22-22:45:26/vmcore  [PARTIAL DUMP]
        CPUS: 1
        DATE: Sun Apr 22 22:45:16 2012
      UPTIME: 00:02:31
LOAD AVERAGE: 0.14, 0.16, 0.07
       TASKS: 85
    NODENAME: localhost.localdomain
     RELEASE: 2.6.32-220.13.1.el6.i686
     VERSION: #1 SMP Tue Apr 17 22:09:08 BST 2012
     MACHINE: i686  (1396 Mhz)
      MEMORY: 511.5 MB
       PANIC: "Oops: 0002 [#1] SMP " (check log for details)
         PID: 2065
     COMMAND: "bash"
        TASK: dfaed030  [THREAD_INFO: df8d0000]
         CPU: 0
       STATE: TASK_RUNNING (PANIC)

crash>

Rather than dumping the core into the /var/crash directory, you can configure kdump to copy the core into a remote server, by mounting a partition through NFS or even directly copying the file with scp. These options can be set in the kdump.conf file. This configuration file also includes another helpful directives which allow to do a series of tasks when a kernel crash has happened and the kdump kernel has been loaded.


No comments:

Post a Comment