Apr 15, 2012

Facing up to a kernel panic (I)

Is there any Linux Engineer who has not run into a kernel panic throughout his career?

Let's get started by explaining what is a kernel panic. A kernel panic is a kind of error generated when the kernel receives an instruction in an unexpected or unknown format and cannot process it, and this situation leads in general to a system crash. A kernel panic can show up when the operating system is not able to recover from another preceding error, attempts to access invalid memory segments, a damage, incompatible or unsupported software or hardware, etc.

How must we face up to a kernel panic? First of all, by keeping calm and applying common sense.

I am going to simulate a typical kernel panic by corrupting the content of the /sbin/init file. If you try to do it when the operating system is running in any of the existing runlevels, you will get an error as follows (also mention that for this article, I will use a CentOS 6.2 distribution).

[root@localhost ~]# >/sbin/init
-bash: /sbin/init: Text file busy

The above message is normal. There is no way that init dies or is killed when is running.

[root@localhost ~]# pgrep init ; kill -SIGKILL 1 ; pgrep init
1
1

So as to harm the /sbin/init file, you can boot the system in rescue mode and then, perform it.

bash-4.1# chroot /mnt/sysimage/

sh-4.1# >/sbin/init ; exit

bash-4.1# sync ; reboot

Now if you restart the system, you will come across the following sequence of messages.

dracut: Mounted root filesystem /dev/mapper/VolGroup-lv_root
dracut: Loading SELinux policy
dracut: Switching root
Kernel panic - not syncing: Attempted to kill init!
Pid: 1, comm: switch_root Not tainted 2.6.32-220.7.1.el6.i686 #1
...

In the previous kernel panic, the system is trying to run a sync call in order to push all data into the hard drive before going down. And the second part of the message is the text string passed to the panic function. When a kernel panic is raised, Linux calls the panic routine defined in kernel/exit.c.

In this situation, the quickest action is to pick up the error line, go to Google and try to find out more information about the kernel panic by looking up related cases. If  this process does not work, then we will have to  read up on the system by starting it in rescue mode, so as to go over the different log files and try to obtain more information.

Specifically in this case, the booting in rescue mode will not be useful, since the filesystem will not have been able to be mounted in read/write mode, and consequently, the logs will not have been able to be dumped into disk. Perhaps this is the worst case, when the init process acts up. Otherwise, you would have been able to investigate files such as messages, dmesg, boot, etc.


No comments:

Post a Comment