Skip to content

RHEL 5: iLO Unexpectedly Initiated ASR

I received a task to determine the root cause of server downtime. According to the info, the server was suddenly hung, it is running RHEL 5.2 on a HP Proliant Blade.

Operating System Info

[root@localhost ~]# uname -a
Linux localhost 2.6.18-92.el5 #1 SMP Tue Apr 29 13:16:15 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux
[root@localhost ~]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.2 (Tikanga)

Hardware Info

hpasmcli> show server
System        : ProLiant BL460c G1
Serial No.    : [deleted]
ROM version   : I15 02/29/2008
iLo present   : Yes
Embedded NICs : 2
        NIC1 MAC: 00:21:5a:48:7b:cc
        NIC2 MAC: 00:21:5a:48:7b:ca

Processor: 0
        Name         : Intel Xeon
        Stepping     : 11
        Speed        : 3000 MHz
        Bus          : 1333 MHz
        Core         : 4
        Thread       : 4
        Socket       : 1
        Level2 Cache : 8192 KBytes
        Status       : Ok

Processor total  : 1

Memory installed : 16384 MBytes
ECC supported    : Yes

Error message from syslog

May 17 07:31:06 localhost kernel: ipmi_si(SI_CHECK_BMC): Failed to get Global Enables 0xc6.
May 17 07:31:16 localhost hpasmxld[12656]: OsKcsExecCmd:  IPMI NetFN  0x36   CMD: 0x2 has timed out!
May 17 07:31:46 localhost last message repeated 3 times
May 17 07:31:46 localhost hpasmxld[12656]: iLO 2 Communications Error - Attempting synchronization!
May 17 07:32:31 localhost hpasmxld[12656]: iLO 2 has responded to reset request . . .
May 17 07:32:31 localhost hpasmxld[12656]: Stopping the Watchdog Timer . . .
May 17 07:32:31 localhost hpasmxld[12656]: Resetting Internal Data structures . . .
May 17 07:32:31 localhost hpasmxld[12656]: Initializing Internal Data structures from iLO 2. . .
May 17 07:32:31 localhost hpasmxld[12656]: The iLO 2 reset / synchronization has completed successfully
May 17 07:32:31 localhost kernel: hpasmxld[12656]: segfault at 0000000000000031 rip 0000000000000031 rsp 00007fff20cc7808 error 4

It seems iLO was unexpectedly initiated Automatic Server Recovery (ASR) to the server, this is admit by HP Support Document. Nevertheless, I really disappointed because I did not find ASR detected in the log (based on hplog -v) when the error occurred. Hmm…

According to the solution given in that support document, HP advise customer to uninstall hp-OpenIPMI package.

About hp-OpenIPMI

[root@localhost ~]# rpm -qi hp-OpenIPMI
Name        : hp-OpenIPMI                  Relocations: (not relocatable)
Version     : 8.0.0                             Vendor: Hewlett-Packard Company
Release     : 113.rhel5                     Build Date: Sat 24 Nov 2007 02:07:02 AM SGT
Install Date: Fri 13 Jun 2008 01:31:16 PM SGT      Build Host: rhel5e
Group       : System Environment/Kernel     Source RPM: hp-OpenIPMI-8.0.0-113.rhel5.src.rpm
Size        : 6860802                          License: GNU Public License
Signature   : (none)
Packager    : Hewlett-Packard Company
URL         : http://www.hp.com/linux
Summary     : OpenIPMI +HP
Description :
This is an upgraded version of the Open IPMI device driver that is shipped as part of the standard Linux kernel. This release is for Linux 2.6.18+ kernels. This provides support for PCI Based Base Management Controllers that are truly interrupt driven. This package will NOT activate on it's own.  The drivers for this release are place in the /opt/hp/hp-OpenIPMI/bin with a script that can be used to launch the IPMI drivers.  This has been done as the changes made to the IPMI drivers are expected to be included in future Linux kernels.

The hp-OpenIPMI driver can be built for any kernel like any other GPL Open Source application.  You need to load the appropriate kernel-devel (for Red Hat releases) package to do this.

Or should I disable ASR too? According to the discussion in ITRC, disabling ASR can also prevent this problem from happening again.

Comments

{ 5 } Comments

  1. Miljan | May 27, 2009 at 9:25 pm | Permalink
    Using Mozilla Mozilla 1.8.1.19 on Linux Linux

    I had the same problem recently on couple of servers. It seems to be a problem known for quite some time. It is disappointing to see that even new machines are shipped with this bug. Now one of the first thing I do when installing HP Proliant servers is to disable ASR. I sleep much better knowing it is turned off. :)

    Using Mozilla Mozilla 1.8.1.19 on Linux Linux
  2. Irwan | May 28, 2009 at 3:54 am | Permalink
    Using Netfront Netfront 3.3 on SonyEricsson SonyEricsson W850i

    Thanks for sharing your experience. I think this ASR-thingy is a serious problem. It’s supposed to help us but now it’s giving me a headache. Not too long ago, I also experienced unexpected ASR on VMware ESX running on Proliant server.

    Using Netfront Netfront 3.3 on SonyEricsson SonyEricsson W850i
  3. marjan | May 28, 2009 at 11:45 pm | Permalink
    Using Mozilla Firefox Mozilla Firefox 3.0.10 on Windows Windows XP

    or switch to DELL poweredge… cheaper and less hassle ;)

    Using Mozilla Firefox Mozilla Firefox 3.0.10 on Windows Windows XP
  4. Irwan | May 29, 2009 at 7:46 pm | Permalink
    Using Netfront Netfront 3.3 on SonyEricsson SonyEricsson W850i

    Dell? Ewl!

    Using Netfront Netfront 3.3 on SonyEricsson SonyEricsson W850i
  5. piju | May 30, 2009 at 5:18 pm | Permalink
    Using Mozilla Firefox Mozilla Firefox 3.0.10 on Mac OS X Mac OS X 10

    Mac OSX server!

    Using Mozilla Firefox Mozilla Firefox 3.0.10 on Mac OS X Mac OS X 10

{ 1 } Trackback

  1. Using WordPress WordPress 2.7

    […] Read more here: RHEL 5: iLO Unexpectedly Initiated ASR […]

Post a Comment

Your email is never published nor shared. Required fields are marked *