I have a stability problem with Qlogic QLE2562 FC-8 HBA on RHEL5 U3...
on about 6 SunFire X4270 (x64) servers. The strange thing is that 2
other same servers are working well. After a few minutes/hours, I've
got the following trace (see below).
I tried the qla2xxx drivers from qlogic.com without success (with
embedded 4.04 and 4.06 fw), same problem ... I tried several PCI slots
without any success. Any idea / suggestions ?
Regards,
Stephane
qla2xxx_eh_abort(

: aborting sp ffff81037d86ebc0 from RISC. pid=952
sp->state=7 q->q_flag=2
qla2xxx 0000:2f:00.1: Mailbox command timeout occurred. Issuing ISP
abort.
NMI Watchdog detected LOCKUP on CPU 13
CPU 13
Modules linked in: autofs4 sunrpc ipv6 xfrm_nalgo crypto_api
cpufreq_ondemand acpi_cpufreq freq_table dm_mirror dm_multipath
scsi_dh video hwmon backlight sbs i2c_ec button battery asus_acpi
acpi_memhotplug ac parport_pc lp parport joydev qla2xxx(U) qla2xxx_conf
(U) igb i2c_i801 intermodule(U) i2c_core sg pcspkr dm_raid45
dm_message dm_region_hash dm_log dm_mod dm_mem_cache ahci libata
shpchp mptsas mptscsih mptbase scsi_transport_sas sd_mod scsi_mod ext3
jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 2982, comm: scsi_eh_8 Tainted: G 2.6.18-128.el5 #1
RIP: 0010:[<ffffffff8000c6f2>] [<ffffffff8000c6f2>] __delay+0x8/0x10
RSP: 0018:ffff81067dc7db88 EFLAGS: 00000097
RAX: 00000000ecd06b41 RBX: 000000000018c42b RCX: 00000000ecd05808
RDX: 0000000000000324 RSI: 0000000000000046 RDI: 0000000000003689
RBP: ffffc20000034000 R08: 0000000000000002 R09: ffff81067dc7db54
R10: 0000000000000001 R11: ffffffff80213fbd R12: ffff81037e84c4f8
R13: 0000000000000246 R14: 0000000000000001 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff81067fc46140(0000) knlGS:
0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00000000006bb424 CR3: 000000067d035000 CR4: 00000000000006e0
Process scsi_eh_8 (pid: 2982, threadinfo ffff81067dc7c000, task
ffff81010c6ec040)
Stack: ffffffff8827f743 ffff81037e84c4f8 ffff81067dc7dc90
ffff81060000dc20
ffff81037fa461c8 ffff81037e84c4f8 ffff81067dc7dc90 0000000000000100
ffffffff88285488 ffff81037fa461c8 ffff81037e84c4f8 ffff81067dc7dc90
Call Trace:
[<ffffffff8827f743>] :qla2xxx:qla2x00_reset_chip+0x157/0x47e
[<ffffffff88285488>] :qla2xxx:qla2x00_abort_isp+0x6c/0x70b
[<ffffffff88286dfd>] :qla2xxx:qla2x00_mailbox_command+0x48e/0x553
[<ffffffff88286960>] :qla2xxx:qla2x00_mbx_sem_timeout+0x0/0xf
[<ffffffff882886f5>] :qla2xxx:qla2x00_issue_iocb_timeout+0x5f/0xc0
[<ffffffff88288fd0>] :qla2xxx:qla24xx_abort_command+0xf9/0x1a5
[<ffffffff88289099>] :qla2xxx:qla2x00_abort_command+0x1d/0x124
[<ffffffff80064c08>] _spin_unlock_irqrestore+0x8/0x9
[<ffffffff8827f1e6>] :qla2xxx:qla2xxx_eh_abort+0x9f8/0xba0
[<ffffffff8009d909>] keventd_create_kthread+0x0/0xc4
[<ffffffff8807919f>] :scsi_mod:scsi_error_handler+0x290/0x4ac
[<ffffffff88078f0f>] :scsi_mod:scsi_error_handler+0x0/0x4ac
[<ffffffff8009d909>] keventd_create_kthread+0x0/0xc4
[<ffffffff80032360>] kthread+0xfe/0x132
[<ffffffff8005dfb1>] child_rip+0xa/0x11
[<ffffffff8009d909>] keventd_create_kthread+0x0/0xc4
[<ffffffff80032262>] kthread+0x0/0x132
[<ffffffff8005dfa7>] child_rip+0x0/0x11
Code: 29 c8 48 39 f8 72 f5 c3 41 54 83 3d ad d8 3c 00 00 49 89 f4
Kernel panic - not syncing: nmi watchdog
BUG: warning at kernel/panic.c:137/panic() (Tainted: G )
Call Trace:
<NMI> [<ffffffff8008efff>] panic+0x1da/0x1eb
[<ffffffff8006ba21>] _show_stack+0xdb/0xea
[<ffffffff8006bb14>] show_registers+0xe4/0x100
[<ffffffff8006537d>] die_nmi+0x66/0xa3
[<ffffffff80065ac3>] nmi_watchdog_tick+0x157/0x1d3
[<ffffffff800656e1>] default_do_nmi+0x81/0x225
[<ffffffff8006594e>] do_nmi+0x43/0x61
[<ffffffff80064fa7>] nmi+0x7f/0x88
[<ffffffff80213fbd>] pci_mmcfg_read+0x0/0x92
[<ffffffff8000c6f2>] __delay+0x8/0x10
<<EOE>> [<ffffffff8827f743>] :qla2xxx:qla2x00_reset_chip+0x157/0x47e
[<ffffffff88285488>] :qla2xxx:qla2x00_abort_isp+0x6c/0x70b
[<ffffffff88286dfd>] :qla2xxx:qla2x00_mailbox_command+0x48e/0x553
[<ffffffff88286960>] :qla2xxx:qla2x00_mbx_sem_timeout+0x0/0xf
[<ffffffff882886f5>] :qla2xxx:qla2x00_issue_iocb_timeout+0x5f/0xc0
[<ffffffff88288fd0>] :qla2xxx:qla24xx_abort_command+0xf9/0x1a5
[<ffffffff88289099>] :qla2xxx:qla2x00_abort_command+0x1d/0x124
[<ffffffff80064c08>] _spin_unlock_irqrestore+0x8/0x9
[<ffffffff8827f1e6>] :qla2xxx:qla2xxx_eh_abort+0x9f8/0xba0
[<ffffffff8009d909>] keventd_create_kthread+0x0/0xc4
[<ffffffff8807919f>] :scsi_mod:scsi_error_handler+0x290/0x4ac
[<ffffffff88078f0f>] :scsi_mod:scsi_error_handler+0x0/0x4ac
[<ffffffff8009d909>] keventd_create_kthread+0x0/0xc4
[<ffffffff80032360>] kthread+0xfe/0x132
[<ffffffff8005dfb1>] child_rip+0xa/0x11
[<ffffffff8009d909>] keventd_create_kthread+0x0/0xc4
[<ffffffff80032262>] kthread+0x0/0x132
[<ffffffff8005dfa7>] child_rip+0x0/0x11
BUG: warning at drivers/input/serio/i8042.c:846/i8042_panic_blink()
(Tainted: G )
Call Trace:
<NMI> [<ffffffff801fa015>] i8042_panic_blink+0x112/0x2a5
[<ffffffff8008efa5>] panic+0x180/0x1eb
[<ffffffff8006ba21>] _show_stack+0xdb/0xea
[<ffffffff8006bb14>] show_registers+0xe4/0x100
[<ffffffff8006537d>] die_nmi+0x66/0xa3
[<ffffffff80065ac3>] nmi_watchdog_tick+0x157/0x1d3
[<ffffffff800656e1>] default_do_nmi+0x81/0x225
[<ffffffff8006594e>] do_nmi+0x43/0x61
[<ffffffff80064fa7>] nmi+0x7f/0x88
[<ffffffff80213fbd>] pci_mmcfg_read+0x0/0x92
[<ffffffff8000c6f2>] __delay+0x8/0x10
<<EOE>> [<ffffffff8827f743>] :qla2xxx:qla2x00_reset_chip+0x157/0x47e
[<ffffffff88285488>] :qla2xxx:qla2x00_abort_isp+0x6c/0x70b
[<ffffffff88286dfd>] :qla2xxx:qla2x00_mailbox_command+0x48e/0x553
[<ffffffff88286960>] :qla2xxx:qla2x00_mbx_sem_timeout+0x0/0xf
[<ffffffff882886f5>] :qla2xxx:qla2x00_issue_iocb_timeout+0x5f/0xc0
[<ffffffff88288fd0>] :qla2xxx:qla24xx_abort_command+0xf9/0x1a5
[<ffffffff88289099>] :qla2xxx:qla2x00_abort_command+0x1d/0x124
[<ffffffff80064c08>] _spin_unlock_irqrestore+0x8/0x9
[<ffffffff8827f1e6>] :qla2xxx:qla2xxx_eh_abort+0x9f8/0xba0
[<ffffffff8009d909>] keventd_create_kthread+0x0/0xc4
[<ffffffff8807919f>] :scsi_mod:scsi_error_handler+0x290/0x4ac
[<ffffffff88078f0f>] :scsi_mod:scsi_error_handler+0x0/0x4ac
[<ffffffff8009d909>] keventd_create_kthread+0x0/0xc4
[<ffffffff80032360>] kthread+0xfe/0x132
[<ffffffff8005dfb1>] child_rip+0xa/0x11
[<ffffffff8009d909>] keventd_create_kthread+0x0/0xc4
[<ffffffff80032262>] kthread+0x0/0x132
[<ffffffff8005dfa7>] child_rip+0x0/0x11
BUG: warning at drivers/input/serio/i8042.c:849/i8042_panic_blink()
(Tainted: G )
Call Trace:
<NMI> [<ffffffff801fa0fe>] i8042_panic_blink+0x1fb/0x2a5
[<ffffffff8008efa5>] panic+0x180/0x1eb
[<ffffffff8006ba21>] _show_stack+0xdb/0xea
[<ffffffff8006bb14>] show_registers+0xe4/0x100
[<ffffffff8006537d>] die_nmi+0x66/0xa3
[<ffffffff80065ac3>] nmi_watchdog_tick+0x157/0x1d3
[<ffffffff800656e1>] default_do_nmi+0x81/0x225
[<ffffffff8006594e>] do_nmi+0x43/0x61
[<ffffffff80064fa7>] nmi+0x7f/0x88
[<ffffffff80213fbd>] pci_mmcfg_read+0x0/0x92
[<ffffffff8000c6f2>] __delay+0x8/0x10
<<EOE>> [<ffffffff8827f743>] :qla2xxx:qla2x00_reset_chip+0x157/0x47e
[<ffffffff88285488>] :qla2xxx:qla2x00_abort_isp+0x6c/0x70b
[<ffffffff88286dfd>] :qla2xxx:qla2x00_mailbox_command+0x48e/0x553
[<ffffffff88286960>] :qla2xxx:qla2x00_mbx_sem_timeout+0x0/0xf
[<ffffffff882886f5>] :qla2xxx:qla2x00_issue_iocb_timeout+0x5f/0xc0
[<ffffffff88288fd0>] :qla2xxx:qla24xx_abort_command+0xf9/0x1a5
[<ffffffff88289099>] :qla2xxx:qla2x00_abort_command+0x1d/0x124
[<ffffffff80064c08>] _spin_unlock_irqrestore+0x8/0x9
[<ffffffff8827f1e6>] :qla2xxx:qla2xxx_eh_abort+0x9f8/0xba0
[<ffffffff8009d909>] keventd_create_kthread+0x0/0xc4
[<ffffffff8807919f>] :scsi_mod:scsi_error_handler+0x290/0x4ac
[<ffffffff88078f0f>] :scsi_mod:scsi_error_handler+0x0/0x4ac
[<ffffffff8009d909>] keventd_create_kthread+0x0/0xc4
[<ffffffff80032360>] kthread+0xfe/0x132
[<ffffffff8005dfb1>] child_rip+0xa/0x11
[<ffffffff8009d909>] keventd_create_kthread+0x0/0xc4
[<ffffffff80032262>] kthread+0x0/0x132
[<ffffffff8005dfa7>] child_rip+0x0/0x11
BUG: warning at drivers/input/serio/i8042.c:851/i8042_panic_blink()
(Tainted: G )
Call Trace:
<NMI> [<ffffffff801fa17b>] i8042_panic_blink+0x278/0x2a5
[<ffffffff8008efa5>] panic+0x180/0x1eb
[<ffffffff8006ba21>] _show_stack+0xdb/0xea
[<ffffffff8006bb14>] show_registers+0xe4/0x100
[<ffffffff8006537d>] die_nmi+0x66/0xa3
[<ffffffff80065ac3>] nmi_watchdog_tick+0x157/0x1d3
[<ffffffff800656e1>] default_do_nmi+0x81/0x225
[<ffffffff8006594e>] do_nmi+0x43/0x61
[<ffffffff80064fa7>] nmi+0x7f/0x88
[<ffffffff80213fbd>] pci_mmcfg_read+0x0/0x92
[<ffffffff8000c6f2>] __delay+0x8/0x10
<<EOE>> [<ffffffff8827f743>] :qla2xxx:qla2x00_reset_chip+0x157/0x47e
[<ffffffff88285488>] :qla2xxx:qla2x00_abort_isp+0x6c/0x70b
[<ffffffff88286dfd>] :qla2xxx:qla2x00_mailbox_command+0x48e/0x553
[<ffffffff88286960>] :qla2xxx:qla2x00_mbx_sem_timeout+0x0/0xf
[<ffffffff882886f5>] :qla2xxx:qla2x00_issue_iocb_timeout+0x5f/0xc0
[<ffffffff88288fd0>] :qla2xxx:qla24xx_abort_command+0xf9/0x1a5
[<ffffffff88289099>] :qla2xxx:qla2x00_abort_command+0x1d/0x124
[<ffffffff80064c08>] _spin_unlock_irqrestore+0x8/0x9
[<ffffffff8827f1e6>] :qla2xxx:qla2xxx_eh_abort+0x9f8/0xba0
[<ffffffff8009d909>] keventd_create_kthread+0x0/0xc4
[<ffffffff8807919f>] :scsi_mod:scsi_error_handler+0x290/0x4ac
[<ffffffff88078f0f>] :scsi_mod:scsi_error_handler+0x0/0x4ac
[<ffffffff8009d909>] keventd_create_kthread+0x0/0xc4
[<ffffffff80032360>] kthread+0xfe/0x132
[<ffffffff8005dfb1>] child_rip+0xa/0x11
[<ffffffff8009d909>] keventd_create_kthread+0x0/0xc4
[<ffffffff80032262>] kthread+0x0/0x132
[<ffffffff8005dfa7>] child_rip+0x0/0x11