Analyzing ESXi Log Files to Identify Storage and Multipathing Problems

by admin

When troubleshooting storage and multipathing issues it is often necessary to check log files to fully understand where the problem lies. Most commonly the log to check is the vmkernel log file. The log file is viewed, most commonly, either using the DCUI, or through using the ESXi Shell. Using the DCUI you can view it by selecting ‘View System Logs’, then option 2 for Vmkernel:

view-system-logs

As with other log files, if using the shell, they can be found in /var/log:

esxi-storage-log-files

I’ve written previously about the different ways to examine log files here.┬áMy preference is to use the ESXi Shell to examine logs as shell tools such as grep, tail, cat etc can be used to help find and display log entries. For example, to search of all instances of the word ‘failed’ in the vmkernel log file, you could run:

# grep -i failed /var/log/vmkernel.log | less

The ‘-i’ parameter will mean that the search isn’t case sensitive, as otherwise ‘failed’ and ‘Failed’ would be treated differently, and log entries may be easily overlooked. The resultant output when running this on my host is as follows:

2012-12-26T12:32:02.773Z cpu2:2050)NMP: nmp_ThrottleLogForDevice:2318: Cmd 0x12 (0x4124007b2640, 0) to dev "t10.F405E46494C4540013C625565687D2A6A75633D293877753" on path "vmhba33:C1:T0:L0" Failed: H:0x2 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0. Act:$
2012-12-26T12:32:02.848Z cpu2:2050)NMP: nmp_ThrottleLogForDevice:2318: Cmd 0x12 (0x4124007b2640, 0) to dev "t10.F405E46494C4540013C625565687D2A6A75633D293877753" on path "vmhba33:C1:T0:L0" Failed: H:0x2 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0. Act:$
2012-12-26T12:32:02.923Z cpu2:2050)NMP: nmp_ThrottleLogForDevice:2318: Cmd 0x12 (0x4124007b2640, 0) to dev "t10.F405E46494C4540013C625565687D2A6A75633D293877753" on path "vmhba33:C1:T0:L0" Failed: H:0x2 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0. Act:$
2012-12-26T12:32:02.994Z cpu2:2050)NMP: nmp_ThrottleLogForDevice:2318: Cmd 0x12 (0x4124007b2640, 0) to dev "t10.F405E46494C4540013C625565687D2A6A75633D293877753" on path "vmhba33:C1:T0:L0" Failed: H:0x2 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0. Act:$
2012-12-26T12:32:03.069Z cpu2:2050)NMP: nmp_ThrottleLogForDevice:2318: Cmd 0x12 (0x4124007b2640, 0) to dev "t10.F405E46494C4540013C625565687D2A6A75633D293877753" on path vm"hba33:C1:T0:L0" Failed: H:0x2 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0. Act:$
2012-12-26T12:32:03.142Z cpu2:2050)NMP: nmp_ThrottleLogForDevice:2318: Cmd 0x12 (0x4124007b2640, 0) to dev "t10.F405E46494C4540013C625565687D2A6A75633D293877753" on path "vmhba33:C1:T0:L0" Failed: H:0x2 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0. Act:$
2012-12-26T12:32:03.214Z cpu2:2050)NMP: nmp_ThrottleLogForDevice:2318: Cmd 0x12 (0x4124007b2640, 0) to dev "t10.F405E46494C4540013C625565687D2A6A75633D293877753" on path "vmhba33:C1:T0:L0" Failed: H:0x2 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0. Act:$
2012-12-26T12:32:03.288Z cpu2:2050)NMP: nmp_ThrottleLogForDevice:2318: Cmd 0x12 (0x4124007b2640, 0) to dev "t10.F405E46494C4540013C625565687D2A6A75633D293877753" on path "vmhba33:C1:T0:L0" Failed: H:0x2 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0. Act:$

The example above shows log entries showing that a path to a storage device has failed. The adapter is ‘vmhba33’ and the device is ‘t10.F405E46494C4540013C625565687D2A6A75633D293877753’. The Runtime path is vmhba33:C1:T0:L0. Looking in the vSphere client, I can see that a path to this device is indeed unavailable:

iscsi-dead-path

Looking more closely at the vmkernel log extract, take notice of the SCSI Sense Codes, shown in bold text below:

2012-12-26T12:32:03.214Z cpu2:2050)NMP: nmp_ThrottleLogForDevice:2318: Cmd 0x12 (0x4124007b2640, 0) to dev "t10.F405E46494C4540013C625565687D2A6A75633D293877753" on path "vmhba33:C1:T0:L0" Failed: H:0x2 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0. Act:$

In the example above the H:0x2 code suggests that there is an issue with the host’s connectivity to the device. VMware cover how it interpret SCSI codes in detail here. As stated in that article the parts of the code relate to the following:

  • H = Host Status
  • D = Device Status
  • P = Plugin Status

The sense code that follows will give further information on the nature of the issue. More useful information from VMware on SCSI sense codes can be found in this KB article.


Keep up to date with new posts on Buildvirtual.net - Follow us on Twitter:
Be Sociable, Share!

Leave a Comment

*

Previous post:

Next post: