How to use the new Network Health Check in VMware ESXi 5.1 to verify VLAN configuration on physical switches.
A quite common problem is that the configuration for the VLANs that are configured on the Port Groups on virtual switches and the physical switches could mismatch. For example, if a virtual switch has a portgroup with the VLAN number 200 and that specific VLAN is not allowed on the physical switch port, all frames tagged with this VLAN id will be dropped by the physical switch.
Making things worse in troubleshooting is that you often have several ESXi host which typically have many vmnics (server NIC ports) connecting to several physical switches. If a single switch port is incorrectly configured, like missing a certain VLAN, a VM could seem to work well, but depending on features like vMotion / DRS suddenly get moved to a port where this VLAN is not allowed. Finding the culprit switch port is not always easy.
In ESXi 5.1 you could enable Network Health Check. The feature is available on Distributed vSwitches version 5.1. Above we can see that we must first enable the VLAN checking. These settings are found on the Manage tab of the Distributed vSwitch on ESXi 5.1.
The ESXi host will now send certain broadcast frames on all vmnics attached to this Distributed vSwitch. These frames will be tagged with all VLAN ID:s defined on the virtual portgroups. In this example we have portgroups with VLAN 100 and 200.
On the new Monitor tab we could see if any configuration issues has been detected, here related to the VLAN setup on a certain ESXi 5.1 host.
In the details pane we are informed that there seems to be a problem with VLAN 200 on the vmnic2 and vmnic3. To be able to fix this problem easily it is very recommended to enable LLDP on the virtual switch.
On the physical switch we could now use the LLDP feature to quickly locate which physical switch port is connected to this ESXi host. Above we see that the local ports are called A13 and A14. Being sure of which local switch ports are connected to which ESXi vmnic ports are very crucial in any configuration change on physical switches.
As the ESXi 5.1 host reported problems with VLAN 200 we should now verify the VLAN configuration using the command “show VLAN 200“. It seems like only A13 is allowing tagged frames for VLAN 200 and the port A14 is missing. The network engineer has likely missed to add this port at some point.
With the command “VLAN 200 tag A14” (HP networking syntax) we add this port to the specific VLAN and then use the show command again to verify. Both ports attached to the ESXi host (A13 and A14) is now a member of VLAN 200 and allows incoming tagged frames with the VLAN id.
Let use now return to the vSphere Web Client for ESXi 5.1 and see if the status has changed: it now reports that all VLANs are supported over all available vmnics. These Health Checks are done every minute which is also good to detect any issues that are both present at the moment and potential future incorrect configuration.
Without doubt this is a very useful new feature to vSphere Networking. The new Network Health Check for VLAN tagging is a great new feature in ESXi 5.1 and will help network configuration troubleshooting in many cases.
Great post, and a very overdue but welcome feature! This would save me tons of time going back and forth with our network team. I’ve seen scripts to do similar things but this looks much easier.
Thanks Ed, and I agree that this should be a real time saver in getting the configuration between virtual and physical switches correct. It would have been very nice to get the same feature for the Standard vSwitch too..
Thanks Rickard, this feature will save me a lot of effort.