The VLAN tagging options with VMware vSwitches. How the 802.1Q tagging works for internal and external VLAN traffic in vSphere standard vSwitches and what “VLAN trunking / tagging” is.
There are multiple different way to configure VLAN tagging 802.1Q in VMware vSphere ESXi. The VLAN settings at ESXi vSwitches are very important to get configured correct to have a working and secure network.
For more information about the 802.1Q tag and how it actually modifies the frames see this article.
The most common and simple way is as above, in this example two portgroups and the VLAN id:s set to 100 and 200. The virtual machines does not need to know which VLAN they are member of and the vSwitch will expect only untagged default sized frames incoming from the VMs.
Internal traffic is untagged
If a virtual machine sends a frame destinated to another VM on the same VLAN and on the same vSwitch the frame will be delivered untagged and unmodified. No tagging is needed either by the VM or the vmkernel.
To keep the traffic internally at the ESXi host the destination VM must be on the same vSwitch, but it could be located on another portgroup as the sender VM, as long as both portgroups has the same VLAN number configured. If however two VMs are on two different vSwitches on the same ESXi host the traffic must always hit the physical switches and return, even if the same VLAN id are on both vSwitches.
If two VMs are on the same vSwitch, but different portgroups with different VLANs the traffic must always be routed at some location. The vmkernel could never lift a frame from one VLAN to another as frames from one specific Layer Two broadcast domain must be processed by a Layer Three router before entering a new VLAN. The router could be either a physical or virtual, but the vSwitch itself has no L3 capabilities.
External traffic will be tagged
If a virtual machine (on a portgroup with a VLAN id) sends a frame that should be delivered to something outside of the virtual switch the vmkernel will modify the frame and add the 802.1Q tag to the frame before sending it to the physical network through the vmnic1 uplink. The VM is not involved in any tagging action and does not even know it takes place.
(The tagging action on outgoing frames is often offloaded by ESXi to the physical network adapter and by that is minimal in performance overhead. The tag do consume 4 extra byte for each frame for the 802.1Q tag, but that is also negligible.)
When the ESXi host sends any 802.1Q tagged frame to the network the physical switch port must be correctly configured. If not the frame will be dropped at the switch level. On Cisco devices a port allowing tagged frames is called a “trunk port“. HP Procurve switches uses the words “tagged“. In the next part of the VLAN vSwitch articles we will look in detail at the physical switch configuration.
When a tagged response arrives from the physical network the vSwitch (in vmkernel memory) has to untag the frame before sending it into the virtual machine. If the vSwitch should send a 802.1Q tagged frame into a default configured VM the frame would be dropped by the VM network card driver.
Broadcast frames are both tagged and untagged
One special case is also that if a virtual machine sends a broadcast frame (MAC destination FF-FF-FF-FF-FF-FF). The broadcast frame must be delivered to all other stations on the Layer Two LAN (the “broadcast domain”) where one untagged copy will be sent into each virtual machine on the local vSwitch and one tagged copy of the frame will be sent on the uplink.
If using vSwitch NIC teaming with two or more physical NIC ports (vmnics) connected still only one copy of the tagged broadcast frame will be sent from the vSwitch. This is to avoid MAC flapping at the physical switches.
Tagged frames from VMs are dropped
The vSwitch demands that virtual machines must only send untagged frames (with one exception, discussed in part 3 of this article.)
If a virtual machine would send tagged frames, even with the correct VLAN id, the frames will be dropped. Typically there are no reason for a VM to tag frames at all and the ESXi behavior to drop unexpected tagged frames protects from VLAN hopping attacks.
In the example above, if the VM uses the VMware VMXNET3 network card and sets a 802.1Q tag with VLAN id 100 which is the same VLAN as the VM is a member of, the frame would still be dropped.
The next part of the vSwitch and VLAN tagging articles will discuss the setup at physical switches from HP Procurve and Cisco.
Hi Rickard,
this is a very nice article. Usually you need to also manage traffic between VLANs.
So You can add port group with vlan 4095 as a trunk to virtual machine (for example linux router/firewall).
My scenario:
VM1 – linux (VPN endpoint, VLANs inside VPN).
NIC1 – vSwitch1 – just to connect to Internet
NIC2 – vSwitch2 – TRUNK (vlan 4095)
(inside VM1 is VPN endpoint network interface TAP bridged with NIC2)
VM2 – virtual server
NIC1 – vSwitch2 – portgroup VLAN120
I am expecting, that VM2 generate ethernet frame, sent it to the network through portgroup VLAN120, this frame is tagged as VLAN 120 member and gou through portgroup TRUNK to the VM1.
But it does not work! Frame arrives to VM1 untagged!
BTW:
VM1 is to vSwitch1 connected through TRUNK too, because it manages lot of another VLANs. This works perfect.
The goal is maybe here:
vSwitch2 has no physical NIC attached
vSwitch2 uses the same VLAN numbers as vSwitch1 (but it corespondent to different networks and switches has no direct connection).
Do you have any Idea why is this not working?
It seems to be a ESXi failure.
Thanks
T.
Hello Tomas,
and thank you for a long comment.
I am planning to include the internal VLAN tagging / trunking with VLAN id 4095 in the third part of this series, however it is an interesting case you have and from a quick view it does look like ESXi does not do an “on-tag” (as would be expected) for frames from untagged VMs if they need to go into the VLAN 4095.
As a work-around, would it be possible to do Guest Tagging of VLAN 120 on VM2? That might do the correct tags being already on the frames before sent into the “uplink trunk” into VM1, but will also make the setup a bit less clean with VLAN 4095 also the on the VM2 portgroup.
I will do some testing with this and see what the behavior really is in this cases.
Best regards,
Rickard
Excellent article Rickard. Looking forward to the next parts 🙂
Great article, looking forward to your next post.
Quick question Rick, if the VLAN ID is set to 4095 on a portgroup on a switch (vSS or vDS), does Promiscuous Mode need to be set to “Accept” for VM’s on that port group to be able to receive tagged traffic? I understand that the trunking driver needs to be installed and the physical switch port needs to be a trunk port.
Hello Many,
and thank you for your comment. A quick response is that I do not think you will have to enable Promiscous Mode, but you would – as you say – need a VLAN capable driver inside the VM with the correct configuration. However, I would like to verify the Promiscous setting and will get back to you.
Thanks Rick, eagerly awaiting a reply.
So what do you think Rick?
Hello Manny,
I tried to email you some days ago, but the mail address given in your comment was not valid?
I have unfortunately not had the opportunity to test this setting, but hopefully soon.
Best regards, Rickard
Rickard;
I thoroughly enjoyed reading your article, thank you!
I have an ESX server with only one network interface, but wish to add two hosts which should reside on a DMZ. (I also have another ESX server with two NIC, those are teamed).
The two VM’s I need to add is a network proxy, and the other a firewall VM.
In my case, would I be able to trunk my ports (yes, my switch supports 801.2q), and keep separate all traffic to/from my two DMZ VM’s?
Or, do I have to add another NIC, which I am not capable of doing right now?
Would you recommend against doing this, and if so, why?
Thank you,
-vp
Hello Vadim,
and thank you for your question.
It should work without problem to use several VLANs on the same physical NIC. Just make sure the physical switch is correctly configured with the VLANs you will use and set the same VLAN ids on the portgroups for your firewall VM.
The only “problem” with using only one physical NIC is that you will have no redundancy (on NIC, cable and switch), but if that is acceptable to you then there should be no technical problems.
Best regards,
Rickard
I have a strange problem.
I have two Vsphere hosts. On both I have set up two “Virtual Machine Port Groups” on the the same NIC. One untagged (VLAN 0) one and one with VLAN 2.
The strange thing is that I can access the untagged one from the other host and vice versa.
Shouldn’t a VM with VLAN 2 only access the VM with VLAN on the other side?
Hello joblack,
and thank you for your question.
The machines should be able to communicate, but only if the physical switch connecting them are correctly configured. What is needed is that these ports must be allowing tagged frames with VLAN id 2, or else they will just silently drop those frames. Depending on your physical switch brand the actual commands on the switch will be different (on Cisco a “switchport mode trunk” on the specific ports and on for example HP it will be “VLAN 2 tagged …. PORTS”.
Hi Rickard,
This article is very interesting, can you please point me to the other parts of this talk?
Hello Sirisha,
and thank you for your comment. The blog has been not updated for a while, but I am planing to complete the second and third part of the VLAN series in the near future. Hope to see you back then.
Best regards,
Rickard
Hi guys, I have read this article. Just wanted to confirm my understanding. If someone can plz confirm.
Basically, as I understand, when there is inter VLAN ( like from VLAN 100 to 200) traffic involved (regardless of same or different port group, ESXi host or vSwitch) layer 3 routing will be required. If a VM in vSwitch0 needs to communicate with VM in vSwitch1 on same host, uplink (depending on load balancing mechanism), pSwitch and layer 3 routing (assuming both VM’s are in separate VLANs) will be involved. If a VM in portgroup A needs to communicate with portgroup B with totally 2 different hosts, uplinks, pswitch will be involved. The only time when uplink and pSwitch is not involved is when VM A needs to talk with VM B in same port group on same vSwitch and same host. Also important to remember is all traffic sent to VM and received from VM must be untagged when using VLAN ID’s 1-4094 otherwise it will be dropped by vSwitch. VM’s are simply not aware that their traffic is being tagged and VLAN’s are used. Is this correct? Am I missing something?
Hello Gary,
and thank you for your comment. I think you have most things totally correct, I have only some minor details to include.
“The only time when uplink and pSwitch is not involved is when VM A needs to talk with VM B in same port group on same vSwitch and same host.”
Most of the times this is true, however it does not actually need to be the same portgroup if there are several portgroups with the same VLAN id on the same vSwitch. (I think it is common to think of the portgroups as equalient to a “VLAN”, but it is more like a template for a couple of virtual switch ports with the same configuration. So several portgroups could be in the same VLAN, but still for example need different security settings or active/standby physical adapters.)
So cross portgroup communication is possible on the same vSwitch inside the host if the portgroups use the same (or no) VLAN tag.
“Also important to remember is all traffic sent to VM and received from VM must be untagged when using VLAN ID’s 1-4094 otherwise it will be dropped by vSwitch.”
This is true, and also actually when using VLAN 0, i.e. that all frames from the VMs should leave the vSwitch to the physical switch as untagged. Any tagged frames from any VM will be dropped.
“If a VM in vSwitch0 needs to communicate with VM in vSwitch1 on same host, uplink (depending on load balancing mechanism), pSwitch and layer 3 routing (assuming both VM’s are in separate VLANs) will be involved.”
This is also true in most of the time, however if there is be a VM running as a virtual router on the host with one vNIC on each of the VLAN portgroups the traffic could stay internally on the host but still be Layer 3 routed. This is of course not the usual case.
Best regards,
Rickard
Hi Rickard,
Nice post. I have a little problem here.
I want several VLAN on same NIC.
I configured 2 ports group:
1 – VLAN 100
2 – VLAN 200
On my physical switch I set physical port in TRUNK MODE and VLAN membership I allowed both vlan.
So. The VM in Port group with tag VLAN 100 work perfectly. But second VLAN not working.
I cannot ping 2 VM in same port group VLAN 200. I think the paquets are dropped directly in the vSwitch.
Any config in the vswitch can I do ?
Many thanks if you can help me!
🙂
PY
Work now.
Sorry, mistake from me.
🙂
Thanks !
Hi Rickard,
Could you post the part 2 and 3 of this subject please ?
I need the point when you will talk about the exception when Tagged frames from VMs are dropped.
Actually I have a VM that must tagged frames and I dont know how to let them out from the vswitch without set the vswitch to 4096.
Sorry for my english.
Best regards,
Jerome
Hello Jerome,
and thank you for your comment.
Does your virtual machine need to have multiple tagged VLANs? Do you have ordinary Standard vSwitch or the Distributed vSwtich?
Hi Rickard,
I really enjoyed reading your article. Thank you!
I have a problem in environment.
Where I have one Standard Vswitch (two uplinks) with two port groups:
1. VLAN-100
2. VLAN-200
I have this configuration on two Vsphere hosts. On one host this is working perfectly fine. However, when I migrate my VM to another host it does not communicate to network. On physical switch configurations are same for both the hosts.
can you please help me in troubleshooting this?
Many Thanks in advance!
Hello Gaurav ,
and thanks for your question.
On your second host, does none of the VMs work? Or does it works “sometimes”? Does it affect both VLANs?
Do you use the same NIC Teaming Policy on both vSwitches?
And are you totally sure that the physical switch configuration is the exact same for all ports connected to your ESXi? Both VLAN tagging/trunking and Link Aggregation settings, depending on the how the vSwitches are configured.
Regards, Rickard
Thanks for your response Rickard!
Issue has been resolved. There was some issue from Physical Switch. Network team did not check the second core.
Thanks again!
The only way I could get hosts to properly communicate through a windows ESXI based machine (I could see traffic tagged frm the ESXI but return traffic was showing up untagged — without vlan)
Needed to enable Monitor Mode on Windows to not strip the VLAN
here is how it’s done. One entry into the registry and it’s all working.
http://www.intel.com/support/network/sb/CS-005897.htm
Nice article. One question: Is it possible to accomplish the following:
1. A VM sends untagged traffic, and the host tags the traffic with a set VLAN id before it hits the physical network.
2. The same VM ALSO sends tagged traffic on a number of other VLANs (such as would be the case with a unix-based router servicing multiple VLANs).
3. Multiple VMs doing the same thing over the same physical interface (ie VM1 in portgroup A, untagged traffic is placed into VLAN 100, and it accepts traffic with vlan tags 101-199. VM2 in portgroup B, untagged traffic placed into VLAN 200, and accepts tagged traffic on VLANs 201-299.)
In an all-physical environment, this would be accomplished by setting the the switchport (ex using cisco sg300):
switchport mode trunk
switchport trunk native vlan 100
switchport trunk allowed vlan add 101-199
But is it possible to configure a portgroup in a similar way? What about with a distributed vswitch with an enterprise+ license?
Also, will parts 2-3 ever be published?
Thanks!
Hello Lannar,
and thank you for your reply.
The user case you present is unfortunately not possible in the way you describe. It is possible to allow a VM to tag the frames it sends, this is done on a Standard vSwitch by setting the VLAN option to 4095. This will be somewhat equalient to “switchport mode trunk” on a physical Cisco switch.
Note that this will allow any tagged VLANs to enter the VM, there is no “switchport trunk allowed vlan 100, 200, 300”, but only all VLANs. On the Distributed vSwitch you could however do this by setting the VLAN option to “VLAN trunk” and set the specific allowed VLANs.
One thing that is not possible on either the Standard or Distributed vSwitch is however any handling of the “native” vlan, in the sense that you could not make the ESXi to do anything on the untagged frames – they will just be sent to the physical switch without any altering.
This means it will not be possible to have two portgroups on a vSwitch with different “native” vlans because they will both end up untagged on the physical switch port and be consumed into the switch native VLAN.
Are you forced to have a untagged VLAN on your VMs? If this vlan also could get tagged inside the VM it should work.
As for the next part of the article – thank you for reminding me! 🙂
Best regards,
Rickard
Hello Rickard,
Beautiful explanation. Can you share the link to part 2 and 3. I don’t seem to find them.
Thank you for your comment Harry. As for the next parts they are unfortunately not ready yet, but I will work to get them ready for publishing.
Regards, Rickard
Hejsan,
Jag har en fråga gällande trafik över flera NIC:ar…
Om man har en server (ESXi) med 4 nätverksportar, alla är kopplade till samma switch på samma VLAN och alla står som Active i vmware.. hur balanserar den då trafiken över portarna, blir det 25% last på varje port??
Hello Tobbe,
(reply in english) – each VM will get a connection to a certain physical vmnic port which will be determined at VM startup or through incoming vMotion. These “connections” is not visible in the graphical interface, but you could see them in the network view of the Esxtop tool through SSH.
Over time there is no guarantee that the VM-connection-load will be spread totally even among the physical vmnics, but it will typically be more or less fair spread.
Regards, Rickard
Hi Rickard,
We have a IBM pureflex with 5 compute node.
Each node has two PNIC. Each NIC is connected with Brocade switch which are behind chassis.
Both brocade switches are connected to Cisco core switch.
Cisco core switch port are configured as trunk for native vlan 1 and vlan 4.
Brocade switches are configured as tagged port for vlan 5 and PVID in 1.
My all file Esxi host VMkernal network are configured in vlan 4 network.
I have configured VST on Vswitch and both PIC are working in NIC teaming as default option.
As of now I am able to communicate with my all VMs which are on native vlan1 and vlan 4, but I am not able to communicate with my ESXi hosts which are on Vlan 4 . Pls help me to sort out it.
Good morning, I’m working in a school and I have this scenario: I need to avoid some lab from going on the web.
I have five labs, I have several VM on Esxi.
one VM is a zeroshell instace that I use for routing and firewall purpose and is possible to define 5 VLAN on one NIC and use them to block traffic.
So I configure my switches to tag packets from lab1 with VLAN 100, lab2 with VLAN 200 and so on.
One VM is a DC and its VNIC is connected on the same VSWITCH of firewall.
I supposed that my DC will catch all packet being VLAN in layer two, but I supposed BAD … any tagged packet drop.
So, if I set my switch to send to VMINC0 untagged packets my DC works well but my firewall cannot know from witch lab packets are coming, if I set my switch to send to VMNIC0 tagged packets my firewall will (probably) works but my DC Don’t.
I cant use another VMNIC because they are all four used, I prefer don’t make five subnet because I already have a lot … don’t like to have a DC that is autoritative on 5 subs and a firewall with 11 vnic … if possible.
Thank’s for any ideas
(I know … probably I’m doing a lot of mistake … but I’m working to my best)
It is not completely clear to me BUT I suspect based on the article and the discussion the answer is most likely NO.
If I have a virtual Machine that requires a defined interface with VLAN tagging enabled say ,VLAN 100, there is no WAY to have that traffic from the virtual machine pass through an ESXi virtual switch (in any configuration VLAN = 0, VLAN=100 or VLAN=4095) to a real ethernet switch with the original packet tagging in place?
HI,it’t a nice article.
BUT ,I am in trouble dealing traffic with tag 4095.
To be more specifical, situation is that I install an OS inside ESXi6.0 to receive traffic from physical switch.
My switch is H3C S5800-32F, which adds tag 4095 on outbound traffic.By capturing traffic ,I find that ESXi 6.0 won’t receive 4095 traffic. And using ESXtop command ,dropped stastics is none,which means ESX6.0 is not able to recgnise traffic 4095.
How can I solve the problem.
Hello Rickard,
How to communicate two different VLAN’S(Different Port Groups) in VStandard Switch.
Regards,
Hari.
It is only possible by having the traffic routed through a Layer Three device, either an external physical router or a VM with routing capabilities.
Up to this great post !!!
Just to be sure :
I’ve 2 vswitchs on an Esx , vSwitch0 and vSwitch1, each one with a PortGroup in Vlan 110 respectively vSwitch0PG110 and vSwitch1PG110 and VM1 (on vSwitch0PG110) cannot communicate with VM2 (on vSwitch1PG110)… is it normal. I anderstand that there is no communication between vSwitchs on the same host so the traffic should go throught the Physical LAn (ex: arp request) so VM1 should be able to contact VM2 via Physical lan , shouldn’t it ?
Moreover, if I vmotion the VM2 on another esx that works.
Very strange from my point of vue 🙁
Please Anyone help me
I am having Vmware workstation(12.5.7) running in Windows 7 and guest Debian 8.8 is running in Vmware
I have created multiple Vlan inside guest (Debian 8.8) using the below settings in /etc/network/interfaces
auto eth10g
iface eth10g inet static
auto eth1g
iface eth1g inet static
address 172.17.3.2
netmask 255.255.0.0
post-up ip rule add from 172.17.3.2 lookup rt1g priority 100
post-up ip route add 172.17.0.0/16 dev eth1g table rt1g
auto vlan10
iface vlan10 inet manual
address 172.17.3.160
netmask 255.255.0.0
vlan-raw-device eth10g
auto vlan20
iface vlan20 inet manual
address 172.29.55.160
netmask 255.255.255.0
vlan-raw-device eth10g
auto vlan21
iface vlan21 inet manual
address 172.16.32.1
netmask 255.255.255.0
vlan-raw-device eth10g
auto br-sec
iface br-sec inet static
address 172.17.3.1
netmask 255.255.0.0
bridge_ports eth10g vlan10
bridge_stp off
bridge_maxwait 0
bridge_fd 0
auto br-gx0
iface br-gx0 inet static
address 172.29.55.160
netmask 255.255.255.0
bridge_ports vlan20
bridge_stp off
bridge_maxwait 0
bridge_fd 0
auto br-gx1
iface br-gx1 inet static
address 172.16.32.1
netmask 255.255.255.0
bridge_ports vlan21
bridge_stp off
bridge_maxwait 0
bridge_fd 0
I am unable to ping vlan20 and vlan21 from Windows. Can anyone help me with this