“Jumbo Frames” is the common name for the use of non default sized Ethernet frames. The only size that is accepted by all Ethernet network cards, switches, routers and operating systems is 1518 bytes. (Devices that supports 802.1Q VLAN tagging can extend this by 4 bytes up to 1522).
So how large is a Jumbo Frame then? Surprisingly that is not really possible to tell. Since “Jumbo Frames” is not yet a standard there is actually no defined maximum size for a so called Jumbo Frame, however a common accepted MTU (Maximum Transmission Unit) value is 9000 bytes. MTU is the amount of data possible inside a frame after the Ethernet overhead. The MTU value of 9000 is six times larger than default Ethernet. To be able to use Jumbo frames all involved devices must support this, for example in a VMware environment: the VMkernel port, the vSwitch, the physical switches and the remote side, e.g. an iSCSI target SAN. This is very critical and note that many physical switches with support for Jumbo Frames still has the default setting of Jumbo Frames disabled. See this how the verify end-to-end Jumbo Frames support.
(Even on VMware ESX/ESXi it must be activated. See this post on how to enable Jumbo Frames on ESXi 4.1.)
In a later article I will cover Jumbo Frames in more detail, but here I will look at one concern I sometimes see and that is what will happen if we have some end devices on a VLAN that supports Jumbo Frames and others that don’t. A common misunderstanding is that this will not work at all.
For example: assume we have an ESXi host which uses iSCSI for accessing perhaps two iSCSI targets on the same VLAN, and only one of them supports Jumbo Frames. See an example on the picture on the top. The physical switch and ESXi has been configured to accept Jumbo Frames on the iSCSI VLAN, but how does ESXi know when to send the larger frames and when not to?
The answer is within the setup of the TCP session. When doing the classic three way TCP handshake both side will inform the other of the largest acceptable MSS, Maximum Segment Size. The MSS is how much data that could be carried inside each frame, after the overhead for Ethernet (18 bytes), IP (20 bytes) and TCP (20 bytes). Typically this will give a MSS of 1460 bytes for standard Ethernet and 8960 bytes for Jumbo frames.
If the two sides reports different MSS the smallest value will automatically be selected. This will make the ESXi host aware per TCP session when to use Jumbo Frames and when to use default Ethernet frames sizes, without any need for separated VLANs or other configuration.
Above is a default TCP handshake on ordinary Ethernet, displayed with Wireshark. Note that both sides reports the same 1460 byte MSS. (Click the picture for a larger view.)
And finally an example of two sides reporting different MSS values. Without even having to confirm this to each other both TCP stacks knows it has to use 1460 bytes as Maximum Segment Size. One of many great features of TCP.
However, depending on the exact workings of your switches, the traffic stream with normal frame size could get a lot less bandwidth, so it will still of course be advantageous if all devices uses the same frame size. It is also very recommended in various RFCs for IP to keep the MTU size equal on the same network, but it would still work due to the TCP MSS negotiation.