Identifying MTU issues with ping


Recently I have been facing MTU problems (again), i.e. assuming that jumbo frames are enabled on all infrastructure or software elements and suddenly discovering that this assumption is wrong and some of them are using the standard Ethernet MTU (1500 bytes).

In this post we will explain how to use ping by precisely controlling the size of the packets sent to discover the element or elements not configured to use jumbo frames.

Short introduction

MTU stands for Maximum Transmission Unit, so it is related to the maximum number of bytes that can be sent in a single «packet». In the case of Ethernet frames, the standard MTU is 1500 bytes (the actual number of data bytes is lower due to the higher layer headers (IP header, TCP header, etc.).

For layer 2 communication to occur correctly, all the elements involved (switches, network interfaces, software bridges, etc.) must use exactly the same MTU. There is no mechanism for discovering or negotiating it, so the same MTU must be set on all elements.

Jumbo frames is the name used when MTU > 1500 bytes is enabled. There is no standard size for jumbo frames, but 9000 bytes is commonly used. Jumbo frames are more efficient and increase network performance, so it is a common configuration in data centers and on premises infrastructure.

It should be noted that MTU only affects layer 2 communications. Systems with different MTUs can communicate without any problem if they are in different networks and therefore communication between them takes place at layer 3.

MTU issue

When only one of the L2 elements of a network is configured with the standard MTU of 1500 bytes and the rest are configured to use jumbo frames (9000 bytes) the following behavior is observed:

root@test1:~# ip a s dev eth0
2: eth0@if10: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP group default qlen 1000
    link/ether 00:16:3e:b1:20:18 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 192.168.100.2/24 brd 192.168.100.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::216:3eff:feb1:2018/64 scope link 
       valid_lft forever preferred_lft forever

root@test2:~# ip a s dev eth0
2: eth0@if11: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 00:16:3e:df:65:65 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 192.168.100.3/24 brd 192.168.100.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::216:3eff:fedf:6565/64 scope link 
       valid_lft forever preferred_lft forever

----

root@test1:~# ping -c2 192.168.100.3
PING 192.168.100.3 (192.168.100.3) 56(84) bytes of data.
64 bytes from 192.168.100.3: icmp_seq=1 ttl=64 time=0.220 ms
64 bytes from 192.168.100.3: icmp_seq=2 ttl=64 time=0.070 ms

--- 192.168.100.3 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1000ms
rtt min/avg/max/mdev = 0.070/0.145/0.220/0.075 ms

root@test1:~# ssh 192.168.100.3
(connection not established)

What happened? Ping shows there’s possible to reach test2 from test1, but a ssh connection (in general a TCP client/server) connection doesn’t work. This is the typical MTU issue caused by the different MTU set for test1 and test2. The standard ping doesn’t care about the MTU because it send a small «package» of 84 bytes (including IP and ICMP headers) but other protocols don’t work because they’ll try to use all the payload and then the connection will hang.

Debugging the network MTU with ping

In a usual client/server connection, many packages of different size are sent, so it’s not the best way to debug the issue and to find the MTU in the network. Ping can be used for this purpose because it’s possible to set the exact size of the packages to send and to disable fragmentation. To send 1500 bytes of payload in a Ethernet frame without fragmentation, the following command can be used:

root@test1:~# ping -c1 -M do -s 1472 192.168.100.3
PING 192.168.100.3 (192.168.100.3) 1472(1500) bytes of data.
1480 bytes from 192.168.100.3: icmp_seq=1 ttl=64 time=0.267 ms

--- 192.168.100.3 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.267/0.267/0.267/0.000 ms
  • -M do: prohibit fragmentation
  • -s 1472: 1472 bytes of data
  • ICMP header: 8 bytes
  • IP header: 20 bytes (usually, it can be higher)
  • 1480 bytes is the IP payload

By simply increasing the size of the package sent by one byte, the connection will hang up:

root@test1:~# ping -c1 -M do -s 1473 192.168.100.3
PING 192.168.100.3 (192.168.100.3) 1473(1501) bytes of data.

--- 192.168.100.3 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms

We can conclude that ping is an useful tool for troubleshooting a mtu issue by performing the previous test on every element in the network to find the mtu set.

Identifying MTU issues with ping

Deja un comentario