OpenVPN MTU detection bug

This forum is for admins who are looking to build or expand their OpenVPN setup.

Moderators: TinCanTech, TinCanTech, TinCanTech, TinCanTech, TinCanTech, TinCanTech

Forum rules
Please use the [oconf] BB tag for openvpn Configurations. See viewtopic.php?f=30&t=21589 for an example.
Post Reply
tedm
OpenVpn Newbie
Posts: 7
Joined: Sun May 16, 2021 4:30 pm

OpenVPN MTU detection bug

Post by tedm » Sun May 16, 2021 5:37 pm

I am running the following:
# openvpn --version
OpenVPN 2.5.2 arm-unknown-linux-gnu [SSL (OpenSSL)] [LZO] [LZ4] [EPOLL] [MH/PKTINFO] [AEAD] built on May 13 2021
library versions: OpenSSL 1.1.1k 25 Mar 2021, LZO 2.09
Originally developed by James Yonan
Copyright (C) 2002-2018 OpenVPN Inc <sales@openvpn.net>
Compile time defines: enable_async_push=no enable_comp_stub=no enable_crypto_ofb_cfb=yes enable_debug=no enable_def_auth=yes enable_dlopen=unknown enable_dlopen_self=unknown enable_dlopen_self_static=unknown enable_eurephia=no enable_fast_install=yes enable_fragment=yes enable_iproute2=no enable_libtool_lock=yes enable_lz4=yes enable_lzo=yes enable_management=yes enable_multihome=yes enable_pam_dlopen=no enable_password_save=yes enable_pedantic=no enable_pf=yes enable_pkcs11=no enable_plugin_auth_pam=no enable_plugin_down_root=no enable_plugins=no enable_port_share=yes enable_selinux=no enable_server=yes enable_shared=yes enable_shared_with_static_runtimes=no enable_small=no enable_static=yes enable_strict=no enable_strict_options=no enable_systemd=no enable_werror=no enable_win32_dll=yes enable_x509_alt_username=no with_aix_soname=aix with_crypto_library=openssl with_gnu_ld=yes with_mem_check=no with_sysroot=no
#

on Linux kernel 4.4 patchlevel 268

This is on two Netgear AC1450 routers (dd-wrt project)

We (users and developers of dd-wrt) have noticed that ever since Linux Kernel 2.6 and ever since the project upgraded to the 2.5.2 version of OpenVPN that OpenVPN seems unable to detect the MTU of the ethernet interfaces. This causes it to use ridiculously large MTU's like 1550 and higher with the result that UDP packets get trashed going through the VPN. TCP packets seem unaffected - likely due to advanced Path MTU Discovery - but UDP is.

Setting tun-mtu will cause OpenVPN to reduce the MTU it thinks the interfaces use. But this is a trial and error guessing process because different ciphers need different tun-mtu values

Setting link-mtu to 28 below 1500 also causes OpenVPN to reduce the MTU but for whatever reason it seems to still trash UDP packets.

I've been doing the investigating on this mainly because I have a setup that is particularly sensitive to it. My setup is dd-wrt OpenVPN client and server routers, a VPN connecting one private network I control to another private network I control, and a FreePBX VoIP server on one of the private networks and 7 VoIP phone sets on the other network, 2 Polycom's and 2 Cisco phones.

With tun-mtu and link-mtu unset, allowing OpenVPN to autodiscover MTU, my phones will not stay registered to the PBX and OpenVPN will setup a link-mtu in excess of the physical mtu of the interfaces (gigabit ethernet). With tun-mtu set to 1350 and AES-128-CGM/SHA256 setup link-mtu is computed to 1472 and the phones then WILL stay registered. Clearly what is happening is that with the smaller tun-mtu, OpenVPN is properly fragmenting/reassembling larger registration keepalives or whatever traffic the PBX and the phones are using to stay registered.

In addition to trashing UDP packets we have also found that "black holes" exist for certain packet sizes - attempts to send ICMP echo replies through the VPN for specific sized packets with the Don't Fragment bit set cause the packets to completely disappear - no responses back at all not even the Packet Too Big icmp messages that are supposed to come back. Setting tun-mtu corrects this issue.

I have no issues sending 1500 byte echo reply ICMP packets with DF set from the public IP's on one site to the public IP's on the other. Both of my sites have subnets of static public IPs. One site uses a different provider than the other and traceroutes show a half dozen hops though public routers.

It took a LOT of time to figure out what was going on, and there is a plethora of contradictory OpenVPN discussions and documentation on the Internet on this issue that is either flat out baloney or applicable to old versions of OpenVPN, that put up lots of blind alleys to get stuck in. There is also no good real world testing tools (other than using actual hardware phones and such) Users are also upset that tun-mtu has to be set so low and are resistant to doing it, and the vast majority of dd-wrt users are using OpenVPN in one direction only - as a client to go to a commercial VPN provider - and the providers themselves also (apparently) don't understand the issue and some trash large UDP packets themselves, Many also run older OpenVPN versions and testing is all over the map on those and behavior varies in those instances. And some OpenVPN documentation says to set tun-mtu to 1500 which is clearly wrong since that would create a packet that is much larger than an MTU of 1500 and what would happen to it is undefined, it's also not understood if OpenVPN would even pay attention to such a setting at all if it knew the actual interface MTU was also 1500.

Broadcom also has not written drivers for newer Linux kernel versions for much of their hardware so moving to a newer Linux kernel IS NOT an option for most router platforms. Some platforms are still stuck on the 2.4 kernel and some on the 2.6 kernel and some on the 3 kernel. Many users run hardware that has restricted flash and nvram and so will not upgrade to the latest dd-wrt builds they will use old builds with insecure versions of OpenVPN on them because they "work"

What I am seeking is some clarification on this issue:

1) Does the OpenVPN developers not really give a fig on this issue or are they actually interested in us submitting bugs on it? The focus does seem rather x86/Wintel these days from the OpenVPN project.

2) Why is it that ifconfig shows the correct physical MTU and OpenVPN's autodiscovery seems unable to figure it out? Isn't the Linux ifconfig getting the MTU from the network driver? Since all programs in this environment run at the "root" level it seems there is no access security blocking the OpenVPN program from reading these values.

3) Is there some inherent issue with ARM gear that makes MTU discovery difficult to impossible? Or is it some inherent issue with older Linux kernels and newer OpenVPN versions? Or is the problem due to Broadcom and other makers of ARM gear who are disinterested in supporting newer Linux kernels on their older chips that are still out in the wild being used?

4) What is the "best" way to fix incorrect MTU detection of OpenVPN in the config file? Is it setting tun-mtu, or link-mtu or mssfix or something else?

Any discussion other than "switch over to an old PC for running OpenVPN" or other such Wintel-centric "advice" will be most welcome. Please remember that the entire world does not have ready access to a pile of free old PC's that won't satisfactorily run Windpig10

Thanks!

TinCanTech
OpenVPN Protagonist
Posts: 11139
Joined: Fri Jun 03, 2016 1:17 pm

Re: OpenVPN MTU detection bug

Post by TinCanTech » Sun May 16, 2021 6:02 pm

Openvpn has a long, long history of weird problems with MTU.

I cannot give you any definitive answers but the current preferred setting appears to be --tun-mtu 1400.
Which conflicts with the manual and other defaults, most notably --mss-fix !

I don't think there is an easy answer ..

tedm
OpenVpn Newbie
Posts: 7
Joined: Sun May 16, 2021 4:30 pm

Re: OpenVPN MTU detection bug

Post by tedm » Sun May 16, 2021 6:33 pm

tun-mtu of 1400 with AES-128-GCM produces a link-mtu that exceeds 1472. When using that my phones will not stay registered so clearly it's too large for that cipher. It might be OK for CBC ciphers or blowfish or chacha or something like that, I have not tested. The ARM cpu contains go-fast instructions for AES that clearly OpenVPN is using which is why I'm sticking with that cipher.

I'll be happy to put much more time into testing IF there's any interest from the devs in fixing the problem. Otherwise it's just better to publicize the issue as much as possible to try to help users out who are getting bitten in the ass by weird problems caused by this. Currently in the dd-wrt project there's a lot of push from experienced users to get users running private VPN's to run wireguard because the experienced users are fed up with chasing ghosts in OpenVPN. But a LOT of users are undoubtedly operating from networks in countries that do not believe in the right of privacy and OpenVPN is their only way to get safe access out. Don't forget there's entities in the world quite happy to see OpenVPN _not_ nail down MTU as it keeps it unreliable, thus encouraging people to run antique, easily-crackable ciphers and old OpenVPN versions.

TinCanTech
OpenVPN Protagonist
Posts: 11139
Joined: Fri Jun 03, 2016 1:17 pm

Re: OpenVPN MTU detection bug

Post by TinCanTech » Sun May 16, 2021 6:39 pm

tedm wrote:
Sun May 16, 2021 6:33 pm
tun-mtu of 1400 with AES-128-GCM produces a link-mtu that exceeds 1472. When using that my phones will not stay registered so clearly it's too large for that cipher.
Then go lower.

This is why the options are there, so that you can use them to get Openvpn working.
tedm wrote:
Sun May 16, 2021 6:33 pm
Don't forget there's entities in the world quite happy to see OpenVPN _not_ nail down MTU as it keeps it unreliable, thus encouraging people to run antique, easily-crackable ciphers and old OpenVPN versions.
No doubt. But bear in mind that it's not Openvpn's job to tackle that !

It is what-it-is and it is free, people are also free to chose an alternative.

tedm
OpenVpn Newbie
Posts: 7
Joined: Sun May 16, 2021 4:30 pm

Re: OpenVPN MTU detection bug

Post by tedm » Sat Nov 30, 2024 1:08 pm

I apologize for reviving an old thread - but - it appears in recent versions of OpenVPN that some of the MTU issues are fixed. I wrote:

"In addition to trashing UDP packets we have also found that "black holes" exist for certain packet sizes - attempts to send ICMP echo replies through the VPN for specific sized packets with the Don't Fragment bit set cause the packets to completely disappear - no responses back at all not even the Packet Too Big icmp messages that are supposed to come back. Setting tun-mtu corrects this issue."

This does not appear to be the case anymore that "black holes" exist.

But there is ONE issue that I think may be a bug and I was wondering about this if anyone can explain it:

I have an OpenVPN VPN that's a gateway-to-gateway VPN running over TCP without tun-mtu set.

From a host on one side I can do the following:

tedm@beachserver:~$ ping -s 1472 -c1 -M do 172.16.1.16
PING 172.16.1.16 (172.16.1.16) 1472(1500) bytes of data.
1480 bytes from 172.16.1.16: icmp_seq=1 ttl=61 time=32.2 ms

--- 172.16.1.16 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 32.190/32.190/32.190/0.000 ms
tedm@beachserver:~$

As you can see everything is working fine

However, if I change this to:

tedm@beachserver:~$ ping -s 1473 -c1 -M do 172.16.1.16
PING 172.16.1.16 (172.16.1.16) 1473(1501) bytes of data.
ping: local error: message too long, mtu=1500

--- 172.16.1.16 ping statistics ---
1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms

tedm@beachserver:~$

The error message that comes back says that MTU is set to 1500. However, it's NOT. It's not because the packet is getting rejected at 1473, because 1500 - 20 (openVPN TCP header size) - 8 (ICMP header size) = 1472

Why is the ICMP error message that's being returned to the sender saying MTU is 1500 when MTU is actually 1472?
(note that not all versions of ping return the second ICMP error message, for example the ping under Windows merely outputs:

C:\Users\tedm>ping -f -l 1473 172.16.1.16

Pinging 172.16.1.16 with 1473 bytes of data:
Packet needs to be fragmented but DF set.
Packet needs to be fragmented but DF set.

Ping statistics for 172.16.1.16:
Packets: Sent = 2, Received = 0, Lost = 2 (100% loss),
Control-C
^C
C:\Users\tedm> )

Post Reply