OpenVPN MTU detection bug

This forum is for admins who are looking to build or expand their OpenVPN setup.

Moderators: TinCanTech, TinCanTech, TinCanTech, TinCanTech, TinCanTech, TinCanTech

Forum rules
Please use the [oconf] BB tag for openvpn Configurations. See viewtopic.php?f=30&t=21589 for an example.
Post Reply
tedm
OpenVpn Newbie
Posts: 6
Joined: Sun May 16, 2021 4:30 pm

OpenVPN MTU detection bug

Post by tedm » Sun May 16, 2021 5:37 pm

I am running the following:
# openvpn --version
OpenVPN 2.5.2 arm-unknown-linux-gnu [SSL (OpenSSL)] [LZO] [LZ4] [EPOLL] [MH/PKTINFO] [AEAD] built on May 13 2021
library versions: OpenSSL 1.1.1k 25 Mar 2021, LZO 2.09
Originally developed by James Yonan
Copyright (C) 2002-2018 OpenVPN Inc <sales@openvpn.net>
Compile time defines: enable_async_push=no enable_comp_stub=no enable_crypto_ofb_cfb=yes enable_debug=no enable_def_auth=yes enable_dlopen=unknown enable_dlopen_self=unknown enable_dlopen_self_static=unknown enable_eurephia=no enable_fast_install=yes enable_fragment=yes enable_iproute2=no enable_libtool_lock=yes enable_lz4=yes enable_lzo=yes enable_management=yes enable_multihome=yes enable_pam_dlopen=no enable_password_save=yes enable_pedantic=no enable_pf=yes enable_pkcs11=no enable_plugin_auth_pam=no enable_plugin_down_root=no enable_plugins=no enable_port_share=yes enable_selinux=no enable_server=yes enable_shared=yes enable_shared_with_static_runtimes=no enable_small=no enable_static=yes enable_strict=no enable_strict_options=no enable_systemd=no enable_werror=no enable_win32_dll=yes enable_x509_alt_username=no with_aix_soname=aix with_crypto_library=openssl with_gnu_ld=yes with_mem_check=no with_sysroot=no
#

on Linux kernel 4.4 patchlevel 268

This is on two Netgear AC1450 routers (dd-wrt project)

We (users and developers of dd-wrt) have noticed that ever since Linux Kernel 2.6 and ever since the project upgraded to the 2.5.2 version of OpenVPN that OpenVPN seems unable to detect the MTU of the ethernet interfaces. This causes it to use ridiculously large MTU's like 1550 and higher with the result that UDP packets get trashed going through the VPN. TCP packets seem unaffected - likely due to advanced Path MTU Discovery - but UDP is.

Setting tun-mtu will cause OpenVPN to reduce the MTU it thinks the interfaces use. But this is a trial and error guessing process because different ciphers need different tun-mtu values

Setting link-mtu to 28 below 1500 also causes OpenVPN to reduce the MTU but for whatever reason it seems to still trash UDP packets.

I've been doing the investigating on this mainly because I have a setup that is particularly sensitive to it. My setup is dd-wrt OpenVPN client and server routers, a VPN connecting one private network I control to another private network I control, and a FreePBX VoIP server on one of the private networks and 7 VoIP phone sets on the other network, 2 Polycom's and 2 Cisco phones.

With tun-mtu and link-mtu unset, allowing OpenVPN to autodiscover MTU, my phones will not stay registered to the PBX and OpenVPN will setup a link-mtu in excess of the physical mtu of the interfaces (gigabit ethernet). With tun-mtu set to 1350 and AES-128-CGM/SHA256 setup link-mtu is computed to 1472 and the phones then WILL stay registered. Clearly what is happening is that with the smaller tun-mtu, OpenVPN is properly fragmenting/reassembling larger registration keepalives or whatever traffic the PBX and the phones are using to stay registered.

In addition to trashing UDP packets we have also found that "black holes" exist for certain packet sizes - attempts to send ICMP echo replies through the VPN for specific sized packets with the Don't Fragment bit set cause the packets to completely disappear - no responses back at all not even the Packet Too Big icmp messages that are supposed to come back. Setting tun-mtu corrects this issue.

I have no issues sending 1500 byte echo reply ICMP packets with DF set from the public IP's on one site to the public IP's on the other. Both of my sites have subnets of static public IPs. One site uses a different provider than the other and traceroutes show a half dozen hops though public routers.

It took a LOT of time to figure out what was going on, and there is a plethora of contradictory OpenVPN discussions and documentation on the Internet on this issue that is either flat out baloney or applicable to old versions of OpenVPN, that put up lots of blind alleys to get stuck in. There is also no good real world testing tools (other than using actual hardware phones and such) Users are also upset that tun-mtu has to be set so low and are resistant to doing it, and the vast majority of dd-wrt users are using OpenVPN in one direction only - as a client to go to a commercial VPN provider - and the providers themselves also (apparently) don't understand the issue and some trash large UDP packets themselves, Many also run older OpenVPN versions and testing is all over the map on those and behavior varies in those instances. And some OpenVPN documentation says to set tun-mtu to 1500 which is clearly wrong since that would create a packet that is much larger than an MTU of 1500 and what would happen to it is undefined, it's also not understood if OpenVPN would even pay attention to such a setting at all if it knew the actual interface MTU was also 1500.

Broadcom also has not written drivers for newer Linux kernel versions for much of their hardware so moving to a newer Linux kernel IS NOT an option for most router platforms. Some platforms are still stuck on the 2.4 kernel and some on the 2.6 kernel and some on the 3 kernel. Many users run hardware that has restricted flash and nvram and so will not upgrade to the latest dd-wrt builds they will use old builds with insecure versions of OpenVPN on them because they "work"

What I am seeking is some clarification on this issue:

1) Does the OpenVPN developers not really give a fig on this issue or are they actually interested in us submitting bugs on it? The focus does seem rather x86/Wintel these days from the OpenVPN project.

2) Why is it that ifconfig shows the correct physical MTU and OpenVPN's autodiscovery seems unable to figure it out? Isn't the Linux ifconfig getting the MTU from the network driver? Since all programs in this environment run at the "root" level it seems there is no access security blocking the OpenVPN program from reading these values.

3) Is there some inherent issue with ARM gear that makes MTU discovery difficult to impossible? Or is it some inherent issue with older Linux kernels and newer OpenVPN versions? Or is the problem due to Broadcom and other makers of ARM gear who are disinterested in supporting newer Linux kernels on their older chips that are still out in the wild being used?

4) What is the "best" way to fix incorrect MTU detection of OpenVPN in the config file? Is it setting tun-mtu, or link-mtu or mssfix or something else?

Any discussion other than "switch over to an old PC for running OpenVPN" or other such Wintel-centric "advice" will be most welcome. Please remember that the entire world does not have ready access to a pile of free old PC's that won't satisfactorily run Windpig10

Thanks!

TinCanTech
OpenVPN Protagonist
Posts: 11137
Joined: Fri Jun 03, 2016 1:17 pm

Re: OpenVPN MTU detection bug

Post by TinCanTech » Sun May 16, 2021 6:02 pm

Openvpn has a long, long history of weird problems with MTU.

I cannot give you any definitive answers but the current preferred setting appears to be --tun-mtu 1400.
Which conflicts with the manual and other defaults, most notably --mss-fix !

I don't think there is an easy answer ..

tedm
OpenVpn Newbie
Posts: 6
Joined: Sun May 16, 2021 4:30 pm

Re: OpenVPN MTU detection bug

Post by tedm » Sun May 16, 2021 6:33 pm

tun-mtu of 1400 with AES-128-GCM produces a link-mtu that exceeds 1472. When using that my phones will not stay registered so clearly it's too large for that cipher. It might be OK for CBC ciphers or blowfish or chacha or something like that, I have not tested. The ARM cpu contains go-fast instructions for AES that clearly OpenVPN is using which is why I'm sticking with that cipher.

I'll be happy to put much more time into testing IF there's any interest from the devs in fixing the problem. Otherwise it's just better to publicize the issue as much as possible to try to help users out who are getting bitten in the ass by weird problems caused by this. Currently in the dd-wrt project there's a lot of push from experienced users to get users running private VPN's to run wireguard because the experienced users are fed up with chasing ghosts in OpenVPN. But a LOT of users are undoubtedly operating from networks in countries that do not believe in the right of privacy and OpenVPN is their only way to get safe access out. Don't forget there's entities in the world quite happy to see OpenVPN _not_ nail down MTU as it keeps it unreliable, thus encouraging people to run antique, easily-crackable ciphers and old OpenVPN versions.

TinCanTech
OpenVPN Protagonist
Posts: 11137
Joined: Fri Jun 03, 2016 1:17 pm

Re: OpenVPN MTU detection bug

Post by TinCanTech » Sun May 16, 2021 6:39 pm

tedm wrote:
Sun May 16, 2021 6:33 pm
tun-mtu of 1400 with AES-128-GCM produces a link-mtu that exceeds 1472. When using that my phones will not stay registered so clearly it's too large for that cipher.
Then go lower.

This is why the options are there, so that you can use them to get Openvpn working.
tedm wrote:
Sun May 16, 2021 6:33 pm
Don't forget there's entities in the world quite happy to see OpenVPN _not_ nail down MTU as it keeps it unreliable, thus encouraging people to run antique, easily-crackable ciphers and old OpenVPN versions.
No doubt. But bear in mind that it's not Openvpn's job to tackle that !

It is what-it-is and it is free, people are also free to chose an alternative.

Post Reply