Network Stalls

voiceover · Post by **voiceover** » Sat Oct 30, 2010 1:45 pm

For years, I have about 30 openvpn connections successfully running at any given moment except for connections with one particular ISP which experiences periodic network stalls over the vpn while non-vpn traffic is flowing normally.

In the following setup, the VPN end-point running in server mode is located at a hosting provider and is using an official IP address. The VPN end-point running in client mode is located on a DSL router (linux host) which is using an official IP address.

If I do a:

Code: Select all

# ping -s 1472 -M do 172.18.18.6

from the server to the DSL connection then the pings come back. Using:

Code: Select all

# ping -s 1473 -M do 172.18.18.6

produces messages:

Code: Select all

From 172.18.18.1 icmp_seq=1 Frag needed and DF set (mtu = 1500)

These pings are going over the VPN from the server to the DSL connection. Pinging the DSL connection from the server *without* going over the VPN connection produces the same behaviour (|ping|==1472 -> succes; |ping|==1473 -> fail).

If I setup a regular (|icmp|==64) ping going from the server to the DSL connection, all pings are replied to in less than 15ms. At the same time, I run a ping (|icmp|==64) from the server to the DSL connection going over the VPN connection. Most pings are replied to in 17ms but there's periods when several pings are lost or stalled significantly while at the same time, the 'regular' ping (which is not going over the vpn connection) simply continues without any stalls or lost replies.

This is the ping which is not going over the VPN from the server to the DSL connection:

Code: Select all

64 bytes from dsl.hostname (dsl.ipa.ddr.ess): icmp_seq=75 ttl=61 time=12.6 ms
64 bytes from dsl.hostname (dsl.ipa.ddr.ess): icmp_seq=76 ttl=61 time=12.5 ms
64 bytes from dsl.hostname (dsl.ipa.ddr.ess): icmp_seq=77 ttl=61 time=12.5 ms
64 bytes from dsl.hostname (dsl.ipa.ddr.ess): icmp_seq=78 ttl=61 time=12.6 ms
64 bytes from dsl.hostname (dsl.ipa.ddr.ess): icmp_seq=79 ttl=61 time=12.5 ms
64 bytes from dsl.hostname (dsl.ipa.ddr.ess): icmp_seq=80 ttl=61 time=12.6 ms
64 bytes from dsl.hostname (dsl.ipa.ddr.ess): icmp_seq=81 ttl=61 time=12.4 ms
64 bytes from dsl.hostname (dsl.ipa.ddr.ess): icmp_seq=82 ttl=61 time=12.4 ms
64 bytes from dsl.hostname (dsl.ipa.ddr.ess): icmp_seq=83 ttl=61 time=12.6 ms
64 bytes from dsl.hostname (dsl.ipa.ddr.ess): icmp_seq=84 ttl=61 time=12.3 ms
64 bytes from dsl.hostname (dsl.ipa.ddr.ess): icmp_seq=85 ttl=61 time=12.8 ms
64 bytes from dsl.hostname (dsl.ipa.ddr.ess): icmp_seq=86 ttl=61 time=12.3 ms
64 bytes from dsl.hostname (dsl.ipa.ddr.ess): icmp_seq=87 ttl=61 time=13.0 ms

and the beat goes on...

This is the ping from the same server to the same DSL connectiong going over the VPN connection:

Code: Select all

64 bytes from 172.18.18.6: icmp_seq=76 ttl=64 time=14.4 ms
64 bytes from 172.18.18.6: icmp_seq=77 ttl=64 time=14.5 ms
64 bytes from 172.18.18.6: icmp_seq=78 ttl=64 time=14.5 ms
64 bytes from 172.18.18.6: icmp_seq=79 ttl=64 time=14.6 ms
64 bytes from 172.18.18.6: icmp_seq=82 ttl=64 time=1013 ms
64 bytes from 172.18.18.6: icmp_seq=80 ttl=64 time=5017 ms
64 bytes from 172.18.18.6: icmp_seq=87 ttl=64 time=1015 ms
64 bytes from 172.18.18.6: icmp_seq=86 ttl=64 time=3016 ms
64 bytes from 172.18.18.6: icmp_seq=83 ttl=64 time=9020 ms
64 bytes from 172.18.18.6: icmp_seq=92 ttl=64 time=1016 ms
64 bytes from 172.18.18.6: icmp_seq=90 ttl=64 time=4018 ms
64 bytes from 172.18.18.6: icmp_seq=96 ttl=64 time=1014 ms

and the dropping or stalling continues for anywhere between 10s and several minutes...

Both pings were started at the same time.

This is the server config:

Code: Select all

cd /etc/openvpn/config_dir
local off.ici.al.ip
port 29302
proto udp
dev tun7
hand-window 240
ca ca.crt
cert HS.crt
key HS.key
dh dh2048.pem
; tun
server 172.18.18.0 255.255.255.0
route 172.18.17.0 255.255.255.0
push "route 172.18.21.0 255.255.255.0"
push "dhcp-option DNS 172.18.21.252"
client-config-dir ccd
keepalive 10 180
cipher AES-256-CBC
comp-lzo no
max-clients 128
persist-key
persist-tun
ifconfig-pool-persist /var/run/openvpn/HS-ipp.txt
status /var/run/openvpn/HS-openvpn-status.log
verb 3
mute 13

and this is the client config:

Code: Select all

cd /etc/openvpn/client_conf
client
dev tun5
proto udp
remote off.ici.al.ip 29302
resolv-retry infinite
nobind
user nobody
group nobody
persist-key
persist-tun
ca ca.crt
cert DSL_1.crt
key DSL_1.key
cipher AES-256-CBC
comp-lzo
verb 3

Note that there are several clients connecting with this server from several different ISPs. All clients run for years without any issues whatsoever while one client on one particular ISP experiences these stalls.

I don't see any special messages logged. If I use "verb 6", then, on the DSL connection I see the UDP packets come in, get transferred/unpacked to ICMP, receive an ICMP response which is transferred/packed to UDP and sent back over the DSL link. Iirc, the UDP packets are 133 octets in size.

I've played with mssfix, tun-mtu and fragment but to no avail. I've run the same VPN over (non-preferable) TCP but the behaviour is more or less the same.

Should I accuse my ISP of traffic shaping of some sort?

tommyj27 · Post by **tommyj27** » Mon Nov 08, 2010 7:06 pm

The symptoms you are describing do not indicate any sort of MTU issue. My initial thought while reading your post was also poorly-designed QoS on the ISP or modem-level. The major difference between your two pings is that the ISP sees them as different protocols, the "outside" ping is ICMP and the "inside" ping is UDP. It may be the some QoS somewhere is giving higher priority to ICMP than to UDP; so when some burst of traffic happens, queues fill and response time degrades.

Another way you might troubleshoot this is by using mtr. If you've never used it before, mtr is a great little traceroute program that continually runs a traceroute and updates response times. The big thing that may be useful to you is that it can send ICMP or UDP packets. I would try running two "outside" mtr sessions in parallel, one using each traffic type. Watch for latency spikes on the UDP test and see if there is a corresponding spike for ICMP. For fun, you might even try running a third concurrent mtr test on the "inside" of the tunnel.

voiceover · Post by **voiceover** » Tue Nov 09, 2010 10:09 am

tommyj27 wrote:The symptoms you are describing do not indicate any sort of MTU issue. My initial thought while reading your post was also poorly-designed QoS on the ISP or modem-level. The major difference between your two pings is that the ISP sees them as different protocols, the "outside" ping is ICMP and the "inside" ping is UDP. It may be the some QoS somewhere is giving higher priority to ICMP than to UDP; so when some burst of traffic happens, queues fill and response time degrades.

Traffic shaping at the ISP level is exactly what I'm inferring from this behaviour. The (DSL) modem is running in bridge mode and there's a (linux) router behind the DSL modem which is not doing any form of QoS or otherwise.

I've asked the ISP whether they do traffic shaping and they've admitted that they prioritize voip over all other traffic. This by itself doesn't explain the behaviour of losing udp while icmp is completely lossless. I suspect that the ISP's QoS tables involve more than partitioning voip from non-voip traffic.

Moreover, the ISP uses a network provided by a network provider. The ISP will not admit that the underlying network does not apply their own QoS.

I've switched ISPs in the mean time. Regretfully, the new ISP uses the same underlying network as the old ISP and the behaviour has been the same.

tommyj27 wrote: Another way you might troubleshoot this is by using mtr. If you've never used it before, mtr is a great little traceroute program that continually runs a traceroute and updates response times. The big thing that may be useful to you is that it can send ICMP or UDP packets. I would try running two "outside" mtr sessions in parallel, one using each traffic type. Watch for latency spikes on the UDP test and see if there is a corresponding spike for ICMP. For fun, you might even try running a third concurrent mtr test on the "inside" of the tunnel.

Thanks. I'm going to try this out.

tommyj27 · Post by **tommyj27** » Tue Nov 09, 2010 6:04 pm

The DSL modem could conceivably be doing QoS even in bridge mode. IIRC, linux is capable of that (I assume the modem is running some sort of embedded linux). Is it possible to try a different DSL modem, or are there any firmware updates available for the one you have in place?

If you don't me asking, what ISPs are you dealing with? Usually the big guys get QoS right, but I've seen smaller ones get it completely arse-backwards.

Assuming they are indeed giving high priority to voice traffic, perhaps (as a last resort sort of band-aid) you could make your openvpn tunnel look like voice traffic. Unless they're doing some sort of deep packet inspection, I'm guessing that they are classifying voice traffic by UDP ports. Port 5060 is used for SIP signalling, and RTP generally uses random, high-numbered UDP ports. You might be able to fool their QoS by using an odd port number for the server, or do some iptables trickery on the openvpn server to redirect the port.

OpenVPN Support Forum

Network Stalls

Network Stalls

Re: Network Stalls

Re: Network Stalls

Re: Network Stalls