Max connection resource problem
Moderators: TinCanTech, TinCanTech, TinCanTech, TinCanTech, TinCanTech, TinCanTech
Forum rules
Please use the [oconf] BB tag for openvpn Configurations. See viewtopic.php?f=30&t=21589 for an example.
Please use the [oconf] BB tag for openvpn Configurations. See viewtopic.php?f=30&t=21589 for an example.
-
- OpenVpn Newbie
- Posts: 3
- Joined: Tue May 10, 2022 4:59 pm
Max connection resource problem
We have a large number of IoT devices (running ubuntu) spread out in random places. They are behind misc network that we can't control so to be able to manage them we have them to do a vpn connection to two vpn servers (two for redundancy). To do some maintenance authorized users can then connect to the vpn server and on to the IoT device.
This works well most of the time but as we grow we seem to hit some limit.
It's currently >5500 connections to each server and sometimes it seems like it just collapse with a ton of "TLS Error: TLS key negotiation failed to occur within 60 seconds". The connection count does a dive down to some hundred connections before it recovers as the clients then retry and the connection count then goes up again.
The vpn servers are in the cloud on dedicated cpu, quad core cpus with 8G mem. Network traffic when idle seems to be <2MB/sec.
Whenever it has issues I of course check cpu, mem and io and it seems like cpu might be a problem because the (single threaded - I know) openvpn process get close to using a full cpu (18-23%). Ok, so it's cpu then - well why is the cpu util then more like 5-10% when it is working?
What is it that can cause it to suddenly start dropping connections and then - without we doing anything - recover all the connections and go back to all good?
As we grow, what is a better solution to have access to each of this devices? I'm thinking more load balanced VPN servers with a "back bone" vpn network and some smarts to keep track of what vpn server each device is connected to.
What other smarter solution exist?
/ps
This works well most of the time but as we grow we seem to hit some limit.
It's currently >5500 connections to each server and sometimes it seems like it just collapse with a ton of "TLS Error: TLS key negotiation failed to occur within 60 seconds". The connection count does a dive down to some hundred connections before it recovers as the clients then retry and the connection count then goes up again.
The vpn servers are in the cloud on dedicated cpu, quad core cpus with 8G mem. Network traffic when idle seems to be <2MB/sec.
Whenever it has issues I of course check cpu, mem and io and it seems like cpu might be a problem because the (single threaded - I know) openvpn process get close to using a full cpu (18-23%). Ok, so it's cpu then - well why is the cpu util then more like 5-10% when it is working?
What is it that can cause it to suddenly start dropping connections and then - without we doing anything - recover all the connections and go back to all good?
As we grow, what is a better solution to have access to each of this devices? I'm thinking more load balanced VPN servers with a "back bone" vpn network and some smarts to keep track of what vpn server each device is connected to.
What other smarter solution exist?
/ps
-
- OpenVPN Protagonist
- Posts: 11138
- Joined: Fri Jun 03, 2016 1:17 pm
-
- OpenVpn Newbie
- Posts: 3
- Joined: Tue May 10, 2022 4:59 pm
Re: Max connection resource problem
What I wonder most is what I can look at to get some stability in the short term.
Long term we setting up redundant servers but that will take a while to design, configure and implement with a single backend admin network segment accessing multiple kiosk network segments.
Here is some info around the setup:
> lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 20.04.4 LTS
Release: 20.04
Codename: focal
> openvpn --version
OpenVPN 2.4.7 x86_64-pc-linux-gnu [SSL (OpenSSL)] [LZO] [LZ4] [EPOLL] [PKCS11] [MH/PKTINFO] [AEAD] built on Mar 22 2022
library versions: OpenSSL 1.1.1f 31 Mar 2020, LZO 2.10
Originally developed by James Yonan
Copyright (C) 2002-2018 OpenVPN Inc <sales@openvpn.net>
Compile time defines: enable_async_push=no enable_comp_stub=no enable_crypto=yes enable_crypto_ofb_cfb=yes enable_debug=yes enable_def_auth=yes enable_dependency_tracking=no enable_dlopen=unknown enable_dlopen_self=unknown enable_dlopen_self_static=unknown enable_fast_install=needless enable_fragment=yes enable_iproute2=yes enable_libtool_lock=yes enable_lz4=yes enable_lzo=yes enable_maintainer_mode=no enable_management=yes enable_multihome=yes enable_pam_dlopen=no enable_pedantic=no enable_pf=yes enable_pkcs11=yes enable_plugin_auth_pam=yes enable_plugin_down_root=yes enable_plugins=yes enable_port_share=yes enable_selinux=no enable_server=yes enable_shared=yes enable_shared_with_static_runtimes=no enable_silent_rules=no enable_small=no enable_static=yes enable_strict=no enable_strict_options=no enable_systemd=yes enable_werror=no enable_win32_dll=yes enable_x509_alt_username=yes with_aix_soname=aix with_crypto_library=openssl with_gnu_ld=yes with_mem_check=no with_sysroot=no
port 443
proto tcp
dev tun
ca ca.crt
cert serverkiosk.crt
key serverkiosk.key
dh dh.pem
topology subnet
server 10.<subnet>.0.0 255.255.0.0
ifconfig-pool-persist /var/log/openvpn/ipp-kiosk.txt
push "route 172.<subnet>.0.0 255.255.255.0"
duplicate-cn
keepalive 10 120
tls-auth ta-kiosk.key 0
key-direction 0
cipher AES-256-CBC
auth SHA256
max-clients 16000
user nobody
group nogroup
persist-key
persist-tun
status /var/log/openvpn/openvpn-status-kiosk.log
verb 3
script-security 2
client-connect client_connect.sh
client-disconnect client_disconnect.sh
route-up route-up.sh
route-pre-down route-pre-down.sh
up up.sh
down down.sh
management localhost 6601
client
dev tun
proto tcp
remote <remoteip>
resolv-retry infinite
nobind
persist-key
persist-tun
remote <remoteip>
cipher AES-256-CBC
auth SHA256
key-direction 1
verb 3
<ca>
...
</ca>
...
<cert>
...
</cert>
<tls-auth>
...
</tls-auth>
It's hard to get a good log when 5000+ BTMs trying to connect around the same time, many from the same ip. I did extract one retry here by grepping for <client_ip>:<port> of one failure. This "TLS key negotiation failed" is a very frequent error but left alone for a while it tend to recover so it's not like it's a config error. Once recovered it can work fine for some hours or weeks only to collapse again at some point.
May 18 12:55:03 <vpn-server1> ovpn-serverkiosk[355274]: TCP connection established with [AF_INET]<client_ip>:26422
May 18 12:55:03 <vpn-server1> ovpn-serverkiosk[355274]: <client_ip>:26422 1 variation(s) on previous 1 message(s) suppressed by --mute
May 18 12:55:03 <vpn-server1> ovpn-serverkiosk[355274]: <client_ip>:26422 MULTI: multi_create_instance called
May 18 12:55:03 <vpn-server1> ovpn-serverkiosk[355274]: <client_ip>:26422 Re-using SSL/TLS context
May 18 12:55:03 <vpn-server1> ovpn-serverkiosk[355274]: <client_ip>:26422 Control Channel MTU parms [ L:1623 D:1170 EF:80 EB:0 ET:0 EL:3 ]
May 18 12:55:03 <vpn-server1> ovpn-serverkiosk[355274]: <client_ip>:26422 NOTE: --mute triggered...
May 18 12:55:03 <vpn-server1> ovpn-serverkiosk[355274]: <client_ip>:26422 1 variation(s) on previous 1 message(s) suppressed by --mute
May 18 12:55:03 <vpn-server1> ovpn-serverkiosk[355274]: <client_ip>:26422 Local Options String (VER=V4): 'V4,dev-type tun,link-mtu 1571,tun-mtu 1500,proto TCPv4_SERVER,keydir 0,cipher AES-256-CBC,auth SHA256,keysize 256,tls-auth,key-method 2,tls-server'
May 18 12:55:03 <vpn-server1> ovpn-serverkiosk[355274]: <client_ip>:26422 NOTE: --mute triggered...
May 18 12:55:03 <vpn-server1> ovpn-serverkiosk[355274]: <client_ip>:26422 1 variation(s) on previous 1 message(s) suppressed by --mute
May 18 12:55:03 <vpn-server1> ovpn-serverkiosk[355274]: <client_ip>:26422 TCP connection established with [AF_INET]174.252.81.85:8487
May 18 12:55:03 <vpn-server1> ovpn-serverkiosk[355274]: <client_ip>:26422 TCPv4_SERVER link local: (not bound)
May 18 12:55:03 <vpn-server1> ovpn-serverkiosk[355274]: <client_ip>:26422 NOTE: --mute triggered...
May 18 12:55:08 <vpn-server1> ovpn-serverkiosk[355274]: <client_ip>:26422 10 variation(s) on previous 1 message(s) suppressed by --mute
May 18 12:55:08 <vpn-server1> ovpn-serverkiosk[355274]: <client_ip>:26422 TLS: Initial packet from [AF_INET]<client_ip>:26422, sid=47d26d81 56d5d97e
May 18 12:55:08 <vpn-server1> ovpn-serverkiosk[355274]: <client_ip>:26422 TCPv4_SERVER WRITE [66] to [AF_INET]<client_ip> P_CONTROL_HARD_RESET_SERVER_V2 kid=0 pid=[ #1 ] [ 0 ] pid=0 DATA len=0
May 18 12:55:08 <vpn-server1> ovpn-serverkiosk[355274]: <client_ip>:26422 NOTE: --mute triggered...
May 18 12:55:26 <vpn-server1> ovpn-serverkiosk[355274]: <client_ip>:26422 NOTE: --mute triggered...
May 18 12:55:26 <vpn-server1> ovpn-serverkiosk[355274]: <client_ip>:26422 1 variation(s) on previous 1 message(s) suppressed by --mute
May 18 12:55:26 <vpn-server1> ovpn-serverkiosk[355274]: <client_ip>:26422 VERIFY OK: depth=1, CN=BA vpn-eu CA
May 18 12:55:26 <vpn-server1> ovpn-serverkiosk[355274]: <client_ip>:26422 NOTE: --mute triggered...
May 18 12:55:26 <vpn-server1> ovpn-serverkiosk[355274]: <client_ip>:26422 1 variation(s) on previous 1 message(s) suppressed by --mute
May 18 12:55:26 <vpn-server1> ovpn-serverkiosk[355274]: <client_ip>:26422 TCPv4_SERVER WRITE [117] to [AF_INET]<client_ip> P_CONTROL_V1 kid=0 pid=[ #10 ] [ 3 ] pid=4 DATA len=51
May 18 12:56:03 <vpn-server1> ovpn-serverkiosk[355274]: <client_ip>:26422 TCPv4_SERVER READ [1093] from [AF_INET]<client_ip> P_CONTROL_V1 kid=0 pid=[ #11 ] [ ] pid=3 DATA len=1039
May 18 12:56:03 <vpn-server1> ovpn-serverkiosk[355274]: <client_ip>:26422 TLS Error: TLS key negotiation failed to occur within 60 seconds (check your network connectivity)
May 18 12:56:03 <vpn-server1> ovpn-serverkiosk[355274]: <client_ip>:26422 NOTE: --mute triggered...
May 18 12:56:03 <vpn-server1> ovpn-serverkiosk[355274]: <client_ip>:26422 1 variation(s) on previous 1 message(s) suppressed by --mute
May 18 12:56:03 <vpn-server1> ovpn-serverkiosk[355274]: <client_ip>:26422 Fatal TLS error (check_tls_errors_co), restarting
May 18 12:56:03 <vpn-server1> ovpn-serverkiosk[355274]: <client_ip>:26422 SIGUSR1[soft,tls-error] received, client-instance restarting
Long term we setting up redundant servers but that will take a while to design, configure and implement with a single backend admin network segment accessing multiple kiosk network segments.
Here is some info around the setup:
OS
> lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 20.04.4 LTS
Release: 20.04
Codename: focal
Open vpn version
> openvpn --version
OpenVPN 2.4.7 x86_64-pc-linux-gnu [SSL (OpenSSL)] [LZO] [LZ4] [EPOLL] [PKCS11] [MH/PKTINFO] [AEAD] built on Mar 22 2022
library versions: OpenSSL 1.1.1f 31 Mar 2020, LZO 2.10
Originally developed by James Yonan
Copyright (C) 2002-2018 OpenVPN Inc <sales@openvpn.net>
Compile time defines: enable_async_push=no enable_comp_stub=no enable_crypto=yes enable_crypto_ofb_cfb=yes enable_debug=yes enable_def_auth=yes enable_dependency_tracking=no enable_dlopen=unknown enable_dlopen_self=unknown enable_dlopen_self_static=unknown enable_fast_install=needless enable_fragment=yes enable_iproute2=yes enable_libtool_lock=yes enable_lz4=yes enable_lzo=yes enable_maintainer_mode=no enable_management=yes enable_multihome=yes enable_pam_dlopen=no enable_pedantic=no enable_pf=yes enable_pkcs11=yes enable_plugin_auth_pam=yes enable_plugin_down_root=yes enable_plugins=yes enable_port_share=yes enable_selinux=no enable_server=yes enable_shared=yes enable_shared_with_static_runtimes=no enable_silent_rules=no enable_small=no enable_static=yes enable_strict=no enable_strict_options=no enable_systemd=yes enable_werror=no enable_win32_dll=yes enable_x509_alt_username=yes with_aix_soname=aix with_crypto_library=openssl with_gnu_ld=yes with_mem_check=no with_sysroot=no
Server Config
port 443
proto tcp
dev tun
ca ca.crt
cert serverkiosk.crt
key serverkiosk.key
dh dh.pem
topology subnet
server 10.<subnet>.0.0 255.255.0.0
ifconfig-pool-persist /var/log/openvpn/ipp-kiosk.txt
push "route 172.<subnet>.0.0 255.255.255.0"
duplicate-cn
keepalive 10 120
tls-auth ta-kiosk.key 0
key-direction 0
cipher AES-256-CBC
auth SHA256
max-clients 16000
user nobody
group nogroup
persist-key
persist-tun
status /var/log/openvpn/openvpn-status-kiosk.log
verb 3
script-security 2
client-connect client_connect.sh
client-disconnect client_disconnect.sh
route-up route-up.sh
route-pre-down route-pre-down.sh
up up.sh
down down.sh
management localhost 6601
Client Config
client
dev tun
proto tcp
remote <remoteip>
resolv-retry infinite
nobind
persist-key
persist-tun
remote <remoteip>
cipher AES-256-CBC
auth SHA256
key-direction 1
verb 3
<ca>
...
</ca>
...
<cert>
...
</cert>
<tls-auth>
...
</tls-auth>
It's hard to get a good log when 5000+ BTMs trying to connect around the same time, many from the same ip. I did extract one retry here by grepping for <client_ip>:<port> of one failure. This "TLS key negotiation failed" is a very frequent error but left alone for a while it tend to recover so it's not like it's a config error. Once recovered it can work fine for some hours or weeks only to collapse again at some point.
server log
May 18 12:55:03 <vpn-server1> ovpn-serverkiosk[355274]: TCP connection established with [AF_INET]<client_ip>:26422
May 18 12:55:03 <vpn-server1> ovpn-serverkiosk[355274]: <client_ip>:26422 1 variation(s) on previous 1 message(s) suppressed by --mute
May 18 12:55:03 <vpn-server1> ovpn-serverkiosk[355274]: <client_ip>:26422 MULTI: multi_create_instance called
May 18 12:55:03 <vpn-server1> ovpn-serverkiosk[355274]: <client_ip>:26422 Re-using SSL/TLS context
May 18 12:55:03 <vpn-server1> ovpn-serverkiosk[355274]: <client_ip>:26422 Control Channel MTU parms [ L:1623 D:1170 EF:80 EB:0 ET:0 EL:3 ]
May 18 12:55:03 <vpn-server1> ovpn-serverkiosk[355274]: <client_ip>:26422 NOTE: --mute triggered...
May 18 12:55:03 <vpn-server1> ovpn-serverkiosk[355274]: <client_ip>:26422 1 variation(s) on previous 1 message(s) suppressed by --mute
May 18 12:55:03 <vpn-server1> ovpn-serverkiosk[355274]: <client_ip>:26422 Local Options String (VER=V4): 'V4,dev-type tun,link-mtu 1571,tun-mtu 1500,proto TCPv4_SERVER,keydir 0,cipher AES-256-CBC,auth SHA256,keysize 256,tls-auth,key-method 2,tls-server'
May 18 12:55:03 <vpn-server1> ovpn-serverkiosk[355274]: <client_ip>:26422 NOTE: --mute triggered...
May 18 12:55:03 <vpn-server1> ovpn-serverkiosk[355274]: <client_ip>:26422 1 variation(s) on previous 1 message(s) suppressed by --mute
May 18 12:55:03 <vpn-server1> ovpn-serverkiosk[355274]: <client_ip>:26422 TCP connection established with [AF_INET]174.252.81.85:8487
May 18 12:55:03 <vpn-server1> ovpn-serverkiosk[355274]: <client_ip>:26422 TCPv4_SERVER link local: (not bound)
May 18 12:55:03 <vpn-server1> ovpn-serverkiosk[355274]: <client_ip>:26422 NOTE: --mute triggered...
May 18 12:55:08 <vpn-server1> ovpn-serverkiosk[355274]: <client_ip>:26422 10 variation(s) on previous 1 message(s) suppressed by --mute
May 18 12:55:08 <vpn-server1> ovpn-serverkiosk[355274]: <client_ip>:26422 TLS: Initial packet from [AF_INET]<client_ip>:26422, sid=47d26d81 56d5d97e
May 18 12:55:08 <vpn-server1> ovpn-serverkiosk[355274]: <client_ip>:26422 TCPv4_SERVER WRITE [66] to [AF_INET]<client_ip> P_CONTROL_HARD_RESET_SERVER_V2 kid=0 pid=[ #1 ] [ 0 ] pid=0 DATA len=0
May 18 12:55:08 <vpn-server1> ovpn-serverkiosk[355274]: <client_ip>:26422 NOTE: --mute triggered...
May 18 12:55:26 <vpn-server1> ovpn-serverkiosk[355274]: <client_ip>:26422 NOTE: --mute triggered...
May 18 12:55:26 <vpn-server1> ovpn-serverkiosk[355274]: <client_ip>:26422 1 variation(s) on previous 1 message(s) suppressed by --mute
May 18 12:55:26 <vpn-server1> ovpn-serverkiosk[355274]: <client_ip>:26422 VERIFY OK: depth=1, CN=BA vpn-eu CA
May 18 12:55:26 <vpn-server1> ovpn-serverkiosk[355274]: <client_ip>:26422 NOTE: --mute triggered...
May 18 12:55:26 <vpn-server1> ovpn-serverkiosk[355274]: <client_ip>:26422 1 variation(s) on previous 1 message(s) suppressed by --mute
May 18 12:55:26 <vpn-server1> ovpn-serverkiosk[355274]: <client_ip>:26422 TCPv4_SERVER WRITE [117] to [AF_INET]<client_ip> P_CONTROL_V1 kid=0 pid=[ #10 ] [ 3 ] pid=4 DATA len=51
May 18 12:56:03 <vpn-server1> ovpn-serverkiosk[355274]: <client_ip>:26422 TCPv4_SERVER READ [1093] from [AF_INET]<client_ip> P_CONTROL_V1 kid=0 pid=[ #11 ] [ ] pid=3 DATA len=1039
May 18 12:56:03 <vpn-server1> ovpn-serverkiosk[355274]: <client_ip>:26422 TLS Error: TLS key negotiation failed to occur within 60 seconds (check your network connectivity)
May 18 12:56:03 <vpn-server1> ovpn-serverkiosk[355274]: <client_ip>:26422 NOTE: --mute triggered...
May 18 12:56:03 <vpn-server1> ovpn-serverkiosk[355274]: <client_ip>:26422 1 variation(s) on previous 1 message(s) suppressed by --mute
May 18 12:56:03 <vpn-server1> ovpn-serverkiosk[355274]: <client_ip>:26422 Fatal TLS error (check_tls_errors_co), restarting
May 18 12:56:03 <vpn-server1> ovpn-serverkiosk[355274]: <client_ip>:26422 SIGUSR1[soft,tls-error] received, client-instance restarting
-
- OpenVPN Protagonist
- Posts: 11138
- Joined: Fri Jun 03, 2016 1:17 pm
Re: Max connection resource problem
Why do you use TCP?
-
- OpenVpn Newbie
- Posts: 3
- Joined: Tue May 10, 2022 4:59 pm
Re: Max connection resource problem
Hard enough to get out from all random places with random firewalls that may block everything except tcp/80 and tcp/443
-
- OpenVPN Protagonist
- Posts: 11138
- Joined: Fri Jun 03, 2016 1:17 pm
Re: Max connection resource problem
Do you know why the default to --auth is SHA1 ?
-
- OpenVPN Expert
- Posts: 685
- Joined: Tue May 01, 2012 9:30 pm
Re: Max connection resource problem
You can consider set up multi instance of openvpn server so it can handle more connected clients. On cloud computer you should think about 10 instance of open pn running at the same time so each instance can sevice 500 clients.
Running multi instan can reduce overload openvpn server as it must keep the job so hard to sevice multi clients at same time
Running multi instan can reduce overload openvpn server as it must keep the job so hard to sevice multi clients at same time
-
- OpenVPN Expert
- Posts: 685
- Joined: Tue May 01, 2012 9:30 pm
Re: Max connection resource problem
Everything have its limits and that will show on real life situation like this one . Try multi way or buy more hardware to support growth clients .