This morning when I woke up, it was to the sound of my monitoring system notifications.
We have a pretty simple setup:
web
servers connect to thevpn
servervpn
server connects to theinternal
serversweb
andvpn
are hosted for our customer by a cloud providerinternal
servers are hosted at our customer’s datacenter
This morning then, around 7:30AM, a group of technicians decided to make a major change in the network configuration of all machines linked to our customer’s subscription at the cloud provider.
This change was supposed to be unrelated to our innocent boxes, but it turns out that after this re-configuration, we were in a funny situation:
web
could no longer connect tovpn
vpn
could no longer connect toweb
either- but
web
andvpn
could connect / be connected from anywhere else on the Internet.
If you can’t take the direct road …
As expected, our customer was not happy about the situation, and we were given until the end of the day to make it work again, no matter what.
Recap of the situation:
- we have about 20
web
servers that are on a10.10.0.0/16
network - the
vpn
machine is physically wired to both the Internet and theinternal
customer network, so we have to use it. - the openvpn network running on
vpn
has the10.30.0.0/16
range - the
internal
network has a lot of subnets of various sizes (most are /16)
We decided to spin another server, to go from the web
network to the newvpn
network, then from newvpn
to vpn
and from there we’re back in business.
It’s quite a stretch, but we had an almost-working situation that just needed a little nudge.
For connecting newvpn
to vpn
instead of using openvpn again, and since there was only going to be one peer in the network, we instead chose to use wireguard, even though it claims on the website it is not yet production ready, it is being currently reviewed for being integrated directly in the linux kernel, and past experiences proved it to be resilient enough for our usage.
WireGuard under Centos7 caveats
WireGuard comes with an abundance of packages to install from, and this was a treat since vpn
runs Ubuntu while the rest of our infrastructure runs Centos7.
[root@newvpn]# curl -Lo /etc/yum.repos.d/wireguard.repo https://copr.fedorainfracloud.org/coprs/jdoss/wireguard/repo/epel-7/jdoss-wireguard-epel-7.repo
[root@newvpn]# yum install epel-release
[root@newvpn]# yum install kernel kernel-headers dkms
[root@newvpn]# yum install wireguard-dkms wireguard-tools
Speaking of which, I encountered an annoying issue right after installing it:
[root@newvpn]# ip link add dev wg0 type wireguard
RTNETLINK answers: Operation not supported
What did I miss ? Was there some Centos-specific incantation that I overlooked ? No, I did all the steps as described, I had installed the kernel, the kernel headers, dkms
, etc.
But still, the wireguard
kernel module was not being found, as modprobe would confirm:
[root@newvpn]# modprobe wireguard
modprobe: FATAL: Module wireguard not found.
Then some old proverb struck my mind !
[root@newvpn]# reboot now
...
[me@newvpn]$ modprobe wireguard
modprobe: ERROR: could not insert 'wireguard': Operation not permitted
# oops! but encouraging!
[root@newvpn]# modprobe wireguard
[root@newvpn]#
Et voilà!
Setting up WireGuard
The rest of the setup went pretty much like what is described in the quickstart, so I’ll just post edited versions of my configurations here:
On vpn
server
First step was to disable openvpn that was running from vpn
to avoid further confusion, and install WireGuard.
[root@vpn]# service openvpn stop
In /etc/wireguard/wg0.conf
[Interface]
Address = 192.168.1.1/32
ListenPort = 3000
PrivateKey = ABCDEFG
[Peer]
PublicKey = KLMNOP
Endpoint = x.x.x.x:7000
AllowedIPs = 192.168.1.2/32
This is pretty clear: we declare a new interface called wg0
(thanks to the name of the file), that will be serving WireGuard service over port 3000
.
This end of the connection will be associated with the IP 192.168.1.1/32
.
iptables
rules
iptables -t nat -A POSTROUTING -s 192.168.1.0/24 -j MASQUERADE
This means “whatever comes out of WireGuard and its weird-looking IP range, just forward it to where it wants to go”.
in /etc/sysctl.conf
net.ipv4.conf.all.proxy_arp=1
net.ipv4.ip_forward=1
On newvpn
server
I am not going to describe how to setup openvpn here since there is already an abundance of available documentation and blog posts on the subject. I will just share the bits of configuration that are relevant for this example.
in /etc/wireguard/wg0.conf
[Interface]
Address = 192.168.1.2/24
ListenPort = 7000
PrivateKey = WXYZ
[Peer]
PublicKey = QRSTUV
Endpoint = y.y.y.y:3000
AllowedIPs = 192.168.1.1/32, 10.11.0.0/16, 10.45.0.0/16, 171.13.0.0/16, 10.32.0.0/16, 20.115.18.214/32, 20.70.0.0/16, 10.66.68.0/24
This is the important part: WireGuard can take care of setting up all your routes for you, as long as you declare what routes are accessible through what peer in the interface configuration file.
You can mention any route that can be routed from your exit node (here vpn
) and it will happily setup your clients to send the traffic in the right place, even if the IPs you mention are completely alien to the IP range you are using for setting up your private WireGuard network (I used 192.168.1.x
here and on the internal
network I’m only using 10.x.x.x
, 20.x.x.x
or 171.x.x.x
addresses).
The utility that is managing all of this for you is called wg-quick
and you can just invoke it this way once you have written your configuration file: wg-quick up wg0
.
in /etc/openvpn/servers.conf
...
server 10.30.0.0 255.255.0.0
...
# Service 1
push "route 10.11.0.0 255.255.0.0"
# Service 2
push "route 10.45.0.0 255.255.0.0"
# Service 3
push "route 171.13.0.0 255.255.0.0"
push "route 10.32.0.0 255.255.0.0"
# Service 4
push "route 20.115.18.214 255.255.255.255"
# Service 5
push "route 20.70.0.0 255.255.0.0"
# Service 6
push "route 10.66.68.0 255.255.255.0"
...
This network topology is going to be pushed to each VPN client saying “if you are looking for this range of IPs, then ask me”. “me” is in this case newvpn
.
Since WireGuard is routing the exact same ranges through its own interface, newvpn
is just transparently passing packets from web
to vpn
.
in /etc/openvpn/jail/ccd/web-5.conf
ifconfig-push 10.30.0.5 255.255.0.0
This pushes the static IP of each web
server in the openvpn network.
in /etc/sysctl.conf
net.ipv4.conf.all.proxy_arp=1
net.ipv4.ip_forward=1
iptables
rules
iptables -A FORWARD -s 10.10.0.0/16 -j ACCEPT
iptables -A FORWARD -d 10.10.0.0/16 -j ACCEPT
iptables -t nat -A POSTROUTING -o tun0 -j MASQUERADE
This will accept all traffic incoming from tun0 (the openvpn exit point), and forward it where it wants to go (we could probably have used -s 10.10.0.0/24
instead of -o tun0
).
Don’t ask me why here we need to specify those FORWARD
rules either and not on the WireGuard server side, I don’t know. But I know openvpn
routing does not work without it.
One last challenge awaits
After running wg-quick
on each server and starting openvpn on newvpn
, I could ping internal
services from my web
boxes !
me@web-1:~$ ping 10.45.13.62
PING 10.45.13.62 (10.45.13.62) 56(84) bytes of data.
64 bytes from 10.45.13.62: icmp_seq=1 ttl=104 time=57.9 ms
64 bytes from 10.45.13.62: icmp_seq=2 ttl=104 time=56.0 ms
^C
--- 10.45.13.62 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 56.081/56.996/57.911/0.915 ms
This goes through tun0
, exits in newvpn
, gets forwarded by WireGuard to vpn
who is physically connected to the 10.45.0.0/16
network, and back !
But as I am ready to call it a day, a co-worker tells me that he cannot reach the service in question, but ping
is indeed doing its job.
me@web-1:~$ curl -v http://10.45.13.62
* Rebuilt URL to: http://10.45.13.62/
* Trying 10.45.13.62...
* connect to 10.45.13.62 port 80 failed: No route to host
* Failed to connect to 10.45.13.62 port 80: No route to host
* Closing connection 0
curl: (7) Failed to connect to 10.45.13.62 port 80: No route to host
So you’re telling me that ICMP traffic can reach the host, but TCP traffic cannot ?
This is definitely not a networking issue, but sounds an awful lot like some firewall issue.
Let’s check the iptables
rules just one more time …
[root@newvpn]# iptables -L
Chain INPUT (policy ACCEPT)
target prot opt source destination
ACCEPT all -- anywhere anywhere ctstate RELATED,ESTABLISHED
ACCEPT all -- anywhere anywhere
INPUT_direct all -- anywhere anywhere
INPUT_ZONES_SOURCE all -- anywhere anywhere
INPUT_ZONES all -- anywhere anywhere
DROP all -- anywhere anywhere ctstate INVALID
REJECT all -- anywhere anywhere reject-with icmp-host-prohibited
Chain FORWARD (policy ACCEPT)
target prot opt source destination
ACCEPT all -- anywhere anywhere ctstate RELATED,ESTABLISHED
ACCEPT all -- anywhere anywhere
FORWARD_direct all -- anywhere anywhere
FORWARD_IN_ZONES_SOURCE all -- anywhere anywhere
FORWARD_IN_ZONES all -- anywhere anywhere
FORWARD_OUT_ZONES_SOURCE all -- anywhere anywhere
FORWARD_OUT_ZONES all -- anywhere anywhere
DROP all -- anywhere anywhere ctstate INVALID
REJECT all -- anywhere anywhere reject-with icmp-host-prohibited
ACCEPT all -- anywhere 10.10.0.0/16
ACCEPT all -- 10.10.0.0/16 anywhere
....
I can see the rules that I added for forwarding traffic, and I don’t know what I’m doing wrong here.
Except maybe …
See those lines:
REJECT all -- anywhere anywhere reject-with icmp-host-prohibited
ACCEPT all -- anywhere 10.10.0.0/16
ACCEPT all -- 10.10.0.0/16 anywhere
What it means is: “reject all packets, but ICMP, then for the packets that haven’t been rejected, forward them”.
This is basically killing all traffic except ping
before doing the relay, while it would make more sense to accept the traffic to be relayed, then drop the remainder while keeping the ICMP for debug purpose.
Turns out it’s pretty annoying to edit iptables
by hand using the command line, so I just ran a quick iptables-save > /etc/sysconfig/iptables
.
Then I swapped the order of the rules so it reads:
ACCEPT all -- anywhere 10.10.0.0/16
ACCEPT all -- 10.10.0.0/16 anywhere
REJECT all -- anywhere anywhere reject-with icmp-host-prohibited
Then finally ran iptables-restore < /etc/sysconfig/iptables
.
All services re-connected properly, traffic was reaching the internal
network again.
Not too bad for an afternoon of work, I ran into an impressive amount of quirks (set aside the initial cataclysm that triggered this whole operation), and was surprised to see it was not more thoroughly documented.
I hope this may help you if you are also dealing with routing issues on Centos7 and using WireGuard and OpenVPN in conjunction. You can do it !