Advanced? network debugging

Table of Contents

I’m writing this a couple weeks after some business travel to the US. We had our global offsite event, including a stay at a hotel in Monterey.

This hotel provided free Wifi (thank god), though the way it was set up lead to a minor issue with my VPN.

Precondition
#

My main firewall (router for the home network) acts as wireguard server and allows me to access resources I host at home from anywhere. I’m generaly connected to it from my phone, and for travel to the US enable access from the work laptop.

The networks involved here are:

10.0.8.0/24 - the wireguard virtual network
192.168.128.0/24 - the network at home

I’ve conifgured the metric on the VPN routed route to home quite high (i.e. low priority) so it doesn’t interfere with local networks by default.

I don’t usually run into issues with either of those CIDRs, since they are uncommon enough. Until very recently.

This hotel uses 192.168.128.0/19 for the guest wifi.

Enabling the VPN
#

So when I inspect the routes on this laptop with the VPN enabled we see

% ip route
default via 192.168.128.1 dev wlp0s20f3 proto dhcp src 192.168.129.148 metric 600
10.0.8.0/24 dev wg-home proto kernel scope link src 10.0.8.14 metric 50
192.168.128.0/24 via 10.0.8.1 dev wg-home proto static metric 2000
192.168.128.0/19 dev wlp0s20f3 proto kernel scope link src 192.168.129.148 metric 600

None of these are surprising, but we do see that there is some overlap beyond just the default route. There’s 192.168.128.0 twice. Once on /19 and once on /24.

Now the rules of Linux routing dictate that the longest prefix wins. I.e. any connection going towards 192.168.128.0/24 will route via wireguard. Any connection going to 192.168.128.0/19 but outside the previous /24 will route locally.

This works for a moment, but then the connection breaks entirely because it tries to reach the default route through the VPN (I suspect). While the default route has an interface in ip route, it does not seem to be respected by the in-kernel wireguard :(

Fixing the default route
#

Easy fix: ip route add 192.168.128.1 dev wlp0s20f3 metric 50.

Now we have another layer in the matroska, and route anything going towards the default gateway correctly again. /32 is clearly longer than /24.

With this issue solved, we created one that stumped me for a bit.

The new problem
#

This new problem really only exists because 192.168.128.1 hosts the DNS server for the home network. I.e. if I want to use domain names (which I need to do) the laptop needs to be able to resolve them from this address. As we’ve just discussed, we need to route this address to the local gateway, otherwise things break.

Enter policy based routing
#

Luckily, linux is mighty. The kernel contains more than a singular routing table. While most users only really interact with the main one, there’s multiple set up by default.

I haven’t found a good way to just list all tables. But you can list the rules pointing at those tables.

% ip rule
0:	from all lookup local
32766:	from all lookup main
32767:	from all lookup default

So if we want to do some magic, we need to add a new routing entry but only point the kernel at it for the intended traffic.

ip route add table 53 192.168.128.1/32 via 10.0.8.1 dev wg-home metric 5 adds a new route into a new table. ip rule add ipproto udp dport 53 lookup 53 priority 0 points the kernel at that table. But only for traffic that we want to route over the VPN. In this case, DNS.

We can validate this:

ip rule
0:    from all ipproto udp dport 53 lookup 53
1:    from all lookup local
32766:    from all lookup main
32767:    from all lookup default

And

ip route list table 53
192.168.128.1 via 10.0.8.1 dev wg-home metric 5

Nice!. At this point, we can send DNS queries to the server.

06:45:29.614938 wg-home Out IP (tos 0x0, ttl 64, id 7794, offset 0, flags [none], proto UDP (17), length 94)
    10.0.8.14.36562 > 192.168.128.1.53: 61546+ [1au] A? host.in.my.home. (66)
06:45:29.880723 wg-home In  IP (tos 0x0, ttl 64, id 60450, offset 0, flags [none], proto UDP (17), length 98)
    192.168.128.1.53 > 10.0.8.14.36562: 61546 1/0/1 host.in.my.home. A 192.168.128.??? (70)

The traffic captured above was generated by running dig @192.168.128.1 host.in.my.home. And it looks good. But dig never returns :( So there is some issue we haven’t yet found.

The confusing problem
#

At this point, I was at a loss. The response comes in, so the issue must be local.

I found dropwatch. And with some messing with it, I found out, that the issue must be in netfilter/iptables.

drop at: nft_do_chain+0x40e/0x630 [nf_tables] (0xffffffffc24ed4fe)
origin: software
input port ifindex: 85
timestamp: Tue Sep 16 06:46:38 2025 725500501 nsec
protocol: 0x800
length: 98
original length: 98
drop reason: NETFILTER_DROP

Everything on the network seems to work. So we’ll have to take a closer look at the firewall.

Naturally, I use iptables -nvL to figure out what’s going on. But I don’t see anything (I’m not posting the Google corp laptop firewall here). The most relevant seeming thing was the usual conntrack rules:

 pkts bytes target     prot opt in     out     source               destination        
13044 5623K ACCEPT     all  --  lo     *       0.0.0.0/0            0.0.0.0/0          
98309   86M ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0            ctstate RELATED,ESTABLISHED
    0     0 DROP       all  --  *      *       0.0.0.0/0            0.0.0.0/0            ctstate INVALID

So we should check out whether conntrack works as expected.

$ conntrack -E -p udp --dst 192.168.128.1
[NEW] udp      17 30 src=10.0.8.14 dst=192.168.128.1 sport=39451 dport=53 [UNREPLIED] src=192.168.128.1 dst=10.0.8.14 sport=53 dport=39451

It does. At least the outgoing half. But the connection never gets responded on. So conntrack is set up correctly. But the response gets dropped before it even reaches conntrack.

There can only be … many
#

At this point I remembered, iptables also got multiple tables. Let’s take a look at all of them.

$ iptables-save 
# Generated by iptables-save v1.8.11 (nf_tables) on Tue Sep 16 07:32:47 2025
*raw
:PREROUTING ACCEPT [255988:178111373]
:OUTPUT ACCEPT [0:0]
-A PREROUTING -m rpfilter --invert -j DROP
-A PREROUTING -m rpfilter --invert -j DROP
COMMIT

oh, what do we have here? Something I hadn’t seen earlier, because it’s on another table. iptables -t raw -D PREROUTING -m rpfilter --invert -j DROP to the rescue! Running it twice removes both of the rules form the raw table. And tada! things work.

Additional thoughts
#

The way rpfilter works is by checking whether the incoming traffic would be routed out of the interface it came in from. I suspect that these rules could be kept around by adding a second policy rule that also points at the custom table but uses a source IP/Port pair instead of the destination one. But since this was only a short trip, and I debugged this partially because I was jetlagged and awake at 2am, I did not come around to test this later on.

Precondition#

Enabling the VPN#

Fixing the default route#

The new problem#

Enter policy based routing#

The confusing problem#

There can only be … many#

Additional thoughts#