The Setup
Yea and verily, ponder ye the diagram that followeth below, and I shall spin ye a tale o’ sorrow. Was that Olde English or pirate? I really have no idea. Maybe an Olde English pirate sailing the seas of Ethernet. Seriously though, this diagram helps explain something I broke the other day, so take a peek and refer to it as you like.
I have been working on a project to migrate our remote office connectivity into a private WAN. Today, many of those sites are connected via a manual mesh of site-to-site IPSEC VPN tunnels. In the process of this conversion, I have been re-working the WAN cloud itself to leverage the vendor’s ability to peer with me via BGP. Using BGP will allow for dynamic routing and a full WAN mesh. Sure, I could have used mGRE with NHRP or DMVPN, but I don’t like to use tunnels if I don’t have to. With no encryption requirement, my interest in tunnels drops off quickly. Now that said, almost none of this matters, other than providing a little bit of context for what I broke the other day.
The Scenario
Site 1 and site 2 need to communicate. They used to talk through an IPSEC tunnel over the Internet, using border firewalls as tunnel endpoints. With the turn up of some new circuits, site 1 and site 2 can now talk via a private WAN. With the WAN circuits up, communications between 10.1.1.0/24 and 10.2.2.0/24 works correctly. At each site, the IGP redistributes local routes into the WAN’s BGP, and then BGP redistributes routes back out of BGP into the remote sites’ IGP.
Since I want to consolidate all inter-site traffic onto the WAN circuits, I added a static route for the local site DMZ into the site 1 and site 2 core routers, so that they would be redistributed into the WAN BGP in accordance with my carefully crafted route-map. This too worked without issue. Examining the routing tables of my core routers, I could see the /24 DMZ networks pointed towards the WAN cloud, just like I wanted.
Only…this isn’t what I wanted (yet). The problem that I ran into was in the form of a call from the IT manager at site 2, complaining that his users were unable to access resources in the DMZ of site 1. I checked routing tables, and found nothing unexpected. Forward and return routes all looked fine, to and from the WAN as expected. What then was the problem?
The problem is with the rather confused firewalls at each site. Let’s consider communication from site 2′s 10.2.2.0/24 core network to site 1′s 192.168.1.0/24 DMZ network in the new WAN scenario. 10.2.2.0/24 traffic leaves the site 2 core, heads across the WAN, and arrives at site 1′s core. Site 1 knows the DMZ network to live off the site 1 firewall, and forwards appropriately. The site 1 firewall gets the traffic and goes into an anti-spoofing hissy fit.
What’s anti-spoofing (sometimes called unicast RPF)? It’s a security check routers and firewalls can do to make sure that a source IP address should be arriving on the interface it’s arriving on. If a firewall sees traffic showing up on one interface, when he believes that traffic should be showing up an another interface, he assumes the traffic to be “spoofed” — a bogus source address presumably crafted by someone with bad intentions.
The site 1 firewall believes that traffic sourced from 10.2.2.0/24 should be arriving via the external interface / IPSEC site-to-site tunnel. When the site 1 firewall sees 10.2.2.0/24 showing up on the internal interface instead, he assumes the traffic to be spoofed, and throws it away.
The Solution
There are a few ways I could fix this. One thing to keep in mind is that I would like to keep the site-to-site IPSEC VPN tunnels in place to act as a backup for a WAN failure, but let’s consider a variety of ways to resolve this.
- Remove the IPSEC site-to-site tunnel from the firewalls, and update the firewall’s anti-spoofing tables so that it would accept the other site’s traffic on his internal interface. Again, not my first choice, as I want to keep the site-to-site tunnel in place as a failover.
- Remove the core static routes pointing to the DMZ network that is being redistributed into the WAN cloud and remote sites. For now, this is what I have done. I’m simply not advertising the DMZ network into the IGP. That way, what happens is that DMZ-destined traffic is default routed to the local site firewall where it matches a VPN map and gets sent to the other site via the IPSEC tunnel. Core-to-core traffic goes through the WAN, and core-to-DMZ traffic goes through the VPN tunnel. Not “right” or elegant, but the quick fix to restore connectivity to the remote DMZ for the impacted users.
- Other options require more research.
- I need to research my firewall configuration options specifically to see if there’s a way to allow this traffic to arrive (and be returned) via the firewall’s internal interface, while still keeping the VPN tunnel configuration straight.
- Another possibility is to introduce another device pair to terminate the site-to-site VPN tunnels, while keeping the DMZ on its own firewall. This is what you’d do in a larger enterprise shop, but can be cost-prohibitive in a smaller one.
- Nested tunnels. By this, I mean nail up a GRE tunnel between the 2 core routers, and hide all inter-site traffic inside of this GRE tunnel. The IPSEC site-to-site tunnel would carry nothing but GRE packets. The traffic flow from site 2 core to site 1 DMZ would work, though. Traffic headed for the site 1 DMZ arrives at the site 2 core router, which pops it into a GRE tunnel then forwarded to the site 2 firewall, which pops the GRE traffic into the firewall 2 IPSEC tunnel, which arrives at the firewall 1 IPSEC tunnel and is forwarded to the core 1 router, where the GRE packet is decapsulated, and then forwarded to the site 1 firewall to be routed to the DMZ. Solutions like this one are usually more fun to talk about than actually do. The fancier the design, the more vulnerable that design is to a hardware failure or other sort of topology change. When the network breaks, fancy is no fun anymore.