News | Profile | Code | Photography | Looking Glass | Projects | System Statistics | Uncategorized |
Blog |
Alright.. pick yourself off the floor. YES, IPv6 and NAT+PAT (many folks call it just NAT or NAPT but I'm calling it NAT+PAT to be as precise as possible) are in the title. Keep reading, don't be frightened.
NAT+PAT is a hack introduced in the mid-1990s to help combat IPv4 address exhaustion. It became wildly popular in residences and large enterprises that needed to attach multiple computers to an Internet connection with only a single public (PA) IPv4 address. Unfortunately it breaks end-to-end connectivity, which impacts many protocols, and provides some network administrators with a false sense of security. Bottom line is that it sucks, and it needs to go away.
Enter IPv6, the new protocol with an almost limitless address space. NAT+PAT won't be needed anymore with IPv6, they'll be a thing of the past! Well, almost. One accepted scenario where NAT+PAT will still be used is load balancing. Although, some may not even view load balancing as actual NAT+PAT, but a simple TCP proxy. Call it what you will, but the real servers in a load balanced environment (assuming no DSR) won't typically see the source address of the client hitting the VIP. They'll see requests coming from address(es) on the load balancer itself.
So, other than load balancing, IPv6 NAT+PAT isn't needed, right? Well, read on, because I'll present a somewhat common IPv6 dual-stack deployment that might be able to benefit from sugh a beast.
Enter a company called Foo and Associates. This company doesn't exist, and I'm making all this stuff up. However, the design may look strangely similar to other networks out there. Here's a diagram to start you off:
The above diagram shows Foo and Associates' network, AS64499. They've got chunks of IPv4 and IPv6 PI space and use 10.10/16 internally. There's two data centers and a few branch offices with private (MPLS, leased lines, whatever) WAN connectivity back to the DCs. The DCs have some users and servers, and the branch offices just have users. Right now, the branch offices access the IPv4 Internet through either of the two DCs, and get translated (NAT+PAT) to the external address(es) of the Internet firewalls depicted in site 1 and site 2.
There is a "dirty" segment that the firewalls live on as well as the Internet routers, which maintain connections to a few transit providers. The two Internet routers have a private WAN link between them to be used for redundancy in case the transit providers fail at one particular DC. The following diagram shows both the IPv4 and IPv6 advertisements:
Now let's talk about how IPv4 works.
The DC servers just talk to the Internet without NAT or PAT or any translation. The Internet routers statically route the appropriate subnets to the DC firewalls, and everything works properly. The IPv4 advertisements are tweaked (as you can see in the diagram) so each /23 worth of DC server space is routed symmetrically in a hot-potato fashion.
Internet access from the RFC 1918'ed campus networks goes through the Internet firewalls at each of the DCs. An IPv4 default route is followed based on WAN link costs, and traffic is translated via NAT+PAT out the Internet firewalls. If one DC dies, the branch offices that are sending Internet traffic to it just move over to the Internet firewall at the other DC. Here's a diagram of the outbound IPv4 Internet access flow:
And here's one showing the return traffic:
Let's also say some of the WAN links are a bit more latent than the others, either due to fiber routes or the geographic location of the sites of Foo and Associates. The internal routing always takes the path with the lowest latency to determine the best way to the Internet. Sure, it could be EIGRP or it could be IS-IS with some mechanism to tune link costs based on latency. The details aren't relly important, here.
Now, let's talk about what happens when Foo and Associates deploys IPv6.
After getting a /40 assignment from their local RIR, Foo and Associates deploys IPv6 right on top of their current infrastructure, essentially dual-stacking everything. The first diagram shows all the IPv6 addressing, so you might want to look at that again.
However, instead of deaggregating and sending longer prefixes to the DFZ, they Go Greenࡊ and only announce their /40 equally out of each data center. The DC servers don't have a problem with this, as there are IPv6 static routes (and advertisements in IBGP, just like there were for the IPv4 specifics in the previous section) pointing from the Internet routers at each DC to the /48s of public space. Egress traffic from the servers will exit locally, and ingress traffic can be received either locally or from the other DC, where it will be sent over the WAN link toward the correct site. The DC firewall sees both traffic flows, and everything works as expected.
What doesn't work correctly is Internet access from the branch offices. Since any part of the company could access the Internet via either of the Internet firewalls, the Internet routers have the whole /40 routed to the Internet firewalls for return traffic. Now, keep in mind that Foo and Associates has stateful firewall policies allowing internally-initiated outbound Internet traffic, but not Internet-initiated inbound traffic. Here's the outbound IPv6 Internet access:
And illustrating the specific case where things break, here's the return traffic:
Now you see the problem. If Google decided to send traffic back to Foo and Associates via transit provider B, it would get lost. The Internet firewall at site 2 doesn't have a record of the outgoing connection (SYN packet to [2001:4860:8008::68]:80) so it denies it thinking it's invalid traffic. Meanwhile, the Internet firewall at site 1 never sees any inbound traffic to the session it opened to [2001:4860:8008::68]:80. Eventually, the user's browser at site 3 times-out and falls back to IPv4.
There are a few potential fixes to the above problem. Unfortunatly, every one of them has some drawback. Let's go over them individually.
1: Deaggregate
Foo and Associates could deaggregate and advertise a whole bunch of /48s in addition to (or in place of) its /40. The /48s of the sites closest to site 1 would be advertised out of Internet router A and the /48s of the sites closes to site 2 would be advertised out of Internet router B. This would allow the firewalls at each DC to see both the outbound and inbound traffic flow. Sounds like it might work!
However, even if the ISPs accepted the longer prefixes (and their upstreams did as well), what if the latencies of the WAN links change one day, and sites that normally have their best path out of site 1 flip over to site 2? Latencies of leased lines might not move around too much, but latencies of MPLS-provisioned L2VPNs might fluctuate if the provider is building out a new ring or making network changes. So, again some sites might encounter the original problem and end up with asymmetric flows getting nuked by the firewalls.
2: Move the firewalls
An expensive way of solving this problem is to remove the Internet firewalls in the DCs and move them closer to the users. This means that the campus networks at each of the DCs would have their own firewalls, and each of the branch offices would have theirs, as well. Since there's only one way in and out from each of the campus networks, the problem goes away.
Unfortunately, even if firewalls grew on trees (or were free), there are some security implications with this design. Essentially, Foo and Associates' private WAN becomes a public one. Rogue Internet traffic can make its way into the WAN, even if it's blocked before it gets to the users. A DDoS that might normally not cause the Internet transit links to blink could consume one of the WAN links easily, disrupting operations. This option isn't too bad, but it's not going to fly if Foo and Associates happens to be a financial company. Although, Foo and Associates could just ditch the WAN and move to Internet-based VPNs, but then QoS goes out the window for any VoIP-related things, MTU becomes an issue, and bleh.
3: Link the firewalls
If the Internet firewalls in each data center support high availablity and state table sharing, it might be possible to link them together. Some vendors say that this type of setup will work up to 100 ms but it typically needs two links, jumbo (>= 9100 bytes) frame support, and no VLAN tag manipulation. So, Foo and Associates could buy two dedicated MPLS L2VPNs with jumbo frame support and a latency SLA that's below 100 ms. This way, the firewalls could maintain a common state table and know about the outgoing state even if it's been built through a different firewall.
This might work, actually. But is it a good idea? Maybe, maybe not. Session set-up time may be longer if the session table has to be updated synchronously. What happens if the provider breaks their SLA and the latency exceeds 100 ms? Session table disaster is the probable outcome.
4: Only use one site for IPv6 Internet
Another simple yet unattractive option is to tweak routing so only one DC is used for IPv6 Internet access. If site 1 was chosen, the /40 advertisement out of site 2 would be prepended, and the BGP local preference attribute would be used to influence the exit path. If site 1 drops off the map, site 2 could still be used as backup (since there would be no other routing announcements to choose from). Note that this wouldn't affect the DC servers.
The problem with this plan is obvious: latency. Sites that have the shortest path to site 2 for Internet access would go through site 1, instead. If site 4 is on the west coast and site 1 is on the east coast of the United States, this could get annoying. What if site 4 was in San Francisco and site 1 was in New York? Should the packets for UCLA's website over IPv6 really have to cross the country and back? Yeesh.
5: Forget security
The title says it all. Throw out those Internet firewalls. Let's go back to the old days when there weren't any firewalls. Seriously, if Foo and Associates is a college or university, this may be the way to go. No firewalls, no session tracking, and no problems!
Although this may be a bit unorthodox in this day and age, a compromise might be to turn of SYN packet checking on the firewalls and allow traffic on ports above 1024 to generate sessions even if the 3-way handshake was never observed. Or, replace the firewalls with routers that do stateless filtering to achieve the same goal.
6: IPv6 NAT+PAT
Here it is, folks. Add IPv6 NAT+PAT to the Internet access policy on the Internet firewalls in both DCs, and you're set. IPv6 Internet access is then optimized for latency (well, as well as it was for IPv4) and the problem with return traffic disappears. Here's a diagram:
And the return path:
Problems.. there are tons of them. This breaks end-to-end connectivity, AGAIN. It feels like the mid-1990s all over again. IPv6 NAT+PAT isn't really implemented in many devices, yet, but it's implemented in OpenBSD's PF. The simplicity of this solution is hard to deny, even with the problems (philosophical and other) it presents.
And there you have it. Are there solutions 7, 8, and 9? I sure hope so. Feedback, please. I'd love to hear it.
Thank you for the article, but this is just the beginning of the IPv6 nightmare. What if you have 100+ global locations directly, cheaply and fast connected to the Internet for Facebook, Youtube etc. and at the same time connected to the expensive and slow corporate MPLS+VPN network providing also centralized access to some external partners and services? Simple with IPv4 NAT and BGP but multiply the challenge you so nicely presented by n^2.
Are there any solutions to this?
New comments are currently disabled for this entry.
This HTML for this page was generated in 0.001 seconds. |
IMO solutions 2 3 and 4 are far better than a NAT+PAT hack to get the traffic patterns desired. In the cases where I've seen the problem (albeit on a smaller scale) of "two ingress points break stateful stuff", the answer has been "push your state-tracking further back" which is what #2 accomplishes. 3 and 4 solve the problems in what are arguably more elegant ways, while retaining the benefits of centralized firewalling as shown in this exercise