Eric Stewart: Running Off At The Mouth

When Ethernet Doing What Ethernet Does Is Inconvenient: Layer 2 Load Balancing

by Eric Stewart on Apr.05, 2018, under Networking, Technology

It’s been a while since I’ve posted.  If I’m not busy at work, I’m avoiding anything related to work. One issue I’ve been working on for a long time has involved a load balancer migration from one vendor to another. I ran into an issue which was brought about by the new vendor claiming our configuration was supported … only to find later on that it was not. This is not wholly their fault – we’ve been doing load balancing in possibly an unusual way for quite some time. In this post I’ll go into the hows and the whys, and why you occasionally have to watch out for Ethernet doing exactly what Ethernet does.  I’ll do what I can to avoid naming any vendors – we’re stuck with one for a while, and the product we’re moving away from isn’t “bad”, it’s just that they didn’t win out when we did the bake-off.  Which, as you’ll see eventually, I did not perform as thoroughly as I probably should have.

So Layer 3 load balancing is the way to go … if you don’t have the restrictions we have with our load balancing:

  • The gateway device is a firewall (for both member servers and VIPs), so it can’t be the load balancer.
  • We load balance more than web (DNS, TFTP, RADIUS), and need client information retained – which means the initial source packet must have the client address in it.

Layer 2 load balancing historically has required some method of ensuring that all traffic to member servers passes through the load balancer.  This used to mean that you had an “in” and an “out” interface and you had the rest of the world on one of them and the member servers on the other.  This is horribly inefficient, and should be solvable using networking magic.

It can blow a newer networker’s mind to think that a single VLAN can carry traffic for multiple subnets (this, while evil and wasteful from a bandwidth usage view, wasn’t actually all that uncommon when VLANs didn’t exist and a single wire was where the broadcast domains lived).  I’m going to take things a little further and suggest that the reverse is true as well: You can have one subnet, multiple VLANs.

So … how to do translations between VLANs?  Well, our previous set up utilized an in and and out interface on the load balancers, but one of those interfaces was configured with “switchport vlan mapping <VLAN1> <VLAN2>”.  This command would translate any dot1q packet crossing it from VLAN1 to VLAN2 and visa versa.  Then you just set up your “allowed vlan”s for the inside interface connection to be only the “inside” VLANs, and then the outside interface connection to be the “outside” ones.

This works, but is inefficient (I’ll try to elaborate why later) and relies on switches having this functionality (which can be ASIC limited).  What would be nice is if you could just trunk both the inside and outside VLANs into the load balancer (preferably using some kind of port-channel to get some additional bandwidth) and have the load balancer do all the VLAN translation for you.  Note that what you’d end up with is a single connection into the load balancer, and just VLANs trunked in, pairs of which might be bridged together.

Turns out, three vendors said “We can do that.”  What all of them did (more or less) is provide a “bridging” function, where packets from one VLAN would be thrown onto another VLAN per the configuration.

Now, mind you you have to keep a few things in mind when you do this kind of thing:

  • If you have two devices doing this bridging (say, in an H/A configuration), if the standby device continues to bridge, it causes a loop.  For those who aren’t aware, loops on Ethernet networks (if there isn’t some kind of thing like spanning-tree preventing it) are very very bad.  This kills the network.  So the standby devices should not perform the bridging.  Only the active should.
  • What packets are bridged occasionally also needs to be configured.  It’s unlikely that the bridging device will properly participate in spanning tree like you’d like it to, so you have to either limit what it bridges to “IP traffic only” (which, if they haven’t corrected for it yet, won’t work for v6, since v6 uses multicast and that’s not always considered “IP traffic”) preventing things like dot1q tagged spanning tree packets from being bridged (this usually causes the switch to err-disable the port since it sees spanning tree packets from one VLAN on another, and the switch assumes something’s not right) or configure the connecting port to ignore the traffic that will cause issues if it doesn’t (which breaks loop prevention).

That’s not an exhaustive list – it’s only what’s occurring to me right this second.

Anyway, with much grinding of teeth and working with sales engineers to resolve issues, at the end of the bake-off, I would constantly say two things about all three vendors involved:

  • I liked their sales engineers.  They knew their stuff and would go the extra mile to resolve issues.
  • I hated every one of the products tested.

But … I would also have to admit that after the bake-off, I had no preference from a functionality point of view: I could get all three vendor’s products to do what we wanted them to do.

Okay so I’ve gone off on a tangent – let’s get back on track with what we ran into during implementation.

Any load balancer worth implementing has some kind of monitoring of the member servers of a VIP.  If the member server doesn’t respond on the port(s) required, it is removed from the VIP and not used for load balancing.  The product we chose did this, but we were running into an issue where the failover score (kind of like an HSRP priority) for the standby wouldn’t always stay at what it should be staying at.  It was having trouble with some of its tracking scores, where it wouldn’t be able to successfully ping a gateway.  Further into the implementation process, we also noted that the standby device would also have trouble verifying the status of a member server for a VIP; the active device would say everything was fine, but the standby was saying the devices were down.

This issue would come to a head during software upgrades.  The recommended process was to make the standby (#2) the active, upgrade the now standby (previously active, #1), reboot it.  Allow it to reclaim the active role, then do the standby (#2).  Thing is, the requires #1, when it comes up (and it comes up assuming it is the standby – it doesn’t automatically go active) to have a failover score higher than #2 (think preempt).  If #1, while it is in standby, isn’t able to get it’s score back to where it should be (and higher than #2), it stays in standby.

It took some packet captures to figure out what was going wrong.

The virtual ethernet interface (VE) for a bridge group (two VLANs, one subnet) essentially exists on both VLANs (let’s call them 10 and 20).  And both the active and standby devices are operating in both VLANs, but only the active is bridging the packets.  For any member servers of a VIP on that same subnet, the VE is used as the source address for pings or port checks.

Let’s talk about what a device does when it’s trying to figure out where things are on the network, and try to put this into context for this case.  If a load balancer is going to ping a member server on a “clear” network, the first thing it’s going to do is send out an ARP broadcast, asking for the MAC of the target IP.  In this case, if the standby is sending this broadcast out, and there’s nothing indicating which of the two VLANs the device might be on, it will send it out both VLANs.

Dutifully so, the active device, getting the broadcast of it’s H/A neighbor in one VLAN (say 10), does exactly what you’ve asked it to do and pops it over to the other VLAN (20) and sends it back out.  And of course, since the standby sent the same broadcast out on VLAN 20, the active dutifully duplicates it back onto VLAN 10.

So … what do you think this does to the CAM table on the switch?

Our particular setup (and possibly a contributor to the issue) is the fact that both load balancers were using a VPC connection.  So, when looking at this issue, you’d only have to look at the traffic from one of the two VPC providing switches.  What you’d see from traffic captures is broadcasts from the standby going into and coming back out of the active on two VLANs.  And the CAM table would get really confused and more often than not indicate that the port that the standby’s MAC was on was actually the active’s.  And the active (with a CAM table of its own) would see unicast traffic (ARP responses) coming in to the interface that they were supposed to send it out on … and would logically just eat those frames.  Since the standby never gets the ARP response (at least, not unless timing worked out perfectly, and eventually it would break again at some point), it would just ARP again.

While this isn’t a full loop, it’s essentially what I call two half-a-loops … and it sure acts like a full loop.

It took forever to get TAC to understand the issue.  And due to time and support pressures, we were already midway through the implementation process.  We were given four potential solutions, one of which was “have your switch do the translation” which we were trying to move away from; one was “change your config completely” and we weren’t sure that would work for us, since, at least the active was working as desired; one was “have an inside and an outside interface”; and I can’t 100% remember what the fourth one was.

We opted for solution #5: static CAM entries for the MACs/VLANs/Interfaces involved.  Ideal?  No.  More palatable though than having the switch do it.  It was my idea, and my boss, who’s brain is bigger than mine, considered it a good and viable solution.  Yeah … I was surprised too.

Our own fault for using a load balancing method that many consider dated?  Maybe.  But honestly, if the vendor supports H/A and bridging, the behavior shouldn’t have been a surprise, and we suspect other vendors have solved it one way or the other through code.  The active knows who the standby is.  The standby can inform the active as to what MACs are his, and the active can then just not bridge their broadcasts.  The active could just look at its own CAM table – if it sees a broadcast from a MAC address on one VLAN that it sees in its CAM table for the bridged VLAN, it should just eat that broadcast, feeling safe to assume that if the load balancer knows where the MAC goes, the rest of the network should, too.

I don’t know for sure – I’m not necessarily $DIETY’s gift to networking (I know too many people who are better at than I am).  But there should be some way to fix this in code.  And honestly, I’ll admit that this is something I should have found during the all-to-short bake-off.

:, ,

Hi! Did you get all the way down here and not find an answer to your question? The two preferred options for contacting me are:
  • Twitter: Just start your Twitter message with @BotFodder and I'll respond to it when I see it.
  • Reply to the post: Register (if you haven't already) on the site, submit your question as a comment to the blog post, and I'll reply as a comment.

Leave a Reply

You must be logged in to post a comment.