My apologies – it’s been a few days since we ran into the problem and fixed it (and then moved on to the next fire), so this post may not be as complete and as well written (haha) as some of my others.
Reviewing a little bit of part 1 of this issue, we ran into an issue where the limited TCAM space (or, more accurately, the way TCAM space is utilized) caused us to be unable to install ACLs on some VLAN interfaces that we were planning to put on our new (actually, year-old) Cisco Nexus 7710s. The F3 series blades had four TCAM banks, and by default assigns each bank to a specific purpose, and also allows you to only use 50% of the bank (in order to support Atomic Updates).
The first issue we ran into was that we had some issues arranging our QoS as desired (resolved by a more capable engineer than I). Next, it became apparent that we would be unable to use incoming ACLs (from the vantage point of packets coming in from the VLAN out to the rest of the world). This is no big deal, as we usually don’t block traffic coming from the VLAN going elsewhere; it was the “out” coming in that we were more worried about.
So early one Saturday morning I transfer the first VLANs from the old Cisco 6500s to the 7710s – mainly, “our” VLAN, as well as our learning lab VLAN and a little used setup VLAN. I go through the entire weekend walking on eggshells, expecting to hear that the relatively minor change I implemented broke something. I was even delivered a minor heart attack in the form of my notification to my coworkers about the work being forwarded back to the notification list (with no additional text) by the on-call engineer. Turns out, that was just a minor mistake on his part.
But the following Monday morning, it becomes apparent that there’s an issue when someone reboots their workstation and it fails to get a DHCP address. Checking the DHCP servers indicates that the unicast lease renewals are working fine, but there are no packets that start out as broadcasts making their way to the DHCP servers.
See, one thing we use pretty much everywhere is “ip dhcp relay address [ip address]”, to support DHCP servers not directly on the VLAN. Once you realize how DHCP relay works (at least on the Nexus), then you’ll understand why it wouldn’t have worked with Bank Chaining … But there’s a little more to the story.
The commands were in the VLAN config to start with, so everything looked like it should have worked. It wasn’t until I was clued into the command:
show system internal access-list input config
That it became apparent how DHCP relay works, and why it wasn’t actually working.
What the DHCP relay commands do is install an “internal incoming ACL” to take broadcast traffic to the DHCP ports, forward them to the DHCP relay feature, which then converts it to a unicast request to the address(es) of the DHCP servers so configured.
Turns out, the ACL didn’t exist. And in case you haven’t figured it out yet, because Bank Chaining broke incoming ACLs, that’s why it wasn’t allowed to work.
Removing the relay lines and trying to put them back in resulted in:
ERROR: Hardware programming failed. Reason: Resource-pooling is not supported with certain feature combinations and with Bank Class Mapping.
So, $BOSSBIGBRAIN went through the process of turning off atomic updates (and unfortunately setting the default during updates to “permit”), removing all of our ACLs, turning Bank Chaining off, putting all the ACLs back in, running out of space even with atomic updates off, scraping the ACLs for inefficiencies or cruft, and putting them back in. We’re still running above 90% usage on the one TCAM bank we’re allowed to use and are expecting to run into issues before too long.