Eric Stewart: Running Off At The Mouth

Nexus 7700 Part V: TCAM Woes and Solutions

by Eric Stewart on Jan.21, 2015, under Networking, Technology

The two 7710s going into our data center (yeah, not moving as fast on that as I would like) threw me another curve today.

So we have quite a few VLANs originating from our data center that do not run through the firewall, so as such, might just have their own (often somewhat lengthy) Access Control Lists (ACLs).  Without going into horrible detail about ACLs, they’re just rules that dictate what traffic is and is not allowed to enter/exit the VLAN.

I’ll try not to get into too much detail about hardware design on a Cisco device, but something we need to touch on is a TCAM: a memory location in hardware that holds ACL rules in order to make routing while consulting ACLs faster.  Thing is, there’s often a limit to the size of the TCAM.  Not being the principle architect of our network, I didn’t know that TCAM size limits might be something I would have to deal with.

In addition, on the F3 series blades we’re using, there’s four TCAM banks (two banks per TCAM address, identified as T0B0, T0B1, T1B0, and T1B1), and by default they are designated for specific purposes.  The command:

show hardware access-list output vlan feature-combo <feature>

Will show you (for a given “feature”) which bank will be used.  As an example:

svc-7700# show hardware access-list output vlan feature-combo VACL

Feature combination supported: Yes
______________________________________________________________________________
Feature                     Rslt Type      T0B0      T0B1      T1B0      T1B1
______________________________________________________________________________
VACL                            Acl                             X          
______________________________________________________________________________

shows you that the VLAN ACLs will go into TCAM 1, Bank 0.

Converting the VLANs that currently exist on the 6500s to the 7700s involved, for me, pasting the 6500 VLAN entries into a text file and editing them (including a nice little “shutdown” at the beginning of the definition, so that the 7700s stayed in L2 for the time being).  This isn’t a fast process – there’s a lot of Catalyst-to-Nexus translation that needs to be done, as well as some adjustments to the ACLs that accompany the VLAN interfaces.  I got a few interfaces/ACL rulesets into the process, and decided to paste the configuration for a few of the VLANs in shutdown mode into the 7700s, having already pulled the ACL rulesets in earlier.

Eight VLAN interfaces into the process, I get:

dc-7700(config-if-hsrp)# interface Vlan90
dc-7700(config-if)#  shutdown
dc-7700(config-if)#  description [REDACTED]
...
dc-7700(config-if)#  ip access-group ACLXX-Out out
ERROR: Module 1, 2, 10 returned status: Tcam will be over used, please enable bank
 chaining and/or turn off atomic update.If bank-chaining is enabled on other modules
 and this is a new linecard insertion,please enable bank-chaining prior to reloading
 this module.

...

Now, that warning is a little annoying, because it makes the situation sound a little more dire than it is:

dc-7700# show hardware access-list resource entries module 10


INSTANCE 0x0
-------------

Tcam 0 Bank 0: 10 valid entries   4076 free entries
Tcam 0 Bank 1: 9 valid entries   4077 free entries
Tcam 1 Bank 0: 2030 valid entries   2056 free entries
Tcam 1 Bank 1: 308 valid entries   3778 free entries

Index  Protocol  Encoding  ref_count
------------------------------------
6 protocol cam entries are in use
7 mac protocol cam entries are in use

         ACL Hardware Resource Utilization (Mod 10)
         -------------------------------------------- 
                          Used    Free    Percent 
                                          Utilization
----------------------------------------------------- 
Tcam 0, Bank 0           20      4076    0.49
Tcam 0, Bank 1           19      4077    0.46
Tcam 1, Bank 0           2040    2056    49.80
Tcam 1, Bank 1           318     3778    7.76
...

I did a little Googling and it would appear that the error occurs because the new ACL would cause the “Percent Utilization” to break 50% for T1B0.  Atomic Updates is the reason why we can’t go beyond 50% usage; Atomic Updates, during the updating of an ACL, holds the old copy and the new copy of an ACL in memory at the same time in order to prevent any service interruption while updating the ACL (the old one will be removed once the new one is complete and active).  We can turn that off, but then there might just be a service impact any time someone asks us to adjust an ACL (which, for some VLANs, can be frequently).  The option is to make the default action to be a “permit” during ACL updates rather than the default “deny”, but I don’t so much like that.

Bank chaining may just be the way to go.  It allows us to use all banks for TCAM entries, but with the loss of some functionality: certain combinations of ACL/TCAM usage cannot be done due to the lack of isolation that the default method allows.

Changing the configuration to use bank chaining requires the command to be run for every (non-supervisor) module, and existing TCAM entries do not get balanced.  The command in question is:

hardware access-list resource pooling mod <module#>

Delete the eight interfaces (to remove the ACLs from the TCAM tables), re-add them with the ACLs, and then look at the resource entries:

dc-7700# show hardware access-list resource entries module 10


INSTANCE 0x0
-------------

Tcam 0 Bank 0: 1024 valid entries   3062 free entries
Tcam 0 Bank 1: 9 valid entries   4077 free entries
Tcam 1 Bank 0: 1026 valid entries   3060 free entries
Tcam 1 Bank 1: 308 valid entries   3778 free entries

Index  Protocol  Encoding  ref_count
------------------------------------
6 protocol cam entries are in use
7 mac protocol cam entries are in use

         ACL Hardware Resource Utilization (Mod 10)
         -------------------------------------------- 
                          Used    Free    Percent 
                                          Utilization
----------------------------------------------------- 
Tcam 0, Bank 0           1034    3062    25.24
Tcam 0, Bank 1           19      4077    0.46
Tcam 1, Bank 0           1036    3060    25.29
Tcam 1, Bank 1           318     3778    7.76
...

A little more balanced, and per the documentation‘s note that it first balances across Bank 0 before using all four banks.

So I haven’t had a chance to talk to $BOSSBIGBRAIN yet to see what says, but this is what I’m going with so far.

Don’t be surprised if this article is edited later.

EDIT: And from what I understand we have a TAC case open for the issue, so there may be more to this than I know.

:, ,

1 Trackback or Pingback for this entry

Hi! Did you get all the way down here and not find an answer to your question? The two preferred options for contacting me are:
  • Twitter: Just start your Twitter message with @BotFodder and I'll respond to it when I see it.
  • Reply to the post: Register (if you haven't already) on the site, submit your question as a comment to the blog post, and I'll reply as a comment.

Leave a Reply

You must be logged in to post a comment.