Most of the time, we (Networking people, anyway) are more concerned with traffic entering a network, rather than the traffic that’s leaving the network. This is apparent by how often there’s something along the lines of “allow all established” either in our iptables rules or even our Access Control Lists (ACLs). Specifically for ACLs, it’s why we also tend to focus on the “ip access-group <ACL> out” form of the command, ignoring that there’s an “ip access-group <ACL> in” form as well. But, we do use them occasionally, and I recently ran across a couple of issues that I’m hoping I can help someone get by faster by writing this post. The post may ignore the “out” ACL cases at certain points, since those don’t apply to the issues I’m covering.
So at $JOB we have a certain subset of users that we have absolutely no trust in, but need to provide services to. Through some back-end machinations, we put them on VLANs where DHCP treats them differently (using a main subnet, and then two secondary subnets) depending on how they’re registered. Yes, there are probably better ways to do it, but for us, putting three subnets on a single VLAN was the easiest way to attempt to get this subset of users to let us know who they were and classify them appropriately. Since we don’t wholly trust these folks, and one of the subnets on the VLAN is heavily restricted on where it can go, the “in” ACL ends in
deny ip any any
and because of this, there’s a few things that can break.
First, we need to figure out what “in” and “out” really mean, since it’s often counter-intuitive. Since the platforms I use most often are Cisco Catalyst 6500s and Cisco Nexus 7000/7700, I’ll be referring to them in terms of SVI/VLAN interfaces.
“in” usually means something along the lines of “in to the router from the VLAN,” or perhaps “inside the VLAN”. Now (most of the time), an ACL will only take effect when a packet is hitting the router interface (or VLAN interface) from the VLAN itself for some reason. If the packet is L2 switched, even if it goes through the device that would route the packet out (if it were destined to leave the network), the ACL doesn’t take effect.
“out” is best viewed as “packets going out to the VLAN” or “packets coming from the outside.” Typically this is where we do our filtering. And even if we do have something along the lines of “deny ip any any” at the end, we usually have something along the lines of “permit tcp any any established” before that line. This essentially allows any return traffic for connections machines on the VLAN establish through the ACL.
Keep In Mind Source/Destination
So an ACL line is usually something along the lines of:
[permit/deny] <protocol> <source> <destination>
Depending on the protocol you can include certain key words to specify ports or established traffic, etc. Generally, when you have an “out” ACL, addresses on the VLAN are the <destination>. Conversely, an “in” ACL would (in almost all cases) have the addresses on the VLANs as <source>, since if they were destinations, it wouldn’t be traffic “crossing” the router, but rather just L2 traffic that wouldn’t even hit the ACL.
Right off the bat I need to give a bit of credit to a buddy I met at Cisco Live, Aaron. Google popped me onto one of his blog articles when I was stumbling through this early one morning. For HSRP, you need to remember that there are two different versions (HSRPv1 and HSRPv2), and that they use multicast, and that they each use different IPs. For the life of me, I couldn’t figure out why my initial Googled solution wasn’t working, when it turned out my initial Googled solution was for HSRPv1:
- HSRPv1 packet destination: 126.96.36.199
- HSRPv2 packet destination: 188.8.131.52
So, if you don’t need to be more specific on your source addresses (for HSRPv2):
permit ip any host 184.108.40.206
should be fairly high up in the “in” ACL, or the routers you’re attempting to do HSRP between won’t be able to see one another and will both go into an “active” state as far as HSRP is concerned.
This, if you haven’t realized by now, is not a good thing.
There are a couple of things needed for this if you’re using DHCP via “ip helper-address” (or “ip dhcp relay” on NX-OS).
Even without HSRP getting involved, you need to ensure that the clients (addresses on the VLAN) can get to the DHCP server (for a variety of reasons involving how DHCP works at different times). You also need to ensure that broadcasts from systems are allowed to talk to the router that will be relaying the DHCP request. Assuming no need for additional security:
permit any host <DHCP server address IP> eq bootps permit ip host 0.0.0.0 host 255.255.255.255
is a minimum requirement in your “in” ACL.
But, There’s More: HSRP and DHCP
Now, assuming proper “router” definitions, your two routers should advertise the subnets that they share HSRP duties for to the rest of your network, whether or not they are the Active router for that VLAN. In other words, if a router who participates in routing with the two HSRPed routers has equal cost connections to both, it could send packets destined for the VLAN in question to either one.
Say you have a network, 10.0.0.0/24, on a VLAN with two routers that have the addresses 10.0.0.252 and 10.0.0.253, and they share the duties (it doesn’t matter who is Active and who is Standby) for 10.0.0.254. When a DHCP broadcast is detected by each router on the VLAN in question, and they are both identically configured with (a) DHCP helper statement(s) of some sort, they will both convert the broadcast to a unicast request to the server(s) listed in the DHCP helper statement(s). They will use (even with “secondary” addresses configured) the “primary” non-HSRP address on the interface as the “source” of the packet(s) they send to the DHCP helper(s). On a given DHCP server, you should see the DHCPDISCOVER messages for a given DHCP client on the VLAN coming from both routers.
Now, it’s very possible for equal-cost-load-balancing to do something possibly unexpected, and send the response destined for 10.0.0.252 to the router that has the IP 10.0.0.253 (and visa versa). The “out” ACL would be triggered on 10.0.0.253, and providing you’re allowing (somewhere along the line) the DHCP server as the source, it should just turn around and pass the packet on.
The thing is, now the packet is “inside” the VLAN, and when 10.0.0.252 gets the packet passed to it from 10.0.0.253, it sees a <source> outside the network coming to it from inside the network, and it will drop the packet (assuming the ACL is configured per our assumptions). What’s worse is that if the reverse is happening as well, the DHCP server’s response never makes it back to the client, and the client ends up never getting an address assigned. In the DHCP server logs, you’ll see a bunch of DHCPDISCOVERs followed by DHCPOFFERs, but with no DHCPREQUESTs and then the appropriate DHCPACKs.
You will see the exact same behavior if you have:
ip verify unicast source reachable-via rx
configured (this is a great command in concept but breaks several things when you start doing any kind of first-hop routing management or essentially routing the same subnets with multiple routers). And this issue goes away if you only have one up VLAN interface between the two routers. Put one in “shutdown”, and all routing and requests go through the one router, and the problem is avoided.
How I fixed it?
permit ip any 10.0.0.252 0.0.0.3
Note that the “local” addresses are now specified as the <destination> instead of the source for this “in” ACL. Yes, I could get more specific for this case, and “0.0.0.1” would probably suffice as far as a wildcard is concerned, but “0.0.0.3” covers the “shared” router address and the broadcast address as well as the main addresses for the router interfaces (and you can use the same line on both routers instead of specifying two different ACLs, which gets messy). Another option would be to change the line so the DHCP server addresses are specified as <source> addresses in the ACL. But the line I use above allows several other things to occur that might not occur otherwise (like being able to ping the router if you haven’t otherwise permitted so through some kind of allow statement).
In any event, there are several ways to fix the issue, but if this helps anyone solve a similar conundrum more quickly than it took me, then the time I spent writing this was well spent.