Okay, so we’re going by memory, and I was doing this at 4am, and I have no way to lab it to confirm the behaviors as this was on a production network. As I was more interested in fixing the issue and going back to bed than planning for a blog post, I don’t have any output information. If someone labs this up and disproves it, let me know where I’m wrong.
So this morning I woke up to turn on routing on our 7700s. For us, since we’re unfortunately using a Layer 2 core (we really want it to be Layer 3, but we’re still having issues clearing up all of our cross-core VLANs), there’s a “core routing” VLAN, a Loopback, and then the routing instances (OSPF and OSPFv3) that needed to be “no shut”.
One of the 7710s came up and did neighbor adjacencies fine. The other … not so much. It remained in EXSTART, and before you start Googling, no, the MTUs were not different between the errant 7710 and the DR.
Remember, we configured our vPC Peer link to carry all VLANs. This included the core routing VLAN. I can’t remember the specifics right now but I know I read something about spanning-tree not working quite right over the vPC Peer link. That may have been the core of the issue …
When I finally got to running some debug commands (mainly, debug ip ospf adj), it became apparent that the DR was receiving duplicate DBD packets from the errant 7710. Looking at spanning tree, on the errant 7710, the uplink to the “old” DC 6500 was in blocking state for the routing VLAN and the vPC Peer link was the root port.
An additional note: While bringing up the SVIs, while one was up and the other was down, vPC was in an inconsistent state.
I took the routing VLAN out of the vPC Peer link and the duplicates cleared up and the adjacencies formed properly. I brought up a secondary link between the two 7700s (non-vPC) and added the routing VLAN to that trunk – it went into blocking at some point over that link, but it doesn’t seem to have caused any issues (and is perfectly fine as long as there’s nothing unusual going on, and the root ports remain the uplinks to the existing DC switches).
I’m hoping the behavior is because the routing VLAN, while existing on both devices, isn’t an HSRP link. Or perhaps it’s because, now that I think about it, I may have missed some crucial configuration option that indicates that the links to the existing DC switches (the 6500s) are the “uplinks”. In any event, things are okay now, but I need to look closer at my vPC documentation Monday in order to avoid this issue in the future.
But, I’d like to enjoy the rest of my weekend so for the moment, as long as things are working, I’m not going to worry about it.
EDIT: I turns out this is a known issue and had I read the vPC documentation fully, I would have known about it. In our case, both 7710s will have connectivity to our backbone routers, so they wouldn’t (and probably shouldn’t) need to pass the routing VLAN traffic directly between them.