The other weekend I connected a L2 circuit between two sites. At both ends were Cisco 6500 Catalyst switches running VSS. The interfaces they connected to were configured as L3 and EIGRP was run between the two sites to share routes. But as soon as they were connected the neighbors started flapping.
Troubleshooting started and as always you start at the lowest OSI layer and work up. Bingo! The issue was at Layer 2 as I could see ARP was incomplete on both sides for the neighbor addresses. Checking the MAC address for the interface the L2 circuit was connected to at site A and the MAC address for the interface the L2 circuit was connected to at site B showed the same MAC. How could this happen?
As mentioned in the first sentence both ends had a Cisco 6500 Catalyst switches running VSS. One of the first things you do when configuring VSS so set the switch virtual domain ID. Cisco recommend that you enable virtual MAC addresses (mac-address use-virtual) under the switch virtual domain. I’ll explain why Cisco recommend this option. When when the first switch comes up, VSS uses the MAC address pool from that member and uses that pool across all L3 interfaces. This MAC address pool is maintained by VSS when one (and only one) switch is reloaded. But if the entire VSS is reloaded and the other switch happens to come up first the MAC address pool will change. This shouldn’t be a huge deal but if there are any other devices out there that are ignoring gratuitous ARP they will require manual intervention to get them working which will cause further service disruption.
Hence Cisco recommend using mac-address use-virtual under the switch virtual domain ID. This ensures the same MAC address pool is used at all times. No exceptions. But the switch virtual domain ID is significant in determining the virtual MAC address pool. It’s used in the formula to calculate this pool. As per the Cisco documentation:
The MAC address range reserved for the VSS is derived from a reserved pool of addresses with the domain ID encoded in the leading 6 bits of the last octet and trailing 2 bits of the previous octet of the mac-address. The last two bits of the first octet is allocated for protocol mac-address which is derived by adding the protocol ID (0 to 3) to the router MAC address.
When I checked both switches I found they both had a switch virtual domain ID of 10. Therefore the virtual MAC address on the L3 interfaces were both 0008.e3ff.fc28. We can use the formula to check this:
6th octet (28) to binary: 00101000
Remove trailing 2 bits: 001010
001010 (bin) to decimal: 10
But what are the options for fixing the problem where the MAC addresses are the same on both sides?
- On one side, under the L3 interface use mac-address H.H.H.H
- Change the switch virtual domain ID on one VSS – Possible to do but requires a complete outage as a VSS reload is required.
- Remove mac-address use-virtual from the switch virtual domain ID – Not recommended as discussed previously.
Option 1 seems like the most viable option but how do you guarantee the MAC address you manually assign is unique? Will there be issues in the future? If we pick an arbitrary number between 1 and 255 (switch virtual domain ?) we can then use the formula to calculate a “safe” MAC address as long as no one in the future connects a VSS with this arbitrary number as the switch virtual domain ID. I decided to choose 99.
Virtual MAC address with domain ID 10: 0008.e3ff.fc28
Last two octets (fc28) hex to binary: 1111110000101000
99 (dec) to binary: 01100011
Insert 01100011 after leading 6 bits and before trailing 2 bits: 1111110110001100
1111110110001100 (bin) to hex: fd8c
Hence if a switch virtual domain ID of 99 was used the virtual MAC address assigned to a L3 interface would be 0008.e3ff.fd8c.
interface gi2/2/2 mac-address 0008.e3ff.fd8c
Problem solved. It was just unfortunate that the switch virtual domain ID happened to be the same. No one ever saw the two sites needing to be connected this way. If you’re deploying VSS in your organisation, work smart and use unique switch virtual domain IDs everywhere. If you happen to connect to a 3rd party first check if they’re using VSS and if they are, check what they have as their switch virtual domain ID. If there’s a conflict someone will need to manually set a MAC address on an interface.
I’ve been through this problem for the last day and I couldn’t find any configuration issue.
Then I’ve seen your post and since I’ve changed the virtual domain ID, everything works find.
Thanks a lot!
We are also facing same problem, will work on the solution:1 provided above as taking downtime is not possible.
This post is very helpful to us.
I tried to implement the Option 1 – “On one side, under the L3 interface use mac-address H.H.H.H” but this did not solved my problem. Can you please help me in this?