The Case for Migrating CMU's Distribution Layer to VSS

Gabriel Somlo, June 2009

This document describes why and how CMU's distribution routing layer should be switched from HSRP-based redundancy to VSS. To make that case, I begin by covering the current design, along with the behavior of several technologies of interest to us (URPF, SLB, and FWSM). Next, I review the steps required to convert a distribution-layer router pair (Pod) from HSRP redundancy to a VSS stack. Finally, I revisit the behavior of URPF, SLB, and FWSM under the new model. Please feel free to append to this document as more experience and information is gathered on the "quirks" of VSS. This information should then be used as we consider a migration of our production distribution routers to VSS.

1. Our Current State: HSRP-Redundant Distribution Layer

Here's a snippet that illustrates CMU's current distribution layer design:
             --------
             | core |                                 L-3 Core
             -+----+-
         1/4 /      \ 2/4
            /        \
           /          \
          / v903       \ v901
         /              \
        /                \
       /                  \
      / 2/1                \ 2/1
------+------        ------+------
|           |5/4  5/4|           |
| pod-t-a84 +--------+ pod-t-233 |      L-3 Distribution Layer
|           |  v973  |           |
------+------        ------+------
   .2 | 2/8    .1       .3 | 2/8
      |                    |
      |                    |
      | v4                 | v4
      |                    |                  L-2 Access Layer
      |  ----------------  |
      |  |              |  |
      +--+  233-bag-a   +--+
   4/0/1 |              | 4/0/2
         ---+-------+----
         v4 |    v4 |
           ...     ...
Without loss of generality, we only show one core router and one access-layer switch, with one access-layer vlan/subnet (Vlan4). The two "halves" of our distribution router pair are each uplinked to each core using a dedicated point-to-point vlan/subnet, over which they speak OSPF. They each have an address (.2 and .3 respectively) on the subnet associated with the user-facing vlan (vlan4) and use HSRP to share the advertised default gateway IP, .1

To avoid spanning-tree loops and the network outages typically associated with recalculating STP-based layer-2 paths, vlan4 is not trunked on the interlink between the two "pod halves". Instead, the interlink is a point-to-point Layer-3 routed connection, and the two halves speak OSPF to each other over this link as well.

During normal operation, one "half" of pod-t is HSRP-active, and thus "owns" the .1 default gateway IP for the subnet on vlan4. All edge devices on vlan 4 will thus send it their outbound network traffic. For return traffic (traffic sent from the core and headed toward the edge devices on vlan4), the load is shared 50-50 by the two pod-t halves, since both advertise equal OSPF routes for vlan4's subnet to the core. Traffic received by any device on vlan4 has a 50% chance of being relayed by either pod-t-a84 or pod-t-233.

This design can survive several different failure scenarios related to one member of a distribution router pair (e.g. pod-t-a84): The two "halves" are geographically separate, so the chance of both going down simultaneously is small enough to simply ignore.

2. Interaction of HSRP Redundancy with Other Technologies

Our HSRP-based distribution-layer redundancy scheme interacts with several other aspects of our network design, among which we count the topology of the L-2 access layer, and technologies we rely on such as URPF, SLB, and FWSM blades. We describe each of these interactions in this section, with the expectation that a migration to VSS would mitigate or eliminate the negative effects of these interactions.

2.0. HSRP vs. Access Layer Topology

To avoid STP loops, we have so far avoided making available an access Vlan (such as vlan4) on more than one dually-uplinked access switch. In the figure above, vlan4 exists on the 233-bag-a switch and its uplinks to each "half" of pod-t, but not on the interlink between the two pod-t halves. If we made vlan4 available on another access switch there would now exist a STP loop which would need to be recalculated in the event of certain links failing:
         ----------------
         |              |      
      +--+  233-bag-b   +--+
      |  |              |  |
      |  ----------------  |
      | v4                 | v4
      |                    |
------+------        ------+------
|           |        |           |
| pod-t-a84 +--------+ pod-t-233 |
|           |  v973  |           |
------+------        ------+------
      |                    |
      | v4                 | v4
      |  ----------------  |
      |  |              |  |
      +--+  233-bag-a   +--+
         |              |      
         ----------------
We are therefore reluctant to deploy any edge/access vlan to more than one dually uplinked aggregator access switch connected to a pair of distribution routers.

2.2. HSRP vs. URPF

URPF (Unicast Reverse Path Forwarding) checks are supported on routed interfaces mainly as source address anti-spoof mechanism. When URPF checks are enabled on an interface, incoming packets are allowed only if the receiving interface is a valid way to route traffic back to the IP address from which the incoming packets were originated. The straightforward application is to prevent users on a subnet served by our router from sending out packets with spoofed source IP addresses outside the range dictated by the subnet.

In our HSRP-based redundancy design, URPF interferes with external traffic to the router interfaces (.1, .2, and .3). This mainly impacts monitoring software that attempts to ping the router interfaces and alert in case one or more of them become unreachable.

Assume pod-t-a84 is currently HSRP-active for the subnet on vlan4, and thus the owner of both the .1 address and its own .2 one. Assume a monitoring station sends a ping to the .1 address from somewhere beyond the core. At the core, the ping has a 50% chance of being sent to either pod-t-a84 or pod-t-233, as both advertise the entire subnet to the core via OSPF with equal metrics. If the core sends the packet directly to pod-t-a84, it will be received and answered without incident. If, on the other hand, the packet is sent to pod-t-233, it will be forwarded out via the .3 interface into the L-2 access layer, and reach the .1 interface on pod-t-a84 from within the subnet. When this happens, URPF will discard the packet because, based on its off-subnet source IP address, it was received over the wrong interface.

In effect, roughly half the time, external packets to our .1, .2, and .3 interfaces will not reach their destination. When URPF is enabled on an interface, we have the option to add an ACL to exempt given source IPs from being subject to the checks. We currently have such a list containing the known addresses of our monitoring machines, and other trusted IPs we need to be able to reliably contact our router interfaces. The downside is that such access lists must constantly be managed and updated, to facilitate the continued monitoring of the network.

2.3. HSRP vs. SLB

We currently use SLB (Server Load Balancing) to make available well-known caching DNS server IP addresses from several different locations (at least one physical server from each distribution router pair). Without delving into too much SLB-specific detail, we configure a "server farm" containing the real IP addresses of one or more DNS servers, and then we creat "vservers" which advertise the anycast virtual server IPs into the OSPF cloud.

SLB interacts with our HSRP design in two ways. First, SLB itself is HSRP aware, and can be configured to only advertise the virtual IPs from the HSRP-active side of a redundant pair. Should HSRP fail over to the other member, SLB virtual IP advertisements would begin to be originated from that member also. In addition, SLB allows its internal connection database to be replicated to another router. We use both measures to insure that in the event of a Pod member failing, the other member will take over not just the job of advertising and dispatching connections to the virtual server IPs, but also the management of existing connections.

Two extra configuration lines per SLB "vserver" are necessary: one of them ties the "inservice" status of the vserver to the "active" HSRP status of the router, and the other one establishes a connection to the "standby" HSRP router over which current connection status information is replicated.

2.4. HSRP and FWSM blades

FWSM (FireWall Service Modules) provide VPN/Crypto/Firewall handling acceleration to a 6500 series Cisco router. They are functionally similar to ASA5500-series security appliances, but instead of having their own phisical connections, they access the 6500's Layer-2 infrastructure via the chassis backplane.

Layer-2 connections between the 6500 chassis and the FWSM cards are established by issuing the following commands on the chassis:
	firewall multiple-vlan-interfaces
	firewall vlan-group G V1,V2,V3-Vn
	firewall module M vlan-group G
Then, on the FWSM itself, Vlans V1, V2, etc. are available as Layer-3 configurable interfaces. The 6500 chassis itself may or may not have a layer-3 interface on these vlans. We typically configure an IP on the vlan serving as the FWSM's OUTSIDE interface, and simply switch the other firewall vlans at layer-2, allowing the FWSM alone to act as the default gateway on the subnets associated with those vlans.

Just like ASA5500's, two FWSM cards can be configured as an active-standby pair. In our HSRP-based distribution router design, we host an FWSM card in each Pod "half". For the FWSM's own active-standby redundancy, both cards need shared access to the same set of VLANS, which introduces similar issues to the topology of access/edge vlans hosted by the 6500 routers, described in Subsection 2.1 above. We'd have to trunk at least the OUTSIDE and dedicated failover Vlans across the the pod interlink, but any production vlans serviced by the FWSMs would be subject to the same topology limitations w.r.t. STP loops as the Vlan4 in Subsection 2.1. Such vlans could only be made available to one access-layer switch before STP loops would be created on them.

3. Using VSS at the Distribution Layer

In essence, VSS is a mechanism that allows two 6500 routers to be "stacked" similarly to how other Cisco L2 switches can be stacked. Another way to think about it is "backplane over etherchannel" across two 6500 chassis. Rather than using a dedicated short-distance stacking connector, VSS can occur over long distances, allowing the same geographic redundancy and survivability as our current HSRP based design.

VSS requires that the two 10Gig ports available on the SUP cards (Te5/4 and 5/5 in our case) be used for the VSL (virtual switch link). Extra ten gig links from additional 10gig line cards may optionally be used for additional "backplane" capacity. Interfaces now have three numbers (the old slot/port numbering scheme is now prefixed by the switch number, which can be either 1 or 2). Hence, Te2/1 on pod-t-a84 in the HSRP design with standalone routers now becomes Te1/2/1. Similarly, Te2/1 on pod-t-233 becomes Te2/2/1 in the new unified VSS switch.
        ----------
        |  core  |                                 L-3 Core
        -+------+-
     1/4 |      | 2/4
         |      |
         | v903 | etherchannel
         |      |
   1/2/1 |      | 2/2/1
---------+------+---------
| pod-t sw1 || pod-t sw2 |
|           ||           |                    L-3 Distribution Layer
|    a84    VSL   233    |
|          */5/*         |
---------+------+---------
   1/2/8 |      | 2/2/8
         |      |
         |  v4  | etherchannel
         |      |
   4/0/1 |      | 4/0/2
     ----+------+----
     |              |
     |   233-bag-a  |				L-2 Access Layer
     |              |
     ----+------+----
      v4 |   v4 |
        ...    ...
Independent links between the two HSRP members and each core router or access switch now become etherchannels. This reduces VLAN utilization (we no longer require a dedicated Vlan901 for pod-t-233's point-to-point core uplink), IP address utilization (pod-t-233 no longer requires its own loopback IP, dedicated .3 IPs on each access subnet, and pod-t-a84 no longer requires its own .2 addresses on each access subnet). The resulting VSS-capable unified system only requires one address per subnet serviced.

A switch to VSS eliminates the topology and STP related issues described in the previous section, and greatly simplifies the operation of URPF and SLB.

3.1. Topology and FWSM improvements

The access layer switch uplinks to the two physical chassis with a single (multi-chassis) etherchannel rather than with two separate layer-2 connections. As such, we can trunk any given access vlan (e.g. vlan4) to as many access switches as we desire, without the potential for introducing STP loops into the topology.

The same holds true for vlans serviced by FWSM blades. The layer-2 link between the VSS switch and the two FWSM cards (one per physical chassis) is accomplished with only a slight difference from the standalone configuration:
	firewall multiple-vlan-interfaces
	firewall vlan-group G V1,V2,V3-Vn
	firewall switch 1 module M vlan-group G
	firewall switch 2 module M vlan-group G
The vlan-group is made available to both FWSM cards. From this point on, the two cards share access to all these vlans over the backplane (and when I say "backplane" I include the VSL). The VSS switch only needs an IP address on the vlan supporting the FWSM OUTSIDE interface. The rest of the vlans are handed out via layer-2 etherchannels just like regular access vlans, without the potential for creating STP loops.

3.2. URPF improvements

The potential for an externally sourced packet to hit the router from within an access vlan/subnet is eliminated. URPF can be enabled on any access subnet without the requirement to manage an exception ACL containing the source IPs of network monitoring gear.

3.3. SLB simplification

We can operate SLB without the need to explicitly consider and configure replication and failover. Replication/failover for SLB is built into the underlying IOS when running in VSS mode.

4. Converting an HSRP Pair to VSS

This section describes how an HSRP-redundant pair of distribution routers (or "Pod" in CMU-speak) can be converted into a VSS stack without causing any user-visible outage. The Cisco document used to devise this migration plan may be found online here

4.1. Preparing the two Pod halves for VSS

First, make sure both routers (and FWSM blades) are running the appropriate software versions. If FWSM support is required, that means SXI on the IOS side and 4.0.X on the FWSM side. At the time of this writing, we're running the "s72033-advipservicesk9_wan-mz.122-33.SXI1.bin" image on the routers, and "c6svc-fwm-k9.4-0-5.bin" on the FWSM cards.

Next, insure that the two ten-gig interfaces on the sup cards (Te5/4 and Te5/5) are available for configuring the VSL. If currently in use, migrate them to other interfaces.

Make backup copies of the running configs on each router, just in case.

Plan to keep around the loopback IP of the HSRP active router. The loopback(s) configured on the other box (as well as dedicated point-to-point vlans and subnets) will cease to be in use after the conversion is complete. As a convention, the active HSRP router will become "switch 1" in the VSS stack, and the other one will become "switch 2". We need to pick a number to reflect our "switch virtual domain", and, by local convention, we use the last octet of the router loopback IP. So, if pod-t-a84's IP address is 128.2.1.132, we'll end up using "switch virtual domain 132".

We begin the preparations by configuring SSO (Stateful SwitchOver) and NSF (NonStop Forwarding) on both routers. Note that SSO may already be enabled by default:
	redundancy
	 mode sso

	router ospf 1
	 nsf
Next, we assign the switch number within the virtual domain. On pod-t-a84, we enter:
	switch virtual domain 132
	 switch 1
Similarly, on pod-t-233 we enter:
	switch virtual domain 132
	 switch 2
Next, we configure the VSL portchannels on (at least) the sup ten-gig interfaces (in our case, Te5/4 and Te5/5). We pick portchannel numbers 10 for switch1 and 20 for switch2. Note that both port channel numbers must be available on both switches before this configuration step is performed. On pod-t-a84, we enter:
	interface port-channel 10
	 switch virtual link 1
	 no shut

	interface range Te5/4-5
	 channel-group 10 mode on
	 no shut
Similarly, on pod-t-233:
	interface port-channel 20
	 switch virtual link 2
	 no shut

	interface range Te5/4-5
	 channel-group 20 mode on
	 no shut
Next, we must insure that both switches have the same PFC mode and it's set to "PFC3c":
	platform hardware vsl pfc mode pfc3c
We are now ready to convert the two routers to a unified stack. To do this seamlessly, we perform the next step on pod-t-a84 first. This will cause the chassis to reload, and come back in VSS mode. During the reload, pod-t-233 will still perform its HSRP standby duties and continue passing user traffic without interruptions in service:
	switch convert mode virtual
After pod-t-a84 reloads, it will reclaim HSRP primary status from pod-t-233. The only difference is that now all its interfaces are prefixed by the switch ID (1 in this case): interfaces such as Ten2/1 are now numbered Ten1/2/1, to reflect that they're part of "switch 1" in the stack. At this moment, pod-t-a84 is a VSS stack with only one member. We can now issue the conversion command on pod-t-233, again without causing interuption in user traffic, since now pod-t-a84 is up and running. Note: run 'term mon' on the already converted pod-t-a84, to monitor for messages that let us know when the second stack member becomes available. On pod-t-233, issue the conversion command:
	switch convert mode virtual
After this step, pod-t-233 no longer officially exists. The chassis will reload, and once the reload is complete, we'll notice that pod-t-a84 has extra ports and modules available. There will be new interfaces such as Te2/2/1, reflecting the availability of a second stack member. Watch the logging messages on pod-t-a84, and specifically wait for something like "VSL_UP: Ready for control traffic", and "perform exec command switch acept mode virtual". Once the latter message is logged, we can issue the final conversion command on the new stack:
	switch accept mode virtual
This last command merges the configs across the two chassis, and finalizes the "stacking" process.

Ports on "switch 2" of the stack will become available in unconfigured, shutdown mode (except for interfaces Ten2/5/4 and Ten2/5/5, which are part of the VSL).

4.2. Configuring a freshly converted VSS stack

4.2.1. Optional stack member priority and preemption
We may optionally wish to configure VSL priority and preemption, to insure that e.g. switch 1 is always active when available (and switch 2 is always standby when 1 is available). I have not configured this on my test setup, and would like a better reason to configure it than simple aesthetics. If both switches have equal priority, a failover switches the active status to the *other* switch, where it will remain until another failover. In either event, the following commands *would* make switch 1 the preferred-active, higher priority member of the stack:
	switch virtual domain 132
	 switch 1 priority 105
	 switch 1 preempt
4.2.2. Converting access-layer downlinks to etherchannels
At this point, the new VSS stack uses only connections on switch 1, vlan numbers, subnets, and IP addresses inherited from pod-t-a84, and is still configured to act as the HSRP primary (but with an HSRP standby which no longer exists).

The fact that all non-VSL ports on switch 2 are in shutdown mode presents us with an opportunity to convert our existing uplinks to multi-chassis etherchannels in a seamless manner without causing user-visible outages. As an example, our 233-bag-a access switch is now connected to the stack from its Ten4/0/1 interface to Ten1/2/8. Its other port, Ten4/0/2, physically wired to Ten2/2/8 on the stack, is currently down (due to Ten2/2/8 itself being shutdown). We begin by configuring an etherchannel link between the access switch and the stack using this latter physical connection. On both the access switch and the stack, we pick an available portchannel number X, and configure it to use the available link that is currently down. On the access switch, we enter:
	interface Port-channelX
	 switchport
	 switchport trunk encapsulation dot1q
	 switchport trunk native vlan 4
	 switchport trunk allowed vlan 4,A,B,C
	 switchport mode trunk
	 no shut
 
	interface Ten4/0/2
	 switchport
	 switchport trunk encapsulation dot1q
	 switchport trunk native vlan 4
	 switchport trunk allowed vlan 4,A,B,C
	 switchport mode trunk
	 channel-group X mode on
	 no shut
We now have two links between the access switch and the stack. The original link which stayed up after the conversion (Te4/0/1 -> Te1/2/8), and the new etherchannel (containing Te4/0/2 -> Te2/2/8). One of these links will be pruned by STP for a brief period, until we proceed to shut down the former. This may result in a few seconds of production traffic interruption (in case the STP-pruned link was the etherchannel, which would now have to go through the listening/learning/forwarding cycle). Once things stabilize, the newly available ports (Te4/0/1 and Te1/2/8) may be added to the port-channel interface on the acces switch and stack, respectively. The resulting two-link etherchannel is now an STP-free pyhsically redundant uplink between the access switch and the distribution-layer stacked router.
4.2.3. Dual-Active detection, PAGP, and Etherchannel-ing the core uplinks
Before converting the single uplink to (each) core to an etherchannel, a discussion of Dual-Active detection is in order. Each VSS stack member monitors the VSL, and, should the VSL fail in its entirety (i.e., all etherchanneled ten-gig interfaces in the VSL), assume that it now is the "sole survivor" and become active. In very rare cases (e.g. backhoe cuts entire non-pyhisically-diverse VSL), both stack members might end up thinking their mate is down and attempt to become active at the same time, with all the undesireable side-effects that entails.

Several options exist to implement a dual-active detection mechanism which would allow the stack members to realize what just happened, and allow one of them to shut itself down and wait for the VSL to come back up. All but one of these options require a separate dedicated link between the two chassis, which feels clunky, requires extra hardware, and, unless we use ports on modules *other* than the ones already supporting the VSL, *and* unless we insure this dedicated link is geographically diverse from the VSL, is only of marginal value as it might go down along with the rest of the VSL member links. For that reason, I've picked the one remaining method which does not rely on a special link dedicated to dual-active detection.

The Port AGgregation Protocol (PAgP) is a Cisco-proprietary Etherchannel management protocol which allows a VSS stack to perform dual-active detection with the help of one or more adjacent Cisco devices which also have PAgP enabled on the etherchannel connecting them to the VSS stack. Since we only have two core routers, and they are also Cisco 6500 devices, I decided to use the etherchannels connecting our VSS stack to them for dual-active detection. This allows us to stay away from requiring proprietary protocols to be operated on our access-layer switches, which have a much higher chance of being non-Cisco devices and thus not even supporting PAgP. We convert the (currently down) link between the stack's Ten2/2/1 and the core's Ten2/4 into a PAgP-enabled etherchannel on vlan 903 (same as the existing active uplink between the stack and our core). We discard vlan901 as it is no longer necessary. Note also that we use 'channel-group X mode desirable' instead of 'mode on', which enables use of PAgP.
	interface Port-channelX
	 switchport
	 switchport access vlan 903
	 switchport mode access
	 no shut
 
	interface Ten2/2/1
	 switchport
	 switchport access vlan 903
	 switchport mode access
	 channel-group X mode desirable
	 no shut
We are now free to shut down the old active link (between Te1/2/1 on the stack and Te1/4 on the core), and add it to the portchannel.

Once PAgP-capable etherchannels have been established to (all of) the core router(s), we may enable PAgP-based dual-active detection on the stack:
	int poX
	 shut
	switch virtual domain 132
	 dual-active detection pagp trust channel-group X
	int poX
	 no shut
where X iterates over the port-channel number connecting us to each core router. Note that the port-channel interface must be in shutdown mode while it is being added to the dual-active detection setup. Note also that this should not be a problem when multiple cores are available, and the process is being conducted one port-channel at a time.
4.2.4. Cleaning up the stack config
At this point, the conversion to VSS is complete. We might consider renaming the resulting stack to some more location-neutral name such as "pod-t".

We should, at this point, remove all the HSRP-related configuration entries from each interface, as well as the ACL containing the URPF check exceptions. Before, the configuration of an interface might look something like this (note that all non-relevant entries which stay the same between conversion have been removed for simplicity):
	interface Vlan4
	 ip address 128.2.6.2 255.255.255.0
	 ip verify unicast source reachable-via rx allow-self-ping 195
	 standby 1 ip 128.2.6.1
	 standby 1 priority 105
	 standby 1 preempt
	 standby 1 authentication md5 key-string 7 XXXXXXXXXXXXXXXXXXXX
	 standby 1 name SIXNET
	 standby 1 track Loopback2 30
	 standby 1 track Vlan903 20
	 standby 1 track Vlan904 20
We may begin by replacing the URPF statement with simply:
	interface Vlan4
	 ip verify unicast source reachable-via rx allow-self-ping
Note the absence of an ACL reference. We can remove ACL 195 from the stack as soon as all interfaces have been cleaned up, with the long-term benefit of one less place for bitroth to accumulate over time.

Next, we may remove most of the 'standby' hsrp config statements:
        interface Vlan4
         no standby 1 priority 105
         no standby 1 preempt
         no standby 1 authentication md5 key-string 7 XXXXXXXXXXXXXXXXXXXX
         no standby 1 name SIXNET
         no standby 1 track Loopback2 30
         no standby 1 track Vlan903 20
         no standby 1 track Vlan904 20
The next step might cause a brief (seconds or less) interruption in traffic over the interface being cleaned up. Also, it is very important to be logged into the stack from *somewhere other* than the subnet serviced by the interface being cleaned up. In rapid sequence, we type:
	interface Vlan4
	 no standby 1 ip 128.2.6.1
	 ip address 128.2.6.1 255.255.255.0
This finalizes the removal of HSRP configuration on our routed interfaces.

5. Managing and Monitoring the VSS Stack

Previously, each HSRP Pod "half" could be independently monitored via pings to its respective loopback IP. With VSS, this is no longer an option.

5.1. SNMP Monitoring

To monitor the health of a VSS stack we may use SNMP. VSS-specific SNMP traps maybe sent to a listener in the event of a VSL failover, by adding the following configuration to the stack:
	snmp-server enable traps vswitch vsl
During failover, the following traps are sent to snmptrapd:
        SNMPv2-SMI::enterprises.9.9.388 Enterprise Specific Trap (1) Uptime: 0:07:26.86
        SNMPv2-SMI::enterprises.9.9.388.1.3.1.1.3.168 = INTEGER: 2
The INTEGER value is 2 for failure, and 1 for recovery. The MIB used to translate these into human-readable form is available here.

In addition, we may configure our monitoring station to use SNMP polling to monitor specific interfaces (such as Te1/2/1 and Te2/2/1) to allow detection of a failed stack member.

5.2. IOS commands for VSL troubleshooting and management

Troubleshooting commands:
        sho switch virt link [detail]
Displays information on the VSL
        sho switch virt link port-channel
Displays port-channel specific information on the VSL
        sho switch virt link port
Displays port-specific information on VSL member ports
        show switch virtual role
Lists role and state information on each stack member
        show module switch { 1 | 2 | all }
Lists modules present in each stack member
        show switch virtual dual-active pagp
Shows operational status information on the dual-active detection setup.

Administrative commands:
        redundancy reload peer
Causes the peer to reload. Useful during software upgrade.
        redundancy force-switchover
Causes a switchover of the active role to the other stack member. Also useful during software upgrade. Currently, ssh login sessions are lost during a switchover.
        redundancy config-sync {ignore | validate} mismatched-commands
Synchronizes configurations across the two stack members. Can be useful in tracking down mismatches.

6. Conclusion

VSS greatly simplifies management of distribution-layer routing, and reduces the opportunity for outages to occur either due to configuration mismatches across two separately managed HSRP-redundant routers, or due to the almost unavoidable spanning-tree loops that are introduced in the absence of VSS.

The conversion to VSS can be performed without any significant downtime, as outlined throughout this document. CMU's Pod-T (our distribution router dedicated to testing) has been switched over with great results, and will be used for long-term stress testing, before a migration to VSS is considered for our other (production) distribution-layer routers.