How to Replace a Faulty Cisco Switch from a Stack and Add a New Switch

Introduction

Cisco StackWise technology allows multiple physical switches to operate as a single logical unit, sharing a common configuration, management IP, and forwarding table. In a production environment, a switch within a stack can sometimes fail due to hardware faults, power issues, or software corruption.

This article provides a complete, step-by-step guide to safely identifying and removing a faulty Cisco switch from an existing stack, and then adding a new replacement switch — all while minimising network downtime. The procedure applies primarily to Cisco Catalyst series switches supporting StackWise, StackWise Plus, and StackWise-480 (e.g., Catalyst 3750, 3850, 9300 series).

This guide assumes you have console or SSH access to the stack master (active switch) before beginning. Always maintain a backup of your running configuration before making any hardware changes.

Prerequisites

  • Physical access to the switch stack in the rack
  • Console cable or SSH access to the stack master
  • A replacement Cisco switch of the same model and series
  • StackWise stacking cables (same type as existing stack)
  • The correct IOS/IOS-XE software version for the replacement switch
  • A copy of the current running configuration (exported to TFTP or local flash)
  • Basic understanding of Cisco IOS CLI commands

Removing the active stack master will trigger a re-election and may cause a brief traffic disruption. Identify the master before proceeding and plan maintenance accordingly.

Understanding Cisco Stack Roles

Before replacing any switch, it is important to understand the roles each member plays in the stack. Each switch in a Cisco stack is assigned a role based on priority and election criteria.

Role Description Impact if Removed
Active Master Controls the entire stack, holds the running config and routing table Triggers master re-election; brief traffic interruption possible
Standby Ready to take over as master if active fails; syncs state Minimal impact; another member becomes standby
Member Forwards traffic; operates under master's control Only ports on that switch go down; stack continues operating

Best practice: always ensure the faulty switch is a member (not master) before physical removal. If it is the master, reload it first so another switch takes over the master role.

Step 1 — Verify the Stack and Identify the Faulty Switch

Log in to the stack master via console or SSH and run the following commands to view the current stack topology and identify which member is faulty.

Check all stack members and their status:

show switch
  

Sample output showing a faulty member:

Switch/Stack Mac Address : 0011.2233.4455
                                           H/W   Current
Switch#  Role   Mac Address     Priority Version  State
--------------------------------------------------------------------
*1       Active  0011.2233.4401  15       V04      Ready
 2       Standby 0011.2233.4402  10       V04      Ready
 3       Member  0011.2233.4403  1        V04      Provisioned
 4       Member  0011.2233.4404  1        V04      Removed
  

In the above output, Switch 4 shows a state of Removed, indicating it is faulty or has lost stack connectivity. Switch 3 shows Provisioned which means the configuration slot exists but the physical switch is not responding.

For detailed hardware and power status of a specific member:

show switch 4 detail
  
show platform
  

Note down the Switch Number of the faulty switch — you will need it during the removal and renumbering steps.

Step 2 — Back Up the Running Configuration

Before making any changes, always export the current running configuration. This ensures you can restore the stack state if anything goes wrong.

Save the running config to the startup config:

copy running-config startup-config
  

Export to a TFTP server for an off-device backup:

copy running-config tftp://192.168.1.100/stack-backup.cfg
  

Alternatively, save to flash:

copy running-config flash:stack-backup.cfg
  

Keep a printed or text copy of critical interface configurations, VLANs, and IP addressing in case the replacement switch needs manual configuration.

Step 3 — Gracefully Remove the Faulty Switch from Stack

If the faulty switch is the Active Master, you must first transfer the master role to another member before physically removing it to reduce traffic impact.

If the faulty switch is a regular member (not master or standby), you can safely power it off and disconnect the stacking cables without issuing any CLI commands first. The stack will automatically remove it from the topology.

Optionally, remove the provisioned slot from the configuration:

no switch 4 provision
        

If the faulty switch is the active master, first change the priority of another member to make it the preferred new master, then reload the faulty master:

switch 2 priority 15
reload slot 1
        

After reload, Switch 2 will become the new active master. Verify using show switch. Now the old master (Switch 1) is just a member and can be physically removed safely.

If the faulty switch is completely unresponsive (no console, no ping, no stack heartbeat), proceed directly to physical removal. Power off the faulty unit from the power strip or pull the power cable, then disconnect both stacking cables. Reconnect remaining stack members with a stacking cable to maintain the ring topology.

Breaking the stack ring without reconnecting the remaining members will degrade stack performance. Always reconnect cables promptly.

Step 4 — Physical Removal of the Faulty Switch

Once the CLI steps are complete, proceed with the physical removal from the rack. Follow these steps carefully:

  1. Power off the faulty switch (press the power button or disconnect the power cable)
  2. Disconnect both StackWise stacking cables from the faulty switch's stack ports
  3. Label and disconnect all network patch cables from the faulty switch's ports
  4. Slide the switch out of the rack and set it aside
  5. Reconnect the two free stacking cable ends to each other to restore the ring topology between the remaining members
  6. Verify the ring is intact by checking show switch — all remaining members should show Ready

Never leave the stack in a broken ring (open chain) state for longer than necessary. A broken ring means the stack is operating in half-bandwidth mode and is vulnerable to a complete split-brain failure if another cable fails.

Step 5 — Prepare the New Replacement Switch

Before inserting the new switch into the stack, you must ensure it is running the correct IOS/IOS-XE version and is configured with the correct stack member number.

Boot the new switch standalone (not connected to the stack) and check its IOS version:

show version
  

Compare the IOS version with the running stack members. All members must run the same major software version for Auto-Upgrade to work reliably.

If the software does not match, either upgrade the new switch manually, or rely on Cisco's Auto-Upgrade feature (enabled by default on 3850/9300 series):

switch stack-member-number renumber new-stack-member-number
  

Assign the new switch a stack member number (e.g., number 4) before connecting it:

switch 1 renumber 4
  
reload
  

The renumber command takes effect after a reload. If you skip this step, the stack master will automatically assign the lowest available number to the new member when it joins.

Step 6 — Add the New Switch into the Stack

With the stack ring currently open (two free cable ends after removing the faulty switch), you can now insert the new switch into the ring.

  1. Slide the new switch into the rack in the position vacated by the faulty unit
  2. Connect the first free stacking cable end into Stack Port 1 of the new switch
  3. Connect the second free stacking cable end into Stack Port 2 of the new switch
  4. Connect the power cable to the new switch
  5. Power on the new switch

The new switch will boot, detect the stack, and begin the join process. The stack master will push the IOS software to the new member if Auto-Upgrade is enabled and versions differ. This process can take 5–15 minutes depending on IOS image size.

Do not power cycle the stack master or interrupt the stack during the software upgrade process. Interrupting the image copy will leave the new member in an unbootable state.

Step 7 — Verify the New Switch Has Joined the Stack

Once the new switch has completed its boot cycle and software upgrade (if applicable), verify it has successfully joined the stack and is in a Ready state.

Check all stack members:

show switch
  

Expected output after successful replacement:

Switch/Stack Mac Address : 0011.2233.4455
                                           H/W   Current
Switch#  Role   Mac Address     Priority Version  State
--------------------------------------------------------------------
*1       Active  0011.2233.4401  15       V04      Ready
 2       Standby 0011.2233.4402  10       V04      Ready
 3       Member  0011.2233.4403  1        V04      Ready
 4       Member  0011.2233.4405  1        V04      Ready
  

Verify the stacking ring is intact and bandwidth is full:

show switch stack-ring speed
  
show switch neighbors
  

Verify the new member's interfaces are visible:

show interfaces status | include Gi4
  

All members showing Ready and the ring showing full speed confirms a successful stack replacement. The new switch will automatically inherit VLAN, interface, and port-channel configurations that were pre-provisioned for its slot number.

Step 8 — Restore Interface Configurations

If the replacement switch was assigned the same stack member number as the faulty one, all interface configurations (access VLAN, trunk, port-channel) from the startup-config will be automatically applied. Verify critical interfaces are up:

show interfaces GigabitEthernet4/0/1 status
  

If the new switch received a different member number, manually re-apply configurations for its interfaces. First view the relevant section from your backup config, then apply:

interface GigabitEthernet4/0/1
 description SERVER-01
 switchport mode access
 switchport access vlan 100
 spanning-tree portfast
 no shutdown
  

Save the updated configuration:

copy running-config startup-config
  

Reconnect all patch cables to their original ports on the new switch. Update your network documentation to reflect the new switch's MAC address and serial number.

Troubleshooting Common Issues

If the new switch boots but does not appear in show switch, check the stacking cables are firmly seated in the correct stack ports. Try swapping the cable connections between Port 1 and Port 2. Also verify the switch model is compatible with the existing stack (same Catalyst series).

show switch detail
show log | include STACKMGR
        

If Auto-Upgrade fails, manually copy the correct IOS image to the new switch's flash before connecting it to the stack. Boot the switch standalone, then:

copy tftp://192.168.1.100/cat3k_caa-universalk9.bin flash:
boot system flash:cat3k_caa-universalk9.bin
reload
        

After the switch is on the correct version, connect it to the stack.

A broken ring causes the stack to operate as a chain. Confirm the ring topology by checking neighbours. If a port shows as not connected, reseat or replace the stacking cable between those two members.

show switch neighbors
show switch stack-ports summary
        

If the new switch is assigned a member number already in use, the stack will increment it automatically. To manually resolve a conflict, renumber one of the members while standalone before re-connecting to the stack:

switch 1 renumber 5
reload
        

Quick Reference — Key Commands

Command Purpose
show switch List all stack members, roles, and states
show switch detail Detailed hardware and uptime per member
show switch neighbors Display stack ring topology and cable connections
show switch stack-ring speed Confirm ring is operating at full bandwidth
switch X priority Y Set master election priority for a member
reload slot X Reload a specific stack member only
switch X renumber Y Assign a new member number to a switch
no switch X provision Remove a provisioned (absent) member slot
show version Check IOS version and uptime per switch
copy running-config startup-config Save configuration to NVRAM

Conclusion

Replacing a faulty Cisco switch in a stack is a structured process that, when followed carefully, can be completed with minimal or zero impact to the rest of the network. The key steps are: identify the faulty member, back up your configuration, gracefully transfer roles if needed, physically swap the hardware, and verify the new member has joined with a Ready state.

Cisco StackWise technology is designed to make this kind of maintenance straightforward — the stack master automatically provisions the new member with the correct software and configuration for its slot, provided Auto-Upgrade is enabled and the stack member number is correctly assigned.

After a successful replacement, update your network inventory records with the new switch's serial number, MAC address, and installation date. This helps with future maintenance planning and RMA tracking.

For large enterprise environments managing multiple stacks, consider using Cisco DNA Center or Cisco Prime Infrastructure to automate switch provisioning and reduce manual steps during replacements.