A Caribbean island government operated its primary data center out of a third-party service provider facility. The core network infrastructure, the switching backbone connecting every server, storage array, and virtual environment, was built on Cisco Catalyst 6500 chassis switches that had been in place for years.
The original deployment called for six Catalyst 6500 units. Two were positioned as the active core pair, two were earmarked for a replication site that never materialized, and the remaining two sat unused. That left the entire government's hosted infrastructure running through a single pair of aging chassis switches with Supervisor 720-3BXL modules. This was hardware that Cisco had already flagged for end-of-sale back in 2013, with software maintenance winding down by early 2016.
The problems went beyond lifecycle risk. The inter-switch fabric between the two core chassis ran at 1 Gbps, limited to a single fiber uplink carrying all redundancy traffic. Server farm and SAN uplinks were similarly constrained at 1G. An ASA firewall module that was part of the original solution had never been brought online; it simply didn't work, and nobody had been able to get it functional. The only security inspection happening was at the perimeter through a pair of Check Point firewalls, which meant the vast majority of internal traffic between servers, VMs, and storage moved completely uninspected.
For a government relying on this infrastructure to serve every department and remote office across the island, the situation was untenable.
What Made It Urgent
Three pressures converged to force the issue.
- Hardware lifecycle: The Supervisor 720-3BXL was already past end-of-sale and approaching end-of-support. Running core government infrastructure on hardware that could no longer receive security patches or replacement parts under contract was a risk that couldn't be papered over indefinitely.
- Performance ceiling: WAN circuits to government remote sites ranged from 20 Mbps to 100 Mbps. On paper, those links were the throughput bottleneck. In practice, the data center core was making things worse. At 80% WAN utilization, users experienced compounded degradation because the core switching fabric was adding its own constraints on top.
- No path forward: The government had ambitions: unified communications, collaboration platforms, and digital services expansion. None of it was feasible on a core network that couldn't reliably handle existing workloads.
My Role
I was brought in to design the replacement architecture and carry it through from concept to completion. This wasn't a situation where I handed off a design document and walked away. I owned it end-to-end.
- Assessed the existing environment and documented every constraint, gap, and risk
- Designed a modernized core network architecture from scratch, built around current needs and a five-to-ten year horizon
- Worked directly with Cisco to validate the design, including platform choices, port allocations, vPC topology, and uplink capacities
- Negotiated pricing with Cisco to bring the project within budget
- Built the internal business case that went to the CFO for sign-off before procurement could proceed
- Managed the procurement process and coordinated delivery of all hardware to the service provider facility
- Project managed the implementation, coordinating with the service provider, server teams, and storage teams
- Was part of the team on-site during the cutover, which took place during an extended maintenance window to minimize disruption
The Design
The modernized architecture replaced every component of the legacy core with a purpose-built data center switching fabric.
Core Layer: Cisco Nexus 9504. Two modular chassis switches running as an active-active pair using Cisco vPC (Virtual Port Channel). The Nexus 9504 is a 4-slot platform capable of up to 12.8 Tbps of switching capacity, roughly 17 times what the old Catalyst 6500 with Sup720 could deliver. The two cores were interconnected with 80G vPC, eliminating the 1G fiber link that had been the single biggest internal bottleneck. vPC also removed the dependency on Spanning Tree Protocol for loop prevention, meaning every link in the fabric could forward traffic simultaneously.
Distribution Layer: Cisco Nexus 93180YC-EX. A pair of distribution switches, each offering 48 ports of 10/25G SFP+ and 6 ports of 40/100G QSFP28 uplinks, connected to the core via 80G vPC. These handled segmentation between zones and provided high-speed downstream connectivity to the firewall tier.
Internal Segmentation Firewalls: Fortinet FortiGate 1500D. This was the piece that didn't exist at all in the legacy design. Two FortiGate 1500D appliances deployed in high availability, positioned between the distribution and core layers to inspect east-west traffic, specifically the lateral server-to-server, VM-to-VM communication that the perimeter Check Point firewalls never saw. The 1500D delivers 80 Gbps of stateful firewall throughput with hardware-accelerated packet processing.
Server Farm Aggregation: Cisco Nexus 9332PQ. A pair of top-of-rack switches providing 32 ports of 40G QSFP+ each, aggregating connectivity from the HP BladeSystem c7000 enclosures, physical servers, and SAN storage. These connected upstream to the core via 80G vPC, with 160G vPC between them.
Rack-Level Access: Cisco Nexus 2348TQ FEX. Fabric Extenders deployed in server racks, each providing 48 ports of 10GBase-T for server connectivity. The FEX operates as a remote line card of its parent Nexus switch, requiring no separate management plane or additional CLI to maintain.
Every inter-switch link in the new design ran at 40G or 80G using vPC. Every tier had redundant paths. Every connection was active-active.
The Results
The improvement was felt almost immediately after cutover.
- Application performance: Server and application teams reported noticeable improvement in application responsiveness across the board. East-west traffic between virtual infrastructure components became markedly faster. Latency across internal systems dropped.
- WAN utilization made sense again: Before the upgrade, a remote site on a 100 Mbps WAN circuit running at 80% utilization didn't experience 80% utilization. It felt worse because the data center core was adding its own congestion on top. After the upgrade, 80% utilization on a 100 Mbps link behaved exactly as expected.
- Troubleshooting got simpler: With the core no longer a variable, the server team and network team could isolate issues cleanly. Performance complaints started and ended at the WAN link, rather than becoming a three-way finger-pointing exercise.
- Stability under load: The infrastructure held steady under peak demand. Government employees across departments had a consistently better experience.
What It Enabled
The real payoff came in the years that followed. This project was never just about replacing old switches. It was about building a foundation that could carry what came next.
Two to three years after the core modernization, the government successfully deployed a full Cisco collaboration suite: Cisco Unified Communications Manager, WebEx, enterprise voice and video. That deployment depended entirely on having a high-speed, redundant, low-latency core switching fabric. On the old 1G STP-based core, a rollout of that scale would have been technically constrained at best and operationally risky at worst.
Every subsequent initiative the government pursued ran on top of the core network we put in place, from application centralization and digital services for citizens to expanded virtualization. The architecture was designed with headroom. The Nexus 9504 chassis had open line card slots. The 93180YC-EX switches supported 25G downlinks for future NIC upgrades. The vPC topology could absorb additional leaf switches without redesigning the fabric.
Lessons From This Project
- Core infrastructure is a multiplier, not a commodity. When the core is healthy, everything on top of it performs to its potential. When it's constrained, every other system suffers in ways that are hard to diagnose and impossible to fix without addressing the root cause.
- Bottlenecks compound. A WAN link running at 80% utilization is fine. A WAN link running at 80% utilization behind a core network that's also constrained is a different problem entirely. Removing the core bottleneck didn't add WAN bandwidth. Instead, it unlocked the bandwidth that was already being paid for.
- East-west traffic is the blind spot. In this environment, the only security inspection was at the perimeter. Everything moving laterally between servers was completely uninspected. The FortiGate 1500D deployment gave the government visibility into traffic patterns they'd never seen before.
- You have to design for what's next, not just what's broken. By designing for 40G and 80G aggregation, active-active forwarding, and modular expansion, the architecture absorbed a full collaboration deployment and multiple digital transformation initiatives without a second forklift upgrade.
- Procurement and business case matter as much as the design. A technically excellent architecture that doesn't get funded doesn't get built. Working with Cisco to validate the design gave the business case credibility. Negotiating competitive pricing made the numbers work.
This project was delivered end-to-end: from initial assessment and architecture design through vendor validation, pricing negotiation, CFO business case approval, procurement, hardware delivery coordination, project management, and on-site cutover support during the maintenance window.