Assembly Required: A Basic Spanning-Tree Design for a Two-Tier Data Center


In taking over a network that someone else built, I have had the chance to review the topology through my own peculiar lens, considering what I will do differently.  Some discoveries have warranted needful and immediate change, so as to retire the vision of four horsemen galloping through my mind.  Some issues have merely brought confusion, but not panic.  Other finds have earned no more than a facepalm and a line item added to a growing list of trivial horrors to be rectified.

In these recent months, I’ve seen things such as a dual core (per documentation) deconstructed to be a single core (per reality), firewalls far past their sell-by date still in production, HSRP mis- / partially / not-at-all configured, and WAN providers asked to static route my traffic through their core when dynamic routing is an option.  And oh so very much more, to the point that I no longer ask, “Why?” or “What were they thinking?”  There’s no point in assuming there was a plan that made sense when clearly there wasn’t.  My job now is to bring order to the chaos.

An important element in beating back network chaos is a well-ordered spanning-tree.  Spanning-tree was mostly ignored and/or disabled (!) by my predecessors.  Much unloved, spanning-tree is one of those protocols that networking folks are prone to turn their backs on, looking at it from a distance with a jaundiced eye.  ”If I leave it alone, it can’t hurt me, ” seems to be the mantra, right up there with, “Don’t ask, don’t tell,” and “Let sleeping dogs lie.”  Network nerds describing their worst day ever will frequently invoke the mythical “spanning-tree loop” to gain the rapt attention of their empathetic audience.

It’s Simple, Really — The Point of Spanning-Tree

Understanding what spanning-tree is really for is key in understanding why you should get control of it.  For a moment, step away from what you know about spanning-tree.  Forget about root bridges, convergence times, and BPDUs.  What’s the point of spanning-tree?  The point is loop avoidance, or to put it another way, spanning-tree gives you the ability to plumb redundant links between switches without taking the network down due to a topology loop. Or put yet another way, spanning-tree prevents a loop in your network from taking your business offline.

Let’s say you have 3 switches.  C1 and C2 are redundant “core” switches; they provide intervlan routing services, and act as backups for one another, using VRRP or HSRP to make sure default-gateways are always available for hosts.  C1 & C2 are connected to each other.  A1 is an “access” switch, and is where hosts are connected to the network.  You might prefer to call A1 a “top of rack” or “end of row” switch.

Okay.  We’ve established that C1 and C2 are connected, and that hosts are connected to A1.  To provide connectivity to its uplinked hosts, A1 must in turn connect to a core switch. Let’s say we connect A1 to C1.  Job done.  A1 connects to C1.  C1 connects to C2.  Hosts can see their default gateways, and we have a network.

That is to say, we have a network right until C1 goes down, or the link between A1 and C1 is somehow interrupted.  Now A1 is an orphan, and the hosts connected to A1 are cut off from the rest of the network.  We have an outage.  The network is down.  Sysadmins can’t see their hosts, users are calling the helpdesk, and you hear variations of “did something just happen to the network” bounce around the office space.  A small electronic device you carry in your pocket or holster is urgently trying to get your attention, and you’re reminded that you really, really need to change that particular ringtone.

After restoring the link between A1 and C1, you pick up the smashed bits of your small electronic device, apologize to your co-workers and the vendor support guy for the unspeakable things you said during the crisis, and ponder how to prevent A1 from being orphaned in the future.  You come up with the idea of uplinking A1 to C2, forming a triangle.  A1 will now have dual-uplinks — one to C1 and one to C2, C1 and C2 already being connected to one another.  Brilliant, with one caveat: a topology loop has now been introduced into the network.

Genuine topology loops introduce network apocalypse, in the context that an ethernet switch’s job is to forward frames to a destination very, very quickly.  If a frame with no specific destination is placed onto the wire, say a frame containing a broadcast packet that must be flooded to all ports, you will quickly bury your network with thousands upon thousands of frames that get sent around in a circle.  Very often, those multiplied thousands of frames must be handled by the control planes of various network connected devices.  ”Control plane” implies CPU utilization, i.e. something not handled by ASICs; the longer the loop exists, the more traffic that must be handled by sundry control planes (your servers, your routers, etc.) until after just a few seconds, the network is done for.  Core router CPU climbs and plateaus at 100%, servers might respond sluggishly, and network functionality is certainly lost. (Note that there are black arts devoted to maintaining switch and router functionality during topology loops and other DoS attacks that are beyond the scope of this article.)

You didn’t think that plugging A1 into both C1 and C2 could be so risky, did you?  And of course, such a task isn’t usually a risky thing at all, thanks to spanning-tree.  When you connect that second link, A1 to C2, spanning-tree will determine that a topology loop has been introduced, and will choose one of the three links of our triangle to be “blocking” instead of “forwarding”, thus eliminating the loop while still providing redundancy.  Yes, spanning-tree blocks, but spanning-tree will also turn a blocked link back into a forwarding link should our loop be eliminated, say in the event that one of our redundant links is disconnected, or one of the switches in the triangle crashes or suffers a power failure.

Now, all of this loop prevention works automatically without you having to do a thing.  Assuming the switch you’re installing speaks spanning-tree and spanning-tree is enabled by default (almost always the case with any enterprise-class switch), spanning-tree will block certain links, prevent topology loops from forming, and you don’t have to notice.

Only…you should really be paying better attention.  You should be controlling which redundant links are blocked and which ones are forwarding.  Don’t let spanning-tree make those decisions for you based on protocol defaults.

A Simple Design

To discuss this simple spanning-tree design, we’re going to keep our reference model intact.  C1 and C2 form our core switches, and they are connected to one another.  A1 is our access / top-of-rack / end-of-row switch, and it is uplinked to both C1 and C2, forming a triangle.

We’ve established that spanning-tree prevents loops by placing one link in a loop into a blocking state.  The next bit to understand is how spanning-tree learns that a loop has been created.  The explanation is that spanning-tree isn’t a detective “looking for a loop” as such.  If you could interview spanning-tree and ask it what its purpose in life is, it would tell you that it wants to find the fastest path to the root bridge, blocking all others.  It won’t tell you that it wants to prevent loops.

Ah — the “root bridge”.  What is the root bridge?  Remember first that a switch is fundamentally a collection of bridges; when you see the term “bridge” in the context of spanning-tree, translate it as “switch” if that’s terminology that you’re more comfortable with.  In spanning-tree nomenclature, the root bridge is the one switch that has won the root bridge election.  This election is held by all spanning-tree capable devices in a specific network, until one specific switch is the winner.

Let’s say that we made C1 to be the root bridge of our spanning-tree.  A1 has learned this, and now has to determine which is the fastest path to root — C1.  On the assumption that all of our links are of equal speed (and therefore of equal spanning-tree cost), A1 will determine that his fastest path to the root bridge is via the direct link to C1.  The redundant path to the root bridge, the link via C2, will be placed into a blocking state.

Note that in most networks, you’ll find that there’s a unique spanning-tree instance for every VLAN in the network, meaning that a link that is blocking for one VLAN might be forwarding for another VLAN.  So when I said above, “made C1 to be the root bridge of our spanning-tree”, I was really saying the root bridge for a particular VLAN.  This is an important concept to grasp, because the next bit of this simple design takes advantage of the fact that a link can both block and forward simultaneously, depending on what VLAN we’re talking about.

Let’s say that we were talking about VLAN 1 in the previous 2 paragraphs.  C1 is the root bridge for VLAN 1.  For all VLAN 1 traffic, A1 forwards across his A1-C1 direct link, while the A1-C2 link is in a blocked state.  The A1-C2 link is therefore unused, a quiet waste of 1Gbps, or however much bandwidth that link happens to be.  Being the savvy network architects that we are, we’d like to take advantage of that blocked link, and use it for something other than pure failover.  To do this, let’s add a new VLAN to the mix: VLAN 2.  We will make switch C2 the root bridge of VLAN 2.  A1 connects to hosts that live in both VLAN 1 (I know, you should never have hosts that live in VLAN 1 in real life, this is just an example) and VLAN 2.  Spanning-tree on A1 computes the fastest path to VLAN 2 root bridge as via his A1-C2 link, and blocks the A1-C1 link.

In this topology, VLAN 1 hosts uplinked to A1 will ride the A1-C1 link towards the core for routing services.  VLAN 2 hosts will ride the A1-C2 link.  In this way, we’ve made use of both uplinks.  You can scale this design up, where odd-numbered VLANs have a root bridge of C1, and even-numbered VLANs have a root-bridge of C2.  Your A1 uplinks to the core benefit from this rudimentary form of load-balancing.  True, there’s still blocking happening on a per-VLAN basis; in that sense, there’s still wasted uplinks.  Cisco’s VSS, NX-OS’s virtual port-channeling, FabricPath, and TRILL are all technologies aimed at reducing the lost bandwidth endemic in large spanning-tree designs due to blocked ports, and why marketing hype about “the elimination of spanning-tree” keeps appearing in conjunction with new data center products.

In the real world where TRILL isn’t yet widely available and not all of us can afford to re-architect our data centers around NX-OS, spanning-tree still matters.  Arguably, the largest risk of an unconfigured spanning-tree is a root bridge being elected at the edge of your network rather than the core.  This can result in traffic taking paths you would neither expect nor desire, overloading switches and links and creating bottlenecks.  Another significant risk is that of convergence times relating to topology changes, i.e. a switch disappearing or link going down.  I didn’t discuss it above, but which switch will become the new root bridge of VLAN 1 if C1 goes down?  If you configured it, C2 will, which is what you want.  If you did not configure a backup root bridge, it will be up to the election process.

Your network is not a democracy; make sure you rig the elections.

There is a great deal more than could be discussed in the context of spanning-tree design fundamentals including:

  • Portfast — configuring a port that uplinks to a host to move rapidly to the forwarding state
  • Rootguard — shutting down a port that sees an unexpected root bridge advertisement
  • BPDUguard — shutting down a port that sees a spanning-tree device on the other end
  • Rapid spanning-tree vs. legacy spanning-tree — it’s more than just timers
  • Pruning VLANs from trunks to keep STP port counts down in VLANs
  • and more.
  • Unidirectional Link Detection (UDLD) — a protocol that detects a one-way link, usually a fiber pair where one of the two is dead.  While STP doesn’t know about a one-way link and can therefore move a port to forwarding that should be blocking, UDLD does know about a one-way link, and takes the link down before STP does something silly.
  • Storm-control — an algorithm that can help maintain network stability during a topology loop or other iteration of a broadcast storm.
  • Control-plane policing — using QoS concepts to prevent a router or switch CPU from being overloaded with punted traffic.

But we’ll stop there for now.


Tell Me More

I’d love to hear your design preferences, challenges, war stories, and victories relating to spanning-tree.  Are you preparing to make spanning-tree go away by migrating to a new data center switching paradigm (VSS, vPC, TRILL, etc.)?  What about the day the data center went down — how did you find and fix the loop?

Please leave a comment below or send me an e-mail at [email protected].  We’ll talk about the good stuff on the Packet Pushers podcast.

About the author

Ethan Banks

Ethan Banks, CCIE #20655, has been managing networks for higher ed, government, financials and high tech since 1995. Ethan co-hosts the Packet Pushers Podcast, which has seen over 1M downloads and reaches over 10K listeners. With whatever time is left, Ethan writes for fun & profit, studies for certifications, and enjoys science fiction.

Leave a Comment