The Problem TRILL Aims to Solve
TRILL — TRansparent Interconnection of Lots of Links — is proposed with no technical implementation details in RFC5556. TRILL’s proposal can be encapsulated thusly: shove the logic of a layer 3 routing protocol down into layer 2. Why? So that switches can bridge traffic via the most efficient path while still avoiding topology loops. TRILL is therefore an interoperable alternative to spanning-tree. Spanning-tree creates a single-path for traffic to flow by computing the fastest way back to the root bridge, blocking alternate paths to avoid loops. TRILL does away with the root-bridge concept, and instead proposes to compute the most efficient path to any destination within a VLAN, and then forward along that path, with no paths in a blocking state.
Why Now? Do We Really Need TRILL?
You might ask the question “Why do we need this now?”, with the logic that spanning-tree has done very well for a number of years, so what technology is driving a radical new layer 2 path discovery mechanism? In fairness, TRILL is not “radical” in that current TRILL drafts propose using IS-IS in switches that are being termed “rbridges“. The radical bit is computing those paths to remote MAC addresses across a layer 2 topology instead of to a remote layer 3 IP subnet — you know, routing. But is TRILL solving a problem we really have?
I want to answer that question by way of an example. Consider the diagram below, showing 2 access switches and 2 core switches. For our purposes, the topology displayed represents one VLAN. The link colors are largely insignificant.
Assuming Core1 is the spanning-tree root-bridge of this topology and that all links are equal-cost, which ports will become spanning-tree blocked ports? Core2 will block all but C1-C2. Access1 will block all but A1-C1. Access2 will block all but A2-C1.
Let’s say we have a host A attached to Access1, and a host B attached to Access2. What path will the data flow when host A and B communicate (east-west travel)? Access1 <=A1-C1=> Core1 <=A2-C1=> Access2. The Core1 switch is an extra hop. The most efficient path would be Access1 <=A1-A2=> Access2, leaving Core1 out of the forwarding path.
Assuming TRILL was running on these 4 switches instead of spanning-tree, each switch would compute the most efficient path to each remote layer 2 destination (MAC address) and use it. No more blocked ports; no more unused links. In a world where my hypothetical hosts A and B represent VMware clusters doing large data volumes of vMotion, and link A1-A2 could represent expensive 10G or an etherchannel of multiple 10G ports, using that A1-A2 link becomes highly desirable.
In a well-architected data center design leveraging TRILL, traffic will be distributed more evenly across the available interswitch links, maximizing throughput, minimizing latency, and improving performance scale as new links are added (i.e. when you add a new expensive link, it will actually get used, so your overall data center traffic capacity scales up).
Highlights, Excerpts, and Interesting Bits from RFC5556
- From the abstract: “Routing tends not to take full advantage of alternate paths, or even non-overlapping pairwise paths (in the case of spanning trees). This document addresses these concerns and suggests applying modern network-layer routing protocols at the link layer.”
- From 1. Introduction: “With spanning trees, the bandwidth across the subnet is limited because traffic flows over a subset of links forming a single tree…It is thus useful to consider a new approach that combines the features of these two existing solutions, hopefully retaining the desirable properties of each. Such an approach would develop a new kind of bridge system that was capable of using network-style routing, while still providing Ethernet service. It allows reuse of well-understood network routing protocols to benefit the link layer.”
- From 2. The TRILL Problem: “The spanning tree often results in inefficient use of the link topology; traffic is concentrated on the spanning tree path, and all traffic follows that path even when other more direct paths are available. The addition in IEEE 802.1Q of support for multiple spanning trees helps a little, but the use of multiple spanning trees requires additional configuration, the number of trees is limited, and these defects apply within each tree regardless.”
- From 2.2 Multipath Forwarding: “Using spanning trees reduces aggregate bandwidth by forcing all such paths onto one tree, while modern routing causes such paths to be selected based on a cost metric. However, extensions to modern routing protocols enable even greater aggregate bandwidth by permitting traffic flowing from one endpoint to another to be sent over multiple, typically equal-cost, paths.”
- From 2.5 IEEE 802.1 Bridging Protocols: “There have been a variety of IEEE protocols beyond the initial shared-media Ethernet variant, including 802.1D, 802.1w, 802.1Q, 802.1v, and 802.1s. This document presumes the above variants are supported on the Ethernet subnet, i.e., that a TRILL solution would not interfere with (i.e., would not affect) any of the above.”
- From 3.3 Forwarding Loop Mitigation: “Solutions to TRILL are intended to use adapted network-layer routing protocols that may introduce transient loops during routing convergence. A TRILL solution thus needs to provide support for mitigating the effect of such routing loops…These types of mechanisms limit the impact of loops or detect them explicitly. Mechanisms with similar effect should be included in TRILL solutions.”
- From 3.4 Spanning Tree Management: “In order to address convergence under reconfiguration and robustness to link interruption (Section 2.2), participation in the spanning tree (STP) must be carefully managed. The goal is to provide the desired stability of the TRILL solution and of the entire Ethernet link subnet, which may include bridges using STP. This may involve a TRILL solution participating in the STP…”
- From 3.8 Optimizations: “There are a number of optimizations that may be applied to TRILL solutions. These must be applied in a way that does not affect functionality as a tradeoff for increased performance…For example, in many bridged LANs, there are topologies such that central (“core”) bridges which have both a greater volume of traffic flowing through them as well as traffic to and from a larger variety of end station than do non-core bridges. This means that such core bridges need to learn a large number of end station addresses and need to do lookups based on such addresses very rapidly. This might require large high speed content addressable memory making implementation of such core bridges difficult. Although a TRILL solution need not provide such optimizations, it may reduce the need for such large, high speed content addressable memories or provide other similar optimizations.”
- From 3.9 Internet Architecture Issues: “TRILL solutions are intended to have no impact on the Internet network layer architecture. In particular, the Internet and higher layer headers should remain intact when traversing a deployed TRILL solution, just as they do when traversing any other link subnet technologies.”
- From 4. Applicability: “TRILL solutions are not intended to span separate Ethernet link subnets interconnected by network-layer (e.g., router) devices, except via link-layer tunnels, where such tunnels render the distinct subnet undetectably equivalent from a single Ethernet link subnet.”
- From 5. Security Considerations: “TRILL solutions are not intended to be a solution to Ethernet link subnet vulnerabilities, including spoofing, flooding, snooping, and attacks on the link control plane (STP, flooding the learning cache) and link-network control plane (ARP). Although TRILL solutions are intended to provide more stable routing than STP, this stability is limited to performance, and the subsequent robustness is intended to address non-malicious events.”