Layer 3 Technologies

ccde-written

Layer 3 control plane design options for enterprise networks, covering routing protocols, BGP, and routing design recommendations.

Published

April 17, 2026

Overview

In network design, it is common that a certain design goal can be achieved “technically” using different approaches. While from a technical deployment point of view this can be seen as an advantage, from a network design perspective, the question is which design option should be selected and why? To answer this as a network designer, you must be aware of the different design options and protocols as well as the advantages and limitations of each. Therefore, this chapter will concentrate specifically on highlighting, analyzing, and comparing the various design options, principles, and considerations with regard to Layer 3 control plane protocols from different design aspects, focusing on enterprise-grade networks.

This chapter covers the following topics:

Enterprise Layer 3 Routing: This section covers refresher Layer 3 routing concepts, link-state routing protocols, EIGRP, route summarization, traffic engineering and path selection options, and the corresponding network design elements within each of these items and between.
BGP Routing: This section covers the basics of BGP routing, BGP as the core routing protocol, BGP scalability options, route redistribution, and the corresponding network design elements within each of these items.
Enterprise Routing Design Recommendations: This section covers the enterprise routing design recommendations and the corresponding network design elements that go with them, such as how to select the proper routing protocol in the first place.

Enterprise Layer 3 Routing

This section covers the various routing design considerations and optimization concepts that pertain to enterprise-grade routed networks.

IP Routing and Forwarding Concept Review

The main goal of routing protocols is to serve as a delivery mechanism to route packets to reach their intended destination. The end-to-end process of packets routing across the routed network is facilitated and driven by the concept of distributed databases. This concept is typically based on having a database of IP addresses (typically IPs of hosts and networks) on each Layer 3 node in the packet’s path, along with the next-hop IP addresses of the Layer 3 nodes that can be used to reach each of these IPs. This database is known as the routing information base (RIB). In contrast, the forwarding information base (FIB), also known as the forwarding table, contains the destination addresses and the interfaces required to reach those destinations. In general, routing protocols are classified as either link-state, path-vector, or distance-vector protocols. This classification is based on how the mechanism of the routing protocol constructs and updates its routing table, and how it computes and selects the desired path to reach the intended IP destination.

The typical basic forwarding decision in a router is based on three processes:

Routing protocols
Routing table
Forwarding decision (switches packets)

Link-State Routing

Link-state routing protocols use Dijkstra’s shortest path algorithm to calculate the best path. Open Shortest Path First (OSPF) and Intermediate System-to-Intermediate System (IS-IS) protocols are link-state routing protocols that have a common conceptual characteristic in the way they build, interact, and handle L3 routing to some extent. A link-state advertisement (LSA) is a message that is used to communicate network information such as router links, interfaces, link states, and costs within a link-state routing protocol. Figure 1 illustrates the stages of link state protocol operation through the final calculation of the shortest path tree.

Figure 1: Stages of link state protocol operation through the final calculation of the shortest path tree.

It is important to remember that although OSPF and IS-IS as link-state routing protocols are highly similar in the way they build the LSDB and operate, they are not identical. This section discusses the implications of applying link-state routing protocols (OSPF and IS-IS) on different network topologies, along with different design considerations and recommendations.

Link-State over Hub-and-Spoke Topology

In general, some implications should be considered when link-state routing protocols are applied on a hub-and-spoke topology, including the following:

There is a concern with regard to scaling to a large number of spokes because each spoke node typically will receive all other spoke nodes’ link-state information, because there are no effective means to control the distribution of routing information among these spokes.
Special consideration must be taken to avoid suboptimal routing, in which traffic can use remote sites (spokes) as a transit site to reach the hub or other spokes.

For instance, summarization of routing flooding domains in a multi-area/flooding domain design with multiple border routers requires specific routing information between the border routers (area border routers [ABRs] in OSPF or L1/L2 in IS-IS) over a non-summarized link, to avoid using spoke sites as a transit path, as illustrated in Figure 2. An OSPF ABR is a link-state router that is connected to more than one OSPF area.

Figure 2: Multi-Area Link State: Hub and Spoke

So, for each hub-and-spoke flooding domain to be added to the hub routers, you need to consider an additional link between the hub routers in that domain. This is a typical use case scenario to avoid suboptimal routing with link-state routing protocols. However, when the number of flooding domains (for example, OSPF areas) increases, the number of VLANs, subinterfaces, or physical interfaces between the border routers will grow as well, which will result in scalability and complexity concerns. One of the possible solutions is to have a single link with adjacencies in multiple areas (RFC 5185). For instance, in the scenario illustrated in Figure 3, there is a hub-and-spoke topology that uses OSPF multi-area design.

Figure 3: Multi-Area OSPF: Hub and Spoke

If the link between router D and router F (part of OSPF area 1) fails, any traffic from router B destined to the LAN connected to router F going toward the summary advertised route by router D will traverse the more specific route over the path G, E, then F.

To optimize this design during this failure scenario, there are multiple possible solutions, and here network designers must decide which solution is the most suitable one with regard to other design requirements such as application requirements where the delay could affect critical business applications:

Place the inter-ABR link (D to E) in area 1 (simple and provides “north to south” optimal routing in this topology).
Place each spoke in its own area with LSA type 3 filtering. (May lead to complex operations and limited scalability; “depends on the network size.”)
Disable route summarization at the ABRs; for example, advertise more specific routes from ABR router E. (May not always be desirable because this means reduced scalability and the loss of some of the value of the OSPF multi-area design.)

Note

The link between the two hub nodes (for example, ABRs) will introduce the potential of a single point of failure to the design. Therefore, link redundancy (availability) between the ABRs may need to be considered.

If IS-IS is applied to the topology in Figure 3 instead, using a similar setup where IS-IS L2 is to be used instead of the area 0 and IS-IS L1 is to be used by the spokes, the simplest way to optimize this architecture is to put the links between the border routers in IS-IS L1-L2 (overlapping levels capability), where we can extend L1 to overlap with L2 on the border router (ABR in OSPF), as illustrated in Figure 4. This will result in a topology that can support summarization with more optimal routing with regard to the failure scenario discussed previously.

Figure 4: Multilevel IS-IS: Hub and Spoke

Note

OSPF is a more widely deployed and proven link-state routing protocol in enterprise networks compared to IS-IS, especially with regard to hub-and-spoke topologies. IS-IS has limitations when it works on nonbroadcast multiple access (NBMA) multipoint networks.

Table 1 summarizes the different possible types of OSPF interfaces in a hub-and-spoke topology over NBMA transport (typically either Frame Relay or ATM), along with the associated design advantages and implications of each.

Table 1: OSPF Interface Types Comparison: Hub and Spoke

Network Type	Design Advantages	Design Limitations
NBMA / Broadcast	Simplified IP addressing; smaller routing table size (smaller link-state database).	Manual configuration of each spoke with the right OSPF priority. No reachability between spokes or labor intensive at Layer 2 (frame-relay DCLI) configuration (high operation complexity).
Point-to-Multipoint	Simplified IP addressing; supports small to medium networks (smaller link-state database than P2P); simplified operations.	Additional host routes inserted in the routing table, which may limit its scalability (depends on the number of prefixes and hardware resources).
Point-to-Point	Ability to maintain end-to-end link state (signaling the down state).	Large IP addressing spaces. Larger routing table/larger link-state database. Overhead of subinterfaces operations. Limited scalability compared to other options.

Link-State over Full-Mesh Topology

Fully meshed networks can offer a high level of redundancy and the shortest paths. However, the substantial amount of routing information flooding across a fully meshed network is a significant concern. This concern stems from the fact that each router will receive at least one copy of every new piece of information from each neighbor on the full mesh. For example, in an \(n\)-node fully meshed network, each router has \((n-1)\) adjacencies. When a router’s link connected to the LAN side fails, it must flood its LSA/LSP to each of the \((n-1)\) neighbors. Each neighbor will then flood this LSA/LSP (link-state package) again to its neighbors. This process will culminate in a process like a broadcast being sent, due to this full-mesh connectivity and reflooding.

With link-state routing protocols, you can use the mesh group technique to reduce link-state information flooding in a full-meshed topology. However, with link-state routing protocols in failure scenarios over a meshed topology, some routers may know about the failure before others within the mesh. This will typically lead to a temporarily inconsistent LSDB across the nodes within the network, which can result in transient forwarding loops. Even though the concept of a loop-free alternate (LFA) route can be considered to overcome situations like this, using LFA over a mesh topology will add complexity to the control plane.

Note

Later in this chapter, more details are provided about flooding domain and route summarization design considerations for link-state routing protocols, which can reduce the level of control plane complexity and optimize link-state information flooding and performance.

Note

Other mechanisms help to optimize and reduce link-state LSA/LSP flooding by reducing the transmission of subsequent LSAs/LSPs, such as OSPF floors reduction (described in RFC 4136). This is done by eliminating the periodic refresh of unchanged LSAs, which can be useful in fully meshed topologies.

OSPF Area Types

Table 2 provides an overview of the different types of OSPF area types.

Table 2: Summary of OSPF Area Types

Area Type	Advertised Route
Stubby	All routes except type 5, external routing information
Totally stubby	Internal area routes + default route (both type 3 and 5 LSAs are suppressed)
Not so stubby area (NSSA)	All routes with the ability to inject/originate external routing information (type 7 LSA)
Totally NSSA	Internal area routes + default route, with the ability to inject/originate external routing information (type 7 LSAs)

Each of the OSPF areas allows certain types of LSAs to be flooded, which can be used to optimize and control route propagation across the OSPF routed domain. However, if OSPF areas are not properly designed and aligned with other requirements, such as application requirements, it can lead to serious issues because of the traffic black-holing and suboptimal routing that can appear as a result of this type of design.

Table 3 provides a quick reference for how LSA Types 3, 5, and 7 behave in each OSPF area type, including whether they are allowed into the area, generated within it, or blocked at the ABR boundary.

Table 3: OSPF LSA Behavior by Area Type

Area Type	Type-3 Summary (O IA)	Type-5 External (O E1/E2)	Type-7 NSSA External (O N1/N2)	Default Route Injection
Normal	Allowed in/out	Allowed in/out	Not applicable	None (learned normally)
Stub	Allowed in (all prefixes)	Not flooded into area (see Default Route Injection column)	Not applicable	ABR auto-generates a new Type-3 `0.0.0.0/0`
Totally Stubby	Only a single `0.0.0.0/0` from ABR (all others suppressed)	Not flooded into area (see Default Route Injection column)	Not applicable	ABR auto-generates a new Type-3 `0.0.0.0/0`
NSSA	Allowed in (all prefixes)	Not flooded in (see Default Route Injection column). ABR translates internal Type-7 to Type-5 for Area 0 (outbound)	Generated by internal ASBR, stays within the area. ABR translates it to Type-5 for Area 0	Not automatic. Requires `default-information-originate` on ABR (injects Type-7 default)
Totally NSSA	Only a single `0.0.0.0/0` from ABR (all others suppressed)	Not flooded in (see Default Route Injection column). ABR translates internal Type-7 to Type-5 for Area 0 (outbound)	Generated by internal ASBR, stays within the area. ABR translates it to Type-5 for Area 0	ABR auto-generates a new Type-3 `0.0.0.0/0`

Figure 5 shows a conceptual high-level view of the route propagation, along with the different OSPF LSAs, in an OSPF multi-area design with different area types.

Figure 5: OSPF Route Propagation in Multi-Area Design

The typical design question is, “Where can these areas be used and why?” The basic standard answer is, “It depends on the requirements and topology.”

For instance, if no requirement specifies which path a route must take to reach external networks such as an extranet or the Internet, you can use the “totally NSSA” area type to simplify the design. In this design, the ABR will translate the LSA type 7 to type 5 and propagate it to the backbone area. The ABR will not propagate any LSA type 3 into the NSSA area, which means that the NSSA area will not have any inter-area routes. This design can be useful in scenarios where there is no requirement for optimal routing to external networks, and it can help to reduce the size of the routing table and improve convergence time for routers within the NSSA area.

Note

RFC 3101 introduced the ability to have multiple ABRs perform the translation from LSA type 7 to type 5. However, the extra unnecessary number of LSA type 7 to type 5 translators may significantly increase the size of the OSPF LSDB. This can affect the overall OSPF performance and convergence time in large-scale networks with a large number of prefixes.

Similarly, in the scenario depicted on the left in Figure 6, a data center in London hosts two networks (10.1.1.0/24 and 10.2.1.0/24). Both WAN/MAN links to this data center have the same bandwidth and cost. Based on this setup, the traffic coming from the Sydney branch toward network 10.2.1.0/24 can take any path. If this is not compromising any requirement (in other words, suboptimal routing is not an issue), the OSPF area 10 can be deployed as a “totally stubby area” to enhance the performance and stability of remote site routers.

In contrast, the scenario on the right side of Figure 6 has a slightly different setup. The data centers are located in different geographic locations with a data center interconnect (DCI) link. In a scenario like this, the optimal path to reach the destination network can be critical, and using a totally stubby area can break the optimal path requirement. To overcome this limitation, there are two simple alternatives to use: either “normal OSPF area” or the “stubby area” for area 10. This ensures that the most specific route (LSA type 3) is propagated to the Sydney branch router to select the direct optimal path rather than crossing the international DCI.

Figure 6: OSPF Totally Stubby Area Versus Stubby Area Design

In summary, the goal of these types of different OSPF areas is to add more optimization to the OSPF multi-area design by reducing the size of the routing table and lowering the overall control plane complexity by reducing the size of the fault domains (link-state flooding domains). This size reduction can help to reduce the overhead of the routers’ resources, such as CPU and memory. Furthermore, the reduction of the flooding domains’ size will help accelerate the overall network recovery time in the event of a link or node failure. However, in some scenarios where an optimal path is important, take care when choosing between these various area types.

OSPF External Route Types and Traffic Engineering

OSPF external routes come in two types, E1/N1 and E2/N2, and these types serve as traffic engineering tools for controlling how traffic exits the network. Understanding the distinction between these route types is essential for network designers because the choice directly influences path selection behavior and quality of service (QoS) control across autonomous system boundaries.

Hot Potato Routing (E1/N1 Routes)

E1 routes use a type 1 metric that combines the external metric (set by the ASBR) with the internal cost to reach the ASBR. This means a router will prefer the nearest exit point because the internal cost is factored into the total metric. The traffic leaves the network as quickly as possible, following the “hot potato” principle of getting rid of it at the first opportunity. The downside of this approach is that you lose control of path selection and QoS once traffic exits your network, since forwarding decisions beyond that point are made by external domains.

Cold Potato Routing (E2/N2 Routes)

E2 routes use a type 2 metric that includes only the external cost. This means a router prefers the exit point that is closest to the final destination, regardless of internal cost. The traffic stays on your network longer, exiting at the point nearest to the destination, following the “cold potato” principle of keeping it as long as possible. You maintain control of forwarding decisions and QoS handling for a longer portion of the path. An important detail is that E2 routes do carry the internal cost in a separate “forward metric” field, which is used as a tiebreaker when two E2 routes have the same external metric.

Table 4 compares the key attributes of hot potato and cold potato routing in OSPF.

Table 4: OSPF Hot Potato vs. Cold Potato Routing

Attribute	Hot Potato (E1/N1)	Cold Potato (E2/N2)
Metric calculation	External + internal cost	External cost only
Path preference	Nearest exit point	Exit point closest to destination
Traffic behavior	Exits network ASAP	Stays on network as long as possible
Control after exit	Lost	Maintained longer
Tiebreaker	Lowest composite metric	Forward metric (internal cost)

Review Question

Company ABC runs OSPF in their network. A design engineer decides to implement hot-potato routing architecture. How can this implementation be achieved?

Redistribute the external prefixes onto OSPF and ensure that the total metric calculation includes external and internal values.
Enable iBGP and apply prepend to ensure all prefixes will have the same length of the AS path attribute value.
Enable OSPF load-balancing over unequal cost path.
Redistribute the external prefixes onto OSPF and ensure the total metric calculation includes only the external value and the value is the same in all ASBRs.

Answer

a. Hot-potato routing in OSPF is achieved using E1 (type 1 metric) external routes, where the total metric includes both the external cost and the internal cost to reach the ASBR. This causes routers to prefer the nearest exit point because the internal cost is part of the path selection. Option D describes E2 (cold potato) behavior. Options B and C are unrelated to OSPF external route type selection.

OSPF Versus IS-IS

It is obvious that OSPF and IS-IS as link-state routing protocols are similar and can achieve (to a large extent) the same result for enterprises in terms of design, performance, and limitations. However, OSPF is more commonly used by enterprises as the interior gateway protocol (IGP), for the following reasons:

OSPF can offer a more structured and organized routing design for modular enterprise networks.
OSPF is more flexible over a hub-and-spoke topology with multipoint interfaces at the hub.
OSPF naturally runs over IP, which makes it a suitable option to be used over IP tunneling protocols such as Generic Routing Encapsulation (GRE), Multipoint GRE (mGRE), Cisco Dynamic Multipoint Virtual Private Network (DMVPN), and Next Hop Resolution Protocol (NHRP), whereas with IS-IS, this is not a supported design.
In terms of staff knowledge and experience, OSPF is more widely deployed on enterprise-grade networks. Therefore, compared to IS-IS, more people have OSPF knowledge and expertise.

However, if there is no technical barrier, both OSPF and IS-IS are valid options to consider.

EIGRP Routing

Enhanced Interior Gateway Routing Protocol (EIGRP) is an enhanced distance-vector routing protocol, relying on the Diffusing Update Algorithm (DUAL) to calculate the shortest path to a network. A distance-vector protocol is a routing protocol that advertises the entire table to its neighbors. EIGRP, as a unique Cisco innovation, became highly valued for its ease of deployment, flexibility, and fast convergence. For these reasons, EIGRP is commonly considered by many large enterprises as the preferred IGP. EIGRP maintains all the advantages of distance-vector protocols while avoiding the concurrent disadvantages. For instance, EIGRP does not transmit the entire routing information that exists in the routing table following an update event; instead, only the “delta” of the routing information will be transmitted since the last topology update. EIGRP is deployed in many enterprises as the routing protocol for the following reasons:

Easy to design, deploy, and support
Easier to learn
Flexible design options
Lower operational complexities
Fast convergence (subsecond)
Can be simple for small networks while at the same time scalable for large networks
Supports flexible and scalable multi-tier campus and hub-and-spoke WAN design models

Unlike link-state routing protocols, such as OSPF, EIGRP has no hard edges. This is a key design advantage because hierarchy in EIGRP is created through route summarization or route filtering rather than relying on a protocol-defined boundary, such as OSPF areas. The depth of hierarchy depends on where the summarization or filtering boundary is applied. This makes EIGRP flexible in networks structured as a multitier architecture.

EIGRP: Hub and Spoke

As discussed earlier, link-state routing protocols have some scaling limitations when applied to a hub-and-spoke topology. In contrast, EIGRP offers more flexible and scalable capabilities for the hub-and-spoke types of topologies. One of the main concerns in a hub-and-spoke topology is the possibility of a spoke or remote site being used as a transit path due to a configuration error or a link failure. With link-state routing protocols, several techniques to mitigate this type of issue were highlighted. However, there are still scalability limitations associated with it.

However, EIGRP offers the capability to mark the remote site (spoke) as a stub, which is unlike the OSPF stub (where all routers in the same stub area can exchange routes and propagate failure, and update information). With EIGRP, when the spokes are configured as a stub, it will signal to the hub router that the paths through the spokes should not be used as transit paths. As a result, there will be significant optimization to the design. This optimization results from the decrease in EIGRP query scope and the reduction of the unnecessary overhead associated with responding to queries by the spoke routers (for example, EIGRP stuck-in-active [SIA] queries).

In Figure 7, router B will see it has only one path to the LAN connected to router A, rather than four paths.

Figure 7: EIGRP Stub

Consequently, enabling EIGRP Stub over a “hub-and-spoke” topology helps to reduce the overall control plane complexity and increases the scalability of the design to support a large number of spokes without affecting its performance.

EIGRP: Stub Route Leaking

You might encounter some scenarios like the one depicted in Figure 8, which is an extension to the EIGRP stub design with a backdoor link between two remote sites. In this scenario, the HQ site is connected to the two remote sites over an L2 WAN. These remote sites are also interconnected directly via a backdoor link. Remote sites are configured as EIGRP stubs to optimize the remote sites’ EIGRP performance over the WAN.

Figure 8: EIGRP Stub Leaking

The issue with the design in this scenario is that if the link between router B and router D fails, the following will result as a consequence of this single failure:

Router A cannot reach network 192.168.10.0/24 because router D is configured as a stub. Also, router C is a stub, which will not advertise this network to router A anyway.
Router D will not be able to receive the default from router A because router C is a stub as well.

This means that the remote site connected to router D will be completely isolated, without taking any advantage of the backdoor link. To overcome this issue, EIGRP offers a useful feature called stub leaking, where both routers D and C in this scenario can advertise routes to each other selectively, even if they are configured as a stub. Route filtering might need to be incorporated in scenarios like this when an EIGRP leak map is introduced into the design to avoid any potential suboptimal routing that might happen as a consequence of route leaking.

EIGRP: Ring Topology

Unlike link-state routing protocols, EIGRP has limitations with a ring topology. As depicted in Figure 9, the greater the number of nodes in the ring, the greater the number of queries to be sent during a link failure. As a general recommendation with EIGRP, always try to design in triangles where possible, rather than rings.

Figure 9: EIGRP Queries on a Ring Topology

EIGRP: Full-Mesh Topology

EIGRP in a full-mesh topology (see Figure 10) is less desirable in comparison with link-state protocols. For example, with link-state protocols such as OSPF, network designers can designate one router to flood into the mesh and block flooding on the other routers, which can improve the topology. In contrast, with EIGRP, this capability is not available. The only way to mitigate the information flooding in an EIGRP mesh topology is by relying on route summarization and filtering techniques. To optimize EIGRP in a mesh topology, the summarization must be into and out of the meshed network.

Figure 10: EIGRP on a Mesh Topology

Note

As discussed earlier, a link-state routing protocol can lead to transit forwarding loops in a ring and mesh topologies after a network component failure event. Therefore, both EIGRP and link-state routing protocols have limitations on these topologies, with different indications (fast and large number of EIGRP queries versus link-state transit loop).

EIGRP Route Propagation Considerations

EIGRP offers a high level of flexibility to network designers because it can fit different types of designs and topologies. However, like any other protocol, some limitations apply to EIGRP (especially with regard to route propagation) and may influence the design choices. Therefore, network designers must consider the following factors to avoid impacting the propagation of routing information, which can result in an unstable design:

EIGRP bandwidth: By default, EIGRP is designed to use up to 50 percent of the main interface bandwidth for EIGRP packets; however, this value is configurable. The limitation of this concept occurs when there is a dialer or point-to-multipoint physical interface with several peers over one multipoint interface. In this scenario, EIGRP considers the bandwidth value on the main interface divided by the number of EIGRP peers on that interface to calculate the amount of bandwidth per peer. Consequently, when more peers are added over this multipoint interface, EIGRP will reach a point where it will not have enough bandwidth to operate over that dialer or multipoint interface appropriately. In addition, one of the common mistakes with regard to EIGRP and interface bandwidth is that sometimes network operators try to “influence” the best path selection decision in EIGRP DUAL by only tuning the bandwidth over an interface where the interface with the lowest bandwidth will be the least preferred. However, this approach can impact the EIGRP control plane peering functionality and scalability if it is tuned to a low value without proper planning. Therefore, the network designer must take this point into consideration and adopt alternatives, such as point-to-point subinterfaces under the multipoint interface. In addition, with overlay multipoint tunnel interfaces such as DMVPN, the bandwidth may be required to be defined manually at the tunnel interface when there is a large number of remote spokes.
Zero successor routes: When EIGRP tries to install a route in the RIB table and it is rejected, this is called a zero successor route because this route simply will not be propagated to other EIGRP neighbors in the network. This behavior typically happens due to one of the following two primary reasons:
- There is already the same route in the RIB table with a better administrative distance (AD). (Administrative distance is a rating of the trustworthiness of a routing information source. A lower number is preferred.)
- When there are multiple EIGRP autonomous systems (AS) defined on the same router, the router will typically install any given route learned via both EIGRP autonomous systems with the same AD from one EIGRP AS, while the other will be rejected. Consequently, the route of the other EIGRP AS will not be propagated within its domain.

Hiding Topology and Reachability Information

Technically, both topology and reachability information hiding can help to improve routing convergence time during a link or node failure. Topology and reachability information hiding also reduces control plane complexity and enhances network stability to a large extent. For example, if there is a link flapping in a remote site, this might cause all other remote sites to receive and process the update information every time this link flaps, which leads to instability and increased CPU processing.

However, to produce a successful design, the design must first align with the business goals and requirements (and not just be based on the technical drivers). Therefore, before deciding how to structure IGP flooding domains, network architects or designers must first identify the business’s goals, priorities, and drivers. Consider, for example, an organization that plans to merge with one of its business partners but with no budget allocated to upgrade any of the existing network nodes. When these two networks merge, the size of the network may increase significantly in a short period of time. As a result, the number of prefixes and network topology information will increase significantly, which will require more hardware resources such as memory or CPUs.

Given that this business has no budget allocated for any network upgrade, in this case introducing topology and reachability information hiding in this network can optimize the overall network performance, stability, and convergence time. This will ultimately enable the business to meet its goal without adding any additional cost. In other words, the restructuring of IGP flooding domain design in this particular scenario is a strategic business-enabler solution.

However, in some situations, hiding topology and reachability information may lead to undesirable behaviors, such as suboptimal routing. Therefore, network designers must identify and measure the benefits and consequences by following the top-down approach. The following are some of the common questions that need to be thought about during the planning phase of the IGP flooding domain design:

What are the business goals, priorities, and directions?
How many Layer 3 nodes are in the network?
What is the number of prefixes?
Are there any hardware limitations (memory, CPU)?
Is optimal routing a requirement?
Is low convergence time a requirement?
What IGP is used, and what underlying topology is used?

Furthermore, it is important that network designers understand how each protocol interacts with topology information and how each calculates its path, so as to be able to identify design limitations and provide valid optimization recommendations.

Link-state routing protocols take the full topology of the link-state routed network into account when calculating a path. A path-vector routing protocol (Border Gateway Protocol [BGP]) can achieve topology hiding by simply using either route summarization or filtering, and distance-vector protocols, by nature, do not propagate topology information. Moreover, with route summarization, network designers can achieve “reachability information hiding” for all the different routing protocols.

Note

A link-state routing protocol can offer built-in information hiding capabilities (route suppression) by using different types of flooding domains, such as L1/L2 in IS-IS and stubby types of areas in OSPF.

Note

Although route filtering can be considered as an option for hiding reachability information, it is often somewhat complicated with link-state protocols.

IGP Flooding Domains

As discussed earlier, modularity can add significant benefits to the overall network architecture. By applying this concept to the design of logical routing architectural domains, we can have a more manageable, scalable, and flexible design. To achieve this, we need to break a flat routing design into one that is more hierarchical and has modularity in its overall architecture. In this scenario, we may have to ask the following questions: How many layers should we consider in our design? How many modules or domains is good practice?

The simple answer to these questions depends on several factors, including the following:

Design goal (simplicity versus scalability versus stability)
Network topology
Network size (nodes, routes)
Routing protocol
Network type (for example, enterprise versus service provider)

The following sections cover the various design considerations for IGP flooding domains, starting with a review of the structure of link-state and EIGRP domains.

Link-State Flooding Domain Structure

Both OSPF and IS-IS as link-state routing protocols can divide the network into multiple flooding domains, as discussed earlier above. Dividing a network into multiple flooding domains, however, requires an understanding of the principles each protocol uses to build and maintain communication between the different flooding domains. In a multiple flooding domain design with OSPF, a backbone area is required to maintain end-to-end communication between all other areas (regardless of its type). In other words, area 0 in OSPF is like the glue that interconnects all other areas within an OSPF domain. In fact, non-backbone OSPF areas and area 0 (backbone area) interconnect and communicate in a hub-and-spoke fashion.

Similarly, with IS-IS, its levels chain (IS-IS flooding domains) must not be disjointed (L2 to L1/L2 to L1 and vice versa) for IS-IS to maintain end-to-end communications, where level 2 can be seen as analogous to area 0 in OSPF.

The natural communication behavior of link-state protocols across multiple flooding domains requires at least one router to be dually connected to the core flooding domain (backbone area) and the other area or areas, where an LSDB for each area is stored along with a separate shortest path first (SPF) calculation for each area. Moreover, the characteristic of the communication between link-state flooding domains (between border routers) is like a distance-vector protocol. In OSPF terminology, this router is called the area border router (ABR). In IS-IS, the L1/L2 router is analogous to the OSPF ABR.

In general, OSPF and IS-IS are two-layer hierarchy protocols; however, this does not mean that they cannot operate well in networks with more hierarchies (as discussed later in this section).

In addition, although both OSPF and IS-IS are suitable for two-layer hierarchy network architecture, there are some differences in the way that their logical layout (flooding domains such as areas and levels) can be designed. For example, OSPF has a hard edge at the flooding domain borders. Typically, this is where routing policies are applied, such as route summarization and filtering, as shown in Figure 11.

Figure 11: OSPF Flooding Domain Borders

By contrast, IS-IS routing information of the different levels (L1 and L2) is (technically) carried over different packets. This helps IS-IS have a softer edge at its flooding domain borders. This makes IS-IS more flexible than OSPF because the L2 routing domain can overlap with the L1 domains, as shown in Figure 12.

Figure 12: IS-IS Flooding Domain Borders

Consequently, IS-IS can perform better when optimal routing is required with multiple border routers, whereas OSPF requires special consideration with regard to the inter-ABR links (for example, which area to be part of, or in which direction is optimal routing more important).

Note

With both OSPF and IS-IS, the design must always reflect that the backbone cannot be partitioned in case of a link or node failure. Although an OSPF virtual link can help to fix partitioned backbone area issues, it is not a recommended approach. Instead, redesign of the logical or physical architecture is highly desirable in this case. Nevertheless, an OSPF virtual link may be used as an interim solution (see the following example).

The scenario shown in Figure 13 illustrates poorly designed OSPF areas. It is considered a poor design because the OSPF backbone area has the potential to be partitioned if the direct interconnect link between the regional data centers (London and Sydney) fails. This will result in communication isolation between the London and Sydney data centers. However, let’s assume that this organization needs to use its regional HQs (Melbourne, Amsterdam, and Singapore), which are interconnected in a hub-and-spoke fashion, as a backup transit path when the link between the London and Sydney sites is down.

Figure 13: OSPF Poor Area Design

Based on the current OSPF area design, area 6 cannot serve as a transit path if the data center interconnect between London and Sydney fails, resulting in a discontiguous area 0.

The ideal fix to this issue is to add redundant links from the London data center to WAN backbone router Y and/or from the Sydney data center to WAN backbone router X or to add a link between WAN backbone routers X and Y in area 0.

However, let’s assume that the provisioning of the links takes a while and this organization requires a quick fix to this issue. As shown in Figure 14, if you deploy an OSPF virtual link between WAN backbone routers X and Y in Amsterdam and Melbourne, respectively (across the hub site in Singapore), OSPF will consider this link as a point-to-point link. Both WAN backbone routers (ABRs) X and Y will form a virtual adjacency across this virtual link. As a result, this path can be used as an alternate path to maintain the communication between London and Sydney data centers when the direct link between them is down.

Figure 14: OSPF Virtual Link

Note

The solution presented in this scenario is based on the assumption that traffic flowing over multiple international links is acceptable from the perspective of business and application requirements.

You can use a GRE tunnel as an alternative method to the OSPF virtual link to fix issues like the one just described; however, there are some differences between using a GRE tunnel versus an OSPF virtual link, as summarized in Table 5.

Table 5: OSPF Virtual Link Versus GRE Tunnel

GRE Tunnel	Virtual Link
May add tunnel overhead as all traffic is tunneled and encapsulated by the tunnel endpoints.	The routing updates are tunneled, but the data traffic is sent natively without tunnel overhead.
May add operational overhead; for example, IP addressing needs to be configured if not deployed as “unnumbered” and the tunnel interface/IP needs to be assigned manually to OSPF area 0.	Simplified operation; for example, no IP addressing needs to be configured manually, and it’s under OSPF area 0 by default.
OSPF stub area can be used as a transit area for the tunnel.	The transit area cannot be an OSPF stub area.

Link-State Flooding Domains

One of the most common questions when designing OSPF or IS-IS is, “What is the maximum number of routers that can be placed within a single area?”

The common rule of thumb specifies between 50 and 100 routers per area or IS-IS level. However, in reality, it is hard to generalize the recommended maximum number of routers per area because the maximum number of routers can be influenced by several variables, such as the following:

Hardware resources (such as memory, CPU)
Number of prefixes (can be influenced by routes’ summarization design)
Number of adjacencies per shared segment

Note

The amount of available bandwidth with regard to the control plane traffic such as link-state LSAs/LSPs is sometimes a limiting factor. For instance, the most common quality of service (QoS) standard models followed by many organizations allocate one of the following percentages of the interface’s available bandwidth for control (routing) traffic: 4-class model, 7 percent; 8-class model, 5 percent; and 12-class model, 2 percent. This is more of a concern when the interconnection is a low-speed link such as a legacy WAN link (time-division multiplexing [TDM] based, Frame Relay, or ATM) with limited bandwidth. Therefore, other alternatives are sometimes considered with these types of interfaces, such as a passive interface or static routing.

For instance, many service providers run thousands of routers within one IS-IS level. Although this may introduce other design limitations with regard to modern architectures, in practice it is proven as a doable design. In addition, today’s router capabilities, in terms of hardware resources, are much stronger and faster than routers that were used five to seven years ago. This can have a major influence on the design, as well, because these routers can handle a high number of routes and volume of processing without any noticeable performance degradation.

In addition, the number of areas per border router is also one of the primary considerations in designing link-state routing protocols, in particular OSPF. Traditionally, the main constraint with the limited number of areas per ABR is the hardware resources. With the next generation of routers, which offer significant hardware improvements, ABRs can hold a greater number of areas. However, network designers must understand that additional areas to be added per ABR correlates to potential lower expected performance (because the router will store a separate LSDB per area).

In other words, the hardware capabilities of the ABR are the primary deterministic factor of the number of areas that can be allocated per ABR, considering the number of prefixes per area as well. Traditionally, the rule of thumb is to consider two to three areas (including the backbone area) per ABR. This is a foundation and can be expanded if the design requires more areas per ABR, with the assumption that the hardware resources of the ABR can handle this increase.

In addition to these facts and variables, network designers should consider the nature of the network and the concept of fault isolation and design modularity for large networks that can be designed with multiple functional fault domains (modules). For example, large-scale routed networks are commonly divided based on the geographic location of global networks or based on an administrative domain structure if they are managed by different entities.

EIGRP Flooding Domain Structure

As discussed earlier, EIGRP has no protocol-specific flooding domains or structures. However, EIGRP with route summarization or filtering techniques can break the flooding domains into multiple hierarchies of routing domains, which can reduce the EIGRP query scope. This concept is a vital contributor to the optimization of the overall EIGRP design in terms of scalability, simplicity, and convergence time.

In addition, EIGRP offers a higher degree of flexibility and scalability in networks with three or more levels in their hierarchies as compared to link-state routing protocols.

Routing Domain Logical Separation

The two main drivers for breaking a routed network into multiple logical domains (fault domains) are the following: to improve the performance of the networks and routers (fault isolation), and to modularize the design (to make it simpler, more stable, and scalable). These two drivers enhance network convergence and increase the overall routing architecture scalability. Furthermore, breaking the routed topology into multiple logical domains will facilitate topology aggregation and information hiding. It is critical to decide where a routing domain can be divided into two or multiple logical domains. In fact, several variables influence the location where the routing domains are broken or divided. The considerations discussed in the sections that follow are the primary influencers that help to determine the correct location of the logical routing boundaries. Network designers need to consider these when designing or restructuring a routed network.

Underlying Physical Topology

As discussed in Chapter 1, “Network Design,” the physical network layout is like the foundation of a building. As such, it is the main influencer when designing the logical structure of a routing domain (for example, a hub-and-spoke versus ring topology). For instance, the level of the hierarchy held by a given network can impact the logical routing design if its structure includes two, three, or more tiers.

Moreover, the points in the network where the interconnections or devices meet (also known as chokepoints) at any given tier within the network are good potential border locations of a fault domain boundary, such as ABR in OSPF. For instance, in Figure 15, the network is constructed of three-level hierarchies. Routers A and B and routers C and D are good potential points for breaking the routing domain (physical aggregation points). Also, these boundaries can be feasible places to perform route summarizations.

Figure 15: Physical Aggregation Points

The other important factor with regard to the physical network layout is to break areas that have a high density of interconnections into separate logical fault domains where possible. As a result, devices in each fault domain will have smaller reachability databases (for example, LSDB) and will only compute paths within their fault domain, as illustrated in Figure 16. This will ultimately lead to the reduction of the overall control plane design complexity. This concept will promote a design that can facilitate the support of other design principles, including simplicity, modularity, scalability, topology, and reachability of information hiding.

Figure 16: Potential Routing Domain Boundaries

The network illustrated in Figure 16 has four different functional areas:

The primary data center
The regional data center
The international WAN
The hub-and-spoke network for some of the remote sites

From the perspective of logical separation, you should place each one of the large parts of the network into its own logical domain. The logical topology can be broken using OSPF areas, IS-IS levels, or EIGRP route summarization. The question you might be asking is, “Why has the domain boundary been placed at routers G and H rather than router D?” Technically, both are valid places to break the network into multiple logical domains. However, if we place the domain boundary at router D, both the primary data center network and regional data center will be under the same logical fault domain. This means the network may be less scalable and associated with lower control plane stability because routers E and F will have a full view of the topology of the regional data center network connected to routers G and H. In addition, routers G and H most probably will face the same limitations as routers E and F. As a result, if there is any link flap or routing change in the regional data center network connected to router G or H, it will be propagated across to routers E and F (unnecessary extra load and processing).

Traffic Pattern and Volume

By understanding traffic patterns (for example, south–north versus east–west) and traffic volume trends, network designers can better understand the impact if a logical topology were to be divided into multiple domains at certain points. For example, OSPF always prefers the path over the same area regardless of the link cost over other areas. In some situations, this could lead to suboptimal routing, where a high volume of traffic will travel across low-capacity links or expensive links with strict billing that not every type of communication should go over them; this results from the poor design of OSPF areas that did not consider bandwidth or cost requirements.

Similarly, if the traffic pattern is mostly north–south, such as in a hub-and-spoke topology where no communication between the spokes is required, this can help network designers to avoid placing the logical routing domain boundary at points likely to use spoke sites as transit sites (suboptimal routing). For instance, the scenario depicted in Figure 17 demonstrates how the application of the logical area boundaries on a network can influence path selection. Traffic sourced from router B going to the regional data center behind router G should (optimally) go through router D, then across one of the core routers E or F, and finally to router C to reach the data center over one of the core high-speed links. However, the traffic is currently traversing the low-speed link via router A. This path (B-D-A-C-G) is within the same area (area 10), as shown in Figure 17.

Figure 17: OSPF Suboptimal Routing

No route filtering or any type of summarization is applied to this network. This suboptimal routing results entirely from the poor design of OSPF areas. If you apply the concepts discussed in this section, you can optimize this design and fix the issue of suboptimal routing, as follows:

First, the physical network is a three-tier hierarchy. Routers C and D are the points where the access, data center, and core links meet, which makes them a good potential location to be the area border (which is already in place).
Second, if you divide this topology into functional domains, you can, for example, have three parts (core, remote sites, and data center), with each placed in its own area. This can simplify summarization and introduce modularity to the overall logical architecture.
The third point here is the traffic pattern. It is obvious that there will be traffic from the remote sites to the regional data center, which needs to go over the high-speed links rather than going over the low-speed links by using other remote sites as a transit path.

Based on this analysis, the simple solution to this design is to either place the data center in its own area or to make the data center part of area 0, as illustrated in Figure 18, with area 0 extended to include the regional data center.

Note

Although both options are valid solutions, on the CCDE exam the correct choice will be based on the information and requirements provided. For instance, if one of the requirements is to achieve a more stable and modular design, a separate OSPF area for the regional data center will be the more feasible option in this case.

Figure 18: OSPF Optimal Routing

Similarly, if IS-IS is used in this scenario, as illustrated in Figure 19, router B will always use router A as a transit path to reach the regional data center prefix. Over this path (B-D-A-C-G), the regional data center prefix behind router G will be seen as IS-IS level 1, and based on IS-IS route selection rules, this path will be preferred compared to the one over the core, in which it will be announced as an IS-IS level 2 route. Figure 19 suggests a simple possible solution to optimize IS-IS flooding domain design (levels): including the regional data center as part of IS-IS level 2. This ensures that traffic from the spokes (router B in this example) destined to the regional data center will always traverse the core network rather than transiting any other spoke’s network.

Figure 19: IS-IS Levels and Optimal Routing

Security Control and Policy Compliance

This pertains more to what areas of a certain network have to be logically separated from other parts of the network. For example, an enterprise might have a research and development (R&D) lab where different types of unified communications applications are installed, including routers and switches. Furthermore, the enterprise security policy may dictate that this part of the network must be logically contained and only specific reachability information needs to be leaked between this R&D lab environment and the production network. Technically, this will lead to increased network stability and policy control.

Route Summarization (Logical Separation Factor)

The other major factor when deciding where to divide the logical topology of a routed network is where summarization or reachability information hiding can take place. The important point here is that the physical layout of the topology must be considered. In other words, you cannot decide where to place the reachability information hiding boundary (summarization) without referring to what the physical architecture looks like and where the points are that can enhance the overall routing design if summarization is enabled.

Route Summarization

By having a well-structured IP address align with the physical layout with reachability information hiding using route summarization, as shown in Figure 20, network designers can achieve an optimized level of network design simplicity, scalability, and stability.

Figure 20: Structured IP Addressing and Physical Connectivity

For example, based on the routes’ summarization structure illustrated in Figure 20, if there is any link flap in a remote site in region 2, it will not affect the remote site routers of region 1 in processing or updating their topology database (which in some situations might cause unnecessary path recalculation and processing, which in turn may lead to service interruption). Usually, route summarization facilitates the reduction of the RIB table size by reducing the number of route counts. This means less memory, lower CPU utilization, and faster convergence time during a network change or following any failure event. In other words, the boundary of the route summarization almost always overlaps with the boundary of the fault domain.

However, not every network has structured network IP addressing like the one shown in Figure 20. Therefore, network designers must consider alternatives to overcome this issue. In some situations, the solution is “not to summarize.” For instance, Figure 21 illustrates a network with unstructured IP addressing, and the business may not be able to afford to change its IP scheme in the near future.

Figure 21: Network with Unstructured IP Addressing

Moreover, in some scenarios, the unstructured physical connectivity can introduce challenges with route summarization. For example, in Figure 22, summarization can lead to forcing all the traffic from the hub site to always prefer the high-cost and low-bandwidth link to reach the 172.2.0.0/24 network (more specific route over the high-cost non-summarized link), which may lead to an undesirable outcome from the business point of view (for example, slow applications’ response time over this link).

Figure 22: Unstructured Physical Connectivity

As a general rule of thumb (not always), summarization should be considered at the routing logical domain boundaries. The reason why summarization might not always be considered at the logical boundary domain is that in some designs it can lead to suboptimal routing or traffic black-holing (also known as summary black holes).

Summary Black Holes

The principle of route summarization is based on hiding specific reachability information. This principle can optimize many network designs, as discussed earlier; however, it can lead to traffic black-holing in some scenarios because of the specific hidden routing information. In the scenario illustrated in Figure 23, routers A and B send the summary route only (172.1.0.0/21) with the same metric toward router C. Based on this design, in case of link failure between routers D and E, the routing table of router C will remain intact because it is receiving only the summary. Consequently, there is potential for traffic black-holing. For instance, traffic sourced from router C destined to network 172.1.1.0/24 landing at router B will be dropped because of this summarization black-holing. Moreover, the situation can become even worse if router C is performing per-packet load balancing across routers A and B. In this case, 50 percent of the traffic is expected to be dropped. Similarly, if router C is load balancing on a per-session basis, hypothetically some of the sessions will reach their destinations and others may fail. As a result, route summarization in this scenario can lead to serious connectivity issues in some failure situations.

Figure 23: Summary Black Hole

To mitigate this issue and enhance the design in Figure 23, summarization either should be avoided (this option might not always be desirable because it can reduce the stability and scalability in large networks) or at least one non-summarized link must be added between the summarizing routers (in this scenario, between routers A and B, as illustrated in Figure 24). The non-summarized link can be used as an alternate path to overcome the route summarization black-holing issue described previously.

Figure 24: Summary Black-Hole Optimization

Suboptimal Routing

Although hiding reachability information with route summarization can help to reduce control plane complexity, it can lead to suboptimal routing in some scenarios. This suboptimal routing, in turn, may lead traffic to use a lower-bandwidth link or an expensive link, over which the enterprise might not want to send every type of traffic. For example, if we use the same scenario discussed earlier in the OSPF areas, we then apply summarization to the data center edge routers of London and Milan and assume that the link between Sydney and Milan is a high-cost link that has a typically lower routing metric, as depicted in Figure 25.

Note

The example in Figure 25 is “routing protocol” neutral; it can apply to all routing protocols in general.

As illustrated in Figure 25, the link between the Sydney branch and the Milan data center is 10 Mbps, and the link to London is 5 Mbps. In addition, the data center interconnect between Milan and London data centers is only 2 Mbps. In this particular scenario, summarization of the Sydney branch from both data centers will typically hide the more specific route. Therefore, the Sydney branch will send traffic destined to any of the data centers over the high-bandwidth link (with a lower routing metric); in this case, the Sydney–Milan path will be preferred (almost always, higher bandwidth = lower path metric). This behavior will cause suboptimal routing for traffic destined to the London data center network. This suboptimal routing, in turn, can lead to an undesirable experience, because rather than having 5 Mbps between the Sydney branch and the London data center, their maximum bandwidth will be limited to the data center interconnect link capacity, which is 2 Mbps in this scenario. This is in addition to the extra cost and delay that will be from the traffic having to traverse multiple international links.

Figure 25: Summary Route and Suboptimal Routing

Even so, this design limitation can be resolved via different techniques based on the use of the routing protocol, as summarized in Table 6.

Table 6: Suboptimal Routing Optimization Techniques

OSPF	IS-IS	EIGRP	BGP
Using a normal OSPF area combined with LSA type 3 filtering at the ABRs to send /23 summary route from Milan side and the more specific route along with /23 summary route from London ABR where the optimal path needs to take place.	Using route leaking from L2 to L1 (RFC 5302); in this scenario, leaking the more specific route from London L1/L2 router.	Send two summary routes containing the more- and less-specific routes from the router that needs to be used for the more specific routes (London). Route leaking with the summary to send more specific routes from the desired router for optimal path (London router).	By sending the summary along with the more specific route (for example, using unsuppress-map with the BGP summary at the London router).

Figure 26 illustrates link-state areas/levels application with regard to the discussed scenario and the suggested solutions because the different areas/levels designs can have a large influence on the overall traffic engineering and path selection.

Figure 26: Link-State Flooding Domain Applications and Optimal Routing

Note

With IS-IS, L1-L2 (ABR) may send the default route toward the L1 domain, and the route leaking at the London ABR will leak/send the more specific local prefix for optimal routing.

Based on these design considerations and scenarios, we can conclude that although route summarization can optimize the network design for several reasons (discussed earlier in this chapter), in some scenarios, summarization from the core networks toward the edge or remote sites can lead to suboptimal routing. In addition, summarization from the remote sites or edge routers toward the core network may lead to traffic black holes in some failure scenarios. Therefore, to provide a robust and resilient design, network designers must pay attention to the different failure scenarios when considering route summarization.

IGP Traffic Engineering and Path Selection

By understanding the variables that influence a routing protocol decision to select a certain path, network designers can gain more control to influence route preference over a given path based on a design goal. This process is also known as traffic engineering.

In general, routing protocols perform what is known as destination traffic engineering, where the path selection is always based on the targeted prefix and the attributes of the path to reach this prefix. However, each of the three IGPs discussed in this chapter has its own metrics, algorithm, and default preferences to select routes. From a routing point of view, they can be altered to control which path is preferred or selected over others, as summarized in the sections that follow.

OSPF

If multiple routes cover the same network with different types of routes, such as inter-area (LSA type 3) or external (LSA type 5), OSPF considers the following list in order to select the preferred path (from highest preference to the lowest):

Intra-area routes
Inter-area routes
External type 1 routes
External type 2 routes

Let’s take a scenario where there are multiple routes covering the same network with the same route type as well; for instance, both are inter-area routes (LSA type 3). In this case, the OSPF metric (cost) that is driven by the links’ bandwidth is used as a tiebreaker to select the preferred path. Typically, the route with the lowest cost is chosen as the preferred path. If multiple paths cover the same network with the same route type and cost, OSPF will typically select all the available paths to be installed in the routing table. Here, OSPF performs what is known as equal-cost multipath (ECMP) routing across multiple paths.

An OSPF router that injects external LSAs into the OSPF database is called an autonomous system boundary router (ASBR). For external routes with multiple ASBRs, OSPF relies on LSA type 4 to describe the path’s cost to each ASBR that advertises the external routes. For instance, in the case of multiple ASBRs advertising the same external OSPF E2 prefixes carrying the same redistributed metric value, the ASBR with the lowest reported forwarding metric (cost) will win as the preferred exit point.

IS-IS

Typically, with IS-IS, if multiple routes cover the same network (same exact subnet) with different route types, IS-IS follows the sequence here in order to select the preferred path:

Level 1
Level 2
Level 2 external with internal metric type
Level 1 external with external metric type
Level 2 external with external metric type

Like OSPF, if there are multiple paths to a network with the same exact subnet, route type, and cost, IS-IS selects all the available paths to be installed in the routing table (ECMP).

EIGRP

EIGRP has a set of variables that can solely or collectively influence which path a route can select. For more stability and simplicity, bandwidth and delay are commonly used for this purpose. Nonetheless, it is always simpler and safer to alter delay for EIGRP path selection, because of some implications associated with tuning bandwidth for EIGRP traffic engineering purposes discussed earlier in this chapter, which requires careful planning.

Like other IGPs, EIGRP supports the concept of ECMP; in addition, it does support unequal cost load balancing, as well, with proportional load sharing.

Summary of IGP Characteristics

As discussed in this chapter, each routing protocol behaves and handles routing differently on each topology. Table 7 summarizes the characteristics of the IGPs, taking into account the topology that is used.

Table 7: IGP Characteristics Summary

	Link State	EIGRP
Hub-and-spoke scalability	Moderate scaling capability. Care must be taken with summary black holes. Consider stub areas with filtering to prevent transiting traffic via remote sites and large RIB tables.	Excellent scaling capability. Consider routes to address summary black holes. Consider stub remote routers with filtering and summarization to prevent transiting traffic through remote sites.
Full-mesh scalability	Acceptable scaling capability, to a certain extent.	Acceptable scaling capability, to a certain extent.
Full-mesh considerations	Manually designate flooding points and increase scaling through a full mesh. Potential of a temporary routing loop following a network failure event.	Summarize into and out of the full mesh. Increased number of EIGRP queries following a network failure event.
Summarization	Only at border routers (ABRs).	At any place.
Filtering	Only at border routers (ABRs).	At any place.
Load balancing	Equal-cost load balancing.	Equal- and unequal-cost load balancing.

Note

In the table above, link-state ABR refers to either OSPF ABR, ASBR, or IS-IS L1-L2 router.

Note

The full mesh in the preceding table has no excellent scalability among the IGPs. This is because the nature of full-mesh topology is not very scalable. (The larger the mesh becomes, the more complicated the control plane will be.)

Route Redistribution Design Considerations

Route redistribution refers to the process of exchanging or injecting routing information (typically routing prefixes) between two different routing domains or protocols. However, route redistribution between routing domains does not always refer to the route redistribution between two different routing protocols. For example, redistribution between two OSPF routing domains where the border router runs two different OSPF instances (process) represents the redistribution between two routing domains using the same routing protocol.

Route redistribution is one of the most advanced routing design mechanisms commonly relied on by network designers to achieve certain design requirements, such as the following:

In merger and acquisition scenarios, route redistribution can sometimes facilitate routing integration between different organizations.
In large-scale networks, such as global organizations, where BGP might be used across the WAN core and different IGP islands connect to the BGP core, full or selective route redistribution can facilitate route injection between these protocols and routing domains in some scenarios.
Route redistributions can also be used as an interim solution during the migration from one routing protocol to another.

Note

None of the preceding points can be considered as an absolute use case for route redistribution because the use of route redistribution has no fixed rule or standard design. Therefore, network designers need to rely on experience when evaluating whether route redistribution needs to be used to meet the desired goal or whether other routing design mechanisms can be used instead, such as static routes.

Route redistribution can sometimes be as simple as adding a one-line command. However, its impact sometimes leads to major network outages because of routing loops or the black-holing of traffic, which can be introduced to the network if the redistribution was not planned and designed properly. That is why network designers must have a good understanding of the characteristics of the participating routing protocols and the exact aim of route redistribution. In general, route redistribution can be classified into two primary models, based on the number of redistribution boundary points:

Single redistribution boundary point
Multiple redistribution boundary points

Single Redistribution Boundary Point

This design model is the simplest and most basic route redistribution design model; it has minimal complexities, if any. Typically, the edge router between the routing domains can perform either one- or two-way route redistribution based on the desired goal without any concern, as depicted in Figure 27. This is based on the assumption that there is no other redistribution point between the same routing domains anywhere else across the entire network.

Figure 27: Single Redistribution Boundary Point

However, if the redistributing border router belongs to three routing domains, the route that is sourced from another routing protocol cannot be redistributed into a third routing protocol on the same router. For instance, in Figure 28, the route redistributed from EIGRP into OSPF cannot be redistributed again from OSPF into RIP.

Figure 28: Nontransitive Attribute of Route Redistribution

Multiple Redistribution Boundary Points

Networks with two or more redistribution boundary points between routing domains require careful planning and design prior to applying the redistribution into the production network, because it can lead to a complete or partial network outage. The primary issues that can be introduced by this design are as follows:

Routing loop
Suboptimal routing
Slower network convergence time

To optimize a network design that has two or more redistribution boundary points, network designers must consider the following aspects and how each may impact the network, along with the possible methods to address it based on the network architecture and the design requirements (for example optimal versus suboptimal routing):

Metric transformation
Administrative distance

Metric Transformation

Typically, each routing protocol has its own characteristic and algorithm to calculate network paths to determine the best path to use based on certain variables known as metrics. Because of the different metrics (measures) used by each protocol, the exchange of routing information between different routing protocols will lead to metric conversion so that the receiving routing protocol can understand this route, as well as be able to propagate this route throughout its routed domain. Therefore, specifying the metric at the redistribution point is important, so that the injected route can be understood and considered.

For instance, a common simple example is a redistribution from RIP into OSPF. RIP relies on hop counts to determine the best path, whereas OSPF considers link cost that is driven by the link bandwidth. Therefore, redistributing RIP into OSPF with a metric of 5 (five RIP hops) has no meaning to OSPF. Hence, OSPF assigns a default metric value to the redistributed external route. Furthermore, metric transformation can lead to routing loops if not planned and designed correctly when there are multiple redistribution points. For example, Figure 29 illustrates a scenario of mutual redistribution between RIP and OSPF over two border routers. Router A receives the RIP route from the RIP domain with a metric of 5, which means five hops. Router B will redistribute this route into the OSPF domain with the default redistribution metrics or any manually assigned metric. The issue in this scenario is that when the same route is redistributed back into the RIP domain with a lower metric (for example, 2), router A will see the same route with a better metric from the second border router. As a result, a routing loop will be formed based on this design (because of metric transformation).

Figure 29: Multipoint Routing Redistribution

Hypothetically, this metric issue can be fixed by redistributing the same route back into the RIP domain with a higher metric value (for example, 7). However, this will not guarantee the prevention of routing loops because there is another influencing factor in this scenario, which is the administrative distance (discussed next in more detail). Therefore, by using route filtering or a combination of route filtering and tagging to prevent the route from being reinjected into the same domain, network designers can avoid route looping issues in this type of scenario.

Administrative Distance

Some routing protocols assign a different administrative distance (AD) value to the redistributed route by default (typically higher than the locally learned route) to give it preference over the external (redistributed route). However, this value can be changed, which enables network designers and engineers to alter the default behavior with regard to route and path selection. From the route redistribution design point of view, AD can be a concern that requires special design considerations, especially when there are multiple points of redistribution with mutual route redistribution.

To resolve this issue, either route filtering or route tagging jointly with route filtering can be used to avoid reinjecting the redistributed (external) route back into the same originating routing domain. You can tune AD values to control the preferred route. However, this solution does not always provide the optimal path when there are multiple redistribution border routers performing mutual redistribution. If for any reason AD tuning is used, the network designer must be careful when considering this option, to ensure that routing protocols prefer internally learned prefixes over external ones (to avoid unexpected loops or suboptimal routing behavior).

Route Filtering Versus Route Tagging with Filtering

Route filtering and route tagging combined with route filtering are common and powerful routing policy mechanisms that you can use in many routing scenarios to control route propagation and advertisement and to prevent routing loops in situations where multiple redistribution boundary points are exits with mutual route redistribution between routing domains. However, these mechanisms have some differences that network designers must be aware of, as summarized in Table 8.

Table 8: Route Filtering Techniques Comparison

Design Consideration	Route Filtering	Route Tagging with Filtering
Scalability	Low	High
Manageability	Complex	Simple
Multipoint redistribution loop prevention	Yes (complex)	Yes (simple)
Multipoint redistribution optimal routing	Yes (complex)	Yes (simple)
Flexibility	Limited	Flexible

Based on the simple comparison in the table above, it is obvious that route filtering is more suitable for small and simple filtering and loop-prevention tasks. In contrast, route filtering associated with route tagging can support large-scale and dynamic networks to achieve more scalable and flexible routing policies across routing domains.

For example, in the scenario illustrated in Figure 30, there are two boundary redistribution points with mutual redistribution between EIGRP and IS-IS in both directions deployed at R1 and R2. In addition, R10 is injecting an external EIGRP route for an organization to communicate with its business partner; this route will typically have by default an AD value of 170.

After this external route was injected into the EIGRP domain, internal users connected to the IS-IS domain started complaining that they could not reach any of the intended destinations located at their business partner network.

Figure 30: Multipoint Route Redistribution: Routing Loop

This design has the following technical concerns:

Two redistribution boundary points
Mutual redistribution at each boundary point from a high AD domain (external EIGRP in this case) to a lower AD domain
Possibility of metric transformation (applicable to the external EIGRP route when redistributed back from IS-IS with better metrics)

As a result, a route looping will be formed with regard to the external EIGRP (between R1 and R2). With route filtering combined with tagging, as illustrated in Figure 31, both R1 and R2 can stop the reinjection of the redistributed external EIGRP route from IS-IS back into EIGRP again.

Figure 31: Route Filtering with Route Tagging

This is achieved by assigning a tag value to the EIGRP route when it is redistributed into IS-IS (at both R1 and R2). At the other redistribution boundary point (again R1 and R2), routes can be stopped from being redistributed into EIGRP again based on the assigned tag value. After you apply this filtering, the loop will be avoided, and path selection can be something like that depicted in Figure 32. With route tagging as in this example, network operators do not need to worry about managing and updating complicated access control lists (ACLs) to filter prefixes, because they can match the route tag at any node in the network and take action against it. Therefore, this offers simplified manageability and more flexible control.

Figure 32: Multipoint Route Redistribution: Routing After Filtering

The optimal path, however, will not be guaranteed in this case unless another local filtering is applied to deny the EIGRP route from being installed in the local IS-IS routing table of the boundary routers. However, this must be performed only if the optimal path is a priority requirement, to avoid impacting any potential loss of path redundancy. For instance, if R1 in Figure 32 filters the redistributed EIGRP external routes by R2 from being installed into the IS-IS local routing table (based on the assigned route tag by R2), the optimal path can be achieved. However, if there is a LAN or hosts connected directly to R1, R1 loses its connection to the EIGRP domain. In this case, any device or network that uses R1 as its gateway will not be able to reach the EIGRP external routes (unless there is a default route or a floating static route with higher AD that points to R2 within the IS-IS domain); in other words, to achieve the optimal path, a second filtering layer is required at the ASBRs (R1 and R2 in this example) to filter the “redistributed” external EIGRP routes by the other IS-IS ASBR from being reinjected into the IS-IS local routing table of the ASBR based on the route tag. Also, each ASBR should use a default route (ideally a static route, pointing to the other ASBR) to maintain redundancy to external prefixes in case of an ASBR link failure toward the EIGRP domain, as illustrated in Figure 33.

Figure 33: Multipoint Route Redistribution with Optimal Path: Failure Scenario

From a design point of view, achieving optimal network design does not mean an optimal path must always be considered. For example, as a network designer, you must look at the bigger picture using the “holistic approach” to evaluate and decide what are the possible options to achieve the design requirements optimally, and what are the possible implications of each design option.

For instance, in the scenario discussed previously, if the IS-IS domain is receiving a default route from an internal node such as an Internet edge router, injecting a default route from the ASBRs (R1 and R2) most likely will break the Internet reachability for the IS-IS domain or any network directly connected to R1 and R2. Therefore, if both paths (over R1 and R2, with or without asymmetrical routing) technically satisfy the requirements for the communication between this organization and its partner network, in this case, from a network design perspective, “optimal path” is not a requirement to achieve “optimal design.” Because optimal path can introduce design and operational complexity as well, it may break the Internet reachability in this particular scenario.

Note

Route tagging in some platforms requires the IS-IS “wide metric” feature to be enabled in order for the route tagging to work properly, where migrating IS-IS routed domain from “narrow metrics to wide metrics” must be considered in this case.

Note

If asymmetrical routing has a bad impact on the communication in the previous scenario, between EIGRP and IS-IS domains, it can be avoided by tuning EIGRP metrics such as delay, when the IS-IS route redistributed into EIGRP to control path selection from EIGRP domain point of view and align it with the selected path from IS-IS (to align both ingress and egress traffic flows).

Review Questions

1. What is used by OSPF to share routing and topology information?

Link-state PDU
Distant vector update
Link-state advertisement
Link-state hellos

Answer

c. Link-state advertisements (LSAs) are used by OSPF routers to exchange routing and topology information. When neighbors decide to exchange routes, they send a list of all LSAs in their respective topology database. Each router then checks its topology database and sends a Link State Request message requesting all LSAs that were not found in its topology table. Other routers respond with the Link State Update that contains all LSAs requested by the neighbor.

1. What type of link-state advertisement is generated by area border routers, which advertise networks from one area to another?

Summary LSA (type 3)
Network LSA (type 2)
Summary ASBR (type 4)
Not-so-stubby area LSA (type 7)

Answer

a. Type 3 LSAs are generated by area border routers (ABRs) to advertise networks from one area to the rest of the areas in an autonomous system.

1. Which OSPF network type will not elect a DR and BDR and have support for multi-access networks?

Broadcast
Point-to-multipoint
Point-to-point
Nonbroadcast

Answer

b. Point-to-multipoint indicates a topology where one interface can connect to multiple destinations. Each connection between a source and destination is treated as a point-to-point link. An example would be a Point-to-Multipoint Cisco Dynamic Multipoint VPN (DMVPN) topology. OSPF will not elect DRs and BDRs and all OSPF traffic is multicast to 224.0.0.5.

1. Which EIGRP packet types are sent using multicast? (Choose all that apply.)

Hello
Reply
Acknowledgement
Update
Query

Answer

a, d, and e. Neighborship is discovered and maintained using hello packets. These packets are sent using multicast. Update messages are used to send routing information to neighbors. These packets are sent to either one neighbor via unicast or to multiple neighbors via multicast. They are sent using Reliable Transport Protocol. EIGRP uses query packets when a router loses a path to a network. The router sends a query packet to its neighbors, asking if they have information on that network. These packets are sent via multicast and using Reliable Transport Protocol.

1. True or false: An EIGRP stub router will respond to queries and is a transit router.

True
False

Answer

b. An EIGRP stub router will inform neighbors via the hello packet that it’s a stub; by doing so, neighbors will not send queries to the router. EIGRP stubs are typically used at spoke locations, as stubs cannot be used as transit routers.

1. Which EIGRP packets use the Reliable Transport Protocol? (Choose all that apply.)

Hello
Reply
Acknowledgement
Update
Query

Answer

b, d, and e. Update messages are used to send routing information to neighbors and are sent using Reliable Transport Protocol. EIGRP uses query packets when a router loses a path to a network; these are sent via multicast using Reliable Transport Protocol. Reply packets are used by routers that received the query packet to respond to the query; these are sent unicast to the router that sent the query using Reliable Transport Protocol.

1. What does IS-IS use to make sure there is efficient flooding of LSPs in multiaccess networks?

Designated Intermediate System (DIS)
Designated router (DR)
LDP synchronization
Overload bit

Answer

a. IS-IS uses something like a designated router in OSPF, but in IS-IS it’s referred to as a Designated Intermediate System. A DIS is elected and is a pseudo node of the process. If you were to not have a DIS on a multiaccess environment, then all the LSPs would be flooded to other routers.

1. In IS-IS, a level 2 (L2) area router is considered the same as what router in OSPF?

Area border router
Backbone router
Autonomous system border router
Internal router

Answer

b. An IS-IS level 2 router has the link-state information for the intra-area as well as inter-area routing. The L2 router sends only L2 hellos. IS-IS level 2 area is similar and often compared to OSPF backbone area 0.

1. What is the benefit of setting the overload bit in IS-IS?

Increase scaling
Force DIS election
Black hole avoidance
Enable pseudo-node LSP

Answer

c. The overload bit in IS-IS is used to increase convergence and prevent black-holing of traffic in the environment. When the overload bit is set, it will gracefully redirect traffic around the device in which the bit is set, thus making it a non-transit router. By leveraging the overload bit, traffic will not be sent to routers where other processes (such as BGP) haven’t converged yet and therefore would drop the traffic.

BGP Routing

Border Gateway Protocol (BGP) is an Internet Engineering Task Force (IETF) protocol and the most scalable of all routing protocols. As such, BGP is considered the routing protocol of the global Internet, as well as for service provider–grade networks. In addition, BGP is the desirable routing protocol of today’s large-scale enterprise networks because of its flexible and powerful attributes and capabilities. Unlike IGPs, BGP is used mainly to exchange network layer reachability information (NLRI) between routing domains. (The routing domain in BGP terms is referred to as an autonomous system [AS]; typically, it is a logical entity with its own routing and policies and is usually under the same administrative control.) Therefore, BGP is almost always the preferred inter-AS routing protocol. A typical example is the global Internet, which is formed by numerous interconnected BGP autonomous systems.

There are two primary forms of BGP peering:

Interior BGP (iBGP): The peering between BGP neighbors that is contained within one AS
Exterior BGP (eBGP): The peering between BGP neighbors that occurs between the boundaries of different autonomous systems (interdomain)

Interdomain Routing

Typically, eBGP is mainly used to determine paths and route traffic between different autonomous systems; this function is known as inter-domain routing. Unlike an IGP (where routing is usually performed based on protocol metrics to determine the desired path within an AS), eBGP relies more on policies to route or interconnect two or more autonomous systems. The powerful policies of eBGP allow it to ignore several attributes of routing information that typically an IGP takes into consideration. Therefore, an eBGP can offer simpler and more flexible solutions to interconnect various autonomous systems based on predefined routing policies.

Table 9 summarizes common AS terminology with regard to the interdomain routing concept.

Table 9: Interdomain Routing Terminology

Term	Description
Stub AS	An AS that has one connection to one upstream AS
Stub multihomed AS	An AS that has connections to more than one AS, and typically should not offer a transit path
Transit AS	An AS that connects two or more autonomous systems to provide a transit path for traffic sources from one AS and destined to another AS

Furthermore, normally, each AS has its own characteristic in terms of administrative boundaries, geographic restrictions, QoS scheme, cost, and legal constraints. Therefore, for the routing policy control to deliver its value to the business with regard to these variables, there must be a high degree of flexibility in how and where the policy control can be imposed. Typically, there are three standard levels where interdomain routing control can be considered (inbound, transit, and outbound):

Inbound interdomain routing policy to influence which path egress traffic should use to reach other domains
Outbound interdomain routing policy to influence which path ingress traffic sourced from other domains should use to reach the intended destination prefixes within the local domain
Transportation interdomain routing policy to influence how traffic is routed across the transit domain as well as which prefixes and policy attributes from one domain are announced or passed to other neighboring domains, along with how these prefixes and policy attributes are announced (for example, summarized or non-summarized prefixes)

As a path-vector routing protocol, BGP has the most flexible and reliable attributes to match the various requirements of interdomain routing and control. Accordingly, BGP is considered the de facto routing protocol for the global Internet and large-scale networks, which require complex and interdomain routing control capabilities and policies.

BGP Attributes and Path Selection

BGP attributes, also known as path attributes, are sets of information attached to BGP updates. BGP primarily relies on these attributes to influence the process of best-path selection. These attributes are critical and effective when designing BGP routing architectures. A good understanding of these attributes and their behavior is a prerequisite to producing a successful BGP design. There are four primary types of BGP attributes, as summarized in Table 10.

Table 10: BGP Attributes

BGP Path Attribute	Characteristic
Well-known mandatory	Must appear in every update and must be supported by all speakers (for example, ORIGIN).
Well-known discretionary	May not be included in the update message but must be supported by all BGP speakers (for example, LOCAL_PREFERENCE).
Optional transitive	May be supported by BGP speakers, and they should be maintained and passed to other BGP AS peers whether or not they are supported (for example, COMMUNITY).
Optional nontransitive	May or may not be supported by BGP speakers. If an update is received that includes an optional transitive attribute, it is not required that the router pass it on (for example, MULTI_EXIT_DISC).

The following list highlights the typical BGP route selection (from the highest to the lowest preference):

Prefer highest weight (Cisco proprietary, local to router)
Prefer highest local preference (global within AS)
Prefer route originated by the local router
Prefer shortest AS path
Prefer lowest origin code (IGP < EGP < incomplete)
Prefer lowest MED (from other AS)
Prefer eBGP path over iBGP path
Prefer the path through the closest IGP neighbor
Prefer oldest route for eBGP paths
Prefer the path with the lowest neighbor BGP router ID

Note

For more information about BGP path selection, refer to the document “BGP Best Path Selection Algorithm” on Cisco’s website.

BGP as the Enterprise Core Routing Protocol

Most enterprises prefer IGPs such as OSPF as the core routing protocol to provide end-to-end enterprise IP reachability. However, in some scenarios, network designers may prefer a protocol that can provide more flexible and robust routing policies and can cover single- and multi-routing domains with the ability to facilitate a diversified administrative control approach.

For example, an enterprise may have a large core network that connects different regions or large department networks, each with its own administrative control. To achieve that, we need a protocol that can provide interconnects between all the places in the network (PINs) and at the same time enable each group or region to maintain the ability to control its network without introducing any added complexity when connecting the PINs. Obviously, a typical IGP implementation in the core cannot achieve that, and even if it is possible, it will be very complex to scale and manage.

In other words, when the IGP of a large-scale global enterprise’s network reaches the borderline of its scalability limits within the routed network, which usually contains a high number of routing prefixes, a high level of flexibility is required to support “splitting routed networks” into multiple failure domains with distributed network administration. In this scenario, BGP is the ideal candidate protocol as the enterprise core routing protocol.

BGP in the enterprise core can offer the following benefits to the overall routing architecture:

A high degree of responsiveness to new business requirements, such as business expansion, business decline, innovation (IPv6 over IPv4 core), and security policies like end-to-end path separation (for example, MP-BGP + MPLS in the core)
Design simplicity (separating complex functional areas, each into its own routed region within the enterprise)
Flexible domain control by supporting administrative control per routing domains (per region)
More flexible and manageable routing policies that support intra- and interdomain routing requirements
Improved scalability because it can significantly reduce the number of prefixes that regional routing domains need to hold and process
Optimized network stability by stressing fault isolation domain boundaries (for example, at IGP island edges), where any control plane instability in one IGP/BGP domain will not impact other routing domains (topology and reachability information hiding principle)

However, network designers need to consider some limitations or concerns that BGP might introduce to the enterprise routing architecture when used as the core routing protocol:

Convergence time: In general, BGP convergence time during a change or following a failure event is slower than IGP. However, this can be mitigated to a good extent when advanced BGP fast convergence techniques are well-tuned, such as BGP Prefix Independent Convergence (BGP-PIC).
Staff knowledge and operational complexity: BGP in the enterprise core can simplify the routing design. However, additional knowledge and experience for the operation staff are required because the network will be more complex to troubleshoot, especially if multiple control policies in different directions are applied for control and traffic engineering purposes.
Hardware and software constraints: Some legacy or low-end network devices either do not support BGP or may require a software upgrade to support it. In both cases, there is a cost and the possibility of a maintenance outage for the upgrade. This might not always be an acceptable or supported practice by the business.

Enterprise Core Routing Design Models with BGP

This section highlights and compares the primary and most common design models that network designers and architects can consider for large-scale enterprise networks with BGP as the core routing protocol (as illustrated in Figure 34 through Figure 37). These design models are based on the design principle of dividing the enterprise network into a two-tiered hierarchy. This hierarchy includes a transit core network to which a number of access or regional networks are attached. Typically, the transit core network runs BGP and glues the different geographic areas (network islands) of the enterprise regional networks. In addition, no direct link should interconnect the regional networks. Ideally, traffic from one regional network to another must traverse the BGP core.

However, each network has unique and different requirements. Therefore, all the design models discussed in this section support the existence or addition of backdoor links between the different regions; remember to always consider the added complexity to the design with this approach:

Design model 1: This design model (see Figure 34) has the following characteristics:
- iBGP is used across the core only.
- Regional networks use IGP only.
- Border routers between each regional network and the core run IGP and iBGP.
- IGP in the core is mainly used to provide next-hop (NHP) reachability for iBGP speakers.

Figure 34: BGP Core Design Model 1

Design model 2: This design model (see Figure 35) has the following characteristics:
- BGP is used across the core and regional networks.
- Each regional network has its own BGP AS number (ASN) (no direct BGP session between the regional networks).
- Reachability information is exchanged between each regional network and the core over eBGP (no direct BGP session between regional networks).
- IGP in the core as well as at the regional networks is mainly used to provide NHP reachability for iBGP speakers in each domain.

Figure 35: BGP Core Design Model 2

Design model 3: This design model (see Figure 36) has the following characteristics:
- MP-BGP is used across the core (MPLS L3VPN design model).
- MPLS is enabled across the core.
- Regional networks can run either static IGP or BGP.
- IGP in the core is mainly used to provide NHP reachability for MP-BGP speakers.

Figure 36: BGP Core Design Model 3

Design model 4: This design model (see Figure 37) has the following characteristics:
- BGP is used across the regional networks.
- Each regional network has its own BGP ASN.
- Reachability information is exchanged between the regional networks directly over direct eBGP sessions.
- IGP can be used at the regional networks to provide local reachability within each region and may be required to provide NHP reachability for BGP speakers in each domain (BGP AS).

Figure 37: BGP Core Design Model 4

These designs are all valid and proven design models; however, each has its own strengths and weaknesses in certain areas, as summarized in Table 11. During the planning phase of network design or design optimization, network designers or architects must select the most suitable design model as driven by other design requirements, such as business and application requirements (which ideally must align with the current business needs and provide support for business directions such as business expansion).

Table 11: Comparing BGP Core Design Models

Design Model	Core	Branches/Region	Design Model Attributes
Design Model 1	iBGP	IGP only	Offers the least administrative domain control as compared to other models. Can be suitable in large-scale environments to overcome IGP complexities in the core, and offers more control between regions compared to IGP-based only. Moderate operational complexity.
Design Model 2	iBGP	iBGP + IGP	Offers moderate administrative domain control between routing regions. Can be suitable for environments under multiple admin domains and large-scale (global) enterprise WANs. Low operational complexity.
Design Model 3	MP-iBGP + MPLS	iBGP/eBGP/IGP or eiBGP + IGP	Offers the highest administrative domain control between routing regions, combined with the ability to control multiple routing islands in different places with end-to-end path isolation. Can be suitable for environments under multiple admin domains and large-scale (global) enterprise WANs. Offers the highest flexibility and simplicity to introduce new capabilities across the entire enterprise or for specific regions only (for example, IPv6, multicast, and end-to-end traffic separation). High operational complexity.
Design Model 4	eBGP	iBGP + IGP	Offers high administrative domain control between routing regions. Can be suitable for merging networks scenarios and environments under multiple admin domains (global organizations). Moderate operational complexity.

Note

IGP or control plane complexity referred to in the table above is in comparison to the end-to-end IGP-based design model, specifically across the core.

BGP Shortest Path over the Enterprise Core

BGP as a path-vector control plane protocol normally prefers the path with the smallest number of autonomous systems when traversing multiple autonomous systems when other attributes such as local preference are the same (classical interdomain routing scenarios). Typically, in interdomain routing scenarios, the different routed domains have their own policies, which do not always need to be exposed to other routing domains. However, in the enterprise core with BGP scenarios, when a router selects a specific path based on the BGP AS-PATH attribute, the “edge eBGP” nodes cannot determine which path within the selected core or transit BGP core AS is the shortest (hypothetically, the optimal path). For instance, the scenario in Figure 38 depicts design model 2 of BGP enterprise core. The question is, “How can router A decide which path is the shortest (optimal) within the enterprise core (AS 65000)?”

Figure 38: BGP AIGP

Accumulated IGP Cost for BGP (AIGP) is an optional nontransitive BGP path attribute, designed to enhance shortest path selection in scenarios like the one in the example, where a large-scale network is part of a single enterprise with multiple administrative domains using multiple contiguous BGP networks (BGP core routing design model 2, discussed earlier in this section). Therefore, it is almost always more desirable than BGP considering the shortest path with the lowest metric across the transit BGP core. In fact, AIGP replicates the behavior of link-state routing protocols in computing the distance associated with a path that has routes within a single flooding domain. Although the BGP MED attribute can carry IGP metric values, MED comes after several BGP attributes in the path selection process.

In contrast, AIGP is considered before the AS-PATH attribute when enabled in the BGP path selection process, which makes it more influential in this type of scenario:

Prefer the highest weight (Cisco proprietary, local to router)
Prefer highest local preference (global within single AS)
Prefer route originated by the local router
Prefer lowest AIGP cost
Prefer shortest AS path
Prefer lowest origin code (IGP < EGP < incomplete)
Prefer lowest MED (from other AS)
Prefer eBGP path over iBGP path
Prefer the path through the closest IGP neighbor

It is obvious that AIGP can be a powerful feature to optimize the BGP path selection process across a transit AS. However, network designers must be careful when enabling this feature because when AIGP is enabled, any alteration to the IGP routing can lead to a direct impact on BGP routing (optimal path versus routing stability).

BGP Scalability Design Options and Considerations

This section discusses the primary design options to scale BGP in general at an enterprise network grade. The natural behavior of BGP can be challenging when the size of the network grows to a large number of BGP peers, because it will introduce a high number of route advertisements, along with scalability and manageability complexities and limitations. According to the default behavior of BGP, any iBGP-learned route will not be advertised to any iBGP peer (the typical BGP loop-prevention mechanism, also known as the iBGP split-horizon rule). This means that a full mesh of iBGP peering sessions is required to maintain full reachability across the network. On this basis, if a network has 15 BGP routers within an AS, a full mesh of iBGP peering will require (15(15 – 1) / 2) = 105 iBGP sessions to manage within an AS. Consequently, it will be a network that has a large amount of configuration associated with a high probability of configuration errors, is complex to troubleshoot, and has very limited scalability. However, BGP has two main proven techniques that you can use to reduce or eliminate these limitations and complexities of the BGP control plane:

Route reflection (described in RFC 4456)
Confederation (described in RFC 3065)

BGP Route Reflection

Route reflection is a BGP route advertisement mechanism based on relaying the iBGP-learned routes from other iBGP peers. This process involves a special BGP peer or set of peers called route reflectors (RRs). These RRs can alter the classical iBGP split-horizon rule by re-advertising the BGP route that was received from iBGP peers to other iBGP peers, also known as route reflector clients, which can significantly reduce the total number of iBGP sessions. Moreover, RRs reflect routes to nonclient iBGP peers as well, in certain cases.

Figure 39 summarizes RR route advertisement rules based on three primary route sources and receivers in terms of the BGP session type (eBGP, iBGP RR client, and iBGP non-RR client).

Figure 39: RR Route Advertisement Rules

It is obvious from the figure that the route(s) sourced from an iBGP non-RR client peer(s) will not be re-advertised by the RR to another iBGP non-RR client peer(s).

As a result, the concept of RR can help network designers avoid the complexities and limitations associated with iBGP full-mesh sessions, where more scalable and manageable designs can be produced. However, BGP RR can introduce new challenges that network designers should take into account, such as redundancy, optimal path selection, and network convergence.

Route Reflector Redundancy

In BGP environments, RRs can introduce a single point of failure to the design if no redundancy mechanism is considered. RR clustering is designed to provide redundancy, where typically two (or more) RRs can be grouped to serve one or more iBGP clients. With RR clustering, technically, BGP uses special 4-byte attributes called CLUSTER_ID. Each route exchanged between these RRs in the same cluster will be ignored and not installed in their BGP routing table if the corresponding route identified by the receiving RR has the same CLUSTER_ID attribute that is being used. However, in some situations, it is recommended that two redundant RRs be configured with different CLUSTER_IDs for an increased level of BGP routing redundancy.

For instance, the RR client in Figure 40 is multihomed to two RRs leveraging the link addresses (not loopback addresses) for the corresponding iBGP neighborships. If each RR is deployed with a different CLUSTER_ID, the RR client will continue to be able to reach prefix X, even after the link with RR 1 fails.

Figure 40: RR Clustering

In contrast, if RR 1 and RR 2 were deployed with the same CLUSTER_ID, after this failure event the RR client in Figure 40 would not be able to reach prefix X. This is because the CLUSTER_ID attribute mechanism will stop the propagation of a route from RR 1 to RR 2 with the same CLUSTER_ID.

Furthermore, two BGP attributes were created specifically to optimize redundant RR behavior, especially with regard to avoiding routing information loops (for example, duplicate routing information). If the redundant RRs are being deployed in different clusters, the two attributes are ORIGINATOR_ID and CLUSTER_LIST.

RR Logical and Physical Topology Alignment

As discussed in Chapter 1, the physical topology forms the foundation of many design scenarios, including BGP RRs. In fact, with BGP RRs, the logical and physical topologies must be given special consideration. They should be as congruent as possible to avoid any undesirable behaviors, such as suboptimal routing and routing loops. For example, the scenario depicted in Figure 41 is based on an enterprise network that uses BGP as the core routing protocol (based on design model 1, discussed earlier in this chapter). In this scenario, the data center is located miles away from the campus core and is connected over two dark fiber links. The enterprise campus core routers C and D are configured as BGP RR (same RR cluster) to aggregate iBGP sessions of the campus buildings and data center routers. Data center aggregation router E is the iBGP client of core RR D, and data center aggregation router F is the iBGP client of core RR C.

Figure 41: BGP RR Physical and Logical Topology Congruence

If the prefix 200.1.1.1 is advertised by both Internet edge routers (A and B), typically router A will advertise it to core router C, and router B will advertise it to core router D over eBGP sessions. Then, each RR will advertise this prefix to its clients. (RR C will advertise it to data center aggregation router F, and RR D will advertise it to data center aggregation router E.) Up to this stage, there is no issue. However, when routers E and F try to reach prefix 200.1.1.1, a loop will be formed, as follows:

Note

For simplicity, this scenario assumes that both campus cores (RR) advertise the next-hop IPs of the Internet edge routers to all the campus blocks.

Based on the design in Figure 41, data center aggregation router E will have the next hop to prefix 200.1.1.1 as Internet edge router B.
Data center aggregation router F will have the next hop to prefix 200.1.1.1 as Internet edge router A.
Data center aggregation router E will forward the packets destined to prefix 200.1.1.1 to data center aggregation router F. (Based on physical connectivity and IGP, the Internet edge router B is reachable via data center aggregation router F from the data center aggregation router E point of view.)
Because data center aggregation router F has prefix 200.1.1.1, which is reachable through A, it will then send the packet back to data center aggregation router E, as illustrated in Figure 42.

Figure 42: BGP RR and Physical Topology Congruence: Routing Loop

This loop was obviously formed because there is no alignment (congruence) between iBGP-RR topology and the physical topology. The following are three simple possible ways to overcome this design issue and continue using RRs in this network:

Add a physical link directly between E and D and between F and C, along with an iBGP session over each link to the respective core router. (It might take a long time to provision a fiber link, or it might be an expensive solution from the business point of view.)
Align the iBGP-RR peering with physical topology by making E the iBGP client to RR C and F the iBGP client to RR D (the simplest solution), as illustrated in Figure 43.
Add a direct link between core RRs and place each RR in a different RR cluster along with a direct iBGP session between them. (This might add control plane complexity in this particular scenario to align IGP and BGP paths without alignment between the physical topology and iBGP client to RR sessions.)

Figure 43: BGP RR Alignment with the Physical Topology

Note

One of the common limitations of the route reflection concept in large BGP environments is the possibility of suboptimal routing. This point is covered in more detail later in this book.

Update Grouping

Update grouping helps to optimize BGP processing overhead by providing a mechanism that groups BGP peers that have the same outbound policy in one update group, and updates are then generated once per group. By integrating this function with BGP route reflection, each RR update message can be generated once per update group and then replicated for all the RR clients that are part of the relevant group. This can significantly reduce the number of BGP updates that need to be processed by the RR and its clients, especially in large BGP networks with many peers and frequent updates. In addition, update grouping can help to optimize the BGP control plane performance by reducing the CPU utilization on the RR and its clients, which can lead to faster convergence times and improved overall network stability.

Technically, update grouping can be achieved by using peer group or peer template features, which can enhance BGP RR functionality and simplify the overall network operations in large BGP networks by:

Making the configuration easier, less error-prone, and more readable
Lowering CPU utilization
Speeding up iBGP client provisioning (because they can be configured and added quickly)

BGP Confederation

The other option to solve iBGP scalability limitations in large-scale networks is through the use of confederations. The concept of a BGP confederation is based on splitting a large iBGP domain into multiple (smaller) BGP domains (also known as sub-autonomous systems). The BGP communication between these sub-autonomous systems is formed over eBGP sessions (a special type of eBGP session referred to as an intra-confederation eBGP session). Consequently, the BGP network can scale and support a larger number of BGP peers because there is no need to maintain a full mesh among the sub-autonomous systems; however, within each sub-AS iBGP, full mesh is required.

Note

The intra-confederation eBGP session has a mixture of both iBGP and eBGP characteristics. For example, NEXT_HOP, MED, and LOCAL_PREFERENCE attributes are kept between sub-autonomous systems. However, the AS_PATH is changed with updates across the sub-autonomous systems.

Note

The confederations appear as a single AS to external BGP autonomous systems. Because the sub-AS topology is invisible to external peering BGP autonomous systems, the sub-AS is also removed from the eBGP update sent to any external eBGP peer.

In large iBGP environments like a global enterprise (or Internet service provider [ISP] type of network), you can use both RR and confederation jointly to maximize the flexibility and scalability of the design. As illustrated in Figure 44, the confederation can help to split the BGP AS into sub-autonomous systems, where each sub-AS can be managed and controlled by a different team or business unit. At the same time, within each AS, the RR concept is used to reduce iBGP full-mesh session complexity. In addition, network designers must make sure that IGP metrics within any given sub-AS are lower than those between sub-autonomous systems to avoid any possibility of suboptimal routing issues within the confederation AS.

Note

To avoid BGP route oscillation, which is associated with RRs or confederations in some scenarios, network designers must consider deploying higher IGP metrics between sub-autonomous systems or RR clusters than those within the sub-AS or cluster.

Figure 44: BGP Confederation and RR

Note

Although BGP route reflection combined with confederation can maximize the overall BGP flexibility and scalability, it may add complexity to the design if the combination of both is not required. For instance, when merging two networks with a large number of iBGP peers in each domain, confederation with RR might be a feasible joint approach to optimize and migrate these two networks if it does not compromise any other requirements. However, with a large network with a large number of iBGP peers in one AS that cannot afford major outages and configuration changes within the network, it is more desirable to optimize using RR only rather than combined with confederation.

Confederation Versus Route Reflection

The most common dilemma is whether to use route reflection or confederation to optimize iBGP scalability. The typical solution to this dilemma, from a design point of view, is “it depends.” Like any other design decision, deciding what technology or feature to use to enhance BGP design and scalability depends on different factors. Table 12 highlights the different factors that can help you narrow down the design decision with regard to BGP confederation versus route reflection.

Table 12: Confederation Versus RR

	Route Reflection (RR)	Confederation (Conf)	Conf + RR
IGP architecture	Ideally one IGP domain	Supports multiple IGP domains	Supports multiple IGP domains
Hierarchical topology	More flexible	Less flexible	Flexible within the sub-AS
Policy control	Less control	More control between domains	More control between domains
Control plane complexity	Moderate	The larger the sub-AS, the higher the control plane complexity	Low (optimized)
Optimal routing	May be affected	Maintained within and between sub-autonomous systems	May be affected
Integration with MPLS-TE	Simple	Simple within the same sub-AS, complex between sub-autonomous systems	Simple within the same sub-AS, complex between sub-autonomous systems

Again, there is no 100 percent definite answer. As a designer, you can decide which way to go based on the information and architecture you have and the goals that need to be achieved, taking the factors highlighted in the table above into consideration.

Review Questions

1. In BGP, what nontransitive BGP attribute that is also standards-based is commonly leveraged on ingress to influence egress traffic flows?

AS Path Prepend
Weight
Route-Map
Local Preference

Answer

d. To manipulate traffic inside your own AS, local preference can be used. Local preference is carried inside an AS (iBGP) so you can manipulate traffic at one node and the attribute is carried inside your AS.

1. How would you influence traffic inbound to your AS? (Choose all that apply.)

AS Path Prepend
Weight
Multi-Exit Discriminator
Local Preference

Answer

a and c. AS Path prepend is a very common way to influence traffic into your AS. If you want to prefer a router over another, then on the router that is less preferred, add additional AS to the path to make the route “look not as good.” Another option, although less common, is the use of MED. MED may be valid when connected to the same neighboring AS with multiple connections versus connecting to different ASs.

Enterprise Routing Design Recommendations

This chapter discussed several concepts and approaches pertaining to Layer 3 control plane routing design. Table 13 summarizes the main Layer 3 routing design considerations and recommendations in a simplified way that you can use as a foundation to optimize the overall routing design.

Table 13: Layer 3 Routing Design Considerations and Recommendations

Design Consideration	Design Recommendations
Scalability	Modular routing design (contain and optimize fault domains design). Reduce the number of prefixes (for example, suppress the advertisement of transport link IPs, routes summarization).
Resiliency and fast convergence	Reduce the number of prefixes (for example, suppress the advertisement of transport link IPs, routes summarization). Modular routing design. Fast detection, processing, and reaction to the failure. LFA can be used with link state in some scenarios. When possible, design in triangles rather than squares between the routed layers.
Control and security	Enable routing authentication. Suppress peering with end-host VLANs (passive interface). Route filtering and tagging between routing domains.

In large-scale enterprise networks with different modules and many remote sites, selecting a routing protocol can be a real challenge. Therefore, network designers need to consider the answers to the following questions as a foundation for routing protocol selection:

What is the underlying topology, and which protocol can scale to a larger number of prefixes and peers?
Which routing protocol can be more flexible, considering the topology and future plans (for example, integrating with other routing domains)?
Is fast convergence a requirement? If yes, which protocol can converge faster and at the same time offer stability enhancement mechanisms?
Which protocol can utilize fewer hardware resources?
Is the routing internal or external (different routing domains)?
Which protocol can provide less operational complexity (for instance, easy to configure and troubleshoot)?

Although these questions are not the only ones, they cover the most important functional requirements that can be delivered by a routing protocol. Furthermore, there are some factors that you need to consider when selecting an IGP:

Size of the network (for example, the number of L3 hops and expected future growth)
Security requirements and the supported authentication type
IT staff knowledge and experience
Protocol’s flexibility in the modular network such as support of flexible route summarization techniques

Generally speaking, EIGRP tends to be simpler and more scalable in hub-and-spoke topology and over networks with three or more hierarchical layers, whereas link-state routing protocols can perform better over flat networks when flooding domains and other factors discussed earlier in this book are tuned properly. In contrast, BGP is the preferred protocol to communicate between different routing domains (external), as summarized in Figure 45.

Figure 45: Routing Protocol Selection Decision Tree

Moreover, the decision tree depicted in Figure 46 highlights the routing protocol selection decision to migrate from one routing protocol to another based on the topology used. This tree is based on the assumption that you have the choice to select the preferred protocol.

Figure 46: Routing Protocol Migration Decision Tree

Review Questions

Summary

For network designers and architects to provide a valid and feasible network design (including both Layer 2 and Layer 3), they must understand the characteristics of the nominated or used control protocols and how each behaves over the targeted physical network topology. This understanding will enable them to align the chosen protocol behavior with the business, functional, and application requirements, to achieve a successful business-driven network design. Also, considering any Layer 2 or Layer 3 design optimization technique, such as route summarization, may introduce new design concerns (during normal or failure scenarios), such as suboptimal routing. Therefore, the impact of any design optimization must be taken into consideration and analyzed, to ensure the selected optimization technique will not introduce new issues or complexities to the network that could impact its primary business functions. Ideally, the requirements of the business-critical applications and business priorities should drive design decisions.