Network Virtualization

ccde-written
Published

April 21, 2026

Overview

In modern networks, virtualization has become a fundamental design pillar that enables service providers and enterprises to deliver multiple logical network services over a shared physical infrastructure. Network virtualization decouples the logical network topology from the underlying physical topology, enabling greater flexibility, scalability, and operational efficiency. This chapter covers the different design options and considerations of network virtualization, including the forwarding and control plane mechanisms of MPLS, MP-BGP, and software-defined networking.

This chapter covers the following topics:

  • MPLS: This section covers critical MPLS topics and network design elements for MPLS.
  • Software-Defined Networks: This section covers SD-WAN and SD-LAN in a vendor-agnostic perspective to provide the corresponding network design elements associated with the inherent capabilities these solutions provide.

MPLS

Over the years, Multiprotocol Label Switching (MPLS) has become more popular within the large enterprise space. The benefits and overall design goals are as follows:

  • Support a large number of customer groups and a large number of sites per group (scalable).
  • Provide customers with value-added services that can create new revenue-generation sources such as service differentiation–capable transport with end-to-end quality of service (QoS) (flexible and reliable).
  • Support various services for all enterprise customers over one unified infrastructure. Having a single consolidated network reduces capital expenditure (CAPEX) and can offer a significant return on investment (ROI) to the large enterprise business on its hardware equipment (with the virtualized or overlaid architectures).
  • Provide flexible service provisioning over various media access methods.

In addition, the MPLS peer model has proven its flexibility and reliability in fulfilling these goals for many large enterprise customers by offering:

  • A single infrastructure that can serve all VPN customers (as shown in Figure 1).
  • The optimization of OPEX. For instance, adding a new customer or a new site for an existing customer will require simple changes to the relevant edge nodes (provider edge [PE] nodes) only, as the core control plane intelligence is pushed to the provider cloud.
  • The opening of new revenue-generation sources to the business by offering differentiated services for its customers, such as prioritization and expedited forwarding for voice.
  • The optimization of time to market to introduce new services to the organization’s L3VPN customers, such as IPv6 and multicast support.
  • A high degree of flexibility by offering various media access methods for the organization’s customers, such as legacy equipment, Ethernet over copper or fiber, and Long Term Evolution (LTE) or 5G.
Frame RelayTDM/ISDNATMMPLS L3VPN
Core Control Plane Intelligence
Core Control Plane Intelligen...
Overlay Model
Overlay Model
Peer Model
Peer Model
Individually Overlaid
Private IP Networks per Customer
(Customers' Engineered)
Individually Overlaid...
Single Transport Network
And Control Plane Supporting Multiple 
Customers' IP VPN
Single Transport Network...
Text is not SVG - cannot display
Figure 1: MPLS Peer Model vs. Overlay Model

MPLS Architecture Components

In a typical MPLS environment, the architecture is constructed of the following components:

  • Customer edge (CE)
  • Provider edge (PE)
  • Provider nodes (P)

In the typical MPLS architecture, the provider edge nodes (PEs) carry customer routing information to inject customer routes from the directly connected customer edge nodes (CEs), each to the relevant Multiprotocol Border Gateway Protocol (MP-BGP) VPNv4/v6, along with the relevant VPN and transport MPLS labels (label edge router [LER]). This achieves the optimal routing of traffic that pertains to each customer within each VPN routing domain. However, provider routers (Ps) at the core of the network are mainly responsible for switching MPLS labeled packets. Therefore, they are also known as label switching routers (LSRs). Figure 2 illustrates the primary components of an MPLS architecture.

PE-3PPPE-3PE-2PE-1Cx-A-1Cx-B-1Cx-A-2Cx-B-2Cx-A-3Cx-B-3Cx-A-2Cx-B-4Provider Network (MPLS VPN)
Customer Network
Customer Network
Customer Network
Customer Network
Text is not SVG - cannot display
Figure 2: Primary Elements of an MPLS Architecture

On top of the architectural components, the different control plane protocols are overlaid to construct the control and forwarding planes for each VPN network (per customer). Figure 3 shows the relationship between the different control and forwarding plane components in an MPLS architecture.

Core TransportMP-BGPCustomer Routes- IGP- BGP- Static- VPNv4- VPNv6- VPN Label- IGP- Transport Label
Figure 3: MPLS Control Plane Components

The actual communication in a typical MPLS environment is driven by the following three primary elements:

  • Routing information isolation between different VPNs (for example, Virtual Routing and Forwarding [VRF] instances)
  • Controlled sharing of routing information to sites within a VPN or between different VPNs (for example, route distinguisher [RD] + route target [RT] + MP-BGP)
  • MPLS traffic forwarding of packets across the MPLS core (for example, VPN and transport labels)

MPLS Control Plane Components

This section covers the primary control plane elements of an MPLS environment.

Virtual Routing and Forwarding

Virtual Routing and Forwarding (VRF) is one of the primary mechanisms used in today’s modern networks to maintain routing isolation on a Layer 3 device level. In MPLS architecture, each PE holds a separate routing and forwarding instance per VRF per customer, as shown in Figure 4. Typically, each customer’s VPN is associated with at least one VRF. Maintaining multiple VRFs on the same PE is similar to maintaining multiple dedicated routers for customers connecting to the provider network. In addition, maintaining multiple forwarding tables at the PE is essential to support overlapping address spaces. Normally, the routing information of each customer (VPN) is installed at the relevant VRF routing tables of a PE, either from directly connected CEs (using a VRF-aware interior gateway protocol [IGP], BGP, or static route) or routes of other CEs learned via remote PEs over MP-BGP VPNv4/v6.

PE-1Cx-A-1Cx-B-1
Static
Static
IGP
IGP
VRF 1VRF 2
MP-BGP
MP-BGP
MPLS Core
MPLS Core
Text is not SVG - cannot display
Figure 4: MPLS L3VPN: Virtual Routing and Forwarding

Route Distinguisher

For an MPLS L3VPN to support having multiple customer VPNs with overlapping addresses and to maintain the control plane separation, the PE router must be capable of using processes that enable overlapping address spaces of multiple customers’ VPNs. This is accomplished by using a route distinguisher (RD) per VPN or per VRF instance. As a result, the MPLS core can seamlessly transport customers’ routes (overlapped and non-overlapped) over one common infrastructure and control plane protocol, taking advantage of the RD prepended per MP-BGP VPNv4/v6 prefix. The RD value can be allocated using different approaches, each with its own strengths and weaknesses, as summarized in Table 1.

Table 1: MPLS L3VPN RD Allocation Models
RD Model Strength Weakness Suitable Scenario
Unique RD per VPN Simple to design and manage. Lower hardware resource consumption. Lacks load-balancing capability when VPN route reflection (RR) is used and customers have multihomed CE routes. Very-large-scale MPLS VPN without load-balancing or load-sharing requirements toward multihomed sites.
Unique RD per VPN per PE Offers the ability to load balance traffic toward multihomed sites connected to different PEs. Requires more hardware resources. Higher design and operational complexity. Large-scale MPLS VPN with load-balancing or load-sharing requirements toward multihomed sites.
Unique RD per VPN per interface of each PE Simplifies identifying sites within a VPN. Highest hardware resource utilization. High design and operational complexity. MPLS VPN of an enterprise with a small number of VPNs requiring simplified identification of route origin per site.
Note

Based on the RD allocation models, a single VPN may include multiple RDs with different VRFs. However, the attributes of the VPN (per customer) will not change and is still considered an intra-VPN because technically the route propagation is controlled based on the import/export of the RT values.

Figure 5 illustrates the different RD allocation models discussed in Table 1.

MPLS L3VPNPEPEPEABBABB200200100200200100RD =Unique RD Per VPNMPLS L3VPNPEPEPEAA102100101RD =Unique RD Per VPN Per PEMPLS L3VPNPEPEPEABBABB202203100200201101RD =Unique RD Per VPN per PEPer Interface
Figure 5: MPLS L3VPN RD Allocation Models

Route Targets

Route targets (RTs) are an additional identifier and are considered part of the primary control plane elements of a typical MPLS L3VPN architecture because they facilitate the identification of which VRF instance can install which VPN routes. In fact, RTs represent the policies that govern the connectivity between customer sites. This is achieved via controlling the import and export RTs. Technically, in an MPLS VPN environment, the export RT identifies a VPN membership with regard to the existing VRFs on other PEs, whereas the import RT is associated with each PE local VRF. The import RT recognizes and maps the VPN routes (received from remote PEs or leaked on the local PE from other VRF instances) to be imported into the relevant VRF instance of any given customer. In other words, RTs offer network designers a powerful capability to control what MP-BGP VPN route is to be installed in any given VRF/customer routing instance. In addition, they provide flexibility to create various logical L3VPN (WAN) topologies for the enterprise customer, such as any-to-any, hub-and-spoke, and partially meshed, to meet different connectivity requirements.

L3VPN Forwarding Plane

In addition to the control plane discussed earlier, the data or forwarding plane forms the other major component of typical MPLS L3VPN building blocks. In MPLS VPN environments, the data plane is based on forwarding packets based on labels (transport labels, VPN labels, MPLS Traffic Engineering [MPLS-TE] labels, and so on). This section focuses on the VPN label.

Typically, VPN traffic is assigned to a VPN label at the egress PE (LER) that can be used by the remote ingress PEs (LER), where the egress PE demultiplexes the traffic to the correct VPN customer egress interface based on the assigned VPN label. In other words, the VPN label is generated and assigned to every VPN route by the egress PE router, then advertised to the ingress PE routers over an MP-BGP update. Therefore, it is only understood by the egress PE node that performs demultiplexing to forward traffic to the respective VPN customer egress interface/CE based on its VPN label. This is true for all local labels regardless of which protocol allocates and binds them.

The VPN labels in MPLS VPN architectures can be allocated by the PE nodes using different models for MP-BGP L3VPN routes, based on the scenario and the design requirements. The VPN labels also offer network designers the flexibility to achieve a level of trade-offs between network performance and scalability when possible. The following are the common MPLS VPN label-allocation models:

  • Per prefix: In this model, a VPN label is assigned for each VPN prefix. Although this model can generate a large number of labels, it is required in scenarios where the VPN packets sent between the PE and CE are label switched, such as in Carrier supporting Carrier (CsC) designs.
  • Per VRF: In this model, a single label is allocated to all local VPN routes of any given PE in a given VRF. This model offers an efficient label space and BGP advertisements. In addition, some vendor platforms support the same per-VRF label for both IPv4 and IPv6 prefixes.
  • Per CE: The PE router allocates one label for every immediate next hop; in most cases, this would be a CE router. This label is directly mapped to the next hop, so there is no VRF route lookup performed during data forwarding. However, the number of labels allocated is one for each CE rather than one for each VRF. Because BGP knows all the next hops, it assigns a label for each next hop (not for each PE-CE interface). When the outgoing interface is a multiaccess interface and the MAC address of the neighbor is not known, ARP is triggered during packet forwarding.

Network designers must be careful if they plan to change the default label allocation behavior, because any inconsistency or simple error can lead to a broken forwarding plane that can easily bring down the entire network or a portion of the network. In a service provider network, a PE that goes down this way may result in several customer sites (usually single-homed ones) being out of service, which can impact the business significantly, especially if there is a strict service-level agreement (SLA) with its customers.

Figure 6 shows a summary of the end-to-end forwarding and control planes of an MPLS L3VPN architecture.

Site 110.1.1.0/24P-1P-2PE-2PE-1CE-1Site 220.1.1.0/24CE-2
MP-iBGP
MP-iBGP
IGP/BGP/
Static
IGP/BGP...
IGP/BGP/
Static
IGP/BGP...
VRF CX-A
RD = 10:1
RT Export: 101:1
RT Import: 101:1
VRF CX-A...
VRF CX-A
RD = 20:1
RT Export: 101:1
RT Import: 101:1
VRF CX-A...
Prefix: 10:1:10.1.1.0, RT: 101:1, MP-iBGP NH: 1.1.1.1/32,
VPN Label: V1
Prefix: 10:1:10.1.1.0, RT: 101:1, MP-iBGP NH: 1.1.1...
Control Plane
Control...
Lo0: 1.1.1.1
Lo0: 1.1.1.1
Lo0: 2.2.2.2
Lo0: 2.2.2.2
V1
V1
10.1.1.0/24
10.1.1.0/24
V1
V1
10.1.1.0/24
10.1.1.0/24
L1
L1
V1
V1
10.1.1.0/24
10.1.1.0/24
L2
L2
P-1 pops the top label before
sending it to PE-1
P-1 pops the top label b...
Forwarding Plane
Forward...
Text is not SVG - cannot display
Figure 6: Forwarding and Control Planes of MPLS L3VPN Architecture

L3VPN Design Considerations

This section discusses the primary design considerations common in an MPLS L3VPN environment, along with the possible design options of each. One of the common connectivity models of enterprise customers to the MPLS L3VPN-based WAN is multihoming. Enterprise customers with this connectivity model often need to load balance traffic across both WAN links (this model includes both one CE with two links, or one site with two CEs and links), as shown in Figure 7.

As shown in Figure 7, there is an MP-BGP route reflector (RR) part of the MP-BGP control plane architecture. Normally, the RR will advertise only the best route to its clients (other PEs) from the RR point of view, which will usually break the requirement of load balancing or sharing for those multihomed enterprise customers. One simple solution is to remove the RR and use a full mesh of MP-iBGP sessions. However, this might not be an ideal solution for many carrier networks because it may introduce MP-BGP scalability limitations on the underlay network. The other common and simple solution to this requirement is to configure the multihomed VPNs/VRFs of the multihomed sites with different RDs, where each route will appear as a unique VPN route to the RR. Consequently, the RR will send these VPN routes to the other remote PEs (PE-1 in Figure 7, with PE-2 and PE-3 as the MP-BGP next hops).

10.2.1.0/24MPLS L3VPN
iBGP
iBGP
iBGP
iBGP
RRPE-1PE-210.3.1.0/2410.1.1.0/24PE-3
iBGP
iBGP
1:1001:1001:1001:1001:100
Traffic Flow
Traffic...
Unique RD per VPN
Unique RD per VPN
10.2.1.0/24MPLS L3VPN
iBGP
iBGP
iBGP
iBGP
RRPE-1PE-210.3.1.0/2410.1.1.0/24PE-3
iBGP
iBGP
1:1001:2001:2001:1001:100
Traffic Flow
Traffic...
Unique RD per VPN per PE
Unique RD per VPN per PE
1:100:10.2.1.0/24 NH PE-2
1:100:10.1.1.0/24 NH PE-2
1:100:10.2.1.0/24 NH PE-...
1:100:10.2.1.0/24 NH PE-2
1:100:10.1.1.0/24 NH PE-2
1:200:10.2.1.0/24 NH PE-3
1:200:10.1.1.0/24 NH PE-3
1:100:10.2.1.0/24 NH PE-...
RD
RD
Text is not SVG - cannot display
Figure 7: Multihoming in MPLS L3VPN Environment
Note

The BGP multipathing feature must be enabled within the relevant BGP VRF address family at the remote PE routers. Similarly, enabling BGP multipathing is required in a single CE dual-attached use case to enable load balancing/sharing from the CE end as well when BGP is used between the CE and PE.

MPLS L3VPN Topologies

As covered earlier in this section, RT values enable you to control the import and export of VRF routes, which can control VPN membership per customer. This facilitates the creation of different L3VPN overlaid topologies based on customer requirements. The following are the most common L3VPN WAN topologies, which are controlled by RTs.

Full Mesh

The full-mesh topology shown in Figure 8 is the simplest and most common topology that represents the typical MPLS L3VPN layout. The any-to-any communications model between different customer sites that normally belong to the same customer (under a single VPN or multiple VPNs) must carry the same RT values of the import and export among them (among the relevant PEs).

Site-3VPN-1Site-4VPN-1Site-2VPN-1Site-1VPN-1Full MeshVPN Connectivity
Figure 8: MPLS L3VPN Full-Mesh Topology

This design model logically can be shown as one large router with all other locations connected directly to it, as shown in Figure 9, where the MPLS L3VPN cloud acts as a central hub and all other sites are directly attached to it in a star topology.

Site-2
Site-2
Site-3
Site-3
Site-1
Site-1
Site-7
Site-7
Site-8
Site-8
Site-6
Site-6
Site-5
Site-5
Site-4
Site-4
MPLS L3VPN
Text is not SVG - cannot display
Figure 9: MPLS L3VPN Conceptual View

Hub-and-Spoke L3VPN Service

In some situations, MPLS L3VPN customers require that the communication between remote sites has to go through the main or hub site — for example, to align with the enterprise security policy requirements. From the L3VPN service provider point of view, a hub-and-spoke topology can be provisioned for this type of requirement by controlling MP-BGP VPN route propagation (using RT import and export), as shown in Figure 10.

Site-3VPN-1Hub SiteVPN-1Site-2VPN-1Site-1VPN-1
Figure 10: MPLS L3VPN Hub-and-Spoke Topology

The most common and proven way to achieve a hub-and-spoke topology over an L3VPN network is to deploy two links with one VRF per link between the PE and the directly connected CE hub, as shown in Figure 11.

Site 2Site 1HQService ProviderMPLS VPN
route-target export 1:1
route-target import 2:2
route-target export 1:...
route-target export 1:1
route-target import 2:2
route-target export 1:...
VRF-InVRF-Out
route-target export 2:2
route-target export 2:2
route-target import 1:1
route-target import 1:1
Text is not SVG - cannot display
Figure 11: MPLS L3VPN Hub-and-Spoke Design

Achieving a hub-and-spoke topology in an MPLS L3VPN environment is as simple as controlling the import and export of RT values. However, network designers must be aware of the following design considerations to avoid breaking communications across the overlaid hub-and-spoke topology:

  • If BGP is used as the PE-CE routing protocol across the hub-and-spoke topology over L3VPN and each site uses the same BGP autonomous system number (ASN), BGP AS override should be used by the PE connected to the hub-and-spoke sites. This avoids blocking communication among the sites as a result of BGP loop-prevention behavior. Although BGP allows the AS-in feature to be used for the same purpose from the CE side, it must be planned carefully to avoid any unexpected BGP AS_PATH looping.
  • If more than one spoke is connected to the same PE, a VRF is required to avoid traffic bypassing the hub site.
  • If the hub site has two edge CE routers connected to the MPLS L3VPN cloud, each CE must (ideally) be assigned the role of handling routing/traffic in one direction; one hub CE is connected to the receiving link, and the other hub CE is connected to the sending link.

Multilevel Hub and Spoke

In this model, large enterprise customers (usually with distributed sites in multiple geographic areas) can take advantage of a multitiered hub-and-spoke topology, as shown in Figure 12. For instance, remote sites distributed across different regions can be aggregated into first-level hub sites per region, while the first-level hub sites connect to each other in a hub-and-spoke topology and to a second-level hub site, such as a centralized data center.

In Figure 12, each group of hub and spoke is allocated its own MPLS VPN, which can represent the grouping of sites based on geographic location. With an architecture like this, global enterprises can achieve a more structured network design and traffic flow between different sites and geographic regions. At the same time, service providers can easily provision this type of architecture by controlling routing information propagation among the different VPNs by controlling the import and export of RT values between the VPNs.

Site-3VPN-1HubVPN-1Site-2VPN-1Site-1VPN-1Data CenterVPN-XHubVPN-2HubVPN-3Site-4VPN-2Site-5VPN-3
Figure 12: MPLS Multilevel Hub and Spoke

Extranet and Shared Services

In this particular design model, communication between one or more different VPN networks and a centralized VPN network is required. This is achieved in the same manner as the previous models: by controlling the import and export RT values. In this particular model, the central VPN must have import RT values matching the different export RT values of the VPNs that require access, as shown in Figure 13.

Site-4VPN-1Site-2VPN-1Site-XVPN-XSite-3VPN-2Site-2VPN-2
Figure 13: MPLS L3VPN Extranet Topology

Figure 14 illustrates a detailed view of how the import and export of routes happen in a shared services VPN architecture.

RD: 600:2RT Import: 600:2002, 600:6006RT Export: 600:6000, 600:2002RD: 600:1RT Import: 600:1001, 600:6006RT Export: 600:6000, 600:1001RD: 600:6RT Import: 600:6000RT Export: 600:6006ImportRT 600:6000ExportRT 600:6006Shared ServicesVPN-XSite-X VPN-2Site-X VPN-1No Route leaking between different VPNs
Figure 14: MPLS L3VPN Shared Services Connectivity Model

With this design, the shared services VPN will be accessed by both VPN-1 and VPN-2 without compromising routing and reachability separation requirements between these VPNs (no communication between VPN-1 and VPN-2). However, because the shared services VPN will have visibility on both VPNs’ routes, the IP prefixes of VPN-1 and VPN-2 must be unique; otherwise, VRF-aware Network Address Translation (NAT) should be deployed.

The following are the most common scenarios used with this design model:

  • Management: Network operations center (NOC) management access. For example, the service provider can offer managed CE services to its clients. Accordingly, the NOC or management VPN of the service provider requires access to the relevant customer’s VPN to manage or monitor its CE routers.
  • Shared services: In general, shared service access refers to several services that have to be accessible from different MPLS L3VPN networks, such as file services, Voice over IP (VoIP) gateways, and hosted applications (Software as a Service [SaaS]) or Internet connection.
  • Extranet or business-to-business (B2B) communication: This scenario is common in modern enterprises where vendors and partners can share limited reachability between their networks to facilitate different types of communications, such as business-to-business telepresence.
  • Community communication: In this model, a centralized entity provides central services access via a common MPLS VPN cloud. An example of this is an educational system, where a centralized entity provides shared services access to different schools across different locations while maintaining traffic separation between the schools.
Note

Depending on the network environment, the number of prefixes to be exported and imported can be limited in a controlled manner. For example, in the case of managed services, only the loopback IP of each CE is exported from the customer VPN to the NOC/management VPN for monitoring and remote-access purposes. At the same time, controlling the number of exported and imported prefixes prevents the leaking of extra prefixes, which can lead to other issues such as exposing customer internal routes and unnecessary extra overhead on the PE nodes.

MPLS VPN Internet Access Design Options

The flexibility provided by MPLS-based infrastructures, specifically MPLS VPN, allows operators to offer more than basic private IP WAN connectivity to their customers. One of the primary services that today’s MPLS VPN architectures offer is Internet access as a value-added service, taking advantage of the flexible MPLS VPN architecture that uses the same infrastructure to provide various connectivity models and services.

Internet access for MPLS VPN customers can be provided using multiple design options, categorized as follows:

  • Non-MP-BGP VPN Internet routing
  • MP-BGP VPN-based Internet routing

Non-MP-BGP VPN Internet Routing

The primary concept of this design model is that Internet routes are carried across the carrier network using the global routing table.

Option 1: VRF-specific default route — This design option uses a static default route to redirect traffic from a customer VRF to the Internet gateway (Autonomous System Boundary Router [ASBR]) using the PE’s global routing table, as shown in Figure 15. For traffic in the other direction (Internet to VPN customers), a static route moves Internet traffic from the global routing table to the corresponding VRF/VPN per customer.

PE
PE
ASBR
ASBR
Default Route
(Global Routing Table)
Default...
CE
CE
VRF
NAT can be done at CE side
NAT can be done at CE side
Internet
Internet
Text is not SVG - cannot display
Figure 15: MPLS L3VPN Internet Access Option 1

Option 2: Separate PE-CE subinterface — This design option uses the conceptual model of Option 1, but separates Internet traffic and IP VPN traffic over two physical links or subinterfaces (such as 802.1Q-based, or interface/DLCI-based in the case of FR CE-PE physical connectivity). Either a static route or BGP is used over the Internet link (the non-VRF interface) to propagate Internet routes (or default only) between a PE and the directly attached CE, as shown in Figure 16.

PE
PE
ASBR
ASBR
Default Route or BGP
(Global Routing Table)
Default...
CE
CE
Sub-interfaces or separate physical interfaces
NAT can be done at CE side
Sub-interfaces or separate physical i...
Internet
Internet
MPLS L3VPN Interface
MPLS L3VPN Interface
Dot1Q
Dot1Q
VRF
Text is not SVG - cannot display
Figure 16: MPLS L3VPN Internet Access Option 2

MP-BGP VPN Internet Routing

This design model relies primarily on MP-BGP VPN to carry Internet routes across the carrier network.

Option 3: Extranet with Internet-VRF — This design option is based on the extranet/shared VPN principle, where the Internet gateway (ASBR) installs Internet routes in its own VPN. This VPN has bidirectional communication with the customer’s VPN for customers requiring Internet access over the same link/VPN. Operators control the propagation of Internet routes or a default route into a customer’s VRF routing table via the import and export RT values, as shown in Figure 17. If full Internet routes are injected into the customer VPN, it is highly recommended to change the MPLS VPN label allocation mode to either per CE or per VRF to avoid the performance inefficiencies of the default per-prefix allocation model.

PE
PE
ASBR
ASBR
MP-BGP VPN
MP-BGP...
CE
CE
NAT can be done at CE side
NAT can be done at CE side
Internet
Internet
0/0 or Full Internet Routes
(Inside MP-BGP VPN)
0/0 or Full Internet R...
VRF
Text is not SVG - cannot display
Figure 17: MPLS L3VPN Internet Access Option 3

Option 4: VRF-aware NAT — This design option also relies on MP-BGP VPN for customer Internet traffic. However, the Internet gateway (ASBR) is not placed in a shared VPN. Instead, a VRF/VPN is created at the ASBR per customer, each injected with a default route propagated to the corresponding VPN across the MPLS VPN network. VRF-aware NATing is performed on the outside (Internet-facing) interface for both ingress and egress Internet traffic. As a result, customers may retain their private IP addressing even if there is overlap with other customers’ IP addressing ranges, as shown in Figure 18.

PE
PE
ASBR
ASBR
MP-BGP VPN
MP-BGP...
CE
CE
VRF-aware NAT
VRF-aware NAT
Internet
Internet
0/0 (Inside MP-BGP VPN)
0/0 (Inside MP-BGP VPN)
VRF
Text is not SVG - cannot display
Figure 18: MPLS L3VPN Internet Access Option 4

Table 2 compares these four design options across key design dimensions.

Table 2: MPLS L3VPN Internet Access Design Options Comparison
Design Consideration Option 1 Option 2 Option 3 Option 4
PE manageability Moderate Simple Simple Simple
ASBR manageability Simple Simple Moderate Complex
CE simplicity Simple Simple Moderate Moderate
Scalability High High Moderate* High**
Supports overlapping customer IPs No (requires public PI/PA) No (requires public PI/PA) Yes No
Full Internet routes at the PE Yes May May No
Design concerns Static routing may lead to configuration errors and operational complexity. May add routing/BGP configurations to the CE side. Several VRFs on the same PE, each with full Internet routes, can add significant load on the PE. VPN customers do not receive full Internet routes.

* If the full Internet routing table is installed per VRF, there can be scalability limitations at the PE level.

** There might be limitations at the ASBR level (VRF routes or NATing entries/sessions).

The comparison in Table 2 might make it seem like one option is better than others for certain design requirements, but usually the network environment and the situation drive the design choices. Always consider business priorities, design constraints, and the targeted environment, in addition to the technical aspects, before making any design decision.

PE-CE L3VPN Routing Design

Designing routing between the PE and CE in an MPLS L3VPN environment can sometimes be challenging, whether in a service provider or an enterprise with a self-deployed MPLS L3VPN core. Although the CE and PE sides are most commonly managed by different teams, the routing design should be coordinated and aligned to achieve a successful design, because of dependencies in some scenarios that can impact the overall design and end-to-end communication.

PE-CE Routing Design Considerations

Network designers must follow a top-down approach during the planning phase to identify the goals and direction of the design. The following questions form the foundation for making a suitable design choice:

  • Business requirements:
    • Is it to reduce the cost of existing expensive links?
    • Is it to reduce the time to expand the business (for example, to add new remote sites quickly)?
    • Is it to increase the reliability of business applications with minimal cost (backup path)?
    • Is it to optimize ROI by using multiple links in a load-balancing/sharing manner?
  • PE-CE functional and application requirements:
    • Provide primary or backup WAN connectivity?
    • Provide WAN and Internet access?
    • Provide a primary path for applications requiring high bandwidth or strict end-to-end QoS?
    • Provide efficient bandwidth utilization over multiple paths (multihoming)?
    • Provide connectivity to remote areas with limited optical coverage (for example, 4G/5G)?
  • Technical requirements (connectivity characteristics):
    • Is the CE single-homed or multihomed to the MPLS L3VPN network (PE)?
    • Is there any backdoor link from the CE side?
    • Is there any limitation to running a specific routing protocol (such as lack of staff knowledge, software limitations, or lack of SP support)?

PE-CE Routing Protocol Selection

Routing protocol selection is one of the critical parts of PE-CE design because each protocol has its strengths and weaknesses. As shown in Figure 19, a route received from one CE via a routing protocol is converted into an MP-BGP route, along with protocol-related information transformed into BGP extended community values. These values can be used at the remote egress PE when the route is reconverted into the original routing protocol (such as OSPF or EIGRP). Each routing protocol behaves differently in these scenarios.

IGP is converted into MP-BGP
+ some IGP information is carried across the
MPLS VPN backbone by MP-BGP
in new extended communities.
IGP is converted into MP-BGP...
IGP
IGP
IGP
IGP
MP-BGP is reconverted into
IGP along with relevant
carried IGP attributes.
MP-BGP is reconverted in...
MPLS L3VPN Backbone
MPLS L3VPN Backbone
Text is not SVG - cannot display
Figure 19: PE-CE Routing Principle

PE-CE Design Options and Recommendations

The following discussion uses the reference architecture in Figure 20 as the foundation for design considerations per routing protocol.

Data Center
Data Center
P
P
P
P
PE-1PE-2PE-3CE-2CE-3
HQ
HQ
CE-4
Remote Site-1
Remote Site...
CE-1
Text is not SVG - cannot display
Figure 20: PE-CE Connectivity Model Reference Architecture

Static Route PE-CE

Applying a static route to the reference architecture can lead to design limitations and operational complexities. For example, CE-2, CE-3, and CE-4 have multiple links (direct PE-CE and backdoor links), which can lead to limited scalability and flexibility, and managing these multiple edge devices with multiple links is associated with a high possibility of human (configuration) errors. However, a static route between CE-1 and PE-1 is a feasible design option because it is a single-homed site; even a single default route at CE-1 can be used, assuming Internet access is through the same link.

Link State: OSPF as a PE-CE Routing Protocol

OSPF is not commonly used for PE-CE routing by service providers, though it is applicable to self-deployed enterprise MPLS L3VPN environments. The design can be more complex with OSPF, especially when there are sites with backdoor links or multihomed sites. Figure 21 shows the implications of different OSPF area designs applied to the reference architecture.

Area 2Area 0Area 1
Data Center
Data Center
P
P
P
P
PE-1PE-2PE-3CE-2CE-3
HQ
HQ
CE-4
Remote Site-1
Remote Site...
CE-1
Scenario 1
Scenario 1
Area 0Area 0Area 0
Data Center
Data Center
P
P
P
P
PE-1PE-2PE-3CE-2CE-3
HQ
HQ
CE-4
Remote Site-1
Remote Site...
CE-1
Scenario 2
Scenario 2
Text is not SVG - cannot display
Figure 21: PE-CE Connectivity Model OSPF
Note

If you are unfamiliar with OSPF terms such as the DN bit, OSPF domain identifier, and OSPF sham link, refer to IETF RFC 4577 before reading this section.

In Scenario 1 (multi-area OSPF), each site or branch is deployed with its own OSPF area, and the service provider side is part of the super backbone (area 0). If the CE side is configured with a different OSPF process ID than the PE side, traffic between the data center and HQ will always prefer the backdoor link, as shown in Figure 22. This can be avoided by using a single matching OSPF process ID between CEs and PEs, though this can add operational complexity for the service provider. Alternatively, the PE side can deploy the same OSPF domain identifier on both ingress and egress PEs along with OSPF cost metric tuning.

Area 2Area 0
Data Center
Data Center
PE-2PE-3CE-2CE-3
HQ
HQ
CE-4
Advertise via
MP-BGP VPNv4/6
Advertise via...
LSA Type-3 always be preferred over LSA Type-5
LSA Type-3 always be preferred over LSA T...
Advertise OSPF Prefixes
as LSA Type-5
Advertise OSPF Prefix...
OSPF Process ID Mismatch
OSPF Process ID Mismatch
OSPF Process ID: 1
Advertise OSPF Prefixes LSA Type 1 and 2
OSPF Process ID: 1...
OSPF Process ID: 2
Redistribute OSPF into MP-BGP
OSPF Process ID: 2...
Redistribute MP-BGP
into OSPF
Redistribute MP-BG...
Text is not SVG - cannot display
Figure 22: PE-CE Connectivity Model OSPF with Backdoor Link

OSPF incorporates multiple loop-prevention attributes, including the DN bit and route tag, and the OSPF domain identifier. For instance, when PE-1 sets the DN bit on prefixes of Remote Site-1 (CE-1) and advertises them to CE-2, it prevents PE-2 from re-advertising the same prefix back into the MPLS VPN super backbone when sourced from CE-2 or CE-3. When any PE receives a type 3, 5, or 7 LSA from any CE with the DN bit set, the routing information from that LSA will not be considered in OSPF route computation and will not be converted into a BGP route, as shown in Figure 23.

Area 0Area 1
Data Center
Data Center
P
P
P
P
PE-1PE-2CE-2CE-3
Remote Site-1
Remote Site...
CE-1
Redistribute to OSPF and
set the DN bit.
Redistribute to OSPF...
DN bit is set, so route will not
be re-injected into the super
backbone OSPF again. 
DN bit is set, so route will...
Text is not SVG - cannot display
Figure 23: PE-CE Connectivity Model OSPF Loop Prevention
Note

In hub-and-spoke topologies over MPLS L3VPN where OSPF is used as the PE-CE routing protocol, when two remote spokes communicate, LSAs from each spoke reach the central/hub PE and then the hub CE, looping back into a different VRF. Because these LSAs (type 3, 5, or 7) carry the DN bit, they will not be considered by the hub PE. The DN bit ignore feature (also known as capability VRF-lite) is required at the hub PE to disable DN bit checking, allowing the route to be considered when it loops back into a different VRF. A careful analysis is required before enabling this feature to avoid routing information loops.

However, the DN bit and route tag attributes will be stripped if the route is redistributed into another routing domain (such as EIGRP or RIP) and then redistributed back into OSPF. Multiple redistributions can lead to a routing loop. To avoid this, the service provider must ensure that routes redistributed from OSPF into other routing protocols are not redistributed back into OSPF. This can be achieved by using route maps to filter routes based on attributes such as route tags or by using route policies to prevent redistribution of certain routes.

In constrast, in Scenario 2 (all sites in OSPF area 0), all routes between the data center and HQ are seen as OSPF intra-area routes. Regardless of the WAN link cost metric, the backdoor will always be the preferred path because the route from the MPLS L3VPN will be seen as an inter-area or external route. To resolve this, the service provider must set up an OSPF sham link between the relevant PEs (PE-2 and PE-3 in this scenario) to create a logical intra-area link, as shown in Figure 24. After the sham link adjacency is established, the OSPF cost metric on the relevant interface can be manipulated to make the MPLS L3VPN link the preferred path.

Note

For the enterprise (CE side) to avoid reliance on the SP to set up a sham link, the OSPF area design can be migrated to use a unique OSPF area per site (for sites connected with a backdoor link), if this option is available.

Using OSPF as the PE-CE routing protocol can be challenging without close coordination between the PE and CE sides. Even with good coordination, OSPF imposes some design limitations:

  • If any CE needs to send a summary route to other CEs, this must be deployed by the provider/PE side.
  • OSPF offers very limited control to achieve detailed load sharing over multiple paths in data center multihoming scenarios. There is also no mechanism to influence the service provider’s route selection from the CE, unlike BGP AS-PATH prepending.
Note

IS-IS acts similarly to OSPF when there is a backdoor link: redistributed prefixes from MP-BGP into IS-IS will be seen as external routes, while the same route over the backdoor link will be received as an internal route, always preferring the backdoor path. BGP supports carrying critical IS-IS information as BGP extended communities, which can be converted back into an IS-IS LSP at the other PE. For more details, refer to IETF draft-sheng-isis-bgp-mpls-vpn.

EIGRP as a PE-CE Routing Protocol

Using EIGRP as a PE-CE routing protocol may offer a simpler design compared to OSPF if designed properly. MP-BGP carries specific EIGRP information in new BGP extended communities set by the ingress PE, including EIGRP ASNs and attributes such as delay, bandwidth, hop count, and reliability, which helps the receiving PE EIGRP instance to have relevant and usable routing information.

However, PE-CE EIGRP design requires special considerations in some scenarios to avoid undesirable behaviors. Figure 25 illustrates different EIGRP application scenarios using the same reference network architecture.

AS 2AS 1AS 3
Data Center
Data Center
P
P
P
P
PE-1PE-2PE-3CE-2CE-3
HQ
HQ
CE-4
Remote Site-1
Remote Site...
CE-1
Scenario 1
Scenario 1
AS 1AS 1AS 1
Data Center
Data Center
P
P
P
P
PE-1PE-2PE-3CE-2CE-3
HQ
HQ
CE-4
Remote Site-1
Remote Site...
CE-1
Scenario 2
Scenario 2
Text is not SVG - cannot display
Figure 25: PE-CE Connectivity Model EIGRP

In Scenario 1 in Figure 25, EIGRP is applied with different ASNs per site. Each site receives the EIGRP route of the other site as an external route because the ASN is carried in the BGP extended community. When redistributed from MP-BGP into EIGRP at the egress PE with an ASN mismatch, EIGRP installs the route as an external route. The concern arises where a backdoor link exists between sites, because EIGRP has no built-in loop-detection mechanism like OSPF. This can lead to EIGRP information circulating in the looped topology and unpredictable behavior due to a race condition, where the route is accepted based on the timing of EIGRP and BGP updates.

Route racing in Scenario 1 can be mitigated to some extent by limiting the maximum EIGRP hop count, but this approach is difficult to tune and still allows undesirable looping behavior until the maximum hop count is reached. A more deterministic solution is EIGRP Site of Origin (SoO), a BGP extended community associated with the route. When redistributed from EIGRP into MP-BGP, any route assigned an SoO value will not be re-advertised over interfaces deployed with the same SoO value, helping to avoid routing loops in topologies with both MPLS VPN and backdoor links, as shown in Figure 26.

AS 2AS 1Data CenterPE-1PE-3CE-2CE-3HQCE-4MP-BGP SoO 1:4SoO:1:4SoO 1:3SoO 1:4 (HQ LAN)Applying these SoO to the highlighted links canoffer optimized stability and loop prevention;however, it might break the redundancy over thebackdoor link during some failure scenarios.Sourced from PE-3with SoO:1:4Route with SoO value 1:4will be stoppedSoO:1:3SoO:1:4PE-2SoO:1:3SoO:1:4MP-BGP SoO 1:4
Figure 26: PE-CE Connectivity Model EIGRP SoO

Table 3 summarizes the behavior of EIGRP SoO in different situations.

Table 3: EIGRP SoO Actions
Received Route SoO Details Action
SoO value matches the SoO value on the sending or receiving interface. The route will be filtered out.
CE deployed with SoO value that does not match. The route is added to the EIGRP topology table so that it can be redistributed into BGP and the SoO value preserved.
Does not contain an SoO value. The route is accepted into the EIGRP topology table, and the SoO value from the interface used to reach the next-hop CE router is appended to the route before redistribution into BGP.

Although SoO can help mitigate route looping and racing issues, it may sometimes reduce redundancy. For example, if SoO values are applied on the backdoor link and the PE-3–CE-4 link goes down, traffic with SoO value 1:3 or 1:4 destined for the HQ will be isolated due to SoO filtering at the backdoor link, even though the backdoor link is available. Network designers must understand the design goals and priorities — redundancy + suboptimal routing versus stability + optimal routing. If the time required for EIGRP to stabilize following a failure is acceptable, a simple SoO design is sufficient, as shown in Figure 27, where SoO stops the information feedback loop faster than relying on hop count.

AS 2AS 1Data CenterMP-BGPPE-1PE-3CE-2CE-3HQCE-41.SoO:1:3SoO filtering stopsthe feedback loopfollowing the failureof CE-2.SoO:1:3SoO:1:4PE-2SoO:1:3SoO:1:4SoO:1:3SoO:1:3SoO:1:32.3.4.
Figure 27: PE-CE Connectivity Model EIGRP Loop Prevention with SoO: Failure

In Scenario 2 in Figure 25, all sites share the same EIGRP ASN, so all routes learned over MPLS L3VPN and backdoor links are internal EIGRP routes. A common design concern is when the backdoor link is intended only as a backup path. For instance, the HQ LAN prefix is advertised in EIGRP to the MPLS VPN PE-3 and to CE-2/CE-3 over the backdoor link. PE-1 then has two BGP paths for the HQ LAN: the iBGP path via PE-2/PE-3, and the locally redistributed BGP route from EIGRP via CE-2. From PE-1’s perspective, the locally originated route is preferred by BGP best-path selection, causing traffic from CE-1 destined for the HQ to use the backdoor link as the primary path, as shown in Figure 28.

AS 1AS 1
Data Center
Data Center
P
P
P
P
PE-1PE-2PE-3CE-2CE-3
HQ
HQ
CE-4
Remote Site-1
Remote Site...
CE-1
Text is not SVG - cannot display
Figure 28: PE-CE Connectivity Model EIGRP Suboptimal Routing

A common solution is the BGP cost community, which influences BGP path selection. EIGRP routes injected into MP-BGP with a point of insertion (POI) value of 128, along with the cost set to the EIGRP composite metric, are considered before the typical BGP path selection algorithm (such as local preference and AS-PATH). The BGP cost community value ranges from 0 to 4,294,967,295, with a default of 2,147,483,647. When PEs redistribute EIGRP routes into BGP, the BGP cost community is populated with the accumulated EIGRP metrics, giving each PE visibility of the EIGRP path cost. As a result, PE-1 prefers the HQ LAN route advertised by PE-3 because the EIGRP metric via CE-2/CE-3 includes the accumulated backdoor link cost, while the iBGP path from PE-3 has a lower cost community value, as shown in Figure 29.

AS 1AS 1
Data Center
Data Center
P
P
P
P
PE-1PE-2PE-3CE-2CE-3
HQ
HQ
CE-4
Remote Site-1
Remote Site...
CE-1
Text is not SVG - cannot display
Figure 29: PE-CE Connectivity Model EIGRP Optimal Routing

The BGP cost community also helps optimize routing in scenarios where a route from a directly connected site should always be preferred. If the route from CE-1 is assigned a BGP cost community value lower than the default (2,147,483,647) at PE-1, it will always be preferred by PE-2 and PE-3 over any BGP advertisement without a cost community, regardless of other BGP attributes, as shown in Figure 30.

AS 2AS 1
Data Center
Data Center
P
P
P
P
PE-1PE-2CE-2CE-3
Remote Site-1
Remote Site...
CE-1Possible Suboptimal Routing Without AssigningBGP Cost Community to CE-1 Routes at PE-1AS 2AS 1
Data Center
Data Center
P
P
P
P
PE-1PE-2CE-2CE-3
Remote Site-1
Remote Site...
CE-1Optimal Routing After Assigning BGP Cost Community toCE-1 Routes at PE-1
Text is not SVG - cannot display
Figure 30: PE-CE Connectivity Model EIGRP Suboptimal Routing-2
Note

The BGP cost community may transform BGP to act in a way it is not designed to (like IGP), which may lead to undesirable behaviors in some scenarios. Additionally, the BGP cost community does not support propagation over eBGP sessions, so it is not a valid solution when inter-AS communication is required to extend MPLS L3VPN reachability to a new remote site where the current service provider has no presence.

Consequently, using EIGRP as a PE-CE routing protocol may add simplicity for enterprises that already use EIGRP internally. However, when there are multihomed sites or sites with backdoor links, the design may prove too complicated, and overall flexibility and stability may be reduced. EIGRP Over the Top (OTP) can offer a more flexible PE-CE design that is independent of the service provider routing control.

BGP as a PE-CE Routing Protocol

BGP as a PE-CE routing protocol can achieve optimal routing and traffic control over the most complex connectivity layouts because it has multiple powerful attributes that can influence inbound and outbound path selection. The BGP SoO attribute uses the same concept as EIGRP SoO to control route propagation when there is a backdoor link between customer sites. When possible, it is always desirable to consider BGP as the PE-CE routing protocol for multihomed sites to single- or multiple-provider networks, as it facilitates advanced traffic engineering with flexible BGP policies and optimized ROI of available paths.

However, the BGP cost community does not propagate over eBGP sessions, making it invalid when inter-AS communication is required to extend MPLS L3VPN reachability, as shown in Figure 31.

AS 1AS 1AS 1
Data Center
Data Center
AS 100PE-1PE-2PE-3CE-2CE-3
HQ
HQ
CE-4
Remote Site-1
Remote Site...
CE-1AS 300
eBGP Session for Inter-AS Connectivity
eBGP Session for Inter-AS Connectivity
AS 1
Remote Site-1
Remote Site...
CE-5
Text is not SVG - cannot display
Figure 31: PE-CE Connectivity Model EIGRP: BGP Cost Community Limitation

MPLS VPN providers typically consider two BGP ASN allocation models when BGP is used as a PE-CE routing protocol, as shown in Figure 32.

  • Same ASN per site: The MPLS provider allocates the same ASN to all customer sites. The main advantage is reduced BGP ASN collisions, but it introduces design concerns for multihomed sites.
  • Unique ASN per site: The MPLS provider allocates each customer site a separate BGP ASN. This allows easy identification of prefix sources via the AS-PATH attribute but may introduce scalability limitations with regard to available ASNs.
MPLS L3VPNAS 500PEPEPEAS 64520AAS 65001BAS 64520AAS 65001BAS 64520AAS 65001BMPLS L3VPNAS 500PEPEPEAS 64520AAS 65001BAS 64521AAS 65002BAS 64522AAS 65003B
Same BGP ASN per Site
Same BGP ASN per Site
UniqueBGP ASN per Site
UniqueBGP ASN per Site
Text is not SVG - cannot display
Figure 32: BGP ASN Allocation Models as an MPLS VPN PE-CE Routing Protocol

With the same ASN per site model, the service provider must rewrite the customer ASN (AS override) to overcome the default BGP loop-prevention mechanism and allow all sites with the same ASN to communicate. However, AS-PATH rewriting means CE routers cannot detect BGP looping. As shown in Figure 33, after Prefix X is received from CE-2 and AS override is applied, the AS-PATH becomes (300 300). When CE-1 receives this prefix, it accepts it because the original AS was removed from the AS-PATH, forming a route loop.

To overcome this loop, the BGP SoO extended community attribute is attached to BGP prefixes so that PEs can identify the actual prefix source. If the SoO value of a prefix equals the deployed SoO for a BGP peer, the prefix is stopped from being advertised. By applying this to the scenario in Figure 33, both PEs facing CE-1 and CE-2 configure the same SoO for their direct BGP peering. Route looping is stopped without impacting communication with any other remote site belonging to the same customer AS.

AS 300AS 65001Prefix P1CE-1CE-2PEPEPEAS 65001Without SoOeBGP: Prefix P1AS_PATH: 65001Rewriting AS Path,Apply as-overrideRewriteing AS Path,Apply as-overrideLoopeBGP: Prefix P1AS_PATH: 300 300AS 300AS 65001Prefix P1CE-1CE-2PEPEPEAS 65001With SoOeBGP: Prefix P1AS_PATH: 65001Rewriting AS Path,Apply as-overrideAssign SoO 65001Rewriteing AS Path,Apply as-overrideeBGP: Prefix P1AS_PATH: 300 300SoO 65001SoO 65001SoO 65001Configured SoO valueequals the attachedSoO value to the PrefixStop advertisement
Figure 33: BGP PE-CE Routing Protocol Loop Prevention

From a design perspective, rewriting the ASN along with SoO considerations adds complexity to the same ASN per site model. Additionally, BGP cannot always practically be considered — design constraints include BGP not being supported as PE-CE by the service provider, BGP not being supported by CE node software, or a lack of BGP knowledge among enterprise IT staff.

Table 4 summarizes the strengths and weaknesses of each PE-CE routing protocol.

Table 4: Comparison of PE-CE Routing Protocols
Routing Protocol Strengths Weaknesses
Static Simple and reliable when combined with IP SLA. Low operational complexity in small environments with a small number of prefixes. Nonscalable. High operational complexity in large environments. Limited flexibility in multihomed scenarios with automatic failover limitations.
Link state Reliable to a certain extent (supports built-in loop prevention). Supports multiple connectivity and flooding domain design scenarios. High design and operational complexity. Limited flexibility in large environments with backdoor links and multihoming scenarios.
EIGRP Reliable to a certain extent (topology dependent). EIGRP OTP can simplify and optimize CE-PE designs to a large extent. High design and operational complexity and limited flexibility in large environments with multihoming scenarios. EIGRP SoO may lead to inefficient use of available paths or lack of redundancy in multihoming scenarios with backdoor links.
BGP Most powerful and flexible protocol that can support all types of connectivity. For multihomed sites with complex policies, requires advanced operational staff expertise. May not be supported by some low-end routers for very small remote sites.

Review Questions

3. What does EIGRP SoO specifically mitigate from a routing perspective?

  1. Route looping and racing issues when OSPF is used as a PE-CE routing protocol at sites that contain just an MPLS VPN link
  2. Route looping and racing issues when EIGRP is used as a PE-CE routing protocol at sites that contain both MPLS VPN and backdoor links
  3. Route looping and racing issues when OSPF is used as a PE-CE routing protocol at sites that contain both MPLS VPN and backdoor links
  4. Route looping and racing issues when EIGRP is used as a PE-CE routing protocol at sites that contain just an MPLS VPN link

b. EIGRP site of origin (SoO) is specifically used to help avoid or mitigate the impact of routing loops, and racing issues, in complex topologies leveraging EIGRP as a PE-CE routing protocol that contain both MPLS VPN and backdoor links.


4. What is used in OSPF for loop prevention in the super backbone when redistributing into OSPF as a PE-CE routing protocol?

  1. BGP SoO
  2. OSPF domain ID
  3. OSPF DN bit
  4. EIGRP SoO

c. The OSPF DN bit is set when routes are redistributed from BGP into OSPF. This bit is then checked when OSPF redistributes routes into BGP at another device. If the OSPF DN bit is set, those routes will not be redistributed back into the super backbone. BGP and EIGRP SoO are unrelated here. The OSPF domain ID is a mechanism to manually adjust the OSPF process ID and is normally used by the PE devices to ensure the OSPF neighbors are in the proper OSPF process.


Software-Defined Networks

With the advent of software-defined solutions, network designers now have more capabilities to leverage in an overarching network design to meet multiple business requirements. Inherently, a software-defined solution is more complex, but this complexity is obfuscated by the additional capabilities it provides, assuming it is designed, deployed, and functioning properly. The following sections highlight SD-WAN and SD-LAN in a vendor-agnostic perspective and the corresponding design decisions and options around each solution.

SD-WAN

From a vendor-agnostic perspective, software-defined wide-area networking (SD-WAN) is composed of separate orchestration, management, control, and data planes:

  • Orchestration plane: Assists in the automatic onboarding of edge (spoke) routers into the SD-WAN overlay.
  • Management plane: Responsible for central configuration and monitoring.
  • Control plane: Builds and maintains the network topology and makes decisions regarding where traffic flows.
  • Data plane: Responsible for forwarding packets based on decisions from the control plane.

Figure 34 shows the different SD-WAN planes and how they interact with one another.

ManagementControlDCOfficeCampusBranchSOHOOrchestrationAPIINETMPLSCellular/5GSecureControlChannelOrchestration PlaneManagement Plane(Multitenant or Dedicated)Control PlaneData Plane(Physical or Virtual)
Figure 34: SD-WAN Solution Planes

SD-WAN Components

The primary components of the SD-WAN consist of a network manager, the controller, the orchestrator, and the edge router:

  • Network manager: A centralized network management system providing a GUI to monitor, configure, and maintain all SD-WAN devices and links in the underlay and overlay network.
  • Controller: A software-based component responsible for the centralized control plane of the SD-WAN fabric. It establishes a secure connection to each edge router, distributes routes and policy information, and orchestrates secure data plane connectivity between edge routers by distributing crypto key information, enabling a scalable, IKE-less architecture.
  • Orchestrator: A software-based component that performs initial authentication of edge devices and orchestrates controller and edge device connectivity. It also enables communication of devices behind NAT.
  • Edge routers: Devices at a physical site or in the cloud that provide secure data plane connectivity among sites over one or more WAN transports. They are responsible for traffic forwarding, security, encryption, QoS, and routing protocols such as BGP and OSPF.
Note

Depending on the specific vendor implementation of SD-WAN, these components and capabilities can be integrated into the same system or into dedicated individual systems. As a network designer, you will need to know when to leverage a solution like SD-WAN to solve the underlying business requirements.

Figure 35 depicts the different SD-WAN components and capabilities.

ISP-AISP-B
Controller
Contro...
Manager
Manager
Orchestrator
Orches...
EdgeEdgeEdge
Text is not SVG - cannot display
Figure 35: SD-WAN Components and Capabilities

SD-WAN Management Protocol

The management protocol manages the SD-WAN overlay network, running between the controllers and edge routers over a secure DTLS or TLS connection. Control plane information such as route prefixes, next-hop routes, crypto keys, and policy information is exchanged over this connection. The controller acts like a BGP route reflector: it receives routes from edge routers, processes and applies policy, then advertises routes to other edge routers. The default behavior with no policy defined is a full-mesh topology.

The management protocol advertises three types of routes:

  • Management protocol routes: Prefixes learned from the local site (service side) of an edge router, originated as static, connected, OSPF, or BGP routes and redistributed into the management protocol. They advertise attributes such as TLOC information (similar to a BGP next-hop), origin, originator, preference, site ID, tag, and VPN. A management protocol route is only installed in the forwarding table if the TLOC to which it points is active.
  • TLOC routes: The logical tunnel termination points on edge routers connecting to a transport network. A TLOC route is uniquely identified by a three-tuple: system IP address, link color, and encapsulation. TLOC routes carry attributes such as TLOC private and public IP addresses, carrier, preference, site ID, tag, and weight. For a TLOC to be considered active on a particular edge router, an active BFD session must be associated with that TLOC.
  • Service routes: Represent services (firewall, IPS, application optimization, etc.) connected to the edge device local-site network and available for other sites via service insertion. These routes also include VPN labels sent to controllers to indicate what VPNs are serviced at a remote site.

Virtual Networks

In the SD-WAN overlay, virtual networks (VNs) provide segmentation, much like VRF instances. Each VN is isolated from other VNs and has its own forwarding table. An interface or subinterface is explicitly configured under a single VN and cannot be part of more than one VN. Labels in the management protocol route attributes and packet encapsulation identify the VN a packet belongs to. The VN number is a 4-byte integer with a value from 0 to 65530.

TLOC Extension

A common network setup at a site with two edge routers is for each edge router to be connected to just one transport, with links between the edge routers allowing each to access the opposite transport through a TLOC extension interface on the neighboring edge router. TLOC extensions can be separate physical interfaces or subinterfaces.

SD-WAN Policies

Policies are an important part of the SD-WAN architecture and are used to influence the flow of data traffic among edge routers in the overlay network. Policies apply to either control plane or data plane traffic and are configured centrally on the controllers or locally on the edge device routers.

  • Centralized control policies: Operate on routing and TLOC information to customize routing decisions and determine routing paths through the overlay. Used for traffic engineering, path affinity, service insertion, and VPN topologies (full-mesh, hub-and-spoke, regional mesh, etc.). Application-aware routing is a centralized control policy that selects the optimal path based on real-time path performance characteristics for different traffic types.
  • Localized control policies: Enable routing policy at a local site, specifically through OSPF or BGP.
  • Centralized data policies: Influence the flow of data traffic based on IP packet header fields and VPN membership. Used for application firewalls, service chaining, traffic engineering, and QoS. Some centralized data policies affect handling on the edge device itself (such as application route policies or QoS classification); in these cases, configuration is downloaded to the controllers and relevant policy information is communicated to edge routers through the established secure connection.
  • Localized data policies: Allow data traffic to be handled at a specific site, such as ACLs, QoS, mirroring, and policing.

Review Questions

9. Which of the following SDN mechanisms is defined by the physical switches and routers that are part of the local-area network?

  1. Underlay
  2. SDN controller
  3. Overlay
  4. VN

a. The underlay network is defined specifically by the physical switches and routers in the LAN.


10. Which of the following is an SD-WAN overlay mechanism that provides segmentation much like a VRF?

  1. Underlay
  2. SDN controller
  3. Overlay
  4. VN

d. In the SD-WAN overlay, virtual networks (VNs) provide segmentation just like VRFs.


SD-LAN

Software-defined local area network (SD-LAN) is an evolved evolution of existing campus LAN designs that introduces programmable overlays enabling easy-to-deploy network virtualization across the LAN, capable of supporting multiple enclaves. In addition to network virtualization, SD-LAN allows for software-defined segmentation and policy enforcement based on user identity, device, method of connectivity, and group membership. These capabilities provide a significant reduction in operational expenses and an increased ability to drive business assurance and outcomes quickly with minimal risk, at the cost of increased complexity and staff expertise.

SD-LAN Terminology

Similarly to SD-WAN, the SD-LAN architecture enables the use of virtual networks (overlay networks) running on a physical network (underlay network) to create alternative topologies to connect devices. The key terms used in SD-LAN are:

  • Underlay network
  • Overlay network
  • SD-LAN data plane
  • SD-LAN control plane

Figure 36 depicts a conceptual view of the underlay and overlay network concepts.

Encapsulation
Encapsulation
Underlay Fabric
Underlay Fabric
Edge
Edge
Edge
Edge
Edge
Edge
Edge
Edge
Edge
Edge
Edge
Edge
Overlay Control Plane
Overlay...
Underlay Control Plane
Underlay Control Pla...
Text is not SVG - cannot display
Figure 36: Conceptual Underlay and Overlay Networks

Underlay network — Defined by the physical switches and routers that are part of the LAN. All network elements must establish IP connectivity via a routing protocol. A well-designed Layer 3 foundation to the LAN edge is highly recommended to ensure performance, scalability, and high availability. End-user subnets are not part of the underlay but instead are part of the overlay. The underlay is typically a Layer 3 fabric without any Layer 2; all Layer 2 requirements can be achieved in the overlay.

Overlay network — Runs over the underlay to create a virtual network. Virtual networks isolate both data plane traffic and control plane behavior among the physical networks of the underlay. Virtualization is achieved by encapsulating user traffic over IP tunnels sourced and terminated at the boundaries of SD-LAN. Network virtualization extending outside of the SD-LAN is preserved using traditional technologies such as VRF-Lite, MPLS VPN, or SD-WAN. Multiple overlay networks can run across the same underlay to support multitenancy.

SD-LAN Components

Note

To properly illustrate the different capabilities and components of an SD-LAN solution, specific aspects of the Cisco SD-A solution are referenced. The goal is not to teach all aspects of Cisco SD-A but rather to highlight the capabilities that SD-LAN solutions should have and how a network designer can leverage them to meet business requirements. Cisco SD-A is a unique set of technologies, automation, and central control that doesn’t necessarily fit perfectly into the SD-LAN category; SD-LAN represents a subset of Cisco SD-A.

Figure 37 shows the different components of an SD-LAN (specifically, Cisco SD-A) solution.

Border ABorder BEdgeEdgeEdgeEdgeEdgeEdgeBorder CExternal NetworkFabric SDN ControllerFabric Policy andAdentity RepositoryFabric Control-Plane NodeFabric IntermediateNodes (Underlay)Fabric Boder Node
Figure 37: SD-LAN Components: Cisco SD-A

Fabric Control-Plane Node

When leveraging LISP, the fabric control-plane node is based on the LISP Map-Server (MS) and Map-Resolver (MR) functionality combined on the same node, which can be instantiated on the fabric border node or a dedicated node. It enables:

  • Host tracking database (HTDB): A control repository of endpoint identifiers to fabric edge node bindings.
  • Map-Server (MS): Populates the HTDB from registration messages from fabric edge devices.
  • Map-Resolver (MR): Responds to map queries from fabric edge devices looking to determine the RLOC mapping information for a destination endpoint identifier.

When leveraging MP-BGP EVPN as the fabric control plane, MP-BGP peering relationships are established between fabric edge nodes and BGP route reflectors (instantiated on the fabric border node or a dedicated node). The MP-BGP EVPN fabric control-plane node enables:

  • Learning of endpoint identifier Layer 2 and Layer 3 reachability information between fabric edge nodes, keeping bindings current across VXLAN overlays.
  • Minimizing network flooding through protocol-based host MAC/IP route distribution and ARP suppression on the local VXLAN tunnel endpoint (VTEP).

Fabric Edge Node

Fabric edge nodes are the equivalent of an access layer switch in a traditional campus design, implementing a Layer 3 access design with the following fabric functions:

  • Endpoint registration: After an endpoint is detected, it is added to a local HTDB and a LISP map-register message is issued to inform the control-plane node.
  • Mapping of the user to virtual network: Endpoints are placed into virtual networks by assigning them to a VLAN mapped to a LISP instance, done statically or dynamically using 802.1X. Additional policies can be assigned for segmentation and policy enforcement.
  • Anycast Layer 3 gateway: A common gateway (IP and MAC addresses) used at every node sharing a common subnet to provide optimal forwarding and mobility across different edge nodes.
  • LISP forwarding (LISP control plane): Fabric edge nodes query the map server to determine the RLOC for the destination IP and encapsulate traffic in VXLAN. If the destination RLOC cannot be resolved, traffic is sent to the fabric border where the global routing table is used.
  • Routing-based forwarding (MP-BGP EVPN control plane): Fabric edge nodes participate in the VXLAN control plane as MP-iBGP EVPN peers to route reflectors, advertising end hosts behind all VTEPs.

Fabric Intermediate Node

Fabric intermediate nodes are part of the Layer 3 network interconnecting edge nodes to border nodes. In a three-tier campus design, they are the equivalent of a distribution switch. Fabric intermediate nodes only route IP traffic inside the fabric; no VXLAN encapsulation/de-encapsulation or LISP control-plane messages are required.

Fabric Border Node

Fabric border nodes serve as the gateway between the fabric domain and the network outside of the fabric, responsible for network virtualization interworking from the campus fabric to the rest of the network. They implement the following functions:

  • Advertisement of IP subnets: Runs an IGP or BGP to advertise IP prefixes outside of the fabric. These IP prefixes appear only on the routing tables at the border; throughout the rest of the fabric, IP information is accessed using the fabric control-plane node.
  • Fabric domain exit point: The gateway of last resort for fabric edge nodes.
  • Mapping of LISP instance to VRF: Extends network virtualization from inside the campus fabric to outside by using external VRF instances to preserve virtualization.

Fabric Policy and Identity Repository

The fabric policy and identity services are leveraged for dynamic user/endpoint-to-group mappings and policy definition. These policy types include:

  • Access policy: Determines how the user or endpoint is authenticated and authorized onto the campus fabric. Active Directory is typically used as the identity repository, with 802.1X and network access control solutions as the authentication and authorization mechanism. Based on these policies, endpoints can be placed into virtual networks and/or security groups (SGs), with group context propagated using security group tags (SGTs) as defined in RFC 3514 and draft-Smith-Kandula standards.
  • Network segmentation policy: Determines to which network overlay a user or endpoint should be assigned, dynamically or statically, based on virtual networks and security group information. Leveraging both virtual networks and security groups allows for two levels of segmentation for role-based access control (RBAC), with the security group segmentation layer referred to as micro-segmentation.
  • Access control policy: Rules and policies governing who can access what. Role-based access control policies are enforced using security group access control lists (SGACLs) for segmentation within VNs and dynamic VLAN assignment for mapping endpoints into VNs at the fabric edge node. Group-based policies simplify access control rules and end-to-end security policy enforcement by decoupling user identity from the network design.
  • Application policy: Policies enforced based on traffic treatment such as QoS for applications and path optimization.

Fabric SDN Controller

The fabric SDN controller oversees the configurations and operations of its network elements, including the configuration of fabric elements and policies associated with users, devices, and endpoints as they connect to the network. The controller offers a network abstraction layer to arbitrate the specifics of various network elements toward the orchestration and analytics engines, and exposes northbound REST-based APIs that abstract out the network functionality and services available at a network level.

Review Questions


Summary

This chapter covered the different design options and considerations of forwarding and control plane mechanisms of MPLS, MP-BGP, and software-defined networking. These services have become primary business enablers for enterprises by meeting customer connectivity requirements, whether Layer 3 or Layer 2 type of connectivity. Automation within the software-defined solutions discussed in this chapter brings new capabilities to businesses, allowing staff to focus on business initiatives rather than day-to-day operations and maintenance of the infrastructure.

In addition to the design models and considerations, this chapter covered the different design approaches that offer a scalable design to support enterprise businesses in this modern software-defined world with a very large number of nodes and prefixes. The design decision of selecting a certain design approach or protocol must be based on a holistic approach, to avoid designing in isolation of other parts of the network, regardless of whether it is for a physical, virtual, underlay, or overlay entity.

Previous: Layer 3 Technologies | Next: Security