Enterprise WAN Architecture Design
Overview
The enterprise WAN refers to the various enterprise WAN modules and components that facilitate efficient and secure communication between different enterprise locations (campuses, data centers, and remote sites). This chapter covers different design options and considerations for the WAN modules.
Three main topics are covered:
- Enterprise WAN Module: Critical WAN connectivity topics and network design elements
- WAN Virtualization and Overlays Design Considerations and Techniques: How the enterprise can extend network virtualization across the WAN
- Enterprise WAN Migration to MPLS VPN Considerations: Example migration steps for WAN migration to MPLS VPNs
Enterprise WAN Module
The WAN module is the gateway of the enterprise network to remote and regional sites. As part of the modular enterprise network architecture, it aggregates and houses all WAN/MAN edge devices that extend the enterprise network to remote sites using different transport media types and technologies.
Enterprises require a WAN design that offers a common resource access experience to remote sites with sufficient performance and reliability. As organizations move into multinational or global markets, they require:
- Flexible network design that reduces the time needed to add new remote sites
- Support for emerging business applications and communications
- Consistent quality of experience (QoE) for users whether at HQ or a remote site
- Ability to scale bandwidth or add new sites and resilient links without major architectural changes
The primary WAN module can be co-resident physically within the data center block or at the primary enterprise campus. In both cases, the WAN module architecture itself is the same. The aggregation layer connects to the core layer of either the enterprise campus or data center.
WAN Transports: Overview
Enterprise IT leaders are primarily concerned with managing costs and maintaining reliable WAN infrastructures. The most common WAN transport considerations are:
- Interconnectivity: Connect geographically dispersed enterprise locations and remote sites
- Security: Protect enterprise traffic over the WAN to offer the desired end-to-end level of protection and privacy
- Cost-effectiveness and reliability: Flexible and reliable transport that meets business objectives, supports critical application requirements, and converges voice, data, and video
- Business evolution support: Agility and scalability to meet current and projected growth of remote sites with flexible bandwidth rates
These factors are generic and common concerns for most enterprises but can vary from business to business. For instance, many businesses have no concern about unsecured IP communications over their private WAN.
Today’s ISPs offer dramatically enhanced Internet bandwidth and price performance with improved service reliability. The Internet with secure VPN overlay is now adopted by many businesses as either their primary or redundant WAN transport.
There are multiple WAN topologies and transport models an enterprise can choose from, such as point to point, hub and spoke, and any to any. Traffic over each topology can be carried over Layer 2 or Layer 3 WAN transports, either over a private WAN network such as MPLS or overlaid over the Internet. Each model has its own strengths and weaknesses, and network designers must understand all aspects of each WAN transport to select the right solution for the business, application, and functional requirements.
Modern WAN Transports (Layer 2 Versus Layer 3)
The decision to select L2 or L3 for the enterprise WAN transport is a business-driven design decision. Some factors that drive the decision include business, functional, and application requirements, along with the enterprise WAN layout and design constraints. In most cases, Layer 3 may be a better starting place, with the decision verified or changed to Layer 2 based on identified requirements.
- Layer 3 MPLS VPN: Enables true any-to-any connectivity between any number of sites without a full mesh of circuits or routing adjacencies. The SP exchanges routing information with enterprise WAN edge routers and forwards packets based on Layer 3 (IP). Each enterprise is assigned its own VPN within the SP MPLS network.
- Layer 2 VPN: The provider has no participation in the enterprise Layer 3 WAN control plane. The provider forwards traffic based on Layer 2 information such as Ethernet MAC addresses.
Layer 2 MPLS-Based WAN
Metro Ethernet (ME) services are one of the most common Layer 2 WANs used by large enterprises. ME-based L2 WAN offers two primary connectivity models:
- E-Line (EVPL): Point-to-point service over Ethernet (Fast Ethernet or Gigabit Ethernet), with multiple virtual circuits over one physical link using VLANs as a service identifier
- E-LAN (VPLS): Multipoint or any-to-any connectivity with high flexibility for the enterprise WAN
E-Tree is another ME connectivity model. It is a variation of E-LAN providing hub-and-spoke connectivity.
From an enterprise perspective, ME services appear either like a LAN switch (multipoint L2VPN) or a simple passthrough link (point-to-point L2VPN). Figure 1 illustrates the Layer 2 WAN MPLS-based architecture, and Table 1 summarizes its advantages and limitations.
| Advantages | Limitations |
|---|---|
| Bandwidth scalability: Scales from 1 Mbps to 100 Gbps | Limited access coverage: ME service may not be available in all locations |
| Performance and QoS: Low-latency, low-jitter transport suitable for converged networks; supports end-to-end QoS CoS/DSCP classes per SLA | Scalability concerns: Large-scale networks with many remote sites over a common E-LAN face control plane limitations (e.g., large number of routing adjacencies) |
| Routing control: Full enterprise control over WAN routing design, implementation, and operations; freedom to choose routing protocol and deploy WAN network virtualization | Routing topology limitations: Routing protocol limitations on certain topologies (e.g., full-mesh VPLS with OSPF requires special care) |
| Service offering availability: SPs globally are moving toward ME services | Staff knowledge: Enterprise staff may lack expertise to design and operate large-scale WAN routing (EIGRP, BGP) |
| Topology flexibility: Supports point-to-point, point-to-multipoint, and multipoint-to-multipoint layouts | |
| Service flexibility: Can run any advanced IP/non-IP services (IPv6, multicast) without SP dependency | |
| Cost-effective: Replicates the Ethernet cost model to the WAN |
Layer 3 MPLS-Based WAN
MPLS L3VPN enables enterprise customers to route traffic across the SP cloud as a transit L3 WAN network with a simplified “one-hop” single routing session per link between the enterprise WAN edge router and the PE router. The SP offloads all enterprise WAN control plane complexities, resulting in significant OPEX savings and faster time to add new remote sites, especially important for enterprises with hundreds or thousands of sites. Figure 2 shows the Layer 3 MPLS-based WAN architecture, and Table 2 summarizes its advantages and limitations.
| Advantages | Limitations |
|---|---|
| Bandwidth scalability: Flexible bandwidth capacity with smooth upgrades | Cost of bandwidth: L2VPN (ME) offers higher bandwidth scale (e.g., 10-Gbps wire rates) at lower cost |
| Performance and QoS: Consistent end-to-end DSCP-driven QoS, especially for real-time traffic such as VoIP | QoS complexity: QoS re-marking may be required to comply with different SP policies |
| Cost-effective: Reduces OPEX by eliminating the need to operate a large routed WAN core | IP addressing: Migration from legacy L2 WAN (e.g., Frame Relay) requires re-addressing of WAN interfaces |
| Service offering availability: Most SPs globally offer MPLS L3VPN | SP dependencies: Deploying new services (IPv6, multicast) requires SP support |
| Topology flexibility: Supports point-to-point, point-to-multipoint, and any-to-any | Flexible topology cost: Non-standard layouts (e.g., hub and spoke) may require additional VPNs provisioned by the SP at extra cost |
| Access flexibility: Can be provisioned with any access media type (Ethernet, WiMAX, VPN over Internet, 4G/5G) | LAN extension: L3 WAN does not natively allow Layer 2 extensions; overlay technologies (e.g., L2VPN over GRE over L3VPN) add complexity and potential fragmentation/serialization delay |
| Routing simplicity: Only one routing peer/session per link to maintain |
Internet as WAN Transport
Despite being a best-effort transport that lacks end-to-end QoS support, the modern Internet offers relatively high reliability and high-speed connectivity at low cost. Many enterprises are increasingly hosting services in the cloud and embracing SaaS applications such as Cisco WebEx and Microsoft Office 365, shifting traffic patterns toward the Internet.
The Internet can be a reasonable and cost-effective choice for remote sites as a primary transport when other WAN options are not feasible or when WAN access coverage is limited. This design relies on VPN tunneling (overlay) techniques to connect remote sites to the hub site over the Internet. The preferred approach is hub-and-spoke connectivity using DMVPN, which also offers any-to-any (direct spoke-to-spoke) flexibility. Point-to-point tunneling mechanisms such as classical IPsec and GRE are still viable options, though with typical P2P scalability limitations. Figure 3 depicts this model, and Table 3 summarizes its advantages and limitations.
The connectivity to the Internet can be either directly via the enterprise WAN module or through the Internet edge module. This decision is usually driven by the enterprise security policy. For instance, there might be a dedicated DMZ for VPN tunnel termination at the Internet edge with a backdoor link to the WAN distribution block.
| Advantages | Limitations |
|---|---|
| Low cost: High bandwidth at low cost | Reliability: Cannot satisfy the strict service reliability required by some businesses; business-grade SLAs are available but at higher cost |
| Split tunneling: Offload traffic destined to the Internet (e.g., public cloud SaaS) via direct Internet access, reducing load on private WAN paths | Consistent QoS: Best-effort IP transport; enterprises cannot maintain true end-to-end service differentiation and consistent QoS |
| Ubiquitous connectivity: Can be provisioned over various media types (wireless, LTE, 4G/5G, DSL, Ethernet) | Operations complexity: Overlay mechanisms (GRE, DMVPN, IPsec) add troubleshooting and provisioning complexity, especially with advanced services like IP multicast |
| Faster time to install: Internet provisioning is quicker than other WAN services, enabling large enterprises with many small remote sites to accelerate time to market |
WAN Transport Models Comparison
| MPLS L2VPN WAN | MPLS L3VPN WAN | Internet as WAN | |
|---|---|---|---|
| Bandwidth | Very flexible (1 Mbps to 100 Gbps) | Flexible (less than L2 MPLS) | Flexible with limitations depending on site location and connectivity type (DSL vs 4G vs 5G) |
| WAN core routing control | Enterprise managed | SP managed | Enterprise managed |
| Cost | Moderate | Usually more expensive than L2, especially at high bandwidth | Cheap |
| CoS | Depends on SP; can support CoS based on L2 marking and DSCP | End-to-end Layer 3 CoS (DSCP-based) | End-to-end QoS guarantee not supported (only at network edge) |
| Staff experience | Requires experienced staff to design and manage core WAN routing | High-level routing expertise not required for the WAN | Requires experienced staff for WAN routing and overlay VPN setup |
| Remote site scalability | Routing/adjacency issues with large number of sites | Can support very large scale of remote sites | Scalable to some extent (limited by WAN router hardware capability, e.g., supported VPN sessions) |
| Site physical connectivity | Limited options (e.g., legacy Frame Relay, ME) | Very flexible, supporting any access type (legacy, Ethernet, VPN over Internet to SP MPLS) | Flexible (DSL, Ethernet, 4G/5G) |
Table 4 provides a side-by-side comparison of the three WAN transport models. Network designers should consider the following questions during the WAN transport selection planning phase:
- Who is responsible for the core WAN routing management?
- Who manages the customer edge (CE) WAN devices?
- How critical is WAN connectivity to the business? What is the impact of an outage in terms of cost and functions?
- What is the number of remote sites and the projected growth percentage?
- Are there any budget constraints?
- What WAN capabilities are required to transport business applications with the desired experience (QoS, IP multicast, IPv6)?
Review Questions
1. Which topology would allow customers to roll out new transport services such as IPv6 or multicast most rapidly?
- MPLS L2VPN WAN
- MPLS L3VPN WAN
- Internet as WAN
- Internet as transport
a. An MPLS L2VPN WAN allows a customer to roll out new transport technologies and services like IPv6 or multicast rapidly without having to wait on the provider to make any changes.
2. Which of the following WAN transport models should be chosen if the design decision calls for controlling the number of routing neighborships deterministically?
- MPLS L2VPN WAN
- MPLS L3VPN WAN
- Internet as WAN
- Internet as transport
b. An MPLS L3VPN WAN controls the number of routing neighborships deterministically (one or two BGP neighbors), while an MPLS L2VPN WAN can have hundreds of IGP neighbors across the same link.
3. Which of the following WAN transport models should be chosen if the design decision calls for the cheapest WAN solution?
- MPLS L2VPN WAN
- MPLS L3VPN WAN
- Internet as WAN
- Internet as transport
c. Of the options provided, leveraging the Internet as a WAN is the cheapest solution.
4. Which of the following WAN transport models should be chosen if the design decision requires remote site scalability for a very large number of remote sites?
- MPLS L2VPN WAN
- MPLS L3VPN WAN
- Internet as WAN
- Internet as transport
b. The best WAN transport model for a very large number of remote sites is MPLS L3VPN. MPLS L2VPN introduces routing and adjacency issues with large numbers of sites. Internet as WAN is limited by the VPN hardware’s supported number of concurrent VPN sessions.
WAN Module Design Options and Considerations
Design Hierarchy of the Enterprise WAN Module
Applying the hierarchical design principle to the WAN module maximizes flexibility and scalability, simplifies adding/removing/integrating network nodes and services (WAN routers, firewalls, WAN acceleration appliances), and enables each layer to perform specific functions in a structured manner. Figure 4 illustrates this hierarchical approach.
WAN Module Access to Aggregation Layer Design Options
The aggregation layer of the WAN module aggregates traffic and connectivity of access layer nodes (WAN edge routers, firewalls, WAN acceleration appliances). There are three common design options to interconnect WAN edge routers to the aggregation layer, as shown in Figure 5.
Table 5 compares the ECMP and mLAG options.
| Option 2: ECMP | Option 3: mLAG | |
|---|---|---|
| Link redundancy | Redundant L3 links (more routing peers) | Redundant L2 mLAG links (fewer routing peers) |
| Failure reconvergence | Routing reconvergence required when one uplink fails | No routing reconvergence required if one mLAG member link fails |
| Convergence time | Relies on routing protocol design and timers | Each flow typically utilizes one member link, limited to that link’s capacity (unless flowlet concept is used) |
| Scalability | Supports both scale out and scale up; more than two aggregation layer nodes supported | Scale up only (maximum two aggregation layer nodes per mLAG) |
| Load balancing | ECMP flow-based; larger routing database as links are added | L3/L4 hashing across mLAG member links; larger ARP table as links are added |
Option 1 (shared LAN) has several design limitations: without careful IGP tuning it can lead to slow convergence, and it has potential instability and scalability issues as nodes and routing adjacencies grow. It is the least resilient and scalable option, but can still meet requirements where tight convergence and scalability are not needed (e.g., a regional HQ WAN model with only a pair of WAN edge nodes and no growth plans).
WAN Edge Connectivity Design Options
The most common factors driving WAN edge connectivity design decisions are:
- Site type (small branch vs data center vs regional office)
- Level of criticality (cost of downtime, business impact)
- Traffic load
- Cost
Figure 6 shows the various WAN edge connectivity options.
Table 6 details each model’s characteristics.
| Connectivity Model | Redundancy | Reliability | Cost | QoS Model | Suitability |
|---|---|---|---|---|---|
| Single-homed to WAN | None | Moderate | Moderate | Consistent end to end | Small to medium branch with high traffic volume |
| Single-homed to Internet | None | Low | Low | Internet edge only | Small branch with low fault-tolerance requirements |
| Dual-homed WAN (single router) | Link only | Moderate | High | Consistent end to end | Medium to large, critical, or regional remote sites |
| Dual-homed WAN + Internet (single router) | Link only | Moderate (lower than MPLS) | Moderate | Consistent end to end over MPLS path | Medium to large or regional remote sites |
| Dual-homed Internet (single router) | Link only | Moderate (lower than MPLS + Internet) | Moderate | Internet edge only | Small to medium remote site |
| Dual-homed WAN (dual routers) | Link and device | Very high (single vs dual providers) | Very high | Consistent end to end | Hub, HQ, DC, or large regional site |
| Dual-homed WAN + Internet (dual routers) | Link and device | High | Moderate to high | Consistent end to end over MPLS path | Hub, large remote or regional sites |
Single WAN Provider Versus Dual Providers
The previous section discussed the various WAN edge design options and the characteristics of each from a design point of view. This section focuses on dual WAN edge connectivity and takes it a step further to compare the impact of connecting a multihomed site to a single SP versus two different SPs. Table 7 summarizes the key differences.
| Single Service Provider | Dual (Different) Service Providers | |
|---|---|---|
| Design simplicity | Simple, consistent (SLA, QoS design) | Can be inconsistent and more complex (different SLAs, QoS models, routing protocols) |
| Availability | SP outage can lead to a WAN blackout | Higher degree of WAN reliability and availability |
| Cost | Fixed | May lead to better competitive pricing |
| Operational complexity | Simpler (consistent) | More complex (different SLAs, possibly different routing protocols) |
Large enterprises with wide geographic distribution can mix between the connectivity options (single versus dual WAN) by using single and dual providers, based on the criticality of the site and business needs. For instance, regional hub sites and data centers can be dual-homed to two providers while smaller remote sites remain single-homed to a single provider. This mixed connectivity design approach can offer a transit path during a link failure, as depicted in Figure 7. Ideally, the transit site should be located within the same geographic area or country (in the case of global organizations) to mitigate any latency or cost-related issues by reducing the number of international paths that traffic has to traverse. The second provider can also be an Internet-based transport such as DMVPN over the Internet.
Remote Site (Branch) WAN Design Considerations
The WAN edge design options of a remote site can be based on any of the design options described in the previous section (see Table 6), where single or dual WAN edge routers can be used based on the requirements of each particular site. Most commonly, in large enterprises, remote sites are categorized based on different criteria such as size, criticality, and location, and typically all the sites under the same categorization follow the same design standards.
The edge node is usually either a CE node (for MPLS L3 or L2 WAN) or a VPN spoke node. In some cases, a single WAN edge router can perform both roles.
The level of availability can be determined based on different variables such as the level of criticality of the remote site. The rule of thumb for remote site availability is that the network should ideally tolerate single failure conditions, either the failure of any single WAN link or the failure of any single network device at the hub/HQ WAN site (by considering control plane or overlay failover techniques). However, the different business drivers, constraints, and the level of site criticality can drive the level of availability of any given remote site. In other words, remote site availability is not always a requirement or a component that must be considered in the design. In general:
- Single router, dual links: Must tolerate the loss of either WAN link
- Dual router, dual links: Can tolerate the loss of either a WAN edge router or a WAN link (multiple-failure scenarios)
In addition, from a design perspective, the selected WAN connectivity option has a significant influence on the LAN design of a remote site, as does the size of the site (in terms of the number of users and endpoints connected to the network). In general, the design models of a remote site fit into two primary models, shown in Figure 8 and Figure 9. Table 8 compares these two design models.
| Single-Tier Design Model | Multitier Design Model | |
|---|---|---|
| LAN scalability | Very limited | More scalable |
| First-hop Layer 3 service | WAN edge router | Distribution layer switches |
| LAN flexibility | Very limited | More flexible |
| LAN to WAN edge connectivity | FHRP (dual edge router scenario) | IGP over ECMP, FHRP, mLAG with FHRP, mLAG with IGP |
| Supported endpoints | Small | Small to medium-sized |
| Supported WAN connectivity | Single WAN single router; dual WAN single router; dual WAN dual routers | Single WAN single router; dual WAN single router; dual WAN dual routers |
The WAN connectivity options in Table 8 apply for both private enterprise WAN and overlaid WAN over the Internet transport.
DMVPN-Based Remote Site WAN
Using the Internet as a WAN transport with DMVPN as the overlay offers several benefits for remote site WAN connectivity, as illustrated in Figure 10:
- Cost-effective and reliable WAN connectivity over the Internet
- Reduces time to add new remote sites over various media types (LTE, 4G/5G, DSL), combined with zero-touch configuration of hub routers for new spokes
- Automatic full-mesh connectivity with simple hub-and-spoke configuration
- Any-to-any spoke-to-spoke direct connectivity
- Supports dynamically addressed spokes
- Supports provisioning behind NAT devices
- Supports automatic IPsec triggering for building IPsec tunnels
- Multiple flexible hub-and-spoke design options for different goals, scales, and requirements
Routing over GRE tunnels with large routing tables may require adjustments (normally lowering) to the maximum transmission unit (MTU) value of the tunnel interface.
Three primary connectivity options for DMVPN-based remote site WAN are shown in Figure 11:
- Single router, single link: Simplest option; WAN edge router is a single point of failure
- Single router, dual links: Redundant Internet links but edge router remains a single point of failure
- Dual routers, dual links: Eliminates any single point of failure at the WAN/Internet edge
The best option is selected based on design requirements and constraints. For example, a small retail site where users can save transactions locally during a WAN outage may not justify the added cost of redundancy. Adding it would be overengineering with no significant business value.
Enterprise WAN Module Design Options
This section highlights common proven design models as foundational reference architecture for the enterprise WAN module, based on the scale of remote sites.
These design models can be scaled based on requirements. For example, Option 1 can be migrated to Option 2 as the number of remote sites increases. The number of edge access nodes can also be scaled out. For instance, Option 1 with an additional redundant edge router to a second MPLS WAN, while the Internet edge router serves as a third level of redundancy via tunneling.
The number of remote sites in the following categorization is a rough estimation only (based on the current Cisco Validated Design at the time of writing). Typically, this number varies based on several variables such as hardware limitations and routing design in terms of number of routes.
Option 1: Small to Medium
Figure 12 illustrates this design model. Characteristics:
- Dual redundant edge routers
- Single WAN connectivity (primary path)
- Single Internet connectivity (backup path over VPN tunnel)
- Each WAN and Internet router is dual-homed to the WAN module aggregation clustered switches using Layer 3 over mLAG (or Layer 3 ECMP without switch clustering)
- Supports up to ~100 remote sites (subject to hardware limitations)
Option 2: Medium to Large
Figure 13 illustrates this design model. Table 9 compares the supported remote site WAN connectivity options for both design models. Characteristics:
- Dual WAN connectivity and dual WAN routers
- Dual Internet connectivity and dual Internet routers (backup path over VPN tunnel, and primary for VPN-only remote sites)
- Each WAN and Internet router is dual-homed to the WAN module aggregation clustered switches using Layer 3 over mLAG (or Layer 3 ECMP without switch clustering)
- Supports up to a few thousand remote sites depending on WAN router and VPN termination hardware capabilities
| WAN Module Design Option 1 | WAN Module Design Option 2 | |
|---|---|---|
| Scalability (remote sites) | Small to medium | High |
| Single WAN link (MPLS only) | Yes | Yes |
| Single WAN link (Internet only) | Yes (single hub, single point of failure) | Yes |
| Dual WAN links (MPLS only) | No | Yes |
| Dual WAN links (MPLS + Internet) | Yes | Yes |
| Dual WAN links (Internet only) | Yes (single hub, single point of failure, which eliminates the benefit of redundant Internet links at remote site) | Yes |
Option 3: Large to Very Large
This architecture targets very large-scale routed WAN deployments encompassing branch, metro connectivity, and global core backbones. It consists of five primary modules, as shown in Figure 14:
- Regional WAN: Connects branch offices and aggregates remote locations
- Regional MAN: Connects remote offices and data centers across metro area transports
- WAN core: Interconnects regional networks and data centers within a country, theater (region or country), or globally
- Enterprise edge: Connects the enterprise network to external networks and services (Internet, mobile)
- Enterprise interconnect: Aggregation and interconnection point for all modules (regional WANs, MANs, data centers, enterprise edge, and campus networks)
This hierarchical structure offers flexibility to separate the design into different element tiers suitable for different environments. A global footprint requires all elements; a single-theater footprint does not require the global core but can add it when expansion is needed.
WAN Virtualization and Overlays Design Considerations and Techniques
Overlay (self-deployed VPN) technologies are among the primary foundational technologies used by enterprises to facilitate WAN virtualization. It is critical that network designers have a very good understanding of the different overlay (VPN) options in terms of supported design models, strengths, limitations, and suitable use cases. Table 10 compares the main overlay technologies. Enterprises adopt overlay technologies for the following purposes:
- Build a cost-effective Internet-as-WAN transport model (primary or backup)
- Maintain end-to-end path isolation for logical groups across the enterprise WAN
- Secure IP communications over private enterprise WAN or Internet
- Provide controlled remote access for mobile users or third-party entities
- Provide overlaid transport for services not supported by the underlay (e.g., multicast or IPv6 over unicast-only IPv4)
| Remote Access VPN | DMVPN | IPsec | GRE | |
|---|---|---|---|---|
| Targeted transport | Public Internet | Private WAN and public Internet | Private WAN and public Internet | Private WAN and public Internet |
| Supported topology | Hub-spoke (client to server) | Hub-spoke, spoke to spoke | Point-to-point | Point-to-point |
| Routing technique | Reverse-route injection | Static and dynamic routing | Reverse-route injection | Static and dynamic routing |
| Encryption | Peer-to-peer | Peer-to-peer (with IPsec) | Peer-to-peer | Peer-to-peer (with IPsec) |
| IP multicast | Replication at hub (if supported by VPN client) | Replication at hub | Point-to-point (with GRE) | Point-to-point |
| Scalability | Moderate | High | Low | Low |
| Design flexibility | Limited (client/server only) | High | Low | Moderate |
| Operational complexity | Moderate | Moderate | High | High |
| Network virtualization | Limited (VRF-aware remote access) | Flexible (end-to-end) | Flexible (VRF-aware IPsec) | Flexible (end-to-end) |
The primary scalability limiting factor of any VPN solution is the supported number of sessions by the hardware platform that is used.
GETVPN is an encryption mechanism that preserves IP header information and supports true any-to-any encrypted IP connectivity. It is commonly used over private transport networks. However, if existing WAN platforms do not support GETVPN and no hardware upgrade is planned, other options such as IPsec with GRE or mGRE must be considered.
Review Questions
5. Which of the following network overlay technologies should be chosen if the design decision requires the highest level of scalability, spoke registration combined with optional security, and spoke-to-spoke communication dynamically?
- Remote access VPN
- DMVPN
- IPsec
- GRE
b. DMVPN is the most scalable option that also provides spoke registration and spoke-to-spoke communication dynamically.
6. Which of the following network overlay technologies provides support for a point-to-point network topology? (Choose two.)
- Remote access VPN
- DMVPN
- IPsec
- GRE
c and d. IPsec and GRE both provide point-to-point network topologies, while remote access VPN and DMVPN provide hub-spoke network topologies.
7. Which of the following network overlay technologies allows for a spoke-to-spoke traffic pattern?
- Remote access VPN
- DMVPN
- IPsec
- GRE
b. DMVPN is the only option that inherently allows spoke sites to send traffic directly between them without going to a hub or data center location.
WAN Virtualization
Introducing virtualization and path isolation over the WAN transport is commonly driven by the adoption of the network virtualization concept by the enterprise within the campus LAN, branches, or data center network. Therefore, to maintain end-to-end path isolation, network virtualization must be extended over the WAN transport in a manner that does not compromise path-isolation requirements. From a WAN design point of view, two primary WAN connectivity models drive WAN virtualization design choices, as shown in Figure 15:
- Customer/enterprise-controlled WAN (self-deployed): This model provides full control for the enterprise to use the desired core routing design and the type of virtualization techniques that meet their requirements, such as MPLS in the core or tunneling with multiple VRFs. Typically, this model is based on the fact that the enterprise controls the WAN core infrastructure or transport, as shown in Figure 15. If the WAN SP provides L2 WAN transport, it can also be categorized under this model, because the enterprise will have the control and freedom to deploy the desired end-to-end WAN virtualization techniques, as illustrated in Figure 16.
- SP-controlled WAN: This model provides the least control for the enterprise when it comes to routing and end-to-end network virtualization over an SP-controlled WAN transport (e.g., MPLS L3VPN), as depicted in Figure 17. Enterprises must either extend virtualization to the SP (PE node) or build an overlay between CE nodes. This approach is commonly referred to as over the top.
Over-the-Top WAN Virtualization: SP Coordinated
These design options require coordination with the SP to extend enterprise network virtualization over the MPLS L3VPN provider network. Table 11 compares the two approaches:
- Back-to-back VRFs to provider PE (Multi-VRF CE): Uses multi-VRF CE concept to provide L3 path virtualization extension without exchanging labels. A routing instance per VRF is required at each CE and PE node. The SP may charge per additional VRF. Figure 18 illustrates this approach.
- Enable MPLS (LDP) with provider PE (CSC model): Based on Carrier Supporting Carrier (RFC 8277). The CE node sends packets with MPLS labels to the provider PE, facilitating the enterprise’s own MP-BGP peering across the SP MPLS backbone. Only one GRE tunnel carries LDP, IGP, and MP-BGP (VPNv4/6). Figure 19 illustrates this approach.
Table 11 compares these two design approaches from different design angles.
| CSC Model | Back-to-Back VRFs Model | |
|---|---|---|
| Scalability | High | Low |
| Coordination with SP | Moderate | High |
| Design complexity | Moderate | The larger, the more complex |
| SP dependencies (e.g., multicast) | High | High |
| Extra cost | No | Yes (SP may charge per additional VRF) |
| Adding new VRF requires SP coordination | No | Yes |
| Requires label exchange with provider PE | Yes | No |
| Requires PE-CE routing instance per VRF | No | Yes |
| Control plane complexity | Moderate | High |
| Operational complexity | Moderate | High |
| Security and edge policy control | Moderate | High |
| QoS granularity | Moderate | High |
Over-the-Top WAN Virtualization: SP Independent
These design options use various overlay approaches that can facilitate the extension of enterprise network virtualization over an unmanaged L3 SP WAN. Unlike the SP-coordinated approaches, these options are end-to-end controlled and deployed by the enterprise without any coordination or dependencies with the WAN SP (simply because all the methods are based on using different tunneling mechanisms that encapsulate and hide all traffic and virtualization setup from the underlying SP transport). The following figures illustrate the various options:
- P2P GRE tunnel per VRF: Simple private VN extension; each GRE tunnel assigned to a specific VRF. This is the least scalable option. For example, 60 sites with 3 VRFs requires (59 × 3) = 177 tunnels. Suitable for 2–3 sites with 2–3 VRFs only. When tunnels share the same source/destination, a unique tunnel key is required as a demultiplexer. Figure 20 illustrates this option.
For these design options, it is hard to generalize and provide a specific recommended number of remote sites or VRFs, because the decision has to be made based on these two variables when measuring the scalability of the design option. For example, evaluating a design for a network that requires path isolation between three sites with ten different virtual networks is different from three sites with two virtual networks. In both cases, the number of sites is small; however, the number of VRFs (virtual networks) becomes the tiebreaker.
- DMVPN per VRF: One DMVPN cloud per VRF. Supports direct spoke-to-spoke per VRF. Suitable for large number of sites with very few VRFs (ideally 2), or small number of sites with few VRFs (ideally ≤3). Figure 21 illustrates this option. This design option also supports direct spoke-to-spoke traffic forwarding (bypassing the hub) per VRF. Furthermore, it supports deployments of a larger scale than those of the point-to-point GRE tunnels.
- MPLS over P2P GRE tunnel: This design option is based on the concept of encapsulating MPLS labels in a GRE tunnel, as described in RFC 4023, which helps to overcome some of the limitations of the point-to-point GRE tunnel per VRF design option, by using MPLS with an MP-BGP VPNv4/6 session over one GRE tunnel (RFC 4364 MP-BGP control plane style), as depicted in Figure 22. Consequently, there will be only one GRE tunnel required to carry LDP, IGP, and MP-BGP (VPNv4/6). Typically, there is no need to create a separate GRE tunnel per VRF with this design option. However, the number of remote sites is still a limiting factor in the scalability of this design option in the case where many remote sites need to be connected, either in a fully meshed manner or using hub-and-spoke overlay topology.
Furthermore, this design option can help simplify the interconnection of disjoint MPLS- enabled infrastructures over a native IP backbone. As illustrated in Figure 23, MPLS over GRE is used to extend the reachability between two MPLS-enabled islands over a non-MPLS backbone (native IP).
- MPLS over DMVPN (2547oDMVPN): MPLS VPN over DMVPN framework using NHRP for dynamic endpoint discovery. One DMVPN cloud carries all VRFs (LDP, IGP, MP-BGP VPNv4). Very scalable for large hub-and-spoke deployments with multiple VRFs. Supports direct spoke-to-spoke; multicast must traverse the hub. Figure 24 illustrates this option.
- MPLS over mGRE (BGP autodiscovery): Dynamic tunnel endpoint discovery using BGP as the control plane. No manual GRE tunnel configuration or LDP/RSVP on interfaces required. mGRE encapsulation is automatically generated. Supports any-to-any unicast (IPv4, IPv6 6VPE) and multicast (MDT-based) MPLS VPN. Figure 25 illustrates this option.
- EIGRP OTP (Over the Top): Forms EIGRP adjacencies across unmanaged WAN transport using unicast packets without injecting routes into the provider’s MP-BGP table. Uses LISP encapsulation for dynamic multipoint encapsulation. EIGRP OTP relies on EIGRP routing tables rather than on the LISP mapping system to populate IP routing information. Supports multiple EIGRP instances for multiple routing instances. This approach offers significant design flexibility because WAN networks will be seen as a virtual extension of the network, and enterprise customers can transparently extend their infrastructure reachability over the provider network using one unified control plane protocol. Figure 26 illustrates this option.
Comparison of Enterprise WAN Transport Virtualization Techniques
Table 12 provides a comprehensive comparison of all enterprise WAN transport virtualization techniques. Figure 27 presents a decision tree to guide the selection of the appropriate WAN virtualization technique.
| SP Dependent | Control Plane | VRF Scalability | Remote Site Scalability | Direct CE-to-CE | Multicast | IPv6 | Encryption | |
|---|---|---|---|---|---|---|---|---|
| VRF Lite | Yes | IGP/BGP | Limited | Very limited | Yes | SP dependent | SP dependent | N/A |
| CSC model | Yes | IGP + BGP/MP-BGP | Scalable | Scalable | Yes | SP dependent | SP dependent | N/A |
| GETVPN | No | IGP/BGP | Limited | Very limited | Yes (full-mesh P2P) | Yes | Yes | GETVPN |
| P2P GRE per VRF | No | IGP/BGP | Limited | Very limited | Yes (full-mesh P2P) | Yes | Yes | IPsec |
| P2P GRE + MPLS | No | IGP + MP-BGP | Scalable | Very limited | Yes (full-mesh P2P) | Yes | Yes | IPsec |
| DMVPN per VRF | No | IGP/BGP | Limited | Scalable | Yes (hub-and-spoke data path only) | Yes | Yes | IPsec |
| DMVPN + MPLS | No | IGP/BGP + MP-BGP | Scalable | Scalable | Yes (hub-and-spoke data path only) | Yes | Yes | IPsec |
| BGP mGRE + MPLS | No | MP-BGP | Scalable | Scalable | Yes | Yes | Yes | GETVPN |
| EIGRP OTP | No | EIGRP | Limited | Scalable | Yes (with third-party next hop) | No | Yes | GETVPN |
Operational complexity always increases when the network size increases and the WAN virtualization techniques used have limited scalability support, and vice versa.
WAN Virtualization Design Options Decision Tree
Enterprise WAN Migration to MPLS VPN Considerations
Network migration is one of the most challenging and critical projects of network design. Network designers sometimes spend a large amount of time focusing on the “end state” of their network designs. However, a good and successful design must consider a migration strategy to move the network from its current state to the new state. Similarly, a good migration plan should address how network control plane protocols and applications will interact when the network is partially migrated.
It is impossible to provide general definite guidance for network migration because many variables drive and influence the migration plan and its strategy, such as business and application requirements, network size, and the technologies used. However, the following are some approaches and rules of thumb based on proven WAN migration experiences that should be considered when generating strategic or tactical migration plans:
- The phased migration approach is always recommended for large-scale networks with many remote sites and multiple links or external exit points
- Logical and physical site architecture must be analyzed during the planning phase (backdoor links, OSPF areas, BGP AS numbering)
- In L3VPN, review the selected PE-CE routing protocol and how it integrates with the existing routing setup without service interruption. BGP is the most common choice and often the only option a carrier offers.
- Identify where the default route is generated and whether remote sites will use a locally generated default route or one from the MPLS cloud
- Review routing and identify summarization points. These may lead to suboptimal routing or route summarization black holes.
- In L2VPN, carefully review the topology and the effect on traffic paths after migration, especially when changing topology (e.g., hub-and-spoke Frame Relay to full-mesh VPLS)
Phased Migration Strategy
Regardless of the source and target transports, a phased migration is the safe default for large WAN changes. The guiding principle is to maintain service continuity while progressively cutting over sites, and to avoid routing loops or black holes at the boundary between the old and new environments.
A typical approach starts by designating one or more transit sites (often the hub or regional data center) that are dual-connected to both the legacy WAN and the new MPLS L3VPN (or SD-WAN overlay). The transit sites carry inter-site traffic between migrated and non-migrated spokes until the cutover is complete. During this period, any redistribution point between the old and new clouds needs careful control. Advertising only summaries across backdoor links, applying route filters, and keeping the preferred path deterministic (for example by relying on the lower administrative distance of eBGP, or by tuning metrics) are the usual tools. Where the legacy IGP and the new PE-CE protocol share a common area (such as OSPF area 0 spanning both hubs over a Layer 2 interconnect), summarization does not apply, so path selection must be driven by metric or cost tuning.
Once the transit sites are stable, spokes are migrated one at a time: establish the new PE-CE session (typically eBGP), advertise the local LAN subnets via a BGP network statement rather than redistribution, verify traffic shifts onto the new path, then decommission the legacy circuit. Repeat until all sites are moved, then retire the transit role and the backdoor interconnect. Modern SD-WAN deployments follow the same pattern, with the overlay fabric acting as the new WAN and the traditional MPLS or Internet underlay carrying the tunnels.
Review Questions
8. Which of the following should be taken into consideration when migrating from one WAN architecture to another? (Choose two.)
- Remote access VPN options
- Where the default route is generated
- The application specifications
- The logical and physical site architectures
b and d. The best options are identifying where the default route is being generated (and how it will be propagated in the new WAN architecture) and knowing the logical and physical site architectures (such as OSPF areas and area types, and whether there are backdoor links).
Summary
Today’s enterprise businesses are geographically dispersed, which makes them rely on the technology services of the WAN module to interconnect their disaggregated locations together. This is why the WAN module is one of the most vital and critical modules within the modern network architecture for businesses. Network designers must consider designs that can provide a common resource access experience to the remote sites and users, whether over the WAN or the Internet, without compromising any of the enterprise security requirements, such as end-to-end path separation between certain user groups. Overlay integration at the WAN edge in today’s networks can offer enterprises flexible and cost-effective WAN and remote-access connectivity, even considering the additional layer of control plane and design complexity that may be introduced into the overall enterprise architecture.