Network Services and Management
Overview
This chapter covers multiple networking and IP service design concepts that are core topics for the CCDE exam. The topics discussed may appear as applications or services used to achieve a business need — for example, a business-critical application might require QoS to be enabled across the network to function properly. The chapter focuses on design drivers, considerations, and approaches without covering deep technical implementation details.
The three main topics covered are:
- IPv6 Design Considerations: Critical IPv6 topics and network design elements, focusing on integration and coexistence with IPv4
- Quality of Service Design Considerations: QoS models, concepts, migrations, and corresponding network design elements
- Network Management: Network management concepts, protocols, and corresponding network design elements
IPv6-specific design considerations are covered in this chapter, but there are no specific CCDE blueprint line items for IPv6. This is because IPv4 and IPv6 are inherently included throughout every CCDE blueprint domain and topic.
IPv6 Design Considerations
This section focuses on one of the most critical IPv6 topics: the integration and coexistence of IPv4 and IPv6, covering the different design and technical options and how to follow a business-driven life-cycle design approach.
IPv6 Business and Technical Drivers
Despite IPv6 being proposed over two decades ago, serious adoption by enterprises and service providers only began in recent years. The drivers include the explosion of smartphones, mobile devices, IoT, smart connected cities, and public cloud services — all of which demand a huge number of IP addresses that IPv4 can no longer accommodate.
Organizations face several challenges from IPv4 exhaustion:
- Exhaustion and constraints of IPv4 addresses, adding complexity to managing and provisioning new services
- Added complexity in merger and acquisition scenarios, where NAT with its limitations becomes the primary option for overlapping address space
- New market trends — mobility, IoT, smart cities — requiring large numbers of IPs for IP-enabled endpoints
An organization should consider migrating to IPv6 if it encounters any of the following situations:
- Unable to expand to other global regions due to exhaustion of public IPv4 addresses
- Deploying IoT environments with large numbers of connected sensors for smart communities
- A service provider or enterprise needs seamless connectivity across fixed and mobile users where NAT is no longer viable
- Enabling or connecting to IPv6-based 4G/5G/LTE mobile networks
- Working as a supplier or partner with public sector or government entities where IPv6 is becoming the standard
- IPv6 requirements driven by end-user operating systems and applications such as Windows 11, Windows Server, macOS, system virtualization, and large-scale multitenancy
IPv6 Address Types
The following table summarizes the key technical similarities and differences between IPv4 and IPv6.
| IPv4 | IPv6 | |
|---|---|---|
| Address scope | 32 bit | 128 bit, multiple scopes |
| IP allocation | Manual, DHCP | Manual, SLAAC, DHCP |
| QoS | Differentiated services, integrated services | Differentiated services, integrated services, flow label |
| Multicast | IGMP, PIM, MP-BGP | MLD, PIM, MP-BGP |
| Security | No built-in support | IPsec built in |
Although IPv6 supports built-in IPsec, it is a misconception that IPv6 is inherently more secure than IPv4. If IPsec is implemented, it provides confidentiality and integrity between two hosts, but it does not address link operation vulnerabilities, attacks, or most denial-of-service (DoS) attacks.
IPv6 has three types of unicast addresses:
- Link local (fe80::/64): Nonroutable, exists only within a single Layer 2 domain. Required on every IPv6-enabled interface even when routable addresses are also assigned (RFC 4291).
- Unique local address (ULA) (fc00::/7): Routable within the administrative domain of a given network. Conceptually similar to IPv4 private address ranges (RFC 1918).
- Global (2000::/3): Routable across the Internet. Conceptually similar to IPv4 public address ranges.
Migration and Integration of IPv4 and IPv6
For network architects and designers to achieve a successful IPv6 migration or integration, they must follow a structured approach based on the top-down design methodology: network discovery, assessment, planning, design, deployment, monitoring, and optimization.
Discovery Phase
At this phase, network architects focus on understanding and identifying the business goals and drivers toward IPv6 enablement, along with other influencing factors such as project timeframe, government compliance, and geographic distribution of sites with regard to IP addressing availability. It is also critical to identify at a high level whether the existing network infrastructure (LAN, WAN, security nodes, services, and applications) supports IPv6 and whether the business is willing to invest in upgrading nodes that do not.
Solution Assessment and Planning
After the discovery phase, network designers analyze each identified influencing factor and generate a migration or integration plan. The following considerations drive the detailed design of the transition strategy:
- Goal: Understanding the main purpose of the migration — for example, accessing services in the data center, regulatory compliance, or enabling IPv6 at the Internet edge due to lack of public IPv4 pools
- Infrastructure support: Whether the entire infrastructure supports IPv6 or only the network edges, and whether the business is willing to upgrade non-IPv6 devices. This drives the selection of a technology solution to overcome any IPv6 support constraints.
- Existing services and applications: Many applications still do not support IPv6, especially those developed in-house. Failing to consider how IPv6 networks will reach IPv4-only applications can break communication and seriously impact the business.
For example, if an enterprise needs end-to-end IPv6 but the core does not support it and no budget exists for upgrades, while access to new IPv6 applications in the data center is urgent, the designer may suggest IPv6-over-IPv4 tunneling, DNS-based translation, or NAT64 as interim solutions.
Transition Approaches for Enterprise Networks
| Design Goal | Priorities | Timeframe | Design Approach | Design Considerations |
|---|---|---|---|---|
| Migrate to pure IPv6 or dual stack | No service interruption | Flexible | Migrate core to dual stack first, then gradually migrate other modules to IPv6-only or dual stack | Increased hardware resource utilization; increased control plane complexity; core must support IPv6 |
| Migrate fully or partially to IPv6-only or dual stack | Quickly migrate certain modules first (e.g., data center) | Limited | Migrate certain enterprise modules first; DNS translation or tunneling (e.g., ISATAP) required to maintain IPv4/IPv6 communication | Suitable when core does not support IPv6; increases design, control plane, and operational complexity |
| Migrate data center to support IPv6 hosts | Support virtualized and non-virtualized IPv6 hosts | Flexible | Dual stack, VXLAN overlay, or MPLS-based 6PE/6VPE depending on DC architecture | Dual stack increases hardware resource utilization and operational complexity |
| Provide IPv6 access at the Internet edge | Support translation between IPv4 and IPv6 | Flexible | Translation via load balancer, pure DNS, or NAT64 | Increases operational complexity; requires additional IPv6 security considerations |
Review Questions
1. What is the first step an enterprise should take when migrating to an IPv6-only network if it wants to ensure that no service interruption occurs?
- Leverage an overlay like VXLAN to support IPv6 hosts within the data center
- Leverage a translation mechanism based on a load balancer, pure DNS, or classical NAT64
- Migrate certain modules of the enterprise network first while leveraging a translation/tunneling mechanism to maintain communication between IPv6 and IPv4 islands
- Migrate the core to be in dual-stack mode first, then migrate the other modules as time allows
d. Migrate the core to be in dual-stack mode first, and then other enterprise modules can be gradually migrated to IPv6-only or dual stack, depending on the goals and requirements of the business. Migrating to IPv6 this way ensures there is no service interruption.
2. Which of the following IPv6 design approaches would allow a network to provide IPv6 access inbound or outbound at the enterprise Internet edge?
- Leveraging an overlay like VXLAN to support IPv6 hosts within the data center
- Leveraging a translation mechanism based on a load balancer, pure DNS, or classical NAT64
- Migrating certain modules of the enterprise network first while leveraging a translation/tunneling mechanism to maintain communication between IPv6 and IPv4 islands
- Migrating the core to be in dual-stack mode first, then migrating the other modules as time allows
b. To provide IPv6 access either inbound or outbound at the enterprise Internet edge, a translation mechanism is required that is either based on a load balancer, pure DNS, or classical NAT64.
Transition Approaches for Service Provider Networks
Enabling IPv6 in an SP network differs from enterprise networks. SPs typically enable IPv6 either to provide a transit path for other SPs or to offer IPv6 connectivity to customers. The mechanism used is mainly driven by the goal and whether the transport is native IPv4 or MPLS-based.
Some transition approaches for SP networks (6PE, 6VPE, 6rd) are out of scope for the CCDE v3 exam at the time of this writing but are included in the table below for completeness.
| Goal | Transport | Possible Approaches |
|---|---|---|
| Provide IPv6 Internet transit | Native IPv4 | Dual stack, tunneling (manual RFC 2893, GRE, L2TPv3) |
| Provide IPv6-based services and Internet access to residential clients | Native IPv4 | Dual stack, 6rd, tunneling such as IPv6 over L2TP |
| Provide IPv6 Internet access/transit | MPLS | 6PE, IPv6 over pseudowires |
| Provide IPv6 connectivity for MPLS L3VPN customers | MPLS | 6VPE |
| Provide IPv6 Internet access for MPLS L3VPN customers | MPLS | 6VPE |
SP networks are transport networks with no directly connected endpoints, making the transition more flexible and less interruptive than in enterprise networks. When MPLS is enabled, IPv6 integration is simpler using MP-BGP overlay capabilities (6PE, 6VPE). Operators can also take a phased approach, enabling only the PE nodes that need to provide IPv6 transit first, without changing core (P) routers.
Today’s SPs also offer hosted services, SaaS, cloud-based data centers, and content services such as IPTV — meaning coexistence of IPv4 and IPv6 is inevitable. One common approach is to enable IPv6 at the services level first, requiring customers to be IPv6-enabled or to use translation (such as NAT v4-to-v6 at the enterprise Internet edge or DNS-based translation offered by the SP).
One of the primary considerations when migrating to IPv6 is to ensure that IPv6 is secured in the same manner as IPv4. For instance, since Windows Server 2008, IPv6 has been native to Windows and supports transition technologies such as ISATAP. If a server is compromised and security rules do not account for IPv6, malicious traffic can ride an IPv6 tunnel without being blocked by security devices.
Detailed Design
After selecting the suitable approach, network designers put together the details: integration mechanism selection, tunnel termination, IP addressing, routing design, network security, and network virtualization considerations. The outcome of the design phase is used by implementation engineers during deployment. If anything proves impractical, it is reported back to the designer for revision.
IPv6/IPv4 integration mechanisms can be classified into four categories:
- Dual stack
- Tunneling based
- Translation based
- MPLS environment solutions
The mechanisms in the table below are not prescriptive best practices. They represent commonly considered technology solutions for certain scenarios. Network designers must always assess the different influencing factors before suggesting any approach.
| Mechanism | Scenario | Targeted Environment | Design Concern |
|---|---|---|---|
| Dual stack | End-to-end IPv6 + IPv4 | Any environment ultimately moving to end-to-end IPv6 | IPv6 support required on all L3 platforms; increased control plane complexity; potential scalability weaknesses depending on hardware resources |
| Tunneling: P2P (L2TPv3, GRE RFC 2473) | Transit IPv6 over IPv4-only network | Small number of IPv6 islands interconnecting over IPv4 | Scalability and encapsulation overhead; increased control plane complexity |
| Tunneling: ISATAP (RFC 5214) | Host-sourced tunnels terminating at IPv6-enabled modules | Trial IPv6 services or partial IPv6 enablement (e.g., DC only); mostly enterprise | Affects overall network architecture; QoS, multicast, and NAT issues; adds control plane and operational complexity |
| Tunneling: mGRE | Interconnect IPv6 over IPv4 in hub-and-spoke topology | Hub-and-spoke IPv6 islands over IPv4 WAN | Multicast traffic must go via hub; adds control plane and operational complexity |
| Tunneling: 6rd (RFC 5969) | Extend IPv6 deployment to customer/residential sites with limited impact on existing IPv4 | SP networks offering IPv6 over IPv4 to residential customers | Simple, stateless, automatic encap/decap; depends on equipment support; adds control plane complexity |
| Tunneling: IPv6 over L2TP | IPv6 access for residential gateways | DSL/residential SPs with limited investment | Stateful architecture on LNS; dual-stack IPv4/IPv6 on residential gateway LAN side; increases operational complexity |
| Translation: NAT64/SLB | IPv6 endpoints accessing IPv4 Internet or services (LTE/4G/5G) | Green-field IPv6 SPs or enterprises interconnecting to legacy IPv4 | Does not support every application/protocol; performance may not match dual-stack depending on traffic load |
| Translation: DNS64 | Access applications/services by name; translates between v4 and v6 based on source and target host | Services and applications reachable by name | Limited to name-based access; NAT64 usually required alongside DNS64 |
| Translation: LISP | Facilitate IPv6 communication over IPv4 transport via LISP encapsulation | Enterprise edge, DC, or WAN with mixed IPv4/IPv6 | High operational complexity; increased control plane complexity; devices must support LISP |
| MPLS: 6PE | Enable IPv6 over existing MPLS/MP-BGP IPv4 network | Large enterprises and SPs providing IPv6 over IPv4 infrastructure | No traffic separation between customers; increases control plane complexity |
| MPLS: 6VPE | Enable IPv6 over existing MPLS/MP-BGP IPv4 network for VPN customers | MPLS VPN providers or enterprises with MPLS VPN networks | Increases control plane complexity; may introduce scalability limitations due to separate RIB/FIB per customer |
Adding any overlay or tunneling mechanism to the network will almost always increase operational complexity. The level varies based on network size, routing design, staff knowledge, and the nature of the selected technology.
Review Questions
3. Which of the following IPv6 mechanisms would allow for end-to-end IPv4 and IPv6 functionality?
- Dual stack
- ISATAP
- GRE
- mGRE
a. Dual stack is when a device runs both IPv4 and IPv6 protocol stacks. When all devices in the network run like this, it is called end-to-end dual stack.
4. Which of the following IPv6 mechanisms would allow network-to-network connectivity to transit IPv6 over IPv4-only devices without any additional control plane protocols?
- Dual stack
- ISATAP
- GRE
- mGRE
c. Generic Routing Encapsulation (GRE) is a protocol for encapsulating data packets that uses one routing protocol inside the packets of another protocol. GRE sets up a direct point-to-point connection across a network — in this case, IPv6 runs through the GRE tunnel that traverses the IPv4 network.
Deployment, Monitoring, and Optimization
These phases cover implementation of the design followed by continuous monitoring to ensure the network delivers the promised value. The implementation should follow a plan that specifies which services and features need to be enabled at each step, along with any potential risks associated with each change. For example, enabling IPv6 at the routing protocol level may reset existing IPv4 peering sessions, depending on the routing protocol, hardware platform, and software in use.
Transition to IPv6 Scenario
ABC Corp. is an international real-estate company headquartered in Singapore, with 116 remote sites across Asia, Australia, and Europe, as per Figure 1.
The CIO has decided to migrate the entire IP network and applications to be primarily IPv6-based to support long-term business innovation, while maintaining business continuity:
- Retain the ability for internal users to access legacy IPv4-only applications and the IPv4 Internet
- Provide external users the ability to access ABC Corp.’s new IPv6 web-based services over the IPv4 Internet
- IPv4 Internet websites accessed by internal IPv6 users must appear as IPv6 addresses (DNS64 synthesis of A records into AAAA records)
- Go-live within six weeks
The primary design constraints are:
- Quick transition solution required
- Internet and DC services are centralized at the HQ/hub site
- The current MPLS VPN WAN provider does not support IPv6
The transition approach is illustrated in Figure 2.
Phase 1 — Fast IPv6 Enablement:
- Enable IPv6 (dual stack) on all network nodes, starting from the DC then WAN routers
- Enable IPv6 routing on DC, WAN routers (hub and spokes), and Internet edge
- Enable stateful NAT64 at the IPv4 Internet edge to provide IPv4 Internet access for internal IPv6 devices
- Introduce DNS64 to synthesize IPv4 DNS A records into AAAA records, making IPv4 Internet services appear as IPv6 to internal users
- Enable static NAT64 at DC edge nodes for internal IPv6 users to access legacy IPv4-only applications
- Enable static NAT64 at the IPv4 Internet gateway for external users to access ABC Corp.’s IPv6 web services over the IPv4 Internet
- Interconnect IPv6 network islands (spokes/remote sites) with HQ using IPv6 over mGRE/DMVPN tunneling over the IPv4 MPLS VPN WAN
Phase 2 — Design Optimization:
- Migrate the WAN to a provider supporting IPv6 MPLS L3VPN (6VPE), replacing the IPv6 DMVPN overlay
- Disable IPv4 routing in network areas where no IPv4 clients/hosts exist (e.g., remote sites), reducing load from holding separate RIB/FIB tables per IP version
Quality of Service Design Considerations
In today’s converged networks, there is an extremely high reliance on IT services and applications. Converged IP networks carry various traffic types — voice (IP telephony, HD audio, VoIP), video (video-on-demand, interactive video, telepresence, IP surveillance, digital signage), and an unlimited variety of data applications — each with different network requirements.
To deliver the desired quality of experience, network designers need a mechanism that can selectively prioritize traffic by providing dedicated bandwidth, controlled jitter and latency, and improved loss characteristics, while ensuring that prioritizing one flow does not cause other flows to fail. This mechanism is quality of service (QoS).
QoS High-Level Design: Business-Driven Approach
To design and deploy QoS successfully, network designers must follow the top-down approach: first understand the critical applications from the business point of view, then assess the optimal QoS design strategies to meet business and application requirements.
The goal is to align QoS design with business priorities and expectations. For example, a financial application sensitive to packet loss must be treated as high priority because any loss of connectivity can cost the business significantly. Similarly, an SP with a strict SLA to deliver voice traffic with no more than 1% end-to-end packet loss must apply the right QoS design to meet that SLA — otherwise it faces tangible penalties and intangible reputation damage.
| Strategic Goal | Approach | Design Considerations |
|---|---|---|
| Understand business requirements | Understand business priorities and goals | Identify primary business drivers; highlight constraints such as budget |
| Identify the scope | Understand the scope of the QoS design (campus, WAN, VPN, SP edge, or end to end) | Is the application used within the campus, across the WAN, or over VPN? Is there any network in the path not directly controlled, such as a WAN? |
| Identify mission-critical applications | Identify which applications need to be treated differently; identify non-business applications | Identify mission-critical applications or services (e.g., SAP, FCoE, VoIP, TelePresence) |
| Understand application requirements | Identify the characteristics of each application | What network delivery is required: TCP, UDP, unicast, multicast? Application sensitivity to packet loss, jitter, and delay |
| Select a design strategy and identify technical constraints | Clarify the end-to-end design strategy: number of QoS classes, QoS toolset, etc. | What traffic classification strategy is used within the LAN (e.g., 8 or 12 classes)? What MPLS DiffServ tunneling mode is used? Is the core/WAN native IP or MPLS? What CoS are supported over the WAN? Can the targeted network node support the required number of queues or priority queuing? |
QoS Architecture
There are two fundamental QoS architecture models:
- Integrated Services (IntServ) (RFC 1633): Offers end-to-end QoS based on application transport requirements (usually per flow) by explicitly controlling network resources and reserving the required bandwidth end to end along the path per network node for each traffic flow. Resource reservation protocols such as RSVP and admission control mechanisms form the foundation of this process.
- Differentiated Services (DiffServ) (RFC 2475): Offers QoS based on classifying traffic into multiple subclasses where packet flows are assigned different markings to receive different forwarding treatment (per-hop behavior, PHB) per network node along the path within each differentiated services domain (DS domain).
Both QoS architectural models are applicable for IPv4 and IPv6, as both include the same 8-bit field in their headers (IPv4: Type of Service; IPv6: Traffic Class). The larger IPv6 packet header must be considered when calculating aggregate bandwidth of traffic flows.
QoS DiffServ Architecture and Toolset
A true and effective QoS design must cover traffic flows end to end. Because each traffic flow may traverse multiple networks with different QoS philosophies, the design must be divided into differentiated services domains (DS domains) as described in RFC 2475. Each DS domain consists of multiple interconnected network nodes operating under a common service provisioning policy, with a set of PHB groups enabled on each node.
Each DS domain has two primary types of nodes:
- Internal nodes: Nodes belonging to a single DS domain, sharing the same QoS provisioning policy
- DS boundary nodes: Nodes facing other DS or non-DS-capable domains, responsible for applying traffic policies (QoS policies) on traffic flows in both directions (ingress and egress) based on a predefined or agreed model between domains
Figure 3 illustrates the relationship between DS domains and boundary nodes.
DS domains can take different forms:
- An enterprise domain with an SP domain in the middle (WAN transport)
- Within an enterprise: multiple DS domains such as campus LAN, WAN, DC, and DMVPN over Internet
In the second scenario, multiple DS domains belonging to a single administrative authority can be combined under one global DS region. Each DS domain in that region can have its own QoS provisioning standards, offering a more structured and tiered design for large-scale networks (Figure 4).
Traffic conditioning and QoS policies are enforced at multiple points across each domain using the following primary QoS toolset:
- Traffic classification and marking
- Traffic profiling and congestion management
- Congestion avoidance (active queue management)
- Admission control
Traffic Classification and Marking
Traffic classification selects frames or packets in a traffic stream based on the content of some portion of the frame or packet header, to which different policies can then be applied. Traffic marking writes a value into the packet header to be identified by QoS policies and placed in the desired class with the desired treatment at different stages during the end-to-end packet trip.
Classification does not always require marking. In some scenarios, traffic only needs to be selected based on IP header fields (source/destination address, source/destination port, incoming interface) and associated with a QoS policy action such as placing it in a predefined queue. Classification should almost always be performed at the point of network access (as close to the source as possible), then associated with the appropriate marking value (usually ToS header bits) so that QoS policies can be applied at any node across the network.
Marking also establishes trust boundaries at the edge of the network — the point where markings such as CoS or DSCP begin to be accepted as set by the connected endpoint. Trust boundaries are classified into three primary models (Figure 5):
- Trusted model: Used with endpoints that can mark their own traffic and are approved from a security standpoint — such as IP phones, voice gateways, wireless access points, videoconferencing, and video surveillance endpoints. Ideally these are fixed (non-mobile) endpoints.
- Untrusted model: Uses manual traffic classification and marking. Common candidates are PCs and servers, which are subject to attack and infection. Malicious traffic marked with high-priority CoS/DSCP values can cause a true DoS situation. Network designers selectively classify each application’s traffic flows and mark them with the desired CoS/DSCP value, along with a policy that either limits each class to a predefined maximum bandwidth or marks down out-of-profile traffic to a lower-priority value.
- Conditional trust model: Extends the trust boundary to a connected device such as an IP phone (detected via CDP in Cisco solutions). The IP phone sends its traffic in a trusted manner while overriding PC traffic (connected to the back of the phone) to DSCP 0. Offers a simple method for large IP telephony deployments. If PCs run applications requiring specific DSCP values (e.g., softphone), manual classification and marking at the access switch edge port are required.
Marking values can also be re-marked at any location within or between DS domains. Between domains, re-marking handles mismatches between ToS values. Within a domain, re-marking moves out-of-profile traffic into a lower-priority class as a protective countermeasure.
| OSI Layer | Classification | Marking |
|---|---|---|
| Physical | Input interface | N/A |
| Layer 2 | VLAN ID, MAC, IEEE 802.1Q/p CoS | IEEE 802.1Q/p CoS |
| Layer 2.5 | MPLS label, MPLS EXP | MPLS EXP |
| Layer 3 | IP DSCP, IP source/destination | IPP, DSCP |
| Layer 4 | Source/destination port | IPP, DSCP, EXP |
| Layers 5–7 | Application signature (e.g., NBAR) | IPP, DSCP, EXP |
DSCP marking is more commonly used than IP Precedence (IPP) because of its higher flexibility and scalability. However, a mix of both may be required in migration or integration scenarios between different domains (such as M&A or WAN MPLS VPN providers offering CoS based on IPP). In this case, class selector PHB provides backward compatibility with ToS-based IP Precedence (RFC 4594, 2474).
After traffic flows are classified and marked, they are grouped into DS classes. Application flows sharing similar traffic characteristics and network requirements (delay, jitter, packet loss) are placed under the same DS class, enabling network operators to assign the desired treatment per class at different locations across the DS domain — such as assigning different queuing models per class to control traffic during congestion.
Traffic Profiling and Congestion Management
During normal operation where traffic is at or below the maximum available bandwidth, packets are sent out of the interface as soon as they arrive. During congestion, packets arrive faster than the outgoing interface can handle them, leading to undesirable outcomes for business-critical applications and user quality of experience.
If the network is overprovisioned with bandwidth, QoS adds minimal value. However, it is common practice to enable QoS with a minimum number of classes to cater to critical applications in case of unpredicted congestion — such as a node failure causing overutilization of a secondary path if capacity planning did not account for failure scenarios.
Congestion management allows nodes to queue accumulating packets at the outbound interface until the interface (Tx-Ring) is free. Transmission of queued packets is scheduled based on assigned priority and a queuing mechanism configured per traffic flow aggregate (predefined traffic profiling).
| Queuing Mechanism | Characteristics |
|---|---|
| Weighted Fair Queuing (WFQ) | Dynamic distribution among all traffic flows based on predefined values such as DSCP |
| Priority Queuing (PQ) | Typically supports four queues with different priority levels; higher-priority queues are always serviced first |
| Class-Based WFQ (CBWFQ) | Provides class-based queuing (user-defined classes) with a minimum bandwidth guarantee; supports flow-based WFQ for undefined classes (class-default); supports Low-Latency Queuing (LLQ) |
Other techniques such as WRR and custom queuing exist but are less commonly used. FIFO is the default when no other queuing is configured — suitable for large links with low delay and minimal congestion, but with no priority or traffic classes.
WFQ offers simplified, automated, fair flow distribution but can impact certain applications. For example, a telepresence endpoint requiring 5 Mbps over a 10-Mbps WAN link shared by ten flows would receive only 1 Mbps under WFQ fairness, degrading video quality. With CBWFQ, network designers can place telepresence RTP streams in their own class with a minimum bandwidth guarantee of 5 Mbps during congestion. Interactive video traffic can be assigned to the LLQ to be prioritized and serviced first.
CBWFQ supports two LLQ models: single LLQ and multi-LLQ. With multi-LLQ, multiple sub-LLQs can be enabled inside a single aggregate strict priority queue, allowing multiple traffic types (e.g., VoIP and video) to be assigned to the LLQ. However, service within the LLQ itself is FIFO-based, so admission control is required to protect one LLQ from another (e.g., protecting a voice LLQ from a video LLQ).
Cisco IOS includes a built-in implicit policer with the LLQ that limits the available bandwidth of the strict-priority queue to the allocated amount, preventing bandwidth starvation of non-real-time flows serviced by the CBWFQ scheduler. This applies only during periods of interface congestion (full Tx-Ring). A similar implicit policer applies per sub-LLQ in the multi-LLQ model.
Hierarchical QoS
At the enterprise edge, links are commonly provisioned at sub-line rate — for example, a physical 1-Gbps Ethernet link provisioned at 10 Mbps or 50 Mbps. In this setup, QoS policies such as CBWFQ provide no value because QoS only activates when the interface detects congestion. Since the physical line rate is higher than the provisioned bandwidth, no congestion is detected even when the actual provisioned rate is saturated.
Hierarchical QoS (HQoS) solves this by using a shaper at the parent policy to simulate backpressure, informing the router that congestion has occurred at the provisioned rate so that child QoS policies can take effect (Figure 6).
Congestion Avoidance (Active Queue Management)
Congestion management techniques manage the front of the queue — which packets are sent first. Congestion avoidance algorithms manage the tail of the queue — which packets are dropped first when queuing buffers are full.
Weighted Random Early Detection (WRED) is the most commonly used technique. Packets are dropped based on their ToS markings:
- IP Precedence-based: Packets with lower IPP values are dropped more aggressively than those with higher IPP values
- DSCP-based: Packets with higher AF drop precedence values are dropped more aggressively
When WRED selectively drops packets, it triggers TCP windowing mechanisms to adjust flow rates to manageable levels, optimizing TCP-based applications. WRED is a member of the broader Active Queue Management (AQM) family of technologies.
Review Questions
5. Which of the following QoS queuing characteristics is an example of Weighted Fair Queuing?
- Supports real-time queuing and minimum bandwidth guarantee
- Offers a dynamic distribution based on DSCP values
- Four queues with associated levels of importance, with the most important being serviced first
- Suitable for large links that have low delay with very minimal congestion, but has no requirement for priority or classes of traffic
b. The WFQ algorithm offers a dynamic distribution among all traffic flows based on weights such as DSCP values.
6. Which of the following QoS queuing characteristics is an example of FIFO?
- Supports real-time queuing and minimum bandwidth guarantee
- Offers a dynamic distribution based on DSCP values
- Typically four to six queues with associated levels of importance, with the most important being serviced first
- Suitable for large links that have low delay with very minimal congestion, but has no requirement for priority or classes of traffic
d. FIFO queuing is the default when no other queuing is used. Although suitable for large links with low delay and minimal congestion, it has no priority or classes of traffic.
7. Which of the following QoS queuing characteristics is an example of Priority Queuing?
- Supports real-time queuing and minimum bandwidth guarantee
- Offers a dynamic distribution based on DSCP values
- Typically four to six queues with associated levels of importance, with the most important being serviced first
- Suitable for large links that have low delay with very minimal congestion, but has no requirement for priority or classes of traffic
c. Priority Queuing typically has four to six queues with different priority levels, and the higher-priority queues are always serviced first.
8. Which of the following QoS queuing characteristics is an example of LLQ?
- Supports real-time queuing and minimum bandwidth guarantee
- Offers a dynamic distribution based on DSCP values
- Typically four to six queues with associated levels of importance, with the most important being serviced first
- Suitable for large links that have low delay with very minimal congestion, but has no requirement for priority or classes of traffic
a. LLQ supports real-time queuing and minimum bandwidth guarantee.
Admission Control
Admission control keeps traffic flows in compliance with DS domain traffic conditioning standards — such as an SLA specifying the maximum allowed traffic rate per class and per link, where excess packets are discarded to keep flows within the agreed traffic profile. There are two primary ways to perform admission control:
- Traffic policing: When traffic reaches the predefined maximum contracted rate, excess traffic is either dropped or re-marked (marked down)
- Traffic shaping: Excess packets are buffered and delayed, then scheduled for later transmission over increments of time, smoothing the output rate and preventing unnecessary drops
The difference between policing and shaping is illustrated in Figure 7 and Figure 8.
Buffering excess packets in traffic shaping may introduce delay, especially with deep queues. For real-time traffic, it is sometimes preferable to police and drop excess packets rather than delay them, to avoid degraded quality of experience.
QoS Design Strategy
Effective QoS design must be measured end to end across the packet’s trip from source to destination. Network designers must consider a consistent and unified QoS design model based on available bandwidth, traffic characteristics, and network scope (campus only, WAN only, or end to end across the entire enterprise — single DS domain versus multiple DS domains).
Table 8 provides a generic 12-class QoS baseline model based on both the Cisco QoS Baseline and informational RFC 4594, offering common and unified traffic marking and profiling characteristics across single and multiple DS domains.
| Application Class | Per-Hop Behavior | IETF RFC | Queuing & Dropping | Application Examples |
|---|---|---|---|---|
| VoIP Telephony | EF | 3246 | Priority Queue (PQ) | IP Telephony (IPT) |
| Broadcast Video | CS5 | 2474 | (Optional) PQ | IP Video Surveillance/IPTV |
| Real-time Interactive VC | CS4 | 2474 | (Optional) PQ | Telepresence |
| Multimedia Conferencing | AF4 | 2597 | BW Queue + DSCP WRED | IPT Video |
| Multimedia Streaming | AF3 | 2597 | BW Queue + DSCP WRED | Video on Demand (VoD), E-learning |
| Network Control | CS6 | 2474 | BW Queue | EIGRP, OSPF, BGP, HSRP, IKE |
| Call-Signaling | CS3 | 2474 | BW Queue | SCCP, SIP, H.323 |
| Mgmt (OAM) | CS2 | 2474 | BW Queue | SNMP, SSH, Syslog |
| Low-Latency Data | AF2 | 2597 | BW Queue + DSCP WRED | ERP Apps, CRM Apps, Database Apps |
| High-Throughput Data | AF1 | 2474 | BW Queue + DSCP WRED | E-mail, FTP, Backup Apps, Content Distribution |
| Best Effort | DF | — | Default Queue + RED | Default Class |
| Low-Priority Data | CS1 | 3662 | Min BW Queue (Deferential) | YouTube, iTunes, BitTorrent, Xbox Live |
Bold rows indicate the key changes introduced by RFC 4594 compared to the original Cisco QoS Baseline (2002): Broadcast Video was assigned CS5, and Call-Signaling was re-marked from AF31 to CS3.
The IETF DiffServ RFCs provide consistent PHBs for applications marked to specific DSCP values but do not specify which application should be marked with which DSCP value. RFC 4594 (informational, August 2006) puts forward 12 application classes matched to RFC-defined PHBs. The most significant difference from the original Cisco QoS Baseline (2002) is the recommendation to mark call signaling from AF31 to CS3. RFC 4594 is an informational RFC — an industry best practice, not a standard.
The 12-class model is comprehensive and flexible but is not always viable:
- Not all enterprises or SPs need such a wide QoS design model
- Most WAN providers offer only 4- or 6-class QoS models, making mapping complex
Both 4- and 6-class models provision only a single class for real-time traffic (usually voice). If video is added, either a higher class model (such as 8-class) is required, or voice and video must share a single class — which may not be desirable for large deployments with many IP telephony and video endpoints. Figure 9 shows how classes are mapped between models with different numbers of classes.
As a general rule, network designers should use a phased approach: start with a simple QoS model (such as 4-class) as a baseline, then add classes as requirements mandate. This minimizes initial design and operational complexity.
Although standard best practice guides recommend bandwidth allocation percentages per class — such as no more than 33% of available bandwidth for real-time traffic (LLQ) and 25% for best effort — these are generic baselines, not fixed rules. Consider the following example:
- A campus network with 20 access switches, each with 30 IP phones using G.711 codec (~80 kbps per call), and a WAN designed for a maximum of 20 simultaneous calls
- Allocating 33% LLQ on 10-Gbps campus uplinks reserves 3.3 Gbps — a potential security risk if malicious traffic is marked DSCP EF
- On the 10-Mbps WAN link, 20 simultaneous calls require only 1.6 Mbps; allocating 33% (3.3 Mbps) wastes bandwidth
Network designers must adjust bandwidth allocation based on actual traffic flow requirements, security concerns, available bandwidth, and whether the link is LAN, WAN, or data center. The overall QoS design framework is summarized in Figure 10.
The QoS design framework follows this top-down flow:
- Business Requirements: Business drivers, goals, requirements, and design scope (WAN, campus, or end to end)
- Functional and Application Requirements: Identify high-priority business applications and understand their network requirements and attributes
- Classification and Marking: Classify and mark application flows as close to the traffic source as technically possible, combined with admission control
- Congestion Management and Avoidance: Profile traffic flows and aggregates into DS classes, then assign each class an appropriate queuing mechanism combined with a queue management technique
- Monitor and Optimize
Enterprise QoS Design Considerations
Enterprise Campus
Today’s campus networks are provisioned with Gigabit/10 Gigabit bandwidth, where queuing needs are minimal compared to the WAN and Internet edge. However, QoS in the campus is not limited to queuing functions. Unified marking and accurate traffic classification (as close to the source as possible) also enables policing across the campus LAN, giving network operators flexibility to manage traffic based on ToS values and providing a protective mechanism against DoS attacks.
It is recommended that QoS be enabled across the campus LAN to maintain a seamless DS domain design where classification and marking policies establish trust boundaries, and policers protect against undesired flows at the access edge.
Enterprise Edge
The enterprise edge (WAN, extranet, or Internet) is where traffic flow aggregation occurs — many flows from the high-bandwidth LAN side must exit through a lower-capacity edge link. QoS is always a primary function at the enterprise edge for bandwidth optimization, especially for converged voice, video, and data traffic.
The enterprise edge represents the DS domain boundary where traffic must be mapped and profiled to align with the adjacent DS domain. For example, a 12-class enterprise model must be mapped to a 4-class SP model at the WAN edge router (CE) toward the SP edge (PE), as shown in Figure 11.
It is common for SPs to offer CoS based on IP Precedence only. A DSCP value such as AF41 (binary 100010) converts to IPP 4 (binary 100), which comes back as DSCP 32 (binary 100000) at the remote site. Re-marking is required at the receiving side in the ingress direction to maintain unified end-to-end QoS marking.
IP Tunneling QoS Design Considerations
VPN solutions add additional IP and ESP headers to each packet, increasing overhead and bandwidth consumption. For example, a G.711 VoIP RTP stream requiring 80 kbps at Layer 3 increases to ~112 kbps when sent over GRE encrypted with IPsec, which is an almost 40% bandwidth increase to transport the encrypted VoIP call. Therefore, bandwidth consumption with VPN is an essential point network designers need to consider, because it will usually impact the overall supported number of simultaneous voice calls or application sessions based on the actual available bandwidth.
Additional QoS design factors to consider with VPN solutions:
- Additional packet delay from encryption and decryption
- Pre-crypto queuing
- QoS packet reordering (prioritization) may cause IPsec anti-replay to drop legitimate out-of-sequence packets
- ToS value preservation
- MTU issues
- Original flow information is hidden for outbound QoS policies by default (Cisco QoS pre-classify feature helps here)
- VPN logical topology vs underlying IP transport topology (e.g., DMVPN Phase 1 over MPLS L3VPN transforms any-to-any to hub-and-spoke)
The following QoS toolset forms the foundation of QoS design with VPN solutions:
- Hierarchical QoS: Helps when QoS policies must be applied on a sub-line rate interface or per-tunnel
- CBWFQ: Required when different traffic treatment is needed per class
- Admission control: Provides per-child QoS class admission control
- WRED: Optimizes TCP-based applications
With point-to-point VPN (classic IPsec or GRE), traffic flow is deterministic but introduces high operational complexity and scalability limitations in large deployments — for example, maintaining manual QoS policies per tunnel at the hub site.
DMVPN introduces additional QoS challenges because it supports hub-and-spoke topology with direct spoke-to-spoke traffic, and the hub’s higher-bandwidth pipe can easily congest the spokes. Adding a QoS policy per spoke at the hub is non-scalable.
The per-tunnel QoS feature with DMVPN promotes a zero-touch hub design using NHRP groups to dynamically provision QoS policies on a per-spoke basis at the hub mGRE tunnel during spoke registration. Remote sites are profiled based on their provisioned WAN/Internet bandwidth and automatically assigned a QoS policy that shapes traffic from the hub toward each spoke to match the spoke’s maximum download rate.
At the hub site, network operators still need to define a policy to shape the interface to the actual provisioned sub-line rate. Spokes should follow typical HQoS deployment where the WAN/Internet link is shaped to the maximum upload capacity, controlling direct spoke-to-spoke traffic streams.
When calculating bandwidth for QoS shaping and queuing with DMVPN, GRE, IPsec, and L2 overhead must all be included because queuing and shaping are executed at the outbound physical interface of the mGRE tunnel. GETVPN, in contrast, preserves the entire original IP packet header (source/destination IPs, TCP/UDP ports, ToS byte, DF bit) because no tunnels are used — making the standard WAN QoS design directly applicable, with the exception that increased packet size must be factored into bandwidth calculations.
Network Management
Today’s modern networks carry multiple business-critical applications over one unified infrastructure. Traffic requirements in terms of pattern and volume can change over time due to organic growth, mergers and acquisitions, and new applications — meaning the network may end up handling traffic it was not designed for. Most network downtimes are caused by human error, making controlled and tracked change management essential.
A network management solution must provide real-time and historical information about every activity across the network, enabling the IT team to act proactively rather than reactively, keeping mean time to repair (MTTR) as short as possible. Actions must be performed in a controlled and structured manner, tracked and recorded, and combined with automation to reduce human errors. The solution must address operation, administration, maintenance, and provisioning.
FCAPS (Fault, Configuration, Accounting, Performance, and Security)
FCAPS is a network management framework defined by the ISO that classifies network management objectives into five distinct categories:
- Fault management: Minimize network outages by detecting and isolating network issues, with corrective actions to overcome current issues and prevent recurrence. Examples: alarms, fault isolation, testing, troubleshooting.
- Configuration management: Maintain a current inventory of network equipment and configurations for planning, installation, and provisioning of new services and equipment.
- Accounting management: Ensure each user or entity is billed or allocated an appropriate cost reference based on activities and utilization. Examples: usage management, pricing, auditing, profitability analysis.
- Performance management: Monitor and track performance issues such as network bottlenecks by continuously collecting and analyzing statistical information. Examples: quality assurance, performance analysis, monitoring, capacity planning.
- Security management: Focus on the security of the management solution itself (access control, data confidentiality, integrity) and on monitoring the network for security aspects such as unauthorized access, traffic spikes (DoS attacks), and targeted application attacks.
Network Management High-Level Design Considerations
The following questions form the foundation of the network management solution:
- What is the targeted environment (enterprise, MPLS VPN SP, application SP, cloud-hosting SP)?
- Is there an existing network management solution? Does it follow a standard framework such as FCAPS?
- Is the solution being added to overcome an existing challenge or for enhancement?
- Are there any business-related constraints such as budget?
- What is the goal of the solution (monitoring and fault management, capacity planning, billing, security monitoring, or a combination)?
- Are there any security constraints, such as out-of-band management only, or can secure in-band management protocols be used?
After answering these questions, the detailed design should address:
- What information or events need to be collected or monitored?
- Where is the best place to gather the intended information or report relevant events?
- Where should collected information or events be sent?
- What degree of detail is required — full or partial data collection?
- Is the underlying transport network secure (internal) or untrusted (public Internet)?
- How is confidentiality and integrity of polled or exported information maintained?
- What protocols and versions are supported by the elements to be monitored (e.g., SNMP, NetFlow)?
Multitier Network Management Design
Integrating and structuring multiple management systems in a hierarchical manner offers a more flexible and efficient network management solution. This layered approach reduces the number of alerts seen by operations staff, presenting only filtered and relevant information. The multitier approach offers the following benefits:
- Proactively identifies and corrects potential network issues before they become problems
- Optimizes IT productivity by reducing network connectivity loss to a minimum
- Focuses on the solution instead of the problem, reducing downtime duration (MTTR)
This approach is based on bottom-up communication between management systems using protocols including NetFlow, syslog, and SNMP. In large networks, a failure in one area can impact multiple devices, each independently alerting the NMS and creating duplicate instances of the same problem. The multitiered architecture is shown in Figure 12.
The architecture has three tiers:
- Event Management Tier: Collects input from network elements via SNMP, IPFIX, IP SLA, and syslog
- Network Management Tier (NMT): Performs root-cause analysis by correlating information from multiple sources, deduplicating events, and presenting only the most relevant events to operations personnel. Covers fault, accounting, performance, security, and configuration management.
- Service Management Tier: Adds intelligence and automation to filtered NMT events for further optimization, enabling operators to move from element-by-element (box-by-box) management to managing network events and identified problems
Model-Driven Network Management
Automation capabilities have introduced new options for network management. This section covers YANG, NETCONF, RESTCONF, and gNMI to give network designers a basic understanding for making proper design decisions around automated network management.
YANG
Yet Another Next Generation (YANG) is an IETF standard (RFC 6020) data modeling language used to describe data for network configuration protocols such as NETCONF and RESTCONF. YANG has a hierarchical configuration structure within data models, making it easy to read and reuse. It is extensible through augmentation and serves as a full, formal contract language with rich syntax and semantics. Listing 1 shows a simple YANG data model.
module my-interface {
namespace "com.my-interface";
container interface {
list interface {
key name;
leaf name { type string; }
leaf admin-status { type enumeration { enum up; enum down; } }
}
}
rpc flap-interface {
input {
leaf name { type string; }
}
output {
leaf result { type boolean; }
}
}
}NETCONF
Network Configuration Protocol (NETCONF) is a network management protocol defined by the IETF in RFC 6241. NETCONF provides rich functionality for managing configuration and state data. The protocol operations are defined as remote procedure calls (RPCs) for requests and replies in XML-based representation. NETCONF supports running, candidate, and startup configuration datastores. The NETCONF capabilities are exchanged during session initiation. Transaction support is also a key NETCONF feature. NETCONF is a client/server protocol and is connection-oriented over TCP. All NETCONF messages are encrypted with SSH and encoded with XML. A NETCONF manager is a client, and a NETCONF device is a server. The initial contents of the <hello> message define the NETCONF capabilities that each side supports. The YANG data model defines capabilities for the supported devices. In addition, other standards bodies and proprietary specifications define capabilities. Figure 13 highlights the different NETCONF operations and datastore capabilities.
Key characteristics:
- Protocol operations defined as RPCs in XML-based representation
- Supports running, candidate, and startup configuration datastores
- Client/server protocol, connection-oriented over TCP
- All messages encrypted with SSH and encoded in XML
- Transaction support is a key feature
- Capabilities exchanged during session initiation via the
<hello>message
RESTCONF
RESTCONF, defined in RFC 8040, is an HTTP-based protocol that provides a programmatic interface for accessing YANG-modeled data. RESTCONF uses HTTP operations to provide create, retrieve, update, and delete (CRUD) operations on a NETCONF datastore containing YANG data. RESTCONF is tightly coupled to the YANG data model definitions. It supports HTTP-based tools and programming libraries. RESTCONF can be encoded in either XML or JSON.
When comparing RESTCONF with NETCONF, RESTCONF has:
- No notion of transaction
- No notion of lock
- No notion of candidate config and commit
- No notion of two-phase commit
- No
<copy-config> - XML or JSON, while NETCONF is only XML
In most design situations, it will be best to leverage NETCONF for routers and switches and RESTCONF for controller northbound communication. Table 9 represents the different layers, highlighting NETCONF and RESTCONF characteristics at each layer.
| Layer | NETCONF | RESTCONF |
|---|---|---|
| Content | Configuration data, notification data | Configuration data, notification data |
| Operations | <get>, <get-config> |
GET, POST, PATCH |
| Messages | <rpc>, <notification> |
HTTP payload, W3C Server-Sent Events |
| Secure Transport | SSH | HTTPS |
The operations listed in Table 9 are not an all-inclusive list of operations for both NETCONF and RESTCONF. Listing 2 maps the YANG model to corresponding RESTCONF HTTP operations.
module my-interface {
namespace "com.my-interface"; // GET: Gets a resource
// GET /restconf/data/my-interfaces:interfaces
container interface { // GET /restconf/data/my-interfaces:interfaces/interface/<some-name>
list interface {
key name; // POST: Creates a resource or invoke operation
leaf name { type string; } // POST /restconf/operations/my-interfaces:flap-interface + Data
leaf admin-status { type enum; }
rpc flap-interface { // PUT: Replaces a resource
input { // PUT /restconf/data/my-interfaces:interfaces/interface/<some-name> + Data
leaf name { type string; }
} // DELETE: Removes a resource
output { // DELETE /restconf/data/my-interfaces:interfaces/interface/<some-name>
leaf result { type boolean; }
}gNMI
gRPC Network Management Interface (gNMI), developed by Google, provides the mechanism to install, manipulate, and delete the configuration of network devices and also to view operational data. The content provided through gNMI can be modeled using YANG. gRPC is a remote procedure call developed by Google for low-latency, scalable distributions with mobile clients communicating to a cloud server. gRPC carries gNMI and provides the means to formulate and transmit data and operation requests. When a gNMI service failure occurs, the gNMI broker (GNMIB) will indicate an operational change of state from up to down, and all RPCs will return a service unavailable message until the database is up and running. Upon recovery, the GNMIB will indicate a change of operation state from down to up, and resume normal handling of RPCs.
Review Questions
9. Which network management protocol would be used if you wanted XML encoding and messages encrypted with SSH?
- YAML
- NETCONF
- RESTCONF
- gNMI
b. NETCONF is a network management protocol defined by the IETF in RFC 6241. All NETCONF messages are encrypted with SSH and encoded with XML.
10. Which network management protocol would be used for northbound controller communication leveraging JSON encoding while also leveraging HTTP for transport?
- YAML
- NETCONF
- RESTCONF
- gNMI
c. RESTCONF (RFC 8040) is an HTTP-based protocol that provides a programmatic interface for accessing YANG-modeled data. RESTCONF can be encoded in either XML or JSON. In most design situations, NETCONF is best for routers and switches and RESTCONF for controller northbound communication.
Fully Automated Network Management
Today, IT systems create thousands of events per second. Having humans monitor all these events would be prohibitively expensive, and yet they still would not react quickly enough. This has paved the way for automated solutions to help solve this problem. At the time of writing, there are three prominent concepts in this area: artificial intelligence (AI) for IT operations (AIOps), closed-loop automation, and full-stack observability (FSO).
Artificial Intelligence for IT Operations
AIOps helps in separating events with a business impact (from the noise) and resolving them autonomously. Managing IT operations without AIOps will be challenging in the future because of the rapid growth in data volumes and the rate of change. A modern-day AIOps platform detects anomalies, suppresses noise with correlation/de-duplication, helps in triaging and performing root cause analyses (RCA), and suggests/applies fixes.
Closed-Loop Automation
Closed-loop automation (CLA) is a continuous process that monitors, measures, and assesses real-time network traffic and then automatically acts to optimize end-user quality of experience.
Full-Stack Observability
Full-stack observability (FSO) is defined by metrics, events, logs, and traces. Modern applications span multiple environments. Today, a typical mobile application comprises hundreds of services communicating with each other over a zero-trust multi-cloud landscape, all of which have to work flawlessly. The level of complexity of these applications is tremendously higher than in decades past. We can no longer manage or optimize them because it is too much data with too little context and correlation. Traditional monitoring only gives visibility at the domain level, whether it be the network, infrastructure level, cloud, or database. The combined full-picture view is becoming more critical for the best user experience. This is where FSO comes into the forefront. Organizations require complete visibility and insights to properly take relevant action at the right time. To achieve this, there has to be a capability to measure the inner state of these applications based on the data generated by them, such as logs, metrics, and traces, which is also known as observability.
Cloud Services Design Considerations
As organizations increasingly leverage Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS), network designers must understand how cloud-based traffic management concepts map to traditional network design principles. Several cloud-specific mechanisms exist to handle capacity, traffic flow, and service continuity.
Cloud Bursting
Cloud bursting is a deployment model in which an application runs in a private cloud or on-premises data center and “bursts” into a public cloud when demand for computing capacity spikes. When consumers that leverage IaaS reach 100% resource capacity, cloud bursting redirects the overflow of traffic to the public cloud so there is no disruption to service.
This approach allows organizations to:
- Maintain baseline workloads on private infrastructure for cost efficiency and data sovereignty
- Absorb traffic spikes without over-provisioning local resources
- Ensure service continuity during peak demand periods (seasonal traffic, marketing campaigns, unexpected load)
- Pay only for additional public cloud resources when they are actually needed
From a network design perspective, cloud bursting requires:
- Low-latency, high-bandwidth connectivity between the private environment and the public cloud (typically via direct interconnects or dedicated WAN links)
- Consistent security policies across both environments
- Application architectures that support horizontal scaling across cloud boundaries
- Automated orchestration to detect capacity thresholds and trigger the burst
Cloud bursting is distinct from other cloud traffic management concepts. Cloud policing and cloud shaping apply rate-limiting principles (similar to QoS policing and shaping) to cloud-bound traffic. Cloud bursting specifically addresses capacity overflow by extending workloads into the public cloud.
Review Questions
11. When consumers that leverage IaaS reach 100% resource capacity, what can be used to redirect the overflow of traffic to the public cloud, so there is no disruption to service?
- Cloud policing
- Cloud bursting
- Cloud shaping
- Cloud spill
b. Cloud bursting is a deployment model where an application runs in a private cloud or on-premises environment and bursts into the public cloud when demand exceeds local capacity, ensuring no service disruption.
Summary
This chapter covered advanced IP topics and services that are part of any network design. To avoid design defects, network designers must incorporate these services in an integrated, holistic approach rather than designing in isolation. The top-down design approach is a fundamental requirement for achieving a successful business-driven design, for example, ensuring the design complies with the organization’s security policy standards. Business priorities and design constraints must always be considered, adopting a “first things first” approach that accounts for existing limitations including staff knowledge, budget, and supported features and technologies.
Previous: Multicast Design | Next: Scalable Enterprise Campus Architecture Design