Network Services and Management

ccde-written
Published

April 28, 2026

Overview

This chapter covers multiple networking and IP service design concepts that are core topics for the CCDE exam. The topics discussed may appear as applications or services used to achieve a business need — for example, a business-critical application might require QoS to be enabled across the network to function properly. The chapter focuses on design drivers, considerations, and approaches without covering deep technical implementation details.

The three main topics covered are:

  • IPv6 Design Considerations: Critical IPv6 topics and network design elements, focusing on integration and coexistence with IPv4
  • Quality of Service Design Considerations: QoS models, concepts, migrations, and corresponding network design elements
  • Network Management: Network management concepts, protocols, and corresponding network design elements
Note

IPv6-specific design considerations are covered in this chapter, but there are no specific CCDE blueprint line items for IPv6. This is because IPv4 and IPv6 are inherently included throughout every CCDE blueprint domain and topic.

IPv6 Design Considerations

This section focuses on one of the most critical IPv6 topics: the integration and coexistence of IPv4 and IPv6, covering the different design and technical options and how to follow a business-driven life-cycle design approach.

IPv6 Business and Technical Drivers

Despite IPv6 being proposed over two decades ago, serious adoption by enterprises and service providers only began in recent years. The drivers include the explosion of smartphones, mobile devices, IoT, smart connected cities, and public cloud services — all of which demand a huge number of IP addresses that IPv4 can no longer accommodate.

Organizations face several challenges from IPv4 exhaustion:

  • Exhaustion and constraints of IPv4 addresses, adding complexity to managing and provisioning new services
  • Added complexity in merger and acquisition scenarios, where NAT with its limitations becomes the primary option for overlapping address space
  • New market trends — mobility, IoT, smart cities — requiring large numbers of IPs for IP-enabled endpoints

An organization should consider migrating to IPv6 if it encounters any of the following situations:

  • Unable to expand to other global regions due to exhaustion of public IPv4 addresses
  • Deploying IoT environments with large numbers of connected sensors for smart communities
  • A service provider or enterprise needs seamless connectivity across fixed and mobile users where NAT is no longer viable
  • Enabling or connecting to IPv6-based 4G/5G/LTE mobile networks
  • Working as a supplier or partner with public sector or government entities where IPv6 is becoming the standard
  • IPv6 requirements driven by end-user operating systems and applications such as Windows 11, Windows Server, macOS, system virtualization, and large-scale multitenancy

IPv6 Address Types

The following table summarizes the key technical similarities and differences between IPv4 and IPv6.

Table 1: Summary of IPv4 Versus IPv6
IPv4 IPv6
Address scope 32 bit 128 bit, multiple scopes
IP allocation Manual, DHCP Manual, SLAAC, DHCP
QoS Differentiated services, integrated services Differentiated services, integrated services, flow label
Multicast IGMP, PIM, MP-BGP MLD, PIM, MP-BGP
Security No built-in support IPsec built in
Note

Although IPv6 supports built-in IPsec, it is a misconception that IPv6 is inherently more secure than IPv4. If IPsec is implemented, it provides confidentiality and integrity between two hosts, but it does not address link operation vulnerabilities, attacks, or most denial-of-service (DoS) attacks.

IPv6 has three types of unicast addresses:

  • Link local (fe80::/64): Nonroutable, exists only within a single Layer 2 domain. Required on every IPv6-enabled interface even when routable addresses are also assigned (RFC 4291).
  • Unique local address (ULA) (fc00::/7): Routable within the administrative domain of a given network. Conceptually similar to IPv4 private address ranges (RFC 1918).
  • Global (2000::/3): Routable across the Internet. Conceptually similar to IPv4 public address ranges.

Migration and Integration of IPv4 and IPv6

For network architects and designers to achieve a successful IPv6 migration or integration, they must follow a structured approach based on the top-down design methodology: network discovery, assessment, planning, design, deployment, monitoring, and optimization.

Discovery Phase

At this phase, network architects focus on understanding and identifying the business goals and drivers toward IPv6 enablement, along with other influencing factors such as project timeframe, government compliance, and geographic distribution of sites with regard to IP addressing availability. It is also critical to identify at a high level whether the existing network infrastructure (LAN, WAN, security nodes, services, and applications) supports IPv6 and whether the business is willing to invest in upgrading nodes that do not.

Solution Assessment and Planning

After the discovery phase, network designers analyze each identified influencing factor and generate a migration or integration plan. The following considerations drive the detailed design of the transition strategy:

  • Goal: Understanding the main purpose of the migration — for example, accessing services in the data center, regulatory compliance, or enabling IPv6 at the Internet edge due to lack of public IPv4 pools
  • Infrastructure support: Whether the entire infrastructure supports IPv6 or only the network edges, and whether the business is willing to upgrade non-IPv6 devices. This drives the selection of a technology solution to overcome any IPv6 support constraints.
  • Existing services and applications: Many applications still do not support IPv6, especially those developed in-house. Failing to consider how IPv6 networks will reach IPv4-only applications can break communication and seriously impact the business.

For example, if an enterprise needs end-to-end IPv6 but the core does not support it and no budget exists for upgrades, while access to new IPv6 applications in the data center is urgent, the designer may suggest IPv6-over-IPv4 tunneling, DNS-based translation, or NAT64 as interim solutions.

Transition Approaches for Enterprise Networks

Table 2: Approaches to Enable/Transition to IPv6 for the Enterprise
Design Goal Priorities Timeframe Design Approach Design Considerations
Migrate to pure IPv6 or dual stack No service interruption Flexible Migrate core to dual stack first, then gradually migrate other modules to IPv6-only or dual stack Increased hardware resource utilization; increased control plane complexity; core must support IPv6
Migrate fully or partially to IPv6-only or dual stack Quickly migrate certain modules first (e.g., data center) Limited Migrate certain enterprise modules first; DNS translation or tunneling (e.g., ISATAP) required to maintain IPv4/IPv6 communication Suitable when core does not support IPv6; increases design, control plane, and operational complexity
Migrate data center to support IPv6 hosts Support virtualized and non-virtualized IPv6 hosts Flexible Dual stack, VXLAN overlay, or MPLS-based 6PE/6VPE depending on DC architecture Dual stack increases hardware resource utilization and operational complexity
Provide IPv6 access at the Internet edge Support translation between IPv4 and IPv6 Flexible Translation via load balancer, pure DNS, or NAT64 Increases operational complexity; requires additional IPv6 security considerations

Review Questions

1. What is the first step an enterprise should take when migrating to an IPv6-only network if it wants to ensure that no service interruption occurs?

  1. Leverage an overlay like VXLAN to support IPv6 hosts within the data center
  2. Leverage a translation mechanism based on a load balancer, pure DNS, or classical NAT64
  3. Migrate certain modules of the enterprise network first while leveraging a translation/tunneling mechanism to maintain communication between IPv6 and IPv4 islands
  4. Migrate the core to be in dual-stack mode first, then migrate the other modules as time allows

d. Migrate the core to be in dual-stack mode first, and then other enterprise modules can be gradually migrated to IPv6-only or dual stack, depending on the goals and requirements of the business. Migrating to IPv6 this way ensures there is no service interruption.


2. Which of the following IPv6 design approaches would allow a network to provide IPv6 access inbound or outbound at the enterprise Internet edge?

  1. Leveraging an overlay like VXLAN to support IPv6 hosts within the data center
  2. Leveraging a translation mechanism based on a load balancer, pure DNS, or classical NAT64
  3. Migrating certain modules of the enterprise network first while leveraging a translation/tunneling mechanism to maintain communication between IPv6 and IPv4 islands
  4. Migrating the core to be in dual-stack mode first, then migrating the other modules as time allows

b. To provide IPv6 access either inbound or outbound at the enterprise Internet edge, a translation mechanism is required that is either based on a load balancer, pure DNS, or classical NAT64.

Transition Approaches for Service Provider Networks

Enabling IPv6 in an SP network differs from enterprise networks. SPs typically enable IPv6 either to provide a transit path for other SPs or to offer IPv6 connectivity to customers. The mechanism used is mainly driven by the goal and whether the transport is native IPv4 or MPLS-based.

Note

Some transition approaches for SP networks (6PE, 6VPE, 6rd) are out of scope for the CCDE v3 exam at the time of this writing but are included in the table below for completeness.

Table 3: Approaches to Enable/Transition to IPv6 for the Service Provider
Goal Transport Possible Approaches
Provide IPv6 Internet transit Native IPv4 Dual stack, tunneling (manual RFC 2893, GRE, L2TPv3)
Provide IPv6-based services and Internet access to residential clients Native IPv4 Dual stack, 6rd, tunneling such as IPv6 over L2TP
Provide IPv6 Internet access/transit MPLS 6PE, IPv6 over pseudowires
Provide IPv6 connectivity for MPLS L3VPN customers MPLS 6VPE
Provide IPv6 Internet access for MPLS L3VPN customers MPLS 6VPE

SP networks are transport networks with no directly connected endpoints, making the transition more flexible and less interruptive than in enterprise networks. When MPLS is enabled, IPv6 integration is simpler using MP-BGP overlay capabilities (6PE, 6VPE). Operators can also take a phased approach, enabling only the PE nodes that need to provide IPv6 transit first, without changing core (P) routers.

Today’s SPs also offer hosted services, SaaS, cloud-based data centers, and content services such as IPTV — meaning coexistence of IPv4 and IPv6 is inevitable. One common approach is to enable IPv6 at the services level first, requiring customers to be IPv6-enabled or to use translation (such as NAT v4-to-v6 at the enterprise Internet edge or DNS-based translation offered by the SP).

Note

One of the primary considerations when migrating to IPv6 is to ensure that IPv6 is secured in the same manner as IPv4. For instance, since Windows Server 2008, IPv6 has been native to Windows and supports transition technologies such as ISATAP. If a server is compromised and security rules do not account for IPv6, malicious traffic can ride an IPv6 tunnel without being blocked by security devices.

Detailed Design

After selecting the suitable approach, network designers put together the details: integration mechanism selection, tunnel termination, IP addressing, routing design, network security, and network virtualization considerations. The outcome of the design phase is used by implementation engineers during deployment. If anything proves impractical, it is reported back to the designer for revision.

IPv6/IPv4 integration mechanisms can be classified into four categories:

  • Dual stack
  • Tunneling based
  • Translation based
  • MPLS environment solutions
Note

The mechanisms in the table below are not prescriptive best practices. They represent commonly considered technology solutions for certain scenarios. Network designers must always assess the different influencing factors before suggesting any approach.

Table 4: Mechanisms to Support Coexistence of IPv4 and IPv6
Mechanism Scenario Targeted Environment Design Concern
Dual stack End-to-end IPv6 + IPv4 Any environment ultimately moving to end-to-end IPv6 IPv6 support required on all L3 platforms; increased control plane complexity; potential scalability weaknesses depending on hardware resources
Tunneling: P2P (L2TPv3, GRE RFC 2473) Transit IPv6 over IPv4-only network Small number of IPv6 islands interconnecting over IPv4 Scalability and encapsulation overhead; increased control plane complexity
Tunneling: ISATAP (RFC 5214) Host-sourced tunnels terminating at IPv6-enabled modules Trial IPv6 services or partial IPv6 enablement (e.g., DC only); mostly enterprise Affects overall network architecture; QoS, multicast, and NAT issues; adds control plane and operational complexity
Tunneling: mGRE Interconnect IPv6 over IPv4 in hub-and-spoke topology Hub-and-spoke IPv6 islands over IPv4 WAN Multicast traffic must go via hub; adds control plane and operational complexity
Tunneling: 6rd (RFC 5969) Extend IPv6 deployment to customer/residential sites with limited impact on existing IPv4 SP networks offering IPv6 over IPv4 to residential customers Simple, stateless, automatic encap/decap; depends on equipment support; adds control plane complexity
Tunneling: IPv6 over L2TP IPv6 access for residential gateways DSL/residential SPs with limited investment Stateful architecture on LNS; dual-stack IPv4/IPv6 on residential gateway LAN side; increases operational complexity
Translation: NAT64/SLB IPv6 endpoints accessing IPv4 Internet or services (LTE/4G/5G) Green-field IPv6 SPs or enterprises interconnecting to legacy IPv4 Does not support every application/protocol; performance may not match dual-stack depending on traffic load
Translation: DNS64 Access applications/services by name; translates between v4 and v6 based on source and target host Services and applications reachable by name Limited to name-based access; NAT64 usually required alongside DNS64
Translation: LISP Facilitate IPv6 communication over IPv4 transport via LISP encapsulation Enterprise edge, DC, or WAN with mixed IPv4/IPv6 High operational complexity; increased control plane complexity; devices must support LISP
MPLS: 6PE Enable IPv6 over existing MPLS/MP-BGP IPv4 network Large enterprises and SPs providing IPv6 over IPv4 infrastructure No traffic separation between customers; increases control plane complexity
MPLS: 6VPE Enable IPv6 over existing MPLS/MP-BGP IPv4 network for VPN customers MPLS VPN providers or enterprises with MPLS VPN networks Increases control plane complexity; may introduce scalability limitations due to separate RIB/FIB per customer
Note

Adding any overlay or tunneling mechanism to the network will almost always increase operational complexity. The level varies based on network size, routing design, staff knowledge, and the nature of the selected technology.

Review Questions

3. Which of the following IPv6 mechanisms would allow for end-to-end IPv4 and IPv6 functionality?

  1. Dual stack
  2. ISATAP
  3. GRE
  4. mGRE

a. Dual stack is when a device runs both IPv4 and IPv6 protocol stacks. When all devices in the network run like this, it is called end-to-end dual stack.


4. Which of the following IPv6 mechanisms would allow network-to-network connectivity to transit IPv6 over IPv4-only devices without any additional control plane protocols?

  1. Dual stack
  2. ISATAP
  3. GRE
  4. mGRE

c. Generic Routing Encapsulation (GRE) is a protocol for encapsulating data packets that uses one routing protocol inside the packets of another protocol. GRE sets up a direct point-to-point connection across a network — in this case, IPv6 runs through the GRE tunnel that traverses the IPv4 network.

Deployment, Monitoring, and Optimization

These phases cover implementation of the design followed by continuous monitoring to ensure the network delivers the promised value. The implementation should follow a plan that specifies which services and features need to be enabled at each step, along with any potential risks associated with each change. For example, enabling IPv6 at the routing protocol level may reset existing IPv4 peering sessions, depending on the routing protocol, hardware platform, and software in use.

Transition to IPv6 Scenario

ABC Corp. is an international real-estate company headquartered in Singapore, with 116 remote sites across Asia, Australia, and Europe, as per Figure 1.

Singapore HQ
Singapore HQ
IPv4 Internet
WAN
WAN
DC
DC
IPv4 MPLSL3VPN Provider
Barcelona Branch
Barcelona...
Hong Kong Branch
Hong Kong Bra...
London Branch
London Bra...
Sydney Branch
Sydney Bra...
Text is not SVG - cannot display
Figure 1: ABC Corp. Transition Approach to IPv6: Phases 1 and 2

The CIO has decided to migrate the entire IP network and applications to be primarily IPv6-based to support long-term business innovation, while maintaining business continuity:

  • Retain the ability for internal users to access legacy IPv4-only applications and the IPv4 Internet
  • Provide external users the ability to access ABC Corp.’s new IPv6 web-based services over the IPv4 Internet
  • IPv4 Internet websites accessed by internal IPv6 users must appear as IPv6 addresses (DNS64 synthesis of A records into AAAA records)
  • Go-live within six weeks

The primary design constraints are:

  • Quick transition solution required
  • Internet and DC services are centralized at the HQ/hub site
  • The current MPLS VPN WAN provider does not support IPv6

The transition approach is illustrated in Figure 2.

Singapore HQ
Singapore HQ
IPv4 Internet
WAN
WAN
Stateful NAT64
State...
Stateful NAT64
Stateful...
DC
DC
IPv4 MPLSL3VPN Provider
London Branch
London Bra...
Sydney Branch
Sydney Bra...
Phase 1
Phase 1
Dual Stack
Dual Stack
Overlay
Overlay
Dual Stack
Dual Stack
DNS64
DNS64
IPv6 Internet
IPv6 over mGRE Tunnel
IPv6 over mGRE Tunnel
Singapore HQ
Singapore HQ
IPv4 Internet
WAN
WAN
Stateful NAT64
State...
Stateful NAT64
Stateful...
DC
DC
IPv4 MPLSL3VPN Provider
London Branch
London Bra...
Sydney Branch
Sydney Bra...
Phase 2
Phase 2
IPv6 Only
IPv6 Only
6VPE
6VPE
Dual Stack
Dual Stack
DNS64
DNS64
IPv6 Internet
Text is not SVG - cannot display
Figure 2: ABC Corp. Transition Approach to IPv6: Phases 1 and 2

Phase 1 — Fast IPv6 Enablement:

  • Enable IPv6 (dual stack) on all network nodes, starting from the DC then WAN routers
  • Enable IPv6 routing on DC, WAN routers (hub and spokes), and Internet edge
  • Enable stateful NAT64 at the IPv4 Internet edge to provide IPv4 Internet access for internal IPv6 devices
  • Introduce DNS64 to synthesize IPv4 DNS A records into AAAA records, making IPv4 Internet services appear as IPv6 to internal users
  • Enable static NAT64 at DC edge nodes for internal IPv6 users to access legacy IPv4-only applications
  • Enable static NAT64 at the IPv4 Internet gateway for external users to access ABC Corp.’s IPv6 web services over the IPv4 Internet
  • Interconnect IPv6 network islands (spokes/remote sites) with HQ using IPv6 over mGRE/DMVPN tunneling over the IPv4 MPLS VPN WAN

Phase 2 — Design Optimization:

  • Migrate the WAN to a provider supporting IPv6 MPLS L3VPN (6VPE), replacing the IPv6 DMVPN overlay
  • Disable IPv4 routing in network areas where no IPv4 clients/hosts exist (e.g., remote sites), reducing load from holding separate RIB/FIB tables per IP version

Quality of Service Design Considerations

In today’s converged networks, there is an extremely high reliance on IT services and applications. Converged IP networks carry various traffic types — voice (IP telephony, HD audio, VoIP), video (video-on-demand, interactive video, telepresence, IP surveillance, digital signage), and an unlimited variety of data applications — each with different network requirements.

To deliver the desired quality of experience, network designers need a mechanism that can selectively prioritize traffic by providing dedicated bandwidth, controlled jitter and latency, and improved loss characteristics, while ensuring that prioritizing one flow does not cause other flows to fail. This mechanism is quality of service (QoS).

QoS High-Level Design: Business-Driven Approach

To design and deploy QoS successfully, network designers must follow the top-down approach: first understand the critical applications from the business point of view, then assess the optimal QoS design strategies to meet business and application requirements.

The goal is to align QoS design with business priorities and expectations. For example, a financial application sensitive to packet loss must be treated as high priority because any loss of connectivity can cost the business significantly. Similarly, an SP with a strict SLA to deliver voice traffic with no more than 1% end-to-end packet loss must apply the right QoS design to meet that SLA — otherwise it faces tangible penalties and intangible reputation damage.

Table 5: Top-Down QoS Design Approach Summary
Strategic Goal Approach Design Considerations
Understand business requirements Understand business priorities and goals Identify primary business drivers; highlight constraints such as budget
Identify the scope Understand the scope of the QoS design (campus, WAN, VPN, SP edge, or end to end) Is the application used within the campus, across the WAN, or over VPN? Is there any network in the path not directly controlled, such as a WAN?
Identify mission-critical applications Identify which applications need to be treated differently; identify non-business applications Identify mission-critical applications or services (e.g., SAP, FCoE, VoIP, TelePresence)
Understand application requirements Identify the characteristics of each application What network delivery is required: TCP, UDP, unicast, multicast? Application sensitivity to packet loss, jitter, and delay
Select a design strategy and identify technical constraints Clarify the end-to-end design strategy: number of QoS classes, QoS toolset, etc. What traffic classification strategy is used within the LAN (e.g., 8 or 12 classes)? What MPLS DiffServ tunneling mode is used? Is the core/WAN native IP or MPLS? What CoS are supported over the WAN? Can the targeted network node support the required number of queues or priority queuing?

QoS Architecture

There are two fundamental QoS architecture models:

  • Integrated Services (IntServ) (RFC 1633): Offers end-to-end QoS based on application transport requirements (usually per flow) by explicitly controlling network resources and reserving the required bandwidth end to end along the path per network node for each traffic flow. Resource reservation protocols such as RSVP and admission control mechanisms form the foundation of this process.
  • Differentiated Services (DiffServ) (RFC 2475): Offers QoS based on classifying traffic into multiple subclasses where packet flows are assigned different markings to receive different forwarding treatment (per-hop behavior, PHB) per network node along the path within each differentiated services domain (DS domain).
Note

Both QoS architectural models are applicable for IPv4 and IPv6, as both include the same 8-bit field in their headers (IPv4: Type of Service; IPv6: Traffic Class). The larger IPv6 packet header must be considered when calculating aggregate bandwidth of traffic flows.

QoS DiffServ Architecture and Toolset

A true and effective QoS design must cover traffic flows end to end. Because each traffic flow may traverse multiple networks with different QoS philosophies, the design must be divided into differentiated services domains (DS domains) as described in RFC 2475. Each DS domain consists of multiple interconnected network nodes operating under a common service provisioning policy, with a set of PHB groups enabled on each node.

Each DS domain has two primary types of nodes:

  • Internal nodes: Nodes belonging to a single DS domain, sharing the same QoS provisioning policy
  • DS boundary nodes: Nodes facing other DS or non-DS-capable domains, responsible for applying traffic policies (QoS policies) on traffic flows in both directions (ingress and egress) based on a predefined or agreed model between domains

Figure 3 illustrates the relationship between DS domains and boundary nodes.

DS Domain-2
DS Domain-2
DS Domain-1
DS Domain-1
DS Domain-3
DS Domain-3
Text is not SVG - cannot display
Figure 3: QoS DS Domains

DS domains can take different forms:

  • An enterprise domain with an SP domain in the middle (WAN transport)
  • Within an enterprise: multiple DS domains such as campus LAN, WAN, DC, and DMVPN over Internet

In the second scenario, multiple DS domains belonging to a single administrative authority can be combined under one global DS region. Each DS domain in that region can have its own QoS provisioning standards, offering a more structured and tiered design for large-scale networks (Figure 4).

DS Domain-1
DS Domain-1
DS Domain-2
DS Domain-2
DS Domain-3
DS Domain-3
DS Domain-X
DS Domain-X
Boundary QoS Policies
Boundary QoS Policies
Global DS Region
Global DS Region
Text is not SVG - cannot display
Figure 4: Multitier DS Domains

Traffic conditioning and QoS policies are enforced at multiple points across each domain using the following primary QoS toolset:

  • Traffic classification and marking
  • Traffic profiling and congestion management
  • Congestion avoidance (active queue management)
  • Admission control

Traffic Classification and Marking

Traffic classification selects frames or packets in a traffic stream based on the content of some portion of the frame or packet header, to which different policies can then be applied. Traffic marking writes a value into the packet header to be identified by QoS policies and placed in the desired class with the desired treatment at different stages during the end-to-end packet trip.

Classification does not always require marking. In some scenarios, traffic only needs to be selected based on IP header fields (source/destination address, source/destination port, incoming interface) and associated with a QoS policy action such as placing it in a predefined queue. Classification should almost always be performed at the point of network access (as close to the source as possible), then associated with the appropriate marking value (usually ToS header bits) so that QoS policies can be applied at any node across the network.

Marking also establishes trust boundaries at the edge of the network — the point where markings such as CoS or DSCP begin to be accepted as set by the connected endpoint. Trust boundaries are classified into three primary models (Figure 5):

Access
Access
Conditionally Trusted Endpoints
IP Phone + PC
Conditiona...
IP
Trusted and Secure Endpoint
Trusted and...
Untrusted/Unsecure Endpoint
Untrusted/...
Access
Access
Access
Access
Extended Trust Boundary
Extended Trust Boundary
IP
Trust Boundary
Trust Boundary
Text is not SVG - cannot display
Figure 5: QoS Trust Boundaries
  • Trusted model: Used with endpoints that can mark their own traffic and are approved from a security standpoint — such as IP phones, voice gateways, wireless access points, videoconferencing, and video surveillance endpoints. Ideally these are fixed (non-mobile) endpoints.
  • Untrusted model: Uses manual traffic classification and marking. Common candidates are PCs and servers, which are subject to attack and infection. Malicious traffic marked with high-priority CoS/DSCP values can cause a true DoS situation. Network designers selectively classify each application’s traffic flows and mark them with the desired CoS/DSCP value, along with a policy that either limits each class to a predefined maximum bandwidth or marks down out-of-profile traffic to a lower-priority value.
  • Conditional trust model: Extends the trust boundary to a connected device such as an IP phone (detected via CDP in Cisco solutions). The IP phone sends its traffic in a trusted manner while overriding PC traffic (connected to the back of the phone) to DSCP 0. Offers a simple method for large IP telephony deployments. If PCs run applications requiring specific DSCP values (e.g., softphone), manual classification and marking at the access switch edge port are required.

Marking values can also be re-marked at any location within or between DS domains. Between domains, re-marking handles mismatches between ToS values. Within a domain, re-marking moves out-of-profile traffic into a lower-priority class as a protective countermeasure.

Table 6: Summary of QoS Classification and Marking Options
OSI Layer Classification Marking
Physical Input interface N/A
Layer 2 VLAN ID, MAC, IEEE 802.1Q/p CoS IEEE 802.1Q/p CoS
Layer 2.5 MPLS label, MPLS EXP MPLS EXP
Layer 3 IP DSCP, IP source/destination IPP, DSCP
Layer 4 Source/destination port IPP, DSCP, EXP
Layers 5–7 Application signature (e.g., NBAR) IPP, DSCP, EXP
Note

DSCP marking is more commonly used than IP Precedence (IPP) because of its higher flexibility and scalability. However, a mix of both may be required in migration or integration scenarios between different domains (such as M&A or WAN MPLS VPN providers offering CoS based on IPP). In this case, class selector PHB provides backward compatibility with ToS-based IP Precedence (RFC 4594, 2474).

After traffic flows are classified and marked, they are grouped into DS classes. Application flows sharing similar traffic characteristics and network requirements (delay, jitter, packet loss) are placed under the same DS class, enabling network operators to assign the desired treatment per class at different locations across the DS domain — such as assigning different queuing models per class to control traffic during congestion.

Traffic Profiling and Congestion Management

During normal operation where traffic is at or below the maximum available bandwidth, packets are sent out of the interface as soon as they arrive. During congestion, packets arrive faster than the outgoing interface can handle them, leading to undesirable outcomes for business-critical applications and user quality of experience.

Note

If the network is overprovisioned with bandwidth, QoS adds minimal value. However, it is common practice to enable QoS with a minimum number of classes to cater to critical applications in case of unpredicted congestion — such as a node failure causing overutilization of a secondary path if capacity planning did not account for failure scenarios.

Congestion management allows nodes to queue accumulating packets at the outbound interface until the interface (Tx-Ring) is free. Transmission of queued packets is scheduled based on assigned priority and a queuing mechanism configured per traffic flow aggregate (predefined traffic profiling).

Table 7: Common QoS Queuing Mechanisms
Queuing Mechanism Characteristics
Weighted Fair Queuing (WFQ) Dynamic distribution among all traffic flows based on predefined values such as DSCP
Priority Queuing (PQ) Typically supports four queues with different priority levels; higher-priority queues are always serviced first
Class-Based WFQ (CBWFQ) Provides class-based queuing (user-defined classes) with a minimum bandwidth guarantee; supports flow-based WFQ for undefined classes (class-default); supports Low-Latency Queuing (LLQ)

Other techniques such as WRR and custom queuing exist but are less commonly used. FIFO is the default when no other queuing is configured — suitable for large links with low delay and minimal congestion, but with no priority or traffic classes.

WFQ offers simplified, automated, fair flow distribution but can impact certain applications. For example, a telepresence endpoint requiring 5 Mbps over a 10-Mbps WAN link shared by ten flows would receive only 1 Mbps under WFQ fairness, degrading video quality. With CBWFQ, network designers can place telepresence RTP streams in their own class with a minimum bandwidth guarantee of 5 Mbps during congestion. Interactive video traffic can be assigned to the LLQ to be prioritized and serviced first.

CBWFQ supports two LLQ models: single LLQ and multi-LLQ. With multi-LLQ, multiple sub-LLQs can be enabled inside a single aggregate strict priority queue, allowing multiple traffic types (e.g., VoIP and video) to be assigned to the LLQ. However, service within the LLQ itself is FIFO-based, so admission control is required to protect one LLQ from another (e.g., protecting a voice LLQ from a video LLQ).

Note

Cisco IOS includes a built-in implicit policer with the LLQ that limits the available bandwidth of the strict-priority queue to the allocated amount, preventing bandwidth starvation of non-real-time flows serviced by the CBWFQ scheduler. This applies only during periods of interface congestion (full Tx-Ring). A similar implicit policer applies per sub-LLQ in the multi-LLQ model.

Hierarchical QoS

At the enterprise edge, links are commonly provisioned at sub-line rate — for example, a physical 1-Gbps Ethernet link provisioned at 10 Mbps or 50 Mbps. In this setup, QoS policies such as CBWFQ provide no value because QoS only activates when the interface detects congestion. Since the physical line rate is higher than the provisioned bandwidth, no congestion is detected even when the actual provisioned rate is saturated.

Hierarchical QoS (HQoS) solves this by using a shaper at the parent policy to simulate backpressure, informing the router that congestion has occurred at the provisioned rate so that child QoS policies can take effect (Figure 6).

Child
Policy
Child...
50 MbpsSub-LineRate100 MbpsLine Rate
Traffic Classes
Traffic Classes
Real-Time 33%
Real-Time 33%
Control 7%
Control 7%
Data 35%
Data 35%
Best Efforts 25%
Best Efforts 25%
Parent Policy with Shaper
Parent Policy with Shaper
Text is not SVG - cannot display
Figure 6: HQoS

Congestion Avoidance (Active Queue Management)

Congestion management techniques manage the front of the queue — which packets are sent first. Congestion avoidance algorithms manage the tail of the queue — which packets are dropped first when queuing buffers are full.

Weighted Random Early Detection (WRED) is the most commonly used technique. Packets are dropped based on their ToS markings:

  • IP Precedence-based: Packets with lower IPP values are dropped more aggressively than those with higher IPP values
  • DSCP-based: Packets with higher AF drop precedence values are dropped more aggressively

When WRED selectively drops packets, it triggers TCP windowing mechanisms to adjust flow rates to manageable levels, optimizing TCP-based applications. WRED is a member of the broader Active Queue Management (AQM) family of technologies.

Review Questions

5. Which of the following QoS queuing characteristics is an example of Weighted Fair Queuing?

  1. Supports real-time queuing and minimum bandwidth guarantee
  2. Offers a dynamic distribution based on DSCP values
  3. Four queues with associated levels of importance, with the most important being serviced first
  4. Suitable for large links that have low delay with very minimal congestion, but has no requirement for priority or classes of traffic

b. The WFQ algorithm offers a dynamic distribution among all traffic flows based on weights such as DSCP values.


6. Which of the following QoS queuing characteristics is an example of FIFO?

  1. Supports real-time queuing and minimum bandwidth guarantee
  2. Offers a dynamic distribution based on DSCP values
  3. Typically four to six queues with associated levels of importance, with the most important being serviced first
  4. Suitable for large links that have low delay with very minimal congestion, but has no requirement for priority or classes of traffic

d. FIFO queuing is the default when no other queuing is used. Although suitable for large links with low delay and minimal congestion, it has no priority or classes of traffic.


7. Which of the following QoS queuing characteristics is an example of Priority Queuing?

  1. Supports real-time queuing and minimum bandwidth guarantee
  2. Offers a dynamic distribution based on DSCP values
  3. Typically four to six queues with associated levels of importance, with the most important being serviced first
  4. Suitable for large links that have low delay with very minimal congestion, but has no requirement for priority or classes of traffic

c. Priority Queuing typically has four to six queues with different priority levels, and the higher-priority queues are always serviced first.


8. Which of the following QoS queuing characteristics is an example of LLQ?

  1. Supports real-time queuing and minimum bandwidth guarantee
  2. Offers a dynamic distribution based on DSCP values
  3. Typically four to six queues with associated levels of importance, with the most important being serviced first
  4. Suitable for large links that have low delay with very minimal congestion, but has no requirement for priority or classes of traffic

a. LLQ supports real-time queuing and minimum bandwidth guarantee.

Admission Control

Admission control keeps traffic flows in compliance with DS domain traffic conditioning standards — such as an SLA specifying the maximum allowed traffic rate per class and per link, where excess packets are discarded to keep flows within the agreed traffic profile. There are two primary ways to perform admission control:

  • Traffic policing: When traffic reaches the predefined maximum contracted rate, excess traffic is either dropped or re-marked (marked down)
  • Traffic shaping: Excess packets are buffered and delayed, then scheduled for later transmission over increments of time, smoothing the output rate and preventing unnecessary drops

The difference between policing and shaping is illustrated in Figure 7 and Figure 8.

2026-04-29T09:37:23.531759 image/svg+xml Matplotlib v3.10.8, https://matplotlib.org/ Time Traffic Rate Time Traffic Rate Policing
Figure 7: Traffic Policing
2026-04-29T09:37:23.545434 image/svg+xml Matplotlib v3.10.8, https://matplotlib.org/ Time Traffic Rate Time Traffic Rate Shaping
Figure 8: Traffic Shaping
Note

Buffering excess packets in traffic shaping may introduce delay, especially with deep queues. For real-time traffic, it is sometimes preferable to police and drop excess packets rather than delay them, to avoid degraded quality of experience.

QoS Design Strategy

Effective QoS design must be measured end to end across the packet’s trip from source to destination. Network designers must consider a consistent and unified QoS design model based on available bandwidth, traffic characteristics, and network scope (campus only, WAN only, or end to end across the entire enterprise — single DS domain versus multiple DS domains).

Table 8 provides a generic 12-class QoS baseline model based on both the Cisco QoS Baseline and informational RFC 4594, offering common and unified traffic marking and profiling characteristics across single and multiple DS domains.

Table 8: Twelve-Class QoS Baseline Model Based on Cisco and RFC 4594 Baselines
Application Class Per-Hop Behavior IETF RFC Queuing & Dropping Application Examples
VoIP Telephony EF 3246 Priority Queue (PQ) IP Telephony (IPT)
Broadcast Video CS5 2474 (Optional) PQ IP Video Surveillance/IPTV
Real-time Interactive VC CS4 2474 (Optional) PQ Telepresence
Multimedia Conferencing AF4 2597 BW Queue + DSCP WRED IPT Video
Multimedia Streaming AF3 2597 BW Queue + DSCP WRED Video on Demand (VoD), E-learning
Network Control CS6 2474 BW Queue EIGRP, OSPF, BGP, HSRP, IKE
Call-Signaling CS3 2474 BW Queue SCCP, SIP, H.323
Mgmt (OAM) CS2 2474 BW Queue SNMP, SSH, Syslog
Low-Latency Data AF2 2597 BW Queue + DSCP WRED ERP Apps, CRM Apps, Database Apps
High-Throughput Data AF1 2474 BW Queue + DSCP WRED E-mail, FTP, Backup Apps, Content Distribution
Best Effort DF Default Queue + RED Default Class
Low-Priority Data CS1 3662 Min BW Queue (Deferential) YouTube, iTunes, BitTorrent, Xbox Live

Bold rows indicate the key changes introduced by RFC 4594 compared to the original Cisco QoS Baseline (2002): Broadcast Video was assigned CS5, and Call-Signaling was re-marked from AF31 to CS3.

Note

The IETF DiffServ RFCs provide consistent PHBs for applications marked to specific DSCP values but do not specify which application should be marked with which DSCP value. RFC 4594 (informational, August 2006) puts forward 12 application classes matched to RFC-defined PHBs. The most significant difference from the original Cisco QoS Baseline (2002) is the recommendation to mark call signaling from AF31 to CS3. RFC 4594 is an informational RFC — an industry best practice, not a standard.

The 12-class model is comprehensive and flexible but is not always viable:

  • Not all enterprises or SPs need such a wide QoS design model
  • Most WAN providers offer only 4- or 6-class QoS models, making mapping complex
4-Class Model
4-Class Model
Multimedia Conferencing
Multimedia Conferencing
Real-time
Real-time
Signaling/Control
Signaling/Control
Best Effort
Best Effort
12-Class Model
12-Class Model
Voice
Voice
Real-time Interactive
Real-time Interactive
Broadcast Video
Broadcast Video
Multimedia Streaming
Multimedia Streaming
Call Signaling
Call Signaling
Network Control
Network Control
Network Management
Network Management
Low-Latency Data
Low-Latency Data
Best Effort
Best Effort
Low-Priority Data
Low-Priority Data
8-Class Model
8-Class Model
Voice
Voice
Interactive Video
Interactive Video
Streaming Video
Streaming Video
Call Signaling
Call Signaling
Network Control
Network Control
Critical Data
Critical Data
Best Effort
Best Effort
Low-Priority Data
Low-Priority Data
High-Throughput Data
High-Throughput Data
Critical Data
Critical Data
Text is not SVG - cannot display
Figure 9: Mapping Between QoS Models with Different Classes

Both 4- and 6-class models provision only a single class for real-time traffic (usually voice). If video is added, either a higher class model (such as 8-class) is required, or voice and video must share a single class — which may not be desirable for large deployments with many IP telephony and video endpoints. Figure 9 shows how classes are mapped between models with different numbers of classes.

As a general rule, network designers should use a phased approach: start with a simple QoS model (such as 4-class) as a baseline, then add classes as requirements mandate. This minimizes initial design and operational complexity.

Although standard best practice guides recommend bandwidth allocation percentages per class — such as no more than 33% of available bandwidth for real-time traffic (LLQ) and 25% for best effort — these are generic baselines, not fixed rules. Consider the following example:

  • A campus network with 20 access switches, each with 30 IP phones using G.711 codec (~80 kbps per call), and a WAN designed for a maximum of 20 simultaneous calls
  • Allocating 33% LLQ on 10-Gbps campus uplinks reserves 3.3 Gbps — a potential security risk if malicious traffic is marked DSCP EF
  • On the 10-Mbps WAN link, 20 simultaneous calls require only 1.6 Mbps; allocating 33% (3.3 Mbps) wastes bandwidth

Network designers must adjust bandwidth allocation based on actual traffic flow requirements, security concerns, available bandwidth, and whether the link is LAN, WAN, or data center. The overall QoS design framework is summarized in Figure 10.

4-Class Model
4-Class Model
Multimedia Conferencing
Multimedia Conferencing
Real-time
Real-time
Signaling/Control
Signaling/Control
Best Effort
Best Effort
12-Class Model
12-Class Model
Voice
Voice
Real-time Interactive
Real-time Interactive
Broadcast Video
Broadcast Video
Multimedia Streaming
Multimedia Streaming
Call Signaling
Call Signaling
Network Control
Network Control
Network Management
Network Management
Low-Latency Data
Low-Latency Data
Best Effort
Best Effort
Low-Priority Data
Low-Priority Data
8-Class Model
8-Class Model
Voice
Voice
Interactive Video
Interactive Video
Streaming Video
Streaming Video
Call Signaling
Call Signaling
Network Control
Network Control
Critical Data
Critical Data
Best Effort
Best Effort
Low-Priority Data
Low-Priority Data
High-Throughput Data
High-Throughput Data
Critical Data
Critical Data
Text is not SVG - cannot display
Figure 10: QoS Design Framework

The QoS design framework follows this top-down flow:

  1. Business Requirements: Business drivers, goals, requirements, and design scope (WAN, campus, or end to end)
  2. Functional and Application Requirements: Identify high-priority business applications and understand their network requirements and attributes
  3. Classification and Marking: Classify and mark application flows as close to the traffic source as technically possible, combined with admission control
  4. Congestion Management and Avoidance: Profile traffic flows and aggregates into DS classes, then assign each class an appropriate queuing mechanism combined with a queue management technique
  5. Monitor and Optimize

Enterprise QoS Design Considerations

Enterprise Campus

Today’s campus networks are provisioned with Gigabit/10 Gigabit bandwidth, where queuing needs are minimal compared to the WAN and Internet edge. However, QoS in the campus is not limited to queuing functions. Unified marking and accurate traffic classification (as close to the source as possible) also enables policing across the campus LAN, giving network operators flexibility to manage traffic based on ToS values and providing a protective mechanism against DoS attacks.

It is recommended that QoS be enabled across the campus LAN to maintain a seamless DS domain design where classification and marking policies establish trust boundaries, and policers protect against undesired flows at the access edge.

Enterprise Edge

The enterprise edge (WAN, extranet, or Internet) is where traffic flow aggregation occurs — many flows from the high-bandwidth LAN side must exit through a lower-capacity edge link. QoS is always a primary function at the enterprise edge for bandwidth optimization, especially for converged voice, video, and data traffic.

The enterprise edge represents the DS domain boundary where traffic must be mapped and profiled to align with the adjacent DS domain. For example, a 12-class enterprise model must be mapped to a 4-class SP model at the WAN edge router (CE) toward the SP edge (PE), as shown in Figure 11.

MPLS VPN
Best Effort
Best Effort
Low-Priority Data
Low-Priority Data
High-Throughput Data
High-Throughput Data
CS6
CS6
Application
Application
DSCP
DSCP
CS5
CS5
EF
EF
AF4 → AF2
AF4 → AF2
CS4 → CS5
CS4 → CS5
AF3 → AF2
AF3 → AF2
CS3
CS3
AF2 → AF3
AF2 → AF3
CS2
CS2
CS1
CS1
DF
DF
AF1
AF1
Network Control
Network Control
VoIP Telephony
VoIP Telephony
Broadcast Video
Broadcast Video
Multimedia Conferencing
Multimedia Conferencing
Real-time Interactive
Real-time Interactive
Multimedia Streaming
Multimedia Streaming
Signaling
Signaling
Transactional Data
Transactional Data
OAM
OAM
4-Class SP Model
4-Class SP Model
SP-Real-time
(RTP/UDP)
30%
SP-Real-time...
SP-Critical Class 1
20%
SP-Critical Class...
SP-Best Effort
30%
SP-Best Effort...
SP-Critical Class 2
20%
SP-Critical Class...
EF
EF
CS5
CS5
CS3
CS3
AF3
AF3
CS6
CS6
AF2
AF2
CS2
CS2
DF
DF
4-Class SP Model
4-Class SP Model
SP-Real-time
(RTP/UDP)
30%
SP-Real-time...
SP-Critical Class 1
20%
SP-Critical Class 1...
SP-Best Effort
30%
SP-Best Effort...
SP-Critical Class 2
20%
SP-Critical Class 2...
EXP5
EXP5
EXP4
EXP4
EXP3
EXP3
EXP0
EXP0
IPP5/DSCP EF
IPP5/DSCP EF
IPP5/DSCP EF
IPP5/DSCP EF
TOP-label MPLS EXP5
TOP-label MPLS EXP5
MPLS EXP 5
MPLS EXP 5
IPP5/DSCP EF
IPP5/DSCP EF
Text is not SVG - cannot display
Figure 11: QoS Mapping: Enterprise WAN Edge
Note

It is common for SPs to offer CoS based on IP Precedence only. A DSCP value such as AF41 (binary 100010) converts to IPP 4 (binary 100), which comes back as DSCP 32 (binary 100000) at the remote site. Re-marking is required at the receiving side in the ingress direction to maintain unified end-to-end QoS marking.

IP Tunneling QoS Design Considerations

VPN solutions add additional IP and ESP headers to each packet, increasing overhead and bandwidth consumption. For example, a G.711 VoIP RTP stream requiring 80 kbps at Layer 3 increases to ~112 kbps when sent over GRE encrypted with IPsec, which is an almost 40% bandwidth increase to transport the encrypted VoIP call. Therefore, bandwidth consumption with VPN is an essential point network designers need to consider, because it will usually impact the overall supported number of simultaneous voice calls or application sessions based on the actual available bandwidth.

Additional QoS design factors to consider with VPN solutions:

  • Additional packet delay from encryption and decryption
  • Pre-crypto queuing
  • QoS packet reordering (prioritization) may cause IPsec anti-replay to drop legitimate out-of-sequence packets
  • ToS value preservation
  • MTU issues
  • Original flow information is hidden for outbound QoS policies by default (Cisco QoS pre-classify feature helps here)
  • VPN logical topology vs underlying IP transport topology (e.g., DMVPN Phase 1 over MPLS L3VPN transforms any-to-any to hub-and-spoke)

The following QoS toolset forms the foundation of QoS design with VPN solutions:

  • Hierarchical QoS: Helps when QoS policies must be applied on a sub-line rate interface or per-tunnel
  • CBWFQ: Required when different traffic treatment is needed per class
  • Admission control: Provides per-child QoS class admission control
  • WRED: Optimizes TCP-based applications

With point-to-point VPN (classic IPsec or GRE), traffic flow is deterministic but introduces high operational complexity and scalability limitations in large deployments — for example, maintaining manual QoS policies per tunnel at the hub site.

DMVPN introduces additional QoS challenges because it supports hub-and-spoke topology with direct spoke-to-spoke traffic, and the hub’s higher-bandwidth pipe can easily congest the spokes. Adding a QoS policy per spoke at the hub is non-scalable.

The per-tunnel QoS feature with DMVPN promotes a zero-touch hub design using NHRP groups to dynamically provision QoS policies on a per-spoke basis at the hub mGRE tunnel during spoke registration. Remote sites are profiled based on their provisioned WAN/Internet bandwidth and automatically assigned a QoS policy that shapes traffic from the hub toward each spoke to match the spoke’s maximum download rate.

Note

At the hub site, network operators still need to define a policy to shape the interface to the actual provisioned sub-line rate. Spokes should follow typical HQoS deployment where the WAN/Internet link is shaped to the maximum upload capacity, controlling direct spoke-to-spoke traffic streams.

When calculating bandwidth for QoS shaping and queuing with DMVPN, GRE, IPsec, and L2 overhead must all be included because queuing and shaping are executed at the outbound physical interface of the mGRE tunnel. GETVPN, in contrast, preserves the entire original IP packet header (source/destination IPs, TCP/UDP ports, ToS byte, DF bit) because no tunnels are used — making the standard WAN QoS design directly applicable, with the exception that increased packet size must be factored into bandwidth calculations.

Network Management

Today’s modern networks carry multiple business-critical applications over one unified infrastructure. Traffic requirements in terms of pattern and volume can change over time due to organic growth, mergers and acquisitions, and new applications — meaning the network may end up handling traffic it was not designed for. Most network downtimes are caused by human error, making controlled and tracked change management essential.

A network management solution must provide real-time and historical information about every activity across the network, enabling the IT team to act proactively rather than reactively, keeping mean time to repair (MTTR) as short as possible. Actions must be performed in a controlled and structured manner, tracked and recorded, and combined with automation to reduce human errors. The solution must address operation, administration, maintenance, and provisioning.

FCAPS (Fault, Configuration, Accounting, Performance, and Security)

FCAPS is a network management framework defined by the ISO that classifies network management objectives into five distinct categories:

  • Fault management: Minimize network outages by detecting and isolating network issues, with corrective actions to overcome current issues and prevent recurrence. Examples: alarms, fault isolation, testing, troubleshooting.
  • Configuration management: Maintain a current inventory of network equipment and configurations for planning, installation, and provisioning of new services and equipment.
  • Accounting management: Ensure each user or entity is billed or allocated an appropriate cost reference based on activities and utilization. Examples: usage management, pricing, auditing, profitability analysis.
  • Performance management: Monitor and track performance issues such as network bottlenecks by continuously collecting and analyzing statistical information. Examples: quality assurance, performance analysis, monitoring, capacity planning.
  • Security management: Focus on the security of the management solution itself (access control, data confidentiality, integrity) and on monitoring the network for security aspects such as unauthorized access, traffic spikes (DoS attacks), and targeted application attacks.

Network Management High-Level Design Considerations

The following questions form the foundation of the network management solution:

  • What is the targeted environment (enterprise, MPLS VPN SP, application SP, cloud-hosting SP)?
  • Is there an existing network management solution? Does it follow a standard framework such as FCAPS?
  • Is the solution being added to overcome an existing challenge or for enhancement?
  • Are there any business-related constraints such as budget?
  • What is the goal of the solution (monitoring and fault management, capacity planning, billing, security monitoring, or a combination)?
  • Are there any security constraints, such as out-of-band management only, or can secure in-band management protocols be used?

After answering these questions, the detailed design should address:

  • What information or events need to be collected or monitored?
  • Where is the best place to gather the intended information or report relevant events?
  • Where should collected information or events be sent?
  • What degree of detail is required — full or partial data collection?
  • Is the underlying transport network secure (internal) or untrusted (public Internet)?
  • How is confidentiality and integrity of polled or exported information maintained?
  • What protocols and versions are supported by the elements to be monitored (e.g., SNMP, NetFlow)?

Multitier Network Management Design

Integrating and structuring multiple management systems in a hierarchical manner offers a more flexible and efficient network management solution. This layered approach reduces the number of alerts seen by operations staff, presenting only filtered and relevant information. The multitier approach offers the following benefits:

  • Proactively identifies and corrects potential network issues before they become problems
  • Optimizes IT productivity by reducing network connectivity loss to a minimum
  • Focuses on the solution instead of the problem, reducing downtime duration (MTTR)

This approach is based on bottom-up communication between management systems using protocols including NetFlow, syslog, and SNMP. In large networks, a failure in one area can impact multiple devices, each independently alerting the NMS and creating duplicate instances of the same problem. The multitiered architecture is shown in Figure 12.

SNMP
SNMP
IPFIX
IPFIX
IPSLA
IPSLA
SYSLOG
SYSLOG
Fault Management
Fault Manage...
Fault Management
Fault Managem...
Performance Management
Performance M...
Security Management
Security Man...
Config Management
Config Manage...
Inventory Management
Inventory Manag...
Events
Coloration
Events...
Events Automation
Events Automati...
Change Management
Change Manageme...
NOC
NOC
Service Management Tier
Service Management Tier
Network Management Tier
Network Management Tier
Event Management Tier
Event Management Tier
Text is not SVG - cannot display
Figure 12: Multitiered Network Management Solution

The architecture has three tiers:

  • Event Management Tier: Collects input from network elements via SNMP, IPFIX, IP SLA, and syslog
  • Network Management Tier (NMT): Performs root-cause analysis by correlating information from multiple sources, deduplicating events, and presenting only the most relevant events to operations personnel. Covers fault, accounting, performance, security, and configuration management.
  • Service Management Tier: Adds intelligence and automation to filtered NMT events for further optimization, enabling operators to move from element-by-element (box-by-box) management to managing network events and identified problems

Model-Driven Network Management

Automation capabilities have introduced new options for network management. This section covers YANG, NETCONF, RESTCONF, and gNMI to give network designers a basic understanding for making proper design decisions around automated network management.

YANG

Yet Another Next Generation (YANG) is an IETF standard (RFC 6020) data modeling language used to describe data for network configuration protocols such as NETCONF and RESTCONF. YANG has a hierarchical configuration structure within data models, making it easy to read and reuse. It is extensible through augmentation and serves as a full, formal contract language with rich syntax and semantics. Listing 1 shows a simple YANG data model.

Listing 1: YANG Data Model Example
module my-interface {
  namespace "com.my-interface";

  container interface {
    list interface {
      key name;
      leaf name { type string; }
      leaf admin-status { type enumeration { enum up; enum down; } }
    }
  }

  rpc flap-interface {
    input {
      leaf name { type string; }
    }
    output {
      leaf result { type boolean; }
    }
  }
}

NETCONF

Network Configuration Protocol (NETCONF) is a network management protocol defined by the IETF in RFC 6241. NETCONF provides rich functionality for managing configuration and state data. The protocol operations are defined as remote procedure calls (RPCs) for requests and replies in XML-based representation. NETCONF supports running, candidate, and startup configuration datastores. The NETCONF capabilities are exchanged during session initiation. Transaction support is also a key NETCONF feature. NETCONF is a client/server protocol and is connection-oriented over TCP. All NETCONF messages are encrypted with SSH and encoded with XML. A NETCONF manager is a client, and a NETCONF device is a server. The initial contents of the <hello> message define the NETCONF capabilities that each side supports. The YANG data model defines capabilities for the supported devices. In addition, other standards bodies and proprietary specifications define capabilities. Figure 13 highlights the different NETCONF operations and datastore capabilities.

Running
Running
<copy>
<copy>
<commit>
<commit>
<copy>
<copy>
Candidate
Candidate
Startup
Startup
<edit-config>
<edit-config>
<get-config>
<get-config>
<get>
<get>
:candidate
:candidate
:startup
:startup
Working copy of config that can be modified without impacting the running config
Working copy of confi...
Complete and active running config and the respective data
Complete and active r...
Config loaded at device startup
Config loaded a...
Text is not SVG - cannot display
Figure 13: NETCONF Operations and Datastore Capabilities

Key characteristics:

  • Protocol operations defined as RPCs in XML-based representation
  • Supports running, candidate, and startup configuration datastores
  • Client/server protocol, connection-oriented over TCP
  • All messages encrypted with SSH and encoded in XML
  • Transaction support is a key feature
  • Capabilities exchanged during session initiation via the <hello> message

RESTCONF

RESTCONF, defined in RFC 8040, is an HTTP-based protocol that provides a programmatic interface for accessing YANG-modeled data. RESTCONF uses HTTP operations to provide create, retrieve, update, and delete (CRUD) operations on a NETCONF datastore containing YANG data. RESTCONF is tightly coupled to the YANG data model definitions. It supports HTTP-based tools and programming libraries. RESTCONF can be encoded in either XML or JSON.

When comparing RESTCONF with NETCONF, RESTCONF has:

  • No notion of transaction
  • No notion of lock
  • No notion of candidate config and commit
  • No notion of two-phase commit
  • No <copy-config>
  • XML or JSON, while NETCONF is only XML

In most design situations, it will be best to leverage NETCONF for routers and switches and RESTCONF for controller northbound communication. Table 9 represents the different layers, highlighting NETCONF and RESTCONF characteristics at each layer.

Table 9: Layering Model for NETCONF vs RESTCONF
Layer NETCONF RESTCONF
Content Configuration data, notification data Configuration data, notification data
Operations <get>, <get-config> GET, POST, PATCH
Messages <rpc>, <notification> HTTP payload, W3C Server-Sent Events
Secure Transport SSH HTTPS

The operations listed in Table 9 are not an all-inclusive list of operations for both NETCONF and RESTCONF. Listing 2 maps the YANG model to corresponding RESTCONF HTTP operations.

Listing 2: RESTCONF HTTP Operations Mapped to YANG Model
module my-interface {                    
  namespace "com.my-interface";          // GET: Gets a resource
                                         // GET /restconf/data/my-interfaces:interfaces
  container interface {                  // GET /restconf/data/my-interfaces:interfaces/interface/<some-name>
    list interface {
      key name;                          // POST: Creates a resource or invoke operation
      leaf name { type string; }         // POST /restconf/operations/my-interfaces:flap-interface + Data
      leaf admin-status { type enum; }
  rpc flap-interface {                   // PUT: Replaces a resource
    input {                              // PUT /restconf/data/my-interfaces:interfaces/interface/<some-name> + Data
      leaf name { type string; }
    }                                    // DELETE: Removes a resource
    output {                             // DELETE /restconf/data/my-interfaces:interfaces/interface/<some-name>
      leaf result { type boolean; }
}

gNMI

gRPC Network Management Interface (gNMI), developed by Google, provides the mechanism to install, manipulate, and delete the configuration of network devices and also to view operational data. The content provided through gNMI can be modeled using YANG. gRPC is a remote procedure call developed by Google for low-latency, scalable distributions with mobile clients communicating to a cloud server. gRPC carries gNMI and provides the means to formulate and transmit data and operation requests. When a gNMI service failure occurs, the gNMI broker (GNMIB) will indicate an operational change of state from up to down, and all RPCs will return a service unavailable message until the database is up and running. Upon recovery, the GNMIB will indicate a change of operation state from down to up, and resume normal handling of RPCs.

Review Questions

9. Which network management protocol would be used if you wanted XML encoding and messages encrypted with SSH?

  1. YAML
  2. NETCONF
  3. RESTCONF
  4. gNMI

b. NETCONF is a network management protocol defined by the IETF in RFC 6241. All NETCONF messages are encrypted with SSH and encoded with XML.


10. Which network management protocol would be used for northbound controller communication leveraging JSON encoding while also leveraging HTTP for transport?

  1. YAML
  2. NETCONF
  3. RESTCONF
  4. gNMI

c. RESTCONF (RFC 8040) is an HTTP-based protocol that provides a programmatic interface for accessing YANG-modeled data. RESTCONF can be encoded in either XML or JSON. In most design situations, NETCONF is best for routers and switches and RESTCONF for controller northbound communication.

Fully Automated Network Management

Today, IT systems create thousands of events per second. Having humans monitor all these events would be prohibitively expensive, and yet they still would not react quickly enough. This has paved the way for automated solutions to help solve this problem. At the time of writing, there are three prominent concepts in this area: artificial intelligence (AI) for IT operations (AIOps), closed-loop automation, and full-stack observability (FSO).

Artificial Intelligence for IT Operations

AIOps helps in separating events with a business impact (from the noise) and resolving them autonomously. Managing IT operations without AIOps will be challenging in the future because of the rapid growth in data volumes and the rate of change. A modern-day AIOps platform detects anomalies, suppresses noise with correlation/de-duplication, helps in triaging and performing root cause analyses (RCA), and suggests/applies fixes.

Closed-Loop Automation

Closed-loop automation (CLA) is a continuous process that monitors, measures, and assesses real-time network traffic and then automatically acts to optimize end-user quality of experience.

Full-Stack Observability

Full-stack observability (FSO) is defined by metrics, events, logs, and traces. Modern applications span multiple environments. Today, a typical mobile application comprises hundreds of services communicating with each other over a zero-trust multi-cloud landscape, all of which have to work flawlessly. The level of complexity of these applications is tremendously higher than in decades past. We can no longer manage or optimize them because it is too much data with too little context and correlation. Traditional monitoring only gives visibility at the domain level, whether it be the network, infrastructure level, cloud, or database. The combined full-picture view is becoming more critical for the best user experience. This is where FSO comes into the forefront. Organizations require complete visibility and insights to properly take relevant action at the right time. To achieve this, there has to be a capability to measure the inner state of these applications based on the data generated by them, such as logs, metrics, and traces, which is also known as observability.

Cloud Services Design Considerations

As organizations increasingly leverage Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS), network designers must understand how cloud-based traffic management concepts map to traditional network design principles. Several cloud-specific mechanisms exist to handle capacity, traffic flow, and service continuity.

Cloud Bursting

Cloud bursting is a deployment model in which an application runs in a private cloud or on-premises data center and “bursts” into a public cloud when demand for computing capacity spikes. When consumers that leverage IaaS reach 100% resource capacity, cloud bursting redirects the overflow of traffic to the public cloud so there is no disruption to service.

This approach allows organizations to:

  • Maintain baseline workloads on private infrastructure for cost efficiency and data sovereignty
  • Absorb traffic spikes without over-provisioning local resources
  • Ensure service continuity during peak demand periods (seasonal traffic, marketing campaigns, unexpected load)
  • Pay only for additional public cloud resources when they are actually needed

From a network design perspective, cloud bursting requires:

  • Low-latency, high-bandwidth connectivity between the private environment and the public cloud (typically via direct interconnects or dedicated WAN links)
  • Consistent security policies across both environments
  • Application architectures that support horizontal scaling across cloud boundaries
  • Automated orchestration to detect capacity thresholds and trigger the burst
Note

Cloud bursting is distinct from other cloud traffic management concepts. Cloud policing and cloud shaping apply rate-limiting principles (similar to QoS policing and shaping) to cloud-bound traffic. Cloud bursting specifically addresses capacity overflow by extending workloads into the public cloud.

Review Questions

11. When consumers that leverage IaaS reach 100% resource capacity, what can be used to redirect the overflow of traffic to the public cloud, so there is no disruption to service?

  1. Cloud policing
  2. Cloud bursting
  3. Cloud shaping
  4. Cloud spill

b. Cloud bursting is a deployment model where an application runs in a private cloud or on-premises environment and bursts into the public cloud when demand exceeds local capacity, ensuring no service disruption.

Summary

This chapter covered advanced IP topics and services that are part of any network design. To avoid design defects, network designers must incorporate these services in an integrated, holistic approach rather than designing in isolation. The top-down design approach is a fundamental requirement for achieving a successful business-driven design, for example, ensuring the design complies with the organization’s security policy standards. Business priorities and design constraints must always be considered, adopting a “first things first” approach that accounts for existing limitations including staff knowledge, budget, and supported features and technologies.

Previous: Multicast Design | Next: Scalable Enterprise Campus Architecture Design