The digital world is built on connectivity. From streaming your favorite shows to the intricate dance of IoT sensors and the demanding workloads in the cloud, the network is the invisible, persistent presence that powers everything. But as networks grow in complexity and scale, particularly with the rise of AI-driven applications and distributed architectures combined with low-latency and high-throughput requirements, how do we get a clear picture of network health and optimize performance?
Evolving network performance demands require extensive visibility
For decades, network operators have relied on traditional probing methods like Bidirectional Forwarding Detection (BFD), Y.1731, and Internet Protocol Service Level Agreement (IP SLA). These active probing techniques have been instrumental in understanding service performance and measuring service level agreements (SLAs). However, much like the Internet Protocol (IP) itself, these solutions, while effective for certain use cases, are increasingly revealing their limitations in modern, hyperscale environments:
Scalability limits: Traditional probes struggle to keep pace, handling only a few thousand probes per second. This falls drastically short of the millions needed to cover all Equal Cost Multi-Path (ECMP) paths, often resulting in less than 1% path coverage—insufficient for today’s AI-scale data centers where AI workloads require per-path visibility.
Suboptimal latency metrics: Relying solely on minimum, maximum, or average values can be misleading. A single problematic path among many can have a sizeable impact on a segment of users, yet its effect is often masked by the overall average.
Path asymmetry challenges: Issues like loss and liveness can differ significantly between upstream and downstream paths. Two-way methods struggle to localize the problem, leaving operators without clarity on where the issue truly lies.
Lack of underlay visibility: The core transport network often remains a “black box,” offering minimal insight into how traffic truly flows. This makes accurate SLA validation and effective troubleshooting an ongoing challenge.
These limitations underscore the need for a solution that can discover and monitor all ECMP paths, deliver expanded probe rates, report accurately across these paths, provide continuous routing monitoring, and unleash powerful insights by correlating measurement and routing data.
The need for scale and per-path visibility becomes even more important in emerging environments such as large-scale AI data centers. AI workloads are highly sensitive to latency variation and congestion and often rely on deterministic path selection across massive ECMP fabrics. In these environments, understanding performance per individual path—not just per aggregate—is key.
Measure what matters with Integrated Performance Measurement (IPM)
Cisco, recognizing these evolving demands, has pioneered Integrated Performance Measurement (IPM). This innovative approach embeds performance measurement directly into the network hardware fabric, empowering a new era of scale, richness, and cost-efficiency in network performance monitoring.
IPM directly addresses the deep visibility requirements of large AI data centers by making it possible to measure every path, one by one, at scale. Importantly, IPM can be deployed in existing networks to dramatically improve visibility compared to legacy probing approaches. Segment Routing over IPv6 (SRv6) together with IPM becomes even more powerful: SRv6 provides deterministic traffic steering, while IPM provides deterministic, per-path measurement aligned with that intent.
This combination showcases why deterministic networking and per-path measurement are foundational in some of the world’s largest AI data center designs today—and why scale is no longer optional when it comes to performance measurement.
Optimize network performance connecting AI data centers
IPM is changing the game for AI data centers with:
Hardware-driven scale: IPM is built on a foundation of Cisco hardware innovation, which enables an astounding 14 million probes per second (MPPS) both out and in. This allows for granular, continuous measurement—one measurement every millisecond—across even the most complex network segments. Imagine monitoring 500 edge nodes with 16 ECMP paths and generating 8 million probes per second with ease.
Accurate one-way measurement: Leveraging One-Way Active Measurement Protocol (OWAMP) and Simple Two-Way Active Measurement Protocol (STAMP) (RFC8762/RFC8972) standards, IPM performs one-way probing. This eliminates exposure to the return path, allowing for highly accurate latency and loss measurements, providing a true picture of performance.
Comprehensive ECMP path coverage: IPM helps make sure every ECMP path is measured. By using random flow labels for each probe packet, it reports the experience across all paths, not just a sample, providing a complete view of network behavior.
Rich and actionable metrics: Moving beyond basic averages, IPM delivers:
Latency histograms: A 28-bin histogram digitalizes the latency curve, reporting the experience of the entire population and pinpointing issues that averages would hide (e.g., a single bad path impacting 6.25% of clients).
Absolute loss: Utilizing alternate marking (RFC9341), IPM provides precise, absolute loss figures, eliminating approximations.
Liveness detection: IPM offers continuous and accurate detection of path liveness.
Standard-based and flexible probing: IPM adheres to STAMP standards and offers extensive configuration flexibility, including configurable source/destination addresses, virtual routing and forwarding (VRF) instances, Differentiated Services Code Point (DSCP) values, ECMP modes (spray or dedicated flow label (FL)), explicit session IDs, and smooth integration with SRv6 microsegment (uSID) policies.
Maximize your results with the full IPM ecosystem: Assurance and routing analytics

Figure 1: Measure transport service performance across all ECMP paths for any given network path for comprehensive visibility
IPM is not a standalone feature; it’s a foundational element within a powerful ecosystem designed for holistic network assurance and automation:
Cisco Provider Connectivity Assurance (PCA): This serves as the robust data collection infrastructure, handling measurement, path analytics, and maintaining a comprehensive network status history within a time series database. PCA sensors and smart Small Form-Factor Pluggables (SFPs) are integral to IPM probing.
Cisco Crosswork Network Controller (CNC) with Routing Analytics: CNC integrates IPM-based insights with real-time routing data. Routing Analytics, a critical component of CNC Essentials, takes network visibility to the next level by providing real-time insights into the underlying routing infrastructure. It’s not enough to know what the performance is; you also need to know why and what’s expected.
Routing Analytics helpfully defines the baseline for performance measurements. It answers the fundamental question: “Is the measured latency good or bad?” by reporting the expected end-to-end propagation delay for each ECMP path. For example, if the measured delay is 13ms, but the current routing delay indicates a +1ms deviation from the baseline, network teams can quickly understand the context of that measurement.
The rich path information provided by Routing Analytics is invaluable for a breadth of use cases, including:
Service troubleshooting: Quickly pinpoint routing issues impacting service performance.
Traffic engineering policy design: Inform the design and optimization of traffic engineering policies by understanding path characteristics and delays.
Network optimization: Utilize path data to optimize routing decisions for latency-sensitive applications.
By providing a clear, real-time understanding of the routing underlay and its expected performance characteristics, Routing Analytics empowers operators to interpret IPM measurements with precision, allowing for proactive management and more effective troubleshooting.
Prepare for what’s ahead with network innovation
Cisco’s commitment to embedding performance measurement directly into hardware and network fabric, combined with powerful routing analytics and assurance, signifies a major leap forward in network operations. This integrated approach empowers network operators with deep visibility and control, helping ensure that as network demands continue to escalate, especially with the explosion of AI workloads, they have the tools to optimize performance and deliver superior user experiences.
Related blog posts:
IP Is Better Than Ever with SRv6 uSID
More Scale, More Intelligence, and More Control: New Cisco Solutions for Accelerating AI
Additional resources:
Integrated Performance Measurement technical documentation
Cisco IOS XR data sheet
Source:
blogs.cisco.com



