Microburst Detection

Overview

This feature requires deploying the WhiteOwl probe.

Microbursts are brief, intense spikes in network traffic that can cause packet loss, increased latency, and buffer overflows—even when average utilization appears normal. Traditional flow monitoring with 1-minute or 5-minute aggregation windows completely misses these sub-second events.

WhiteOwl adds microburst detection by analyzing traffic patterns at 10ms granularity within each flow, identifying peak throughput windows that may indicate problematic burst behavior.

How It Works

Probe-Side Collection

The WhiteOwl probe tracks traffic intensity within sliding time windows for each flow:

Flow Duration: 5 seconds
Traditional View: 500 KB total, 100 KB/s average ✓ Looks fine

Microburst View:
  Window 1 (0-10ms):    2 KB
  Window 2 (10-20ms):   3 KB
  Window 3 (20-30ms):   1 KB
  ...
  Window 47 (460-470ms): 150 KB  ← BURST! 15 MB/s instantaneous
  ...
  Window 500: 1 KB

The probe records:

max_bytes_per_window — The highest byte count seen in any single 10ms window
max_packets_per_window — The highest packet count seen in any single 10ms window

Why 10ms Windows?

Switch buffer timescales — Most switch buffers fill/drain in 1-50ms
TCP behavior — RTT-scale bursts affect congestion control
Practical detection — Catches bursts that cause real problems without excessive overhead

How Microbursts Are Reported in Flow Records

The WhiteOwl probe continuously monitors traffic intensity by dividing each flow into 10ms measurement windows. For every active flow, the probe tracks the byte and packet count within each window, comparing it against the current peak. When the flow record is exported (typically every 30 seconds), only the single highest 10ms window is included in the IPFIX record as max_bytes_per_window and max_packets_per_window. For example, a 30-second export interval contains roughly 3,000 individual 10ms windows — the probe evaluates all of them but only reports the worst-case peak. This design is intentional: microburst detection is about identifying the moment of greatest stress on switch buffers and link capacity, not the average. The burst ratio, calculated in ClickHouse by comparing the peak window against the flow's average throughput, provides a measure of how "spiky" a given flow is relative to its sustained rate.

Dashboard Usage

Add Widget → Select visualization type (Bar, Time Series, Table)
Data Source → probe_metrics
Metric → Choose one of:
- Max Burst (bytes) — for worst-case analysis
- Max Burst (packets) — for packet-based analysis
- Avg Burst (bytes) — for trend analysis
- Burst Ratio — for relative burstiness
Group By → Recommended dimensions:
- src_addr / dst_addr — Find bursty hosts
- dst_port / appid — Find bursty applications
- src_as / dst_as — Find bursty networks

Top Bursty Sources (Bar Chart)

Type: Bar Chart
Metric: Max Burst (bytes)
Group By: src_addr
Shows which source IPs generate the largest bursts

Burst Trends Over Time (Time Series)

Type: Time Series
Metric: Max Burst (bytes)
Group By: dst_port
Shows how burst patterns change over time by service

Burst Analysis Table

Type: Table
Metric: Max Burst (bytes)
Group By: src_addr, dst_addr, dst_port
Shows detailed burst data per conversation

Interpreting Results

What's a "Bad" Burst?

Context matters, but general guidelines:

Max Burst	Interpretation
Less than 10 KB	Normal, small transfers
10-100 KB	Moderate bursts, usually fine
100 KB - 1 MB	Significant bursts, check if causing issues
Greater than 1 MB	Large bursts, likely causing buffer pressure

Burst Ratio Guidelines

Ratio	Interpretation
1-2	Smooth, well-paced traffic
2-5	Mildly bursty, typical for web traffic
5-20	Bursty, may cause issues on congested links
20+	Very bursty, investigate application behavior

Common Causes of Microbursts

TCP Slow Start — New connections ramp up aggressively
Backup Jobs — Large file transfers with no rate limiting
Incast — Multiple servers responding simultaneously (common in distributed storage)
Video Streaming — Chunk-based delivery creates periodic bursts
Application Bugs — Poorly implemented sending loops

Summary

The microburst feature provides visibility into sub-second traffic patterns that traditional flow monitoring misses. By tracking peak throughput within 10ms windows, WhiteOwl can now identify:

Which hosts/applications generate problematic bursts
When bursts occur (time series analysis)
How "bursty" traffic is relative to its average (burst ratio)

This enables proactive identification of traffic patterns that may cause packet loss and latency issues before they impact users.

Microburst Detection

Overview

How It Works

Probe-Side Collection

Why 10ms Windows?

How Microbursts Are Reported in Flow Records

Dashboard Usage

Creating a Microburst Widget

Example Widget Configurations

Interpreting Results

What's a "Bad" Burst?

Burst Ratio Guidelines

Common Causes of Microbursts

Summary

Overview​

How It Works​

Probe-Side Collection​

Why 10ms Windows?​

How Microbursts Are Reported in Flow Records​

Dashboard Usage​

Creating a Microburst Widget​

Example Widget Configurations​

Interpreting Results​

What's a "Bad" Burst?​

Burst Ratio Guidelines​

Common Causes of Microbursts​

Summary​

Overview

How It Works

Probe-Side Collection

Why 10ms Windows?

How Microbursts Are Reported in Flow Records

Dashboard Usage

Creating a Microburst Widget

Example Widget Configurations

Interpreting Results

What's a "Bad" Burst?

Burst Ratio Guidelines

Common Causes of Microbursts

Summary