Microburst Detection
Overview
This feature requires deploying the WhiteOwl probe.
Microbursts are brief, intense spikes in network traffic that can cause packet loss, increased latency, and buffer overflows—even when average utilization appears normal. Traditional flow monitoring with 1-minute or 5-minute aggregation windows completely misses these sub-second events.
WhiteOwl adds microburst detection by analyzing traffic patterns at 10ms granularity within each flow, identifying peak throughput windows that may indicate problematic burst behavior.
How It Works
Probe-Side Collection
The WhiteOwl probe tracks traffic intensity within sliding time windows for each flow:
Flow Duration: 5 seconds
Traditional View: 500 KB total, 100 KB/s average ✓ Looks fine
Microburst View:
Window 1 (0-10ms): 2 KB
Window 2 (10-20ms): 3 KB
Window 3 (20-30ms): 1 KB
...
Window 47 (460-470ms): 150 KB ← BURST! 15 MB/s instantaneous
...
Window 500: 1 KB
The probe records:
- max_bytes_per_window — The highest byte count seen in any single 10ms window
- max_packets_per_window — The highest packet count seen in any single 10ms window
Why 10ms Windows?
- Switch buffer timescales — Most switch buffers fill/drain in 1-50ms
- TCP behavior — RTT-scale bursts affect congestion control
- Practical detection — Catches bursts that cause real problems without excessive overhead
How Microbursts Are Reported in Flow Records
The WhiteOwl probe continuously monitors traffic intensity by dividing each flow into 10ms
measurement windows. For every active flow, the probe tracks the byte and packet count within
each window, comparing it against the current peak. When the flow record is
exported (typically every 30 seconds), only the single highest 10ms window is included in the
IPFIX record as max_bytes_per_window and max_packets_per_window. For example, a 30-second export
interval contains roughly 3,000 individual 10ms windows — the probe evaluates all of them but only
reports the worst-case peak. This design is intentional: microburst detection is about identifying
the moment of greatest stress on switch buffers and link capacity, not the average. The burst ratio,
calculated in ClickHouse by comparing the peak window against the flow's average throughput,
provides a measure of how "spiky" a given flow is relative to its sustained rate.
Dashboard Usage
Creating a Microburst Widget
- Add Widget → Select visualization type (Bar, Time Series, Table)
- Data Source →
probe_metrics - Metric → Choose one of:
Max Burst (bytes)— for worst-case analysisMax Burst (packets)— for packet-based analysisAvg Burst (bytes)— for trend analysisBurst Ratio— for relative burstiness
- Group By → Recommended dimensions:
src_addr/dst_addr— Find bursty hostsdst_port/appid— Find bursty applicationssrc_as/dst_as— Find bursty networks
Example Widget Configurations
Top Bursty Sources (Bar Chart)
- Type: Bar Chart
- Metric: Max Burst (bytes)
- Group By:
src_addr - Shows which source IPs generate the largest bursts
Burst Trends Over Time (Time Series)
- Type: Time Series
- Metric: Max Burst (bytes)
- Group By:
dst_port - Shows how burst patterns change over time by service
Burst Analysis Table
- Type: Table
- Metric: Max Burst (bytes)
- Group By:
src_addr,dst_addr,dst_port - Shows detailed burst data per conversation
Interpreting Results
What's a "Bad" Burst?
Context matters, but general guidelines:
| Max Burst | Interpretation |
|---|---|
| Less than 10 KB | Normal, small transfers |
| 10-100 KB | Moderate bursts, usually fine |
| 100 KB - 1 MB | Significant bursts, check if causing issues |
| Greater than 1 MB | Large bursts, likely causing buffer pressure |
Burst Ratio Guidelines
| Ratio | Interpretation |
|---|---|
| 1-2 | Smooth, well-paced traffic |
| 2-5 | Mildly bursty, typical for web traffic |
| 5-20 | Bursty, may cause issues on congested links |
| 20+ | Very bursty, investigate application behavior |
Common Causes of Microbursts
- TCP Slow Start — New connections ramp up aggressively
- Backup Jobs — Large file transfers with no rate limiting
- Incast — Multiple servers responding simultaneously (common in distributed storage)
- Video Streaming — Chunk-based delivery creates periodic bursts
- Application Bugs — Poorly implemented sending loops
Summary
The microburst feature provides visibility into sub-second traffic patterns that traditional flow monitoring misses. By tracking peak throughput within 10ms windows, WhiteOwl can now identify:
- Which hosts/applications generate problematic bursts
- When bursts occur (time series analysis)
- How "bursty" traffic is relative to its average (burst ratio)
This enables proactive identification of traffic patterns that may cause packet loss and latency issues before they impact users.