Benchmarks

Performance measurement suite — scheduler, memory, IRQ, IPC, and stress benchmarks with statistical analysis.

Documentation

Suite Architecture

The benchmark suite (helix-benchmarks, ~500 lines) provides a framework for measuring kernel performance across all subsystems.

Dependencies

[dependencies]
helix-hal = { path = "../hal" }
helix-core = { path = "../core" }
helix-execution = { path = "../subsystems/execution" }
helix-memory = { path = "../subsystems/memory" }
helix-dis = { path = "../subsystems/dis" }
helix-modules = { path = "../modules" }

Benchmark Runner

benchmarks/src/lib.rs
rust
pub struct BenchmarkDef {
2
pub id: BenchmarkId,
3
pub name: String,
4
pub description: String,
5
pub category: BenchmarkCategory,
6
pub setup: Option<fn() -> bool>,
7
pub run: fn() -> u64, // Returns cycles
8
pub teardown: Option<fn()>,
9
pub baseline_cycles: Option<u64>,
10
}
11
pub struct Statistics {
13
pub min: u64,
14
pub max: u64,
15
pub mean: u64,
16
pub p50: u64,
17
pub p95: u64,
18
pub p99: u64,
19
pub std_dev: u64,
20
pub jitter: u64,
21
}
22
2 refs
pub enum BenchmarkCategory {
24
Scheduler, Memory, Irq, Ipc, Stress, Custom,
25
}
Index

Feature Flags

[features]
verbose = []        # Print per-iteration timing
extended = []       # Run additional stress benchmarks
stress = []         # High-iteration stress tests

Configuration

Benchmark Parameters

ParameterDefaultExtendedStress
Warmup iterations1001,00010,000
Measured iterations1,00010,000100,000
Statistical samples101001,000
Outlier detectionMADMADMAD
Output formatSummaryDetailedFull

Running Benchmarks

# Via kernel shell
> bench

# Via make (runs in QEMU)
make bench

# Specific benchmark
> bench context_switch
> bench memory_alloc
> bench syscall_latency

Scheduler Benchmarks

Context Switch Latency

Measures the time to switch between two threads:

Context Switch Measurement3N · 2E
yieldswitch ⏱Thread ARunning thread1SchedulerContext switch decis…2Thread BResumed thread1
100%
☝ Drag to pan·🤏 Pinch to zoom·Tap a node
MetricTargetDescription
Mean< 1 usAverage context switch time
P99< 5 us99th percentile (tail latency)
Min~500 nsBest case (hot cache)
Max< 20 usWorst case (cold cache, TLB flush)

Thread Creation

Measures time to create a new thread (allocate stack, initialize context, add to scheduler):

MetricTarget
Mean< 10 us
P99< 50 us

Scheduler Throughput

Measures scheduling decisions per second with varying thread counts:

ThreadsTarget Decisions/sec
10> 1,000,000
100> 500,000
1,000> 100,000

DIS Intent Scheduling

Measures overhead of intent-based scheduling vs. simple priority scheduling:

OperationTarget
Intent classification< 100 ns
Policy evaluation< 500 ns
Queue selection< 50 ns
Full DIS dispatch< 2 us

Memory Benchmarks

Page Allocation

AllocatorOperationTarget
BumpSingle page< 50 ns
BitmapSingle page< 200 ns
BitmapContiguous 16 pages< 1 us
BuddySingle page< 100 ns
Buddy1 MB (256 pages)< 500 ns
Buddy8 MB (2048 pages)< 1 us

Slab Allocation

Size ClassAlloc TargetFree Target
16 bytes< 30 ns< 20 ns
64 bytes< 30 ns< 20 ns
256 bytes< 40 ns< 25 ns
1024 bytes< 50 ns< 30 ns
2048 bytes< 60 ns< 35 ns

Virtual Memory

OperationTarget
Page map< 200 ns
Page unmap< 150 ns
TLB flush (single)< 100 ns
TLB flush (full)< 500 ns
mmap_anonymous (4 KB)< 1 us
mmap_anonymous (1 MB)< 10 us

Syscall Benchmarks

SyscallTarget Latency
getpid (trivial)< 100 ns
read (0 bytes)< 300 ns
write (serial, 1 byte)< 500 ns
mmap (anonymous, 4 KB)< 2 us
fork< 50 us

IPC Benchmarks

OperationTarget
Channel send (64B)< 100 ns
Channel recv (64B)< 100 ns
OneShot round-trip< 300 ns
EventBus publish (1 subscriber)< 200 ns
EventBus publish (10 subscribers)< 1 us
MessageRouter send< 200 ns

Throughput

ScenarioTarget
Channel: 1 producer, 1 consumer> 5M msg/sec
Channel: 4 producers, 1 consumer> 2M msg/sec
EventBus: 1 topic, 10 subscribers> 1M msg/sec

Statistical Analysis

The benchmark framework uses rigorous statistical methods:

Outlier Detection

The Median Absolute Deviation (MAD) method identifies outliers:

  • Compute the median of all measurements
  • Compute the MAD = median(|xi - median|)
  • Flag values where |xi - median| > 3 * MAD as outliers
  • Report with and without outliers

Confidence Intervals

For each metric, the framework reports:

  • Mean with 95% confidence interval
  • Median (robust to outliers)
  • Standard deviation
  • Min / Max (absolute bounds)
  • P50, P90, P95, P99 (percentile distribution)

Comparison

When comparing two benchmark runs:

Benchmark: context_switch
  Before: mean=890ns, p99=4.2us
  After:  mean=850ns, p99=3.8us
  Change: -4.5% mean, -9.5% p99
  Significant: yes (p < 0.05, Mann-Whitney U test)

Running Benchmarks

From the Kernel Shell

helix> bench
=== Helix Benchmark Suite ===

[1/8] context_switch .............. 890ns mean (1000 iters)
[2/8] thread_create ............... 8.2us mean (1000 iters)
[3/8] page_alloc_bitmap ........... 180ns mean (10000 iters)
[4/8] page_alloc_buddy ............ 95ns  mean (10000 iters)
[5/8] slab_alloc_64 ............... 28ns  mean (10000 iters)
[6/8] syscall_getpid .............. 85ns  mean (10000 iters)
[7/8] ipc_channel_roundtrip ....... 210ns mean (10000 iters)
[8/8] dis_intent_classify ......... 75ns  mean (10000 iters)

All benchmarks passed target thresholds.

From Host

# Run all benchmarks in QEMU
make bench

# Run with extended iterations
make bench FEATURES=extended

# Run stress tests
make bench FEATURES=stress

Continuous Integration

Benchmarks run on every PR to detect performance regressions:

  1. Build the kernel with profile.bench
  2. Boot in QEMU with benchmark arguments
  3. Parse serial output for results
  4. Compare against baseline (stored in repo)
  5. Fail the build if any metric regresses > 10%