Benchmarks

Performance measurement suite — scheduler, memory, IRQ, IPC, and stress benchmarks with statistical analysis.

Documentation

Suite Architecture

The benchmark suite (helix-benchmarks, ~500 lines) provides a framework for measuring kernel performance across all subsystems.

Dependencies

[dependencies]
helix-hal = { path = "../hal" }
helix-core = { path = "../core" }
helix-execution = { path = "../subsystems/execution" }
helix-memory = { path = "../subsystems/memory" }
helix-dis = { path = "../subsystems/dis" }
helix-modules = { path = "../modules" }

Benchmark Runner

benchmarks/src/lib.rs

21rust

pub struct BenchmarkDef {

  pub id: BenchmarkId,

  pub name: String,

  pub description: String,

  pub category: BenchmarkCategory,

  pub setup: Option<fn() -> bool>,

  pub run: fn() -> u64,          // Returns cycles

  pub teardown: Option<fn()>,

  pub baseline_cycles: Option<u64>,

}

pub struct Statistics {

  pub min: u64,

  pub max: u64,

  pub mean: u64,

  pub p50: u64,

  pub p95: u64,

  pub p99: u64,

  pub std_dev: u64,

  pub jitter: u64,

}

2 refs

pub enum BenchmarkCategory {

  Scheduler, Memory, Irq, Ipc, Stress, Custom,

}

Index

Feature Flags

[features]
verbose = []        # Print per-iteration timing
extended = []       # Run additional stress benchmarks
stress = []         # High-iteration stress tests

Configuration

Benchmark Parameters

Parameter	Default	Extended	Stress
Warmup iterations	100	1,000	10,000
Measured iterations	1,000	10,000	100,000
Statistical samples	10	100	1,000
Outlier detection	MAD	MAD	MAD
Output format	Summary	Detailed	Full

Running Benchmarks

# Via kernel shell
> bench

# Via make (runs in QEMU)
make bench

# Specific benchmark
> bench context_switch
> bench memory_alloc
> bench syscall_latency

Scheduler Benchmarks

Context Switch Latency

Measures the time to switch between two threads:

Context Switch Measurement3N · 2E

Minimap100%

100%

☝ Drag to pan·🤏 Pinch to zoom·Tap a node

Ctrl+FSearch

PPath

SStats

FFullscreen

EExport

Shift+DragMove node

↑↓Navigate

+/−Zoom

Metric	Target	Description
Mean	< 1 us	Average context switch time
P99	< 5 us	99th percentile (tail latency)
Min	~500 ns	Best case (hot cache)
Max	< 20 us	Worst case (cold cache, TLB flush)

Thread Creation

Measures time to create a new thread (allocate stack, initialize context, add to scheduler):

Metric	Target
Mean	< 10 us
P99	< 50 us

Scheduler Throughput

Measures scheduling decisions per second with varying thread counts:

Threads	Target Decisions/sec
10	> 1,000,000
100	> 500,000
1,000	> 100,000

DIS Intent Scheduling

Measures overhead of intent-based scheduling vs. simple priority scheduling:

Operation	Target
Intent classification	< 100 ns
Policy evaluation	< 500 ns
Queue selection	< 50 ns
Full DIS dispatch	< 2 us

Memory Benchmarks

Page Allocation

Allocator	Operation	Target
Bump	Single page	< 50 ns
Bitmap	Single page	< 200 ns
Bitmap	Contiguous 16 pages	< 1 us
Buddy	Single page	< 100 ns
Buddy	1 MB (256 pages)	< 500 ns
Buddy	8 MB (2048 pages)	< 1 us

Slab Allocation

Size Class	Alloc Target	Free Target
16 bytes	< 30 ns	< 20 ns
64 bytes	< 30 ns	< 20 ns
256 bytes	< 40 ns	< 25 ns
1024 bytes	< 50 ns	< 30 ns
2048 bytes	< 60 ns	< 35 ns

Virtual Memory

Operation	Target
Page map	< 200 ns
Page unmap	< 150 ns
TLB flush (single)	< 100 ns
TLB flush (full)	< 500 ns
mmap_anonymous (4 KB)	< 1 us
mmap_anonymous (1 MB)	< 10 us

Syscall Benchmarks

Syscall	Target Latency
`getpid` (trivial)	< 100 ns
`read` (0 bytes)	< 300 ns
`write` (serial, 1 byte)	< 500 ns
`mmap` (anonymous, 4 KB)	< 2 us
`fork`	< 50 us

IPC Benchmarks

Operation	Target
Channel send (64B)	< 100 ns
Channel recv (64B)	< 100 ns
OneShot round-trip	< 300 ns
EventBus publish (1 subscriber)	< 200 ns
EventBus publish (10 subscribers)	< 1 us
MessageRouter send	< 200 ns

Throughput

Scenario	Target
Channel: 1 producer, 1 consumer	> 5M msg/sec
Channel: 4 producers, 1 consumer	> 2M msg/sec
EventBus: 1 topic, 10 subscribers	> 1M msg/sec

Statistical Analysis

The benchmark framework uses rigorous statistical methods:

Outlier Detection

The Median Absolute Deviation (MAD) method identifies outliers:

Compute the median of all measurements
Compute the MAD = median(|xi - median|)
Flag values where |xi - median| > 3 * MAD as outliers
Report with and without outliers

Confidence Intervals

For each metric, the framework reports:

Mean with 95% confidence interval
Median (robust to outliers)
Standard deviation
Min / Max (absolute bounds)
P50, P90, P95, P99 (percentile distribution)

Comparison

When comparing two benchmark runs:

Benchmark: context_switch
  Before: mean=890ns, p99=4.2us
  After:  mean=850ns, p99=3.8us
  Change: -4.5% mean, -9.5% p99
  Significant: yes (p < 0.05, Mann-Whitney U test)

Running Benchmarks

From the Kernel Shell

helix> bench
=== Helix Benchmark Suite ===

[1/8] context_switch .............. 890ns mean (1000 iters)
[2/8] thread_create ............... 8.2us mean (1000 iters)
[3/8] page_alloc_bitmap ........... 180ns mean (10000 iters)
[4/8] page_alloc_buddy ............ 95ns  mean (10000 iters)
[5/8] slab_alloc_64 ............... 28ns  mean (10000 iters)
[6/8] syscall_getpid .............. 85ns  mean (10000 iters)
[7/8] ipc_channel_roundtrip ....... 210ns mean (10000 iters)
[8/8] dis_intent_classify ......... 75ns  mean (10000 iters)

All benchmarks passed target thresholds.

From Host

# Run all benchmarks in QEMU
make bench

# Run with extended iterations
make bench FEATURES=extended

# Run stress tests
make bench FEATURES=stress

Continuous Integration

Benchmarks run on every PR to detect performance regressions:

Build the kernel with profile.bench
Boot in QEMU with benchmark arguments
Parse serial output for results
Compare against baseline (stored in repo)
Fail the build if any metric regresses > 10%

Suite Architecture#

Dependencies#

Benchmark Runner#

Feature Flags#

Configuration#

Benchmark Parameters#

Running Benchmarks#

Scheduler Benchmarks#

Context Switch Latency#

Thread Creation#

Scheduler Throughput#

DIS Intent Scheduling#

Memory Benchmarks#

Page Allocation#

Slab Allocation#

Virtual Memory#

Syscall Benchmarks#

IPC Benchmarks#

Throughput#

Statistical Analysis#

Outlier Detection#

Confidence Intervals#

Comparison#

Running Benchmarks#

From the Kernel Shell#

From Host#

Continuous Integration#

Suite Architecture

Dependencies

Benchmark Runner

Feature Flags

Configuration

Benchmark Parameters

Running Benchmarks

Scheduler Benchmarks

Context Switch Latency

Thread Creation

Scheduler Throughput

DIS Intent Scheduling

Memory Benchmarks

Page Allocation

Slab Allocation

Virtual Memory

Syscall Benchmarks

IPC Benchmarks

Throughput

Statistical Analysis

Outlier Detection

Confidence Intervals

Comparison

Running Benchmarks

From the Kernel Shell

From Host

Continuous Integration