Scheduler

Key Takeaways

After reading this guide, you will understand:

  • How the Scheduler orchestrates node execution through init(), tick(), and shutdown() phases
  • Scheduler::new() as the single entry point, with builder methods for global settings
  • Per-node execution classes: .compute(), .on(topic), .async_io() — plus auto-RT from .budget() / .rate()
  • The fluent NodeBuilder API for adding nodes with .add(node).order(0).build()
  • Per-node configuration: .order(), .rate(), .budget(), .deadline(), .on_miss()
  • Composable scheduler builders: .watchdog(Duration), .blackbox(n), .require_rt(), .max_deadline_misses(n)
  • Priority-based execution where lower numbers run first (0 = highest priority)
  • Graceful shutdown via Ctrl+C signal handling

The Scheduler is the execution orchestrator in HORUS. It manages the node lifecycle, coordinates priority-based execution, and handles graceful shutdown.

What is the Scheduler?

The Scheduler is responsible for:

Node Registration: Adding nodes with the fluent builder API

Lifecycle Management: Calling init(), tick(), and shutdown() at the right times

Priority-Based Execution: Running nodes in priority order every tick

Signal Handling: Graceful shutdown on Ctrl+C

Performance Monitoring: Tracking execution metrics for all nodes

Failure Policies: Per-node failure handling (fatal, restart, skip, ignore)

Creating a Scheduler

Scheduler::new() — Lightweight, No Syscalls

new() detects runtime capabilities (~30-100us) but does not apply any OS-level features. Use builder methods to opt in to features.

use horus::prelude::*;

// Minimal — just capability detection, no syscalls
let mut scheduler = Scheduler::new();
scheduler.add(my_node).order(0).build()?;
scheduler.run()?;

Builder Methods — Global Settings

For production deployments, combine builder methods for the right settings:

use horus::prelude::*;

// Production robot — watchdog, blackbox, 1 kHz control loop
let mut scheduler = Scheduler::new()
    .watchdog(500_u64.ms())     // frozen node detection
    .blackbox(64)               // 64 MB flight recorder
    .tick_rate(1000_u64.hz());  // 1 kHz control loop

scheduler.add(motor_ctrl).order(0).rate(1000_u64.hz()).build()?;
scheduler.run()?;

All RT features that cannot be applied at runtime are recorded as degradations, not errors (see Real-Time Features below).

Composable Builders

Instead of presets, combine builder methods for your deployment needs:

Builder MethodWhat it enablesWhen to use
.watchdog(Duration)Frozen node detection — auto-creates safety monitorProduction robots
.blackbox(size_mb)BlackBox flight recorder with n MB bufferPost-mortem debugging
.max_deadline_misses(n)Emergency stop after n deadline misses (default: 100)Safety-critical
.require_rt()All RT features + memory locking + RT scheduling class. Panics without RT capabilitiesHard real-time systems
.prefer_rt()Request RT features (degrades gracefully if unavailable)Best-effort RT
.verbose(bool)Enable/disable non-emergency logging (default: true)Quieter production logs
.with_recording()Enable record/replaySession recording

Adding Nodes

Use the fluent builder API to add nodes with configuration:

let mut scheduler = Scheduler::new();

// Basic: just set execution order
scheduler.add(sensor_node).order(0).build()?;

// With per-node tick rate — auto-derives budget (80%) and deadline (95%), auto-marks as RT
scheduler.add(fast_sensor).order(0).rate(1000_u64.hz()).build()?;

// Real-time node with explicit budget and deadline (auto-marks as RT)
scheduler.add(motor_ctrl)
    .order(0)
    .budget(500_u64.us())     // 500μs max execution time → auto RT
    .deadline(1_u64.ms())     // 1ms deadline
    .on_miss(Miss::Skip)      // Skip tick on deadline miss
    .build()?;

// Chain multiple nodes
scheduler.add(safety_node).order(0).build()?;
scheduler.add(controller).order(10).build()?;
scheduler.add(sensor).order(50).build()?;
scheduler.add(logger).order(200).build()?;

NodeBuilder Methods

Execution classes (mutually exclusive — pick one per node):

MethodDescription
.compute()Mark as CPU-heavy compute node (may be scheduled on worker threads)
.on(topic)Event-driven — node ticks only when the given topic receives a message
.async_io()Async I/O node (non-blocking, suitable for network / file operations)

Note: RT is auto-detected when you set .budget(), .deadline(), or .rate(Frequency). There is no .rt() method — just set a budget or rate and the node is automatically marked as real-time. The default execution class is BestEffort.

Per-node configuration:

MethodDescription
.order(n)Set execution order (lower = runs first)
.rate(Frequency)Set tick rate (e.g. 1000_u64.hz()) — auto-derives budget (80% period) and deadline (95% period), auto-marks RT
.budget(Duration)Set tick budget (e.g. 200_u64.us()) — auto-marks as RT
.deadline(Duration)Set deadline (e.g. 1_u64.ms()) — auto-marks as RT
.on_miss(Miss)Set deadline miss policy (Warn, Skip, SafeMode, Stop)
.failure_policy(policy)Override failure handling policy
.build() / .build()Finalize and register the node (returns HorusResult)

Priority-Based Execution

Priorities are u32 values where lower numbers = higher priority:

RangeLevelUse Case
0-9CriticalSafety monitors, emergency stops, watchdogs
10-49HighControl loops, actuators, time-sensitive operations
50-99NormalSensors, filters, state estimation
100-199LowLogging, diagnostics, non-critical computation
200+BackgroundTelemetry, data recording

Nodes execute in priority order every tick:

let mut scheduler = Scheduler::new();

// Execution order: Safety -> Controller -> Sensor -> Logger
scheduler.add(safety_monitor).order(0).build()?;    // Runs 1st
scheduler.add(controller).order(10).build()?;        // Runs 2nd
scheduler.add(sensor).order(50).build()?;            // Runs 3rd
scheduler.add(logger).order(200).build()?;           // Runs 4th

Running the Scheduler

Continuous Mode

Run until Ctrl+C:

scheduler.run()?;

Duration-Limited

Run for a specific duration, then shutdown:

use std::time::Duration;

scheduler.run_for(Duration::from_secs(30))?;

Node-Specific Execution

Execute only specific nodes by name:

// Run only these nodes continuously
scheduler.tick(&["SensorNode", "MotorNode"])?;

// Run specific nodes for a duration
scheduler.tick_for(&["SensorNode"], Duration::from_secs(10))?;

Builder Methods

Tick Rate

// Set global tick rate (default: 100 Hz)
let scheduler = Scheduler::new()
    .tick_rate(1000_u64.hz()); // 1kHz control loop

Real-Time Features

RT is configured through composable builder methods. Nodes are automatically marked as RT when you set .budget(), .deadline(), or .rate(Frequency):

// Hard real-time — panics without RT capabilities
let mut scheduler = Scheduler::new()
    .require_rt()
    .watchdog(500_u64.ms())
    .tick_rate(1000_u64.hz());

// Per-node: rate auto-derives budget + deadline, auto-marks as RT
scheduler.add(motor_ctrl)
    .order(0)
    .rate(1000_u64.hz())      // budget=800us, deadline=950us → auto RT
    .on_miss(Miss::SafeMode)  // Enter safe mode on miss
    .build()?;

scheduler.add(sensor)
    .order(50)
    .compute()              // CPU-heavy, non-RT
    .build()?;

When using .require_rt(), the scheduler applies (in order):

  1. RT priority (SCHED_FIFO) — if available
  2. Memory locking (mlockall) — if permitted
  3. CPU affinity — only to isolated CPUs if they exist

Features that fail are recorded as degradations, not errors. Use .prefer_rt() instead if you want graceful degradation.

BlackBox Flight Recorder

Enable BlackBox recording through the builder API:

let mut scheduler = Scheduler::new()
    .blackbox(16);  // 16 MB flight recorder buffer

// All node ticks are now recorded to the flight recorder

Safety Monitor

.watchdog(Duration) enables frozen node detection and the safety monitor. Budget enforcement is implicit when nodes have .rate() set:

let mut scheduler = Scheduler::new()
    .watchdog(500_u64.ms())      // frozen node detection
    .blackbox(64)                // 64 MB flight recorder
    .tick_rate(1000_u64.hz());

// RT node with auto-derived budget and deadline from rate
scheduler.add(motor_ctrl)
    .order(0)
    .rate(1000_u64.hz())         // budget=800us, deadline=950us
    .on_miss(Miss::SafeMode)     // Enter safe mode on miss
    .build()?;

Per-Node Rate Control

Individual nodes can run at different frequencies:

let mut scheduler = Scheduler::new();
scheduler.add(fast_sensor).order(0).rate(1000_u64.hz()).build()?;  // 1kHz
scheduler.add(slow_logger).order(200).rate(10_u64.hz()).build()?;  // 10Hz

// Or set rates after adding
scheduler.set_node_rate("FastSensor", 100_u64.hz());
scheduler.set_node_rate("SlowLogger", 10_u64.hz());

Ergonomic Timing with DurationExt

HORUS provides extension methods on numeric types for creating Duration and Frequency values:

use horus::prelude::*;

// Duration helpers — works on u64
let budget = 200_u64.us();   // Duration::from_micros(200)
let deadline = 1_u64.ms();   // Duration::from_millis(1)

// Frequency — auto-derives budget (80% period) and deadline (95% period)
let freq = 500_u64.hz();
scheduler.add(motor_ctrl)
    .order(0)
    .rate(freq)           // auto-marks as RT, sets budget + deadline
    .on_miss(Miss::Skip)
    .build()?;

See DurationExt and Frequency for the full API reference.

The scheduler automatically adjusts its internal tick period to be fast enough for the highest-frequency node.

Lifecycle Management

Initialization Phase

When you call run(), the scheduler initializes all nodes by calling init():

  • All nodes initialize before the main loop starts
  • If init() fails, the node enters Error state and won't tick
  • Other nodes continue normally

Main Execution Loop

1. Initialize all nodes (call init())
2. Sort nodes by priority
3. Main loop:
   a. For each node (in priority order):
      - Check if node should tick (rate limiting)
      - Start tick timing
      - Call node.tick()
      - Record metrics, check budget/deadlines
   b. Sleep to maintain target tick rate
4. On shutdown signal:
   a. Set running = false
   b. Call shutdown() on all nodes

Graceful Shutdown

Ctrl+C is automatically caught:

^C
Ctrl+C received! Shutting down HORUS scheduler...
[Nodes shutting down gracefully...]
Scheduler shutdown complete
  • Main loop exits
  • Each node's shutdown() is called
  • Errors during shutdown are logged but don't prevent other nodes from cleaning up
  • Shared memory cleaned up

Recording and Replay

Recording Sessions

Enable recording through the builder API:

let mut scheduler = Scheduler::new()
    .blackbox(16);  // 16 MB flight recorder buffer

scheduler.add(my_node).order(0).build()?;

// Run normally — all node ticks are recorded
scheduler.run()?;

Replaying

let mut replay_scheduler = Scheduler::replay_from(
    "~/.horus/recordings/my_session".into()
)?;
replay_scheduler.run()?;

Performance Monitoring

Node Metrics

let metrics = scheduler.metrics();

for m in &metrics {
    println!("Node: {} (order: {})", m.name(), m.order());
    println!("  Ticks: {} total, {} ok, {} failed",
             m.total_ticks(), m.successful_ticks(), m.failed_ticks());
    println!("  Duration: avg={:.2}ms, min={:.2}ms, max={:.2}ms",
             m.avg_tick_duration_ms(), m.min_tick_duration_ms(), m.max_tick_duration_ms());
}

Other Monitoring Methods

MethodDescription
metrics()Get Vec<NodeMetrics> for all nodes
node_list()Get list of registered node names
safety_stats()Get budget overruns, deadline misses, watchdog expirations
is_running()Check if scheduler is running

Error Handling

Node Initialization Failure

If init() fails, the node enters Error state and won't tick. Other nodes continue:

fn init(&mut self) -> Result<()> {
    Err(Error::node("MySensor", "Sensor not connected")) // Node won't run, others unaffected
}

Runtime Errors

Handle errors gracefully in tick() — don't panic:

// GOOD: Handle errors
fn tick(&mut self) {
    match self.operation() {
        Ok(_) => {}
        Err(e) => hlog!(error, "Error: {}", e),
    }
}

// BAD: Panic crashes the scheduler
fn tick(&mut self) {
    self.operation().unwrap(); // Will crash!
}

Common Patterns

Layered Architecture

// Layer 1: Safety (Critical)
scheduler.add(collision_detector).order(0).build()?;
scheduler.add(emergency_stop).order(0).build()?;

// Layer 2: Control (High)
scheduler.add(pid_controller).order(10).build()?;
scheduler.add(motor_driver).order(10).build()?;

// Layer 3: Sensing (Normal)
scheduler.add(lidar_node).order(50).build()?;
scheduler.add(camera_node).order(50).build()?;

// Layer 4: Processing (Low)
scheduler.add(path_planner).order(100).build()?;

// Layer 5: Monitoring (Background)
scheduler.add(logger).order(200).build()?;
scheduler.add(diagnostics).order(200).build()?;

Production Deployment

let mut scheduler = Scheduler::new()
    .watchdog(500_u64.ms())
    .blackbox(64)
    .tick_rate(1000_u64.hz());

scheduler.add(safety_monitor).order(0).rate(1000_u64.hz()).on_miss(Miss::Stop).build()?;
scheduler.add(motor_ctrl).order(5).rate(1000_u64.hz()).on_miss(Miss::SafeMode).build()?;
scheduler.add(sensor).order(50).compute().build()?;
scheduler.add(logger).order(200).async_io().build()?;

scheduler.run()?;

Best Practices

Initialize heavy resources in init() — not in the constructor:

fn init(&mut self) -> Result<()> {
    self.buffer = vec![0.0; 10000];
    self.connection = connect_to_hardware()?;
    Ok(())
}

Keep tick() fast — each tick should complete within the tick period:

fn tick(&mut self) {
    let data = self.read_sensor();
    self.pub_topic.send(data);  // Fast!
}

Use appropriate priorities — don't make everything order 0:

scheduler.add(emergency_stop).order(0).build()?;   // Critical
scheduler.add(controller).order(10).build()?;       // High
scheduler.add(sensor).order(50).build()?;           // Normal
scheduler.add(logger).order(200).build()?;          // Background

Use composable builders for production — enable watchdog, blackbox, and safety monitoring:

// Use composable builders for production
let mut scheduler = Scheduler::new()
    .watchdog(500_u64.ms())
    .blackbox(16)
    .tick_rate(500_u64.hz());

Miss — Deadline Miss Policy

When a real-time node exceeds its deadline, the Miss policy determines what happens:

use horus::prelude::*;

scheduler.add(motor_ctrl)
    .order(0)
    .budget(500.us())
    .deadline(1.ms())
    .on_miss(Miss::SafeMode)  // Enter safe mode on miss
    .build()?;
PolicyBehaviorUse For
Miss::WarnLog a warning and continue normallySoft real-time nodes (logging, UI)
Miss::SkipSkip the node for this tickFirm real-time nodes (video encoding)
Miss::SafeModeCall enter_safe_state() on the nodeMotor controllers, safety nodes
Miss::StopStop the entire schedulerHard real-time safety-critical nodes

The default is Miss::Warn.

Method Reference

Constructor & Configuration

MethodReturnsDescription
Scheduler::new()SchedulerCreate scheduler with auto-detected capabilities
.tick_rate(freq)SelfSet global tick rate (default: 100 Hz)
.watchdog(Duration)SelfFrozen node detection — auto-creates safety monitor
.blackbox(size_mb)SelfBlackBox flight recorder with n MB buffer
.max_deadline_misses(n)SelfMax deadline misses before emergency stop (default: 100)
.prefer_rt()SelfTry RT features, degrade gracefully if unavailable
.require_rt()SelfEnable RT features, panic if unavailable
.verbose(bool)SelfEnable/disable non-emergency logging (default: true)
.with_recording()SelfEnable record/replay

Node Registration

MethodReturnsDescription
.add(node)NodeBuilderAdd node via fluent builder (recommended)

Execution Control

MethodReturnsDescription
.run()HorusResult<()>Main loop — run until Ctrl+C
.run_for(duration)HorusResult<()>Run for specific duration, then shutdown
.tick_once()HorusResult<()>Execute exactly one tick cycle (no loop, no sleep)
.tick_once_nodes(&[names])HorusResult<()>One tick cycle for specific nodes only
.stop()()Stop the scheduler
.is_running()boolCheck if scheduler is currently running

Monitoring & Statistics

MethodReturnsDescription
.metrics()Vec<NodeMetrics>Performance metrics for all nodes
.safety_stats()Option<SafetyStats>Budget overruns, deadline misses, watchdog expirations
.node_list()Vec<String>List of registered node names
.status()StringFormatted status report (platform, RT features, safety)

Runtime Configuration

MethodReturnsDescription
.set_node_rate(name, freq)&mut SelfChange per-node rate at runtime

Next Steps