Record & Replay

You need to capture a robot's execution and replay it later for debugging, regression testing, or what-if analysis. HORUS record/replay captures full node state and replays it with tick-perfect determinism.

When To Use This

  • You need to reproduce a bug that only occurs in specific conditions (field test, customer site)
  • You want to regression-test a new planner/controller against recorded sensor data
  • You need to compare two algorithm versions on the same input data
  • You are debugging a crash and need to step through the event timeline

Use BlackBox instead if you only need lightweight event logging for crash forensics. Record/Replay captures full node state (inputs/outputs) and is storage-heavy. The BlackBox captures event metadata and is always-on.

Prerequisites

  • Familiarity with Scheduler Configuration -- especially .with_recording() and .deterministic(true)
  • Understanding of Deterministic Mode for replay to produce identical results
  • Understanding of Topics for how recorded data is injected into shared memory

Overview

The record/replay system supports:

  • Full recording: Capture entire system execution
  • Tick-perfect replay: Reproduce exact behavior deterministically
  • Time travel: Jump to any recorded tick
  • Mixed replay: Combine recorded nodes with live execution
  • Playback control: Speed adjustment, tick ranges

Enabling Recording

Via Builder API

Enable recording through builder methods:

// simplified
use horus::prelude::*;

// Enable recording via builder API
let mut scheduler = Scheduler::new()
    .with_recording();

Via CLI

# Record during a run
horus run --record my_session my_project

When recording is enabled, the scheduler automatically captures each node's inputs, outputs, and timing.

Replaying Recordings

Full Replay

Replay an entire recorded session:

use horus::prelude::*;
use std::path::PathBuf;

let mut scheduler = Scheduler::replay_from(
    PathBuf::from("~/.local/share/horus/recordings/crash/scheduler@abc123.horus")
)?;
scheduler.run()?;

Time Travel

Jump to specific tick ranges during replay:

// Start at a specific tick
let mut scheduler = Scheduler::replay_from(path)?
    .start_at_tick(1500);

// Stop at a specific tick
let mut scheduler = Scheduler::replay_from(path)?
    .stop_at_tick(2000);

// Adjust playback speed (0.01 to 100.0)
let mut scheduler = Scheduler::replay_from(path)?
    .with_replay_speed(0.5);  // Half speed

Mixed Replay

Combine recorded nodes with live execution for what-if testing:

use horus::prelude::*;

let mut scheduler = Scheduler::new();

// Add replay nodes from recordings
scheduler.add_replay(
    PathBuf::from("recordings/Lidar@001.horus"),
    0,  // priority
)?;

// Add live nodes alongside
scheduler.add(live_controller).order(1).build()?;

scheduler.run()?;

Output Overrides

Override specific outputs during replay:

let mut scheduler = Scheduler::replay_from(path)?
    .with_override("sensor_node", "temperature", 25.0f32.to_le_bytes().to_vec());

CLI Commands

Record and replay from the command line:

# Start recording during a run
horus run --record my_session my_project

# List recording sessions
horus record list
horus record list --long  # Show file sizes and tick counts

# Show details of a session
horus record info my_session

# Replay a recording
horus record replay my_session
horus record replay my_session --start-tick 1000 --stop-tick 2000
horus record replay my_session --speed 0.5

# Compare two recording sessions
horus record diff session1 session2
horus record diff session1 session2 --limit 50

# Export to JSON or CSV
horus record export my_session --output data.json --format json
horus record export my_session --output data.csv --format csv

# Inject recorded nodes into a new run
horus record inject my_session --nodes camera_node,lidar_node
horus record inject my_session --all --loop

# Delete a recording session
horus record delete my_session
horus record delete my_session --force

Managing Recordings

// simplified
// List all recording sessions
let sessions = Scheduler::list_recordings()?;

// Delete a recording session
Scheduler::delete_recording("old_session")?;

replay_from vs add_replay

MethodUse CaseClock
Scheduler::replay_from(path)Full replay — all nodes from one recordingReplayClock (recorded timestamps)
scheduler.add_replay(path, priority)Mixed — replay some nodes, run others liveReplayClock for replay nodes

When to use which:

  • Use replay_from() for post-mortem debugging — replay an entire session exactly as recorded
  • Use add_replay() for regression testing — replay recorded sensor data while running a new version of your planner/controller live
// Post-mortem: "what happened in production?"
let mut scheduler = Scheduler::replay_from("crash_session.hbag")?;
scheduler.run()?;

// Regression test: "does the new planner work with the same sensor data?"
let mut scheduler = Scheduler::new();
scheduler.add_replay("sensor_data.horus".into(), 0)?;  // recorded LiDAR + IMU
scheduler.add(NewPlannerV2::new()).order(1).build()?;    // live planner under test
scheduler.run()?;

Python Complete Recording Workflow

import horus

def sensor_tick(node):
    node.send("imu", horus.Imu(accel_x=0.0, accel_y=0.0, accel_z=9.81))

sensor = horus.Node(name="imu", pubs=[horus.Imu], tick=sensor_tick, rate=100)

# Step 1: Record a session
sched = horus.Scheduler(tick_rate=100, recording=True)
sched.add(sensor)
sched.run(duration=5.0)

# Step 2: Get recording files
files = sched.stop_recording()
print(f"Recorded to: {files}")

# Step 3: List and manage
for rec in sched.list_recordings():
    print(f"  Available: {rec}")

# Step 4: Full replay
sched2 = horus.Scheduler.replay_from(files[0])
sched2.run()

# Step 5: Time travel replay
sched3 = horus.Scheduler.replay_from(files[0])
sched3.start_at_tick(100)
sched3.stop_at_tick(400)
sched3.set_replay_speed(0.5)
sched3.run()

# Step 6: Mixed replay (recorded sensor + new controller)
sched4 = horus.Scheduler(tick_rate=100)
sched4.add_replay("recordings/imu@001.horus", priority=0)
sched4.add(horus.Node(tick=new_controller, rate=100, order=1))
sched4.run()

Note: Python supports the full replay API: Scheduler.replay_from(), add_replay(), start_at_tick(), stop_at_tick(), set_replay_speed(), and set_replay_override(). See examples below.


Design Decisions

Why record at the topic level instead of the node level?

Recording topic data (inputs/outputs) rather than internal node state means recordings are portable across code versions. You can replay recorded sensor data against a new planner without recompiling the sensor driver. This is the same approach used by ROS2's rosbag.

Why mixed replay instead of full-system-only replay?

The most common debugging workflow is: "replay the recorded sensors, but run my new controller live." Mixed replay enables this without re-recording. You swap out the node under test while keeping all other data identical.

Why .horus format instead of standard formats?

The .horus recording format preserves tick-level timing, shared memory layout, and type metadata. Standard formats (CSV, JSON) lose timing precision and type safety. Export to JSON/CSV is available via horus record export for analysis tools.

Trade-offs

GainCost
Tick-perfect deterministic replayRecordings grow with session length (not bounded like BlackBox)
Mixed replay enables what-if testingReplaying with different code may produce different results (expected)
Time travel to any tickRandom access requires indexing, which adds to recording size
CLI tools for comparison and exportCustom .horus format requires HORUS tools to read

Common Errors

SymptomCauseFix
Recording is empty.with_recording() not set on the schedulerAdd .with_recording() to the scheduler builder
Replay produces different resultsCode changed between recording and replayUse the same binary version, or use mixed replay for the changed node
replay_from() fails with file not foundIncorrect recording pathUse horus record list to find available recordings
Mixed replay node does not receive dataTopic names do not match between recorded and live nodesVerify topic names are identical (case-sensitive)
Replay runs instantly (no pacing)Replay uses virtual time by defaultUse .with_replay_speed(1.0) for real-time pacing
Large recording files filling diskLong sessions with many topicsUse horus record delete to clean up, or record only specific sessions
horus record diff shows no differencesSessions are identicalThis confirms both runs produced the same output

See Also