Record & Replay
You need to capture a robot's execution and replay it later for debugging, regression testing, or what-if analysis. HORUS record/replay captures full node state and replays it with tick-perfect determinism.
When To Use This
- You need to reproduce a bug that only occurs in specific conditions (field test, customer site)
- You want to regression-test a new planner/controller against recorded sensor data
- You need to compare two algorithm versions on the same input data
- You are debugging a crash and need to step through the event timeline
Use BlackBox instead if you only need lightweight event logging for crash forensics. Record/Replay captures full node state (inputs/outputs) and is storage-heavy. The BlackBox captures event metadata and is always-on.
Prerequisites
- Familiarity with Scheduler Configuration -- especially
.with_recording()and.deterministic(true) - Understanding of Deterministic Mode for replay to produce identical results
- Understanding of Topics for how recorded data is injected into shared memory
Overview
The record/replay system supports:
- Full recording: Capture entire system execution
- Tick-perfect replay: Reproduce exact behavior deterministically
- Time travel: Jump to any recorded tick
- Mixed replay: Combine recorded nodes with live execution
- Playback control: Speed adjustment, tick ranges
Enabling Recording
Via Builder API
Enable recording through builder methods:
// simplified
use horus::prelude::*;
// Enable recording via builder API
let mut scheduler = Scheduler::new()
.with_recording();
Via CLI
# Record during a run
horus run --record my_session my_project
When recording is enabled, the scheduler automatically captures each node's inputs, outputs, and timing.
Replaying Recordings
Full Replay
Replay an entire recorded session:
use horus::prelude::*;
use std::path::PathBuf;
let mut scheduler = Scheduler::replay_from(
PathBuf::from("~/.local/share/horus/recordings/crash/scheduler@abc123.horus")
)?;
scheduler.run()?;
Time Travel
Jump to specific tick ranges during replay:
// Start at a specific tick
let mut scheduler = Scheduler::replay_from(path)?
.start_at_tick(1500);
// Stop at a specific tick
let mut scheduler = Scheduler::replay_from(path)?
.stop_at_tick(2000);
// Adjust playback speed (0.01 to 100.0)
let mut scheduler = Scheduler::replay_from(path)?
.with_replay_speed(0.5); // Half speed
Mixed Replay
Combine recorded nodes with live execution for what-if testing:
use horus::prelude::*;
let mut scheduler = Scheduler::new();
// Add replay nodes from recordings
scheduler.add_replay(
PathBuf::from("recordings/Lidar@001.horus"),
0, // priority
)?;
// Add live nodes alongside
scheduler.add(live_controller).order(1).build()?;
scheduler.run()?;
Output Overrides
Override specific outputs during replay:
let mut scheduler = Scheduler::replay_from(path)?
.with_override("sensor_node", "temperature", 25.0f32.to_le_bytes().to_vec());
CLI Commands
Record and replay from the command line:
# Start recording during a run
horus run --record my_session my_project
# List recording sessions
horus record list
horus record list --long # Show file sizes and tick counts
# Show details of a session
horus record info my_session
# Replay a recording
horus record replay my_session
horus record replay my_session --start-tick 1000 --stop-tick 2000
horus record replay my_session --speed 0.5
# Compare two recording sessions
horus record diff session1 session2
horus record diff session1 session2 --limit 50
# Export to JSON or CSV
horus record export my_session --output data.json --format json
horus record export my_session --output data.csv --format csv
# Inject recorded nodes into a new run
horus record inject my_session --nodes camera_node,lidar_node
horus record inject my_session --all --loop
# Delete a recording session
horus record delete my_session
horus record delete my_session --force
Managing Recordings
// simplified
// List all recording sessions
let sessions = Scheduler::list_recordings()?;
// Delete a recording session
Scheduler::delete_recording("old_session")?;
replay_from vs add_replay
| Method | Use Case | Clock |
|---|---|---|
Scheduler::replay_from(path) | Full replay — all nodes from one recording | ReplayClock (recorded timestamps) |
scheduler.add_replay(path, priority) | Mixed — replay some nodes, run others live | ReplayClock for replay nodes |
When to use which:
- Use
replay_from()for post-mortem debugging — replay an entire session exactly as recorded - Use
add_replay()for regression testing — replay recorded sensor data while running a new version of your planner/controller live
// Post-mortem: "what happened in production?"
let mut scheduler = Scheduler::replay_from("crash_session.hbag")?;
scheduler.run()?;
// Regression test: "does the new planner work with the same sensor data?"
let mut scheduler = Scheduler::new();
scheduler.add_replay("sensor_data.horus".into(), 0)?; // recorded LiDAR + IMU
scheduler.add(NewPlannerV2::new()).order(1).build()?; // live planner under test
scheduler.run()?;
Python Complete Recording Workflow
import horus
def sensor_tick(node):
node.send("imu", horus.Imu(accel_x=0.0, accel_y=0.0, accel_z=9.81))
sensor = horus.Node(name="imu", pubs=[horus.Imu], tick=sensor_tick, rate=100)
# Step 1: Record a session
sched = horus.Scheduler(tick_rate=100, recording=True)
sched.add(sensor)
sched.run(duration=5.0)
# Step 2: Get recording files
files = sched.stop_recording()
print(f"Recorded to: {files}")
# Step 3: List and manage
for rec in sched.list_recordings():
print(f" Available: {rec}")
# Step 4: Full replay
sched2 = horus.Scheduler.replay_from(files[0])
sched2.run()
# Step 5: Time travel replay
sched3 = horus.Scheduler.replay_from(files[0])
sched3.start_at_tick(100)
sched3.stop_at_tick(400)
sched3.set_replay_speed(0.5)
sched3.run()
# Step 6: Mixed replay (recorded sensor + new controller)
sched4 = horus.Scheduler(tick_rate=100)
sched4.add_replay("recordings/imu@001.horus", priority=0)
sched4.add(horus.Node(tick=new_controller, rate=100, order=1))
sched4.run()
Note: Python supports the full replay API:
Scheduler.replay_from(),add_replay(),start_at_tick(),stop_at_tick(),set_replay_speed(), andset_replay_override(). See examples below.
Design Decisions
Why record at the topic level instead of the node level?
Recording topic data (inputs/outputs) rather than internal node state means recordings are portable across code versions. You can replay recorded sensor data against a new planner without recompiling the sensor driver. This is the same approach used by ROS2's rosbag.
Why mixed replay instead of full-system-only replay?
The most common debugging workflow is: "replay the recorded sensors, but run my new controller live." Mixed replay enables this without re-recording. You swap out the node under test while keeping all other data identical.
Why .horus format instead of standard formats?
The .horus recording format preserves tick-level timing, shared memory layout, and type metadata. Standard formats (CSV, JSON) lose timing precision and type safety. Export to JSON/CSV is available via horus record export for analysis tools.
Trade-offs
| Gain | Cost |
|---|---|
| Tick-perfect deterministic replay | Recordings grow with session length (not bounded like BlackBox) |
| Mixed replay enables what-if testing | Replaying with different code may produce different results (expected) |
| Time travel to any tick | Random access requires indexing, which adds to recording size |
| CLI tools for comparison and export | Custom .horus format requires HORUS tools to read |
Common Errors
| Symptom | Cause | Fix |
|---|---|---|
| Recording is empty | .with_recording() not set on the scheduler | Add .with_recording() to the scheduler builder |
| Replay produces different results | Code changed between recording and replay | Use the same binary version, or use mixed replay for the changed node |
replay_from() fails with file not found | Incorrect recording path | Use horus record list to find available recordings |
| Mixed replay node does not receive data | Topic names do not match between recorded and live nodes | Verify topic names are identical (case-sensitive) |
| Replay runs instantly (no pacing) | Replay uses virtual time by default | Use .with_replay_speed(1.0) for real-time pacing |
| Large recording files filling disk | Long sessions with many topics | Use horus record delete to clean up, or record only specific sessions |
horus record diff shows no differences | Sessions are identical | This confirms both runs produced the same output |
See Also
- BlackBox Flight Recorder — Lightweight event recording for crash forensics
- Deterministic Mode — Required for bit-identical replay
- Scheduler Configuration —
.with_recording()and.deterministic(true)builder methods - Time API — ReplayClock and time backends