Architecture Overview

HORUS is built on four foundational concepts that work together to create a high-performance robotics runtime:

Loading diagram...
The Four Pillars of HORUS

The Node Model

Everything in HORUS is a Node. A node is an independent unit of computation with a well-defined lifecycle.

Why Nodes?

Robotics systems are inherently modular. A robot has sensors, actuators, planners, and controllers - each with different timing requirements and failure modes. By making each component a node, HORUS provides:

  • Isolation - A failing camera driver doesn't crash your motion controller
  • Composability - Mix and match nodes to build different robots
  • Testability - Test each node independently before integration
  • Reusability - Share nodes across projects via the package registry

Node Lifecycle

Every node follows the same lifecycle, ensuring predictable behavior:

Loading diagram...
Node State Machine
StateWhat Happens
UninitializedNode exists but hasn't started
InitializingSetting up resources, connecting to hardware
RunningActively processing - tick() called each cycle
PausedTemporarily suspended, can resume instantly
StoppingCleaning up, releasing resources
StoppedFully shut down
ErrorSomething went wrong, but recoverable
CrashedUnrecoverable failure

The Tick Model

Nodes don't run continuously - they tick. Each tick is a discrete unit of work:

fn tick(&mut self) {
    // Read inputs
    if let Some(sensor_data) = self.sensor_topic.recv() {
        // Process
        let command = self.compute_response(sensor_data);

        // Write outputs
        self.command_topic.send(command);
    }
}

This model enables:

  • Deterministic timing - Know exactly when each node runs
  • Profiling - Measure how long each tick takes
  • Scheduling intelligence - The scheduler can optimize execution order

Communication

Nodes need to exchange data. HORUS provides a single Topic API that automatically selects the optimal backend based on how many nodes are communicating and whether they're in the same process.

Topic: One API, Automatic Optimization

You always use the same Topic::new() call. HORUS automatically detects the topology and selects the fastest communication path — from ~3ns (same-thread) to ~167ns (cross-process, many-to-many):

// Same API for all communication patterns
let topic: Topic<Image> = Topic::new("camera.image")?;
topic.send(&frame);

// Another node subscribes — same API
let topic: Topic<Image> = Topic::new("camera.image")?;
if let Some(frame) = topic.recv() {
    // Process frame
}

No configuration needed — the backend is selected and upgraded transparently as participants join or leave.

Cross-Process Communication

Topics work transparently across process boundaries using shared memory:

Loading diagram...
Transparent Cross-Process Communication

Data goes through shared memory with sub-microsecond latency. Simple fixed-size types get an even faster zero-copy path automatically.


The Scheduler

The scheduler is the brain of HORUS. It decides when and how nodes execute.

Why a Scheduler?

Without coordination, nodes would:

  • Fight for CPU resources
  • Miss real-time deadlines
  • Waste cycles waiting for data that hasn't arrived

The HORUS scheduler solves these problems with intelligent orchestration.

Execution Modes

Different applications need different scheduling strategies:

Loading diagram...
Execution Modes
ModeBest ForTick Overhead
SequentialSafety-critical, debuggingMinimal
ParallelCPU-heavy workloadsVaries by core count

Profiling

The scheduler tracks node execution statistics for diagnostics and optimization:

  • Runtime Profiler - Tracks how long each node takes (mean, stddev, min/max)
  • Node Tiers - Annotate nodes with execution characteristics (UltraFast, Fast, Normal, etc.)

Safety Systems

Real robots need safety guarantees. The scheduler provides:

FeaturePurpose
WCET MonitoringDetect nodes exceeding time budgets
Fault ToleranceIsolate failing nodes automatically
Watchdog TimersDetect hung nodes
Black BoxFlight recorder for post-mortem analysis

Memory System

Large data (images, point clouds, ML tensors) needs special handling. Copying a 4K image between nodes would destroy performance.

Zero-Copy Design

HORUS uses shared memory pools for large data:

Loading diagram...
Zero-Copy: Write Once, Read Many

The image data is written once to shared memory. Each subscriber reads directly from the same memory location - no copying.

TensorPool

TensorPool manages shared memory allocation:

// Auto-managed pool via Topic<Tensor>
let topic: Topic<Tensor> = Topic::new("camera.rgb")?;
let handle = topic.alloc_tensor(&[1080, 1920, 3], TensorDtype::U8, Device::cpu())?;

// Write data (only done once)
let data = handle.data_slice_mut()?;
camera.capture_into(data);

// Send through Topic - only a lightweight descriptor is copied, not the image
topic.send_handle(&handle);

TensorPool characteristics:

  • Fast allocation (~100ns)
  • Automatic reference counting
  • Works across processes
  • Device-aware descriptors (CPU, future GPU support)

Python Integration

Python nodes share the same memory pool:

import horus
import numpy as np

# Receive tensor from Rust node
tensor = topic.recv()

# Zero-copy numpy view - no data copied!
array = np.array(tensor, copy=False)

# Process with numpy/PyTorch
result = model.predict(array)

Data Flow Example

Here's how these concepts work together in a typical perception-to-action pipeline:

Loading diagram...
Perception-Planning-Control Pipeline
ConnectionMechanismWhy
Camera → DetectorTensorPoolLarge image, zero-copy
Detector → PlannerTopicMultiple planners might subscribe
Planner → ControllerTopicMonitoring tools can observe
Controller → MotorsTopicDirect pipeline connection

Total pipeline latency: Under 1 microsecond for message passing (same-process backends).


Performance Summary

MetricValue
Same-thread topic~3 ns
Same-process topic~18-36 ns
Cross-process topic~50-167 ns
Scheduler tick overhead~50-100ns
TensorPool allocation~100ns

Design Philosophy

HORUS is built on these principles:

  1. Nodes are the unit of composition - Build robots by connecting nodes
  2. Communication is explicit - No hidden data flow, everything goes through Topic
  3. The scheduler is your friend - Let it optimize; don't fight it
  4. Zero-copy by default - Large data should never be copied unnecessarily
  5. Safety is not optional - Fault tolerance, watchdogs, and black boxes are built in

Next Steps