Multi-Process Architecture

HORUS topics work transparently across process boundaries. Two nodes in separate processes communicate the same way as two nodes in the same process — through shared memory. No broker, no serialization layer, no configuration.

To orchestrate multiple processes with session discovery, control routing, and e-stop propagation, see Launch System.

// simplified
// Process 1: sensor.rs
let topic: Topic<Imu> = Topic::new("imu")?;
topic.send(imu_reading);

// Process 2: controller.rs
let topic: Topic<Imu> = Topic::new("imu")?;  // same name = same topic
if let Some(reading) = topic.recv() {
    // Got it — zero-config, sub-microsecond
}

How It Works

When you call Topic::new("imu"), HORUS creates (or opens) a shared memory region. Any process on the same machine that calls Topic::new("imu") with the same type connects to the same underlying ring buffer. The shared memory backend is managed by horus_sys — you never configure paths manually.

HORUS auto-detects whether a topic is same-process or cross-process and picks the fastest path:

Scenario	Latency	How It Works
Same thread	~3ns	Direct pointer handoff
Same process, 1:1	~18ns	Lock-free single-producer/single-consumer ring buffer
Same process, 1:N	~24ns	Broadcast to multiple in-process subscribers
Same process, N:1	~26ns	Multiple in-process publishers, one subscriber
Same process, N:N	~36ns	Full many-to-many in-process
Cross-process, POD type	~50ns	Zero-copy shared memory (no serialization)
Cross-process, N:1	~65ns	Shared memory, multiple publishers
Cross-process, 1:N	~70ns	Shared memory, multiple subscribers
Cross-process, 1:1	~85ns	Shared memory, serialized type
Cross-process, N:N	~91ns	Shared memory, contention-free fan-out

Cross-process adds ~30-130ns vs in-process — still sub-microsecond. You don't configure any of this. The backend is selected automatically based on topology and upgrades transparently as participants join or leave.

Running Multiple Processes

Option 1: `horus run` with Multiple Files

# Builds and runs both files as separate processes
horus run sensor.rs controller.rs

# Mixed languages work too
horus run sensor.py controller.rs

# With release optimizations
horus run -r sensor.rs controller.rs

horus run compiles each file, then launches all processes and manages their lifecycle (SIGTERM on Ctrl+C, etc.).

Option 2: Separate Terminals

Run each node in its own terminal:

# Terminal 1
horus run sensor.rs

# Terminal 2
horus run controller.rs

Topics auto-discover via shared memory. No coordination needed.

Option 3: `horus launch` (YAML)

For production, declare your multi-process layout in a launch file:

# launch.yaml
nodes:
  - name: sensor
    cmd: horus run sensor.rs

  - name: controller
    cmd: horus run controller.rs

  - name: monitor
    cmd: horus run monitor.py

horus launch launch.yaml

Example: Two-Process Sensor Pipeline

Process 1 — sensor.rs:

// simplified
use horus::prelude::*;

message! {
    WheelEncoder {
        left_ticks: i64,
        right_ticks: i64,
        timestamp_ns: u64,
    }
}

struct EncoderNode {
    publisher: Topic<WheelEncoder>,
    ticks: i64,
}

impl EncoderNode {
    fn new() -> Result<Self> {
        Ok(Self {
            publisher: Topic::new("wheel.encoders")?,
            ticks: 0,
        })
    }
}

impl Node for EncoderNode {
    fn name(&self) -> &str { "Encoder" }

    fn tick(&mut self) {
        self.ticks += 10;
        self.publisher.send(WheelEncoder {
            left_ticks: self.ticks,
            right_ticks: self.ticks + 2,
            timestamp_ns: horus::now_ns(),
        });
    }
}

fn main() -> Result<()> {
    let mut sched = Scheduler::new().tick_rate(100_u64.hz());
    sched.add(EncoderNode::new()?).order(0).build()?;
    sched.run()?;
    Ok(())
}

Process 2 — odometry.rs:

// simplified
use horus::prelude::*;

message! {
    WheelEncoder {
        left_ticks: i64,
        right_ticks: i64,
        timestamp_ns: u64,
    }
}

struct OdometryNode {
    encoder_sub: Topic<WheelEncoder>,
    odom_pub: Topic<Odometry>,
    last_left: i64,
    last_right: i64,
}

impl OdometryNode {
    fn new() -> Result<Self> {
        Ok(Self {
            encoder_sub: Topic::new("wheel.encoders")?,
            odom_pub: Topic::new("odom")?,
            last_left: 0,
            last_right: 0,
        })
    }
}

impl Node for OdometryNode {
    fn name(&self) -> &str { "Odometry" }

    fn tick(&mut self) {
        if let Some(enc) = self.encoder_sub.recv() {
            let dl = enc.left_ticks - self.last_left;
            let dr = enc.right_ticks - self.last_right;
            self.last_left = enc.left_ticks;
            self.last_right = enc.right_ticks;

            println!("[Odom] delta L={} R={}", dl, dr);
        }
    }
}

fn main() -> Result<()> {
    let mut sched = Scheduler::new().tick_rate(100_u64.hz());
    sched.add(OdometryNode::new()?).order(0).build()?;
    sched.run()?;
    Ok(())
}

Run them:

# Terminal 1
horus run sensor.rs

# Terminal 2
horus run odometry.rs

The WheelEncoder messages flow through shared memory at ~50ns latency, with zero configuration.

When to Use Multi-Process

Factor	Single Process	Multi-Process
Latency	~3-36ns (intra-process)	~50-171ns (cross-process)
Determinism	Full control via scheduler ordering	Each process has its own scheduler
Isolation	A crash takes down everything	A crash is contained to one process
Languages	Single language per binary	Mix Rust + Python freely
Restart	Must restart everything	Restart one process independently
Debugging	Single debugger session	Attach debugger to one process
Deployment	One binary to deploy	Multiple binaries
Complexity	Simpler	More moving parts

Use single-process when:

All nodes are the same language
You need deterministic ordering between nodes (e.g., sensor → controller → actuator)
Latency matters at the nanosecond level
Simpler deployment is preferred

Use multi-process when:

Mixing Rust and Python (e.g., Rust motor control + Python ML inference)
Process isolation is needed (safety-critical separation)
Independent restart required (update one node without stopping others)
Different update rates or lifecycle requirements

Introspection

HORUS CLI tools work across processes automatically:

# See all topics (from any process)
horus topic list

# Monitor a topic published by another process
horus topic echo wheel.encoders

# See all running nodes across processes
horus node list

# Check bandwidth across processes
horus topic bw wheel.encoders

Cleaning Up

Shared memory files persist after processes exit. Clean them with:

horus clean --shm    # Remove stale shared memory regions

In practice, you rarely need this — HORUS automatically cleans stale SHM on every horus CLI command and every Scheduler::new() call. The manual command is an escape hatch for debugging.

What Happens When a Process Crashes

When a process dies (even via SIGKILL or power loss):

SHM files persist — the kernel closes the file descriptor and releases flock locks, but the mmap'd file stays on disk
Other processes continue — subscribers see dropped_count() increase if the publisher was mid-write, but they don't crash
Backend auto-migrates — when the crashed process restarts and reconnects, the topic detects the new participant and migrates the backend (e.g., from 1:1 to 1:N) within ~10μs
Automatic cleanup — the next horus CLI command or Scheduler::new() call auto-cleans stale namespaces (<1ms). No manual intervention needed.

# Process 1 crashes
# Process 2 keeps running, reading stale data from the ring buffer
# Process 1 restarts
# Process 2 sees fresh data again — no reconfiguration needed

Type mismatches: If a restarted process changes its message type (e.g., from CmdVel to Twist), the join fails with an error. Both processes must use the same message type for the same topic name.

Mixed-Language Multi-Process

The most common multi-process pattern: Rust for control loops, Python for ML inference.

Rust sensor node (sensor.rs):

// simplified
use horus::prelude::*;

struct CameraNode {
    pub_img: Topic<Image>,
}

impl CameraNode {
    fn new() -> Result<Self> {
        Ok(Self { pub_img: Topic::new("camera.rgb")? })
    }
}

impl Node for CameraNode {
    fn name(&self) -> &str { "camera" }
    fn tick(&mut self) {
        let mut img = Image::new(640, 480, "rgb8");
        // ... capture from hardware ...
        self.pub_img.send(img);
    }
}

fn main() -> Result<()> {
    let mut sched = Scheduler::new().tick_rate(30_u64.hz());
    sched.add(CameraNode::new()?).order(0).build()?;
    sched.run()
}

Run both:

# Terminal 1 (Rust, 30 FPS camera)
horus run sensor.rs

# Terminal 2 (Python, ML inference)
horus run detector.py

The Image flows through shared memory pool-backed transport — the Python node gets a zero-copy view of the pixels the Rust node wrote. No serialization, no copying.

Debugging Multi-Process Systems

Identify which process owns what

horus topic list --verbose
# Shows publisher/subscriber PIDs per topic

horus node list
# Shows all running nodes across all processes with PID, rate, CPU, memory

Watch cross-process data flow

# Monitor messages from the Rust sensor in the Python process's terminal
horus topic echo camera.rgb

# Measure the actual publishing rate
horus topic hz camera.rgb

# Measure bandwidth
horus topic bw camera.rgb

Debug one process at a time

# Start the sensor normally
horus run sensor.rs

# Start the detector with verbose logging
RUST_LOG=debug horus run detector.py

Use the monitor for a system-wide view

horus monitor
# Web UI at http://localhost:3000 shows ALL nodes from ALL processes
# Topic graph view shows cross-process message flow

Common debugging workflow

horus topic list — verify both processes see the same topics
horus topic hz <topic> — verify the publisher is sending at expected rate
horus topic echo <topic> — verify message content is correct
horus node list — verify both nodes are Running (not Error or Crashed)
horus bb --anomalies — check for deadline misses or errors

Common Errors

Error	Cause	Fix
Topics not visible across processes	Different SHM namespaces	Set `HORUS_NAMESPACE=shared` in both terminals, or use `horus launch`
`Type mismatch` on topic join	Process A uses `CmdVel`, Process B uses different type for same name	Ensure both processes use the exact same message type
Stale data after crash	SHM files persist after process death	Usually auto-cleaned on next `horus run`. Manual: `horus clean --shm`
`Topic not found` in CLI	CLI uses a different namespace than the running app	Run CLI in same terminal or set matching `HORUS_NAMESPACE`
High `dropped_count`	Subscriber process is slower than publisher	Increase subscriber rate, reduce publisher rate, or increase topic capacity
Permission denied on SHM	Different users running processes	Run both as the same user, or check `/dev/shm` permissions

Design Decisions

Why Auto-Discovery via Shared Memory Names

When you call Topic::new("imu") in two separate processes, both connect to the same shared memory region because the topic name deterministically maps to a shared memory path (managed by horus_sys). There is no registration step, no discovery protocol, and no configuration file listing topic endpoints. This works because shared memory is a kernel-level namespace — any process on the same machine that opens the same named region gets the same memory. Auto-discovery eliminates an entire class of misconfiguration bugs ("I forgot to register my topic") and means processes can start and stop in any order.

Why No Broker Process

Message brokers (like DDS in ROS2 or MQTT) add a routing hop between every publisher and subscriber. Even with optimizations, this hop adds latency and creates a single point of failure — if the broker crashes, all communication stops. HORUS uses direct shared memory: publishers write to a ring buffer, subscribers read from it, and no intermediary process routes messages. This gives sub-microsecond latency and means there is no central process that can fail. The cost is that HORUS topics only work on a single machine (cross-machine communication requires an explicit bridge).

Why Transparent Same-Process vs Cross-Process Selection

HORUS automatically detects whether a publisher and subscriber are in the same process or different processes and selects the fastest transport: direct pointer handoff (~3ns) for same-thread, lock-free ring buffer (~18ns) for same-process, or shared memory (~50ns) for cross-process. Users write the same Topic::new("name") call regardless. This means code that works in a single-process prototype deploys to a multi-process production system with zero changes. The transport upgrades and downgrades transparently as participants join and leave — splitting a monolith into separate processes does not require code changes.

Trade-offs

Area	Benefit	Cost
Auto-discovery	Zero configuration; processes connect by topic name alone; start/stop in any order	No explicit topology — harder to audit which processes are connected without `horus topic list`
No broker	Sub-microsecond latency; no single point of failure; no extra process to deploy	Single-machine only — cross-machine communication requires an explicit network bridge
Transparent transport	Same code works in single-process and multi-process; zero migration cost	Users cannot force a specific transport backend; automatic selection may surprise during debugging
Process isolation	One crash does not take down the system; independent restart and upgrade	Higher baseline latency (~50ns cross-process vs ~3ns same-thread); shared memory files persist after exit and need cleanup
Shared memory persistence	Fast reconnection — no handshake needed when a process restarts	Stale files from crashes are auto-cleaned on next startup; `horus clean --shm` for manual override
Independent schedulers	Each process can run at its own tick rate with its own ordering	No cross-process deterministic ordering — sensor-to-actuator chains across processes depend on timing, not scheduler order