Communication Backends

You need to understand how HORUS routes messages between nodes and what latency to expect. HORUS automatically selects the optimal communication backend based on topology -- no configuration needed.

When To Use This

  • You want to understand the latency characteristics of node-to-node communication
  • You are debugging unexpected latency in your pub/sub pipeline
  • You are planning a multi-process architecture and need to know cross-process overhead
  • You want to know the roadmap for multi-machine communication

This is informational. You do not need to configure backends manually -- HORUS selects them automatically.

Prerequisites

⚠️Network Transport Not Yet Available

HORUS currently operates as a local shared memory IPC framework only. Network transport (Zenoh, UDP, etc.) is planned but no network backend code exists yet. All topics created with Topic::new("name") use local shared memory or in-process channels. This page documents the existing backends and the planned network direction.

Automatic Backend Selection

When you call Topic::new("name"), HORUS automatically detects the optimal backend based on the number of publishers, subscribers, and whether they're in the same process:

use horus::prelude::*;

// Just create a topic — backend is auto-selected
let topic = Topic::<CmdVel>::new("motors.cmd_vel")?;
topic.send(CmdVel { linear: 1.0, angular: 0.0 });

No configuration needed. The backend upgrades and downgrades dynamically as participants join and leave.

Verifying the active backend

Use horus topic list --verbose to see which backend each topic is using and how many publishers/subscribers are connected:

horus topic list --verbose
TOPIC                  TYPE        BACKEND       PUBS  SUBS  LATENCY
motors.cmd_vel         CmdVel      in-process    1     1     ~18ns
sensors.imu            Imu         in-process    1     3     ~24ns
camera.rgb             Image       shm           1     1     ~85ns
diagnostics.status     Generic     shm           2     2     ~91ns

The BACKEND column tells you whether a topic is using in-process channels or shared memory. If you expect in-process but see shm, one of the participants is in a different process. If you expect shm but see in-process, all participants happen to be in the same process (which is faster -- no action needed).

You can also check a single topic:

horus topic info motors.cmd_vel

This shows the backend type, message size, publisher/subscriber count, and measured latency.

Communication Paths

HORUS selects the optimal path based on where your nodes are running and how many publishers/subscribers are involved:

Same-Process Communication

ScenarioLatency
Same thread, 1 publisher → 1 subscriber~3ns
1 publisher → 1 subscriber~18ns
1 publisher → many subscribers~24ns
Many publishers → 1 subscriber~26ns
Many publishers → many subscribers~36ns

Same-process communication uses lock-free ring buffers. No system calls, no serialization, no copies. The subscriber reads directly from the publisher's buffer.

Cross-Process Communication (Shared Memory)

ScenarioLatency
1 publisher → 1 subscriber (simple types)~50ns
Many publishers → 1 subscriber~65ns
1 publisher → many subscribers~70ns
1 publisher → 1 subscriber~85ns
Many publishers → many subscribers~91ns

Cross-process communication uses POSIX shared memory (shm_open on Linux/macOS). The publisher writes to a shared memory segment; the subscriber reads from the same segment. For POD (plain old data) types, this is zero-copy -- the subscriber reads the bytes directly without deserialization.

How latency scales

Latency is determined by three factors:

Process locality. Same-thread is fastest (~14ns in benchmarks). Crossing a thread boundary adds ~68ns (14ns to 82ns). Crossing a process boundary adds another ~80ns (82ns to 162ns). These numbers are from actual benchmarks on a 4-core Linux system with performance CPU governor.

Topology. Each additional subscriber adds ~6-8ns of overhead (the publisher must write to each subscriber's slot). Each additional publisher adds synchronization cost (~10-15ns for the lock-free MPSC coordination).

Message size. For POD types, latency is nearly independent of message size up to a few KB (the shared memory segment is memory-mapped, so the OS handles paging). For serialized types (String, Vec, nested structs), serialization time scales linearly with payload size -- roughly 10ns per additional KB.

The path is selected based on:

  • Process locality: Same thread → same process → cross-process
  • Topology: Number of publishers and subscribers
  • Data type: Simple fixed-size types get the fastest cross-process path

Dynamic Migration

HORUS dynamically migrates between backends as topology changes:

Single publisher + single subscriber (same process)
  → ~18ns

Second subscriber joins (same process)
  → ~24ns

Subscriber in different process joins
  → ~70ns

All subscribers disconnect except one in-process
  → ~18ns

Migration is transparent -- send() and recv() calls are unaffected.

Performance Characteristics

MetricIn-ProcessShared Memory
Latency3-36ns50-171ns
ThroughputMillions msg/sMillions msg/s
Zero-copyYesYes
Cross-machineNoNo

Debugging Latency

If your topic latency is higher than expected, follow these steps in order:

Step 1: Check the backend type

horus topic list --verbose

If a topic shows shm but you expected in-process, one of the publishers or subscribers is in a different process. This alone adds ~80ns. Check whether all nodes that use this topic are in the same horus run invocation.

Step 2: Check if you are cross-process

Cross-process communication adds ~80ns over same-process. If all your nodes are in one process, you should see in-process backend. If you launched multiple horus run commands that share a topic, the backend upgrades to shm automatically.

# See how many processes are connected to a topic
horus topic info sensors.imu

Step 3: Check message size

For serialized types (anything with String, Vec, or Option), larger payloads take longer to serialize. Measure with:

horus topic hz sensors.imu --latency

If latency grows with message size, consider:

  • Switching to a #[fixed] POD type (zero-copy, no serialization)
  • Reducing the payload (send only changed fields)
  • Using GenericMessage only for prototyping, not production

Step 4: Check CPU governor

The CPU frequency governor has a major impact on latency. powersave mode can double latency compared to performance mode:

# Check current governor
cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor

# Set to performance (requires root)
sudo cpufreq-set -g performance

On embedded systems (Raspberry Pi, Jetson), the default governor is often ondemand or powersave. For real-time control loops, always use performance.

Step 5: Measure with horus topic hz

# Measure publish rate and latency for a running topic
horus topic hz motors.cmd_vel

This shows the actual publish frequency and per-message latency. If the measured rate is lower than the node's configured rate, the node's tick() function is taking too long -- the bottleneck is in your code, not the transport.

Step 6: Check for contention

If many publishers write to the same topic, lock-free coordination adds overhead. The horus topic list --verbose output shows publisher and subscriber counts. For high-frequency topics, prefer a single-publisher architecture (one node publishes, many subscribe).


Planned: Network Transport (Zenoh)

Future versions of HORUS will add Zenoh-based network transport for:

  • Multi-robot communication across machines
  • Cloud connectivity for telemetry and remote monitoring
  • ROS2 interoperability via Zenoh DDS bridge

The planned architecture adds network backends alongside the existing local backends:

TransportLatencyUse Case
In-process3-36nsSame-process nodes
Shared Memory50-171nsSame-machine IPC
Zenoh (planned)~100us+Multi-robot, cloud, ROS2

When network transport is implemented, Topic::new() will continue to auto-select the optimal backend. Network transport will only be used when topics are explicitly configured for remote communication.

Future: Multi-Machine Transport

HORUS does not yet support multi-machine communication natively. The likely approach is Zenoh, a pub/sub protocol designed for robotics and IoT that provides:

  • Automatic discovery of peers on the local network (no broker, no configuration)
  • Protocol flexibility -- TCP, UDP, and shared memory, selected automatically
  • DDS bridge for ROS2 interoperability without running a full DDS stack
  • Low overhead -- Zenoh adds ~100-200us of latency for LAN communication, compared to milliseconds for DDS

The planned integration will work like this: topics marked as remote in the HORUS configuration will be bridged to Zenoh. Local topics (the default) will continue to use shared memory. No code changes will be required -- only configuration.

What to do today for multi-machine setups

If you need multi-machine communication right now, you have two options:

Option 1: horus deploy (recommended). Deploy your HORUS project to a remote machine and run it there. Each machine runs its own HORUS instance with local shared memory. Use an external coordinator (HTTP, MQTT, or a custom bridge node) to exchange data between machines. This is the most reliable approach today.

# Deploy and run on a remote machine
horus deploy --target pi@192.168.1.10 --run

Option 2: Custom UDP bridge node. Write a node that subscribes to local topics, serializes messages, and sends them over UDP to a node on the other machine that deserializes and publishes locally. This adds ~1-5ms of latency (UDP + serialization) but works with any network topology.

// simplified
use horus::prelude::*;
use std::net::UdpSocket;

struct UdpBridge {
    cmd_sub: Topic<CmdVel>,
    socket: UdpSocket,
    remote_addr: String,
}

impl Node for UdpBridge {
    fn name(&self) -> &str { "UdpBridge" }
    fn tick(&mut self) {
        if let Some(cmd) = self.cmd_sub.recv() {
            let data = serde_json::to_vec(&cmd).unwrap();
            self.socket.send_to(&data, &self.remote_addr).ok();
        }
    }
}

Both approaches are interim solutions. When Zenoh support ships, migration will require only configuration changes, not code changes.


Design Decisions

Why shared memory instead of sockets?

Shared memory provides 50-171ns latency for cross-process communication. Unix domain sockets would add 5-15us. TCP sockets add 50-100us. For robotics control loops running at 1kHz (1ms budget), every microsecond matters. At 50ns, the transport is effectively invisible -- your control algorithm is always the bottleneck, not the communication layer.

Why automatic selection instead of manual configuration?

Developers should not need to know whether their nodes are in the same thread, same process, or different processes. Topic::new("name") always works, and HORUS picks the fastest available path. This also means topology changes (moving a node to a separate process) do not require code changes. A node that works in a single-process prototype continues to work when deployed as multiple processes -- the backend upgrades automatically.

Why dynamic migration between backends?

Nodes can start and stop at any time. When a cross-process subscriber joins, HORUS upgrades from in-process channels to shared memory. When it leaves, HORUS downgrades back. This happens transparently, with no message loss during migration. Without dynamic migration, you would need to pre-configure the backend at startup, which means topology changes require restarts.

Why not DDS?

DDS (Data Distribution Service) is the transport layer behind ROS2. It provides multi-machine communication, QoS policies, and automatic discovery. However:

  • Latency overhead. DDS adds 50-200us of latency even for same-machine communication. HORUS achieves 3-171ns.
  • Complexity. DDS requires configuring QoS profiles, domain IDs, and participant discovery. HORUS requires zero configuration.
  • Binary size. A DDS implementation (Fast-DDS, Cyclone DDS) adds 5-20MB to the binary. HORUS's shared memory backend adds <100KB.
  • Startup time. DDS participant discovery takes 1-5 seconds. HORUS topics are available immediately.

HORUS will support DDS interoperability via a Zenoh-DDS bridge for teams that need to integrate with ROS2 systems, without imposing DDS overhead on the core framework.

Why not raw TCP/UDP for cross-process?

Even on localhost, TCP adds ~50us and UDP adds ~20us per message due to kernel-to-userspace copies and system call overhead. Shared memory eliminates these copies entirely -- the publisher and subscriber read and write the same physical memory pages. The OS kernel is not involved in the data path at all.


Trade-offs

GainCost
Zero-configuration backend selectionLess explicit control over transport
Sub-microsecond latency for all local pathsNo cross-machine communication (yet)
Dynamic migration handles topology changes transparentlyBrief latency spike during migration (~1-2 messages)
Shared memory provides zero-copy cross-process IPCShared memory segments require cleanup on crash (horus clean --shm)
No DDS overhead or configurationNo built-in QoS policies (reliability, durability, history depth)
Immediate topic availability (no discovery phase)Topics must use dots not slashes for macOS compatibility
Backend auto-upgrade when cross-process subscribers joinLatency increases from ~18ns to ~70ns when upgrading to shm

Common Errors

SymptomCauseFix
Cross-process topic not receiving messagesTopic names do not match exactly (case-sensitive)Verify topic names are identical in both processes
Stale shared memory after crashProcess was killed without cleanupRun horus clean --shm to clear shared memory segments
Higher latency than expectedNodes in different processes when they could be in the same processMove nodes into the same process if latency is critical
Topic names with / fail on macOSmacOS shm_open does not support slashesUse dots instead of slashes: "sensors.lidar" not "sensors/lidar"
Latency doubles after system idleCPU governor switched to powersaveSet governor to performance: sudo cpufreq-set -g performance
horus topic list shows no topicsNo HORUS process is runningStart your application first, then inspect topics
Subscriber gets stale data on startupShared memory retains last message from previous runRun horus clean --shm before starting, or handle stale data in your node

See Also