# HORUS Documentation (Full) Generated: 2026-04-23T06:28:55.924Z Pages: 295 Source: https://docs.horusrobotics.dev Condensed overview: https://docs.horusrobotics.dev/llms.txt This file contains the complete HORUS documentation in plain text. AI agents can use this to understand the full framework. ## Table of Contents - Getting Started (17 pages) - Learn (4 pages) - Core Concepts (25 pages) - Tutorials (28 pages) - Rust (37 pages) - Python (66 pages) - Development (14 pages) - Advanced Topics (9 pages) - Standard Library (16 pages) - Plugins (3 pages) - Package Management (5 pages) - Performance (4 pages) - Operations (2 pages) - Reference (7 pages) - Recipes (33 pages) - cpp (25 pages) ======================================== # SECTION: Getting Started ======================================== --- ## Getting Started with C++ Path: /getting-started/cpp Description: Build robotics applications in C++ with HORUS — zero-copy IPC, real-time scheduling, and 70+ message types # Getting Started with C++ HORUS provides idiomatic C++ bindings with zero-copy shared memory IPC, real-time scheduling, and the full message type system. The C++ API matches the Rust API's method names and patterns — if you know one, you know both. ## Prerequisites - CMake 3.20+ - GCC 12+ or Clang 15+ - HORUS installed (`curl -fsSL https://github.com/softmata/horus/raw/release/install.sh | bash`) ## Quick Start Create a new C++ project: ```bash horus new my-robot --cpp cd my-robot horus run ``` This generates a working controller that publishes velocity commands: ```cpp #include using namespace horus::literals; int main() { auto sched = horus::Scheduler() .tick_rate(100_hz) .name("my-robot"); auto cmd_pub = sched.advertise("cmd_vel"); sched.add("controller") .rate(50_hz) .budget(5_ms) .on_miss(horus::Miss::Skip) .tick([&] { auto sample = cmd_pub.loan(); sample->linear = 0.3f; sample->angular = 0.0f; cmd_pub.publish(std::move(sample)); }) .build(); sched.spin(); } ``` ## Core Concepts ### Scheduler The scheduler creates, configures, and runs nodes: ```cpp auto sched = horus::Scheduler() .tick_rate(100_hz) // 100 Hz main loop .prefer_rt() // use RT scheduling if available .deterministic(true) // reproducible execution .watchdog(5_s); // 5 second watchdog ``` ### Nodes — Three Ways **1. Lambda (quick scripts):** ```cpp sched.add("motor_ctrl") .rate(1000_hz) .budget(800_us) .on_miss(horus::Miss::Skip) .order(0) .tick([&] { /* control logic */ }) .build(); ``` **2. Struct-based (stateful, lifecycle hooks — like Rust `impl Node`):** ```cpp class Controller : public horus::Node { public: Controller() : Node("controller") { scan_ = subscribe("lidar.scan"); cmd_ = advertise("motor.cmd"); } void tick() override { auto scan = scan_->recv(); if (!scan) return; msg::CmdVel cmd{}; cmd.linear = scan->get()->linear > 0.5f ? 0.3f : 0.0f; cmd_->send(cmd); } void init() override { /* called once */ } void enter_safe_state() override { /* stop motors */ } private: Subscriber* scan_; Publisher* cmd_; }; Controller ctrl; sched.add(ctrl).rate(100_hz).order(10).build(); ``` **3. LambdaNode (declarative — like Python `horus.Node()`):** ```cpp auto node = horus::LambdaNode("controller") .sub("lidar.scan") .pub("motor.cmd") .on_tick([](horus::LambdaNode& self) { auto scan = self.recv("lidar.scan"); if (!scan) return; self.send("motor.cmd", msg::CmdVel{0, 0.3f, 0.0f}); }); sched.add(node).order(10).build(); ``` ### Zero-Copy Publishing (Loan Pattern) The loan pattern gives you a direct pointer to shared memory — no copies: ```cpp horus::Publisher pub("cmd_vel"); auto sample = pub.loan(); // ~3ns — get SHM buffer sample->linear = 0.5f; // 0ns — direct write sample->angular = 0.1f; // 0ns — direct write pub.publish(std::move(sample)); // ~3ns — make visible // Total overhead: ~6ns constant, regardless of message size ``` ### Subscribing ```cpp horus::Subscriber sub("lidar.scan"); auto scan = sub.recv(); // returns std::optional if (!scan) return; // no message yet float range = scan->get()->linear; // read from SHM ``` ### Duration Literals ```cpp using namespace horus::literals; auto freq = 100_hz; // Frequency(100.0) auto budget = 5_ms; // 5000 microseconds auto jitter = 200_us; // 200 microseconds auto timeout = 3_s; // 3 seconds ``` ## Message Types HORUS provides 70+ pre-defined message types in `horus::msg::`: | Category | Types | Header | |----------|-------|--------| | Geometry | `Twist`, `Pose2D`, `Pose3D`, `Quaternion`, `Vector3`, `Point3` | `msg/geometry.hpp` | | Sensor | `LaserScan`, `Imu`, `Odometry`, `JointState`, `NavSatFix` | `msg/sensor.hpp` | | Control | `CmdVel`, `MotorCommand`, `ServoCommand`, `PidConfig` | `msg/control.hpp` | | Vision | `CameraInfo`, `RegionOfInterest`, `StereoInfo` | `msg/vision.hpp` | | Navigation | `NavGoal`, `Waypoint`, `GoalResult` | `msg/navigation.hpp` | | Detection | `Detection`, `Detection3D`, `BoundingBox2D` | `msg/detection.hpp` | | Diagnostics | `Heartbeat`, `EmergencyStop`, `SafetyStatus` | `msg/diagnostics.hpp` | All types are `#[repr(C)]` compatible — identical memory layout in Rust and C++ for zero-copy SHM transfer. ## Single Include ```cpp #include // Everything: Scheduler, Topics, Messages, Literals ``` ## Building ### With HORUS CLI (recommended) ```bash horus build # builds using horus.toml configuration horus run # builds and runs horus test # builds and runs tests via ctest ``` ### With CMake directly ```cmake cmake_minimum_required(VERSION 3.20) project(my_robot LANGUAGES CXX) set(CMAKE_CXX_STANDARD 17) find_package(horus REQUIRED) add_executable(my_robot src/main.cpp) target_link_libraries(my_robot PRIVATE horus::horus) ``` ## Performance The C++ FFI boundary adds **15-17ns per call** — comparable to a virtual function call. A scheduler tick with one C++ node completes in **250ns** median. Throughput: **2.84 million ticks/second**. For full benchmark results with percentiles, scalability analysis, and ASAN validation, see the [Benchmarks page](/docs/performance/benchmarks#c-binding-performance). ## What's Next - [Obstacle Avoidance Tutorial](/docs/tutorials/cpp-obstacle-avoidance) — build a complete reactive controller - [Multi-Node Pipelines](/docs/concepts/nodes) — sensor → processor → actuator patterns - [C++ vs Rust vs Python](/docs/getting-started/choosing-language) — when to use each language - [API Reference](/docs/reference/cpp-api) — complete class documentation --- ## Getting Started with Python Path: /getting-started/python Description: Your path to building robotics applications with HORUS in Python # Getting Started with Python Python is the fast path to building with HORUS. You get the same shared-memory IPC, the same message types, and the same scheduler — with Python's ecosystem of ML frameworks, data tools, and rapid prototyping. ## Your Learning Path ### Step 1: Install HORUS ```bash curl -fsSL https://github.com/softmata/horus/raw/release/install.sh | bash ``` The installer sets up both the CLI and the Python bindings. See [Installation](/getting-started/installation) for details. ### Step 2: Build Your First App Follow the [Quick Start (Python)](/getting-started/quick-start-python) — build a publisher and subscriber in 10 minutes. ### Step 3: Learn the Concepts - [Nodes: The Building Blocks](/concepts/core-concepts-nodes) — how nodes work - [Topics: How Nodes Talk](/concepts/topics-beginner) — pub/sub communication - [Scheduler: Running Your Nodes](/concepts/scheduler-beginner) — execution model - [Choosing Your Configuration](/concepts/choosing-configuration) — progressive complexity guide ### Step 4: Tutorials - [Tutorial 1: IMU Sensor Node](/tutorials/python-imu) — read sensor data and publish it - [Tutorial 2: Motor Controller](/tutorials/python-motor) — subscribe to commands, simulate physics - [Tutorial 3: Full Robot System](/tutorials/python-robot) — combine sensors, controller, and estimator - [Tutorial 4: Custom Messages](/tutorials/python-custom-messages) — send typed data between nodes - [Tutorial 5: Hardware & Real-Time](/tutorials/python-hardware) — drivers and scheduling ### Step 5: Go Deeper - [Python API Reference](/python/api) — complete Node, Scheduler, Topic API - [Python Message Library](/python/library/python-message-library) — 55+ standard robotics types - [Recipes](/recipes) — copy-paste patterns for common tasks ## Why Python for HORUS? - **ML/AI integration** — use PyTorch, ONNX, TensorFlow, OpenCV directly in your nodes - **Rapid prototyping** — iterate on behavior logic without compile cycles - **Same IPC** — Python nodes share the same zero-copy topics as Rust nodes - **NumPy interop** — Image, PointCloud, and Tensor types expose data as NumPy arrays - **One-liner setup** — `horus.Node(name, tick, rate)` + `horus.run(node)` and you're running ## Python-Specific Guides - [Python Bindings Architecture](/python/api/python-bindings) — how the two-layer Python API works - [ML Integration](/python/library/ml-guide) — PyTorch, ONNX, HuggingFace in HORUS nodes - [Python CV Node Recipe](/recipes/python-cv-node) — OpenCV computer vision pipeline - [Async Nodes](/python/api/async-nodes) — async/await support for non-blocking I/O ## Need Help? - [Troubleshooting](/getting-started/troubleshooting) — fix installation and runtime errors - [Choosing a Language](/getting-started/choosing-language) — comparing Rust and Python for your project --- ## Getting Started with Rust Path: /getting-started/rust Description: Your path to building robotics applications with HORUS in Rust # Getting Started with Rust Rust is the native language for HORUS. You get direct access to the full API — zero-copy shared memory, real-time scheduling, compile-time safety, and maximum performance. ## Your Learning Path ### Step 1: Install HORUS ```bash curl -fsSL https://github.com/softmata/horus/raw/release/install.sh | bash ``` See [Installation](/getting-started/installation) for platform-specific instructions and troubleshooting. ### Step 2: Build Your First App Follow the [Quick Start](/getting-started/quick-start) — build a publisher and subscriber in 10 minutes. ### Step 3: Build Something Real [Building Your Second Application](/getting-started/second-application) — a 3-node sensor pipeline with filtering and display. ### Step 4: Learn the Concepts - [Nodes: The Building Blocks](/concepts/core-concepts-nodes) — how nodes work - [Topics: How Nodes Talk](/concepts/topics-beginner) — pub/sub communication - [Scheduler: Running Your Nodes](/concepts/scheduler-beginner) — execution model - [Execution Classes](/concepts/execution-classes) — Rt, Compute, Event, AsyncIo, BestEffort ### Step 5: Go Deeper - [Tutorials](/tutorials) — step-by-step guides (IMU sensor, motor controller, full robot integration) - [Recipes](/recipes) — copy-paste patterns for common tasks - [Rust API Reference](/rust/api) — complete method reference ## Why Rust for HORUS? - **Zero-copy IPC** — shared memory topics with no serialization overhead - **Compile-time safety** — the borrow checker catches data races before they reach your robot - **Real-time capable** — dedicated threads with SCHED_FIFO, budget enforcement, deadline monitoring - **Full API access** — every HORUS feature is available in Rust first - **`node!` macro** — write nodes in 5 lines instead of 30 ## Common Mistakes New to HORUS? Check [Common Mistakes](/getting-started/common-mistakes) before you get stuck. ## Need Help? - [Troubleshooting](/getting-started/troubleshooting) — fix installation and runtime errors - [Choosing a Language](/getting-started/choosing-language) — not sure if Rust is right for your project? --- ## Installation Path: /getting-started/installation Description: Install HORUS on Linux, macOS, or Windows # Installation ## Prerequisites - **Operating system**: Linux (Ubuntu 20.04+, Debian 11+, Fedora 36+, Arch), macOS, or Windows via WSL 2 - **Internet connection**: Required to download Rust and HORUS - **System dependencies**: Build tools and development libraries (installed in Step 1) ## What You'll Set Up A working HORUS installation with the `horus` CLI tool, core libraries, and optionally Python bindings. After completing this guide, you'll be able to create, build, and run HORUS projects. **Time estimate**: ~10–15 minutes ## Platform Support | Platform | Status | Notes | |----------|--------|-------| | **Ubuntu 20.04+** | Supported | Recommended for production | | **Debian 11+** | Supported | Tested and working | | **Fedora 36+** | Supported | Use dnf for packages | | **Arch Linux** | Supported | Community maintained | | **Raspberry Pi** | Supported | ARM64 tested on Ubuntu | | **macOS** | Supported | Full support (dev + testing) | | **Windows** | Supported | Full support (dev + testing); production deploy via WSL 2 | | **WSL 2** | Supported | Full Linux environment — identical to native Linux | For detailed platform differences (RT scheduling, shared memory backends, timer precision), see [Platform Support](/reference/platform-support). ## Step 1: Install System Dependencies Install build tools and development libraries for your platform. You should see all packages install without errors. If a package is not found, verify your system's package repositories are up to date. ## Step 2: Install Rust HORUS requires Rust 1.92 or later. ```bash curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh ``` Follow the prompts — press Enter to accept the defaults. Then load the Rust environment: ```bash source $HOME/.cargo/env ``` Verify the installation: ```bash rustc --version cargo --version ``` You should see version numbers like `rustc 1.92.0` or higher. If `rustc` is not found, restart your terminal and try again. ## Step 3: Install HORUS **Option A: One-line installer (recommended)** ```bash curl -fsSL https://github.com/softmata/horus/raw/release/install.sh | bash ``` This downloads and runs the installer, which builds the `horus` CLI and installs it to `~/.cargo/bin/`. **Option B: Clone and build** ```bash git clone https://github.com/softmata/horus.git cd horus ./install.sh ``` The installer takes approximately 5 minutes. You should see green checkmarks for each step as it completes. ## Step 4: Verify Installation ```bash horus --help ``` You should see a list of available commands including `new`, `run`, `monitor`, `install`, `search`, and more. Run the system health check: ```bash horus doctor ``` You should see a summary of your HORUS installation with all checks passing. If any checks fail, the output includes remediation steps. ## Step 5: Install Python Bindings (Optional) If Python 3.9+ was detected during Step 3, the bindings are already installed. Verify: ```bash python3 -c "import horus; print('Python bindings installed')" ``` You should see `Python bindings installed`. If you see `ModuleNotFoundError`, install manually: ```bash # Install maturin (Rust-Python build tool) cargo install maturin # Build and install from the horus_py directory cd horus_py maturin develop --release ``` You should see `Successfully installed horus-robotics`. Verify again with the `python3 -c "import horus"` command above. ## Updating HORUS If you used Option B (clone and build): ```bash cd horus git pull ./install.sh ``` To preview changes before updating: ```bash git fetch git log HEAD..@{u} git pull ./install.sh ``` ## Uninstalling HORUS Run the uninstaller from the HORUS directory: ```bash cd horus ./uninstall.sh ``` This removes the `horus` CLI binary from `~/.cargo/bin/`, cached libraries from `~/.horus/cache/`, and cleans up shared memory files. Project-local `.horus/` directories are left untouched. ## Docker Installation For CI/CD pipelines and containerized deployments: ```dockerfile # Multi-stage build FROM rust:1.92-bookworm AS builder # System dependencies RUN apt-get update && apt-get install -y \ build-essential pkg-config libssl-dev cmake \ python3-dev python3-pip && rm -rf /var/lib/apt/lists/* # Install HORUS RUN curl -fsSL https://github.com/softmata/horus/raw/release/install.sh | bash # Build your project WORKDIR /app COPY . . RUN horus build --release # Runtime stage FROM debian:bookworm-slim RUN apt-get update && apt-get install -y libssl3 && rm -rf /var/lib/apt/lists/* COPY --from=builder /root/.cargo/bin/horus /usr/local/bin/ COPY --from=builder /app/.horus/target/release/my_robot /usr/local/bin/ # CRITICAL: --ipc=host for shared memory access across containers # docker run --ipc=host my-robot CMD ["my_robot"] ``` **Important**: Use `--ipc=host` when running the container to enable shared memory access between containers and the host: ```bash docker run --ipc=host my-robot-image ``` Without `--ipc=host`, topics cannot be shared across container boundaries. ## CI/CD Setup ### GitHub Actions ```yaml name: HORUS CI on: [push, pull_request] jobs: test: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: Install Rust uses: dtolnay/rust-toolchain@stable - name: Install system deps run: sudo apt-get update && sudo apt-get install -y build-essential pkg-config libssl-dev cmake - name: Install HORUS run: curl -fsSL https://github.com/softmata/horus/raw/release/install.sh | bash - name: Validate project run: horus check - name: Build run: horus build --release - name: Test (simulation mode, no hardware) run: horus test --sim --release - name: Lint run: horus lint - name: Format check run: horus fmt --check ``` **Key CI flags**: - `horus test --sim` — run without hardware (uses simulated sensors) - `horus fmt --check` — fail if code is unformatted (exit code 1) - `horus lint` — run clippy + ruff ## Offline Installation For air-gapped robots without internet access: ```bash # On a machine WITH internet: # 1. Download the installer and source curl -fsSL https://github.com/softmata/horus/raw/release/install.sh -o install.sh git clone https://github.com/softmata/horus.git --branch release --depth 1 # 2. Transfer to the robot via USB scp -r install.sh horus/ robot@192.168.1.100:~/ # On the air-gapped robot: # 3. Install from local source cd ~/horus ./install.sh --local ``` ## Understanding `horus doctor` `horus doctor` runs a comprehensive health check. Here's what each check means: ```bash horus doctor --verbose ``` | Check | What it tests | OK means | FAIL means | |-------|--------------|----------|-----------| | **Toolchains** | cargo, rustc, python3, ruff, cmake | Tools are installed and in PATH | Install missing tools (see Step 1) | | **Manifest** | `horus.toml` validity | Config is syntactically correct | Fix TOML syntax errors | | **Shared Memory** | `/dev/shm` accessible | SHM directory exists and is writable | Check filesystem permissions | | **Plugins** | Global + local plugin count | Plugins are installed | Reinstall with `horus install --plugin` | | **Disk** | `.horus/` cache size | Cache exists | Run `horus build` to populate | | **Languages** | Detected from build files | At least one language detected | Create `src/main.rs` or `src/main.py` | | **Dependencies** | Source validation | All deps resolve | `horus add` with correct `--source` | | **Drivers** | Serial/I2C/network reachability | Hardware is connected | Check cables, permissions, device paths | | **System Deps** | Python version, C++ compiler, libs | All system packages found | `horus doctor --fix` to auto-install | **Auto-fix**: `horus doctor --fix` installs missing toolchains and system packages automatically. It pins installed versions to `horus.lock`. ## Troubleshooting | Symptom | Cause | Fix | |---------|-------|-----| | `rustc: command not found` | Rust not in PATH | Run `source $HOME/.cargo/env` or restart your terminal | | `horus: command not found` | `~/.cargo/bin` not in PATH | Add `export PATH="$HOME/.cargo/bin:$PATH"` to your shell profile | | Build fails with missing headers | System dependencies not installed | Run the install commands for your platform in Step 1 | | `ModuleNotFoundError: horus` | Python bindings not installed | Follow Step 5 to install manually | | Raspberry Pi build is slow | Limited RAM, debug mode | Use `--release` flag and ensure 64-bit OS | See the [Troubleshooting Guide](/getting-started/troubleshooting) for more issues and detailed solutions. ## Key Takeaways - HORUS installs via a one-line command or by cloning and running `./install.sh` - The `horus` CLI is installed to `~/.cargo/bin/` and core libraries are cached in `~/.horus/cache/` - Python bindings are auto-installed when Python 3.9+ is detected - `horus doctor` verifies your installation is healthy - No hardware is required — HORUS works on any Linux, macOS, or WSL 2 machine ## Next Steps - [Quick Start](/getting-started/quick-start) — Build your first HORUS application in Rust - [Quick Start (Python)](/getting-started/quick-start-python) — Build your first HORUS application in Python ## See Also - [Choosing a Language](/getting-started/choosing-language) — Rust vs Python comparison - [CLI Reference](/development/cli-reference) — Complete `horus` command documentation - [Common Mistakes](/getting-started/common-mistakes) — Avoid frequent beginner pitfalls - [Troubleshooting](/getting-started/troubleshooting) — Solutions for installation and runtime issues --- ## Quick Start Path: /getting-started/quick-start Description: Build your first HORUS application in 10 minutes # Quick Start ## Prerequisites - [HORUS installed](/getting-started/installation) with `horus --help` working - A terminal and text editor ## What You'll Build A temperature monitoring system with two nodes: 1. **Sensor** — generates temperature readings and publishes them 2. **Monitor** — subscribes to the readings and displays them The nodes communicate through HORUS's shared-memory Topics — no sockets, no serialization, no configuration. **Time estimate**: ~10 minutes ## Step 1: Create a New Project ```bash horus new temperature-monitor ``` Select options in the interactive prompt: - **Language**: Rust - **Use macros**: No (we'll learn the basics first) ```bash cd temperature-monitor ``` You should see three items in the project directory: - `src/main.rs` — your application code - `horus.toml` — project configuration (name, version, dependencies) - `.horus/` — generated build files (managed automatically) ## Step 2: Write the Sensor Node Replace the contents of `src/main.rs` with the following: ```rust use horus::prelude::*; use std::time::Duration; // ── Sensor Node ───────────────────────────────────────────── // Publishes a simulated temperature reading every second. struct TemperatureSensor { publisher: Topic, temperature: f32, } impl TemperatureSensor { fn new() -> Result { Ok(Self { publisher: Topic::new("temperature")?, temperature: 20.0, }) } } impl Node for TemperatureSensor { fn name(&self) -> &str { "TemperatureSensor" } fn tick(&mut self) { self.temperature += 0.1; self.publisher.send(self.temperature); // WARNING: sleep() in tick() blocks the scheduler thread. // In production, use .rate(1_u64.hz()) on the node builder instead. std::thread::sleep(Duration::from_secs(1)); } fn shutdown(&mut self) -> Result<()> { eprintln!("Sensor shutting down. Last reading: {:.1}°C", self.temperature); Ok(()) } } // ── Monitor Node ──────────────────────────────────────────── // Subscribes to temperature readings and prints them. struct TemperatureMonitor { subscriber: Topic, } impl TemperatureMonitor { fn new() -> Result { Ok(Self { subscriber: Topic::new("temperature")?, }) } } impl Node for TemperatureMonitor { fn name(&self) -> &str { "TemperatureMonitor" } fn tick(&mut self) { if let Some(temp) = self.subscriber.recv() { println!("Temperature: {:.1}°C", temp); } } fn shutdown(&mut self) -> Result<()> { eprintln!("Monitor shutting down."); Ok(()) } } // ── Main ──────────────────────────────────────────────────── fn main() -> Result<()> { eprintln!("Starting temperature monitoring system...\n"); let mut scheduler = Scheduler::new(); // The scheduler auto-detects order from topics: // sensor publishes "temperature", monitor subscribes → sensor runs first scheduler.add(TemperatureSensor::new()?).build()?; scheduler.add(TemperatureMonitor::new()?).build()?; // Run until Ctrl+C scheduler.run()?; Ok(()) } ``` ## Step 3: Run It ```bash horus run --release ``` You should see output like: ``` Starting temperature monitoring system... Temperature: 20.1°C Temperature: 20.2°C Temperature: 20.3°C Temperature: 20.4°C ... ``` Press **Ctrl+C** to stop. You should see the shutdown messages from both nodes. > **Debug vs Release**: `horus run` without `--release` uses debug mode (~60-200μs per tick). With `--release`, tick times drop to ~1-3μs. If you're thinking "HORUS is slow" — you're probably running in debug mode. Always use `--release` for performance testing. ## Step 3.5: Introspect While Running While your app is running, open a **second terminal** and peek inside: ```bash # See all active topics horus topic list # Output: # temperature (f32) — 1 publisher, 1 subscriber # Watch messages in real-time horus topic echo temperature # Output: # 20.1 # 20.2 # 20.3 # ... # Check the publishing rate horus topic hz temperature # Output: average rate: 1.0 Hz # See running nodes horus node list # Output: # NAME RATE STATUS # TemperatureSensor 1Hz Running # TemperatureMonitor 1Hz Running ``` This works because HORUS topics live in shared memory — any process on the machine can inspect them, including the CLI tools. This is your primary debugging tool. ## Step 3.6: What Just Happened When you ran `horus run --release`, HORUS: 1. **Parsed** `horus.toml` and compiled your Rust code via Cargo 2. **Created shared memory** — a ring buffer at `/dev/shm/horus_/topics/horus_temperature` for the "temperature" topic 3. **Initialized nodes** — called `new()` on both structs, which opened the same SHM region via `Topic::new("temperature")` 4. **Started the tick loop** — the scheduler calls `TemperatureSensor::tick()` then `TemperatureMonitor::tick()` in order, every cycle 5. **On Ctrl+C** — sent SIGTERM, called `shutdown()` on both nodes, cleaned up SHM The data flow: `TemperatureSensor::tick()` writes an `f32` directly into the ring buffer (zero-copy, ~3ns). `TemperatureMonitor::tick()` reads it out. No serialization, no network, no kernel involvement. ## Step 4: Understand the Key Patterns You just used three core HORUS concepts: ### Topic — Communication Channel ```rust // Both nodes create a Topic with the same name — simplified publisher: Topic::new("temperature")? // returns HorusResult> subscriber: Topic::new("temperature")? // ... ``` `Topic::new()` returns a `HorusResult` — the `?` propagates errors if shared memory allocation fails. Any number of nodes can publish or subscribe to the same topic name. ### Node Trait — Component Lifecycle Every HORUS component implements the `Node` trait. Only `tick()` is required — all other methods have defaults: ```rust // simplified impl Node for MyNode { fn name(&self) -> &str { "MyNode" } // identity fn tick(&mut self) { /* runs every cycle */ } // required fn shutdown(&mut self) -> Result<()> { Ok(()) } // cleanup on Ctrl+C } ``` ### Scheduler — Orchestration The scheduler runs nodes in priority order each tick cycle: ```rust // simplified let mut scheduler = Scheduler::new(); // Scheduler auto-detects order from topic dependencies scheduler.add(sensor).build()?; scheduler.add(monitor).build()?; scheduler.run()?; // loop until Ctrl+C ``` ## Step 5: Try the node! Macro (Optional) The same two nodes can be written with the `node!` macro, which eliminates boilerplate: ```rust use horus::prelude::*; node! { TemperatureSensor { pub { publisher: f32 -> "temperature" } data { temperature: f32 = 20.0 } tick { self.temperature += 0.1; self.publisher.send(self.temperature); std::thread::sleep(std::time::Duration::from_secs(1)); } } } node! { TemperatureMonitor { sub { subscriber: f32 -> "temperature" } tick { if let Some(temp) = self.subscriber.recv() { println!("Temperature: {:.1}°C", temp); } } } } ``` The macro generates the struct, constructor, and `Node` trait implementation. Both approaches produce identical runtime behavior. See the [node! Macro Guide](/concepts/node-macro) for the full syntax. ## Troubleshooting | Symptom | Cause | Fix | |---------|-------|-----| | `Failed to create Topic` | Stale shared memory from a previous run | Usually auto-cleaned — if persists, run `horus clean --shm` | | Nothing prints | Monitor added but sensor missing | Ensure both nodes are added to the scheduler | | `horus run` fails to build | Missing system dependencies | See [Installation](/getting-started/installation) Step 1 | | Output looks slow | Running in debug mode | Use `horus run --release` | | Topic not in `topic list` | CLI is in a different SHM namespace | Run CLI in the same terminal, or set matching `HORUS_NAMESPACE` | | `Permission denied` on shared memory | SHM directory permissions | Run `horus doctor` to diagnose | | `Address already in use` | Another HORUS process still running | Check for running processes; if none, `horus clean --shm` | | `.build()` returns error | Duplicate node name or invalid config | Check that each node has a unique name | ## Python Equivalent The same system in Python (8 lines): ```python import horus temperature = 20.0 def sensor_tick(node): global temperature temperature += 0.1 node.send("temperature", temperature) def monitor_tick(node): temp = node.recv("temperature") if temp is not None: print(f"Temperature: {temp:.1f}°C") sensor = horus.Node(name="sensor", pubs=["temperature"], tick=sensor_tick, rate=1, order=0) monitor = horus.Node(name="monitor", subs=["temperature"], tick=monitor_tick, rate=1, order=1) horus.run(sensor, monitor) ``` See [Quick Start (Python)](/getting-started/quick-start-python) for a full walkthrough. ## Graceful Shutdown When you press **Ctrl+C**, HORUS: 1. Catches the SIGTERM signal 2. Calls `shutdown()` on each node in registration order 3. Cleans up shared memory regions owned by this process 4. Prints a timing report (total ticks, average tick duration) Always zero actuators in `shutdown()` — if your node controls motors, send a stop command before exiting. ## Key Takeaways - **Nodes** implement the `Node` trait — `tick()` runs every scheduler cycle - **Topics** are named shared-memory channels — `send()` to publish, `recv()` to subscribe - **Scheduler** orchestrates nodes in priority order with `.order(n)` - `Topic::new("name")?` returns a `HorusResult` — always handle the error - Multi-process communication works automatically: same topic name = same channel ## Next Steps - [Quick Start (Python)](/getting-started/quick-start-python) — Build the same system in Python - [Second Application](/getting-started/second-application) — Build a 3-node pipeline with filtering, monitoring, and watchdog - [Common Mistakes](/getting-started/common-mistakes) — Avoid the pitfalls that trip up every beginner ## Beyond the Basics You've seen Nodes, Topics, and the Scheduler. HORUS has much more — here's what to explore next: | Feature | What it does | Guide | |---------|-------------|-------| | **Execution classes** | Run nodes as RT, compute-bound, event-driven, or async I/O | [Execution Classes](/concepts/execution-classes) | | **Watchdog & safety** | Detect frozen nodes, enforce deadlines, graduated degradation | [Safety Monitor](/advanced/safety-monitor) | | **BlackBox** | Flight recorder for post-mortem crash analysis | [BlackBox](/advanced/blackbox) | | **Deterministic mode** | Reproducible execution for simulation and CI | [Deterministic Mode](/advanced/deterministic-mode) | | **Record & Replay** | Tick-perfect replay for reproducing field bugs | [Record & Replay](/advanced/record-replay) | | **Fault tolerance** | Per-node failure policies (restart, skip, fatal) | [Circuit Breaker](/advanced/circuit-breaker) | | **Framework clock** | `horus::now()`, `dt()`, `budget_remaining()` | [Real-Time Systems](/concepts/real-time) | | **Progressive config** | From prototype to production in 5 levels | [Choosing Your Configuration](/concepts/choosing-configuration) | ## See Also - [Nodes (Concept)](/concepts/core-concepts-nodes) — How nodes work under the hood - [Topic (Concept)](/concepts/core-concepts-topic) — Shared memory architecture - [Scheduler (Concept)](/concepts/core-concepts-scheduler) — Execution model and priority - [Examples](/rust/examples/basic-examples) — More sample applications --- ## Quick Start: C++ Path: /getting-started/quick-start-cpp Description: Build your first HORUS C++ project in 5 minutes — a temperature monitoring system # Quick Start: C++ > If you prefer Rust, see the [Rust Quick Start](/docs/getting-started/quick-start). For Python, see the [Python Quick Start](/docs/getting-started/quick-start-python). ## Prerequisites - **HORUS CLI** installed ([Installation Guide](/docs/getting-started/installation)) - **CMake 3.20+** and **GCC 12+** (or Clang 15+) ## What You'll Build A temperature monitoring system with two nodes communicating over shared memory: 1. **Sensor node** — publishes temperature readings at 10 Hz 2. **Monitor node** — subscribes, detects overheating, publishes alerts Both nodes run in the same scheduler with deterministic ordering. ## Step 1: Create the Project ```bash horus new temp_monitor --lang cpp cd temp_monitor ``` This creates: ``` temp_monitor/ ├── horus.toml # Project config (single source of truth) ├── src/ │ └── main.cpp # Your code goes here └── include/ ``` ## Step 2: Write the Sensor Node Replace `src/main.cpp`: ```cpp #include #include #include using namespace horus::literals; // Simulated temperature sensor static float read_temperature() { static int tick = 0; return 22.0f + 5.0f * std::sin(tick++ * 0.1f); // oscillates 17-27°C } int main() { // Create scheduler at 10 Hz horus::Scheduler sched; sched.tick_rate(10_hz); sched.name("temp_monitor"); // Create pub/sub — both share the same SHM topic horus::Publisher sensor_pub("temp.reading"); horus::Subscriber monitor_sub("temp.reading"); // Sensor node: reads temperature, publishes it sched.add("sensor") .order(0) // runs first .tick([&] { auto sample = sensor_pub.loan(); // zero-copy from SHM sample->linear = read_temperature(); // store temp in linear field sample->timestamp_ns = 0; sensor_pub.publish(std::move(sample)); }) .build(); // Monitor node: reads temperature, checks threshold sched.add("monitor") .order(10) // runs after sensor .tick([&] { auto msg = monitor_sub.recv(); if (!msg) return; float temp = msg->get()->linear; if (temp > 25.0f) { std::printf("[ALERT] Temperature %.1f°C exceeds threshold!\n", temp); } else { std::printf("[OK] Temperature %.1f°C\n", temp); } }) .build(); std::printf("Starting temperature monitor (Ctrl+C to stop)...\n"); sched.spin(); // blocks until stopped } ``` ## Step 3: Build and Run ```bash horus build horus run ``` Output: ``` Starting temperature monitor (Ctrl+C to stop)... [OK] Temperature 22.0°C [OK] Temperature 22.5°C [OK] Temperature 23.5°C [OK] Temperature 24.8°C [ALERT] Temperature 25.9°C exceeds threshold! [ALERT] Temperature 26.7°C exceeds threshold! [ALERT] Temperature 26.9°C exceeds threshold! ... ``` ## Step 4: Monitor with CLI In a second terminal: ```bash horus topic list # see active topics horus topic echo temp.reading # watch raw messages horus node list # see running nodes ``` ## What Just Happened? - **Zero-copy IPC** — `sensor_pub.loan()` returns a pointer directly into shared memory. The monitor reads from the same memory. No serialization, no copying. - **Deterministic ordering** — `order(0)` guarantees sensor runs before `order(10)` monitor, every tick. - **Single scheduler** — both nodes share one tick loop. No threads, no race conditions. ## Next Steps - [Getting Started with C++](/docs/getting-started/cpp) — full API overview with all message types - [C++ API Reference](/docs/reference/cpp-api) — complete class/method reference - [Migrating from ROS2 C++](/docs/tutorials/migrating-from-ros2-cpp) — side-by-side comparison - [Multi-Language Guide](/docs/concepts/multi-language) — mix C++, Rust, and Python in one system - [Tutorials](/docs/tutorials) — LiDAR robot, motor controllers, sensor fusion --- ## Quick Start (Python) Path: /getting-started/quick-start-python Description: Build your first HORUS application in 10 minutes — Python edition # Quick Start (Python) ## Prerequisites - [HORUS installed](/getting-started/installation) with `horus --help` working - Python 3.9+ with HORUS bindings (`python3 -c "import horus"` works) - A terminal and text editor ## Installing the Python Bindings Install the `horus` Python package from PyPI: ```bash pip install horus-robotics ``` If you are building from source (e.g., developing HORUS itself or need unreleased features), use [maturin](https://www.maturin.rs/) instead: ```bash cd horus_py maturin develop --release ``` Verify the installation: ```bash python3 -c "import horus; print('horus OK')" ``` ## What You'll Build A temperature monitoring system with two nodes: 1. **Sensor** — generates temperature readings and publishes them 2. **Monitor** — subscribes to the readings and displays them The nodes communicate through HORUS's shared-memory Topics — the same zero-copy IPC as Rust, accessible from Python. **Time estimate**: ~10 minutes ## Step 1: Create a New Project ```bash horus new temperature-monitor -p cd temperature-monitor ``` You should see three items in the project directory: - `src/main.py` — your application code - `horus.toml` — project configuration - `.horus/` — generated build files (managed automatically) ## Step 2: Write the Code Replace the contents of `src/main.py` with the following: ```python import horus # ── Sensor Node ────────────────────────────────────────────── # Publishes a simulated temperature reading every second. def make_sensor(): temp = [20.0] # mutable state via closure def tick(node): temp[0] += 0.1 node.send("temperature", temp[0]) return horus.Node(name="TemperatureSensor", tick=tick, rate=1, order=0, pubs=["temperature"]) # ── Monitor Node ───────────────────────────────────────────── # Subscribes to temperature readings and prints them. def monitor_tick(node): temp = node.recv("temperature") if temp is not None: print(f"Temperature: {temp:.1f}°C") monitor = horus.Node(name="TemperatureMonitor", tick=monitor_tick, rate=1, order=1, subs=["temperature"]) # ── Run ────────────────────────────────────────────────────── print("Starting temperature monitoring system...\n") horus.run(make_sensor(), monitor) ``` ## Step 3: Run It ```bash horus run ``` You should see output like: ``` Starting temperature monitoring system... Temperature: 20.1°C Temperature: 20.2°C Temperature: 20.3°C Temperature: 20.4°C ... ``` Press **Ctrl+C** to stop. ## Step 3.5: Inspect Your System While your system is running (restart it with `horus run` in one terminal), open a **second terminal** in the same project directory and try these debugging tools: ```bash # List all active topics horus topic list ``` You should see `temperature` in the list — this is the shared memory topic your sensor is publishing to. ```bash # Watch messages on a topic in real time horus topic echo temperature ``` This prints every message as it's published. You'll see the temperature values streaming. Press Ctrl+C to stop. ```bash # Measure the actual publish rate horus topic hz temperature ``` This shows the real frequency of messages. Since your sensor runs at `rate=1`, you should see ~1.0 Hz. ## Step 4: Understand the Key Patterns You just used three core HORUS concepts: ### Node — Component Definition Each component is a `horus.Node` with a tick function: ```python def tick(node): node.send("temperature", value) sensor = horus.Node( name="TemperatureSensor", tick=tick, rate=1, # tick frequency in Hz order=0, # execution priority (lower = runs first) pubs=["temperature"], # declare published topics ) ``` The `tick` function receives the `node` instance and is called every cycle. State lives in the closure or as class attributes. ### Topics — Communication ```python # Send data to a topic node.send("temperature", value) # Receive data from a topic (returns None if no messages) temp = node.recv("temperature") ``` Both use the same topic name. HORUS manages shared memory automatically — same zero-copy IPC as Rust. ### horus.run() — One-Liner Execution ```python horus.run(sensor, monitor) ``` Creates a scheduler, adds all nodes, and runs until Ctrl+C. For more control, use `horus.Scheduler` directly. ## Troubleshooting | Symptom | Cause | Fix | |---------|-------|-----| | `ModuleNotFoundError: horus` | Python bindings not installed | Run `pip install horus-robotics`. For source builds: `cd horus_py && maturin develop --release` | | `Failed to create Topic` | Stale shared memory from a previous run | Run `horus clean --shm` | | Nothing prints | Monitor added but sensor missing | Ensure both nodes are passed to `horus.run()` | | Topics not visible to other processes | Running `python src/main.py` directly | Always use `horus run` — it sets up the SHM namespace. See the warning above | | `Permission denied` on SHM files | Shared memory permissions mismatch | Run `horus doctor` to diagnose. On Linux, check `/dev/shm/` permissions | | Output looks slow or laggy | Running in debug mode | Use `horus run --release` for optimized builds. Debug builds are 10-50x slower for compute-heavy code | | `horus: command not found` | HORUS not in PATH | Re-run the [installer](/getting-started/installation) or add `~/.horus/bin` to your `PATH` | | `TypeError` in tick function | Wrong tick function signature | Tick functions must accept exactly one argument: `def tick(node)`. The scheduler passes the node instance automatically | ## Key Takeaways - **Nodes** are created with `horus.Node(name, tick, rate, order, pubs, subs)` - **`node.send(topic, data)`** publishes data, **`node.recv(topic)`** subscribes (returns `None` if empty) - **`horus.run(*nodes)`** is the one-liner to run everything - State in tick functions lives in closures (lists for mutability) or class instances - The Python API uses the same shared-memory Topics as Rust — no performance penalty for IPC ## Next Steps - [Quick Start (Rust)](/getting-started/quick-start) — Build the same system in Rust - [Second Application (Python)](/getting-started/second-application-python) — Build a 3-node pipeline with filtering and monitoring - [Choosing a Language](/getting-started/choosing-language) — When to use Python vs Rust - [Common Mistakes (Python)](/getting-started/common-mistakes-python) — Avoid pitfalls specific to Python nodes ## Beyond the Basics You've seen Nodes, Topics, and `horus.run()`. HORUS has much more — here's what to explore next: | Feature | What it does | Guide | |---------|-------------|-------| | **Execution classes** | Run nodes as RT, compute-bound, event-driven, or async I/O | [Execution Classes](/concepts/execution-classes) | | **Watchdog & safety** | Detect frozen nodes, enforce deadlines, graduated degradation | [Safety Policies (Python)](/python/safety-policies) | | **BlackBox** | Flight recorder for post-mortem crash analysis | [BlackBox](/advanced/blackbox) | | **Deterministic mode** | Reproducible execution for simulation and CI | [Deterministic Mode](/advanced/deterministic-mode) | | **Record & Replay** | Tick-perfect replay for reproducing field bugs | [Record & Replay](/advanced/record-replay) | | **Fault tolerance** | Per-node failure policies (restart, skip, fatal) | [Circuit Breaker](/advanced/circuit-breaker) | | **Framework clock** | `horus.now()`, `horus.dt()`, `horus.budget_remaining()` | [Real-Time (Python)](/python/real-time) | | **Progressive config** | From prototype to production in 5 levels | [Choosing Your Configuration](/concepts/choosing-configuration) | ## See Also - [Python API Reference](/python/api/python-bindings) — Full `horus.Node`, `horus.run()`, `horus.Scheduler` docs - [Nodes (Concept)](/concepts/core-concepts-nodes) — How nodes work under the hood - [Topic (Concept)](/concepts/core-concepts-topic) — Shared memory architecture - [Python Examples](/python/examples) — More sample applications --- ## Choosing a Language Path: /getting-started/choosing-language Description: Should you use Rust, Python, or C++ with HORUS? # Choosing a Language HORUS supports **Rust**, **Python**, and **C++**. All three share the same shared-memory IPC — a Rust node can publish to a C++ subscriber with zero overhead. This guide helps you choose. ## Prerequisites - HORUS installed ([Installation Guide](/getting-started/installation)) --- ## Quick Decision **Use Python if:** - You're prototyping or experimenting - You're new to robotics programming - You want to integrate with ML/AI libraries (TensorFlow, PyTorch) - Development speed matters more than runtime performance **Use Rust if:** - You need maximum performance - You're building production systems - You want compile-time safety guarantees - You're comfortable with Rust (or want to learn) **Use C++ if:** - You have existing C++ codebases or libraries to integrate - Your team already knows C++ (common in robotics, automotive, gaming) - You're migrating from ROS2 C++ (rclcpp) - You need direct hardware access with familiar tooling (CMake, GDB) - You want near-Rust performance without learning Rust --- ## Side-by-Side Comparison ### Hello World: Temperature Sensor **Python:** ```python import horus def sensor_tick(node): temp = 25.0 # Read sensor node.send("temperature", temp) sensor = horus.Node(name="TempSensor", tick=sensor_tick, order=5, pubs=["temperature"]) horus.run(sensor) ``` **Rust:** ```rust use horus::prelude::*; struct TempSensor { pub_topic: Topic, } impl TempSensor { fn new() -> Result { Ok(Self { pub_topic: Topic::new("temperature")? }) } } impl Node for TempSensor { fn name(&self) -> &str { "TempSensor" } fn tick(&mut self) { let temp = 25.0; // Read sensor self.pub_topic.send(temp); } } fn main() -> Result<()> { let mut scheduler = Scheduler::new(); scheduler.add(TempSensor::new()?).order(5).build()?; scheduler.run() } ``` **Or with the Rust node! macro:** ```rust use horus::prelude::*; node! { TempSensor { pub { temperature: f32 -> "temperature" } tick { let temp = 25.0; self.temperature.send(temp); } } } fn main() -> Result<()> { let mut scheduler = Scheduler::new(); scheduler.add(TempSensor::new()).order(5).build()?; scheduler.run() } ``` **C++:** ```cpp #include int main() { horus::Scheduler sched; horus::Publisher pub("temperature"); sched.add("TempSensor") .order(5) .tick([&] { auto sample = pub.loan(); // zero-copy SHM sample->linear = 25.0f; // read sensor pub.publish(std::move(sample)); }) .build(); sched.spin(); } ``` --- ## Detailed Comparison | Aspect | Python | Rust | C++ | |--------|--------|------|-----| | **Learning curve** | Easy | Steeper | Moderate | | **Setup time** | 5 minutes | 10 minutes | 10 minutes | | **Compile time** | None | A few seconds | A few seconds | | **Runtime performance** | Good | Excellent | Excellent | | **Memory safety** | Runtime checks | Compile-time guarantees | Manual (RAII helps) | | **ML/AI integration** | Excellent (numpy, torch) | Limited | Good (OpenCV, TensorRT) | | **Debugging** | Simple print | More tooling | GDB, Valgrind, ASAN | | **Production readiness** | Good for prototypes | Production-grade | Production-grade | | **ROS2 migration** | Easy | Moderate | Easiest (similar API) | | **IPC latency** | ~80ns send | ~11ns send | ~15ns send (FFI overhead) | --- ## Performance Comparison | Operation | Python | Rust | Difference | |-----------|--------|------|------------| | Node tick latency | ~10μs | ~1μs | 10x faster | | Message send | ~2μs | ~400ns | 5x faster | | Control loop (1kHz) | Achievable | Easy | - | | Control loop (10kHz) | Difficult | Achievable | - | **Bottom line:** For most robotics applications, both are fast enough. Rust matters when you need very high-frequency control (>1kHz), hard real-time guarantees, or minimal memory footprint. ### Scheduling Features Both Rust and Python support the full scheduling API: | Feature | Rust | Python | Notes | |---------|------|--------|-------| | `.rate()` | Yes | Yes | Tick rate in Hz | | `.order()` | Yes | Yes | Execution priority | | `.budget()` | Yes | Yes | Tick time budget | | `.deadline()` | Yes | Yes | Hard deadline | | `.on_miss()` | Yes | Yes | Deadline miss policy | | `.priority()` | Yes | Yes | OS thread priority (SCHED_FIFO) | | `.core()` | Yes | Yes | CPU core pinning | | `.watchdog()` | Yes | Yes | Per-node watchdog timeout | | `.compute()` | Yes | Yes | CPU-bound thread pool | | `.async_io()` | Yes | Yes | I/O-bound async pool | | `.on(topic)` | Yes | Yes | Event-triggered execution | | `enter_safe_state()` | Yes | No | Requires implementing Node trait | **Performance difference:** While both languages expose the same API, Rust nodes achieve lower tick jitter and more predictable timing due to the absence of the GIL. For control loops above 1kHz with hard deadlines, Rust is recommended. **Best practice:** Use Rust for RT-critical driver nodes (motors, safety). Use Python for application logic (planning, ML inference, behavior trees). They communicate via shared memory topics — zero overhead across the language boundary. --- ## When to Choose Python ### Rapid Prototyping ```python # Quick experiment - try different approaches fast import horus def controller_tick(node): input_val = node.recv("sensor") or 0.0 strategy = "aggressive" if strategy == "aggressive": output = input_val * 2.0 else: output = input_val * 0.5 node.send("output", output) ctrl = horus.Node(name="ExperimentalController", tick=controller_tick, subs=["sensor"], pubs=["output"]) ``` ### Machine Learning Integration ```python import torch import horus model = torch.load("my_model.pt") def ml_tick(node): sensor_data = node.recv("sensor_data") if sensor_data is not None: with torch.no_grad(): output = model(torch.tensor(sensor_data)) node.send("control_output", output.item()) ml_node = horus.Node(name="MLController", tick=ml_tick, subs=["sensor_data"], pubs=["control_output"]) ``` ### Education and Learning Python's readable syntax makes it easier to understand robotics concepts without fighting the language. --- ## When to Choose Rust ### Production Deployments ```rust // simplified // Rust catches bugs at compile time impl Node for SafetyMonitor { fn tick(&mut self) { // Compiler ensures we handle all cases match self.check_safety() { SafetyStatus::OK => self.continue_operation(), SafetyStatus::Warning(msg) => self.log_warning(&msg), SafetyStatus::Critical(msg) => self.emergency_stop(&msg), } } } ``` ### High-Frequency Control ```rust // simplified // Rust can sustain 10kHz+ control loops impl Node for MotorController { fn tick(&mut self) { // Microsecond-level timing is reliable let error = self.target - self.position; let output = self.pid.compute(error); self.motor.send(output); } } ``` ### Resource-Constrained Environments ```rust // Rust has minimal runtime overhead // Perfect for embedded systems and single-board computers ``` --- ## Mixed Language Projects You can use both languages in the same project! HORUS nodes communicate via shared memory, which works across languages. **Example:** Python for AI, Rust for control (ML Inference)
10 Hz"] CTRL["Rust Node
(Controller)
100 Hz"] MOTOR["Rust Node
(Motor Driver)
1000 Hz"] PY --> CTRL --> MOTOR `} caption="Mixed language project: Python for ML, Rust for control - connected via shared memory" /> **Python ML node:** ```python import horus def detector_tick(node): camera_image = node.recv("camera") if camera_image is not None: detections = model.detect(camera_image) node.send("detections", detections) detector = horus.Node(name="ObjectDetector", tick=detector_tick, subs=["camera"], pubs=["detections"]) horus.run(detector) ``` **Rust control node:** ```rust // simplified impl Node for NavigationController { fn tick(&mut self) { if let Some(detections) = self.detection_sub.recv() { // React to Python node's output self.plan_path(&detections); } } } ``` --- ## Recommendation by Use Case | Use Case | Recommended Language | |----------|---------------------| | Learning HORUS | Python | | University project | Python | | Hobby robot | Either | | Machine learning robot | Python + Rust | | Industrial automation | Rust | | Drone/UAV | Rust | | Research prototype | Python | | Competition robot | Rust | | Product development | Rust | --- ## Getting Started **Ready to start with Python?** - [Quick Start (Python)](/getting-started/quick-start-python) - [Python API Reference](/python/api/python-bindings) **Ready to start with Rust?** - [Quick Start](/getting-started/quick-start) (uses Rust) - [node! Macro Guide](/concepts/node-macro) - [Rust API Reference](/rust/api) --- ## Still Unsure? **Start with Python.** It's faster to get something working, and you can always port critical parts to Rust later. HORUS makes it easy to mix languages. --- ## See Also - [Quick Start (Rust)](/getting-started/quick-start) — First Rust app - [Quick Start (Python)](/getting-started/quick-start-python) — First Python app - [Multi-Language Support](/concepts/multi-language) — How Rust and Python interoperate --- ## Common Mistakes Path: /getting-started/common-mistakes Description: Avoid these common beginner pitfalls when using HORUS # Common Mistakes New to HORUS? Here are the most common mistakes beginners make and how to fix them. ## Prerequisites - HORUS installed ([Installation Guide](/getting-started/installation)) - [Quick Start](/getting-started/quick-start) completed --- ## 1. Using Slashes in Topic Names **The Problem:** ```rust // simplified // Works on Linux only — fails on macOS! let topic: Topic = Topic::new("sensors/lidar")?; ``` **Why:** On Linux, slashes create subdirectories in the shared memory filesystem which works fine. On macOS, `shm_open()` does not support slashes in names, so this will fail. **The Fix:** ```rust // simplified // CORRECT — Use dots for cross-platform compatibility let topic: Topic = Topic::new("sensors.lidar")?; ``` Use dot-separated names (`"sensors.lidar"`, `"camera.rgb"`) for portable topic names that work on all platforms. --- ## 2. Forgetting to Call recv() Every Tick **The Problem:** ```rust // simplified fn tick(&mut self) { // Only check for messages sometimes if self.counter % 10 == 0 { if let Some(data) = self.sensor_sub.recv() { self.process(data); } } self.counter += 1; } ``` **Why:** Messages can be missed if you don't check every tick. Topic uses a ring buffer (16-1024 slots by default), and old messages are overwritten when the buffer fills up. **The Fix:** ```rust // simplified fn tick(&mut self) { // ALWAYS check for new messages if let Some(data) = self.sensor_sub.recv() { self.last_data = Some(data); } // Use cached data for processing if self.counter % 10 == 0 { if let Some(ref data) = self.last_data { self.process(data); } } self.counter += 1; } ``` --- ## 3. Blocking in tick() **The Problem:** ```rust // simplified fn tick(&mut self) { // WRONG - This blocks the entire scheduler! let data = std::fs::read_to_string("large_file.txt").unwrap(); std::thread::sleep(Duration::from_millis(100)); } ``` **Why:** All nodes run in a single tick cycle. Blocking one node blocks them all. **The Fix:** ```rust // simplified fn init(&mut self) -> Result<()> { // Do slow initialization in init(), not tick() self.data = std::fs::read_to_string("large_file.txt")?; Ok(()) } fn tick(&mut self) { // Keep tick() fast - ideally under 1ms self.process(&self.data); } ``` --- ## 4. Wrong Priority Order **The Problem:** ```rust // simplified // WRONG - Logger runs before sensor! scheduler.add(logger).order(0).build()?; // Order 0 (runs first) scheduler.add(sensor).order(10).build()?; // Order 10 scheduler.add(controller).order(5).build()?; // Order 5 ``` **Why:** Lower order number = runs first. Safety-critical code should be order 0. **The Fix:** ```rust // simplified // CORRECT - Proper ordering scheduler.add(safety_monitor).order(0).build()?; // Safety first! scheduler.add(sensor).order(5).build()?; // Then sensors scheduler.add(controller).order(10).build()?; // Then control scheduler.add(logger).order(100).build()?; // Logging last ``` --- ## 5. Not Implementing shutdown() for Motors **The Problem:** ```rust // simplified impl Node for MotorController { fn name(&self) -> &str { "motor" } fn tick(&mut self) { self.motor.set_velocity(self.velocity); } // No shutdown() implemented! } ``` **Why:** When you press Ctrl+C, the motor keeps running at its last velocity! **The Fix:** ```rust // simplified impl Node for MotorController { fn name(&self) -> &str { "motor" } fn tick(&mut self) { self.motor.set_velocity(self.velocity); } fn shutdown(&mut self) -> Result<()> { // CRITICAL: Stop motor on shutdown! hlog!(info, "Stopping motor for safe shutdown"); self.motor.set_velocity(0.0); Ok(()) } } ``` --- ## 6. Not Deriving Required Traits for Custom Messages **The Problem:** ```rust // simplified struct MyMessage { x: f32, y: f32, } // Error: the trait bound `MyMessage: Clone` is not satisfied let topic: Topic = Topic::new("data")?; ``` **Why:** Topic requires types to implement `Clone + Send + Sync + Serialize + Deserialize`. Most Rust types are `Send + Sync` automatically, so you usually only need to derive the other three. **The Fix:** ```rust // simplified use serde::{Serialize, Deserialize}; #[derive(Clone, Serialize, Deserialize)] struct MyMessage { x: f32, y: f32, name: String, // Strings work fine! data: Vec, // Vecs work too! } let topic: Topic = Topic::new("data")?; ``` Or use the standard message types which already have the required traits: ```rust // simplified use horus::prelude::*; let topic: Topic = Topic::new("cmd_vel")?; let topic: Topic = Topic::new("odom")?; ``` --- ## 7. Thinking send() Returns a Result **The Problem:** ```rust // simplified fn tick(&mut self) { // WRONG - send() is infallible, this won't compile if let Err(e) = self.pub_topic.send(data) { hlog!(warn, "Failed to publish: {:?}", e); } } ``` **Why:** `send()` returns `()`, not `Result`. It uses ring buffer "keep last" semantics — when the buffer is full, the oldest message is overwritten. This means `send()` always succeeds. **The Fix:** ```rust // simplified fn tick(&mut self) { // CORRECT - send() is infallible, just call it self.pub_topic.send(data); } ``` --- ## 8. Creating Topic Inside tick() **The Problem:** ```rust // simplified fn tick(&mut self) { // WRONG - Creates new Topic every tick! let topic: Topic = Topic::new("data").unwrap(); topic.send(42.0); } ``` **Why:** Creating a Topic is expensive (opens shared memory). Doing it every tick wastes resources. **The Fix:** ```rust // simplified struct MyNode { topic: Topic, // Store Topic in struct } impl MyNode { fn new() -> Result { Ok(Self { topic: Topic::new("data")?, // Create once }) } } fn tick(&mut self) { self.topic.send(42.0); // Reuse existing Topic } ``` --- ## 9. Mismatched Topic Types **The Problem:** ```rust // simplified // Publisher sends f32 let pub_topic: Topic = Topic::new("data")?; pub_topic.send(42.0); // Subscriber expects i32 let sub_topic: Topic = Topic::new("data")?; // WRONG TYPE! let value = sub_topic.recv(); // Will get garbage data ``` **Why:** HORUS doesn't check types at runtime. Mismatched types cause data corruption. **The Fix:** ```rust // simplified // Use the SAME type for publisher and subscriber let pub_topic: Topic = Topic::new("data")?; let sub_topic: Topic = Topic::new("data")?; // Same type! ``` **Pro tip:** Use named message types to avoid confusion: ```rust // simplified type SensorReading = f32; let pub_topic: Topic = Topic::new("sensor")?; let sub_topic: Topic = Topic::new("sensor")?; ``` --- ## 10. Using Raw Node Trait When node! Macro Would Be Simpler **The Problem:** ```rust // simplified // Manual implementation - lots of boilerplate struct MySensor { pub_topic: Topic, } impl MySensor { fn new() -> Result { Ok(Self { pub_topic: Topic::new("sensor.data")?, }) } } impl Node for MySensor { fn name(&self) -> &str { "MySensor" } fn tick(&mut self) { let data = 42.0; // Read sensor self.pub_topic.send(data); } } ``` **The Fix:** ```rust // simplified // Use node! macro - 75% less code! node! { MySensor { pub { sensor_data: f32 -> "sensor.data" } tick { let data = 42.0; // Read sensor self.sensor_data.send(data); } } } ``` See [node! Macro](/concepts/node-macro) for more examples. --- ## 11. Not Enabling Watchdog for Safety-Critical Nodes **The Problem:** ```rust // simplified fn main() -> Result<()> { let mut scheduler = Scheduler::new(); scheduler.add(MotorController::new()?).order(0).rate(100_u64.hz()).build()?; scheduler.run() } ``` **Why:** Without a watchdog, if the motor controller node hangs (deadlock, hardware stall, infinite loop), nothing detects it. The robot keeps running on its last commanded velocity. **The Fix:** ```rust // simplified fn main() -> Result<()> { let mut scheduler = Scheduler::new() .watchdog(500_u64.ms()); // global watchdog for all nodes scheduler.add(MotorController::new()?) .order(0) .rate(100_u64.hz()) .watchdog(50_u64.ms()) // tighter timeout for motor control .on_miss(Miss::SafeMode) // enter safe state if tick overruns .build()?; scheduler.run() } ``` Also implement `enter_safe_state()` on any node that controls actuators: ```rust // simplified impl Node for MotorController { fn name(&self) -> &str { "motor" } fn tick(&mut self) { self.motor.set_velocity(self.velocity); } fn enter_safe_state(&mut self) { self.motor.set_velocity(0.0); self.motor.engage_brake(); } } ``` **Rule of thumb:** Set the watchdog to 5x–10x the tick period (e.g., 100 Hz node → 50–100 ms watchdog). See [Safety Monitor](/advanced/safety-monitor) for graduated degradation details. --- ## Quick Reference | Mistake | Fix | |---------|-----| | Slashes in topic names | Use dots: `sensors.lidar` | | Not checking recv() every tick | Always call recv(), cache last value | | Blocking in tick() | Keep tick() under 1ms, do I/O in init() | | Wrong priority order | Lower number = higher priority | | No shutdown() for motors | Always stop actuators in shutdown() | | Missing derives on messages | Add Clone, Serialize, Deserialize | | Treating send() as fallible | `send()` is infallible — just call it directly | | Creating Topic in tick() | Create Topic once in new() | | Mismatched topic types | Use same type for pub and sub | | Too much boilerplate | Use the `node!` macro | | No watchdog on safety-critical nodes | Enable `.watchdog()` + `.on_miss()` + `enter_safe_state()` | --- ## Still Having Issues? - Check [Troubleshooting](/getting-started/troubleshooting) for error messages - See [Examples](/rust/examples/basic-examples) for working code - Run `horus monitor` to see what your nodes are doing --- ## See Also - [Troubleshooting](/getting-started/troubleshooting) — Detailed error resolution - [Testing](/development/testing) — Catch mistakes early - [Debugging](/development/debugging) — Diagnose runtime issues - [Safety Monitor](/advanced/safety-monitor) — Full watchdog and deadline enforcement reference - [Choosing Your Configuration](/concepts/choosing-configuration) — Progressive complexity guide (Level 0→4) - [Execution Classes](/concepts/execution-classes) — When to use `.compute()`, `.on()`, `.async_io()` --- ## Common Mistakes (Python) Path: /getting-started/common-mistakes-python Description: Avoid these common Python-specific pitfalls when using HORUS # Common Mistakes (Python) New to HORUS with Python? Here are the most common mistakes beginners make and how to fix them. ## Prerequisites - HORUS installed ([Installation Guide](/getting-started/installation)) - [Quick Start (Python)](/getting-started/quick-start-python) completed --- ## 1. Using Slashes in Topic Names **The Problem:** ```python node.send("sensors/lidar", scan_data) ``` **Why:** On Linux, slashes create subdirectories in the shared memory filesystem which works fine. On macOS, `shm_open()` does not support slashes in names, so this will fail with an OS error. **The Fix:** ```python # CORRECT — Use dots for cross-platform compatibility node.send("sensors.lidar", scan_data) ``` Use dot-separated names (`"sensors.lidar"`, `"camera.rgb"`) for portable topic names that work on all platforms. --- ## 2. Blocking the tick() Function **The Problem:** ```python import time import requests def tick(node): # WRONG — blocks the entire scheduler! time.sleep(0.1) response = requests.get("http://api.example.com/data") node.send("data", response.json()) ``` **Why:** All nodes run in a single tick cycle. A blocking call in one node delays every other node. Worse, Python's GIL means `time.sleep()` and synchronous HTTP calls hold up the entire tick cycle. If you have a motor controller at 1kHz, a 100ms sleep in any node will cause it to miss hundreds of deadlines. **The Fix:** ```python import horus # Option 1: Use an async tick for I/O-bound work import aiohttp async def tick(node): async with aiohttp.ClientSession() as session: async with session.get("http://api.example.com/data") as resp: data = await resp.json() node.send("data", data) # async def is auto-detected — the node runs on the async executor fetcher = horus.Node(name="fetcher", pubs=["data"], tick=tick, rate=1) ``` ```python # Option 2: For CPU-bound work, use compute=True def tick(node): if node.has_msg("image"): frame = node.recv("image") result = run_heavy_inference(frame) # Runs on thread pool node.send("detections", result) detector = horus.Node(name="detector", pubs=["detections"], subs=["image"], tick=tick, rate=30, compute=True) ``` **Rule of thumb:** Keep synchronous tick functions under 1ms. Use `async def` for network I/O, `compute=True` for CPU-heavy work. --- ## 3. Using Dicts When You Need Typed Topics **The Problem:** ```python def controller_tick(node): # Works, but slow — ~5-50μs per message via JSON serialization node.send("cmd_vel", {"linear": 0.5, "angular": 0.1}) controller = horus.Node(name="nav", pubs=["cmd_vel"], tick=controller_tick, rate=1000) ``` **Why:** String topics (`pubs=["cmd_vel"]`) use `GenericMessage` with MessagePack serialization, which costs 5-50μs per message. At 1kHz, that is 5-50ms per second spent just on serialization. Typed topics use zero-copy POD transport at ~1.5μs. Additionally, dict topics cannot cross to Rust nodes — a Rust subscriber using `Topic` will never see Python dicts. **The Fix:** ```python import horus def controller_tick(node): # Zero-copy POD transport — ~1.5μs, visible to Rust nodes cmd = horus.CmdVel(linear=0.5, angular=0.1) node.send("cmd_vel", cmd) controller = horus.Node(name="nav", pubs=[horus.CmdVel], tick=controller_tick, rate=1000) ``` **When to use each:** | Use case | Topic type | Latency | |----------|-----------|---------| | Control loops (motor commands, sensor fusion) | Typed (`horus.CmdVel`) | ~1.5μs | | Debug/logging/telemetry | String (dicts) | ~5-50μs | | Cross-language (Python to Rust) | Typed (required) | ~1.5μs | | Rapid prototyping | String (dicts) | ~5-50μs | --- ## 4. Forgetting to Handle None from recv() **The Problem:** ```python def tick(node): # WRONG — crashes with AttributeError when no message is available! temp = node.recv("temperature") print(f"Temperature: {temp['value']:.1f}°C") ``` **Why:** `node.recv()` returns `None` when no message is available. On the first tick, or if the publisher is slower than the subscriber, there will be no message. Accessing attributes or keys on `None` raises `AttributeError` or `TypeError`. **The Fix:** ```python # Option 1: Guard with None check def tick(node): temp = node.recv("temperature") if temp is not None: print(f"Temperature: {temp['value']:.1f}°C") ``` ```python # Option 2: Check first with has_msg() def tick(node): if node.has_msg("temperature"): temp = node.recv("temperature") print(f"Temperature: {temp['value']:.1f}°C") ``` ```python # Option 3: Cache last known value last_temp = [None] def tick(node): temp = node.recv("temperature") if temp is not None: last_temp[0] = temp if last_temp[0] is not None: print(f"Temperature: {last_temp[0]['value']:.1f}°C") ``` --- ## 5. Not Calling horus.run() **The Problem:** ```python import horus def sensor_tick(node): node.send("data", 42.0) sensor = horus.Node(name="sensor", pubs=["data"], tick=sensor_tick, rate=10) # Script ends here — nothing happens! ``` **Why:** Creating a `horus.Node` only defines the node. It does not start the scheduler or begin ticking. Without `horus.run()` or a `Scheduler`, nothing executes. **The Fix:** ```python import horus def sensor_tick(node): node.send("data", 42.0) sensor = horus.Node(name="sensor", pubs=["data"], tick=sensor_tick, rate=10) # MUST call run() to start the scheduler horus.run(sensor) ``` For more control, use `horus.Scheduler` directly: ```python sched = horus.Scheduler(tick_rate=100, watchdog_ms=500) sched.add(sensor) sched.run() ``` --- ## 6. Import Errors: _horus Module Not Found **The Problem:** ``` $ python3 -c "import horus" ModuleNotFoundError: No module named '_horus' ``` **Why:** The `horus` Python package has two layers: `_horus` (compiled Rust PyO3 bindings) and `horus` (Python wrapper). If `_horus` is missing, the Rust native extension was not installed or was installed for a different Python version. **The Fix:** ```bash # Install from PyPI (recommended) pip install horus-robotics # Verify it works python3 -c "import horus; print('OK')" ``` If you are building from source: ```bash cd horus_py maturin develop --release ``` **Common causes:** | Symptom | Cause | Fix | |---------|-------|-----| | `No module named '_horus'` | Native extension not installed | `pip install horus-robotics` | | `No module named 'horus'` | Package not in current environment | Check `which python3` and `pip list` | | Works in terminal, fails in script | Different Python environment | Use `horus run` instead of `python3 script.py` | | `ImportError: ... ABI` | Python version mismatch | Reinstall with the correct Python version | --- ## 7. Topic Name Typos **The Problem:** ```python def sensor_tick(node): node.send("temperture", 22.5) # Typo: "temperture" def monitor_tick(node): temp = node.recv("temperature") # Correct name — but never receives! if temp is not None: print(f"Temp: {temp}") sensor = horus.Node(name="sensor", pubs=["temperture"], tick=sensor_tick, rate=1) monitor = horus.Node(name="monitor", subs=["temperature"], tick=monitor_tick, rate=1) ``` **Why:** Python has no compile-time type checking for topic names. Mismatched names silently create two separate topics — the publisher writes to one, the subscriber reads from another. The subscriber's `recv()` always returns `None`, with no error. **The Fix:** ```python import horus # Define topic names as constants TOPIC_TEMPERATURE = "sensor.temperature" def sensor_tick(node): node.send(TOPIC_TEMPERATURE, 22.5) def monitor_tick(node): temp = node.recv(TOPIC_TEMPERATURE) if temp is not None: print(f"Temp: {temp}") sensor = horus.Node(name="sensor", pubs=[TOPIC_TEMPERATURE], tick=sensor_tick, rate=1) monitor = horus.Node(name="monitor", subs=[TOPIC_TEMPERATURE], tick=monitor_tick, rate=1) ``` **Better yet, use typed topics** which enforce the topic name automatically: ```python # Typed topics include their canonical name — no string to mistype sensor = horus.Node(name="sensor", pubs=[horus.Imu], tick=sensor_tick, rate=100) monitor = horus.Node(name="monitor", subs=[horus.Imu], tick=monitor_tick, rate=100) ``` **Debugging tip:** Use `horus topic list` in another terminal to see all active topics and spot mismatches. --- ## 8. Using Global State Instead of Closure/Class State **The Problem:** ```python import horus # WRONG — global mutable state shared between ALL nodes counter = 0 def tick_a(node): global counter counter += 1 node.send("count_a", counter) def tick_b(node): global counter counter += 1 # Both nodes mutate the same variable! node.send("count_b", counter) node_a = horus.Node(name="a", pubs=["count_a"], tick=tick_a, rate=10) node_b = horus.Node(name="b", pubs=["count_b"], tick=tick_b, rate=10) horus.run(node_a, node_b) ``` **Why:** Global variables are shared across all nodes. When two nodes mutate the same global, the values interleave unpredictably. This also breaks deterministic mode where execution order must produce reproducible results. **The Fix:** ```python import horus # Option 1: Closure state (simple, recommended for small state) def make_counter_node(name, topic): counter = [0] # Mutable via list — each node gets its own def tick(node): counter[0] += 1 node.send(topic, counter[0]) return horus.Node(name=name, pubs=[topic], tick=tick, rate=10) node_a = make_counter_node("a", "count_a") node_b = make_counter_node("b", "count_b") horus.run(node_a, node_b) ``` ```python # Option 2: Class state (better for complex state) class CounterNode: def __init__(self): self.counter = 0 def tick(self, node): self.counter += 1 node.send("count", self.counter) state = CounterNode() node = horus.Node(name="counter", pubs=["count"], tick=state.tick, rate=10) horus.run(node) ``` --- ## 9. Not Checking for Slow Subscribers **The Problem:** ```python def monitor_tick(node): data = node.recv("fast.sensor") if data is not None: # Process only the latest message — silently losing all others expensive_analysis(data) ``` **Why:** If the publisher runs at 1kHz and your subscriber ticks at 10Hz, `recv()` only returns the latest message. The other ~99 messages per tick are silently overwritten in the ring buffer. For many applications (motor commands, camera frames) this is fine — you want the latest data. But for event streams (button presses, error reports, waypoints), losing messages is a bug. **The Fix:** ```python # Use recv_all() to drain all queued messages def monitor_tick(node): messages = node.recv_all("events") for msg in messages: process_event(msg) ``` ```python # Or increase the ring buffer capacity for bursty topics event_node = horus.Node( name="event_handler", subs={"events": {"type": "events", "capacity": 4096}}, tick=monitor_tick, rate=10, ) ``` **Use `node.info` to monitor health during development:** ```python def tick(node): data = node.recv("sensor") if data is not None: process(data) # Periodically check metrics if node.info and node.info.tick_count() % 1000 == 0: metrics = node.info.get_metrics() node.log_info(f"Avg tick: {node.info.avg_tick_duration_ms():.2f}ms") ``` --- ## 10. Mixing Python async and HORUS tick **The Problem:** ```python import asyncio import horus async def tick(node): # WRONG — calling asyncio.run() or loop.run_until_complete() inside a tick loop = asyncio.get_event_loop() result = loop.run_until_complete(fetch_data()) node.send("data", result) ``` ```python # Also WRONG — using asyncio.sleep() in a synchronous tick import asyncio def tick(node): asyncio.run(asyncio.sleep(0.01)) # Creates a new event loop every tick! node.send("heartbeat", True) ``` **Why:** HORUS has its own async executor for `async def` tick functions. Manually creating or running `asyncio` event loops inside a tick conflicts with the scheduler's execution model, can deadlock, and creates a new event loop on every tick (which is expensive). **The Fix:** ```python import aiohttp import horus # CORRECT — just make tick async and await directly async def tick(node): async with aiohttp.ClientSession() as session: async with session.get("http://api.example.com/data") as resp: data = await resp.json() node.send("data", data) # HORUS auto-detects async def and runs it on its async executor fetcher = horus.Node(name="fetcher", pubs=["data"], tick=tick, rate=1) horus.run(fetcher) ``` **Rules for async nodes:** - Declare `tick` (and optionally `init`/`shutdown`) as `async def` — HORUS detects this automatically - Use `await` for all I/O operations inside the tick - Do not use `asyncio.run()`, `asyncio.get_event_loop()`, or `loop.run_until_complete()` inside tick functions - Async nodes cannot use `compute=True` or `on="topic"` — these are mutually exclusive execution modes --- ## Quick Reference | Mistake | Fix | |---------|-----| | Slashes in topic names | Use dots: `sensors.lidar` | | Blocking tick() with sleep/HTTP | Use `async def` tick or `compute=True` | | Dicts in control loops | Use typed topics: `pubs=[horus.CmdVel]` | | Not handling `None` from `recv()` | Check `if temp is not None` or use `has_msg()` | | Forgetting `horus.run()` | Always call `horus.run(*nodes)` to start execution | | `_horus` import error | `pip install horus-robotics` or `maturin develop --release` | | Topic name typos | Use string constants or typed topics | | Global mutable state | Use closure state (list) or class instances | | Missing messages from fast publishers | Use `recv_all()` or increase `capacity` | | Manual asyncio inside tick | Use `async def tick(node)` — HORUS runs the executor | --- ## Still Having Issues? - Check [Troubleshooting](/getting-started/troubleshooting) for error messages - See [Python Examples](/python/examples) for working code - Run `horus topic list` to see active topics and debug connectivity - Run `horus monitor` to watch node tick rates and health in real time --- ## See Also - [Common Mistakes (Rust)](/getting-started/common-mistakes) — Rust-specific pitfalls - [Python API Reference](/python/api) — Full Node, Scheduler, Topic reference - [Python Bindings](/python/api/python-bindings) — Deep dive into the PyO3 layer - [Async Nodes](/python/api/async-nodes) — Async I/O patterns and best practices - [Troubleshooting](/getting-started/troubleshooting) — Detailed error resolution --- ## Second Application: C++ Path: /getting-started/second-application-cpp Description: Build a multi-sensor robot with RT scheduling, transforms, parameters, and services # Second Application: C++ You've built a [Quick Start](/docs/getting-started/quick-start-cpp) app. Now build a real robot: a mobile base with lidar, IMU, motor control, coordinate transforms, runtime parameters, and a service. ## What You'll Build ``` ┌──────────┐ ┌────────────┐ ┌───────────┐ │ LiDAR │───→│ Controller │───→│ Motors │ │ (10 Hz) │ │ (50 Hz) │ │ (50 Hz) │ └──────────┘ └────────────┘ └───────────┘ ↑ ┌──────────┐ │ │ IMU │─────────┘ │ (100 Hz) │ └──────────┘ ``` Features exercised: - Multi-rate nodes (10 Hz sensor, 50 Hz controller, 100 Hz IMU) - Coordinate transforms (lidar → base_link) - Runtime parameters (max_speed, safety_distance) - Service (emergency stop) - Budget/deadline enforcement ## The Code ```cpp #include #include #include #include using namespace horus::literals; // ── Simulated sensor data ─────────────────────────────────────────────── static int lidar_seq = 0; static float sim_distance() { return 1.0f + 0.5f * std::sin(lidar_seq++ * 0.05f); // oscillates 0.5-1.5m } // ── Shared state ──────────────────────────────────────────────────────── static std::atomic estop_active{false}; int main() { // ── Scheduler ─────────────────────────────────────────────────── horus::Scheduler sched; sched.tick_rate(100_hz); sched.name("mobile_base"); // ── Coordinate transforms ─────────────────────────────────────── horus::TransformFrame tf; tf.register_frame("world"); tf.register_frame("base_link", "world"); tf.register_frame("lidar", "base_link"); // Lidar is 20cm forward, 30cm up from base tf.update("lidar", {0.2, 0.0, 0.3}, {0, 0, 0, 1}, 0); // ── Runtime parameters ────────────────────────────────────────── horus::Params params; params.set("max_speed", 0.5); // m/s params.set("safety_distance", 0.3); // meters params.set("controller_gain", 0.8); // ── Topics ────────────────────────────────────────────────────── horus::Publisher lidar_pub("lidar.scan"); horus::Subscriber lidar_sub("lidar.scan"); horus::Publisher imu_pub("imu.data"); horus::Subscriber imu_sub("imu.data"); horus::Publisher cmd_pub("motor.cmd"); horus::Subscriber cmd_sub("motor.cmd"); // ── Nodes ─────────────────────────────────────────────────────── // LiDAR driver: 10 Hz, publishes range data static int lidar_tick_count = 0; sched.add("lidar_driver") .order(0) .tick([&] { // Only publish every 10th tick (10 Hz at 100 Hz scheduler) if (++lidar_tick_count % 10 != 0) return; auto msg = lidar_pub.loan(); msg->linear = sim_distance(); // range in meters msg->timestamp_ns = static_cast(lidar_tick_count); lidar_pub.publish(std::move(msg)); }) .build(); // IMU: every tick (100 Hz), publishes orientation sched.add("imu_driver") .order(1) .tick([&] { auto msg = imu_pub.loan(); msg->angular = 0.01f; // simulated yaw rate imu_pub.publish(std::move(msg)); }) .build(); // Controller: 50 Hz, reads lidar + IMU, publishes motor commands static int ctrl_tick = 0; sched.add("controller") .order(10) .budget(5_ms) .on_miss(horus::Miss::Skip) .tick([&] { if (++ctrl_tick % 2 != 0) return; // 50 Hz from 100 Hz if (estop_active.load()) return; // Read parameters double max_speed = params.get("max_speed", 0.5); double safety_dist = params.get("safety_distance", 0.3); double gain = params.get("controller_gain", 0.8); // Read lidar float range = 999.0f; auto scan = lidar_sub.recv(); if (scan) range = scan->get()->linear; // Read IMU float yaw_rate = 0.0f; auto imu = imu_sub.recv(); if (imu) yaw_rate = imu->get()->angular; // Simple obstacle avoidance auto cmd = cmd_pub.loan(); if (range < static_cast(safety_dist)) { cmd->linear = 0.0f; cmd->angular = 0.5f; // turn away } else { cmd->linear = static_cast(max_speed * gain); cmd->angular = -yaw_rate; // compensate drift } cmd_pub.publish(std::move(cmd)); }) .build(); // Motor driver: reads commands, "applies" them static int motor_tick = 0; sched.add("motor_driver") .order(20) .tick([&] { if (++motor_tick % 2 != 0) return; // 50 Hz auto cmd = cmd_sub.recv(); if (!cmd) return; if (motor_tick % 100 == 0) { // print every 1 second std::printf("[motor] linear=%.2f angular=%.2f\n", cmd->get()->linear, cmd->get()->angular); } }) .build(); // ── Verify setup ──────────────────────────────────────────────── auto nodes = sched.node_list(); std::printf("Robot ready: %zu nodes\n", nodes.size()); for (auto& n : nodes) std::printf(" - %s\n", n.c_str()); // Verify transforms if (tf.can_transform("lidar", "world")) { auto t = tf.lookup("lidar", "world"); if (t) { std::printf("Lidar position: (%.1f, %.1f, %.1f) in world frame\n", t->translation[0], t->translation[1], t->translation[2]); } } // Run for 3 seconds for (int i = 0; i < 300; i++) { sched.tick_once(); // Simulate parameter change at 1.5 seconds if (i == 150) { params.set("max_speed", 0.3); std::printf("[param] max_speed reduced to 0.3 m/s\n"); } } std::printf("Done: 300 ticks completed\n"); return 0; } ``` ## Build and Run ```bash horus build && horus run ``` ## Key Patterns Demonstrated ### Multi-Rate Nodes All nodes share one 100 Hz scheduler. Each node divides internally: - LiDAR: ticks every 10th iteration = 10 Hz - Controller + Motor: every 2nd iteration = 50 Hz - IMU: every iteration = 100 Hz ### Coordinate Transforms ```cpp horus::TransformFrame tf; tf.register_frame("world"); tf.register_frame("base_link", "world"); tf.register_frame("lidar", "base_link"); tf.update("lidar", {0.2, 0.0, 0.3}, {0, 0, 0, 1}, timestamp); auto t = tf.lookup("lidar", "world"); // chain: lidar → base → world ``` ### Runtime Parameters ```cpp horus::Params params; params.set("max_speed", 0.5); double speed = params.get("max_speed", 0.0); // with default params.set("max_speed", 0.3); // change at runtime ``` ### Budget Enforcement ```cpp sched.add("controller") .budget(5_ms) // must complete within 5ms .on_miss(horus::Miss::Skip) // skip tick if over budget .tick([&] { /* ... */ }) .build(); ``` ## Next Steps - [C++ API Reference](/docs/reference/cpp-api) — full class/method documentation - [Multi-Language Guide](/docs/concepts/multi-language) — combine C++, Rust, and Python - [Execution Classes](/docs/concepts/execution-classes) — RT, Compute, Event, AsyncIo - [Safety Monitor](/docs/advanced/safety-monitor) — watchdog, safe state, BlackBox --- ## Migrating to horus.toml Path: /getting-started/migrating-to-horus-toml Description: Convert Cargo.toml, pyproject.toml, or CMakeLists.txt to horus.toml — side-by-side examples for every language # Migrating to horus.toml If you already have a Rust, Python, or C++ project and want to move it to HORUS, this page shows you exactly how your existing config maps to `horus.toml`. No guessing — just side-by-side translations. --- ## Quick Path: Automatic Migration HORUS can read your existing build files and generate `horus.toml` for you: ```bash # Step 1: Initialize a horus.toml in your project cd my-existing-project horus init # Step 2: Auto-import dependencies from existing configs horus migrate --dry-run # Preview what will change horus migrate # Do it ``` `horus migrate` reads `Cargo.toml`, `pyproject.toml`, and `CMakeLists.txt` from your project root, extracts all dependencies, and writes them into `horus.toml`. It backs up old files to `.horus/backup/` so nothing is lost. **What it handles:** - Cargo.toml `[dependencies]` and `[dev-dependencies]` with versions, features, git sources, and path deps - pyproject.toml PEP 621 `[project] dependencies` with version specifiers - CMakeLists.txt `find_package()`, `pkg_check_modules()`, `FetchContent_Declare()`, and `ExternalProject_Add()` **What you'll still do manually:** `[hardware]`, `[scripts]`, `[hooks]`, `[robot]` — these are HORUS-specific and have no equivalent in native configs. If you prefer to understand the mapping and do it yourself, read on. --- ## Coming from Rust (Cargo.toml) If you've used Cargo, `horus.toml` will feel familiar. The structure is intentionally similar. ### Package Metadata **Key differences:** - `edition` becomes `rust_edition` (it's language-specific, since horus.toml is multi-language) - `type = "lib"` or `type = "bin"` works the same way - No `[lib]`, `[[bin]]`, or `[profile]` sections — HORUS handles these in `.horus/Cargo.toml` ### Dependencies This is where the biggest shift happens. In Cargo, source is implicit — everything comes from crates.io unless you specify `path` or `git`. In horus.toml, you declare the `source` explicitly because deps can come from six different places. **Before — Cargo.toml:** ```toml [dependencies] serde = { version = "1.0", features = ["derive"] } tokio = { version = "1", features = ["full"] } my-lib = { path = "../my-lib" } my-driver = { git = "https://github.com/team/driver.git", branch = "main" } [dev-dependencies] criterion = "0.5" ``` **After — horus.toml:** ```toml [dependencies] serde = { version = "1.0", source = "crates.io", features = ["derive"] } tokio = { version = "1", source = "crates.io", features = ["full"] } my-lib = { path = "../my-lib" } my-driver = { git = "https://github.com/team/driver.git", branch = "main" } [dev-dependencies] criterion = { version = "0.5", source = "crates.io" } ``` **What changed:** - Added `source = "crates.io"` to crates.io deps. Without it, HORUS defaults to the HORUS registry - `path` and `git` deps don't need `source` — HORUS infers it automatically - Everything else is identical ### Field-by-Field Mapping | Cargo.toml | horus.toml | Notes | |-----------|-----------|-------| | `version = "1.0"` | `version = "1.0"` | Same syntax | | `features = ["derive"]` | `features = ["derive"]` | Same syntax | | `optional = true` | `optional = true` | Same syntax | | `path = "../lib"` | `path = "../lib"` | Source auto-inferred | | `git = "https://..."` | `git = "https://..."` | Source auto-inferred | | `branch = "main"` | `branch = "main"` | Same syntax | | `tag = "v1.0"` | `tag = "v1.0"` | Same syntax | | `rev = "abc123"` | `rev = "abc123"` | Same syntax | | (implicit crates.io) | `source = "crates.io"` | **Must be explicit** | | `default-features = false` | Not yet supported | Use `features = []` | ### Cargo Features Cargo `[features]` don't have a direct equivalent in horus.toml. Feature flags for your own crate are defined in the generated `.horus/Cargo.toml`. For now, if your project defines custom features, you have two options: 1. Use the `enable` field for capability flags that HORUS understands (`cuda`, `headless`, etc.) 2. For custom Cargo features, use `horus cargo build --features my-feature` — the native tool proxy passes flags through ### Workspaces **Before — Cargo.toml (root):** ```toml [workspace] members = ["crates/*"] [workspace.dependencies] serde = { version = "1.0", features = ["derive"] } ``` **After — horus.toml (root):** ```toml [workspace] members = ["crates/*"] [workspace.dependencies] serde = { version = "1.0", source = "crates.io", features = ["derive"] } ``` Each workspace member gets its own `horus.toml`: ```toml # crates/my-node/horus.toml [package] name = "my-node" version = "0.1.0" [dependencies] serde = { workspace = true } # Inherits from root ``` This works exactly like Cargo workspaces — `workspace = true` pulls version, features, and source from the root. --- ## Coming from Python (pyproject.toml) Python's packaging landscape has `pyproject.toml`, `setup.py`, `requirements.txt`, and `setup.cfg`. HORUS replaces all of them with one section in `horus.toml`. ### Package Metadata **Key differences:** - `[project]` becomes `[package]` — same fields, simpler syntax - Authors are strings, not tables - License is a string, not a table - No `requires-python` — HORUS manages the Python environment ### Dependencies **Before — pyproject.toml:** ```toml [project] dependencies = [ "numpy>=1.24", "opencv-python>=4.8", "torch>=2.0", ] [project.optional-dependencies] dev = ["pytest>=7.0", "ruff>=0.1"] ``` **After — horus.toml:** ```toml [dependencies] numpy = { version = ">=1.24", source = "pypi" } opencv-python = { version = ">=4.8", source = "pypi" } torch = { version = ">=2.0", source = "pypi" } [dev-dependencies] pytest = { version = ">=7.0", source = "pypi" } ruff = { version = ">=0.1", source = "pypi" } ``` **What changed:** - PEP 508 inline strings become TOML tables with `version` and `source` fields - `source = "pypi"` is required — without it, HORUS looks in its own registry - Optional dependency groups become `[dev-dependencies]` **If you're using requirements.txt instead:** ``` # requirements.txt numpy>=1.24 opencv-python>=4.8 torch>=2.0 ``` The mapping is the same — each line becomes an entry in `[dependencies]` with `source = "pypi"`. ### Field-by-Field Mapping | pyproject.toml / requirements.txt | horus.toml | Notes | |----------------------------------|-----------|-------| | `"numpy>=1.24"` | `version = ">=1.24", source = "pypi"` | Version syntax is the same | | `"torch==2.0.1"` | `version = "==2.0.1", source = "pypi"` | Exact pinning works | | `"pkg[extra1,extra2]"` | `features = ["extra1", "extra2"]` | Extras become features | | `[project.optional-dependencies]` | `[dev-dependencies]` | Dev-only scope | | `[tool.pytest.ini_options]` | Not in horus.toml | Keep in `pyproject.toml` or use `[scripts]` | | `[tool.ruff]` | Not in horus.toml | Keep in `pyproject.toml` or `ruff.toml` | ### Version Specifiers Python version specifiers work directly in horus.toml: | Python style | horus.toml | Meaning | |-------------|-----------|---------| | `>=1.24` | `version = ">=1.24"` | At least 1.24 | | `>=1.24,<2.0` | `version = ">=1.24,<2.0"` | Range | | `==1.24.0` | `version = "==1.24.0"` | Exact | | `~=1.24` | `version = "~=1.24"` | Compatible release | --- ## Coming from C++ (CMakeLists.txt) C++ migration is more involved because CMake configs are imperative scripts, not declarative manifests. HORUS extracts what it can and puts it in `[dependencies]` and `[cpp]`. ### Package Metadata **Before — CMakeLists.txt:** ```cmake cmake_minimum_required(VERSION 3.16) project(my_robot VERSION 0.1.0 LANGUAGES CXX) set(CMAKE_CXX_STANDARD 20) ``` **After — horus.toml:** ```toml [package] name = "my-robot" version = "0.1.0" standard = "c++20" [cpp] compiler = "clang++" # Optional cmake_args = ["-DCMAKE_BUILD_TYPE=Release"] # Optional ``` ### Dependencies **Before — CMakeLists.txt:** ```cmake find_package(Eigen3 REQUIRED) find_package(OpenCV REQUIRED) find_package(Boost REQUIRED COMPONENTS filesystem system) include(FetchContent) FetchContent_Declare( fmt GIT_REPOSITORY https://github.com/fmtlib/fmt.git GIT_TAG 10.1.1 ) FetchContent_MakeAvailable(fmt) ``` **After — horus.toml:** ```toml [dependencies] eigen3 = { source = "system", apt = "libeigen3-dev", cmake_package = "Eigen3" } opencv = { source = "system", apt = "libopencv-dev", cmake_package = "OpenCV" } boost = { source = "system", apt = "libboost-all-dev", cmake_package = "Boost" } fmt = { git = "https://github.com/fmtlib/fmt.git", tag = "10.1.1" } ``` ### Field-by-Field Mapping | CMake pattern | horus.toml | Notes | |-------------|-----------|-------| | `find_package(Pkg)` | `source = "system"` + `cmake_package` + `apt` | System library | | `FetchContent_Declare(... GIT_REPOSITORY ...)` | `git = "..."` + `tag` or `branch` | Git dependency | | `ExternalProject_Add(...)` | `git = "..."` | External project | | `pkg_check_modules(...)` | `source = "system"` + `apt` | pkg-config library | | `set(CMAKE_CXX_STANDARD 20)` | `standard = "c++20"` in `[package]` | Language standard | | `target_compile_options(...)` | `cmake_args = [...]` in `[cpp]` | Build flags | --- ## Mixed-Language Projects This is where horus.toml pays off. A project using Rust for control and Python for ML traditionally needs separate config files. With horus.toml, it's one file: **Before — two files:** ```toml # Cargo.toml [package] name = "my-robot" version = "0.1.0" edition = "2021" [dependencies] serde = { version = "1.0", features = ["derive"] } horus_library = "0.1.9" ``` ```toml # pyproject.toml [project] name = "my-robot" version = "0.1.0" dependencies = ["numpy>=1.24", "torch>=2.0"] ``` **After — one file:** ```toml # horus.toml [package] name = "my-robot" version = "0.1.0" [dependencies] # Rust serde = { version = "1.0", source = "crates.io", features = ["derive"] } horus_library = "0.1.9" # Python numpy = { version = ">=1.24", source = "pypi" } torch = { version = ">=2.0", source = "pypi" } ``` When you run `horus build`, HORUS generates both `.horus/Cargo.toml` (with serde and horus_library) and `.horus/pyproject.toml` (with numpy and torch). Each build tool only sees its own deps. --- ## Using Native Tools After Migration You don't have to abandon `cargo` or `pip` after switching to horus.toml. HORUS provides transparent proxying: ```bash # Set up the proxy (add to your shell profile) eval "$(horus env --init)" # Now these commands auto-sync with horus.toml cargo add rand # Adds to horus.toml with source = "crates.io" pip install scipy # Adds to horus.toml with source = "pypi" cargo build # Builds from .horus/Cargo.toml (generated from horus.toml) ``` The proxy intercepts `cargo` and `pip` commands, updates horus.toml, regenerates the native build files, and runs the real tool. Your muscle memory still works. To bypass the proxy for a one-off command: ```bash HORUS_NO_PROXY=1 cargo build # Uses Cargo.toml directly, skips horus.toml ``` See [Native Tool Integration](/development/native-tools) for details on how the proxy works. --- ## Step-by-Step: Manual Migration If `horus migrate` doesn't cover your case, or you want to understand every step: ### Step 1: Initialize ```bash cd my-existing-project horus init ``` This creates a minimal `horus.toml` and `.horus/` directory. It does **not** touch your existing files. ### Step 2: Move Package Metadata Copy `name`, `version`, `description`, `authors`, `license` from your existing config into `[package]`. ### Step 3: Convert Dependencies For each dependency in your existing config: 1. **Is it from crates.io?** Add `source = "crates.io"` 2. **Is it from PyPI?** Add `source = "pypi"` 3. **Is it a system library?** Add `source = "system"` with `apt` and `cmake_package` 4. **Is it a local path?** Use `path = "..."` (source auto-inferred) 5. **Is it from git?** Use `git = "..."` (source auto-inferred) 6. **Is it from the HORUS registry?** Just use `name = "version"` (default source) ### Step 4: Add HORUS-Specific Config These sections have no equivalent in native build files — add them as needed: ```toml [robot] name = "turtlebot" description = "robot.urdf" simulator = "sim3d" [hardware] lidar = { use = "rplidar", port = "/dev/ttyUSB0", sim = true } imu = { use = "bno055", bus = "i2c-1", sim = true } [scripts] sim = "horus sim start --world warehouse" deploy = "horus deploy pi@robot --release" [hooks] pre_run = ["fmt", "lint"] ``` ### Step 5: Verify ```bash horus check # Validates horus.toml syntax and fields horus build # Generates .horus/ and builds horus test # Runs tests through the new config ``` ### Step 6: Clean Up Once everything works, you can remove the old config files. `horus migrate` backs them up to `.horus/backup/` automatically. If you migrated manually: ```bash # Only after confirming horus build and horus test pass mkdir -p .horus/backup mv Cargo.toml .horus/backup/ # If Rust mv pyproject.toml .horus/backup/ # If Python mv CMakeLists.txt .horus/backup/ # If C++ ``` --- ## Troubleshooting ### "Package not found" after migration Most likely a missing `source` field. Check that every crates.io dep has `source = "crates.io"` and every PyPI dep has `source = "pypi"`. Without an explicit source, HORUS looks in its own registry. ### "Invalid version format" horus.toml requires full semver for your project version: `"0.1.0"`, not `"0.1"`. Dependency versions can use ranges (`">=1.24"`, `"^1.0"`). ### Build fails but old config worked Compare the generated `.horus/Cargo.toml` or `.horus/pyproject.toml` against your original. If a feature flag or version constraint is missing, update `horus.toml` and run `horus build` again. ```bash diff Cargo.toml.bak .horus/Cargo.toml # Spot the difference ``` ### Cargo features I defined are missing Custom `[features]` sections aren't in horus.toml yet. Use the native tool proxy: ```bash horus cargo build --features my-custom-feature ``` --- ## Cheat Sheet A quick reference for the most common translations: | What you want | Cargo.toml | pyproject.toml | horus.toml | |-------------|-----------|---------------|-----------| | Add a Rust crate | `serde = "1.0"` | N/A | `serde = { version = "1.0", source = "crates.io" }` | | Add a Python package | N/A | `"numpy>=1.24"` | `numpy = { version = ">=1.24", source = "pypi" }` | | Add with features | `features = ["derive"]` | `"pkg[extra]"` | `features = ["derive"]` | | Dev-only dep | `[dev-dependencies]` | `[project.optional-dependencies]` | `[dev-dependencies]` | | Local path dep | `path = "../lib"` | N/A | `path = "../lib"` | | Git dep | `git = "https://..."` | N/A | `git = "https://..."` | | System library | N/A | N/A | `source = "system", apt = "libfoo-dev"` | | Add via CLI | `cargo add serde` | `pip install numpy` | `horus add serde --source crates.io` | --- ## Next Steps - **[horus.toml Reference](/concepts/horus-toml)** — Why it's the single source of truth - **[Configuration Reference](/package-management/configuration)** — Every field documented - **[Native Tool Integration](/development/native-tools)** — Keep using `cargo` and `pip` - **[Multi-Crate Workspaces](/development/workspaces)** — Workspace migration - **[CLI Reference](/development/cli-reference)** — All `horus` commands --- ## Building Your Second Application Path: /getting-started/second-application Description: Build a 3-node sensor pipeline with filtering and display # Building Your Second Application Now that you've built your first HORUS application, let's create something more practical: a **3-node sensor pipeline** that reads temperature data, filters out noise, and displays the results. ## Prerequisites - HORUS installed ([Installation Guide](/getting-started/installation)) - [Quick Start](/getting-started/quick-start) completed (you know how to create a project and run it) **Time:** ~20 minutes ## What You'll Build A real-time temperature monitoring system with: 1. **SensorNode**: Publishes simulated temperature readings every second 2. **FilterNode**: Subscribes to raw temperatures, filters noise, republishes clean data 3. **DisplayNode**: Subscribes to filtered data, displays to console This demonstrates: - Multi-node communication patterns - Data pipeline processing - Real-time filtering - Monitor monitoring ## Architecture SensorNode
Publish 25.3°C"] -->|raw_temp| F["FilterNode
Remove noise"] F -->|filtered_temp| D["DisplayNode
Temp: 25.0°C"] style S fill:#3b82f6,color:#fff style F fill:#10b981,color:#fff style D fill:#f59e0b,color:#fff `} caption="Temperature pipeline: SensorNode → FilterNode → DisplayNode" /> ## Step 1: Create the Project ```bash horus new temperature_pipeline cd temperature_pipeline ``` ## Step 2: Write the Code Replace `src/main.rs` with this complete, runnable code: ```rust use horus::prelude::*; use std::time::{Duration, Instant}; // ============================================================================ // Node 1: SensorNode - Publishes temperature readings // ============================================================================ struct SensorNode { temp_pub: Topic, last_publish: Instant, reading: f32, } impl SensorNode { fn new() -> Result { Ok(Self { temp_pub: Topic::new("raw_temp")?, last_publish: Instant::now(), reading: 20.0, }) } } impl Node for SensorNode { fn name(&self) -> &str { "SensorNode" } fn init(&mut self) -> Result<()> { hlog!(info, "Temperature sensor initialized"); Ok(()) } fn tick(&mut self) { // Publish every 1 second if self.last_publish.elapsed() >= Duration::from_secs(1) { // Simulate realistic temperature with noise // Base temperature oscillates between 20-30°C let base_temp = 25.0 + (self.reading * 0.1).sin() * 5.0; // Add random noise (+/- 2°C) let noise = (self.reading * 0.7).sin() * 2.0; let temperature = base_temp + noise; // Publish raw temperature self.temp_pub.send(temperature); hlog!(info, "Published raw temp: {:.2}°C", temperature); self.reading += 1.0; self.last_publish = Instant::now(); } } fn shutdown(&mut self) -> Result<()> { hlog!(info, "Sensor shutdown complete"); Ok(()) } } // ============================================================================ // Node 2: FilterNode - Removes noise with exponential moving average // ============================================================================ struct FilterNode { raw_sub: Topic, filtered_pub: Topic, filtered_value: Option, alpha: f32, // Smoothing factor (0.0 - 1.0) } impl FilterNode { fn new() -> Result { Ok(Self { raw_sub: Topic::new("raw_temp")?, filtered_pub: Topic::new("filtered_temp")?, filtered_value: None, alpha: 0.3, // 30% new data, 70% previous (smooth but responsive) }) } } impl Node for FilterNode { fn name(&self) -> &str { "FilterNode" } fn init(&mut self) -> Result<()> { hlog!(info, "Filter initialized (alpha = {:.2})", self.alpha); Ok(()) } fn tick(&mut self) { // Check for new temperature reading if let Some(raw_temp) = self.raw_sub.recv() { // Apply exponential moving average filter let filtered = match self.filtered_value { Some(prev) => self.alpha * raw_temp + (1.0 - self.alpha) * prev, None => raw_temp, // First reading, no previous value }; self.filtered_value = Some(filtered); // Publish filtered temperature self.filtered_pub.send(filtered); hlog!(info, "Filtered: {:.2}°C -> {:.2}°C (removed {:.2}°C noise)", raw_temp, filtered, raw_temp - filtered); } } fn shutdown(&mut self) -> Result<()> { hlog!(info, "Filter shutdown complete"); Ok(()) } } // ============================================================================ // Node 3: DisplayNode - Shows filtered temperature on console // ============================================================================ struct DisplayNode { filtered_sub: Topic, display_counter: u32, } impl DisplayNode { fn new() -> Result { Ok(Self { filtered_sub: Topic::new("filtered_temp")?, display_counter: 0, }) } } impl Node for DisplayNode { fn name(&self) -> &str { "DisplayNode" } fn init(&mut self) -> Result<()> { hlog!(info, "Display initialized"); println!("\n========================================"); println!(" Temperature Monitor - Press Ctrl+C to stop"); println!("========================================\n"); Ok(()) } fn tick(&mut self) { if let Some(temp) = self.filtered_sub.recv() { self.display_counter += 1; // Display temperature with visual indicator let status = if temp < 22.0 { "COLD" } else if temp > 28.0 { "HOT" } else { "NORMAL" }; println!( "[Reading #{}] Temperature: {:.1}°C - Status: {}", self.display_counter, temp, status ); hlog!(debug, "Displayed reading #{}", self.display_counter); } } fn shutdown(&mut self) -> Result<()> { println!("\n========================================"); println!(" Total readings displayed: {}", self.display_counter); println!("========================================\n"); hlog!(info, "Display shutdown complete"); Ok(()) } } // ============================================================================ // Main Application - Configure and run the scheduler // ============================================================================ fn main() -> Result<()> { println!("Starting Temperature Pipeline...\n"); let mut scheduler = Scheduler::new(); // Add nodes in priority order: // 1. SensorNode (order 0) - Runs first to generate data scheduler.add(SensorNode::new()?).order(0).build()?; // 2. FilterNode (order 1) - Runs second to process data scheduler.add(FilterNode::new()?).order(1).build()?; // 3. DisplayNode (order 2) - Runs last to display results scheduler.add(DisplayNode::new()?).order(2).build()?; // Optional: Enable watchdog for production use. // If any node hangs for more than 5 seconds, the scheduler // logs a warning (1x), skips the node (2x), then isolates it (3x). // See the Safety Monitor guide for details. println!("All nodes initialized. Running...\n"); // Run the scheduler (blocks until Ctrl+C) scheduler.run()?; Ok(()) } ``` ## Step 3: Run the Application ```bash horus run ``` **Expected Output:** ``` Starting Temperature Pipeline... All nodes initialized. Running... ======================================== Temperature Monitor - Press Ctrl+C to stop ======================================== [Reading #1] Temperature: 23.4°C - Status: NORMAL [Reading #2] Temperature: 24.1°C - Status: NORMAL [Reading #3] Temperature: 25.8°C - Status: NORMAL [Reading #4] Temperature: 27.2°C - Status: NORMAL [Reading #5] Temperature: 28.6°C - Status: HOT [Reading #6] Temperature: 27.9°C - Status: NORMAL [Reading #7] Temperature: 26.3°C - Status: NORMAL ``` Press **Ctrl+C** to stop: ``` ^C Ctrl+C received! Shutting down HORUS scheduler... ======================================== Total readings displayed: 7 ======================================== ``` ## Step 4: Monitor with Monitor Open a **second terminal** and run: ```bash horus monitor ``` The monitor will show: ### Nodes Tab - **SensorNode**: Publishing to `raw_temp` every ~1 second - **FilterNode**: Subscribing to `raw_temp`, publishing to `filtered_temp` - **DisplayNode**: Subscribing to `filtered_temp` ### Topics Tab - **raw_temp** (f32): Noisy temperature readings - **filtered_temp** (f32): Smooth temperature readings ### Metrics Tab - **IPC Latency**: ~85ns-437ns (sub-microsecond!) - **Tick Duration**: How long each node takes to execute - **Message Counts**: Total messages sent/received ## Understanding the Code ### SensorNode ```rust // simplified // Publish every 1 second if self.last_publish.elapsed() >= Duration::from_secs(1) { let temperature = 25.0 + noise; self.temp_pub.send(temperature); } ``` **Key Points:** - Uses `Instant` to track time between publishes - Simulates realistic sensor data with noise - Publishes to `"raw_temp"` topic ### FilterNode ```rust // simplified // Exponential moving average filter let filtered = self.alpha * raw_temp + (1.0 - self.alpha) * prev; self.filtered_pub.send(filtered); ``` **Key Points:** - Subscribes to `"raw_temp"`, publishes to `"filtered_temp"` - Implements exponential moving average (EMA) filter - `alpha = 0.3` balances responsiveness vs smoothness **Filter Behavior:** - **High alpha (0.8)**: Fast response, less smoothing - **Low alpha (0.2)**: Slow response, more smoothing ### DisplayNode ```rust // simplified if let Some(temp) = self.filtered_sub.recv() { println!("[Reading #{}] Temperature: {:.1}°C", count, temp); } ``` **Key Points:** - Subscribes to `"filtered_temp"` - Only receives when new data is available - `recv()` returns `None` when no message (not an error!) ## Common Issues & Fixes ### Issue 1: No Output Displayed **Symptom:** ``` Starting Temperature Pipeline... All nodes initialized. Running... ======================================== Temperature Monitor - Press Ctrl+C to stop ======================================== [Nothing appears] ``` **Cause:** Topics not connecting (typo in topic names) **Fix:** - Check topic names match exactly: `"raw_temp"` and `"filtered_temp"` - Verify with monitor: `horus monitor` -> Topics tab - Ensure all nodes are running in same scheduler ### Issue 2: Too Much/Too Little Smoothing **Symptom:** Temperature changes too fast or too slow **Fix:** Adjust the `alpha` value in `FilterNode`: ```rust // simplified alpha: 0.3, // Current: moderate smoothing // Try these alternatives: alpha: 0.7, // More responsive, less smooth alpha: 0.1, // Very smooth, slower response ``` ### Issue 3: Monitor Shows No Nodes **Symptom:** Monitor is empty **Cause:** Application not running or monitor started before app **Fix:** 1. Start the application first: `horus run` 2. Then start monitor in separate terminal: `horus monitor` 3. Monitor auto-discovers running nodes ### Issue 4: Build Errors **Symptom:** ``` error[E0433]: failed to resolve: use of undeclared type `Topic` ``` **Fix:** - Ensure HORUS is installed: `horus --help` - Check import: `use horus::prelude::*;` - Run from project directory (where `horus.toml` is) ## Experiments to Try ### 1. Change Update Rate Make the sensor publish faster: ```rust // simplified // In SensorNode::tick() if self.last_publish.elapsed() >= Duration::from_millis(500) { // 2 Hz instead of 1 Hz ``` ### 2. Add Temperature Alerts Add to `DisplayNode`: ```rust // simplified if temp > 30.0 { println!(" WARNING: High temperature detected!"); } ``` ### 3. Log Data to File Add to `DisplayNode::tick()`: ```rust // simplified use std::fs::OpenOptions; use std::io::Write; let mut file = OpenOptions::new() .create(true) .append(true) .open("temperature_log.txt") .unwrap(); writeln!(file, "{:.1}", temp).ok(); ``` ### 4. Use Rate-Limited Logging In a real robot running at 1kHz, you don't want every tick flooding the log. Use `hlog_once!` for one-time events and `hlog_every!` for periodic status: ```rust // simplified fn tick(&mut self) { // Log once when sensor first connects hlog_once!(info, "Sensor online, streaming data"); // Log status every 5 seconds (not every tick) hlog_every!(5000, info, "Filter running — last value: {:.1}°C", self.filtered_value.unwrap_or(0.0)); // Log warnings at most once per second if let Some(temp) = self.raw_sub.recv() { if temp > 35.0 { hlog_every!(1000, warn, "High temperature: {:.1}°C", temp); } } } ``` ### 5. Add Multiple Sensors Create a second sensor node: ```rust // simplified // In main() scheduler.add(SensorNode::new()?).order(0).build()?; // Sensor 1 scheduler.add(SensorNode::new()?).order(0).build()?; // Sensor 2 ``` Both will publish to the same topic, and FilterNode will process both! ## Key Concepts Demonstrated **Pipeline Pattern**: Data flows through stages (Sensor -> Filter -> Display) **Pub/Sub Decoupling**: Nodes don't know about each other, only topics **Real-Time Processing**: Filtering happens as data arrives **Shared Memory IPC**: Sub-microsecond communication between nodes **Priority Scheduling**: Sensor runs before filter, filter before display ## Shorter Version: node! Macro The entire pipeline above can be written more concisely with the `node!` macro. Here's the same 3-node system in roughly half the code: ```rust use horus::prelude::*; use std::time::{Duration, Instant}; node! { SensorNode { pub { temp_pub: f32 -> "raw_temp" } data { last_publish: Instant = Instant::now(), reading: f32 = 20.0, } init { hlog!(info, "Temperature sensor initialized"); Ok(()) } tick { if self.last_publish.elapsed() >= Duration::from_secs(1) { let base_temp = 25.0 + (self.reading * 0.1).sin() * 5.0; let noise = (self.reading * 0.7).sin() * 2.0; self.temp_pub.send(base_temp + noise); self.reading += 1.0; self.last_publish = Instant::now(); } } } } node! { FilterNode { sub { raw_sub: f32 -> "raw_temp" } pub { filtered_pub: f32 -> "filtered_temp" } data { filtered_value: Option = None, alpha: f32 = 0.3, } tick { if let Some(raw) = self.raw_sub.recv() { let filtered = match self.filtered_value { Some(prev) => self.alpha * raw + (1.0 - self.alpha) * prev, None => raw, }; self.filtered_value = Some(filtered); self.filtered_pub.send(filtered); } } } } node! { DisplayNode { sub { filtered_sub: f32 -> "filtered_temp" } data { count: u32 = 0 } tick { if let Some(temp) = self.filtered_sub.recv() { self.count += 1; let status = if temp < 22.0 { "COLD" } else if temp > 28.0 { "HOT" } else { "NORMAL" }; println!("[#{}] {:.1}°C - {}", self.count, temp, status); } } } } fn main() -> Result<()> { let mut scheduler = Scheduler::new(); scheduler.add(SensorNode::new()).order(0).build()?; scheduler.add(FilterNode::new()).order(1).build()?; scheduler.add(DisplayNode::new()).order(2).build()?; scheduler.run()?; Ok(()) } ``` The `node!` macro generates the struct, constructor, and `Node` trait implementation. Both versions produce identical runtime behavior. See the [node! Macro Guide](/concepts/node-macro) for the full syntax. ## Step 5: Add a Watchdog (Production) The pipeline above works great for learning, but a production robot needs safety monitoring. What if the `FilterNode` hangs — a bug causes an infinite loop, or a hardware sensor blocks? Without a watchdog, the scheduler keeps calling the broken node forever. Add a watchdog to detect and respond to frozen nodes: ```rust // simplified fn main() -> Result<()> { let mut scheduler = Scheduler::new() .watchdog(5000_u64.ms()); // global: 5-second timeout scheduler.add(SensorNode::new()?).order(0).build()?; scheduler.add(FilterNode::new()?).order(1).build()?; scheduler.add(DisplayNode::new()?).order(2).build()?; scheduler.run() } ``` With this change, if any node stops completing its `tick()` within 5 seconds: - **At 5 seconds**: Warning logged — "FilterNode watchdog warning" - **At 10 seconds**: Node marked Unhealthy, skipped. Sensor and Display keep running - **At 15 seconds**: Node Isolated, `enter_safe_state()` called (if implemented) For safety-critical nodes (motors, actuators), you'd use a much tighter timeout and implement `enter_safe_state()`. See [Safety Monitor](/advanced/safety-monitor) for the full guide. ## Key Takeaways - **Pipeline pattern**: Data flows through stages (Sensor -> Filter -> Display), each a separate node - **Pub/sub decoupling**: Nodes only know about topic names, not each other - **Execution order**: `.order()` controls which node runs first in each tick cycle - **recv() is non-blocking**: Returns `None` when no message is available -- not an error - **Watchdog**: `.watchdog()` detects frozen nodes and isolates them — critical for production - **node! macro**: Reduces boilerplate by ~75% while producing identical runtime behavior ## Next Steps Now that you've built a 3-node pipeline, try: 1. **[Choosing Your Configuration](/concepts/choosing-configuration)** — Progressive guide from prototype to production (Level 0→4) 2. **[Execution Classes](/concepts/execution-classes)** — When to use `.compute()`, `.on("topic")`, or `.async_io()` for different node types 3. **[Testing](/development/testing)** — Unit test your nodes with `tick_once()` 4. **[Custom Messages](/tutorials/04-custom-messages)** — Define your own message types 5. **[node! Macro](/concepts/node-macro)** — Reduce boilerplate with macros ### Production Features When you're ready to move beyond prototyping: - **[Safety Monitor](/advanced/safety-monitor)** — Watchdog, deadline enforcement, graduated degradation - **[BlackBox](/advanced/blackbox)** — Flight recorder for post-mortem crash analysis - **[Deterministic Mode](/advanced/deterministic-mode)** — Reproducible execution for simulation and CI testing - **[Record & Replay](/advanced/record-replay)** — Tick-perfect replay for reproducing field bugs - **[Fault Tolerance](/advanced/circuit-breaker)** — Per-node failure policies (restart, skip, fatal) - **[Emergency Stop](/recipes/emergency-stop)** — Event-driven e-stop pattern for actuator safety ## Full Code The complete code above is production-ready. To save it: 1. Copy the entire code block 2. Replace `src/main.rs` in your project 3. Run with `horus run` For additional examples, see [Basic Examples](/rust/examples/basic-examples). --- ## See Also - [Tutorials](/tutorials) — Continue learning with guided tutorials - [Recipes](/recipes) — Production-ready patterns (PID, sensor fusion, hardware, telemetry) - [Core Concepts](/concepts) — Understand how HORUS works - [Real-Time Systems](/concepts/real-time) — What real-time means and when you need it - [Framework Clock](/concepts/real-time#framework-clock) — `horus::now()`, `dt()`, `elapsed()`, `budget_remaining()` --- ## Building Your Second Application (Python) Path: /getting-started/second-application-python Description: Build a 3-node sensor pipeline with filtering and display in Python # Building Your Second Application (Python) Now that you have built your first HORUS application in Python, let's create something more practical: a **3-node sensor pipeline** that reads temperature data, filters out noise, and displays the results. ## Prerequisites - HORUS installed ([Installation Guide](/getting-started/installation)) - [Quick Start (Python)](/getting-started/quick-start-python) completed (you know how to create a Python project and run it) - Python 3.9+ with HORUS bindings (`python3 -c "import horus"` works) **Time:** ~20 minutes ## What You'll Build A real-time temperature monitoring system with three nodes: 1. **SensorNode**: Publishes simulated temperature readings every second 2. **FilterNode**: Subscribes to raw temperatures, filters noise, republishes clean data 3. **DisplayNode**: Subscribes to filtered data, displays to console This demonstrates: - Multi-node communication patterns - Data pipeline processing - Real-time filtering with state - CLI introspection tools ## Architecture SensorNode
Publish 25.3°C"] -->|raw_temp| F["FilterNode
Remove noise"] F -->|filtered_temp| D["DisplayNode
Temp: 25.0°C"] style S fill:#3b82f6,color:#fff style F fill:#10b981,color:#fff style D fill:#f59e0b,color:#fff `} caption="Temperature pipeline: SensorNode → FilterNode → DisplayNode" /> ## Step 1: Create the Project ```bash horus new temperature_pipeline -p cd temperature_pipeline ``` The `-p` flag creates a Python project with `src/main.py`, `horus.toml`, and `.horus/`. ## Step 2: Write the Code Replace `src/main.py` with this complete, runnable code: ```python import math import horus # ============================================================================ # Node 1: SensorNode — Publishes temperature readings # ============================================================================ def make_sensor(): state = {"reading": 0.0} def init(node): node.log_info("Temperature sensor initialized") def tick(node): # Simulate realistic temperature with noise # Base temperature oscillates between 20-30°C base_temp = 25.0 + math.sin(state["reading"] * 0.1) * 5.0 # Add random noise (+/- 2°C) noise = math.sin(state["reading"] * 0.7) * 2.0 temperature = base_temp + noise # Publish raw temperature node.send("raw_temp", temperature) node.log_info(f"Published raw temp: {temperature:.2f}°C") state["reading"] += 1.0 def shutdown(node): node.log_info("Sensor shutdown complete") return horus.Node( name="SensorNode", tick=tick, init=init, shutdown=shutdown, rate=1, # 1 Hz — one reading per second order=0, # Runs first in each tick cycle pubs=["raw_temp"], ) # ============================================================================ # Node 2: FilterNode — Removes noise with exponential moving average # ============================================================================ def make_filter(): state = { "filtered_value": None, "alpha": 0.3, # Smoothing factor: 30% new data, 70% previous } def init(node): node.log_info(f"Filter initialized (alpha = {state['alpha']:.2f})") def tick(node): raw_temp = node.recv("raw_temp") if raw_temp is None: return alpha = state["alpha"] # Apply exponential moving average filter if state["filtered_value"] is None: filtered = raw_temp # First reading, no previous value else: prev = state["filtered_value"] filtered = alpha * raw_temp + (1.0 - alpha) * prev state["filtered_value"] = filtered # Publish filtered temperature node.send("filtered_temp", filtered) noise_removed = raw_temp - filtered node.log_info( f"Filtered: {raw_temp:.2f}°C -> {filtered:.2f}°C " f"(removed {noise_removed:.2f}°C noise)" ) def shutdown(node): node.log_info("Filter shutdown complete") return horus.Node( name="FilterNode", tick=tick, init=init, shutdown=shutdown, rate=1, order=1, # Runs after SensorNode pubs=["filtered_temp"], subs=["raw_temp"], ) # ============================================================================ # Node 3: DisplayNode — Shows filtered temperature on console # ============================================================================ def make_display(): state = {"count": 0} def init(node): node.log_info("Display initialized") print("\n========================================") print(" Temperature Monitor — Press Ctrl+C to stop") print("========================================\n") def tick(node): temp = node.recv("filtered_temp") if temp is None: return state["count"] += 1 # Display temperature with status indicator if temp < 22.0: status = "COLD" elif temp > 28.0: status = "HOT" else: status = "NORMAL" print(f"[Reading #{state['count']}] Temperature: {temp:.1f}°C — Status: {status}") node.log_debug(f"Displayed reading #{state['count']}") def shutdown(node): print(f"\n========================================") print(f" Total readings displayed: {state['count']}") print(f"========================================\n") node.log_info("Display shutdown complete") return horus.Node( name="DisplayNode", tick=tick, init=init, shutdown=shutdown, rate=1, order=2, # Runs last — after FilterNode subs=["filtered_temp"], ) # ============================================================================ # Main — Configure and run the pipeline # ============================================================================ if __name__ == "__main__": print("Starting Temperature Pipeline...\n") horus.run(make_sensor(), make_filter(), make_display()) ``` ## Step 3: Run the Application ```bash horus run ``` **Expected output:** ``` Starting Temperature Pipeline... ======================================== Temperature Monitor — Press Ctrl+C to stop ======================================== [Reading #1] Temperature: 23.4°C — Status: NORMAL [Reading #2] Temperature: 24.1°C — Status: NORMAL [Reading #3] Temperature: 25.8°C — Status: NORMAL [Reading #4] Temperature: 27.2°C — Status: NORMAL [Reading #5] Temperature: 28.6°C — Status: HOT [Reading #6] Temperature: 27.9°C — Status: NORMAL [Reading #7] Temperature: 26.3°C — Status: NORMAL ``` Press **Ctrl+C** to stop: ``` ^C Ctrl+C received! Shutting down HORUS scheduler... ======================================== Total readings displayed: 7 ======================================== ``` ## Step 4: Inspect with CLI Tools While the application is running, open a **second terminal** to inspect the live data. ### List Active Topics ```bash horus topic list ``` You should see both topics: ``` raw_temp (active, 1 publisher) filtered_temp (active, 1 publisher) ``` ### Echo Raw Sensor Data Watch the noisy sensor readings in real time: ```bash horus topic echo raw_temp ``` ``` [1] 23.42 [2] 25.87 [3] 27.14 [4] 24.63 ... ``` Press **Ctrl+C** to stop echoing. ### Echo Filtered Data Compare with the smoothed output: ```bash horus topic echo filtered_temp ``` ``` [1] 23.42 [2] 25.14 [3] 25.74 [4] 25.41 ... ``` Notice how the filtered values change more gradually than the raw values. ### Measure Publishing Rate Verify each topic publishes at the expected rate: ```bash horus topic hz raw_temp ``` ``` average rate: 1.00 Hz min: 0.998s, max: 1.002s, std dev: 0.001s ``` ```bash horus topic hz filtered_temp ``` The filtered topic should also show ~1 Hz since the FilterNode republishes every time it receives a new reading. ### Monitor Dashboard For a full overview of nodes, topics, and metrics: ```bash horus monitor ``` The monitor will show: - **Nodes tab**: SensorNode, FilterNode, DisplayNode with their rates and states - **Topics tab**: `raw_temp` and `filtered_temp` with message counts - **Metrics tab**: IPC latency, tick duration, and message throughput ## Understanding the Code ### SensorNode ```python def tick(node): base_temp = 25.0 + math.sin(state["reading"] * 0.1) * 5.0 noise = math.sin(state["reading"] * 0.7) * 2.0 temperature = base_temp + noise node.send("raw_temp", temperature) ``` **Key points:** - Publishes to `"raw_temp"` topic at 1 Hz (set by `rate=1`) - State lives in a dictionary captured by the closure - `node.send(topic, data)` publishes any serializable value ### FilterNode ```python def tick(node): raw_temp = node.recv("raw_temp") if raw_temp is None: return filtered = alpha * raw_temp + (1.0 - alpha) * prev node.send("filtered_temp", filtered) ``` **Key points:** - Subscribes to `"raw_temp"`, publishes to `"filtered_temp"` - Implements an exponential moving average (EMA) filter - `alpha = 0.3` balances responsiveness vs smoothness - `node.recv()` returns `None` when no message is available (not an error) **Filter behavior:** - **High alpha (0.8)**: Fast response, less smoothing - **Low alpha (0.2)**: Slow response, more smoothing ### DisplayNode ```python def tick(node): temp = node.recv("filtered_temp") if temp is None: return print(f"[Reading #{state['count']}] Temperature: {temp:.1f}°C — Status: {status}") ``` **Key points:** - Subscribes to `"filtered_temp"` only - Only prints when new data is available - Uses `init` and `shutdown` callbacks for banner display ### State Management Python nodes manage state through closures. Each `make_*()` factory function creates a `state` dictionary that the `tick`, `init`, and `shutdown` functions capture: ```python def make_sensor(): state = {"reading": 0.0} # Mutable state def tick(node): state["reading"] += 1.0 # Mutate via dict node.send("raw_temp", state["reading"]) return horus.Node(name="SensorNode", tick=tick, ...) ``` Dictionaries work because `tick` closes over the dict reference, and dict values are mutable in place. You can also use a list (`[0.0]`) for single values or a class instance for complex state. ### Execution Order The `order` parameter controls when each node runs within a tick cycle: ```python make_sensor() # order=0 — runs first, produces data make_filter() # order=1 — runs second, consumes and re-publishes make_display() # order=2 — runs third, consumes final output ``` Lower order values run first. This ensures data flows through the pipeline in a single tick cycle without a one-tick delay between stages. ## Common Issues and Fixes ### Issue: No Output Displayed **Symptom:** ``` Starting Temperature Pipeline... ======================================== Temperature Monitor — Press Ctrl+C to stop ======================================== [Nothing appears] ``` **Cause:** Topic names don't match between publisher and subscriber. **Fix:** - Check topic names match exactly: `"raw_temp"` and `"filtered_temp"` - Verify with `horus topic list` in a second terminal - Ensure all three nodes are passed to `horus.run()` ### Issue: Too Much or Too Little Smoothing **Symptom:** Temperature changes too fast or too slow. **Fix:** Adjust the `alpha` value in `make_filter()`: ```python state = { "alpha": 0.3, # Current: moderate smoothing # Try these alternatives: # "alpha": 0.7, # More responsive, less smooth # "alpha": 0.1, # Very smooth, slower response } ``` ### Issue: `ModuleNotFoundError: horus` **Cause:** Python bindings not installed. **Fix:** ```bash pip install horus-robotics ``` For source builds: ```bash cd horus_py && maturin develop --release ``` ### Issue: `horus topic echo` Shows Nothing **Cause:** Application not running, or running with `python src/main.py` instead of `horus run`. **Fix:** 1. Start the application with `horus run` (not `python src/main.py`) 2. In a separate terminal, run `horus topic echo raw_temp` 3. If still empty, run `horus topic list` to check which topics exist ### Issue: Stale Shared Memory **Symptom:** `Failed to create Topic` or topics from a previous run interfere. **Fix:** ```bash horus clean --shm ``` ## Experiments to Try ### Change the Update Rate Make the sensor publish faster (2 Hz instead of 1 Hz): ```python return horus.Node( name="SensorNode", tick=tick, rate=2, # 2 Hz instead of 1 Hz ... ) ``` Remember to update FilterNode and DisplayNode rates to match, or they will only process every other reading. ### Add Temperature Alerts Add a warning when the temperature exceeds a threshold: ```python def tick(node): temp = node.recv("filtered_temp") if temp is None: return state["count"] += 1 print(f"[Reading #{state['count']}] Temperature: {temp:.1f}°C") if temp > 30.0: print(" WARNING: High temperature detected!") node.log_warning(f"High temp alert: {temp:.1f}°C") ``` ### Log Data to a File Add file logging to the DisplayNode: ```python def make_display(): state = {"count": 0, "logfile": None} def init(node): state["logfile"] = open("temperature_log.csv", "w") state["logfile"].write("reading,temperature\n") def tick(node): temp = node.recv("filtered_temp") if temp is None: return state["count"] += 1 state["logfile"].write(f"{state['count']},{temp:.2f}\n") state["logfile"].flush() print(f"[#{state['count']}] {temp:.1f}°C (logged)") def shutdown(node): if state["logfile"]: state["logfile"].close() node.log_info("Log file closed") return horus.Node(name="DisplayNode", tick=tick, init=init, shutdown=shutdown, rate=1, order=2, subs=["filtered_temp"]) ``` ### Use a Class for State If you prefer classes over closures: ```python class SensorState: def __init__(self): self.reading = 0.0 sensor = SensorState() def sensor_tick(node): sensor.reading += 1.0 base_temp = 25.0 + math.sin(sensor.reading * 0.1) * 5.0 noise = math.sin(sensor.reading * 0.7) * 2.0 node.send("raw_temp", base_temp + noise) sensor_node = horus.Node( name="SensorNode", tick=sensor_tick, rate=1, order=0, pubs=["raw_temp"], ) ``` Both approaches are valid. Closures keep state co-located with the factory function; classes work better when state is complex or shared. ### Use the Scheduler Directly For more control over execution, use `horus.Scheduler` instead of `horus.run()`: ```python sched = horus.Scheduler(tick_rate=100, watchdog_ms=500) sched.add(make_sensor()) sched.add(make_filter()) sched.add(make_display()) # Run for 30 seconds, then stop automatically sched.run(duration=30.0) ``` This gives you access to runtime mutation, safety configuration, and introspection methods. See the [Python API Reference](/python/api) for the full Scheduler API. ## Key Takeaways - **Pipeline pattern**: Data flows through stages (Sensor -> Filter -> Display), each a separate node - **Pub/sub decoupling**: Nodes only know about topic names, not each other - **Execution order**: `order` controls which node runs first in each tick cycle - **`recv()` is non-blocking**: Returns `None` when no message is available -- not an error - **State in closures**: Use dictionaries or class instances captured by tick functions - **CLI introspection**: `horus topic echo`, `horus topic hz`, and `horus monitor` work on live data from any terminal ## Next Steps Now that you have built a 3-node pipeline, try: 1. **[Choosing Your Configuration](/concepts/choosing-configuration)** -- Progressive guide from prototype to production (Level 0→4) 2. **[Execution Classes](/concepts/execution-classes)** -- When to use `compute=True`, `on="topic"`, or async I/O 3. **[Testing](/development/testing)** -- Unit test your nodes with `tick_once()` 4. **[Custom Messages](/tutorials/04-custom-messages-python)** -- Define your own message types 5. **[Choosing a Language](/getting-started/choosing-language)** -- When to use Python vs Rust ### Production Features When you're ready to move beyond prototyping: - **[Safety Policies (Python)](/python/safety-policies)** -- Watchdog, deadline enforcement, graduated degradation - **[BlackBox](/advanced/blackbox)** -- Flight recorder for post-mortem crash analysis - **[Deterministic Mode](/advanced/deterministic-mode)** -- Reproducible execution for simulation and CI testing - **[Record & Replay](/advanced/record-replay)** -- Tick-perfect replay for reproducing field bugs - **[Fault Tolerance](/advanced/circuit-breaker)** -- Per-node failure policies (restart, skip, fatal) - **[Emergency Stop (Python)](/recipes/emergency-stop-python)** -- Event-driven e-stop pattern for actuator safety ## Full Code The complete code above is production-ready. To save it: 1. Copy the entire code block from Step 2 2. Replace `src/main.py` in your project 3. Run with `horus run` For additional examples, see [Python Examples](/python/examples). --- ## See Also - [Quick Start (Python)](/getting-started/quick-start-python) -- Your first Python application - [Building Your Second Application (Rust)](/getting-started/second-application) -- The same pipeline in Rust - [Tutorials](/tutorials) -- Continue learning with guided tutorials - [Recipes](/recipes) -- Production-ready patterns (PID, sensor fusion, hardware, telemetry) - [Core Concepts](/concepts) -- Understand how HORUS works - [Real-Time (Python)](/python/real-time) -- Budget, deadline, and `horus.budget_remaining()` --- ## Simulation Path: /getting-started/simulation Description: Run your robot in simulation with horus-sim3d before deploying to hardware # Simulation HORUS separates your robot logic from the hardware it runs on. The same node code that reads from a physical LiDAR or IMU works identically with simulated sensors, because both publish to the same shared-memory topics using the same message types. ## Overview **horus-sim3d** is a 3D physics simulator distributed as a HORUS CLI plugin. It is not bundled with HORUS -- you install it separately: ```bash horus install horus-sim3d ``` Once installed, one flag swaps between simulation and real hardware: ```bash horus run # real hardware horus run --sim # simulated sensors via horus-sim3d ``` Your robot code does not change. Nodes subscribe to topics like `turtlebot.front_lidar.scan` regardless of whether the data comes from a physical RPLidar or a simulated laser scanner. ## Quick Start ### 1. Create a Project ```bash horus new my_robot cd my_robot ``` ### 2. Add a Robot Description Edit `horus.toml` to point at your URDF file: ```toml [robot] name = "turtlebot" description = "robot.urdf" ``` Place your URDF file in the project root (or a subdirectory -- the path is relative to the project root). ### 3. Define Hardware Add a `[hardware]` section listing every device. Mark devices that should run in simulation with `sim = true`: ```toml [hardware] lidar = { use = "rplidar", port = "/dev/ttyUSB0", sim = true, noise = 0.01 } imu = { use = "bno055", bus = "/dev/i2c-1", sim = true } ``` ### 4. Run in Simulation ```bash horus run --sim ``` This launches horus-sim3d, loads your URDF, and starts publishing simulated sensor data on the same topics your nodes already subscribe to. Only devices with `sim = true` are replaced by the simulator -- the rest connect to real hardware as usual. ### 5. Run on Real Hardware When you are ready for the real robot: ```bash horus run ``` No code changes. The scheduler loads the `[hardware]` section and connects to physical hardware. The `sim` field is ignored when running without `--sim`. ## Configuration ### The `[robot]` Section The `[robot]` table in `horus.toml` tells HORUS about your robot model: ```toml [robot] name = "turtlebot" # used in topic naming description = "robot.urdf" # URDF file path (passed to the simulator) simulator = "sim3d" # which simulator plugin (optional, default: sim3d) ``` | Key | Required | Description | |-----|----------|-------------| | `name` | Yes | Robot name. Used as the first segment in all topic names (e.g., `turtlebot.front_lidar.scan`) | | `description` | No | Path to URDF, Xacro, SDF, or MJCF file relative to project root. Passed to the simulator for model loading | | `simulator` | No | Simulator plugin name. Defaults to `"sim3d"`. HORUS launches `horus-{simulator}` when you run `--sim` | ### The `[hardware]` Section `[hardware]` defines all your devices in one place. Each device can optionally include `sim = true` to indicate it should be replaced by the simulator when running in `--sim` mode. When you run `horus run --sim`, devices with `sim = true` are handed off to the simulator. Devices **without** `sim = true` (or with `sim = false`) stay connected to real hardware. This lets you mix real and simulated hardware in a single configuration block. ```toml [hardware] lidar = { use = "rplidar", port = "/dev/ttyUSB0", sim = true, noise = 0.01 } imu = { use = "bno055", bus = "/dev/i2c-1", sim = true } camera = { use = "realsense" } # camera has no sim = true -- stays real even in --sim mode ``` ### Mixed Mode You can selectively simulate individual devices while keeping others connected to real hardware: ```bash horus run --sim lidar # only lidar is simulated, IMU + camera stay real horus run --sim lidar camera # lidar + camera simulated, IMU stays real horus run --sim # all devices with sim = true are simulated ``` This is useful for testing a new perception algorithm against simulated LiDAR while the robot's IMU and motors are connected to real hardware. ## Hardware Types Each entry in `[hardware]` specifies a device source: | Type | TOML Syntax | What It Does | |------|-------------|-------------| | **Terra** | `use = "rplidar"` | Uses a pre-built Terra hardware driver | | **Package** | `package = "horus-driver-ati-netft"` | Installs and runs a registry driver package | | **Node** | `node = "ConveyorDriver"` | Runs a local node registered via `register_driver!` | | **Crate** | `crate = "rplidar-driver"` | Adds a Rust crate from crates.io to `.horus/Cargo.toml` | | **PyPI** | `pip = "adafruit-bno055"` | Adds a Python package from PyPI to `.horus/pyproject.toml` | | **Exec** | `exec = "./sensor_bridge.py"` | Launches as a subprocess with health monitoring | Any device can include `sim = true` alongside its source to mark it as simulatable. When `--sim` is active for that device, the simulator takes over instead of loading the real driver. Devices can include additional parameters as key-value pairs: ```toml [hardware.arm] use = "dynamixel" port = "/dev/ttyUSB0" baudrate = 1000000 servo_ids = [1, 2, 3, 4, 5, 6] sim = true ``` ## Topic Naming Convention HORUS uses a shared topic naming convention that both real hardware and the simulator follow. This is why your code works in both modes without changes. **Format**: `{robot_name}.{sensor_name}.{data_type}` Topics use dots as separators (not slashes). This is required because HORUS topics are backed by shared memory files, and `shm_open()` on macOS does not allow slashes in names. ### Standard Data Type Suffixes | Suffix | Sensor / Use | Message Type | |--------|-------------|-------------| | `scan` | 2D/3D LiDAR | `LaserScan` | | `imu` | IMU (accel + gyro) | `Imu` | | `gps` | GPS receiver | `Gps` | | `image` | RGB camera | `Image` | | `depth` | Depth camera | `Image` | | `camera_info` | Camera intrinsics | `CameraInfo` | | `odom` | Wheel odometry | `Odometry` | | `cmd_vel` | Velocity commands | `CmdVel` | | `joint_state` | Joint positions/velocities | `JointState` | | `joint_cmd` | Joint commands | `JointCommand` | | `wrench` | Force/torque sensor | `Wrench` | | `sonar` | Ultrasonic sonar | `Sonar` | | `encoder` | Rotary encoder | `Encoder` | | `radar` | Radar with Doppler | `Radar` | | `segmentation` | Semantic segmentation | `Image` | | `thermal` | Thermal/IR camera | `Image` | | `event_camera` | Dynamic vision sensor | `EventCamera` | | `pointcloud` | 3D point cloud | `PointCloud` | ### Examples ```text turtlebot.front_lidar.scan # 2D LiDAR scan data turtlebot.imu_sensor.imu # IMU readings turtlebot.rgb_camera.image # RGB camera image turtlebot.rgb_camera.depth # Depth from an RGB-D camera turtlebot.rgb_camera.camera_info # Camera intrinsics turtlebot.odom # Robot-level odometry (no sensor_name) turtlebot.cmd_vel # Velocity commands (no sensor_name) turtlebot.joint_state # Joint positions (no sensor_name) ``` Robot-level topics (odometry, velocity commands, joint state) omit the sensor name segment -- they apply to the whole robot, not a specific sensor. ## Sim Control Services When horus-sim3d is running, your nodes can control the simulation at runtime through service topics. Each service has a `.request` topic (you write to) and a `.response` topic (you read from). | Service | Topic Name | Request Type | What It Does | |---------|-----------|-------------|-------------| | **Spawn** | `sim.spawn` | `SpawnRequest` | Spawn a model (URDF, SDF) or primitive (box, sphere, cylinder) | | **Despawn** | `sim.despawn` | `DespawnRequest` | Remove an entity by ID | | **Teleport** | `sim.teleport` | `TeleportRequest` | Instantly move an entity to a new pose | | **Pause** | `sim.pause` | `PauseRequest` | Pause physics and sensor updates | | **Resume** | `sim.resume` | `ResumeRequest` | Resume a paused simulation | | **Raycast** | `sim.raycast` | `RaycastRequest` | Cast a ray and get the hit point, normal, and entity ID | | **Get State** | `sim.state.get` | `GetStateRequest` | Query sim time, entity count, paused status, physics dt | | **Set Param** | `sim.param.set` | `SetParamRequest` | Change gravity or physics timestep at runtime | These service names are defined in `horus_library::topic_convention::service` and are **simulator-agnostic** -- any simulator plugin that implements them will work with `horus run --sim`. ### Example: Spawning an Obstacle Rust: ```rust use horus::prelude::*; use horus_library::messages::simulation::*; // Create request/response topics let req: Topic = Topic::new("sim.spawn.request")?; let resp: Topic = Topic::new("sim.spawn.response")?; // Spawn a box at position (2, 0.5, 0) with size 1x1x1 req.send(SpawnRequest { model: "box".into(), name: "obstacle_1".into(), position: [2.0, 0.5, 0.0], scale: [1.0, 1.0, 1.0], ..Default::default() }); // Wait for response std::thread::sleep(std::time::Duration::from_millis(50)); if let Some(response) = resp.recv() { if response.success { println!("Spawned entity {} with ID {}", response.name, response.entity_id); } } ``` Python: ```python # simplified # NOTE: SpawnRequest/SpawnResponse are not yet available in Python bindings. # Use dict-based topics for simulation control from Python: import horus spawn_req = horus.Topic("sim.spawn.request") spawn_req.send({ "model": "box", "name": "obstacle_1", "position": [2.0, 0.5, 0.0], "scale": [1.0, 1.0, 1.0], }) ``` ## sim.toml (Optional) `sim.toml` is an optional configuration file for horus-sim3d. It provides fine-grained control over physics, sensors, actuators, world setup, and rendering. Place it in your project root and pass it with `--config`: ```bash horus sim3d --config sim.toml --robot robot.urdf ``` ### Physics Configuration ```toml [physics] dt = 0.001 # timestep in seconds (default: 1/240) solver = "newton" # "newton", "lcp", or "smooth" integrator = "semi_implicit_euler" # "semi_implicit_euler", "rk4", "velocity_verlet", "implicit_euler" max_iterations = 100 substeps = 1 [physics.contact] damping = 0.5 stiffness = 100000.0 ``` ### World Configuration ```toml [world] ground = true # enable ground plane gravity = [0.0, -9.81, 0.0] # gravity vector [x, y, z] in m/s^2 scene = "warehouse.yaml" # optional scene file [[world.objects]] name = "wall_1" model = "box" position = [5.0, 1.0, 0.0] scale = [0.2, 2.0, 10.0] is_static = true [world.terrain] type = "heightfield" source = "terrain.png" size = [100.0, 100.0] height_scale = 5.0 ``` ### Sensor Overrides ```toml [sensors.front_lidar] type = "lidar" link = "lidar_link" # URDF link to attach to rate_hz = 10 # publish rate (overrides URDF value) rays = 360 range = [0.1, 30.0] fov = 360.0 noise = { type = "gaussian", std = 0.01 } [sensors.imu_sensor] type = "imu" link = "imu_link" rate_hz = 100 noise = { type = "gaussian", std = 0.005 } ``` ### Actuator Configuration ```toml [actuators.drive] type = "differential_drive" topic = "cmd_vel" wheel_radius = 0.033 wheel_separation = 0.16 max_speed = 1.0 max_torque = 2.0 [actuators.drive.motor_model] type = "dc" # DC motor model for sim-to-real fidelity [actuators.drive.latency] command_ms = 5 # simulated command latency sensor_ms = 2 # simulated sensor feedback latency ``` ### Full sim.toml Structure | Section | Purpose | |---------|---------| | `[robot]` | Robot name and URDF path | | `[physics]` | Timestep, solver, integrator, contact parameters | | `[world]` | Ground plane, gravity, terrain, static/dynamic objects | | `[sensors.*]` | Per-sensor configuration: type, rate, noise, link attachment | | `[actuators.*]` | Per-actuator configuration: type, motor model, latency | | `[controllers.*]` | Controller configuration (PID, etc.) | | `[materials.*]` | Physics material definitions (friction, restitution) | | `[topics]` | Topic naming and remapping | | `[visual]` | Rendering settings and camera modes | | `[recording]` | Trajectory and sensor data recording | | `[rl]` | Reinforcement learning training configuration | ## URDF Sensors horus-sim3d reads sensor definitions from your URDF file's `` elements. This is the primary source for sensor placement and configuration: ```xml 360 -3.14159 3.14159 0.1 30.0 10 ``` The `[sensors]` section in sim.toml **overlays on top** of what is in the URDF. You do not need to duplicate sensor definitions -- only specify what you want to override: ```toml # Override just the rate and add noise -- geometry comes from the URDF [sensors.front_lidar] type = "lidar" rate_hz = 20 # override URDF's 10 Hz to 20 Hz noise = { type = "gaussian", std = 0.02 } ``` If a sensor name in sim.toml matches a URDF sensor name, the sim.toml values are merged onto the URDF values. If the name does not match any URDF sensor, sim3d spawns it as an additional sensor. ### Supported Sensor Types horus-sim3d supports 16 sensor types: | Type | sim.toml `type` | Data Suffix | Description | |------|----------------|-------------|-------------| | 2D/3D LiDAR | `lidar` | `scan` | Ray-based laser scanner | | IMU | `imu` | `imu` | Accelerometer + gyroscope | | RGB Camera | `rgb_camera` | `image` | Color camera with render pipeline | | Depth Camera | `depth_camera` | `depth` | Depth-only camera | | Stereo Camera | `stereo_camera` | `image` | Stereo camera pair | | GPS | `gps` | `gps` | GNSS receiver | | Force/Torque | `force_torque` | `wrench` | 6-axis force/torque on a joint | | Contact | `contact` | `contact` | Binary contact detection | | Encoder | `encoder` | `encoder` | Rotary or absolute encoder on a joint | | Sonar | `sonar` | `sonar` | Ultrasonic distance sensor | | Radar | `radar` | `radar` | Doppler radar | | Thermal Camera | `thermal_camera` | `thermal` | Infrared camera | | Event Camera | `event_camera` | `event_camera` | Dynamic vision sensor (neuromorphic) | | Barometer | `barometer` | `barometer` | Atmospheric pressure sensor | | Magnetometer | `magnetometer` | `magnetometer` | 3-axis compass | | Altimeter | `altimeter` | `altimeter` | Altitude sensor | ## Complete Example Here is a complete project setup for a differential-drive robot with LiDAR, IMU, and a camera. **horus.toml**: ```toml [package] name = "my-robot" version = "0.1.0" [robot] name = "turtlebot" description = "robot.urdf" [hardware] lidar = { use = "rplidar", port = "/dev/ttyUSB0", sim = true, noise = 0.01 } imu = { use = "bno055", bus = "/dev/i2c-1", sim = true } camera = { use = "realsense", sim = true } ``` **src/main.rs** (works in both sim and real): ```rust use horus::prelude::*; fn main() -> anyhow::Result<()> { let scan: Topic = Topic::new("turtlebot.front_lidar.scan")?; let imu_data: Topic = Topic::new("turtlebot.imu_sensor.imu")?; let cmd: Topic = Topic::new("turtlebot.cmd_vel")?; Scheduler::new() .add("navigator", |ctx| { // Read latest sensor data if let Some(scan) = scan.recv() { let min_range = scan.ranges.iter().copied().fold(f32::MAX, f32::min); if min_range < 0.5 { // Obstacle close -- turn cmd.send(CmdVel { linear_x: 0.0, angular_z: 0.5, ..Default::default() }); } else { // Clear path -- drive forward cmd.send(CmdVel { linear_x: 0.3, angular_z: 0.0, ..Default::default() }); } } }) .rate(10.hz()) .build()?; Ok(()) } ``` Run it: ```bash # Test in simulation first horus run --sim # Deploy to real hardware horus run ``` ## sim3d Deep Dive The horus-sim3d plugin has extensive documentation covering every aspect of the simulator. These guides go deeper than this overview page. ### References | Guide | What it covers | |-------|---------------| | [CLI Reference](https://github.com/softmata/horus-sim3d/blob/main/docs/cli-reference.md) | All sim3d flags: `--mode`, `--world`, `--speed`, `--no-gui`, `--namespace`, `--driver-mode` | | [sim.toml Configuration](https://github.com/softmata/horus-sim3d/blob/main/docs/configuration.md) | Complete reference for physics, world, sensors, actuators, materials, visual, recording, RL (1300+ lines) | | [Scene Format](https://github.com/softmata/horus-sim3d/blob/main/docs/scene-format.md) | YAML world file schema — static objects, terrain, lighting, spawn points | | [Robot Loading](https://github.com/softmata/horus-sim3d/blob/main/docs/robot-loading.md) | URDF, MJCF, SDF parsing — how robot models are loaded and configured | | [Sensor Reference](https://github.com/softmata/horus-sim3d/blob/main/docs/sensors.md) | All 16 sensor types — configuration, noise models, rate tuning, URDF overlay | | [Actuators Reference](https://github.com/softmata/horus-sim3d/blob/main/docs/actuators.md) | Motor models, latency simulation, traction, battery | | [Topic Wiring](https://github.com/softmata/horus-sim3d/blob/main/docs/topic-wiring.md) | How sim3d topics map to horus hardware topics — the sim/real swap mechanism | | [Physics Engine](https://github.com/softmata/horus-sim3d/blob/main/docs/physics.md) | Featherstone ABA/RNEA/CRBA, LCP contacts, solver selection, integrators | | [Performance Tuning](https://github.com/softmata/horus-sim3d/blob/main/docs/performance.md) | Substeps, dt, headless mode, GPU acceleration, profiling | | [Multi-Robot](https://github.com/softmata/horus-sim3d/blob/main/docs/multi-robot.md) | Namespace isolation, multi-sim coordination | | [Recording](https://github.com/softmata/horus-sim3d/blob/main/docs/recording.md) | Trajectory capture, sensor data recording, video export | | [RL Training](https://github.com/softmata/horus-sim3d/blob/main/docs/reinforcement-learning.md) | GymVecEnv, domain randomization, curriculum, Python bindings | | [Cloud Deployment](https://github.com/softmata/horus-sim3d/blob/main/docs/deployment/cloud.md) | Running sim3d headless in cloud/CI environments | | [Editor](https://github.com/softmata/horus-sim3d/blob/main/docs/editor.md) | Entity inspector, visual debugging tools | ### Tutorials | Tutorial | What you build | |----------|---------------| | [1: Basic Simulation](https://github.com/softmata/horus-sim3d/blob/main/docs/tutorials/01_basic_simulation.md) | Load a world, spawn objects, control physics | | [2: Robot Simulation](https://github.com/softmata/horus-sim3d/blob/main/docs/tutorials/02_robot_simulation.md) | Load URDF, wire sensors, drive a robot | | [3: Sensors](https://github.com/softmata/horus-sim3d/blob/main/docs/tutorials/03_sensors.md) | Configure all sensor types, add noise, tune rates | | [4: Reinforcement Learning](https://github.com/softmata/horus-sim3d/blob/main/docs/tutorials/04_reinforcement_learning.md) | Set up RL training with vectorized environments | ### How `horus run --sim` Works When you run `horus run --sim`, the CLI: 1. Reads `[robot].name` and `[robot].description` from `horus.toml` 2. Sets `HORUS_SIM_MODE=1` so the hardware system identifies devices with `sim = true` 3. Launches sim3d as a background process: `horus-sim3d --driver-mode --robot robot.urdf --robot-name turtlebot` 4. The `--driver-mode` flag tells sim3d to use the shared topic naming convention (`{robot}.{sensor}.{type}`) without a namespace prefix -- so topics match what real hardware produces 5. sim3d parses the URDF, creates physics world, attaches sensors, and starts publishing to horus SHM topics 6. Your code runs normally -- nodes subscribe to the same topics regardless of sim or real 7. On exit, horus kills the sim3d process If sim3d is not installed, you get a helpful message: `Install with: horus install horus-sim3d` ## Next Steps - [Deterministic Mode](/advanced/deterministic-mode) -- reproducible execution with virtual time for simulation and testing - [Package Management](/package-management/package-management) -- install plugins and packages from the HORUS registry - [Standard Messages](/stdlib/messages/cmd-vel) -- message types used by topics --- ## Troubleshooting (Python) Path: /getting-started/troubleshooting-python Description: Fix installation, runtime, and performance issues in HORUS Python applications # Troubleshooting (Python) Fix installation failures, runtime errors, performance problems, and debug HORUS Python applications. Start with the quick diagnostic steps, then jump to the section matching your symptom. ## Prerequisites - HORUS installed ([Installation Guide](/getting-started/installation)) - Python 3.9+ (`python3 --version`) - Access to a terminal in your project directory ## Common Errors At a Glance | Error | Jump To | Quick Fix | |-------|---------|-----------| | `ModuleNotFoundError: No module named 'horus'` | [Import Errors](#modulenotfounderror-no-module-named-horus) | `pip install horus-robotics` | | `ModuleNotFoundError: No module named '_horus'` | [Native Extension Missing](#modulenotfounderror-no-module-named-_horus) | Rebuild with `maturin develop --release` | | `Failed to create Topic` | [Topic Creation Errors](#topic-creation-errors) | `horus clean --shm` | | `Permission denied` on `/dev/shm` | [SHM Permission Denied](#permission-denied-on-devshm) | Fix `/dev/shm` permissions | | `recv()` always returns `None` | [Topic Not Found](#topic-not-found--recv-always-returns-none) | Check topic name match | | Topics not visible across processes | [Multi-Process Issues](#topics-not-visible-across-processes) | Use `horus run`, not `python` directly | | High `dropped_count` | [Messages Dropped](#high-dropped_count--messages-silently-lost) | Keep messages under slot size | | `tick()` taking too long | [GIL Blocking](#tick-taking-too-long-gil-blocking) | Release GIL with `compute=True` | | `TypeError` in tick function | [Wrong Tick Signature](#typeerror-in-tick-function) | Use `def tick(node):` | | Version mismatch (CLI vs Python) | [Version Mismatch](#version-mismatch-between-cli-and-python-bindings) | Reinstall both from same source | | `horus: command not found` | [CLI Not Found](#horus-command-not-found) | Add `~/.cargo/bin` to PATH | ## Quick Diagnostic Steps When your HORUS Python application is not working: 1. **Verify imports**: `python3 -c "import horus; print(horus.__version__)"` 2. **Check the Monitor**: Run `horus monitor` in a second terminal to see active nodes and topics 3. **Verify Topics**: Ensure publisher and subscriber use the exact same topic names 4. **Check shared memory**: Run `horus clean --shm` to remove stale regions from a previous crash 5. **Test individually**: Run nodes one at a time to isolate the problem 6. **Check CLI health**: Run `horus doctor` to diagnose system-level issues --- ## Installation Issues ### `ModuleNotFoundError: No module named 'horus'` **Symptom:** ``` >>> import horus ModuleNotFoundError: No module named 'horus' ``` **Cause:** The `horus` Python package is not installed in your active Python environment. **Fix:** ```bash # Install from PyPI pip install horus-robotics # Verify python3 -c "import horus; print('OK')" ``` If you are building from source: ```bash cd horus_py maturin develop --release ``` **Still failing?** Check which Python you are using: ```bash which python3 python3 -c "import sys; print(sys.executable)" pip show horus-robotics ``` If `pip show` says it is installed but `import horus` fails, you are likely running a different Python than the one pip installed into. See [Virtual Environment Issues](#virtual-environment-issues) below. --- ### `ModuleNotFoundError: No module named '_horus'` **Symptom:** ``` >>> import horus ModuleNotFoundError: No module named '_horus' ``` **Cause:** The Python wrapper (`horus/__init__.py`) loads correctly, but the native Rust extension (`_horus`) is missing or was compiled for a different Python version. **Fix:** ```bash # Rebuild the native extension cd horus_py maturin develop --release # Verify the .so file exists python3 -c "import _horus; print('Native extension OK')" ``` **Common reasons the native extension is missing:** - You installed `horus` from a wheel built for a different Python version (e.g., built for 3.11, running 3.12) - The `.so` file was compiled for a different platform - You reinstalled Python without rebuilding the extension **If maturin fails:** ```bash # Ensure Rust is installed rustup --version # Ensure maturin is installed pip install maturin # or: cargo install maturin # Rebuild cd horus_py maturin develop --release ``` --- ### `pip install horus-robotics` Fails **Symptom:** ``` error: can't find Rust compiler ``` or: ``` error: command 'rustc' failed ``` **Cause:** The `horus-robotics` package contains Rust code compiled via PyO3. If no pre-built wheel exists for your platform, pip tries to build from source and needs a Rust toolchain. **Fix:** ```bash # Install Rust curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh source ~/.cargo/env # Retry pip install horus-robotics ``` **On Ubuntu/Debian, you may also need build tools:** ```bash sudo apt update sudo apt install -y build-essential pkg-config libssl-dev pip install horus-robotics ``` --- ### Version Mismatch Between CLI and Python Bindings **Symptom:** Strange behavior, missing features, or errors like: ``` RuntimeError: Scheduler version mismatch ``` or topics created by Python nodes are not visible to Rust nodes (or vice versa). **Cause:** The `horus` CLI and the Python bindings were built from different source versions. They must share the same `horus_core` version to communicate via shared memory. **Fix:** ```bash # Check versions horus --version python3 -c "import horus; print(horus.__version__)" # If they differ, rebuild both from the same source cd /path/to/horus ./install.sh # Rebuilds CLI cd horus_py maturin develop --release # Rebuilds Python bindings ``` --- ## Runtime Issues ### `horus: command not found` **Symptom:** ```bash $ horus run bash: horus: command not found ``` **Fix:** ```bash # Add to PATH export PATH="$HOME/.cargo/bin:$PATH" # Make permanent echo 'export PATH="$HOME/.cargo/bin:$PATH"' >> ~/.bashrc source ~/.bashrc # Verify horus --help ``` --- ### Topic Creation Errors **Symptom:** ``` RuntimeError: Failed to create Topic ``` **Common causes and fixes:** **Stale shared memory from a previous crash:** ```bash horus clean --shm ``` **Insufficient permissions on shared memory (Linux):** ```bash horus doctor # Diagnose permission issues horus clean --shm # Clean stale regions ``` **Conflicting topic names** (same name, different types): ```python # BAD: Same topic name used with different data shapes node_a = horus.Node(pubs=["data"], tick=lambda n: n.send("data", 42.0)) node_b = horus.Node(pubs=["data"], tick=lambda n: n.send("data", {"x": 1})) # GOOD: Distinct names for distinct data node_a = horus.Node(pubs=["sensor.raw"], tick=lambda n: n.send("sensor.raw", 42.0)) node_b = horus.Node(pubs=["sensor.processed"], tick=lambda n: n.send("sensor.processed", {"x": 1})) ``` --- ### "No such file or directory" When Creating Topic **Symptom:** ``` RuntimeError: Failed to create publisher 'camera': No such file or directory ``` **Cause:** You are using slashes (`/`) in a topic name. On macOS, shared memory uses `shm_open()` which does not support slashes. On Linux it works, but your code will not be portable. **Fix:** Use dots instead of slashes: ```python # WRONG - fails on macOS node = horus.Node(pubs=["sensors/camera"], tick=tick) # CORRECT - works on all platforms node = horus.Node(pubs=["sensors.camera"], tick=tick) ``` --- ### Permission Denied on `/dev/shm` **Symptom:** ``` PermissionError: [Errno 13] Permission denied: '/dev/shm/horus/...' ``` **Cause:** The shared memory directory has restrictive permissions, or a previous process created SHM files owned by a different user (e.g., root). **Fix:** ```bash # Diagnose horus doctor # Clean stale shared memory (safe — only removes HORUS regions) horus clean --shm # If /dev/shm itself is restricted (rare) ls -la /dev/shm/ # Should show drwxrwxrwt (world-writable with sticky bit) ``` --- ### Topic Not Found / `recv()` Always Returns `None` **Symptom:** Your subscriber node never receives messages even though the publisher is running. ```python def tick(node): msg = node.recv("sensor_data") if msg is not None: print(f"Got: {msg}") # Never prints ``` **Common causes:** **Topic name mismatch (the #1 cause):** ```python # Publisher pub = horus.Node(pubs=["sensor_data"], tick=pub_tick) # Subscriber (TYPO — missing underscore) sub = horus.Node(subs=["sensordata"], tick=sub_tick) # CORRECT: sub = horus.Node(subs=["sensor_data"], tick=sub_tick) ``` **Debug with CLI tools:** ```bash # List all active topics horus topic list # Watch messages on a specific topic horus topic echo sensor_data # Check topic publish rate horus topic hz sensor_data ``` **Publisher has not sent yet:** ```python # This is normal on the first few ticks — recv() returns None # until the publisher sends its first message def tick(node): msg = node.recv("sensor_data") if msg is not None: process(msg) # else: no message yet — just continue ``` **Wrong execution order:** ```python # Publisher should run first (lower order number) pub = horus.Node(pubs=["data"], tick=pub_tick, order=0) # Subscriber runs after sub = horus.Node(subs=["data"], tick=sub_tick, order=1) ``` **Topic not declared in subs/pubs:** ```python # BAD: "temperature" not declared in subs node = horus.Node(subs=["temp"], tick=tick) msg = node.recv("temperature") # Wrong name — not in subs # GOOD: declared topic matches recv() call node = horus.Node(subs=["temperature"], tick=tick) msg = node.recv("temperature") ``` --- ### `TypeError` in Tick Function **Symptom:** ``` TypeError: tick() takes 0 positional arguments but 1 was given ``` **Cause:** Your tick function does not accept the `node` parameter. The scheduler passes the node instance automatically. **Fix:** ```python # WRONG def tick(): print("hello") # CORRECT — tick must accept one argument (the node) def tick(node): print("hello") ``` This also applies to `init` and `shutdown` callbacks: ```python def my_init(node): node.log_info("Starting up") def my_shutdown(node): node.log_info("Shutting down") ``` --- ## Multi-Process Issues ### Topics Not Visible Across Processes **Symptom:** Two Python scripts both use HORUS topics, but they cannot see each other's messages. **Cause:** You are running `python src/main.py` directly instead of `horus run`. When you run Python directly, HORUS cannot set up the shared memory namespace, so each process gets its own isolated SHM region. **Fix:** Always use `horus run`: ```bash # WRONG — no SHM namespace, topics are isolated python src/main.py # CORRECT — horus sets up the namespace horus run ``` If you need multiple processes that share topics, launch them all through HORUS: ```bash # Terminal 1 cd my-project && horus run # Terminal 2 (same project directory — shares the SHM namespace) cd my-project && horus run src/second_process.py ``` **Verifying topic visibility:** ```bash # While your application is running, check for active topics horus topic list # If topics appear here, they are in the correct namespace # If empty, the publisher is not using `horus run` ``` --- ### Different SHM Namespaces **Symptom:** Two HORUS projects cannot communicate via topics. **Cause:** Each HORUS project gets its own SHM namespace (derived from the project directory). Topics from project A are invisible to project B by design. **Fix:** If you need cross-project communication, run both node sets from the same project directory. Alternatively, structure your system as a single HORUS project with multiple entry points. --- ## Virtual Environment Issues ### `horus` Installed System-Wide, Python in a Venv **Symptom:** `horus run` works, but `import horus` fails inside your virtual environment. **Cause:** The `horus-robotics` package was installed into the system Python, but your venv does not inherit system packages. **Fix:** ```bash # Option 1: Install into the venv source venv/bin/activate pip install horus-robotics # Option 2: If building from source source venv/bin/activate cd horus_py maturin develop --release ``` **Verify the fix:** ```bash source venv/bin/activate python3 -c "import horus; print('OK')" ``` --- ### `horus` Installed in Venv, CLI Calls System Python **Symptom:** `python3 -c "import horus"` works inside your venv, but `horus run` uses the wrong Python and fails with `ModuleNotFoundError`. **Cause:** The `horus` CLI resolves Python from `PATH`. If the venv is not activated when you run `horus run`, it picks up the system Python which does not have the package. **Fix:** ```bash # Always activate the venv before running source venv/bin/activate horus run ``` **Check which Python `horus` is using:** ```bash # Inside your tick function, add temporarily: import sys print(sys.executable) ``` If this prints a path outside your venv (e.g., `/usr/bin/python3`), activate the venv and try again. --- ## Performance Issues ### `tick()` Taking Too Long (GIL Blocking) **Symptom:** Your node has a deadline or budget set, and you see warnings like: ``` [WARN] [MyNode] tick exceeded budget: 52ms (budget: 33ms) ``` Or `horus monitor` shows tick durations consistently over budget. **Cause:** Python's Global Interpreter Lock (GIL) means only one thread runs Python code at a time. If your `tick()` does heavy computation in pure Python, it blocks the scheduler. **Fix:** **Option 1: Use `compute=True` for C-extension work (NumPy, PyTorch, OpenCV):** ```python import numpy as np import horus def heavy_tick(node): data = node.recv("sensor") if data is not None: # NumPy releases the GIL internally result = np.fft.fft(np.array(data)) node.send("processed", result.tolist()) node = horus.Node( subs=["sensor"], pubs=["processed"], tick=heavy_tick, rate=30, compute=True # Runs on thread pool, GIL released during C calls ) ``` **Option 2: Reduce work per tick:** ```python def tick(node): # BAD: Processing entire dataset every tick for item in huge_list: process(item) # GOOD: Process in chunks chunk = huge_list[node.tick_count % len(chunks)] process(chunk) ``` **Option 3: Lower the rate for heavy nodes:** ```python # ML inference at 10 Hz, not 100 Hz node = horus.Node(tick=run_model, rate=10, budget=0.08) # 80ms budget ``` --- ### High `dropped_count` / Messages Silently Lost **Symptom:** `horus monitor` shows a nonzero dropped count on a topic, or you know messages are being sent but the subscriber never sees them. **Cause:** For generic (dict-based) messages, HORUS serializes data with MessagePack. If the serialized size exceeds the slot size (default 8KB), the message is silently dropped. **Fix:** **Check message size:** ```python import msgpack data = {"image": [0] * 10000} size = len(msgpack.packb(data)) print(f"Serialized size: {size} bytes") # If > 8192, the message will be dropped ``` **Keep messages small:** ```python # BAD: Sending large data through generic topics node.send("image", {"pixels": huge_list}) # GOOD: Send metadata, store large data elsewhere node.send("image.meta", {"width": 640, "height": 480, "frame_id": 42}) # BETTER: Use typed messages for fixed-size data (zero-copy, no size limit) from horus import Image node.send("camera", Image(...)) ``` **Use typed messages for large fixed-size data:** ```python from horus import CmdVel, LaserScan # Typed messages use zero-copy Pod transfer (~2.7us) # No serialization, no size limit concerns node = horus.Node(pubs=[CmdVel], tick=tick) ``` **Monitor drops in real time:** ```bash horus monitor # Check the "Drops" column in the Topics tab ``` --- ### Slow Dict Topics vs Typed Topics **Symptom:** IPC latency is around 10us when sending dicts, but you expected the sub-microsecond latency advertised by HORUS. **Cause:** String-named topics use MessagePack serialization (~10us round-trip). Typed topics use zero-copy Pod transfer (~2.7us round-trip). The sub-microsecond figures apply to Rust Pod messages. **Fix:** Use typed messages for performance-critical paths: ```python from horus import CmdVel, LaserScan, Imu # SLOW (~10μs) — dict serialized with MessagePack node = horus.Node(pubs=["cmd_vel"], tick=lambda n: n.send("cmd_vel", {"linear": 1.0})) # FAST (~2.7μs) — typed Pod, zero-copy node = horus.Node(pubs=[CmdVel], tick=lambda n: n.send("cmd_vel", CmdVel(1.0, 0.0))) ``` For nodes where latency does not matter (loggers, displays, config), dict topics are perfectly fine. --- ## Debugging Tools ### `horus topic` Commands Use these to inspect live topic data from the command line: ```bash # List all active topics with publisher/subscriber counts horus topic list # Print messages as they arrive on a topic horus topic echo sensor_data # Measure the publish rate of a topic horus topic hz sensor_data ``` These work regardless of whether the publisher is Python or Rust. --- ### `horus monitor` The real-time TUI monitor is the single best debugging tool: ```bash # Terminal 1: Run your application horus run # Terminal 2: Open the monitor horus monitor ``` **What to check:** | Tab | Look For | Meaning | |-----|----------|---------| | Nodes | State = "Running" for all nodes | If "Error" or missing, the node crashed | | Nodes | Tick duration | If consistently over budget, tick is too slow | | Topics | Publisher count = 0 | Nobody is sending to this topic | | Topics | Subscriber count = 0 | Nobody is listening to this topic | | Topics | Drops > 0 | Messages are being lost (see [Messages Dropped](#high-dropped_count--messages-silently-lost)) | | Metrics | IPC latency | Should be <10us for dict topics, <3us for typed | | Graph | Disconnected nodes | Topic name mismatch between publisher and subscriber | **Debug workflow:** ``` 1. Check Nodes tab -> All nodes Running? If not, check terminal for exceptions 2. Check Topics tab -> Topic exists? If not, topic name typo or publisher not started -> Publishers > 0? If not, publisher node is not working -> Subscribers > 0? If not, subscriber has wrong topic name 3. Check Metrics tab -> Messages sent > 0? If not, publisher tick is not calling send() -> Messages received > 0? If not, subscriber tick is not calling recv() 4. Check Graph tab -> All nodes connected? If not, topic name mismatch ``` --- ### Checking Node Health Use logging inside your nodes to trace execution: ```python def tick(node): node.log_debug("tick started") msg = node.recv("input") if msg is not None: node.log_info(f"Processing message: {msg}") result = process(msg) node.send("output", result) node.log_debug("tick completed with message") else: node.log_debug("tick completed, no message") ``` If you see "tick started" in the logs but never "tick completed", the hang is inside your processing code. --- ## Common Error Messages ### `RuntimeError: Topic 'X' not in publishers list` **Cause:** You called `node.send("X", data)` but `"X"` was not declared in the node's `pubs` parameter. **Fix:** ```python # WRONG node = horus.Node(pubs=["output"], tick=tick) # then in tick: node.send("result", data) # "result" not in pubs! # CORRECT node = horus.Node(pubs=["result"], tick=tick) # then in tick: node.send("result", data) ``` --- ### `RuntimeError: Topic 'X' not in subscribers list` **Cause:** You called `node.recv("X")` but `"X"` was not declared in the node's `subs` parameter. **Fix:** Add the topic to `subs`: ```python node = horus.Node(subs=["sensor_data"], tick=tick) # now node.recv("sensor_data") works ``` --- ### `RuntimeWarning: Logging outside scheduler context` **Cause:** You called `node.log_info()` (or another logging method) outside of `init()`, `tick()`, or `shutdown()`. Logging only works during scheduler callbacks. **Fix:** ```python # WRONG — logging before scheduler starts node = horus.Node(tick=tick) node.log_info("Ready") # RuntimeWarning — dropped # CORRECT — log inside callbacks def init(node): node.log_info("Ready") # Works def tick(node): node.log_info("Ticking") # Works ``` --- ### `RuntimeError: Scheduler is already running` **Cause:** You called `horus.run()` twice, or called it while a scheduler is already active in the same process. **Fix:** Only call `horus.run()` once. If you need to restart, let the first call finish (Ctrl+C or `duration` timeout), then call again. --- ### `OSError: [Errno 28] No space left on device` **Cause:** The `/dev/shm` tmpfs filesystem is full. HORUS stores shared memory topics here, and stale regions from crashed processes can accumulate. **Fix:** ```bash # Clean HORUS shared memory regions horus clean --shm # Check available space df -h /dev/shm # If still full, check what's using space horus doctor ``` --- ## Patterns and Anti-Patterns ### DO: Check `recv()` for `None` ```python def tick(node): msg = node.recv("input") if msg is not None: process(msg) # No message? That's OK — continue ``` ### DON'T: Assume `recv()` Always Returns Data ```python def tick(node): msg = node.recv("input") print(msg["value"]) # AttributeError if msg is None! ``` ### DO: Keep `tick()` Fast ```python def tick(node): data = node.recv("sensor") if data is not None: node.send("output", quick_transform(data)) ``` ### DON'T: Block in `tick()` ```python def tick(node): import time time.sleep(1) # Blocks the entire scheduler! import requests resp = requests.get("https://api.example.com") # Network I/O blocks! ``` If you need async I/O, use `async` tick functions: ```python async def tick(node): # Async ticks are auto-detected and run on the async executor data = await fetch_data() node.send("output", data) ``` ### DO: Declare All Topics ```python node = horus.Node( pubs=["status", "cmd_vel"], subs=["sensor", "emergency_stop"], tick=tick ) ``` ### DON'T: Use Undeclared Topics ```python node = horus.Node(pubs=["status"], tick=tick) # In tick: node.send("cmd_vel", data) # RuntimeError — not in pubs! ``` ### DO: Use `horus run` ```bash horus run ``` ### DON'T: Run Python Directly ```bash python src/main.py # Topics won't connect across processes ``` --- ## Getting Help If you are still stuck: 1. **Run diagnostics:** ```bash horus doctor horus clean --shm ``` 2. **Add debug logging:** ```python def tick(node): node.log_debug(f"State: {my_state}") ``` 3. **Test with a minimal example:** Strip your code down to the simplest possible case. Add complexity back one piece at a time until the error appears. 4. **Check the monitor:** ```bash horus monitor ``` 5. **Report the issue:** - GitHub: https://github.com/softmata/horus/issues - Include: full error message + traceback, minimal code example, output of `horus --version` and `python3 -c "import horus; print(horus.__version__)"`, OS and Python version --- ## Next Steps - [Quick Start (Python)](/getting-started/quick-start-python) — Build your first application - [Python API Reference](/python/api/python-bindings) — Full API documentation - [Choosing a Language](/getting-started/choosing-language) — When to use Python vs Rust - [Performance](/performance/performance) — Optimization tips - [Common Mistakes](/getting-started/common-mistakes) — Beginner pitfalls --- ## See Also - [Troubleshooting (Rust)](/getting-started/troubleshooting) — Rust-specific issues - [Debugging](/development/debugging) — Runtime diagnostics - [Monitor](/development/monitor) — Live system observation - [CLI Reference](/development/cli-reference) — `horus doctor`, `horus topic`, and `horus clean` --- ## Troubleshooting Path: /getting-started/troubleshooting Description: Fix installation issues, runtime errors, and debug HORUS applications # Troubleshooting Fix installation issues, runtime errors, and debug HORUS applications. Start with the quick diagnostic steps below, then jump to the specific section matching your symptom. ## Prerequisites - HORUS installed ([Installation Guide](/getting-started/installation)) - Access to a terminal in your project directory ## Quick Reference | Script | Use When | What It Does | |--------|----------|--------------| | `./install.sh` | Install or update | Full installation from source | | `./uninstall.sh` | Remove HORUS | Complete removal | ## Common Errors At a Glance | Error | Jump To | Quick Fix | |-------|---------|-----------| | `horus: command not found` | [Runtime Issues](#horus-command-not-found) | `export PATH="$HOME/.cargo/bin:$PATH"` | | `Rust is not installed` | [Installation Issues](#installation-issues) | Install Rust via `rustup` | | `Failed to create Topic` | [Topic Creation Errors](#topic-creation-errors) | `horus clean --shm` | | `No such file or directory` (topic) | [Slashes in Topic Names](#no-such-file-or-directory-when-creating-topic) | Use dots, not slashes | | Subscriber never receives messages | [Topic Not Found](#topic-not-found--no-messages-received) | Check topic name match in `horus monitor` | | Application freezes | [Application Hangs](#application-hangs--deadlock) | Check for blocking calls in `tick()` | | Messages silently dropped | [Messages Dropped](#messages-silently-dropped) | Keep messages under slot size | | Version mismatch | [Version Mismatch](#version-mismatch-errors) | `horus run --clean` | | `HORUS source directory not found` | [Source Not Found](#horus-source-directory-not-found-rust-projects) | Set `$HORUS_SOURCE` | ## Quick Diagnostic Steps When your HORUS application isn't working: 1. **Check the Monitor**: Run `horus monitor` to see active nodes, topics, and message flow 2. **Examine Logs**: Look for error messages in your terminal output 3. **Verify Topics**: Ensure publisher and subscriber use exact same topic names 4. **Check Shared Memory**: Run `horus clean --shm` to remove stale shared memory regions 5. **Test Individually**: Run nodes one at a time to isolate the problem --- ## Updating HORUS To update to the latest version: ```bash cd /path/to/horus git pull ./install.sh ``` **To preview changes before updating:** ```bash git fetch git log HEAD..@{u} # See what's new git pull ./install.sh ``` **If you have uncommitted changes:** ```bash git stash git pull ./install.sh git stash pop # Restore your changes ``` --- ## Manual Recovery **Use when:** Build errors, corrupted cache, installation broken ### Quick Steps ```bash # Navigate to HORUS source directory cd /path/to/horus # 1. Clean build artifacts cargo clean # 2. Remove cached libraries rm -rf ~/.horus/cache # 3. Fresh install ./install.sh ``` ### When to Use Recovery **Symptoms requiring recovery:** 1. **Build fails:** ``` error: could not compile `horus_core` ``` 2. **Corrupted cache:** ``` error: failed to load source for dependency `horus_core` ``` 3. **Binary doesn't work:** ```bash $ horus --help Segmentation fault ``` 4. **Version mismatches:** ``` error: the package `horus` depends on `horus_core 0.1.0`, but `horus_core 0.1.3` is installed ``` 5. **Broken after system updates:** - Rust updated - System libraries changed - GCC/Clang updated ### What Gets Removed **By `cargo clean`:** - `target/` directory (build artifacts) **By `rm -rf ~/.horus/cache`:** - Installed libraries - Cached dependencies **Never removed (safe):** - `~/.horus/config` (user settings) - `~/.horus/credentials` (registry auth) - Project-local `.horus/` directories - Your source code ### Full Reset (Nuclear Option) If the quick steps don't work, do a complete reset: ```bash # Remove everything HORUS-related cargo clean rm -rf ~/.horus rm -f ~/.cargo/bin/horus # Fresh install ./install.sh ``` --- ## Installation Issues **Problem: "Rust not installed"** ```bash $ ./install.sh Error: Rust is not installed ``` **Solution:** ```bash # Install Rust curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh # Then try again ./install.sh ``` --- **Problem: "C compiler not found"** **Solution:** ```bash # Ubuntu/Debian/Raspberry Pi OS - Install ALL required packages sudo apt update sudo apt install -y build-essential pkg-config \ libssl-dev libudev-dev libasound2-dev \ libx11-dev libxrandr-dev libxi-dev libxcursor-dev libxinerama-dev \ libwayland-dev wayland-protocols libxkbcommon-dev \ libvulkan-dev libfontconfig-dev libfreetype-dev \ libv4l-dev # Fedora/RHEL sudo dnf groupinstall "Development Tools" sudo dnf install -y pkg-config openssl-devel systemd-devel alsa-lib-devel \ libX11-devel libXrandr-devel libXi-devel libXcursor-devel libXinerama-devel \ wayland-devel wayland-protocols-devel libxkbcommon-devel \ vulkan-devel fontconfig-devel freetype-devel \ libv4l-devel ``` --- **Problem: Build fails with linker errors** ``` error: linking with `cc` failed: exit status: 1 error: could not find native static library `X11`, perhaps an -L flag is missing? ``` **Solution:** ```bash # Install ALL missing system libraries (most common cause) # Ubuntu/Debian/Raspberry Pi OS sudo apt update sudo apt install -y build-essential pkg-config \ libssl-dev libudev-dev libasound2-dev \ libx11-dev libxrandr-dev libxi-dev libxcursor-dev libxinerama-dev \ libwayland-dev wayland-protocols libxkbcommon-dev \ libvulkan-dev libfontconfig-dev libfreetype-dev \ libv4l-dev # Or run manual recovery (see Manual Recovery section) cargo clean && rm -rf ~/.horus/cache && ./install.sh ``` --- ## Update Issues **Problem: "Build failed" during update** **Solution:** ```bash # Try manual recovery cargo clean && rm -rf ~/.horus/cache && ./install.sh ``` --- **Problem: "Already up to date" but binary broken** **Solution:** ```bash # Force rebuild ./install.sh ``` --- ## Runtime Issues ### "horus: command not found" **Solution:** ```bash # Add to PATH (add to ~/.bashrc or ~/.zshrc) export PATH="$HOME/.cargo/bin:$PATH" # Then reload shell source ~/.bashrc # or restart terminal # Verify which horus horus --help ``` --- ### Binary exists but doesn't run ```bash $ horus --help Segmentation fault ``` **Solution:** ```bash # Full recovery cargo clean && rm -rf ~/.horus/cache && ./install.sh ``` --- ### Version mismatch errors ``` error: the package `horus` depends on `horus_core 0.1.0`, but `horus_core 0.1.3` is installed ``` **Why this happens:** - You updated the `horus` CLI to a new version - Your project's `.horus/` directory still has cached dependencies from the old version - The cached `Cargo.lock` references incompatible library versions **Solution (Recommended - Fast & Easy):** ```bash # Clean cached build artifacts and dependencies horus run --clean # This removes .horus/target/ and forces a fresh build # with the new version ``` **Alternative Solutions:** **Option 2: Manual cleanup** ```bash # Remove the entire .horus directory rm -rf .horus/ # Next run will rebuild from scratch horus run ``` **Option 3: Manual recovery (for persistent issues)** ```bash # Only needed if --clean doesn't work # This reinstalls HORUS libraries globally cd /path/to/horus cargo clean && rm -rf ~/.horus/cache && ./install.sh ``` **For multiple projects:** ```bash # Clean all projects in your workspace find ~/your-projects -type d -name ".horus" -exec rm -rf {}/target/ \; ``` --- ### "HORUS source directory not found" (Rust projects) ``` Error: HORUS source directory not found. Please set HORUS_SOURCE environment variable. ``` **Solution:** ```bash # Option 1: Set HORUS_SOURCE (recommended for non-standard installations) export HORUS_SOURCE=/path/to/horus echo 'export HORUS_SOURCE=/path/to/horus' >> ~/.bashrc # Option 2: Install HORUS to a standard location # The CLI checks these paths automatically: # - ~/softmata/horus # - /horus # - /opt/horus # - /usr/local/horus # Verify HORUS source is found horus build ``` **Why this happens:** - `horus run` needs to find HORUS core libraries for Rust compilation - It auto-detects standard installation paths - For custom installations, set `$HORUS_SOURCE` --- ## Topic Creation Errors **Symptom**: Application crashes on startup with: ``` Error: Failed to create `Topic` thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value' ``` **Common Causes:** 1. **Stale Shared Memory from Previous Run** - If your app crashes, shared memory regions can persist **Fix**: Clean shared memory: ```bash horus clean --shm ``` 2. **Insufficient Permissions on Shared Memory** (Linux) **Fix**: Ensure the shared memory directory has correct permissions, then clean stale regions: ```bash horus doctor # Diagnoses permission issues horus clean --shm # Clean stale regions ``` 3. **Insufficient Shared Memory Space** **Fix**: Run the health check to see available space: ```bash horus doctor ``` Consult your OS documentation to increase tmpfs size if needed. 4. **Conflicting Topic Names** - Two Topics with same name but different types **Fix**: Use unique topic names: ```rust // BAD: Same name, different types let topic1: Topic = Topic::new("data")?; let topic2: Topic = Topic::new("data")?; // CONFLICT! // GOOD: Different names let topic1: Topic = Topic::new("sensor_data")?; let topic2: Topic = Topic::new("status_data")?; ``` **General Code Fix:** ```rust // Topic names become file paths on the underlying shared memory system. // Use simple, descriptive names with dots (not slashes): let topic = Topic::new("sensor_data")?; let topic = Topic::new("camera.front.raw")?; ``` --- ### "No such file or directory" when creating Topic **Symptom**: Application crashes with: ``` thread 'main' panicked at 'Failed to create publisher 'camera': No such file or directory' ``` **Cause**: You're using **slashes (`/`)** in your topic name. While slashes work on Linux (parent directories are created automatically), they fail on **macOS** where shared memory uses `shm_open()` which doesn't support embedded slashes. On Linux, topic names map to shared memory files. On macOS, they map to `shm_open()` kernel objects which don't support slashes: ``` Topic: "sensors.camera" → works on all platforms (cross-platform) Topic: "sensors.camera" → works on Linux only, fails on macOS ``` **Fix**: Use dots instead of slashes for cross-platform compatibility: ```rust // WRONG - slashes fail on macOS let topic: Topic = Topic::new("sensors/camera")?; let topic: Topic = Topic::new("robot/cmd_vel")?; // CORRECT - dots work on all platforms let topic: Topic = Topic::new("sensors.camera")?; let topic: Topic = Topic::new("robot.cmd_vel")?; ``` **Coming from ROS?** ROS uses slashes (`/sensor/lidar`) because it uses network-based naming. HORUS uses dots because topic names map directly to shared memory file names. See [Topic Naming](/concepts/core-concepts-topic#use-dots-not-slashes) for details. --- ## Topic Not Found / No Messages Received **Symptom**: Subscriber node never receives messages even though publisher is sending. ```rust // recv() always returns None if let Some(data) = self.data_sub.recv() { println!("Got data"); // Never prints } ``` **Common Causes:** 1. **Topic Name Mismatch (Typo)** - This is the #1 cause **Fix**: Verify exact topic names: ```rust // Publisher let pub_topic: Topic = Topic::new("sensor_data")?; // Note: sensor_data // Subscriber (TYPO! Missing underscore) let sub_topic: Topic = Topic::new("sensordata")?; // CORRECT: let sub_topic: Topic = Topic::new("sensor_data")?; // Exact match ``` **Debug with Monitor:** ```bash horus monitor ``` Check the "Topics" section to see active topic names. 2. **Type Mismatch** - Publisher and subscriber use different message types **Fix**: Ensure both use same type: ```rust // Publisher let pub_topic: Topic = Topic::new("data")?; pub_topic.send(3.14); // Subscriber (WRONG TYPE) let sub_topic: Topic = Topic::new("data")?; // f64 != f32 // CORRECT: let sub_topic: Topic = Topic::new("data")?; // Same type ``` 3. **Publisher Hasn't Sent Yet** - Subscriber starts before publisher sends first message - This is normal! First `recv()` will return `None` **Fix**: Check multiple ticks: ```rust impl Node for SubscriberNode { fn tick(&mut self) { if let Some(msg) = self.topic.recv() { // Process message } else { // No message yet - this is OK on first few ticks } } } ``` 4. **Wrong Priority Order** - Subscriber runs before publisher in same tick **Fix**: Set priorities correctly: ```rust // Publisher should run first (lower order number) scheduler.add(PublisherNode::new()?).order(0).build()?; // Subscriber runs after (higher order number) scheduler.add(SubscriberNode::new()?).order(1).build()?; ``` --- ## Application Hangs / Deadlock **Symptom**: Your app starts but freezes with no error messages. ``` Starting application... [Nodes initialized] [Application freezes - no output] ``` **Common Causes:** 1. **Infinite Loop in `tick()`** ```rust // BAD: Never returns! fn tick(&mut self) { loop { // Process data } } // GOOD: Tick returns after work fn tick(&mut self) { self.process_data(); // Return naturally — scheduler calls tick() again next frame } ``` 2. **Blocking Operations in `tick()`** ```rust // BAD: Blocks scheduler fn tick(&mut self) { std::thread::sleep(Duration::from_secs(10)); // Blocks everything! } // GOOD: Use tick counter for delays fn tick(&mut self) { self.tick_count += 1; // Execute every 10 ticks (~167ms at 60 FPS) if self.tick_count % 10 == 0 { self.slow_operation(); } } ``` 3. **Waiting Forever for Messages** ```rust // BAD: Blocking wait fn tick(&mut self) { while self.data_sub.recv().is_none() { // Infinite loop if no messages! } } // GOOD: Non-blocking receive fn tick(&mut self) { if let Some(data) = self.data_sub.recv() { // Process data } // Continue even if no message } ``` 4. **Circular Priority Dependencies** - Node A waits for Node B, Node B waits for Node A **Fix**: Ensure data flows one direction: ```rust // BAD: Circular dependency // Node A (priority 0) subscribes to "data_b" // Node B (priority 1) subscribes to "data_a" // Both wait for each other! // GOOD: Unidirectional flow // Node A (priority 0) publishes to "data_a" // Node B (priority 1) subscribes to "data_a", publishes to "data_b" // Node C (priority 2) subscribes to "data_b" ``` 5. **Debug with Logging** ```rust fn tick(&mut self) { hlog!(debug, "Tick started"); // Your code here hlog!(debug, "Tick completed"); } ``` If you see "Tick started" but never "Tick completed", the hang is in your code. --- ## Messages Silently Dropped **Symptom**: Publisher sends messages but subscriber never receives them, and no error is reported. **Cause**: For non-POD (serialized) messages, the serialized data exceeds the slot size (default 8KB). The `send()` method is lossy — it retries briefly then drops the message, incrementing an internal failure counter. **Fix**: **1. Check Message Size:** ```rust use std::mem::size_of; // POD messages always fit (slot = size_of::()) // Non-POD messages must serialize within the slot size (default 8KB) println!("Message size: {} bytes", size_of::()); ``` **2. Keep Messages Reasonably Sized:** ```rust // BAD: Variable size — may exceed limit #[derive(Clone, Serialize, Deserialize)] pub struct LargeMessage { pub data: Vec, } // GOOD: Fixed size #[derive(Clone, Serialize, Deserialize)] pub struct LargeMessage { pub data: [u8; 4096], // Fixed 4KB } // BETTER: Split into multiple messages #[derive(Clone, Serialize, Deserialize)] pub struct MessageChunk { pub chunk_id: u32, pub total_chunks: u32, pub data: [u8; 1024], } ``` **3. Use Monitor to Check:** ```bash # The monitor shows send failure counts per topic horus monitor ``` --- ## Build and Compilation Issues ### "unresolved import" or "cannot find type in this scope" **Symptom**: Code won't compile, missing types or functions. **Fix**: Ensure HORUS is in your `Cargo.toml` dependencies: ```toml [dependencies] horus = { path = "..." } horus_library = { path = "..." } # For standard messages (CmdVel, Twist, etc.) ``` Import the prelude: ```rust use horus::prelude::*; // Provides Twist, LaserScan, CmdVel, etc. ``` ### "trait bound ... is not satisfied" **Symptom**: Compiler says your message doesn't implement required traits. **Fix**: Add required derives: ```rust // Add these three derives to all messages #[derive(Debug, Clone, Serialize, Deserialize)] pub struct MyMessage { pub field: f32, } ``` --- ## Performance Issues **Problem: Slow builds** **Solution:** ```bash # Use release mode (optimized) horus run --release ``` --- **Problem: Large disk usage** **Solution:** ```bash # Clean old cargo cache cargo clean # Remove unused dependencies cargo install cargo-cache cargo cache --autoclean ``` --- **Problem: Large `.horus/target/` directory (Rust projects)** **Why this happens:** - Cargo stores build artifacts in `.horus/target/` - Debug builds are unoptimized and larger - Incremental compilation caches intermediate files **Solution:** ```bash # Clean build artifacts in current project rm -rf .horus/target/ # Or use horus clean flag (next build will be slower) horus run --clean # Regular cleanup (if working on multiple projects) find . -type d -name ".horus" -exec rm -rf {}/target/ \; # Add to .gitignore (already included in horus new templates) echo ".horus/target/" >> .gitignore ``` **Disk usage typical sizes:** - `.horus/target/debug/`: ~10-100 MB (incremental builds) - `.horus/target/release/`: ~5-50 MB (optimized, no debug symbols) **Best practices:** - `.horus/` is in `.gitignore` by default - Clean periodically if disk space is limited --- ## Using the Monitor to Debug The monitor is your best debugging tool for runtime issues. **Starting the Monitor:** ```bash # Terminal 1: Run your application horus run # Terminal 2: Start monitor horus monitor ``` **Monitor Features:** **1. Nodes Tab:** - Shows all running nodes - Displays node state (Running, Error, Stopped) - Shows tick count and timing - Highlights nodes that aren't ticking (stuck) **2. Topics Tab:** - Lists all active topics - Shows message types - Displays publisher/subscriber counts - **0 publishers** = no one is sending - **0 subscribers** = no one is listening **3. Metrics Tab:** - **IPC Latency**: Communication time (should be <1µs) - **Tick Duration**: How long each node takes - **Message Counts**: Total sent/received - If sent > 0 but received = 0, subscriber issue - If sent = 0, publisher issue **4. Graph Tab:** - Visual node graph - Shows message flow between nodes - Disconnected nodes = topic mismatch **Debug Workflow:** ``` 1. Check Nodes tab -> All nodes Running? (If Error, check logs) 2. Check Topics tab -> Topics exist? (If no, topic name typo) -> Publishers > 0? (If no, publisher not working) -> Subscribers > 0? (If no, subscriber not created) 3. Check Metrics tab -> Messages sent > 0? (If no, publisher not sending) -> Messages received > 0? (If no, subscriber not receiving) -> IPC latency sane? (If >1ms, system issue) 4. Check Graph tab -> Nodes connected? (If no, topic name mismatch) ``` **Example Debug Session:** ```bash # Problem: Subscriber not receiving messages # Monitor shows: # Nodes: SensorNode (Running), DisplayNode (Running) # Topics: "sensor_data" (1 pub, 0 sub) <-- AHA! # Issue: No subscribers! # Fix: Check DisplayNode - likely wrong topic name ``` --- ## Reading Log Output ### Log Levels HORUS nodes can log at different severity levels: ```rust fn tick(&mut self) { hlog!(debug, "Detailed info for debugging"); hlog!(info, "Normal informational message"); hlog!(warn, "Something unusual happened"); hlog!(error, "Something went wrong!"); } ``` ### Log Format Console output uses ANSI-colored formatting: ``` [INFO] [SensorNode] Sensor initialized │ │ │ │ │ └─ Message │ └─ Node name └─ Log level (INFO, WARN, ERROR, DEBUG) ``` Timestamps are included in the shared memory log buffer (visible in the monitor), formatted as `HH:MM:SS.mmm`. --- ## Common Patterns and Anti-Patterns ### [OK] DO: Check recv() for None ```rust fn tick(&mut self) { if let Some(msg) = self.topic.recv() { // Process message } // No message? That's OK, just continue } ``` ### [FAIL] DON'T: Unwrap recv() ```rust fn tick(&mut self) { let msg = self.topic.recv().unwrap(); // PANIC if no message! } ``` ### [OK] DO: Use Result for errors ```rust impl Node for MyNode { fn init(&mut self) -> Result<()> { if self.sensor.is_broken() { return Err(Error::node("MyNode", "Sensor initialization failed")); } Ok(()) } } ``` ### [FAIL] DON'T: panic!() in nodes ```rust fn init(&mut self) -> Result<()> { if self.sensor.is_broken() { panic!("Sensor broken"); // DON'T DO THIS } Ok(()) } ``` ### [OK] DO: Keep tick() fast ```rust fn tick(&mut self) { // Quick operations only let data = self.sensor.read_cached(); self.topic.send(data); } ``` ### [FAIL] DON'T: Block in tick() ```rust fn tick(&mut self) { thread::sleep(Duration::from_millis(100)); // Blocks everything! let data = self.network.fetch(); // Network I/O blocks! } ``` --- ## Best Practices ### Regular Maintenance **Weekly (active development):** ```bash git pull && ./install.sh # Pulls latest and rebuilds ``` **After system updates:** ```bash # If Rust/GCC updated, run manual recovery cargo clean && rm -rf ~/.horus/cache && ./install.sh ``` ### CI/CD Integration ```bash # In CI pipeline ./install.sh || (cargo clean && rm -rf ~/.horus/cache && ./install.sh) ``` ### Debugging Workflow 1. **First: Check horus works** ```bash horus --help ``` 2. **If issues: Update** ```bash git pull && ./install.sh ``` 3. **If errors: Manual recovery** ```bash cargo clean && rm -rf ~/.horus/cache && ./install.sh ``` --- ## Getting Help If you're still having issues: 1. **Try manual recovery:** ```bash cargo clean && rm -rf ~/.horus/cache && ./install.sh ``` 2. **Add Debug Logging:** ```rust // Add hlog!(debug, ...) in your nodes to trace execution hlog!(debug, "Node state: {:?}", self.state); ``` 3. **Test with Minimal Example:** - Strip down to simplest possible code - Add complexity back one piece at a time - Identify what causes the error 4. **Check System Resources:** ```bash # Check system health (includes shared memory) horus doctor # Clean stale shared memory if needed horus clean --shm ``` 5. **Report the issue:** - GitHub: https://github.com/softmata/horus/issues - Include: full error message, minimal code example, OS and platform --- ## Next Steps - **[Installation](/getting-started/installation)** - First-time installation guide - **[CLI Reference](/development/cli-reference)** - All horus commands - **[Examples](/rust/examples/basic-examples)** - Working code examples - **[Performance](/performance/performance)** - Optimization tips - **[Testing](/development/testing)** - Test your nodes to prevent runtime errors --- ## See Also - [Common Mistakes](/getting-started/common-mistakes) — Beginner pitfalls - [Debugging](/development/debugging) — Runtime diagnostics - [Monitor](/development/monitor) — Live system observation - [CLI Reference](/development/cli-reference) — `horus doctor` and `horus logs` ======================================== # SECTION: Learn ======================================== --- ## HORUS vs ROS2 Path: /learn/vs-ros2 Description: Detailed comparison of HORUS and ROS2 — performance, architecture, developer experience, and when to use each # HORUS vs ROS2 ROS2 is the industry standard for robotics middleware. It has a massive ecosystem, multi-machine networking, and 15 years of community packages. But its architecture — built around DDS, a distributed data service designed for enterprise networking — adds overhead that many single-machine robots don't need. HORUS makes the opposite trade-off: it strips out the network layer entirely and uses shared memory for all communication. The result is 100–500x faster IPC, deterministic execution, and a simpler development experience — at the cost of a smaller ecosystem and no built-in multi-machine networking. This page helps you decide which framework fits your project, or whether to use both. > Already using ROS2? See [Coming from ROS2](/learn/coming-from-ros2) for a migration guide with side-by-side code examples. ## Quick Summary | Aspect | HORUS | ROS2 | |--------|-------|------| | **IPC Latency** | ~85 ns (shared memory) | ~50–100 µs (DDS) | | **Architecture** | Tick-based, deterministic | Callback-based, event-driven | | **Real-Time** | Auto-detected from `.rate()` / `.budget()` | Manual DDS QoS configuration | | **Config Files** | 1 file (`horus.toml`) | 3+ files (package.xml, CMakeLists.txt, launch) | | **Languages** | Rust, Python | C++, Python | | **Ecosystem** | Growing (core + package registry) | Massive (thousands of packages) | | **Multi-Machine** | Not yet (single-machine) | Native (DDS network transport) | | **Visualization** | `horus monitor` (web + TUI) | RViz2, Foxglove, rqt | ## Performance ### Message Latency HORUS uses shared memory directly — no serialization, no kernel transitions, no DDS middleware layer. | Message Type | Size | HORUS | ROS2 (FastDDS) | Speedup | |-------------|------|-------|----------------|---------| | CmdVel | 16 B | ~85 ns | ~50 µs | **588x** | | IMU | 304 B | ~400 ns | ~55 µs | **138x** | | Odometry | 736 B | ~600 ns | ~60 µs | **100x** | | LaserScan | 1.5 KB | ~900 ns | ~70 µs | **78x** | | PointCloud (1K pts) | 12 KB | ~12 µs | ~150 µs | **13x** | | PointCloud (10K pts) | 120 KB | ~360 µs | ~800 µs | **2.2x** | The speedup is most dramatic for small, frequent messages (CmdVel, IMU) — exactly the messages that matter for tight control loops above 100 Hz. ### Throughput | Metric | HORUS | ROS2 | |--------|-------|------| | Small messages (16 B) | 2.7M msg/s | ~20K msg/s | | IMU messages (304 B) | 1.8M msg/s | ~18K msg/s | ### Real-Time Metrics | Metric | HORUS | ROS2 | |--------|-------|------| | Timing jitter | ±10 µs | ±100–500 µs | | Per-node overhead | <5 µs | ~50–200 µs per callback | | Deadline enforcement | Built-in (`.budget()`, `.deadline()`) | Manual (rmw QoS) | | Emergency stop response | <100 µs (Event node) | Application-dependent | ## Architecture ### HORUS: Dependency-Driven, Auto-Parallel The scheduler builds a dependency graph from topic `send()`/`recv()` metadata. Dependent nodes execute in causal order. Independent nodes run in parallel — automatically: ``` Scheduler tick (ready-dispatch): t=0ms: SafetyMonitor, SensorReader start (independent → parallel) t=0.1ms: SafetyMonitor done t=2ms: SensorReader done → Controller starts (depends on SensorReader) t=5ms: Controller done → Actuator starts (depends on Controller) ``` Controller always sees SensorReader's latest data. The dependency graph guarantees causal ordering. Independent nodes like SafetyMonitor and SensorReader run simultaneously — no wasted time waiting for each other. Two runs of the same code produce the same causal execution order. ### ROS2: Callback-Based, Event-Driven Callbacks fire when events arrive. Execution order depends on timing, message arrival, and executor implementation: ``` Executor spin: → timer callback fires (sensor) ← order depends on event timing → subscription callback fires (ctrl) ← may fire before or after sensor → timer callback fires (actuator) ← under load, may be delayed ``` Under load, callbacks can be delayed or reordered. Two runs of the same code may execute callbacks in different orders. For a motor controller that reads IMU data, this means the IMU reading might arrive before or after the control computation. ## Developer Experience ### Project Setup ### Node Definition HORUS: 10 lines, no shared pointers, no bind, no QoS depth parameter. ### CLI Comparison | Task | HORUS | ROS2 | |------|-------|------| | Create project | `horus new my_robot` | `ros2 pkg create my_robot` + edit CMakeLists.txt | | Build | `horus build` | `colcon build` | | Run | `horus run` | `ros2 run my_robot my_node` | | List topics | `horus topic list` | `ros2 topic list` | | Echo topic | `horus topic echo velocity` | `ros2 topic echo /velocity` | | Monitor | `horus monitor` | `rqt` (separate install) | | Add dependency | `horus add serde` | Edit package.xml + CMakeLists.txt + rosdep install | ## Feature Comparison | Feature | HORUS | ROS2 | |---------|-------|------| | **Pub/Sub Topics** | Shared memory, 10 auto-selected backends | DDS middleware | | **Services (RPC)** | Beta | Yes | | **Actions (long-running)** | Beta | Yes | | **Transform Frames** | TransformFrame (built-in) | tf2 | | **Recording/Replay** | Built-in Record/Replay | rosbag2 | | **Monitoring** | Web + TUI (built-in) | rqt, Foxglove (separate) | | **Launch System** | YAML launch files | Python/XML/YAML launch | | **Package Manager** | horus registry | rosdep, bloom | | **[Deterministic Mode](/advanced/deterministic-mode)** | SimClock + dependency graph | Partial (use_sim_time) | | **[Safety Monitor](/advanced/safety-monitor)** | Built-in (watchdog, graduated degradation) | Application-level | | **[Deadline Enforcement](/concepts/real-time)** | Built-in (`.budget()`, `.deadline()`) | Manual (rmw QoS) | | **Multi-Machine** | Not yet | DDS discovery | | **3D Visualization** | Not yet | RViz2 | | **Simulation** | horus-sim3d (Bevy + native solver) | Gazebo, Isaac Sim | | **Message IDL** | Rust structs with derives | .msg/.srv/.action files | ## Safety & Failure Handling In ROS2, if a node hangs or misses its deadline, nothing happens — the robot keeps running on its last command. DDS has `LIVELINESS` and `DEADLINE` QoS policies, but they only notify you; they don't take corrective action. Building equivalent safety requires writing a custom lifecycle manager and health monitoring node — hundreds of lines that every team writes differently (or skips entirely). HORUS has this built into the scheduler: a [graduated watchdog](/advanced/safety-monitor) detects frozen nodes (warn → skip → isolate → safe state), [deadline miss policies](/advanced/safety-monitor) control what happens when a node overruns its budget (`Miss::Warn`, `Skip`, `SafeMode`, `Stop`), and [fault tolerance](/advanced/circuit-breaker) handles node crashes with automatic restart, skip, or fatal policies. See [Safety Monitor](/advanced/safety-monitor) for the full reference with configuration and code examples. ## When to Use Each ### Choose HORUS when: - Sub-microsecond IPC latency matters (control loops > 100 Hz) - Deterministic execution order is required (safety-critical systems) - You want a single-file project config - Your robot runs on a single machine - You prefer Rust's safety guarantees - You're starting a new project (no ROS2 migration debt) ### Choose ROS2 when: - You need multi-machine communication (distributed robots, fleet management) - You need RViz2 for 3D visualization - You depend on specific ROS2 packages (MoveIt2, Nav2, SLAM Toolbox) - Your team already has ROS2 expertise - You need the larger ecosystem (drivers, integrations, community support) ### Use both: - HORUS for real-time control on the robot (sensors, actuators, safety) - ROS2 for high-level planning, visualization, fleet management on separate machines - A bridge node translates between DDS topics and HORUS topics at the boundary ## Migration Path HORUS and ROS2 can coexist. Common strategies: 1. **Start new subsystems in HORUS** — Keep existing ROS2 for high-level planning, add HORUS for real-time control 2. **Bridge approach** — Run a bridge node that translates between DDS topics and HORUS topics 3. **Full migration** — Replace ROS2 nodes one-by-one with HORUS equivalents See [Coming from ROS2](/learn/coming-from-ros2) for detailed migration guidance with side-by-side code examples. ## See Also - [Why HORUS?](/learn/why-horus) — Motivation and design philosophy - [Coming from ROS2](/learn/coming-from-ros2) — Migration guide with concept mapping - [Benchmarks](/performance/benchmarks) — Full performance data - [Installation](/getting-started/installation) — Get started in 5 minutes --- ## Why HORUS? Path: /learn/why-horus Description: Why robotics teams choose HORUS over traditional middleware — shared-memory IPC, deterministic scheduling, auto-detected real-time, and single-file config # Why HORUS? You're building a robot. Maybe it's a warehouse AGV, a surgical arm, a drone, or a research platform. You need software that reads sensors, computes control signals, and drives actuators — all on a single computer, all in real time. The conventional answer is ROS2. But within a week, you're debugging DDS discovery, tuning QoS profiles, writing three config files per package, and wondering why your 1 kHz motor controller has 100 µs jitter. You're spending more time fighting the framework than building the robot. HORUS exists because most robots don't need a distributed middleware stack. They need fast, deterministic communication between components on the same machine — and a framework that gets out of the way. ## What HORUS Does Differently ### Shared Memory IPC — 575x Faster Traditional robotics middleware (ROS2/DDS) serializes messages, pushes them through a network stack, and deserializes on the other end — even when sender and receiver are on the same machine. HORUS skips all of that. Topics use shared memory: the publisher writes data once, the subscriber reads from the same address. No copies, no serialization, no kernel transitions. | Message | HORUS | ROS2 (DDS) | Speedup | |---------|-------|------------|---------| | Motor command (16 B) | ~85 ns | ~50 µs | **588x** | | IMU reading (304 B) | ~400 ns | ~55 µs | **138x** | | LiDAR scan (1.5 KB) | ~900 ns | ~70 µs | **78x** | | Point cloud (12 KB) | ~12 µs | ~150 µs | **13x** | Measured on Intel i9-14900K. See [Benchmarks](/performance/benchmarks) for full methodology. The speedup matters most for small, frequent messages — exactly the CmdVel and IMU messages that drive tight control loops. At 1 kHz, a 50 µs DDS message eats 5% of every cycle. A 85 ns HORUS message is negligible. ### Deterministic Execution — No Race Conditions HORUS runs nodes in a guaranteed order every tick: ```rust // simplified scheduler.add(SafetyMonitor::new()?).order(0).build()?; // Always first scheduler.add(SensorReader::new()?).order(1).build()?; // Always second scheduler.add(Controller::new()?).order(2).build()?; // Always third scheduler.add(Actuator::new()?).order(3).build()?; // Always last ``` No callback scheduling surprises. No mutex deadlocks. The safety monitor always runs before the actuator — every tick, guaranteed. Two runs of the same code produce the same execution order. For safety-critical systems, this is not optional. In ROS2, callbacks fire when events arrive. Execution order depends on timing, message arrival, and executor implementation. Under load, callbacks can be delayed or reordered. Two runs of the same code may execute callbacks in different orders. ### Auto-Detected Real-Time Set a rate or budget, and HORUS automatically enables RT features — dedicated thread, budget enforcement, deadline monitoring: ```rust // simplified scheduler.add(MotorNode::new()?) .order(0) .rate(1000_u64.hz()) // 1 kHz → auto-enables RT, derives budget + deadline .on_miss(Miss::SafeMode) // Enter safe state if tick takes too long .build()?; ``` No DDS QoS tuning. No rmw configuration files. No manual thread priority management. Declare your timing requirements and HORUS handles the rest. ### Single-File Config One `horus.toml` replaces CMakeLists.txt, package.xml, and launch files: ```toml [package] name = "warehouse-robot" version = "1.0.0" [dependencies] nalgebra = "0.32" [scripts] start = "horus run --release" test = "horus test --parallel" ``` ### Built-in Safety Safety features are part of the scheduler — not bolted on after the fact: - **[Watchdog](/advanced/safety-monitor)**: Detects frozen nodes with graduated degradation (warn → skip → isolate) - **[Deadline enforcement](/concepts/real-time)**: `.budget()` and `.deadline()` are first-class scheduler features - **[Safe state](/advanced/safety-monitor#enter-safe-state)**: Every node can implement `enter_safe_state()` — stop motors, close valves - **[Emergency stop](/recipes/emergency-stop)**: Event-driven nodes react in microseconds via `.on("emergency.stop")` - **[BlackBox](/advanced/blackbox)**: Flight recorder for post-mortem crash analysis - **[Fault tolerance](/advanced/circuit-breaker)**: Per-node failure policies — restart, skip, or fatal - **[Record & Replay](/advanced/record-replay)**: Tick-perfect replay for reproducing field bugs ### Rust + Python — Your Choice Same framework, same topics, same scheduler. Mix Rust and Python nodes in the same application. Use Python for prototyping and ML, Rust for production control — or both simultaneously. ## Who Uses HORUS - **Research labs** prototyping new robot behaviors (Python for quick iteration) - **Startups** building production robots (Rust safety + performance) - **Control engineers** who need deterministic timing (auto-RT, deadline enforcement) - **Teams migrating from ROS2** who want simpler tooling without sacrificing capability - **Solo developers** who want to build a robot without weeks of framework setup ## What HORUS Is NOT (Yet) Being honest about current limitations: | Limitation | Impact | Workaround | |-----------|--------|------------| | **Single-machine only** | No distributed multi-robot communication | Use ROS2 for fleet management, HORUS for on-robot control | | **No RViz equivalent** | No 3D visualization of robot state | `horus monitor` shows nodes/topics/metrics; use Foxglove for 3D | | **Smaller ecosystem** | Fewer ready-made packages than ROS2's 15-year library | Growing registry; HORUS packages + Rust crate ecosystem | | **No ROS2 bag compatibility** | Can't replay existing rosbag2 files | HORUS has its own recording/replay system | These are active development areas. The core framework is production-ready; the ecosystem is growing. ## Get Started ```bash curl -fsSL https://github.com/softmata/horus/raw/release/install.sh | bash horus new my-robot cd my-robot && horus run ``` ## See Also - [HORUS vs ROS2](/learn/vs-ros2) — Detailed technical comparison - [Coming from ROS2](/learn/coming-from-ros2) — Migration guide with concept mapping - [Installation](/getting-started/installation) — Install in 5 minutes - [Quick Start](/getting-started/quick-start) — Build your first robot - [Architecture](/concepts/architecture) — System design overview - [Benchmarks](/performance/benchmarks) — Full performance data --- ## Coming from ROS2 Path: /learn/coming-from-ros2 Description: A migration guide for ROS2 developers — concept mapping, architecture differences, and side-by-side code comparisons # Coming from ROS2 If you have experience with ROS2, you already know most of the concepts in HORUS. This guide maps what you know to how HORUS does it, highlights the architectural differences, and shows code side-by-side. ## Concept Mapping | ROS2 | HORUS | Notes | |------|-------|-------| | Node | `Node` trait | Same concept. Implement `tick()` instead of callbacks | | Publisher / Subscriber | `Topic` (send/recv) | Named channels, zero-copy via SHM | | Service | `Service` | Request/response, same pattern | | Action | `Action` | Long-running tasks with feedback | | tf2 | `TransformFrame` | `tf` / `tf_static` topics, tree lookups | | Parameter Server | `RuntimeParams` | Per-node typed parameters | | Launch file | `Scheduler` | Single process, all nodes in one scheduler | | rqt / Foxglove | `Monitor` | Built-in web dashboard + TUI | | rosbag | `Record` / `Replay` | Topic recording and playback | | QoS profiles | — | Not yet available | | Lifecycle node | `Node` trait | `init()` / `shutdown()` methods on every node | | DDS middleware | SHM IPC | No middleware layer, sub-microsecond latency | | `colcon build` | `horus build` | Single manifest (`horus.toml`), no CMake | | `ros2 topic echo` | `horus topic echo` | Same idea, different CLI | ## Architecture Differences ### ROS2: Multi-Process, Callback-Based In ROS2, each node is typically its own OS process. Nodes communicate over DDS (a network middleware), and you write callbacks that fire when messages arrive. Launch files coordinate which processes to start. (pid 1)"] <-->|DDS| B["Node B
(pid 2)"] <-->|DDS| C["Node C
(pid 3)"] `} caption="ROS2: Each node is a separate process, communicating over DDS middleware" /> ### HORUS: Single-Process, Tick-Based In HORUS, all nodes live in one process. The scheduler calls each node's `tick()` in a deterministic order every cycle. Nodes communicate through shared-memory topics with zero-copy reads. tick()"] --> B["Node B
tick()"] --> C["Node C
tick()"] end SHM["Shared Memory Topics"] A --- SHM B --- SHM C --- SHM `} caption="HORUS: All nodes in one process, deterministic tick order, zero-copy SHM" /> ### Why Tick-Based Matters for Real-Time | Property | ROS2 Callbacks | HORUS Ticks | |----------|---------------|-------------| | Execution order | Non-deterministic | Deterministic (`.order()`) | | Timing jitter | Depends on DDS, OS scheduling | Bounded by scheduler budget | | Deadline enforcement | Manual (timers) | Built-in (`.deadline()`, `.on_miss()`) | | Thread safety | You manage mutexes | Single-threaded tick, no locks needed | | Latency | Microseconds to milliseconds (DDS) | Sub-microsecond (SHM) | ### Cross-Process Communication HORUS nodes can still talk across processes. SHM topics are visible to any process on the same machine. You simply run two schedulers that share the same topic names — no DDS required. ## Code Comparison Here is the same motor controller node in ROS2 C++ and HORUS (Rust and Python). ### ROS2 C++ ```cpp #include #include #include class MotorNode : public rclcpp::Node { public: MotorNode() : Node("motor") { sub_ = create_subscription( "imu", 10, [this](sensor_msgs::msg::Imu::SharedPtr msg) { last_imu_ = *msg; }); pub_ = create_publisher("cmd_vel", 10); timer_ = create_wall_timer(10ms, [this]() { tick(); }); } private: void tick() { geometry_msgs::msg::Twist cmd; cmd.linear.x = compute_speed(last_imu_); pub_->publish(cmd); } rclcpp::Subscription::SharedPtr sub_; rclcpp::Publisher::SharedPtr pub_; rclcpp::TimerBase::SharedPtr timer_; sensor_msgs::msg::Imu last_imu_; }; int main(int argc, char** argv) { rclcpp::init(argc, argv); rclcpp::spin(std::make_shared()); } ``` ### HORUS **Key differences from ROS2:** - No callback boilerplate -- `tick()` reads and writes directly - Rate is set on the scheduler, not via a timer - No `SharedPtr`, no mutex -- the scheduler guarantees single-threaded access - `Scheduler::run()` / `horus.run()` replaces `rclcpp::spin()` - Python uses `node.recv("topic")` and `node.send("topic", data)` instead of typed subscribers ## Message Type Mapping | ROS2 Message | HORUS Type | Module | |-------------|-----------|--------| | `sensor_msgs/Imu` | `Imu` | `horus::prelude` | | `sensor_msgs/LaserScan` | `LaserScan` | `horus::prelude` | | `sensor_msgs/Image` | `Image` | `horus::memory` | | `sensor_msgs/JointState` | `JointState` | `horus::prelude` | | `sensor_msgs/PointCloud2` | `PointCloud` | `horus::memory` | | `geometry_msgs/Twist` | `Twist` | `horus::prelude` | | `geometry_msgs/Pose` | `Pose3D` | `horus::prelude` | | `geometry_msgs/Transform` | `TFMessage` | `horus::transform_frame` | | `nav_msgs/Odometry` | `Odometry` | `horus::prelude` | | `std_msgs/String` | `String` | Rust stdlib | | `std_msgs/Bool` | `bool` | Rust stdlib | | `std_msgs/Float64` | `f64` | Rust stdlib | ## What Happens When a Node Misbehaves In ROS2, if a node hangs, nothing detects it — the robot keeps running on its last commanded velocity. There is no built-in watchdog or deadline enforcement. In HORUS, the scheduler monitors every node with a [graduated watchdog](/advanced/safety-monitor). If a node stops completing ticks on time, the system automatically warns, skips the node, and eventually calls `enter_safe_state()` to stop motors and apply brakes — all without any other node being affected. See [Safety Monitor](/advanced/safety-monitor) for the full reference with configuration, timeout guidelines, and code examples. ## What HORUS Adds Over ROS2 **Zero-copy SHM.** Topics use shared memory by default. Readers get a direct pointer to the data — no serialization, no copy. This gives sub-microsecond publish-to-read latency. **Deterministic mode.** The scheduler can run in lockstep with a simulation clock. Every tick produces identical results given the same inputs. This is critical for sim-to-real transfer. **Built-in safety monitor.** Every node has a watchdog. If a node exceeds its deadline, the scheduler can warn, skip the node, reduce its rate, or trigger a safe-state shutdown — all configured per-node via `.on_miss()`. **Auto-RT detection.** Set `.rate()` or `.budget()` on a node and HORUS automatically classifies it as real-time. No need to manually configure thread priorities or scheduling policies. **Single-file configuration.** One `horus.toml` replaces `package.xml`, `CMakeLists.txt`, `setup.py`, and launch files. Dependencies, scripts, and node configuration all live in one place. ## What HORUS Doesn't Have Yet **Multi-machine networking.** HORUS currently runs on a single machine. SHM topics do not cross network boundaries. For multi-machine setups, you would need a custom bridge. **Visualization (rviz equivalent).** There is no 3D visualization tool like rviz. The Monitor provides metrics dashboards but not scene rendering. **Bag file format.** Record/Replay works but uses an internal format. There is no equivalent to the rosbag2 format or interoperability with ROS2 bags. **QoS profiles.** There is no quality-of-service configuration for topics (reliability, durability, history depth). Topics are currently best-effort with configurable buffer sizes. **Ecosystem breadth.** ROS2 has thousands of community packages. HORUS is younger and has a smaller library of pre-built drivers and algorithms. Check the [HORUS Registry](https://registry.horusrobotics.dev) for available packages. ## Migration Checklist If you are porting a ROS2 project to HORUS: 1. **Map your nodes.** Each ROS2 node becomes a struct implementing the `Node` trait 2. **Replace callbacks with `tick()`.** Read all inputs at the top of `tick()`, compute, then publish outputs 3. **Convert message types.** Use the mapping table above. Custom messages become Rust structs 4. **Replace launch files.** Build your scheduler in `main()` with `.add()` calls 5. **Replace `package.xml` + `CMakeLists.txt`.** Write one `horus.toml` 6. **Replace tf2 with TransformFrame.** Same tree semantics, publish to `tf` / `tf_static` topics 7. **Test with `tick_once()`.** HORUS supports single-tick execution for deterministic unit tests ## Design Decisions **Why tick() instead of callbacks?** ROS2 callbacks fire when messages arrive — the order depends on timing, executor implementation, and system load. In HORUS, the scheduler calls `tick()` on every node in a fixed order every cycle. This means you always know that the sensor node ran before the controller, and the controller ran before the actuator. For safety-critical systems, deterministic ordering eliminates race conditions that only appear under load. **Why single-process instead of multi-process?** ROS2's multi-process model uses DDS for inter-process communication, adding serialization and kernel transitions (~50 µs per message). HORUS's single-process model uses in-process ring buffers (~3–36 ns). For robots where all software runs on one computer, the multi-process overhead is pure waste. When you do need process isolation (fault boundaries), HORUS supports cross-process topics transparently — same API, same code, just separate processes. **Why `horus.toml` instead of `package.xml` + `CMakeLists.txt`?** ROS2 inherits its build system from catkin/ament, which requires separate files for package metadata (package.xml), build instructions (CMakeLists.txt), Python setup (setup.py), and node launch (launch.py). HORUS collapses all of this into one TOML file. Adding a dependency is one line, not three edits across three files. ## See Also - [HORUS vs ROS2](/learn/vs-ros2) — Detailed feature comparison with benchmarks - [Why HORUS?](/learn/why-horus) — Motivation and design philosophy - [Quick Start](/getting-started/quick-start) — Build your first HORUS application - [Architecture](/concepts/architecture) — System design overview - [CLI Reference](/development/cli-reference) — HORUS CLI commands (mapped from ros2 CLI) --- ## Learn HORUS Path: /learn Description: Understand the core concepts behind HORUS — from nodes and topics to real-time control and safety # Learn HORUS This section explains **what HORUS is and why it works the way it does**. Read these pages to build a mental model before writing code. ## Start Here | If you're... | Read this first | |-------------|----------------| | New to robotics | [What is HORUS?](/concepts/what-is-horus) → [Architecture](/concepts/architecture) | | New to real-time systems | [Real-Time for Robotics](/concepts/real-time) | | Coming from ROS2 | [Coming from ROS2](/learn/coming-from-ros2) | | Ready to build | Jump to [Tutorials](/tutorials) | ## Concepts 1. **[What is HORUS?](/concepts/what-is-horus)** — The problem HORUS solves and how it's different from ROS2 2. **[Architecture](/concepts/architecture)** — Nodes, topics, scheduler — how the pieces fit together 3. **[Real-Time for Robotics](/concepts/real-time)** — What "real-time" means, why robots need it, and when you don't 4. **[Communication Patterns](/concepts/communication-overview)** — Topics vs services vs actions — when to use which 5. **[Safety & Fault Tolerance](/advanced/circuit-breaker)** — Watchdogs, failure policies, and safe states 6. **[Coordinate Transforms](/concepts/transform-frame)** — Frame chains, interpolation, and why robots need transforms 7. **[Coming from ROS2](/learn/coming-from-ros2)** — Concept mapping, architecture differences, and migration guide --- ## See Also - [Getting Started](/getting-started/installation) — Install HORUS - [Tutorials](/tutorials) — Step-by-step learning path - [Core Concepts](/concepts) — In-depth concept explanations ======================================== # SECTION: Core Concepts ======================================== --- ## Nodes: The Building Blocks Path: /concepts/nodes-beginner Description: What HORUS nodes are and how they work — a 5-minute introduction # Nodes: The Building Blocks A robot arm picks parts off a conveyor belt. One piece of software reads the camera, another detects parts, another plans the arm's trajectory, and another sends motor commands. If any of these components shares memory or a call stack with the others, a bug in the camera driver can crash the motor controller — and the arm drops whatever it's holding. HORUS solves this with **nodes**: isolated components that each do one job. A camera node reads frames. A detection node finds parts. A planner node computes trajectories. A motor node sends commands. They communicate through shared-memory channels, but they don't share state. If one crashes, the others keep running — and the safety monitor stops the arm cleanly. > For the complete Node trait reference with all methods, see [Nodes — Full Reference](/concepts/core-concepts-nodes). ## How It Works ### What is a Node? A **node** is one component doing one job: - A **SensorNode** reads the camera or IMU - A **ControlNode** moves the motors - A **SafetyNode** prevents collisions - A **PlannerNode** decides where to go Every node implements the `Node` trait. The only required method is `tick()` — your main logic that runs every cycle: The scheduler calls `tick()` repeatedly — you don't manage loops, threads, or timing. ### How Nodes Communicate Nodes don't call each other directly. They send data through **Topics** — named channels: |send| T(("temperature")) T -->|recv| M["MonitorNode"] style T fill:#f59e0b,stroke:#d97706,color:#000 `} caption="Nodes communicate through Topics, not direct calls" /> The sensor doesn't know the monitor exists. It just publishes data. Any number of subscribers can listen — zero coupling between components. In Python, topics are declared via constructor kwargs: ```python # Python: topics declared via constructor kwargs node = horus.Node( pubs=[horus.CmdVel, "status"], # typed + generic subs=[horus.LaserScan], # typed tick=my_tick, rate=50 ) ``` ### Node Lifecycle Every node has three phases: init: Scheduler starts init --> tick: init() returns Ok tick --> tick: Every cycle tick --> shutdown: Ctrl+C or error shutdown --> [*] `} caption="Node lifecycle: init once, tick repeatedly, shutdown once" /> | Phase | Method | When | Use for | |-------|--------|------|---------| | **Startup** | `init()` | Once, before first tick | Open files, connect to hardware | | **Running** | `tick()` | Every scheduler cycle | Read sensors, compute, send commands | | **Cleanup** | `shutdown()` | Once, on exit | Stop motors, close connections | ### Running Nodes Nodes run inside a **Scheduler**. Add nodes, set their execution order, and run: ```rust // simplified use horus::prelude::*; fn main() -> Result<()> { let mut scheduler = Scheduler::new(); scheduler.add(SensorNode::new()?) .order(0) // runs first .build()?; scheduler.add(ControlNode::new()?) .order(1) // runs second .build()?; scheduler.run() // runs until Ctrl+C } ``` `.order()` controls execution sequence — lower numbers run first. This ensures the sensor publishes data before the controller consumes it. ## Design Decisions **Why isolated nodes instead of a single program?** A monolithic program shares one call stack. A panic in the camera driver kills everything — including the motor controller, which may leave the robot arm in a dangerous position. Nodes provide fault boundaries: the scheduler can isolate a crashing node while the rest of the system continues and the safety monitor stops actuators cleanly. **Why tick() instead of run()?** A `run()` method gives the node full control — it can loop forever, block on I/O, or forget to check for shutdown signals. A `tick()` method gives the scheduler full control: it decides when to call each node, how long to allow, and when to force shutdown. This enables deterministic execution, deadline monitoring, and coordinated shutdown across all nodes. **Why communicate through Topics instead of direct calls?** Direct calls create tight coupling — the sensor must know the controller's API, and adding a logger means modifying the sensor. Topics decouple: the sensor publishes to `"temperature"` and doesn't know who reads it. Adding a logger is zero changes to existing code. ## Trade-offs | Gain | Cost | |------|------| | Fault isolation — one crash doesn't kill the system | Communication through Topics is indirect (nanoseconds, not zero) | | Testable in isolation — tick a node once and assert | More boilerplate than a function call | | Composable — mix and match nodes across projects | Nodes must agree on topic names and message types | | Deterministic execution order via scheduler | No direct function calls between nodes | ## See Also - [Nodes — Full Reference](/concepts/core-concepts-nodes) — All lifecycle methods, execution classes, safety features - [Topics: How Nodes Talk](/concepts/topics-beginner) — The pub/sub communication system - [Scheduler: Running Your Nodes](/concepts/scheduler-beginner) — Execution and timing - [Scheduler API](/rust/api/scheduler) — Node configuration reference - [Sensor Node Tutorial](/tutorials/01-sensor-node) — Build your first node step by step --- ## What is HORUS? Path: /concepts/what-is-horus Description: A high-performance framework for building distributed systems with sub-microsecond IPC # What is HORUS? A surgical robot performs 200 corrections per second. Each correction reads force sensors, computes inverse kinematics, and commands six motors — all within 5 milliseconds. A single late message means the scalpel moves too far. A single dropped message means it doesn't move at all. Traditional robotics middleware serializes every message, copies it through a kernel socket, and deserializes it on the other side. That round trip costs 50–100 microseconds — eating 1–2% of the budget for every message. In a system with 20 inter-node messages per cycle, middleware alone consumes 20–40% of the time budget before any application code runs. HORUS eliminates this overhead. Publisher and subscriber read from the same memory region — no serialization, no copies, no kernel transitions. A `CmdVel` message reaches the motor controller in ~85 nanoseconds, leaving 99.98% of the time budget for actual computation. The framework is written in Rust, so the safety guarantees that protect against data races and use-after-free bugs come from the compiler, not from runtime checks. ## How It Works HORUS applications are built from three primitives: |send| T1 T1 -->|recv| P P -->|send| T2 T2 -->|recv| M SCH["Scheduler"] -.->|order 0| S SCH -.->|order 1| P SCH -.->|order 2| M `} caption="Nodes communicate through shared-memory Topics, orchestrated by the Scheduler" /> ### Nodes A **Node** is a component with a single responsibility. It implements the `Node` trait — only `tick()` is required: ```rust // simplified use horus::prelude::*; struct MotorController { cmd_sub: Topic, } impl Node for MotorController { fn name(&self) -> &str { "MotorController" } fn tick(&mut self) { if let Some(cmd) = self.cmd_sub.recv() { // Drive motors based on velocity command } } } ``` The scheduler calls `init()` once at startup, `tick()` every cycle, and `shutdown()` on Ctrl+C. Nodes are isolated — a crash in one doesn't bring down the system. ### Topics A **Topic** is a named shared-memory channel. Multiple publishers can send, multiple subscribers can receive: ```rust // simplified let publisher: Topic = Topic::new("sensor.temperature")?; publisher.send(25.0); let subscriber: Topic = Topic::new("sensor.temperature")?; if let Some(temp) = subscriber.recv() { println!("Got: {temp}"); } ``` Topic names use dots for hierarchy: `"sensor.imu.accel"`, `"motor.left_wheel"`. The type parameter (``) ensures compile-time type safety. ### Scheduler The **Scheduler** runs nodes in priority order each tick cycle: Order controls data flow: sensors publish before processors consume, processors publish before actuators consume. The scheduler also provides deadline monitoring, graduated watchdog, and coordinated shutdown. Python and Rust nodes communicate through the same shared-memory Topics — mix languages freely within the same application. ## Key Concepts | Concept | What it does | Learn more | |---------|-------------|------------| | **Node** | Self-contained component with `tick()` lifecycle | [Nodes](/concepts/core-concepts-nodes) | | **Topic** | Named shared-memory pub/sub channel | [Topic](/concepts/core-concepts-topic) | | **Scheduler** | Priority-based execution with deadline monitoring | [Scheduler](/concepts/core-concepts-scheduler) | | **Execution Classes** | 5 executor types (RT, Compute, Event, AsyncIo, BestEffort) | [Execution Classes](/concepts/execution-classes) | | **node! Macro** | Eliminates boilerplate for common node patterns | [node! Macro](/concepts/node-macro) | | **Messages** | Typed structs (CmdVel, Imu, LaserScan, etc.) for zero-copy IPC | [Message Types](/concepts/message-types) | ## When to Use HORUS **Good fit:** - Multi-component systems where components communicate at high frequency - Real-time control loops (motor control, flight control, haptics) - Single-machine distributed systems (Raspberry Pi, Jetson, edge devices) - Mixed-language applications (Rust performance + Python prototyping) **Not a good fit:** - Simple single-file scripts (HORUS adds unnecessary structure) - Internet-scale distributed systems (use gRPC, Kafka, or message queues) - CRUD web applications (use Axum, Actix, Django, Flask) - Bare-metal embedded without an OS (use RTIC or Embassy) ## Performance | Metric | Value | |--------|-------| | Same-thread latency | ~3 ns | | Same-process latency | ~18–36 ns | | Cross-process latency | ~50–167 ns | | Small message throughput (<1 KB) | 2M+ msgs/sec | | Framework memory overhead | ~2 MB | For context, ROS2 DDS achieves ~50–100 microseconds for intra-machine messages — HORUS is 300–30,000x faster because data never leaves RAM. ## Design Decisions **Why shared memory instead of network IPC?** Shared memory eliminates serialization, copying, and kernel transitions. A network-based framework like ROS2 pays ~50–100 us per message for DDS serialization + UDP. HORUS pays ~3–167 ns because the data never leaves RAM — publisher writes, subscriber reads from the same memory region. **Why Rust?** Robotics code controls physical actuators. A null pointer dereference in a motor controller can damage hardware or injure people. Rust's ownership system prevents entire categories of bugs (use-after-free, data races, null pointers) at compile time. Python is supported via bindings for ease of use, but the runtime is Rust. **Why a scheduler instead of independent processes?** A scheduler enables deterministic execution order (safety node always runs before motor node), deadline monitoring (detect if a node is too slow), and coordinated shutdown (all actuators go to safe state). Independent processes cannot guarantee these properties without complex external coordination. ## Trade-offs | Gain | Cost | |------|------| | Sub-microsecond IPC — no serialization overhead | Single-machine only — no built-in cross-network communication | | Compile-time memory safety from Rust | Steeper learning curve for developers new to Rust | | Deterministic execution order via scheduler | All nodes must run in the same scheduler (or use multi-process with shared topics) | | Zero-copy communication for large payloads (images, point clouds) | Shared memory requires careful lifecycle management (handled by the framework) | | Same API across Rust and Python | Python nodes pay a small overhead for the Rust FFI bridge | ## See Also - [Quick Start](/getting-started/quick-start) — Build your first HORUS application - [Why HORUS](/learn/why-horus) — Detailed motivation and design goals - [vs ROS2](/learn/vs-ros2) — Side-by-side comparison with ROS2 - [Architecture](/concepts/architecture) — System architecture and internals --- ## Topics: How Nodes Talk Path: /concepts/topics-beginner Description: Understanding HORUS topics and pub/sub communication — a beginner's guide from first principles # Topics: How Nodes Talk Imagine a factory floor. A quality camera spots a defective part on the conveyor belt. It needs to tell the robot arm to reject the part, the logging system to record the defect, and the operator dashboard to flash a warning. In a traditional program, the camera code would call the robot arm code directly — but then adding the logger means editing the camera. Adding the dashboard means editing it again. And if the robot arm code crashes, it takes the camera down with it. This is the **coupling problem**: when components talk directly, every new connection requires changing existing code, and one failure cascades through the entire system. HORUS solves this with **topics**: named channels that carry typed data. The camera publishes to `"quality.defect"`. The robot arm, the logger, and the dashboard all subscribe to `"quality.defect"`. None of them know the others exist. Adding a new subscriber requires zero changes to existing code. If the dashboard crashes, the camera and robot arm keep running. This pattern is called **publish/subscribe** (or **pub/sub**). Think of a radio station: the broadcaster (publisher) sends a signal, and any number of receivers (subscribers) tune in. The broadcaster doesn't know who's listening. Listeners don't know about each other. Anyone can start or stop listening at any time. > For capacity tuning, performance optimization, communication patterns, and multi-process details, see [Topics — Full Reference](/concepts/core-concepts-topic). ## How It Works ### The Big Picture |send| T["Shared Memory
topic: quality.defect"] T -->|recv| B["RobotArmNode"] T -->|recv| C["LoggerNode"] T -->|recv| D["DashboardNode"] end style T fill:#3b82f6,stroke:#2563eb,color:#fff `} caption="A topic is a named channel — one publisher, any number of subscribers" /> A **topic** is a named, typed communication channel. Under the hood it's backed by shared memory with a ring buffer — but you never interact with memory directly. You just call `send()` and `recv()`. Here's what those terms mean if you've never encountered them: - **Named**: Every topic has a string name like `"sensor.temperature"`. Two nodes that create a topic with the same name are automatically connected — no configuration, no broker, no server. - **Typed**: A topic carries one type of data. A `Topic` carries floating-point numbers. A `Topic` carries velocity commands. The compiler enforces this — you can't accidentally subscribe with the wrong type. - **Shared memory**: Normally, sending data between two programs requires the operating system's kernel to copy bytes from one program's memory to the other. For a 6 MB camera image, that copy takes milliseconds. Shared memory is a region of RAM that both programs can read and write directly. The publisher writes the image once; the subscriber reads from the same address. No copies, no kernel involvement — just nanoseconds. - **Ring buffer**: The shared memory is organized as a fixed-size circular buffer (imagine slots arranged in a circle). When all slots are full and a new message arrives, the oldest unread message is overwritten. This guarantees the system never runs out of memory and the publisher never blocks — but slow subscribers may miss messages. ### Your First Topic Both nodes create a topic with the name `"sensor.temperature"`. That's all it takes — HORUS detects they should be connected and wires them together. No configuration files, no broker process, no network setup. ### Topic Naming Topic names are simple strings. Use **dots** for hierarchy: - `"sensor.temperature"` — sensor readings - `"camera.rgb"` — camera frames - `"motor.cmd_vel"` — velocity commands - `"robot.sensors.imu.raw"` — deeply nested namespaces ## Sending Data HORUS provides three ways to send data, each for a different situation: ### `send()` — Fire and Forget ```rust // simplified self.publisher.send(CmdVel::new(1.0, 0.0)); ``` `send()` **always succeeds**. It never blocks. It never returns an error. If the ring buffer is full (the subscriber hasn't read fast enough), the oldest unread message is silently overwritten. **Why this is the default for robotics**: A motor controller runs at 1000 Hz — it expects a new velocity command every millisecond. If it falls 3 messages behind, you don't want it processing stale commands from 3 ms ago. You want it to skip ahead to the latest command. Overwriting old data is the correct behavior. ### `try_send()` — Send With Feedback ```rust // simplified match self.publisher.try_send(expensive_result) { Ok(()) => { /* message delivered to buffer */ } Err(returned_msg) => { // Buffer full — the message is returned to you (not consumed) hlog!(warn, "Buffer full, discarding result"); } } ``` `try_send()` attempts to publish. If the buffer is full, instead of overwriting the oldest message, it gives the message **back** to you. This lets you count drops, log warnings, or retry later. **When to use**: When the message is expensive to create (e.g., a computed path plan) and you want to know if it was accepted. ### `send_blocking()` — Wait for Space ```rust // simplified use horus::prelude::*; match self.publisher.send_blocking(critical_command, 10_u64.ms()) { Ok(()) => { /* delivered */ } Err(SendBlockingError::Timeout) => { hlog!(error, "Could not deliver command within 10ms!"); } } ``` `send_blocking()` waits until buffer space opens up, or until your timeout expires. It uses a graduated wait strategy to minimize latency: first it spins (sub-microsecond), then yields the thread (microseconds), then sleeps in 100 µs increments. **When to use**: Logging pipelines, recording, non-real-time data transfer — anywhere brief blocking is acceptable and message loss is not. ### Choosing a Send Method | Method | Blocks? | On buffer full | Best for | |--------|---------|----------------|----------| | `send()` | Never | Overwrites oldest | Control loops, sensor data, telemetry | | `try_send()` | Never | Returns message to caller | Expensive messages, drop detection | | `send_blocking()` | Up to timeout | Waits for space | Logging, recording, non-RT pipelines | ## Receiving Data ### `recv()` — Take the Next Message ```rust // simplified if let Some(msg) = self.subscriber.recv() { self.process(msg); } ``` `recv()` returns the oldest unread message and **removes it from the buffer**. If nothing is available, it returns `None` — immediately, without blocking or waiting. **Key detail**: Each call to `recv()` consumes one message. If a publisher sends 5 messages between your `tick()` calls, you need 5 `recv()` calls to get all of them. In practice, drain the buffer every tick: ```rust // simplified fn tick(&mut self) { // IMPORTANT: drain all pending messages every tick // If you only call recv() once, messages pile up and you process stale data while let Some(msg) = self.commands.recv() { self.latest_command = Some(msg); } // Now act on only the latest if let Some(cmd) = &self.latest_command { self.execute(cmd); } } ``` ### `read_latest()` — Peek at the Newest ```rust // simplified if let Some(pose) = self.position_sub.read_latest() { self.current_position = pose; } ``` `read_latest()` skips all older messages and gives you only the newest one. Unlike `recv()`, it does **not** remove the message — calling it again returns the same value until a new message arrives. **When to use**: State-like data where you only care about the current value — robot position, transform frames, configuration parameters. ### Choosing a Receive Method | Method | Consumes? | Order | Best for | |--------|-----------|-------|----------| | `recv()` | Yes — removes from buffer | FIFO (oldest first) | Commands, events, data streams | | `read_latest()` | No — stays in buffer | Latest only | State, poses, configuration | ## What Happens at the Edges ### No Subscribers? `send()` still succeeds. The data goes into the ring buffer and waits. If a subscriber connects later, it reads whatever is still in the buffer. If the buffer fills up before anyone subscribes, older messages are overwritten — but nothing crashes or blocks. ### No Data Yet? `recv()` returns `None`. It never blocks and never panics. Your node continues to the next tick: ```rust // simplified fn tick(&mut self) { match self.subscriber.recv() { Some(data) => self.process(data), None => {} // Nothing published yet — normal at startup } } ``` This means your nodes are **startup-order independent**. It doesn't matter whether the publisher or subscriber starts first. ### Subscriber Too Slow? If a publisher sends faster than a subscriber reads, the ring buffer fills up. The next `send()` overwrites the oldest unread message. You can detect this: ```rust // simplified fn tick(&mut self) { if self.subscriber.dropped_count() > 0 { hlog!(warn, "{} messages dropped on '{}' — subscriber too slow", self.subscriber.dropped_count(), self.subscriber.name()); } } ``` **Solutions**: - **Increase buffer capacity**: `Topic::with_capacity("name", 16, None)` instead of the default 4 slots - **Read faster**: Use `read_latest()` instead of `recv()` if you only need the newest value - **Accept the drops**: For telemetry or logging, some dropped messages are often fine ### Monitoring Topics at Runtime Use `horus monitor` to see all active topics, their publishers, subscribers, and message rates in real time: ```bash horus monitor --tui # Interactive terminal dashboard ``` ## How Fast Is This? Numbers without context are meaningless. Here's what topic latency means for a real robot: A motor controller running at **1000 Hz** has a **1 millisecond** (1,000,000 nanoseconds) budget per cycle. The time spent delivering a velocity command via a topic: | Scenario | Latency | % of 1 ms budget | |----------|---------|-------------------| | Same thread | ~3 ns | 0.0003% | | Same process, 1 publisher + 1 subscriber | ~18 ns | 0.002% | | Same process, many-to-many | ~36 ns | 0.004% | | Cross-process (shared memory) | ~50–167 ns | 0.005–0.017% | For comparison, a traditional message-passing system using sockets or pipes adds **50,000–100,000 ns** per message — eating 5–10% of the motor controller's budget. Topics use shared memory to eliminate serialization and kernel transitions, keeping overhead negligible even at kilohertz control rates. ## A Complete Example A temperature monitor with publisher, subscriber, and scheduler — copy-paste and run: ## Design Decisions **Why named channels instead of direct function calls?** If the camera node calls `robot_arm.reject_part()` directly, the camera must know about the robot arm's API. Adding a logger means editing the camera code. Adding a dashboard means editing it again. With topics, the camera publishes to `"quality.defect"` and knows nothing about who's listening. Adding the logger, dashboard, or a dozen more subscribers requires zero changes to the camera. This is the same pattern used by ROS, MQTT, and Kafka — proven at scale across millions of deployments. **Why shared memory instead of sockets or pipes?** Sockets and pipes require the operating system's kernel to copy data between processes. Each message needs a `write()` system call on one side and a `read()` on the other, with the kernel copying bytes in between. This adds 50,000–100,000 nanoseconds per message. Shared memory puts data in a region of RAM that both processes can access directly — the publisher writes once and the subscriber reads from the same address. No copies, no kernel involvement. For a 6 MB camera image, this is the difference between waiting milliseconds (sockets) and waiting nanoseconds (shared memory). **Why a ring buffer that overwrites old data?** A robot's control loop cares about what's happening *now*, not what happened 10 milliseconds ago. If the motor controller falls behind, you want it to skip ahead to the latest velocity command, not process 10 stale commands in sequence. A ring buffer with overwrite semantics guarantees this: the latest data is always available, memory usage is bounded (no unbounded queues that eat all your RAM), and the publisher never blocks waiting for a slow subscriber. If you need guaranteed delivery (no overwrites), use `try_send()` to detect drops or `send_blocking()` to wait. **Why automatic backend selection?** During development, all your nodes typically run in one process — the fastest communication path. In production, you might split the camera and controller into separate processes for fault isolation. If you had to manually choose the communication backend (in-process ring buffer vs cross-process shared memory), you'd need to reconfigure every time you changed the deployment topology. HORUS detects whether publisher and subscriber are in the same thread, same process, or different processes, and automatically selects the fastest available path. Your code stays identical across all deployment configurations. ## Trade-offs | Gain | Cost | |------|------| | **Zero coupling** — publishers don't know subscribers exist | Indirect communication — harder to trace data flow (use `horus monitor` to visualize) | | **Zero-copy for large data** — images and point clouds transfer in nanoseconds | Fixed-size ring buffers — must choose capacity at creation (default: 4 slots) | | **Automatic backend selection** — fastest path chosen for you | Cannot force a specific backend (the auto-selection is always optimal) | | **Multi-subscriber fan-out** — one publisher serves any number of subscribers | Late subscribers miss messages published before they connected | | **Never blocks** — `send()` always returns immediately | Slow subscribers lose old messages (detectable via `dropped_count()`) | ## See Also - [Topics — Full Reference](/concepts/core-concepts-topic) — Capacity tuning, communication patterns, backend details, type-safe topic descriptors - [Topic API](/rust/api/topic) — Complete API reference with every method signature - [Python Topic API](/python/api/python-bindings) — Topic API from Python - [Nodes: The Building Blocks](/concepts/nodes-beginner) — How nodes use topics - [Sensor Node Tutorial](/tutorials/01-sensor-node) — Build a complete working example with topics - [Communication Overview](/concepts/communication-overview) — When to use topics vs services vs actions --- ## horus.toml: Single Source of Truth Path: /concepts/horus-toml Description: Why horus.toml replaces Cargo.toml and pyproject.toml — one config file for all languages # horus.toml: Single Source of Truth Every HORUS project has one config file: `horus.toml`. It replaces the need for separate `Cargo.toml` and `pyproject.toml` files. You declare your dependencies once, and HORUS generates everything else. --- ## The Problem Traditional robotics projects need different config files for each language: | Language | Config Files | |----------|-------------| | Rust | `Cargo.toml` + `Cargo.lock` | | Python | `pyproject.toml` + `requirements.txt` | | Mixed | All of the above | A robot using Python for ML and Rust for control needs **3+ config files**. Adding a dependency means knowing which file to edit. ## The HORUS Solution One file. All languages. All dependencies. ```toml # horus.toml — the only config file you edit [package] name = "my-robot" version = "0.1.0" [robot] name = "turtlebot" description = "robot.urdf" # Path to URDF file simulator = "sim3d" # Simulator plugin (default) [dependencies] # Rust (auto-detected as crates.io) serde = { version = "1.0", source = "crates.io", features = ["derive"] } # Python (auto-detected as PyPI) numpy = { version = ">=1.24", source = "pypi" } [hardware] lidar = { use = "rplidar", port = "/dev/ttyUSB0", sim = true } imu = { use = "bno055", bus = 1, sim = true } [scripts] sim = "horus sim start --world warehouse" deploy = "horus deploy pi@robot --release" [hooks] pre_run = ["fmt", "lint"] # Auto-format and lint before every run ``` When you run `horus build`, HORUS reads `horus.toml` and generates native build files: - **Rust deps** → `.horus/Cargo.toml` - **Python deps** → `.horus/pyproject.toml` You never see or edit these generated files. --- ## The `.horus/` Directory Every HORUS project has a `.horus/` directory containing generated files and build artifacts. It's gitignored and fully managed by horus. ``` my_project/ ├── horus.toml ← You edit this ├── src/ │ ├── main.rs │ └── main.py └── .horus/ ← Generated (don't touch) ├── Cargo.toml ← From horus.toml Rust deps ├── pyproject.toml ← From horus.toml Python deps ├── target/ ← Rust build artifacts └── packages/ ← Cached registry packages ``` If `.horus/` gets corrupted, just delete it: ```bash horus clean --all # Remove everything, regenerated on next build ``` --- ## How It Works CG["cargo_gen"] H --> PG["pyproject_gen"] CG --> CT[".horus/Cargo.toml"] PG --> PT[".horus/pyproject.toml"] style H fill:#3b82f6,color:#fff style CT fill:#22c55e,color:#fff style PT fill:#22c55e,color:#fff `} caption="horus.toml is the single source — native build files are generated into .horus/" /> | Command | What happens | |---------|-------------| | `horus add serde` | Detects crates.io → adds to horus.toml → regenerates .horus/Cargo.toml | | `horus add numpy` | Detects PyPI → adds to horus.toml → regenerates .horus/pyproject.toml | | `horus build` | Reads horus.toml → generates all build files → builds all languages | | `horus test` | Runs cargo test + pytest (all from horus.toml) | | `horus remove X` | Removes from horus.toml → regenerates affected build files | --- ## Comparison | | Traditional | HORUS | |---|---|---| | **Config files** | 3+ per language | 1 (`horus.toml`) | | **Add a dep** | Edit the right file, know the syntax | `horus add NAME` | | **Build** | `cargo build` + `pip install` + ... | `horus build` | | **Test** | `cargo test` + `pytest` | `horus test` | | **Deploy** | Manual cross-compile scripts | `horus deploy pi@robot` | | **Onboarding** | Learn 3 build systems | Learn 1 CLI | --- ## Workspace Projects For projects that contain multiple crates (e.g., a driver library and a binary that uses it), `horus.toml` supports a `[workspace]` section. This lets you manage several crates under a single project root while keeping one unified config file. ```toml # horus.toml — workspace with multiple crates [package] name = "my-robot" version = "0.1.0" type = "lib" # "lib" for libraries, omit for binaries [workspace] members = [ "crates/driver", "crates/controller", "crates/my-robot-bin", ] [dependencies] serde = { version = "1.0", source = "crates.io", features = ["derive"] } ``` Each workspace member directory contains its own `horus.toml` with a `[package]` section. The root `horus.toml` defines shared dependencies and the member list. When you run `horus build`, all members are compiled together, and you can target a specific member with `horus build --package controller`. The `type = "lib"` field in `[package]` marks a crate as a library (no binary entry point). This is useful for shared code that other workspace members depend on but that is not run directly. For a complete guide on setting up and working with multi-crate workspaces, see **[Multi-Crate Workspaces](/development/workspaces)**. --- ## Next Steps - **[Configuration Reference](/package-management/configuration)** — Full field reference for horus.toml - **[Quick Start](/getting-started/quick-start)** — Build your first project - **[CLI Reference](/development/cli-reference)** — All horus commands --- ## All Sections | Section | Purpose | |---------|---------| | `[package]` | Project name, version, metadata | | `[robot]` | Robot name, URDF path, simulator selection | | `[dependencies]` | Project deps from any source (crates.io, PyPI, system, registry) | | `[dev-dependencies]` | Test/development-only dependencies | | `[hardware]` | Hardware node configuration (`use` field + params, `sim` for simulation swap) | | `[scripts]` | Custom project commands | | `[hooks]` | Pre/post action hooks (pre_run, pre_build, pre_test, post_test) | | `[ignore]` | File/directory exclusion patterns | | `enable` | Capability flags (cuda, gpio, etc.) | | `[cpp]` | C++ build configuration | | `[workspace]` | Multi-crate workspace | See [Configuration Reference](/package-management/configuration) for complete field-level documentation. --- ## Design Decisions ### Why TOML Instead of YAML YAML is common in robotics (ROS2 launch files, parameter files), but it has well-documented pitfalls for configuration: implicit type coercion (`yes` becomes `true`, `3.10` becomes `3.1`), significant whitespace that causes silent errors, and multiple ways to express the same thing. TOML has an unambiguous grammar — every value has an explicit type, indentation is not significant, and there is one canonical way to write each structure. HORUS still uses YAML for launch configs and parameter files where its flexibility is useful, but the project manifest uses TOML because dependency versions and package metadata must be parsed unambiguously. ### Why One File Instead of Per-Language Configs A robot project using Rust for control and Python for ML traditionally needs `Cargo.toml`, `pyproject.toml`, and possibly `requirements.txt` — each with different syntax, different dependency resolution, and different tooling. When a team member adds a dependency, they need to know which file to edit. `horus.toml` is the single source of truth: all dependencies are declared once with an explicit `source` field (`crates.io`, `pypi`, `system`, `git`, `path`). HORUS generates the native build files (`Cargo.toml`, `pyproject.toml`) into `.horus/` automatically. One file to learn, one file to review in PRs, one file to validate in CI. ### Why Generated Build Files in .horus/ Directory Native build tools (cargo, pip) need their own config files to function. Rather than forcing users to maintain both `horus.toml` and `Cargo.toml` in sync, HORUS generates the native files into a `.horus/` directory that is gitignored and fully managed. Users never edit these files. This means `cargo build` and `pip install` still work under the hood with their standard tooling — HORUS does not replace build systems, it generates their input. If `.horus/` gets corrupted, `horus clean --all` deletes it and the next build regenerates everything from `horus.toml`. ## Trade-offs | Area | Benefit | Cost | |------|---------|------| | **TOML format** | Unambiguous parsing; no implicit type coercion; explicit types for dependency versions | Less familiar to teams coming from YAML-heavy ROS2 workflows | | **Single manifest** | One file to learn, edit, and review; `horus add` works for any language | Must specify `source` for non-default package registries (e.g., `source = "pypi"` for Python deps) | | **Generated build files** | Native tooling (cargo, pip) works unchanged; no custom build system to learn | Cannot use `Cargo.toml` features not exposed by `horus.toml` without escape hatches; generated files must not be edited | | **`.horus/` directory** | Clean project root; generated artifacts are gitignored and disposable | Extra directory to understand; new contributors may be confused by the absence of `Cargo.toml` in the project root | | **`horus add/remove` CLI** | No need to manually edit TOML or know per-language syntax | Dependency resolution depends on HORUS tooling — `cargo add` and `pip install` do not update `horus.toml` | | **Workspace support** | Multi-crate projects with shared dependencies and a single root config | Workspace members still need their own `horus.toml` with `[package]` — not fully flat | ## See Also - [Package Management](/package-management/package-management) — Dependencies and lockfile - [CLI Reference](/development/cli-reference) — `horus build`, `horus add`, `horus run` - [Choosing a Language](/getting-started/choosing-language) — Rust vs Python project setup --- ## Scheduler: Running Your Nodes Path: /concepts/scheduler-beginner Description: How the HORUS scheduler executes your nodes — a beginner's guide from first principles # Scheduler: Running Your Nodes A robot has a camera reading frames, a controller computing motor commands, a safety system watching for collisions, and a logger recording everything. Each of these is a separate node. But who decides which one runs first? Who makes sure the safety check happens before the motor command? Who handles it when the camera takes too long? And who stops all the motors cleanly when you press Ctrl+C? You could write all of this coordination yourself — loops, threads, timers, signal handlers — but you'd be writing a scheduler. Every robotics team eventually builds one, and most get it wrong. Race conditions, priority inversions, missed deadlines, and unclean shutdowns are the norm. HORUS gives you a **scheduler** that handles all of this. You tell it which nodes to run, in what order, and at what speed. It handles the rest: timing, ordering, monitoring, and shutdown. > For the full reference with real-time configuration, watchdog, deadline monitoring, composable builders, and deterministic mode, see [Scheduler — Full Reference](/concepts/core-concepts-scheduler). ## How It Works ### What Is the Scheduler? The **scheduler** is the engine that runs your nodes. It does three things: 1. **Calls `init()` on every node** — once, at startup. This is where nodes connect to hardware, open files, or set up state. 2. **Calls `tick()` on every node** — repeatedly, in order, at a configurable speed. This is your main logic. 3. **Calls `shutdown()` on every node** — once, when the program exits. This is where nodes stop motors, close connections, and clean up. You don't write loops or manage threads. You add nodes, configure their order and timing, and let the scheduler handle everything. init: Scheduler starts init --> tick: All nodes initialized tick --> tick: Repeat at configured rate tick --> shutdown: Ctrl+C or error shutdown --> [*]: All nodes cleaned up `} caption="The scheduler lifecycle: initialize once, tick repeatedly, shut down once" /> ### Basic Usage ### Execution Order The scheduler figures out the right execution order **automatically** from your topics. When a node calls `send()` on a topic and another calls `recv()` on the same topic, the scheduler knows the sender must run first. publishes scan"] --> B["ControlNode
subscribes scan"] B --> C["LoggerNode
subscribes cmd"] end C -.->|"next tick"| A style A fill:#22c55e,stroke:#16a34a,color:#000 style B fill:#3b82f6,stroke:#2563eb,color:#fff style C fill:#a855f7,stroke:#9333ea,color:#fff `} caption="The scheduler infers order from topic dependencies — no manual .order() needed" /> **The scheduler builds a dependency graph from your topics.** If ControlNode subscribes to a topic that SensorNode publishes, ControlNode always runs after SensorNode — automatically. You don't need to set `.order()` values. | Node | Topics | Scheduler Action | |------|--------|------------------| | SensorNode | publishes `scan` | Runs first (no dependencies) | | ControlNode | subscribes `scan`, publishes `cmd` | Runs after SensorNode | | LoggerNode | subscribes `cmd` | Runs after ControlNode | **Independent nodes run in parallel.** If you have a Camera node and a LiDAR node that publish to different topics with no shared dependencies, the scheduler runs them simultaneously — no configuration needed. You can still use `.order()` as a **tiebreaker** for nodes with no topic relationship, or when you want explicit control over independent nodes. The scheduler provides order-range guidelines: | Range | Category | Examples | |-------|----------|----------| | 0–9 | Critical | Emergency stop, safety monitor | | 10–49 | High priority | Sensor readers, actuator controllers | | 50–99 | Normal | Processing, planning, fusion | | 100–199 | Low priority | Logging, telemetry, visualization | | 200+ | Background | Diagnostics, statistics | ### Setting the Tick Rate The **tick rate** is how many times per second the scheduler runs through all nodes. It's measured in **Hertz (Hz)** — a unit that means "times per second." The default tick rate is **100 Hz** (100 times per second). Change it with `.tick_rate()`: ```rust // simplified use horus::prelude::*; let mut scheduler = Scheduler::new() .tick_rate(100_u64.hz()); // 100 times per second ``` The `.hz()` syntax is HORUS's `DurationExt` trait — it converts a number into a frequency. Similarly, `.ms()` creates milliseconds and `.us()` creates microseconds: ```rust // simplified 100_u64.hz() // 100 Hz (frequency) 5_u64.ms() // 5 milliseconds (duration) 200_u64.us() // 200 microseconds (duration) 1_u64.secs() // 1 second (duration) ``` ### Per-Node Rates Not every node needs to run at the same speed. A fast sensor might need 1000 Hz while a logger only needs 10 Hz. Set per-node rates: ```rust // simplified scheduler.add(FastSensor::new()?) .order(0) .rate(1000_u64.hz()) // This node ticks at 1 kHz .build()?; scheduler.add(SlowLogger::new()?) .order(1) .rate(10_u64.hz()) // This node ticks at 10 Hz .build()?; ``` The scheduler automatically skips ticks for slower nodes — `SlowLogger` only has its `tick()` called every 100th cycle (at 1 kHz global rate, 10 Hz node rate = called every 100 ticks). ### Graceful Shutdown When you press **Ctrl+C**, the scheduler doesn't just kill everything. It: 1. Stops calling `tick()` on all nodes 2. Calls `shutdown()` on every node **in reverse order** — the last-added node shuts down first 3. Exits cleanly This reverse order is critical for safety. Consider: your motor controller (order 1) depends on sensor data from the sensor node (order 0). During shutdown, you want the motor controller to stop the motors *before* the sensor node disconnects — otherwise the motor controller loses its data source while motors are still spinning. ```rust // simplified impl Node for MotorController { fn name(&self) -> &str { "Motor" } fn tick(&mut self) { if let Some(cmd) = self.commands.recv() { self.motor.set_velocity(cmd.linear); } } // SAFETY: always stop motors in shutdown — a spinning motor // with no controller is a safety hazard fn shutdown(&mut self) -> Result<()> { self.motor.set_velocity(0.0); println!("Motor safely stopped"); Ok(()) } } ``` ## Timing: Budgets, Deadlines, and Misses Robots operate in the real world. A motor controller that takes 5 ms instead of 1 ms doesn't just "slow down" — it causes the robot to overshoot its target, potentially colliding with objects or people. The scheduler monitors timing and takes action when nodes run too long. ### What Is a Budget? A **budget** is the maximum time a node's `tick()` should take. If you set a budget of 800 µs (microseconds), the scheduler expects `tick()` to finish within 800 µs. If it takes longer, that's a **deadline miss**. ```rust // simplified scheduler.add(MotorController::new()?) .order(1) .rate(1000_u64.hz()) // 1 kHz → 1 ms per tick .budget(800_u64.us()) // tick() should finish in 800 µs .build()?; ``` ### What Is a Deadline? A **deadline** is the absolute latest a `tick()` can finish before the scheduler considers it a critical problem. The budget is a soft target; the deadline is a hard wall. ```rust // simplified scheduler.add(MotorController::new()?) .order(1) .rate(1000_u64.hz()) .budget(800_u64.us()) // Soft target: finish in 800 µs .deadline(950_u64.us()) // Hard wall: must finish by 950 µs .build()?; ``` If you set `.rate()` without an explicit `.deadline()`, HORUS auto-derives it as **95% of the period**. ### What Happens When a Node Takes Too Long? When a node exceeds its deadline, the scheduler reacts according to the **miss policy** you set with `.on_miss()`: | Policy | What happens | Use when | |--------|-------------|----------| | `Miss::Warn` | Log a warning, continue normally | Non-critical nodes (logging, display) | | `Miss::Skip` | Skip this node's next tick to recover | High-frequency nodes that can afford to skip one cycle | | `Miss::SafeMode` | Call `enter_safe_state()` on the node | Safety-critical nodes (motors slow to safe speed) | | `Miss::Stop` | Stop the entire scheduler | Last resort — whole system must halt | ```rust // simplified scheduler.add(MotorController::new()?) .order(1) .rate(1000_u64.hz()) .budget(800_u64.us()) .on_miss(Miss::SafeMode) // If tick takes too long, enter safe state .build()?; ``` When `Miss::SafeMode` triggers, the scheduler calls `enter_safe_state()` on your node — a method you implement to bring the node to a known-safe condition: ```rust // simplified impl Node for MotorController { fn name(&self) -> &str { "Motor" } fn tick(&mut self) { if let Some(cmd) = self.commands.recv() { self.motor.set_velocity(cmd.linear); } } // Called by scheduler when a deadline miss triggers SafeMode fn enter_safe_state(&mut self) { // SAFETY: reduce to safe speed — don't stop completely, // as that might cause a sudden jerk self.motor.set_velocity(0.0); } fn is_safe_state(&self) -> bool { // Tell the scheduler whether we've reached safe state self.motor.velocity().abs() < 0.01 } } ``` ## Execution Classes (How Nodes Run) Not all nodes have the same workload. A motor controller needs microsecond-precise timing. A path planner needs heavy CPU computation. A cloud uploader needs network I/O. Running them all the same way wastes resources and creates bottlenecks. The scheduler assigns each node an **execution class** based on how you configure it. You describe *what you need*, the scheduler figures out *how to run it*: | What you configure | What the scheduler does | Example use case | |--------------------|------------------------|-----------------| | Nothing special | Runs sequentially in main loop | Logging, telemetry | | `.rate(1000.hz())` | Gives a dedicated real-time thread | Motor control, sensor fusion | | `.compute()` | Offloads to a CPU thread pool | Path planning, SLAM | | `.on("topic")` | Only wakes when that topic has new data | Emergency stop handler | | `.async_io()` | Runs on an async (Tokio) executor | Cloud upload, HTTP, database | ```rust // simplified // Just order — runs in the main loop (simplest) scheduler.add(logger).order(2).build()?; // Add rate — scheduler auto-creates a dedicated RT thread scheduler.add(motor).order(1).rate(1000_u64.hz()).build()?; // Heavy CPU work — scheduler sends it to the thread pool scheduler.add(planner).order(1).compute().build()?; // Only run when emergency data arrives scheduler.add(estop).order(0).on("emergency.stop").build()?; ``` The key insight: **`.rate()` is the trigger for real-time behavior.** When you say "this node needs to run at 1000 Hz," the scheduler knows it needs a dedicated thread with timing guarantees — you don't have to request that explicitly. For deeper coverage of all five classes, see [Execution Classes](/concepts/execution-classes). ## Common Pitfalls ## A Complete Example A temperature monitor with sensor, threshold checker, and logger — all coordinated by the scheduler: ## Design Decisions **Why a scheduler instead of writing your own loop?** A `while` loop that calls each node is simple — until you need timing, ordering, monitoring, and shutdown. A bare loop doesn't enforce execution order, doesn't measure how long each node takes, doesn't recover from deadline misses, and doesn't guarantee motors stop when the program exits. Every robotics team eventually builds these features. The scheduler gives them to you out of the box and has been tested across thousands of configurations. **Why tick() instead of run()?** A `run()` method gives each node full control — it can loop forever, block on I/O, or ignore shutdown signals. A `tick()` method gives the scheduler full control: it decides when to call each node, how long to allow, and when to force shutdown. This enables deterministic execution (same order every cycle), deadline monitoring (detect when a node takes too long), and coordinated shutdown (all nodes stop together, in the right order). **Why automatic execution class detection?** Most developers don't think in terms of "execution classes." They think "this node needs to run at 1 kHz" or "this node does heavy computation." The scheduler infers the right class from `.rate()`, `.compute()`, `.on()`, and `.async_io()`, mapping developer intent to the right executor. If you set `.rate(1000_u64.hz())`, the scheduler knows you need a dedicated real-time thread — you don't have to explicitly request one. **Why reverse-order shutdown?** Nodes are typically added in dependency order: sensors before controllers before loggers. Shutting down in reverse means controllers stop motors before sensors disconnect, and loggers record the shutdown events before they themselves stop. This prevents the dangerous situation where a sensor disconnects while a motor controller is still running (the controller would have no data and might hold the last velocity forever). ## Trade-offs | Gain | Cost | |------|------| | **Deterministic ordering** — nodes always run in the same sequence | Must manually specify `.order()` for each node | | **Automatic timing enforcement** — budget/deadline/miss monitoring | Adds ~microsecond of overhead per tick per monitored node | | **Coordinated shutdown** — all nodes stop cleanly on Ctrl+C | Nodes must implement `shutdown()` for hardware cleanup | | **Auto-detected execution classes** — right executor for each workload | Less explicit control (use `.compute()` or `.on()` to override) | | **tick_rate + per-node rates** — flexible frequency management | Nodes must finish `tick()` within their budget | ## See Also - [Scheduler — Full Reference](/concepts/core-concepts-scheduler) — Watchdog, deterministic mode, blackbox, composable builders - [Execution Classes](/concepts/execution-classes) — Deep dive into the 5 execution classes - [Scheduler API](/rust/api/scheduler) — Complete API reference with every method signature - [Nodes: The Building Blocks](/concepts/nodes-beginner) — The components the scheduler runs - [Topics: How Nodes Talk](/concepts/topics-beginner) — How nodes communicate through the scheduler - [Python Scheduler API](/python/api/python-bindings) — Scheduler from Python --- ## Choosing Your Configuration Path: /concepts/choosing-configuration Description: A progressive guide to HORUS scheduler and node settings — from prototyping to production # Choosing Your Configuration HORUS has dozens of configuration options: tick rates, budgets, deadlines, execution classes, miss policies, watchdogs, CPU pinning, flight recorders. If you try to learn them all at once, you'll drown in complexity before building anything. The good news: you don't need any of them to get started. Defaults work. This guide adds configuration in layers — you only move to the next level when your project needs it. ## Level 0: Just Getting Started You need **nothing**. Defaults work: This gives you: 100Hz tick rate, best-effort scheduling, no RT, no safety monitor. Good enough for learning and prototyping. ## Level 1: Setting Tick Rates When you know how fast your nodes should run: **When to add `.order()`**: When execution order matters. Sensor before controller. Controller before motor driver. Lower number = runs first. **When to add `.rate()`**: When your node needs a specific frequency. A camera at 30Hz. A motor controller at 1kHz. Without `.rate()`, the node ticks at the scheduler's global rate. ## Level 2: Separating Workloads When you have both fast control and slow computation: **When to add `.compute()`**: For CPU-heavy work (SLAM, planning, ML inference) that takes more than a few milliseconds. Keeps it off the RT thread. **When to add `.async_io()`**: For I/O-bound work (network, files, database). Never blocks anything. **When to add `.on("topic")`**: When a node should only run when new data arrives on a topic, not on a fixed schedule. ## Level 3: Safety for Real Robots When deploying on physical hardware: **When to add `.prefer_rt()`**: When running on a real robot. Enables OS-level RT features for better timing. **When to add `.watchdog()`**: When you need to detect nodes that hang or deadlock. **When to add `.on_miss()`**: For safety-critical nodes. What happens when a motor controller misses its deadline? | Policy | Use When | |--------|----------| | `Miss::Warn` | Default. Non-critical nodes. | | `Miss::Skip` | Video encoding, logging. Drop a frame, keep going. | | `Miss::SafeMode` | Motor controllers. Reduce speed or hold position. | | `Miss::Stop` | Emergency systems. Stop everything. | **When to add `.failure_policy()`**: To control what happens when `tick()` panics or errors. | Policy | Use When | |--------|----------| | `Fatal` | Motor control, safety. Stop if broken. | | `Restart(n, backoff)` | Sensor drivers. Reconnect after crash. | | `Skip(n, cooldown)` | Logging, telemetry. Tolerate failures. | | `Ignore` | Debug output. Partial results are fine. | ## Level 4: Production Deployment For robots running unattended: **When to add `.blackbox()`**: For crash forensics. "Why did the robot stop at 3 AM?" **When to add `.priority()`**: When you need OS-level thread priority (SCHED_FIFO). Only for RT nodes. **When to add `.core()`**: When you need CPU pinning. Prevents cache thrashing on multi-core systems. **When to add per-node `.watchdog()`**: When different nodes need different timeout tolerances. A safety monitor needs 5ms. A logger can tolerate 5 seconds. ## Level 5: Simulation and Testing For deterministic, reproducible behavior: **When to add `.deterministic(true)`**: Simulation, testing, CI pipelines. NOT for real robots (virtual time can't drive hardware). **When to add `.with_recording()`**: To capture a session for later replay and debugging. ## Decision Flowchart ``` Are you learning / prototyping? → Level 0: Scheduler::new(), no options Do you have multiple nodes at different rates? → Level 1: add .order() and .rate() Do you have slow compute (SLAM, planning) alongside fast control? → Level 2: add .compute() for heavy work Are you running on a physical robot? → Level 3: add .prefer_rt(), .watchdog(), .on_miss(), .failure_policy() Is the robot running unattended / in production? → Level 4: add .blackbox(), .priority(), .core() Do you need reproducible tests or simulation? → Level 5: add .deterministic(true) ``` ## Quick Reference: All Options ### Scheduler | Method | Level | What It Does | |--------|-------|-------------| | `.tick_rate(freq)` | 1 | Global tick rate | | `.prefer_rt()` | 3 | Enable OS RT features | | `.require_rt()` | 4 | Panic if RT unavailable | | `.watchdog(Duration)` | 3 | Detect frozen nodes | | `.blackbox(size_mb)` | 4 | Crash recorder | | `.max_deadline_misses(n)` | 4 | Emergency stop threshold | | `.cores(&[usize])` | 4 | CPU core pinning | | `.deterministic(bool)` | 5 | Reproducible execution | | `.with_recording()` | 5 | Record for replay | | `.verbose(bool)` | 3 | Control log output | | `.telemetry(endpoint)` | 4 | Export metrics | ### Node Builder | Method | Level | What It Does | |--------|-------|-------------| | `.order(u32)` | 1 | Execution priority | | `.rate(Frequency)` | 1 | Tick rate (auto-RT) | | `.compute()` | 2 | CPU-bound thread pool | | `.async_io()` | 2 | I/O-bound async pool | | `.on("topic")` | 2 | Event-triggered | | `.on_miss(Miss)` | 3 | Deadline miss response (warned if no deadline) | | `.failure_policy(Policy)` | 3 | Crash recovery | | `.budget(Duration)` | 3 | Override tick budget (RT only — rejected on non-RT) | | `.deadline(Duration)` | 3 | Override deadline (RT only — rejected on non-RT) | | `.priority(i32)` | 4 | OS thread priority (RT only — warned if non-RT) | | `.core(usize)` | 4 | CPU core pinning (RT only — warned if non-RT) | | `.watchdog(Duration)` | 4 | Per-node watchdog | | `.build()` | 0 | Validate & finalize (always needed) | > **Tip:** `.build()` catches common mistakes — see [Validation & Conflicts](/concepts/execution-classes#validation--conflicts) for what's rejected vs warned. ## Design Decisions **Why progressive levels instead of a single "best practices" page?** A beginner reading about CPU pinning and SCHED_FIFO priorities before they've written their first node will be overwhelmed and conclude HORUS is complex. The progressive structure matches complexity to experience: Level 0 is a single function call, Level 4 is production deployment. Each level adds only what that stage of development needs. **Why are defaults deliberately simple?** `Scheduler::new()` creates a 100 Hz best-effort scheduler with no RT, no watchdog, no safety monitor. This is intentional — a beginner should be able to run their first robot with zero configuration. Adding `.prefer_rt()` or `.watchdog()` is a conscious decision made when the developer understands why they need it. ## See Also - [Builder Composition Guide](/concepts/builder-composition) — How builder methods interact, override, and compose - [Execution Classes](/concepts/execution-classes) — The 5 classes and when to use each - [Scheduler — Full Reference](/concepts/core-concepts-scheduler) — Deep dive into all scheduler features - [Scheduler API](/rust/api/scheduler) — Builder method reference - [Real-Time Systems](/concepts/real-time) — Understanding timing requirements - [Scheduler Configuration](/advanced/scheduler-configuration) — Advanced tuning --- ## Nodes — Full Reference Path: /concepts/core-concepts-nodes Description: Deep dive into HORUS nodes: the Node trait, lifecycle, communication patterns, safety, and design rationale # Nodes — Full Reference In every robotics system, software components fight over shared resources. The camera driver writes to a buffer while the vision system reads from it. The motor controller modifies velocity state while the safety monitor checks it. Traditional multithreaded programs manage this with locks — but locks introduce deadlocks, priority inversions, and subtle race conditions that only manifest when the robot is moving at full speed through a warehouse. HORUS takes a different approach. Each component is a **node**: an isolated unit with its own state, running in a scheduler-controlled tick loop. Nodes don't share memory directly. They communicate through typed channels (topics) backed by lock-free shared memory. The scheduler controls when each node runs, how long it's allowed to take, and what happens when it misbehaves. The result is a system where you can reason about each node independently — test it in isolation, monitor its timing, and replace it without touching anything else. This page covers the complete Node trait, lifecycle, communication patterns, and safety mechanisms. For a gentler introduction, start with [Nodes: The Building Blocks](/concepts/nodes-beginner). ## The Node Trait Every HORUS node implements the `Node` trait. The only required method is `tick()` — everything else has sensible defaults: ```rust // simplified pub trait Node: Send { // Required — your main logic, called repeatedly by the scheduler fn tick(&mut self); // Identity — defaults to the struct's type name fn name(&self) -> &str; // Lifecycle — called once at startup and shutdown fn init(&mut self) -> Result<()> { Ok(()) } fn shutdown(&mut self) -> Result<()> { Ok(()) } // Safety — used by the safety monitor during deadline misses fn is_safe_state(&self) -> bool { true } fn enter_safe_state(&mut self) {} // Error recovery — called when tick() panics or errors fn on_error(&mut self, error: &str) { /* logs error */ } // Metadata — auto-generated by node! macro, rarely implemented manually fn publishers(&self) -> Vec { Vec::new() } fn subscribers(&self) -> Vec { Vec::new() } } ``` ### `tick()` — Your Main Logic `tick()` is the only required method. The scheduler calls it repeatedly — once per cycle at the configured rate: ```rust // simplified impl Node for MotorController { fn tick(&mut self) { if let Some(cmd) = self.commands.recv() { self.motor.set_velocity(cmd.linear); } } } ``` **Critical rules for tick()**: - **Keep it fast.** The scheduler monitors how long `tick()` takes. If it exceeds the node's budget, that's a deadline miss — which can trigger safety responses. For a 1 kHz node, `tick()` should complete in under 800 µs. - **Never block on I/O.** Reading a file, making an HTTP request, or waiting on a socket blocks the entire tick cycle. Use `.async_io()` or `.compute()` execution classes for blocking work. - **Never allocate in the hot path.** `Vec::new()`, `String::from()`, and `Box::new()` call the allocator, which can take microseconds and introduces jitter. Pre-allocate in `init()`. - **Always drain your topics.** Call `recv()` on every subscribed topic every tick, even if you don't need the data. Unread messages pile up and consume buffer slots. ### `init()` — One-Time Setup Called once at startup, before the first `tick()`. Use it for hardware connections, file handles, buffer allocation, and calibration: ```rust // simplified fn init(&mut self) -> Result<()> { // Open hardware self.serial = serialport::new("/dev/ttyUSB0", 115200) .open() .horus_context("opening motor serial port")?; // IMPORTANT: pre-allocate buffers here — allocation in tick() causes jitter self.buffer = vec![0u8; 256]; // SAFETY: start with actuators in known-safe state self.velocity = 0.0; hlog!(info, "MotorController initialized"); Ok(()) } ``` If `init()` returns `Err`, the node is marked as failed and its `FailurePolicy` is applied (default: node is skipped, error logged). The scheduler continues running other nodes. ### `shutdown()` — Graceful Cleanup Called once when the scheduler stops (Ctrl+C, SIGINT, SIGTERM, or `.stop()`). Nodes shut down in **reverse order** — the last-added node shuts down first: ```rust // simplified // SAFETY: always stop actuators before releasing hardware connections fn shutdown(&mut self) -> Result<()> { // 1. Stop actuators FIRST self.velocity = 0.0; self.send_stop_command(); // 2. Disable hardware outputs self.disable_motor_driver(); // 3. Close connections self.serial = None; hlog!(info, "MotorController shut down safely"); Ok(()) } ``` **Rules for shutdown()**: - **Stop actuators before closing connections** — send zero velocity before dropping the serial port - **Never panic** — if one cleanup step fails, log the error and continue with the rest - **Don't assume tick() ran** — init() may have succeeded but the system may be shutting down before the first tick ```rust // simplified // SAFETY: never panic in shutdown — always attempt all cleanup steps fn shutdown(&mut self) -> Result<()> { if let Err(e) = self.stop_motors() { hlog!(error, "Failed to stop motors: {}", e); // Continue cleanup — don't return early } if let Err(e) = self.close_connection() { hlog!(warn, "Failed to close connection: {}", e); } Ok(()) } ``` ### `is_safe_state()` and `enter_safe_state()` — Safety Monitor Integration When a node with `Miss::SafeMode` exceeds its deadline, the scheduler calls `enter_safe_state()` to bring the node to a known-safe condition. It then polls `is_safe_state()` each tick to check for recovery: ```rust // simplified impl Node for MotorController { fn enter_safe_state(&mut self) { // SAFETY: reduce to zero velocity — don't cut power abruptly, // as that can cause mechanical shock self.target_velocity = 0.0; self.motor.set_velocity(0.0); hlog!(warn, "Motor entering safe state"); } fn is_safe_state(&self) -> bool { // Report whether the node has reached a safe condition self.motor.velocity().abs() < 0.01 } } ``` The default `is_safe_state()` returns `true` (the node claims to always be safe). The default `enter_safe_state()` does nothing. Override both for any node that controls actuators. ### `on_error()` — Error Recovery Called when `tick()` panics (caught by the scheduler) or encounters an error. The default implementation logs the error. Override for custom recovery: ```rust // simplified fn on_error(&mut self, error: &str) { hlog!(error, "Motor controller error: {}", error); self.consecutive_errors += 1; if self.consecutive_errors > 5 { self.enter_safe_state(); } } ``` ### `publishers()` and `subscribers()` — Topic Metadata Return metadata about which topics this node uses. Used by `horus monitor`, graph visualization, and introspection CLI commands. The `node!` macro generates these automatically. When implementing `Node` manually, override them for accurate monitoring: ```rust // simplified fn publishers(&self) -> Vec { vec![TopicMetadata { topic_name: "motor.status".to_string(), type_name: std::any::type_name::().to_string(), }] } ``` ## Node Lifecycle Every node transitions through well-defined states: INIT["Initializing"] INIT --> RUN["Running"] RUN --> STOP["Stopping"] STOP --> DONE["Stopped"] RUN --> ERR["Error"] ERR --> RUN RUN --> CRASH["Crashed"] ERR --> CRASH `} caption="Node state transitions — Error is recoverable, Crashed is not" /> | State | Description | Transitions to | |-------|-------------|----------------| | **Uninitialized** | Created but `init()` not yet called | Initializing | | **Initializing** | `init()` is running | Running, Error | | **Running** | Actively ticking | Stopping, Error, Crashed | | **Stopping** | `shutdown()` is running | Stopped | | **Stopped** | Clean shutdown complete | (terminal) | | **Error** | Recoverable error — `on_error()` called | Running (recovery), Crashed | | **Crashed** | Unrecoverable — node is removed from tick loop | (terminal) | The scheduler monitors health via `NodeHealthState`, tracked per-node with lock-free atomics: | Health | Meaning | Scheduler response | |--------|---------|-------------------| | `Healthy` | Operating within budget | Normal ticking | | `Warning` | 1× timeout elapsed | Log warning | | `Unhealthy` | 2× timeout — tick skipped | Skip tick, log error | | `Isolated` | 3× timeout on critical node | Isolate from scheduler, call `enter_safe_state()` | ## Communication Patterns Nodes communicate through [Topics](/concepts/topics-beginner) — named, typed, shared-memory channels. Here are the standard patterns: ### Publisher A node that produces data for others to consume: ```rust // simplified struct SensorNode { data_pub: Topic, } impl Node for SensorNode { fn name(&self) -> &str { "Sensor" } fn tick(&mut self) { let reading = self.read_hardware(); self.data_pub.send(reading); } } ``` ### Subscriber A node that consumes data from others: ```rust // simplified struct DisplayNode { data_sub: Topic, } impl Node for DisplayNode { fn name(&self) -> &str { "Display" } fn tick(&mut self) { // IMPORTANT: always call recv() — even if you don't need data this tick if let Some(value) = self.data_sub.recv() { println!("Value: {:.1}", value); } } } ``` ### Pipeline (Subscribe → Transform → Publish) A node that transforms data between topics: ```rust // simplified struct Filter { input: Topic, output: Topic, alpha: f32, smoothed: f32, } impl Node for Filter { fn name(&self) -> &str { "Filter" } fn tick(&mut self) { if let Some(raw) = self.input.recv() { // Exponential moving average self.smoothed = self.alpha * raw + (1.0 - self.alpha) * self.smoothed; self.output.send(self.smoothed); } } } ``` ### Multi-Topic Synchronization When you need data from multiple topics, cache the latest from each and process when all are available: ```rust // simplified struct Fusion { imu_sub: Topic, odom_sub: Topic, pose_pub: Topic, last_imu: Option, last_odom: Option, } impl Node for Fusion { fn name(&self) -> &str { "Fusion" } fn tick(&mut self) { // IMPORTANT: drain ALL topics every tick — never skip a recv() conditionally if let Some(imu) = self.imu_sub.recv() { self.last_imu = Some(imu); } if let Some(odom) = self.odom_sub.recv() { self.last_odom = Some(odom); } if let (Some(imu), Some(odom)) = (&self.last_imu, &self.last_odom) { let fused = self.fuse(imu, odom); self.pose_pub.send(fused); } } } ``` ## The `node!` Macro The `node!` macro eliminates boilerplate by generating the struct, `Node` implementation, constructor, and topic metadata: ```rust // simplified use horus::prelude::*; node! { SensorNode { pub { sensor_data: f32 -> "sensor.data", } tick { let data = 42.0; self.sensor_data.send(data); } } } ``` This generates: - A `SensorNode` struct with a `Topic` field named `sensor_data` - A `SensorNode::new()` constructor that creates the topic - A `Node` implementation with `tick()`, `name()`, and `publishers()` - Topic metadata for monitoring and introspection The macro also supports `sub {}` (subscribers), `data {}` (internal state), `init {}`, `shutdown {}`, and `impl {}` blocks. See [The node! Macro Guide](/concepts/node-macro) for the full syntax. ## Logging Use the `hlog!` macro for structured logging inside nodes: ```rust // simplified fn tick(&mut self) { hlog!(info, "Processing frame {}", self.frame_count); hlog!(warn, "Battery at {}%", self.battery_level); hlog!(error, "Sensor disconnected: {}", self.sensor_name); hlog!(debug, "Position: ({:.2}, {:.2})", self.x, self.y); } ``` For topic-level monitoring without code changes, use CLI tools: ```bash horus topic echo sensor.data # Print messages on a topic horus topic hz sensor.data # Show publish rate horus monitor --tui # Interactive dashboard ``` ## Python Nodes Python nodes use a callback-based API that mirrors the Rust pattern: ```python import horus def motor_tick(node): cmd = node.recv("cmd_vel") if cmd is not None: set_motor_velocity(cmd) def motor_shutdown(node): set_motor_velocity(0.0) print("Motor stopped safely") motor = horus.Node( name="Motor", tick=motor_tick, shutdown=motor_shutdown, subs=["cmd_vel"], order=10, ) horus.run(motor) ``` Key differences from Rust: - `tick`, `init`, and `shutdown` are callback functions, not trait methods - Topics are declared via `pubs=["topic"]` and `subs=["topic"]` in the constructor - `node.send("topic", data)` and `node.recv("topic")` instead of `self.topic.send(data)` / `self.topic.recv()` See [Python Bindings](/python/api/python-bindings) for the complete Python Node API. ## Design Decisions **Why tick() instead of run()?** A `run()` method gives each node full control — it can loop forever, block on I/O, or ignore shutdown signals. This makes it impossible for the scheduler to enforce execution order, monitor timing, or coordinate shutdown. A `tick()` method inverts the control: the scheduler decides when to call each node, measures how long it takes, and can force shutdown at any time. This enables deterministic execution, deadline monitoring, and safety enforcement. **Why a trait instead of callbacks or closures?** A trait lets each node hold its own state as struct fields — typed, owned, and scoped to the node's lifetime. Closures would require capturing state via `Arc>`, reintroducing the shared-state problems nodes are designed to eliminate. A trait also enables the compiler to verify that all required methods are implemented and that the node is `Send`. **Why Send but not Sync?** `Send` is required because the scheduler may move a node to a dedicated thread (e.g., for RT execution). `Sync` is not required because the scheduler never allows two threads to access the same node simultaneously — it owns each node exclusively and calls methods sequentially. This means your nodes can use `Cell`, `RefCell`, and other non-`Sync` types without restriction. **Why no context parameter in tick()?** Earlier versions of the HORUS API passed a `NodeContext` or `NodeInfo` to `tick()`. This was removed because it coupled every node to the scheduler's internal types, made nodes harder to test in isolation, and added overhead to every tick call. Instead, nodes use `hlog!` for logging (a global macro) and own their topics directly. Testing a node is now: create an instance, call `tick()`, check the output topics. **Why lazy initialization?** `init()` is called when the scheduler starts, not when the node is added. This means you can configure all your nodes, set up the scheduler, and only open hardware connections when the system is actually ready to run. It also means `init()` can depend on the scheduler's configuration (e.g., clock source) being finalized. ## Trade-offs | Gain | Cost | |------|------| | **Isolation** — each node has its own state, no shared memory | Communication through topics adds nanoseconds of latency | | **Testability** — tick a node once and assert on output topics | More boilerplate than a bare function call | | **Deterministic ordering** — scheduler controls execution sequence | Nodes can't call each other directly | | **Safety monitoring** — budget/deadline/miss enforcement | Must implement `shutdown()` and `enter_safe_state()` for hardware | | **Hot-swappable** — replace a node without touching others | Nodes must agree on topic names and message types | ## See Also - [Nodes: The Building Blocks](/concepts/nodes-beginner) — Beginner introduction - [Topics: How Nodes Talk](/concepts/topics-beginner) — How nodes communicate - [Scheduler: Running Your Nodes](/concepts/scheduler-beginner) — Execution and timing - [Scheduler API](/rust/api/scheduler) — Node builder methods (`.order()`, `.rate()`, `.budget()`, etc.) - [node! Macro](/concepts/node-macro) — Code generation for nodes - [Python Bindings](/python/api/python-bindings) — Python Node API - [Execution Classes](/concepts/execution-classes) — How different workloads run --- ## Topics — Full Reference Path: /concepts/core-concepts-topic Description: Deep dive into HORUS topics: automatic backend selection, communication patterns, memory model, and performance # Topics — Full Reference A self-driving forklift has a LiDAR scanning at 10 Hz, cameras streaming 30 FPS, wheel encoders firing at 1 kHz, and a safety system that must react within microseconds. All of this data needs to flow between nodes — but the forklift can't afford the 50–100 µs per message that sockets and pipes add. At 1 kHz, that overhead alone eats 5–10% of every control cycle. And when the vision team splits their processing into a separate process for fault isolation, you can't afford to rewrite every communication path. HORUS topics solve this with **automatic shared-memory IPC**. You call `send()` and `recv()`. HORUS detects whether publisher and subscriber are on the same thread, same process, or different processes, counts the number of each, and selects the fastest lock-free backend from 10 possible paths — from ~3 ns (same thread, direct channel) to ~167 ns (cross-process, many-to-many). When the topology changes (a new subscriber joins, a process splits), the backend **live-migrates** without dropping messages. Your code never changes. For a gentler introduction, start with [Topics: How Nodes Talk](/concepts/topics-beginner). ## Automatic Backend Selection HORUS doesn't have one communication backend — it has ten, each optimized for a specific topology. The system selects the fastest path automatically based on two factors: 1. **Where** are the publisher and subscriber? (same thread, same process, different processes) 2. **How many** publishers and subscribers exist? B{"Same thread?"} B -->|Yes| C["DirectChannel
~3 ns"] B -->|No| D{"Same process?"} D -->|Yes| E{"How many P × S?"} D -->|No| F{"POD type?"} E -->|1P × 1S| G["SPSC Intra
~18 ns"] E -->|1P × NS| H["SPMC Intra
~24 ns"] E -->|NP × 1S| I["MPSC Intra
~26 ns"] E -->|NP × NS| J["MPMC Intra
~36 ns"] F -->|Yes| K["POD SHM
~50 ns"] F -->|No| L{"How many P × S?"} L -->|NP × 1S| M["MPSC SHM
~65 ns"] L -->|1P × NS| N["SPMC SHM
~70 ns"] L -->|1P × 1S| O["SPSC SHM
~85 ns"] L -->|NP × NS| P["MPMC SHM
~167 ns"] style C fill:#22c55e,stroke:#16a34a,color:#000 style G fill:#3b82f6,stroke:#2563eb,color:#fff style H fill:#3b82f6,stroke:#2563eb,color:#fff style I fill:#3b82f6,stroke:#2563eb,color:#fff style J fill:#3b82f6,stroke:#2563eb,color:#fff style K fill:#a855f7,stroke:#9333ea,color:#fff style M fill:#a855f7,stroke:#9333ea,color:#fff style N fill:#a855f7,stroke:#9333ea,color:#fff style O fill:#a855f7,stroke:#9333ea,color:#fff style P fill:#a855f7,stroke:#9333ea,color:#fff `} caption="HORUS selects from 10 backend paths based on topology — you just call send() and recv()" /> ### Backend Reference | Backend | Latency | Topology | Implementation | |---------|---------|----------|----------------| | DirectChannel | ~3 ns | Same thread, same `Topic` instance | Inlined ring write, no atomics | | SPSC Intra | ~18 ns | Same process, 1 publisher, 1 subscriber | Lock-free ring, Release/Acquire ordering | | SPMC Intra | ~24 ns | Same process, 1 publisher, N subscribers | CAS with AcqRel on tail | | MPSC Intra | ~26 ns | Same process, N publishers, 1 subscriber | Lamport sequence numbers on head | | MPMC Intra | ~36 ns | Same process, N publishers, N subscribers | CAS on both head and tail | | POD SHM | ~50 ns | Cross-process, plain-old-data type | Direct memory-mapped read/write | | MPSC SHM | ~65 ns | Cross-process, N publishers, 1 subscriber | SHM + Lamport sequences | | SPMC SHM | ~70 ns | Cross-process, 1 publisher, N subscribers | SHM + CAS tail | | SPSC SHM | ~85 ns | Cross-process, 1 publisher, 1 subscriber | SHM ring buffer | | MPMC SHM | ~167 ns | Cross-process, N publishers, N subscribers | SHM + CAS head and tail | ### Live Migration When topology changes — a second subscriber joins, a process splits — HORUS transparently migrates to the optimal backend without dropping messages. The migration uses epoch-based notification: producers and consumers periodically check a shared epoch counter and re-detect the optimal path. You never need to configure or trigger migration. It happens automatically on the next `send()` or `recv()` call after the topology change is detected. ## Communication Patterns ### One-to-One (~18 ns same-process, ~85 ns cross-process) The simplest pattern: one publisher, one subscriber. Auto-selects SPSC (single-producer, single-consumer) — the fastest multi-entity path. ### One-to-Many / Broadcast (~24 ns same-process, ~70 ns cross-process) One publisher, multiple subscribers. All subscribers independently receive every message. Auto-selects SPMC (single-producer, multiple-consumer). This is the standard pattern for sensor data: one camera node publishes, and the vision system, logger, and display all subscribe independently. ### Many-to-One / Aggregation (~26 ns same-process, ~65 ns cross-process) Multiple publishers, one subscriber. Auto-selects MPSC (multiple-producer, single-consumer). Messages are interleaved in arrival order. ### Many-to-Many (~36 ns same-process, ~167 ns cross-process) Multiple publishers and subscribers. Auto-selects MPMC — the most general (and slowest) path. Use only when you genuinely need N:N communication; prefer narrower patterns when possible. ## Topic Naming ### Use Dots, Not Slashes HORUS uses **dots (`.`)** for topic name hierarchy: ### Naming Conventions | Pattern | Example | Use for | |---------|---------|---------| | `subsystem.data` | `sensor.temperature` | Most topics | | `subsystem.device.data` | `camera.front.rgb` | Multi-device systems | | `robot_id.subsystem.data` | `robot1.motor.cmd_vel` | Multi-robot fleets | **Avoid**: names starting with `_` (reserved for internal use), names containing special characters (`!@#$%^&*()`), and names containing `/`. ### Type-Safe Topic Descriptors The `topics!` macro defines compile-time topic descriptors that prevent name typos and type mismatches: This catches topic name typos and type mismatches at compile time instead of runtime. ## Message Types ### What Types Work? Any type that implements `Clone + Send + Sync + Serialize + Deserialize + 'static` works with topics. In practice, this means almost everything: ### The TopicMessage Trait Under the hood, all types go through the `TopicMessage` trait, which defines a **wire format** for transmission: For most types, the blanket implementation makes `Wire = T` — a direct pass-through with zero overhead. Pool-backed types (Image, PointCloud, DepthImage, Tensor) use a small descriptor as the wire format and transfer the actual data through shared memory pools. You never need to implement this trait manually — the blanket impl covers all standard types. ### Pool-Backed Types (Zero-Copy for Large Data) For large data types, copying through the ring buffer would be too slow. A 1920×1080 RGB image is 6 MB — copying it at 30 FPS would consume 180 MB/s of memory bandwidth just for one topic. HORUS solves this with **pool-backed types**. The actual data lives in a shared memory pool. The topic transfers only a small descriptor (~200–336 bytes) that points to the pool slot. The receiver reads the pixel data directly from the same memory — zero copies regardless of image size. | Type | Descriptor size | Actual data | Example size | |------|----------------|-------------|--------------| | `Image` | ~288 bytes | Pixel buffer in SHM pool | 6 MB (1080p RGB) | | `PointCloud` | ~200 bytes | Point array in SHM pool | 1.2 MB (30K points) | | `DepthImage` | ~200 bytes | Depth buffer in SHM pool | 4 MB (1080p f32) | | `Tensor` | ~336 bytes | DLPack tensor in SHM pool | Varies | The API is identical — `send()`, `recv()`, `try_send()` all work the same: ## Memory and Capacity ### Ring Buffer Model Each topic is backed by a ring buffer with a fixed number of **slots**. When a publisher calls `send()` and all slots are occupied, the oldest unread message is overwritten. ``` Ring buffer (capacity = 4): ┌───┬───┬───┬───┐ │ 3 │ 4 │ 5 │ _ │ ← slot 0 was overwritten by message 4 └───┴───┴───┴───┘ ↑ ↑ read write cursor cursor ``` The default capacity is **4 slots**. This is intentionally small — robotics control loops care about the latest data, not history. Increase capacity when you need more buffering: ### Memory Usage | Message Type | Size | Default Capacity | Approximate Memory | |-------------|------|------------------|-------------------| | `f32` | 4 B | 4 slots | ~1 KB | | `CmdVel` | 16 B | 4 slots | ~1 KB | | `Imu` | ~300 B | 4 slots | ~2 KB | | `LaserScan` | ~1.5 KB | 4 slots | ~7 KB | For most robotics applications, total topic memory is well under 1 MB. ### Cleaning Up Shared Memory Shared memory files persist after processes exit — by design, so new processes can join existing topics. Clean up between sessions: ```bash horus clean --shm # Clean shared memory horus clean --shm --dry-run # Preview what would be cleaned horus clean --all # Clean everything (SHM + build cache) ``` HORUS also performs automatic stale-topic cleanup — files with no active process are removed when new topics are created. ## Runtime Debugging ### CLI Tools ```bash horus topic list # List all active topics horus topic echo sensor.data # Print messages on a topic horus topic hz sensor.data # Show publish rate horus monitor --tui # Interactive dashboard ``` ### TUI Debug Logging Debug logging is toggled at runtime from the TUI monitor — no code changes or recompilation needed. Select a topic in the Topics tab and press **Enter** to start logging; press **Esc** to stop. When active, every `send()` and `recv()` records timing and message summaries. When disabled, there is zero overhead — introspection is fully separated from the hot path. ### Programmatic Monitoring ## Design Decisions **Why 10 backends instead of just "shared memory"?** A single shared-memory implementation that handles all topologies must use the most conservative synchronization — CAS operations on both head and tail, which costs ~167 ns. But the most common case in robotics is one publisher and one subscriber in the same process, where a simple Release/Acquire ring buffer costs only ~18 ns. The 10-backend architecture lets each topology use the minimal synchronization required, with automatic selection so you never think about it. **Why ring buffers instead of unbounded queues?** Unbounded queues grow without limit. A slow subscriber on a fast publisher creates a memory leak that eventually crashes the process. Ring buffers have fixed, predictable memory usage. When full, they overwrite the oldest data — which is exactly what a control loop wants (latest data, not history). If you need guaranteed delivery, use `try_send()` or `send_blocking()`. **Why automatic backend selection instead of manual configuration?** During development, everything runs in one process. In production, you split across processes for isolation. In simulation, you might run everything on one thread for determinism. If you had to manually configure the backend for every topology, you'd need different configurations for development, testing, and production. Automatic selection means the same code runs optimally everywhere. **Why live migration?** A new subscriber joining shouldn't require restarting the publisher. A process splitting in two shouldn't require code changes. Live migration means the system adapts to topology changes at runtime — backends upgrade when new participants join and downgrade when they leave, without dropping messages or requiring coordination. **Why dots instead of slashes for naming?** POSIX `shm_open` on macOS interprets slashes as directory separators. A topic named `"sensor/lidar"` creates a file at `/dev/shm/sensor/lidar` instead of `/dev/shm/sensor.lidar`, which fails if the `sensor/` directory doesn't exist. Dots work identically on Linux and macOS. This was a pragmatic cross-platform decision, not a stylistic one. ## Trade-offs | Gain | Cost | |------|------| | **10 optimized backends** — each topology gets minimal-overhead sync | More complex internal implementation (transparent to users) | | **Automatic selection** — same code works across all topologies | Cannot force a specific backend (auto-selection is always optimal) | | **Ring buffer** — bounded memory, never blocks, always has latest data | Slow subscribers lose messages (detectable via `dropped_count()`) | | **Pool-backed zero-copy** — 6 MB images transfer in nanoseconds | Pool slots are finite; stale `recv()` wastes slots | | **Live migration** — backends upgrade/downgrade transparently | Brief (~µs) migration pause on topology change | | **Type safety** — compiler enforces message types | All serializable types need `Clone + Serialize + Deserialize` derives | ## See Also - [Topics: How Nodes Talk](/concepts/topics-beginner) — Beginner introduction - [Topic API](/rust/api/topic) — Complete API reference with every method signature - [Python Topic API](/python/api/python-bindings) — Topic API from Python - [Communication Overview](/concepts/communication-overview) — When to use topics vs services vs actions - [Image API](/rust/api/image) — Pool-backed camera images - [PointCloud API](/rust/api/pointcloud) — Pool-backed 3D point clouds - [Message Types](/concepts/message-types) — Standard robotics message types --- ## Scheduler — Full Reference Path: /concepts/core-concepts-scheduler Description: Deep dive into the HORUS scheduler: execution model, real-time enforcement, safety monitoring, and deterministic mode # Scheduler — Full Reference A warehouse robot runs 15 nodes: LiDAR, cameras, motor controllers, safety monitors, path planners, and loggers. The motor controller must run every millisecond — miss a deadline and the robot overshoots into a shelf. The path planner needs 50 ms of CPU time — if it runs on the same thread as the motor controller, it blocks 50 ticks. The safety monitor must run *before* the motor controller every cycle — if it runs after, a collision is detected one cycle too late. And when someone presses the emergency stop, all 15 nodes must shut down in the right order: motors stop before sensors disconnect. Writing this coordination by hand is the single largest source of bugs in robotics software. Thread pools, priority queues, signal handlers, deadline monitors, watchdog timers — each one is a weekend of debugging, and they all interact in non-obvious ways. HORUS's scheduler handles all of this. You add nodes, configure their order and timing, and call `run()`. The scheduler manages execution order, tick rates, deadline enforcement, safety monitoring, graceful shutdown, and recording — across five different execution classes, each optimized for a different workload type. For a gentler introduction, start with [Scheduler: Running Your Nodes](/concepts/scheduler-beginner). For multi-process orchestration (mixed languages, process isolation), see [Launch System](/concepts/launch-system). ## The Execution Model ### How a Tick Works Every tick cycle, the scheduler does the following: B["For each node in order:"] B --> C{"Should this node tick?
(rate limiter check)"} C -->|No — not due yet| B C -->|Yes| D["Start timer"] D --> E["Call node.tick()"] E --> F["Stop timer"] F --> G{"Exceeded budget?"} G -->|No| H["Record metrics"] G -->|Yes| I["Apply Miss policy"] I --> H H --> B B -->|All nodes done| J["Sleep to maintain tick rate"] J --> A style I fill:#ef4444,stroke:#dc2626,color:#fff `} caption="Each tick: rate-check every node, time it, enforce budgets, sleep to maintain rate" /> **Key details**: - **BestEffort nodes auto-parallelize**: the scheduler builds a dependency graph from topic `send()`/`recv()` metadata. Independent nodes (no shared topics) run in parallel via the **ready-dispatch executor**. Dependent nodes (one subscribes to another's topic) execute in causal order — the publisher always finishes before the subscriber starts. RT, Compute, Event, and AsyncIo nodes run on their own dedicated threads. - Rate limiting is per-node: a 10 Hz node inside a 1 kHz scheduler only has `tick()` called every 100th cycle. - Budget enforcement happens *after* `tick()` returns — the scheduler doesn't preempt mid-tick. It records the overrun and applies the `Miss` policy. - `.order()` is **optional** — when nodes have topic dependencies, the graph determines execution order automatically. `.order()` serves as a tiebreaker for independent nodes with no topic relationship. ### Initialization When you call `scheduler.run()`, initialization happens **lazily** — not when nodes are added: 1. All pending node configurations are finalized (execution class inference, budget/deadline auto-derivation) 2. `init()` is called on every node — topic `send()`/`recv()` calls during `init()` register with the TopicNodeRegistry 3. The **dependency graph** is built from the registry: edges from publishers to subscribers, cycle detection, topological sort 4. If a node's `init()` returns `Err`, that node enters Error state and is excluded from ticking. Other nodes continue. 5. The main tick loop begins 6. After the first tick, the graph is rebuilt once if new topic registrations were detected (for topics created lazily during `tick()`) This lazy initialization means you can configure nodes in any order and the scheduler resolves dependencies at startup. ### Shutdown When the scheduler receives a stop signal (Ctrl+C, SIGINT, SIGTERM, or `.stop()`): 1. The main loop exits 2. `shutdown()` is called on every node in **reverse order** — last-added first 3. RT threads are given 3 seconds to exit cleanly; stalled threads are detached 4. Errors during shutdown are logged but don't prevent other nodes from cleaning up 5. Shared memory is cleaned up Reverse-order shutdown ensures dependent nodes stop before their dependencies. The motor controller (order 1) shuts down before the sensor (order 0) that feeds it. ## Configuring the Scheduler ### Builder Methods `Scheduler::new()` creates a minimal scheduler with capability detection (~30–100 µs) but no OS-level features applied. Use builder methods to opt in: | Builder | What it enables | When to use | |---------|----------------|-------------| | `.tick_rate(freq)` | Global tick rate (default: 100 Hz) | Always — match your fastest node | | `.watchdog(duration)` | Frozen node detection with graduated degradation | Production robots | | `.blackbox(size_mb)` | Flight recorder for post-mortem debugging | Production, testing | | `.max_deadline_misses(n)` | Emergency stop threshold (default: 100) | Safety-critical systems | | `.prefer_rt()` | Request RT features, degrade gracefully if unavailable | Most production deployments | | `.require_rt()` | Require RT features — **panics** if unavailable | Hard real-time systems | | `.deterministic(true)` | Dependency-driven execution order for reproducible results | Simulation, testing | | `.verbose(false)` | Suppress non-emergency logging | Quieter production | | `.with_recording()` | Enable session record/replay | Data collection | | `.cores(&[0, 1])` | Pin scheduler to specific CPU cores | Dedicated-core deployments | ### Adding Nodes The fluent builder API configures each node: ### Per-Node Configuration Reference | Method | Description | |--------|-------------| | `.order(n)` | Execution order — lower runs first. 0–9 critical, 10–49 high, 50–99 normal, 100–199 low, 200+ background | | `.rate(freq)` | Per-node tick rate. Auto-derives budget (80% of period) and deadline (95%). Auto-marks as RT | | `.budget(duration)` | Max expected tick duration. Auto-marks as RT | | `.deadline(duration)` | Hard deadline — `Miss` policy fires when exceeded. Auto-marks as RT | | `.on_miss(miss)` | Deadline miss policy: `Warn`, `Skip`, `SafeMode`, `Stop` (default: `Warn`) | | `.compute()` | Runs on thread pool — for CPU-heavy work | | `.on("topic")` | Event-driven — ticks only when named topic receives a message | | `.async_io()` | Runs on tokio runtime — for network/file I/O | | `.priority(n)` | OS thread priority (SCHED_FIFO 1–99) — requires RT capabilities | | `.core(cpu_id)` | Pin to specific CPU core | | `.watchdog(duration)` | Per-node watchdog timeout (overrides global) | | `.failure_policy(policy)` | What to do when `tick()` panics: `Fatal`, `Restart`, `Skip`, `Ignore` | | `.build()` | Finalize and register — returns `Result` | ## Execution Classes The scheduler assigns each node an **execution class** based on its configuration. Each class uses a different executor optimized for its workload: |".rate() or .budget() or .deadline()"| B["Rt
Dedicated thread
Budget enforcement"] A -->|".compute()"| C["Compute
Thread pool
No timing guarantees"] A -->|".on(topic)"| D["Event
Sleeps until message
Wakes on publish"] A -->|".async_io()"| E["AsyncIo
Tokio runtime
Non-blocking I/O"] A -->|"default"| F["BestEffort
Main loop
Sequential"] style B fill:#ef4444,stroke:#dc2626,color:#fff style C fill:#3b82f6,stroke:#2563eb,color:#fff style D fill:#22c55e,stroke:#16a34a,color:#000 style E fill:#a855f7,stroke:#9333ea,color:#fff style F fill:#6b7280,stroke:#4b5563,color:#fff `} caption="Five execution classes — auto-detected from node configuration" /> | Class | Thread model | Timing | Best for | |-------|-------------|--------|----------| | **BestEffort** | Sequential in main loop | No guarantees | Logging, telemetry, display | | **Rt** | Dedicated thread per node | Budget + deadline enforced | Motor control, safety, sensor fusion | | **Compute** | Shared thread pool | None — CPU time only | Path planning, SLAM, image processing | | **Event** | Sleeps until topic message | None — latency from publish to tick | Emergency stop, command handlers | | **AsyncIo** | Tokio runtime | None — I/O bound | Cloud upload, HTTP, database | ### Deferred Finalization Execution class inference happens at `.build()` time, not when individual methods are called. This means `.rate(100_u64.hz()).compute()` and `.compute().rate(100_u64.hz())` behave identically — the last mutually-exclusive setter wins. Specifically: - `.rate()` / `.budget()` / `.deadline()` → Rt - `.compute()` → Compute (overrides Rt) - `.on(topic)` → Event (overrides Rt and Compute) - `.async_io()` → AsyncIo (overrides everything) If none are set, the default is BestEffort. ## Real-Time Enforcement ### Budget and Deadline Auto-Derivation When you set `.rate(freq)`, the scheduler auto-derives: - **Budget** = 80% of the period (e.g., 1000 Hz → 800 µs budget) - **Deadline** = 95% of the period (e.g., 1000 Hz → 950 µs deadline) You can override either with explicit `.budget()` or `.deadline()`. ### The Miss Enum When a node exceeds its deadline, the `Miss` policy fires: | Policy | What happens | Use when | |--------|-------------|----------| | `Miss::Warn` | Log warning, continue normally | Non-critical nodes (default) | | `Miss::Skip` | Skip this node's next tick to recover | High-frequency nodes that can afford one skipped cycle | | `Miss::SafeMode` | Call `enter_safe_state()` on the node | Motor controllers, actuators | | `Miss::Stop` | Stop the entire scheduler | Safety monitor, last resort | ### Graduated Watchdog When `.watchdog(duration)` is set, the scheduler monitors every node with a graduated response: | Timeout | Health State | Response | |---------|-------------|----------| | 1× watchdog | Warning | Log warning | | 2× watchdog | Unhealthy | Skip tick, log error | | 3× watchdog on critical node | Isolated | Remove from tick loop, call `enter_safe_state()` | This prevents a single frozen node from stalling the entire system. ### `.require_rt()` vs `.prefer_rt()` | Method | RT scheduling | Memory locking | CPU affinity | On failure | |--------|--------------|----------------|--------------|------------| | `.prefer_rt()` | Tries SCHED_FIFO | Tries mlockall | Tries isolated CPUs | Logs degradation, continues | | `.require_rt()` | Requires SCHED_FIFO | Requires mlockall | Requires isolated CPUs | **Panics** | Use `.prefer_rt()` for most deployments — it applies what it can and logs what it can't. Use `.require_rt()` only for hard real-time systems where degraded mode is unacceptable. ## Running the Scheduler ### Continuous Mode ### Single-Tick Mode (Testing/Simulation) Execute exactly one tick cycle, then return: This is invaluable for testing — create a scheduler, add nodes, call `tick_once()`, and assert on topic output. No threads, no timing, fully deterministic. ### Deterministic Mode Enable dependency-driven execution order for reproducible results: In deterministic mode, the scheduler uses a dependency graph (inferred from topic connections) to determine execution order instead of relying on OS thread scheduling. This guarantees identical results across runs — essential for simulation and regression testing. ### Duration-Limited ## DurationExt and Frequency HORUS provides ergonomic extension methods for creating `Duration` and `Frequency` values: `Frequency` validates at construction — `0_u64.hz()`, `f64::NAN.hz()`, `f64::INFINITY.hz()`, and negative values all panic immediately rather than producing silent bugs later. A `Frequency` auto-derives timing parameters: - `.period()` → `Duration` (1/freq) - `.budget_default()` → 80% of period - `.deadline_default()` → 95% of period ## Recording and Replay Enable recording through the builder API: Replay a recorded session: ## Performance Monitoring ### Programmatic ### CLI Tools ```bash horus monitor --tui # Interactive dashboard with per-node timing horus node list # List all registered nodes horus node info MotorCtrl # Detailed stats for a specific node ``` ## Design Decisions **Why a single Scheduler instead of per-node threads?** Per-node threads give each node independence, but they lose deterministic ordering. If the sensor and motor controller run on separate threads, you can't guarantee the sensor publishes before the motor controller reads. The scheduler gives you the best of both: sequential ordering for BestEffort nodes (deterministic), dedicated threads for RT nodes (timing guarantees), and thread pools for Compute nodes (parallelism) — all coordinated. **Why auto-detect RT instead of requiring explicit configuration?** Developers think "this node needs to run at 1 kHz," not "this node needs SCHED_FIFO priority 80 on an isolated CPU core." Auto-detection from `.rate()`, `.budget()`, and `.deadline()` maps intent to implementation. The developer describes *what* the node needs; the scheduler figures out *how* to deliver it. **Why graduated watchdog instead of instant kill?** A single late tick might be a transient spike (GC in a Python node, page fault). Killing the node immediately is too aggressive. Graduated response (warn → skip → isolate) gives transient problems time to resolve while still catching truly frozen nodes. Only after 3× the watchdog timeout on a critical node does the scheduler isolate it. **Why reverse-order shutdown?** Nodes are typically added in dependency order: sensors → controllers → loggers. Reverse-order shutdown means controllers stop motors before sensors disconnect, and loggers record the shutdown sequence before they themselves stop. This prevents the dangerous scenario where a sensor disconnects while a motor controller is still running with the last received command. **Why lazy initialization?** `init()` runs when `scheduler.run()` is called, not when `scheduler.add()` is called. This means you can configure all nodes, set global scheduler settings, and defer hardware initialization until the system is truly ready to start. It also means the scheduler's clock, RT configuration, and recording state are all finalized before any node initializes. ## Trade-offs | Gain | Cost | |------|------| | **Deterministic ordering** — nodes always execute in the configured sequence | Must specify `.order()` for every node | | **Five execution classes** — each workload gets the right executor | More complex scheduler internals (transparent to users) | | **Automatic RT detection** — `.rate()` implies dedicated thread + timing | Less explicit control (override with `.compute()` or `.on()`) | | **Budget/deadline enforcement** — catches timing regressions automatically | ~µs measurement overhead per monitored node | | **Graduated watchdog** — frozen nodes are isolated, not killed instantly | 3× timeout before isolation on critical nodes | | **Graceful shutdown** — guaranteed `shutdown()` call on all nodes | Nodes must implement `shutdown()` for hardware cleanup | | **tick_once()** — fully deterministic single-step execution for tests | Not suitable for production (no rate control) | ## See Also - [Scheduler: Running Your Nodes](/concepts/scheduler-beginner) — Beginner introduction - [Execution Classes](/concepts/execution-classes) — Deep dive into the 5 classes - [Scheduler API](/rust/api/scheduler) — Complete API reference with every method signature - [Nodes — Full Reference](/concepts/core-concepts-nodes) — The components the scheduler runs - [Safety Monitor](/advanced/safety-monitor) — Watchdog, budget enforcement, graduated degradation - [Scheduler Configuration](/advanced/scheduler-configuration) — Advanced tuning - [Python Scheduler API](/python/api/python-bindings) — Scheduler from Python --- ## Execution Classes Path: /concepts/execution-classes Description: The 5 execution classes — Rt, Compute, Event, AsyncIo, BestEffort — and when to use each # Execution Classes A motor controller that misses its deadline by even a millisecond can cause a robot arm to overshoot and collide with a person. A path planner that takes 50 ms of CPU time is completely normal — but if it runs on the same thread as the motor controller, it blocks 50 ticks. A logging node that takes an extra 10 ms is harmless. A cloud uploader that blocks on a network request shouldn't hold up anything. These are fundamentally different workloads. Running them all the same way — in a single sequential loop — forces every node to compromise. The fast ones wait for the slow ones. The critical ones share a thread with the optional ones. A single slow node can cascade timing failures across the entire system. HORUS solves this with **execution classes**: five different executors, each optimized for a specific workload type. The scheduler automatically selects the right class based on how you configure the node — you describe what your node needs, and the scheduler figures out how to run it. ## The Five Classes |default — no special config| F["BestEffort
Main loop, sequential"] A -->|".rate() or .budget() or .deadline()"| B["Rt
Dedicated thread, timing enforced"] A -->|".compute()"| C["Compute
Thread pool, parallel"] A -->|".on(topic)"| D["Event
Sleeps until message"] A -->|".async_io()"| E["AsyncIo
Tokio runtime, non-blocking"] style F fill:#6b7280,stroke:#4b5563,color:#fff style B fill:#ef4444,stroke:#dc2626,color:#fff style C fill:#3b82f6,stroke:#2563eb,color:#fff style D fill:#22c55e,stroke:#16a34a,color:#000 style E fill:#a855f7,stroke:#9333ea,color:#fff `} caption="The scheduler selects the execution class from your node configuration" /> ### BestEffort (Default) The default class. The scheduler **automatically parallelizes** independent BestEffort nodes using a dependency graph built from topic `send()`/`recv()` metadata. Nodes that share no topics run simultaneously. Nodes with topic dependencies execute in causal order (publisher before subscriber). When no topic metadata exists, nodes fall back to `.order()` tiers. **How it works**: At startup, the scheduler builds a dependency graph from topic metadata. Independent nodes are dispatched to a thread pool via the **ready-dispatch executor** — each node starts the instant its last dependency finishes, with no barriers. Dependent nodes execute in causal order. The graph is built after `init()` and rebuilt once after the first tick to capture any lazily-registered topics. **Use for**: Sensors, controllers, planners, logging, telemetry — most nodes. The scheduler automatically determines what can run in parallel. **Characteristics**: - Runs at the scheduler's global `tick_rate()` - Independent nodes execute in parallel (automatic — no configuration needed) - Dependent nodes execute in causal order (publisher before subscriber) - `.order()` is optional — used as tiebreaker for nodes with no topic relationship - Falls back to sequential `.order()` tiers when no topic metadata is available ### Rt (Real-Time) Each RT node gets a **dedicated thread** with optional OS-level priority scheduling. The scheduler enforces timing budgets and deadlines, and takes action when a node runs too long. **How it works**: Each RT node runs on its own dedicated thread. If `.require_rt()` or `.prefer_rt()` is set on the scheduler and the OS supports it, the thread gets `SCHED_FIFO` real-time priority. The scheduler measures every `tick()` call and applies the `Miss` policy when budget or deadline is exceeded. **Additional RT configuration**: **Use for**: Motor control, safety monitoring, sensor fusion — anything where missing a deadline has physical consequences. ### Compute For CPU-heavy work that benefits from parallelism. Multiple Compute nodes run simultaneously on a shared thread pool. **How it works**: Compute nodes are dispatched to a thread pool (similar to `rayon`). Multiple compute nodes can run in parallel on different CPU cores. They don't block the main tick loop or RT threads. **Use for**: Path planning, SLAM, point cloud processing, ML inference on CPU, image processing — any CPU-bound work that takes more than ~1 ms per tick. ### Event Nodes that sleep until a specific topic receives new data. Zero CPU usage when idle. **How it works**: The node's thread sleeps. When any publisher calls `send()` on the named topic, the Event node wakes and `tick()` is called. If multiple messages arrive between wakes, the node ticks once — call `recv()` in a loop inside `tick()` to drain all pending messages. **Use for**: Emergency stop handlers, command receivers, sparse event processors — anything where the node should be completely idle until something specific happens. **Characteristics**: - Zero CPU when no messages arrive - Wake latency: ~microseconds from `send()` to `tick()` - `.order()` still applies: if two event nodes wake simultaneously, lower order runs first - The topic name in `.on("name")` must match a `Topic::new("name")` in another node ### AsyncIo For network or file I/O operations that would block a real-time thread. Runs on a tokio runtime. **How it works**: The node's `tick()` runs via `tokio::task::spawn_blocking` on a tokio-managed thread pool. The node can safely block on network requests, file I/O, or database queries without affecting any other node. **Use for**: HTTP/REST API calls, database writes, file logging, cloud telemetry. ### Gpu For nodes that launch CUDA kernels. The scheduler manages a dedicated GPU thread with one CUDA stream per node. Kernels launched in `tick()` execute asynchronously on the GPU; the executor synchronizes after all GPU nodes have launched. **How it works**: The GpuExecutor runs on a dedicated thread with a CUDA context. Each GPU node gets its own CUDA stream. Per tick cycle: (1) all ready nodes call `tick()` which launches GPU kernels non-blocking, (2) all streams synchronize, (3) GPU-side timing is recorded via CUDA events. Inside `tick()`, access the scheduler-managed stream via `horus::gpu_stream()`. **Use for**: Image preprocessing (resize, normalize, color convert), neural network inference, point cloud filtering, any CUDA kernel work. `.gpu()` takes precedence over `.rate()` for execution class -- a GPU node with `.rate(30.hz()).gpu()` runs on the GPU executor at 30Hz, not the RT executor. **Graceful degradation**: If CUDA is not available, the GpuExecutor logs a warning and GPU nodes fall back to BestEffort execution on the main thread. ## How Classes Are Selected The scheduler selects the execution class based on **which builder methods you call**: | Configuration | Resulting Class | |--------------|-----------------| | (nothing special) | BestEffort | | `.rate()` | Rt (auto-derived budget/deadline) | | `.budget()` | Rt | | `.deadline()` | Rt | | `.rate()` + `.budget()` + `.deadline()` | Rt (explicit overrides) | | `.compute()` | Compute | | `.compute().rate()` | Compute (rate-limited, not RT) | | `.on("topic")` | Event | | `.async_io()` | AsyncIo | | `.async_io().rate()` | AsyncIo (rate-limited, not RT) | | `.gpu()` | Gpu | | `.gpu().rate()` | Gpu (rate-limited, CUDA stream managed) | **Key rule**: `.rate()` only auto-enables RT when no explicit execution class (`.compute()`, `.on()`, `.async_io()`, `.gpu()`) is set. When combined with an explicit class, `.rate()` just limits tick frequency. ### Deferred Finalization Class selection happens at `.build()` time, not when individual methods are called. This means: - `.rate(100_u64.hz()).compute()` → **Compute** (`.compute()` overrides auto-RT) - `.compute().rate(100_u64.hz())` → **Compute** (same result regardless of order) - `.compute().async_io()` → **AsyncIo** (last explicit class wins, warning logged) ## Decision Guide | Your node does... | Use | Builder | |-------------------|-----|---------| | Motor control at 500+ Hz | **Rt** | `.rate(500_u64.hz())` | | Safety monitoring with deadlines | **Rt** | `.rate().budget().deadline().on_miss()` | | Path planning (takes 10-50 ms) | **Compute** | `.compute()` | | ML inference on CPU | **Compute** | `.compute()` | | React to emergency stop | **Event** | `.on("emergency.stop")` | | Process commands as they arrive | **Event** | `.on("command")` | | Upload telemetry to cloud | **AsyncIo** | `.async_io()` | | Write logs to disk | **AsyncIo** | `.async_io()` | | Display dashboard updates | **BestEffort** | (default) | | Simple sensor reading | **BestEffort** or **Rt** | `.rate()` if timing matters | ## Validation and Common Mistakes `.build()` validates your configuration and catches mistakes: ### What's Rejected | Configuration | Error | |--------------|-------| | `.compute().budget()` | Budget only meaningful for RT nodes | | `.on("topic").deadline()` | Deadline only meaningful for RT nodes | | `.async_io().budget()` | Budget only meaningful for RT nodes | | `.budget(Duration::ZERO)` | Budget must be > 0 | | `.on("")` | Empty topic — node can never trigger | ### What's Warned | Configuration | Warning | |--------------|---------| | `.compute().async_io()` | Last class wins (AsyncIo), first silently overridden | | `.compute().priority(99)` | Priority ignored on non-RT nodes | | `.on_miss(Miss::Stop)` without deadline | No deadline to miss — policy has no effect | ### Common Mistakes **1. Thinking `.priority()` works on Compute nodes** **2. Setting `.on_miss()` without a deadline** **3. Chaining multiple execution classes** ## Complete Example: Mixed Execution Classes ## Design Decisions **Why 5 classes instead of just RT and non-RT?** A thread-per-node model (RT) is wasteful for logging nodes — dedicating OS threads and SCHED_FIFO slots to telemetry is overkill. A single-threaded model (BestEffort) can't handle 50 ms path planning without stalling the control loop. A two-class split (RT vs non-RT) doesn't distinguish between CPU-bound work (Compute), event-driven reactions (Event), and I/O-bound operations (AsyncIo) — each of which has a fundamentally different optimal executor. The five-class model matches the five common robotics workload patterns. **Why auto-detection instead of explicit class selection?** Most developers don't think in terms of "execution classes" — they think "this node needs to run at 1 kHz" or "this node does heavy computation." Auto-detection from `.rate()`, `.compute()`, `.on()`, and `.async_io()` maps intent to the right executor without requiring framework knowledge. If you set `.rate(1000_u64.hz())`, the scheduler knows you need a dedicated real-time thread. You don't have to explicitly request one. **Why does `.rate()` + `.compute()` not become RT?** Because rate-limiting and real-time are different things. A path planner at 10 Hz means "tick at most 10 times per second" — not "this node has a 100 ms deadline that must be enforced." Mixing the two concepts would force compute nodes to pay RT overhead (dedicated threads, timing measurement) for no benefit. The rule is clear: `.rate()` only triggers RT when no explicit class is set. ## Trade-offs | Gain | Cost | |------|------| | **Right executor per workload** — each node runs optimally | Must understand which class fits your node | | **Auto-detection** — `.rate()` infers RT without explicit configuration | Less explicit — must know the `.rate()` + `.compute()` interaction | | **RT isolation** — a slow Compute node can't block an RT motor controller | RT nodes consume one OS thread each | | **Event nodes** — zero CPU when idle | Must match `.on("topic")` name exactly to a `Topic::new("topic")` | | **AsyncIo** — I/O never blocks the tick loop | tokio runtime overhead for simple file writes | ## See Also - [Builder Composition Guide](/concepts/builder-composition) — How builder methods interact, override, and compose - [Scheduler — Full Reference](/concepts/core-concepts-scheduler) — How the scheduler manages all execution classes - [Scheduler API](/rust/api/scheduler) — Complete builder method reference - [Scheduler Configuration](/advanced/scheduler-configuration) — Advanced tuning and RT setup - [Choosing Configuration](/concepts/choosing-configuration) — Progressive complexity guide - [Real-Time Control Tutorial](/tutorials/realtime-control) — Hands-on RT tutorial --- ## Message Types Path: /concepts/message-types Description: Standard HORUS message types for robotics applications # Message Types HORUS provides 70+ standard message types for robotics. Pick the right ones for your robot: | I'm building a... | Start with these messages | |---|---| | **Mobile robot** | `CmdVel`, `Odometry`, `LaserScan`, `Imu`, `BatteryState` | | **Robot arm** | `JointState`, `JointCommand`, `WrenchStamped`, `TrajectoryPoint` | | **Drone** | `Imu`, `NavSatFix`, `MotorCommand`, `BatteryState`, `Pose3D` | | **Vision system** | `Image`, `Detection`, `PointCloud`, `DepthImage`, `CameraInfo` | | **Multi-robot** | `Pose2D`, `Heartbeat`, `DiagnosticStatus`, `TransformStamped` | | **Teleoperation** | `JoystickInput`, `CmdVel`, `EmergencyStop` | | **Industrial** | `JointState`, `MotorCommand`, `ForceCommand`, `DiagnosticReport` | **Need a custom type?** Use the `message!` macro — it handles serialization and optimization automatically: ```rust // simplified use horus::prelude::*; message! { pub struct MotorFeedback { pub motor_id: u32, pub velocity: f32, pub current_amps: f32, pub temperature_c: f32, } } let topic: Topic = Topic::new("motor.feedback")?; ``` All built-in messages use fixed-size structures with zero-copy shared memory transport (~50ns latency). For zero-copy performance, make your type POD — see [POD Types](/concepts/core-concepts-podtopic) for details. --- ## Typed Messages vs Generic Messages ### Typed Messages (Recommended) Strongly-typed Rust structs — all available via `use horus::prelude::*`: **Benefits:** - **Ultra-fast**: ~50-167ns IPC latency (zero-copy shared memory for POD types) - **Type safety**: Compile-time checks prevent type mismatches - **IDE support**: Autocomplete, type hints, inline documentation - **Cross-language**: Rust and Python see the same typed data ### Generic Messages (Prototyping) Dynamic data for arbitrary structures using `GenericMessage`: `GenericMessage` uses MessagePack serialization with a 4KB maximum payload. It has an inline buffer for small messages (≤256 bytes) and an overflow buffer for larger ones. **Tradeoffs:** - Flexible — any data structure, evolving schemas - Slower IPC — serialization overhead vs zero-copy POD types - No compile-time type safety **Use generic messages** for quick prototypes, external JSON integrations, or truly dynamic schemas. **Default to typed messages** for production code. ### Performance Comparison | Feature | Typed Messages | Generic Messages | |---------|---------------|------------------| | **IPC Latency** | ~50-167ns (POD) | Higher (serialization) | | **Type Safety** | Compile-time | Runtime only | | **IDE Support** | Full autocomplete | None | | **Best For** | Production | Prototyping | --- ## LogSummary Trait The `LogSummary` trait provides human-readable summaries for logging. It is **not required** for basic `Topic::new()` usage, but is required if you want logging via `Topic::verbose flag (via TUI monitor)`. ```rust // simplified pub trait LogSummary { fn log_summary(&self) -> String; } ``` ### When is LogSummary Used? - `Topic::new("name")?` — no `LogSummary` required, no logging overhead - `Topic::new("name")?` — requires `T: LogSummary`, enables logging on send/recv When logging is active, `log_summary()` is called once per send/recv. Logs appear in the console and in `horus monitor`. ### Deriving LogSummary For most types, derive the trait to get `Debug` formatting automatically: ```rust // simplified use horus::prelude::*; #[derive(Debug, Clone, Serialize, Deserialize, LogSummary)] pub struct RobotState { pub position: [f64; 3], pub velocity: f64, pub battery_level: f32, } // log_summary() outputs: RobotState { position: [1.0, 2.0, 0.0], velocity: 1.5, battery_level: 0.85 } ``` ### Custom Implementation for Large Types For types where `Debug` output would be too large (images, point clouds, scans), implement `LogSummary` manually: ```rust // simplified use horus::prelude::*; impl LogSummary for MyLargeMessage { fn log_summary(&self) -> String { format!("MyMsg({} items, {:.2}MB)", self.count, self.size_mb()) } } ``` **Guidelines:** - Keep summaries concise — they appear inline in logs - Include units (meters, rad/s, %) to make values unambiguous - Log metadata about the message, not the full content ### Built-in LogSummary Implementations LogSummary is implemented for: - Primitive types: `f32`, `f64`, `i32`, `i64`, `u32`, `u64`, `usize`, `bool`, `String` - Messages: `CmdVel`, `CompressedImage`, `CameraInfo`, `RegionOfInterest`, `StereoInfo`, `NavSatFix`, `GenericMessage` - Descriptors: `ImageDescriptor`, `PointCloudDescriptor`, `DepthImageDescriptor`, `Tensor` - Any type that derives `#[derive(LogSummary)]` (uses `Debug` formatting) --- ## Geometry Messages Spatial primitives for position, orientation, and motion. All are POD types. ### CmdVel Basic 2D velocity command: ```rust // simplified use horus::prelude::*; let cmd = CmdVel::new(1.0, 0.5); // 1.0 m/s forward, 0.5 rad/s rotation let stop = CmdVel::zero(); let cmd = CmdVel::with_timestamp(1.0, 0.5, 123456789); ``` | Field | Type | Description | |-------|------|-------------| | `stamp_nanos` | `u64` | Timestamp in nanoseconds | | `linear` | `f32` | Forward velocity in m/s | | `angular` | `f32` | Rotation velocity in rad/s | ### Twist 3D velocity with linear and angular components: ```rust // simplified use horus::prelude::*; // 2D twist (common for mobile robots) let cmd = Twist::new_2d(1.0, 0.5); // 1.0 m/s forward, 0.5 rad/s rotation // 3D twist let cmd_3d = Twist::new( [1.0, 0.5, 0.0], // Linear velocity [x, y, z] m/s [0.0, 0.0, 0.5] // Angular velocity [roll, pitch, yaw] rad/s ); let stop = Twist::stop(); assert!(cmd.is_valid()); ``` | Field | Type | Description | |-------|------|-------------| | `linear` | `[f64; 3]` | Linear velocity in m/s | | `angular` | `[f64; 3]` | Angular velocity in rad/s | | `timestamp_ns` | `u64` | Nanoseconds since epoch | ### Pose2D 2D position and orientation for planar robots: ```rust // simplified use horus::prelude::*; let pose = Pose2D::new(1.0, 2.0, 0.5); // x=1m, y=2m, theta=0.5rad let origin = Pose2D::origin(); let distance = pose.distance_to(&origin); // Normalize angle to [-π, π] let mut pose = Pose2D::new(1.0, 2.0, 3.5); pose.normalize_angle(); ``` | Field | Type | Description | |-------|------|-------------| | `x` | `f64` | X position in meters | | `y` | `f64` | Y position in meters | | `theta` | `f64` | Orientation in radians | | `timestamp_ns` | `u64` | Nanoseconds since epoch | ### TransformStamped 3D transformation with translation and rotation: ```rust // simplified use horus::prelude::*; let identity = TransformStamped::identity(); let tf = TransformStamped::new( [1.0, 2.0, 3.0], // Translation [x, y, z] [0.0, 0.0, 0.0, 1.0] // Rotation quaternion [x, y, z, w] ); // From 2D pose let pose2d = Pose2D::new(1.0, 2.0, 0.5); let tf = TransformStamped::from_pose_2d(&pose2d); // Normalize quaternion let mut tf = tf; tf.normalize_rotation(); ``` | Field | Type | Description | |-------|------|-------------| | `translation` | `[f64; 3]` | Position in meters | | `rotation` | `[f64; 4]` | Quaternion [x, y, z, w] | | `timestamp_ns` | `u64` | Nanoseconds since epoch | ### Point3, Vector3, Quaternion 3D points, vectors, and rotations: ```rust // simplified use horus::prelude::*; // Point let point = Point3::new(1.0, 2.0, 3.0); let distance = point.distance_to(&Point3::origin()); // Vector with operations let vec = Vector3::new(1.0, 0.0, 0.0); let magnitude = vec.magnitude(); let dot = vec.dot(&Vector3::new(0.0, 1.0, 0.0)); let cross = vec.cross(&Vector3::new(0.0, 1.0, 0.0)); // Quaternion let q = Quaternion::identity(); let q = Quaternion::from_euler(0.0, 0.0, std::f64::consts::PI / 2.0); ``` --- ## Sensor Messages Standard sensor data formats. All are POD types. ### LaserScan 2D lidar scan data (up to 360 points): ```rust // simplified use horus::prelude::*; let mut scan = LaserScan::new(); scan.ranges[0] = 5.2; scan.angle_min = -std::f32::consts::PI; scan.angle_max = std::f32::consts::PI; scan.range_min = 0.1; scan.range_max = 30.0; scan.angle_increment = std::f32::consts::PI / 180.0; let angle = scan.angle_at(45); if scan.is_range_valid(0) { println!("Range: {}m", scan.ranges[0]); } let valid = scan.valid_count(); if let Some(min) = scan.min_range() { println!("Closest: {}m", min); } ``` | Field | Type | Description | |-------|------|-------------| | `ranges` | `[f32; 360]` | Range readings in meters (0 = invalid) | | `angle_min` / `angle_max` | `f32` | Scan angle range in radians | | `range_min` / `range_max` | `f32` | Valid range limits in meters | | `angle_increment` | `f32` | Angular resolution in radians | | `time_increment` | `f32` | Time between measurements | | `scan_time` | `f32` | Time to complete scan in seconds | | `timestamp_ns` | `u64` | Nanoseconds since epoch | ### Imu Inertial Measurement Unit data: ```rust // simplified use horus::prelude::*; let mut imu = Imu::new(); imu.set_orientation_from_euler(0.1, 0.2, 1.5); // roll, pitch, yaw imu.angular_velocity = [0.1, 0.2, 0.3]; // rad/s imu.linear_acceleration = [0.0, 0.0, 9.81]; // m/s² if imu.has_orientation() { let quat = imu.orientation; } assert!(imu.is_valid()); ``` | Field | Type | Description | |-------|------|-------------| | `orientation` | `[f64; 4]` | Quaternion [x, y, z, w] | | `orientation_covariance` | `[f64; 9]` | 3x3 covariance matrix | | `angular_velocity` | `[f64; 3]` | Gyroscope data in rad/s | | `angular_velocity_covariance` | `[f64; 9]` | 3x3 covariance matrix | | `linear_acceleration` | `[f64; 3]` | Accelerometer data in m/s² | | `linear_acceleration_covariance` | `[f64; 9]` | 3x3 covariance matrix | | `timestamp_ns` | `u64` | Nanoseconds since epoch | ### Odometry Combined pose and velocity from wheel encoders or visual odometry: ```rust // simplified use horus::prelude::*; let mut odom = Odometry::new(); odom.set_frames("odom", "base_link"); let pose = Pose2D::new(1.0, 2.0, 0.5); let twist = Twist::new_2d(0.5, 0.2); odom.update(pose, twist); ``` | Field | Type | Description | |-------|------|-------------| | `pose` | `Pose2D` | Current position and orientation | | `twist` | `Twist` | Current velocity | | `pose_covariance` | `[f64; 36]` | 6x6 covariance matrix | | `twist_covariance` | `[f64; 36]` | 6x6 covariance matrix | | `frame_id` | `[u8; 32]` | Reference frame (e.g., "odom") | | `child_frame_id` | `[u8; 32]` | Child frame (e.g., "base_link") | | `timestamp_ns` | `u64` | Nanoseconds since epoch | ### Range Single-point distance sensor (ultrasonic, infrared): ```rust // simplified use horus::prelude::*; let range = Range::new(Range::ULTRASONIC, 1.5); let mut range = Range::new(Range::INFRARED, 0.8); range.min_range = 0.02; range.max_range = 4.0; range.field_of_view = 0.1; ``` | Field | Type | Description | |-------|------|-------------| | `sensor_type` | `u8` | `Range::ULTRASONIC` (0) or `Range::INFRARED` (1) | | `field_of_view` | `f32` | Sensor FOV in radians | | `min_range` / `max_range` | `f32` | Valid range limits in meters | | `range` | `f32` | Distance reading in meters | | `timestamp_ns` | `u64` | Nanoseconds since epoch | ### BatteryState Battery status and charge information: ```rust // simplified use horus::prelude::*; let mut battery = BatteryState::new(12.6, 75.0); // 12.6V, 75% charge battery.current = -2.5; battery.temperature = 28.5; battery.power_supply_status = BatteryState::STATUS_DISCHARGING; if battery.is_low(20.0) { println!("Battery low!"); } if battery.is_critical() { // Below 10% println!("Battery critical!"); } if let Some(time_left) = battery.time_remaining() { println!("Time remaining: {}s", time_left); } ``` | Field | Type | Description | |-------|------|-------------| | `voltage` | `f32` | Battery voltage in volts | | `current` | `f32` | Current in amperes (negative = discharging) | | `charge` | `f32` | Remaining charge in Ah | | `capacity` | `f32` | Total capacity in Ah | | `percentage` | `f32` | Charge percentage (0-100) | | `power_supply_status` | `u8` | `STATUS_UNKNOWN` (0), `STATUS_CHARGING` (1), `STATUS_DISCHARGING` (2), `STATUS_FULL` (3) | | `temperature` | `f32` | Temperature in °C | | `cell_voltages` | `[f32; 16]` | Individual cell voltages | | `cell_count` | `u8` | Number of cells | | `timestamp_ns` | `u64` | Nanoseconds since epoch | ### NavSatFix GPS position data: | Field | Type | Description | |-------|------|-------------| | `latitude` / `longitude` / `altitude` | `f64` | WGS84 coordinates | | `position_covariance` | `[f64; 9]` | 3x3 covariance matrix | | `status` | `u8` | Fix status | | `satellites_visible` | `u16` | Number of satellites | | `hdop` / `vdop` | `f32` | Dilution of precision | | `speed` / `heading` | `f32` | Speed (m/s) and heading (rad) | | `timestamp_ns` | `u64` | Nanoseconds since epoch | --- ## Control Messages Actuator commands and control parameters. All are POD types. ### MotorCommand Direct motor control: | Field | Type | Description | |-------|------|-------------| | `motor_id` | `u32` | Motor identifier | | `mode` | `f32` | Control mode | | `target` | `f32` | Target value | | `max_velocity` / `max_acceleration` | `f32` | Limits | | `feed_forward` | `f32` | Feed-forward term | | `enable` | `u8` | Enable flag | | `timestamp_ns` | `u64` | Nanoseconds since epoch | ### DifferentialDriveCommand Differential drive control (left/right wheels): | Field | Type | Description | |-------|------|-------------| | `left_velocity` / `right_velocity` | `f32` | Wheel velocities in m/s | | `max_acceleration` | `f32` | Acceleration limit | | `enable` | `u8` | Enable flag | | `timestamp_ns` | `u64` | Nanoseconds since epoch | ### ServoCommand Servo position/velocity control: | Field | Type | Description | |-------|------|-------------| | `servo_id` | `u32` | Servo identifier | | `position` / `speed` | `f32` | Target position and speed | | `enable` | `u8` | Enable flag | | `timestamp_ns` | `u64` | Nanoseconds since epoch | ### JointCommand Multi-joint position/velocity/effort (up to 16 joints): | Field | Type | Description | |-------|------|-------------| | `joint_names` | `[[u8; 32]; 16]` | Joint names | | `joint_count` | `u8` | Number of active joints | | `positions` / `velocities` / `efforts` | `[f64; 16]` | Joint commands | | `modes` | `[u8; 16]` | Control mode per joint | | `timestamp_ns` | `u64` | Nanoseconds since epoch | ### PidConfig PID controller parameters: | Field | Type | Description | |-------|------|-------------| | `controller_id` | `u32` | Controller identifier | | `kp` / `ki` / `kd` | `f64` | PID gains | | `integral_limit` / `output_limit` | `f64` | Limits | | `anti_windup` | `u8` | Anti-windup flag | | `timestamp_ns` | `u64` | Nanoseconds since epoch | ### TrajectoryPoint Single point in a trajectory: | Field | Type | Description | |-------|------|-------------| | `position` / `velocity` / `acceleration` | `[f64; 3]` | 3D motion | | `orientation` | `[f64; 4]` | Quaternion [x, y, z, w] | | `angular_velocity` | `[f64; 3]` | Angular velocity | | `time_from_start` | `f64` | Time offset in seconds | --- ## Vision Messages Image and camera data types. ### Image RAII image type with zero-copy shared memory backing. Pixel data lives in shared memory — only a lightweight descriptor is transmitted through topics. ```rust // simplified use horus::prelude::*; // Note: args are (height, width, encoding) let mut img = Image::new(480, 640, ImageEncoding::Rgb8)?; img.set_pixel(100, 200, &[255, 0, 0]); // Set pixel at (x=100, y=200) let pixels: &[u8] = img.data(); // Zero-copy access ``` **Accessor methods:** | Method | Returns | Description | |--------|---------|-------------| | `width()` | `u32` | Image width in pixels | | `height()` | `u32` | Image height in pixels | | `encoding()` | `ImageEncoding` | Pixel format | | `channels()` | `u32` | Number of channels | | `step()` | `u32` | Row stride in bytes | | `data()` / `data_mut()` | `&[u8]` / `&mut [u8]` | Zero-copy pixel data | | `pixel(x, y)` | `Option<&[u8]>` | Get a single pixel | | `set_pixel(x, y, val)` | `&mut Self` | Set a single pixel | | `copy_from(buf)` | `&mut Self` | Copy pixel data from buffer | | `fill(val)` | `&mut Self` | Fill entire image | | `roi(x, y, w, h)` | `Option>` | Extract region of interest | | `frame_id()` | `&str` | Camera frame | | `timestamp_ns()` | `u64` | Nanoseconds since epoch | **Supported encodings:** Mono8, Mono16, Rgb8, Bgr8, Rgba8, Bgra8, Yuv422, Mono32F, Rgb32F, BayerRggb8, Depth16 ### CompressedImage JPEG/PNG compressed images (variable-size, not POD): | Field | Type | Description | |-------|------|-------------| | `format` | `[u8; 8]` | Compression format string | | `data` | `Vec` | Compressed image data | | `width` / `height` | `u32` | Image dimensions | | `frame_id` | `[u8; 32]` | Camera frame | | `timestamp_ns` | `u64` | Nanoseconds since epoch | ### CameraInfo Camera calibration parameters (POD type): | Field | Type | Description | |-------|------|-------------| | `width` / `height` | `u32` | Image dimensions | | `distortion_model` | `[u8; 16]` | Distortion model name | | `distortion_coefficients` | `[f64; 8]` | Distortion coefficients | | `camera_matrix` | `[f64; 9]` | 3x3 intrinsic matrix | | `rectification_matrix` | `[f64; 9]` | 3x3 rectification matrix | | `projection_matrix` | `[f64; 12]` | 3x4 projection matrix | | `frame_id` | `[u8; 32]` | Camera frame | | `timestamp_ns` | `u64` | Nanoseconds since epoch | ### StereoInfo Stereo camera parameters (POD type): | Field | Type | Description | |-------|------|-------------| | `left_camera` / `right_camera` | `CameraInfo` | Per-camera calibration | | `baseline` | `f64` | Baseline distance in meters | | `depth_scale` | `f64` | Depth scaling factor | Methods: `depth_from_disparity()`, `disparity_from_depth()` --- ## Large Data (Zero-Copy) For most use cases, use the high-level domain types (`Image`, `PointCloud`, `DepthImage`) — they use zero-copy shared memory transport automatically and provide domain-specific convenience methods like `pixel()`, `point_at()`, and `get_depth()`. ```rust // simplified use horus::prelude::*; // Create an image (backed by shared memory automatically) let mut img = Image::new(1080, 1920, ImageEncoding::Rgb8)?; // ... fill pixels via img.data_mut() or img.set_pixel() ... let topic: Topic = Topic::new("camera.rgb")?; topic.send(&img); // Receiver gets zero-copy access if let Some(img) = topic.recv() { println!("{}x{} {:?}", img.width(), img.height(), img.encoding()); } ``` Only a lightweight descriptor is transmitted through topics while the actual data stays in shared memory. For low-level tensor transport (ML inference, custom pipelines), `Topic` provides direct access to the same zero-copy shared memory path with raw shape/dtype control. --- ## Detection Messages Object detection results. All are POD types. ### BoundingBox2D / BoundingBox3D 2D and 3D bounding boxes: | Type | Fields | |------|--------| | `BoundingBox2D` | `x`, `y`, `width`, `height` (all `f32`) | | `BoundingBox3D` | `cx`, `cy`, `cz` (center), `length`, `width`, `height`, `roll`, `pitch`, `yaw` (all `f32`) | ### Detection / Detection3D 2D and 3D object detections: | Field | Type | Description | |-------|------|-------------| | `bbox` | `BoundingBox2D` or `BoundingBox3D` | Bounding box | | `confidence` | `f32` | Detection confidence | | `class_id` | `u32` | Class identifier | | `class_name` | `[u8; 32]` | Class name string | | `instance_id` | `u32` | Instance identifier | `Detection3D` also includes `velocity: [f32; 3]`. --- ## Perception Messages 3D perception data types. ### PointCloud RAII point cloud type with zero-copy shared memory backing. Point data lives in shared memory — only a lightweight descriptor is transmitted through topics. ```rust // simplified use horus::prelude::*; // Create XYZ cloud: (num_points, fields_per_point, dtype) let cloud = PointCloud::from_xyz(\&points)? // 1000 points; // 1000 XYZ points let cloud = PointCloud::from_xyz(\&points) // 1000 points, 6 fields?; // 1000 XYZRGB points ``` **Accessor methods:** | Method | Returns | Description | |--------|---------|-------------| | `point_count()` | `u64` | Number of points | | `fields_per_point()` | `u32` | Floats per point (3=XYZ, 4=XYZI, 6=XYZRGB) | | `dtype()` | `TensorDtype` | Data type of components | | `is_xyz()` | `bool` | Whether this is a plain XYZ cloud | | `has_intensity()` | `bool` | Whether cloud has intensity | | `has_color()` | `bool` | Whether cloud has color | | `data()` / `data_mut()` | `&[u8]` / `&mut [u8]` | Zero-copy point data | | `point_at(idx)` | `Option<&[u8]>` | Get the i-th point as bytes | | `extract_xyz()` | `Option>` | Extract all XYZ coordinates | | `copy_from(buf)` | `&mut Self` | Copy point data from buffer | | `frame_id()` | `&str` | Reference frame | | `timestamp_ns()` | `u64` | Nanoseconds since epoch | ### Fixed-Size Point Types (POD) For zero-copy point cloud processing: | Type | Fields | Description | |------|--------|-------------| | `PointXYZ` | `x`, `y`, `z` (`f32`) | 3D point | | `PointXYZRGB` | `x`, `y`, `z` (`f32`), `r`, `g`, `b`, `a` (`u8`) | Colored point | | `PointXYZI` | `x`, `y`, `z`, `intensity` (`f32`) | Point with intensity | ### DepthImage RAII depth image type with zero-copy shared memory backing. Supports both F32 (meters) and U16 (millimeters) formats. ```rust // simplified use horus::prelude::*; let mut depth = DepthImage::meters(480, 640)?; // F32 meters depth.set_depth(100, 200, 1.5); // Set depth at (x=100, y=200) ``` **Accessor methods:** | Method | Returns | Description | |--------|---------|-------------| | `width()` / `height()` | `u32` | Image dimensions | | `dtype()` | `TensorDtype` | Data type (F32 or U16) | | `is_meters()` | `bool` | Whether F32 depth in meters | | `is_millimeters()` | `bool` | Whether U16 depth in mm | | `depth_scale()` | `f32` | Depth scale factor | | `data()` / `data_mut()` | `&[u8]` / `&mut [u8]` | Zero-copy depth data | | `get_depth(x, y)` | `Option` | Get depth at pixel (meters) | | `set_depth(x, y, val)` | `&mut Self` | Set depth at pixel | | `get_depth_u16(x, y)` | `Option` | Get raw U16 depth | | `depth_statistics()` | `(f32, f32, f32)` | (min, max, mean) in meters | | `frame_id()` | `&str` | Reference frame | | `timestamp_ns()` | `u64` | Nanoseconds since epoch | --- ## Navigation Messages Path planning and navigation types. ### Goal / GoalResult Navigation goal and result (both POD): ```rust // simplified use horus::prelude::*; let goal = Goal::new(Pose2D::new(5.0, 3.0, 0.0), 0.1, 0.05); // pose, position_tol, angle_tol let goal = goal.with_timeout(30.0).with_priority(1); if goal.is_reached(¤t_pose) { println!("Goal reached!"); } ``` **Goal fields:** `target_pose` (Pose2D), `tolerance_position`, `tolerance_angle`, `timeout_seconds` (f64), `priority` (u8), `goal_id` (u32), `timestamp_ns` (u64) **GoalResult fields:** `goal_id` (u32), `status` (u8), `distance_to_goal` (f64), `eta_seconds` (f64), `progress` (f32), `error_message` ([u8; 64]), `timestamp_ns` (u64) **GoalStatus values:** Pending, Active, Succeeded, Aborted, Cancelled, Preempted, TimedOut ### Waypoint / Path Waypoints and paths (both POD): ```rust // simplified use horus::prelude::*; let wp = Waypoint::new(Pose2D::new(1.0, 2.0, 0.0)); let wp = wp.with_velocity(Twist::new_2d(0.5, 0.0)).with_stop(); let mut path = Path::new(); path.add_waypoint(wp); ``` **Path** holds up to 256 waypoints with fields for `total_length`, `duration_seconds`, `frame_id`, `algorithm`, and `timestamp_ns`. ### PathPlan Compact planned path (POD, fixed-size): | Field | Type | Description | |-------|------|-------------| | `waypoint_data` | `[f32; 768]` | 256 waypoints × 3 floats (x, y, theta) | | `goal_pose` | `[f32; 3]` | Target pose | | `waypoint_count` | `u16` | Number of waypoints | | `timestamp_ns` | `u64` | Nanoseconds since epoch | ### OccupancyGrid 2D occupancy map (variable-size, not POD — uses serialization): ```rust // simplified let mut grid = OccupancyGrid::new(200, 200, 0.05, Pose2D::origin()); // 200x200, 5cm resolution if let Some((gx, gy)) = grid.world_to_grid(1.0, 2.0) { grid.set_occupancy(gx, gy, 100); // Mark occupied (grid coords) } // is_free/is_occupied take world coordinates directly if grid.is_free(1.0, 2.0) { /* navigable */ } if grid.is_occupied(1.0, 2.0) { /* blocked */ } ``` Values: -1 = unknown, 0 = free, 100 = occupied. ### CostMap Cost map for path planning (variable-size, not POD): ```rust // simplified let costmap = CostMap::from_occupancy_grid(grid, 0.55); // 55cm inflation radius let cost = costmap.cost(1.0, 2.0); // Get cost at world coordinates ``` ### VelocityObstacle / VelocityObstacles For velocity obstacle-based collision avoidance (both POD). `VelocityObstacles` holds up to 32 velocity obstacles. --- ## Diagnostics Messages Health monitoring and safety. All are POD types. ### Heartbeat Node liveness signal: | Field | Type | Description | |-------|------|-------------| | `node_name` | `[u8; 32]` | Node name | | `node_id` | `u32` | Node identifier | | `sequence` | `u64` | Sequence number | | `alive` | `u8` | Alive flag | | `uptime` | `f64` | Uptime in seconds | | `timestamp_ns` | `u64` | Nanoseconds since epoch | ### NodeHeartbeat Detailed per-node health status: | Field | Type | Description | |-------|------|-------------| | `state` / `health` | `u8` | Node state and health | | `tick_count` | `u32` | Total ticks executed | | `target_rate` / `actual_rate` | `f32` | Expected vs actual tick rate | | `error_count` | `u32` | Error counter | | `last_tick_timestamp` / `heartbeat_timestamp` | `u64` | Timestamps | ### Status General status report: | Field | Type | Description | |-------|------|-------------| | `level` | `u8` | Severity level | | `code` | `u32` | Status code | | `message` | `[u8; 128]` | Status message | | `component` | `[u8; 32]` | Component name | | `timestamp_ns` | `u64` | Nanoseconds since epoch | ### EmergencyStop Emergency stop signal: | Field | Type | Description | |-------|------|-------------| | `engaged` | `u8` | E-stop engaged flag | | `reason` | `[u8; 64]` | Reason string | | `source` | `[u8; 32]` | Source of e-stop | | `auto_reset` | `u8` | Auto-reset flag | | `timestamp_ns` | `u64` | Nanoseconds since epoch | ### SafetyStatus Safety system state: | Field | Type | Description | |-------|------|-------------| | `enabled`, `estop_engaged`, `watchdog_ok`, `limits_ok`, `comms_ok` | `u8` | Status flags | | `mode` | `u8` | Safety mode | | `fault_code` | `u32` | Fault code | | `timestamp_ns` | `u64` | Nanoseconds since epoch | ### ResourceUsage CPU/memory monitoring: | Field | Type | Description | |-------|------|-------------| | `cpu_percent` / `memory_percent` / `disk_percent` | `f32` | Usage percentages | | `memory_bytes` / `disk_bytes` | `u64` | Usage in bytes | | `network_tx_bytes` / `network_rx_bytes` | `u64` | Network traffic | | `temperature` | `f32` | Temperature in °C | | `thread_count` | `u32` | Thread count | | `timestamp_ns` | `u64` | Nanoseconds since epoch | ### DiagnosticValue / DiagnosticReport Key-value diagnostics: `DiagnosticValue` holds a single key-value pair. `DiagnosticReport` groups up to 16 `DiagnosticValue` entries with a component name and severity level. --- ## Force/Haptics Messages Force sensing and haptic feedback. All are POD types. | Message | Description | |---------|-------------| | `WrenchStamped` | Force/torque measurement with point of application | | `ImpedanceParameters` | Impedance control parameters (stiffness, damping, inertia) | | `ForceCommand` | Force control command with target force/torque | | `ContactInfo` | Contact detection state, force, normal, and point | | `HapticFeedback` | Haptic output command (vibration, force feedback) | --- ## Input Messages Human input devices. All are POD types. | Message | Description | |---------|-------------| | `JoystickInput` | Gamepad/joystick state (buttons, axes, hats) | | `KeyboardInput` | Keyboard key events with modifier flags | --- ## Segmentation, Landmark, and Tracking Messages Computer vision pipeline types. All are POD types. | Message | Description | |---------|-------------| | `SegmentationMask` | Semantic/instance/panoptic segmentation mask descriptor | | `Landmark` / `Landmark3D` | 2D/3D keypoints with visibility | | `LandmarkArray` | Set of landmarks (supports COCO, MediaPipe Pose/Hand/Face presets) | | `TrackedObject` | Tracked object with bbox, velocity, age, and state | | `TrackingHeader` | Tracking frame header with active track count | --- ## Custom Messages ### Basic Custom Message ```rust // simplified use serde::{Serialize, Deserialize}; use horus::prelude::*; #[derive(Debug, Clone, Serialize, Deserialize)] pub struct RobotStatus { pub battery_level: f32, pub temperature: f32, pub error_code: u32, pub timestamp_ns: u64, } let topic: Topic = Topic::new("robot_status")?; topic.send(RobotStatus { battery_level: 75.0, temperature: 42.0, error_code: 0, timestamp_ns: timestamp_now(), }); ``` ### POD Custom Message (Zero-Copy) For maximum performance, make your type POD-compatible: ```rust // simplified use horus::prelude::*; message! { MotorFeedback { timestamp_ns: u64, motor_id: u32, velocity: f32, current_amps: f32, temperature_c: f32, } } // Ready to use with Topic — zero-copy automatically let topic: Topic = Topic::new("motor.feedback")?; ``` See [POD Types](/concepts/core-concepts-podtopic) for full requirements. ### Adding LogSummary To enable logging with `Topic::verbose flag (via TUI monitor)`: ```rust // simplified use horus::prelude::*; // Option 1: Derive (uses Debug formatting) #[derive(Debug, Clone, Serialize, Deserialize, LogSummary)] pub struct SmallMessage { /* ... */ } // Option 2: Manual (for large types) impl LogSummary for LargeMessage { fn log_summary(&self) -> String { format!("LargeMsg({} items)", self.count) } } ``` --- ## Working with Messages in Nodes ### Publishing ```rust // simplified use horus::prelude::*; struct LidarNode { scan_pub: Topic, } impl Node for LidarNode { fn name(&self) -> &str { "LidarNode" } fn tick(&mut self) { let mut scan = LaserScan::new(); scan.ranges[0] = 5.2; self.scan_pub.send(scan); } } ``` ### Subscribing ```rust // simplified struct ObstacleDetector { scan_sub: Topic, } impl Node for ObstacleDetector { fn name(&self) -> &str { "ObstacleDetector" } fn tick(&mut self) { if let Some(scan) = self.scan_sub.recv() { if let Some(min_range) = scan.min_range() { if min_range < 0.5 { // Obstacle too close! } } } } } ``` --- ## GenericMessage Dynamic message type for cross-language communication (Rust <-> Python). Uses MessagePack serialization internally. Maximum payload: 4KB. ```rust // simplified use horus::prelude::*; use serde::{Serialize, Deserialize}; #[derive(Serialize, Deserialize)] struct SensorReading { temperature: f64, humidity: f64 } // Create from any serializable type let reading = SensorReading { temperature: 22.5, humidity: 60.0 }; let msg = GenericMessage::from_value(&reading)?; // Send through a topic let topic: Topic = Topic::new("sensor.generic")?; topic.send(msg); // Receive and deserialize if let Some(msg) = topic.recv() { let reading: SensorReading = msg.to_value()?; println!("Temperature: {}", reading.temperature); } ``` ### Key Methods | Method | Description | |--------|-------------| | `GenericMessage::new(bytes)` | Create from raw bytes | | `GenericMessage::from_value(v)` | Serialize any `Serialize` type | | `GenericMessage::with_metadata(bytes, meta)` | Create with metadata string | | `.data()` | Get raw byte payload | | `.metadata()` | Get metadata string (if set) | | `.to_value::()` | Deserialize to any `Deserialize` type | ### Performance - Small messages (≤256 bytes): ~4.0 us (inline fast path) - Large messages (>256 bytes): ~4.4 us (overflow buffer) - Maximum payload: 4096 bytes - Uses zero-copy IPC for transport ### When to Use - **Cross-language communication** — Python and Rust nodes sharing untyped data - **Prototyping** — Quick iteration before defining typed messages - **ML pipelines** — Flexible model outputs with varying schemas - **Metadata tagging** — Attach routing or context info via the metadata field For production code with known schemas, prefer typed messages for compile-time safety and ~50x lower serialization overhead. --- ## Clock & Time Messages | Message | Description | API Reference | |---------|-------------|---------------| | `Clock` | Simulation/replay time broadcast (clock_ns, sim_speed, paused) | [Clock API](/rust/api/clock-messages) | | `TimeReference` | External time sync from GPS/NTP/PTP (time_ref_ns, source, offset) | [Clock API](/rust/api/clock-messages) | ## Audio Messages | Message | Description | API Reference | |---------|-------------|---------------| | `AudioFrame` | Microphone audio data (up to 4800 samples, configurable sample rate and channels) | [AudioFrame](/stdlib/messages/audio-frame) | | `AudioEncoding` | Encoding format enum: `F32` or `I16` | [AudioFrame](/stdlib/messages/audio-frame) | --- ## Design Decisions ### Why Standard Message Types Instead of User-Defined Only Robotics has well-established data formats — IMU readings, velocity commands, laser scans, odometry — that every project needs. Requiring users to define these from scratch creates fragmentation: two teams building lidar drivers would produce incompatible `LaserScan` types with different field names, units, and layouts. HORUS ships 70+ standard messages so that any driver, algorithm, or tool can interoperate out of the box. A motor controller from one team works with a path planner from another because both use `CmdVel` with the same field layout and units. ### Why Rust Structs Instead of IDL Files ROS2 uses `.msg` and `.srv` Interface Definition Language files that require a code generation step (`rosidl`) before compilation. This adds build complexity, creates generated code that is hard to debug, and forces a separate toolchain dependency. HORUS defines messages as plain Rust structs with derive macros. They are normal Rust code — debuggable, IDE-navigable, and compiled with the rest of the project. The `message!` macro and `#[derive(Serialize, Deserialize)]` handle serialization without a separate code generation pipeline. Python bindings are generated via PyO3, not from IDL. ### Why Fixed-Size Types Get Automatic Zero-Copy Fixed-size (POD) types have a known memory layout at compile time — every `LaserScan` is exactly the same number of bytes regardless of content. This means they can be written directly into shared memory and read by another process without any serialization or deserialization. HORUS detects POD types at compile time and routes them through the zero-copy path automatically (~50ns latency). Variable-size types (like `OccupancyGrid` with its `Vec` data) use MessagePack serialization because their size is not known until runtime. Users get the fastest possible transport for each type without manual optimization. ## Trade-offs | Area | Benefit | Cost | |------|---------|------| | **Standard library** | Instant interoperability between all HORUS packages and drivers | Users must learn the standard types; custom types may duplicate existing ones | | **Rust-native definitions** | No code generation step, full IDE support, normal debugging | Message definitions are tied to Rust — Python types are mirrored via PyO3, not generated from a shared IDL | | **Zero-copy POD** | ~50ns latency for fixed-size types with no serialization overhead | POD types have fixed-size arrays (e.g., `[f32; 360]` for LaserScan) — wastes memory when actual data is smaller | | **`message!` macro** | One-line POD message definition with automatic zero-copy transport | Custom messages must be POD-compatible (no `Vec`, `String`, `Option`) to get zero-copy; otherwise they fall back to serialization | | **GenericMessage** | Flexible prototyping with any data shape, cross-language support | 4KB payload limit, serialization overhead, no compile-time type safety | | **No IDL** | Simpler build, fewer dependencies, no generated code to maintain | No automatic multi-language type generation from a single source — Python types must be manually kept in sync | ## See Also - **[POD Types](/concepts/core-concepts-podtopic)** — Zero-serialization for maximum performance - **[Tensor Messages](/rust/api/tensor-messages)** — Tensor, Device, TensorDtype, and tensor domain types - **[Image API](/rust/api/image)** — Pool-backed camera images - **[Topic](/concepts/core-concepts-topic)** — The unified communication API - **[Basic Examples](/rust/examples/basic-examples)** — Working examples with messages - **[Architecture](/concepts/architecture)** — How messages fit into HORUS --- ## Builder Composition Guide Path: /concepts/builder-composition Description: How node builder methods interact, compose, and override each other — the complete reference for combining .rate(), .compute(), .budget(), and friends # Builder Composition Guide You know what `.rate()` does. You know what `.compute()` does. But what happens when you call both? Does `.rate()` make it RT, or does `.compute()` override that? What if you add `.budget()` on top? Does call order matter? These are the questions that trip up every HORUS developer eventually. Each method's documentation explains what it does in isolation, but the real power — and the real confusion — comes from combining them. This page is the complete reference for how builder methods interact. ## The Core Rule: Deferred Finalization Every builder method just **stores a value**. Nothing happens until you call `.build()`. At that point, the scheduler looks at *everything* you've set and resolves the configuration in one pass. This means **method call order does not matter**: All three result in a **Compute** node that ticks at most 100 times per second. The scheduler sees both `.rate()` and `.compute()` at build time, and `.compute()` wins — regardless of when you called it. This is different from how most builder patterns work. In a typical builder, the last call wins because it overwrites a field. In HORUS, the scheduler applies resolution rules that consider all fields together. ## The `.rate()` Dual Meaning This is the single most important interaction to understand. `.rate()` changes its behavior based on what else is set: **Why?** Because "run at 1,000 Hz" and "run at most 10 times per second" are different intents. A motor controller running at 1,000 Hz needs a dedicated thread, timing enforcement, and deadline monitoring. A path planner ticking at 10 Hz just needs a frequency cap — it's CPU-bound work that runs on the thread pool. The resolution rule: | `.rate()` combined with... | Resulting class | `.rate()` means... | |---|---|---| | Nothing else | **Rt** | "This node has real-time timing requirements" | | `.compute()` | **Compute** | "Tick at most N times per second" (frequency cap) | | `.async_io()` | **AsyncIo** | "Tick at most N times per second" (frequency cap) | | `.on("topic")` | **Event** | Ignored — Event nodes trigger on messages, not time | | `.budget()` or `.deadline()` only | **Rt** | Both rate and explicit timing — RT with overrides | ### What `.rate()` auto-derives (Rt only) When `.rate()` results in the Rt class, it auto-derives timing parameters you didn't set: | What you set | Budget | Deadline | |---|---|---| | `.rate(100.hz())` only | 8ms (80% of 10ms) | 9.5ms (95% of 10ms) | | `.rate(100.hz()).budget(5.ms())` | 5ms (explicit) | 9.5ms (auto) | | `.rate(100.hz()).deadline(8.ms())` | 8ms (auto 80%) | 8ms (explicit) | | `.rate(100.hz()).budget(5.ms()).deadline(8.ms())` | 5ms | 8ms | | `.budget(5.ms())` only (no rate) | 5ms | 5ms (deadline = budget) | | `.deadline(8.ms())` only (no rate) | None | 8ms | ## Full Interaction Matrix This table shows what happens when you combine any two builder methods. Read it as: "row method + column method → result." ### Execution Class Methods Only one execution class can be active. If you call multiple, **the last one wins** (with a warning logged): | First | + Second | Result | Notes | |---|---|---|---| | `.compute()` | `.async_io()` | AsyncIo | Warning: first overridden | | `.compute()` | `.on("topic")` | Event | Warning: first overridden | | `.async_io()` | `.compute()` | Compute | Warning: first overridden | | `.async_io()` | `.on("topic")` | Event | Warning: first overridden | | `.on("topic")` | `.compute()` | Compute | Warning: first overridden | | `.on("topic")` | `.async_io()` | AsyncIo | Warning: first overridden | ### RT-Only Methods on Non-RT Nodes Some methods only make sense for RT nodes. Using them on the wrong class produces warnings or errors: | Method | On Rt node | On Compute node | On Event node | On AsyncIo node | On BestEffort node | |---|---|---|---|---|---| | `.budget()` | Sets budget | **Error** | **Error** | **Error** | Promotes to Rt | | `.deadline()` | Sets deadline | **Error** | **Error** | **Error** | Promotes to Rt | | `.on_miss()` | Sets policy | Warning (no effect) | Warning (no effect) | Warning (no effect) | Warning (no effect) | | `.priority()` | Sets OS priority | Warning (ignored) | Warning (ignored) | Warning (ignored) | Warning (ignored) | | `.core()` | Pins to CPU | Warning (ignored) | Warning (ignored) | Warning (ignored) | Warning (ignored) | | `.watchdog()` | Per-node watchdog | Works | Works | Works | Works | | `.rate()` | Sets rate | Frequency cap | Ignored | Frequency cap | Promotes to Rt | | `.order()` | Sets order | Sets order | Sets order | Sets order | Sets order | | `.failure_policy()` | Sets policy | Sets policy | Sets policy | Sets policy | Sets policy | ## Goal-Oriented Recipes Instead of "what does this method do?", here's "I need X — which methods do I chain?" ### "100 Hz sensor driver with deadline monitoring" **Why these methods**: `.rate()` → Rt class with auto-derived 8ms budget / 9.5ms deadline. `.on_miss(Miss::Skip)` → if the driver stalls waiting for hardware, skip one reading rather than accumulating delay. `.order(1)` → runs after safety-critical nodes. **What removing each method changes**: - Remove `.rate()` → BestEffort, no timing enforcement at all - Remove `.on_miss()` → defaults to `Miss::Warn` (logs but takes no action) - Remove `.order()` → defaults to 100 (normal priority) ### "Background logger that must not starve RT nodes" **Why `.compute()` and not just BestEffort**: A logger doing disk I/O in the main loop would block all BestEffort nodes behind it. `.compute()` moves it to the thread pool. `.rate(10.hz())` caps frequency (NOT RT — `.compute()` overrides that). **What if you used `.async_io()` instead**: Also works. Use `.async_io()` if the logger does network I/O (cloud upload). Use `.compute()` if it does local file I/O with CPU-bound formatting. ### "Event-driven planner that reacts to new scans" **Why not `.rate()`**: The planner has nothing to do until a new scan arrives. Polling at a fixed rate wastes CPU. `.on("lidar.scan")` means zero CPU when idle, instant wake on new data. **Can you add `.budget()` to an Event node?** No — this is an error at `.build()`. Event nodes trigger on data arrival, not on a fixed schedule, so deadline enforcement doesn't apply. ### "1 kHz motor controller on production hardware" **Why explicit `.budget()` and `.deadline()`**: Auto-derived values (800µs budget, 950µs deadline at 1 kHz) are generous defaults. After profiling, you know the motor controller takes ~200µs. Setting `.budget(300.us())` with a `.deadline(900.us())` gives a tighter budget for monitoring while leaving headroom before the deadline fires. **Why `.priority(90)` and `.core(0)`**: On a multi-core robot computer, pinning the motor controller to an isolated CPU core eliminates jitter from OS scheduling and cache migration. `.priority(90)` ensures the kernel never preempts this thread for normal processes. ### "ML inference that takes 50-200ms" **Why no `.rate()`**: ML inference time varies (50-200ms depending on scene complexity). A fixed rate would either waste CPU (rate too low) or queue up work (rate too high). Let it run as fast as it can on the thread pool. **Why not `.async_io()`**: ML inference is CPU-bound, not I/O-bound. `.compute()` uses a CPU thread pool optimized for parallel work. `.async_io()` uses tokio, which is optimized for I/O waiting. ### "Safety monitor that must never miss" **Every method is load-bearing**: Remove any one and you lose a safety guarantee. This is the maximum-configuration pattern for the most critical node in your system. ## What Happens If I... Quick answers to common "what if" questions: **"...call `.rate()` and `.compute()`?"** Compute class. `.rate()` becomes a frequency cap, not RT. No budget, no deadline, no timing enforcement. **"...call `.budget()` without `.rate()`?"** RT class. `.budget()` alone implies "this node has timing requirements." Deadline auto-derived as `deadline = budget`. **"...call `.deadline()` without `.rate()` or `.budget()`?"** RT class. Budget is not set (no auto-derivation without `.rate()`). The scheduler monitors wall time against the deadline. **"...set `.budget()` larger than `.deadline()`?"** Error at `.build()`. Budget is "expected time," deadline is "maximum time." A budget larger than the deadline means you expect the work to take longer than the hard limit — that's a configuration mistake. **"...set `.budget(Duration::ZERO)`?"** Error at `.build()`. Zero budget is meaningless. **"...call `.on_miss(Miss::Stop)` on a Compute node?"** Warning: "has no effect without a deadline." Compute nodes have no deadline, so the miss policy can never trigger. The node builds successfully, but `.on_miss()` does nothing. **"...call `.priority(99)` on a Compute node?"** Warning: "only RT nodes get SCHED_FIFO threads." Priority is silently ignored. The node builds and runs fine — it just doesn't get OS-level priority. **"...call `.on("")` (empty topic)?"** Error at `.build()`. An Event node with an empty topic can never trigger. **"...call `.compute().async_io()`?"** AsyncIo class (last wins). Warning logged that `.compute()` was overridden. Pick one. **"...just call `.build()` with no methods at all?"** BestEffort class with order 100. Ticks in the main loop at the scheduler's global rate. This is the simplest valid configuration. ## Anti-Patterns ### Cargo-culting RT configuration This wastes a dedicated CPU thread and an entire CPU core on a logger. Use `.async_io()` or just default BestEffort. ### Using `.compute()` for everything `.compute().rate()` gives you a frequency cap, not RT. The motor controller has no budget, no deadline, and no `Miss` policy. When the thread pool is busy with other Compute nodes, the motor controller waits. Use `.rate()` alone for nodes with timing requirements. ### Deadline without a response plan If you've set explicit budget and deadline, you've decided this node's timing matters. But the default `Miss::Warn` just logs a warning and continues — the robot keeps moving with a late motor command. Add `.on_miss(Miss::SafeMode)` or `.on_miss(Miss::Skip)` to define what should actually happen. ### Mixing intent across classes Event nodes trigger on messages, not time. A deadline ("must finish within 10ms of... what?") doesn't apply because there's no periodic schedule to miss. If you need deadline enforcement, use `.rate()` instead of `.on()` and poll the topic in `tick()`. ## Putting It All Together: Complete System Each node uses exactly the methods it needs — no more, no less. The safety monitor has every RT option enabled. The dashboard has none. The path planner and ML detector use `.compute()` to stay off the RT threads. The telemetry uploader uses `.async_io()` because it blocks on network I/O. ## Design Decisions **Why is method order irrelevant (deferred finalization)?** Early prototypes resolved execution class eagerly — `.rate()` immediately set the class to Rt, and `.compute()` overwrote it. This created subtle ordering bugs: `.rate().compute()` and `.compute().rate()` produced different nodes. Developers had to remember which method to call last. Deferred finalization eliminates this entire class of bugs by resolving everything at `.build()` time, where the scheduler can see the full picture. **Why does `.rate()` change meaning based on context?** The alternative was having two methods: `.rate()` for RT and `.frequency_cap()` for non-RT. But this forces developers to understand execution classes before they can set a tick rate. With the current design, the intent is clear from context: `.rate(1000.hz())` alone means "timing matters" (Rt), while `.rate(10.hz()).compute()` means "don't run too often" (frequency cap). The mental model is "describe what you need, the scheduler figures out how to run it." **Why errors instead of silent fixes for invalid combinations?** Setting `.budget()` on a Compute node is almost always a mistake — the developer thinks they're getting timing enforcement, but Compute nodes don't have it. Silently ignoring the budget would hide the bug. Erroring at `.build()` catches it immediately, before the robot moves. The principle: configuration mistakes should fail fast, not fail silently on the factory floor. ## Trade-offs | Gain | Cost | |------|------| | **Order-independent builders** — no "call this last" bugs | Must understand deferred finalization to predict results | | **`.rate()` dual meaning** — one method, context-dependent behavior | Must know that `.rate().compute()` is NOT RT | | **Strict validation** — catches mistakes at `.build()` | Learning curve: must understand which combinations are valid | | **RT auto-detection** — no explicit `.rt()` call needed | Less visible which nodes are RT (use `horus monitor` to check) | | **Warnings for ignored methods** — `.priority()` on Compute logs a warning | Warning fatigue if you're intentionally mixing configurations during prototyping | ## Advanced RT Builders These methods extend the RT chain for production deployments: ```rust scheduler.add(motor_ctrl) .rate(1000_u64.hz()) // RT class, budget=800us, deadline=950us .budget(500_u64.us()) // override budget .deadline_scheduler() // SCHED_DEADLINE (kernel EDF) instead of SCHED_FIFO .no_alloc() // panic if tick() allocates .core(3) // pin to core 3 + lock governor + move IRQs .priority(90) // SCHED_FIFO fallback priority (if DEADLINE unavailable) .build()?; ``` **Interaction rules:** - `.deadline_scheduler()` requires `.rate()` and `.budget()` — these provide the kernel parameters (period = 1/rate, runtime = budget) - `.deadline_scheduler()` + `.priority()` — priority is used as SCHED_FIFO fallback if SCHED_DEADLINE fails - `.no_alloc()` works with any execution class but is most useful for RT nodes - `.core()` now automatically locks CPU governor to `performance` and moves IRQs off that core - Without root/CAP_SYS_NICE, `.deadline_scheduler()` and `.priority()` degrade silently to normal scheduling --- ## See Also - [Execution Classes](/concepts/execution-classes) — The 5 classes and when to use each - [Choosing Configuration](/concepts/choosing-configuration) — Progressive complexity guide (Levels 0-5) - [Real-Time Systems](/concepts/real-time) — Budget, deadline, jitter, and what "real-time" means - [Scheduler API](/rust/api/scheduler) — Complete method reference with signatures - [Scheduler — Full Reference](/concepts/core-concepts-scheduler) — Execution model and tick lifecycle --- ## Real-Time Systems Path: /concepts/real-time Description: What real-time means for robotics, why it matters, and how HORUS handles it # Real-Time Systems Imagine a robot arm in a factory, picking parts off a conveyor belt. A camera spots a part, the planner calculates where to move, and the motor controller sends commands to the arm's joints 1,000 times per second. Each command tells the joints "move this much in the next millisecond." The arm swings smoothly because each command arrives exactly on time, every time, a thousand times in a row. Now imagine the computer running that motor controller decides to do something else for 50 milliseconds — maybe it is updating a log file, or the operating system is shuffling memory around. For those 50 milliseconds, the arm keeps executing the *last* command it received. If that command said "rotate the elbow joint at 2 radians per second," the elbow keeps rotating — uncorrected — for 50 times longer than it should have. The arm overshoots, slams into the conveyor belt, and breaks a gripper that costs thousands of dollars. This is why robots need real-time systems. Not because they need to be fast (the arm was already moving plenty fast), but because they need to be **predictable**. Every command must arrive on time. Every sensor reading must be processed before the next one arrives. Every safety check must happen within a guaranteed window. This page explains what "real-time" actually means, why robots need it, and how HORUS gives it to you. ## What Is Real-Time? Real-time does **not** mean fast. It means **predictable**. A real-time system guarantees that work finishes within a bounded time window called a **deadline**. A 10 ms deadline means every computation must complete in 10 ms — not just on average, not 99% of the time, but *every single time*. To understand why this distinction matters, consider two systems: - **System A** finishes in 1 ms on average, but occasionally takes 500 ms when the garbage collector runs. - **System B** always finishes in 9 ms. Every time. No exceptions. System A is **faster** — its average is 9x better. But System B is **real-time** — you can depend on it. For a motor controller that needs a new command every 10 ms, System A will eventually cause a catastrophic overshoot. System B never will. Three concepts define real-time behavior: **Deadline**: the maximum allowed time for a computation to complete. If your motor controller runs at 1,000 Hz, each tick has a 1 ms period. The deadline is some fraction of that period — say 950 microseconds — leaving a small margin for scheduling overhead. **Budget**: the maximum expected computation time. This is how long the actual work *should* take. A 500-microsecond budget inside a 950-microsecond deadline means the system expects the computation to finish in 500 microseconds but tolerates up to 950 microseconds before declaring a miss. **Jitter**: the variation in timing between consecutive ticks. If your controller ticks at 1,000 Hz, perfect timing means exactly 1,000 microseconds between each tick. In reality, one tick might start at 1,002 microseconds and the next at 998 microseconds. That 4-microsecond variation is jitter. Low jitter means consistent, smooth control. High jitter means the robot stutters, oscillates, or drifts. ``` Perfect timing (zero jitter): | 1ms | 1ms | 1ms | 1ms | 1ms | tick tick tick tick tick tick Real-world timing (some jitter): | 1.02ms |0.98ms| 1.01ms |0.99ms| 1.00ms | tick tick tick tick tick tick Pathological timing (high jitter): | 0.5ms | 2.3ms | 0.2ms | 1.8ms | 0.7ms | tick tick tick tick tick ``` ## Why Robots Need Real-Time Three things break when timing is unpredictable: **Motor control loops.** A controller sends velocity commands at 1,000 Hz. Each command says "move at this speed for the next millisecond." If one command arrives 50 ms late, the motor runs at the old speed for 50x too long. The result: the arm overshoots, oscillates, or collides with something. The higher the control frequency, the more damage a single late tick causes. **Sensor fusion.** An IMU (inertial measurement unit) reports acceleration and rotation at 200 Hz. A fusion algorithm integrates these readings to estimate the robot's position and orientation. If one reading is processed late, the algorithm integrates stale data. At a robot speed of 1 m/s, being 10 ms late means 1 cm of position error — per missed sample. After a few missed samples, the robot thinks it is somewhere it is not. **Safety systems.** A watchdog monitors all nodes and must detect a frozen node within a bounded time. If the watchdog itself is delayed — by a garbage collection pause, a page fault, or the OS scheduling another process — the robot keeps moving when it should have stopped. Safety systems are the one place where "usually fast enough" is never acceptable. ## Hard vs Soft vs Firm Not all deadlines are created equal. The consequences of missing a deadline determine which category of real-time you need: | Type | If you miss a deadline... | Example | |------|--------------------------|---------| | **Hard real-time** | System failure. Physical damage. People get hurt. | Pacemaker, airbag controller, ABS brakes | | **Firm real-time** | The result is worthless, but the system survives | Sensor fusion with stale data, dropped video frame in a pipeline | | **Soft real-time** | Quality degrades gradually — the more you miss, the worse it gets | Video streaming, audio playback, game rendering | ## Where HORUS Fits A typical robot's software stack has three layers. HORUS sits in the middle: Application Layer
Planning, behavior trees, mission logic
1-10 Hz, best-effort"] HORUS["HORUS (soft RT, Linux userspace)
Perception, control loops, sensor fusion
50-1000 Hz, ms-level deadlines"] FW["Firmware / RTOS (hard RT)
PWM, current loops, IMU sampling
1-50 kHz, us-level deadlines"] HW["Hardware
Motors, sensors, encoders, batteries"] APP --> HORUS --> FW --> HW style HORUS fill:#3b82f6,color:#fff,stroke:#2563eb `} caption="HORUS sits in the soft RT middle layer — fast enough for control loops, delegates us-level work to firmware" /> **Application layer** (top): High-level decision-making that does not need timing guarantees. Mission planning, behavior trees, user interfaces. Runs at 1-10 Hz, best-effort. **HORUS layer** (middle): Perception pipelines, control loops, sensor fusion, safety monitoring. Runs at 50-1,000 Hz with millisecond-level deadlines. This is where HORUS operates. You get the full power of Linux (networking, file I/O, ML inference, Python interop) while still meeting timing constraints. **Firmware layer** (bottom): Microsecond-level work that Linux cannot guarantee. PWM generation, motor current control, raw IMU sampling. Runs on dedicated microcontrollers at 1-50 kHz. This division is deliberate. Trying to do everything in firmware is painful — firmware has no file system, no networking stack, no Python. Trying to do everything in Linux userspace is dangerous — Linux cannot guarantee microsecond deadlines. HORUS gives you the sweet spot: fast enough for the control loops that matter, with graceful degradation when the OS causes a timing hiccup. ## HORUS RT Features HORUS provides six tools for real-time behavior. You can use as many or as few as your application needs. Most robots use two or three. ### Auto-derived timing from `.rate()` The simplest way to get real-time behavior. Set a tick rate and HORUS calculates safe default budget and deadline values: When you set `.rate()`, the scheduler automatically: 1. Calculates the period (1/frequency = 10 ms at 100 Hz) 2. Sets the **budget** to 80% of the period (8 ms) — this is how long your `tick()` should take 3. Sets the **deadline** to 95% of the period (9.5 ms) — this is the hard wall where the `Miss` policy fires 4. Assigns the **Rt execution class** — the node gets its own dedicated thread You never need to call `.rt()` — there is no such method. Setting `.rate()`, `.budget()`, or `.deadline()` is enough. HORUS auto-detects that the node needs real-time scheduling. See [Scheduler](/concepts/core-concepts-scheduler) for the full execution model. ### Explicit `.budget()` and `.deadline()` For fine-grained control, set budget and deadline directly instead of relying on auto-derivation: If you set `.budget()` without `.deadline()`, the deadline equals the budget — your budget IS your hard deadline: ### `.on_miss()` — Deadline miss handling When a node's `tick()` takes longer than its deadline, the `Miss` policy determines what happens: | Policy | What happens | Best for | |--------|-------------|----------| | `Miss::Warn` | Logs a warning, continues normally | Default. Non-critical nodes. Use during development. | | `Miss::Skip` | Skips this node's next tick to let it catch up | High-frequency nodes where one dropped cycle is acceptable | | `Miss::SafeMode` | Calls `enter_safe_state()` on the node | Motor controllers, actuators — stops movement on overrun | | `Miss::Stop` | Stops the entire scheduler immediately | Safety monitors — the last line of defense | ### `.prefer_rt()` / `.require_rt()` These methods control OS-level real-time scheduling features: | Method | RT scheduling | Memory locking | CPU affinity | On failure | |--------|--------------|----------------|--------------|------------| | `.prefer_rt()` | Tries `SCHED_FIFO` | Tries `mlockall` | Tries isolated CPUs | Logs degradation, continues | | `.require_rt()` | Requires `SCHED_FIFO` | Requires `mlockall` | Requires isolated CPUs | **Panics** | `SCHED_FIFO` tells the Linux kernel to give your process priority over all normal processes. `mlockall` prevents your memory from being swapped to disk (which would cause multi-millisecond page faults). Both require root privileges or `CAP_SYS_NICE`. ### CPU pinning Pin a node to specific CPU cores to reduce jitter from cache thrashing and OS scheduling: When a thread migrates between CPU cores (which the OS does frequently for load balancing), it loses its L1 and L2 cache contents. Rebuilding the cache takes microseconds — which shows up as jitter in your timing measurements. Pinning a real-time node to a dedicated core eliminates this source of jitter entirely. This is most effective when combined with Linux CPU isolation (`isolcpus=2,3` on the kernel command line), which prevents the OS from scheduling *anything else* on those cores. ### Watchdog The scheduler's watchdog detects nodes that stop responding and applies graduated degradation: | Timeout | Health state | Response | |---------|-------------|----------| | 1x watchdog | Warning | Log warning | | 2x watchdog | Unhealthy | Skip tick, log error | | 3x watchdog (critical node) | Isolated | Remove from tick loop, call `enter_safe_state()` | The graduated response prevents a single transient spike (garbage collection in a Python node, a page fault) from killing a node that would recover on its own. Only sustained unresponsiveness triggers isolation. ## When You Do NOT Need Real-Time Using real-time features where you do not need them wastes CPU, adds complexity, and makes debugging harder. Not every node is a motor controller. **Prototyping.** Just get it working first. Add timing constraints after you have proven the logic is correct. Premature RT configuration is the robotics equivalent of premature optimization. **Simulation.** Simulated time advances in discrete steps — it does not care about wall-clock deadlines. Use `tick_once()` for deterministic single-step execution. **Logging and recording.** A blackbox recorder can buffer writes and flush to disk at its own pace. It does not matter if a log entry is 10 ms late. **Visualization.** Rendering a dashboard at 30 FPS does not need deadline enforcement. If a frame is 5 ms late, nobody notices. **Planning and decision-making.** A path planner that runs at 1 Hz can take up to a second to compute. It is CPU-heavy but not time-critical. For these workloads, skip the RT configuration entirely: ## Quick Reference | Your node does... | Rust | Python | Why | |---|---|---|---| | Motor control at 100+ Hz | `.rate(100.hz())` | `rate=100` | Auto-derives budget and deadline, gets dedicated thread | | Sensor fusion with strict timing | `.rate(200.hz()).budget(3.ms())` | `rate=200, budget=3*ms` | Explicit budget for tight loops | | Safety-critical stop logic | `.rate(100.hz()).on_miss(Miss::SafeMode)` | `rate=100, on_miss="safe_mode"` | Degrades safely on overrun | | ML inference (variable latency) | `.compute()` | `compute=True` | No deadline — just use available CPU | | Emergency stop handler | `.on("emergency.stop")` | `on="emergency.stop"` | Runs only when the event fires, zero polling overhead | | Background logging | default (no config) | `horus.Node(name, tick)` | BestEffort is fine | | Visualization / UI | `.rate(30.hz())` or default | `rate=30` or default | Low rate, no deadline needed | ## Design Decisions **Why soft RT in Linux userspace instead of hard RT on an RTOS?** Hard real-time requires a dedicated RTOS or bare-metal firmware, which gives up everything that makes modern software development productive: file systems, networking, Python, ML frameworks, debugging tools. Most robotics control loops run at 100-1,000 Hz (1-10 ms periods), where Linux userspace jitter (typically 10-100 microseconds with proper configuration) is well within budget. By staying in userspace, HORUS gives developers access to the entire Linux ecosystem while still meeting the timing requirements of most robots. The small percentage of work that truly needs microsecond guarantees (PWM, current loops) belongs on dedicated firmware anyway. **Why auto-detect RT from `.rate()` instead of requiring explicit `.rt()` calls?** Developers think in terms of what their node needs: "this controller must run at 1,000 Hz." They do not think in terms of scheduling policies: "this needs SCHED_FIFO priority 80 on an isolated core with mlockall." Auto-detection from `.rate()`, `.budget()`, and `.deadline()` maps developer intent to the correct execution class without requiring framework knowledge. If you set a rate, HORUS assumes you care about timing and gives you a dedicated thread with budget enforcement. If you do not, it assumes you do not and uses the lightweight BestEffort executor. **Why graduated watchdog instead of instant kill?** A single late tick can happen for legitimate reasons: a Python node's garbage collector ran, the OS handled a network interrupt, or a page fault pulled data from swap. Killing the node on the first miss would make the system brittle. Graduated response (warn, then skip, then isolate) gives transient problems time to resolve while still catching genuinely frozen nodes. The 3x timeout threshold for isolation was chosen empirically — in testing, transient spikes almost never lasted beyond 2x the watchdog period. **Why `.prefer_rt()` as the recommended default instead of `.require_rt()`?** Most robots run on standard Linux without a fully configured RT kernel. Requiring `SCHED_FIFO` would make HORUS unusable during development (on laptops), in CI (Docker containers), and in simulation. `.prefer_rt()` applies RT features when available and degrades gracefully when they are not, logging exactly what was requested and what was achieved. This means the same code works on a developer laptop, in CI, and on the production robot — with progressively better timing as the platform improves. **Why budget and deadline as separate concepts?** Budget is "how long should this take." Deadline is "how long CAN it take before we have a problem." Separating them lets you express nuance: a sensor fusion node with a 3 ms budget and an 8 ms deadline means "it usually finishes in 3 ms, but I can tolerate up to 8 ms before the data is stale." This is different from a safety monitor with a 100-microsecond budget and a 100-microsecond deadline, where any overrun is unacceptable. If only one value existed, you would have to choose between being too strict (false alarms) or too lenient (missed problems). ## Trade-offs | Gain | Cost | |------|------| | **Soft RT in userspace** — access to full Linux ecosystem (Python, ML, networking) | Cannot guarantee sub-microsecond deadlines; kernel can always preempt | | **Auto-detection from `.rate()`** — no explicit RT configuration needed | Less visible what execution class a node will get (check with `horus node info`) | | **Budget + deadline separation** — express expected vs worst-case timing independently | Two parameters to understand instead of one | | **Graduated watchdog** — transient spikes do not kill nodes | Genuinely frozen nodes take 3x watchdog timeout to isolate | | **`.prefer_rt()` graceful degradation** — works everywhere, from laptops to production | May run without RT features and developer does not notice (check logs) | | **Per-node CPU pinning** — eliminates cache-migration jitter | Dedicated cores are unavailable to other processes; wastes resources if node is idle | | **`Miss::SafeMode` / `Miss::Stop`** — automatic safety response on overrun | Aggressive policies can shut down the system on transient spikes; tune thresholds carefully | ## See Also ## Advanced RT Features HORUS provides progressive RT levels. Each is opt-in — prototyping code works without any of these. ### SCHED_DEADLINE (Kernel-Guaranteed EDF) Linux's Earliest Deadline First scheduler gives **hard CPU bandwidth guarantees**. Unlike SCHED_FIFO (priority-based, can starve), EDF is mathematically optimal with admission control. ```rust scheduler.add(motor_ctrl) .rate(1000_u64.hz()) .budget(500_u64.us()) .deadline_scheduler() // opt-in to SCHED_DEADLINE .build()?; ``` What happens: the kernel guarantees this thread gets 500us of CPU every 1ms. If the system can't honor this (CPU overcommitted), `SCHED_DEADLINE` is rejected and HORUS falls back to `SCHED_FIFO` automatically. Requires `CAP_SYS_NICE` or root. Falls back to `SCHED_FIFO` silently if unavailable. ### Allocation-Free Tick Enforcement The #1 silent RT killer: a `format!()`, `Vec::push()`, or `String::from()` in `tick()` causes 100us+ latency spikes from heap allocation. `.no_alloc()` catches this instantly: ```rust scheduler.add(motor_ctrl) .rate(1000_u64.hz()) .no_alloc() // panic if tick() allocates .build()?; ``` Any heap allocation during `tick()` panics with a message naming the offending node. Requires `RtAwareAllocator` as global allocator in your binary: ```rust #[global_allocator] static ALLOC: horus_core::memory::rt_allocator::RtAwareAllocator = horus_core::memory::rt_allocator::RtAwareAllocator; ``` Without this line, `.no_alloc()` is a no-op — safe for prototyping. ### Automatic CPU Governor and IRQ Management When `.prefer_rt()` or `.core()` pins a thread to a CPU core, HORUS automatically: 1. **Locks the CPU governor** to `performance` (prevents frequency scaling jitter) 2. **Moves hardware interrupts** off the pinned core (prevents IRQ latency spikes) Both require root. Both degrade gracefully with a log message if permissions are insufficient. > **Python note:** `.deadline_scheduler()` and `.no_alloc()` are Rust-only. Python nodes already get RT enforcement (budget/deadline checking runs in Rust regardless of tick language), but the GIL prevents kernel-level scheduling guarantees and allocation-free execution. Use `.rate()`, `.budget()`, and `.deadline()` for Python nodes — these work identically in both languages. --- ## See Also - [Builder Composition Guide](/concepts/builder-composition) — How `.rate()`, `.budget()`, `.compute()` interact and override each other - [Execution Classes](/concepts/execution-classes) — The 5 execution classes and when to use each - [Scheduler — Full Reference](/concepts/core-concepts-scheduler) — Execution model, budget enforcement, deterministic mode - [RT Setup](/advanced/rt-setup) — Linux real-time kernel configuration guide - [Scheduler API](/rust/api/scheduler) — `.rate()`, `.budget()`, `.deadline()`, `.on_miss()` method reference - [Choosing Configuration](/concepts/choosing-configuration) — Practical guide to picking the right node settings --- ## node! Macro Guide Path: /concepts/node-macro Description: Write less code with the node! macro # The node! Macro **The problem**: Writing HORUS nodes manually in Rust requires lots of boilerplate code. **The solution**: The `node!` macro generates all the boilerplate for you. ## Why Use It? ## Basic Syntax ```rust // simplified node! { NodeName { name: "custom_name" // Explicit node name (optional) pub { ... } // Publishers (optional) sub { ... } // Subscribers (optional) data { ... } // Internal state (optional) tick { ... } // Main loop (required) init { ... } // Startup (optional) shutdown { ... } // Cleanup (optional) impl { ... } // Custom methods (optional) } } ``` **Only the node name and `tick` are required!** Everything else is optional. Set tick rate via the scheduler builder: `.rate(100.hz())`. ## Sections Explained ### `name:` - Explicit Node Name (Optional) Override the auto-generated node name with a custom identifier: ```rust // simplified node! { FlightControllerNode { name: "flight_controller", // Custom name instead of "flight_controller_node" pub { status: String -> "fc.status" } tick { ... } } } ``` **By default**, the node name is auto-generated from the struct name using snake_case: - `SensorNode` → `"sensor_node"` - `IMUProcessor` → `"i_m_u_processor"` - `MyRobotController` → `"my_robot_controller"` **With explicit naming**, you control the exact name used everywhere: - Scheduler registration and execution logs - Monitor TUI display - Log messages (`[flight_controller] ...`) - Diagnostics and metrics **Use cases for explicit naming:** - Multiple instances of the same node type: `name: "imu_front"`, `name: "imu_rear"` - Cleaner names for acronyms: `IMUSensor` → `name: "imu_sensor"` (instead of `"i_m_u_sensor"`) - Match existing naming conventions in your robot system - Shorter names for logging readability ```rust // simplified // Example: Multiple IMU sensors with explicit names node! { IMUSensor { name: "imu_front", pub { data: Imu -> "sensors.imu_front" } tick { ... } } } node! { IMUSensor { // Same struct definition... name: "imu_rear", // ...but different runtime identity pub { data: Imu -> "sensors.imu_rear" } tick { ... } } } ``` ### Tick Rate Set the tick rate via the scheduler builder, not in the macro: ```rust // simplified // Set rate when adding to scheduler scheduler.add(MyNode::new()) .rate(100.hz()) // 100 Hz .build()?; ``` The `rate` keyword inside `node! {}` is **not supported** — it was removed in favor of the builder API. ### `pub` - Send Messages Define what this node sends: ```rust // simplified pub { // Syntax: name: Type -> "topic" velocity: f32 -> "robot.velocity", status: String -> "robot.status" } ``` This creates: - A `Topic` field called `velocity` - A `Topic` field called `status` - Both connected to their respective topics ### `sub` - Receive Messages Define what this node receives: ```rust // simplified sub { // Syntax: name: Type -> "topic" commands: String -> "user.commands", sensors: f32 -> "sensors.temperature" } ``` This creates: - A `Topic` field called `commands` - A `Topic` field called `sensors` - Both listening to their respective topics ### `data` - Internal State Store data inside your node: ```rust // simplified data { counter: u32 = 0, buffer: Vec = Vec::new(), last_time: Instant = Instant::now() } ``` Access these as `self.counter`, `self.buffer`, etc. ### `tick` - Main Loop This runs repeatedly at the node's tick rate: ```rust // simplified tick { // Read inputs if let Some(cmd) = self.commands.recv() { // Process let result = process(cmd); // Send outputs self.status.send(result); } // Update state self.counter += 1; } ``` **Keep this fast!** It runs every frame. ### `init` - Startup (Optional) Runs once when your node starts: ```rust // simplified init { hlog!(info, "Starting up"); self.buffer.reserve(1000); // Pre-allocate Ok(()) } ``` The init block must return `Ok(())` on success (it generates `fn init(&mut self) -> Result<()>`). Use this for: - Opening files/connections - Pre-allocating memory - One-time setup ### `shutdown` - Cleanup (Optional) Runs once when your node stops: ```rust // simplified shutdown { hlog!(info, "Processed {} messages", self.counter); // Save state, close files, etc. Ok(()) } ``` ### `impl` - Custom Methods (Optional) Add helper functions: ```rust // simplified impl { fn calculate(&self, x: f32) -> f32 { x * 2.0 + self.counter as f32 } fn reset(&mut self) { self.counter = 0; } } ``` ## Complete Examples ### High-Rate Motor Controller ### Simple Publisher ### Simple Subscriber ### Pipeline (Sub + Pub) ### With State ```rust // simplified node! { AverageNode { sub { input: f32 -> "values" } pub { output: f32 -> "average" } data { buffer: Vec = Vec::new(), max_size: usize = 10 } tick { if let Some(value) = self.input.recv() { self.buffer.push(value); // Keep only last 10 values if self.buffer.len() > self.max_size { self.buffer.remove(0); } // Calculate average let avg: f32 = self.buffer.iter().sum::() / self.buffer.len() as f32; self.output.send(avg); } } } } ``` ### With Lifecycle ```rust // simplified node! { FileLoggerNode { sub { data: String -> "logs" } data { file: Option = None } init { use std::fs::OpenOptions; self.file = OpenOptions::new() .create(true) .append(true) .open("log.txt") .ok(); hlog!(info, "File opened"); Ok(()) } tick { if let Some(msg) = self.data.recv() { if let Some(file) = &mut self.file { use std::io::Write; writeln!(file, "{}", msg).ok(); } } } shutdown { hlog!(info, "Closing file"); self.file = None; // Closes the file Ok(()) } } } ``` ## Tips and Tricks ### Use Descriptive Names ```rust // simplified // Good pub { motor_speed: f32 -> "motors.speed" } // Bad pub { x: f32 -> "data" } ``` ### Keep tick Fast ```rust // simplified // Good - quick operation tick { if let Some(x) = self.input.recv() { let y = x * 2.0; self.output.send(y); } } // Bad - slow operation tick { std::thread::sleep(Duration::from_secs(1)); // Blocks everything! } ``` ### Pre-allocate in init() ```rust // simplified init { self.buffer.reserve(1000); // Do this once Ok(()) } tick { // Don't allocate in tick - do it in init! } ``` ## Common Questions ### Do I need to import anything? Yes, import the prelude: ```rust // simplified use horus::prelude::*; node! { MyNode { ... } } ``` ### Can I have multiple publishers? Yes! ```rust // simplified pub { speed: f32 -> "speed", direction: f32 -> "direction", status: String -> "status" } ``` ### Can I skip sections I don't need? Yes! Only `NodeName` and `tick` are required: ```rust // simplified node! { MinimalNode { tick { hlog!(info, "Hello!"); } } } ``` ### How do I use the node? Create it and add it to the scheduler: ## Troubleshooting ### "Cannot find type in scope" Import your message types: ```rust // simplified use horus::prelude::*; node! { MyNode { pub { cmd: CmdVel -> "cmd_vel" } ... } } ``` ### "Expected `,`, found `{`" Check your syntax: ```rust // simplified // Wrong pub { cmd: f32 "topic" } // Right pub { cmd: f32 -> "topic" } ``` ### Node name must be CamelCase ```rust // simplified // Wrong node! { my_node { ... } } // Right node! { MyNode { ... } } ``` ### How do I give my node a custom name? Use the `name:` section: ```rust // simplified node! { MyNode { name: "robot1_controller", // Custom runtime name tick { ... } } } // Now node.name() returns "robot1_controller" instead of "my_node" ``` This is useful for: - Running multiple instances of the same node type - Avoiding ugly auto-generated names (e.g., `IMU` → `"i_m_u"`) - Matching external naming conventions ## What the Macro Generates For reference, the `node!` macro expands to: 1. **`pub struct NodeName`** — with `Topic` fields for publishers/subscribers and your data fields 2. **`impl NodeName { pub fn new() -> Self }`** — constructor that creates all topics and initializes data with defaults 3. **`impl Node for NodeName`** — with `name()`, `tick()`, optional `init()`, `shutdown()`, `publishers()`, `subscribers()`, and `rate()` methods 4. **`impl Default for NodeName`** — calls `Self::new()` 5. **`impl NodeName { ... }`** — any methods from the `impl` section The struct name is converted to snake_case for the node name (e.g., `SensorNode` becomes `"sensor_node"`), unless overridden with `name:`. ## Design Decisions ### Why a Procedural Macro Instead of Derive Macros Derive macros (e.g., `#[derive(Node)]`) operate on existing struct definitions — they can add trait implementations but cannot generate the struct itself, create constructors, or reorganize fields by role. The `node!` macro is a procedural macro that takes a domain-specific syntax and generates the entire struct, constructor, trait impl, and Default impl from a single declaration. This lets users declare publishers, subscribers, and state in separate labeled sections (`pub {}`, `sub {}`, `data {}`) rather than mixing `Topic` fields with plain data fields in a flat struct. The macro assigns meaning to each section and generates the correct wiring code for each. ### Why Generate the Constructor Automatically Topic creation requires calling `Topic::new("name")` for every publisher and subscriber, and each call can fail. In hand-written code, users must write a `new()` function that mirrors the struct field-by-field, creating topics and initializing defaults — pure boilerplate that adds no information. The macro generates `new() -> Self` (panicking on topic creation failure) because topic names are string literals known at compile time. A panic during construction surfaces immediately at startup rather than hiding behind `Result` chains that obscure the actual error. ### Why Auto-Generate Publishers/Subscribers Metadata The macro generates `publishers()` and `subscribers()` methods that return the list of topic names the node uses. This metadata powers the scheduler's topology awareness, the monitor TUI's wiring diagram, and `horus node info` introspection — all without requiring users to manually keep a metadata list in sync with their actual topic declarations. Since the macro already knows every `pub` and `sub` entry, generating this metadata is free and always correct. ## Trade-offs | Area | Benefit | Cost | |------|---------|------| | **Boilerplate reduction** | 13 lines vs 48 lines for the same node; no hand-written constructor, trait impl, or Default | Custom DSL syntax to learn — not standard Rust | | **Topic wiring** | Publishers and subscribers are declared once and auto-connected | Topic creation panics at startup instead of returning `Result` — errors are loud but not recoverable | | **Compile-time safety** | Type mismatches caught at compile time via generated `Topic` fields | Macro error messages can be harder to read than normal Rust compiler errors | | **Introspection** | Auto-generated `publishers()`/`subscribers()` metadata is always accurate | Users cannot override the generated metadata without escaping to manual `impl Node` | | **Flexibility** | Covers the common case (pub/sub nodes with state and lifecycle) concisely | Complex nodes needing custom `Node` trait methods or non-standard constructors must use manual `impl Node` | | **IDE support** | Code inside `tick {}`, `init {}`, `impl {}` blocks has full autocomplete | Some IDEs struggle with macro-generated fields until the project is built once | ## See Also - **[Core Concepts: Nodes](/concepts/core-concepts-nodes)** — Full node model and lifecycle - **[Basic Examples](/rust/examples/basic-examples)** — Real applications using the macro - **[Message Types](/concepts/message-types)** — Available message types for pub/sub - **[Topic](/concepts/core-concepts-topic)** — Communication API details --- ## Actions Path: /concepts/actions Description: Long-running tasks with feedback, cancellation, and preemption # Actions > **Beta**: The Actions API is functional in Rust but still maturing. Python bindings are not yet available. The API may change in future releases. Your robot needs to navigate across a warehouse, pick up a package, and bring it back. That takes 30 seconds. How does the operator know it's working? How do they cancel if something goes wrong? How does a new high-priority goal interrupt the current one? **Topics** can't do this — they're fire-and-forget. **Services** can't either — they block until done, with no progress updates. **Actions** solve this: >Bot: Go to shelf B3 Note right of Bot: starts navigating... Bot-->>Op: 12m remaining, 20% Bot-->>Op: 8m remaining, 45% Bot-->>Op: 3m remaining, 78% Op->>Bot: Cancel! New priority Note right of Bot: stops safely Bot-->>Op: Canceled at (4.2, 1.1) `} caption="Action pattern: goal → progress feedback → cancel/complete" /> **Use actions when:** - The task takes more than one tick (navigation, arm motion, calibration) - You need progress updates (distance remaining, percent complete) - You need to cancel or preempt in-flight tasks - You need to know if the task succeeded or failed ## Defining an Action Use the `action!` macro to define Goal, Feedback, and Result types: ### Standard Action Templates For common robotics patterns, use the `standard_action!` shortcut: ```rust // simplified standard_action!(navigate MyNavAction); // Goal: target pose, Feedback: distance, Result: final pose standard_action!(manipulate MyPickPlace); // Goal: object + target, Feedback: phase, Result: success standard_action!(wait MyWaitAction); // Goal: duration, Feedback: elapsed, Result: completed standard_action!(dock MyDockAction); // Goal: dock ID, Feedback: alignment, Result: docked ``` For a simple action with single fields per section, you can still use the `action!` macro: ```rust // simplified action! { Spin { goal { angular_velocity: f64 } feedback { current_angle: f64 } result { total_rotations: u32 } } } ``` --- ## Action Server The action server receives goals, executes them, and sends back feedback and results. ### Building a Server ```rust // simplified let server = ActionServerNode::::builder() // Validate incoming goals .on_goal(|goal| { if goal.max_speed <= 0.0 { GoalResponse::Reject("Speed must be positive".into()) } else { GoalResponse::Accept } }) // Handle cancellation requests .on_cancel(|goal_id| { hlog!(info, "Cancel requested for {:?}", goal_id); CancelResponse::Accept }) // Execute the action .on_execute(|handle| { let goal = handle.goal(); let mut distance = ((goal.target_x).powi(2) + (goal.target_y).powi(2)).sqrt(); let total = distance; while distance > 0.1 { // Check for cancellation if handle.is_cancel_requested() { return handle.canceled(NavigateResult { success: false, final_x: goal.target_x - distance, final_y: goal.target_y - distance, }); } // Simulate movement distance -= goal.max_speed * 0.1; // Publish feedback handle.publish_feedback(NavigateFeedback { distance_remaining: distance.max(0.0), percent_complete: ((total - distance) / total * 100.0) as f32, }); std::thread::sleep(std::time::Duration::from_millis(100)); } handle.succeed(NavigateResult { success: true, final_x: goal.target_x, final_y: goal.target_y, }) }) .build(); ``` ### Server Configuration ```rust // simplified let server = ActionServerNode::::builder() .on_goal(|_| GoalResponse::Accept) .on_execute(|handle| { /* ... */ handle.succeed(result) }) .max_concurrent_goals(Some(1)) // Only one goal at a time .feedback_rate(20.0) // 20 Hz feedback rate .goal_timeout(Duration::from_secs(30)) // Timeout after 30s .preemption_policy(PreemptionPolicy::PreemptOld) // New goals preempt active .build(); ``` ### Preemption Policies | Policy | Behavior | |--------|----------| | `PreemptOld` | New goals cancel the active goal (default) | | `RejectNew` | Reject new goals while one is active | | `Priority` | Higher-priority goals preempt lower-priority ones | | `Queue { max_size }` | Queue goals in FIFO order | ### ServerGoalHandle The `handle` passed to `on_execute` provides: ```rust // simplified handle.goal_id() // Unique goal identifier handle.goal() // The goal request (&A::Goal) handle.priority() // Goal priority level handle.status() // Current GoalStatus handle.elapsed() // Time since goal started handle.is_cancel_requested() // Client requested cancellation? handle.is_preempt_requested() // Higher-priority goal arrived? handle.should_abort() // Timeout or other abort condition? handle.publish_feedback(fb) // Send feedback to client // Terminal methods (consume the handle): handle.succeed(result) // -> GoalOutcome::Succeeded handle.abort(result) // -> GoalOutcome::Aborted handle.canceled(result) // -> GoalOutcome::Canceled handle.preempted(result) // -> GoalOutcome::Preempted ``` ### Server Metrics ```rust // simplified let metrics = server.metrics(); println!("Goals received: {}", metrics.goals_received); println!("Active: {}, Queued: {}", metrics.active_goals, metrics.queued_goals); println!("Succeeded: {}, Aborted: {}", metrics.goals_succeeded, metrics.goals_aborted); ``` --- ## Action Client ### Async Client (Node-Based) Use `ActionClientNode` when running inside a scheduler: ```rust // simplified let client = ActionClientNode::::builder() .on_feedback(|goal_id, feedback| { println!("Progress: {:.0}%", feedback.percent_complete); }) .on_result(|goal_id, status, result| { println!("Goal {:?} finished: {:?}", goal_id, status); }) .build(); // Send a goal let handle = client.send_goal(NavigateGoal { target_x: 5.0, target_y: 3.0, max_speed: 1.0, })?; // Or with priority let handle = client.send_goal_with_priority(goal, GoalPriority::HIGH)?; ``` ### ClientGoalHandle ```rust // simplified handle.goal_id() // Unique goal ID handle.status() // Current GoalStatus handle.is_active() // Pending or Active? handle.is_done() // In terminal state? handle.is_success() // Succeeded? handle.elapsed() // Time since sent handle.last_feedback() // Most recent feedback (Option) handle.result() // Final result if done (Option) handle.cancel() // Request cancellation // Blocking wait let result = handle.await_result(Duration::from_secs(10)); // Wait with feedback callback let result = handle.await_result_with_feedback( Duration::from_secs(10), |feedback| println!("Distance: {:.1}m", feedback.distance_remaining), )?; ``` ### Sync Client (Standalone) Use `SyncActionClient` for simple scripts without a scheduler: ```rust // simplified let client = SyncActionClient::::new()?; // Blocking call let result = client.send_goal_and_wait( NavigateGoal { target_x: 5.0, target_y: 3.0, max_speed: 1.0 }, Duration::from_secs(30), )?; // With feedback let result = client.send_goal_and_wait_with_feedback( goal, Duration::from_secs(30), |feedback| println!("{:.0}% complete", feedback.percent_complete), )?; ``` --- ## Goal Lifecycle >S: GoalRequest Note right of S: on_goal() Accept/Reject S-->>C: StatusUpdate (Pending → Active) S-->>C: Feedback (progress) S-->>C: Feedback (progress) S-->>C: Feedback (progress) C->>S: CancelRequest (optional) Note right of S: on_cancel() Accept/Reject S-->>C: Result (succeed / abort / canceled) `} caption="Action protocol: GoalRequest → StatusUpdate → Feedback... → Result" /> ### GoalStatus | Status | Description | |--------|-------------| | `Pending` | Received but not yet executing | | `Active` | Currently executing | | `Succeeded` | Completed successfully | | `Aborted` | Failed during execution | | `Canceled` | Canceled by client request | | `Preempted` | Canceled by higher-priority goal | | `Rejected` | Rejected by `on_goal` validation | ### GoalPriority ```rust // simplified GoalPriority::HIGHEST // 0 - Critical tasks GoalPriority::HIGH // 64 GoalPriority::NORMAL // 128 (default) GoalPriority::LOW // 192 GoalPriority::LOWEST // 255 - Background tasks ``` --- ## Running Actions in a Scheduler --- ## Error Handling ```rust // simplified match client.send_goal(goal) { Ok(handle) => { /* goal accepted */ } Err(ActionError::GoalRejected(reason)) => { /* validation failed */ } Err(ActionError::ServerUnavailable) => { /* no server running */ } Err(ActionError::GoalTimeout) => { /* server didn't respond */ } Err(e) => { /* other error */ } } ``` | Error | Cause | |-------|-------| | `GoalRejected(reason)` | `on_goal` returned `Reject` | | `GoalCanceled` | Goal was canceled | | `GoalPreempted` | Goal was preempted by higher priority | | `GoalTimeout` | Execution exceeded timeout | | `ServerUnavailable` | No action server found | | `CommunicationError(msg)` | IPC failure | | `ExecutionError(msg)` | Error during execution | | `InvalidGoal(msg)` | Malformed goal data | | `GoalNotFound(id)` | Unknown goal ID | --- ## CLI Commands ```bash # List active actions horus action list # Get action details horus action info navigate ``` --- ## Prebuilt Action Patterns The `standard_action!` macro provides ready-to-use action definitions for common robotics tasks: ### Navigate ```rust // simplified use horus::prelude::*; standard_action!(navigate); // Creates: NavigateGoal, NavigateFeedback, NavigateResult, Navigate let goal = NavigateGoal { target_x: 5.0, target_y: 3.0, target_theta: Some(1.57), max_speed: 1.0, }; ``` **Goal**: `target_x`, `target_y`, `target_theta` (optional), `max_speed` (default 1.0) **Feedback**: `distance_remaining`, `current_x`, `current_y`, `progress_percent` **Result**: `success`, `final_x`, `final_y`, `final_theta`, `distance_traveled`, `time_elapsed` ### Manipulate ```rust // simplified standard_action!(manipulate); // Creates: ManipulateGoal, ManipulateFeedback, ManipulateResult, Manipulate ``` **Goal**: `object_id`, `target_x/y/z`, `force_limit` (default 10.0) **Feedback**: `phase`, `grip_force`, `ee_x/y/z`, `progress_percent` **Result**: `success`, `error_message`, `object_x/y/z` ### Wait ```rust // simplified standard_action!(wait); // Creates: WaitGoal, WaitFeedback, WaitResult, Wait ``` **Goal**: `duration_secs` **Feedback**: `time_remaining`, `progress_percent` **Result**: `completed`, `actual_duration` ### Dock ```rust // simplified standard_action!(dock); // Creates: DockGoal, DockFeedback, DockResult, Dock ``` **Goal**: `dock_id`, `approach_speed` (default 0.1) **Feedback**: `phase`, `distance_to_dock`, `alignment_error`, `dock_detected` **Result**: `success`, `docked_id`, `contact_force` ## Design Decisions **Why Goal / Feedback / Result instead of just request/response?** A navigation task takes 30 seconds. With request/response (services), the caller blocks for 30 seconds with no visibility into progress — it doesn't know if the robot is moving, stuck, or crashed. The Goal/Feedback/Result pattern separates the initial request (goal), continuous progress updates (feedback), and final outcome (result), giving the caller real-time visibility and the ability to cancel at any point. **Why built on top of topics?** Actions use three internal topics (`{name}.goal`, `{name}.feedback`, `{name}.result`) for communication. This means actions inherit all of Topic's properties: shared-memory speed, automatic backend selection, cross-process transparency. It also means actions work anywhere topics work — no separate transport layer. **Why preemption policies instead of manual cancellation only?** In production robotics, new high-priority goals must be able to interrupt lower-priority ones automatically. A warehouse robot navigating to shelf B3 must immediately redirect if a higher-priority order arrives. Preemption policies (`PreemptOld`, `Priority`, `Queue`, `RejectNew`) encode this logic in the action server rather than requiring every client to manage cancellation manually. **Why separate sync and async clients?** Inside a scheduler tick, blocking for 30 seconds is unacceptable — you need `ActionClientNode` with non-blocking goal submission and feedback callbacks. In a standalone script or test, blocking is fine and simpler — you want `SyncActionClient`. Both use the same underlying topics. ## Trade-offs | Gain | Cost | |------|------| | **Progress visibility** — feedback at configurable rate | More complex than fire-and-forget topics | | **Cancellation and preemption** — interrupt tasks safely | Server must check `is_cancel_requested()` in its execute loop | | **Goal validation** — reject bad goals before execution | Extra `on_goal` handler to implement | | **Priority-based preemption** — high-priority goals auto-cancel lower ones | Must choose preemption policy (default: `PreemptOld`) | | **Built on topics** — inherits shared-memory speed and cross-process transparency | Three internal topics per action (goal, feedback, result) | ## See Also - [Actions API](/rust/api/actions) — Full Rust API reference - [Services](/concepts/services) — Synchronous request/response (for quick operations) - [Communication Overview](/concepts/communication-overview) — When to use topics vs services vs actions - [Scheduler — Full Reference](/concepts/core-concepts-scheduler) — Running action nodes in the scheduler --- ## Services Path: /concepts/services Description: Synchronous request/response RPC between nodes # Services > **Beta**: The Services API is functional in Rust but still maturing. Python bindings are not yet available. The API may change in future releases. Your SLAM node needs a map region from the map server. It can't continue until it gets the data. Your arm planner needs to know joint limits before computing a trajectory. Your calibration routine needs to trigger sensor calibration and wait for the result. These are **request/response** problems. Topics can't solve them — they're fire-and-forget with no reply. Actions are overkill — these operations finish in milliseconds. **Use services when:** - You need a response before continuing (parameter queries, map data, joint limits) - The operation completes quickly (within milliseconds) - You need guaranteed delivery with error reporting **Use topics instead** for continuous data streams. **Use actions instead** for long-running tasks with feedback. ## How It Works A service is a named RPC channel with a defined request type and response type. One server listens for requests and produces responses. One or more clients send requests and wait for responses. Under the hood, every service creates two internal topics: |"ServiceRequest"| REQ["{name}.request
Topic"] REQ --> S["Server"] S -->|"ServiceResponse"| RES["{name}.response
Topic"] RES --> C style REQ fill:#3b82f6,stroke:#2563eb,color:#fff style RES fill:#22c55e,stroke:#16a34a,color:#000 `} caption="Services use two topics internally — one for requests, one for responses. All shared-memory optimizations apply." /> - Requests include a monotonically-increasing `request_id` for correlation - Multiple clients can call the same server concurrently - Each client filters responses by matching its `request_id` - Communication uses shared memory (zero-copy IPC) ## Defining a Service Use the `service!` macro to define Request and Response types: --- ## Service Server The server listens for requests and returns responses. It runs in a background thread. ```rust // simplified let server = ServiceServerBuilder::::new() .on_request(|req| { if req.resolution <= 0.0 { return Err("Resolution must be positive".into()); } if req.x_max <= req.x_min || req.y_max <= req.y_min { return Err("Invalid region bounds".into()); } let map = generate_map(req.x_min, req.y_min, req.x_max, req.y_max, req.resolution); Ok(GetMapRegionResponse { width: map.width, height: map.height, data: map.data, timestamp: horus::timestamp_now(), }) }) .poll_interval(Duration::from_millis(1)) // How often to check for requests (default: 5ms) .build()?; // Server runs in a background thread — it's active until dropped ``` ### Stopping a Server The server stops when dropped, or explicitly: ```rust // simplified server.stop(); ``` --- ## Service Client ### Blocking Client The simplest way to call a service: ```rust // simplified let mut client = ServiceClient::::new()?; let response = client.call( GetMapRegionRequest { x_min: 0.0, y_min: 0.0, x_max: 10.0, y_max: 10.0, resolution: 0.05, }, Duration::from_secs(1), )?; println!("Map: {}x{} pixels", response.width, response.height); ``` #### Resilient Calls (Auto-Retry) Use `call_resilient` for production code that needs automatic retries on transient failures: ```rust // simplified // Auto-retry with default settings (3 retries, exponential backoff from 10ms) let response = client.call_resilient(request, Duration::from_secs(5))?; // Custom retry configuration use horus::prelude::RetryConfig; let response = client.call_resilient_with( request, Duration::from_secs(5), RetryConfig::new(5, Duration::from_millis(20)), // 5 retries, 20ms initial backoff )?; ``` `call_resilient` retries on `Timeout` and `Transport` errors. `ServiceFailed` and `NoServer` errors are not retried since they indicate permanent failures. #### Optional Response Use `call_optional` when the server may not be running: ```rust // simplified match client.call_optional(request, Duration::from_millis(100))? { Some(response) => println!("Map: {}x{}", response.width, response.height), None => println!("No server available"), } ``` ### Async Client For non-blocking calls, use `AsyncServiceClient`: ```rust // simplified let mut client = AsyncServiceClient::::new()?; // Start the call (non-blocking) let mut pending = client.call_async( GetMapRegionRequest { x_min: 0.0, y_min: 0.0, x_max: 5.0, y_max: 5.0, resolution: 0.1 }, Duration::from_secs(1), ); // Do other work... // Check if response is ready (non-blocking) match pending.check()? { Some(response) => println!("Map: {}x{}", response.width, response.height), None => println!("Still waiting..."), } // Check if the call has timed out if pending.is_expired() { println!("Service call timed out"); } // Or block until done let response = pending.wait()?; ``` ### Client Configuration ```rust // simplified // Custom poll interval for faster response detection let mut client = ServiceClient::::with_poll_interval( Duration::from_micros(500), // Default: 1ms )?; ``` --- ## Error Handling ```rust // simplified match client.call(request, Duration::from_secs(1)) { Ok(response) => { /* success */ } Err(ServiceError::Timeout) => { /* server didn't respond in time */ } Err(ServiceError::ServiceFailed(msg)) => { /* handler returned Err */ } Err(ServiceError::NoServer) => { /* no server found */ } Err(ServiceError::Transport(msg)) => { /* IPC error */ } } ``` | Error | Cause | Retried by `call_resilient`? | |-------|-------|------------------------------| | `Timeout` | Server didn't respond within the timeout duration | Yes | | `ServiceFailed(msg)` | Server handler returned `Err(msg)` | No (permanent) | | `NoServer` | No service server is running | No (permanent) | | `Transport(msg)` | IPC/shared memory communication failure | Yes | --- ## Complete Example A service that looks up robot joint limits by name: --- ## CLI Commands ```bash # List active services horus service list ``` --- ## Design Decisions **Why build services on top of topics instead of a separate IPC mechanism?** Topics already solve all the hard shared-memory problems: cross-process communication, automatic backend selection based on topology, live migration when processes split or join, and zero-copy for large data. Building a separate RPC transport would mean reimplementing all of that infrastructure — and maintaining two code paths. By using topics internally (`{name}.request` and `{name}.response`), services inherit every topic optimization for free. It also means `horus topic list` shows service traffic alongside regular topic traffic, giving you a single debugging tool for all communication. **Why poll-based servers instead of interrupt-driven wakeups?** The server thread checks for new requests at a configurable interval (default: 5 ms). An alternative would be to use OS-level signaling (futex, eventfd) to wake the server immediately when a request arrives. HORUS uses polling because it produces predictable, bounded CPU usage — the server wakes at known intervals, does bounded work, and sleeps. Interrupt-driven wakeups create unpredictable timing spikes that interfere with real-time nodes sharing the same system. The 5 ms default means worst-case response latency is 5 ms plus handler execution time, which is fast enough for the configuration-query and data-lookup use cases services are designed for. **Why both sync and async clients?** `ServiceClient` blocks the calling thread until a response arrives — simple but it stalls your node's `tick()` for the duration of the call. `AsyncServiceClient` returns immediately with a `PendingResponse` handle that you check later. Two clients because two usage patterns: inside a scheduler node, use async to avoid blocking the tick loop. In a standalone script or initialization code, use sync for simplicity. Providing only async would force unnecessary complexity on simple scripts. Providing only sync would make services unusable inside real-time scheduler nodes. **Why `call_resilient` as a separate method instead of building retries into `call`?** Retrying a failed request is not always the right thing to do. If the caller has a tight deadline, spending time on retries could cause a deadline miss. If the request has side effects (triggering a calibration), retrying could trigger it twice. Making retries explicit via `call_resilient` means the developer consciously opts in, and the default `call` method has the simplest possible behavior: try once, succeed or fail. **Why `call_optional` for missing servers?** During system startup, nodes initialize in order but the server might not be ready when a client starts. `call_optional` returns `None` instead of an error when no server exists, making it trivial to write startup-tolerant code without `match` on error variants. This is a common enough pattern (especially in tests and gradual-startup deployments) that it deserved a dedicated method rather than requiring error-handling boilerplate. ## Trade-offs | Gain | Cost | |------|------| | **Built on topics** — inherits all shared-memory optimizations and debugging tools | Service traffic mixes with topic traffic in `horus topic list` (distinguishable by `.request`/`.response` suffix) | | **Poll-based server** — predictable timing, no interrupt spikes | Worst-case 5 ms added latency (configurable via `poll_interval`) | | **Blocking `call()`** — simple, no callbacks, no futures | Stalls the caller's thread; do not use inside tight RT loops | | **Async client** — non-blocking, works inside scheduler nodes | More complex API (`call_async` + `pending.check()` loop) | | **Auto-retry via `call_resilient`** — handles transient failures automatically | Retries add latency; not suitable for side-effecting or time-critical calls | | **Request ID correlation** — multiple clients can share one server | Small overhead per request (monotonic counter + ID matching on response) | | **Rust-only** — full type safety, compile-time request/response checking | Python bindings not yet available | ## See Also - [Services API](/rust/api/services) — Full Rust API reference with per-method documentation - [Communication Overview](/concepts/communication-overview) — When to use topics vs services vs actions - [Actions](/concepts/actions) — Long-running tasks with feedback and cancellation - [Topics — Full Reference](/concepts/core-concepts-topic) — The pub/sub primitive that services are built on - [Topic API](/rust/api/topic) — Topic method reference (services use topics internally) --- ## Multi-Process Architecture Path: /concepts/multi-process Description: Run HORUS nodes across separate processes — shared memory auto-discovery, mixed languages, process isolation # Multi-Process Architecture HORUS topics work transparently across process boundaries. Two nodes in separate processes communicate the same way as two nodes in the same process — through shared memory. No broker, no serialization layer, no configuration. To orchestrate multiple processes with session discovery, control routing, and e-stop propagation, see [Launch System](/concepts/launch-system). ```rust // simplified // Process 1: sensor.rs let topic: Topic = Topic::new("imu")?; topic.send(imu_reading); // Process 2: controller.rs let topic: Topic = Topic::new("imu")?; // same name = same topic if let Some(reading) = topic.recv() { // Got it — zero-config, sub-microsecond } ``` --- ## How It Works When you call `Topic::new("imu")`, HORUS creates (or opens) a shared memory region. Any process on the same machine that calls `Topic::new("imu")` with the same type connects to the same underlying ring buffer. The shared memory backend is managed by `horus_sys` — you never configure paths manually. HORUS auto-detects whether a topic is same-process or cross-process and picks the fastest path: | Scenario | Latency | How It Works | |----------|---------|--------------| | Same thread | ~3ns | Direct pointer handoff | | Same process, 1:1 | ~18ns | Lock-free single-producer/single-consumer ring buffer | | Same process, 1:N | ~24ns | Broadcast to multiple in-process subscribers | | Same process, N:1 | ~26ns | Multiple in-process publishers, one subscriber | | Same process, N:N | ~36ns | Full many-to-many in-process | | Cross-process, POD type | ~50ns | Zero-copy shared memory (no serialization) | | Cross-process, N:1 | ~65ns | Shared memory, multiple publishers | | Cross-process, 1:N | ~70ns | Shared memory, multiple subscribers | | Cross-process, 1:1 | ~85ns | Shared memory, serialized type | | Cross-process, N:N | ~91ns | Shared memory, contention-free fan-out | Cross-process adds ~30-130ns vs in-process — still sub-microsecond. You don't configure any of this. The backend is selected automatically based on topology and upgrades transparently as participants join or leave. --- ## Running Multiple Processes ### Option 1: `horus run` with Multiple Files ```bash # Builds and runs both files as separate processes horus run sensor.rs controller.rs # Mixed languages work too horus run sensor.py controller.rs # With release optimizations horus run -r sensor.rs controller.rs ``` `horus run` compiles each file, then launches all processes and manages their lifecycle (SIGTERM on Ctrl+C, etc.). ### Option 2: Separate Terminals Run each node in its own terminal: ```bash # Terminal 1 horus run sensor.rs # Terminal 2 horus run controller.rs ``` Topics auto-discover via shared memory. No coordination needed. ### Option 3: `horus launch` (YAML) For production, declare your multi-process layout in a launch file: ```yaml # launch.yaml nodes: - name: sensor cmd: horus run sensor.rs - name: controller cmd: horus run controller.rs - name: monitor cmd: horus run monitor.py ``` ```bash horus launch launch.yaml ``` --- ## Example: Two-Process Sensor Pipeline **Process 1** — `sensor.rs`: ```rust // simplified use horus::prelude::*; message! { WheelEncoder { left_ticks: i64, right_ticks: i64, timestamp_ns: u64, } } struct EncoderNode { publisher: Topic, ticks: i64, } impl EncoderNode { fn new() -> Result { Ok(Self { publisher: Topic::new("wheel.encoders")?, ticks: 0, }) } } impl Node for EncoderNode { fn name(&self) -> &str { "Encoder" } fn tick(&mut self) { self.ticks += 10; self.publisher.send(WheelEncoder { left_ticks: self.ticks, right_ticks: self.ticks + 2, timestamp_ns: horus::now_ns(), }); } } fn main() -> Result<()> { let mut sched = Scheduler::new().tick_rate(100_u64.hz()); sched.add(EncoderNode::new()?).order(0).build()?; sched.run()?; Ok(()) } ``` **Process 2** — `odometry.rs`: ```rust // simplified use horus::prelude::*; message! { WheelEncoder { left_ticks: i64, right_ticks: i64, timestamp_ns: u64, } } struct OdometryNode { encoder_sub: Topic, odom_pub: Topic, last_left: i64, last_right: i64, } impl OdometryNode { fn new() -> Result { Ok(Self { encoder_sub: Topic::new("wheel.encoders")?, odom_pub: Topic::new("odom")?, last_left: 0, last_right: 0, }) } } impl Node for OdometryNode { fn name(&self) -> &str { "Odometry" } fn tick(&mut self) { if let Some(enc) = self.encoder_sub.recv() { let dl = enc.left_ticks - self.last_left; let dr = enc.right_ticks - self.last_right; self.last_left = enc.left_ticks; self.last_right = enc.right_ticks; println!("[Odom] delta L={} R={}", dl, dr); } } } fn main() -> Result<()> { let mut sched = Scheduler::new().tick_rate(100_u64.hz()); sched.add(OdometryNode::new()?).order(0).build()?; sched.run()?; Ok(()) } ``` Run them: ```bash # Terminal 1 horus run sensor.rs # Terminal 2 horus run odometry.rs ``` The `WheelEncoder` messages flow through shared memory at ~50ns latency, with zero configuration. --- ## When to Use Multi-Process | Factor | Single Process | Multi-Process | |--------|----------------|---------------| | **Latency** | ~3-36ns (intra-process) | ~50-171ns (cross-process) | | **Determinism** | Full control via scheduler ordering | Each process has its own scheduler | | **Isolation** | A crash takes down everything | A crash is contained to one process | | **Languages** | Single language per binary | Mix Rust + Python freely | | **Restart** | Must restart everything | Restart one process independently | | **Debugging** | Single debugger session | Attach debugger to one process | | **Deployment** | One binary to deploy | Multiple binaries | | **Complexity** | Simpler | More moving parts | **Use single-process when:** - All nodes are the same language - You need deterministic ordering between nodes (e.g., sensor → controller → actuator) - Latency matters at the nanosecond level - Simpler deployment is preferred **Use multi-process when:** - Mixing Rust and Python (e.g., Rust motor control + Python ML inference) - Process isolation is needed (safety-critical separation) - Independent restart required (update one node without stopping others) - Different update rates or lifecycle requirements --- ## Introspection HORUS CLI tools work across processes automatically: ```bash # See all topics (from any process) horus topic list # Monitor a topic published by another process horus topic echo wheel.encoders # See all running nodes across processes horus node list # Check bandwidth across processes horus topic bw wheel.encoders ``` --- ## Cleaning Up Shared memory files persist after processes exit. Clean them with: ```bash horus clean --shm # Remove stale shared memory regions ``` In practice, you rarely need this — HORUS automatically cleans stale SHM on every `horus` CLI command and every `Scheduler::new()` call. The manual command is an escape hatch for debugging. --- ## What Happens When a Process Crashes When a process dies (even via SIGKILL or power loss): 1. **SHM files persist** — the kernel closes the file descriptor and releases `flock` locks, but the mmap'd file stays on disk 2. **Other processes continue** — subscribers see `dropped_count()` increase if the publisher was mid-write, but they don't crash 3. **Backend auto-migrates** — when the crashed process restarts and reconnects, the topic detects the new participant and migrates the backend (e.g., from 1:1 to 1:N) within ~10μs 4. **Automatic cleanup** — the next `horus` CLI command or `Scheduler::new()` call auto-cleans stale namespaces (<1ms). No manual intervention needed. ```bash # Process 1 crashes # Process 2 keeps running, reading stale data from the ring buffer # Process 1 restarts # Process 2 sees fresh data again — no reconfiguration needed ``` **Type mismatches**: If a restarted process changes its message type (e.g., from `CmdVel` to `Twist`), the join fails with an error. Both processes must use the same message type for the same topic name. --- ## Mixed-Language Multi-Process The most common multi-process pattern: Rust for control loops, Python for ML inference. **Run both**: ```bash # Terminal 1 (Rust, 30 FPS camera) horus run sensor.rs # Terminal 2 (Python, ML inference) horus run detector.py ``` The `Image` flows through shared memory pool-backed transport — the Python node gets a zero-copy view of the pixels the Rust node wrote. No serialization, no copying. --- ## Debugging Multi-Process Systems ### Identify which process owns what ```bash horus topic list --verbose # Shows publisher/subscriber PIDs per topic horus node list # Shows all running nodes across all processes with PID, rate, CPU, memory ``` ### Watch cross-process data flow ```bash # Monitor messages from the Rust sensor in the Python process's terminal horus topic echo camera.rgb # Measure the actual publishing rate horus topic hz camera.rgb # Measure bandwidth horus topic bw camera.rgb ``` ### Debug one process at a time ```bash # Start the sensor normally horus run sensor.rs # Start the detector with verbose logging RUST_LOG=debug horus run detector.py ``` ### Use the monitor for a system-wide view ```bash horus monitor # Web UI at http://localhost:3000 shows ALL nodes from ALL processes # Topic graph view shows cross-process message flow ``` ### Common debugging workflow 1. `horus topic list` — verify both processes see the same topics 2. `horus topic hz ` — verify the publisher is sending at expected rate 3. `horus topic echo ` — verify message content is correct 4. `horus node list` — verify both nodes are `Running` (not `Error` or `Crashed`) 5. `horus bb --anomalies` — check for deadline misses or errors --- ## Common Errors | Error | Cause | Fix | |-------|-------|-----| | Topics not visible across processes | Different SHM namespaces | Set `HORUS_NAMESPACE=shared` in both terminals, or use `horus launch` | | `Type mismatch` on topic join | Process A uses `CmdVel`, Process B uses different type for same name | Ensure both processes use the exact same message type | | Stale data after crash | SHM files persist after process death | Usually auto-cleaned on next `horus run`. Manual: `horus clean --shm` | | `Topic not found` in CLI | CLI uses a different namespace than the running app | Run CLI in same terminal or set matching `HORUS_NAMESPACE` | | High `dropped_count` | Subscriber process is slower than publisher | Increase subscriber rate, reduce publisher rate, or increase topic capacity | | Permission denied on SHM | Different users running processes | Run both as the same user, or check `/dev/shm` permissions | --- ## Design Decisions ### Why Auto-Discovery via Shared Memory Names When you call `Topic::new("imu")` in two separate processes, both connect to the same shared memory region because the topic name deterministically maps to a shared memory path (managed by `horus_sys`). There is no registration step, no discovery protocol, and no configuration file listing topic endpoints. This works because shared memory is a kernel-level namespace — any process on the same machine that opens the same named region gets the same memory. Auto-discovery eliminates an entire class of misconfiguration bugs ("I forgot to register my topic") and means processes can start and stop in any order. ### Why No Broker Process Message brokers (like DDS in ROS2 or MQTT) add a routing hop between every publisher and subscriber. Even with optimizations, this hop adds latency and creates a single point of failure — if the broker crashes, all communication stops. HORUS uses direct shared memory: publishers write to a ring buffer, subscribers read from it, and no intermediary process routes messages. This gives sub-microsecond latency and means there is no central process that can fail. The cost is that HORUS topics only work on a single machine (cross-machine communication requires an explicit bridge). ### Why Transparent Same-Process vs Cross-Process Selection HORUS automatically detects whether a publisher and subscriber are in the same process or different processes and selects the fastest transport: direct pointer handoff (~3ns) for same-thread, lock-free ring buffer (~18ns) for same-process, or shared memory (~50ns) for cross-process. Users write the same `Topic::new("name")` call regardless. This means code that works in a single-process prototype deploys to a multi-process production system with zero changes. The transport upgrades and downgrades transparently as participants join and leave — splitting a monolith into separate processes does not require code changes. ## Trade-offs | Area | Benefit | Cost | |------|---------|------| | **Auto-discovery** | Zero configuration; processes connect by topic name alone; start/stop in any order | No explicit topology — harder to audit which processes are connected without `horus topic list` | | **No broker** | Sub-microsecond latency; no single point of failure; no extra process to deploy | Single-machine only — cross-machine communication requires an explicit network bridge | | **Transparent transport** | Same code works in single-process and multi-process; zero migration cost | Users cannot force a specific transport backend; automatic selection may surprise during debugging | | **Process isolation** | One crash does not take down the system; independent restart and upgrade | Higher baseline latency (~50ns cross-process vs ~3ns same-thread); shared memory files persist after exit and need cleanup | | **Shared memory persistence** | Fast reconnection — no handshake needed when a process restarts | Stale files from crashes are auto-cleaned on next startup; `horus clean --shm` for manual override | | **Independent schedulers** | Each process can run at its own tick rate with its own ordering | No cross-process deterministic ordering — sensor-to-actuator chains across processes depend on timing, not scheduler order | ## See Also - [Shared Memory](/concepts/shared-memory) — SHM architecture, ring buffers, platform differences - [Topics (Full Reference)](/concepts/core-concepts-topic) — topic API and backend details - [Message Performance](/concepts/core-concepts-podtopic) — POD types and zero-copy transport - [Multi-Language](/concepts/multi-language) — Rust + Python interop patterns - [CLI Reference](/development/cli-reference) — `horus launch`, `horus topic list`, `horus clean` --- ## Launch System Path: /concepts/launch-system Description: Multi-process orchestration with full observability — session discovery, control topics, e-stop propagation, and coordinated shutdown # Launch System The Scheduler runs multiple nodes inside a single process. The launch system runs multiple *processes* on the same machine — each process can contain its own Scheduler with its own nodes. ```yaml # formation.yaml — users only write this session: formation nodes: - name: controller package: control-pkg rate_hz: 100 - name: perception package: perception-pkg rate_hz: 30 depends_on: [controller] ``` ```bash horus launch formation.yaml ``` --- ## When to Use Scheduler vs Launch | Situation | Use | |-----------|-----| | All nodes in one language, one binary | **Scheduler only** | | Mixed Rust + Python nodes | **Launch** (separate processes) | | Crash isolation (camera crash shouldn't kill motor) | **Launch** (process boundaries) | | Multiple robots on one machine | **Launch** (namespaced sessions) | | Simulation testing, deterministic replay | **Scheduler** (`tick_once()`, `deterministic(true)`) | | Field deployment on embedded hardware | **Launch** (swap nodes by editing YAML, no recompile) | | Maximum performance (sub-microsecond IPC) | **Scheduler** (in-process backends: 3-36ns) | **Rule of thumb:** Start with Scheduler. Move to launch when you need process isolation, mixed languages, or runtime node composition without recompiling. --- ## How Launch Integrates with HORUS Launch is not a separate system — it is wired into the same observability infrastructure as the Scheduler. When you run `horus launch`, this is what happens: ### 1. Process Spawning Launch parses the YAML, resolves dependencies via topological sort, and spawns each node as a separate OS process. Each process receives environment variables: - `HORUS_NODE_NAME` — the Scheduler reads this as its default name - `HORUS_NAMESPACE` — shared memory namespace isolation - `HORUS_PARAM_*` — parameters from the launch YAML, consumed by `RuntimeParams::new()` ### 2. Session Registry (Auto-Discovery) After spawning, launch writes a session manifest to shared memory: ``` /dev/shm/horus_{namespace}/launch/{session}.json ``` It then polls node presence files to discover which Schedulers appeared. Once a process creates a Scheduler, launch auto-detects it and records the Scheduler name, control topic, and node list — all without any configuration from the user. ```bash # See active sessions horus launch --status ``` ``` ACTIVE LAUNCH SESSIONS ● formation (3 processes, uptime: 5m 32s) File: formation.yaml NAME PID STATUS SCHEDULER RESTARTS controller 12346 running controller 0 perception 12347 running perception 0 drivers 12348 running drivers 0 ``` ### 3. Control Topics Launch creates a session-level control topic: ``` horus.launch.ctl.{session} ``` Commands sent to this topic are routed to the correct per-Scheduler control topic. This means `StopNode("controller")` goes to `horus.ctl.controller` automatically — CLI tools don't need to know which Scheduler owns which node. ### 4. E-Stop Propagation Launch subscribes to the `_horus.estop` topic. When any node triggers an emergency stop: 1. Launch sends `GracefulShutdown` to every discovered Scheduler control topic 2. Schedulers run their full shutdown sequence (node `shutdown()` callbacks, blackbox flush, presence cleanup) 3. If a process doesn't exit within 2 seconds, launch sends SIGTERM 4. If still alive after 3 more seconds, SIGKILL ### 5. Coordinated Shutdown Pressing Ctrl+C triggers the same three-phase shutdown: ``` Phase A: GracefulShutdown → all Scheduler control topics (2s grace) Phase B: SIGTERM → any survivors (3s grace) Phase C: SIGKILL → any still alive ``` This ensures nodes get their `shutdown()` callback called, hardware is released, files are flushed, and presence files are cleaned up — instead of raw SIGKILL losing everything. ### 6. Blackbox Events Launch records lifecycle events to a JSONL log: ``` /dev/shm/horus_{namespace}/launch/{session}.events.jsonl ``` Events: `SessionStart`, `NodeSpawned`, `NodeCrashed`, `NodeRestarted`, `SessionStop`. Each includes nanosecond timestamps and process details. View with: ```bash cat /dev/shm/horus_*/launch/*.events.jsonl | jq . ``` ### 7. CLI Integration Nodes spawned by launch appear in standard CLI tools with session grouping: ```bash horus node list # groups nodes under "Launch: session (N nodes)" horus topic echo # live data from any launched process horus topic list # shows all topics across all launched processes horus param get # reads params from launched processes horus param set # sets params at runtime on launched processes horus launch --status # shows all active sessions with per-process table horus launch --stop # stops a session from any terminal horus launch --list # dry-run: shows nodes and dependencies ``` Example output of `horus node list` with a running launch session: ``` Running Nodes: Launch: formation (3 nodes) NAME STATUS PRIORITY RATE TICKS SOURCE --------------------------------------------------------------------------- pid_loop Running 0 100 Hz 5032 12346 planner Running 0 50 Hz 2516 12346 detector Running 100 30 Hz 1510 12347 Total: 3 node(s) ``` ### 8. Node Kill Routing `horus node kill ` automatically detects if a node belongs to a launch session. If it does, the command routes through the launch control topic (`horus.launch.ctl.{session}`) so the launch monitor can track the stop, update the session manifest, and potentially restart the process. Non-launched nodes use the direct Scheduler control path as before. ### 9. Network Replication Launched processes can enable `horus_net` for LAN topic replication: ```yaml session: fleet env: HORUS_NET_ENABLED: "true" nodes: - name: robot_a command: ./target/release/controller - name: robot_b command: ./target/release/controller ``` Network replication is on by default. Use `.network(false)` to disable. The launch system discovers networked Schedulers the same way as local ones — via SHM presence files and the scheduler directory. --- ## Launch File Reference ```yaml # Session name (used for SHM manifest, control topic, event log) session: my_robot # Global namespace (all topics prefixed) namespace: robot_1 # Global environment variables (applied to all nodes) env: HORUS_LOG_LEVEL: info nodes: - name: controller # Required: unique name package: control-pkg # Run via `horus run control-pkg` # OR command: "python ctrl.py" # Run a custom command rate_hz: 100 # Target tick rate (hint — Scheduler controls actual rate) priority: 0 # Launch order (lower = earlier) namespace: /arm # Per-node namespace prefix params: # Injected as HORUS_PARAM_* env vars max_speed: 1.5 robot_id: 1 env: # Extra environment variables CUDA_VISIBLE_DEVICES: "0" depends_on: [sensor] # Wait for these nodes to start first start_delay: 0.5 # Seconds to wait before launching restart: on-failure # "never" (default), "always", "on-failure" args: [--verbose] # Extra command-line arguments ``` --- ## Parameters: Launch YAML to Node Code Parameters in the launch YAML reach your node code automatically: ```yaml # launch.yaml nodes: - name: controller package: control-pkg params: max_speed: 1.5 robot_id: 1 ``` ```rust // In your node's init() let params = RuntimeParams::new()?; let speed: f64 = params.get("max_speed").unwrap_or(1.0); let id: i64 = params.get("robot_id").unwrap_or(0); ``` ```python # In your Python node params = horus.RuntimeParams() speed = params.get("max_speed", default=1.0) ``` The launch system sets `HORUS_PARAM_MAX_SPEED=1.5` and `HORUS_PARAM_ROBOT_ID=1` as environment variables. `RuntimeParams::new()` reads all `HORUS_PARAM_*` variables automatically, parsing them as the appropriate type (number, boolean, or string). --- ## Restart Policies | Policy | Behavior | |--------|----------| | `never` (default) | Process exits, launch records the event, moves on | | `on-failure` | Restart only if exit code is non-zero. Exponential backoff (100ms to 10s). Max 10 restarts. | | `always` | Restart on any exit. Same backoff and max restarts. | --- ## Compared to ROS 2 | Feature | ROS 2 Launch | HORUS Launch | |---------|-------------|--------------| | Config format | Python scripts or XML | YAML only | | Process management | roslaunch / ros2 launch | `horus launch` | | Parameter injection | Launch parameters → node params | `HORUS_PARAM_*` → `RuntimeParams` | | Session discovery | N/A | Auto-discovered from SHM | | Control routing | N/A | `horus.launch.ctl.{session}` topic | | E-stop propagation | Custom | Built-in, wired to `_horus.estop` | | Coordinated shutdown | SIGINT only | Control topic → SIGTERM → SIGKILL | | Event logging | rosout | JSONL blackbox per session | | Observability | `ros2 node list` (separate) | `horus node list` shows session grouping | The key difference: ROS 2 launch is a process spawner that delegates everything to DDS. HORUS launch is wired into the SHM observability stack — it discovers Schedulers, routes control commands, propagates safety events, and records lifecycle data. --- ## Mixed Language Example The primary reason launch exists — running Rust and Python nodes together: ```yaml # mixed_robot.yaml session: mixed_robot nodes: - name: controller command: ./target/release/motor_controller params: max_speed: 1.5 pid_kp: 2.0 - name: perception command: python3 ml_detector.py depends_on: [controller] params: model: yolov8n confidence: 0.7 ``` Both processes communicate via SHM topics — the Rust controller publishes `cmd_vel`, the Python ML node subscribes to `camera.image` and publishes `detections`. All visible in `horus topic list`, all controllable via `horus launch --stop`. --- ## CLI Quick Reference ```bash # Launch horus launch robot.yaml # start all nodes horus launch robot.yaml --dry-run # show plan without starting horus launch robot.yaml --list # show nodes and dependencies horus launch robot.yaml --namespace r1 # override namespace # Monitor horus launch --status # show all active sessions horus node list # show nodes grouped by session horus topic list # show topics from all processes horus topic echo # stream live data # Control horus launch --stop # stop a session from any terminal horus node kill # stop a node (routes via launch if applicable) # Debug horus param get # read runtime params horus param set # set params at runtime cat /dev/shm/horus_*/launch/*.events.jsonl | jq . # view lifecycle events ``` --- ## Multi-Language Support Path: /concepts/multi-language Description: Use HORUS with Rust, Python, and C++ # Multi-Language Support HORUS supports **Rust**, **Python**, and **C++**. All three share the same shared-memory IPC — a C++ publisher writes directly to the same ring buffer a Python subscriber reads from. Zero serialization, zero copying. ## Supported Languages ### Rust (Native) **Best for:** High-performance nodes, control loops, real-time systems HORUS is written in Rust and provides the most complete API. ```bash horus new my-project # Rust by default ``` **Learn more:** [Quick Start](/getting-started/quick-start) ### Python **Best for:** Rapid prototyping, AI/ML integration, data processing, visualization Python bindings (via PyO3) provide a Pythonic API with typed messages, per-node rate control, and multiprocess support. Integrates with NumPy, PyTorch, TensorFlow. ```bash horus new my-project --lang python ``` **Learn more:** [Python Quick Start](/getting-started/quick-start-python) ### C++ **Best for:** Existing C++ codebases, ROS2 migration, hardware drivers, performance-critical systems C++ bindings provide an idiomatic API with RAII, move semantics, zero-copy loan pattern, and template specializations. 15ns FFI overhead. Full API: Scheduler, Publisher/Subscriber, TensorPool, Image, PointCloud, Params, Services, Actions, TransformFrame. ```bash horus new my-project --lang cpp ``` **Learn more:** [C++ Quick Start](/getting-started/quick-start-cpp) | [Migrating from ROS2 C++](/tutorials/migrating-from-ros2-cpp) ## Cross-Language Communication All three languages communicate through HORUS's shared memory system. For cross-language communication, all sides **must use the same typed message types**. The SHM ring buffer layout is identical regardless of which language writes to it. ### Typed Topics for Cross-Language Communication Pass a message type class to the Python `Topic()` constructor to create a typed topic that Rust can read: ### Supported Typed Message Types These types work for Python-Rust cross-language communication: | Message Type | Python Constructor | Default Topic Name | Use Case | |-------------|-------------------|-------------------|----------| | `CmdVel` | `CmdVel(linear, angular)` | `cmd_vel` | Velocity commands | | `Pose2D` | `Pose2D(x, y, theta)` | `pose` | 2D position | | `Imu` | `Imu(accel_x, accel_y, accel_z, gyro_x, gyro_y, gyro_z)` | `imu` | IMU sensor data | | `Odometry` | `Odometry(x, y, theta, linear_velocity, angular_velocity)` | `odom` | Odometry data | | `LaserScan` | `LaserScan(angle_min, angle_max, ..., ranges=[...])` | `scan` | LiDAR scans | All message types include an optional `timestamp_ns` field for nanosecond timestamps. **Usage examples:** ```python from horus import Topic, CmdVel, Imu, LaserScan # Velocity commands cmd_topic = Topic(CmdVel) cmd_topic.send(CmdVel(linear=1.5, angular=0.3)) # IMU data imu_topic = Topic(Imu) imu_topic.send(Imu( accel_x=0.0, accel_y=0.0, accel_z=9.81, gyro_x=0.0, gyro_y=0.0, gyro_z=0.1 )) # Receive (returns typed Python object or None) if cmd := cmd_topic.recv(): print(f"linear={cmd.linear}, angular={cmd.angular}") ``` ### Generic Topics (Python-to-Python Only) For Python-to-Python communication, pass a string name to create a generic topic that can send any Python type: ```python from horus import Topic # Generic topic - pass topic name as string topic = Topic("my_data") topic.send({"sensor": "lidar", "ranges": [1.0, 1.1, 1.2]}) topic.send([1, 2, 3]) topic.send("hello") # Receive if msg := topic.recv(): print(msg) # Python dict, list, string, etc. ``` Generic topics use JSON/MessagePack serialization internally. **Rust nodes cannot read generic topics** — use typed messages for cross-language communication. **When to use which:** - **Typed Topics** (`Topic(CmdVel)`, `Topic(Pose2D)`) — Cross-language Rust+Python or Python-only - **Generic Topics** (`Topic("topic_name")`) — Python-only systems with custom data ## Python Node API The Python `Node` class provides a simple callback-based API: ```python import horus from horus import CmdVel, Pose2D, Topic pose_sub = Topic(Pose2D) cmd_pub = Topic(CmdVel) def controller_tick(node): pose = pose_sub.recv() if pose is not None: # Compute velocity command from pose cmd = CmdVel(linear=1.0, angular=0.5) cmd_pub.send(cmd) controller = horus.Node(name="Controller", tick=controller_tick, order=0, rate=30, subs=["pose"], pubs=["cmd_vel"]) horus.run(controller) ``` **Key Topic methods:** - `topic.send(msg)` — Send data to a topic - `topic.recv()` — Get next message (returns `None` if no messages) ## Choosing a Language | Use Case | Recommended Language | |----------|---------------------| | **Control loops** | Rust (lowest latency) | | **AI/ML models** | Python (ecosystem) | | **Hardware drivers** | Rust | | **Data processing** | Python or Rust | | **Real-time systems** | Rust | | **Prototyping** | Python (fastest development) | ## Mixed-Language Systems You can build systems with nodes in different languages: **Example: Robot with mixed languages** - **Motor controller** (Rust) — 1kHz control loop - **Vision processing** (Python) — PyTorch object detection - **Hardware driver** (Rust) — Sensor integration - **Monitor** (Rust) — Real-time visualization All communicate through HORUS shared memory with **sub-microsecond latency**. ### Running Mixed-Language Systems The `horus run` command automatically handles compilation and execution of mixed-language systems: ```bash # Mix Python and Rust nodes horus run sensor.py controller.rs visualizer.py # Mix Rust and Python horus run lidar_driver.rs planner.py motor_control.rs ``` **What happens:** 1. **Rust files** (`.rs`) are automatically compiled with `cargo build` using HORUS dependencies 2. **Python files** (`.py`) are executed directly with Python 3 3. All processes communicate via shared memory (managed automatically by `horus_sys`) 4. `horus run` manages the lifecycle (start, monitor, stop all together) **Note:** For Rust files, `horus run` creates a temporary Cargo project in `.horus/` with proper dependencies, builds it with `cargo build`, and executes the resulting binary. ### Example: Complete Mixed System ```bash # Run both together horus run sensor.py planner.rs # Both processes communicate via shared memory ``` **Benefits:** - **No manual compilation** — `horus run` handles it - **Automatic dependency management** — HORUS libraries linked correctly - **Process isolation** — One crash doesn't kill the whole system - **True parallelism** — Each process can use separate CPU cores ## API Parity | Feature | Rust | Python | |---------|------|--------| | **Topic send/recv** | `topic.send(msg)` / `topic.recv()` | `topic.send(msg)` / `topic.recv()` | | **Typed messages** | `Topic` | `Topic(CmdVel)` | | **Generic messages** | `Topic` | `Topic("name")` | | **Node lifecycle** | `init()`, `tick()`, `shutdown()` | `init()`, `tick()`, `shutdown()` callbacks | | **Scheduler** | `Scheduler::new()` | `Scheduler()` | | **Node priority** | `.order(n)` | `order=n` | | **Rate control** | `Scheduler` rate | `rate=Hz` per node | | **Transport selection** | Automatic (topology-based) | Automatic (topology-based) | | **Message types** | Full horus_library | CmdVel, Pose2D, Imu, Odometry, LaserScan + horus.library | | **TransformFrame transforms** | `TransformFrame::new()` | `TransformFrame()` | | **Tensor system** | Native | `Image`, `PointCloud`, `DepthImage` (pool-backed) | | **Logging** | `hlog!(info, ...)` | `node.log_info(...)` | ## Next Steps **Choose your language:** - [Python Bindings](/python/api/python-bindings) — Full guide with examples - [Quick Start](/getting-started/quick-start) — Get started with Rust **Build something:** - [Examples](/rust/examples/basic-examples) — See multi-language systems in action - [CLI Reference](/development/cli-reference) — `horus new` command options --- ## Design Decisions ### Why Rust + Python (Not C++) ROS2's primary languages are C++ and Python — C++ for performance, Python for scripting. HORUS chose Rust over C++ for the core because Rust provides the same zero-cost abstractions and bare-metal performance while eliminating entire categories of bugs (use-after-free, data races, buffer overflows) at compile time. For robotics — where memory safety bugs can cause physical harm — this is not a style preference but a safety requirement. Python was kept because the ML/AI ecosystem (PyTorch, TensorFlow, NumPy) is overwhelmingly Python-based, and robotics increasingly depends on learned models. ### Why Same Topics Across Languages (Shared Memory) Python and Rust nodes communicate through the same shared memory topics with identical layouts. A `Topic(CmdVel)` in Python writes to the same shared memory region as `Topic` in Rust. This means cross-language communication has the same sub-microsecond latency as same-language communication — there is no serialization bridge, no socket layer, and no middleware translation. The tradeoff is that typed message layouts must match exactly across languages, which is enforced by generating the Python message classes from the same Rust struct definitions via PyO3. ### Why PyO3 Bindings with a Python Wrapper Layer HORUS uses a two-layer Python architecture: `_horus` (Rust bindings via PyO3) and `horus` (Python wrapper in `__init__.py`). The Rust layer provides raw performance and shared memory access. The Python wrapper adds Pythonic ergonomics — keyword arguments, sensible defaults, `horus.run()` one-liner, and idiomatic naming. This separation means the Rust layer can expose low-level primitives without worrying about Python API design, while the Python layer can evolve its interface without changing Rust code. Users import `horus`, not `_horus`. ## Trade-offs | Area | Benefit | Cost | |------|---------|------| | **Rust over C++** | Memory safety at compile time; no data races or use-after-free in production | Steeper learning curve for teams coming from C++; smaller ecosystem of robotics-specific Rust libraries | | **Shared memory across languages** | Sub-microsecond cross-language latency; no serialization bridge | Typed message layouts must match exactly — adding a field to a Rust message requires updating the Python binding | | **PyO3 bindings** | Direct access to Rust performance from Python; no FFI boilerplate | PyO3 builds require a Rust toolchain — pure-Python environments cannot use HORUS without compiling | | **Two-layer Python API** | Clean Pythonic interface independent of Rust internals | Two layers to maintain; changes to the Rust API may not automatically surface in the Python wrapper | | **Generic topics (Python-only)** | Send any Python type without defining a message struct | Not readable by Rust nodes; serialization overhead compared to typed topics | | **No C++ support** | Simpler codebase, fewer language bindings to maintain | Cannot integrate with existing C++ robotics code without an interop layer or rewrite | ## Cross-Language IPC: Rust and Python Sharing Topics Rust and Python nodes can communicate through the same shared memory topics. Both languages use the same binary wire format (POD zero-copy) when using **typed topics**. ### Rust Publisher, Python Subscriber **Rust side:** ```rust use horus_library::messages::sensor::Imu; let topic: Topic = Topic::new("sensor.imu")?; let mut imu = Imu::new(); imu.linear_acceleration = [0.0, 0.0, 9.81]; topic.send(imu); ``` **Python side:** ```python import horus def tick(node): msg = node.recv("sensor.imu") if msg is not None: print(f"accel_z = {msg.linear_acceleration[2]}") sub = horus.Node("py_sub", tick=tick, subs={"sensor.imu": horus.Imu}, rate=100) horus.run(sub, duration=10.0) ``` The **dict key** `"sensor.imu"` becomes the SHM topic name. Both sides must use the same string. The **type** (`horus.Imu`) tells Python to use the POD zero-copy path — the same binary layout as Rust. ### Python Publisher, Rust Subscriber **Python side:** ```python import horus def tick(node): cmd = horus.CmdVel(1.5, 0.5) # linear, angular node.send("motor.cmd", cmd) pub = horus.Node("py_pub", tick=tick, pubs={"motor.cmd": horus.CmdVel}, rate=100) horus.run(pub, duration=10.0) ``` **Rust side:** ```rust let topic: Topic = Topic::new("motor.cmd")?; if let Some(cmd) = topic.recv() { println!("linear={}, angular={}", cmd.linear, cmd.angular); } ``` ### Requirements for Cross-Language IPC 1. **Same topic name** on both sides (the string in `Topic::new()` and the Python dict key) 2. **Typed topics in Python** — use `subs={"name": horus.Imu}`, not `subs="name"` (string topics use a different wire format) 3. **Same message type** — `Topic<Imu>` in Rust matches `horus.Imu` in Python ### Supported Cross-Language Types All 30+ typed message types work across Rust and Python: `CmdVel`, `Imu`, `Odometry`, `LaserScan`, `JointState`, `BatteryState`, `Pose2D`, `Pose3D`, `Twist`, `Vector3`, `Point3`, `Quaternion`, `MotorCommand`, `ServoCommand`, `DifferentialDriveCommand`, `Heartbeat`, `DiagnosticStatus`, `EmergencyStop`, `MagneticField`, `Temperature`, `FluidPressure`, `Illuminance`, `RangeSensor`, `NavSatFix`, `NavGoal`, `WrenchStamped`, `Clock`, `PidConfig`, `TrajectoryPoint`, `JointCommand` ### What Does NOT Work Cross-Language - **Python generic topics** (`subs="my_topic"` without a type) — these use Python-specific serialization that Rust cannot read - **Custom Rust serde types** — Python can only read the pre-defined POD types listed above - **Python dict messages** (`node.send("topic", {"x": 1.0})`) — not readable by Rust ## See Also - [Python Overview](/python) — Python documentation hub - [Choosing a Language](/getting-started/choosing-language) — Rust vs Python comparison - [Python Bindings](/python/api/python-bindings) — Python API reference --- ## Communication Overview Path: /concepts/communication-overview Description: When to use Topics, Services, and Actions — the three HORUS communication primitives # Communication Overview Think about how people communicate at work. Sometimes you leave a note on a shared whiteboard for anyone who walks by to read — you do not care who reads it or when, you just put the information out there. Other times, you pick up the phone and ask a colleague a direct question: "What is the status of order 4217?" You need an answer before you can continue. And sometimes you delegate an entire task: "Ship this package to Berlin, and let me know when it arrives." You want updates along the way, and you want the ability to say "actually, cancel that" if plans change. Robots have exactly the same three communication needs. A LiDAR sensor continuously publishes scan data for anyone who wants it — the path planner, the obstacle detector, the logger. A calibration routine needs to ask the map server for a specific region of the map and wait for the response before it can proceed. A navigation system needs to send the robot to a destination, receive progress updates ("3 meters remaining"), and cancel the trip if a higher-priority task arrives. HORUS gives you one communication primitive for each of these patterns: **Topics** for continuous data streams, **Services** for quick request/response exchanges, and **Actions** for long-running tasks with feedback and cancellation. Every robotics communication problem maps to one of these three. The rest of this page helps you understand what each one does and when to pick which. ## The Three Primitives ### Topics — Continuous Data Streams A topic is like a radio station. The broadcaster transmits whether anyone is listening or not. Listeners tune in when they want and tune out when they are done. The broadcaster does not know or care how many listeners exist — zero, one, or fifty. In HORUS, a topic is a named channel that carries a specific type of data. Publishers call `send()` to put data on the channel. Subscribers call `recv()` to read the latest value. The publisher and subscriber do not need to know about each other. They just agree on a topic name and a data type. Topics are the workhorse of robotics communication. Sensor data, motor commands, camera frames, diagnostic status, emergency stops — all flow through topics. They are fire-and-forget: the publisher sends and moves on. There is no acknowledgment, no reply, no waiting. **Latency**: ~3 ns (same thread) to ~167 ns (cross-process), depending on topology. See [Topics — Full Reference](/concepts/core-concepts-topic) for the 10 automatic backend paths. **Use topics when**: - Data is produced continuously (sensor readings, motor commands, video frames) - Multiple consumers might want the same data (logger + planner + display) - You care about the *latest* value, not every historical value - Speed matters more than guaranteed delivery ### Services — Request/Response A service is like a phone call. You dial, ask a question, wait for an answer, and hang up. It is synchronous — the caller blocks until the response arrives (or a timeout fires). There is exactly one server and one or more clients. ```rust // simplified use horus::prelude::*; // Client asks for map data — blocks until response arrives let mut client = ServiceClient::::new()?; let response = client.call( GetMapRegionRequest { x_min: 0.0, y_min: 0.0, x_max: 10.0, y_max: 10.0, resolution: 0.05 }, Duration::from_secs(1), )?; println!("Map: {}x{} pixels", response.width, response.height); ``` Services are for operations that complete quickly (milliseconds) and where the caller cannot continue without the result. Parameter queries, configuration lookups, one-shot commands, joint limit checks — these are all service calls. **Use services when**: - You need a response before you can continue - The operation finishes quickly (milliseconds, not seconds) - You need error reporting back to the caller - There is a clear question-and-answer pattern ### Actions — Long-Running Tasks An action is like delegating a task to a colleague. You say "navigate to shelf B3," and they go do it. While they work, they send you progress updates: "12 meters remaining... 8 meters remaining... arrived." If plans change, you can say "cancel, new priority" and they stop what they are doing. >Bot: Go to shelf B3 Note right of Bot: starts navigating... Bot-->>Op: 12m remaining, 20% Bot-->>Op: 8m remaining, 45% Bot-->>Op: 3m remaining, 78% Op->>Bot: Cancel! New priority Note right of Bot: stops safely Bot-->>Op: Canceled at (4.2, 1.1) `} caption="Action pattern: goal → progress feedback → cancel/complete" /> Actions follow the **Goal / Feedback / Result** pattern. The client sends a goal, the server sends periodic feedback, and eventually delivers a final result (succeeded, failed, or canceled). Actions support cancellation, preemption (a higher-priority goal interrupts the current one), and timeout. **Use actions when**: - The task takes more than one tick to complete (navigation, arm motion, calibration) - You need progress updates while the task runs - You need to cancel or preempt in-flight tasks - You need to know whether the task succeeded, failed, or was interrupted ## Choosing the Right Primitive When you are not sure which primitive to use, walk through this decision flowchart: between nodes"] --> Q1{"Is it continuous
streaming data?"} Q1 -->|Yes| TOPIC["Use Topic
sensor data, commands,
video, diagnostics"] Q1 -->|No| Q2{"Does it finish
in milliseconds?"} Q2 -->|Yes| Q3{"Do you need a
response back?"} Q3 -->|Yes| SERVICE["Use Service
parameter queries,
config lookups, one-shot RPC"] Q3 -->|No| TOPIC Q2 -->|No| Q4{"Do you need progress
updates or cancellation?"} Q4 -->|Yes| ACTION["Use Action
navigation, arm motion,
calibration, docking"] Q4 -->|No| Q5{"Do you need to know
when it finishes?"} Q5 -->|Yes| ACTION Q5 -->|No| TOPIC style TOPIC fill:#3b82f6,stroke:#2563eb,color:#fff style SERVICE fill:#22c55e,stroke:#16a34a,color:#000 style ACTION fill:#a855f7,stroke:#9333ea,color:#fff `} caption="Decision flowchart: continuous data goes to Topics, quick Q&A goes to Services, long tasks go to Actions" /> ### Decision Table | Scenario | Use | Why | |----------|-----|-----| | Sensor data streaming (IMU, LiDAR, encoders) | **Topic** | Continuous, latest value matters, multiple consumers | | Motor velocity commands | **Topic** | High frequency, low latency, fire-and-forget | | Camera frames at 30 FPS | **Topic** | Streaming, pool-backed zero-copy for large data | | Emergency stop signal | **Topic** | Must be instantaneous, no handshake overhead | | Diagnostics broadcast | **Topic** | Many publishers, flexible topology | | Query joint limits before planning | **Service** | Need response before continuing, completes in ms | | Fetch a map region for SLAM | **Service** | Need specific data, cannot proceed without it | | Trigger a one-shot sensor calibration | **Service** | Quick operation, need success/failure result | | Get current parameter values | **Service** | Direct question, direct answer | | Navigate to a waypoint | **Action** | Takes seconds, needs progress updates, cancellable | | Pick-and-place an object | **Action** | Multi-phase, needs feedback per phase | | Calibration routine (full sequence) | **Action** | Takes time, reports progress, might fail | | Dock at a charging station | **Action** | Long-running, needs alignment feedback | ### Quick Rules of Thumb - **If it is a stream** of data flowing continuously, use a **Topic**. - **If it is a question** you need answered before you can continue, use a **Service**. - **If it is a task** you would start, monitor, and potentially cancel, use an **Action**. - **When in doubt**, start with a Topic. It is the simplest primitive and covers the vast majority of robotics communication. ## How They Work Together Real robot systems use all three primitives together. A navigation system subscribes to sensor topics (LiDAR, odometry), exposes an action server for goal-based navigation, and calls a service to fetch map data on demand: ```rust // simplified struct NavigationSystem { // Topics for streaming sensor data (pub/sub) lidar_sub: Topic, odom_sub: Topic, // Topics for publishing commands (pub/sub) cmd_vel_pub: Topic, status_pub: Topic, // Action server for goal-based navigation (goal/feedback/result) nav_server: ActionServerNode, // Service client for on-demand map queries (request/response) map_client: ServiceClient, } ``` The navigation action server subscribes to sensor topics internally, calls the map service when it needs terrain data, computes a path, publishes velocity commands via topics, and reports progress back to the action client. All three primitives use the same underlying shared-memory infrastructure, so mixing them has zero additional overhead. Topic"] ODOM["Odometry
Topic"] CMD["CmdVel
Topic"] end subgraph Services MAP["Map Server
Service"] end subgraph Actions NAV["Navigate
Action"] end LIDAR -->|"subscribe"| NAV ODOM -->|"subscribe"| NAV NAV -->|"publish"| CMD NAV -->|"call"| MAP OP["Operator"] -->|"send goal"| NAV NAV -->|"feedback"| OP style LIDAR fill:#3b82f6,stroke:#2563eb,color:#fff style ODOM fill:#3b82f6,stroke:#2563eb,color:#fff style CMD fill:#3b82f6,stroke:#2563eb,color:#fff style MAP fill:#22c55e,stroke:#16a34a,color:#000 style NAV fill:#a855f7,stroke:#9333ea,color:#fff `} caption="A typical navigation system combines Topics (sensor data, motor commands), Services (map queries), and Actions (goal-based navigation)" /> ## Design Decisions **Why three primitives and not just one?** You could theoretically build everything with topics alone — request/response by publishing a request on one topic and listening for a response on another, long-running tasks by publishing goals and feedback on separate topics. ROS1 actually started with mostly topics and bolted on services and actions later. The problem is that reimplementing request correlation, timeout handling, cancellation, and progress tracking on top of raw pub/sub is error-prone and duplicated across every project. Three distinct primitives with clear semantics eliminate that boilerplate and make the developer's intent explicit in the code. **Why not more primitives?** Some frameworks add parameter servers, shared blackboards, or distributed state stores as separate communication channels. HORUS keeps it to three because every additional primitive adds cognitive load — developers must learn when to use each one. Parameters can be modeled as services (`GetParameter` / `SetParameter`). Shared state can be modeled as topics with the latest-value semantic. Three covers the full space without redundancy. **Why are Services and Actions built on Topics internally?** Services use two internal topics (`{name}.request` and `{name}.response`). Actions use topics for goal submission, feedback, and results. This is not an implementation shortcut — it is a deliberate design choice. Topics already handle all the hard problems: cross-process shared memory, automatic backend selection, live migration, zero-copy for large data. Building services and actions on top means they inherit all of that infrastructure for free. It also means a single debugging tool (`horus topic list`) shows all communication in the system, including service and action traffic. **Why poll-based services instead of callback-based?** The service server polls for incoming requests on a configurable interval (default: 5 ms). An alternative would be to wake the server thread immediately when a request arrives (interrupt-driven). HORUS uses polling because it is simpler to reason about in a real-time context: the server thread wakes at predictable intervals, does bounded work, and sleeps. Interrupt-driven wakeups create unpredictable timing spikes. The 5 ms default poll interval means worst-case service latency is 5 ms — fast enough for the configuration-query use cases services are designed for. If you need sub-millisecond response, use a topic. ## Trade-offs | Gain | Cost | |------|------| | **Three clear primitives** — every communication pattern maps to exactly one | Developers must learn which primitive to use (this page helps) | | **Topics are fire-and-forget** — zero overhead, no waiting for consumers | No delivery guarantee; slow subscribers lose messages | | **Services give guaranteed responses** — caller blocks until answer arrives | Blocking means the caller's tick stalls if the server is slow or down | | **Actions support cancellation and feedback** — full task lifecycle management | More complex API surface than topics or services | | **All built on shared memory** — same zero-copy infrastructure everywhere | Services and actions inherit topic limitations (e.g., ring buffer overflow on extreme load) | | **Services are poll-based** — predictable timing, no interrupt spikes | Worst-case 5 ms latency on default poll interval (configurable) | ## See Also - [Topics — Full Reference](/concepts/core-concepts-topic) — Automatic backend selection, naming, memory model - [Topics: How Nodes Talk](/concepts/topics-beginner) — Beginner introduction to pub/sub - [Services](/concepts/services) — Request/response API, error handling, resilient calls - [Actions](/concepts/actions) — Goal/feedback/result pattern, preemption, lifecycle - [Multi-Process Communication](/concepts/multi-process) — Cross-process and cross-machine topics --- ## Architecture Overview Path: /concepts/architecture Description: Understanding HORUS — the node model, communication patterns, scheduling, and memory system # Architecture Overview A warehouse robot picks orders from shelves. Its camera captures 30 frames per second, each frame 6 megabytes. A vision model detects items. A planner computes a path around obstacles. A motor controller executes the path at 1 kHz. A safety monitor watches for collisions. These components have fundamentally different timing requirements — the safety monitor must never be late, the vision model needs GPU time, the planner needs CPU time, and the motor controller needs sub-millisecond predictability. They also share large data: the camera frame must reach the vision model without being copied, and the planner's velocity commands must reach the motor controller in nanoseconds, not milliseconds. HORUS solves this with four primitives: **Nodes** (isolated components), **Topics** (zero-copy communication), a **Scheduler** (priority-based orchestration), and a **Memory System** (shared memory pools for large data). ## How It Works Nodes
Isolated components with tick lifecycle"] C["Topics
Zero-copy pub/sub channels"] S["Scheduler
Priority-based orchestration"] M["Memory
Shared memory pools"] end N <--> C C <--> S S <--> M M <--> N `} caption="The four primitives of HORUS" /> ### Nodes Everything in HORUS is a **Node** — an independent component with a well-defined lifecycle. Each node implements `tick()`, which the scheduler calls repeatedly: ```rust // simplified fn tick(&mut self) { if let Some(sensor_data) = self.sensor_topic.recv() { let command = self.compute_response(sensor_data); self.command_topic.send(command); } } ``` The tick model enables deterministic timing (know exactly when each node runs), profiling (measure how long each tick takes), and scheduling intelligence (the scheduler can optimize execution order). I["init()"] I --> R["tick() loop"] R --> S["shutdown()"] S --> D["Stopped"] R -.-> E["Error"] E --> R E --> S `} caption="Node lifecycle" /> ### Topics Nodes communicate through **Topics** — named shared-memory channels. You always use the same `Topic::new()` call; HORUS automatically selects the fastest backend based on topology: ```rust // simplified let topic: Topic = Topic::new("camera.image")?; topic.send(&frame); // publisher let frame = topic.recv(); // subscriber (another node) ``` | Backend | Latency | When selected | |---------|---------|---------------| | Same-thread | ~14 ns | Publisher and subscriber on same thread | | Same-process | ~82–182 ns | Same process, different threads | | Cross-process | ~162–165 ns | Different processes, shared memory | No configuration needed — the backend upgrades transparently as participants join or leave. |"publish"| T -->|"subscribe"| N2 `} caption="Transparent cross-process communication" /> ### Scheduler The scheduler orchestrates node execution with priority-based ordering, five execution classes, deadline monitoring, and graceful degradation: | Feature | Purpose | |---------|---------| | **Execution order** | `.order(n)` — lower runs first each tick | | **5 execution classes** | BestEffort, Rt, Compute, Event, AsyncIo — auto-detected from node config | | **Deadline enforcement** | `.budget()`, `.deadline()`, `.on_miss()` — graduated response to overruns | | **Watchdog** | Detects frozen nodes with graduated degradation (warn → reduce rate → isolate → safe state) | | **BlackBox** | Flight recorder for post-crash forensics | ### Memory System Large data (images, point clouds, tensors) uses shared memory pools for zero-copy transfer: 1920x1080x3"] end CAM["Camera"] -->|"write once"| T T -->|"read"| V1["Vision 1"] T -->|"read"| V2["Vision 2"] T -->|"read"| V3["Vision 3"] `} caption="Zero-copy: write once, read many" /> ## Data Flow Example A typical perception-to-action pipeline: Camera
30 Hz"] DET["Detector
compute()"] NAV["Planner
10 Hz"] CTRL["Controller
1 kHz RT"] MOT["Motors
Hardware"] CAM -->|"Image
(zero-copy)"| DET DET -->|"Detections
(Topic)"| NAV NAV -->|"CmdVel
(Topic)"| CTRL CTRL -->|"PWM
(Topic)"| MOT `} caption="Perception → Planning → Control pipeline" /> Total message-passing latency: under 1 microsecond (same-process backends). ## Design Decisions **Why nodes instead of functions?** Robotics systems need isolation. A crashing camera driver shouldn't bring down the motor controller. Nodes provide fault boundaries — the scheduler can isolate a failing node while the rest of the system continues. Functions in a monolith share a call stack and a single point of failure. **Why a single Topic API instead of separate in-process and cross-process APIs?** During development, you run everything in one process. In production, you split across processes for isolation. If the communication API changed between these modes, you'd have to rewrite code for deployment. The single `Topic::new()` API means the same code works in both modes — HORUS selects the optimal backend automatically. **Why a tick model instead of threads?** Threads are hard to reason about: priority inversion, lock contention, non-deterministic scheduling. The tick model gives the scheduler full control over execution order. For nodes that genuinely need their own thread (RT, Compute, AsyncIo), the scheduler creates one — but it manages the lifecycle, not the application. **Why shared memory instead of message passing?** A 4K RGB image is 24 MB. Serializing, copying, and deserializing it through a socket costs milliseconds. Shared memory costs nanoseconds — the subscriber reads directly from the publisher's memory. For small messages (`CmdVel`, `Imu`), the difference is less dramatic but still 300x faster than DDS. ## Trade-offs | Gain | Cost | |------|------| | Sub-microsecond IPC via shared memory | Single-machine only — no built-in cross-network transport | | Deterministic execution order via scheduler | All interacting nodes must be in the same scheduler (or use cross-process topics) | | Zero-copy large data (images, point clouds) | Fixed-size ring buffers — must choose capacity at topic creation | | Automatic backend selection (in-process vs SHM) | Can't force a specific backend — the system chooses | | Five execution classes for different workload types | More complex scheduler internals | | Node isolation with fault boundaries | Node crashes still lose that node's state — no automatic restart by default | ## Performance Summary | Metric | Value | |--------|-------| | Same-thread topic | ~14 ns | | Same-process topic | ~82–182 ns | | Cross-process topic | ~162–165 ns | | Scheduler tick overhead | ~50–100 ns | | Shared memory allocation | ~100 ns | | Framework memory overhead | ~2 MB | See [Benchmarks](/performance/benchmarks) for exact measured numbers. ## See Also - [What is HORUS?](/concepts/what-is-horus) — Overview and positioning - [Nodes (Concept)](/concepts/core-concepts-nodes) — Deep dive into the node model - [Topic (Concept)](/concepts/core-concepts-topic) — Communication architecture - [Scheduler (Concept)](/concepts/core-concepts-scheduler) — Tick loop and execution classes - [Execution Classes](/concepts/execution-classes) — The 5 classes and when to use each --- ## Custom Messages & Performance Path: /concepts/core-concepts-podtopic Description: Define your own message types and understand how HORUS optimizes transfer speed # Custom Messages & Performance Need a message type that doesn't exist in the standard library? Use the `message!` macro. HORUS handles all the optimization automatically — you never need to think about serialization or memory layout. For performance-sensitive applications: standard fixed-size types transfer at **~50ns** (zero-copy), variable-size types at **~167ns** (serialized). No configuration needed. ## Automatic Optimization When you create a `Topic`, HORUS inspects the message type at compile time and selects the fastest transfer strategy: - **Fixed-size types** (no heap allocations) use raw memory copy — no serialization overhead - **Variable-size types** (containing `String`, `Vec`, etc.) use fast bincode serialization You always use the same API. The optimization is invisible: ## Performance Comparison | Type Category | Cross-Process Latency | Examples | |---------------|----------------------|----------| | Fixed-size (zero-copy) | ~50-85ns | `CmdVel`, `Imu`, `Pose2D`, `Heartbeat` | | Variable-size (serialized) | ~167ns | `String`, `Vec`, `HashMap` | Same-process communication is fast for both categories since no shared memory serialization is needed. For most applications, the ~167ns serialized path is more than fast enough. Only high-frequency control loops running at 1kHz+ benefit noticeably from the zero-copy path. ## Custom Messages Use the `message!` macro to define your own message types. Add `#[fixed]` for zero-copy shared memory transport (~50ns) when all fields are primitives: For messages with `String`, `Vec`, or other dynamic data, omit `#[fixed]` — HORUS uses serialization transport automatically (~167ns): ## Built-in Message Types All standard HORUS message types are pre-optimized. Fixed-size types automatically use the zero-copy fast path. ### Geometry | Message | Description | |---------|-------------| | `CmdVel` | 2D velocity command (linear + angular) | | `Pose2D` | 2D position and orientation | | `Twist` | 3D linear and angular velocity | | `TransformStamped` | 3D transformation with timestamp | | `Point3` | 3D point | | `Vector3` | 3D vector | | `Quaternion` | Rotation quaternion | ### Sensors | Message | Description | |---------|-------------| | `Imu` | Inertial measurement unit data | | `LaserScan` | 2D laser range data | | `Odometry` | Position/velocity estimate | | `RangeSensor` | Single distance measurement | | `BatteryState` | Battery level and status | | `NavSatFix` | GPS position | ### Control | Message | Description | |---------|-------------| | `MotorCommand` | Individual motor control | | `DifferentialDriveCommand` | Differential drive control | | `ServoCommand` | Servo position/velocity | | `JointCommand` | Joint-level control | | `TrajectoryPoint` | Trajectory waypoint | | `PidConfig` | PID controller parameters | ### Diagnostics | Message | Description | |---------|-------------| | `Heartbeat` | Liveness signal | | `NodeHeartbeat` | Per-node health status | | `DiagnosticStatus` | General status report | | `EmergencyStop` | Emergency stop signal | | `SafetyStatus` | Safety system state | | `ResourceUsage` | CPU/memory usage | | `DiagnosticValue` | Single diagnostic measurement | | `DiagnosticReport` | Full diagnostic report | ### Navigation | Message | Description | |---------|-------------| | `NavGoal` | Navigation goal | | `GoalResult` | Goal completion result | | `Waypoint` | Navigation waypoint | | `NavPath` | Sequence of waypoints | | `PathPlan` | Planned path | | `VelocityObstacle` | Velocity obstacle for avoidance | | `VelocityObstacles` | Set of velocity obstacles | ### Force/Haptics | Message | Description | |---------|-------------| | `WrenchStamped` | Force/torque measurement | | `ForceCommand` | Force control command | | `ImpedanceParameters` | Impedance control config | | `ContactInfo` | Contact detection data | | `HapticFeedback` | Haptic output command | ### Input | Message | Description | |---------|-------------| | `JoystickInput` | Gamepad/joystick state | | `KeyboardInput` | Keyboard key events | ### Tensor | Message | Description | |---------|-------------| | `Tensor` | Fixed-size tensor descriptor | ## Design Decisions **Why automatic optimization instead of manual annotation?** In ROS2, you choose between message types and serialization strategies manually. HORUS inspects the type at compile time — if all fields are fixed-size primitives (no `String`, `Vec`, or heap pointers), it uses raw memory copy. If any field is dynamic, it uses fast bincode serialization. You always use the same `Topic` API. The optimization is invisible and always correct. **Why the `message!` macro instead of raw derives?** The macro generates the correct trait implementations (`Clone`, `Serialize`, `Deserialize`, and optionally `Copy` for fixed-size types) in one step. It also generates `LogSummary` for TUI debug logging and validates that `#[fixed]` types actually have fixed-size fields at compile time — catching mistakes that would otherwise cause silent performance degradation. **Why separate fixed-size and variable-size paths?** Fixed-size types can be memcpy'd directly into shared memory — no serialization, no allocation, ~50 ns. Variable-size types must be serialized because the receiver doesn't know the exact layout (how long is the `String`?). The two paths are invisible to the user but provide a 2–3x latency difference that matters at 1 kHz+. ## Trade-offs | Gain | Cost | |------|------| | **Automatic optimization** — fastest path selected at compile time | Must understand fixed vs variable-size to optimize manually | | **Zero-copy for fixed-size types** — ~50 ns cross-process | Fixed-size types can't contain `String`, `Vec`, or `Box` | | **Same API for all types** — `Topic` works identically | Variable-size types are ~3x slower (~167 ns) than fixed-size | | **`message!` macro** — one-line type definition with all derives | Macro syntax differs slightly from raw struct definition | ## See Also - [Topics — Full Reference](/concepts/core-concepts-topic) — The unified communication API - [Message Types](/concepts/message-types) — Full message type reference with all 70+ types - [Topic API](/rust/api/topic) — Complete Topic method documentation - [Architecture](/concepts/architecture) — How communication fits into the HORUS architecture --- ## Shared Memory Path: /concepts/shared-memory Description: How HORUS uses shared memory for zero-copy IPC — ring buffers, backend selection, cross-process discovery, and platform differences # Shared Memory Your robot's camera publishes 30 frames/sec at 2MB each. Copying through kernel pipes would consume 60MB/s of bandwidth and add milliseconds of latency. HORUS puts the data in shared memory — zero copies, sub-200ns latency, no kernel involvement. This page explains **how it works underneath**. You never interact with SHM directly — `Topic::new()` handles everything. But understanding the architecture helps you debug performance issues, choose the right message types, and work with multi-process systems. --- ## SHM Directory Structure When a HORUS application starts, `horus_sys` creates a namespace directory on the filesystem: ``` /dev/shm/horus_/ # Linux (tmpfs — backed by RAM) ├── topics/ # Ring buffer files, one per topic │ ├── horus_cmd_vel # SHM region for "cmd_vel" topic │ ├── horus_cmd_vel.meta # Discovery metadata (JSON) │ ├── horus_scan # SHM region for "scan" topic │ └── horus_scan.meta ├── nodes/ # Node presence files (JSON) │ ├── motor_controller.json │ └── lidar_driver.json ├── tensors/ # TensorPool regions (for Image, PointCloud) │ └── tensor_pool_a3f2c1d0 ├── scheduler/ # Scheduler state ├── network/ # Network transport state └── logs/ # Structured log files ``` **Namespace generation** (priority order): 1. `HORUS_NAMESPACE` env var — set by `horus launch` for multi-robot deployments 2. Auto-generated `sid{N}_uid{N}` from the session ID and user ID — isolates different users and terminal sessions This means two `horus run` commands in different terminals get different namespaces and don't interfere. Use `HORUS_NAMESPACE=shared` to explicitly share topics across terminals. --- ## Ring Buffer Architecture Every topic is backed by a lock-free ring buffer in a single mmap'd SHM region: ``` ┌─────────────────────────────────────────────────────────────────┐ │ SHM Region (mmap'd file) │ ├─────────────┬──────────────────┬────────────────────────────────┤ │ TopicHeader │ Sequence Array │ Data Slots │ │ 640 bytes │ capacity × 8B │ capacity × slot_size │ │ (10 cache │ (per-slot write │ (message data) │ │ lines) │ completion flag) │ │ └─────────────┴──────────────────┴────────────────────────────────┘ ``` **TopicHeader** is exactly 640 bytes (10 cache lines), `#[repr(C, align(64))]`. The critical optimization: **producer and consumer state live on separate cache lines** to prevent false sharing: ``` Cache Line 2 (bytes 64-127): head, capacity, capacity_mask, slot_size ↑ ONLY the producer writes here Cache Line 3 (bytes 128-191): tail ↑ ONLY the consumer writes here ``` This separation is the single most important optimization for achieving sub-20ns intra-process latency. Without it, every `send()` would invalidate the consumer's cache line and vice versa. The remaining cache lines hold: publisher/subscriber counts, participant tracking (up to 16 participants with PID, thread ID, role, and lease expiry), and the type name for runtime validation. **Capacity** is always a power of two. Index calculation uses a bitmask (`seq & (capacity - 1)`) instead of modulo — one CPU cycle instead of ~20 cycles for division. Auto-sizing: `PAGE_SIZE / sizeof(T)`, clamped to `[16, 1024]`. **Slot sizing** for POD types: - Small types (size + 8 ≤ 64): **co-located layout** — sequence number and data share one 64-byte cache line. This puts the write-completion flag and data on the same cache line, so the reader needs only one memory load. - Larger types: slot size equals `sizeof(T)`, with the sequence number in a separate array. --- ## Three Transport Paths When you call `topic.send(msg)`, one of three paths executes depending on the message type: ### 1. POD Zero-Copy (~3-150ns) For types with only primitive fields and no heap pointers (`CmdVel`, `Imu`, `Pose2D`, any `#[repr(C)]` struct without `String`, `Vec`, or `Box`): **How it works**: Raw `memcpy` from your struct directly into the ring buffer slot. No serialization, no allocation. For buffers ≥4KB, HORUS uses AVX2 non-temporal streaming stores (runtime-detected) to bypass the CPU cache and write directly to RAM. **POD auto-detection**: `!std::mem::needs_drop::() && std::mem::size_of::() > 1`. No user annotation needed — HORUS checks at compile time via monomorphization. ### 2. Serde Serialization (~167ns cross-process) For types with heap-allocated fields (`String`, `Vec`, `HashMap`): **How it works**: Bincode serialization into the ring buffer slot (up to `slot_size`, default 8KB). Deserialization on the reader side. Slower than POD but works with any `Serialize + Deserialize` type. **Spill mechanism**: Messages with serialized size above 4KB are "spilled" to a `TensorPool` region instead of inline storage. A 40-byte `SpillDescriptor` with a magic sentinel is written to the ring slot. The receiver detects the sentinel and reads from the pool. ### 3. Pool-Backed Descriptors (~50ns for descriptor) For large data types (`Image`, `PointCloud`, `DepthImage`): **How it works**: The actual pixel/point data lives in a `TensorPool` SHM region (separate from the ring buffer). Only a small descriptor (288-336 bytes, POD) flows through the ring buffer. The pool handles reference counting via atomics. **Performance comparison**: | Transport | Latency (same process) | Latency (cross process) | Suitable for | |-----------|----------------------|------------------------|-------------| | POD zero-copy | ~3-36ns | ~50-167ns | Fixed-size sensor data, commands | | Serde | ~50ns | ~167ns+ | Variable-length strings, logs | | Pool-backed | ~50ns (descriptor) | ~50ns (descriptor) | Images, point clouds, depth maps | --- ## Backend Selection HORUS automatically selects the optimal ring buffer backend based on the runtime topology. You never choose a backend — it's detected from how many publishers and subscribers exist. | Backend | Latency | When Used | |---------|---------|-----------| | DirectChannel | ~3ns | Same thread, POD type | | SpscIntra | ~18ns | Same process, 1 pub, 1 sub | | SpmcIntra | ~24ns | Same process, 1 pub, N subs | | MpscIntra | ~26ns | Same process, N pubs, 1 sub | | FanoutIntra | ~36ns | Same process, N pubs, N subs | | FanoutShm | ~40ns | Cross-process, N pubs, N subs | | PodShm | ~50ns | Cross-process, POD type | | MpscShm | ~65ns | Cross-process, N pubs, 1 sub | | SpmcShm | ~70ns | Cross-process, 1 pub, N subs | | SpscShm | ~85ns | Cross-process, 1 pub, 1 sub, non-POD | **Intra-process backends** use heap memory (`Box<[UnsafeCell>]>`) — no mmap overhead. The SHM file still exists for the control plane (header for cross-process discovery) but the data plane is heap-backed. **Live migration**: When a second process joins a topic, the backend migrates from intra-process to cross-process automatically. The migration uses a CAS-based lock, drains in-flight messages, increments an epoch counter, and updates the backend mode atomically. Existing publishers and subscribers adapt within 32 messages (~1-10μs). See [Topics](/concepts/core-concepts-topic) for the full backend selection details and communication patterns. --- ## Cross-Process Discovery Two processes sharing a topic need no configuration — they discover each other via the SHM filesystem. **How it works**: 1. Process A calls `Topic::new("imu")` → creates `/dev/shm/horus_/topics/horus_imu` (owner) + `.meta` JSON file 2. Process B calls `Topic::new("imu")` → opens the existing file (non-owner) → reads header → validates type compatibility → registers as participant 3. Both processes mmap the same file — writes by A are immediately visible to B **Join protocol**: The owner writes the header and sets a magic number last (with a memory fence). Joiners wait for the magic with exponential backoff (1ms base, 2s deadline). This handles the race where the file exists but the header isn't initialized yet. **Type validation**: When a joiner opens an existing topic, it checks `type_size` and `type_name` in the header against its own type. If they don't match (e.g., Process A uses `CmdVel` but Process B uses `Twist`), the join fails with an error. This prevents silent data corruption from type mismatches across processes. **Participant tracking**: The header tracks up to 16 concurrent participants (publishers and subscribers). Each entry records the PID, thread ID hash, role (publisher/subscriber), and a lease expiry timestamp. Stale participants are detected by expired leases and reclaimed automatically. **Meta files**: Each topic creates a `.meta` JSON file alongside the SHM file containing the topic name, size, creator PID, and creation timestamp. These are used by `horus topic list` and the discovery system on platforms where direct SHM file scanning isn't available (macOS, Windows). **Discovery tools**: - `horus topic list` scans the SHM directory and reads `.meta` files to show all active topics - `horus node list` reads node presence files from `nodes/` to show running nodes - Node presence files include PID, subscribed topics, published topics, and are validated via liveness checks --- ## Platform Differences | Feature | Linux | macOS | Windows | Fallback | |---------|-------|-------|---------|----------| | SHM mechanism | `/dev/shm` tmpfs + mmap | `shm_open()` (Mach VM) | `CreateFileMappingW` | `/tmp` file + mmap | | Base directory | `/dev/shm/horus_/` | `/tmp/horus_/` | `%TEMP%\horus_\` | `/tmp/horus_/` | | Stale detection | `flock(LOCK_EX\|LOCK_NB)` | PID check via `.meta` | PID check via `.meta` | `flock` | | Topic name rule | Any valid filename chars | **No slashes** — `shm_open` limitation | Any valid chars | Any valid filename chars | | Discovery | File scan + `.meta` | `.meta` files only | `.meta` files only | File scan + `.meta` | **macOS topic naming**: `shm_open()` does not allow `/` characters in SHM object names (beyond the leading `/`). Use **dots** instead of slashes: `sensor.imu` not `sensor/imu`. This is enforced across all platforms for portability. **macOS initialization race**: Between `shm_open(O_CREAT)` and `ftruncate()`, a non-owner process sees `st_size == 0`. HORUS handles this with `wait_for_shm_init()` — exponential backoff (1ms base, 10 retries). If the timeout expires (creator assumed dead), it calls `shm_unlink` and re-creates. **Windows SHM**: Uses named memory-mapped files backed by the system pagefile (`CreateFileMappingW`). Automatically released when all handles close — no manual cleanup needed. Discovery relies on `.meta` files in the `%TEMP%` directory. --- ## Cleanup ### Automatic Cleanup (you rarely need to do anything) HORUS has **three layers of automatic cleanup** — users almost never need manual intervention: **Layer 1: Drop-based (normal exit)** When a process exits normally (Ctrl+C, `.stop()`, scope exit), `Drop for ShmRegion` removes the SHM file and `.meta` if it's the owner. The `flock(LOCK_SH)` is automatically released by the kernel when the file descriptor closes — even on panic. **Layer 2: Startup namespace cleanup (every command)** Every `horus` CLI command and every `Scheduler::new()` call runs `cleanup_stale_namespaces()` automatically. This scans for `horus_sid*_uid*` directories from dead sessions and removes them. Cost: <1ms. You don't call this — it happens silently. **Layer 3: Pre-run stale topic cleanup** Before every `horus run`, stale topics (no live processes AND older than 5 minutes) are removed automatically. ### Stale SHM Detection When a process crashes (even via `SIGKILL`), the kernel closes its file descriptors and releases `flock` locks. HORUS detects stale SHM files by attempting an exclusive lock: ``` flock(fd, LOCK_EX | LOCK_NB) ├── EWOULDBLOCK → file is alive (another process holds a shared lock) └── success → file is stale (no shared locks held) ``` This is more reliable than PID-based detection — PIDs can be reused, and `kill(pid, 0)` races with process creation. ### Manual Cleanup (escape hatch) `horus clean --shm` is a manual escape hatch for edge cases. You should **almost never need it** because the automatic cleanup layers handle normal crashes. It exists for: - Debugging SHM issues when automatic cleanup hasn't triggered yet - Forcing a clean slate before benchmarking - After `kill -9` in rapid succession (before the next `horus` command auto-cleans) ```bash # Preview what would be cleaned horus clean --shm --dry-run # Manual cleanup (rarely needed) horus clean --shm # Nuclear option: clean everything (SHM + build cache) horus clean --all ``` --- ## Advanced: SIMD Optimization For messages ≥4KB, HORUS uses AVX2 non-temporal streaming stores to bypass the CPU cache: ``` Standard memcpy: src → L1 cache → L2 cache → RAM → L2 cache → L1 cache → dst Streaming stores: src → RAM → dst (bypasses cache hierarchy) ``` This matters for large messages (images, point clouds) where the data would evict useful cache entries. Runtime detection via `is_x86_feature_detected!("avx2")` with fallback to standard `memcpy` on non-x86 or older CPUs. The threshold is configurable but defaults to 4KB. --- ## Advanced: Dispatch Optimization The ring buffer hot path uses function-pointer dispatch (not enum matching) for send/recv. Each `Topic` caches a function pointer to the correct backend-specific send/recv implementation. This eliminates ~7ns per message compared to a match chain. Epoch checking is amortized: a process-local `AtomicU64` is checked every message (~1ns L1 read), but the SHM header epoch is only read every 32 messages (~20ns mmap read). This keeps the fast path fast while still detecting backend migrations promptly. --- ## Design Decisions **Why mmap, not pipes or sockets?** Pipes and sockets require kernel transitions for every message — `write()` + `read()` system calls add 1-5μs of overhead. Mmap'd shared memory is just regular memory access — the CPU reads/writes at RAM speed without entering the kernel. The tradeoff is complexity: ring buffer synchronization via atomics is harder to get right than `write()/read()`. **Why flock for stale detection, not PID files?** PID files require writing a file, reading it, and calling `kill(pid, 0)` — all of which race with process creation/destruction. `flock` is kernel-managed: the lock is automatically released when the process exits (even via SIGKILL), and the exclusive lock test is atomic. No race conditions. **Why cache-line separation for head/tail?** On x86, when two cores write to the same cache line, the MESI protocol bounces ownership between them (false sharing). Each bounce costs ~40-80ns. By placing the producer's head and consumer's tail on separate 64-byte cache lines, we eliminate this entirely. The header is 640 bytes (10 cache lines) specifically to accommodate this layout. **Why auto-detect POD instead of requiring user annotation?** `needs_drop::()` is a compile-time check that's always correct — if a type has no destructor, it has no heap pointers, so raw `memcpy` is safe. Requiring users to annotate types is error-prone (they forget, or annotate incorrectly). Auto-detection means zero-copy "just works" for the right types. **Why auto-capacity as a power of two?** Ring buffer index calculation via bitmask (`seq & (capacity - 1)`) is a single AND instruction (~1 cycle). Integer modulo (`seq % capacity`) requires division (~20 cycles). For a hot path that runs millions of times per second, this 19-cycle saving is significant. --- ## Trade-offs | Choice | Benefit | Cost | |--------|---------|------| | SHM ring buffers | Zero-copy, sub-200ns, no kernel | Complex synchronization, platform-specific code | | Automatic backend selection | User doesn't think about topology | Migration latency on topology change (~10μs) | | flock-based stale detection | Kernel-reliable, no race conditions | Linux/fallback only — macOS uses PID-based | | Cache-line-aligned header | Eliminates false sharing (~40-80ns saved) | 640 bytes per topic (mostly unused padding) | | AVX2 streaming stores | Bypass cache for large messages | x86-only, runtime detection overhead | | Auto POD detection | Zero-copy without user annotation | Types with `Drop` always use Serde, even if logically copyable | --- ## Common Issues ### "Topic not found" across processes Two processes must use the **same namespace** to see each other's topics. If you run `horus run` in two terminals, they get different auto-generated namespaces by default. Fix: ```bash # Terminal 1 HORUS_NAMESPACE=robot horus run src/sensor.rs # Terminal 2 HORUS_NAMESPACE=robot horus run src/controller.rs ``` Or use `horus launch` which sets the namespace automatically for all nodes. ### Stale topics after crash If a process crashes (SIGKILL, power loss), its SHM files persist. Other processes see stale data or fail to create topics with the same name. ```bash # Check for stale files horus clean --shm --dry-run # Remove them horus clean --shm ``` ### Type mismatch across processes If Process A publishes `CmdVel` on "cmd_vel" but Process B subscribes with a different struct (different field layout, different size), the header validation fails. Both processes must use the exact same message type definition. Use `horus msg hash CmdVel` to verify type compatibility. ### Topic names with slashes fail on macOS `shm_open()` on macOS does not allow `/` in SHM object names. Always use dots: `sensor.imu` not `sensor/imu`. This is enforced across all platforms for portability. ### Ring buffer overflow (dropped messages) If a subscriber is slower than a publisher, the ring buffer overwrites old messages. The subscriber sees this via `dropped_count()`: This is by design — HORUS prioritizes freshness over completeness. If you need guaranteed delivery, increase capacity or reduce the publisher rate. ### Memory usage is high Each topic allocates `capacity × slot_size` bytes. For a `Topic` with 1024 capacity and 2MB frames, that's 2GB of SHM. Use smaller capacity for large types: Or use pool-backed types (Image, PointCloud) which store data in a shared TensorPool with refcounting. --- ## Inspecting SHM at Runtime ```bash # List all active topics with backends and latency horus topic list --verbose # Watch a topic in real-time horus topic echo sensor.imu # Measure publishing rate horus topic hz sensor.imu # Measure bandwidth horus topic bw camera.rgb # See what nodes are running horus node list # On Linux, inspect SHM files directly ls -la /dev/shm/horus_*/topics/ ``` The `horus monitor` web UI and TUI also show real-time topic topology, message rates, and backend types for every topic. --- ## See Also - [Topics](/concepts/core-concepts-topic) — Backend selection, communication patterns, topic API - [Custom Messages & Performance](/concepts/core-concepts-podtopic) — POD types, `#[fixed]` attribute, zero-copy - [Multi-Process](/concepts/multi-process) — Running nodes across processes, cross-process topics - [Topic API](/rust/api/topic) — `Topic::new()`, `send()`, `recv()`, `try_recv()` - [Performance Optimization](/performance/performance) — Benchmarks, tuning, profiling - [CLI Reference](/development/cli-reference) — `horus clean --shm`, `horus topic list` --- ## TransformFrame Transform System Path: /concepts/transform-frame Description: High-performance coordinate frame management for real-time robotics # TransformFrame Transform System TransformFrame is HORUS's coordinate frame management system — a real-time-safe replacement for ROS2 TF2. It manages the spatial relationships between coordinate frames (e.g., "where is the camera relative to the robot base?") with lock-free lookups and sub-microsecond performance. ## Why TransformFrame? | Problem with TF2 | TransformFrame Solution | |------------------|-----------------| | Mutex-based locking | Lock-free seqlock protocol | | Unbounded latency spikes | Predictable sub-microsecond latency | | String-only frame lookup | Dual API: Integer IDs + Names | | No hard real-time support | Real-time safe, no allocations in hot path | ### Performance | Operation | TransformFrame | ROS2 TF2 | Speedup | |-----------|--------|----------|---------| | Lookup by ID | ~50ns | N/A | - | | Lookup by name | ~200ns | ~2us | 10x | | Chain resolution (depth 3) | ~150ns | ~5us | 33x | | Chain resolution (depth 10) | ~2.5us | ~15us | 6x | | Concurrent reads (4 threads) | ~800ns | ~8us | 10x | --- ## Basic Usage ## Transform Type ```rust // simplified pub struct Transform { pub translation: [f64; 3], // [x, y, z] in meters pub rotation: [f64; 4], // quaternion [x, y, z, w] (Hamilton convention) } ``` ### Creating Transforms ### Transform Operations --- ## Frame Registration ### Dynamic Frames Dynamic frames have transforms that change over time (e.g., robot joints, camera tracking): ### Static Frames Static frames have transforms that never change (e.g., sensor mounts, fixed offsets): Static frames use less memory since they don't maintain a history buffer. ### Frame Queries --- ## Transform Lookups ### By Name ### By ID (Hot Path) For control loops running at 1kHz+, cache frame IDs and use the ID-based API: ### Time-Travel Queries TransformFrame maintains a history buffer of past transforms, enabling queries at past timestamps with interpolation: If the exact timestamp isn't available, TransformFrame interpolates between the two nearest samples: - **Translation**: Linear interpolation - **Rotation**: SLERP (Spherical Linear Interpolation) --- ## Configuration ### Presets | Preset | Frames | Static | History | Cache | Memory | |--------|--------|--------|---------|-------|--------| | `small()` | 256 | 128 | 32 | 64 | ~550KB | | `medium()` | 1024 | 512 | 32 | 128 | ~2.2MB | | `large()` | 4096 | 2048 | 32 | 256 | ~9MB | Additional presets for simulation: ### Custom Configuration --- ## CLI Tools HORUS provides CLI equivalents of ROS2 tf2 tools: ```bash # List all coordinate frames horus tf list horus tf list --json # JSON output # Echo transform between frames (like ros2 run tf2_ros tf2_echo) horus tf echo camera base_link horus tf echo camera world --rate 10 # 10 Hz # Show frame tree (like ros2 run tf2_tools view_frames) horus tf tree # Get detailed info about a specific frame horus tf info camera # Check if transform exists between two frames horus tf can camera world # Monitor transform update rates horus tf hz ``` --- ## Statistics and Inspection ### TransformFrameStats Get system-wide statistics via `tf.stats()`: | Field | Type | Description | |-------|------|-------------| | `total_frames` | `usize` | Total registered frames | | `static_frames` | `usize` | Frames that never change | | `dynamic_frames` | `usize` | Frames updated at runtime | | `max_frames` | `usize` | Maximum capacity | | `history_len` | `usize` | Transform history buffer size | | `tree_depth` | `usize` | Maximum depth of the frame tree | | `root_count` | `usize` | Number of root frames (no parent) | ### FrameInfo Get metadata for a specific frame via `tf.frame_info(name)`: | Field | Type | Description | |-------|------|-------------| | `name` | `String` | Frame name | | `id` | `FrameId` | Internal frame ID | | `parent` | `Option` | Parent frame name (`None` for root) | | `is_static` | `bool` | Whether this frame never changes | ### Diagnostics ```rust // simplified // Validate frame tree integrity hf.validate()?; ``` --- ## Design Decisions ### Why Lock-Free Seqlock Instead of Mutex Coordinate frame lookups happen inside control loops running at 1kHz or faster. A mutex-based design (as in ROS2 TF2) means a writer updating a transform can block readers — and in a real-time system, priority inversion from mutex contention causes unbounded latency spikes. TransformFrame uses a seqlock protocol: writers increment a sequence number before and after updating, and readers retry if they detect a concurrent write. This guarantees that readers never block, even under heavy write load. The worst case for a reader is a retry (~50ns extra), not an unbounded wait. ### Why Dual API (Integer IDs + String Names) String-based frame lookups require hashing and comparison on every call — acceptable for setup code but wasteful in a 1kHz control loop that looks up the same frames every tick. TransformFrame provides both: string-based registration (`register_frame("camera", ...)`) for clarity at startup, and integer ID-based lookups (`tf_by_id(camera_id, world_id)`) for the hot path. Users cache frame IDs once during initialization and use them in the loop body, cutting lookup time from ~200ns to ~50ns. The string API remains available for introspection, CLI tools, and non-performance-critical code. ### Why Built-In Instead of a Separate Package In ROS2, TF2 is a separate package that must be installed, configured, and launched independently. This creates a common failure mode: new users forget to run a transform broadcaster, and their system silently produces wrong results. TransformFrame is part of `horus_library` and available via `use horus::prelude::*` — no extra dependency, no separate process, no configuration. Every HORUS project has access to coordinate transforms by default. The CLI (`horus tf list`, `horus tf echo`) also works out of the box without installing additional tools. ## Trade-offs | Area | Benefit | Cost | |------|---------|------| | **Seqlock protocol** | Readers never block; predictable sub-microsecond latency under contention | Readers may retry during a concurrent write — rare but adds ~50ns when it happens | | **Integer ID API** | ~50ns lookups in hot paths — 4x faster than string-based | Users must cache IDs at startup; using IDs without caching defeats the purpose | | **History buffer** | Time-travel queries with interpolation for sensor fusion and replay | Fixed-size ring buffer per dynamic frame (default 32 entries) consumes memory even if time-travel is unused | | **Pre-allocated frame slots** | No heap allocation during runtime — real-time safe after initialization | Maximum frame count must be chosen at startup; exceeding it requires `enable_overflow` (which uses a HashMap and is not RT-safe) | | **Built-in to horus_library** | Zero-config availability, no forgotten dependencies | Adds to the base library size even for projects that do not use coordinate transforms | | **Static frame optimization** | Static frames skip the history buffer — less memory, faster lookups | Changing a static frame (e.g., recalibration) requires an explicit `set_static_transform` call instead of the normal `update_transform` path | ## See Also - **[Message Types](/concepts/message-types)** — TransformStamped and other spatial messages - **[Architecture](/concepts/architecture)** — How TransformFrame fits into HORUS - **[Quick Start](/getting-started/quick-start)** — Get started with HORUS --- ## Core Concepts Path: /concepts Description: Fundamental HORUS concepts — nodes, topics, scheduler, execution classes, and communication patterns # Core Concepts Everything you need to understand how HORUS works — from the basic building blocks to advanced communication patterns. ## Start Here - [Architecture](/concepts/architecture) — How the four primitives (Nodes, Topics, Scheduler, Memory) fit together - [Choosing Configuration](/concepts/choosing-configuration) — Progressive guide from prototyping to production ## Nodes - [Nodes (Beginner)](/concepts/nodes-beginner) — What nodes are and how they work - [Nodes — Full Reference](/concepts/core-concepts-nodes) — Node trait, lifecycle, communication patterns, safety - [The node! Macro](/concepts/node-macro) — Declarative node definition with less boilerplate ## Topics & Communication - [Topics (Beginner)](/concepts/topics-beginner) — How nodes talk through pub/sub channels - [Topics — Full Reference](/concepts/core-concepts-topic) — 10 backends, ring buffers, memory model, performance - [Communication Overview](/concepts/communication-overview) — Topics vs Services vs Actions — when to use each - [Services](/concepts/services) — Request/response RPC pattern (Beta) - [Actions](/concepts/actions) — Long-running tasks with feedback and cancellation (Beta) - [Multi-Process](/concepts/multi-process) — Cross-process shared memory communication ## Scheduler & Execution - [Scheduler (Beginner)](/concepts/scheduler-beginner) — How the scheduler runs your nodes - [Scheduler — Full Reference](/concepts/core-concepts-scheduler) — Tick model, execution classes, safety, deterministic mode - [Execution Classes](/concepts/execution-classes) — Rt, Compute, Event, AsyncIo, BestEffort - [Real-Time Systems](/concepts/real-time) — What RT means, why robots need it, how HORUS handles it ## Messages & Data - [Message Types](/concepts/message-types) — 70+ standard robotics messages (CmdVel, Imu, LaserScan, etc.) - [Custom Messages & Performance](/concepts/core-concepts-podtopic) — Define your own types, zero-copy optimization - [TransformFrame](/concepts/transform-frame) — Coordinate frame management (lock-free, sub-microsecond) ## Configuration & Multi-Language - [horus.toml](/concepts/horus-toml) — Project manifest and dependency management - [Multi-Language](/concepts/multi-language) — Rust + Python integration - [What is HORUS?](/concepts/what-is-horus) — Overview and positioning ## See Also - [Learn](/learn) — Why HORUS, comparisons, migration guides - [Tutorials](/tutorials) — Hands-on learning path - [Getting Started](/getting-started/installation) — Install and build your first robot - [Rust API Reference](/rust/api) — Complete Rust API documentation - [Python API Reference](/python/api) — Python bindings documentation ======================================== # SECTION: Tutorials ======================================== --- ## Tutorials Path: /tutorials Description: Step-by-step guides that teach robotics concepts by building real applications with HORUS # Tutorials Step-by-step guides that teach robotics concepts by building real, runnable applications with HORUS. Each tutorial builds a **complete working application**. Follow them in order — each one builds on concepts from the previous. ## Rust Learning Path | # | Tutorial | What You'll Build | What You'll Learn | |---|---------|-------------------|-------------------| | 1 | [Quick Start](/getting-started/quick-start) | Temperature sensor + monitor | Node, Topic, Scheduler basics | | 2 | [Sensor Pipeline](/getting-started/second-application) | 3-node sensor → filter → display | Multi-node architecture, message flow | | 3 | [IMU Sensor Node](/tutorials/01-sensor-node) | IMU data publisher at 100Hz | Custom data types, `.hz()`, sensor patterns | | 4 | [Motor Controller](/tutorials/02-motor-controller) | Velocity → position controller | Subscribing, state management, safe shutdown | | 5 | [Full Robot Integration](/tutorials/03-full-robot) | 4-node robot system | Multi-rate scheduling, data fusion, monitoring | | 6 | [Custom Message Types](/tutorials/04-custom-messages) | Battery monitor with custom messages | message! macro, manual derives, GenericMessage | | 7 | [Hardware Drivers](/tutorials/05-hardware-drivers) | Servo bus + custom conveyor driver | horus.toml [hardware], NodeParams, register_driver! | | 8 | [Real-Time Control](/tutorials/realtime-control) | Hard-RT motor + lidar + planner | .rate(), .budget(), .on_miss(), .failure_policy(), .compute() | | 9 | [Debug with Record & Replay](/tutorials/record-replay-debugging) | Record a bug, replay, fix with mixed replay | .with_recording(), replay_from(), inject, diff | | 10 | [Choosing Configuration](/concepts/choosing-configuration) | Progressive config levels | Rate, budget, deadline, safety | | 11 | [Common Mistakes](/getting-started/common-mistakes) | — | 13 pitfalls and how to avoid them | ## Python Tutorials The same progression, using the Python API: | # | Tutorial | What You'll Build | |---|---------|-------------------| | 1 | [Quick Start (Python)](/getting-started/quick-start-python) | Temperature sensor + monitor | | 2 | [IMU Sensor Node (Python)](/tutorials/01-sensor-node-python) | IMU data publisher at 100Hz | | 3 | [Motor Controller (Python)](/tutorials/02-motor-controller-python) | Velocity → position controller with safe shutdown | | 4 | [Full Robot System (Python)](/tutorials/03-full-robot-python) | 4-node robot with multi-rate scheduling | | 5 | [Custom Messages (Python)](/tutorials/04-custom-messages-python) | Typed messages, dicts, dataclasses | | 6 | [Hardware & Real-Time (Python)](/tutorials/05-hardware-and-rt-python) | Drivers, budgets, deadlines, production config | ## Prerequisites - HORUS installed ([Installation Guide](/getting-started/installation)) - Basic Rust or Python knowledge ([Choosing a Language](/getting-started/choosing-language)) --- ## See Also - [Getting Started](/getting-started/installation) — Install HORUS first - [Recipes](/recipes) — Production-ready copy-paste patterns - [Core Concepts](/concepts) — Conceptual foundations --- ## Tutorial: LiDAR Sensor Node (C++) Path: /tutorials/01-sensor-node-cpp Description: Build a LiDAR sensor node that publishes scan data and processes it for obstacle avoidance # Tutorial: LiDAR Sensor Node (C++) In this tutorial, you'll build a complete obstacle avoidance robot controller in C++. You'll learn: - Creating a scheduler with real-time settings - Publishing sensor data with the zero-copy loan pattern - Subscribing to data and processing it - Multi-node pipelines with execution ordering ## Step 1: Create the Project ```bash horus new lidar_robot --cpp cd lidar_robot ``` ## Step 2: Write the Controller Replace `src/main.cpp` with: ```cpp #include using namespace horus::literals; int main() { horus::Scheduler sched; sched.tick_rate(100_hz).prefer_rt(); // ── Topics ───────────────────────────────────────────── auto scan_pub = sched.advertise("lidar.scan"); auto scan_sub = sched.subscribe("lidar.scan"); auto cmd_pub = sched.advertise("cmd_vel"); // ── Node 1: Simulated LiDAR Driver ───────────────────── sched.add("lidar_driver") .rate(100_hz) .order(0) .tick([&] { auto scan = scan_pub.loan(); // Simulate: obstacles at 45° and 315° for (int i = 0; i < 360; ++i) { if (i > 40 && i < 50) { scan->ranges[i] = 0.3f; // close obstacle } else { scan->ranges[i] = 5.0f; // clear } } scan->angle_min = 0.0f; scan->angle_max = 6.28318f; scan_pub.publish(std::move(scan)); }) .build(); // ── Node 2: Obstacle Avoidance Controller ────────────── sched.add("controller") .rate(50_hz) .order(10) .budget(5_ms) .on_miss(horus::Miss::Skip) .tick([&] { auto scan = scan_sub.recv(); if (!scan) return; // Find minimum range in front 60° arc float min_front = 999.0f; for (int i = 150; i < 210; ++i) { if (scan->ranges[i] > 0.01f && scan->ranges[i] < min_front) { min_front = scan->ranges[i]; } } auto cmd = cmd_pub.loan(); if (min_front < 0.5f) { cmd->linear = 0.0f; cmd->angular = 0.5f; // turn } else { cmd->linear = 0.3f; cmd->angular = 0.0f; // go straight } cmd_pub.publish(std::move(cmd)); }) .build(); sched.spin(); } ``` ## Step 3: Build and Run ```bash horus build horus run ``` ## Step 4: Monitor In another terminal: ```bash horus topic list # see active topics horus topic hz cmd_vel # check publish rate horus node list # see running nodes horus topic echo cmd_vel # watch velocity commands ``` ## Key Takeaways 1. **Captured pub/sub** — create topics outside the tick lambda, capture by reference 2. **Loan pattern** — `pub.loan()` gives you a direct SHM pointer (0ns data access) 3. **Execution order** — `order(0)` runs before `order(10)`, ensuring sensor data is fresh 4. **Budget enforcement** — `budget(5_ms)` + `on_miss(Skip)` keeps the system responsive 5. **Zero configuration** — no XML launch files, no msg files, no codegen step ## Next Steps - Add a third node for motor control - Use `horus record` to capture and replay the session - Try `horus sim3d` for 3D simulation with physics --- ## Tutorial 1: IMU Sensor Node (Python) Path: /tutorials/01-sensor-node-python Description: Read IMU sensor data and publish it over a topic — Python edition # Tutorial 1: IMU Sensor Node (Python) Build a node that simulates IMU sensor readings (accelerometer + gyroscope) and publishes them over a topic. A second node subscribes and displays the data. ## What You'll Learn - Creating a Python node with `horus.Node()` - Publishing data with `node.send()` - Subscribing with `node.recv()` - Running multiple nodes with `horus.run()` - Setting execution order and tick rate ## Step 1: Create the Project ```bash horus new imu-tutorial -p cd imu-tutorial ``` ## Step 2: Write the Code Replace `main.py`: ```python import horus import math # ── Sensor Node ────────────────────────────────────────────── def make_sensor(): """Simulates an IMU producing accelerometer and gyroscope readings.""" t = [0.0] def tick(node): t[0] += horus.dt() # Simulate accelerometer (gravity + vibration) accel_x = 0.1 * math.sin(t[0] * 2.0) accel_y = 0.05 * math.cos(t[0] * 3.0) accel_z = 9.81 + 0.02 * math.sin(t[0] * 10.0) # Simulate gyroscope (slow rotation) gyro_roll = 0.01 * math.sin(t[0]) gyro_pitch = 0.02 * math.cos(t[0] * 0.5) gyro_yaw = 0.05 reading = { "accel_x": accel_x, "accel_y": accel_y, "accel_z": accel_z, "gyro_roll": gyro_roll, "gyro_pitch": gyro_pitch, "gyro_yaw": gyro_yaw, "timestamp": horus.now(), } node.send("imu.data", reading) return horus.Node(name="ImuSensor", tick=tick, rate=100, order=0, pubs=["imu.data"]) # ── Display Node ───────────────────────────────────────────── def make_display(): """Prints every 100th IMU sample to avoid flooding the terminal.""" count = [0] def tick(node): msg = node.recv("imu.data") if msg is not None: count[0] += 1 if count[0] % 100 == 0: print(f"[#{count[0]}] accel=({msg['accel_x']:.3f}, {msg['accel_y']:.3f}, {msg['accel_z']:.2f})" f" gyro=({msg['gyro_roll']:.4f}, {msg['gyro_pitch']:.4f}, {msg['gyro_yaw']:.4f})") return horus.Node(name="ImuDisplay", tick=tick, rate=100, order=1, subs=["imu.data"]) # ── Main ───────────────────────────────────────────────────── print("Starting IMU tutorial...\n") horus.run(make_sensor(), make_display()) ``` ## Step 3: Run It ```bash horus run ``` You'll see output every 100 samples: ``` Starting IMU tutorial... [#100] accel=(0.091, -0.042, 9.81) gyro=(0.0084, 0.0193, 0.0500) [#200] accel=(-0.054, 0.048, 9.81) gyro=(0.0059, 0.0100, 0.0500) [#300] accel=(-0.099, -0.013, 9.81) gyro=(-0.0029, -0.0098, 0.0500) ``` Press **Ctrl+C** to stop. ## Understanding the Code ### State Management Python nodes use closures for state. Wrap mutable state in a list (`t = [0.0]`) since closures can't reassign outer variables: ```python t = [0.0] # mutable via t[0] def tick(node): t[0] += horus.dt() # works ``` ### Timing with `horus.dt()` Use `horus.dt()` instead of `time.time()` — it returns the actual timestep and works correctly in deterministic mode: ```python t[0] += horus.dt() # correct — uses framework time ``` ### Execution Order `order=0` runs before `order=1`. The sensor publishes data before the display reads it: ```python sensor = horus.Node(..., order=0) # runs first display = horus.Node(..., order=1) # runs second ``` ## Using Typed Messages The dict-based version above works, but typed topics are significantly faster and enforce a schema at compile time. Here is the same sensor + display tutorial rewritten with `horus.Imu`: ```python import horus import math # ── Sensor Node (typed) ──────────────────────────────────── def make_sensor(): """Publishes IMU readings using the typed horus.Imu message.""" t = [0.0] def tick(node): t[0] += horus.dt() reading = horus.Imu( accel_x=0.1 * math.sin(t[0] * 2.0), accel_y=0.05 * math.cos(t[0] * 3.0), accel_z=9.81 + 0.02 * math.sin(t[0] * 10.0), gyro_x=0.01 * math.sin(t[0]), gyro_y=0.02 * math.cos(t[0] * 0.5), gyro_z=0.05, ) node.send("imu", reading) return horus.Node(name="imu_sensor", tick=tick, rate=100, order=0, pubs=[horus.Imu]) # ── Display Node (typed) ─────────────────────────────────── def make_display(): """Receives typed Imu messages — fields are attributes, not dict keys.""" count = [0] def tick(node): imu = node.recv("imu") if imu: count[0] += 1 if count[0] % 100 == 0: print(f"[#{count[0]}] Accel Z: {imu.accel_z:.2f} m/s² " f"Gyro: ({imu.gyro_x:.4f}, {imu.gyro_y:.4f}, {imu.gyro_z:.4f})") return horus.Node(name="display", tick=tick, rate=10, order=1, subs=[horus.Imu]) # ── Main ──────────────────────────────────────────────────── print("Starting IMU tutorial (typed)...\n") horus.run(make_sensor(), make_display()) ``` ### Key Differences from the Dict Version | | Dict topics | Typed topics | |---|---|---| | **Publish** | `node.send("imu.data", {"accel_z": 9.81})` | `node.send("imu", horus.Imu(accel_z=9.81))` | | **Subscribe** | `msg["accel_z"]` | `imu.accel_z` | | **Declare** | `pubs=["imu.data"]` | `pubs=[horus.Imu]` | | **Transport** | GenericMessage serialization | Zero-copy Pod transport | > **Performance:** Typed topics (`horus.Imu`) use zero-copy Pod transport at ~1.7 us. Dict topics use GenericMessage serialization at ~6-12 us. Use typed topics for anything in a control loop. ## Experiments 1. **Change the rate** — try `rate=1000` for 1kHz sampling 2. **Add noise** — use `horus.rng_float()` for deterministic random noise 3. **Add a filter node** — subscribe to raw IMU, apply smoothing, publish filtered data ## Next Steps - [Tutorial 2: Motor Controller (Python)](/tutorials/02-motor-controller-python) — subscribe to commands and control a motor - [Tutorial 1 (Rust)](/tutorials/01-sensor-node) — same tutorial in Rust - [Python API Reference](/python/api/python-bindings) — full API docs --- ## See Also - [IMU Sensor (Rust)](/tutorials/01-sensor-node) — Rust version - [Python Bindings](/python/api/python-bindings) — Python API reference --- ## Tutorial 2: Motor Controller (Python) Path: /tutorials/02-motor-controller-python Description: Subscribe to velocity commands, simulate motor physics, publish state feedback — Python edition # Tutorial 2: Motor Controller (Python) Build a motor controller that subscribes to velocity commands, simulates motor physics, and publishes position/velocity feedback. Includes safe shutdown. ## What You'll Learn - Subscribing to command topics - Publishing state feedback - Managing state between ticks - Multiple topics per node (sub + pub) - Safe shutdown for actuators ## Step 1: Create the Project ```bash horus new motor-tutorial -p cd motor-tutorial ``` ## Step 2: Write the Code Replace `main.py`: ```python import horus import math # ── Commander Node ─────────────────────────────────────────── def make_commander(): """Generates sine-wave velocity commands.""" t = [0.0] def tick(node): t[0] += horus.dt() velocity = 2.0 * math.sin(t[0] * 0.5) # oscillate ±2 rad/s node.send("motor.cmd", {"velocity": velocity, "max_torque": 5.0}) return horus.Node(name="Commander", tick=tick, rate=100, order=0, pubs=["motor.cmd"]) # ── Motor Controller Node ─────────────────────────────────── def make_motor(): """Simulates a motor: integrates velocity → position.""" state = {"position": 0.0, "velocity": 0.0} def tick(node): cmd = node.recv("motor.cmd") if cmd is not None: state["velocity"] = cmd["velocity"] # Integrate velocity → position dt = horus.dt() state["position"] += state["velocity"] * dt # Publish state feedback node.send("motor.state", { "position": state["position"], "velocity": state["velocity"], "timestamp": horus.now(), }) def shutdown(node): # SAFETY: stop the motor before exiting state["velocity"] = 0.0 node.send("motor.cmd", {"velocity": 0.0, "max_torque": 0.0}) print("Motor stopped safely") return horus.Node(name="MotorController", tick=tick, shutdown=shutdown, rate=100, order=1, subs=["motor.cmd"], pubs=["motor.state"]) # ── Display Node ──────────────────────────────────────────── def make_display(): """Prints motor state every 50 samples.""" count = [0] def tick(node): msg = node.recv("motor.state") if msg is not None: count[0] += 1 if count[0] % 50 == 0: print(f"[#{count[0]}] pos={msg['position']:.2f} rad" f" vel={msg['velocity']:.2f} rad/s") return horus.Node(name="StateDisplay", tick=tick, rate=100, order=2, subs=["motor.state"]) # ── Main ──────────────────────────────────────────────────── print("Starting motor controller tutorial...\n") horus.run(make_commander(), make_motor(), make_display()) ``` ## Step 3: Run It ```bash horus run ``` ``` Starting motor controller tutorial... [#50] pos=0.12 rad vel=0.50 rad/s [#100] pos=0.95 rad vel=1.73 rad/s [#150] pos=2.84 rad vel=1.98 rad/s [#200] pos=4.76 rad vel=0.97 rad/s ``` Press **Ctrl+C** — you'll see "Motor stopped safely" confirming the shutdown callback ran. ## Understanding the Code ### Safe Shutdown The `shutdown` callback is critical for actuator nodes. When you press Ctrl+C, HORUS calls `shutdown()` on every node before exiting: ```python def shutdown(node): state["velocity"] = 0.0 node.send("motor.cmd", {"velocity": 0.0, "max_torque": 0.0}) print("Motor stopped safely") ``` Without this, a real motor would continue at its last commanded velocity. ### Data Flow ``` Commander (order=0) → motor.cmd → MotorController (order=1) → motor.state → Display (order=2) ``` Lower order numbers run first each tick, ensuring the commander publishes before the controller reads. ### State Between Ticks The `state` dict persists across ticks via closure. The motor integrates velocity into position every tick: ```python state["position"] += state["velocity"] * dt ``` ## Using a Class for State For complex nodes, a class is cleaner than closures. Here is a complete motor controller rewritten as a class: ```python import horus class MotorController: def __init__(self): self.kp = 1.5 self.target_speed = 0.0 def tick(self, node): cmd = node.recv("cmd_vel") if cmd: self.target_speed = cmd.linear error = self.target_speed - self.current_speed() node.send("motor_cmd", {"rpm": error * self.kp}) def shutdown(self, node): node.send("motor_cmd", {"rpm": 0.0}) node.log_info("Motors zeroed") def current_speed(self): # In a real system, read from an encoder return 0.0 motor = MotorController() node = horus.Node( name="motor", pubs=["motor_cmd"], subs=["cmd_vel"], tick=motor.tick, shutdown=motor.shutdown, rate=100 ) horus.run(node) ``` Python automatically binds `self` when you pass `motor.tick` as the callback. The scheduler calls `tick(node)` which becomes `motor.tick(node)` with `self` bound. This means your tick function receives both the instance state (`self`) and the HORUS node handle (`node`) without any extra wiring. ### Lifecycle: Shutdown Guarantees The scheduler calls `shutdown(node)` on Ctrl+C, SIGTERM, or when the run duration expires. If `shutdown` raises an exception, it is caught and logged — other nodes still shut down normally. This means you can safely send final commands (like zeroing motors) without worrying about one node's failure preventing cleanup of the others. ## Next Steps - [Tutorial 3: Full Robot (Python)](/tutorials/03-full-robot-python) — combine sensors and motors into a complete system - [Tutorial 2 (Rust)](/tutorials/02-motor-controller) — same tutorial in Rust - [Common Mistakes](/getting-started/common-mistakes) — avoid shutdown pitfalls --- ## See Also - [Motor Controller (Rust)](/tutorials/02-motor-controller) — Rust version - [Python Bindings](/python/api/python-bindings) — Python API reference --- ## Tutorial 3: Full Robot System (Python) Path: /tutorials/03-full-robot-python Description: Combine sensors, controller, and state estimator into a complete multi-rate robot — Python edition # Tutorial 3: Full Robot System (Python) Combine an IMU sensor, velocity commander, motor controller, and state estimator into a complete robot system with multi-rate scheduling. ## What You'll Learn - Composing 4+ nodes into a working system - Multi-rate scheduling (different nodes at different frequencies) - Data fusion (subscribing to multiple topics) - Monitoring with `horus monitor` and `horus topic echo` ## Architecture ``` ImuSensor (100Hz) ──→ imu.data ──→ StateEstimator (100Hz) ──→ robot.pose Commander (10Hz) ──→ motor.cmd ──→ MotorController (50Hz) ──→ motor.state ──→ StateEstimator ``` ## The Code ```python import horus import math us = horus.us # microsecond constant for budget/deadline # ── IMU Sensor (100 Hz) ───────────────────────────────────── def make_imu(): t = [0.0] def tick(node): t[0] += horus.dt() node.send("imu.data", { "accel_x": 0.1 * math.sin(t[0] * 2.0), "accel_y": 0.05 * math.cos(t[0] * 3.0), "accel_z": 9.81, "gyro_yaw": 0.05, # constant rotation "timestamp": horus.now(), }) return horus.Node(name="ImuSensor", tick=tick, rate=100, order=0, pubs=["imu.data"]) # ── Commander (10 Hz) ──────────────────────────────────────── def make_commander(): t = [0.0] def tick(node): t[0] += horus.dt() node.send("motor.cmd", { "velocity": 1.0 + 0.5 * math.sin(t[0] * 0.3), }) return horus.Node(name="Commander", tick=tick, rate=10, order=1, pubs=["motor.cmd"]) # ── Motor Controller (50 Hz) ──────────────────────────────── def make_motor(): state = {"velocity": 0.0, "position": 0.0} def tick(node): cmd = node.recv("motor.cmd") if cmd is not None: state["velocity"] = cmd["velocity"] state["position"] += state["velocity"] * horus.dt() node.send("motor.state", { "position": state["position"], "velocity": state["velocity"], }) def shutdown(node): state["velocity"] = 0.0 print("Motor stopped safely") return horus.Node(name="MotorController", tick=tick, shutdown=shutdown, rate=50, order=2, subs=["motor.cmd"], pubs=["motor.state"]) # ── State Estimator (100 Hz) ──────────────────────────────── def make_estimator(): """Fuses IMU gyro + motor velocity into a robot pose estimate.""" pose = {"x": 0.0, "y": 0.0, "heading": 0.0} def tick(node): dt = horus.dt() # Fuse IMU data (heading from gyro) imu = node.recv("imu.data") if imu is not None: pose["heading"] += imu["gyro_yaw"] * dt # Fuse motor state (position from velocity) motor = node.recv("motor.state") if motor is not None: pose["x"] += motor["velocity"] * math.cos(pose["heading"]) * dt pose["y"] += motor["velocity"] * math.sin(pose["heading"]) * dt node.send("robot.pose", { "x": pose["x"], "y": pose["y"], "heading": pose["heading"], "timestamp": horus.now(), }) return horus.Node(name="StateEstimator", tick=tick, rate=100, order=3, subs=["imu.data", "motor.state"], pubs=["robot.pose"]) # ── Main ──────────────────────────────────────────────────── print("Starting full robot system...") print(" ImuSensor: 100 Hz (order 0)") print(" Commander: 10 Hz (order 1)") print(" MotorController: 50 Hz (order 2)") print(" StateEstimator: 100 Hz (order 3)") print() horus.run( make_imu(), make_commander(), make_motor(), make_estimator(), tick_rate=100, watchdog_ms=500, ) ``` ## Run and Monitor ```bash # Terminal 1: Run the robot horus run # Terminal 2: Monitor topics horus topic echo robot.pose # Terminal 3: Full monitoring dashboard horus monitor ``` ## Key Concepts ### Multi-Rate Scheduling Each node runs at its own rate. The scheduler handles the timing: ```python make_imu() # rate=100 — ticks 100 times per second make_commander() # rate=10 — ticks 10 times per second make_motor() # rate=50 — ticks 50 times per second make_estimator() # rate=100 — ticks 100 times per second ``` ### Data Fusion The state estimator subscribes to **two** topics and fuses them: ```python imu = node.recv("imu.data") # gyro → heading motor = node.recv("motor.state") # velocity → position ``` Always call `recv()` on every subscribed topic every tick — even if you don't need the data yet. This prevents stale data accumulation. ### Watchdog `watchdog_ms=500` detects frozen nodes. If any node's tick takes longer than 500ms, the safety monitor triggers. ## Using the Scheduler Directly For production configs with RT features: ```python scheduler = horus.Scheduler(tick_rate=100, watchdog_ms=500, rt=True) scheduler.add(make_imu()) scheduler.add(make_commander()) scheduler.add(make_motor()) scheduler.add(make_estimator()) scheduler.run() ``` ## Next Steps - [Tutorial 4: Custom Messages (Python)](/tutorials/04-custom-messages-python) — define your own message types - [Tutorial 3 (Rust)](/tutorials/03-full-robot) — same tutorial in Rust - [Scheduler API](/python/api/python-bindings#scheduler) — advanced scheduling --- ## See Also - [Full Robot (Rust)](/tutorials/03-full-robot) — Rust version - [Python Bindings](/python/api/python-bindings) — Python API reference --- ## Tutorial 1: Build an IMU Sensor Node Path: /tutorials/01-sensor-node Description: Create a node that publishes IMU sensor data — accelerometer and gyroscope readings # Tutorial 1: Build an IMU Sensor Node Inertial Measurement Units (IMUs) are one of the most common sensors in robotics. Every drone relies on an IMU for stabilization, self-driving cars fuse IMU data with GPS for localization, and warehouse AGVs use them to track heading during navigation. In this tutorial you will build a HORUS node that simulates an IMU — publishing accelerometer and gyroscope readings at 100 Hz — and a second node that subscribes to that data and displays it. This is the foundational publish/subscribe pattern you will use for every sensor in HORUS. ## Prerequisites - [Quick Start](/getting-started/quick-start) completed - HORUS installed and `horus --help` working ## What You'll Build An IMU sensor node that: 1. Generates accelerometer data (x, y, z in m/s²) 2. Generates gyroscope data (roll, pitch, yaw rates in rad/s) 3. Publishes both on a topic at 100 Hz 4. A display node that prints the data once per second **Time estimate**: ~15 minutes ## Step 1: Create the Project ```bash horus new imu-demo -r cd imu-demo ``` You should see `horus.toml`, `src/main.rs`, and `.horus/` in the project directory. ## Step 2: Define the IMU Data Type Any `Copy` type can be sent over a HORUS topic. Replace `src/main.rs` with: ```rust use horus::prelude::*; /// IMU sensor reading — accelerometer + gyroscope. /// #[repr(C)] + Copy makes this a POD type for zero-copy shared memory. #[derive(Debug, Clone, Copy, Default, Serialize, Deserialize, LogSummary)] #[repr(C)] struct ImuData { // Accelerometer (m/s²) accel_x: f32, accel_y: f32, accel_z: f32, // Gyroscope (rad/s) gyro_roll: f32, gyro_pitch: f32, gyro_yaw: f32, // Timestamp (seconds since start) timestamp: f64, } ``` ## Step 3: Build the Sensor Node Add the sensor node below the struct definition: ```rust // simplified struct ImuSensor { publisher: Topic, tick_count: u64, } impl ImuSensor { fn new() -> Result { Ok(Self { publisher: Topic::new("imu.data")?, tick_count: 0, }) } } impl Node for ImuSensor { fn name(&self) -> &str { "ImuSensor" } fn tick(&mut self) { let t = self.tick_count as f64 * 0.01; // 100 Hz → 0.01s per tick // Simulate a robot turning slowly let data = ImuData { accel_x: 0.0, accel_y: 0.0, accel_z: 9.81, // Gravity gyro_roll: 0.0, gyro_pitch: 0.0, gyro_yaw: 0.1 * (t * 0.5).sin(), // Gentle oscillation timestamp: t, }; self.publisher.send(data); self.tick_count += 1; } fn shutdown(&mut self) -> Result<()> { eprintln!("ImuSensor shutting down after {} ticks", self.tick_count); Ok(()) } } ``` `Topic::new("imu.data")` creates a typed topic that carries `ImuData` structs. The `tick()` function runs at whatever rate the scheduler sets (100 Hz in Step 5). ## Step 4: Build the Display Node ```rust // simplified struct ImuDisplay { subscriber: Topic, sample_count: u64, } impl ImuDisplay { fn new() -> Result { Ok(Self { subscriber: Topic::new("imu.data")?, sample_count: 0, }) } } impl Node for ImuDisplay { fn name(&self) -> &str { "ImuDisplay" } fn tick(&mut self) { if let Some(data) = self.subscriber.recv() { self.sample_count += 1; // Print every 100th sample (once per second at 100 Hz) if self.sample_count % 100 == 0 { println!( "[{:.1}s] accel=({:.2}, {:.2}, {:.2}) gyro=({:.3}, {:.3}, {:.3})", data.timestamp, data.accel_x, data.accel_y, data.accel_z, data.gyro_roll, data.gyro_pitch, data.gyro_yaw, ); } } } fn shutdown(&mut self) -> Result<()> { eprintln!("ImuDisplay shutting down after {} samples", self.sample_count); Ok(()) } } ``` ## Step 5: Wire It Together ```rust fn main() -> Result<()> { eprintln!("IMU demo starting...\n"); let mut scheduler = Scheduler::new(); // Sensor publishes (order 0) before display subscribes (order 1) scheduler.add(ImuSensor::new()?) .order(0) .rate(100_u64.hz()) // 100 Hz — typical IMU rate .build()?; scheduler.add(ImuDisplay::new()?) .order(1) .build()?; scheduler.run() } ``` `.rate(100_u64.hz())` tells the scheduler to tick this node at 100 Hz. The `.hz()` method comes from HORUS's `DurationExt` trait, which also provides `.ms()`, `.us()`, and `.khz()`. ## Step 6: Run It ```bash horus run ``` You should see one line per second: ``` IMU demo starting... [1.0s] accel=(0.00, 0.00, 9.81) gyro=(0.000, 0.000, 0.048) [2.0s] accel=(0.00, 0.00, 9.81) gyro=(0.000, 0.000, 0.084) [3.0s] accel=(0.00, 0.00, 9.81) gyro=(0.000, 0.000, 0.097) ``` Press **Ctrl+C** to stop. You should see shutdown messages from both nodes. ## Step 7: Introspect While Running While the demo is still running (restart it with `horus run` if you stopped it), open a **second terminal** and use the HORUS CLI to inspect live topics. ### List active topics ```bash horus topic list ``` You should see output like: ``` Active topics: imu.data ImuData 100 Hz 1 publisher, 1 subscriber ``` This tells you the topic name, the message type, the publish rate, and how many nodes are connected. ### Watch live messages ```bash horus topic echo imu.data ``` This streams every message on the `imu.data` topic to your terminal: ``` [0.01s] ImuData { accel_x: 0.00, accel_y: 0.00, accel_z: 9.81, gyro_roll: 0.000, gyro_pitch: 0.000, gyro_yaw: 0.001, timestamp: 0.01 } [0.02s] ImuData { accel_x: 0.00, accel_y: 0.00, accel_z: 9.81, gyro_roll: 0.000, gyro_pitch: 0.000, gyro_yaw: 0.001, timestamp: 0.02 } ... ``` Press **Ctrl+C** to stop echoing. The running demo is unaffected — `horus topic echo` is a read-only observer. ### What Happens When the Subscriber is Slower? HORUS topics use a **lock-free ring buffer** under the hood. If the publisher sends faster than the subscriber reads, the ring buffer silently overwrites the oldest unread messages. No crash, no blocking — the publisher never waits. This design is intentional: in real-time systems, dropping stale data is better than stalling the control loop. You can detect dropped messages by calling `dropped_count()` on the topic: ```rust // simplified fn tick(&mut self) { if let Some(data) = self.subscriber.recv() { self.sample_count += 1; // Check for dropped messages periodically if self.sample_count % 1000 == 0 { let dropped = self.subscriber.dropped_count(); if dropped > 0 { eprintln!( "WARNING: {} messages dropped on '{}'", dropped, self.subscriber.name() ); } } if self.sample_count % 100 == 0 { println!( "[{:.1}s] accel=({:.2}, {:.2}, {:.2}) gyro=({:.3}, {:.3}, {:.3})", data.timestamp, data.accel_x, data.accel_y, data.accel_z, data.gyro_roll, data.gyro_pitch, data.gyro_yaw, ); } } } ``` At 100 Hz with a matching subscriber, you should see zero drops. In the Challenges section below, you will push this to 1000 Hz and observe what happens. ## Key Takeaways - **Custom data types** can be sent over topics — any `#[repr(C)]` + `Copy` struct works for zero-copy IPC - **Per-node rates** are set with `.rate(100_u64.hz())` on the node builder - **Topic naming** uses dots for hierarchy (`"imu.data"`) - **Sampling** lets you display high-frequency data at a readable rate (print every Nth sample) ### Challenges Ready to go further? Try these extensions on your own: **(a) Add realistic noise to the sensor readings.** Real IMUs are noisy. Add the `rand` crate to your `horus.toml` dependencies and inject Gaussian noise into every reading: ```rust // simplified use rand::Rng; let mut rng = rand::thread_rng(); let data = ImuData { accel_x: rng.gen_range(-0.05..0.05), accel_y: rng.gen_range(-0.05..0.05), accel_z: 9.81 + rng.gen_range(-0.02..0.02), gyro_roll: rng.gen_range(-0.001..0.001), gyro_pitch: rng.gen_range(-0.001..0.001), gyro_yaw: 0.1 * (t * 0.5).sin() + rng.gen_range(-0.005..0.005), timestamp: t, }; ``` **(b) Add a third node that computes a moving average.** Create an `ImuFilter` node that subscribes to `imu.data`, maintains a circular buffer of the last N readings, and publishes the smoothed result on `imu.filtered`. Then update `ImuDisplay` to subscribe to `imu.filtered` instead. This is how real sensor pipelines work: raw data goes through one or more filter stages before reaching the controller. **(c) Push the sensor rate to 1000 Hz and observe dropped messages.** Change the sensor rate to `1000_u64.hz()` and keep the display node at its default rate. Add the `dropped_count()` check from Step 7 and watch what happens. You should see drops increase over time as the subscriber cannot keep up. Try increasing the display node's rate (e.g., `.rate(1000_u64.hz())`) and confirm the drops go to zero. ## Complete File Here is the full `src/main.rs` in a single copy-pasteable block: ```rust use horus::prelude::*; // ── Message Type ────────────────────────────────────────────── /// IMU sensor reading — accelerometer + gyroscope. /// #[repr(C)] + Copy makes this a POD type for zero-copy shared memory. #[derive(Debug, Clone, Copy, Default, Serialize, Deserialize, LogSummary)] #[repr(C)] struct ImuData { // Accelerometer (m/s²) accel_x: f32, accel_y: f32, accel_z: f32, // Gyroscope (rad/s) gyro_roll: f32, gyro_pitch: f32, gyro_yaw: f32, // Timestamp (seconds since start) timestamp: f64, } // ── Sensor Node ─────────────────────────────────────────────── struct ImuSensor { publisher: Topic, tick_count: u64, } impl ImuSensor { fn new() -> Result { Ok(Self { publisher: Topic::new("imu.data")?, tick_count: 0, }) } } impl Node for ImuSensor { fn name(&self) -> &str { "ImuSensor" } fn tick(&mut self) { let t = self.tick_count as f64 * 0.01; // 100 Hz → 0.01s per tick // Simulate a robot turning slowly let data = ImuData { accel_x: 0.0, accel_y: 0.0, accel_z: 9.81, // Gravity gyro_roll: 0.0, gyro_pitch: 0.0, gyro_yaw: 0.1 * (t * 0.5).sin(), // Gentle oscillation timestamp: t, }; self.publisher.send(data); self.tick_count += 1; } fn shutdown(&mut self) -> Result<()> { eprintln!("ImuSensor shutting down after {} ticks", self.tick_count); Ok(()) } } // ── Display Node ────────────────────────────────────────────── struct ImuDisplay { subscriber: Topic, sample_count: u64, } impl ImuDisplay { fn new() -> Result { Ok(Self { subscriber: Topic::new("imu.data")?, sample_count: 0, }) } } impl Node for ImuDisplay { fn name(&self) -> &str { "ImuDisplay" } fn tick(&mut self) { if let Some(data) = self.subscriber.recv() { self.sample_count += 1; // Print every 100th sample (once per second at 100 Hz) if self.sample_count % 100 == 0 { println!( "[{:.1}s] accel=({:.2}, {:.2}, {:.2}) gyro=({:.3}, {:.3}, {:.3})", data.timestamp, data.accel_x, data.accel_y, data.accel_z, data.gyro_roll, data.gyro_pitch, data.gyro_yaw, ); } } } fn shutdown(&mut self) -> Result<()> { eprintln!("ImuDisplay shutting down after {} samples", self.sample_count); Ok(()) } } // ── Main ────────────────────────────────────────────────────── fn main() -> Result<()> { eprintln!("IMU demo starting...\n"); let mut scheduler = Scheduler::new(); // Sensor publishes (order 0) before display subscribes (order 1) scheduler.add(ImuSensor::new()?) .order(0) .rate(100_u64.hz()) // 100 Hz — typical IMU rate .build()?; scheduler.add(ImuDisplay::new()?) .order(1) .build()?; scheduler.run() } ``` ## Troubleshooting | Problem | Cause | Fix | |---------|-------|-----| | `error[E0277]: the trait bound Copy is not satisfied` | `ImuData` contains a non-`Copy` field (e.g., `String`, `Vec`) | All fields must be `Copy` types (`f32`, `f64`, `u64`, `bool`, fixed-size arrays). Use `[u8; N]` instead of `String` | | `horus run` prints nothing | Display node is not receiving messages — topic name mismatch | Verify both nodes use exactly `"imu.data"` (case-sensitive, dot-separated) | | `horus topic echo imu.data` shows no output | No publisher is running, or the topic name is wrong | Start the demo in another terminal first, then run `horus topic echo` | | `error: failed to create shared memory` | Shared memory region already exists from a crashed run | Run `horus clean --shm` to remove stale shared memory segments | | Timestamp stays at 0.0 | `tick_count` is not being incremented | Make sure `self.tick_count += 1;` is at the end of `tick()` | | Compile error on `.hz()` | Missing `DurationExt` trait import | The `horus::prelude::*` import brings it in automatically. If using selective imports, add `use horus_core::duration_ext::DurationExt;` | | Display prints every tick instead of once per second | The modulo check is wrong or `sample_count` is not incrementing | Verify `self.sample_count % 100 == 0` and that `sample_count` starts at `0` and increments by `1` per received message | ## Next Steps - [Tutorial 2: Motor Controller](/tutorials/02-motor-controller) — Subscribe to velocity commands and publish joint positions ## See Also - [Nodes (Concept)](/concepts/nodes-beginner) — How nodes work - [Topic API](/rust/api/topic) — Full Topic reference - [Imu Message](/stdlib/messages/imu) — The standard Imu message type - [IMU Reader Recipe](/recipes/imu-reader) — Production IMU integration pattern --- ## Tutorial 4: Custom Messages (Python) Path: /tutorials/04-custom-messages-python Description: Send typed messages, dicts, and dataclasses between Python nodes — Python edition # Tutorial 4: Custom Messages (Python) Learn three ways to send data between Python nodes: typed messages (fast), dicts (flexible), and dataclasses (structured). ## What You'll Learn - Using built-in typed messages (`CmdVel`, `Imu`, `Odometry`) - Sending Python dicts as generic messages - Using dataclasses for structured Python-only data - Compiling custom messages with `maturin` for production performance - Performance trade-offs between approaches - When to use which approach **Prerequisites:** [Tutorial 1 (Python)](/tutorials/01-sensor-node-python) completed. A working `horus` installation. **Time:** 25 minutes --- ## Approach 1: Built-in Typed Messages (Fastest) Use HORUS's built-in message types for maximum performance (~1.7us latency, zero-copy with Rust nodes). These types are defined in Rust and exposed to Python via PyO3 bindings, so they work seamlessly across both languages. ### When to use typed messages - Your data matches a standard robotics message (IMU, velocity, pose, etc.) - You need cross-language compatibility (Python publisher, Rust subscriber, or vice versa) - You want the fastest possible latency without a custom build step - You are building control loops, sensor pipelines, or motor drivers ### Complete example Create `typed_demo.py`: ```python import horus # --- Node tick functions --- def imu_sensor_tick(node): """Simulate an IMU sensor publishing orientation and acceleration.""" imu = horus.Imu() # Imu fields: orientation[4], angular_velocity[3], linear_acceleration[3] # All fields default to 0.0 — set what you need node.send("imu", imu) def motor_driver_tick(node): """Receive velocity commands and 'drive' motors.""" cmd = node.recv("cmd_vel") if cmd is not None: print(f"[Motor] linear={cmd.linear:.2f} angular={cmd.angular:.2f}") def controller_tick(node): """Read IMU data and publish velocity commands.""" imu = node.recv("imu") if imu is not None: # Simple proportional controller: slow down if tilting tilt = abs(imu.linear_acceleration[0]) speed = max(0.1, 1.0 - tilt * 0.5) cmd = horus.CmdVel(linear=speed, angular=0.0) node.send("cmd_vel", cmd) print(f"[Ctrl] tilt={tilt:.2f} -> speed={speed:.2f}") # --- Node definitions --- sensor = horus.Node( name="ImuSensor", tick=imu_sensor_tick, rate=100, order=0, pubs=[horus.Imu], # typed topic — HORUS knows the type at registration ) controller = horus.Node( name="Controller", tick=controller_tick, rate=100, order=1, subs=[horus.Imu], pubs=[horus.CmdVel], ) motor = horus.Node( name="MotorDriver", tick=motor_driver_tick, rate=100, order=2, subs=[horus.CmdVel], ) # --- Run --- horus.run(sensor, controller, motor, duration=5.0) ``` Create `horus.toml` in the same directory: ```toml [project] name = "typed-demo" version = "0.1.0" language = "python" entry = "typed_demo.py" ``` Run it: ```bash horus run ``` **Expected output:** ```text [Ctrl] tilt=0.00 -> speed=1.00 [Motor] linear=1.00 angular=0.00 [Ctrl] tilt=0.00 -> speed=1.00 [Motor] linear=1.00 angular=0.00 ... ``` ### Key details - `pubs=[horus.Imu]` registers a typed topic. HORUS uses the type's built-in topic name (e.g., `horus.Imu` maps to `"imu"`). - `node.send("imu", imu)` sends to the topic by name. The name must match what subscribers use. - `node.recv("imu")` returns `None` if no message has arrived since the last read. - All 70+ built-in types are zero-copy across the Rust/Python boundary: the Python object wraps the same shared memory that Rust nodes read. **Available types:** `CmdVel`, `Imu`, `Odometry`, `LaserScan`, `Pose2D`, `Pose3D`, `Twist`, `BatteryState`, `JointState`, `MotorCommand`, `ServoCommand`, `EmergencyStop`, and 60+ more. See [Message Types](/concepts/message-types) for the full list. --- ## Approach 2: Python Dicts (Most Flexible) Send any Python dict as a message. HORUS serializes it with MessagePack under the hood (~6-50us latency depending on dict size). No type registration, no build step, no schema definition. ### When to use dicts - You are prototyping and the message schema is still evolving - Your data does not fit any built-in type (configuration blobs, experiment metadata, etc.) - You only need Python-to-Python communication - The topic runs at low frequency (1-10Hz) where microsecond overhead does not matter ### Complete example Create `dict_demo.py`: ```python import horus import time # --- Node tick functions --- def environment_sensor_tick(node): """Simulate an environment sensor with multiple readings.""" reading = { "temperature": 23.5 + (horus.rng_float() * 2.0 - 1.0), "humidity": 0.65 + (horus.rng_float() * 0.1 - 0.05), "pressure": 1013.25, "location": "lab_room_3", "tags": ["indoor", "calibrated"], "nested": { "sensor_id": 42, "firmware": "v2.1.0", }, } node.send("environment", reading) def logger_tick(node): """Log environment data to the console.""" data = node.recv("environment") if data is not None: temp = data["temperature"] humidity = data["humidity"] * 100 sensor_id = data["nested"]["sensor_id"] print(f"[Logger] sensor={sensor_id} temp={temp:.1f}C humidity={humidity:.0f}%") def alert_tick(node): """Check for out-of-range readings and publish alerts.""" data = node.recv("environment") if data is not None: if data["temperature"] > 24.0: alert = { "type": "temperature_high", "value": data["temperature"], "threshold": 24.0, "location": data["location"], } node.send("alerts", alert) print(f"[Alert] Temperature {data['temperature']:.1f}C exceeds 24.0C") # --- Node definitions --- sensor = horus.Node( name="EnvSensor", tick=environment_sensor_tick, rate=1, # 1 Hz — environment data changes slowly order=0, pubs=["environment"], # string topic name = dict-based message ) logger = horus.Node( name="Logger", tick=logger_tick, rate=1, order=1, subs=["environment"], ) alerter = horus.Node( name="Alerter", tick=alert_tick, rate=1, order=2, subs=["environment"], pubs=["alerts"], ) # --- Run --- horus.run(sensor, logger, alerter, duration=10.0) ``` Create `horus.toml`: ```toml [project] name = "dict-demo" version = "0.1.0" language = "python" entry = "dict_demo.py" ``` Run it: ```bash horus run ``` **Expected output:** ```text [Logger] sensor=42 temp=23.8C humidity=63% [Logger] sensor=42 temp=22.9C humidity=67% [Alert] Temperature 24.3C exceeds 24.0C [Logger] sensor=42 temp=24.3C humidity=62% ... ``` ### What dicts support Dicts can contain any MessagePack-compatible value: | Python type | Supported | Notes | |-------------|-----------|-------| | `str` | Yes | UTF-8 strings | | `int` | Yes | Arbitrary precision | | `float` | Yes | 64-bit float | | `bool` | Yes | `True` / `False` | | `list` | Yes | Heterogeneous allowed | | `dict` | Yes | Nested dicts work | | `None` | Yes | Serialized as nil | | `bytes` | Yes | Raw binary data | | Custom objects | No | Use `asdict()` or `__dict__` first | ### Limitations - **No cross-language support.** Rust nodes cannot subscribe to dict-based topics (they do not know the schema). If you need Rust interop, use typed messages (Approach 1) or compiled messages (Approach 4). - **No type checking.** A typo in a key name (`data["temperture"]`) is a runtime `KeyError`, not a compile-time error. - **Size overhead.** MessagePack includes key names in every message. A dict with long key names is larger than an equivalent struct. --- ## Approach 3: Dataclasses (Structured Python) For structured Python-only data, define `@dataclass` classes and serialize them to dicts with `asdict()`. This gives you IDE autocompletion, type hints, and constructor validation while still using the dict transport under the hood. ### When to use dataclasses - You want Python type checking and IDE support - Multiple team members work on the same message schema - You need default values, validation, or computed fields - The data stays within Python (no Rust subscribers) ### Complete example Create `dataclass_demo.py`: ```python import horus from dataclasses import dataclass, asdict, field from typing import List # --- Message definitions --- @dataclass class BatteryReading: voltage: float current: float temperature: float charge_percent: float cell_count: int = 3 def is_critical(self) -> bool: """Business logic directly on the message.""" return self.voltage < 10.5 or self.temperature > 60.0 @dataclass class BatteryAlert: level: str # "info", "warning", "critical" message: str voltage: float charge_percent: float @dataclass class SystemStatus: battery: BatteryReading alerts: List[str] = field(default_factory=list) uptime_secs: float = 0.0 # --- Node tick functions --- tick_counter = {"value": 0} def battery_sensor_tick(node): """Simulate a draining battery.""" tick_counter["value"] += 1 t = tick_counter["value"] reading = BatteryReading( voltage=12.6 - (t * 0.15), current=2.5, temperature=35.0 + (t * 0.3), charge_percent=max(0.0, 100.0 - t * 5.0), ) # asdict() converts the dataclass to a dict for transport node.send("battery.raw", asdict(reading)) print(f"[Battery] {reading.charge_percent:.0f}% ({reading.voltage:.1f}V)") def monitor_tick(node): """Watch battery readings and publish alerts.""" data = node.recv("battery.raw") if data is not None: # Reconstruct the dataclass from the dict reading = BatteryReading(**data) if reading.is_critical(): alert = BatteryAlert( level="critical", message=f"Battery critical: {reading.voltage:.1f}V", voltage=reading.voltage, charge_percent=reading.charge_percent, ) node.send("battery.alert", asdict(alert)) print(f"[Monitor] CRITICAL: {alert.message}") elif reading.charge_percent < 30.0: alert = BatteryAlert( level="warning", message=f"Battery low: {reading.charge_percent:.0f}%", voltage=reading.voltage, charge_percent=reading.charge_percent, ) node.send("battery.alert", asdict(alert)) print(f"[Monitor] WARNING: {alert.message}") def dashboard_tick(node): """Aggregate battery data and alerts into a system status.""" battery_data = node.recv("battery.raw") alert_data = node.recv("battery.alert") if battery_data is not None: battery = BatteryReading(**battery_data) alerts = [] if alert_data is not None: alerts.append(alert_data["message"]) status = SystemStatus( battery=battery, alerts=alerts, uptime_secs=tick_counter["value"], ) # Nested dataclasses serialize correctly with asdict() node.send("system.status", asdict(status)) # --- Node definitions --- battery = horus.Node( name="Battery", tick=battery_sensor_tick, rate=1, order=0, pubs=["battery.raw"], ) monitor = horus.Node( name="Monitor", tick=monitor_tick, rate=1, order=1, subs=["battery.raw"], pubs=["battery.alert"], ) dashboard = horus.Node( name="Dashboard", tick=dashboard_tick, rate=1, order=2, subs=["battery.raw", "battery.alert"], pubs=["system.status"], ) # --- Run --- horus.run(battery, monitor, dashboard, duration=20.0) ``` Create `horus.toml`: ```toml [project] name = "dataclass-demo" version = "0.1.0" language = "python" entry = "dataclass_demo.py" ``` Run it: ```bash horus run ``` **Expected output:** ```text [Battery] 95% (12.5V) [Battery] 90% (12.3V) [Battery] 85% (12.2V) ... [Battery] 25% (10.9V) [Monitor] WARNING: Battery low: 25% [Battery] 20% (10.7V) [Monitor] WARNING: Battery low: 20% ... [Battery] 5% (10.4V) [Monitor] CRITICAL: Battery critical: 10.4V ``` ### Dataclass tips - **`asdict()` handles nesting.** If `SystemStatus` contains a `BatteryReading`, `asdict()` recursively converts both. - **Reconstruction with `**data`.** `BatteryReading(**data)` works because `asdict()` preserves field names. Nested dataclasses come back as plain dicts though, so you need to reconstruct them manually: `reading = BatteryReading(**data["battery"])`. - **Validation in `__post_init__`.** Add a `__post_init__` method to validate on construction: ```python @dataclass class BatteryReading: voltage: float current: float temperature: float charge_percent: float def __post_init__(self): if not 0.0 <= self.charge_percent <= 100.0: raise ValueError(f"charge_percent must be 0-100, got {self.charge_percent}") ``` - **Methods survive transport.** After `BatteryReading(**data)`, you can call `reading.is_critical()` -- the methods are on the class, not on the dict. --- ## Approach 4: Compiled Messages with maturin (Production) For Python-only custom messages at high frequency, use `horus.msggen` to define a message schema, then compile it with `maturin` for near-native performance (~3-5us). This approach requires a build step but gives you a typed, binary-serialized message without writing any Rust yourself. ### When to use compiled messages - You have a custom message type that does not match any built-in type - The topic runs at high frequency (50Hz+) and dict overhead is too high - You want binary serialization without writing Rust - You are deploying to production and want predictable performance ### Complete example Create `compiled_demo.py`: ```python from horus.msggen import define_message import horus # --- Define a custom message type --- # This creates a Python class backed by fixed-layout binary serialization. # At runtime (no maturin), latency is ~20-40us. # After `maturin develop`, latency drops to ~3-5us. RobotStatus = define_message("RobotStatus", "robot.status", [ ("battery_level", "f32"), ("motor_rpm", "f32"), ("error_code", "i32"), ("is_active", "bool"), ("tick_count", "u64"), ]) WheelSpeed = define_message("WheelSpeed", "wheel.speed", [ ("left_rpm", "f32"), ("right_rpm", "f32"), ("timestamp_ns", "u64"), ]) # --- Node tick functions --- counter = {"value": 0} def status_publisher_tick(node): """Publish robot status at 50 Hz.""" counter["value"] += 1 status = RobotStatus( battery_level=85.0, motor_rpm=3200.0 + (counter["value"] % 100), error_code=0, is_active=True, tick_count=counter["value"], ) node.send("robot.status", status) def wheel_publisher_tick(node): """Publish wheel speeds at 50 Hz.""" speed = WheelSpeed( left_rpm=150.0, right_rpm=148.0, timestamp_ns=counter["value"] * 20_000_000, # 20ms per tick ) node.send("wheel.speed", speed) def dashboard_tick(node): """Read both topics and display.""" status = node.recv("robot.status") wheels = node.recv("wheel.speed") if status is not None and wheels is not None: print( f"[Dash] battery={status.battery_level:.0f}% " f"rpm={status.motor_rpm:.0f} " f"wheels=({wheels.left_rpm:.0f}, {wheels.right_rpm:.0f})" ) # --- Node definitions --- status_pub = horus.Node( name="StatusPub", tick=status_publisher_tick, rate=50, order=0, pubs=["robot.status"], ) wheel_pub = horus.Node( name="WheelPub", tick=wheel_publisher_tick, rate=50, order=1, pubs=["wheel.speed"], ) dash = horus.Node( name="Dashboard", tick=dashboard_tick, rate=50, order=2, subs=["robot.status", "wheel.speed"], ) # --- Run --- horus.run(status_pub, wheel_pub, dash, duration=5.0) ``` Create `horus.toml`: ```toml [project] name = "compiled-demo" version = "0.1.0" language = "python" entry = "compiled_demo.py" ``` This example runs without any build step (runtime mode, ~20-40us). To get compiled performance (~3-5us), run: ```bash # One-time setup: install maturin pip install maturin # Compile the message definitions maturin develop ``` After compilation, the same code runs with binary serialization instead of Python's `struct` module. **Expected output (both runtime and compiled):** ```text [Dash] battery=85% rpm=3201 wheels=(150, 148) [Dash] battery=85% rpm=3202 wheels=(150, 148) [Dash] battery=85% rpm=3203 wheels=(150, 148) ... ``` --- ## Performance Comparison | Approach | Latency | Cross-language? | Build step? | Best for | |----------|---------|-----------------|-------------|----------| | **Typed** (`horus.CmdVel`) | ~1.7us | Yes (Rust + Python) | No | Control loops, typed sensor data | | **Dict** (GenericMessage) | ~6-50us | No (Python only) | No | Prototyping, flexible schemas | | **Dataclass** (runtime) | ~20-40us | No (Python only) | No | Structured Python-only data | | **Compiled** (maturin) | ~3-5us | No (Python only) | Yes (`maturin develop`) | Production, high-frequency custom types | Latency is measured end-to-end: serialize on the publisher, write to shared memory, read on the subscriber, deserialize. Dict latency varies with payload size (6us for a small dict, 50us for a dict with large nested structures). ### Choosing the right approach Follow this decision process: **Does your data match a built-in type?** Use typed messages (Approach 1). They are the fastest option with no build step, and they work across Rust and Python. If you are sending `CmdVel`, `Imu`, `Odometry`, or any of the 70+ standard types, there is no reason to use anything else. **Are you prototyping?** Use dicts (Approach 2). You can change the schema by editing a Python dict literal. No class definitions, no registration, no build step. When the schema stabilizes, migrate to typed or compiled messages. **Do you need structure but stay Python-only?** Use dataclasses (Approach 3). You get IDE autocompletion, type hints, and constructor validation. The transport cost is the same as dicts (they serialize as dicts), but your code is cleaner and more maintainable. **Do you need a custom type at high frequency?** Use compiled messages (Approach 4). Define the schema once with `define_message()`, compile with `maturin`, and get ~3-5us latency with binary serialization. This is the right choice for production Python nodes running at 50Hz or above with custom data. **Do you need cross-language custom types?** Define the message in Rust with `message!` and expose it to Python via PyO3. See [Tutorial 4 (Rust)](/tutorials/04-custom-messages) for the Rust side and [Python API: Custom Messages](/python/api/custom-messages) for binding patterns. --- ## Common Mistakes **Mixing typed and string topic names.** If you register `pubs=[horus.Imu]` but send with `node.send("imu", some_dict)`, the subscriber gets a dict, not an `Imu` object. Always match the registration type with the send type. **Forgetting `asdict()`.** Sending a raw dataclass instance without `asdict()` will fail at serialization. Always wrap: `node.send("topic", asdict(my_dataclass))`. **Nested dataclass reconstruction.** `BatteryReading(**data)` works for flat dataclasses, but if your dataclass contains another dataclass, the nested field comes back as a plain dict. Reconstruct it manually: ```python @dataclass class Outer: inner: Inner value: float # After receiving: data = node.recv("topic") if data is not None: inner = Inner(**data["inner"]) outer = Outer(inner=inner, value=data["value"]) ``` **Dict key typos.** `data["temperture"]` (typo) raises `KeyError` at runtime. Dataclasses catch this at construction time: `BatteryReading(temperture=23.5)` raises `TypeError`. This is why Approach 3 is better for team projects. --- ## Next Steps - [Tutorial 5: Hardware Drivers (Python)](/tutorials/05-hardware-and-rt-python) -- connect to real hardware - [Tutorial 4 (Rust)](/tutorials/04-custom-messages) -- same tutorial in Rust with `message!` macro - [Message Types](/concepts/message-types) -- full list of 70+ built-in types - [Python API: Custom Messages](/python/api/custom-messages) -- `define_message()`, `define_numpy_message()`, and compiled workflows --- ## See Also - [Custom Messages (Rust)](/tutorials/04-custom-messages) -- Rust version - [Python Custom Messages](/python/api/custom-messages) -- Python API - [Performance Optimization](/performance/performance) -- benchmarks and tuning --- ## Migrating from ROS2 C++ Path: /tutorials/migrating-from-ros2-cpp Description: Side-by-side comparison of ROS2 rclcpp and HORUS C++ — move faster with less code # Migrating from ROS2 C++ This guide shows ROS2 rclcpp patterns and their HORUS equivalents side by side. ## Node Definition ### ROS2 (35+ lines) ```cpp #include "rclcpp/rclcpp.hpp" #include "sensor_msgs/msg/laser_scan.hpp" #include "geometry_msgs/msg/twist.hpp" class Controller : public rclcpp::Node { public: Controller() : Node("controller") { sub_ = create_subscription( "scan", 10, std::bind(&Controller::scan_cb, this, std::placeholders::_1)); pub_ = create_publisher("cmd_vel", 10); } private: void scan_cb(const sensor_msgs::msg::LaserScan::SharedPtr msg) { auto cmd = geometry_msgs::msg::Twist(); cmd.linear.x = msg->ranges[0] > 0.5 ? 0.3 : 0.0; pub_->publish(cmd); } rclcpp::Subscription::SharedPtr sub_; rclcpp::Publisher::SharedPtr pub_; }; int main(int argc, char** argv) { rclcpp::init(argc, argv); rclcpp::spin(std::make_shared()); rclcpp::shutdown(); } ``` ### HORUS (15 lines) ```cpp #include using namespace horus::literals; int main() { horus::Scheduler sched; sched.tick_rate(100_hz); auto scan_sub = sched.subscribe("scan"); auto cmd_pub = sched.advertise("cmd_vel"); sched.add("controller") .rate(50_hz) .tick([&] { auto scan = scan_sub.recv(); if (!scan) return; auto cmd = cmd_pub.loan(); cmd->linear = scan->ranges[0] > 0.5f ? 0.3f : 0.0f; cmd_pub.publish(std::move(cmd)); }) .build(); sched.spin(); } ``` ## Pattern Comparison | Concept | ROS2 rclcpp | HORUS C++ | |---------|-------------|-----------| | Node | `class : public rclcpp::Node` | Lambda in `sched.add().tick([&]{})` | | Publisher | `create_publisher(topic, qos)` | `sched.advertise(topic)` | | Subscriber | `create_subscription(topic, qos, cb)` | `sched.subscribe(topic)` | | Callback | `std::bind(&Class::method, this, _1)` | Captured lambda `[&]{ sub.recv(); }` | | Message | `geometry_msgs::msg::Twist` | `horus::msg::CmdVel` | | Pointer | `SharedPtr` everywhere | Value types + move semantics | | Publish | `pub->publish(msg)` (copy) | `pub.publish(std::move(sample))` (zero-copy) | | Receive | Callback-driven | Poll: `sub.recv()` → `std::optional` | | Init | `rclcpp::init(argc, argv)` | Nothing needed | | Run | `rclcpp::spin(node)` | `sched.spin()` | | Rate | `rclcpp::Rate(100)` | `.rate(100_hz)` | | Timer | `create_wall_timer(100ms, cb)` | `.rate(10_hz)` on node | | QoS | `rclcpp::QoS(10).reliable()` | `.budget(5_ms).on_miss(Skip)` | ## Key Differences ### No Inheritance ROS2 requires subclassing `rclcpp::Node`. HORUS uses lambdas — no class hierarchy needed. ### No SharedPtr ROS2 uses `SharedPtr` for everything (publishers, subscribers, messages). HORUS uses move semantics and RAII — `std::unique_ptr` for owned resources, `std::optional` for nullable results. ### No IDL / .msg Files ROS2 requires `.msg` files + `rosidl` codegen. HORUS uses plain C++ structs with `#[repr(C)]` layout — same struct in Rust and C++, no codegen step. ### Zero-Copy IPC ROS2 copies data through the DDS middleware (even with "zero-copy" DDS, there's middleware overhead). HORUS writes directly to shared memory via the loan pattern — the `operator->` call returns a raw pointer to SHM. ### Deterministic Scheduling ROS2 uses callback queues with non-deterministic ordering. HORUS provides explicit `order()` and optional `deterministic(true)` mode for bit-exact reproducibility. ## Migration Checklist 1. Replace `rclcpp::Node` subclass with `sched.add(name).tick(lambda)` 2. Replace `create_publisher` with `sched.advertise` 3. Replace `create_subscription` with `sched.subscribe` (capture in lambda) 4. Replace `std::bind` callbacks with captured lambdas 5. Replace `SharedPtr` with value types 6. Replace `.msg` files with `horus::msg::` types 7. Replace `rclcpp::init/spin/shutdown` with `sched.spin()` 8. Replace `package.xml` + `CMakeLists.txt` with `horus.toml` 9. Replace `ros2 launch` with `horus launch` 10. Replace `ros2 topic echo` with `horus topic echo` ## Performance HORUS C++ FFI adds **15-17ns** per call (vs ~1-5us DDS serialization in ROS2). Scheduler tick: **250ns** (vs ~10-50us in rclcpp). Throughput: **2.84M ticks/sec**. See [Benchmarks: C++ Binding Performance](/docs/performance/benchmarks#c-binding-performance) for full results. --- ## Tutorial 2: Motor Controller (C++) Path: /tutorials/02-motor-controller-cpp Description: Build a closed-loop motor controller with PID, safety limits, and runtime tuning # Tutorial 2: Motor Controller (C++) In this tutorial you'll build a motor controller that: - Reads velocity commands from a topic - Applies PID control to track the commanded velocity - Enforces torque limits and safety constraints - Publishes motor state for monitoring - Enters safe state (stops motors) on safety events ## What You'll Learn - Multi-topic Node with subscribers AND publishers - PID control loop with anti-windup - Runtime parameter tuning via `horus::Params` - Budget enforcement with `Miss::SafeMode` - `enter_safe_state()` for actuator safety ## Prerequisites - Completed [Tutorial 1: LiDAR Sensor Node](/docs/tutorials/01-sensor-node-cpp) - Understanding of PID control (see [PID Recipe](/docs/recipes/pid-controller-cpp)) ## Step 1: Create the Project ```bash horus new motor_ctrl --lang cpp cd motor_ctrl ``` ## Step 2: Write the Motor Controller Replace `src/main.cpp`: ```cpp #include #include #include #include using namespace horus::literals; // ── Motor Controller Node ─────────────────────────────────────────────── class MotorController : public horus::Node { public: MotorController() : Node("motor_controller") { // Subscribe to velocity commands from planner/teleop cmd_sub_ = subscribe("cmd_vel"); // Subscribe to encoder feedback enc_sub_ = subscribe("encoder.velocity"); // Publish motor commands (PWM duty cycle) motor_pub_ = advertise("motor.pwm"); // Publish motor state for monitoring state_pub_ = advertise("motor.state"); } void init() override { horus::log::info("motor", "Motor controller initialized"); horus::log::info("motor", "Waiting for encoder feedback..."); } void tick() override { // Read commanded velocity if (auto cmd = cmd_sub_->recv()) { target_linear_ = cmd->get()->linear; target_angular_ = cmd->get()->angular; } // Read actual velocity from encoders double actual_linear = 0, actual_angular = 0; if (auto enc = enc_sub_->recv()) { actual_linear = enc->get()->linear; actual_angular = enc->get()->angular; encoder_alive_ = true; } // Safety: if no encoder feedback for 50 ticks (500ms), stop if (!encoder_alive_) { ticks_without_encoder_++; if (ticks_without_encoder_ > 50) { if (!encoder_timeout_logged_) { horus::log::error("motor", "Encoder timeout — stopping motors"); horus::blackbox::record("motor", "Encoder timeout after 500ms"); encoder_timeout_logged_ = true; } send_zero(); return; } } else { ticks_without_encoder_ = 0; encoder_timeout_logged_ = false; } encoder_alive_ = false; // PID for linear velocity double lin_error = target_linear_ - actual_linear; lin_integral_ += lin_error * dt_; lin_integral_ = std::clamp(lin_integral_, -integral_max_, integral_max_); double lin_output = kp_ * lin_error + ki_ * lin_integral_ + kd_ * (lin_error - lin_prev_error_) / dt_; lin_prev_error_ = lin_error; // PID for angular velocity double ang_error = target_angular_ - actual_angular; ang_integral_ += ang_error * dt_; ang_integral_ = std::clamp(ang_integral_, -integral_max_, integral_max_); double ang_output = kp_ * ang_error + ki_ * ang_integral_ + kd_ * (ang_error - ang_prev_error_) / dt_; ang_prev_error_ = ang_error; // Clamp to actuator limits lin_output = std::clamp(lin_output, -max_pwm_, max_pwm_); ang_output = std::clamp(ang_output, -max_pwm_, max_pwm_); // Publish motor command horus::msg::CmdVel pwm{}; pwm.linear = static_cast(lin_output); pwm.angular = static_cast(ang_output); motor_pub_->send(pwm); // Publish state for monitoring (every 10th tick = 10 Hz) if (++tick_count_ % 10 == 0) { horus::msg::CmdVel state{}; state.linear = static_cast(actual_linear); state.angular = static_cast(lin_error); state_pub_->send(state); } } void enter_safe_state() override { send_zero(); lin_integral_ = 0; ang_integral_ = 0; horus::blackbox::record("motor", "Safe state — motors zeroed"); } private: void send_zero() { horus::msg::CmdVel stop{}; motor_pub_->send(stop); } horus::Subscriber* cmd_sub_; horus::Subscriber* enc_sub_; horus::Publisher* motor_pub_; horus::Publisher* state_pub_; double target_linear_ = 0, target_angular_ = 0; double lin_integral_ = 0, lin_prev_error_ = 0; double ang_integral_ = 0, ang_prev_error_ = 0; double kp_ = 2.0, ki_ = 0.5, kd_ = 0.1; double dt_ = 0.01; // 100 Hz double integral_max_ = 1.0; double max_pwm_ = 1.0; int tick_count_ = 0; int ticks_without_encoder_ = 0; bool encoder_alive_ = false; bool encoder_timeout_logged_ = false; }; // ── Simulated Encoder (for testing without hardware) ──────────────────── class SimEncoder : public horus::Node { public: SimEncoder() : Node("sim_encoder") { cmd_sub_ = subscribe("motor.pwm"); enc_pub_ = advertise("encoder.velocity"); } void tick() override { if (auto cmd = cmd_sub_->recv()) { // Simulate motor dynamics: velocity tracks PWM with lag velocity_ = 0.9 * velocity_ + 0.1 * cmd->get()->linear; } horus::msg::CmdVel enc{}; enc.linear = static_cast(velocity_); enc_pub_->send(enc); } private: horus::Subscriber* cmd_sub_; horus::Publisher* enc_pub_; double velocity_ = 0; }; // ── Main ──────────────────────────────────────────────────────────────── int main() { horus::Scheduler sched; sched.tick_rate(100_hz).name("motor_ctrl"); // Simulated encoder (order 0 — runs first, provides feedback) SimEncoder encoder; sched.add(encoder).order(0).build(); // Motor controller (order 10 — runs after encoder) MotorController motor; sched.add(motor) .order(10) .budget(5_ms) .on_miss(horus::Miss::SafeMode) // stop motors if tick overruns .build(); // Simulated command source auto cmd_pub = sched.advertise("cmd_vel"); sched.add("commander") .order(5) .tick([&] { static int t = 0; horus::msg::CmdVel cmd{}; cmd.linear = (t++ < 300) ? 0.5f : 0.0f; // drive for 3s then stop cmd_pub.send(cmd); }) .build(); std::printf("Motor controller running at 100 Hz (Ctrl+C to stop)\n"); sched.spin(); } ``` ## Step 3: Build and Run ```bash horus build && horus run ``` ## Step 4: Monitor In another terminal: ```bash horus topic echo motor.state # watch motor state horus topic echo motor.pwm # watch PWM output horus node list # see all 3 nodes running horus log # see init/error messages ``` ## Key Concepts ### Execution Order Matters ``` order 0: SimEncoder → reads PWM, publishes encoder velocity order 5: Commander → publishes velocity command order 10: MotorController → reads command + encoder, publishes PWM ``` The encoder runs BEFORE the controller so the controller always has fresh feedback. This is deterministic — guaranteed every tick. ### Budget + SafeMode ```cpp sched.add(motor).budget(5_ms).on_miss(horus::Miss::SafeMode).build(); ``` If the motor controller takes longer than 5ms, the scheduler calls `enter_safe_state()` — motors stop immediately. This prevents a slow computation from leaving motors running with stale commands. ### Encoder Timeout The controller tracks `ticks_without_encoder_`. If no encoder data arrives for 50 ticks (500ms at 100Hz), it assumes the encoder is dead and stops motors. This catches: - Disconnected encoder cable - Crashed encoder driver process - SHM corruption ## Next Steps - [Tutorial 3: Full Robot System](/docs/tutorials/03-full-robot-cpp) — combine sensor, controller, and actuator - [Recipe: PID Controller](/docs/recipes/pid-controller-cpp) — deeper PID tuning guide - [Guide: Real-Time C++](/docs/cpp/realtime) — SCHED_FIFO, CPU pinning, watchdog --- ## Tutorial 2: Build a Motor Controller Path: /tutorials/02-motor-controller Description: Create a motor controller node that subscribes to velocity commands and publishes joint position # Tutorial 2: Build a Motor Controller In this tutorial, you'll build a motor controller — a node that receives velocity commands and tracks joint position. This is the actuator side of a robot, complementing the sensor from Tutorial 1. **Prerequisites:** [Tutorial 1: IMU Sensor Node](/tutorials/01-sensor-node) completed. **What you'll learn:** - Subscribing to command topics - Publishing state feedback - Managing state between ticks (integration) - Multiple topics per node **Time:** 15 minutes --- ## What We're Building A motor controller node that: 1. Subscribes to velocity commands on `"motor.command"` 2. Integrates velocity into position (simple physics) 3. Publishes current position on `"motor.state"` 4. A commander node that sends test commands ## Step 1: Create the Project ```bash horus new motor-demo -r cd motor-demo ``` ## Step 2: Define the Data Types ```rust use horus::prelude::*; /// Velocity command sent to the motor. #[derive(Debug, Clone, Copy, Default, Serialize, Deserialize, LogSummary)] #[repr(C)] struct MotorCommand { velocity: f32, // Desired velocity (rad/s) max_torque: f32, // Torque limit (N*m) } /// Motor state feedback. #[derive(Debug, Clone, Copy, Default, Serialize, Deserialize, LogSummary)] #[repr(C)] struct MotorState { position: f32, // Current position (radians) velocity: f32, // Current velocity (rad/s) torque: f32, // Applied torque (N*m) timestamp: f64, } ``` ## Step 3: Build the Motor Controller This is the core of the tutorial — a node that subscribes AND publishes: ```rust // simplified struct MotorController { commands: Topic, // Subscribe to commands state_pub: Topic, // Publish state position: f32, // Accumulated position velocity: f32, // Current velocity dt: f32, // Time step (set from rate) } impl MotorController { fn new(rate_hz: f32) -> Result { Ok(Self { commands: Topic::new("motor.command")?, state_pub: Topic::new("motor.state")?, position: 0.0, velocity: 0.0, dt: 1.0 / rate_hz, }) } } impl Node for MotorController { fn name(&self) -> &str { "MotorController" } fn tick(&mut self) { // IMPORTANT: call recv() every tick to consume the latest command. // Even if no command arrives, the motor continues integrating position. if let Some(cmd) = self.commands.recv() { // Simple velocity tracking with torque limit let error = cmd.velocity - self.velocity; let torque = error.clamp(-cmd.max_torque, cmd.max_torque); self.velocity += torque * self.dt; } // Integrate velocity → position self.position += self.velocity * self.dt; // Publish current state self.state_pub.send(MotorState { position: self.position, velocity: self.velocity, torque: 0.0, timestamp: 0.0, // Simplified for tutorial }); } // SAFETY: shutdown() is CRITICAL for actuator nodes. Without it, the motor // could continue running at its last commanded velocity after the scheduler stops. // Always zero velocity and publish the stopped state before returning. fn shutdown(&mut self) -> Result<()> { // CRITICAL: zero velocity before publishing — order matters. self.velocity = 0.0; self.state_pub.send(MotorState { position: self.position, velocity: 0.0, torque: 0.0, timestamp: 0.0, }); eprintln!("Motor stopped safely at position {:.2} rad", self.position); Ok(()) } } ``` Key points: - **Two topics**: one for receiving commands, one for publishing state - **State between ticks**: `position` and `velocity` persist across `tick()` calls - **Simple physics**: velocity integrates into position (`position += velocity * dt`) - **Safe shutdown**: motor velocity set to zero in `shutdown()` ## Step 4: Build the Commander A test node that sends velocity commands: ```rust // simplified struct Commander { publisher: Topic, tick_count: u64, } impl Commander { fn new() -> Result { Ok(Self { publisher: Topic::new("motor.command")?, tick_count: 0, }) } } impl Node for Commander { fn name(&self) -> &str { "Commander" } fn tick(&mut self) { let t = self.tick_count as f64 * 0.01; // At 100Hz // Send a sine wave velocity command let velocity = 1.0 * (t * 0.5).sin() as f32; self.publisher.send(MotorCommand { velocity, max_torque: 10.0, }); self.tick_count += 1; } // SAFETY: shutdown() sends a zero-velocity command so the motor stops // even if the motor controller's own shutdown() hasn't run yet. fn shutdown(&mut self) -> Result<()> { self.publisher.send(MotorCommand { velocity: 0.0, max_torque: 0.0 }); eprintln!("Commander shutting down, sent zero-velocity command."); Ok(()) } } ``` ## Step 5: Add a State Display ```rust // simplified struct StateDisplay { subscriber: Topic, sample_count: u64, } impl StateDisplay { fn new() -> Result { Ok(Self { subscriber: Topic::new("motor.state")?, sample_count: 0, }) } } impl Node for StateDisplay { fn name(&self) -> &str { "StateDisplay" } fn tick(&mut self) { // IMPORTANT: call recv() every tick to drain the topic buffer. if let Some(state) = self.subscriber.recv() { self.sample_count += 1; if self.sample_count % 50 == 0 { println!( "pos={:.2} rad vel={:.2} rad/s", state.position, state.velocity, ); } } } // SAFETY: shutdown() logs final display state for diagnostics. fn shutdown(&mut self) -> Result<()> { eprintln!("StateDisplay shutting down after {} samples", self.sample_count); Ok(()) } } ``` ## Step 6: Wire Everything Together ```rust fn main() -> Result<()> { eprintln!("Motor controller demo starting...\n"); let rate = 100.0; // Hz let mut scheduler = Scheduler::new(); // Execution order: Commander (0) → MotorController (1) → StateDisplay (2). // Commander must publish before motor reads; motor must publish before display reads. scheduler.add(Commander::new()?) .order(0) // NOTE: .rate(100.hz()) triggers auto-RT detection at finalize(). .rate(100.hz()) .build()?; scheduler.add(MotorController::new(rate)?) .order(1) // NOTE: .rate(100.hz()) triggers auto-RT detection at finalize(). .rate(100.hz()) .build()?; scheduler.add(StateDisplay::new()?) .order(2) .build()?; scheduler.run() } ``` Notice the data flow: **Commander (0) → MotorController (1) → StateDisplay (2)**. Order matters — the commander must send before the controller reads. ## Step 7: Run It ```bash horus run ``` Expected output: ``` Motor controller demo starting... pos=0.02 rad vel=0.05 rad/s pos=0.26 rad vel=0.44 rad/s pos=0.86 rad vel=0.76 rad/s pos=1.58 rad vel=0.84 rad/s ``` Press Ctrl+C to stop — you'll see the shutdown message: ``` Motor stopped safely at position 3.14 rad ``` ## What You Learned - **Subscribing** to commands with `commands.recv()` - **Publishing** state feedback with `state_pub.send()` - **State management** between ticks (velocity integration) - **Safe shutdown** — always stop motors in `shutdown()` - **Data flow ordering** — commander before controller before display ## Next: [Tutorial 3: Full Robot Integration](/tutorials/03-full-robot) In the next tutorial, we'll combine the IMU sensor and motor controller into a complete robot system with coordinate frame tracking and monitoring. --- ## See Also - [Motor Controller (Python)](/tutorials/02-motor-controller-python) — Python version - [Differential Drive Recipe](/recipes/differential-drive) — Production pattern - [CmdVel](/stdlib/messages/cmd-vel) — Velocity command type --- ## Tutorial 5: Hardware & Real-Time (Python) Path: /tutorials/05-hardware-and-rt-python Description: Load hardware drivers and configure real-time scheduling — Python edition # Tutorial 5: Hardware & Real-Time (Python) Learn to load hardware drivers from `horus.toml` and configure real-time scheduling with budgets, deadlines, and miss policies. ## What You'll Learn - Loading hardware nodes with `horus.hardware.load()` - Accessing hardware configuration parameters - Setting budgets and deadlines for timing guarantees - Handling deadline misses with `on_miss` - Using `compute=True` for CPU-heavy nodes - Production scheduler configuration ## Part 1: Hardware Drivers ### Configure Drivers in horus.toml ```toml # horus.toml [package] name = "my-robot" version = "0.1.0" [drivers.arm] terra = "dynamixel" port = "/dev/ttyUSB0" baudrate = 1000000 servo_ids = [1, 2, 3] [drivers.lidar] terra = "rplidar" port = "/dev/ttyUSB1" scan_mode = "standard" [drivers.imu] terra = "i2c" bus = 1 address = 104 ``` ### Load and Read Driver Config ```python import horus # Load hardware from horus.toml [hardware] section entries = horus.hardware.load() for name, obj in entries: if isinstance(obj, horus.NodeParams): port = obj.get_or("port", "/dev/ttyUSB0") print(f"{name}: port={port}") ``` ### Using Config in Nodes ```python def arm_tick(node): cmd = node.recv("arm.command") if cmd is not None: # Use hardware connection to send commands node.send("arm.state", {"position": [0.0, 0.0, 0.0]}) arm_node = horus.Node(name="ArmController", tick=arm_tick, rate=100, order=0, subs=["arm.command"], pubs=["arm.state"]) horus.run(arm_node) ``` ### Handling Missing Hardware If `horus.toml` has no `[hardware]` section, `hardware.load()` raises an exception. Wrap it for graceful fallback: ```python try: entries = horus.hardware.load() except Exception as e: print(f"No hardware config: {e}") print("Running in simulation mode") entries = [] ``` This lets the same codebase run on both real hardware and in simulation without changes. ## Part 2: Real-Time Scheduling ### Budget and Deadline Set timing constraints to detect overruns: ```python import horus us = horus.us # 1e-6 (microseconds → seconds) ms = horus.ms # 1e-3 (milliseconds → seconds) # horus.us = 1e-6 (microseconds), horus.ms = 1e-3 (milliseconds). # These are plain float constants for readable duration math: # budget=300 * horus.us means 300 microseconds. # Motor controller: 1kHz with 800μs budget, 950μs deadline motor = horus.Node( name="MotorCtrl", tick=motor_tick, rate=1000, order=0, budget=800 * us, # must finish tick within 800μs deadline=950 * us, # hard deadline at 950μs on_miss="safe_mode", # enter safe state if deadline missed core=0, # pin to CPU core 0 # Pins this node's thread to CPU core 0. Requires CAP_SYS_NICE or root # on Linux. If the core doesn't exist or permissions are insufficient, # HORUS logs a warning and continues with default scheduling. ) # LiDAR driver: 100Hz, skip missed ticks lidar = horus.Node( name="LidarDriver", tick=lidar_tick, rate=100, order=1, on_miss="skip", # skip tick on deadline miss ) # Path planner: CPU-heavy, runs on thread pool planner = horus.Node( name="PathPlanner", tick=planner_tick, rate=10, order=5, compute=True, # runs on worker thread pool, not main loop on_miss="warn", # just log a warning ) # Production scheduler with watchdog and RT horus.run( motor, lidar, planner, tick_rate=1000, rt=True, # request SCHED_FIFO (graceful fallback) watchdog_ms=500, # detect frozen nodes blackbox_mb=64, # flight recorder for debugging ) ``` ### Deadline Miss Policies | Policy | Behavior | Use For | |--------|----------|---------| | `"warn"` | Log warning, continue normally | Non-critical nodes (logging, telemetry) | | `"skip"` | Skip this tick, resume next cycle | Sensor drivers that can miss a frame | | `"safe_mode"` | Call `enter_safe_state()` equivalent, continue ticking | Motor controllers, actuators | | `"stop"` | Stop the entire scheduler | Safety-critical nodes | ### Adaptive Quality with Budget Use `horus.budget_remaining()` to do more work when time permits: ```python def planner_tick(node): # Always compute basic path path = compute_basic_path() # Only optimize if budget allows if horus.budget_remaining() > 2 * ms: path = optimize_path(path) # Only smooth if still have time if horus.budget_remaining() > 1 * ms: path = smooth_path(path) node.send("path", path) ``` ### Complete Production Example ```python import horus us, ms = horus.us, horus.ms def safety_tick(node): """Safety monitor — runs first every tick.""" if node.has_msg("emergency_stop"): estop = node.recv("emergency_stop") if estop.get("engaged"): node.log_error("EMERGENCY STOP ENGAGED") node.request_stop() def motor_tick(node): cmd = node.recv("cmd_vel") if cmd is not None: # Apply velocity command to motor node.send("motor.state", {"velocity": cmd.get("linear", 0.0)}) def motor_shutdown(node): node.send("cmd_vel", {"linear": 0.0, "angular": 0.0}) node.log_info("Motors stopped safely") safety = horus.Node(name="Safety", tick=safety_tick, rate=100, order=0, subs=["emergency_stop"]) motor = horus.Node(name="Motor", tick=motor_tick, shutdown=motor_shutdown, rate=100, order=1, budget=500 * us, on_miss="safe_mode", subs=["cmd_vel"], pubs=["motor.state"]) horus.run(safety, motor, tick_rate=100, rt=True, watchdog_ms=500) ``` ## Next Steps - [Real-Time Concepts](/concepts/real-time) — what real-time means for robotics - [Safety Monitor](/advanced/safety-monitor) — graduated degradation and watchdog - [Choosing a Language](/getting-started/choosing-language) — when to use Python vs Rust for RT - [Python API Reference](/python/api/python-bindings) — full scheduling kwargs --- ## See Also - [Hardware Drivers (Rust)](/tutorials/05-hardware-drivers) — Rust version - [Python Bindings](/python/api/python-bindings) — Python API reference --- ## Tutorial 3: Full Robot System (C++) Path: /tutorials/03-full-robot-cpp Description: Integrate sensors, controller, actuators, safety, transforms, and parameters into one system # Tutorial 3: Full Robot System (C++) Build a complete mobile robot with 6 nodes, coordinate transforms, runtime parameters, safety monitoring, and telemetry — all in one scheduler. ## What You'll Build ``` ┌────────────┐ ┌────────────┐ ┌────────────┐ │ LiDAR │──→│ Controller │──→│ Motors │ │ (10 Hz) │ │ (100 Hz) │ │ (100 Hz) │ └────────────┘ └────────────┘ └────────────┘ ↑ ↑ ┌────────────┐ │ │ ┌────────────┐ │ IMU │────────┘ └──────│ Safety │ │ (200 Hz) │ │ (50 Hz) │ └────────────┘ └────────────┘ ┌────────────┐ │ Telemetry │ │ (1 Hz) │ └────────────┘ ``` ## What You'll Learn - 6-node pipeline with different rates and execution orders - Coordinate transforms (lidar → base_link → world) - Runtime parameters for live tuning - Safety node with emergency stop - Telemetry logging at 1 Hz - Multi-rate execution in a single scheduler ## Prerequisites - Completed [Tutorial 1](/docs/tutorials/01-sensor-node-cpp) and [Tutorial 2](/docs/tutorials/02-motor-controller-cpp) ## Step 1: Create the Project ```bash horus new full_robot --lang cpp cd full_robot ``` ## Step 2: Write the System Replace `src/main.cpp`: ```cpp #include #include #include #include #include using namespace horus::literals; // ── Shared safety flag ────────────────────────────────────────────────── static std::atomic estop_active{false}; // ── 1. LiDAR Driver (10 Hz) ──────────────────────────────────────────── class LidarDriver : public horus::Node { public: LidarDriver() : Node("lidar_driver") { scan_pub_ = advertise("lidar.scan"); } void tick() override { if (++tick_ % 20 != 0) return; // 10 Hz from 200 Hz scheduler auto scan = scan_pub_->loan(); // Simulate: wall at 1.5m with gap at 90° for (int i = 0; i < 360; i++) { if (i > 85 && i < 95) { scan->ranges[i] = 5.0f; // gap (doorway) } else { float dist = 1.5f + 0.2f * std::sin(i * 0.05f); scan->ranges[i] = dist; } } scan->angle_min = 0.0f; scan->angle_max = 6.28318f; scan_pub_->publish(std::move(scan)); } private: horus::Publisher* scan_pub_; int tick_ = 0; }; // ── 2. IMU Driver (200 Hz — every tick) ───────────────────────────────── class ImuDriver : public horus::Node { public: ImuDriver() : Node("imu_driver") { imu_pub_ = advertise("imu.data"); } void tick() override { horus::msg::Imu imu{}; imu.linear_acceleration[2] = 9.81; // gravity imu.angular_velocity[2] = 0.01; // slight yaw drift imu.orientation[3] = 1.0; // identity quaternion w imu_pub_->send(imu); } private: horus::Publisher* imu_pub_; }; // ── 3. Controller (100 Hz) ────────────────────────────────────────────── class Controller : public horus::Node { public: Controller(horus::Params& params) : Node("controller"), params_(params) { scan_sub_ = subscribe("lidar.scan"); imu_sub_ = subscribe("imu.data"); cmd_pub_ = advertise("cmd_vel"); } void tick() override { if (++tick_ % 2 != 0) return; // 100 Hz from 200 Hz if (estop_active.load()) return; double max_speed = params_.get("max_speed", 0.3); double safe_dist = params_.get("safe_distance", 0.5); // Read lidar (latest scan) float min_range = 999.0f; int min_idx = 0; if (auto scan = scan_sub_->recv()) { for (int i = 0; i < 360; i++) { if (scan->get()->ranges[i] > 0.01f && scan->get()->ranges[i] < min_range) { min_range = scan->get()->ranges[i]; min_idx = i; } } } // Read IMU (compensate yaw drift) double yaw_rate = 0; if (auto imu = imu_sub_->recv()) { yaw_rate = imu->get()->angular_velocity[2]; } // Simple obstacle avoidance horus::msg::CmdVel cmd{}; if (min_range < static_cast(safe_dist)) { cmd.linear = 0.0f; cmd.angular = (min_idx < 180) ? -0.5f : 0.5f; } else { cmd.linear = static_cast(max_speed); cmd.angular = static_cast(-yaw_rate * 0.5); // drift compensation } cmd_pub_->send(cmd); } void enter_safe_state() override { horus::msg::CmdVel stop{}; cmd_pub_->send(stop); } private: horus::Subscriber* scan_sub_; horus::Subscriber* imu_sub_; horus::Publisher* cmd_pub_; horus::Params& params_; int tick_ = 0; }; // ── 4. Motor Driver (100 Hz) ──────────────────────────────────────────── class MotorDriver : public horus::Node { public: MotorDriver() : Node("motor_driver") { cmd_sub_ = subscribe("cmd_vel"); odom_pub_ = advertise("odom"); } void tick() override { if (++tick_ % 2 != 0) return; // 100 Hz auto cmd = cmd_sub_->recv(); if (!cmd) return; double v = cmd->get()->linear; double w = cmd->get()->angular; double dt = 0.01; x_ += v * std::cos(theta_) * dt; y_ += v * std::sin(theta_) * dt; theta_ += w * dt; horus::msg::Odometry odom{}; odom.pose.x = x_; odom.pose.y = y_; odom.pose.theta = theta_; odom_pub_->send(odom); } void enter_safe_state() override { horus::log::warn("motor", "Safe state — motors stopped"); } private: horus::Subscriber* cmd_sub_; horus::Publisher* odom_pub_; double x_ = 0, y_ = 0, theta_ = 0; int tick_ = 0; }; // ── 5. Safety Monitor (50 Hz) ─────────────────────────────────────────── class SafetyMonitor : public horus::Node { public: SafetyMonitor() : Node("safety_monitor") { scan_sub_ = subscribe("lidar.scan"); estop_pub_ = advertise("emergency.stop"); } void tick() override { if (++tick_ % 4 != 0) return; // 50 Hz auto scan = scan_sub_->recv(); if (!scan) return; bool danger = false; for (int i = 0; i < 360; i++) { if (scan->get()->ranges[i] > 0.01f && scan->get()->ranges[i] < 0.2f) { danger = true; break; } } if (danger && !estop_active.load()) { estop_active.store(true); horus::msg::EmergencyStop estop{}; estop.engaged = 1; estop_pub_->send(estop); horus::log::error("safety", "EMERGENCY STOP — object < 20cm"); horus::blackbox::record("safety", "E-stop: object < 20cm"); } if (!danger && estop_active.load()) { estop_active.store(false); horus::msg::EmergencyStop clear{}; clear.engaged = 0; estop_pub_->send(clear); horus::log::info("safety", "E-stop cleared"); } } private: horus::Subscriber* scan_sub_; horus::Publisher* estop_pub_; int tick_ = 0; }; // ── 6. Telemetry Logger (1 Hz) ────────────────────────────────────────── class Telemetry : public horus::Node { public: Telemetry() : Node("telemetry") { odom_sub_ = subscribe("odom"); cmd_sub_ = subscribe("cmd_vel"); } void tick() override { if (++tick_ % 200 != 0) return; // 1 Hz auto odom = odom_sub_->recv(); auto cmd = cmd_sub_->recv(); char buf[128]; std::snprintf(buf, sizeof(buf), "pos=(%.2f, %.2f) heading=%.1f° cmd=(%.2f, %.2f)", odom ? odom->get()->pose.x : 0.0, odom ? odom->get()->pose.y : 0.0, odom ? odom->get()->pose.theta * 57.3 : 0.0, cmd ? cmd->get()->linear : 0.0f, cmd ? cmd->get()->angular : 0.0f); horus::log::info("telemetry", buf); } private: horus::Subscriber* odom_sub_; horus::Subscriber* cmd_sub_; int tick_ = 0; }; // ── Main ──────────────────────────────────────────────────────────────── int main() { // Coordinate transforms horus::TransformFrame tf; tf.register_frame("world"); tf.register_frame("base_link", "world"); tf.register_frame("lidar", "base_link"); tf.update("lidar", {0.2, 0.0, 0.3}, {0, 0, 0, 1}, 0); // Runtime parameters horus::Params params; params.set("max_speed", 0.3); params.set("safe_distance", 0.5); // Scheduler horus::Scheduler sched; sched.tick_rate(200_hz).name("full_robot"); // Add nodes in execution order ImuDriver imu; sched.add(imu).order(0).build(); // 200 Hz LidarDriver lidar; sched.add(lidar).order(1).build(); // 10 Hz SafetyMonitor safety; sched.add(safety).order(2).budget(2_ms).on_miss(horus::Miss::Stop).build(); // 50 Hz, critical Controller ctrl(params); sched.add(ctrl).order(10).budget(5_ms).on_miss(horus::Miss::Skip).build(); // 100 Hz MotorDriver motors; sched.add(motors).order(20).on_miss(horus::Miss::SafeMode).build(); // 100 Hz Telemetry telemetry; sched.add(telemetry).order(100).build(); // 1 Hz auto nodes = sched.node_list(); std::printf("Robot ready: %zu nodes\n", nodes.size()); for (auto& n : nodes) std::printf(" - %s\n", n.c_str()); sched.spin(); } ``` ## Step 3: Build and Run ```bash horus build && horus run ``` ## Step 4: Monitor Everything ```bash horus topic list # 6+ active topics horus node list # 6 running nodes horus topic echo odom # watch robot position horus topic echo cmd_vel # watch velocity commands horus log # see telemetry + safety events ``` ## Key Architecture Decisions | Node | Rate | Order | Miss Policy | Why | |------|------|-------|-------------|-----| | IMU | 200 Hz | 0 | default | Fastest sensor, runs every tick | | LiDAR | 10 Hz | 1 | default | Mechanical limit of sensor | | Safety | 50 Hz | 2 | **Stop** | If safety can't run, entire system must stop | | Controller | 100 Hz | 10 | Skip | Skipping one tick is better than lag | | Motors | 100 Hz | 20 | **SafeMode** | Overrun → stop motors immediately | | Telemetry | 1 Hz | 100 | default | Non-critical, best effort | ## Next Steps - [Cross-Language Interop](/docs/tutorials/cross-language-interop) — mix C++, Rust, Python in one system - [Guide: Hardware Integration](/docs/cpp/hardware) — connect real sensors - [Guide: Real-Time C++](/docs/cpp/realtime) — SCHED_FIFO, CPU pinning --- ## Tutorial 3: Full Robot Integration Path: /tutorials/03-full-robot Description: Combine sensor and motor nodes into a complete robot system with multi-rate scheduling and monitoring # Tutorial 3: Full Robot Integration ## Prerequisites - [Tutorial 1: IMU Sensor Node](/tutorials/01-sensor-node) completed - [Tutorial 2: Motor Controller](/tutorials/02-motor-controller) completed ## What You'll Build A robot system with 4 nodes running at different rates: ``` ImuSensor (100 Hz) ──→ "imu.data" ──→ StateEstimator (100 Hz) │ Commander (10 Hz) ──→ "motor.cmd" ──→ MotorController (50 Hz) │ "motor.state" ──→ StateEstimator │ "robot.pose" (output) ``` You'll learn to compose multiple node types, schedule them at different rates, and monitor the system with `horus monitor`. **Time estimate**: ~20 minutes ## Step 1: Create the Project ```bash horus new robot-integration -r cd robot-integration ``` You should see `horus.toml`, `src/main.rs`, and `.horus/` in the project directory. ## Step 2: Define Shared Types Replace `src/main.rs` with the shared data types. We reuse the types from Tutorials 1 and 2, plus a robot pose: ```rust use horus::prelude::*; #[derive(Debug, Clone, Copy, Default, Serialize, Deserialize, LogSummary)] #[repr(C)] struct ImuData { accel_x: f32, accel_y: f32, accel_z: f32, gyro_roll: f32, gyro_pitch: f32, gyro_yaw: f32, timestamp: f64, } #[derive(Debug, Clone, Copy, Default, Serialize, Deserialize, LogSummary)] #[repr(C)] struct MotorCommand { velocity: f32, // rad/s max_torque: f32, // N*m } #[derive(Debug, Clone, Copy, Default, Serialize, Deserialize, LogSummary)] #[repr(C)] struct MotorState { position: f32, // radians velocity: f32, // rad/s torque: f32, // N*m timestamp: f64, } #[derive(Debug, Clone, Copy, Default, Serialize, Deserialize, LogSummary)] #[repr(C)] struct RobotPose { x: f32, // meters y: f32, // meters heading: f32, // radians timestamp: f64, } ``` ## Step 3: The IMU Sensor (from Tutorial 1) ```rust // simplified struct ImuSensor { publisher: Topic, tick_count: u64, } impl ImuSensor { fn new() -> Result { Ok(Self { publisher: Topic::new("imu.data")?, tick_count: 0 }) } } impl Node for ImuSensor { fn name(&self) -> &str { "ImuSensor" } fn tick(&mut self) { let t = self.tick_count as f64 * 0.01; self.publisher.send(ImuData { accel_x: 0.0, accel_y: 0.0, accel_z: 9.81, gyro_roll: 0.0, gyro_pitch: 0.0, gyro_yaw: 0.1 * (t * 0.5).sin() as f32, timestamp: t, }); self.tick_count += 1; } fn shutdown(&mut self) -> Result<()> { eprintln!("ImuSensor shutting down after {} ticks", self.tick_count); Ok(()) } } ``` ## Step 4: The Motor Controller (from Tutorial 2) ```rust // simplified struct MotorController { commands: Topic, state_pub: Topic, position: f32, velocity: f32, } impl MotorController { fn new() -> Result { Ok(Self { commands: Topic::new("motor.cmd")?, state_pub: Topic::new("motor.state")?, position: 0.0, velocity: 0.0, }) } } impl Node for MotorController { fn name(&self) -> &str { "MotorController" } fn tick(&mut self) { let dt = 0.02; // 50 Hz if let Some(cmd) = self.commands.recv() { let error = cmd.velocity - self.velocity; self.velocity += error.clamp(-cmd.max_torque, cmd.max_torque) * dt; } self.position += self.velocity * dt; self.state_pub.send(MotorState { position: self.position, velocity: self.velocity, torque: 0.0, timestamp: 0.0, }); } // SAFETY: always zero velocity on shutdown to prevent runaway fn shutdown(&mut self) -> Result<()> { self.velocity = 0.0; self.state_pub.send(MotorState { position: self.position, velocity: 0.0, torque: 0.0, timestamp: 0.0, }); eprintln!("Motor stopped at position {:.2} rad", self.position); Ok(()) } } ``` ## Step 5: The State Estimator (NEW) This node fuses IMU and motor data to estimate the robot's 2D pose. It subscribes to TWO topics and publishes ONE — showing how nodes compose: ```rust // simplified struct StateEstimator { imu_sub: Topic, motor_sub: Topic, pose_pub: Topic, pose: RobotPose, } impl StateEstimator { fn new() -> Result { Ok(Self { imu_sub: Topic::new("imu.data")?, motor_sub: Topic::new("motor.state")?, pose_pub: Topic::new("robot.pose")?, pose: RobotPose::default(), }) } } impl Node for StateEstimator { fn name(&self) -> &str { "StateEstimator" } fn tick(&mut self) { let dt = 0.01; // 100 Hz // Read IMU for heading changes if let Some(imu) = self.imu_sub.recv() { self.pose.heading += imu.gyro_yaw * dt as f32; self.pose.timestamp = imu.timestamp; } // Read motor state for forward motion if let Some(motor) = self.motor_sub.recv() { let speed = motor.velocity * 0.1; // scale to m/s self.pose.x += speed * self.pose.heading.cos() * dt as f32; self.pose.y += speed * self.pose.heading.sin() * dt as f32; } self.pose_pub.send(self.pose); } fn shutdown(&mut self) -> Result<()> { eprintln!( "StateEstimator final pose: ({:.2}, {:.2}) heading={:.2} rad", self.pose.x, self.pose.y, self.pose.heading ); Ok(()) } } ``` ## Step 6: The Commander ```rust // simplified struct Commander { publisher: Topic, tick_count: u64, } impl Commander { fn new() -> Result { Ok(Self { publisher: Topic::new("motor.cmd")?, tick_count: 0 }) } } impl Node for Commander { fn name(&self) -> &str { "Commander" } fn tick(&mut self) { let t = self.tick_count as f64 * 0.1; // 10 Hz self.publisher.send(MotorCommand { velocity: 1.0 * (t * 0.3).sin() as f32, max_torque: 10.0, }); self.tick_count += 1; } fn shutdown(&mut self) -> Result<()> { self.publisher.send(MotorCommand { velocity: 0.0, max_torque: 0.0 }); eprintln!("Commander shutting down, sent zero-velocity command."); Ok(()) } } ``` ## Step 7: Wire the Complete System Multi-rate scheduling — each node runs at the rate appropriate for its responsibility: ```rust fn main() -> Result<()> { eprintln!("Robot integration demo starting...\n"); eprintln!("Open another terminal and run: horus monitor\n"); let mut scheduler = Scheduler::new(); // Order: sensors (0) → commander (1) → controller (2) → estimator (3) // Producers publish before consumers read. scheduler.add(ImuSensor::new()?) .order(0) .rate(100_u64.hz()) // sensor: high frequency for accuracy .build()?; scheduler.add(Commander::new()?) .order(1) .rate(10_u64.hz()) // commands: change slowly .build()?; scheduler.add(MotorController::new()?) .order(2) .rate(50_u64.hz()) // actuator: moderate update rate .build()?; scheduler.add(StateEstimator::new()?) .order(3) .rate(100_u64.hz()) // estimator: matches fastest sensor .build()?; scheduler.run() } ``` ## Step 8: Run and Monitor ```bash horus run ``` You should see: ``` Robot integration demo starting... Open another terminal and run: horus monitor ``` The system runs silently — the estimator publishes pose but nothing prints it. Open a second terminal to see the live pose data: ```bash horus topic echo robot.pose ``` You should see pose updates streaming at 100 Hz. Open a third terminal for the monitor dashboard: ```bash horus monitor ``` You should see all 4 nodes with their tick rates, all topics with message counts, and system health metrics. Press **Ctrl+C** in the first terminal to stop — you should see shutdown messages from all nodes. ## Key Takeaways - **Multi-node composition** — 4 nodes working together through shared topics - **Multi-rate scheduling** — `.rate(100_u64.hz())`, `.rate(50_u64.hz())`, `.rate(10_u64.hz())` on different nodes - **Data fusion** — a single node subscribes to multiple topics and publishes a fused result - **Execution order** — sensors (0) before controllers (2) before estimators (3) - **System monitoring** — `horus monitor` and `horus topic echo` for live inspection ## Next Steps - [Real-Time Control Tutorial](/tutorials/realtime-control) — add `.budget()`, `.deadline()`, and `.on_miss()` for hard real-time - [TransformFrame](/concepts/transform-frame) — track coordinate frames between sensors and actuators - [Deployment](/operations/deploy-to-robot) — deploy to a real robot over SSH ## See Also - [Architecture](/concepts/architecture) — System design overview - [Scheduler API](/rust/api/scheduler) — Full node configuration reference - [Execution Classes](/concepts/execution-classes) — How multi-rate scheduling works - [Safety Monitor](/advanced/safety-monitor) — Add watchdog and graceful degradation --- ## Tutorial 4: Custom Messages (C++) Path: /tutorials/04-custom-messages-cpp Description: Define custom message types and use them with typed Publisher/Subscriber # Tutorial 4: Custom Messages (C++) HORUS provides 50+ built-in message types, but you'll often need your own. This tutorial shows how to define custom `#[repr(C)]` Pod messages usable across C++, Rust, and Python. ## What You'll Learn - Defining `#[repr(C)]` structs that work as Pod messages - Using custom types with `Publisher` / `Subscriber` - Layout requirements for cross-language compatibility - When to use custom messages vs JsonWireMessage ## The Two Approaches ### Approach 1: Use JsonWireMessage (Quick, Flexible) For prototyping or when the message schema changes frequently, use JSON: ```cpp #include #include using namespace horus::literals; int main() { horus::Scheduler sched; // JSON wire message — sends arbitrary JSON through SHM horus::Publisher cmd_pub("cmd_vel"); // For custom data, use the JSON wire message type auto json_pub = sched.advertise("custom.data"); sched.add("sender") .tick([&] { // Pack custom data as JSON HorusJsonWireMsg msg{}; const char* json = R"({"temperature": 25.3, "humidity": 60, "location": "lab"})"; std::memcpy(msg.data, json, strlen(json)); msg.data_len = strlen(json); msg.msg_id = 1; json_pub.send(msg); }) .build(); sched.spin(); } ``` **Pros:** No Rust changes needed, any JSON schema, works immediately. **Cons:** 4KB max, serialization overhead, no compile-time type checking. ### Approach 2: Define a Pod Struct (Production, Zero-Copy) For production use, define a `#[repr(C)]` struct in Rust, add it to the FFI pipeline, and create a matching C++ struct. This gives you zero-copy SHM transfer. #### Step 1: Define in Rust (`horus_library/messages/`) ```rust /// Custom sensor reading from a weather station #[repr(C)] #[derive(Debug, Clone, Copy, Default, Serialize, Deserialize)] pub struct WeatherData { pub temperature: f32, // Celsius pub humidity: f32, // 0-100% pub pressure: f32, // hPa pub wind_speed: f32, // m/s pub wind_direction: f32, // degrees (0=N, 90=E) pub timestamp_ns: u64, } ``` #### Step 2: Add to FFI pipeline In `horus_cpp/src/topic_ffi.rs`: ```rust impl_topic_ffi!(weather_data, WeatherData, horus_library::WeatherData); ``` In `horus_cpp/src/c_api.rs`: ```rust impl_pod_topic_c_api!(weather_data, horus_library::WeatherData); ``` #### Step 3: Define matching C++ struct ```cpp // In your project or in horus_cpp/include/horus/msg/ namespace horus { namespace msg { struct WeatherData { float temperature; // Celsius float humidity; // 0-100% float pressure; // hPa float wind_speed; // m/s float wind_direction; // degrees uint64_t timestamp_ns; }; }} // namespace horus::msg ``` #### Step 4: Add C++ template specialization In `horus_cpp/include/horus/impl/topic_impl.hpp`: ```cpp HORUS_TOPIC_IMPL(msg::WeatherData, weather_data) ``` #### Step 5: Use it ```cpp #include using namespace horus::literals; class WeatherStation : public horus::Node { public: WeatherStation() : Node("weather_station") { pub_ = advertise("weather.data"); } void tick() override { horus::msg::WeatherData data{}; data.temperature = read_temperature(); data.humidity = read_humidity(); data.pressure = read_pressure(); data.timestamp_ns = 0; pub_->send(data); } private: horus::Publisher* pub_; float read_temperature() { return 22.5f; } float read_humidity() { return 55.0f; } float read_pressure() { return 1013.25f; } }; ``` ## Layout Rules for Custom Types | Rule | Why | |------|-----| | `#[repr(C)]` in Rust | Ensures C-compatible memory layout | | Only primitive types + fixed arrays | `Vec`, `String`, `Box` can't cross SHM | | Same field order in C++ and Rust | Memory layout must match exactly | | Use `f32`/`f64`/`u8`/`u16`/`u32`/`u64`/`i32`/`i64` | Cross-language compatible primitives | | Fixed-size arrays `[T; N]` only | Variable-length data needs JsonWireMessage | | `timestamp_ns: u64` as last field | Convention for all HORUS messages | ## When To Use Each Approach | Scenario | Approach | |----------|----------| | Prototyping, schema changing | JsonWireMessage | | Production, performance-critical | Custom Pod struct | | Cross-language (C++ + Python + Rust) | Custom Pod struct | | One-off configuration messages | JsonWireMessage | | High-frequency sensor data (>100 Hz) | Custom Pod struct (zero-copy) | ## Key Takeaways - JsonWireMessage is the fastest path for custom data — no Rust changes, JSON payload, 4KB max - Custom Pod structs give zero-copy SHM but require adding to the FFI pipeline (5 steps) - Both approaches work cross-language (C++ ↔ Rust ↔ Python) - Always put `timestamp_ns: u64` as the last field (convention) - The `HORUS_TOPIC_IMPL` macro generates the entire Publisher/Subscriber specialization in one line --- ## Tutorial: Real-Time Control Path: /tutorials/realtime-control Description: Build a multi-rate robot controller with real-time scheduling, deadline enforcement, and safety policies # Tutorial: Real-Time Control ## Prerequisites - [Quick Start](/getting-started/quick-start) completed - Basic familiarity with the `Node` trait and Scheduler ## What You'll Build A robot controller with three nodes running at different rates and execution classes: 1. **Motor Controller** at 1 kHz — real-time, safety-critical 2. **LiDAR Driver** at 100 Hz — real-time sensor processing 3. **Path Planner** at 10 Hz — compute-heavy, runs on a thread pool **Time estimate**: ~20 minutes ## Step 1: Define Three Nodes Start with three basic nodes. All run as BestEffort on the main thread — no real-time yet. ```rust // simplified use horus::prelude::*; struct MotorController { cmd_sub: Topic, } impl MotorController { fn new() -> Result { Ok(Self { cmd_sub: Topic::new("cmd_vel")? }) } } impl Node for MotorController { fn name(&self) -> &str { "motor_ctrl" } fn tick(&mut self) { if let Some(cmd) = self.cmd_sub.recv() { // Apply velocity command to motors } } fn enter_safe_state(&mut self) { // SAFETY: zero velocity and engage brakes } } struct LidarDriver { scan_pub: Topic, } impl LidarDriver { fn new() -> Result { Ok(Self { scan_pub: Topic::new("lidar.scan")? }) } } impl Node for LidarDriver { fn name(&self) -> &str { "lidar" } fn tick(&mut self) { let scan = LaserScan::new(); self.scan_pub.send(scan); } } struct PathPlanner { scan_sub: Topic, cmd_pub: Topic, } impl PathPlanner { fn new() -> Result { Ok(Self { scan_sub: Topic::new("lidar.scan")?, cmd_pub: Topic::new("cmd_vel")?, }) } } impl Node for PathPlanner { fn name(&self) -> &str { "planner" } fn tick(&mut self) { if let Some(scan) = self.scan_sub.recv() { let cmd = CmdVel::new(0.3, 0.0); self.cmd_pub.send(cmd); } } } ``` You should be able to compile this with `horus run` — all three nodes tick on the main thread with no timing guarantees. ## Step 2: Add Rates Adding `.rate()` to a BestEffort node does three things automatically: 1. Sets the tick frequency 2. Derives a default budget (80% of the period) 3. Derives a default deadline (95% of the period) 4. Promotes the node to the RT execution class ```rust // simplified use horus::prelude::*; let mut scheduler = Scheduler::new(); scheduler.add(MotorController::new()?) .order(0) .rate(1000_u64.hz()) // 1 kHz → budget=800us, deadline=950us, auto-RT .build()?; scheduler.add(LidarDriver::new()?) .order(10) .rate(100_u64.hz()) // 100 Hz → budget=8ms, deadline=9.5ms, auto-RT .build()?; scheduler.add(PathPlanner::new()?) .order(50) .rate(10_u64.hz()) // 10 Hz → budget=80ms, deadline=95ms, auto-RT .build()?; ``` Notice all three nodes are now RT. That is correct for the motor and LiDAR, but the path planner does not need hard real-time — we fix that in Step 4. ## Step 3: Add Safety Policies If the motor controller misses a deadline, the arm could overshoot and collide. The `Miss` enum controls what happens: | Variant | Behavior | Use case | |---------|----------|----------| | `Miss::Warn` | Log a warning, continue | Soft real-time — logging, UI | | `Miss::Skip` | Drop the late tick, run next on schedule | Firm real-time — sensors | | `Miss::SafeMode` | Call `enter_safe_state()` on the node | Motor controllers — must zero output | | `Miss::Stop` | Shut down the entire scheduler | Safety-critical — unacceptable to continue | Apply safety to the motor controller: ```rust // simplified use horus::prelude::*; scheduler.add(MotorController::new()?) .order(0) .rate(1000_u64.hz()) .budget(800_u64.us()) // override auto-derived budget .deadline(950_u64.us()) // override auto-derived deadline .on_miss(Miss::SafeMode) // zero velocity on deadline miss .build()?; ``` This means: if the motor controller exceeds its 950 us deadline, the scheduler calls `enter_safe_state()` (which zeros velocity and engages brakes). The LiDAR is less critical — a missed scan is suboptimal but not dangerous: ```rust // simplified use horus::prelude::*; scheduler.add(LidarDriver::new()?) .order(10) .rate(100_u64.hz()) .on_miss(Miss::Skip) // drop late ticks, planner uses previous scan .build()?; ``` ## Step 4: Move the Planner to Compute The path planner runs complex algorithms that benefit from parallel execution. Use `.compute()` to move it off the RT thread: ```rust // simplified use horus::prelude::*; scheduler.add(PathPlanner::new()?) .order(50) .compute() // runs on worker thread pool, not RT thread .rate(10_u64.hz()) // rate-limited but no RT enforcement .build()?; ``` With `.compute()`, the planner runs on a worker thread. This prevents a slow planning cycle from blocking the 1 kHz motor loop. Note that `.rate()` still limits frequency but does NOT enforce RT budget/deadline on Compute nodes. ## Step 5: Enable RT on the Scheduler For production deployments, enable real-time OS scheduling on the scheduler itself: ```rust // simplified use horus::prelude::*; let mut scheduler = Scheduler::new() .prefer_rt() // request SCHED_FIFO if available .watchdog(500_u64.ms()) // detect frozen nodes .max_deadline_misses(5) // isolate after 5 consecutive misses .tick_rate(1000_u64.hz()); // global tick rate ``` - `.prefer_rt()` requests real-time OS scheduling (SCHED_FIFO + mlockall on Linux). Falls back gracefully if permissions are unavailable. - `.watchdog()` enables frozen node detection with graduated degradation. - `.max_deadline_misses()` sets the threshold before a node is isolated. For per-node CPU pinning, use `.core()` on the node builder: ```rust // simplified use horus::prelude::*; scheduler.add(MotorController::new()?) .order(0) .rate(1000_u64.hz()) .budget(800_u64.us()) .deadline(950_u64.us()) .on_miss(Miss::SafeMode) .core(0) // pin to CPU core 0 .build()?; scheduler.add(LidarDriver::new()?) .order(10) .rate(100_u64.hz()) .on_miss(Miss::Skip) .core(1) // separate core from motor controller .build()?; ``` ## Step 6: Complete System Here is the full program with all nodes configured: ```rust // simplified use horus::prelude::*; fn main() -> Result<()> { let mut scheduler = Scheduler::new() .prefer_rt() .watchdog(500_u64.ms()) .max_deadline_misses(5) .tick_rate(1000_u64.hz()); // 1 kHz motor controller — safety-critical, RT, pinned to core 0 scheduler.add(MotorController::new()?) .order(0) .rate(1000_u64.hz()) .budget(800_u64.us()) .deadline(950_u64.us()) .on_miss(Miss::SafeMode) .core(0) .build()?; // 100 Hz LiDAR — RT, pinned to core 1 scheduler.add(LidarDriver::new()?) .order(10) .rate(100_u64.hz()) .on_miss(Miss::Skip) .core(1) .build()?; // 10 Hz path planner — compute thread pool scheduler.add(PathPlanner::new()?) .order(50) .compute() .rate(10_u64.hz()) .build()?; scheduler.run() } ``` You should see the system run. Press Ctrl+C to stop — the timing report shows budget/deadline statistics for each node. ## Key Takeaways - **`.rate()` implies RT** — auto-derives budget (80%), deadline (95%), and promotes to RT execution class. Override with `.budget()` / `.deadline()` as needed. - **Safety is explicit** — always set `.on_miss()` for safety-critical nodes. `Miss::SafeMode` calls `enter_safe_state()`. `Miss::Skip` drops late ticks. - **Separate by execution class** — keep fast control loops on RT threads, move heavy computation to `.compute()` - **`.prefer_rt()` over `.require_rt()`** — degrades gracefully during development, use `.require_rt()` only in production where running without RT is unacceptable - **CPU pinning** — `.core(n)` on the node builder prevents OS thread migration and cache thrashing ## Next Steps - [Deterministic Mode](/advanced/deterministic-mode) — reproducible simulations with fixed timesteps - [Safety Monitor](/advanced/safety-monitor) — graduated watchdog and health states - [RT Setup](/advanced/rt-setup) — Linux RT kernel configuration ## See Also - [Scheduler API](/rust/api/scheduler) — Full reference for all builder methods - [Execution Classes](/concepts/execution-classes) — How RT auto-detection works - [Real-Time Concepts](/concepts/real-time) — Why real-time matters for robotics - [Miss Enum](/rust/api/scheduler#miss-enum) — All deadline miss policies --- ## Tutorial 5: Hardware & Real-Time (C++) Path: /tutorials/05-hardware-rt-cpp Description: Connect real hardware with RT scheduling, CPU pinning, and watchdog protection # Tutorial 5: Hardware & Real-Time (C++) Connect real sensors and actuators to HORUS with proper real-time scheduling. This tutorial builds a motor controller that reads encoder feedback and drives a motor with SCHED_FIFO priority. ## What You'll Learn - Opening serial ports from a Node - RT scheduling: `budget()`, `deadline()`, `pin_core()`, `priority()` - Watchdog for detecting frozen hardware - `enter_safe_state()` for actuator safety - CPU governor and kernel requirements ## Prerequisites - Completed [Tutorial 2: Motor Controller](/docs/tutorials/02-motor-controller-cpp) - A Linux machine (RT features require Linux) - Optional: USB serial device for testing ## The Hardware Pattern Every hardware driver follows the same pattern: ``` init(): open device, configure, verify connection tick(): read sensor OR write actuator (never both blocking) safe(): zero actuators, disable outputs shutdown(): close device, release resources ``` ## Complete Code: RT Motor Driver ```cpp #include #include #include #include #include #include using namespace horus::literals; class MotorDriver : public horus::Node { public: MotorDriver(const char* port, int baudrate) : Node("motor_driver"), port_(port), baudrate_(baudrate) { cmd_sub_ = subscribe("motor.cmd"); state_pub_ = advertise("motor.state"); } void init() override { fd_ = open(port_, O_RDWR | O_NOCTTY | O_NONBLOCK); if (fd_ < 0) { horus::log::error("motor", "Failed to open serial port"); horus::blackbox::record("motor", "Serial port open failed"); return; } // Configure serial port struct termios tty{}; tcgetattr(fd_, &tty); cfsetispeed(&tty, baudrate_); cfsetospeed(&tty, baudrate_); tty.c_cflag |= (CLOCAL | CREAD); tty.c_cflag &= ~PARENB; tty.c_cflag &= ~CSTOPB; tty.c_cflag &= ~CSIZE; tty.c_cflag |= CS8; tcsetattr(fd_, TCSANOW, &tty); horus::log::info("motor", "Serial port opened, motor ready"); } void tick() override { if (fd_ < 0) return; // Read command auto cmd = cmd_sub_->recv(); if (cmd) { float duty = cmd->get()->linear; last_cmd_ = duty; // Send PWM command to motor controller // Protocol: "M\n" where duty is -100 to 100 char buf[32]; int n = std::snprintf(buf, sizeof(buf), "M%.0f\n", duty * 100.0f); write(fd_, buf, n); } // Read encoder feedback (non-blocking) char rbuf[64]; int n = read(fd_, rbuf, sizeof(rbuf) - 1); if (n > 0) { rbuf[n] = '\0'; float rpm = 0; if (std::sscanf(rbuf, "E%f", &rpm) == 1) { horus::msg::CmdVel state{}; state.linear = rpm; state.angular = last_cmd_; state_pub_->send(state); watchdog_fed_ = true; } } // Watchdog: if no encoder response for 100 ticks (1s), alarm if (watchdog_fed_) { watchdog_counter_ = 0; watchdog_fed_ = false; } else { watchdog_counter_++; if (watchdog_counter_ > 100 && !watchdog_alarmed_) { horus::log::error("motor", "Encoder watchdog timeout"); horus::blackbox::record("motor", "Encoder timeout > 1s"); watchdog_alarmed_ = true; send_zero(); } } } void enter_safe_state() override { send_zero(); horus::blackbox::record("motor", "Safe state: motor zeroed"); } private: void send_zero() { if (fd_ >= 0) { write(fd_, "M0\n", 3); } horus::msg::CmdVel stop{}; state_pub_->send(stop); } const char* port_; int baudrate_; int fd_ = -1; float last_cmd_ = 0; int watchdog_counter_ = 0; bool watchdog_fed_ = false; bool watchdog_alarmed_ = false; horus::Subscriber* cmd_sub_; horus::Publisher* state_pub_; }; int main() { horus::Scheduler sched; sched.tick_rate(100_hz) .name("rt_motor") .prefer_rt(); // Use SCHED_FIFO if available MotorDriver motor("/dev/ttyUSB0", B115200); sched.add(motor) .order(0) // highest priority .budget(2_ms) // must complete in 2ms .deadline(5_ms) // absolute deadline 5ms .on_miss(horus::Miss::SafeMode) // stop motor if overrun .pin_core(2) // pin to CPU core 2 .priority(90) // SCHED_FIFO priority 90 .watchdog(1_s) // scheduler-level watchdog .build(); sched.spin(); } ``` ## RT Configuration Explained ```cpp sched.prefer_rt(); // Try SCHED_FIFO; degrade gracefully if unavailable // vs sched.require_rt(); // FAIL if SCHED_FIFO not available (production systems) ``` | Setting | Purpose | Typical Value | |---------|---------|---------------| | `budget(2_ms)` | Max time per tick | 50-80% of period | | `deadline(5_ms)` | Absolute tick deadline | 90-95% of period | | `pin_core(2)` | CPU affinity | Dedicated core, not core 0 | | `priority(90)` | SCHED_FIFO level | 80-99 for critical, 50-79 for normal | | `watchdog(1_s)` | Frozen node detection | 5-10x expected tick period | ## RT Kernel Setup For full RT guarantees: ```bash # Check current kernel uname -r # Look for "-rt" suffix # Set CPU governor to performance sudo cpupower frequency-set -g performance # Grant RT privileges without root sudo setcap cap_sys_nice+ep ./my_robot # Or run with elevated privileges sudo nice -n -20 ./my_robot ``` Without an RT kernel, HORUS still works — `prefer_rt()` logs warnings but continues with best-effort scheduling. ## Key Takeaways - `init()` opens hardware, `tick()` reads/writes, `enter_safe_state()` zeros actuators - Never block in `tick()` — use non-blocking I/O (`O_NONBLOCK`) - Watchdog detects frozen hardware (encoder cable disconnected, motor driver crash) - `pin_core()` prevents OS from migrating the thread — critical for latency - `budget()` + `SafeMode` = automatic motor shutdown on timing overrun - Test without RT kernel first, add RT for production deployment --- ## Tutorial: Real-Time Control (Python) Path: /tutorials/realtime-control-python Description: Build a multi-rate robot controller with real-time scheduling, deadline enforcement, and safety policies in Python # Tutorial: Real-Time Control (Python) ## Prerequisites - [Quick Start (Python)](/getting-started/quick-start-python) completed - Basic familiarity with `horus.Node()` and `horus.run()` ## What You'll Build A robot controller with four nodes running at different rates and execution classes: 1. **IMU Sensor** at 100 Hz --- real-time, reads accelerometer and gyroscope data 2. **PID Controller** at 50 Hz --- real-time, computes velocity commands from sensor feedback 3. **Path Planner** at 10 Hz --- compute class, runs on a thread pool 4. **Safety Monitor** at 100 Hz --- real-time, highest priority, stops the system on dangerous commands **Time estimate**: ~20 minutes ## Step 1: Four Nodes, No Real-Time Start with four basic nodes. All use the default configuration --- no budgets, no deadlines, no safety policies yet. ```python import horus import math # ---- IMU Sensor (100 Hz) ---- def make_imu(): t = [0.0] def tick(node): t[0] += horus.dt() imu = horus.Imu( accel_x=0.1 * math.sin(t[0] * 2.0), accel_y=0.05 * math.cos(t[0] * 3.0), accel_z=9.81, gyro_x=0.01, gyro_y=0.0, gyro_z=0.05, ) node.send("imu.data", imu) return horus.Node( name="imu_sensor", tick=tick, rate=100, order=0, pubs=[horus.Imu], ) # ---- PID Controller (50 Hz) ---- def make_controller(): integral = [0.0] target_speed = 0.5 def tick(node): imu = node.recv("imu.data") if imu is None: return error = target_speed - imu.accel_x integral[0] += error * horus.dt() command = 2.0 * error + 0.1 * integral[0] cmd = horus.CmdVel(linear=command, angular=0.0) node.send("cmd_vel", cmd) return horus.Node( name="pid_controller", tick=tick, rate=50, order=10, subs=[horus.Imu], pubs=[horus.CmdVel], ) # ---- Path Planner (10 Hz) ---- def make_planner(): waypoints = [(1.0, 0.0), (2.0, 1.0), (3.0, 0.0)] idx = [0] def tick(node): odom = node.recv("odom") if odom is None: return # Pick the next waypoint wx, wy = waypoints[idx[0] % len(waypoints)] dx = wx - odom.x dy = wy - odom.y dist = math.sqrt(dx * dx + dy * dy) if dist < 0.2: idx[0] += 1 heading = math.atan2(dy, dx) node.send("plan.target", { "heading": heading, "distance": dist, "waypoint": idx[0] % len(waypoints), }) return horus.Node( name="path_planner", tick=tick, rate=10, order=50, subs=["odom"], pubs=["plan.target"], ) # ---- Safety Monitor (100 Hz) ---- def make_safety(): def tick(node): cmd = node.recv("cmd_vel") if cmd is None: return if abs(cmd.linear) > 2.0: node.log_warning(f"Unsafe velocity: {cmd.linear:.2f} m/s") node.request_stop() if abs(cmd.angular) > 1.5: node.log_warning(f"Unsafe angular velocity: {cmd.angular:.2f} rad/s") node.request_stop() return horus.Node( name="safety_monitor", tick=tick, rate=100, order=1, subs=[horus.CmdVel], ) # ---- Run ---- horus.run( make_imu(), make_controller(), make_planner(), make_safety(), tick_rate=100, ) ``` Run this with `horus run`. All four nodes tick on the scheduler with no timing enforcement. The IMU publishes sensor data, the PID controller computes velocity commands, the planner picks waypoints, and the safety monitor watches for dangerous velocities. It works, but nothing prevents a slow tick from cascading delays across the system. ## Step 2: Add Rates and Auto-Derived Timing Setting `rate=` on a node does three things automatically: 1. Sets the tick frequency 2. Derives a default budget (80% of the period) 3. Derives a default deadline (95% of the period) The node is also promoted to the Rt execution class --- it gets a dedicated thread with timing enforcement. Our nodes already have `rate=` set, so they are already RT nodes with auto-derived budgets and deadlines: | Node | Rate | Period | Auto Budget (80%) | Auto Deadline (95%) | |------|------|--------|-------------------|---------------------| | IMU Sensor | 100 Hz | 10 ms | 8 ms | 9.5 ms | | PID Controller | 50 Hz | 20 ms | 16 ms | 19 ms | | Path Planner | 10 Hz | 100 ms | 80 ms | 95 ms | | Safety Monitor | 100 Hz | 10 ms | 8 ms | 9.5 ms | These defaults are reasonable starting points. You do not need to specify budgets and deadlines manually unless profiling shows the auto-derived values are too loose or too tight. ## Step 3: Add Deadline Miss Policies If the PID controller misses a deadline, the motors receive stale commands and the robot drifts. If the safety monitor misses a deadline, dangerous commands go unchecked. Each node needs a policy for what happens on overrun: | Policy | String value | What happens | Best for | |--------|-------------|-------------|----------| | Warn | `"warn"` | Log a warning, continue normally | Development, non-critical nodes | | Skip | `"skip"` | Drop the late tick, run next on schedule | Sensors where one missed reading is acceptable | | SafeMode | `"safe_mode"` | Call `enter_safe_state()` on the node | Motor controllers, actuators | | Stop | `"stop"` | Shut down the entire scheduler | Safety monitors --- last line of defense | Update the node constructors with miss policies: ```python import horus us = horus.us # 1e-6 ms = horus.ms # 1e-3 # IMU: skip missed ticks --- one stale reading is fine imu = horus.Node( name="imu_sensor", tick=imu_tick, rate=100, order=0, on_miss="skip", pubs=[horus.Imu], ) # PID controller: enter safe state on deadline miss controller = horus.Node( name="pid_controller", tick=controller_tick, rate=50, order=10, budget=10 * ms, deadline=18 * ms, on_miss="safe_mode", subs=[horus.Imu], pubs=[horus.CmdVel], ) # Safety monitor: stop everything if it misses a deadline safety = horus.Node( name="safety_monitor", tick=safety_tick, rate=100, order=1, budget=2 * ms, deadline=5 * ms, on_miss="stop", subs=[horus.CmdVel], ) ``` The PID controller now has explicit `budget=10*ms` and `deadline=18*ms`, overriding the auto-derived values. The safety monitor has a tight `budget=2*ms` --- it does minimal work per tick and must finish fast. ## Step 4: Move the Planner to Compute The path planner runs algorithms that can take 10-50 ms. If it runs on the RT thread, a slow planning cycle blocks the PID controller. Use `compute=True` to move it to a thread pool: ```python planner = horus.Node( name="path_planner", tick=planner_tick, rate=10, order=50, compute=True, # runs on worker thread pool, not RT thread subs=["odom"], pubs=["plan.target"], ) ``` With `compute=True`, the planner runs on a separate thread pool. It cannot block the 50 Hz PID controller or the 100 Hz safety monitor. Note that `rate=10` on a Compute node is a frequency cap --- it limits how often the planner ticks, but there is no budget or deadline enforcement. This is the right choice: path planning takes variable time, and a 50 ms planning cycle is normal, not a failure. ## Step 5: Configure the Scheduler for Real-Time Enable OS-level real-time scheduling on the scheduler: ```python sched = horus.Scheduler( tick_rate=100, rt=True, # request SCHED_FIFO + mlockall watchdog_ms=500, # detect frozen nodes max_deadline_misses=5, # stop after 5 consecutive misses ) ``` - `rt=True` requests real-time OS scheduling. Falls back gracefully if permissions are unavailable --- the same code runs on a developer laptop (without RT) and a production robot (with RT). - `watchdog_ms=500` detects nodes that stop responding. Graduated response: warning at 500 ms, unhealthy at 1000 ms, isolated at 1500 ms. - `max_deadline_misses=5` stops the scheduler after 5 total deadline misses --- a system-wide safety net. ## Step 6: Complete System Here is the full program with all nodes configured, real-time scheduling, and safety policies: ```python import horus import math import gc us = horus.us ms = horus.ms # ---- IMU Sensor (100 Hz) ---------------------------------------- def make_imu(): t = [0.0] def tick(node): t[0] += horus.dt() imu = horus.Imu( accel_x=0.1 * math.sin(t[0] * 2.0), accel_y=0.05 * math.cos(t[0] * 3.0), accel_z=9.81, gyro_x=0.01, gyro_y=0.0, gyro_z=0.05, ) node.send("imu.data", imu) return horus.Node( name="imu_sensor", tick=tick, rate=100, order=0, on_miss="skip", # one stale reading is acceptable pubs=[horus.Imu], ) # ---- PID Controller (50 Hz) ------------------------------------- def make_controller(): gc.disable() # no GC pauses in the control loop integral = [0.0] target_speed = 0.5 def tick(node): imu = node.recv("imu.data") if imu is None: return # PID compute --- no allocations in the fast path error = target_speed - imu.accel_x integral[0] += error * horus.dt() command = 2.0 * error + 0.1 * integral[0] cmd = horus.CmdVel(linear=command, angular=0.0) node.send("cmd_vel", cmd) # Collect GC only if budget allows if horus.budget_remaining() > 0.003: gc.collect(generation=0) def enter_safe_state(node): # Called when deadline is missed (on_miss="safe_mode") node.send("cmd_vel", horus.CmdVel(linear=0.0, angular=0.0)) node.log_warning("Controller entered safe state --- zero velocity") return horus.Node( name="pid_controller", tick=tick, shutdown=enter_safe_state, rate=50, order=10, budget=10 * ms, deadline=18 * ms, on_miss="safe_mode", # zero velocity on deadline miss subs=[horus.Imu], pubs=[horus.CmdVel], ) # ---- Path Planner (10 Hz, Compute) ------------------------------ def make_planner(): waypoints = [(1.0, 0.0), (2.0, 1.0), (3.0, 0.0)] idx = [0] def tick(node): odom = node.recv("odom") if odom is None: return wx, wy = waypoints[idx[0] % len(waypoints)] dx = wx - odom.x dy = wy - odom.y dist = math.sqrt(dx * dx + dy * dy) if dist < 0.2: idx[0] += 1 heading = math.atan2(dy, dx) node.send("plan.target", { "heading": heading, "distance": dist, "waypoint": idx[0] % len(waypoints), }) return horus.Node( name="path_planner", tick=tick, rate=10, order=50, compute=True, # thread pool --- does not block RT nodes subs=["odom"], pubs=["plan.target"], ) # ---- Safety Monitor (100 Hz) ------------------------------------ def make_safety(): def tick(node): cmd = node.recv("cmd_vel") if cmd is None: return if abs(cmd.linear) > 2.0: node.log_warning(f"Unsafe velocity: {cmd.linear:.2f} m/s") node.request_stop() if abs(cmd.angular) > 1.5: node.log_warning(f"Unsafe angular rate: {cmd.angular:.2f} rad/s") node.request_stop() return horus.Node( name="safety_monitor", tick=tick, rate=100, order=1, # runs right after IMU, before controller budget=2 * ms, deadline=5 * ms, on_miss="stop", # stop everything if safety check is late priority=95, # highest priority RT thread subs=[horus.CmdVel], ) # ---- Main ------------------------------------------------------- print("Starting multi-rate robot controller") print(" IMU Sensor: 100 Hz (Rt, on_miss=skip)") print(" PID Controller: 50 Hz (Rt, on_miss=safe_mode)") print(" Path Planner: 10 Hz (Compute)") print(" Safety Monitor: 100 Hz (Rt, on_miss=stop, priority=95)") print() sched = horus.Scheduler( tick_rate=100, rt=True, watchdog_ms=500, max_deadline_misses=5, ) sched.add(make_imu()) sched.add(make_controller()) sched.add(make_planner()) sched.add(make_safety()) sched.run() ``` Press Ctrl+C to stop. The timing report shows budget and deadline statistics for each node. ## Step 7: Verify Topic Rates After starting the system, open a second terminal and check that each node is publishing at the expected rate: ```bash # Check IMU is publishing at ~100 Hz horus topic hz imu.data # Expected: ~100.0 Hz # Check PID controller output at ~50 Hz horus topic hz cmd_vel # Expected: ~50.0 Hz # Check planner output at ~10 Hz horus topic hz plan.target # Expected: ~10.0 Hz ``` If a topic's measured rate is significantly lower than expected, a node is missing deadlines or being throttled. Check the scheduler logs for deadline miss warnings. You can also echo a topic to see live data: ```bash horus topic echo cmd_vel horus topic echo imu.data ``` Or use the full monitoring dashboard: ```bash horus monitor ``` ## Python Real-Time: What Works, What Doesn't Python runs on CPython, which has a Global Interpreter Lock (GIL) and a garbage collector. Both affect timing predictability. ### Practical Frequency Limits | Frequency | Period | Python viable? | Notes | |-----------|--------|---------------|-------| | 1-10 Hz | 100-1000 ms | Yes | Huge budget, GIL overhead is negligible | | 10-50 Hz | 20-100 ms | Yes | Plenty of time for Python + ML inference | | 50-100 Hz | 10-20 ms | Yes, with care | Budget is tight but achievable for simple logic | | 100-500 Hz | 2-10 ms | Marginal | GC pauses (1-5 ms) can blow the budget | | 500+ Hz | <2 ms | No | GIL + GC make consistent timing impossible | **The practical ceiling for Python RT is about 100 Hz.** At 100 Hz, your tick budget is 8 ms. A typical Python `tick()` doing sensor reads and simple math takes 0.1-2 ms, leaving plenty of margin. At 500 Hz, the budget drops to 1.6 ms, where a single garbage collection pause blows through the deadline. ### GC Mitigation The PID controller in Step 6 uses `gc.disable()` and manual `gc.collect(generation=0)` during budget headroom. This pattern eliminates GC pauses from the critical path: ```python gc.disable() # no automatic GC def tick(node): # ... fast-path control logic, no allocations ... if horus.budget_remaining() > 0.003: # 3 ms headroom gc.collect(generation=0) # minor collection only ``` Only use this for nodes where GC pauses are unacceptable. For most nodes at 10-50 Hz, the default garbage collector works fine. ### When to Use Rust Instead For this tutorial's architecture: - **IMU at 100 Hz** --- Python works, but you are near the ceiling. If timing jitter is a problem, move to Rust. - **PID at 50 Hz** --- Python is comfortable. 20 ms period gives plenty of budget. - **Planner at 10 Hz** --- Python is ideal. Complex algorithms, ML inference, data structures --- Python's strengths. - **Safety at 100 Hz** --- Python works for monitoring. For hard safety-critical nodes, consider Rust. For motor control at 1 kHz+, use the [Rust Real-Time Control](/tutorials/realtime-control) tutorial. Python cannot reliably sustain sub-millisecond tick budgets. ## Key Takeaways - **`rate=` implies RT** --- auto-derives budget (80%), deadline (95%), and promotes to Rt execution class. Override with explicit `budget=` and `deadline=` as needed. - **Safety is explicit** --- always set `on_miss=` for safety-critical nodes. `"safe_mode"` zeros outputs. `"stop"` halts the scheduler. - **Separate by execution class** --- keep control loops on RT threads, move heavy computation to `compute=True`. - **`rt=True` degrades gracefully** --- request RT on the Scheduler, check `sched.has_full_rt()` in production. - **Budget and deadline are in seconds** --- use `horus.us` (1e-6) and `horus.ms` (1e-3) for readability. `budget=300` means 300 seconds, not 300 microseconds. - **Python is fine for 10-100 Hz** --- sensor fusion, navigation, ML inference, safety monitoring. Use Rust for 500+ Hz motor loops. ## Next Steps - [Deterministic Mode](/advanced/deterministic-mode) --- reproducible simulations with fixed timesteps - [Safety Monitor](/advanced/safety-monitor) --- graduated watchdog and health states - [Testing and Deterministic Mode (Python)](/python/testing) --- `tick_once()`, deterministic mode, and pytest patterns - [Real-Time Control (Rust)](/tutorials/realtime-control) --- same tutorial in Rust, with 1 kHz motor control ## See Also - [Real-Time Systems (Python)](/python/real-time) --- budget, deadline, and miss policy reference - [Execution Classes (Python)](/python/execution-classes) --- how RT auto-detection works - [Scheduler Deep-Dive (Python)](/python/scheduler-guide) --- full scheduler reference - [Python Bindings](/python/api/python-bindings) --- complete API surface --- ## Tutorial 6: Services & Actions (C++) Path: /tutorials/06-services-actions-cpp Description: Request/response RPC and long-running tasks with progress feedback # Tutorial 6: Services & Actions (C++) Topics are fire-and-forget. Sometimes you need a **response** (services) or **progress updates** (actions). This tutorial covers both. ## What You'll Learn - `horus::ServiceClient` / `horus::ServiceServer` for request/response - `horus::ActionClient` / `horus::ActionServer` for long-running tasks - JSON-based type erasure for flexible RPC - Cross-process service calls ## Services: Request/Response A service is like a function call across processes. Client sends a request, server returns a response. ### Example: Add Two Numbers ```cpp #include #include #include using namespace horus::literals; int main() { // ── Server: listens for requests, computes response ───────── horus::ServiceServer server("add_two_ints"); // Handler receives raw bytes, writes response server.set_handler([](const uint8_t* req, size_t req_len, uint8_t* res, size_t* res_len) -> bool { // Parse request JSON char json[4096] = {}; std::memcpy(json, req, req_len); // Simple parsing (production: use a JSON library) int a = 0, b = 0; std::sscanf(json, R"({"a":%d,"b":%d})", &a, &b); // Compute response int sum = a + b; *res_len = std::snprintf(reinterpret_cast(res), 4096, R"({"sum":%d})", sum); return true; }); // ── Client: sends request, waits for response ─────────────── horus::ServiceClient client("add_two_ints"); auto response = client.call(R"({"a": 3, "b": 4})", std::chrono::milliseconds(1000)); if (response) { std::printf("Response: %s\n", response->c_str()); // Output: Response: {"sum":7} } else { std::printf("Service call timed out\n"); } } ``` ### How Services Work ``` Client Server │ │ │─── Request JSON ──────────→ │ │ (via SHM topic) │ handler() called │ │ computes response │ ←── Response JSON ──────── │ │ (via SHM topic) │ ``` Under the hood, services use `JsonWireMessage` Pod transport over two SHM topics: - `{service_name}.request` — client publishes, server subscribes - `{service_name}.response.{client_pid}` — server publishes, client subscribes ## Actions: Long-Running Tasks with Progress Actions are for tasks that take time — navigating to a goal, calibrating a sensor, recording data. ### Example: Navigate to Goal ```cpp #include #include using namespace horus::literals; int main() { // ── Client: send goal, monitor progress ───────────────────── horus::ActionClient client("navigate"); auto goal = client.send_goal(R"({"target_x": 5.0, "target_y": 3.0})"); if (!goal) { std::printf("Failed to send goal\n"); return 1; } std::printf("Goal sent (id=%lu), status: Pending\n", goal.id()); // Monitor status while (goal.is_active()) { // In a real system, poll feedback topic std::printf(" Status: %s\n", goal.status() == horus::GoalStatus::Pending ? "Pending" : goal.status() == horus::GoalStatus::Active ? "Active" : "?"); break; // demo — normally poll in tick loop } // Cancel if needed // goal.cancel(); std::printf("Final status: %d\n", static_cast(goal.status())); } ``` ### Action Lifecycle ``` Client Server │ │ │── Goal JSON ──────────────→ │ accept_handler() → accept/reject │ │ │ ←── Feedback JSON ──────── │ (periodic progress updates) │ ←── Feedback JSON ──────── │ │ ←── Feedback JSON ──────── │ │ │ │ ←── Result JSON ────────── │ (final result) │ │ │── Cancel ──────────────────→ │ (optional, client-initiated) ``` ### Action Server ```cpp horus::ActionServer server("navigate"); // Accept handler: decide whether to accept the goal server.set_accept_handler([](const uint8_t* goal, size_t len) -> uint8_t { // 0 = accept, 1 = reject return 0; // accept all goals }); // Execute handler: called when goal is accepted server.set_execute_handler([](uint64_t goal_id, const uint8_t* goal, size_t len) { // Start navigation... // Publish feedback periodically // Publish result when done }); ``` ## Cross-Process Services Services work across processes — client and server can be separate binaries: ```bash # Terminal 1: server ./my_service_server # Terminal 2: client ./my_service_client ``` Both connect through SHM. The topic names must match. ## When To Use What | Pattern | Use Case | Latency | |---------|----------|---------| | **Topic** | Continuous data (sensor readings, commands) | ~15 ns | | **Service** | One-shot query (get parameter, check status) | ~5 us (JSON round-trip) | | **Action** | Long task with progress (navigate, calibrate) | ~5 us + task duration | ## Key Takeaways - Services = synchronous request/response (like function calls) - Actions = asynchronous goal/feedback/result (like background tasks) - Both use `JsonWireMessage` for type-erased communication - Both work same-process and cross-process via SHM - Client must specify timeout for services (network could delay) - Actions support cancellation via `goal.cancel()` --- ## Tutorial 7: Parameters Deep Dive (C++) Path: /tutorials/07-params-deep-dive-cpp Description: Runtime configuration with typed parameters, live tuning, and persistence # Tutorial 7: Parameters Deep Dive (C++) Parameters let you change robot behavior at runtime without recompiling. Tune PID gains, adjust speed limits, enable/disable features — all while the robot is running. ## What You'll Learn - `horus::Params` for typed key-value storage - `get()` with defaults for safe access - Live tuning from `horus param` CLI - Organizing parameters by subsystem ## Basic Usage ```cpp #include using namespace horus::literals; class TunableController : public horus::Node { public: TunableController(horus::Params& params) : Node("tunable_ctrl"), params_(params) { cmd_sub_ = subscribe("cmd_vel"); out_pub_ = advertise("motor.cmd"); } void tick() override { // Read parameters every tick — picks up live changes double max_speed = params_.get("max_speed", 0.5); double gain = params_.get("controller_gain", 0.8); bool enabled = params_.get("motor_enabled", true); if (!enabled) { horus::msg::CmdVel stop{}; out_pub_->send(stop); return; } auto cmd = cmd_sub_->recv(); if (!cmd) return; double scaled = cmd->get()->linear * gain; scaled = std::min(scaled, max_speed); horus::msg::CmdVel out{}; out.linear = static_cast(scaled); out_pub_->send(out); } private: horus::Params& params_; horus::Subscriber* cmd_sub_; horus::Publisher* out_pub_; }; int main() { horus::Params params; // Set defaults params.set("max_speed", 0.5); params.set("controller_gain", 0.8); params.set("motor_enabled", true); params.set("robot_name", "atlas"); horus::Scheduler sched; sched.tick_rate(100_hz); TunableController ctrl(params); sched.add(ctrl).order(10).build(); sched.spin(); } ``` ## Supported Types | Type | `set()` | `get()` | `get_*()` | |------|---------|------------|-----------| | `double` | `set("key", 1.5)` | `get("key", 0.0)` | `get_f64("key")` → `optional` | | `int64_t` | `set("key", int64_t(42))` | `get("key", 0)` | `get_i64("key")` → `optional` | | `bool` | `set("key", true)` | `get("key", false)` | `get_bool("key")` → `optional` | | `string` | `set("key", "value")` | `get("key", "")` | `get_string("key")` → `optional` | ## Organizing Parameters Group by subsystem: ```cpp // Locomotion params.set("loco.max_speed", 0.5); params.set("loco.max_angular", 1.0); params.set("loco.wheel_base", 0.3); // PID params.set("pid.kp", 2.0); params.set("pid.ki", 0.1); params.set("pid.kd", 0.05); // Safety params.set("safety.min_distance", 0.3); params.set("safety.estop_enabled", true); ``` ## Pattern: Parameter Validation ```cpp void init() override { double kp = params_.get("pid.kp", -1.0); if (kp < 0) { horus::log::error("pid", "pid.kp must be >= 0"); return; } double max_speed = params_.get("loco.max_speed", 0.5); if (max_speed > 2.0) { horus::log::warn("pid", "max_speed > 2.0 m/s — are you sure?"); } } ``` ## Key Takeaways - Read params every tick — captures live changes - Always provide defaults: `get("key", 0.0)` never fails - Use dotted naming (`pid.kp`, `loco.max_speed`) to organize - Validate in `init()` — log warnings for dangerous values - `horus::Params` is thread-safe for concurrent read/write --- ## Cross-Language Interop: C++ + Rust + Python Path: /tutorials/cross-language-interop Description: Run C++, Rust, and Python nodes in the same system, sharing topics over shared memory # Cross-Language Interop HORUS lets you mix C++, Rust, and Python in the same system. All three languages share the same shared-memory ring buffers — a C++ publisher and a Python subscriber read from the exact same memory. Zero serialization, zero copying for Pod types. ## How It Works ``` C++ Process Rust Process Python Process ┌──────────┐ ┌──────────┐ ┌──────────┐ │ Publisher │──SHM───→│Subscriber│ │ │ │ CmdVel │ │ │ CmdVel │──SHM───→│Subscriber│ └──────────┘ │ └──────────┘ │ CmdVel │ │ └──────────┘ └─ Same /dev/shm/horus_default/topics/cmd_vel file ``` All three open the same SHM file. The ring buffer header and data layout are identical regardless of language. ## Requirements For cross-language communication: 1. Use the **same topic name** (e.g., `"cmd_vel"`) 2. Use the **same message type** (e.g., `CmdVel` — a Pod struct with identical memory layout) 3. Processes must run on the **same machine** (SHM is local) ## Example: C++ Sensor + Python AI + Rust Controller ### C++ Sensor Driver (10 Hz) ```cpp #include int main() { horus::Scheduler sched; horus::Publisher pub("sensor.data"); sched.add("lidar_driver") .order(0) .tick([&] { auto msg = pub.loan(); msg->linear = 1.2f; // range reading msg->angular = 0.0f; pub.publish(std::move(msg)); }) .build(); sched.spin(); } ``` ### Python AI Processor ```python import horus def detector_tick(node): scan = node.recv("sensor.data") if scan is not None: # Run ML inference on sensor data obstacle = scan.linear < 0.5 node.send("ai.decision", horus.CmdVel( linear=0.0 if obstacle else 0.3, angular=0.5 if obstacle else 0.0 )) horus.run( horus.Node( name="ai_detector", subs=["sensor.data"], pubs=["ai.decision"], tick=detector_tick, rate=10 ) ) ``` ### Rust Motor Controller (100 Hz) ```rust use horus::prelude::*; node! { MotorCtrl { sub { cmd: CmdVel -> "ai.decision" } data { last_linear: f64 = 0.0 } tick { if let Some(c) = self.cmd.recv() { self.last_linear = c.linear as f64; // Apply to motors... } } } } fn main() -> Result<()> { let mut sched = Scheduler::new().tick_rate(100.hz()); sched.add(MotorCtrl::new()).build()?; sched.run() } ``` ### Run All Three ```bash # Terminal 1: C++ sensor horus run --project cpp_sensor/ # Terminal 2: Python AI horus run --project py_detector/ # Terminal 3: Rust controller horus run --project rust_ctrl/ ``` All three share topics over SHM. Use `horus topic list` to see all active topics from any terminal. ## Message Type Compatibility For cross-language topics, use **typed Pod messages** (not generic dicts): | C++ Type | Rust Type | Python Type | |----------|-----------|-------------| | `horus::msg::CmdVel` | `horus_library::CmdVel` | `horus.CmdVel` | | `horus::msg::Imu` | `horus_library::Imu` | `horus.Imu` | | `horus::msg::LaserScan` | `horus_library::LaserScan` | `horus.LaserScan` | | `horus::msg::Odometry` | `horus_library::Odometry` | `horus.Odometry` | | `horus::msg::JointState` | `horus_library::JointState` | `horus.JointState` | | `horus::msg::Pose2D` | `horus_library::Pose2D` | `horus.Pose2D` | | `horus::msg::Twist` | `horus_library::Twist` | `horus.Twist` | All share the same `#[repr(C)]` memory layout. A CmdVel written by C++ is byte-identical to one written by Rust or Python. ## CLI Works Across Languages No matter which language published the message: ```bash horus topic list # shows topics from C++, Rust, and Python nodes horus topic echo cmd_vel # receives messages regardless of publisher language horus node list # shows all nodes with PID and language ``` ## Performance | Path | Latency | |------|---------| | Rust to Rust (same process) | 11 ns | | C++ to C++ (same process) | 15 ns (11 + 4 FFI) | | C++ to Rust (cross-process) | 170 ns | | Rust to Python (cross-process) | ~300 ns | | C++ to Python (cross-process) | ~300 ns | Cross-process latency is dominated by SHM synchronization, not language boundaries. ## When to Use Each Language | Component | Recommended Language | Why | |-----------|---------------------|-----| | Sensor drivers | C++ or Rust | Direct hardware access, low latency | | AI/ML inference | Python | PyTorch, TensorFlow, NumPy ecosystem | | Control loops | Rust or C++ | Determinism, real-time guarantees | | Monitoring/logging | Python | Rapid development, visualization | | Safety systems | Rust | Compile-time guarantees, no undefined behavior | | Legacy integration | C++ | Existing codebase compatibility | --- ## Tutorial: Real-Time Control (C++) Path: /tutorials/realtime-control-cpp Description: Build a 1kHz motor control loop with SCHED_FIFO, jitter measurement, and failsafe # Tutorial: Real-Time Control (C++) Build a 1 kHz motor control loop — the kind that runs real industrial servos. Measure jitter, enforce deadlines, and prove your system meets timing requirements. ## What You'll Build A control loop that: - Runs at exactly 1000 Hz - Reads encoder + computes PID + writes motor in < 500us - Measures tick-to-tick jitter - Fails safe (stops motor) if any tick exceeds 900us ## Complete Code ```cpp #include #include #include #include #include using namespace horus::literals; class RtMotorLoop : public horus::Node { public: RtMotorLoop() : Node("rt_motor_1khz") { cmd_sub_ = subscribe("motor.target"); enc_sub_ = subscribe("encoder.rpm"); pwm_pub_ = advertise("motor.pwm"); } void init() override { last_tick_ = std::chrono::steady_clock::now(); horus::log::info("rt_motor", "1kHz RT loop initialized"); } void tick() override { auto now = std::chrono::steady_clock::now(); auto dt_us = std::chrono::duration_cast( now - last_tick_).count(); last_tick_ = now; // Track jitter statistics if (tick_count_ > 10) { // skip first 10 ticks (startup) if (dt_us > max_jitter_us_) max_jitter_us_ = dt_us; if (dt_us < min_jitter_us_) min_jitter_us_ = dt_us; jitter_sum_ += dt_us; } tick_count_++; // Read target velocity if (auto cmd = cmd_sub_->recv()) { target_rpm_ = cmd->get()->linear; } // Read actual RPM double actual_rpm = 0; if (auto enc = enc_sub_->recv()) { actual_rpm = enc->get()->linear; } // PID (tuned for motor dynamics) double error = target_rpm_ - actual_rpm; integral_ += error * 0.001; // dt = 1ms integral_ = std::clamp(integral_, -100.0, 100.0); double derivative = (error - prev_error_) * 1000.0; prev_error_ = error; double output = 0.5 * error + 0.01 * integral_ + 0.001 * derivative; output = std::clamp(output, -1.0, 1.0); horus::msg::CmdVel pwm{}; pwm.linear = static_cast(output); pwm_pub_->send(pwm); // Report jitter every 1000 ticks (1 Hz) if (tick_count_ % 1000 == 0) { double avg = static_cast(jitter_sum_) / (tick_count_ - 10); char buf[128]; std::snprintf(buf, sizeof(buf), "jitter: min=%ldus avg=%.0fus max=%ldus (target=1000us)", min_jitter_us_, avg, max_jitter_us_); horus::log::info("rt_motor", buf); } } void enter_safe_state() override { horus::msg::CmdVel stop{}; pwm_pub_->send(stop); integral_ = 0; char buf[128]; std::snprintf(buf, sizeof(buf), "SAFE STATE — max jitter was %ldus", max_jitter_us_); horus::blackbox::record("rt_motor", buf); } private: horus::Subscriber* cmd_sub_; horus::Subscriber* enc_sub_; horus::Publisher* pwm_pub_; double target_rpm_ = 0; double integral_ = 0, prev_error_ = 0; int tick_count_ = 0; long jitter_sum_ = 0; long max_jitter_us_ = 0; long min_jitter_us_ = 999999; std::chrono::steady_clock::time_point last_tick_; }; int main() { horus::Scheduler sched; sched.tick_rate(1000_hz) .name("rt_control") .prefer_rt(); RtMotorLoop motor; sched.add(motor) .order(0) .budget(500_us) // must finish in 500us (50% of 1ms period) .deadline(900_us) // absolute deadline 900us .on_miss(horus::Miss::SafeMode) .pin_core(3) // dedicated CPU core .priority(95) // near-max SCHED_FIFO priority .watchdog(100_ms) // if stuck for 100ms, trigger safety .build(); std::printf("1kHz RT motor loop starting...\n"); std::printf("Monitor: horus log (see jitter stats)\n"); sched.spin(); } ``` ## Understanding the Timing Budget ``` |←────────── 1000 us (1 kHz period) ──────────→| |── budget ──|── slack ──|── deadline ──| | 500 us | | 900 us | ^ Miss::SafeMode triggers here ``` - **budget(500us)**: expected max computation time - **deadline(900us)**: absolute latest the tick can finish - **slack**: 400us buffer for OS scheduling jitter - If tick exceeds 900us → `enter_safe_state()` called → motor stops ## Jitter Measurement The code measures tick-to-tick timing: - **Ideal**: every tick exactly 1000us apart - **Real (no RT kernel)**: 950-1200us typical, spikes to 5000us+ - **Real (PREEMPT_RT)**: 995-1005us, max spike ~50us ## Key Takeaways - 1 kHz control is achievable on commodity Linux with HORUS - `pin_core()` is critical — prevents OS migration mid-tick - `budget()` + `SafeMode` = automatic shutdown on timing failure - Measure jitter to prove your system meets requirements - `prefer_rt()` for development, `require_rt()` for production - BlackBox records the max jitter at safe-state for post-mortem analysis --- ## Tutorial 8: Multi-Process Systems (C++) Path: /tutorials/08-multi-process-cpp Description: Run multiple C++ processes sharing topics over shared memory # Tutorial 8: Multi-Process Systems (C++) Real robots run multiple processes — a sensor driver, a controller, a safety monitor, each as a separate binary. This tutorial shows how to build and run multi-process systems. ## What You'll Learn - Separate binaries sharing SHM topics - Process startup ordering (subscriber before publisher) - Cross-process service calls - Using `horus launch` for multi-process orchestration ## Architecture ``` Process 1: lidar_driver Process 2: controller Process 3: safety ┌────────────────────┐ ┌────────────────────┐ ┌────────────┐ │ Publisher │──SHM─→│ Subscriber │ │ Subscriber │ │ "lidar.scan" │ │ "lidar.scan" │ │ "cmd_vel" │ └────────────────────┘ │ Publisher │──SHM─→│ │ │ "cmd_vel" │ └────────────┘ └────────────────────┘ ``` All three processes open the **same SHM files** in `/dev/shm/horus_default/topics/`. No message broker, no serialization. ## Process 1: Sensor Driver ```cpp // sensor_driver.cpp #include using namespace horus::literals; class LidarSensor : public horus::Node { public: LidarSensor() : Node("lidar") { pub_ = advertise("lidar.scan"); } void tick() override { auto scan = pub_->loan(); for (int i = 0; i < 360; i++) scan->ranges[i] = 2.0f + 0.5f * std::sin(i * 0.1f); pub_->publish(std::move(scan)); } private: horus::Publisher* pub_; }; int main() { horus::Scheduler sched; sched.tick_rate(10_hz).name("lidar_proc"); LidarSensor lidar; sched.add(lidar).order(0).build(); sched.spin(); } ``` ## Process 2: Controller ```cpp // controller.cpp #include using namespace horus::literals; class Controller : public horus::Node { public: Controller() : Node("controller") { scan_sub_ = subscribe("lidar.scan"); cmd_pub_ = advertise("cmd_vel"); } void tick() override { auto scan = scan_sub_->recv(); if (!scan) return; float min_range = 999.0f; for (int i = 0; i < 360; i++) if (scan->get()->ranges[i] < min_range) min_range = scan->get()->ranges[i]; horus::msg::CmdVel cmd{}; cmd.linear = min_range > 0.5f ? 0.3f : 0.0f; cmd_pub_->send(cmd); } void enter_safe_state() override { horus::msg::CmdVel stop{}; cmd_pub_->send(stop); } private: horus::Subscriber* scan_sub_; horus::Publisher* cmd_pub_; }; int main() { horus::Scheduler sched; sched.tick_rate(50_hz).name("ctrl_proc"); Controller ctrl; sched.add(ctrl).order(0).build(); sched.spin(); } ``` ## Running Multi-Process ```bash # Build both g++ -std=c++17 -I horus_cpp/include -o sensor sensor_driver.cpp \ -L target/debug -lhorus_cpp -lpthread -ldl -lm g++ -std=c++17 -I horus_cpp/include -o controller controller.cpp \ -L target/debug -lhorus_cpp -lpthread -ldl -lm # Run (subscriber process first — creates SHM) LD_LIBRARY_PATH=target/debug ./controller & sleep 0.5 LD_LIBRARY_PATH=target/debug ./sensor & # Monitor from any terminal horus topic list # shows lidar.scan, cmd_vel horus node list # shows lidar, controller (separate PIDs) ``` ## Startup Order Rule **Subscriber must start before publisher** for the first message exchange. The first process to open a topic creates the SHM ring buffer. If the publisher creates it and exits, the data is gone before the subscriber connects. For production, use `horus launch` which manages startup ordering automatically. ## Key Takeaways - Each process has its own `Scheduler` — they don't share a scheduler - Topics are shared via SHM files (`/dev/shm/horus_default/topics/`) - Subscriber should start first to create the ring buffer - `horus topic list` shows topics from ALL processes - No message broker — direct SHM ring buffer, ~170ns cross-process latency - Each process can have different tick rates (10 Hz sensor, 50 Hz controller) --- ## Tutorial 9: Record & Replay (C++) Path: /tutorials/09-record-replay-cpp Description: Record topic data for debugging and replay it offline # Tutorial 9: Record & Replay (C++) Record all topic data while your robot runs, then replay it offline for debugging. This is the robotics equivalent of a flight recorder. ## What You'll Learn - Using `horus record` CLI to capture topic data - Using `horus replay` to play back recorded data - Using BlackBox for crash analysis - Designing nodes that work with both live and recorded data ## Recording Live Data While your robot is running: ```bash # Record all topics to a file horus record --output session_001.horus # Record specific topics only horus record --topics lidar.scan,cmd_vel,odom --output drive_test.horus # Record for 60 seconds horus record --duration 60 --output minute_log.horus ``` The recording captures every message on every topic with nanosecond timestamps. ## Replaying Recorded Data Play back recorded data — your subscriber nodes see the same messages as during recording: ```bash # Replay at original speed horus replay session_001.horus # Replay at 2x speed (for faster analysis) horus replay session_001.horus --rate 2.0 # Replay specific topics only horus replay session_001.horus --topics lidar.scan,odom ``` ## Designing for Replay The key pattern: separate your **processing nodes** from your **hardware drivers**. Processing nodes subscribe to topics — they don't care if data comes from live hardware or a recording. ```cpp // This controller works identically with live OR recorded data class Controller : public horus::Node { public: Controller() : Node("controller") { scan_sub_ = subscribe("lidar.scan"); cmd_pub_ = advertise("cmd_vel"); } void tick() override { auto scan = scan_sub_->recv(); if (!scan) return; // This code runs the same whether lidar.scan comes from // a real LiDAR or from horus replay float min_range = 999.0f; for (int i = 0; i < 360; i++) { float r = scan->get()->ranges[i]; if (r > 0.01f && r < min_range) min_range = r; } horus::msg::CmdVel cmd{}; cmd.linear = min_range > 0.5f ? 0.3f : 0.0f; cmd_pub_->send(cmd); } private: horus::Subscriber* scan_sub_; horus::Publisher* cmd_pub_; }; ``` ## BlackBox for Crash Analysis The BlackBox records events even if the process crashes: ```cpp // Record important events during operation horus::blackbox::record("controller", "Obstacle detected at 0.3m"); horus::blackbox::record("safety", "E-stop triggered"); horus::blackbox::record("motor", "Current spike: 15A"); // After a crash, inspect with CLI: // horus blackbox --last 100 // horus log --since "5 minutes ago" ``` ## Workflow ``` 1. Run robot → horus record --output test.horus 2. Bug happens → stop recording 3. Fix controller → edit code, recompile 4. Test with replay → horus replay test.horus 5. Verify fix → no bug? ship it ``` No need to set up hardware again. No need to reproduce the exact scenario. The recording has everything. ## Key Takeaways - `horus record` captures all SHM topic data with timestamps - `horus replay` plays it back — subscribers see the same messages - Design nodes as pure topic processors — they work with live AND recorded data - BlackBox events survive crashes — use for safety-critical logging - Replay at different speeds for fast iteration --- ## Tutorial 10: Write a Reusable Driver (C++) Path: /tutorials/10-write-a-driver-cpp Description: Package a hardware driver as a reusable horus::Node with configuration, diagnostics, and safety # Tutorial 10: Write a Reusable Driver (C++) Turn a hardware interface into a reusable `horus::Node` that other projects can drop in. This tutorial builds a production-quality IMU driver. ## What You'll Learn - Driver as a self-contained `horus::Node` subclass - Configuration via `horus::Params` - Health monitoring with `Heartbeat` - Diagnostic reporting with `DiagnosticStatus` - Safe shutdown on hardware failure ## The Driver Pattern ``` ┌─────────────────────────────────────┐ │ ImuDriver : horus::Node │ │ │ │ init() → open device, configure │ │ tick() → read data, publish │ │ safe() → zero outputs, close │ │ │ │ Publishes: │ │ "imu.data" (Imu) │ │ "imu.heartbeat"(Heartbeat) │ │ "imu.status" (DiagnosticStatus)│ │ │ │ Params: │ │ imu.port = "/dev/ttyUSB0" │ │ imu.baudrate = 115200 │ │ imu.calibrate = true │ └─────────────────────────────────────┘ ``` ## Complete Code ```cpp #include #include #include #include #include using namespace horus::literals; class ImuDriver : public horus::Node { public: ImuDriver(horus::Params& params) : Node("imu_driver"), params_(params) { imu_pub_ = advertise("imu.data"); hb_pub_ = advertise("imu.heartbeat"); status_pub_ = advertise("imu.status"); } void init() override { const char* port = "/dev/ttyUSB0"; // from params in production int baud = static_cast(params_.get("imu.baudrate", 115200)); fd_ = open(port, O_RDWR | O_NOCTTY | O_NONBLOCK); if (fd_ < 0) { horus::log::error("imu", "Failed to open " + std::string(port)); publish_status(2, "Failed to open serial port"); return; } // Configure serial port (abbreviated) // ... connected_ = true; horus::log::info("imu", "IMU connected on " + std::string(port)); publish_status(0, "Connected and calibrating"); if (params_.get("imu.calibrate", true)) { calibrate(); } } void tick() override { tick_count_++; if (!connected_) return; // Read raw data from IMU (non-blocking) uint8_t buf[64]; int n = fd_ >= 0 ? read(fd_, buf, sizeof(buf)) : -1; if (n > 0) { parse_and_publish(buf, n); consecutive_errors_ = 0; } else { consecutive_errors_++; if (consecutive_errors_ > 100) { // 1 second at 100 Hz horus::log::error("imu", "IMU read timeout — 100 consecutive failures"); horus::blackbox::record("imu", "Read timeout > 1s"); publish_status(2, "Read timeout"); connected_ = false; } } // Heartbeat at 1 Hz if (tick_count_ % 100 == 0) { horus::msg::Heartbeat hb{}; std::strncpy(reinterpret_cast(hb.node_id), "imu_driver", 31); hb.sequence = tick_count_ / 100; hb.alive = connected_; hb_pub_->send(hb); } // Status at 0.2 Hz if (tick_count_ % 500 == 0) { publish_status(connected_ ? 0 : 2, connected_ ? "OK" : "Disconnected"); } } void enter_safe_state() override { horus::log::warn("imu", "Safe state — closing device"); if (fd_ >= 0) { close(fd_); fd_ = -1; } connected_ = false; horus::blackbox::record("imu", "Safe state entered"); } private: void calibrate() { // Read 100 samples, compute bias horus::log::info("imu", "Calibrating (hold still)..."); // ... calibration logic ... horus::log::info("imu", "Calibration complete"); publish_status(0, "Ready"); } void parse_and_publish(const uint8_t* buf, int len) { // Parse device-specific protocol into Imu message horus::msg::Imu imu{}; imu.orientation[3] = 1.0; // identity quaternion imu.angular_velocity[2] = 0.01; // simulated yaw rate imu.linear_acceleration[2] = 9.81; // gravity imu.timestamp_ns = 0; // Apply calibration offset imu.angular_velocity[2] -= gyro_bias_z_; imu_pub_->send(imu); } void publish_status(uint8_t level, const char* msg) { horus::msg::DiagnosticStatus status{}; status.level = level; std::strncpy(reinterpret_cast(status.message), msg, 255); status.message_len = std::strlen(msg); status_pub_->send(status); } horus::Params& params_; horus::Publisher* imu_pub_; horus::Publisher* hb_pub_; horus::Publisher* status_pub_; int fd_ = -1; bool connected_ = false; int tick_count_ = 0; int consecutive_errors_ = 0; double gyro_bias_z_ = 0.0; }; int main() { horus::Params params; params.set("imu.baudrate", int64_t(115200)); params.set("imu.calibrate", true); horus::Scheduler sched; sched.tick_rate(100_hz).name("imu_node").prefer_rt(); ImuDriver imu(params); sched.add(imu) .order(0) .budget(2_ms) .on_miss(horus::Miss::Warn) .watchdog(5_s) .build(); sched.spin(); } ``` ## What Makes a Good Driver | Aspect | Implementation | |--------|---------------| | **Self-contained** | All hardware access in one Node class | | **Configurable** | Port, baudrate, calibration via Params | | **Monitored** | Heartbeat + DiagnosticStatus published | | **Safe** | `enter_safe_state()` closes device | | **Fault-tolerant** | Consecutive error counting, auto-disable | | **Logged** | `horus::log` for runtime, `blackbox::record` for crashes | | **Non-blocking** | `O_NONBLOCK` on file descriptor | ## Reusing the Driver Other projects add it as a node: ```cpp // In another project's main.cpp ImuDriver imu(params); sched.add(imu).order(0).budget(2_ms).build(); // Any node can subscribe to imu.data auto imu_sub = sched.subscribe("imu.data"); ``` ## Key Takeaways - Drivers are `horus::Node` subclasses — portable, reusable, testable - Publish 3 topics: data, heartbeat, status — lets monitoring work automatically - Non-blocking I/O in `tick()` — never block the scheduler - Count consecutive errors — disable after threshold (1 second typical) - `enter_safe_state()` closes hardware safely - Params for configuration — no hardcoded values --- ## Tutorial 4: Custom Message Types Path: /tutorials/04-custom-messages Description: Define custom messages for your robot — the message! macro, manual derives, and GenericMessage for dynamic data # Tutorial 4: Custom Message Types HORUS ships with 70+ standard message types, but every real project needs custom messages for its own hardware, protocols, or data formats. This tutorial covers three approaches. **Prerequisites:** [Tutorial 1](/tutorials/01-sensor-node) completed. **What you'll learn:** - Define POD messages with the `message!` macro (zero-copy IPC) - Define complex messages with manual derives (heap types like `String`, `Vec`) - Use `GenericMessage` for dynamic, cross-language data - Publish and subscribe to custom messages in a multi-node system **Time:** 20 minutes --- ## Approach 1: The `message!` Macro (Recommended) The `message!` macro generates all the boilerplate automatically. Add `#[fixed]` for zero-copy shared memory transport when all fields are primitive types: ```rust // simplified use horus::prelude::*; message! { #[fixed] /// Motor feedback — zero-copy (~50ns) MotorFeedback { motor_id: u32, rpm: f32, current_amps: f32, temperature_c: f32, } } ``` **What `#[fixed]` generates:** ```rust // simplified #[repr(C)] #[derive(Clone, Copy, Default, Debug, Serialize, Deserialize)] pub struct MotorFeedback { pub motor_id: u32, pub rpm: f32, pub current_amps: f32, pub temperature_c: f32, } impl LogSummary for MotorFeedback { /* Debug-based formatting */ } ``` The resulting type is immediately usable with `Topic` — no additional trait implementations needed. `#[fixed]` enables zero-copy shared memory transport (~50ns). For messages with `String`, `Vec`, or other dynamic fields, omit `#[fixed]` (uses serialization at ~167ns). ### Using it ```rust // simplified use horus::prelude::*; message! { MotorFeedback { motor_id: u32, rpm: f32, current_amps: f32, temperature_c: f32, } } // Publish let pub_topic: Topic = Topic::new("motor.feedback")?; pub_topic.send(MotorFeedback { motor_id: 1, rpm: 3200.0, current_amps: 1.2, temperature_c: 45.0, }); // Subscribe let sub_topic: Topic = Topic::new("motor.feedback")?; if let Some(msg) = sub_topic.recv() { println!("Motor {} at {} RPM", msg.motor_id, msg.rpm); } ``` ### Multiple messages in one block You can define several messages in a single `message!` call: ```rust // simplified use horus::prelude::*; message! { /// Wheel encoder ticks EncoderReading { left_ticks: i64, right_ticks: i64, timestamp_ns: u64, } /// PID controller output PidOutput { setpoint: f64, measured: f64, output: f64, error: f64, } } ``` ### What types can you use? The `message!` macro works with any **fixed-size, Copy** type: | Allowed | Not Allowed | |---------|-------------| | `f32`, `f64` | `String` | | `u8`, `u16`, `u32`, `u64` | `Vec<T>` | | `i8`, `i16`, `i32`, `i64` | `HashMap<K, V>` | | `bool` | `Option<T>` (heap types) | | `[f32; 3]`, `[u8; 256]` | `Box<T>` | For heap-allocated types, use Approach 2. --- ## Approach 2: Manual Derives (Complex Types) When your message needs `String`, `Vec`, `Option`, or nested structs, derive the traits manually: ```rust // simplified use horus::prelude::*; #[derive(Clone, Debug, Serialize, Deserialize)] pub struct RobotConfig { pub name: String, pub joint_names: Vec, pub max_speeds: Vec, pub description: Option, } ``` This type works with `Topic<RobotConfig>` because it implements `Clone + Serialize + Deserialize`. It uses serialization-based transport instead of zero-copy, which adds ~100ns of overhead — still fast, but not as fast as POD types from `message!`. ### Adding LogSummary If you want debug logging on the topic (via `Topic::new("name")?`), implement `LogSummary`: ```rust // simplified use horus::prelude::*; #[derive(Clone, Debug, Serialize, Deserialize)] pub struct RobotConfig { pub name: String, pub joint_names: Vec, pub max_speeds: Vec, pub description: Option, } impl LogSummary for RobotConfig { fn log_summary(&self) -> String { format!("RobotConfig({}, {} joints)", self.name, self.joint_names.len()) } } ``` Or derive it for `Debug`-based formatting: ```rust // simplified #[derive(Clone, Debug, Serialize, Deserialize, LogSummary)] pub struct RobotConfig { pub name: String, pub joint_names: Vec, pub max_speeds: Vec, pub description: Option, } ``` ### Nested types You can nest custom types — just make sure all nested types also derive the required traits: ```rust // simplified use horus::prelude::*; #[derive(Clone, Debug, Serialize, Deserialize)] pub struct WaypointList { pub waypoints: Vec, pub loop_back: bool, } #[derive(Clone, Debug, Serialize, Deserialize)] pub struct Waypoint { pub x: f64, pub y: f64, pub speed: f64, pub label: String, } ``` --- ## Approach 3: GenericMessage (Dynamic Data) `GenericMessage` is a fixed-size buffer (4KB max) that carries MessagePack-serialized data. Use it when you don't know the schema at compile time, or for quick Rust-Python prototyping. ### Sending structured data ```rust // simplified use horus::prelude::*; use std::collections::HashMap; let topic: Topic = Topic::new("experiment_data")?; // from_value() serializes any Serde type into the buffer let mut data = HashMap::new(); data.insert("trial", 42.0); data.insert("accuracy", 0.95); let msg = GenericMessage::from_value(&data)?; topic.send(msg); ``` ### Receiving and deserializing ```rust // simplified use horus::prelude::*; use std::collections::HashMap; let topic: Topic = Topic::new("experiment_data")?; if let Some(msg) = topic.recv() { let data: HashMap = msg.to_value()?; println!("Trial {}: accuracy {:.1}%", data["trial"], data["accuracy"] * 100.0); } ``` ### Adding metadata You can attach a string label (up to 255 bytes) to identify the message type at runtime: // simplified ```rust // simplified use horus::prelude::*; let payload = GenericMessage::from_value(&sensor_data)?; // Or with metadata tag: let raw = rmp_serde::to_vec(&sensor_data)?; let payload = GenericMessage::with_metadata(raw, "lidar_v2".to_string())?; if let Some(msg) = topic.recv() { if let Some(tag) = msg.metadata() { println!("Got message type: {}", tag); } } ``` ### Performance notes | Message Type | IPC Latency | Max Size | |-------------|------------|----------| | `message!` with `#[fixed]` | ~50ns (zero-copy) | Unlimited | | `message!` (flexible) | ~167ns (serde) | Unlimited | | `GenericMessage` | ~4.0-4.4μs | 4KB | Use `#[fixed]` for high-frequency control loops. Use flexible messages for dynamic data. Use `GenericMessage` for prototyping only. --- ## Complete Example: Battery Monitor System Let's build a 2-node system: a battery sensor publishes custom readings, and a monitor checks for low battery and publishes alerts. ```rust // simplified use horus::prelude::*; // --- Custom Messages --- message! { /// Raw battery sensor data BatteryReading { cell_count: u32, voltage: f32, current_amps: f32, temperature_c: f32, charge_percent: f32, } } message! { /// Alert when battery is low BatteryAlert { severity: u8, // 1=info, 2=warning, 3=critical charge_percent: f32, voltage: f32, } } // --- Battery Sensor Node --- struct BatterySensor { publisher: Topic, tick_count: u32, } impl BatterySensor { fn new() -> Result { Ok(Self { publisher: Topic::new("battery.raw")?, tick_count: 0, }) } } impl Node for BatterySensor { fn name(&self) -> &str { "BatterySensor" } fn tick(&mut self) { self.tick_count += 1; // Simulate draining battery let charge = 100.0 - (self.tick_count as f32 * 2.5); let voltage = 12.6 - (self.tick_count as f32 * 0.3); let reading = BatteryReading { cell_count: 3, voltage, current_amps: 2.1, temperature_c: 35.0 + (self.tick_count as f32 * 0.5), charge_percent: charge.max(0.0), }; println!("[Battery] {:.0}% ({:.1}V)", reading.charge_percent, reading.voltage); self.publisher.send(reading); } } // --- Battery Monitor Node --- struct BatteryMonitor { subscriber: Topic, alert_pub: Topic, } impl BatteryMonitor { fn new() -> Result { Ok(Self { subscriber: Topic::new("battery.raw")?, alert_pub: Topic::new("battery.alert")?, }) } } impl Node for BatteryMonitor { fn name(&self) -> &str { "BatteryMonitor" } fn tick(&mut self) { if let Some(reading) = self.subscriber.recv() { let severity = if reading.charge_percent < 10.0 { println!("[Monitor] CRITICAL: Battery at {:.0}%!", reading.charge_percent); 3 } else if reading.charge_percent < 30.0 { println!("[Monitor] WARNING: Battery at {:.0}%", reading.charge_percent); 2 } else { return; // No alert needed }; self.alert_pub.send(BatteryAlert { severity, charge_percent: reading.charge_percent, voltage: reading.voltage, }); } } } // --- Main --- fn main() -> Result<()> { println!("=== Battery Monitor System ===\n"); let mut scheduler = Scheduler::new().tick_rate(1_u64.hz()); scheduler.add(BatterySensor::new()?).order(0).build()?; scheduler.add(BatteryMonitor::new()?).order(1).build()?; scheduler.run_for(10_u64.secs())?; println!("\nDone!"); Ok(()) } ``` **Expected output:** ```text === Battery Monitor System === [Battery] 97% (12.3V) [Battery] 95% (12.0V) ... [Battery] 25% (5.1V) [Monitor] WARNING: Battery at 25% [Battery] 22% (4.8V) [Monitor] WARNING: Battery at 22% ... [Battery] 5% (3.0V) [Monitor] CRITICAL: Battery at 5%! ``` --- ## When to Use What | Approach | Use When | Performance | Heap Types | |----------|----------|-------------|------------| | `message!` with `#[fixed]` | Primitive fields only (sensor data, motor commands) | ~50ns (zero-copy) | No | | `message!` (flexible) | Any fields including `String`, `Vec`, nested structs | ~167ns (serde) | Yes | | `GenericMessage` | Dynamic schemas, quick prototyping, cross-language | ~4μs | N/A (bytes) | **Rules of thumb:** - Start with `#[fixed]` messages — they cover most robotics sensor/actuator use cases - Drop `#[fixed]` when you need `String`, `Vec`, or other dynamic fields - Use `GenericMessage` only for prototyping or when the schema isn't known at compile time --- ## Key Takeaways - **`message!` with `#[fixed]`** is the default choice for sensor and actuator data -- zero-copy at ~50ns - **Drop `#[fixed]`** when you need `String`, `Vec`, or other heap types -- uses serialization at ~167ns - **`GenericMessage`** is for prototyping or dynamic schemas only -- ~4us and 4KB max - **All three approaches** work with `Topic` and the scheduler -- no special wiring needed - **`LogSummary`** enables human-readable debug output in the monitor ## Next Steps - [Message Types Reference](/concepts/message-types) — all 70+ built-in messages - [POD Topics](/concepts/core-concepts-podtopic) — how zero-copy IPC works - [Python Custom Messages](/python/api/custom-messages) — using custom messages from Python --- ## See Also - [Custom Messages (Python)](/tutorials/04-custom-messages-python) — Python version - [Message Types](/concepts/message-types) — How messages work - [GenericMessage API](/rust/api/generic-message) — Dynamic message type --- ## Tutorial 5: Hardware Drivers Path: /tutorials/05-hardware-drivers Description: Connect your robot's hardware — configure devices in horus.toml, load them as nodes, and schedule them # Tutorial 5: Hardware Drivers Every robot needs to talk to hardware — servos, LiDARs, cameras, IMUs. In HORUS, **a driver is a node**. You declare what hardware you have in `horus.toml`, call `hardware::load()`, and get back ready-to-use nodes you can add straight to a scheduler. **Prerequisites:** [Tutorial 1](/tutorials/01-sensor-node) completed. **What you'll learn:** - Declare devices in `horus.toml [hardware]` - Load devices as nodes with `hardware::load()` - Read device parameters with `NodeParams` - Write custom drivers with `register_driver!` - Swap real hardware for simulation with `sim = true` **Time:** 15 minutes --- ## Core Idea: A Driver Is a Node There is one concept to remember: every hardware device becomes a `Node`. It publishes sensor data, subscribes to commands, and runs inside the scheduler like any other node. No special handle types, no wrapper layers — just nodes. ``` horus.toml [hardware] ──▶ hardware::load() ──▶ Vec<(String, Box)> ──▶ scheduler.add() ``` --- ## Step 1: Declare Devices in horus.toml Add a `[hardware]` section. Each device has a name and a `use` field that tells HORUS which driver to load: ```toml [package] name = "my-robot" version = "0.1.0" language = "rust" [hardware.arm] use = "dynamixel" port = "/dev/ttyUSB0" baudrate = 1000000 servo_ids = [1, 2, 3, 4, 5] [hardware.lidar] use = "rplidar" port = "/dev/ttyUSB1" baudrate = 256000 [hardware.conveyor] use = "ConveyorDriver" port = "/dev/ttyACM0" speed = 0.5 ``` The `use` field is the only required key. Everything else becomes a typed parameter accessible via `NodeParams`. --- ## Step 2: Load Devices as Nodes Call `hardware::load()` to parse the `[hardware]` section. It returns a list of `(name, node)` pairs: ```rust // simplified use horus::prelude::*; use horus::hardware; fn main() -> Result<()> { let devices = hardware::load()?; for (name, _node) in &devices { println!("Loaded device: {}", name); } // → Loaded device: arm // → Loaded device: conveyor // → Loaded device: lidar Ok(()) } ``` Each entry is a `(String, Box)` — add them directly to a scheduler: ```rust // simplified use horus::prelude::*; use horus::hardware; fn main() -> Result<()> { let devices = hardware::load()?; let mut scheduler = Scheduler::new().tick_rate(100_u64.hz()); for (name, node) in devices { println!("Adding {} to scheduler", name); scheduler.add(node).build()?; } scheduler.run() } ``` That is a complete hardware-loading program. Twelve lines. --- ## Step 3: Build a Custom Driver A driver is any struct that implements `Node`. Use `NodeParams` to read configuration from `horus.toml`: ```rust // simplified use horus::prelude::*; use horus::hardware::NodeParams; use horus::register_driver; struct ConveyorDriver { speed: f64, publisher: Topic, } impl ConveyorDriver { fn from_params(params: &NodeParams) -> Result { let speed: f64 = params.get_or("speed", 1.0); Ok(Self { speed, publisher: Topic::new("conveyor.velocity")?, }) } } impl Node for ConveyorDriver { fn name(&self) -> &str { "ConveyorDriver" } fn tick(&mut self) { self.publisher.send(CmdVel::new(self.speed as f32, 0.0)); } } // Register so [hardware.conveyor] use = "ConveyorDriver" works register_driver!(ConveyorDriver, ConveyorDriver::from_params); ``` The corresponding config: ```toml [hardware.conveyor] use = "ConveyorDriver" port = "/dev/ttyACM0" speed = 0.5 ``` When `hardware::load()` encounters `use = "ConveyorDriver"`, it finds the registered factory and calls `from_params()` with the config. The result is a `Box` ready for the scheduler. --- ## Step 4: Add Configuration to a Driver For more complex drivers, read multiple typed params and wire up topics: ```rust // simplified use horus::prelude::*; use horus::hardware::NodeParams; use horus::register_driver; struct ArmController { servo_ids: Vec, port: String, state_pub: Topic, cmd_sub: Topic, } impl ArmController { fn from_params(params: &NodeParams) -> Result { let servo_ids: Vec = params.get("servo_ids")?; let port: String = params.get("port")?; println!("ArmController: {} servos on {}", servo_ids.len(), port); Ok(Self { servo_ids, port, state_pub: Topic::new("arm.state")?, cmd_sub: Topic::new("arm.command")?, }) } } impl Node for ArmController { fn name(&self) -> &str { "ArmController" } fn tick(&mut self) { if let Some(cmd) = self.cmd_sub.recv() { // Send command to hardware using self.port // Read back state and publish } } fn enter_safe_state(&mut self) { // Stop all servos } } register_driver!(ArmController, ArmController::from_params); ``` Then schedule it with real-time constraints: ```rust // simplified use horus::prelude::*; use horus::hardware; fn main() -> Result<()> { let devices = hardware::load()?; let mut scheduler = Scheduler::new().tick_rate(100_u64.hz()); for (name, node) in devices { match name.as_str() { "arm" => { scheduler.add(node) .order(0) .rate(500_u64.hz()) .on_miss(Miss::SafeMode) .build()?; } _ => { scheduler.add(node).build()?; } } } scheduler.run() } ``` ### NodeParams API Quick Reference | Method | Returns | Use | |--------|---------|-----| | `params.get::(key)` | `Result` | Required param (errors if missing) | | `params.get_or(key, default)` | `T` | Optional param with fallback | | `params.has(key)` | `bool` | Check if key exists | | `params.keys()` | `Iterator<&str>` | List all param names | | `params.raw(key)` | `Option<&toml::Value>` | Raw TOML value | Supported types for `get::()`: `String`, `bool`, `i32`, `i64`, `u8`, `u32`, `u64`, `f32`, `f64`, `Vec`. --- ## Step 5: Simulation Swap Add `sim = true` to any device entry. When running in simulation mode, HORUS loads the sim variant of that driver instead of the real one: ```toml [hardware.arm] use = "dynamixel" port = "/dev/ttyUSB0" baudrate = 1000000 servo_ids = [1, 2, 3, 4, 5] sim = true [hardware.lidar] use = "rplidar" port = "/dev/ttyUSB1" baudrate = 256000 sim = true [hardware.conveyor] use = "ConveyorDriver" speed = 0.5 ``` The conveyor has no `sim = true` — it uses the same driver in both modes. The arm and lidar swap to simulated versions automatically. Your application code does not change at all: ```rust // simplified — identical to Step 2, works with real or sim hardware let devices = hardware::load()?; let mut scheduler = Scheduler::new().tick_rate(100_u64.hz()); for (_name, node) in devices { scheduler.add(node).build()?; } scheduler.run() ``` --- ## Step 6: Testing Without Hardware Use `hardware::load_from()` to load from a test config file instead of the project's `horus.toml`: ```rust // simplified #[cfg(test)] mod tests { use super::*; use horus::hardware; #[test] fn arm_controller_from_config() { std::fs::write("test_hardware.toml", r#" [hardware.arm] use = "ArmController" port = "/dev/null" servo_ids = [1, 2, 3] "#).unwrap(); let devices = hardware::load_from("test_hardware.toml").unwrap(); assert_eq!(devices.len(), 1); assert_eq!(devices[0].0, "arm"); std::fs::remove_file("test_hardware.toml").ok(); } } ``` --- ## Key Takeaways - **A driver is a node.** No special types — `hardware::load()` returns `Vec<(String, Box)>` - **`horus.toml [hardware]`** separates hardware config from code — swap devices without recompiling - **`use = "..."`** is the one field that identifies a driver. HORUS resolves it automatically - **`register_driver!`** connects your custom struct to the `[hardware]` config system - **`sim = true`** swaps individual devices to simulated versions without changing code - **`NodeParams`** gives typed access to config values via `params.get::(key)` ## Next Steps - [Hardware API Reference](/rust/api/hardware) — full API documentation - [Real-Time Control](/tutorials/realtime-control) — scheduling drivers with RT guarantees - [Deployment](/operations) — deploying hardware configs to a robot --- ## See Also - [Hardware & RT (Python)](/tutorials/05-hardware-and-rt-python) — Python version - [Hardware API](/rust/api/hardware) — hardware loading reference --- ## Tutorial 6: Write a Reusable Driver Path: /tutorials/06-write-a-driver Description: Build a hardware driver package that others can install from the HORUS registry — from serial protocol to published package # Tutorial 6: Write a Reusable Driver Tutorial 5 showed how to use hardware. This tutorial shows how to **create a driver** — a reusable hardware driver package that anyone can install with `horus install` and configure in their `horus.toml`. **What you'll build:** A driver for a serial motor controller (generic protocol) that publishes `JointState` and subscribes to `JointCommand` topics. By the end, it's installable from the registry. **Prerequisites:** [Tutorial 5: Hardware Drivers](/tutorials/05-hardware-drivers) completed. **Time:** 30 minutes --- ## Architecture of a Driver Package A driver package is a HORUS project that: 1. Reads config from `NodeParams` (port, baudrate, etc.) 2. Communicates with hardware (serial, I2C, CAN, etc.) 3. Publishes sensor data to topics 4. Subscribes to command topics 5. Handles errors and safe states P["NodeParams"] P --> N["MotorDriver Node"] H["Hardware"] --- N N -->|publish| S["joint_state"] CMD["joint_cmd"] -->|subscribe| N `} caption="Driver architecture: config → params → node → hardware + topics" /> --- ## Step 1: Create the Project ```bash horus new horus-driver-serial-motor cd horus-driver-serial-motor ``` Edit `horus.toml`: ```toml [package] name = "horus-driver-serial-motor" version = "0.1.0" description = "Generic serial motor controller driver for HORUS" type = "driver" license = "Apache-2.0" keywords = ["motor", "serial", "driver", "actuator"] [dependencies] serialport = { version = "4.5", source = "crates.io" } ``` The `type = "driver"` metadata tells the registry this is a driver package, which affects how it shows up in search results. --- ## Step 2: Define the Driver Struct --- ## Step 3: Implement the Node Trait --- ## Step 4: Wire Up main() `hardware::load()` reads your `horus.toml` `[hardware]` section and returns ready-to-use nodes — one `(name, Box)` pair per device entry. For a driver package, you typically construct the node directly from params and add it to the scheduler: --- ## Step 5: Test Without Hardware Create a test config and use `hardware::load_from()`: ```rust // simplified #[cfg(test)] mod tests { use super::*; fn test_config() -> &'static str { r#" [hardware.motor] use = "serial" port = "/dev/null" baudrate = 115200 motor_ids = [1, 2, 3] topic_prefix = "test_motor" "# } #[test] fn driver_parses_config() { std::fs::write("/tmp/test_motor_driver.toml", test_config()).unwrap(); let nodes = hardware::load_from("/tmp/test_motor_driver.toml").unwrap(); assert_eq!(nodes.len(), 1); assert_eq!(nodes[0].0, "motor"); std::fs::remove_file("/tmp/test_motor_driver.toml").ok(); } #[test] fn driver_has_safe_state() { // Verify the driver implements enter_safe_state // In a real test, you'd use a mock serial port } } ``` For more thorough testing with simulated serial ports, use the `serialport` crate's `TTYPort::pair()` for virtual serial ports on Linux, or test with `horus run --sim` where the simulator provides virtual motor feedback. --- ## Step 6: Publish to the Registry Once your driver works, publish it so others can install it: ```bash # Login (first time only) horus auth login # Verify package metadata horus check # Publish horus publish ``` Users can then install and use your driver: ```bash horus install horus-driver-serial-motor ``` ```toml # Their horus.toml [hardware.arm] use = "horus-driver-serial-motor" port = "/dev/ttyUSB0" baudrate = 115200 motor_ids = [1, 2, 3, 4, 5] topic_prefix = "arm" ``` --- ## Driver Design Checklist Before publishing, verify your driver follows these practices: ### Config - [ ] All hardware-specific values come from `NodeParams`, not hardcoded - [ ] Reasonable defaults for optional params (`get_or()`) - [ ] `topic_prefix` param so users can namespace topics per robot - [ ] Error messages include the port/device name for debugging ### Safety - [ ] `enter_safe_state()` stops all actuators immediately - [ ] `is_safe_state()` returns `true` when motors are stopped - [ ] `shutdown()` cleanly closes the hardware connection - [ ] `on_miss(Miss::SafeMode)` configured for actuator drivers ### Topics - [ ] Follow the [topic naming convention](/getting-started/simulation#topic-naming-convention): `{prefix}.{data_type}` - [ ] Use standard message types (`JointState`, `JointCommand`, `Imu`, `LaserScan`, etc.) - [ ] Publish at a consistent rate matching the hardware capability ### Testing - [ ] Config parsing tested with `hardware::load_from()` - [ ] Works with `horus run --sim` (simulator provides mock data) - [ ] Graceful behavior when hardware is disconnected (no panics) ### Package metadata - [ ] `type = "driver"` in `horus.toml [package]` - [ ] `description` explains what hardware it supports - [ ] `keywords` include the hardware name and interface type - [ ] `license` specified --- ## Example: Complete horus.toml for a Multi-Hardware Robot ```toml [package] name = "warehouse-robot" version = "0.1.0" [robot] name = "agv" description = "agv.urdf" [hardware.wheels] use = "horus-driver-serial-motor" port = "/dev/ttyUSB0" motor_ids = [1, 2] topic_prefix = "agv" [hardware.lidar] use = "rplidar" port = "/dev/ttyUSB1" [hardware.arm] use = "horus-driver-serial-motor" port = "/dev/ttyUSB2" motor_ids = [1, 2, 3, 4, 5, 6] topic_prefix = "arm" [hardware.imu] use = "bno055" bus = "/dev/i2c-1" [hardware.sim_wheels] use = "horus-driver-serial-motor" sim = true [hardware.sim_lidar] use = "rplidar" sim = true noise = 0.01 [hardware.sim_arm] use = "horus-driver-serial-motor" sim = true [hardware.sim_imu] use = "bno055" sim = true ``` Notice that `horus-driver-serial-motor` is used **multiple times** — for the wheels, the arm, and their simulated counterparts — each with different config values. The `sim = true` flag marks devices that should use simulated backends instead of real hardware. This is the power of config-driven drivers with the unified `use` field. --- ## Next Steps - [Publishing & Registry](/package-management/publishing) — detailed publishing workflow - [Creating CLI Plugins](/plugins/creating-plugins) — extend the `horus` CLI - [Real Hardware](/recipes/real-hardware) — complete I2C and serial examples with real libraries --- ## Tutorial: Debug with Record & Replay Path: /tutorials/record-replay-debugging Description: Use built-in recording to capture a timing bug, replay it deterministically, and test a fix with mixed replay # Tutorial: Debug with Record & Replay Every robotics engineer has hit the bug that only happens in the field. The robot drifts, the arm overshoots, the planner freezes — but only under specific sensor conditions you cannot reproduce at your desk. In ROS, you would reach for `rosbag` — an external tool that records DDS messages to a file. HORUS takes a different approach: recording is **built into the scheduler**, capturing every node's inputs and outputs at tick granularity with zero-copy overhead. This tutorial walks you through a real debugging workflow: record a timing bug, replay it deterministically, isolate the cause, and verify your fix — all without touching the robot again. ## Prerequisites - [Quick Start](/getting-started/quick-start) completed - Familiarity with [Nodes and Topics](/concepts/core-concepts-topic) - Understanding of [Record & Replay](/advanced/record-replay) concepts ## What You'll Build A 3-node robot system with a deliberate timing bug, then: 1. Record the buggy execution 2. Replay the recording to reproduce the bug deterministically 3. Use `horus record diff` to compare runs 4. Fix the bug and verify the fix with mixed replay **Time estimate**: ~20 minutes ## Step 1: Create the Project ```bash horus new replay-debug -r cd replay-debug ``` ## Step 2: Build a Buggy Robot Replace `src/main.rs` with a 3-node system: a simulated sensor, a controller with a timing bug, and a monitor. The controller accumulates drift when the sensor value crosses a threshold — a realistic bug that only manifests under certain input patterns. ```rust use horus::prelude::*; // ── Messages ────────────────────────────────────────────────── #[derive(Debug, Clone, Copy, Default, Serialize, Deserialize, LogSummary)] #[repr(C)] struct SensorReading { value: f32, timestamp_ns: u64, } #[derive(Debug, Clone, Copy, Default, Serialize, Deserialize, LogSummary)] #[repr(C)] struct ControlOutput { command: f32, error: f32, integral: f32, } // ── Sensor Node ─────────────────────────────────────────────── struct Sensor { pub_reading: Topic, tick_count: u64, } impl Sensor { fn new() -> Result { Ok(Self { pub_reading: Topic::new("sensor.reading")?, tick_count: 0, }) } } impl Node for Sensor { fn name(&self) -> &str { "sensor" } fn tick(&mut self) { let t = self.tick_count as f32 * 0.01; // Sine wave with occasional spikes (the trigger condition) let spike = if self.tick_count % 73 == 0 { 5.0 } else { 0.0 }; let value = (t * 2.0).sin() * 10.0 + spike; let _ = self.pub_reading.send(SensorReading { value, timestamp_ns: horus::now().as_nanos() as u64, }); self.tick_count += 1; } } // ── Controller Node (with bug) ──────────────────────────────── struct Controller { sub_reading: Topic, pub_output: Topic, integral: f32, last_error: f32, setpoint: f32, } impl Controller { fn new() -> Result { Ok(Self { sub_reading: Topic::new("sensor.reading")?, pub_output: Topic::new("ctrl.output")?, integral: 0.0, last_error: 0.0, setpoint: 0.0, }) } } impl Node for Controller { fn name(&self) -> &str { "controller" } fn tick(&mut self) { let reading = match self.sub_reading.recv() { Some(r) => r, None => return, }; let error = self.setpoint - reading.value; let dt = horus::dt().as_secs_f32(); // BUG: No anti-windup on integral term. // When sensor spikes, the integral accumulates without bound, // causing the controller to drift after the spike passes. self.integral += error * dt; let derivative = if dt > 0.0 { (error - self.last_error) / dt } else { 0.0 }; self.last_error = error; let command = 0.5 * error + 0.1 * self.integral + 0.01 * derivative; let _ = self.pub_output.send(ControlOutput { command, error, integral: self.integral, }); } } // ── Monitor Node ────────────────────────────────────────────── struct Monitor { sub_output: Topic, tick_count: u64, } impl Monitor { fn new() -> Result { Ok(Self { sub_output: Topic::new("ctrl.output")?, tick_count: 0, }) } } impl Node for Monitor { fn name(&self) -> &str { "monitor" } fn tick(&mut self) { if let Some(output) = self.sub_output.recv() { // Print every 100 ticks if self.tick_count % 100 == 0 { hlog!(info, "cmd={:.2} err={:.2} integral={:.2}", output.command, output.error, output.integral); } } self.tick_count += 1; } } // ── Main ────────────────────────────────────────────────────── fn main() -> Result<()> { let mut scheduler = Scheduler::new() .tick_rate(100_u64.hz()); scheduler.add(Sensor::new()?).order(0).build()?; scheduler.add(Controller::new()?).order(1).build()?; scheduler.add(Monitor::new()?).order(2).build()?; scheduler.run() } ``` Run it briefly and observe the integral drifting: ```bash horus run # Watch for a few seconds, then Ctrl+C # You'll see the integral climbing after spike events ``` ## Step 3: Record the Bug Now run the same system with recording enabled. This captures every node's inputs and outputs at every tick: Or record without changing code using the CLI: ```bash horus run --record buggy_run # Run for 10 seconds, then Ctrl+C ``` Check the recording: ```bash horus record list --long # Output: # buggy_run 3 nodes 1000 ticks 245 KB 2026-03-28 14:32 ``` Inspect what was captured: ```bash horus record info buggy_run # Shows per-node tick ranges, topics recorded, file sizes ``` ## Step 4: Replay and Reproduce Replay the recording. The bug reproduces identically every time — same inputs, same timing, same outputs: ```bash horus record replay buggy_run ``` Slow it down to watch the spike events: ```bash horus record replay buggy_run --speed 0.25 ``` Jump directly to the problematic region (if you noticed the drift starting around tick 400): ```bash horus record replay buggy_run --start-tick 350 --stop-tick 500 ``` ## Step 5: Export and Analyze Export the recording for offline analysis: ```bash horus record export buggy_run --output buggy.json --format json horus record export buggy_run --output buggy.csv --format csv ``` The JSON export lets you script analysis — grep for the integral crossing a threshold, plot the spike correlation, etc. ## Step 6: Fix the Bug The fix is simple — add integral anti-windup clamping. Update the controller: ```rust impl Node for Controller { fn name(&self) -> &str { "controller" } fn tick(&mut self) { let reading = match self.sub_reading.recv() { Some(r) => r, None => return, }; let error = self.setpoint - reading.value; let dt = horus::dt().as_secs_f32(); // FIX: Clamp integral to prevent windup self.integral = (self.integral + error * dt).clamp(-10.0, 10.0); let derivative = if dt > 0.0 { (error - self.last_error) / dt } else { 0.0 }; self.last_error = error; let command = 0.5 * error + 0.1 * self.integral + 0.01 * derivative; let _ = self.pub_output.send(ControlOutput { command, error, integral: self.integral, }); } } ``` ## Step 7: Verify the Fix with Mixed Replay This is the key step that `rosbag` cannot do. Use **mixed replay** to feed the exact same sensor data from the buggy recording into your fixed controller: ```bash # Inject the recorded sensor node, run the fixed controller live horus record inject buggy_run --nodes sensor ``` Or programmatically in Rust: ```rust,ignore use horus::prelude::*; use std::path::PathBuf; fn main() -> Result<()> { let mut scheduler = Scheduler::new() .tick_rate(100_u64.hz()) .with_recording(); // Record the fixed run too // Replay the recorded sensor (same spikes, same timing) scheduler.add_replay( PathBuf::from("~/.local/share/horus/recordings/buggy_run/sensor@001.horus"), 0, )?; // Run the FIXED controller live against recorded sensor data scheduler.add(Controller::new()?).order(1).build()?; scheduler.add(Monitor::new()?).order(2).build()?; scheduler.run() } ``` Now record this fixed run and compare: ```bash horus run --record fixed_run # Let it run the same duration, Ctrl+C horus record diff buggy_run fixed_run # Shows tick-by-tick differences in ctrl.output # The integral no longer drifts past +/-10.0 ``` ## Step 8: Clean Up ```bash # Keep the fixed run, delete the buggy one horus record delete buggy_run # Or clean all recordings older than 7 days horus record clean --max-age-days 7 ``` ## Key Takeaways - **Recording is zero-overhead**: It reads directly from shared memory slots — no extra serialization, no external process - **Replay is deterministic**: Same inputs at the same tick produce identical outputs every time - **Mixed replay is the killer feature**: Replay recorded sensors while running new code live — impossible with external bag tools - **`horus record diff`** lets you prove your fix works by comparing buggy vs fixed runs on identical inputs - **No hardware needed**: Once recorded, you debug entirely at your desk ## Challenges **(a) Add a regression test**: Write a script that runs `horus record inject buggy_run --nodes sensor`, checks the integral stays within bounds, and returns exit code 0/1. Add it to your CI. **(b) Override a sensor value**: Use `--override sensor.reading ...` during replay to test what happens if the spike amplitude doubles. Does your fix still hold? **(c) Compare algorithm versions**: Record a session, then modify the PID gains. Use mixed replay + diff to find the gains that minimize overshoot on the recorded input data. ## Common Errors | Symptom | Cause | Fix | |---------|-------|-----| | `horus record list` shows nothing | Recording was not enabled | Add `.with_recording()` or use `--record` flag | | Replay produces different output | Code changed between record and replay | Expected for mixed replay; use `replay_from` for exact reproduction | | `inject` node not receiving data | Topic name mismatch | Topic names are case-sensitive and must match exactly | | Recording files are large | Long session or high-frequency large messages | Use `horus record clean` or set `max_snapshots` to bound size | ## See Also - [Record & Replay Reference](/advanced/record-replay) — Full API documentation - [BlackBox Flight Recorder](/advanced/blackbox) — Lightweight crash forensics (always-on, bounded storage) - [Deterministic Mode](/advanced/deterministic-mode) — Required for bit-identical replay - [PID Controller Recipe](/recipes/pid-controller) — Production PID with anti-windup ======================================== # SECTION: Rust ======================================== --- ## Standard Messages Path: /rust/api/messages Description: 60+ standard robotics message types — overview and navigation # Standard Messages HORUS provides 60+ typed message types covering every common robotics domain. All messages support zero-copy shared memory transport via `Topic`. ```rust // simplified use horus::prelude::*; ``` --- ## Which Message Do I Need? | I need to... | Use | Page | |-------------|-----|------| | Send velocity commands to motors | `CmdVel`, `Twist` | [Control](/rust/api/control-messages) | | Read IMU data (accel + gyro) | `Imu` | [Sensor](/rust/api/sensor-messages) | | Read LiDAR scans | `LaserScan` | [Sensor](/rust/api/sensor-messages) | | Publish robot position | `Odometry`, `Pose2D`, `Pose3D` | [Sensor](/rust/api/sensor-messages), [Geometry](/rust/api/geometry-messages) | | Send camera images | `Image` | [Image API](/rust/api/image) | | Send point clouds | `PointCloud` | [PointCloud API](/rust/api/pointcloud) | | Report detections (ML output) | `Detection`, `Detection3D` | [Perception](/rust/api/perception-messages) | | Send navigation goals | `NavGoal`, `NavPath` | [Navigation](/rust/api/navigation-messages) | | Report system health | `DiagnosticReport`, `NodeHeartbeat` | [Diagnostics](/rust/api/diagnostics-messages) | | Send force/torque data | `WrenchStamped`, `ForceCommand` | [Force](/rust/api/force-messages) | | Send joystick/keyboard input | `JoystickInput`, `KeyboardInput` | [Input](/rust/api/input-messages) | | Send audio data | `AudioFrame` | [Audio](/stdlib/messages/audio-frame) | | Send ML tensors | `Tensor` | [Tensor API](/rust/api/tensor) | | Send dynamic/untyped data | `GenericMessage` (4KB, MessagePack) | [GenericMessage](/rust/api/generic-message) | --- ## Message Categories | Category | Types | Page | |----------|-------|------| | **Geometry** | Twist, Pose2D, Pose3D, Point3, Vector3, Quaternion, TransformStamped, Accel | [Geometry Messages](/rust/api/geometry-messages) | | **Sensor** | Imu, LaserScan, Odometry, NavSatFix, BatteryState, JointState, Temperature, MagneticField | [Sensor Messages](/rust/api/sensor-messages) | | **Control** | CmdVel, MotorCommand, ServoCommand, PidConfig, DifferentialDriveCommand | [Control Messages](/rust/api/control-messages) | | **Navigation** | NavGoal, NavPath, Waypoint, OccupancyGrid, CostMap, VelocityObstacle | [Navigation Messages](/rust/api/navigation-messages) | | **Perception** | Detection, Detection3D, TrackedObject, SegmentationMask, LandmarkArray | [Perception Messages](/rust/api/perception-messages) | | **Vision** | CompressedImage, CameraInfo, RegionOfInterest, StereoInfo, BoundingBox | [Vision Messages](/rust/api/vision-messages) | | **Diagnostics** | DiagnosticReport, NodeHeartbeat, SafetyStatus, EmergencyStop | [Diagnostics Messages](/rust/api/diagnostics-messages) | | **Force & Haptics** | WrenchStamped, ForceCommand, ContactInfo, ImpedanceParameters | [Force Messages](/rust/api/force-messages) | | **Input** | JoystickInput, KeyboardInput | [Input Messages](/rust/api/input-messages) | | **Clock** | Clock-related messages | [Clock Messages](/rust/api/clock-messages) | | **ML** | ML inference messages | [ML Messages](/rust/api/ml-messages) | | **Tensor** | Tensor descriptors | [Tensor Messages](/rust/api/tensor-messages) | --- ## Import All standard messages are available via the prelude: ```rust // simplified use horus::prelude::*; // Now you can use: CmdVel, Imu, Odometry, LaserScan, Twist, Pose2D, etc. let cmd = CmdVel::new(1.0, 0.0); let imu = Imu::new(); ``` ## Custom Messages Need a type that doesn't exist? Use the `message!` macro: ```rust // simplified use horus::prelude::*; message! { MotorStatus { rpm: f32, current_amps: f32, temperature_c: f32, fault_code: u32, } } let topic = Topic::::new("motor.status")?; ``` See [Macros](/rust/api/macros) for the full `message!` syntax. ## Fixed-Size Types and Zero-Copy All standard messages are `#[repr(C)]`, `Copy`, and fixed-size — they transfer through shared memory without serialization. Variable-length data (images, point clouds, tensors) use pool-backed allocation instead. --- ## See Also - [Standard Library](/stdlib) — Per-message reference with usage examples - [Python Message Library](/python/library/python-message-library) — Python equivalents - [Custom Messages Tutorial](/tutorials/04-custom-messages) — Creating your own types - [GenericMessage](/rust/api/generic-message) — Dynamic 4KB message with MessagePack - [Message Types Concept](/concepts/message-types) — How messages work in HORUS --- ## Advanced Examples Path: /rust/examples/advanced-examples Description: Complex patterns, multi-process systems, and Python integration # Advanced Examples Advanced HORUS patterns for complex robotics systems. These examples demonstrate priority-based safety systems, multi-process architectures, and cross-language communication. **Prerequisites**: Complete [Basic Examples](/rust/examples/basic-examples) first. --- ## 1. State Machine Node Implement complex behavior using state machines - ideal for autonomous robots with multiple operating modes. **File: `state_machine.rs`** ```rust // simplified use horus::prelude::*; #[derive(Debug, Clone, Copy, PartialEq)] enum RobotState { Idle, Moving, ObstacleDetected, Rotating, Escaped, } struct StateMachineNode { state: RobotState, obstacle_sub: Topic, cmd_pub: Topic, rotation_counter: u32, } impl StateMachineNode { fn new() -> Result { Ok(Self { state: RobotState::Idle, obstacle_sub: Topic::new("obstacle_detected")?, cmd_pub: Topic::new("cmd_vel")?, rotation_counter: 0, }) } } impl Node for StateMachineNode { fn name(&self) -> &str { "StateMachineNode" } fn init(&mut self) -> Result<()> { hlog!(info, "State machine initialized - starting in IDLE state"); Ok(()) } fn tick(&mut self) { // Check for obstacles let obstacle = self.obstacle_sub.recv().unwrap_or(false); // Store previous state for logging let prev_state = self.state; // State machine logic self.state = match self.state { RobotState::Idle => { if !obstacle { RobotState::Moving } else { RobotState::Idle } } RobotState::Moving => { if obstacle { self.cmd_pub.send(CmdVel::zero()); // Stop RobotState::ObstacleDetected } else { self.cmd_pub.send(CmdVel::new(1.0, 0.0)); // Forward RobotState::Moving } } RobotState::ObstacleDetected => { self.rotation_counter = 0; RobotState::Rotating } RobotState::Rotating => { self.cmd_pub.send(CmdVel::new(0.0, 0.5)); // Rotate self.rotation_counter += 1; if self.rotation_counter > 50 { RobotState::Escaped } else { RobotState::Rotating } } RobotState::Escaped => { RobotState::Moving // Resume moving } }; // Log state transitions if self.state != prev_state { hlog!(info, "State transition: {:?} -> {:?}", prev_state, self.state); } } fn shutdown(&mut self) -> Result<()> { // Ensure robot is stopped self.cmd_pub.send(CmdVel::zero()); hlog!(info, "State machine shutdown"); Ok(()) } } fn main() -> Result<()> { let mut scheduler = Scheduler::new(); scheduler.add(StateMachineNode::new()?).order(0).build()?; scheduler.run()?; Ok(()) } ``` **Run it**: ```bash horus run state_machine.rs ``` **Key Concepts**: - Enum for states: `Idle`, `Moving`, `ObstacleDetected`, `Rotating`, `Escaped` - Match expression handles state transitions - Each state defines behavior and next state - Log state transitions for debugging --- ## 2. Priority-Based Safety System Use node priorities to ensure safety-critical tasks always run first - essential for production robotics. **File: `safety_system.rs`** ```rust // simplified use horus::prelude::*; // CRITICAL PRIORITY: Emergency stop struct EmergencyStopNode { battery_sub: Topic, lidar_sub: Topic, // Min obstacle distance estop_pub: Topic, estop_active: bool, } impl EmergencyStopNode { fn new() -> Result { Ok(Self { battery_sub: Topic::new("battery_state")?, lidar_sub: Topic::new("min_distance")?, estop_pub: Topic::new("emergency_stop")?, estop_active: false, }) } } impl Node for EmergencyStopNode { fn name(&self) -> &str { "EmergencyStop" } fn init(&mut self) -> Result<()> { hlog!(info, " Emergency stop system online - CRITICAL priority"); Ok(()) } fn tick(&mut self) { let mut should_stop = false; // Check battery if let Some(battery) = self.battery_sub.recv() { if battery.is_critical() { // Below 10% should_stop = true; hlog!(error, " CRITICAL: Battery at {:.0}% - EMERGENCY STOP!", battery.percentage); } } // Check obstacle distance if let Some(min_dist) = self.lidar_sub.recv() { if min_dist < 0.2 { // 20cm should_stop = true; hlog!(error, " CRITICAL: Obstacle at {:.2}m - EMERGENCY STOP!", min_dist); } } // Publish estop state if should_stop != self.estop_active { self.estop_pub.send(should_stop); self.estop_active = should_stop; } } fn shutdown(&mut self) -> Result<()> { // Always activate estop on shutdown self.estop_pub.send(true); hlog!(warn, "Emergency stop system offline"); Ok(()) } } // HIGH PRIORITY: Motor controller struct MotorController { estop_sub: Topic, cmd_sub: Topic, motor_pub: Topic, estop_active: bool, } impl MotorController { fn new() -> Result { Ok(Self { estop_sub: Topic::new("emergency_stop")?, cmd_sub: Topic::new("cmd_vel_request")?, motor_pub: Topic::new("cmd_vel_actual")?, estop_active: false, }) } } impl Node for MotorController { fn name(&self) -> &str { "MotorController" } fn init(&mut self) -> Result<()> { hlog!(info, "Motor controller online - HIGH priority"); Ok(()) } fn tick(&mut self) { // Check emergency stop FIRST if let Some(estop) = self.estop_sub.recv() { if estop != self.estop_active { self.estop_active = estop; if estop { hlog!(warn, "Motors DISABLED - emergency stop active"); } else { hlog!(info, "Motors ENABLED - emergency stop cleared"); } } } // Don't move if estop active if self.estop_active { self.motor_pub.send(CmdVel::zero()); return; } // Process normal commands if let Some(cmd) = self.cmd_sub.recv() { self.motor_pub.send(cmd); hlog!(debug, "Motors: linear={:.2}, angular={:.2}", cmd.linear, cmd.angular); } } fn shutdown(&mut self) -> Result<()> { // Stop motors self.motor_pub.send(CmdVel::zero()); hlog!(info, "Motor controller stopped"); Ok(()) } } // BACKGROUND PRIORITY: Data logging struct LoggerNode { cmd_sub: Topic, battery_sub: Topic, } impl LoggerNode { fn new() -> Result { Ok(Self { cmd_sub: Topic::new("cmd_vel_actual")?, battery_sub: Topic::new("battery_state")?, }) } } impl Node for LoggerNode { fn name(&self) -> &str { "Logger" } fn init(&mut self) -> Result<()> { hlog!(info, " Logger online - BACKGROUND priority"); Ok(()) } fn tick(&mut self) { // Log velocity commands if let Some(cmd) = self.cmd_sub.recv() { hlog!(debug, "LOG: cmd_vel({:.2}, {:.2})", cmd.linear, cmd.angular); } // Log battery state if let Some(battery) = self.battery_sub.recv() { hlog!(debug, "LOG: battery({:.1}V, {:.0}%)", battery.voltage, battery.percentage); } // In production: write to file, database, etc. } fn shutdown(&mut self) -> Result<()> { hlog!(info, "Logger stopped"); Ok(()) } } fn main() -> Result<()> { let mut scheduler = Scheduler::new(); // Order 0 (Critical): Safety runs FIRST scheduler.add(EmergencyStopNode::new()?).order(0).build()?; // Order 1 (High): Control runs SECOND scheduler.add(MotorController::new()?).order(1).build()?; // Order 4 (Background): Logging runs LAST scheduler.add(LoggerNode::new()?).order(4).build()?; scheduler.run()?; Ok(()) } ``` **Run it**: ```bash horus run safety_system.rs ``` **Key Concepts**: - **Priority 0 (Critical)**: Emergency stop - runs first, always - **Priority 1 (High)**: Motor control - runs after safety checks - **Priority 4 (Background)**: Logging - runs last, non-critical - Lower number = higher priority - Safety systems should always check estop before acting --- ## 3. Python Multi-Process System Build a complete sensor monitoring system with Python nodes running as independent processes. ### Project Structure ```bash mkdir multi_node_system cd multi_node_system mkdir nodes ``` ### Sensor Node **nodes/sensor.py:** ```python #!/usr/bin/env python3 import horus import random def sensor_tick(node): # Generate realistic temperature with noise temperature = 20.0 + random.random() * 10.0 node.send("temperature", temperature) print(f"Sensor: {temperature:.1f}°C") sensor = horus.Node(name="SensorNode", tick=sensor_tick, order=0, rate=2, pubs=["temperature"]) horus.run(sensor) ``` ### Controller Node **nodes/controller.py:** ```python #!/usr/bin/env python3 import horus def controller_tick(node): temp = node.recv("temperature") if temp is not None: if temp > 25.0: fan_speed = min(100, int((temp - 20) * 10)) node.send("fan_control", fan_speed) print(f"Controller: Fan at {fan_speed}%") else: node.send("fan_control", 0) print(f"Controller: Temperature normal, fan off") controller = horus.Node(name="ControllerNode", tick=controller_tick, order=1, rate=2, subs=["temperature"], pubs=["fan_control"]) horus.run(controller) ``` ### Logger Node **nodes/logger.py:** ```python #!/usr/bin/env python3 import horus import datetime def logger_tick(node): temp = node.recv("temperature") fan = node.recv("fan_control") if temp is not None and fan is not None: timestamp = datetime.datetime.now().strftime("%H:%M:%S") status = "COOLING" if fan > 0 else "NORMAL" print(f"Logger [{timestamp}]: {temp:.1f}°C | Fan {fan}% | {status}") logger = horus.Node(name="LoggerNode", tick=logger_tick, order=2, rate=1, subs=["temperature", "fan_control"]) horus.run(logger) ``` ### Run All Nodes Concurrently ```bash # Make scripts executable chmod +x nodes/*.py # Run all nodes as separate processes horus run "nodes/*.py" ``` **Output:** ```bash Executing 3 files concurrently: 1. nodes/controller.py (python) 2. nodes/logger.py (python) 3. nodes/sensor.py (python) Phase 1: Building all files... Phase 2: Starting all processes... Started [controller] Started [logger] Started [sensor] All processes running. Press Ctrl+C to stop. [sensor] Sensor: 23.4°C [controller] Controller: Fan at 34% [logger] Logger [15:30:45]: 23.4°C | Fan 34% | COOLING [sensor] Sensor: 26.8°C [controller] Controller: Fan at 68% [sensor] Sensor: 21.2°C [logger] Logger [15:30:46]: 21.2°C | Fan 12% | COOLING ``` **Key Features**: - **Independent Processes**: Each node runs in its own process - **Shared Memory IPC**: Nodes communicate via HORUS shared memory topics - **Color-Coded Output**: Each node has a unique color - **Graceful Shutdown**: Ctrl+C stops all processes cleanly - **Zero Configuration**: No launch files needed --- ## 4. Rust + Python Cross-Language System Mix Rust and Python nodes in the same application. ### Rust Sensor Node **nodes/rust_sensor.rs:** ```rust // simplified use horus::prelude::*; pub struct TempSensor { temp_pub: Topic, counter: f32, } impl TempSensor { fn new() -> Result { Ok(Self { temp_pub: Topic::new("temperature")?, counter: 0.0, }) } } impl Node for TempSensor { fn name(&self) -> &str { "RustTempSensor" } fn init(&mut self) -> Result<()> { hlog!(info, "Rust sensor online - high performance mode"); Ok(()) } fn tick(&mut self) { // Fast sensor simulation let temp = 20.0 + (self.counter.sin() * 5.0); self.temp_pub.send(temp); hlog!(debug, "Rust: {:.2}°C", temp); self.counter += 0.1; } fn shutdown(&mut self) -> Result<()> { hlog!(info, "Rust sensor offline"); Ok(()) } } fn main() -> Result<()> { let mut scheduler = Scheduler::new(); scheduler.add(TempSensor::new()?).order(0).build()?; scheduler.run() } ``` ### Python Controller Node **nodes/py_controller.py:** ```python #!/usr/bin/env python3 import horus def py_controller_tick(node): temp = node.recv("temperature") if temp is not None: status = "HOT" if temp > 22.0 else "NORMAL" print(f"Python controller: {temp:.2f}°C - {status}") command = 1.0 if temp > 22.0 else 0.0 node.send("actuator_cmd", command) controller = horus.Node(name="PyController", tick=py_controller_tick, order=0, rate=10, subs=["temperature"], pubs=["actuator_cmd"]) horus.run(controller) ``` ### Rust Actuator Node **nodes/rust_actuator.rs:** ```rust // simplified use horus::prelude::*; struct Actuator { cmd_sub: Topic, } impl Node for Actuator { fn name(&self) -> &str { "RustActuator" } fn tick(&mut self) { if let Some(cmd) = self.cmd_sub.recv() { let action = if cmd > 0.5 { "COOLING" } else { "IDLE" }; hlog!(info, "Actuator: {}", action); } } fn shutdown(&mut self) -> Result<()> { hlog!(info, "Actuator stopped"); Ok(()) } } fn main() -> Result<()> { let mut scheduler = Scheduler::new(); scheduler.add(Actuator { cmd_sub: Topic::new("actuator_cmd")?, }).order(0).build()?; scheduler.run() } ``` ### Run Mixed System ```bash horus run "nodes/*" ``` HORUS automatically detects file types and compiles/runs appropriately! **Key Concepts**: - Rust nodes: High performance, type safety - Python nodes: Rapid prototyping, easy scripting - Shared memory IPC works across languages - Zero-copy message passing - Sub-microsecond latency even across language boundaries --- ## 5. Advanced Python Features Per-node rates, timestamp checking, and staleness detection. **File: `advanced_python.py`** ```python #!/usr/bin/env python3 import horus from horus import Imu # 100Hz IMU sensor def imu_tick(node): imu = Imu( accel_x=1.0, accel_y=0.0, accel_z=9.8, gyro_x=0.0, gyro_y=0.0, gyro_z=0.0 ) node.send("imu", imu) # 50Hz controller def controller_tick(node): imu = node.recv("imu") if imu is not None: # Process IMU data accel_magnitude = ( imu.accel_x**2 + imu.accel_y**2 + imu.accel_z**2 ) ** 0.5 print(f"Control: IMU magnitude = {accel_magnitude:.2f} m/s²") # Send command cmd = {"linear": 1.0, "angular": 0.5} node.send("cmd_vel", cmd) # 10Hz logger def logger_tick(node): msg = node.recv("cmd_vel") if msg is not None: print(f"Logger: Received command: {msg}") # Create nodes with per-node rates and run sensor = horus.Node(name="HighFreqSensor", tick=imu_tick, order=0, rate=100, pubs=["imu"]) ctrl = horus.Node(name="Controller", tick=controller_tick, order=1, rate=50, subs=["imu"], pubs=["cmd_vel"]) logger = horus.Node(name="Logger", tick=logger_tick, order=2, rate=10, subs=["cmd_vel"]) horus.run(sensor, ctrl, logger) ``` **Run it**: ```bash horus run advanced_python.py ``` **Key Concepts**: - **Per-node rates**: Each node runs at different frequency via `.rate(Hz)` - **Typed messages**: Use `Topic(Imu)` for cross-language compatible messages - **Generic messages**: Use `Topic("name")` for Python-only dict/list data - **Priorities**: Lower order number = higher priority, runs first each tick --- ## When to Use Multi-Process vs Single-Process ### Multi-Process (Concurrent Execution) **Use when:** - Nodes need to run independently (like ROS) - Want fault isolation (one crash doesn't kill others) - Different nodes have vastly different rates - Testing distributed system architectures - Mixing languages (Rust + Python) **Command:** ```bash horus run "nodes/*.rs" horus run "nodes/*.py" horus run "nodes/*" # Mix Rust and Python! ``` ### Single-Process **Use when:** - All nodes in one file - Need maximum performance - Require deterministic execution order - Simpler deployment - Embedded systems with limited resources **Command:** ```bash horus run main.rs ``` --- ## Performance Notes ### Multi-Process IPC Performance - **Latency**: 300ns - 1μs (shared memory) - **Throughput**: Thousands of messages/second - **Scalability**: Tested with 50+ concurrent processes - **Memory**: ~2-5MB overhead per process ### Single-Process Performance - **Latency**: 50-200ns (in-memory) - **Throughput**: Millions of messages/second - **Scalability**: Hundreds of nodes in one process - **Memory**: Minimal overhead --- ## Testing Multi-Node Systems **Create test script: `test_system.py`** ```python #!/usr/bin/env python3 import horus # Shared test state fan_commands = [] def mock_sensor_tick(node): node.send("temperature", 30.0) # Hot! def test_controller_tick(node): temp = node.recv("temperature") if temp is not None and temp > 25.0: fan_speed = min(100, int((temp - 20) * 10)) fan_commands.append(fan_speed) node.send("fan_control", fan_speed) sensor = horus.Node(name="MockSensor", tick=mock_sensor_tick, pubs=["temperature"]) controller = horus.Node(name="TestController", tick=test_controller_tick, subs=["temperature"], pubs=["fan_control"]) # Use tick_once for testing (runs one tick cycle) scheduler = horus.Scheduler() scheduler.add(sensor) scheduler.add(controller) scheduler.tick_once() # Verify assert len(fan_commands) > 0, "Controller should have sent fan commands" assert fan_commands[0] == 100, f"Expected fan at 100%, got {fan_commands[0]}%" print("Test passed!") ``` --- ## Next Steps - [Performance Optimization](/performance/performance) - Tune for maximum throughput - [Python Bindings Reference](/python/api/python-bindings) - Complete Python API - [Package Management](/package-management/package-management) - Share and reuse nodes Need help? - [Troubleshooting](/getting-started/troubleshooting) - Debug common issues - [Monitor](/development/monitor) - Monitor your system in real-time --- ## See Also - [Basic Examples](/rust/examples/basic-examples) — Simple starter patterns - [Recipes](/recipes) — Production-ready copy-paste patterns - [Production Patterns](/rust/api/scheduler#production-patterns) — Warehouse AGV and drone examples --- ## horus_macros Path: /rust/api/macros Description: Procedural macros for reducing boilerplate # horus_macros Macros that eliminate boilerplate. Pick the right one: | I need to... | Use | Example | |---|---|---| | Define a node with pub/sub | `node!` | Sensor reader, motor controller | | Define a custom message type | `message!` | `MotorFeedback`, `WheelOdometry` | | Define a request/response service | `service!` | `CalibrateImu`, `GetMapRegion` | | Define a long-running task | `action!` | `NavigateTo`, `PickAndPlace` | | Use a standard action template | `standard_action!` | `navigate`, `manipulate`, `dock` | | Log from inside a node | `hlog!` | `hlog!(info, "Motor started")` | ```rust // simplified use horus::prelude::*; // Includes all macros ``` --- ## node! Declarative macro for creating HORUS nodes with minimal boilerplate. ### Syntax ```rust // simplified node! { NodeName { name: "custom_name", // Custom node name (optional) rate 100.0 // Tick rate in Hz (optional) pub { ... } // Publishers (optional) sub { ... } // Subscribers (optional) data { ... } // Internal state (optional) tick { ... } // Main loop (required) init { ... } // Initialization (optional) shutdown { ... } // Cleanup (optional) impl { ... } // Custom methods (optional) } } ``` Only the node name and `tick` are required. Everything else is optional. **What it generates** - A struct with the fields you define - `Topic` fields for each `sub:` and `pub:` declaration - `Node` trait implementation skeleton with `name()`, `tick()`, optional `init()` and `shutdown()` ### Sections #### `pub` - Publishers Define topics this node publishes to. ```rust // simplified pub { // Syntax: name: Type -> "topic_name" velocity: f32 -> "robot.velocity", status: String -> "robot.status", pose: Pose2D -> "robot.pose" } ``` **Generated code:** - `Topic` field for each publisher - Automatic initialization in `new()` #### `sub` - Subscribers Define topics this node subscribes to. ```rust // simplified sub { // Syntax: name: Type -> "topic_name" commands: String -> "user.commands", sensors: f32 -> "sensors.temperature" } ``` **Generated code:** - `Topic` field for each subscriber - Automatic initialization in `new()` #### `data` - Internal State Define internal fields with default values. ```rust // simplified data { counter: u32 = 0, buffer: Vec = Vec::new(), last_time: Instant = Instant::now(), config: MyConfig = MyConfig::default() } ``` #### `tick` - Main Loop **Required.** Called every scheduler cycle (~100 Hz by default). ```rust // simplified tick { // Read from subscribers if let Some(cmd) = self.commands.recv() { // Process } // Write to publishers self.velocity.send(1.0); // Access internal state self.counter += 1; } ``` #### `init` - Initialization Called once before the first tick. The block must return `Ok(())` on success (it generates `fn init(&mut self) -> Result<()>`). ```rust // simplified init { hlog!(info, "Node starting"); self.buffer.reserve(1000); Ok(()) } ``` #### `shutdown` - Cleanup Called once when the scheduler stops. Must return `Ok(())` on success (generates `fn shutdown(&mut self) -> Result<()>`). ```rust // simplified shutdown { hlog!(info, "Node stopping"); // Close files, save state, etc. Ok(()) } ``` #### `impl` - Custom Methods Add helper methods to the node. ```rust // simplified impl { fn calculate(&self, x: f32) -> f32 { x * 2.0 + self.offset } fn reset(&mut self) { self.counter = 0; } } ``` ### Generated Code The macro generates: 1. **`pub struct NodeName`** with `Topic` fields for publishers/subscribers and your data fields 2. **`impl NodeName { pub fn new() -> Self }`** constructor that creates all Topics 3. **`impl Node for NodeName`** with `name()`, `tick()`, optional `init()` and `shutdown()` 4. **`impl Default for NodeName`** that calls `Self::new()` 5. **`impl NodeName { ... }`** for any methods from the `impl` section ```rust // simplified // This macro call: node! { SensorNode { pub { data: f32 -> "sensor" } data { count: u32 = 0 } tick { self.count += 1; } } } // Generates approximately: pub struct SensorNode { data: Topic, count: u32, } impl SensorNode { pub fn new() -> Self { Self { data: Topic::new("sensor").expect("Failed to create publisher 'sensor'"), count: 0, } } } impl Node for SensorNode { fn name(&self) -> &str { "sensor_node" } // Auto snake_case fn tick(&mut self) { self.count += 1; } } impl Default for SensorNode { fn default() -> Self { Self::new() } } ``` The struct name is converted to snake_case for the node name (e.g., `SensorNode` becomes `"sensor_node"`), unless overridden with `name:`. ### Examples #### Minimal Node ```rust // simplified node! { MinimalNode { tick { // Called every tick } } } ``` #### Publisher Only ```rust // simplified node! { HeartbeatNode { pub { alive: bool -> "system.heartbeat" } data { count: u64 = 0 } tick { self.alive.send(true); self.count += 1; } } } ``` #### Subscriber Only ```rust // simplified node! { LoggerNode { sub { messages: String -> "logs" } tick { while let Some(msg) = self.messages.recv() { hlog!(info, "{}", msg); } } } } ``` #### Full Pipeline ```rust // simplified node! { ProcessorNode { sub { input: f32 -> "raw_data" } pub { output: f32 -> "processed_data" } data { scale: f32 = 2.0, offset: f32 = 10.0 } tick { if let Some(value) = self.input.recv() { let result = value * self.scale + self.offset; self.output.send(result); } } impl { fn set_scale(&mut self, scale: f32) { self.scale = scale; } } } } ``` #### With Lifecycle ```rust // simplified node! { StatefulNode { pub { status: String -> "status" } data { initialized: bool = false, tick_count: u64 = 0 } init { hlog!(info, "Initializing..."); self.initialized = true; Ok(()) } tick { self.tick_count += 1; let msg = format!("Tick {}", self.tick_count); self.status.send(msg); } shutdown { hlog!(info, "Total ticks: {}", self.tick_count); Ok(()) } } } ``` ### Usage ```rust // simplified use horus::prelude::*; node! { MyNode { pub { output: f32 -> "data" } tick { self.output.send(42.0); } } } fn main() -> Result<()> { let mut scheduler = Scheduler::new(); scheduler.add(MyNode::new()).order(0).build()?; scheduler.run() } ``` --- ## `#[derive(LogSummary)]` Derive macro for implementing the `LogSummary` trait with default `Debug` formatting. ### When to Use Use `#[derive(LogSummary)]` when you need `Topic::verbose flag (via TUI monitor)` on a custom message type. `LogSummary` is **not** required for basic `Topic::new()` — only for the opt-in introspection mode. The derive requires `Debug` on the type since it generates a `Debug`-based implementation. ```rust // simplified use horus::prelude::*; #[derive(Debug, Clone, Serialize, Deserialize, LogSummary)] pub struct MyStatus { pub temperature: f32, pub voltage: f32, } // Now you can use verbose flag (via TUI monitor) let topic: Topic = Topic::new("status")?; ``` The derive generates: ```rust // simplified impl LogSummary for MyStatus { fn log_summary(&self) -> String { format!("{:?}", self) } } ``` ### Custom LogSummary For large types (images, point clouds) where `Debug` output would be too verbose, implement `LogSummary` manually instead of deriving: ```rust // simplified use horus::prelude::*; impl LogSummary for MyLargeData { fn log_summary(&self) -> String { format!("MyLargeData({}x{}, {} bytes)", self.width, self.height, self.data.len()) } } ``` --- ## Best Practices ### Keep tick Fast ```rust // simplified // Good - non-blocking tick { if let Some(x) = self.input.recv() { self.output.send(x * 2.0); } } // Bad - blocking operation tick { std::thread::sleep(Duration::from_secs(1)); // Blocks scheduler! } ``` ### Pre-allocate in init ```rust // simplified init { self.buffer.reserve(1000); // Do once Ok(()) } tick { // Don't allocate here - runs every tick } ``` ### Use Descriptive Names ```rust // simplified // Good pub { motor_velocity: f32 -> "motors.velocity" } // Bad pub { x: f32 -> "data" } ``` ### Handle Errors Gracefully ```rust // simplified tick { // send() is infallible — always succeeds self.status.send("ok".to_string()); // No error handling needed — ring buffer overwrites oldest on full self.critical.send(data); } ``` --- ## Troubleshooting ### "Cannot find type in scope" Import message types: ```rust // simplified use horus::prelude::*; node! { MyNode { pub { cmd: CmdVel -> "cmd_vel" } tick { } } } ``` ### "Expected `,`, found `{`" Check arrow syntax: ```rust // simplified // Wrong pub { cmd: f32 "topic" } // Correct pub { cmd: f32 -> "topic" } ``` ### "Node name must be CamelCase" ```rust // simplified // Wrong node! { my_node { ... } } // Correct node! { MyNode { ... } } ``` ### Use hlog! for logging ```rust // simplified tick { // Use hlog! macro for logging hlog!(info, "test"); hlog!(debug, "value = {}", some_value); hlog!(warn, "potential issue"); hlog!(error, "something went wrong"); } ``` --- ## Logging Macros ### hlog! Node-aware logging that publishes to the shared memory log buffer (visible in monitor) and emits to stderr with ANSI colors. ```rust // simplified hlog!(info, "Sensor initialized"); hlog!(debug, "Value: {}", some_value); hlog!(warn, "Battery low: {}%", battery_pct); hlog!(error, "Failed to read sensor: {}", err); ``` Levels: `trace`, `debug`, `info`, `warn`, `error` The scheduler automatically sets the current node context, so log messages include the node name: ``` [INFO] [SensorNode] Sensor initialized ``` ### hlog_once! Log a message **once per callsite**. Subsequent calls from the same source location are silently ignored. Uses a per-callsite `AtomicBool` — zero overhead after the first call. ```rust // simplified fn tick(&mut self) { // Log when sensor first produces valid data hlog_once!(info, "Sensor online — first reading: {:.2}", value); // Warn about a condition the first time it's detected if self.error_count > 0 { hlog_once!(warn, "Sensor errors detected — check wiring"); } } ``` Common uses: first-connection notifications, one-time calibration messages, feature availability checks at startup. ### hlog_every! Throttled logging — emits at most once per `interval_ms` milliseconds. Uses a per-callsite `AtomicU64` timestamp — zero overhead when the interval hasn't elapsed. Essential for nodes running at high frequencies (100Hz+) where per-tick logging would flood the system. ```rust // simplified fn tick(&mut self) { // Status heartbeat every 5 seconds hlog_every!(5000, info, "Motor controller OK — speed: {:.1} rad/s", self.velocity); // Battery warnings every second (not every tick at 1kHz) if self.battery_pct < 20.0 { hlog_every!(1000, warn, "Battery low: {:.0}%", self.battery_pct); } // Periodic performance stats every 10 seconds hlog_every!(10_000, debug, "Avg latency: {:.1}us, ticks: {}", self.avg_latency_us, self.tick_count); } ``` --- ## message! Declarative macro for defining custom message types for use with `Topic`. Auto-derives all required traits so messages work with zero configuration. ### Flexible Messages (default) Any field types allowed — uses serialization transport (~167ns): ```rust // simplified use horus::prelude::*; message! { /// Robot log entry — contains a String (variable-size) LogEntry { text: String, level: u8, timestamp_ns: u64, } } ``` ### Fixed Messages (zero-copy) Add `#[fixed]` for zero-copy shared memory transport (~50ns). All fields must be `Copy` — no `String`, `Vec`, `Box`, etc.: ```rust // simplified use horus::prelude::*; message! { #[fixed] /// Motor command — fixed-size, zero-copy via shared memory MotorCommand { velocity: f32, torque: f32, } } let topic: Topic = Topic::new("motor.cmd")?; topic.send(MotorCommand { velocity: 1.0, torque: 0.5 }); ``` ### Multiple Messages Define multiple types in a single block — `#[fixed]` and flexible can be mixed: ```rust // simplified message! { #[fixed] /// Zero-copy velocity command (~50ns) CmdVel { linear_x: f64, angular_z: f64, } /// Flexible status with dynamic fields (~167ns) StatusReport { message: String, node_name: String, error_count: u32, } } ``` ### Generated Code **Without `#[fixed]`** (flexible): ```rust,ignore #[derive(Clone, Debug, Serialize, Deserialize)] pub struct Foo { pub x: f32, pub y: f32 } impl LogSummary for Foo { ... } ``` **With `#[fixed]`** (zero-copy): ```rust,ignore #[repr(C)] #[derive(Clone, Copy, Default, Debug, Serialize, Deserialize)] pub struct Foo { pub x: f32, pub y: f32 } impl LogSummary for Foo { ... } ``` `#[fixed]` adds `Copy`, `Default`, and `#[repr(C)]` — enabling automatic zero-copy POD detection for shared memory transport. If you put a non-Copy field (like `String`) inside `#[fixed]`, the compiler tells you immediately. ### When to Use - **`#[fixed]`** — For high-frequency control loops (1kHz+) with primitive fields: sensor readings, motor commands, joint states - **Without `#[fixed]`** — For messages with strings, vectors, or other dynamic data: logs, configs, diagnostics - **Neither** — For [standard message types](/rust/api/messages) (Twist, Pose2D, Imu, etc.), use the prelude types instead --- ### action! and service! Macros See [Actions](/concepts/actions) and [Services](/concepts/services) for the `action!` and `service!` macros that generate typed communication patterns. --- ## topics! Define compile-time topic descriptors for type-safe, typo-proof topic names across your codebase. ### Syntax ```rust // simplified use horus::prelude::*; topics! { pub CMD_VEL: CmdVel = "cmd_vel", pub SENSOR_DATA: Imu = "sensor.imu", pub MOTOR_STATUS: MotorCommand = "motor.cmd", } ``` Each entry creates a `TopicDescriptor` constant. Use it to create topics with guaranteed name and type consistency: ### Usage ```rust // simplified // Instead of string literals (typo-prone): let topic: Topic = Topic::new("cmd_vel")?; // Use typed descriptors (compile-time checked): let topic = CMD_VEL.create()?; // Topic, name = "cmd_vel" ``` ### Benefits - **No typos** — topic name is defined once, referenced everywhere - **Type safety** — `SENSOR_DATA.create()` always returns `Topic`, never `Topic` - **Discoverability** — grep for `CMD_VEL` to find all publishers and subscribers --- ## register_driver! Register a hardware driver so the `[hardware]` section of `horus.toml` can instantiate it by name. The macro uses ELF `.init_array` to register the factory at program startup — no manual initialization code needed. ### Syntax ```rust // simplified use horus::prelude::*; use horus::hardware::NodeParams; register_driver!(MyDriver, MyDriver::from_params); ``` The first argument is the driver type. The second is a factory function with signature `fn(&NodeParams) -> Result`. ### Full Example ```rust // simplified use horus::prelude::*; use horus::hardware::NodeParams; struct LidarDriver { port: String, rpm: u32, } impl LidarDriver { fn from_params(params: &NodeParams) -> Result { Ok(Self { port: params.get::("port")?, rpm: params.get_or("rpm", 600u32), }) } } impl Node for LidarDriver { fn name(&self) -> &str { "lidar" } fn tick(&mut self) { // Read from hardware, publish scan data } } register_driver!(LidarDriver, LidarDriver::from_params); ``` Then in `horus.toml`: ```toml [hardware.lidar] use = "LidarDriver" port = "/dev/ttyUSB0" rpm = 300 ``` And in `main.rs`: ```rust // simplified fn main() -> Result<()> { let nodes = hardware::load()?; let mut sched = Scheduler::new().tick_rate(100_u64.hz()); for node in nodes { sched.add(node).build()?; } sched.run()?; Ok(()) } ``` ### NodeParams API The factory function receives a `NodeParams` with typed access to the `[hardware.NAME]` config table: | Method | Description | |--------|-------------| | `params.get::(key)?` | Required param — errors if missing or wrong type | | `params.get_or(key, default)` | Optional param with fallback value | | `params.has(key)` | Check if a param exists | | `params.keys()` | Iterate over all param names | Supported types: `String`, `bool`, `u8`–`u64`, `i8`–`i64`, `f32`, `f64`, `Vec`. ### How It Works `register_driver!` places a constructor function in the ELF `.init_array` section. When the binary loads, the OS calls it before `main()`, registering the factory in a global registry. `hardware::load()` reads `[hardware]` from `horus.toml`, looks up each `use` name in the registry, and calls the factory with the config params. --- ## See Also - [node! Macro Guide](/concepts/node-macro) - Detailed tutorial - [API Reference](/rust/api) - Core types reference - [Actions](/concepts/actions) - action! macro reference - [Services](/concepts/services) - service! macro reference - [Drivers](/concepts/drivers) - Hardware driver system --- ## Node API Path: /rust/api/node Description: Complete API reference for the HORUS Node trait — lifecycle, safety, error handling, and monitoring # Node API The `Node` trait is the most fundamental type in HORUS. Every component in your robotics system — sensors, controllers, planners, loggers — implements `Node`. The scheduler calls your trait methods in a defined lifecycle: `init()` once at startup, `tick()` repeatedly, and `shutdown()` once at exit. > **Python**: Available via `horus.Node(name, tick, rate, pubs, subs)`. See [Python API](/python/api). ```rust // simplified use horus::prelude::*; ``` --- ## Quick Reference — Trait Methods | Method | Required | Default | Description | |--------|----------|---------|-------------| | `name(&self) -> &str` | No | Type name | Unique identifier for this node | | `init(&mut self) -> Result<()>` | No | `Ok(())` | One-time initialization at startup | | `tick(&mut self)` | **Yes** | — | Main execution loop, called every cycle | | `shutdown(&mut self) -> Result<()>` | No | `Ok(())` | Cleanup on exit | | `is_safe_state(&self) -> bool` | No | `true` | Safety monitor checks this for recovery | | `enter_safe_state(&mut self)` | No | no-op | Called by watchdog on critical timeout | | `on_error(&mut self, error: &str)` | No | logs error | Called when `tick()` panics | | `publishers(&self) -> Vec` | No | `[]` | Internal: topic discovery | | `subscribers(&self) -> Vec` | No | `[]` | Internal: topic discovery | ## Quick Reference — Supporting Types | Type | Purpose | |------|---------| | `NodeState` | Lifecycle state (Uninitialized, Running, Stopped, etc.) | | `NodeHealthState` | Watchdog health (Healthy, Warning, Unhealthy, Isolated) | | `NodeMetrics` | Performance counters (tick times, message counts, errors) | | `Miss` | Deadline miss policy (Warn, Skip, SafeMode, Stop) | | `FailurePolicy` | Error recovery strategy (Fatal, Restart, Skip, Ignore) | | `TopicMetadata` | Topic name + type name for monitoring | --- ## Lifecycle The scheduler manages Node lifecycle in a strict order: ``` Construction → Registration → Init → Tick Loop → Shutdown you scheduler once repeated once ``` 1. **Construction** — You create your node struct with topics, state, and configuration 2. **Registration** — `scheduler.add(node).build()` validates config and registers the node 3. **Initialization** — On first `run()` or `tick_once()`, the scheduler calls `init()` (lazy). Wrapped in `catch_unwind` for panic safety. Topic associations are auto-detected via `TopicNodeRegistry` when topics are used during tick 4. **Tick Loop** — Each cycle: check health state, feed watchdog, call `tick()`, measure timing, check budget/deadline. If `tick()` panics, `on_error()` is called 5. **Shutdown** — On Ctrl+C, SIGTERM, or `.stop()`: `shutdown()` called on each node in order. Wrapped in `catch_unwind`. Timing report printed --- ## Trait Methods ### name() ```rust // simplified fn name(&self) -> &str ``` Returns a unique identifier for this node within the scheduler. **Default**: Extracts the struct's type name from `std::any::type_name::()`, stripping the module path. For `my_crate::sensors::ImuReader`, the default is `"ImuReader"`. **Override when**: You create multiple instances of the same struct, or want a human-readable name for monitoring. ```rust // simplified struct ImuReader { name: String, // ... } impl Node for ImuReader { fn name(&self) -> &str { &self.name } fn tick(&mut self) { /* ... */ } } ``` **Rules**: - Must be unique within a scheduler — duplicate names cause a build error - Used in `horus node list`, `horus log`, `horus monitor`, and `horus blackbox` - Must be stable across restarts for recording/replay to match --- ### init() ```rust // simplified fn init(&mut self) -> Result<()> ``` Called once when the scheduler first starts (`run()` or `tick_once()`). Use for setup that may fail: opening hardware, connecting to networks, loading calibration files. **Default**: Returns `Ok(())`. **When called**: Lazily on first run, not at `scheduler.add()` time. Wrapped in `catch_unwind` — panics are caught and handled by the `FailurePolicy`. **On error**: The configured `FailurePolicy` determines behavior. `Fatal` (default) stops the scheduler. `Restart` retries with exponential backoff. ```rust // simplified impl Node for LidarDriver { fn name(&self) -> &str { "lidar" } fn init(&mut self) -> Result<()> { self.device = SerialPort::open(&self.port) .map_err(|e| Error::config(format!("Cannot open {}: {}", self.port, e)))?; hlog!(info, "Lidar connected on {}", self.port); Ok(()) } fn tick(&mut self) { /* read scans */ } } ``` **Rules**: - Do heavy setup here, not in the struct constructor — keeps `scheduler.add()` fast - If init fails, `tick()` and `shutdown()` are never called for this node - Topics created in the constructor (before init) are valid — SHM is allocated on `Topic::new()` --- ### tick() ```rust // simplified fn tick(&mut self) ``` **The only required method.** Called repeatedly by the scheduler at the configured rate. This is your main execution loop — read sensors, compute control, publish results. The scheduler wraps every tick in timing measurement, profiling, budget checking, and watchdog feeding. If `tick()` panics, `catch_unwind` catches it and routes to `on_error()`. ```rust // simplified impl Node for MotorController { fn name(&self) -> &str { "motor" } fn tick(&mut self) { if let Some(cmd) = self.cmd_sub.try_recv() { let left = cmd.linear - cmd.angular * self.wheel_base / 2.0; let right = cmd.linear + cmd.angular * self.wheel_base / 2.0; self.motor_pub.send(MotorCommand { left_rpm: left * self.rpm_scale, right_rpm: right * self.rpm_scale, }); } } } ``` **Critical rules for tick()**: - **Never allocate** — avoid `Vec::new()`, `String::from()`, `Box::new()` in the hot path. Pre-allocate in `init()` or the constructor - **Never block on I/O** — use `.try_recv()`, not `.recv_blocking()`. Move I/O to `async_io()` nodes - **Never panic on recoverable errors** — use `if let Some` / `match`, not `.unwrap()` - **Never call `std::thread::sleep()`** — the scheduler handles timing - **Keep it fast** — target <10μs for RT nodes. Use `--release` for accurate measurement --- ### shutdown() ```rust // simplified fn shutdown(&mut self) -> Result<()> ``` Called once when the scheduler stops (Ctrl+C, SIGTERM, `.stop()`, or scope exit). Use for cleanup: zeroing motors, closing files, releasing hardware. **Default**: Returns `Ok(())`. **When called**: Nodes shut down in registration order. Each call is wrapped in `catch_unwind` — a panicking shutdown does not prevent other nodes from shutting down. ```rust // simplified impl Node for MotorController { fn name(&self) -> &str { "motor" } fn tick(&mut self) { /* ... */ } fn shutdown(&mut self) -> Result<()> { // CRITICAL: Zero motors before exit self.motor_pub.send(MotorCommand { left_rpm: 0.0, right_rpm: 0.0, }); hlog!(info, "Motors zeroed"); Ok(()) } } ``` **Rules**: - **Always zero actuators** — if your node controls motors, servos, or any physical output, send a zero/stop command in shutdown - **Never panic** — a panic in shutdown is caught but prevents clean resource release - **Don't assume ordering** — while nodes shut down in registration order, another node's shutdown may have already released a shared resource - **Keep it fast** — the scheduler has a 3-second timeout before detaching --- ### is_safe_state() ```rust // simplified fn is_safe_state(&self) -> bool ``` Called by the safety monitor each tick for nodes in `Isolated` health state. If this returns `true`, the node transitions back to `Healthy` and resumes normal ticking. **Default**: Returns `true` (node is always considered safe). **When to override**: Implement for nodes that need recovery verification — check that sensors are reading valid data, that actuators are in a known position, or that communication is restored. ```rust // simplified impl Node for ArmController { fn name(&self) -> &str { "arm" } fn tick(&mut self) { /* ... */ } fn is_safe_state(&self) -> bool { // Only resume if all joints are within limits self.joints.iter().all(|j| j.position.abs() < j.limit) } fn enter_safe_state(&mut self) { // Freeze all joints for joint in &mut self.joints { joint.target_velocity = 0.0; } } } ``` **Rules**: - Called every tick while the node is `Isolated` — keep it fast (no I/O) - Returning `false` indefinitely keeps the node isolated forever - The scheduler never calls `tick()` on an Isolated node until `is_safe_state()` returns `true` --- ### enter_safe_state() ```rust // simplified fn enter_safe_state(&mut self) ``` Called by the watchdog when a node reaches `Critical` severity (3x timeout elapsed) or when a `Miss::SafeMode` deadline miss fires. Transition to a safe configuration — stop motors, hold position, reduce rate. **Default**: No-op. **When to override**: Any node that controls physical actuators or makes safety-critical decisions. ```rust // simplified fn enter_safe_state(&mut self) { self.motor_pub.send(MotorCommand::zero()); self.in_safe_mode = true; hlog!(warn, "Entered safe state — motors zeroed"); } ``` **Rules**: - Must be fast — called from the watchdog path, not the normal tick path - After this is called, the node's health transitions to `Isolated` - The scheduler calls `is_safe_state()` each tick to check for recovery - If you set `on_miss(Miss::SafeMode)`, this is called on every deadline miss --- ### on_error() ```rust // simplified fn on_error(&mut self, error: &str) ``` Called when `tick()` panics. The panic is caught by `catch_unwind`, and the panic message is passed as `error`. Override to add custom recovery logic, telemetry, or graceful degradation. **Default**: Logs `"Node error: {error}"` via `hlog!(error, ...)`. ```rust // simplified fn on_error(&mut self, error: &str) { self.error_count += 1; hlog!(error, "Tick failed ({}x): {}", self.error_count, error); if self.error_count > 10 { self.motor_pub.send(MotorCommand::zero()); } } ``` **Rules**: - Called after `catch_unwind` — the panic has been absorbed, the scheduler continues - The `FailurePolicy` determines what happens next (retry, skip, fatal stop) - Do not panic in `on_error()` — it would cause a double-panic --- --- ## Supporting Types ### TopicMetadata ```rust // simplified pub struct TopicMetadata { pub topic_name: String, pub type_name: String, } ``` Describes a topic connection for monitoring. Topic-node associations are auto-detected when `Topic::new()` is called during a node's tick — you don't need to declare them manually. ### NodeState Lifecycle state of a node, tracked by the scheduler. | Variant | Description | |---------|-------------| | `Uninitialized` | Registered but `init()` not yet called | | `Initializing` | `init()` is running | | `Running` | Normal operation — `tick()` being called | | `Stopping` | `shutdown()` is running | | `Stopped` | Shutdown complete | | `Error(String)` | `init()` or `tick()` returned an error | | `Crashed(String)` | `tick()` panicked | ### NodeHealthState Watchdog health, stored as `AtomicU8` for lock-free per-node tracking. | Variant | Value | Trigger | Behavior | |---------|-------|---------|----------| | `Healthy` | 0 | Normal operation | Node is ticked every cycle | | `Warning` | 1 | 1x watchdog timeout | Node still ticks, warning logged | | `Unhealthy` | 2 | 2x watchdog timeout | Node skipped in tick loop | | `Isolated` | 3 | 3x timeout on critical node | `enter_safe_state()` called, node skipped until `is_safe_state()` returns `true` | ### NodeMetrics Performance counters collected by the scheduler. Access via `scheduler.get_node_metrics("name")`. | Field | Type | Description | |-------|------|-------------| | `name` | `String` | Node name | | `order` | `u32` | Execution order | | `total_ticks` | `u64` | Total tick count | | `successful_ticks` | `u64` | Ticks completed without error | | `failed_ticks` | `u64` | Ticks that panicked or errored | | `avg_tick_duration_ms` | `f64` | Running average tick time | | `max_tick_duration_ms` | `f64` | Worst-case tick time | | `min_tick_duration_ms` | `f64` | Best-case tick time | | `last_tick_duration_ms` | `f64` | Most recent tick time | | `messages_sent` | `u64` | Total messages published | | `messages_received` | `u64` | Total messages received | | `errors_count` | `u64` | Total errors | | `warnings_count` | `u64` | Total warnings | | `uptime_seconds` | `f64` | Time since init | ### Miss Deadline miss policy, set via `.on_miss()` on the node builder. | Variant | Behavior | |---------|----------| | `Warn` (default) | Log a warning, continue normally | | `Skip` | Skip this tick, resume next cycle | | `SafeMode` | Call `enter_safe_state()`, check `is_safe_state()` for recovery | | `Stop` | Stop the entire scheduler (last resort) | ### FailurePolicy Error recovery strategy, set via `.failure_policy()` on the node builder. | Variant | Fields | Behavior | |---------|--------|----------| | `Fatal` | — | Node failure stops the scheduler immediately | | `Restart` | `max_restarts: u32`, `initial_backoff: Duration` | Re-initialize with exponential backoff. Escalates to fatal after max restarts | | `Skip` | `max_failures: u32`, `cooldown: Duration` | Tolerate failures with cooldown. After max consecutive failures, skip until cooldown expires | | `Ignore` | — | Log and continue — node keeps ticking regardless of errors | --- ## Production Patterns ### Standard Sensor Node A complete sensor node with topics, error handling, and clean shutdown: ```rust // simplified use horus::prelude::*; struct LidarNode { scan_pub: Topic, device: Option, port: String, } impl LidarNode { fn new(port: &str) -> Result { Ok(Self { scan_pub: Topic::new("scan")?, device: None, port: port.to_string(), }) } } impl Node for LidarNode { fn name(&self) -> &str { "lidar" } fn init(&mut self) -> Result<()> { self.device = Some(SerialPort::open(&self.port)?); hlog!(info, "Lidar connected on {}", self.port); Ok(()) } fn tick(&mut self) { if let Some(ref mut dev) = self.device { if let Ok(scan) = dev.read_scan() { self.scan_pub.send(scan); } } } fn shutdown(&mut self) -> Result<()> { if let Some(ref mut dev) = self.device { dev.stop_motor()?; } hlog!(info, "Lidar stopped"); Ok(()) } } fn main() -> Result<()> { let mut sched = Scheduler::new().tick_rate(10_u64.hz()); sched.add(LidarNode::new("/dev/ttyUSB0")?) .order(0) .failure_policy(FailurePolicy::Restart { max_restarts: 3, initial_backoff: 1_u64.ms().into(), }) .build()?; sched.run() } ``` ### Safety-Critical Node A motor controller with full safety integration: ```rust // simplified use horus::prelude::*; struct MotorController { cmd_sub: Topic, motor_pub: Topic, safe_mode: bool, wheel_base: f32, } impl MotorController { fn new() -> Result { Ok(Self { cmd_sub: Topic::new("cmd_vel")?, motor_pub: Topic::new("motor.cmd")?, safe_mode: false, wheel_base: 0.3, }) } } impl Node for MotorController { fn name(&self) -> &str { "motor_controller" } fn tick(&mut self) { if self.safe_mode { return; } if let Some(cmd) = self.cmd_sub.try_recv() { self.motor_pub.send(MotorCommand { left_rpm: (cmd.linear - cmd.angular * self.wheel_base / 2.0) * 100.0, right_rpm: (cmd.linear + cmd.angular * self.wheel_base / 2.0) * 100.0, }); } } fn shutdown(&mut self) -> Result<()> { self.motor_pub.send(MotorCommand { left_rpm: 0.0, right_rpm: 0.0 }); Ok(()) } fn enter_safe_state(&mut self) { self.motor_pub.send(MotorCommand { left_rpm: 0.0, right_rpm: 0.0 }); self.safe_mode = true; hlog!(warn, "Safe state — motors zeroed"); } fn is_safe_state(&self) -> bool { // Check that motors have actually stopped !self.safe_mode || true // simplified — real impl checks encoder feedback } fn on_error(&mut self, error: &str) { hlog!(error, "Motor error: {}", error); self.enter_safe_state(); } } fn main() -> Result<()> { let mut sched = Scheduler::new() .tick_rate(100_u64.hz()) .watchdog(500_u64.ms().into()); sched.add(MotorController::new()?) .order(1) .rate(100_u64.hz()) .budget(200_u64.us().into()) .deadline(900_u64.us().into()) .on_miss(Miss::SafeMode) .build()?; sched.run() } ``` --- ## Design Decisions **Why `Send` but not `Sync`?** Nodes are moved to the scheduler, which may run them on dedicated RT threads. `Send` is required for this transfer. But nodes are never accessed from multiple threads simultaneously — the scheduler owns exclusive access during tick — so `Sync` is unnecessary. This lets nodes contain `RefCell`, `UnsafeCell`, or any non-`Sync` state. **Why is `tick()` the only required method?** A minimal node just processes data each cycle. Everything else — init, shutdown, safety, monitoring — has sensible defaults. This keeps the "hello world" implementation to 4 lines while allowing full lifecycle control for production systems. **Why lazy initialization?** `init()` is called on first `run()`, not at `scheduler.add()` time. This lets you build the full node graph before any hardware is opened or resources are allocated. It also means `tick_once()` (used in testing) triggers init on first call, making tests self-contained. **Why `catch_unwind` on every lifecycle method?** A panicking node must not crash the entire robot. The scheduler catches panics in `init()`, `tick()`, and `shutdown()`, routing them through `FailurePolicy`. This is defense-in-depth — you should still avoid panics, but the system survives them. **How does monitoring know which topics a node uses?** Topic-node associations are auto-detected. When your node calls `Topic::new("motor.cmd")` during `tick()`, the `TopicNodeRegistry` automatically records the association. No manual declaration needed. --- ## Trade-offs | Choice | Benefit | Cost | |--------|---------|------| | Trait-based (not struct-based) | Full control over state, zero overhead | More boilerplate than the `node!` macro | | `&mut self` on tick (not a context param) | Direct field access, no borrowing gymnastics | Node must own its topics and state | | Default `is_safe_state() = true` | Nodes work without safety config | Unsafe nodes silently recover from isolation | | `catch_unwind` on every method | System survives panics | ~5ns overhead per tick, hides bugs if relied upon | | Single `tick()` entry point | Simple mental model | Complex nodes need internal state machines | --- ## See Also - [Node Concepts](/concepts/core-concepts-nodes) — Usage patterns, lifecycle diagram, communication patterns - [node! Macro](/concepts/node-macro) — Reduce boilerplate with declarative node definition - [Scheduler API](/rust/api/scheduler) — Node registration, execution classes, runtime management - [Safety Monitor](/advanced/safety-monitor) — Graduated watchdog, deadline enforcement, BlackBox - [Topic API](/rust/api/topic) — Publishing and subscribing to messages - [Macros Reference](/rust/api/macros) — `message!`, `node!`, `hlog!`, and more --- ## Scheduler API Path: /rust/api/scheduler Description: Complete API reference for the HORUS Scheduler — node orchestration, real-time scheduling, and runtime management # Scheduler API The Scheduler is the runtime that manages node execution in HORUS. It orchestrates tick loops, allocates real-time threads, enforces deadlines, and handles graceful shutdown. Every HORUS application creates exactly one Scheduler, adds nodes to it, and calls `.run()`. > **Python**: Available via `horus.Scheduler(tick_rate, rt, watchdog_ms)`. See [Python Bindings](/python/api/python-bindings). ```rust // simplified use horus::prelude::*; ``` --- ## Quick Reference — Scheduler Builder | Method | Default | Description | |--------|---------|-------------| | `.name(name)` | `"Scheduler"` | Sets the scheduler name for logging and diagnostics | | `.tick_rate(freq)` | 100 Hz | Sets the global tick loop frequency | | `.prefer_rt()` | — | Enables RT features with graceful degradation | | `.require_rt()` | — | Enables RT features, panics if unavailable | | `.deterministic(bool)` | `false` | Enables deterministic mode (SimClock, no parallelism) | | `.watchdog(duration)` | disabled | Enables frozen node detection with graduated degradation | | `.blackbox(size_mb)` | disabled | Enables flight recorder for crash forensics | | `.max_deadline_misses(n)` | 100 | Sets consecutive miss threshold before node isolation | | `.cores(cpu_ids)` | all cores | Pins scheduler threads to specific CPU cores | | `.verbose(bool)` | `true` | Enables or disables non-emergency logging | | `.with_recording()` | disabled | Enables session record/replay | ## Quick Reference — Node Builder | Method | Default | Description | |--------|---------|-------------| | `.order(n)` | 100 | Optional tiebreaker for independent nodes (lower = runs first). Ordering is automatic when nodes have topic dependencies. | | `.rate(freq)` | global | Sets per-node tick rate, auto-enables RT | | `.budget(duration)` | 80% of period | Sets maximum tick execution time | | `.deadline(duration)` | 95% of period | Sets hard deadline per tick | | `.on_miss(policy)` | `Warn` | Sets deadline miss policy | | `.compute()` | — | Runs on parallel thread pool | | `.on(topic)` | — | Triggers only when topic receives data | | `.async_io()` | — | Runs on tokio async runtime | | `.priority(n)` | — | Sets OS SCHED_FIFO priority (1-99) | | `.core(cpu_id)` | — | Pins node thread to a CPU core (also locks governor + moves IRQs) | | `.deadline_scheduler()` | — | Opt-in to SCHED_DEADLINE (kernel EDF). Falls back to SCHED_FIFO | | `.no_alloc()` | — | Panic if `tick()` allocates heap memory (requires `RtAwareAllocator`) | | `.failure_policy(p)` | — | Sets error recovery policy | | `.build()` | — | Validates and registers the node | ## Quick Reference — Execution | Method | Returns | Description | |--------|---------|-------------| | `.run()` | `Result<()>` | Starts the tick loop, blocks until Ctrl+C | | `.run_for(duration)` | `Result<()>` | Runs for a specific duration | | `.tick_once()` | `Result<()>` | Executes exactly one tick cycle | | `.stop()` | `()` | Signals graceful shutdown | --- ## Scheduler Builder Methods ### `new()` Creates a new Scheduler with default configuration and auto-detected platform capabilities. **Signature** ```rust // simplified pub fn new() -> Self ``` **Parameters** None. **Returns** `Scheduler` — with 100 Hz tick rate, no watchdog, no blackbox, no RT features enabled. No nodes registered. **Panics** Never. **Behavior** 1. Detects RT capabilities: `SCHED_FIFO` support, `mlockall` permission, CPU topology (~30-100us) 2. Cleans up stale SHM namespaces from previously crashed processes (<1ms) 3. Does NOT enable any RT features — use `.prefer_rt()` or `.require_rt()` to opt in **Example** ```rust // simplified use horus::prelude::*; let mut scheduler = Scheduler::new(); // Scheduler is ready — add nodes and call .run() ``` --- ### `tick_rate(freq)` Sets the global scheduler loop frequency. Individual nodes can override with `.rate()`. **Signature** ```rust // simplified pub fn tick_rate(self, freq: Frequency) -> Self ``` **Parameters** | Name | Type | Required | Description | |------|------|----------|-------------| | `freq` | `Frequency` | yes | Target tick rate. Create with `.hz()`: `100_u64.hz()`, `1000_u64.hz()`. | **Returns** `Self` — chainable. **Panics** Indirectly — `Frequency` validates at construction: - `0_u64.hz()` panics (zero frequency) - NaN, infinite, or negative values panic **Behavior** - Nodes without their own `.rate()` tick at this frequency - Nodes WITH `.rate()` tick at their own frequency, independent of the global rate - Higher rates = lower latency but more CPU usage **Example** ```rust // simplified use horus::prelude::*; // 100 Hz default control loop let s = Scheduler::new().tick_rate(100_u64.hz()); // 1 kHz high-frequency servo control let s = Scheduler::new().tick_rate(1000_u64.hz()); ``` **When to use** - Set to the fastest rate any BestEffort node needs - RT nodes (`.rate()`) are independent — the global rate doesn't limit them --- ### `prefer_rt()` Enables OS-level real-time features with graceful degradation. **Signature** ```rust // simplified pub fn prefer_rt(self) -> Self ``` **Parameters** None. **Returns** `Self` — chainable. **Panics** Never. Degrades gracefully. **Behavior** - Attempts to enable: `mlockall` (prevent page faults) + `SCHED_FIFO` (real-time scheduling) - If the system lacks RT capabilities, logs a warning and continues without them - Use `.degradations()` after construction to check which features were skipped > **Note**: `degradations()` is `#[doc(hidden)]`. It works but may change in future versions. **When to use** - Production robots where RT is desired but the deployment environment may vary - Development machines without RT kernels **When NOT to use** - Safety-critical systems where RT is mandatory — use `.require_rt()` instead **Example** ```rust // simplified use horus::prelude::*; let s = Scheduler::new().prefer_rt(); // On RT kernel: mlockall + SCHED_FIFO enabled // On non-RT kernel: warning logged, runs normally ``` --- ### `require_rt()` Enables OS-level real-time features. Panics if unavailable. **Signature** ```rust // simplified pub fn require_rt(self) -> Self ``` **Parameters** None. **Returns** `Self` — chainable. **Panics** If the system has neither `SCHED_FIFO` nor `mlockall` support. Message: "RT capabilities required but not available". **When to use** - Safety-critical deployments where running without RT guarantees is unacceptable - Forces developers to fix the deployment environment rather than silently degrading **Example** ```rust // simplified use horus::prelude::*; let s = Scheduler::new().require_rt(); // On non-RT kernel: PANICS ``` --- ### `deterministic(enabled)` Enables deterministic execution mode for reproducible runs. **Signature** ```rust // simplified pub fn deterministic(self, enabled: bool) -> Self ``` **Parameters** | Name | Type | Required | Description | |------|------|----------|-------------| | `enabled` | `bool` | yes | `true` to enable deterministic mode. `false` to disable (default). | **Returns** `Self` — chainable. **Behavior** When `true`: - Clock switches from `WallClock` to `SimClock` (virtual time, advances by exact `1/rate` per tick) - Execution is sequential — no parallelism, no thread pools - RNG is seeded deterministically - Two runs with identical inputs produce identical outputs regardless of CPU speed When `false`: - Normal wall-clock time, parallel execution, non-deterministic RNG **When to use** - Simulation and replay - Reproducible testing and CI pipelines - Debugging timing-dependent bugs **Example** ```rust // simplified use horus::prelude::*; let s = Scheduler::new() .tick_rate(100_u64.hz()) .deterministic(true); // Time advances exactly 10ms per tick, regardless of actual CPU time ``` --- ### `watchdog(timeout)` Enables frozen node detection with graduated degradation. **Signature** ```rust // simplified pub fn watchdog(self, timeout: Duration) -> Self ``` **Parameters** | Name | Type | Required | Description | |------|------|----------|-------------| | `timeout` | `Duration` | yes | Maximum allowed tick duration before degradation triggers. Create with `.ms()`: `500_u64.ms()`. | **Returns** `Self` — chainable. **Behavior** - Creates an internal safety monitor that checks every node's tick duration - Graduated response: Warning → reduce rate → isolate → enter safe state - Interacts with `.max_deadline_misses()` for the isolation threshold **Example** ```rust // simplified use horus::prelude::*; let s = Scheduler::new().watchdog(500_u64.ms()); ``` --- ### `blackbox(size_mb)` Enables the flight recorder for post-crash analysis. **Signature** ```rust // simplified pub fn blackbox(self, size_mb: usize) -> Self ``` **Parameters** | Name | Type | Required | Description | |------|------|----------|-------------| | `size_mb` | `usize` | yes | Circular buffer size in megabytes. Typical: 32-128. | **Returns** `Self` — chainable. **Behavior** - Records node ticks, timing data, and events in a circular buffer - On crash, the buffer survives for post-mortem analysis via `horus blackbox` - Overwrites oldest data when full — like an airplane's black box **Example** ```rust // simplified use horus::prelude::*; let s = Scheduler::new().blackbox(64); // 64MB flight recorder ``` --- ### `max_deadline_misses(n)` Sets the threshold for consecutive deadline misses before a node is isolated. **Signature** ```rust // simplified pub fn max_deadline_misses(self, n: u64) -> Self ``` **Parameters** | Name | Type | Required | Description | |------|------|----------|-------------| | `n` | `u64` | yes | Number of consecutive misses. Default: 100. | **Returns** `Self` — chainable. **Behavior** - Requires `.watchdog()` to be enabled — without it, misses aren't tracked - After `n` consecutive misses, the node is isolated (stopped from ticking) - Lower values = more aggressive isolation, higher values = more tolerance **Example** ```rust // simplified use horus::prelude::*; let s = Scheduler::new() .watchdog(500_u64.ms()) .max_deadline_misses(5); // isolate after 5 consecutive misses ``` --- ### `cores(cpu_ids)` Pins scheduler threads to specific CPU cores. **Signature** ```rust // simplified pub fn cores(self, cpu_ids: &[usize]) -> Self ``` **Parameters** | Name | Type | Required | Description | |------|------|----------|-------------| | `cpu_ids` | `&[usize]` | yes | CPU core indices. E.g., `&[2, 3]` for cores 2 and 3. | **Returns** `Self` — chainable. **When to use** - Production RT systems where you've isolated CPU cores for the robot - Prevents OS from migrating scheduler threads between cores --- ### `verbose(enabled)` Enables or disables non-emergency logging. **Signature** ```rust // simplified pub fn verbose(self, enabled: bool) -> Self ``` **Parameters** | Name | Type | Required | Description | |------|------|----------|-------------| | `enabled` | `bool` | yes | `true` for full logging (default). `false` for emergency-only. | **Returns** `Self` — chainable. --- ### `name(name)` Sets the scheduler name for logging and diagnostics. **Signature** ```rust // simplified pub fn name(self, name: &str) -> Self ``` **Parameters** | Name | Type | Required | Description | |------|------|----------|-------------| | `name` | `&str` | yes | Scheduler name. Default: `"Scheduler"`. | **Returns** `Self` — chainable. --- ### `with_recording()` Enables session recording for replay and debugging. **Signature** ```rust // simplified pub fn with_recording(self) -> Self ``` **Parameters** None. **Returns** `Self` — chainable. **Behavior** - Records all topic messages and timing data during the session - Use `horus record replay ` to replay --- ## Node Builder Methods Returned by `scheduler.add(node)`. Chain configuration methods, then call `.build()` to register. ```rust // simplified use horus::prelude::*; scheduler.add(my_node) .order(10) .rate(500_u64.hz()) .on_miss(Miss::Skip) .build()?; ``` ### `order(n)` Sets execution priority. Lower values run first within each tick cycle. **Signature** ```rust // simplified pub fn order(self, order: u32) -> Self ``` **Parameters** | Name | Type | Required | Description | |------|------|----------|-------------| | `order` | `u32` | no | Tiebreaker for independent nodes. Default: 100. Ordering is automatic when nodes have topic dependencies via `send()`/`recv()`. | **Returns** `NodeBuilder` — chainable. **Behavior** - Within a single tick, all nodes execute in order from lowest to highest - Applies to ALL execution classes — even Event and Compute nodes use order when multiple fire simultaneously **Guidelines** | Range | Use | |-------|-----| | 0-9 | Safety-critical (emergency stop, safety monitor) | | 10-49 | High priority (sensors, fast control loops) | | 50-99 | Normal (processing, planning) | | 100-199 | Low (logging, diagnostics) | | 200+ | Background (telemetry, analytics) | **Example** ```rust // simplified use horus::prelude::*; scheduler.add(safety_node).order(0).build()?; // runs first scheduler.add(motor_ctrl).order(10).build()?; // runs second scheduler.add(logger).order(200).build()?; // runs last ``` --- ### `rate(freq)` Sets a per-node tick rate. Automatically promotes BestEffort nodes to RT execution class. **Signature** ```rust // simplified pub fn rate(self, freq: Frequency) -> Self ``` **Parameters** | Name | Type | Required | Description | |------|------|----------|-------------| | `freq` | `Frequency` | yes | Node-specific tick rate. Create with `.hz()`: `1000_u64.hz()`. | **Returns** `NodeBuilder` — chainable. **Panics** Indirectly — `Frequency` validates at construction (zero, NaN, infinite panic). **Behavior** - If the node's execution class is still `BestEffort` (default), auto-promotes to `Rt` with: - `budget = period * 0.80` (e.g., 500 Hz → 2ms period → 1.6ms budget) - `deadline = period * 0.95` (e.g., 500 Hz → 2ms period → 1.9ms deadline) - If the node already has an execution class (`.compute()`, `.on()`, `.async_io()`), rate is informational — no RT promotion, no budget/deadline enforcement - Method call order doesn't matter — auto-derivation is deferred to `.build()` time **Constraints** | Combines with | Result | |--------------|--------| | `.budget()` | Explicit budget overrides the auto-derived 80% value | | `.deadline()` | Explicit deadline overrides the auto-derived 95% value | | `.compute()` | Stays Compute — rate limits frequency but no RT enforcement | | `.on(topic)` | Stays Event — rate is ignored | | `.async_io()` | Stays AsyncIo — rate limits frequency but no RT enforcement | **Example** ```rust // simplified use horus::prelude::*; // Auto-derived RT: budget=1.6ms, deadline=1.9ms scheduler.add(sensor).order(0).rate(500_u64.hz()).build()?; // Override budget: budget=0.5ms instead of auto-derived 0.8ms scheduler.add(motor) .order(1) .rate(1000_u64.hz()) .budget(500_u64.us()) .build()?; // Compute node with rate-limiting (no RT) scheduler.add(planner).order(50).compute().rate(10_u64.hz()).build()?; ``` --- ### `budget(duration)` Sets the maximum CPU time allowed per tick. Auto-enables RT scheduling. **Signature** ```rust // simplified pub fn budget(self, budget: Duration) -> Self ``` **Parameters** | Name | Type | Required | Description | |------|------|----------|-------------| | `budget` | `Duration` | yes | Maximum tick execution time. Create with `.us()` or `.ms()`: `300_u64.us()`. | **Returns** `NodeBuilder` — chainable. **Errors** `.build()` rejects if: - Budget is zero - Budget exceeds deadline - Budget set on a non-RT execution class (no `.rate()` and class is Compute/Event/AsyncIo) **Behavior** - Overrides the auto-derived budget from `.rate()` (which defaults to 80% of period) - If called without `.rate()`, implicitly enables RT scheduling - When exceeded, the safety monitor counts it as a budget violation **Example** ```rust // simplified use horus::prelude::*; scheduler.add(motor) .order(0) .rate(1000_u64.hz()) .budget(300_u64.us()) // override auto-derived 800us .deadline(900_u64.us()) // override auto-derived 950us .on_miss(Miss::Skip) .build()?; ``` --- ### `deadline(duration)` Sets the hard deadline per tick. When exceeded, the `Miss` policy fires. **Signature** ```rust // simplified pub fn deadline(self, deadline: Duration) -> Self ``` **Parameters** | Name | Type | Required | Description | |------|------|----------|-------------| | `deadline` | `Duration` | yes | Hard deadline. Create with `.us()` or `.ms()`: `900_u64.us()`. | **Returns** `NodeBuilder` — chainable. **Errors** `.build()` rejects if: - Deadline is zero - Deadline is less than budget - Deadline set on a non-RT execution class **Behavior** - Overrides the auto-derived deadline from `.rate()` (which defaults to 95% of period) - If called without `.rate()`, implicitly enables RT scheduling - When exceeded, the `Miss` policy fires (`.on_miss()`) --- ### `on_miss(policy)` Sets the policy for when a node exceeds its deadline. **Signature** ```rust // simplified pub fn on_miss(self, policy: Miss) -> Self ``` **Parameters** | Name | Type | Required | Description | |------|------|----------|-------------| | `policy` | `Miss` | yes | One of `Miss::Warn`, `Miss::Skip`, `Miss::SafeMode`, `Miss::Stop`. Default: `Miss::Warn`. | **Returns** `NodeBuilder` — chainable. **Errors** `.build()` warns (but allows) if `.on_miss()` is set on a node without a deadline. **Example** ```rust // simplified use horus::prelude::*; scheduler.add(motor) .order(0) .rate(1000_u64.hz()) .on_miss(Miss::SafeMode) // calls enter_safe_state() on deadline miss .build()?; ``` --- ### `compute()` Marks the node as CPU-bound. Runs on a parallel worker thread pool. **Signature** ```rust // simplified pub fn compute(self) -> Self ``` **Parameters** None. **Returns** `NodeBuilder` — chainable. Sets execution class to `Compute`. **Behavior** - The node runs in a thread pool, isolated from the main tick loop and RT threads - If `.rate()` was already set, the node is rate-limited but does NOT get RT guarantees - If another execution class was already set (`.on()`, `.async_io()`), this overrides it with a warning **Constraints** | Combines with | Result | |--------------|--------| | `.rate()` | Rate-limited Compute (no RT budget/deadline enforcement) | | `.on(topic)` | Conflict — `.compute()` overrides Event. Last setter wins. Warning logged. | | `.async_io()` | Conflict — `.compute()` overrides AsyncIo. Last setter wins. Warning logged. | | `.budget()` / `.deadline()` | Rejected by `.build()` — Compute nodes don't have timing guarantees | | `.order()` | Works — determines priority when multiple Compute nodes finish simultaneously | **When to use** - Path planning, SLAM, image processing, ML inference, inverse kinematics - Any CPU-heavy work that would block the main tick loop **When NOT to use** - Real-time control loops — use `.rate()` instead (needs guaranteed timing) - Network/file I/O — use `.async_io()` instead (blocking I/O wastes thread pool slots) - Event-driven processing — use `.on(topic)` instead (no need to poll) **Example** ```rust // simplified use horus::prelude::*; scheduler.add(PathPlanner::new()) .order(50) .compute() .rate(10_u64.hz()) // rate-limited to 10 Hz, no RT enforcement .build()?; ``` --- ### `on(topic)` Makes the node event-driven. Ticks only when the named topic receives a new message. **Signature** ```rust // simplified pub fn on(self, topic: &str) -> Self ``` **Parameters** | Name | Type | Required | Description | |------|------|----------|-------------| | `topic` | `&str` | yes | Topic name to listen on. Must match a `Topic::new(name)` call exactly (case-sensitive). Use `/`-delimited hierarchical names: `"sensors/emergency_stop"`. | **Returns** `NodeBuilder` — chainable. Sets execution class to `Event`. **Errors** | Error | Condition | |-------|-----------| | `.build()` returns `Err` | `topic` is an empty string `""` | | `.build()` warns | `.budget()` or `.on_miss()` also set (no deadline to miss on Event nodes) | **Panics** Never. Invalid configuration is caught by `.build()`. **Behavior** - The node does NOT tick on the scheduler's global clock — it has its own wake mechanism - When a message is published to the named topic (by any node, any process), this node wakes and `tick()` is called - If multiple messages arrive between ticks, the node ticks once — call `recv()` in a loop to drain all pending messages - `.order()` still applies: if two event-driven nodes wake on the same tick, lower order runs first **Constraints** | Combines with | Result | |--------------|--------| | `.rate()` | Conflict — `.on()` overrides. Last setter wins. Warning logged. | | `.compute()` | Conflict — last setter wins. Warning logged. | | `.async_io()` | Conflict — last setter wins. Warning logged. | | `.budget()` / `.deadline()` | Rejected by `.build()` — Event nodes have no timing guarantees | | `.order()` | Works — priority when multiple events fire simultaneously | | `.on_miss()` | Warned — no deadline to miss on Event nodes | **When to use** - Emergency stop handlers — react to safety events immediately - Command processors — act on commands as they arrive, not on a clock - Event-driven pipelines — process images only when a new frame arrives **When NOT to use** - Continuous control loops — use `.rate()` instead (motor control needs guaranteed frequency) - Periodic polling — use default BestEffort (ticks every scheduler cycle) - CPU-heavy work triggered by events — use `.on()` to detect, then dispatch to a `.compute()` node via a topic **Example** ```rust // simplified use horus::prelude::*; // E-stop handler — ticks only when someone publishes to "emergency_stop" scheduler.add(EmergencyHandler::new()) .order(0) .on("emergency_stop") .build()?; // In the handler node: impl Node for EmergencyHandler { fn tick(&mut self) { // IMPORTANT: drain all pending messages — multiple stops may have fired while let Some(stop_cmd) = self.estop_topic.recv() { self.execute_stop(stop_cmd); } } } ``` --- ### `async_io()` Runs the node on the tokio async runtime. For blocking I/O operations. **Signature** ```rust // simplified pub fn async_io(self) -> Self ``` **Parameters** None. **Returns** `NodeBuilder` — chainable. Sets execution class to `AsyncIo`. **Behavior** - Runs `tick()` via `tokio::task::spawn_blocking()` on a separate runtime - Blocking I/O in this node never affects RT jitter or Compute throughput - If `.rate()` was set, the node is rate-limited but without RT enforcement **Constraints** | Combines with | Result | |--------------|--------| | `.rate()` | Rate-limited AsyncIo (no RT enforcement) | | `.compute()` | Conflict — `.async_io()` overrides. Last setter wins. Warning logged. | | `.on(topic)` | Conflict — `.async_io()` overrides. Last setter wins. Warning logged. | | `.budget()` / `.deadline()` | Rejected by `.build()` — AsyncIo nodes have no timing guarantees | **When to use** - Network calls (HTTP APIs, cloud upload, telemetry export) - File I/O (logging to disk, configuration reloading) - Database queries **When NOT to use** - CPU-bound work — use `.compute()` instead - Real-time control — use `.rate()` instead **Example** ```rust // simplified use horus::prelude::*; scheduler.add(TelemetryUploader::new()) .order(200) .async_io() .rate(1_u64.hz()) // upload once per second .build()?; ``` --- ### `priority(prio)` Sets OS-level thread priority for SCHED_FIFO real-time scheduling. **Signature** ```rust // simplified pub fn priority(self, prio: i32) -> Self ``` **Parameters** | Name | Type | Required | Description | |------|------|----------|-------------| | `prio` | `i32` | yes | SCHED_FIFO priority, 1-99. Higher = more priority. Requires `CAP_SYS_NICE`. | **Returns** `NodeBuilder` — chainable. **Errors** `.build()` warns if set on a non-RT node (`.priority()` only meaningful with `.rate()`). --- ### `core(cpu_id)` Pins this node's thread to a specific CPU core. **Signature** ```rust // simplified pub fn core(self, cpu_id: usize) -> Self ``` **Parameters** | Name | Type | Required | Description | |------|------|----------|-------------| | `cpu_id` | `usize` | yes | CPU core index. E.g., `2` for core 2. | **Returns** `NodeBuilder` — chainable. **Errors** `.build()` warns if set on a non-RT node. **Example** ```rust // simplified use horus::prelude::*; scheduler.add(motor) .order(0) .rate(1000_u64.hz()) .core(2) // pin to isolated CPU core .build()?; ``` --- ### `deadline_scheduler()` Opt in to Linux's SCHED_DEADLINE (Earliest Deadline First) scheduling instead of SCHED_FIFO. The kernel guarantees this node's thread gets `.budget()` of CPU time within every `.rate()` period. Unlike SCHED_FIFO (priority-based, can starve), EDF is bandwidth-fair with admission control — the kernel rejects tasks that would overcommit CPU. **Signature** ```rust pub fn deadline_scheduler(self) -> Self ``` **Requires:** `.rate()` and `.budget()` must be set (they provide the kernel parameters). Requires `CAP_SYS_NICE` or root. Falls back to SCHED_FIFO silently if unavailable. **Example** ```rust scheduler.add(motor_ctrl) .rate(1000_u64.hz()) .budget(500_u64.us()) .deadline_scheduler() // kernel-guaranteed EDF .build()?; ``` --- ### `no_alloc()` Enforce zero heap allocations during `tick()`. Any `Vec::push()`, `String::from()`, `format!()`, or `Box::new()` inside the tick function will panic with a clear message naming the offending node. **Signature** ```rust pub fn no_alloc(self) -> Self ``` **Requires:** The binary must set `RtAwareAllocator` as global allocator: ```rust #[global_allocator] static ALLOC: horus_core::memory::rt_allocator::RtAwareAllocator = horus_core::memory::rt_allocator::RtAwareAllocator; ``` Without this line, `.no_alloc()` is a no-op — safe for prototyping. **Example** ```rust scheduler.add(motor_ctrl) .rate(1000_u64.hz()) .no_alloc() // panic if tick() allocates .build()?; ``` --- ### `failure_policy(policy)` Sets the error recovery policy for this node. **Signature** ```rust // simplified pub fn failure_policy(self, policy: FailurePolicy) -> Self ``` **Parameters** | Name | Type | Required | Description | |------|------|----------|-------------| | `policy` | `FailurePolicy` | yes | One of `Fatal`, `Restart { max_retries, backoff }`, `Skip { max_skips, cooldown }`, `Ignore`. | **Returns** `NodeBuilder` — chainable. --- ### `build()` Validates the node configuration and registers it with the scheduler. **Signature** ```rust // simplified pub fn build(self) -> Result<&mut Scheduler> ``` **Parameters** None. **Returns** `Result<&mut Scheduler>` — the scheduler for chaining further `.add()` calls. **Errors** | Error | Condition | |-------|-----------| | `ValidationError` | Budget or deadline set on non-RT execution class | | `ValidationError` | Budget is zero | | `ValidationError` | Deadline is zero | | `ValidationError` | Budget exceeds deadline | | `ValidationError` | Empty topic name in `.on("")` | **Behavior** 1. Runs `finalize()` — if `.rate()` was set on a BestEffort node, promotes to RT with auto-derived budget (80%) and deadline (95%) 2. Validates all constraints (see Errors above) 3. Logs warnings for non-fatal misconfigurations (`.priority()` on non-RT, `.on_miss()` without deadline) 4. Registers the node with the scheduler **Panics** Never. Returns `Err` on invalid configuration. **Example** ```rust // simplified use horus::prelude::*; // This succeeds: scheduler.add(my_node).rate(100_u64.hz()).build()?; // This fails (budget > deadline): scheduler.add(my_node) .budget(10_u64.ms()) .deadline(5_u64.ms()) .build()?; // Err: budget exceeds deadline ``` --- ## Execution Methods ### `run()` Starts the main scheduler loop. Blocks until `Ctrl+C` or `.stop()` is called. **Signature** ```rust // simplified pub fn run(&mut self) -> Result<()> ``` **Returns** `Result<()>` — `Ok` on graceful shutdown, `Err` on fatal error. **Behavior** 1. Installs `SIGINT`/`SIGTERM` signal handler 2. Calls `init()` on all nodes (lazy init on first run) 3. Loops: tick all nodes in order → sleep until next period → repeat 4. On shutdown: calls `shutdown()` on all nodes in reverse order, prints timing report **Example** ```rust // simplified use horus::prelude::*; scheduler.run()?; // blocks until Ctrl+C ``` --- ### `run_for(duration)` Runs the scheduler for a specific duration, then shuts down. **Signature** ```rust // simplified pub fn run_for(&mut self, duration: Duration) -> Result<()> ``` **Parameters** | Name | Type | Required | Description | |------|------|----------|-------------| | `duration` | `Duration` | yes | How long to run. Create with `.secs()`: `30_u64.secs()`. | **Returns** `Result<()>` — `Ok` after the duration elapses. **Example** ```rust // simplified use horus::prelude::*; scheduler.run_for(30_u64.secs())?; // run 30 seconds then stop ``` --- ### `tick_once()` Executes exactly one tick cycle, then returns. No loop, no sleep. **Signature** ```rust // simplified pub fn tick_once(&mut self) -> Result<()> ``` **Returns** `Result<()>` **Behavior** - Calls `init()` on all nodes if not yet initialized (lazy init) - Ticks every registered node once in order - Returns immediately after all nodes have ticked **When to use** - Unit testing — tick, assert, tick, assert - Simulation stepping — manual time control - Integration tests **Example** ```rust // simplified use horus::prelude::*; let mut scheduler = Scheduler::new(); scheduler.add(my_node).build()?; // Step through 100 ticks manually for _ in 0..100 { scheduler.tick_once()?; } ``` --- ### `stop()` Signals graceful shutdown. **Signature** ```rust // simplified pub fn stop(&self) ``` **Behavior** - The current tick completes - `shutdown()` is called on all nodes in reverse order - Timing report is printed --- ## Types ### `Miss` Enum Deadline miss policy, set via `.on_miss()`. | Variant | Behavior | Use case | |---------|----------|----------| | `Miss::Warn` | Logs a warning, continues ticking | Soft real-time — logging, UI, telemetry | | `Miss::Skip` | Skips this tick entirely | Firm real-time — video encoding, non-critical sensors | | `Miss::SafeMode` | Calls `enter_safe_state()` on the node | Motor controllers, safety nodes — must go to safe output | | `Miss::Stop` | Stops the entire scheduler | Hard real-time safety-critical — unacceptable to continue | Default: `Miss::Warn`. ### `NodeMetrics` Per-node performance data, returned by `scheduler.metrics()`. | Method | Returns | Description | |--------|---------|-------------| | `.name()` | `&str` | Node name | | `.order()` | `u32` | Execution order | | `.total_ticks()` | `u64` | Total tick count | | `.successful_ticks()` | `u64` | Ticks without errors | | `.avg_tick_duration_ms()` | `f64` | Mean tick duration | | `.max_tick_duration_ms()` | `f64` | Worst-case tick duration | | `.min_tick_duration_ms()` | `f64` | Best-case tick duration | | `.last_tick_duration_ms()` | `f64` | Most recent tick duration | | `.messages_sent()` | `u64` | Total messages published | | `.messages_received()` | `u64` | Total messages consumed | | `.errors_count()` | `u64` | Error count | | `.warnings_count()` | `u64` | Warning count | | `.uptime_seconds()` | `f64` | Node uptime | ### `RtStats` Real-time execution statistics for nodes with `.rate()` set. | Method | Returns | Description | |--------|---------|-------------| | `.deadline_misses()` | `u64` | Total deadline misses | | `.budget_violations()` | `u64` | Total budget violations | | `.worst_execution()` | `Duration` | Worst-case tick duration | | `.last_execution()` | `Duration` | Most recent tick duration | | `.jitter_us()` | `f64` | Execution jitter in microseconds | | `.avg_execution_us()` | `f64` | Average tick duration in microseconds | | `.sampled_ticks()` | `u64` | Number of ticks sampled | | `.summary()` | `String` | Formatted timing summary | --- ## Monitoring Methods | Method | Returns | Description | |--------|---------|-------------| | `metrics()` | `Vec` | Per-node performance metrics | | `rt_stats(name)` | `Option<&RtStats>` | RT timing stats for a specific node | | `safety_stats()` | `Option` | Aggregate safety stats (budget overruns, watchdog expirations) | | `node_list()` | `Vec` | Names of all registered nodes | | `status()` | `String` | Formatted status report | > **Advanced/Unstable**: `rt_stats()`, `safety_stats()`, and `node_list()` are `#[doc(hidden)]` in the Rust source and may change without notice. ```rust // simplified use horus::prelude::*; // Check RT performance after a test run if let Some(stats) = scheduler.rt_stats("motor_ctrl") { println!("Deadline misses: {}", stats.deadline_misses()); println!("Worst execution: {:?}", stats.worst_execution()); println!("Jitter: {:.1} us", stats.jitter_us()); } ``` --- ## Production Patterns ### Warehouse AGV Mobile robot with safety monitor, lidar SLAM, path planner, and motor control: ```rust // simplified use horus::prelude::*; let mut sched = Scheduler::new() .watchdog(500_u64.ms()) .blackbox(64) .tick_rate(100_u64.hz()); // Safety runs first every tick — never skip sched.add(EmergencyStopMonitor::new()?).order(0).rate(100_u64.hz()).on_miss(Miss::Stop).build()?; // Sensors at 50Hz sched.add(LidarDriver::new()?).order(10).rate(50_u64.hz()).build()?; sched.add(WheelOdometry::new()?).order(11).rate(100_u64.hz()).build()?; // SLAM is CPU-heavy — runs on thread pool sched.add(SlamNode::new()?).order(20).compute().build()?; // Planner at 10Hz — doesn't need to be fast sched.add(PathPlanner::new()?).order(30).rate(10_u64.hz()).build()?; // Motor control at 100Hz with strict deadline sched.add(MotorController::new()?).order(40).rate(100_u64.hz()).budget(5_u64.ms()).on_miss(Miss::Skip).build()?; // Logger at 1Hz — background priority sched.add(TelemetryLogger::new()?).order(200).rate(1_u64.hz()).build()?; sched.run()?; ``` ### Drone Flight Controller High-frequency IMU processing with tight deadlines: ```rust // simplified use horus::prelude::*; let mut sched = Scheduler::new() .require_rt() .watchdog(100_u64.ms()) .tick_rate(1000_u64.hz()); sched.add(ImuReader::new()?).order(0).rate(1000_u64.hz()).budget(200_u64.us()).build()?; sched.add(AttitudeController::new()?).order(1).rate(1000_u64.hz()).budget(300_u64.us()).on_miss(Miss::SafeMode).build()?; sched.add(PositionController::new()?).order(2).rate(200_u64.hz()).budget(1_u64.ms()).build()?; sched.add(MotorMixer::new()?).order(3).rate(1000_u64.hz()).budget(100_u64.us()).build()?; sched.run()?; ``` --- ## See Also - [Scheduler Concepts](/concepts/core-concepts-scheduler) — Conceptual overview and architecture - [Execution Classes](/concepts/execution-classes) — The 5 execution classes and when to use each - [Scheduler Configuration](/advanced/scheduler-configuration) — Advanced tuning and deployment patterns - [Safety Monitor](/advanced/safety-monitor) — Budget enforcement and graduated degradation - [BlackBox](/advanced/blackbox) — Flight recorder for post-mortem analysis - [Node API](/concepts/core-concepts-nodes) — The Node trait and lifecycle - [DurationExt](/rust/api/duration-ext) — `.hz()`, `.ms()`, `.us()` ergonomic helpers - [Python Bindings](/python/api/python-bindings) — Python Scheduler API - [Driver API](/rust/api/drivers) — Load hardware from config --- ## Hardware API Path: /rust/api/drivers Description: Configure hardware nodes in horus.toml, load them at runtime, register custom node factories # Hardware API Load hardware node configurations from `horus.toml` and add them to your scheduler. ```rust // simplified use horus::prelude::*; use horus::hardware; ``` ## Overview A driver is a node. The hardware API provides: - **Config-driven hardware declarations** in `horus.toml [hardware]` - **Typed parameter access** via `NodeParams` (port, baudrate, servo IDs, etc.) - **Node factory registration** via `register_driver!` — your own `Node` implementations loaded from config - **Simulation swap** — `sim = true` per device, replaced by stub when `horus run --sim` ## Loading Hardware ### `hardware::load()` Reads the `[hardware]` section from `horus.toml` (searches current directory and up to 10 parents). Returns a list of `(name, node)` pairs ready for the scheduler. ```rust // simplified let nodes = hardware::load()?; for (name, node) in nodes { sched.add(node).build()?; } ``` Each entry in `[hardware]` must have a `use` field naming a registered node type. The factory is called with a `NodeParams` containing all non-reserved config keys. ### `hardware::load_from(path)` Load from a specific config file. Useful for testing or multi-robot setups. ```rust // simplified let nodes = hardware::load_from("tests/test_hardware.toml")?; ``` --- ## Registering Node Types Use `register_driver!` to register a factory so `[hardware]` config can instantiate your node. ### Step 1: Define your node ```rust // simplified use horus::prelude::*; use horus::hardware::NodeParams; struct ConveyorDriver { port: String, speed: f64, publisher: Topic, } impl ConveyorDriver { fn from_params(params: &NodeParams) -> Result { let port: String = params.get("port")?; let speed: f64 = params.get_or("speed", 1.0); Ok(Self { port, speed, publisher: Topic::new("conveyor.velocity")?, }) } } impl Node for ConveyorDriver { fn name(&self) -> &str { "ConveyorDriver" } fn tick(&mut self) { self.publisher.send(CmdVel::new(self.speed as f32, 0.0)); } fn enter_safe_state(&mut self) { self.publisher.send(CmdVel::new(0.0, 0.0)); } } // Register so [hardware.conveyor] use = "ConveyorDriver" works register_driver!(ConveyorDriver, ConveyorDriver::from_params); ``` ### Step 2: Configure in horus.toml ```toml [hardware.conveyor] use = "ConveyorDriver" port = "/dev/ttyACM0" speed = 0.5 ``` ### Step 3: Load and schedule ```rust // simplified fn main() -> Result<()> { let mut sched = Scheduler::new().tick_rate(50_u64.hz()); // Load all [hardware] entries — creates nodes from registered factories let nodes = hardware::load()?; for (_name, node) in nodes { sched.add(node).rate(50_u64.hz()).build()?; } sched.run() } ``` `hardware::load()` looks up `"ConveyorDriver"` in the registry and calls `from_params()` with the config values. It returns `Box` ready for the scheduler. --- ## `NodeParams` Typed access to config values from a `[hardware.NAME]` table. | Method | Description | |--------|-------------| | `params.get::("key")?` | Required param — errors if missing or wrong type | | `params.get_or("key", default)` | Optional param — returns default if missing or wrong type | | `params.has("key")` | Whether a key exists | | `params.keys()` | Iterator over param names | | `params.len()` | Number of params | | `params.is_empty()` | Whether there are no params | | `params.raw("key")` | Raw `toml::Value` for a key | ### Supported Types `get::()` supports: `String`, `bool`, `i32`, `i64`, `u32`, `u64`, `u8`, `f32`, `f64`, `Vec` (for any supported `T`). **Type coercion:** TOML integers convert to floats (`1000` becomes `1000.0`). No other cross-type coercion — a string `"42"` will NOT parse as an integer. Use the correct TOML type in your config. ### Reserved Keys These keys are consumed by the loader and NOT passed to `NodeParams`: `use`, `sim`, `args`, `terra`, `package`, `node`, `crate`, `source`, `pip`, `exec`, `simulated` --- ## Simulation Override Mark hardware entries with `sim = true` to swap them for stubs when running in simulation mode: ```toml [hardware.lidar] use = "rplidar" port = "/dev/ttyUSB0" sim = true [hardware.imu] use = "bno055" bus = "/dev/i2c-1" sim = true ``` ```bash horus run # real hardware — creates rplidar and bno055 nodes horus run --sim # sim mode — creates stub nodes, simulator publishes to same topics ``` Your application code doesn't change. The simulator (sim3d, mujoco) publishes to the same topic names. --- ## External Process Drivers Use the `exec:` prefix to wrap any binary as a node: ```toml [hardware.camera] use = "exec:./realsense_bridge" args = ["--width", "640", "--height", "480"] ``` The binary runs as a subprocess. It should publish to horus SHM topics. The `ExecDriver` monitors health and restarts on crash. --- ## `register_driver!` Register a node factory so `hardware::load()` can instantiate it from config. ```rust // simplified register_driver!(MyDriver, MyDriver::from_params); ``` The factory function signature: `fn(&NodeParams) -> Result`. The macro uses `.init_array` for compile-time registration — no manual setup needed. --- ## Complete Example ```toml # horus.toml [package] name = "my-robot" version = "0.1.0" [hardware.arm] use = "ArmDriver" port = "/dev/ttyUSB0" baudrate = 1000000 servo_ids = [1, 2, 3, 4, 5, 6] sim = true [hardware.conveyor] use = "ConveyorDriver" port = "/dev/ttyACM0" speed = 0.5 ``` ```rust // simplified use horus::prelude::*; use horus::hardware; fn main() -> Result<()> { let mut sched = Scheduler::new() .tick_rate(100_u64.hz()); // Load all hardware nodes from config let nodes = hardware::load()?; for (name, node) in nodes { hlog!(info, "Loaded hardware node: {}", name); sched.add(node).build()?; } sched.run() } ``` --- ## Testing with Mock Config Use `hardware::load_from()` to load from a test config file: ```rust // simplified #[test] fn conveyor_from_config() { std::fs::write("test_hw.toml", r#" [hardware.conveyor] use = "ConveyorDriver" port = "/dev/null" speed = 0.0 "#).unwrap(); let nodes = hardware::load_from("test_hw.toml").unwrap(); assert_eq!(nodes.len(), 1); assert_eq!(nodes[0].0, "conveyor"); std::fs::remove_file("test_hw.toml").ok(); } ``` --- ## Error Handling All hardware methods return `Result`. Common error conditions: | Operation | Error | When | |-----------|-------|------| | `hardware::load()` | `ConfigError` | No `horus.toml` found (searches up to 10 levels) | | `hardware::load()` | `ConfigError` | Unknown node type in `use` field — error lists registered types | | `params.get::("key")` | `ConfigError` | Key missing or type mismatch | ```rust // simplified let nodes = match hardware::load() { Ok(n) => n, Err(e) => { hlog!(error, "No hardware config: {}", e); hlog!(info, "Running without hardware"); vec![] } }; ``` --- ## Legacy Support The `[drivers]` section name and legacy source keys (`terra`, `package`, `node`, `crate`, `pip`, `exec`) are still parsed for backward compatibility. Migrate to `[hardware]` with the `use` field: ```toml # Old format [drivers.arm] terra = "dynamixel" port = "/dev/ttyUSB0" # New format [hardware.arm] use = "dynamixel" port = "/dev/ttyUSB0" sim = true ``` --- ## See Also - [Hardware Drivers Tutorial](/tutorials/05-hardware-drivers) — Step-by-step hardware integration - [Scheduler API](/rust/api/scheduler) — Running nodes - [Configuration Reference](/package-management/configuration) — `[hardware]` section in horus.toml --- ## Data Types & Encoding Path: /rust/api/tensor-messages Description: Image, PointCloud, and DepthImage types for robotics data # Data Types & Encoding High-level types for camera images, 3D point clouds, and depth maps. These types use zero-copy shared memory transport internally but expose ergonomic, domain-specific APIs. ```rust // simplified use horus::prelude::*; // Provides Image, PointCloud, DepthImage, TensorDtype, Device ``` ## Image Represents a camera frame with pixel-level access and encoding metadata. ```rust // simplified // Create a 1080p RGB image let image = Image::new(1920, 1080, ImageEncoding::Rgb8); // Publish on a topic let topic: Topic = Topic::new("camera.rgb")?; topic.send(&image); // Receive and access pixels if let Some(img) = topic.recv() { println!("{}x{}, encoding: {:?}", img.width(), img.height(), img.encoding()); let pixel = img.pixel(100, 200); } ``` ### ImageEncoding | Encoding | Channels | Dtype | Description | |----------|----------|-------|-------------| | `Rgb8` | 3 | U8 | Standard RGB color | | `Rgba8` | 4 | U8 | RGB with alpha | | `Bgr8` | 3 | U8 | BGR (OpenCV default) | | `Bgra8` | 4 | U8 | BGR with alpha | | `Mono8` | 1 | U8 | Grayscale 8-bit | | `Mono16` | 1 | U16 | Grayscale 16-bit | ## PointCloud Represents a 3D point cloud with per-point field access. ```rust // simplified // Create a point cloud with XYZ + intensity fields let cloud = PointCloud::new(10_000, &["x", "y", "z", "intensity"], TensorDtype::F32); // Publish let topic: Topic = Topic::new("lidar.points")?; topic.send(&cloud); // Receive and iterate points if let Some(pc) = topic.recv() { println!("{} points, {} fields", pc.num_points(), pc.fields().len()); for i in 0..pc.num_points() { let x = pc.field_f32("x", i); let y = pc.field_f32("y", i); let z = pc.field_f32("z", i); } } ``` ## DepthImage Represents a depth map, typically from an RGBD or stereo camera. ```rust // simplified // Create a 640x480 depth image with millimeter-precision U16 values let depth = DepthImage::millimeters(640, 480); // Publish let topic: Topic = Topic::new("camera.depth")?; topic.send(&depth); // Receive and query depth if let Some(d) = topic.recv() { let depth_mm = d.depth_at(320, 240); println!("Center depth: {} mm", depth_mm); } ``` ## TensorDtype Element data type used when constructing PointCloud, DepthImage, and other tensor-backed types. | Dtype | Size | Use Case | |-------|------|----------| | F32 | 4 | ML training/inference | | F64 | 8 | High-precision computation | | F16 | 2 | Memory-efficient inference | | BF16 | 2 | Training on modern GPUs | | U8 | 1 | Images | | U16 | 2 | Depth sensors (mm) | | U32 | 4 | Large indices | | U64 | 8 | Counters, timestamps | | I8 | 1 | Quantized inference | | I16 | 2 | Audio, sensor data | | I32 | 4 | General integer | | I64 | 8 | Large signed values | | Bool | 1 | Masks | ### TensorDtype Methods ```rust // simplified let dtype = TensorDtype::F32; assert_eq!(dtype.element_size(), 4); assert!(dtype.is_float()); assert!(!dtype.is_signed_int()); println!("{}", dtype); // "f32" // DLPack interop let dl = dtype.to_dlpack(); let back = TensorDtype::from_dlpack(dl.0, dl.1).unwrap(); // Parse from string let parsed = TensorDtype::parse("float32").unwrap(); ``` | Method | Returns | Description | |--------|---------|-------------| | `.element_size()` | `usize` | Bytes per element | | `.is_float()` | `bool` | Whether this is a floating-point type (F16, BF16, F32, F64) | | `.is_signed_int()` | `bool` | Whether this is a signed integer type (I8, I16, I32, I64) | | `TensorDtype::parse(s)` | `Option` | Parse from string ("float32", "uint8", "int16", etc.) | | `.to_dlpack()` | `(u8, u8)` | Convert to DLPack type code and bits | | `TensorDtype::from_dlpack(code, bits)` | `Option` | Create from DLPack type code and bits | ## Device The `Device` struct describes where tensor data physically resides. `Device::cuda(N)` creates a CUDA GPU device targeting GPU index N. When used with a GPU-backed tensor pool, the data is allocated in CUDA memory (managed, pinned, or device-only depending on platform). ```rust // simplified Device::cpu() // CPU / shared memory (mmap-backed) Device::cuda(0) // CUDA GPU device 0 (managed or pinned memory) // Parse from string let cpu = Device::parse("cpu").unwrap(); let dev = Device::parse("cuda:0").unwrap(); // Check device type assert!(Device::cpu().is_cpu()); assert!(Device::cuda(0).is_cuda()); // GPU detection if horus::cuda_available() { let count = horus::cuda_device_count(); println!("Found {} GPU(s)", count); } ``` | Method | Returns | Description | |--------|---------|-------------| | `Device::cpu()` | `Device` | CPU device (mmap-backed shared memory) | | `Device::cuda(index)` | `Device` | CUDA GPU device with the given index | | `.is_cpu()` | `bool` | Whether this is a CPU device | | `.is_cuda()` | `bool` | Whether this is a CUDA device | | `Device::parse(s)` | `Option` | Parse from string ("cpu", "cuda:0", "gpu:1") | | `horus::cuda_available()` | `bool` | Whether CUDA GPU support is available | | `horus::cuda_device_count()` | `usize` | Number of CUDA-capable devices (0 if no GPU) | | `horus::gpu_platform()` | `Option` | Detected GPU platform (Jetson or Discrete) | ## Python Interop Image, PointCloud, and DepthImage are available in Python with NumPy zero-copy access. ```python import horus import numpy as np # Subscribe to camera images topic = horus.Topic("camera.rgb", horus.Image) img = topic.recv() if img is not None: print(f"{img.width}x{img.height}, encoding: {img.encoding}") arr = img.to_numpy() # Zero-copy NumPy view # Subscribe to point clouds pc_topic = horus.Topic("lidar.points", horus.PointCloud) pc = pc_topic.recv() if pc is not None: points = pc.to_numpy() # (N, fields) NumPy array print(f"{pc.num_points} points") # Subscribe to depth images depth_topic = horus.Topic("camera.depth", horus.DepthImage) depth = depth_topic.recv() if depth is not None: depth_arr = depth.to_numpy() # (H, W) NumPy array ``` ## See Also - [Tensor](/rust/api/tensor) — Low-level tensor descriptor for ML pipelines - [Message Types](/concepts/message-types) — All HORUS message types - [Python Image](/python/api/image), [Python PointCloud](/python/api/pointcloud), [Python DepthImage](/python/api/depth-image) --- ## Basic Examples Path: /rust/examples/basic-examples Description: Simple HORUS patterns for beginners # Basic Examples Learn HORUS fundamentals through simple, focused examples. Each example is complete and runnable with `horus run`. **Estimated time**: 30-45 minutes ## Prerequisites - HORUS installed ([Installation Guide](/getting-started/installation)) - Completed [Quick Start](/getting-started/quick-start) - Basic Rust knowledge --- ## 1. Basic Publisher-Subscriber The foundational pattern in HORUS: one node publishes data, another subscribes. ### Publisher Node **File: `publisher.rs`** ```rust // simplified use horus::prelude::*; // Define publisher node struct SensorNode { data_pub: Topic, counter: f32, } impl SensorNode { fn new() -> Result { Ok(Self { data_pub: Topic::new("sensor_data")?, counter: 0.0, }) } } impl Node for SensorNode { fn name(&self) -> &str { "SensorNode" } fn init(&mut self) -> Result<()> { hlog!(info, "Sensor initialized"); Ok(()) } fn tick(&mut self) { // Simulate sensor reading let reading = self.counter.sin() * 10.0; // Publish data self.data_pub.send(reading); hlog!(debug, "Published: {:.2}", reading); self.counter += 0.1; } // SAFETY: no actuators — shutdown is optional for pure sensor nodes fn shutdown(&mut self) -> Result<()> { hlog!(info, "Sensor shutdown"); Ok(()) } } fn main() -> Result<()> { let mut scheduler = Scheduler::new(); scheduler.add(SensorNode::new()?).order(0).build()?; scheduler.run()?; Ok(()) } ``` **Run it**: ```bash horus run publisher.rs ``` ### Subscriber Node **File: `subscriber.rs`** ```rust // simplified use horus::prelude::*; struct ProcessorNode { data_sub: Topic, } impl ProcessorNode { fn new() -> Result { Ok(Self { data_sub: Topic::new("sensor_data")?, }) } } impl Node for ProcessorNode { fn name(&self) -> &str { "ProcessorNode" } fn init(&mut self) -> Result<()> { hlog!(info, "Processor initialized"); Ok(()) } fn tick(&mut self) { // IMPORTANT: call recv() every tick to drain the buffer and avoid stale data if let Some(data) = self.data_sub.recv() { // Process received data let processed = data * 2.0; hlog!(debug, "Processed: {:.2}", processed); } } fn shutdown(&mut self) -> Result<()> { hlog!(info, "Processor shutdown"); Ok(()) } } fn main() -> Result<()> { let mut scheduler = Scheduler::new(); scheduler.add(ProcessorNode::new()?).order(0).build()?; scheduler.run()?; Ok(()) } ``` **Run it**: HORUS uses a **flat namespace** (like ROS), so processes automatically share topics: ```bash # Terminal 1 horus run publisher.rs # Terminal 2 (automatically connects!) horus run subscriber.rs ``` Both use the same topic name (`"sensor_data"`) → communication works automatically! ### Combined Application **File: `pubsub.rs`** ```rust // simplified use horus::prelude::*; // Publisher struct SensorNode { data_pub: Topic, counter: f32, } impl Node for SensorNode { fn name(&self) -> &str { "SensorNode" } fn init(&mut self) -> Result<()> { hlog!(info, "Sensor online"); Ok(()) } fn tick(&mut self) { let reading = self.counter.sin() * 10.0; self.data_pub.send(reading); self.counter += 0.1; } fn shutdown(&mut self) -> Result<()> { hlog!(info, "Sensor offline"); Ok(()) } } // Subscriber struct ProcessorNode { data_sub: Topic, } impl Node for ProcessorNode { fn name(&self) -> &str { "ProcessorNode" } fn init(&mut self) -> Result<()> { hlog!(info, "Processor online"); Ok(()) } fn tick(&mut self) { // IMPORTANT: call recv() every tick to drain the buffer if let Some(data) = self.data_sub.recv() { let processed = data * 2.0; hlog!(info, "Received: {:.2} -> {:.2}", data, processed); } } fn shutdown(&mut self) -> Result<()> { hlog!(info, "Processor offline"); Ok(()) } } fn main() -> Result<()> { let mut scheduler = Scheduler::new(); // Add both nodes scheduler.add(SensorNode { data_pub: Topic::new("sensor_data")?, counter: 0.0, }).order(0).build()?; scheduler.add(ProcessorNode { data_sub: Topic::new("sensor_data")?, }).order(1).build()?; // Run both nodes together scheduler.run()?; Ok(()) } ``` **Run it**: ```bash horus run pubsub.rs ``` **Key Concepts**: - Publisher uses `Topic::new("topic")` to create publisher - Subscriber uses same topic name `"sensor_data"` - Priority matters: Publisher (0) runs before Subscriber (1) - `recv()` returns `Option` - handle None gracefully --- ## 2. Robot Velocity Controller Control a robot using standard CmdVel messages. **File: `robot_controller.rs`** ```rust // simplified use horus::prelude::*; // Keyboard input velocity commands struct TeleopNode { cmd_pub: Topic, } impl TeleopNode { fn new() -> Result { Ok(Self { cmd_pub: Topic::new("cmd_vel")?, }) } } impl Node for TeleopNode { fn name(&self) -> &str { "TeleopNode" } fn init(&mut self) -> Result<()> { hlog!(info, "Teleop ready - sending movement commands"); Ok(()) } fn tick(&mut self) { // Simulate keyboard input (w/a/s/d) // In real code, read from device::Input let cmd = CmdVel::new(1.0, 0.5); // Forward + turn right self.cmd_pub.send(cmd); } // SAFETY: send zero velocity on shutdown to stop the robot fn shutdown(&mut self) -> Result<()> { // CRITICAL: send stop command before exiting — prevents runaway let stop = CmdVel::zero(); self.cmd_pub.send(stop); hlog!(info, "Teleop stopped"); Ok(()) } } // Velocity commands motor control struct MotorDriverNode { cmd_sub: Topic, } impl MotorDriverNode { fn new() -> Result { Ok(Self { cmd_sub: Topic::new("cmd_vel")?, }) } } impl Node for MotorDriverNode { fn name(&self) -> &str { "MotorDriverNode" } fn init(&mut self) -> Result<()> { hlog!(info, "Motor driver initialized"); Ok(()) } fn tick(&mut self) { // IMPORTANT: call recv() every tick to drain the command buffer if let Some(cmd) = self.cmd_sub.recv() { // Convert to differential drive (left/right wheel speeds) let left_speed = cmd.linear - cmd.angular; let right_speed = cmd.linear + cmd.angular; // Send to motors hlog!(debug, "Motors: L={:.2} m/s, R={:.2} m/s", left_speed, right_speed); // In real code: send to hardware // motor_driver.set_speeds(left_speed, right_speed)?; } } // SAFETY: stop motors on shutdown — in real code, send zero to hardware here fn shutdown(&mut self) -> Result<()> { hlog!(info, "Motors stopped"); Ok(()) } } fn main() -> Result<()> { let mut scheduler = Scheduler::new(); scheduler.add(TeleopNode::new()?).order(0).build()?; scheduler.add(MotorDriverNode::new()?).order(1).build()?; scheduler.run()?; Ok(()) } ``` **Run it**: ```bash horus run robot_controller.rs ``` **Key Concepts**: - `CmdVel` is a standard robotics message type - `CmdVel::new(linear, angular)` creates velocity commands - Differential drive: `left = linear - angular`, `right = linear + angular` - Use `shutdown()` to send safe stop commands --- ## 3. Lidar Obstacle Detection Process laser scan data to detect obstacles and stop the robot. **File: `obstacle_detector.rs`** ```rust // simplified use horus::prelude::*; // Lidar Scan data struct LidarNode { scan_pub: Topic, angle: f32, } impl LidarNode { fn new() -> Result { Ok(Self { scan_pub: Topic::new("scan")?, angle: 0.0, }) } } impl Node for LidarNode { fn name(&self) -> &str { "LidarNode" } fn init(&mut self) -> Result<()> { hlog!(info, "Lidar initialized"); Ok(()) } fn tick(&mut self) { // NOTE: in real code, read from hardware instead of simulating let mut scan = LaserScan::new(); // Simulate lidar readings (sine wave for demo) for i in 0..360 { scan.ranges[i] = 5.0 + (self.angle + i as f32 * 0.01).sin() * 2.0; } // Add one close obstacle in front scan.ranges[0] = 0.3; // 30cm directly ahead! self.scan_pub.send(scan); self.angle += 0.1; } fn shutdown(&mut self) -> Result<()> { hlog!(info, "Lidar offline"); Ok(()) } } // Scan data Obstacle detection Stop command struct ObstacleDetector { scan_sub: Topic, cmd_pub: Topic, safety_distance: f32, } impl ObstacleDetector { fn new(safety_distance: f32) -> Result { Ok(Self { scan_sub: Topic::new("scan")?, cmd_pub: Topic::new("cmd_vel")?, safety_distance, }) } } impl Node for ObstacleDetector { fn name(&self) -> &str { "ObstacleDetector" } fn init(&mut self) -> Result<()> { hlog!(info, "Obstacle detector active - safety distance: {:.2}m", self.safety_distance); Ok(()) } fn tick(&mut self) { // IMPORTANT: call recv() every tick — stale scans cause delayed obstacle detection if let Some(scan) = self.scan_sub.recv() { // Find minimum distance if let Some(min_dist) = scan.min_range() { if min_dist < self.safety_distance { // SAFETY: emergency stop — send zero velocity immediately let stop = CmdVel::zero(); self.cmd_pub.send(stop); hlog!(warn, "[WARNING] Obstacle detected at {:.2}m - STOPPING!", min_dist); } else { // Safe to move hlog!(debug, "Safe - closest obstacle: {:.2}m", min_dist); } } } } // SAFETY: send stop command on shutdown as a safety precaution fn shutdown(&mut self) -> Result<()> { self.cmd_pub.send(CmdVel::zero()); hlog!(info, "Obstacle detector offline"); Ok(()) } } fn main() -> Result<()> { let mut scheduler = Scheduler::new(); scheduler.add(LidarNode::new()?).order(0).build()?; // Obstacle detector runs with HIGH priority (1) scheduler.add(ObstacleDetector::new(0.5)?).order(1).build()?; scheduler.run()?; Ok(()) } ``` **Run it**: ```bash horus run obstacle_detector.rs ``` **Key Concepts**: - `LaserScan` has 360 range readings (one per degree) - `scan.min_range()` finds closest obstacle - `scan.is_range_valid(index)` checks if reading is good - Safety nodes should run at HIGH priority --- ## 4. PID Controller Implement a PID controller for position tracking. **File: `pid_controller.rs`** ```rust // simplified use horus::prelude::*; struct PIDController { setpoint_sub: Topic, // Desired position feedback_sub: Topic, // Current position output_pub: Topic, // Control output kp: f32, // Proportional gain ki: f32, // Integral gain kd: f32, // Derivative gain integral: f32, last_error: f32, } impl PIDController { fn new(kp: f32, ki: f32, kd: f32) -> Result { Ok(Self { setpoint_sub: Topic::new("setpoint")?, feedback_sub: Topic::new("feedback")?, output_pub: Topic::new("control_output")?, kp, ki, kd, integral: 0.0, last_error: 0.0, }) } } impl Node for PIDController { fn name(&self) -> &str { "PIDController" } fn init(&mut self) -> Result<()> { hlog!(info, "PID initialized - Kp: {}, Ki: {}, Kd: {}", self.kp, self.ki, self.kd); Ok(()) } fn tick(&mut self) { // IMPORTANT: call recv() on ALL subscribed topics every tick let setpoint = self.setpoint_sub.recv().unwrap_or(0.0); let feedback = self.feedback_sub.recv().unwrap_or(0.0); // Calculate error let error = setpoint - feedback; // Integral term (accumulated error) self.integral += error; // Derivative term (rate of change) let derivative = error - self.last_error; // PID output let output = self.kp * error + self.ki * self.integral + self.kd * derivative; // Publish control output self.output_pub.send(output); hlog!(debug, "PID: setpoint={:.2}, feedback={:.2}, error={:.2}, output={:.2}", setpoint, feedback, error, output); // Update state self.last_error = error; } // SAFETY: send zero output on shutdown to stop actuators downstream fn shutdown(&mut self) -> Result<()> { self.output_pub.send(0.0); hlog!(info, "PID controller stopped"); Ok(()) } } // Simple test system node struct TestSystem { output_sub: Topic, feedback_pub: Topic, position: f32, } impl TestSystem { fn new() -> Result { Ok(Self { output_sub: Topic::new("control_output")?, feedback_pub: Topic::new("feedback")?, position: 0.0, }) } } impl Node for TestSystem { fn name(&self) -> &str { "TestSystem" } fn tick(&mut self) { // IMPORTANT: call recv() every tick to consume control commands if let Some(output) = self.output_sub.recv() { self.position += output * 0.01; // Simple integration } // Publish current position self.feedback_pub.send(self.position); } fn shutdown(&mut self) -> Result<()> { hlog!(info, "Test system stopped"); Ok(()) } } // Setpoint generator struct SetpointNode { setpoint_pub: Topic, } impl Node for SetpointNode { fn name(&self) -> &str { "SetpointNode" } fn tick(&mut self) { // Target position let setpoint = 10.0; self.setpoint_pub.send(setpoint); } fn shutdown(&mut self) -> Result<()> { hlog!(info, "Setpoint generator stopped"); Ok(()) } } fn main() -> Result<()> { let mut scheduler = Scheduler::new(); // Setpoint generator scheduler.add(SetpointNode { setpoint_pub: Topic::new("setpoint")?, }).order(0).build()?; // Test system (simulates plant) scheduler.add(TestSystem::new()?).order(1).build()?; // PID controller (Kp=0.5, Ki=0.01, Kd=0.1) scheduler.add(PIDController::new(0.5, 0.01, 0.1)?).order(2).build()?; scheduler.run()?; Ok(()) } ``` **Run it**: ```bash horus run pid_controller.rs ``` **Key Concepts**: - PID = Proportional + Integral + Derivative - Proportional: immediate response to error - Integral: corrects accumulated error - Derivative: dampens oscillations - Tune gains (Kp, Ki, Kd) for your system --- ## 5. Multi-Node Pipeline Chain multiple processing stages together. **File: `pipeline.rs`** ```rust // simplified use horus::prelude::*; // Stage 1: Data acquisition struct SensorNode { raw_pub: Topic, counter: f32, } impl SensorNode { fn new() -> Result { Ok(Self { raw_pub: Topic::new("raw_data")?, counter: 0.0, }) } } impl Node for SensorNode { fn name(&self) -> &str { "SensorNode" } fn init(&mut self) -> Result<()> { hlog!(info, "Stage 1: Sensor online"); Ok(()) } fn tick(&mut self) { // Simulate noisy sensor let raw_data = 50.0 + (self.counter * 0.5).sin() * 20.0; self.raw_pub.send(raw_data); hlog!(debug, "Raw: {:.2}", raw_data); self.counter += 0.1; } fn shutdown(&mut self) -> Result<()> { hlog!(info, "Sensor offline"); Ok(()) } } // Stage 2: Filtering struct FilterNode { raw_sub: Topic, filtered_pub: Topic, alpha: f32, // Low-pass filter coefficient filtered_value: f32, } impl FilterNode { fn new(alpha: f32) -> Result { Ok(Self { raw_sub: Topic::new("raw_data")?, filtered_pub: Topic::new("filtered_data")?, alpha, filtered_value: 0.0, }) } } // PATTERN: Pipeline stage — subscribe, transform, republish impl Node for FilterNode { fn name(&self) -> &str { "FilterNode" } fn init(&mut self) -> Result<()> { hlog!(info, "Stage 2: Filter online"); Ok(()) } fn tick(&mut self) { // IMPORTANT: call recv() every tick to drain the raw data buffer if let Some(raw) = self.raw_sub.recv() { // Exponential moving average self.filtered_value = self.alpha * raw + (1.0 - self.alpha) * self.filtered_value; self.filtered_pub.send(self.filtered_value); hlog!(debug, "Filtered: {:.2}", self.filtered_value); } } fn shutdown(&mut self) -> Result<()> { hlog!(info, "Filter offline"); Ok(()) } } // Stage 3: Decision making struct ControllerNode { filtered_sub: Topic, cmd_pub: Topic, threshold: f32, } impl ControllerNode { fn new(threshold: f32) -> Result { Ok(Self { filtered_sub: Topic::new("filtered_data")?, cmd_pub: Topic::new("commands")?, threshold, }) } } impl Node for ControllerNode { fn name(&self) -> &str { "ControllerNode" } fn init(&mut self) -> Result<()> { hlog!(info, "Stage 3: Controller online"); Ok(()) } fn tick(&mut self) { // IMPORTANT: call recv() every tick to consume filtered data if let Some(value) = self.filtered_sub.recv() { let command = if value > self.threshold { 1.0 } else { 0.0 }; self.cmd_pub.send(command); hlog!(info, "Value: {:.2}, Threshold: {:.2}, Command: {:.0}", value, self.threshold, command); } } // SAFETY: send zero command on shutdown to stop downstream actuators fn shutdown(&mut self) -> Result<()> { self.cmd_pub.send(0.0); hlog!(info, "Controller offline"); Ok(()) } } fn main() -> Result<()> { let mut scheduler = Scheduler::new(); // Add pipeline stages in priority order scheduler.add(SensorNode::new()?).order(0).build()?; scheduler.add(FilterNode::new(0.2)?).order(1).build()?; scheduler.add(ControllerNode::new(50.0)?).order(2).build()?; scheduler.run()?; Ok(()) } ``` **Run it**: ```bash horus run pipeline.rs ``` **Key Concepts**: - Data flows: Sensor Filter Controller - Each stage has different priority (0, 1, 2) - Exponential moving average: `filtered = α * new + (1-α) * old` - Priorities ensure correct execution order --- ## 6. Camera Image Pipeline Send and receive camera images using `Image` with zero-copy shared memory. ### Camera Sender **File: `camera_sender.rs`** ```rust // simplified use horus::prelude::*; struct CameraSender { topic: Topic, } impl Node for CameraSender { fn name(&self) -> &str { "CameraSender" } fn tick(&mut self) { // NOTE: Image::new() allocates from a global shared memory pool — zero-copy transport let mut img = Image::new(480, 640, ImageEncoding::Rgb8) .expect("failed to allocate image"); // Fill with a solid color (blue) img.fill(&[0, 0, 255]); // Set a red pixel at (100, 200) img.set_pixel(100, 200, &[255, 0, 0]); // Send — only a lightweight descriptor is transmitted. // The pixel data stays in shared memory (zero-copy). self.topic.send(&img); hlog!(debug, "Sent {}x{} image", img.width(), img.height()); } fn shutdown(&mut self) -> Result<()> { hlog!(info, "Camera sender stopped"); Ok(()) } } fn main() -> Result<()> { let mut scheduler = Scheduler::new(); scheduler.add(CameraSender { topic: Topic::new("camera.rgb")?, }).order(0).build()?; scheduler.run()?; Ok(()) } ``` ### Camera Receiver **File: `camera_receiver.rs`** ```rust // simplified use horus::prelude::*; struct CameraReceiver { topic: Topic, } impl Node for CameraReceiver { fn name(&self) -> &str { "CameraReceiver" } fn tick(&mut self) { // IMPORTANT: call recv() every tick — stale images waste shared memory pool slots if let Some(img) = self.topic.recv() { // Read a pixel if let Some(px) = img.pixel(100, 200) { hlog!(info, "Pixel at (100,200): R={} G={} B={}", px[0], px[1], px[2]); } hlog!(debug, "Received {}x{} {:?} image ({} bytes)", img.width(), img.height(), img.encoding(), img.data().len()); } } fn shutdown(&mut self) -> Result<()> { hlog!(info, "Camera receiver stopped"); Ok(()) } } fn main() -> Result<()> { let mut scheduler = Scheduler::new(); scheduler.add(CameraReceiver { topic: Topic::new("camera.rgb")?, }).order(0).build()?; scheduler.run()?; Ok(()) } ``` **Run it** (two terminals): ```bash # Terminal 1 horus run camera_sender.rs # Terminal 2 horus run camera_receiver.rs ``` **Key Concepts**: - `Image::new(width, height, encoding)` allocates from a global shared memory pool - `topic.send(&img)` sends only a lightweight descriptor; pixel data stays in shared memory - `topic.recv()` returns `Option` — zero-copy access to the sender's data - `pixel()` / `set_pixel()` / `fill()` / `roi()` for pixel-level access --- ## 7. Point Cloud Processing Create, send, and process 3D point clouds. **File: `pointcloud_demo.rs`** ```rust // simplified use horus::prelude::*; struct PointCloudSender { topic: Topic, } impl Node for PointCloudSender { fn name(&self) -> &str { "PointCloudSender" } fn tick(&mut self) { // NOTE: PointCloud::new() allocates from the shared memory pool — zero-copy transport let mut pc = PointCloud::from_xyz(\&points) // 1000 points, 3 fields .expect("failed to allocate point cloud"); // Fill with sample data — a sphere of radius 1.0 let floats = pc.data_mut_as::(); for i in 0..1000 { let theta = (i as f32 / 1000.0) * std::f32::consts::TAU; let phi = (i as f32 / 1000.0) * std::f32::consts::PI; floats[i * 3] = phi.sin() * theta.cos(); // x floats[i * 3 + 1] = phi.sin() * theta.sin(); // y floats[i * 3 + 2] = phi.cos(); // z } pc.set_frame_id("lidar_frame"); self.topic.send(&pc); hlog!(debug, "Sent {} points", pc.point_count()); } fn shutdown(&mut self) -> Result<()> { hlog!(info, "PointCloud sender stopped"); Ok(()) } } struct PointCloudReceiver { topic: Topic, } impl Node for PointCloudReceiver { fn name(&self) -> &str { "PointCloudReceiver" } fn tick(&mut self) { // IMPORTANT: call recv() every tick — stale point clouds waste shared memory pool slots if let Some(pc) = self.topic.recv() { // Extract all XYZ points as Vec<[f32; 3]> if let Some(points) = pc.extract_xyz() { let centroid = points.iter().fold([0.0f32; 3], |acc, p| { [acc[0] + p[0], acc[1] + p[1], acc[2] + p[2]] }); let n = points.len() as f32; hlog!(info, "Received {} points, centroid: ({:.2}, {:.2}, {:.2})", points.len(), centroid[0] / n, centroid[1] / n, centroid[2] / n); } } } fn shutdown(&mut self) -> Result<()> { hlog!(info, "PointCloud receiver stopped"); Ok(()) } } fn main() -> Result<()> { let mut scheduler = Scheduler::new(); scheduler.add(PointCloudSender { topic: Topic::new("lidar.points")?, }).order(0).build()?; scheduler.add(PointCloudReceiver { topic: Topic::new("lidar.points")?, }).order(1).build()?; scheduler.run()?; Ok(()) } ``` **Run it**: ```bash horus run pointcloud_demo.rs ``` **Key Concepts**: - `PointCloud::new(num_points, fields_per_point, dtype)` — 3=XYZ, 4=XYZI, 6=XYZRGB - `extract_xyz()` returns `Option>` for F32 clouds - `point_count()`, `fields_per_point()`, `is_xyz()` for metadata - Uses the same zero-copy shared memory transport as Image --- ## Next Steps Now that you understand basic patterns, explore: - [Second Application Tutorial](/getting-started/second-application) - Build a 3-node sensor pipeline - [Advanced Examples](/rust/examples/advanced-examples) - Multi-process systems, Python integration - [Testing](/development/testing) - Write tests for your nodes - [Using Prebuilt Nodes](/package-management/using-prebuilt-nodes) - Leverage the standard library - [Message Types](/concepts/message-types) - Complete message reference Stuck? Check: - [Troubleshooting](/getting-started/troubleshooting) - Fix common issues - [Monitor](/development/monitor) - Debug with the web UI --- ## See Also - [Advanced Examples](/rust/examples/advanced-examples) — Multi-node, RT, and production patterns - [Tutorials](/tutorials) — Step-by-step learning path - [API Reference](/rust/api) — Complete Rust API documentation --- ## Topic API Path: /rust/api/topic Description: Zero-copy pub/sub communication channels between HORUS nodes — the primary IPC primitive # Topic API Topics are how nodes communicate in HORUS. `Topic` provides typed publish/subscribe channels backed by shared memory for zero-copy IPC. Two Topic instances with the same name and type connect automatically — one publishes, the other subscribes. Backend selection (in-process ring buffer vs cross-process SHM) is automatic based on whether publisher and subscriber live in the same process. > **Python**: Available via `horus.Topic("name", Type)`. See [Python Bindings](/python/api/python-bindings). > > New to topics? Start with [Topics: How Nodes Talk](/concepts/topics-beginner) for a 5-minute introduction. ```rust // simplified use horus::prelude::*; ``` --- ## Quick Reference — Constructors | Constructor | Returns | Description | |-------------|---------|-------------| | `Topic::::new(name)` | `HorusResult>` | Creates with auto-computed capacity (16–1024 based on message size) and auto-sized slots | | `Topic::::with_capacity(name, capacity, slot_size)` | `HorusResult>` | Creates with explicit buffer configuration | ## Quick Reference — Sending | Method | Returns | Description | |--------|---------|-------------| | `send(msg)` | `()` | Publishes a message. Non-blocking. Drops oldest if full. | | `try_send(msg)` | `Result<(), T>` | Publishes if space available. Returns message back on failure. | | `send_blocking(msg, timeout)` | `Result<(), SendBlockingError>` | Blocks until space available or timeout. | ## Quick Reference — Receiving | Method | Returns | Description | |--------|---------|-------------| | `recv()` | `Option` | Takes the next unread message (FIFO). Consumes it. | | `try_recv()` | `Option` | Same as `recv()`. Provided for API symmetry with `try_send()`. | | `read_latest()` | `Option` | Reads the newest message, skipping older ones. Requires `T: Copy`. | ## Quick Reference — State & Metrics | Method | Returns | Description | |--------|---------|-------------| | `name()` | `&str` | Topic name | | `has_message()` | `bool` | Whether at least one unread message exists | | `pending_count()` | `u64` | Number of unread messages in the buffer | | `dropped_count()` | `u64` | Total messages dropped (buffer-full overwrites) | | `pub_count()` | `u32` | Number of active publishers | | `sub_count()` | `u32` | Number of active subscribers | | `is_same_process()` | `bool` | Whether all pub/sub are in the same process (`#[cfg(test)]` only) | | `is_same_thread()` | `bool` | Whether all pub/sub are on the same thread (`#[cfg(test)]` only) | | `metrics()` | `TopicMetrics` | Aggregate send/receive statistics | --- ## Constructor Methods ### `new(name)` Creates a topic with default capacity and auto-sized slots. **Signature** ```rust // simplified pub fn new(name: impl Into) -> HorusResult ``` **Parameters** | Name | Type | Required | Description | |------|------|----------|-------------| | `name` | `impl Into` | yes | Topic identifier. Case-sensitive. Use dot-delimited hierarchical names: `"sensors.imu"`, `"cmd_vel"`. Two Topic instances with the same name and type `T` connect automatically. **Note:** Dots are preferred over slashes for macOS compatibility (`shm_open` does not allow `/` in names). | **Returns** `HorusResult>` — connected to the named shared memory channel. **Errors** | Error | Condition | |-------|-----------| | `HorusError` | SHM region creation failed (permissions, disk full) | **Behavior** - Default capacity: auto-computed (16–1024 slots based on message size; smaller messages get more slots) - Default slot size: `size_of::()` rounded up to page alignment - Backend auto-selection: if publisher and subscriber are in the same process, uses an in-process ring buffer (no SHM overhead). Cross-process uses SHM. - Creating two `Topic` with the same name connects them — no broker, no configuration **Example** ```rust // simplified use horus::prelude::*; let pub_topic = Topic::::new("cmd_vel")?; let sub_topic = Topic::::new("cmd_vel")?; // connects to same channel pub_topic.send(CmdVel::new(1.0, 0.0)); let msg = sub_topic.recv(); // Some(CmdVel { linear: 1.0, angular: 0.0 }) ``` --- ### `with_capacity(name, capacity, slot_size)` Creates a topic with explicit buffer configuration. **Signature** ```rust // simplified pub fn with_capacity(name: &str, capacity: u32, slot_size: Option) -> HorusResult ``` **Parameters** | Name | Type | Required | Description | |------|------|----------|-------------| | `name` | `&str` | yes | Topic identifier (same rules as `new`). | | `capacity` | `u32` | yes | Number of slots in the ring buffer. Must be a power of 2. Typical: 4, 8, 16. | | `slot_size` | `Option` | no | Bytes per slot. `None` = auto-sized from `size_of::()`. Set explicitly for variable-size messages. | **Returns** `HorusResult>` **When to use** - Large messages that need bigger slots (e.g., `LaserScan` with 360+ ranges) - High-frequency topics where you need more buffering to prevent drops - Small messages where you want to reduce memory footprint **When NOT to use** - Most cases — `Topic::new()` auto-sizes correctly - Pool-backed types (`Image`, `PointCloud`, `DepthImage`, `Tensor`) — they manage their own memory **Example** ```rust // simplified use horus::prelude::*; // 8 slots, 4KB each — for large LiDAR scans let scan = Topic::::with_capacity("lidar.scan", 8, Some(4096))?; // 16 slots, auto-sized — high-frequency IMU with extra buffering let imu = Topic::::with_capacity("imu", 16, None)?; ``` --- ## Sending Methods ### `send(msg)` Publishes a message. Non-blocking. Overwrites the oldest unread message if the buffer is full. **Signature** ```rust // simplified pub fn send(&self, msg: T) ``` **Parameters** | Name | Type | Required | Description | |------|------|----------|-------------| | `msg` | `T` | yes | Message to publish. Moved into the ring buffer. | **Returns** Nothing (`()`). Always succeeds — never blocks, never returns an error. **Behavior** - If all slots are occupied, the oldest unread message is silently overwritten - Use `dropped_count()` to detect overwrites - Notifies any event-driven nodes (`.on("topic")`) that new data arrived - Fast path: ~3ns for same-thread publisher+subscriber (inlined ring write, no pointer chase) - Cross-process: serializes via SHM, ~150ns typical **When to use** - Default for real-time robotics — you always want the latest data - Control loops where blocking is unacceptable - Fire-and-forget telemetry **When NOT to use** - When you need to know if the message was received — use `try_send()` instead - When dropping messages is unacceptable — use `send_blocking()` instead **Example** ```rust // simplified use horus::prelude::*; let topic = Topic::::new("cmd_vel")?; // Fire-and-forget (overwrites oldest if full) topic.send(CmdVel::new(1.0, 0.0)); // Check if messages were dropped if topic.dropped_count() > 0 { hlog!(warn, "{} messages dropped — subscriber too slow", topic.dropped_count()); } ``` --- ### `try_send(msg)` Attempts to publish without overwriting. Returns the message back if the buffer is full. **Signature** ```rust // simplified pub fn try_send(&self, msg: T) -> Result<(), T> ``` **Parameters** | Name | Type | Required | Description | |------|------|----------|-------------| | `msg` | `T` | yes | Message to publish. Returned on failure. | **Returns** - `Ok(())` — message published successfully - `Err(msg)` — buffer full, message returned to caller (not consumed) **Behavior** - Non-blocking. Checks buffer space and either publishes or returns immediately. - The returned message can be retried or discarded — your choice. **When to use** - When you need to detect drops — implement backpressure, count losses, or log warnings - When the message is expensive to create and you don't want to waste it **When NOT to use** - Normal control loops — use `send()` (simpler, always succeeds) **Example** ```rust // simplified use horus::prelude::*; let topic = Topic::::new("cmd_vel")?; match topic.try_send(CmdVel::new(1.0, 0.0)) { Ok(()) => { /* sent */ } Err(_returned) => { hlog!(warn, "Buffer full — message not sent"); } } ``` --- ### `send_blocking(msg, timeout)` Blocks until buffer space is available or the timeout elapses. **Signature** ```rust // simplified pub fn send_blocking(&self, msg: T, timeout: Duration) -> Result<(), SendBlockingError> ``` **Parameters** | Name | Type | Required | Description | |------|------|----------|-------------| | `msg` | `T` | yes | Message to publish. Consumed on success. | | `timeout` | `Duration` | yes | Maximum time to wait for space. Create with `.ms()`: `10_u64.ms()`. | **Returns** - `Ok(())` — message published - `Err(SendBlockingError::Timeout)` — buffer stayed full for the entire timeout **Behavior** Uses a graduated wait strategy for low latency: 1. Immediate `try_send` (0 latency) 2. Spin loop — 256 iterations (~sub-microsecond) 3. Yield — 8 thread yields (~microseconds) 4. Sleep — 100us increments until deadline **When to use** - When dropping messages is unacceptable and brief blocking is acceptable - Logging pipelines, recording, non-RT data transfer **When NOT to use** - Real-time control loops — blocking in `tick()` causes deadline misses - Any node with `.rate()` or `.budget()` — use `send()` instead **Example** ```rust // simplified use horus::prelude::*; let topic = Topic::::new("diagnostics")?; match topic.send_blocking(report, 10_u64.ms()) { Ok(()) => { /* sent */ } Err(SendBlockingError::Timeout) => { hlog!(warn, "Diagnostic buffer full for 10ms — dropping report"); } } ``` --- ## Receiving Methods ### `recv()` Takes the next unread message in FIFO order. Consumes it from the buffer. **Signature** ```rust // simplified pub fn recv(&self) -> Option ``` **Parameters** None. **Returns** - `Some(T)` — the oldest unread message (removed from buffer) - `None` — no unread messages available **Behavior** - Non-blocking. Returns immediately. - Each message is delivered to each subscriber exactly once. After `recv()` returns it, the message is consumed. - Fast path: ~3ns for same-thread (inlined ring read) **When to use** - Command streams where every message matters (velocity commands, goal updates) - Processing pipelines where order matters (image frames, sensor sequences) **When NOT to use** - State-like data where you only care about the latest value — use `read_latest()` instead **Example** ```rust // simplified use horus::prelude::*; let topic = Topic::::new("imu")?; // IMPORTANT: drain every tick to avoid stale data accumulation while let Some(msg) = topic.recv() { process(msg); } ``` --- ### `try_recv()` Non-blocking receive. Functionally identical to `recv()`. **Signature** ```rust // simplified pub fn try_recv(&self) -> Option ``` **Returns** Same as `recv()`. Provided for API symmetry with `try_send()`. --- ### `read_latest()` Reads the most recent message, skipping all older ones. **Signature** ```rust // simplified pub fn read_latest(&self) -> Option where T: Copy ``` **Parameters** None. **Returns** - `Some(T)` — copy of the newest message - `None` — no messages available **Behavior** - Reads without consuming — multiple calls return the same value until a new message arrives - Skips all older messages — only the latest matters - Requires `T: Copy` because it reads (copies) without removing from the buffer **When to use** - State-like data where you only care about the current value: poses, sensor readings, configuration - Slow subscribers that can't keep up with a fast publisher **When NOT to use** - Command streams where every message matters — use `recv()` instead - Messages that aren't `Copy` (e.g., types with `Vec`, `String`) **Example** ```rust // simplified use horus::prelude::*; let odom = Topic::::new("odom")?; // Good: pose is state — only the latest matters if let Some(pose) = odom.read_latest() { current_position = pose; } ``` --- ## State & Metrics Methods ### `name()` Returns the topic name. **Signature** ```rust // simplified pub fn name(&self) -> &str ``` **Returns** `&str` — the name passed to `new()` or `with_capacity()`. --- ### `has_message()` Checks whether at least one unread message exists. **Signature** ```rust // simplified pub fn has_message(&self) -> bool ``` **Returns** `true` if `recv()` would return `Some`. `false` if the buffer is empty. --- ### `pending_count()` Returns the number of unread messages in the buffer. **Signature** ```rust // simplified pub fn pending_count(&self) -> u64 ``` **Returns** `u64` — count of messages waiting to be consumed by `recv()`. --- ### `dropped_count()` Returns the total number of messages dropped due to buffer-full overwrites. **Signature** ```rust // simplified pub fn dropped_count(&self) -> u64 ``` **Returns** `u64` — cumulative count since topic creation. Incremented by `send()` when it overwrites the oldest slot. **When to use** - Monitor for backpressure — a rising `dropped_count()` means the subscriber is too slow - Log warnings when drops exceed a threshold **Example** ```rust // simplified use horus::prelude::*; let topic = Topic::::new("scan")?; if topic.dropped_count() > 0 { hlog!(warn, "Dropped {} scans — subscriber can't keep up", topic.dropped_count()); } ``` --- ### `pub_count()` Returns the number of active publishers on this topic. > **Advanced**: These methods are `#[doc(hidden)]` in the Rust API — they may change without notice. For monitoring, prefer `horus topic list --verbose`. **Signature** ```rust // simplified pub fn pub_count(&self) -> u32 ``` --- ### `sub_count()` Returns the number of active subscribers on this topic. > **Advanced**: These methods are `#[doc(hidden)]` in the Rust API — they may change without notice. For monitoring, prefer `horus topic list --verbose`. **Signature** ```rust // simplified pub fn sub_count(&self) -> u32 ``` --- ### `is_same_process()` — `#[cfg(test)]` only Checks whether all publishers and subscribers are in the same process. Available only in test builds. **Signature** ```rust // simplified #[cfg(test)] pub fn is_same_process(&self) -> bool ``` **Returns** `true` if using in-process ring buffer (fastest path, ~3ns). `false` if cross-process SHM. > **Note:** This method is gated behind `#[cfg(test)]` and is not available in production builds. It exists for internal testing and assertions only. --- ### `is_same_thread()` — `#[cfg(test)]` only Checks whether all publishers and subscribers are on the same thread. Available only in test builds. **Signature** ```rust // simplified #[cfg(test)] pub fn is_same_thread(&self) -> bool ``` **Returns** `true` if using the ultra-fast same-thread path (inlined ring operations, no atomics). > **Note:** This method is gated behind `#[cfg(test)]` and is not available in production builds. It exists for internal testing and assertions only. --- ### `metrics()` Returns aggregate send/receive statistics. **Signature** ```rust // simplified pub fn metrics(&self) -> TopicMetrics ``` **Returns** `TopicMetrics` — see Types section below. **Example** ```rust // simplified use horus::prelude::*; let topic = Topic::::new("cmd_vel")?; let m = topic.metrics(); println!("Sent: {}, Received: {}, Failures: {}", m.messages_sent(), m.messages_received(), m.send_failures()); ``` --- ## Pool-Backed Types (Zero-Copy) For large data types (`Image`, `PointCloud`, `DepthImage`, `Tensor`), HORUS uses pool-backed allocation. The `Topic` API works the same — `send()`, `recv()`, `try_send()` — but the data is transferred via shared memory pools instead of copying through the ring buffer. ```rust // simplified use horus::prelude::*; let camera = Topic::::new("camera.rgb")?; // Send (moves the Image into the pool slot) camera.send(image); // IMPORTANT: call recv() every tick to drain — images are large and stale frames waste pool slots if let Some(img) = camera.recv() { println!("{}x{} image received", img.width(), img.height()); } ``` See [Image API](/rust/api/image), [PointCloud API](/rust/api/pointcloud), [DepthImage API](/rust/api/depth-image), [Tensor API](/rust/api/tensor) for type-specific methods. --- ## Types ### `TopicMetrics` Aggregate statistics returned by `topic.metrics()`. | Method | Returns | Description | |--------|---------|-------------| | `.messages_sent()` | `u64` | Total messages published on this topic | | `.messages_received()` | `u64` | Total messages consumed from this topic | | `.send_failures()` | `u64` | Failed send attempts (e.g., `try_send` on a full buffer) | | `.recv_failures()` | `u64` | Failed receive attempts (e.g., `recv` on empty buffer) | ### `SendBlockingError` Error returned by `send_blocking()`. | Variant | Description | |---------|-------------| | `Timeout` | The ring buffer stayed full for the entire timeout duration | --- ## Production Example Multi-sensor fusion node subscribing to two topics and publishing a fused pose: ```rust // simplified use horus::prelude::*; struct FusionNode { imu_sub: Topic, odom_sub: Topic, pose_pub: Topic, last_imu: Option, last_odom: Option, } impl Node for FusionNode { fn name(&self) -> &str { "Fusion" } fn tick(&mut self) { // IMPORTANT: always drain both topics every tick if let Some(imu) = self.imu_sub.recv() { self.last_imu = Some(imu); } if let Some(odom) = self.odom_sub.recv() { self.last_odom = Some(odom); } if let (Some(imu), Some(odom)) = (&self.last_imu, &self.last_odom) { let fused = self.fuse(imu, odom); self.pose_pub.send(fused); } } } ``` --- ## See Also - [Topics Concept](/concepts/core-concepts-topic) — Architecture, backends, and design patterns - [Topics for Beginners](/concepts/topics-beginner) — Gentle introduction - [Communication Overview](/concepts/communication-overview) — When to use topics vs services vs actions - [Image API](/rust/api/image) — Pool-backed camera images - [PointCloud API](/rust/api/pointcloud) — Pool-backed 3D point clouds - [DepthImage API](/rust/api/depth-image) — Pool-backed depth images - [Tensor API](/rust/api/tensor) — Tensor descriptor and DLPack - [Scheduler API](/rust/api/scheduler) — Running nodes that use topics - [Python Topic API](/python/api/python-bindings) — Python topic bindings --- ## Services API Path: /rust/api/services Description: Synchronous request/response RPC between HORUS nodes # Services API HORUS services provide synchronous request/response communication between nodes. Define a service with the `service!` macro, run a server with `ServiceServerBuilder`, and call it with `ServiceClient` or `AsyncServiceClient`. > **Python**: Services are Rust-only. Python bindings are not yet available. ## Defining a Service Use the `service!` macro to define request and response types: ```rust // simplified use horus::prelude::*; service! { /// Look up a robot's current pose by name. GetRobotPose { request { robot_name: String, } response { x: f64, y: f64, theta: f64, timestamp_ns: u64, } } } ``` --- ## Service Trait All services implement the `Service` trait: | Method | Returns | Description | |--------|---------|-------------| | `name()` | `&'static str` | Service name (used as topic prefix) | | `request_topic()` | `String` | Request channel name (`"{name}.request"`) | | `response_topic()` | `String` | Response channel name (`"{name}.response"`) | | `request_type_name()` | `&'static str` | Human-readable request type name | | `response_type_name()` | `&'static str` | Human-readable response type name | --- ## ServiceClient (Blocking) Synchronous client that blocks until a response arrives or the timeout elapses. ### Constructor | Method | Returns | Description | |--------|---------|-------------| | `ServiceClient::::new()` | `Result` | Create a client with default 1ms poll interval | | `ServiceClient::::with_poll_interval(interval)` | `Result` | Create a client with custom poll interval | ### Calling | Method | Returns | Description | |--------|---------|-------------| | `call(request, timeout)` | `ServiceResult` | Block until response or timeout | | `call_resilient(request, timeout)` | `ServiceResult` | Auto-retry on transient errors (3 retries, 10ms backoff, 2x multiplier) | | `call_resilient_with(request, timeout, config)` | `ServiceResult` | Auto-retry with custom `RetryConfig` | | `call_optional(request, timeout)` | `ServiceResult>` | Returns `Ok(None)` on timeout instead of `Err` | Only transient errors (`Timeout`, `Transport`) are retried. Permanent errors (`ServiceFailed`, `NoServer`) propagate immediately. ### Detailed Method Reference #### call ```rust // simplified pub fn call(&self, request: S::Request, timeout: Duration) -> ServiceResult ``` Send a request and block until a response arrives or the timeout elapses. **Parameters** | Name | Type | Required | Description | |------|------|----------|-------------| | `request` | `S::Request` | yes | The typed request message. `S` is the service type from `service!` macro. | | `timeout` | `Duration` | yes | Maximum time to wait for a response. Create with `.ms()` or `.secs()`: `100_u64.ms()`. | **Returns:** `ServiceResult` — `Ok(response)` or `Err(ServiceError)`. **Errors** | Error | Condition | |-------|-----------| | `ServiceError::Timeout` | No response within timeout | | `ServiceError::NoServer` | No server is registered for this service | | `ServiceError::ServiceFailed(msg)` | Server handler returned `Err(msg)` | | `ServiceError::Transport(msg)` | IPC communication failure | **When to use** - One-shot queries: "what is the robot's current pose?" - Parameter lookups from a configuration server - Any request that should complete in milliseconds **When NOT to use** - Long-running tasks — use [Actions](/rust/api/actions) instead - Continuous data streams — use [Topics](/rust/api/topic) instead - Non-critical lookups where timeout is acceptable — use `call_optional()` instead **Example:** ```rust // simplified let response = client.call(GetPose { frame: "base" }, 100.ms())?; println!("x={}, y={}", response.x, response.y); ``` #### call_resilient Sends a request with automatic retry on transient errors. **Signature** ```rust // simplified pub fn call_resilient(&mut self, request: S::Request, timeout: Duration) -> ServiceResult ``` **Parameters** | Name | Type | Required | Description | |------|------|----------|-------------| | `request` | `S::Request` | yes | The typed request message. | | `timeout` | `Duration` | yes | Timeout per attempt. Total time may exceed this due to retries. | **Returns** `ServiceResult` — `Ok(response)` after successful attempt, `Err` if all retries exhausted or permanent error. **Behavior** - Default retry config: 3 retries, 10ms initial backoff, 2x multiplier - Only transient errors (`Timeout`, `Transport`) trigger retries - Permanent errors (`NoServer`, `ServiceFailed`) propagate immediately — no retry - Total wall time can exceed `timeout` because each retry gets a fresh timeout **When to use**: Network-adjacent calls where transient failures are expected (sensor servers, remote nodes). **When NOT to use**: Latency-critical paths where retry delay is unacceptable — use `call()` with your own retry logic. #### call_resilient_with Sends a request with automatic retry using a custom retry configuration. **Signature** ```rust // simplified pub fn call_resilient_with(&mut self, request: S::Request, timeout: Duration, config: RetryConfig) -> ServiceResult ``` **Parameters** | Name | Type | Required | Description | |------|------|----------|-------------| | `request` | `S::Request` | yes | The typed request message. | | `timeout` | `Duration` | yes | Timeout per attempt. | | `config` | `RetryConfig` | yes | Custom retry settings: `RetryConfig::new(max_retries, initial_backoff)` with `.multiplier()`, `.max_backoff()`. | **Returns** `ServiceResult` — same as `call_resilient()`. **Example** ```rust // simplified use horus::prelude::*; let config = RetryConfig::new(5, 50_u64.ms()) .multiplier(3.0) .max_backoff(1_u64.secs()); let response = client.call_resilient_with(request, 2_u64.secs(), config)?; ``` #### call_optional Sends a request, returning `None` on timeout instead of an error. **Signature** ```rust // simplified pub fn call_optional(&mut self, request: S::Request, timeout: Duration) -> ServiceResult> ``` **Parameters** | Name | Type | Required | Description | |------|------|----------|-------------| | `request` | `S::Request` | yes | The typed request message. | | `timeout` | `Duration` | yes | Maximum time to wait. | **Returns** - `Ok(Some(response))` — server responded successfully - `Ok(None)` — timeout elapsed, no response (not an error) - `Err(ServiceError)` — non-timeout error (NoServer, ServiceFailed, Transport) **When to use**: Non-critical lookups where missing data is acceptable — e.g., checking if a sensor is online before planning. ### Example ```rust // simplified use horus::prelude::*; // Create client for the GetRobotPose service let mut client = ServiceClient::::new()?; // Blocking call with 1-second timeout let response = client.call( GetRobotPoseRequest { robot_name: "arm_1".into() }, 1_u64.secs(), )?; println!("Robot at ({:.2}, {:.2})", response.x, response.y); // Resilient call — retries on transient failures let response = client.call_resilient( GetRobotPoseRequest { robot_name: "arm_1".into() }, 2_u64.secs(), )?; // Optional call — Ok(None) on timeout match client.call_optional( GetRobotPoseRequest { robot_name: "arm_1".into() }, 100_u64.ms(), )? { Some(res) => println!("Pose: ({:.2}, {:.2})", res.x, res.y), None => println!("Pose server not responding"), } ``` --- ## AsyncServiceClient (Non-Blocking) Non-blocking client that returns a `PendingServiceCall` handle. Check the handle each tick without blocking the scheduler. ### Constructor | Method | Returns | Description | |--------|---------|-------------| | `AsyncServiceClient::::new()` | `Result` | Create with default 1ms poll interval | | `AsyncServiceClient::::with_poll_interval(interval)` | `Result` | Create with custom poll interval | ### Calling | Method | Returns | Description | |--------|---------|-------------| | `call_async(request, timeout)` | `PendingServiceCall` | Send request, return pending handle immediately | ### PendingServiceCall | Method | Returns | Description | |--------|---------|-------------| | `check()` | `ServiceResult>` | Non-blocking check: `Ok(Some(res))` if ready, `Ok(None)` if waiting, `Err` on timeout/failure | | `wait()` | `ServiceResult` | Block until response arrives or timeout | | `is_expired()` | `bool` | Whether the deadline has passed | #### check Polls for a response without blocking. **Signature** ```rust // simplified pub fn check(&mut self) -> ServiceResult> ``` **Returns** - `Ok(Some(response))` — response arrived - `Ok(None)` — still waiting, call again next tick - `Err(ServiceError::Timeout)` — deadline passed - `Err(other)` — transport or server error **When to use**: In `tick()` — check every cycle without blocking the scheduler. #### wait Blocks until the response arrives or the deadline passes. **Signature** ```rust // simplified pub fn wait(self) -> ServiceResult ``` **Returns** `ServiceResult` — consumes the handle. Use `check()` for non-blocking. **When NOT to use**: Inside `tick()` of RT nodes — blocking causes deadline misses. #### is_expired Checks whether the deadline has passed. **Signature** ```rust // simplified pub fn is_expired(&self) -> bool ``` **Returns** `true` if the timeout has elapsed. After this, `check()` will return `Err(Timeout)`. ### Example ```rust // simplified use horus::prelude::*; service! { GetRobotPose { request { robot_name: String } response { x: f64, y: f64, theta: f64, timestamp_ns: u64 } } } struct PlannerNode { client: AsyncServiceClient, pending: Option>, } impl Node for PlannerNode { fn name(&self) -> &str { "Planner" } fn tick(&mut self) { // Send request if none pending if self.pending.is_none() { self.pending = Some(self.client.call_async( GetRobotPoseRequest { robot_name: "arm_0".into() }, 500_u64.ms(), )); } // Check for response (non-blocking) if let Some(ref mut call) = self.pending { match call.check() { Ok(Some(pose)) => { hlog!(info, "Robot at ({:.2}, {:.2})", pose.x, pose.y); self.pending = None; } Ok(None) => {} // Still waiting Err(e) => { hlog!(warn, "Service call failed: {}", e); self.pending = None; } } } } } ``` --- ## ServiceServerBuilder Fluent builder for creating a service server. | Method | Returns | Description | |--------|---------|-------------| | `ServiceServerBuilder::::new()` | `Self` | Create a new builder | | `on_request(handler)` | `Self` | Register the request handler (`Fn(Req) -> Result`) | | `poll_interval(interval)` | `Self` | Override poll interval (default: 5ms) | | `build()` | `Result>` | Build and start the server (spawns background thread) | The handler receives the request payload and returns either a response (`Ok`) or an error message (`Err`). The handler's type is: ```rust // simplified type RequestHandler = Box Result + Send + Sync + 'static>; ``` ### ServiceServer | Method | Returns | Description | |--------|---------|-------------| | `stop()` | `()` | Stop the server (also happens automatically on drop) | The server runs in a background thread. Dropping the `ServiceServer` handle shuts it down. ### Example ```rust // simplified use horus::prelude::*; use std::collections::HashMap; // Build and start server let poses: HashMap = HashMap::from([ ("arm_1".into(), (1.5, 2.0, 0.0)), ("arm_2".into(), (3.0, 1.0, 1.57)), ]); let server = ServiceServerBuilder::::new() .on_request(move |req| { match poses.get(&req.robot_name) { Some(&(x, y, theta)) => Ok(GetRobotPoseResponse { x, y, theta, timestamp_ns: horus::timestamp_now(), }), None => Err(format!("Unknown robot: {}", req.robot_name)), } }) .poll_interval(1_u64.ms()) .build()?; // Server runs in background thread until dropped ``` --- ## ServiceRequest / ServiceResponse Wrapper types that flow over the wire: ### ServiceRequest\ | Field | Type | Description | |-------|------|-------------| | `request_id` | `u64` | Unique correlation ID (auto-assigned by client) | | `payload` | `Req` | The actual request data | ### ServiceResponse\ | Field | Type | Description | |-------|------|-------------| | `request_id` | `u64` | Echoes the request's correlation ID | | `ok` | `bool` | `true` if handled successfully | | `payload` | `Option` | Response data (`Some` when `ok == true`) | | `error` | `Option` | Error message (`Some` when `ok == false`) | | Method | Returns | Description | |--------|---------|-------------| | `ServiceResponse::success(request_id, payload)` | `Self` | Create a successful response | | `ServiceResponse::failure(request_id, error)` | `Self` | Create an error response | --- ## ServiceError | Variant | Description | Transient? | |---------|-------------|------------| | `Timeout` | Call timed out waiting for response | Yes | | `ServiceFailed(String)` | Server returned an error | No | | `NoServer` | No server registered for this service | No | | `Transport(String)` | Topic I/O error | Yes | | Method | Returns | Description | |--------|---------|-------------| | `is_transient()` | `bool` | Whether a retry may succeed (`Timeout` and `Transport` are transient) | --- ## ServiceInfo Metadata returned by `horus service list`: | Field | Type | Description | |-------|------|-------------| | `name` | `String` | Service name | | `request_type` | `String` | Rust type name of request | | `response_type` | `String` | Rust type name of response | | `servers` | `usize` | Active server count (typically 0 or 1) | | `clients` | `usize` | Known client count | --- ## Complete Example ```rust // simplified use horus::prelude::*; use std::collections::HashMap; use std::sync::{Arc, Mutex}; // Define a key-value store service service! { /// Get or set values in a shared store. KeyValueStore { request { key: String, value: Option, // None = get, Some = set } response { value: Option, found: bool, } } } fn main() -> Result<()> { let store = Arc::new(Mutex::new(HashMap::::new())); let store_clone = store.clone(); // Start server let _server = ServiceServerBuilder::::new() .on_request(move |req| { let mut map = store_clone.lock().unwrap(); match req.value { Some(val) => { map.insert(req.key, val.clone()); Ok(KeyValueStoreResponse { value: Some(val), found: true }) } None => { let val = map.get(&req.key).cloned(); let found = val.is_some(); Ok(KeyValueStoreResponse { value: val, found }) } } }) .build()?; // Client: set a value let mut client = ServiceClient::::new()?; client.call( KeyValueStoreRequest { key: "robot_id".into(), value: Some("arm_01".into()) }, 1_u64.secs(), )?; // Client: get it back let res = client.call( KeyValueStoreRequest { key: "robot_id".into(), value: None }, 1_u64.secs(), )?; assert_eq!(res.value, Some("arm_01".into())); assert!(res.found); Ok(()) } ``` --- ## See Also - [Services Concepts](/concepts/services) — Architecture and design patterns - [Actions API](/rust/api/actions) — Long-running tasks with feedback and cancellation - [Topic API](/rust/api/topic) — Streaming pub/sub communication - [Error Handling](/development/error-handling) — RetryConfig for resilient calls --- ## Time API Path: /rust/time-api Description: Framework clock, timestep, and RNG — horus::now(), horus::dt(), horus::rng(), and more # Time API The `horus::` time functions are THE standard way to get time, timestep, and random numbers in HORUS nodes — same pattern as `hlog!()` for logging. The scheduler sets the ambient context before each `tick()` call; these functions read from it. ## Quick Reference | Function | Returns | Normal Mode | Deterministic Mode | |----------|---------|-------------|-------------------| | `horus::now()` | `TimeStamp` | Wall clock | Virtual SimClock | | `horus::since(ts)` | `Duration` | Elapsed since `ts` | Elapsed (virtual) | | `horus::dt()` | `Duration` | Real elapsed since last tick | Fixed `1/rate` | | `horus::elapsed()` | `Duration` | Wall time since scheduler start | Accumulated virtual time | | `horus::tick()` | `u64` | Current tick number | Current tick number | | `horus::rng(f)` | `R` | System entropy | Tick-seeded (deterministic) | | `horus::budget_remaining()` | `Duration` | Time left in budget | Time left (SimClock) | All functions are safe to call outside `tick()` — they return sensible fallback values. ## `horus::now()` — Current Time Returns a `TimeStamp` representing the current framework time. **Returns** `TimeStamp` — nanosecond-precision timestamp. In normal mode: wall clock (`CLOCK_MONOTONIC`). In deterministic mode: virtual SimClock time. **Behavior** - Normal mode: reads `CLOCK_MONOTONIC` (~20ns) - Deterministic mode (`.deterministic(true)`): returns virtual time that advances by exactly `1/rate` per tick - Two calls to `now()` within the same tick return the same value in deterministic mode ```rust // simplified fn tick(&mut self) { let start = horus::now(); self.do_expensive_work(); let elapsed = horus::since(start); hlog!(debug, "work took {:?}", elapsed); } ``` ### `TimeStamp` Type `TimeStamp` is an opaque wrapper with nanosecond precision: ```rust // simplified let a = horus::now(); let b = horus::now(); // Subtraction produces Duration let diff: Duration = b - a; // Elapsed since a timestamp let elapsed: Duration = a.elapsed(); // Comparison assert!(b > a); // Display println!("{}", a); // "1.500000s" // Serialization let nanos: u64 = a.as_nanos(); let restored = TimeStamp::from_nanos(nanos); ``` ## `horus::dt()` — Timestep Returns the timestep for the current tick. Use this for physics integration: ```rust // simplified fn tick(&mut self) { let dt = horus::dt(); self.position += self.velocity * dt.as_secs_f64(); self.velocity += self.acceleration * dt.as_secs_f64(); } ``` In normal mode, `dt()` returns the actual elapsed time since the last tick. In deterministic mode, it returns a fixed value of `1/rate` (e.g., 10ms for 100Hz, 1ms for 1kHz). ## `horus::elapsed()` — Time Since Start Total time since the scheduler started: ```rust // simplified fn tick(&mut self) { if horus::elapsed() > Duration::from_secs(30) { hlog!(info, "Running for 30 seconds, switching to cruise mode"); self.mode = Mode::Cruise; } } ``` ## `horus::tick()` — Tick Number Zero-based tick counter: ```rust // simplified fn tick(&mut self) { if horus::tick() % 100 == 0 { hlog!(info, "Checkpoint at tick {}", horus::tick()); } } ``` ## `horus::rng()` — Deterministic Random Numbers Returns random values via a closure. In deterministic mode, the RNG is seeded from the tick number and node name — same sequence every run. ```rust // simplified fn tick(&mut self) { // Random float in range let noise: f64 = horus::rng(|r| { use rand::Rng; r.gen_range(-0.01..0.01) }); self.measurement += noise; // Random bool let should_explore: bool = horus::rng(|r| { use rand::Rng; r.gen_bool(0.1) // 10% chance }); // Random integer let index: usize = horus::rng(|r| { use rand::Rng; r.gen_range(0..self.candidates.len()) }); } ``` ## `horus::budget_remaining()` — Anytime Algorithms Returns the time remaining in the current tick's budget. Use this for anytime algorithms that improve their result until time runs out: ```rust // simplified fn tick(&mut self) { let mut plan = self.current_plan.clone(); loop { plan = self.improve_plan(plan); if horus::budget_remaining() < 50_u64.us() { break; // stop before budget runs out } } self.path_topic.send(plan); } ``` Returns `Duration::MAX` if no budget is configured or outside `tick()`. ## Fallback Behavior All functions are safe to call outside `tick()`: | Function | Outside `tick()` | |----------|-----------------| | `horus::now()` | Wall clock (fallback) | | `horus::dt()` | `Duration::ZERO` | | `horus::elapsed()` | `Duration::ZERO` | | `horus::tick()` | `0` | | `horus::rng(f)` | System entropy (non-deterministic) | | `horus::budget_remaining()` | `Duration::MAX` | ## Python API The time functions are available as module-level functions in Python: ```python import horus # In a node's tick(): now = horus.now() # float (seconds) dt = horus.dt() # float (seconds) elapsed = horus.elapsed() # float (seconds) tick = horus.tick() # int budget = horus.budget_remaining() # float (seconds, inf if no budget) rng_val = horus.rng_float() # float in [0, 1) ``` --- ## See Also - [DurationExt](/rust/api/duration-ext) — `.hz()`, `.ms()`, `.us()` helpers - [Deterministic Mode](/advanced/deterministic-mode) — SimClock for reproducible execution - [Scheduler API](/rust/api/scheduler) — Tick rate and timing configuration --- ## Actions API Path: /rust/api/actions Description: Long-running tasks with feedback, cancellation, and priority-based preemption # Actions API HORUS actions model long-running tasks with real-time feedback and cancellation. Define an action with the `action!` macro, build a server with `ActionServerBuilder`, and send goals with `ActionClientNode` or `SyncActionClient`. > **Python**: Actions are Rust-only. Python bindings are not yet available. **When to use Actions** - Long-running tasks that take seconds to minutes (navigation, arm motion, calibration) - Tasks that need progress feedback (distance remaining, percentage complete) - Tasks that need cancellation or preemption by higher-priority goals **When NOT to use Actions** - Quick lookups that complete in milliseconds — use [Services](/rust/api/services) instead - Continuous data streams — use [Topics](/rust/api/topic) instead - Fire-and-forget commands — use Topics instead ## Defining an Action ```rust // simplified use horus::prelude::*; action! { /// Navigate to a target pose. NavigateToPose { goal { x: f64, y: f64, theta: f64, } feedback { distance_remaining: f64, estimated_time_sec: f64, } result { success: bool, final_x: f64, final_y: f64, } } } ``` This generates four types: - `NavigateToPoseGoal` — goal struct - `NavigateToPoseFeedback` — feedback struct - `NavigateToPoseResult` — result struct - `NavigateToPose` — zero-sized marker implementing the `Action` trait --- ## Action Trait | Method | Returns | Description | |--------|---------|-------------| | `name()` | `&'static str` | Action name (used as topic prefix) | | `goal_topic()` | `String` | `"{name}.goal"` | | `cancel_topic()` | `String` | `"{name}.cancel"` | | `result_topic()` | `String` | `"{name}.result"` | | `feedback_topic()` | `String` | `"{name}.feedback"` | | `status_topic()` | `String` | `"{name}.status"` | --- ## GoalStatus Lifecycle of a goal: | Variant | Terminal? | Description | |---------|-----------|-------------| | `Pending` | No | Received but not yet processed | | `Active` | No | Currently executing | | `Succeeded` | Yes | Completed successfully | | `Aborted` | Yes | Failed due to server error | | `Canceled` | Yes | Canceled by client request | | `Preempted` | Yes | Preempted by higher-priority goal | | `Rejected` | Yes | Validation failed at acceptance | | Method | Returns | Description | |--------|---------|-------------| | `is_active()` | `bool` | `Pending` or `Active` | | `is_terminal()` | `bool` | Reached a final state | | `is_success()` | `bool` | `Succeeded` | | `is_failure()` | `bool` | `Aborted`, `Canceled`, `Preempted`, or `Rejected` | --- ## GoalPriority | Constant | Value | Description | |----------|-------|-------------| | `HIGHEST` | 0 | Critical goals | | `HIGH` | 64 | Above normal | | `NORMAL` | 128 | Default priority | | `LOW` | 192 | Below normal | | `LOWEST` | 255 | Background tasks | Lower value = higher priority. `is_higher_than(other)` compares priorities. --- ## PreemptionPolicy Controls what happens when a new goal arrives while one is active: | Variant | Description | |---------|-------------| | `RejectNew` | New goals rejected while one is active | | `PreemptOld` | New goals cancel active goals **(default)** | | `Priority` | Higher priority preempts lower priority | | `Queue { max_size }` | Queue goals up to `max_size` | --- ## GoalResponse / CancelResponse Server returns these from goal acceptance and cancel callbacks: | Type | Variants | Methods | |------|----------|---------| | `GoalResponse` | `Accept`, `Reject(String)` | `is_accepted()`, `is_rejected()`, `rejection_reason()` | | `CancelResponse` | `Accept`, `Reject(String)` | `is_accepted()`, `is_rejected()`, `rejection_reason()` | --- ## ActionServerConfig | Field | Type | Default | Description | |-------|------|---------|-------------| | `max_concurrent_goals` | `Option` | `Some(1)` | Max simultaneous goals (`None` = unlimited) | | `feedback_rate_hz` | `f64` | `10.0` | Rate limit for feedback publishing | | `goal_timeout` | `Option` | `None` | Auto-abort after timeout | | `preemption_policy` | `PreemptionPolicy` | `PreemptOld` | How to handle competing goals | | `result_history_size` | `usize` | `100` | How many results to keep in history | Builder methods: `new()`, `unlimited_goals()`, `max_goals(n)`, `feedback_rate(hz)`, `timeout(dur)`, `preemption(policy)`, `history_size(n)`. --- ## ActionError | Variant | Description | |---------|-------------| | `GoalRejected(String)` | Server rejected the goal | | `GoalCanceled` | Goal was canceled | | `GoalPreempted` | Goal was preempted by higher priority | | `GoalTimeout` | Goal timed out | | `ServerUnavailable` | No action server running | | `CommunicationError(String)` | Topic I/O failure | | `ExecutionError(String)` | Server execution error | | `InvalidGoal(String)` | Goal validation failed | | `GoalNotFound(GoalId)` | Goal ID not recognized | --- ## ActionServerBuilder | Method | Returns | Description | |--------|---------|-------------| | `ActionServerBuilder::::new()` | `Self` | Create a new builder | | `on_goal(callback)` | `Self` | Goal acceptance callback (`Fn(&Goal) -> GoalResponse`) | | `on_cancel(callback)` | `Self` | Cancel request callback (`Fn(GoalId) -> CancelResponse`) | | `on_execute(callback)` | `Self` | Execution callback (`Fn(ServerGoalHandle) -> GoalOutcome`) | | `with_config(config)` | `Self` | Apply full `ActionServerConfig` | | `max_concurrent_goals(max)` | `Self` | Shorthand for max concurrent goals | | `feedback_rate(rate_hz)` | `Self` | Shorthand for feedback rate | | `goal_timeout(timeout)` | `Self` | Shorthand for goal timeout | | `preemption_policy(policy)` | `Self` | Shorthand for preemption policy | | `build()` | `ActionServerNode` | Build the server node | ### ActionServerNode Implements `Node` — add it to a `Scheduler` to process goals. | Method | Returns | Description | |--------|---------|-------------| | `builder()` | `ActionServerBuilder` | Create a new builder | | `metrics()` | `ActionServerMetrics` | Get server metrics snapshot | ### ActionServerMetrics | Field | Type | Description | |-------|------|-------------| | `goals_received` | `u64` | Total goals received | | `goals_accepted` | `u64` | Goals that passed acceptance | | `goals_rejected` | `u64` | Goals rejected by `on_goal` | | `goals_succeeded` | `u64` | Successfully completed | | `goals_aborted` | `u64` | Aborted by server | | `goals_canceled` | `u64` | Canceled by client | | `goals_preempted` | `u64` | Preempted by higher priority | | `active_goals` | `usize` | Currently executing | | `queued_goals` | `usize` | Waiting in queue | --- ## ServerGoalHandle Handle passed to the `on_execute` callback. Use it to publish feedback and finalize the goal. ### Query Methods | Method | Returns | Description | |--------|---------|-------------| | `goal_id()` | `GoalId` | Unique goal identifier | | `goal()` | `&A::Goal` | The goal data | | `priority()` | `GoalPriority` | Goal priority level | | `status()` | `GoalStatus` | Current status | | `elapsed()` | `Duration` | Time since execution started | | `is_cancel_requested()` | `bool` | Client requested cancellation | | `is_preempt_requested()` | `bool` | Higher-priority goal wants to preempt | | `should_abort()` | `bool` | `true` if canceled or preempted — check this in loops | ### Action Methods | Method | Returns | Description | |--------|---------|-------------| | `publish_feedback(feedback)` | `()` | Send feedback to client (rate-limited) | | `succeed(result)` | `GoalOutcome` | Complete successfully | | `abort(result)` | `GoalOutcome` | Abort with error | | `canceled(result)` | `GoalOutcome` | Acknowledge cancellation | | `preempted(result)` | `GoalOutcome` | Acknowledge preemption | #### publish_feedback ```rust // simplified pub fn publish_feedback(&self, feedback: A::Feedback) ``` Send progress feedback to the client. Rate-limited by `ActionServerConfig::feedback_rate_hz` (default: 10Hz). Extra calls within the rate window are silently dropped. **Example:** ```rust // simplified handle.publish_feedback(NavFeedback { distance_remaining: 3.2, progress: 0.65, }); ``` #### succeed ```rust // simplified pub fn succeed(self, result: A::Result) -> GoalOutcome ``` Complete the goal successfully. Consumes the handle — no further operations possible. **Example:** ```rust // simplified return handle.succeed(MoveArmResult { final_position: current_pos }); ``` #### should_abort ```rust // simplified pub fn should_abort(&self) -> bool ``` Returns `true` if the client has requested cancellation or a higher-priority goal wants to preempt. **Check this in your execute loop:** **Example:** ```rust // simplified loop { if handle.should_abort() { return handle.canceled(partial_result); } // ... do work ... handle.publish_feedback(progress); } ``` ### GoalOutcome | Variant | Description | |---------|-------------| | `Succeeded(A::Result)` | Goal completed successfully | | `Aborted(A::Result)` | Server aborted execution | | `Canceled(A::Result)` | Client canceled | | `Preempted(A::Result)` | Preempted by higher priority | Methods: `status()` returns `GoalStatus`, `into_result()` extracts the result. --- ## Server Example ```rust // simplified use horus::prelude::*; action! { MoveArm { goal { target_x: f64, target_y: f64, target_z: f64 } feedback { progress: f64, current_x: f64, current_y: f64, current_z: f64 } result { success: bool, final_x: f64, final_y: f64, final_z: f64 } } } let server = ActionServerNode::::builder() .on_goal(|goal| { // Validate the goal if goal.target_z < 0.0 { GoalResponse::Reject("Z must be non-negative".into()) } else { GoalResponse::Accept } }) .on_cancel(|_goal_id| CancelResponse::Accept) .on_execute(|handle| { let goal = handle.goal(); let mut progress = 0.0; while progress < 1.0 { // Check for cancellation if handle.should_abort() { return handle.canceled(MoveArmResult { success: false, final_x: 0.0, final_y: 0.0, final_z: 0.0, }); } progress += 0.01; handle.publish_feedback(MoveArmFeedback { progress, current_x: goal.target_x * progress, current_y: goal.target_y * progress, current_z: goal.target_z * progress, }); std::thread::sleep(10_u64.ms()); } handle.succeed(MoveArmResult { success: true, final_x: goal.target_x, final_y: goal.target_y, final_z: goal.target_z, }) }) .preemption_policy(PreemptionPolicy::Priority) .goal_timeout(30_u64.secs()) .build(); let mut scheduler = Scheduler::new(); scheduler.add(server).order(0).build(); ``` --- ## ActionClientBuilder | Method | Returns | Description | |--------|---------|-------------| | `ActionClientBuilder::::new()` | `Self` | Create a new builder | | `on_feedback(callback)` | `Self` | Feedback callback (`Fn(GoalId, &Feedback)`) | | `on_result(callback)` | `Self` | Result callback (`Fn(GoalId, GoalStatus, &Result)`) | | `on_status(callback)` | `Self` | Status change callback (`Fn(GoalId, GoalStatus)`) | | `build()` | `ActionClientNode` | Build the client node | ### ActionClientNode Implements `Node` — add it to a `Scheduler` alongside the server. | Method | Returns | Description | |--------|---------|-------------| | `builder()` | `ActionClientBuilder` | Create a new builder | | `send_goal(goal)` | `Result` | Send with `NORMAL` priority | | `send_goal_with_priority(goal, priority)` | `Result` | Send with specific priority | | `cancel_goal(goal_id)` | `()` | Request cancellation | #### send_goal Sends a goal to the action server with default (`NORMAL`) priority. **Signature** ```rust // simplified pub fn send_goal(&self, goal: A::Goal) -> Result, ActionError> ``` **Parameters** | Name | Type | Required | Description | |------|------|----------|-------------| | `goal` | `A::Goal` | yes | The goal data. Type defined by `action!` macro's `goal {}` block. | **Returns** `Result, ActionError>` — a handle to monitor progress, get results, or cancel. **Errors** | Error | Condition | |-------|-----------| | `ActionError::ServerUnavailable` | No action server registered | | `ActionError::GoalRejected` | Server's `on_goal` callback rejected the goal | **Example** ```rust // simplified use horus::prelude::*; let handle = client.send_goal(NavigateToPoseGoal { x: 1.0, y: 2.0, theta: 0.0 })?; // Monitor with handle.status(), handle.result(), handle.last_feedback() ``` #### send_goal_with_priority Sends a goal with explicit priority. Higher priority goals can preempt lower ones. **Signature** ```rust // simplified pub fn send_goal_with_priority(&self, goal: A::Goal, priority: GoalPriority) -> Result, ActionError> ``` **Parameters** | Name | Type | Required | Description | |------|------|----------|-------------| | `goal` | `A::Goal` | yes | The goal data. | | `priority` | `GoalPriority` | yes | `GoalPriority::LOW`, `NORMAL`, `HIGH`, or `HIGHEST`. | **Behavior** If the server's `PreemptionPolicy` allows it, a higher-priority goal will preempt the current executing goal. The preempted goal's `is_preempt_requested()` returns `true`. | `goal_status(goal_id)` | `Option` | Query goal status | | `active_goals()` | `Vec` | All active goal IDs | | `active_goal_count()` | `usize` | Number of active goals | | `metrics()` | `ActionClientMetrics` | Get client metrics | ### ActionClientMetrics | Field | Type | Description | |-------|------|-------------| | `goals_sent` | `u64` | Total goals sent | | `goals_succeeded` | `u64` | Successfully completed | | `goals_failed` | `u64` | Aborted, canceled, preempted, or rejected | | `cancels_sent` | `u64` | Cancel requests sent | | `active_goals` | `usize` | Currently active | --- ## ClientGoalHandle Handle returned by `send_goal()`. Use it to monitor progress and get results. ### Query Methods | Method | Returns | Description | |--------|---------|-------------| | `goal_id()` | `GoalId` | Unique goal identifier | | `priority()` | `GoalPriority` | Goal priority level | | `status()` | `GoalStatus` | Current status | | `is_active()` | `bool` | Still executing | | `is_done()` | `bool` | Reached terminal state | | `is_success()` | `bool` | Completed successfully | | `elapsed()` | `Duration` | Time since goal was sent | | `time_since_update()` | `Duration` | Time since last status change | | `result()` | `Option` | Get result if completed | | `last_feedback()` | `Option` | Most recent feedback | ### Blocking Methods | Method | Returns | Description | |--------|---------|-------------| | `await_result(timeout)` | `Option` | Block until result or timeout | | `await_result_with_feedback(timeout, callback)` | `Result` | Block with feedback callback | | `cancel()` | `()` | Send cancel request to server | --- ## SyncActionClient (Blocking) Standalone blocking client — does not need a `Scheduler`. | Method | Returns | Description | |--------|---------|-------------| | `SyncActionClient::::new()` | `Result` | Create and initialize | | `send_goal_and_wait(goal, timeout)` | `Result` | Send goal, block until result | | `send_goal_and_wait_with_feedback(goal, timeout, callback)` | `Result` | Block with feedback | | `cancel_goal(goal_id)` | `()` | Request cancellation | Type alias: `ActionClient = SyncActionClient` #### send_goal_and_wait ```rust // simplified pub fn send_goal_and_wait(&self, goal: A::Goal, timeout: Duration) -> Result ``` Send a goal and block until the server completes it or timeout elapses. The simplest way to execute an action. **Parameters:** - `goal: A::Goal` — The goal to send - `timeout: Duration` — Maximum wait time **Returns:** `Ok(result)` on success, `Err(ActionError::GoalTimeout)` on timeout, `Err(ActionError::GoalRejected)` if server rejects. **Example:** ```rust // simplified let result = client.send_goal_and_wait( MoveArmGoal { target: [0.5, 0.3, 0.1], speed: 0.5 }, 10_u64.secs(), )?; println!("Arm reached: {:?}", result.final_position); ``` #### send_goal_and_wait_with_feedback ```rust // simplified pub fn send_goal_and_wait_with_feedback( &self, goal: A::Goal, timeout: Duration, feedback_cb: impl Fn(GoalId, &A::Feedback), ) -> Result ``` Like `send_goal_and_wait`, but calls `feedback_cb` whenever the server publishes feedback. **Example:** ```rust // simplified let result = client.send_goal_and_wait_with_feedback( nav_goal, 30_u64.secs(), |_id, feedback| { println!("Progress: {:.0}%, distance: {:.1}m", feedback.progress * 100.0, feedback.distance_remaining); }, )?; ``` ### Example ```rust // simplified use horus::prelude::*; // Simple blocking usage (no scheduler needed) let client = SyncActionClient::::new()?; let result = client.send_goal_and_wait_with_feedback( MoveArmGoal { target_x: 1.0, target_y: 2.0, target_z: 0.5 }, 30_u64.secs(), |feedback| { println!("Progress: {:.0}%", feedback.progress * 100.0); }, )?; if result.success { println!("Arm reached ({:.1}, {:.1}, {:.1})", result.final_x, result.final_y, result.final_z); } ``` --- ## GoalId Unique identifier for each goal (`Uuid`-backed). | Method | Returns | Description | |--------|---------|-------------| | `GoalId::new()` | `Self` | Generate a new unique ID | | `GoalId::from_uuid(uuid)` | `Self` | Create from existing UUID | | `as_uuid()` | `&Uuid` | Get underlying UUID | --- ## Wire Types > **Advanced:** These types carry action data through topics internally. Most users interact with `ServerGoalHandle` and `ClientGoalHandle` instead. | Type | Fields | Description | |------|--------|-------------| | `GoalRequest` | `goal_id`, `goal`, `priority`, `timestamp` | Sent from client to server when a goal is submitted | | `CancelRequest` | `goal_id`, `timestamp` | Sent from client to server to cancel a goal | | `ActionResult` | `goal_id`, `status`, `result`, `timestamp` | Sent from server to client with the final result | | `ActionFeedback` | `goal_id`, `feedback`, `timestamp` | Sent from server to client during execution | | `GoalStatusUpdate` | `goal_id`, `status`, `timestamp` | Status change notification | `ActionResult` (the struct) has convenience constructors: ```rust // simplified ActionResult::succeeded(goal_id, result) ActionResult::aborted(goal_id, result) ActionResult::canceled(goal_id, result) ActionResult::preempted(goal_id, result) ``` ## Callback Types Type aliases for the callback signatures used by builders: ```rust // simplified // Server callbacks (passed to ActionServerBuilder) type GoalCallback = Box::Goal) -> GoalResponse + Send + Sync>; type CancelCallback = Box CancelResponse + Send + Sync>; type ExecuteCallback = Box) -> GoalOutcome + Send + Sync>; // Client callbacks (passed to ActionClientBuilder) type FeedbackCallback = Box::Feedback) + Send + Sync>; type ResultCallback = Box::Result) + Send + Sync>; type StatusCallback = Box; ``` You never construct these directly — they're inferred from the closures you pass to `.on_goal()`, `.on_execute()`, `.on_feedback()`, etc. --- ## See Also - [Actions Concepts](/concepts/actions) — Architecture and lifecycle design - [Services API](/rust/api/services) — Synchronous request/response RPC - [Topic API](/rust/api/topic) — Streaming pub/sub communication - [Scheduler API](/rust/api/scheduler) — Node execution orchestrator --- ## Rate & Stopwatch Path: /rust/api/rate-stopwatch Description: Fixed-frequency rate limiting and elapsed time measurement for background threads and performance profiling # Rate & Stopwatch Two timing utilities exported from `horus::prelude::*` for use outside the scheduler's tick loop. ```rust // simplified use horus::prelude::*; ``` ## Rate — Fixed-Frequency Loop `Rate` is the HORUS equivalent of ROS2's `rclcpp::Rate`. Use it for standalone threads that need to run at a target frequency without a scheduler. ```rust // simplified use horus::prelude::*; // Hardware polling thread at 100 Hz std::thread::spawn(|| { let mut rate = Rate::new(100.0); loop { let reading = read_sensor(); process(reading); rate.sleep(); // Sleeps the remaining fraction of 10ms } }); ``` ### How It Works `rate.sleep()` calculates how much time remains in the current period and sleeps for that duration. If work took longer than the period, sleep is skipped and the next cycle catches up — no drift accumulation. ### API | Method | Returns | Description | |---|---|---| | `Rate::new(hz)` | `Rate` | Create a rate limiter at `hz` Hz. Panics if `hz <= 0` | | `.sleep()` | `()` | Sleep for the remainder of the current period | | `.actual_hz()` | `f64` | Exponentially smoothed actual frequency | | `.target_hz()` | `f64` | The target frequency in Hz | | `.period()` | `Duration` | The target period (1/hz) | | `.reset()` | `()` | Reset cycle start to now (use after a long pause) | | `.is_late()` | `bool` | Whether the current cycle exceeded the target period | ### Example: Hardware Driver Thread ```rust // simplified use horus::prelude::*; struct CanBusReader { topic: Topic, } impl CanBusReader { fn run(&mut self) { let mut rate = Rate::new(500.0); // 500 Hz CAN bus polling loop { if let Some(frame) = self.read_can_frame() { let cmd = MotorCommand::from_can(frame); self.topic.send(cmd); } if rate.is_late() { hlog!(warn, "CAN polling late — actual {:.0} Hz", rate.actual_hz()); } rate.sleep(); } } } ``` ### When to Use Rate vs Scheduler | Scenario | Use | |---|---| | Node with `tick()` in a scheduler | Scheduler handles timing — don't use Rate | | Background thread polling hardware | `Rate` | | Standalone process (no scheduler) | `Rate` | | One-off timed loop in a test | `Rate` | ## Stopwatch — Elapsed Time `Stopwatch` measures elapsed time with lap support. Useful for profiling operations inside nodes. ```rust // simplified use horus::prelude::*; let mut sw = Stopwatch::start(); expensive_computation(); hlog!(debug, "computation took {:.2} ms", sw.elapsed_ms()); ``` ### API | Method | Returns | Description | |---|---|---| | `Stopwatch::start()` | `Stopwatch` | Create and start immediately | | `.elapsed()` | `Duration` | Time since start (or last reset) | | `.elapsed_us()` | `u64` | Elapsed microseconds | | `.elapsed_ms()` | `f64` | Elapsed milliseconds (fractional) | | `.lap()` | `Duration` | Return elapsed and reset (for split timing) | | `.reset()` | `()` | Reset start time to now | ### Example: Profiling a Node ```rust // simplified use horus::prelude::*; struct PlannnerNode { scan_sub: Topic, path_pub: Topic, } impl Node for PlannnerNode { fn name(&self) -> &str { "Planner" } fn tick(&mut self) { if let Some(scan) = self.scan_sub.recv() { let mut sw = Stopwatch::start(); let path = self.compute_path(&scan); let plan_time = sw.lap(); self.path_pub.send(path); let total_time = sw.elapsed(); hlog!(debug, "plan={:.1}ms total={:.1}ms", plan_time.as_secs_f64() * 1000.0, (plan_time + total_time).as_secs_f64() * 1000.0); } } } ``` ### Example: Multi-Lap Benchmarking ```rust // simplified let mut sw = Stopwatch::start(); let data = load_model(); let load_time = sw.lap(); let result = run_inference(&data); let infer_time = sw.lap(); save_result(&result); let save_time = sw.lap(); hlog!(info, "load={:.1}ms infer={:.1}ms save={:.1}ms", load_time.as_secs_f64() * 1000.0, infer_time.as_secs_f64() * 1000.0, save_time.as_secs_f64() * 1000.0); ``` ## See Also - **[Time API](/rust/time-api)** — `horus::now()`, `horus::dt()`, `horus::elapsed()` for scheduler-aware time - **[DurationExt](/rust/api/duration-ext)** — `100_u64.hz()`, `200_u64.us()` ergonomic helpers - **[Scheduler](/rust/api/scheduler)** — built-in rate control via `.rate()` and `.tick_rate()` --- ## Error Types Path: /rust/api/error-types Description: Structured error handling — HorusError variants, pattern matching, severity, and retry utilities # Error Types Every error in HORUS is structured and pattern-matchable. Import the short aliases from the prelude: ```rust // simplified use horus::prelude::*; // Result = std::result::Result // Error = HorusError fn my_function() -> Result<()> { let topic: Topic = Topic::new("sensor")?; Ok(()) } ``` ## Quick Reference | Prelude Name | Full Name | What it is | |---|---|---| | `Result` | `HorusResult` | `std::result::Result` | | `Error` | `HorusError` | The umbrella error enum | **Always use `Result` and `Error`** — never write `HorusResult` or `HorusError` directly. ## HorusError Variants `HorusError` is a `#[non_exhaustive]` enum with 13 variants. Each wraps a domain-specific sub-error: | Variant | Sub-error Type | Domain | Common Source | |---|---|---|---| | `Io(…)` | `std::io::Error` | File/system I/O | `std::fs::read()` | | `Config(…)` | `ConfigError` | Configuration | `horus.toml` parsing | | `Communication(…)` | `CommunicationError` | IPC, topics | `Topic::new()` | | `Node(…)` | `NodeError` | Node lifecycle | `init()`, `tick()` panics | | `Memory(…)` | `MemoryError` | SHM, tensors | `Image::new()`, pool exhaustion | | `Serialization(…)` | `SerializationError` | JSON, YAML, TOML | Config/message parsing | | `NotFound(…)` | `NotFoundError` | Missing resources | Frame/topic/node lookup | | `Resource(…)` | `ResourceError` | Resource lifecycle | Duplicate names, permissions | | `InvalidInput(…)` | `ValidationError` | Input validation | Parameter bounds, format | | `Parse(…)` | `ParseError` | Type parsing | Integer, float, bool from strings | | `InvalidDescriptor(…)` | `String` | Tensor integrity | Cross-process tensor validation | | `Transform(…)` | `TransformError` | Coordinate frames | Extrapolation, stale data | | `Timeout(…)` | `TimeoutError` | Timeouts | Service calls, resource waits | ## Pattern Matching Match on specific error variants to handle them differently: ```rust // simplified use horus::prelude::*; fn handle_topic_error(err: Error) { match err { HorusError::Communication(CommunicationError::TopicFull { topic }) => { // Back-pressure: subscriber is slower than publisher hlog!(warn, "Topic '{}' full, dropping message", topic); } HorusError::Communication(CommunicationError::TopicNotFound { topic }) => { // Topic doesn't exist yet hlog!(error, "Topic '{}' not found — is the publisher running?", topic); } HorusError::Memory(MemoryError::PoolExhausted { reason }) => { // Image/PointCloud pool has no free slots hlog!(warn, "Pool exhausted: {} — waiting for consumers", reason); } HorusError::Transform(TransformError::Stale { frame, age, threshold }) => { // Transform data is too old hlog!(warn, "Frame '{}' stale: {:?} > {:?}", frame, age, threshold); } other => { hlog!(error, "Unexpected error: {}", other); // Check for actionable hints if let Some(hint) = other.help() { hlog!(info, " hint: {}", hint); } } } } ``` ## Error Hints Every `HorusError` has a `.help()` method that returns an actionable remediation hint: ```rust // simplified if let Err(e) = Topic::::new("sensor") { eprintln!("error: {}", e); if let Some(hint) = e.help() { eprintln!(" hint: {}", hint); } } ``` Example output: ``` error: Failed to create topic 'sensor': permission denied hint: Check shared memory permissions and available space. Run: horus clean --shm ``` ## Severity Every error has a `Severity` classification used by the scheduler for automatic recovery: | Severity | Meaning | Scheduler Action | |---|---|---| | `Transient` | May resolve on retry (back-pressure, timeout) | Retry | | `Permanent` | Won't succeed but system can continue | Skip, log warning | | `Fatal` | Data integrity compromised, unrecoverable | Stop node or scheduler | ```rust // simplified match err.severity() { Severity::Transient => { /* retry */ } Severity::Permanent => { hlog!(warn, "{}", err); } Severity::Fatal => { /* emergency stop */ } } ``` ## Adding Context Use the `HorusContext` trait to wrap foreign errors with descriptive context: ```rust // simplified use horus::prelude::*; fn load_sensor_config(path: &str) -> Result { // .horus_context() wraps std::io::Error with a message let data = std::fs::read_to_string(path) .horus_context(format!("reading sensor config '{}'", path))?; Ok(data) } fn load_calibration(path: &str) -> Result { // .horus_context_with() is lazy — closure only called on Err std::fs::read_to_string(path) .horus_context_with(|| format!("reading calibration '{}'", path)) } ``` The context is preserved in the error chain and displayed as: ``` reading sensor config 'sensors.yaml' Caused by: No such file or directory (os error 2) ``` ## Retry Utility `retry_transient` automatically retries operations that return transient errors: ```rust // simplified use horus::prelude::*; let config = RetryConfig { max_retries: 3, base_delay: 100_u64.ms(), ..Default::default() }; let result = retry_transient(&config, || { connect_to_sensor() }); ``` ## Common Error Scenarios | You're doing | Error you'll see | What to do | |---|---|---| | `Topic::new("name")` | `CommunicationError::TopicCreationFailed` | Check SHM permissions, disk space | | `scheduler.run()` | `NodeError::InitPanic` | Fix the node's `init()` method | | `Image::new(w, h, enc)` | `MemoryError::PoolExhausted` | Consumers aren't dropping images fast enough | | `tf.tf("a", "b")` | `TransformError::Extrapolation` | Increase history buffer or check timestamp | | `client.call(req, timeout)` | `TimeoutError` | Server not running or overloaded | | `params.set("key", val)` | `ValidationError::OutOfRange` | Value outside configured min/max | ## Sub-Error Details ### CommunicationError ```rust // simplified TopicFull { topic } // Ring buffer full TopicNotFound { topic } // No such topic TopicCreationFailed { topic, reason } // SHM setup failed NetworkFault { peer, reason } // Peer unreachable ActionFailed { reason } // Action system error ``` ### NotFoundError ```rust // simplified Frame { name } // TransformFrame lookup Topic { name } // Topic lookup Node { name } // Node lookup Service { name } // Service lookup Action { name } // Action lookup Parameter { name } // RuntimeParams lookup ``` ### ValidationError ```rust // simplified OutOfRange { field, min, max, actual } // Value outside bounds InvalidFormat { field, expected_format, actual } // Wrong format InvalidEnum { field, valid_options, actual } // Not an allowed value MissingRequired { field } // Required field absent ConstraintViolation { field, constraint } // Custom constraint Conflict { field_a, field_b, reason } // Two fields conflict ``` ### ConfigError ```rust // simplified ParseFailed { format, reason, source } // Failed to parse config file MissingField { field, context } // Required field missing ValidationFailed { field, expected, actual } // Value doesn't match constraint InvalidValue { key, reason } // Invalid configuration value Other(String) // General config error ``` ### SerializationError ```rust // simplified Json { source: serde_json::Error } // JSON parse/emit failure Yaml { source: serde_yaml::Error } // YAML parse/emit failure Toml { source: toml::ser::Error } // TOML parse/emit failure Other { format, reason } // Other format error ``` ### MemoryError ```rust // simplified PoolExhausted { reason } // Tensor pool out of slots AllocationFailed { reason } // Memory allocation failed ShmCreateFailed { path, reason } // Shared memory region creation failed MmapFailed { reason } // Memory mapping failed DLPackImportFailed { reason } // DLPack tensor import failed OffsetOverflow // Tensor offset exceeds region ``` ### NodeError ```rust // simplified InitPanic { node } // Node panicked during init() ReInitPanic { node } // Node panicked during re-init ShutdownPanic { node } // Node panicked during shutdown() InitFailed { node, reason } // init() returned Err TickFailed { node, reason } // Tick error Other { node, message } // General node error ``` ### ResourceError ```rust // simplified AlreadyExists { resource_type, name } // Resource already registered PermissionDenied { resource, required_permission } // Insufficient permissions Unsupported { feature, reason } // Feature not available on this platform ``` ### ParseError ```rust // simplified Int { input, source: ParseIntError } // Integer parsing failed Float { input, source: ParseFloatError } // Float parsing failed Bool { input, source: ParseBoolError } // Boolean parsing failed Custom { type_name, input, reason } // Custom type parse failed ``` ### TransformError ```rust // simplified Extrapolation { frame, requested_ns, oldest_ns, newest_ns } // Out of buffer range Stale { frame, age, threshold } // Transform too old ``` ### TimeoutError ```rust // simplified TimeoutError { resource, elapsed, deadline } // Operation timed out ``` ### Internal & Contextual Variants Two special variants for framework-internal errors: ```rust // simplified // Internal error with source location (use horus_internal!() macro) Internal { message: String, file: &'static str, line: u32 } // Error with preserved source chain (use .horus_context() on Results) Contextual { message: String, source: Box } ``` ```rust // simplified // Adding context to errors let device = Device::open("/dev/ttyUSB0") .horus_context("opening motor controller serial port")?; ``` ## See Also - **[Error Handling Guide](/development/error-handling)** — high-level patterns and best practices - **[Node Trait](/concepts/core-concepts-nodes)** — lifecycle methods that return `Result` - **[Services](/rust/api/services)** — `ServiceError` for RPC failures - **[Actions](/rust/api/actions)** — `ActionError` for long-running task failures --- ## TransformFrame API Path: /rust/api/transform-frame Description: Lock-free coordinate frame tree with sub-microsecond transform lookups # TransformFrame API `TransformFrame` manages coordinate frame trees for real-time robotics — the spatial relationships between your robot's base, sensors, and the world. All lookups are lock-free and wait-free. Register frames with `FrameBuilder`, query transforms with `TransformQuery`, and update dynamic frames at sensor rates. > New to coordinate frames? See [TransformFrame Concepts](/concepts/transform-frame) for an introduction. ### Quick Reference | Method | Returns | Description | |--------|---------|-------------| | `TransformFrame::new(config)` | `Result` | Create a new transform frame tree | | `.register_frame(name, parent)` | `Result<()>` | Add a frame to the tree | | `.update_transform(frame, transform, timestamp)` | `Result<()>` | Update a dynamic frame's transform | | `.tf(from, to)` | `Result` | Look up transform between two frames | | `.tf_at(from, to, time)` | `Result` | Look up transform at a specific time | | `.is_stale(frame, max_age)` | `bool` | Check if a frame's data is too old | | `.frame_exists(name)` | `bool` | Check if a frame is registered | | `.diagnostics()` | `FrameDiagnostics` | Get tree statistics and health | ## Creating a TransformFrame ```rust // simplified use horus::prelude::*; // Default: 256 frames, 128 static, 32-entry history let tf = TransformFrame::new(); // Presets let tf = TransformFrame::small(); // 256 frames (~550KB) let tf = TransformFrame::medium(); // 1024 frames (~2.2MB) let tf = TransformFrame::large(); // 4096 frames (~9MB) let tf = TransformFrame::massive(); // 16384 frames (~35MB) // Custom let tf = TransformFrame::with_config( TransformFrameConfig::custom() .max_frames(512) .history_len(64) .enable_overflow(false) // Hard RT: no heap fallback .build()? ); ``` | Constructor | Description | |-------------|-------------| | `TransformFrame::new()` | Default (256 frames, small preset) | | `TransformFrame::small()` | 256 frames, embedded/single robot | | `TransformFrame::medium()` | 1024 frames, complex/multi-robot | | `TransformFrame::large()` | 4096 frames, simulations | | `TransformFrame::massive()` | 16384 frames, 100+ robots | | `TransformFrame::with_config(config)` | Custom configuration | --- ## TransformFrameConfig | Field | Type | Default | Description | |-------|------|---------|-------------| | `max_frames` | `usize` | `256` | Maximum frames (16–65536) | | `max_static_frames` | `usize` | `max_frames/2` | Static frame slots | | `history_len` | `usize` | `32` | History buffer per dynamic frame (4–256) | | `enable_overflow` | `bool` | `true` | Fallback to HashMap if capacity exceeded | | `chain_cache_size` | `usize` | `64` | Cache for transform chain lookups | ### Presets | Preset | Capacity | Static | Memory | Use Case | |--------|----------|--------|--------|----------| | `small()` | 256 | 128 | ~550KB | Single robots, embedded | | `medium()` | 1024 | 512 | ~2.2MB | Complex robots, multi-robot | | `large()` | 4096 | 2048 | ~9MB | Multi-robot simulations | | `massive()` | 16384 | 8192 | ~35MB | Large-scale simulations (100+ robots) | | `unlimited()` | 4096 | 2048 | 35MB+ | Dynamic/unpredictable counts (no RT guarantee) | | `rt_fixed(max_frames)` | custom | max_frames/2 | variable | Hard real-time (no overflow) | ### TransformFrameConfigBuilder ```rust // simplified let config = TransformFrameConfig::custom() .max_frames(512) .max_static_frames(256) .history_len(64) .enable_overflow(false) .chain_cache_size(128) .build()?; println!("Memory: {}", config.memory_estimate()); // "1.1 MB" ``` | Method | Returns | Description | |--------|---------|-------------| | `TransformFrameConfig::custom()` | `TransformFrameConfigBuilder` | Start custom config | | `.max_frames(n)` | `Self` | Set max frames | | `.max_static_frames(n)` | `Self` | Set static frame slots | | `.history_len(n)` | `Self` | Set history buffer size | | `.enable_overflow(bool)` | `Self` | Enable/disable heap fallback | | `.chain_cache_size(n)` | `Self` | Set chain cache size | | `.build()` | `Result` | Validate and build | | `config.validate()` | `Result<(), String>` | Check constraints | | `config.estimated_memory_bytes()` | `usize` | Memory estimate in bytes | | `config.memory_estimate()` | `String` | Human-readable memory (e.g., "2.2 MB") | --- ## Registering Frames ### FrameBuilder (Fluent API) The recommended way to register frames: ```rust // simplified // Root frame (no parent) tf.add_frame("world").build()?; // Dynamic child frame tf.add_frame("base_link").parent("world").build()?; // Static frame with fixed offset (e.g., sensor mount) tf.add_frame("camera") .parent("base_link") .static_transform(&Transform::xyz(0.1, 0.0, 0.5)) .build()?; ``` | Method | Returns | Description | |--------|---------|-------------| | `tf.add_frame(name)` | `FrameBuilder` | Start registering a frame | | `.parent(name)` | `Self` | Set parent frame | | `.static_transform(transform)` | `Self` | Mark as static with fixed transform (requires parent) | | `.build()` | `Result` | Register and return frame ID | ### Direct Registration ```rust // simplified // Dynamic frame let id = tf.register_frame("base_link", Some("world"))?; // Static frame with fixed transform let id = tf.register_static_frame("camera", Some("base_link"), &Transform::xyz(0.1, 0.0, 0.5))?; // Unregister (dynamic only) tf.unregister_frame("temp_frame")?; ``` ### Frame Lookup | Method | Returns | Description | |--------|---------|-------------| | `frame_id(name)` | `Option` | Get frame ID by name (cache this for hot paths) | | `frame_name(id)` | `Option` | Get name by ID | | `has_frame(name)` | `bool` | Check if frame exists | | `all_frames()` | `Vec` | All frame names | | `frame_count()` | `usize` | Total registered frames | | `parent(name)` | `Option` | Parent frame name | | `children(name)` | `Vec` | Direct child frames | --- ## Updating Transforms Update dynamic frames with new transform data at sensor rates: ```rust // simplified let now = timestamp_now(); tf.update_transform("base_link", &Transform::xyz(1.0, 2.0, 0.0), now)?; // By ID (fastest, lock-free, no name lookup) let id = tf.frame_id("base_link").unwrap(); tf.update_transform_by_id(id, &Transform::xyz(1.0, 2.0, 0.0), now)?; // Update static frame offset tf.set_static_transform("camera", &Transform::xyz(0.1, 0.0, 0.6))?; ``` | Method | Returns | Description | |--------|---------|-------------| | `update_transform(name, transform, timestamp_ns)` | `Result<()>` | Update by name (validates transform) | | `update_transform_by_id(id, transform, timestamp_ns)` | `Result<()>` | Update by ID (fastest, lock-free) | | `set_static_transform(name, transform)` | `Result<()>` | Update a static frame's offset | ### `update_transform(name, transform, timestamp_ns)` Updates a dynamic frame's transform at the current timestamp. **Parameters** | Name | Type | Required | Description | |------|------|----------|-------------| | `name` | `&str` | yes | Frame name (must be registered and dynamic, not static) | | `transform` | `&Transform` | yes | New transform relative to parent. NaN/Inf rejected. Quaternions auto-normalized. | | `timestamp_ns` | `u64` | yes | Timestamp in nanoseconds. Use `timestamp_now()` for current time. | **Returns** `Result<()>` — `Ok` on success, `Err` if frame doesn't exist or transform is invalid. **Behavior** - Validates transform: rejects NaN/Inf, auto-normalizes quaternions - Stores in history buffer for temporal interpolation (LERP + SLERP) - Lock-free write — safe to call from RT nodes at sensor rates **When to use**: Call every tick from sensor nodes that produce position data (odometry, SLAM, IMU integration). ### `update_transform_by_id(id, transform, timestamp_ns)` Same as `update_transform` but uses `FrameId` instead of name string. Skips name lookup — use this in hot loops. **Parameters** | Name | Type | Required | Description | |------|------|----------|-------------| | `id` | `FrameId` | yes | Frame ID from `frame_id(name)`. Cache this at init time. | | `transform` | `&Transform` | yes | New transform. Same validation as `update_transform`. | | `timestamp_ns` | `u64` | yes | Timestamp in nanoseconds. | Transforms are automatically validated — NaN/Inf values are rejected and quaternions are auto-normalized. --- ## Querying Transforms ### TransformQuery (Fluent API) The recommended way to look up transforms: ```rust // simplified // Latest transform from camera to world let t = tf.query("camera").to("world").lookup()?; // At a specific timestamp (with LERP + SLERP interpolation) let t = tf.query("camera").to("world").at(timestamp_ns)?; // Strict: error if extrapolation needed let t = tf.query("camera").to("world").at_strict(timestamp_ns)?; // With tolerance window let t = tf.query("camera").to("world").at_with_tolerance(timestamp_ns, 100_000_000)?; // 100ms // Transform a point let world_pt = tf.query("lidar").to("world").point([1.0, 0.0, 0.0])?; // Transform a vector (rotation only, no translation) let world_vec = tf.query("imu").to("world").vector([0.0, 0.0, 9.81])?; // Check availability if tf.query("sensor").to("world").can_at(timestamp_ns) { let t = tf.query("sensor").to("world").at(timestamp_ns)?; } // Get the frame chain let chain = tf.query("camera").to("world").chain()?; // ["camera", "base_link", "world"] ``` ### TransformQueryFrom | Method | Returns | Description | |--------|---------|-------------| | `tf.query(src)` | `TransformQueryFrom` | Start query from source frame | | `.to(dst)` | `TransformQuery` | Set destination frame | ### TransformQuery Methods | Method | Returns | Description | |--------|---------|-------------| | `lookup()` | `Result` | Latest transform | | `at(timestamp_ns)` | `Result` | At timestamp with interpolation | | `at_strict(timestamp_ns)` | `Result` | Error if extrapolation needed | | `at_with_tolerance(timestamp_ns, tolerance_ns)` | `Result` | Error if gap exceeds tolerance | | `point(xyz)` | `Result<[f64; 3]>` | Transform a 3D point | | `vector(xyz)` | `Result<[f64; 3]>` | Transform a vector (rotation only) | | `can_at(timestamp_ns)` | `bool` | Check if transform available at timestamp | | `can_at_with_tolerance(timestamp_ns, tolerance_ns)` | `bool` | Check within tolerance window | | `chain()` | `Result>` | Frame chain from source to destination | | `wait(timeout)` | `Result` | Block until available (feature `wait`) | | `wait_at(timestamp_ns, timeout)` | `Result` | Block until timestamp data available (feature `wait`) | | `wait_async(timeout)` | `Result` | Async wait (feature `async-wait`) | | `wait_at_async(timestamp_ns, timeout)` | `Result` | Async wait at timestamp (feature `async-wait`) | ### Key Method Details #### lookup ```rust // simplified pub fn lookup(&self) -> Result ``` Get the latest known transform between source and destination frames. Returns the most recent transform regardless of timestamp. **Returns:** `Ok(Transform)` with translation (x, y, z) and rotation (quaternion). **Errors:** - `TransformError::FrameNotFound` — Source or destination frame doesn't exist - `TransformError::NoPath` — No chain of frames connects source to destination - `TransformError::EmptyHistory` — Frame exists but no transforms have been published yet #### at ```rust // simplified pub fn at(&self, timestamp_ns: u64) -> Result ``` Get the transform at a specific timestamp, with automatic interpolation between stored samples. **Parameters:** - `timestamp_ns: u64` — Timestamp in nanoseconds since epoch. Use `horus::now().as_nanos()`. **Behavior:** If the requested time falls between two stored transforms, SLERP interpolation is used for rotation and linear interpolation for translation. If the time is outside stored history, extrapolation is used (may be inaccurate). **Errors:** Same as `lookup()` plus `TransformError::Stale` if the transform data is too old. #### wait ```rust // simplified pub fn wait(&self, timeout: Duration) -> Result ``` Block until the transform becomes available or timeout elapses. Use this when your node starts before the transform publisher. **Parameters:** - `timeout: Duration` — Maximum time to wait. Use `5_u64.secs()`. **Returns:** `Ok(Transform)` once available, `Err(TransformError::Timeout)` if timeout elapses. ### Direct Query Methods For cases where you don't need the fluent builder: | Method | Returns | Description | |--------|---------|-------------| | `tf(src, dst)` | `Result` | Latest transform src→dst | | `tf_by_id(src_id, dst_id)` | `Option` | By ID (fastest, lock-free) | | `tf_at(src, dst, timestamp_ns)` | `Result` | At timestamp with interpolation | | `tf_at_strict(src, dst, timestamp_ns)` | `Result` | Error if extrapolation needed | | `tf_at_with_tolerance(src, dst, ts, tol)` | `Result` | Error if gap exceeds tolerance | | `can_transform(src, dst)` | `bool` | Check if path exists | | `can_transform_at(src, dst, timestamp_ns)` | `bool` | Check at timestamp | | `can_transform_at_with_tolerance(src, dst, ts, tol)` | `bool` | Check within tolerance | | `transform_point(src, dst, point)` | `Result<[f64; 3]>` | One-shot point transform | | `transform_vector(src, dst, vector)` | `Result<[f64; 3]>` | Rotation-only vector transform | > **Argument order**: `tf.tf("camera", "world")` means "transform FROM camera TO world". This is the opposite of ROS2 TF2's `lookupTransform(target, source)`. --- ## Transform A 3D rigid body transform (translation + quaternion rotation) in f64 precision. ### Fields ```rust // simplified pub struct Transform { pub translation: [f64; 3], // [x, y, z] in meters pub rotation: [f64; 4], // [x, y, z, w] Hamilton quaternion } ``` ### Constructors ```rust // simplified let t = Transform::identity(); let t = Transform::new([1.0, 2.0, 0.0], [0.0, 0.0, 0.0, 1.0]); let t = Transform::xyz(1.0, 2.0, 0.0); let t = Transform::x(1.0); // X-axis only let t = Transform::yaw(1.57); // Z-axis rotation (radians) let t = Transform::rpy(0.0, 0.0, 1.57); // Roll, pitch, yaw let t = Transform::from_euler([1.0, 2.0, 0.0], [0.0, 0.0, 1.57]); let t = Transform::from_translation([1.0, 2.0, 0.0]); let t = Transform::from_rotation([0.0, 0.0, 0.707, 0.707]); let t = Transform::from_matrix(matrix_4x4); // Chainable let t = Transform::xyz(1.0, 0.0, 0.0).with_yaw(1.57); let t = Transform::xyz(1.0, 0.0, 0.5).with_rpy(0.0, 0.1, 0.0); ``` | Constructor | Description | |-------------|-------------| | `identity()` | No translation or rotation | | `new(translation, rotation)` | Custom translation + quaternion (auto-normalized) | | `from_translation(xyz)` | Translation only | | `from_rotation(xyzw)` | Rotation only (auto-normalized) | | `from_euler(translation, rpy)` | Translation + Euler angles (radians) | | `xyz(x, y, z)` | Translation shorthand | | `x(v)` / `y(v)` / `z(v)` | Single-axis translation | | `yaw(angle)` / `pitch(angle)` / `roll(angle)` | Single-axis rotation (radians) | | `rpy(roll, pitch, yaw)` | Combined rotation (radians) | | `from_matrix(m)` | From 4x4 homogeneous matrix (row-major) | | `.with_yaw(angle)` | Compose yaw rotation (chainable) | | `.with_rpy(r, p, y)` | Compose RPY rotation (chainable) | ### Operations | Method | Returns | Description | |--------|---------|-------------| | `compose(other)` | `Transform` | Compose transforms (self * other) | | `inverse()` | `Transform` | Reverse direction | | `transform_point(xyz)` | `[f64; 3]` | Apply rotation + translation to point | | `transform_vector(xyz)` | `[f64; 3]` | Apply rotation only (no translation) | | `interpolate(other, t)` | `Transform` | LERP + SLERP interpolation (t in [0, 1]) | | `to_euler()` | `[f64; 3]` | Convert rotation to [roll, pitch, yaw] radians | | `to_matrix()` | `[[f64; 4]; 4]` | Convert to 4x4 homogeneous matrix | | `validate()` | `Result<(), String>` | Check for NaN/Inf and valid quaternion | | `validated()` | `Result` | Validate and auto-normalize | | `is_identity(epsilon)` | `bool` | Approximate identity test | | `translation_magnitude()` | `f64` | Translation vector length | | `rotation_angle()` | `f64` | Rotation angle in radians | ### `tf(src, dst)` — Shorthand Transform Lookup Looks up the latest transform between two named frames. **Signature** ```rust // simplified pub fn tf(&self, src: &str, dst: &str) -> HorusResult ``` **Parameters** | Name | Type | Required | Description | |------|------|----------|-------------| | `src` | `&str` | yes | Source frame name (e.g., `"camera"`) | | `dst` | `&str` | yes | Destination frame name (e.g., `"world"`) | **Returns** `HorusResult` — the transform that converts coordinates from `src` frame to `dst` frame. **Errors** | Error | Condition | |-------|-----------| | `FrameNotFound` | Source or destination frame not registered | | `NoPathBetweenFrames` | Frames exist but have no common ancestor in the tree | **Behavior** - Returns the latest transform regardless of timestamp — use `tf_at()` for time-specific lookups - Equivalent to `query(src).to(dst).lookup()` but shorter - Lock-free read — safe to call from RT nodes at high frequency **Example** ```rust // simplified use horus::prelude::*; // Where is the camera relative to the world? let camera_to_world = tf.tf("camera", "world")?; println!("Camera at: ({:.2}, {:.2}, {:.2})", camera_to_world.translation[0], camera_to_world.translation[1], camera_to_world.translation[2]); ``` **When to use**: Simple one-off lookups. For complex queries (time-specific, point transformation, chain inspection), use the fluent `query()` API instead. --- ## Staleness Detection Check if frames have gone stale (e.g., sensor disconnected): ```rust // simplified // Using wall-clock time if tf.is_stale_now("imu", 500_000_000) { // 500ms hlog!(warn, "IMU data is stale!"); } // Using custom time (for simulation) if tf.is_stale("imu", 500_000_000, sim_time_ns) { hlog!(warn, "IMU stale in sim time"); } // Time since last update if let Some(age_ns) = tf.time_since_last_update_now("imu") { println!("IMU age: {}ms", age_ns / 1_000_000); } ``` | Method | Returns | Description | |--------|---------|-------------| | `is_stale(name, max_age_ns, now_ns)` | `bool` | Stale relative to custom time | | `is_stale_now(name, max_age_ns)` | `bool` | Stale relative to wall-clock | | `time_since_last_update(name, now_ns)` | `Option` | Age in nanoseconds (custom time) | | `time_since_last_update_now(name)` | `Option` | Age in nanoseconds (wall-clock) | | `time_range(name)` | `Option<(u64, u64)>` | Buffer window (oldest_ns, newest_ns) | --- ## Diagnostics ### TransformFrameStats ```rust // simplified let stats = tf.stats(); println!("{}", stats); // Pretty-printed summary println!("Total: {}, Static: {}, Dynamic: {}", stats.total_frames, stats.static_frames, stats.dynamic_frames); println!("Tree depth: {}, Roots: {}", stats.tree_depth, stats.root_count); ``` | Field | Type | Description | |-------|------|-------------| | `total_frames` | `usize` | Total registered frames | | `static_frames` | `usize` | Static frame count | | `dynamic_frames` | `usize` | Dynamic frame count | | `max_frames` | `usize` | Configured max | | `history_len` | `usize` | History buffer size | | `tree_depth` | `usize` | Maximum tree depth | | `root_count` | `usize` | Frames with no parent | ### FrameInfo ```rust // simplified if let Some(info) = tf.frame_info("camera") { println!("Frame: {}, Parent: {:?}, Static: {}", info.name, info.parent, info.is_static); println!("Depth: {}, Children: {}", info.depth, info.children_count); } // All frames let all = tf.frame_info_all(); ``` | Field | Type | Description | |-------|------|-------------| | `name` | `String` | Frame name | | `id` | `FrameId` | Numeric ID (for hot-path lookups) | | `parent` | `Option` | Parent name (`None` = root) | | `is_static` | `bool` | Static vs dynamic | | `time_range` | `Option<(u64, u64)>` | (oldest_ns, newest_ns) | | `children_count` | `usize` | Direct children | | `depth` | `usize` | Depth in tree (root = 0) | ### Tree Export ```rust // simplified tf.print_tree(); // Print to stderr let text = tf.format_tree(); // Human-readable tree string let dot = tf.frames_as_dot(); // Graphviz DOT format let yaml = tf.frames_as_yaml(); // YAML (TF2-style) tf.validate()?; // Validate tree structure ``` --- ## timestamp_now() ```rust // simplified use horus::prelude::*; let now: u64 = timestamp_now(); // Current time in nanoseconds (UNIX epoch) ``` --- ## FrameId Type alias for `u32`. Use for hot-path lookups to avoid string-based name resolution: ```rust // simplified let id: FrameId = tf.frame_id("base_link").unwrap(); // Fast by-ID operations (no string lookup) tf.update_transform_by_id(id, &transform, timestamp_now())?; let t = tf.tf_by_id(src_id, dst_id); ``` --- ## Complete Example ```rust // simplified use horus::prelude::*; fn main() -> Result<()> { let tf = TransformFrame::new(); // Build frame tree tf.add_frame("world").build()?; tf.add_frame("odom").parent("world").build()?; tf.add_frame("base_link").parent("odom").build()?; tf.add_frame("lidar") .parent("base_link") .static_transform(&Transform::xyz(0.2, 0.0, 0.3)) .build()?; tf.add_frame("camera") .parent("base_link") .static_transform(&Transform::xyz(0.1, 0.0, 0.5).with_rpy(0.0, 0.1, 0.0)) .build()?; // Update dynamic frames let now = timestamp_now(); tf.update_transform("odom", &Transform::xyz(1.0, 0.5, 0.0).with_yaw(0.3), now)?; tf.update_transform("base_link", &Transform::xyz(0.05, 0.02, 0.0), now)?; // Query transforms let lidar_to_world = tf.query("lidar").to("world").lookup()?; println!("Lidar position in world: {:?}", lidar_to_world.translation); // Transform a LiDAR point into world frame let world_pt = tf.query("lidar").to("world").point([5.0, 0.0, 0.0])?; println!("Obstacle at world: ({:.2}, {:.2}, {:.2})", world_pt[0], world_pt[1], world_pt[2]); // Check staleness if tf.is_stale_now("base_link", 100_000_000) { // 100ms println!("Odometry is stale!"); } // Print tree tf.print_tree(); Ok(()) } ``` --- ## Common Frame Configurations ### Mobile Robot with Camera and Lidar ``` world → odom → base_link → laser_link → camera_link → camera_optical ``` ```rust // simplified let tf = TransformFrame::new(); // Static frames (don't change) tf.register_frame("world", None)?; tf.register_frame("odom", Some("world"))?; tf.register_frame("base_link", Some("odom"))?; // Sensor mounts (static transforms from CAD/measurement) tf.register_frame("laser_link", Some("base_link"))?; tf.update_transform("laser_link", &Transform::new([0.15, 0.0, 0.3], [0.0, 0.0, 0.0, 1.0]), timestamp_now())?; tf.register_frame("camera_link", Some("base_link"))?; tf.update_transform("camera_link", &Transform::new([0.1, -0.05, 0.25], [0.0, 0.0, 0.0, 1.0]), timestamp_now())?; // Query: transform lidar point to camera frame let laser_to_cam = tf.tf("laser_link", "camera_link")?; let point_in_cam = laser_to_cam.transform_point([2.0, 0.0, 0.0]); ``` ### Robot Arm (6-DOF) ``` base → link1 → link2 → link3 → link4 → link5 → link6 → tool → gripper ``` ```rust // simplified let tf = TransformFrame::new(); tf.register_frame("base", None)?; for i in 1..=6 { let parent = if i == 1 { "base" } else { &format!("link{}", i - 1) }; tf.register_frame(&format!("link{}", i), Some(parent))?; } tf.register_frame("tool", Some("link6"))?; tf.register_frame("gripper", Some("tool"))?; // Update joint angles each tick (from forward kinematics) // tf.update_transform("link1", &fk_transform, timestamp_now())?; // Query: where is the gripper in base frame? let gripper_pose = tf.tf("gripper", "base")?; ``` --- ## See Also - [TransformFrame Concepts](/concepts/transform-frame) — Architecture and design patterns - [Python TransformFrame API](/python/api/transform-frame) — Python bindings - [Scheduler API](/rust/api/scheduler) — Running nodes that use transforms --- ## Geometry Messages Path: /rust/api/geometry-messages Description: 3D and 2D spatial primitives for position, orientation, and motion # Geometry Messages HORUS provides fundamental geometric primitives used throughout robotics applications for representing position, orientation, and motion. All geometry messages support ultra-fast zero-copy transfer (~50ns). ## Twist 3D velocity command with linear and angular components. Used for commanding robot motion in 3D space. For 2D robots, only x (forward) and yaw (rotation) are typically used. ```rust // simplified use horus::prelude::*; // Create 3D velocity command let twist = Twist::new( [1.0, 0.0, 0.0], // linear: [x, y, z] in m/s [0.0, 0.0, 0.5] // angular: [roll, pitch, yaw] in rad/s ); // For 2D robots (common case) let cmd = Twist::new_2d(0.5, 0.3); // 0.5 m/s forward, 0.3 rad/s rotation println!("Linear X: {}, Angular Z: {}", cmd.linear[0], cmd.angular[2]); // Stop command (all zeros) let stop = Twist::stop(); // Validate the message if twist.is_valid() { println!("Twist is valid"); } ``` **Fields:** | Field | Type | Unit | Description | |-------|------|------|-------------| | `linear` | `[f64; 3]` | m/s | Linear velocity [x, y, z]. X=forward, Y=left, Z=up | | `angular` | `[f64; 3]` | rad/s | Angular velocity [roll, pitch, yaw]. Positive yaw=counter-clockwise | | `timestamp_ns` | `u64` | ns | Nanoseconds since epoch | > **Coordinate frame:** Right-hand, Z-up. X=forward, Y=left. Consistent across all geometry types. > > **ROS2 equivalent:** `geometry_msgs/msg/Twist` > > **Typical ranges:** Mobile robots: linear.x ±2.0 m/s, angular.z ±3.0 rad/s **Methods:** | Method | Description | |--------|-------------| | `new(linear, angular)` | Create full 3D velocity command | | `new_2d(linear_x, angular_z)` | Create 2D velocity (forward + rotation) | | `stop()` | Create stop command (all zeros) | | `is_valid()` | Check if all values are finite | ## Pose2D 2D pose representation (position and orientation). Commonly used for mobile robots operating in planar environments. ```rust // simplified use horus::prelude::*; // Create 2D pose let pose = Pose2D::new(5.0, 3.0, 1.57); // x, y, theta println!("Position: ({}, {}), Orientation: {} rad", pose.x, pose.y, pose.theta); // Create pose at origin let origin = Pose2D::origin(); // Calculate distance between poses let other = Pose2D::new(8.0, 7.0, 0.0); let distance = pose.distance_to(&other); println!("Distance: {:.2} m", distance); // Normalize angle to [-π, π] let mut pose_copy = pose; pose_copy.normalize_angle(); // Check validity assert!(pose.is_valid()); ``` **Fields:** | Field | Type | Description | |-------|------|-------------| | `x` | `f64` | X position in meters | | `y` | `f64` | Y position in meters | | `theta` | `f64` | Orientation angle in radians | | `timestamp_ns` | `u64` | Nanoseconds since epoch | **Methods:** | Method | Description | |--------|-------------| | `new(x, y, theta)` | Create a new 2D pose | | `origin()` | Create pose at origin (0, 0, 0) | | `distance_to(&other)` | Calculate euclidean distance to another pose | | `normalize_angle()` | Normalize theta to [-π, π] | | `is_valid()` | Check if all values are finite | ## TransformStamped 3D transformation (translation and rotation). Represents a full 3D transformation using translation vector and quaternion rotation. Used for coordinate frame transformations. ```rust // simplified use horus::prelude::*; // Create transform with translation and quaternion rotation let transform = TransformStamped::new( [1.0, 2.0, 0.5], // translation [x, y, z] [0.0, 0.0, 0.0, 1.0] // rotation [x, y, z, w] quaternion ); // Identity transform (no translation or rotation) let identity = TransformStamped::identity(); // Create from 2D pose (z=0, only yaw rotation) let pose = Pose2D::new(3.0, 4.0, 1.57); let tf_from_pose = TransformStamped::from_pose_2d(&pose); // Validate quaternion is normalized if transform.is_valid() { println!("Transform is valid (quaternion normalized)"); } // Normalize quaternion if needed let mut tf = TransformStamped::new([0.0; 3], [1.0, 1.0, 1.0, 1.0]); tf.normalize_rotation(); ``` **Fields:** | Field | Type | Description | |-------|------|-------------| | `translation` | `[f64; 3]` | Translation [x, y, z] in meters | | `rotation` | `[f64; 4]` | Rotation as quaternion [x, y, z, w] | | `timestamp_ns` | `u64` | Nanoseconds since epoch | **Methods:** | Method | Description | |--------|-------------| | `new(translation, rotation)` | Create new transform | | `identity()` | Identity transform (no change) | | `from_pose_2d(&pose)` | Create from 2D pose | | `is_valid()` | Check quaternion normalized and values finite | | `normalize_rotation()` | Normalize the quaternion component | ## Point3 3D point representation. ```rust // simplified use horus::prelude::*; // Create 3D point let point = Point3::new(1.0, 2.0, 3.0); println!("Point: ({}, {}, {})", point.x, point.y, point.z); // Create point at origin let origin = Point3::origin(); // Calculate distance between points let other = Point3::new(4.0, 6.0, 3.0); let distance = point.distance_to(&other); println!("Distance: {:.2} m", distance); // 5.0 m ``` **Fields:** | Field | Type | Description | |-------|------|-------------| | `x` | `f64` | X coordinate in meters | | `y` | `f64` | Y coordinate in meters | | `z` | `f64` | Z coordinate in meters | **Methods:** | Method | Description | |--------|-------------| | `new(x, y, z)` | Create new point | | `origin()` | Create point at origin (0, 0, 0) | | `distance_to(&other)` | Calculate euclidean distance | ## Vector3 3D vector representation with mathematical operations. ```rust // simplified use horus::prelude::*; // Create 3D vector let v = Vector3::new(3.0, 4.0, 0.0); println!("Vector: ({}, {}, {})", v.x, v.y, v.z); // Zero vector let zero = Vector3::zero(); // Calculate magnitude let mag = v.magnitude(); println!("Magnitude: {:.2}", mag); // 5.0 // Normalize vector let mut unit = Vector3::new(3.0, 4.0, 0.0); unit.normalize(); println!("Unit vector: ({:.2}, {:.2}, {})", unit.x, unit.y, unit.z); // (0.6, 0.8, 0) // Dot product let v1 = Vector3::new(1.0, 2.0, 3.0); let v2 = Vector3::new(4.0, 5.0, 6.0); let dot = v1.dot(&v2); println!("Dot product: {}", dot); // 32.0 // Cross product let i = Vector3::new(1.0, 0.0, 0.0); let j = Vector3::new(0.0, 1.0, 0.0); let k = i.cross(&j); println!("i × j = ({}, {}, {})", k.x, k.y, k.z); // (0, 0, 1) ``` **Fields:** | Field | Type | Description | |-------|------|-------------| | `x` | `f64` | X component | | `y` | `f64` | Y component | | `z` | `f64` | Z component | **Methods:** | Method | Description | |--------|-------------| | `new(x, y, z)` | Create new vector | | `zero()` | Create zero vector | | `magnitude()` | Calculate vector length | | `normalize()` | Normalize to unit vector | | `dot(&other)` | Dot product with another vector | | `cross(&other)` | Cross product with another vector | | `normalized()` | Return a new unit-length vector (does not mutate self) | ## Quaternion Quaternion for 3D rotation representation. Quaternions avoid gimbal lock and provide smooth interpolation for rotations. ```rust // simplified use horus::prelude::*; // Create quaternion directly let q = Quaternion::new(0.0, 0.0, 0.0, 1.0); // [x, y, z, w] // Identity quaternion (no rotation) let identity = Quaternion::identity(); assert_eq!(identity.w, 1.0); // Create from Euler angles (roll, pitch, yaw) let q_from_euler = Quaternion::from_euler( 0.0, // roll (rotation around X) 0.0, // pitch (rotation around Y) 1.57 // yaw (rotation around Z) - 90 degrees ); // Normalize quaternion let mut q_unnorm = Quaternion::new(1.0, 1.0, 1.0, 1.0); q_unnorm.normalize(); // Validate assert!(q.is_valid()); ``` **Fields:** | Field | Type | Description | |-------|------|-------------| | `x` | `f64` | X component (imaginary i) | | `y` | `f64` | Y component (imaginary j) | | `z` | `f64` | Z component (imaginary k) | | `w` | `f64` | W component (real/scalar) | **Methods:** | Method | Description | |--------|-------------| | `new(x, y, z, w)` | Create new quaternion | | `identity()` | Identity quaternion (no rotation) | | `from_euler(roll, pitch, yaw)` | Create from Euler angles | | `normalize()` | Normalize to unit quaternion | | `is_valid()` | Check all values are finite | ## Pose3D 6-DOF pose — position (Point3) + orientation (Quaternion). ```rust // simplified use horus::prelude::*; let pose = Pose3D::new(Point3::new(1.0, 2.0, 0.5), Quaternion::identity()); let identity = Pose3D::identity(); let from_2d = Pose3D::from_pose_2d(&Pose2D::new(1.0, 2.0, 0.5)); let dist = pose.distance_to(&identity); assert!(pose.is_valid()); ``` **Methods:** | Method | Returns | Description | |--------|---------|-------------| | `new(position, orientation)` | `Pose3D` | Create from Point3 + Quaternion | | `identity()` | `Pose3D` | Identity pose (origin, no rotation) | | `from_pose_2d(&pose)` | `Pose3D` | Create from Pose2D (z=0, rotation around z) | | `distance_to(&other)` | `f64` | Euclidean distance to another Pose3D | | `is_valid()` | `bool` | Check all values are finite | ## PoseStamped Timestamped 3D pose with coordinate frame. **Methods:** | Method | Returns | Description | |--------|---------|-------------| | `new(pose)` | `PoseStamped` | Create from a Pose3D | | `with_frame_id(frame_id)` | `Self` | Builder: set coordinate frame | ## PoseWithCovariance Pose with 6x6 covariance matrix (36 elements, row-major). **Methods:** | Method | Returns | Description | |--------|---------|-------------| | `new(pose)` | `PoseWithCovariance` | Create from a Pose3D | | `with_frame_id(frame_id)` | `Self` | Builder: set coordinate frame | | `position_variance()` | `[f64; 3]` | Diagonal elements for position uncertainty (x, y, z) | | `orientation_variance()` | `[f64; 3]` | Diagonal elements for orientation uncertainty | ## TwistWithCovariance Twist with 6x6 covariance matrix (36 elements, row-major). **Methods:** | Method | Returns | Description | |--------|---------|-------------| | `new(twist)` | `TwistWithCovariance` | Create from a Twist | | `with_frame_id(frame_id)` | `Self` | Builder: set coordinate frame | | `linear_variance()` | `[f64; 3]` | Diagonal elements for linear velocity uncertainty | | `angular_variance()` | `[f64; 3]` | Diagonal elements for angular velocity uncertainty | ## Accel / AccelStamped 6-DOF acceleration (linear + angular). AccelStamped adds a coordinate frame. **Accel Methods:** | Method | Returns | Description | |--------|---------|-------------| | `new(linear, angular)` | `Accel` | Create from [f64; 3] arrays | | `is_valid()` | `bool` | Check all values are finite | **AccelStamped Methods:** | Method | Returns | Description | |--------|---------|-------------| | `new(accel)` | `AccelStamped` | Create from an Accel | | `with_frame_id(frame_id)` | `Self` | Builder: set coordinate frame | ## Zero-Copy Performance All geometry messages are optimized for zero-copy shared memory transfer (~50ns): ```rust // simplified use horus::prelude::*; // HORUS automatically uses zero-copy for compatible types let twist_topic: Topic = Topic::new("velocity_cmd")?; let pose_topic: Topic = Topic::new("robot_pose")?; // Send - automatically zero-copy for fixed-size types twist_topic.send(Twist::new_2d(0.5, 0.1)); // Receive - zero-copy access if let Some(pose) = pose_topic.recv() { println!("Robot at ({:.2}, {:.2})", pose.x, pose.y); } ``` ## Robot Control Example ```rust // simplified use horus::prelude::*; struct DifferentialDriveController { pose_sub: Topic, goal_sub: Topic, cmd_pub: Topic, } impl Node for DifferentialDriveController { fn name(&self) -> &str { "DiffDriveController" } fn tick(&mut self) { // Get current pose and goal let pose = match self.pose_sub.recv() { Some(p) => p, None => return, }; let goal = match self.goal_sub.recv() { Some(g) => g, None => return, }; // Calculate distance and angle to goal let dx = goal.x - pose.x; let dy = goal.y - pose.y; let distance = (dx * dx + dy * dy).sqrt(); let angle_to_goal = dy.atan2(dx); let angle_error = angle_to_goal - pose.theta; // Simple proportional controller let cmd = if distance > 0.1 { // Move towards goal Twist::new_2d( 0.3 * distance.min(1.0), // Linear velocity (capped) 1.0 * angle_error // Angular velocity ) } else { // Goal reached Twist::stop() }; self.cmd_pub.send(cmd); } } ``` ## Coordinate Frames HORUS uses right-handed coordinate systems: - **X-axis**: Forward (red) - **Y-axis**: Left (green) - **Z-axis**: Up (blue) For 2D robots: - **X**: Forward - **Y**: Left - **Theta**: Counter-clockwise rotation from X-axis ## See Also - [Navigation Messages](/rust/api/navigation-messages) - Goals, paths, occupancy grids - [Sensor Messages](/rust/api/sensor-messages) - IMU, odometry - [Force Messages](/rust/api/force-messages) - Wrench, force vectors --- ## API Reference Path: /rust/api Description: Complete API reference for the HORUS robotics framework # API Reference Welcome to the HORUS API reference documentation. This section provides detailed documentation for all public types, traits, and functions in the HORUS framework. ## Crates | Crate | Description | |-------|-------------| | [horus_core](/rust/api) | Core runtime - nodes, communication, scheduling | | [horus_library](/rust/api/messages) | Standard message types for robotics | | [horus_macros](/rust/api/macros) | Procedural macros (`node!`, `derive(LogSummary)`) | --- ## Quick Reference ### Core Types | Type | Description | Module | |------|-------------|--------| | [`Node`](/rust/api#node) | Base trait for all computation units | `horus_core` | | [`Topic`](/rust/api/topic) | Unified pub/sub channel with auto-detected backends | `horus_core` | | [`Scheduler`](/rust/api/scheduler) | Node execution orchestrator | `horus_core` | | [`ServiceClient`](/rust/api/services) | Synchronous request/response RPC | `horus_core` | | [`ActionClientNode`](/rust/api/actions) | Long-running tasks with feedback and cancellation | `horus_core` | ### Error Handling | Type | Description | Module | |------|-------------|--------| | [`Error`](/rust/api#error) | Unified error type | `horus_core` | | [`Result`](/rust/api#result) | Result alias | `horus_core` | ### Message Types | Type | Description | Module | |------|-------------|--------| | [`Image`](/rust/api/messages#image) | Camera image data | `horus_library` | | [`LaserScan`](/rust/api/messages#laserscan) | LiDAR scan data | `horus_library` | | [`Imu`](/rust/api/messages#imu) | IMU sensor data | `horus_library` | | [`Twist`](/rust/api/messages#twist) | Velocity commands | `horus_library` | | [`Pose2D`](/rust/api/messages#pose2d) | 2D position and orientation | `horus_library` | --- ## Import Patterns ### Minimal Import ```rust // simplified use horus::prelude::*; ``` This single import gives you access to **165+ types** — everything needed for typical robotics applications. ### Explicit Import (Alternative) If you prefer explicit imports: ```rust // simplified use horus::prelude::{ // Core traits and types Node, NodeState, HealthStatus, LogSummary, // Communication Topic, // Scheduling Scheduler, FailurePolicy, // Real-time DurationExt, Frequency, Miss, Rate, Stopwatch, // Errors Error, Result, HorusError, // Messages Pose2D, LaserScan, Imu, CmdVel, MotorCommand, }; ``` --- ## Prelude Contents The `horus::prelude` re-exports everything you need without hunting for module paths. Here is the full inventory, grouped by category. ### Core Traits & Types | Type | Kind | Description | |------|------|-------------| | `Node` | trait | Base trait for all computation units | | `NodeState` | enum | Running, Paused, Stopped, Error | | `HealthStatus` | enum | Node health reporting | | `LogSummary` | trait | Structured logging for nodes | ### Communication | Type | Kind | Description | |------|------|-------------| | `Topic` | struct | Typed pub/sub channel with auto-detected backends | ### Scheduling & Execution | Type | Kind | Description | |------|------|-------------| | `Scheduler` | struct | Node execution orchestrator | | `FailurePolicy` | enum | Fatal, Restart, Skip, Ignore | ### Real-Time & Timing | Type | Kind | Description | |------|------|-------------| | `DurationExt` | trait | Ergonomic duration creation: `200.us()`, `1.ms()` | | `Frequency` | struct | Type-safe frequency: `100.hz()` | | `Miss` | enum | Deadline miss policy: Warn, Skip, SafeMode, Stop | | `RtStats` | struct | Real-time execution statistics | | `Rate` | struct | Fixed-rate loop control | | `Stopwatch` | struct | High-resolution timer | | [`RuntimeParams`](/rust/api/runtime-params) | struct | Runtime parameter store | ### Memory Domain Types | Type | Kind | Description | |------|------|-------------| | `Image` | struct | Pool-backed camera image (zero-copy) | | `DepthImage` | struct | Pool-backed depth image (F32/U16) | | `PointCloud` | struct | Pool-backed 3D point cloud (zero-copy) | ### Transform Frame | Type | Kind | Description | |------|------|-------------| | `TransformFrame` | struct | Coordinate frame tree manager | | `TransformFrameConfig` | struct | Frame tree configuration | | `TransformFrameStats` | struct | Frame tree statistics | | `Transform` | struct | 3D rigid transformation | | `TransformQuery` | struct | Transform lookup query | | `TransformQueryFrom` | struct | Transform query builder | | `FrameBuilder` | struct | Fluent frame registration | | `FrameInfo` | struct | Frame metadata | | `timestamp_now()` | fn | Current time in nanoseconds | ### Geometry Messages `Accel`, `AccelStamped`, `Point3`, `Pose2D`, `Pose3D`, `PoseStamped`, `PoseWithCovariance`, `Quaternion`, `TransformStamped`, `Twist`, `TwistWithCovariance`, `Vector3` ### Sensor Messages `BatteryState`, `FluidPressure`, `Illuminance`, `Imu`, `JointState`, `LaserScan`, `MagneticField`, `NavSatFix`, `Odometry`, `RangeSensor`, `Temperature` ### Clock & Time Messages `Clock`, `TimeReference`, `SOURCE_WALL`, `SOURCE_SIM`, `SOURCE_REPLAY` ### Control & Actuator Messages `CmdVel`, `DifferentialDriveCommand`, `JointCommand`, `MotorCommand`, `PidConfig`, `ServoCommand`, `TrajectoryPoint` ### Diagnostics Messages `DiagnosticReport`, `DiagnosticStatus`, `DiagnosticValue`, `EmergencyStop`, `Heartbeat`, `NodeHeartbeat`, `NodeStateMsg`, `ResourceUsage`, `SafetyStatus`, `StatusLevel` ### Vision & Perception Messages `BoundingBox2D`, `BoundingBox3D`, `CameraInfo`, `CompressedImage`, `Detection`, `Detection3D`, `Landmark`, `Landmark3D`, `LandmarkArray`, `PlaneDetection`, `RegionOfInterest`, `SegmentationMask`, `TrackedObject`, `TrackingHeader` ### Navigation Messages `CostMap`, `NavGoal`, `NavPath`, `OccupancyGrid`, `PathPlan` ### Force & Impedance Messages `ForceCommand`, `ImpedanceParameters`, `WrenchStamped` ### Input Messages `JoystickInput`, `KeyboardInput` ### Application Messages `CmdVel`, `GenericMessage` (flexible cross-language messaging, `MAX_GENERIC_PAYLOAD = 4096`) ### Type Helpers `Device`, `ImageEncoding`, `PointXYZ`, `PointXYZI`, `PointXYZRGB`, `TensorDtype` ### Actions `Action`, `ActionClient`, `ActionClientBuilder`, `ActionClientNode`, `ActionError`, `ActionResult`, `ActionServerBuilder`, `ActionServerNode`, `CancelResponse`, `ClientGoalHandle`, `GoalId`, `GoalOutcome`, `GoalPriority`, `GoalResponse`, `GoalStatus`, `PreemptionPolicy`, `ServerGoalHandle` ### Services `AsyncServiceClient`, `Service`, `ServiceClient`, `ServiceError`, `ServiceRequest`, `ServiceResponse`, `ServiceResult`, `ServiceServer`, `ServiceServerBuilder` ### Error Handling `Error`, `Result`, `HorusError`, `HorusContext`, `CommunicationError`, `ConfigError`, `MemoryError`, `NodeError`, `NotFoundError`, `ParseError`, `ResourceError`, `SerializationError`, `Severity`, `TimeoutError`, `TransformError`, `ValidationError`, `retry_transient()`, `RetryConfig` ### Macros (with `"macros"` feature) | Macro | Description | |-------|-------------| | `message!` | Define custom message types | | `service!` | Define request/response service types | | `action!` | Define long-running action types (goal/feedback/result) | | `standard_action!` | Pre-built action templates | | `hlog!` | Structured node logging | | `hlog_once!` | Log once per program execution | | `hlog_every!` | Throttled logging | | `node!` | Define node with automatic topic registration | --- ## Version Compatibility | HORUS Version | Rust Edition | MSRV | |---------------|--------------|------| | 0.1.x | 2021 | 1.92.0 | --- ## See Also - [Cargo Feature Flags](/rust/api/feature-flags) - Feature flags across all HORUS crates - [Core Concepts](/concepts/architecture) - Understanding HORUS architecture - [Examples](/rust/examples/basic-examples) - Working code examples - [Message Types](/concepts/message-types) - Standard message reference --- ## Perception Messages Path: /rust/api/perception-messages Description: 3D perception, point cloud, depth, landmark, tracking, and segmentation message types # Perception Messages HORUS provides message types for 3D perception tasks. These include: - **Pool-backed RAII types**: `PointCloud`, `DepthImage` — zero-copy shared memory backed, managed via global tensor pool - **Serde types**: `PointField`, `PlaneDetection` — flexible, support serialization - **Zero-copy types**: `PointXYZ`, `Landmark`, `TrackedObject`, `SegmentationMask` — fixed-size, suitable for shared memory transport with zero serialization overhead ## PointCloud Pool-backed RAII point cloud type with zero-copy shared memory transport. PointCloud allocates from a global tensor pool. ```rust // simplified use horus::prelude::*; // Create XYZ point cloud (10000 points, 3 floats per point) let mut cloud = PointCloud::from_xyz(&points)?; // 10000 points // Copy point data let point_data: Vec = vec![0; 10000 * 3 * 4]; // 10000 points * 3 floats * 4 bytes cloud.copy_from(&point_data); // Set metadata (method chaining) cloud.set_frame_id("lidar_front").set_timestamp_ns(1234567890); // Access properties println!("Points: {}", cloud.point_count()); println!("Fields per point: {}", cloud.fields_per_point()); println!("Is XYZ: {}", cloud.is_xyz()); // Zero-copy data access let data: &[u8] = cloud.data(); // Extract XYZ coordinates as Vec<[f32; 3]> if let Some(points) = cloud.extract_xyz() { for p in &points[..3] { println!("({:.2}, {:.2}, {:.2})", p[0], p[1], p[2]); } } // Access individual point if let Some(point_bytes) = cloud.point_at(0) { println!("Point 0: {} bytes", point_bytes.len()); } ``` **PointCloud methods:** PointCloud is an RAII type — fields are private, accessed through methods. Mutation methods return `&mut Self` for chaining. | Method | Returns | Description | |--------|---------|-------------| | `new(num_points, fields_per_point, dtype)` | `Result` | Create point cloud (allocates from global pool) | | `data()` | `&[u8]` | Zero-copy access to raw point data | | `data_mut()` | `&mut [u8]` | Mutable access to point data | | `copy_from(src)` | `&mut Self` | Copy data into point cloud | | `point_at(idx)` | `Option<&[u8]>` | Get bytes for the i-th point | | `extract_xyz()` | `Option>` | Extract XYZ as float arrays (F32 only) | | `point_count()` | `u64` | Number of points | | `fields_per_point()` | `u32` | Floats per point (3=XYZ, 4=XYZI, 6=XYZRGB) | | `dtype()` | `TensorDtype` | Data type of point components | | `is_xyz()` | `bool` | Whether this is a plain XYZ cloud | | `has_intensity()` | `bool` | Whether cloud has intensity field | | `set_frame_id(id)` | `&mut Self` | Set sensor frame identifier | | `set_timestamp_ns(ts)` | `&mut Self` | Set timestamp in nanoseconds | ### PointField Describes a field within point cloud data (serde type, used for custom point formats): ```rust // simplified use horus::prelude::*; // Create custom field let intensity = PointField::new("intensity", 12, TensorDtype::F32, 1); println!("Field: {}, size: {} bytes", intensity.name_str(), intensity.field_size()); ``` **PointField** uses `TensorDtype` (available via prelude) for its `datatype` field. See [TensorDtype values](/rust/api/tensor#tensordtype) for the full list. ## Point Types (Zero-Copy) Fixed-size point types suitable for zero-copy shared memory transport. ### PointXYZ Basic 3D point (12 bytes). ```rust // simplified use horus::prelude::*; let point = PointXYZ::new(1.0, 2.0, 3.0); println!("Distance from origin: {:.2}m", point.distance()); let other = PointXYZ::new(4.0, 6.0, 3.0); println!("Distance between: {:.2}m", point.distance_to(&other)); ``` | Field | Type | Description | |-------|------|-------------| | `x` | `f32` | X coordinate (meters) | | `y` | `f32` | Y coordinate (meters) | | `z` | `f32` | Z coordinate (meters) | ### PointXYZRGB 3D point with RGB color (16 bytes). Common for RGB-D cameras like Intel RealSense. ```rust // simplified use horus::prelude::*; let point = PointXYZRGB::new(1.0, 2.0, 3.0, 255, 0, 0); // Red point // Convert from PointXYZ (defaults to white) let xyz = PointXYZ::new(1.0, 2.0, 3.0); let colored = PointXYZRGB::from_xyz(xyz); // Get packed RGB as u32 (0xRRGGBBAA) let packed = point.rgb_packed(); // Convert back to PointXYZ (drop color) let xyz_only = point.xyz(); ``` | Field | Type | Description | |-------|------|-------------| | `x`, `y`, `z` | `f32` | Coordinates (meters) | | `r`, `g`, `b` | `u8` | Color components (0-255) | | `a` | `u8` | Alpha/padding (255 default) | ### PointXYZI 3D point with intensity (16 bytes). Common for LiDAR sensors (Velodyne, Ouster, Livox). ```rust // simplified use horus::prelude::*; let point = PointXYZI::new(1.0, 2.0, 3.0, 128.0); // Convert from PointXYZ (zero intensity) let xyz = PointXYZ::new(1.0, 2.0, 3.0); let with_intensity = PointXYZI::from_xyz(xyz); ``` | Field | Type | Description | |-------|------|-------------| | `x`, `y`, `z` | `f32` | Coordinates (meters) | | `intensity` | `f32` | Reflectance (typically 0-255) | ### PointCloudHeader Header for array transmission of point clouds (64 bytes). Sent before the point data array via IPC. ```rust // simplified use horus::prelude::*; // Create header for different point types let header = PointCloudHeader::xyz(10000) .with_frame_id("lidar_front") .with_timestamp(timestamp_ns); // Or for colored points let header = PointCloudHeader::xyzrgb(5000); // Calculate total data size println!("Data size: {} bytes", header.data_size()); ``` | Field | Type | Description | |-------|------|-------------| | `num_points` | `u64` | Number of points | | `point_type` | `u32` | 0=XYZ, 1=XYZRGB, 2=XYZI | | `point_stride` | `u32` | Bytes per point | | `timestamp_ns` | `u64` | Nanoseconds since epoch | | `seq` | `u64` | Sequence number | | `frame_id` | `[u8; 32]` | Sensor/coordinate frame | ## DepthImage Pool-backed RAII depth image with zero-copy shared memory transport. Supports both F32 (meters) and U16 (millimeters) formats. ```rust // simplified use horus::prelude::*; // Create F32 depth image (meters) let mut depth = DepthImage::meters(640, 480)?; // Or U16 depth image (millimeters) let mut depth_u16 = DepthImage::millimeters(640, 480)?; // Set metadata (method chaining) depth.set_frame_id("depth_camera").set_timestamp_ns(1234567890); // Access depth at pixel (always returns meters as f32) if let Some(d) = depth.get_depth(320, 240) { println!("Depth at center: {:.3}m", d); } // Set depth at pixel (value in meters) depth.set_depth(100, 100, 1.5); // Get raw U16 value for millimeter-format images if let Some(mm) = depth_u16.get_depth_u16(320, 240) { println!("Raw depth: {}mm", mm); } // Get statistics (min, max, mean in meters) if let Some((min, max, mean)) = depth.depth_statistics() { println!("Depth range: {:.2}-{:.2}m, mean: {:.2}m", min, max, mean); } // Zero-copy data access let data: &[u8] = depth.data(); println!("Image: {}x{}, {} bytes", depth.width(), depth.height(), data.len()); ``` **DepthImage methods:** DepthImage is an RAII type — fields are private, accessed through methods. Mutation methods return `&mut Self` for chaining. | Method | Returns | Description | |--------|---------|-------------| | `new(width, height, dtype)` | `Result` | Create depth image (F32 or U16) | | `data()` | `&[u8]` | Zero-copy access to raw depth data | | `data_mut()` | `&mut [u8]` | Mutable access to depth data | | `get_depth(x, y)` | `Option` | Get depth in meters at pixel | | `set_depth(x, y, value)` | `&mut Self` | Set depth in meters at pixel | | `get_depth_u16(x, y)` | `Option` | Get raw U16 value (millimeter format only) | | `depth_statistics()` | `Option<(f32, f32, f32)>` | Get (min, max, mean) of valid depths in meters | | `width()` | `u32` | Image width in pixels | | `height()` | `u32` | Image height in pixels | | `set_frame_id(id)` | `&mut Self` | Set camera frame identifier | | `set_timestamp_ns(ts)` | `&mut Self` | Set timestamp in nanoseconds | ## PlaneDetection Detected planar surface. ```rust // simplified use horus::prelude::*; // Create plane detection (floor plane) let coefficients = [0.0, 0.0, 1.0, 0.0]; // ax + by + cz + d = 0 let center = Point3::new(0.0, 0.0, 0.0); let normal = Vector3::new(0.0, 0.0, 1.0); let plane = PlaneDetection::new(coefficients, center, normal) .with_type("floor"); // Check distance from point to plane let test_point = Point3::new(1.0, 2.0, 0.1); let distance = plane.distance_to_point(&test_point); // Check if point is on plane (within tolerance) if plane.contains_point(&test_point, 0.05) { println!("Point is on the plane"); } println!("Plane type: {}", plane.plane_type_str()); ``` **Fields:** | Field | Type | Description | |-------|------|-------------| | `coefficients` | `[f64; 4]` | Plane equation [a, b, c, d] | | `center` | `Point3` | Plane center point | | `normal` | `Vector3` | Plane normal vector | | `size` | `[f64; 2]` | Plane bounds (width, height) | | `inlier_count` | `u32` | Number of inlier points | | `confidence` | `f32` | Detection confidence (0-1) | | `plane_type` | `[u8; 16]` | Type label ("floor", "wall", etc.) | | `timestamp_ns` | `u64` | Nanoseconds since epoch | ### PlaneArray Array of detected planes (max 16). | Field | Type | Description | |-------|------|-------------| | `planes` | `[PlaneDetection; 16]` | Array of plane detections | | `count` | `u8` | Number of valid planes | | `frame_id` | `[u8; 32]` | Source sensor frame | | `algorithm` | `[u8; 32]` | Detection algorithm used | | `timestamp_ns` | `u64` | Nanoseconds since epoch | ## Landmark (Zero-Copy) 2D landmark/keypoint for pose estimation, facial landmarks, hand tracking. Fixed-size (16 bytes). ```rust // simplified use horus::prelude::*; // Create landmark with visibility let nose = Landmark::new(320.0, 240.0, 0.95, 0); // (x, y, visibility, index) // Create visible landmark (visibility = 1.0) let eye = Landmark::visible(300.0, 220.0, 1); // left_eye // Check visibility if nose.is_visible(0.5) { println!("Nose detected at ({}, {})", nose.x, nose.y); } // Distance between landmarks let distance = nose.distance_to(&eye); ``` **Fields:** | Field | Type | Description | |-------|------|-------------| | `x` | `f32` | X coordinate (pixels or normalized 0-1) | | `y` | `f32` | Y coordinate (pixels or normalized 0-1) | | `visibility` | `f32` | Visibility/confidence (0.0-1.0) | | `index` | `u32` | Landmark index (joint ID) | ### Landmark3D 3D landmark for 3D pose estimation, MediaPipe-style landmarks. Fixed-size (20 bytes). ```rust // simplified use horus::prelude::*; let landmark = Landmark3D::new(0.5, 0.3, 0.8, 0.95, 0); // Project to 2D (drops Z) let landmark_2d = landmark.to_2d(); // 3D distance let other = Landmark3D::visible(0.6, 0.4, 0.9, 1); let dist = landmark.distance_to(&other); ``` | Field | Type | Description | |-------|------|-------------| | `x`, `y`, `z` | `f32` | Coordinates (meters or normalized) | | `visibility` | `f32` | Visibility/confidence (0.0-1.0) | | `index` | `u32` | Landmark index | ### LandmarkArray Header for landmark array transmission. Fixed-size (40 bytes). Includes presets for common formats. ```rust // simplified use horus::prelude::*; // Standard pose estimation formats let coco = LandmarkArray::coco_pose(); // 17 2D landmarks let mp_pose = LandmarkArray::mediapipe_pose(); // 33 3D landmarks let mp_hand = LandmarkArray::mediapipe_hand(); // 21 3D landmarks let mp_face = LandmarkArray::mediapipe_face(); // 468 3D landmarks // Custom array with metadata let header = LandmarkArray::new_2d(17) .with_confidence(0.92) .with_bbox(100.0, 50.0, 200.0, 300.0); println!("Data size: {} bytes", header.data_size()); ``` **COCO Pose landmark indices** (available as constants in `landmark::coco`): | Index | Landmark | Index | Landmark | |-------|----------|-------|----------| | 0 | Nose | 9 | Left wrist | | 1 | Left eye | 10 | Right wrist | | 2 | Right eye | 11 | Left hip | | 3 | Left ear | 12 | Right hip | | 4 | Right ear | 13 | Left knee | | 5 | Left shoulder | 14 | Right knee | | 6 | Right shoulder | 15 | Left ankle | | 7 | Left elbow | 16 | Right ankle | | 8 | Right elbow | | | ## TrackedObject (Zero-Copy) Multi-object tracking result with Kalman filter prediction. Fixed-size (96 bytes). ```rust // simplified use horus::prelude::*; // Create tracked object let bbox = TrackingBBox::new(100.0, 100.0, 50.0, 50.0); let mut track = TrackedObject::new(1, bbox, 0, 0.95); track.set_class_name("person"); // Track lifecycle assert!(track.is_tentative()); // New tracks start tentative track.confirm(); // Promote to confirmed assert!(track.is_confirmed()); // Update with new detection track.update(TrackingBBox::new(110.0, 105.0, 50.0, 50.0), 0.93); println!("Velocity: ({}, {})", track.velocity_x, track.velocity_y); println!("Speed: {:.1}", track.speed()); println!("Heading: {:.2} rad", track.heading()); // Handle missed detection track.mark_missed(); println!("Predicted position: ({}, {})", track.predicted_bbox.x, track.predicted_bbox.y); ``` **TrackedObject fields:** | Field | Type | Description | |-------|------|-------------| | `bbox` | `TrackingBBox` | Current bounding box | | `predicted_bbox` | `TrackingBBox` | Predicted next position (Kalman) | | `track_id` | `u64` | Persistent tracking ID | | `confidence` | `f32` | Detection confidence (0.0-1.0) | | `class_id` | `u32` | Class ID | | `velocity_x`, `velocity_y` | `f32` | Velocity (pixels/frame or m/s) | | `accel_x`, `accel_y` | `f32` | Acceleration | | `age` | `u32` | Frames since first detection | | `hits` | `u32` | Frames with detection | | `time_since_update` | `u32` | Consecutive frames without detection | | `state` | `u32` | 0=tentative, 1=confirmed, 2=deleted | | `class_name` | `[u8; 16]` | Class name (max 15 chars) | **TrackedObject methods:** | Method | Returns | Description | |--------|---------|-------------| | `new(track_id, bbox, class_id, confidence)` | `TrackedObject` | Create a new tracked object | | `class_name()` | `&str` | Get class name as string | | `set_class_name(name)` | — | Set class name | | `is_tentative()` | `bool` | True if track is tentative (state=0) | | `is_confirmed()` | `bool` | True if track is confirmed (state=1) | | `is_deleted()` | `bool` | True if track is deleted (state=2) | | `confirm()` | — | Confirm the track | | `delete()` | — | Mark for deletion | | `update(bbox, confidence)` | — | Update with new detection | | `mark_missed()` | — | Mark a missed frame | | `speed()` | `f32` | Estimated speed | | `heading()` | `f32` | Estimated heading (radians) | **TrackingHeader methods:** | Method | Returns | Description | |--------|---------|-------------| | `new(num_tracks, frame_id)` | `TrackingHeader` | Create a new header | | `with_timestamp(timestamp_ns)` | `Self` | Builder: set timestamp | | `data_size()` | `usize` | Size of tracking data buffer | **TrackingBBox fields (16 bytes):** | Field | Type | Description | |-------|------|-------------| | `x`, `y` | `f32` | Top-left corner (pixels) | | `width`, `height` | `f32` | Dimensions (pixels) | **TrackingHeader fields (32 bytes):** | Field | Type | Description | |-------|------|-------------| | `num_tracks` | `u32` | Number of tracked objects | | `frame_id` | `u32` | Frame number | | `timestamp_ns` | `u64` | Nanoseconds since epoch | | `total_tracks` | `u64` | Total tracks created | | `active_tracks` | `u32` | Active confirmed tracks | ## SegmentationMask (Zero-Copy) Header for segmentation masks. Fixed-size (64 bytes). The mask pixel data follows the header as a raw byte array. ```rust // simplified use horus::prelude::*; // Semantic segmentation (class ID per pixel) let mask = SegmentationMask::semantic(1920, 1080, 80) .with_frame_id("camera_front") .with_timestamp(timestamp_ns); println!("Data size: {} bytes", mask.data_size()); // 1920*1080 // Instance segmentation let mask = SegmentationMask::instance(640, 480); // Panoptic segmentation let mask = SegmentationMask::panoptic(640, 480, 80); // Check type if mask.is_panoptic() { println!("Panoptic mask, u16 data: {} bytes", mask.data_size_u16()); } ``` **Fields:** | Field | Type | Description | |-------|------|-------------| | `width` | `u32` | Image width | | `height` | `u32` | Image height | | `num_classes` | `u32` | Number of classes (semantic/panoptic) | | `mask_type` | `u32` | 0=semantic, 1=instance, 2=panoptic | | `timestamp_ns` | `u64` | Nanoseconds since epoch | | `seq` | `u64` | Sequence number | | `frame_id` | `[u8; 32]` | Camera/coordinate frame | **Common COCO class IDs** (available as constants in `segmentation::classes`): `BACKGROUND(0)`, `PERSON(1)`, `BICYCLE(2)`, `CAR(3)`, `MOTORCYCLE(4)`, `BUS(6)`, `TRAIN(7)`, `TRUCK(8)` ## Perception Pipeline Example ```rust // simplified use horus::prelude::*; struct PerceptionNode { depth_sub: Topic, cloud_pub: Topic, fx: f32, fy: f32, cx: f32, cy: f32, } impl Node for PerceptionNode { fn name(&self) -> &str { "PerceptionNode" } fn tick(&mut self) { if let Some(depth) = self.depth_sub.recv() { // Read depth values and build point cloud let w = depth.width(); let h = depth.height(); let num_points = (w * h) as u32; if let Ok(mut cloud) = PointCloud::new(num_points, 3, TensorDtype::F32) { // Depth-to-pointcloud conversion would fill cloud data here // using camera intrinsics (fx, fy, cx, cy) self.cloud_pub.send(cloud); } } } } ``` ## PointField Describes a single field within a point cloud's point structure. Used to define custom point formats beyond XYZ. **Fields:** | Field | Type | Description | |-------|------|-------------| | `name` | `[u8; 16]` | Field name (e.g., "x", "y", "z", "intensity", "rgb") | | `offset` | `u32` | Byte offset within the point data structure | | `datatype` | `TensorDtype` | Data type (F32, U8, etc.) | | `count` | `u32` | Number of elements (1 for scalar, 3 for vector) | ```rust // simplified use horus::prelude::*; let fields = vec![ PointField::new("x", 0, TensorDtype::F32, 1), PointField::new("y", 4, TensorDtype::F32, 1), PointField::new("z", 8, TensorDtype::F32, 1), PointField::new("intensity", 12, TensorDtype::F32, 1), ]; ``` ## PlaneArray Collection of detected planes (up to 16). Typically output from plane segmentation algorithms (RANSAC, region growing). **Fields:** | Field | Type | Description | |-------|------|-------------| | `planes` | `[PlaneDetection; 16]` | Array of detected planes | | `count` | `u8` | Number of valid planes (0-16) | | `frame_id` | `[u8; 32]` | Source sensor coordinate frame | | `algorithm` | `[u8; 32]` | Detection algorithm name | | `timestamp_ns` | `u64` | Timestamp in nanoseconds since epoch | ## TrackingHeader Metadata header for multi-object tracking. Accompanies tracked object lists with frame-level statistics. **Fields:** | Field | Type | Description | |-------|------|-------------| | `num_tracks` | `u32` | Number of tracked objects in this frame | | `frame_id` | `u32` | Sequential frame number | | `timestamp_ns` | `u64` | Timestamp in nanoseconds since epoch | | `total_tracks` | `u64` | Total tracks ever created (for unique ID generation) | | `active_tracks` | `u32` | Currently confirmed active tracks | ```rust // simplified use horus::prelude::*; let header = TrackingHeader::new(5, 42); // 5 tracks, frame 42 println!("Active: {}, Total created: {}", header.active_tracks, header.total_tracks); ``` ## See Also - [Vision Messages](/rust/api/vision-messages) - Image, CameraInfo, Detection, Detection3D - [Message Types](/concepts/message-types) - Standard message type overview - [Sensor Messages](/rust/api/sensor-messages) - Sensor data types --- ## Force & Haptic Messages Path: /rust/api/force-messages Description: Force sensing, impedance control, and haptic feedback # Force & Haptic Messages HORUS provides message types for force/torque sensors, impedance control, and haptic feedback systems commonly used in manipulation tasks. **Re-exported types** (available via `use horus::prelude::*`): `WrenchStamped`, `ImpedanceParameters`, `ForceCommand`. **Non-re-exported types** (require direct import): `ContactInfo`, `ContactState`, `HapticFeedback`, `TactileArray` — import from `horus_library::messages::force::*`. ## WrenchStamped 6-DOF force and torque measurement from force/torque sensors. ```rust // simplified use horus::prelude::*; // Create wrench measurement let force = Vector3::new(10.0, 5.0, -2.0); // Newtons let torque = Vector3::new(0.1, 0.2, 0.05); // Newton-meters let wrench = WrenchStamped::new(force, torque) .with_frame_id("tool0"); // Check magnitudes println!("Force magnitude: {:.2} N", wrench.force_magnitude()); println!("Torque magnitude: {:.3} Nm", wrench.torque_magnitude()); // Safety check let max_force = 50.0; // N let max_torque = 5.0; // Nm if wrench.exceeds_limits(max_force, max_torque) { println!("Safety limits exceeded!"); } // Create from force only let force_only = WrenchStamped::force_only(Vector3::new(0.0, 0.0, -10.0)); // Create from torque only let torque_only = WrenchStamped::torque_only(Vector3::new(0.0, 0.0, 0.5)); // Low-pass filter noisy sensor readings let mut current = wrench; current.filter(&previous_wrench, 0.1); // alpha = 0.1 (heavy filtering) ``` **Fields:** | Field | Type | Unit | Description | |-------|------|------|-------------| | `force` | `Vector3` | N | Force [fx, fy, fz] | | `torque` | `Vector3` | Nm | Torque [tx, ty, tz] | | `point_of_application` | `Point3` | m | Force application point | | `frame_id` | `[u8; 32]` | — | Reference frame | | `timestamp_ns` | `u64` | ns | Nanoseconds since epoch | > **ROS2 equivalent:** `geometry_msgs/msg/WrenchStamped` **Methods:** | Method | Description | |--------|-------------| | `new(force, torque)` | Create a new wrench measurement | | `force_only(force)` | Create from force only (zero torque) | | `torque_only(torque)` | Create from torque only (zero force) | | `with_frame_id(frame_id)` | Set reference frame (builder pattern) | | `force_magnitude()` | Get force vector magnitude | | `torque_magnitude()` | Get torque vector magnitude | | `exceeds_limits(max_force, max_torque)` | Check if limits exceeded | | `filter(&prev_wrench, alpha)` | Apply low-pass filter (alpha 0.0-1.0) | ## ImpedanceParameters Impedance control parameters for compliant manipulation. ```rust // simplified use horus::prelude::*; // Default impedance (moderate compliance) let mut impedance = ImpedanceParameters::new(); // Compliant mode (low stiffness - for delicate tasks) let compliant = ImpedanceParameters::compliant(); // stiffness: [100, 100, 100, 10, 10, 10] // damping: [20, 20, 20, 2, 2, 2] // Stiff mode (high stiffness - for precision tasks) let stiff = ImpedanceParameters::stiff(); // stiffness: [5000, 5000, 5000, 500, 500, 500] // damping: [100, 100, 100, 10, 10, 10] // Enable/disable impedance.enable(); impedance.disable(); // Custom parameters impedance.stiffness = [500.0, 500.0, 200.0, 50.0, 50.0, 50.0]; // [Kx, Ky, Kz, Krx, Kry, Krz] impedance.damping = [30.0, 30.0, 20.0, 3.0, 3.0, 3.0]; // [Dx, Dy, Dz, Drx, Dry, Drz] impedance.force_limits = [50.0, 50.0, 30.0, 5.0, 5.0, 5.0]; // Safety limits ``` **Fields:** | Field | Type | Unit | Description | |-------|------|------|-------------| | `stiffness` | `[f64; 6]` | N/m, Nm/rad | Stiffness [Kx, Ky, Kz, Krx, Kry, Krz] | | `damping` | `[f64; 6]` | Ns/m, Nms/rad | Damping [Dx, Dy, Dz, Drx, Dry, Drz] | | `inertia` | `[f64; 6]` | kg, kgm² | Virtual inertia | | `force_limits` | `[f64; 6]` | N, Nm | Force/torque limits | | `enabled` | `u8` | — | Impedance control active (0 = off, 1 = on) | | `timestamp_ns` | `u64` | ns | Nanoseconds since epoch | **Methods:** | Method | Description | |--------|-------------| | `new()` | Create with default moderate stiffness/damping | | `compliant()` | Create with low stiffness for delicate tasks | | `stiff()` | Create with high stiffness for precision tasks | | `enable()` | Enable impedance control | | `disable()` | Disable impedance control | ## ForceCommand Hybrid force/position control command. ```rust // simplified use horus::prelude::*; // Pure force command let force_cmd = ForceCommand::force_only(Vector3::new(0.0, 0.0, -5.0)); // 5N downward // Hybrid force/position control // Force control on Z axis, position control on X/Y let force_axes: [u8; 6] = [0, 0, 1, 0, 0, 0]; // 1=force, 0=position per axis let target_force = Vector3::new(0.0, 0.0, -10.0); let target_position = Vector3::new(0.5, 0.3, 0.0); let hybrid_cmd = ForceCommand::hybrid(force_axes, target_force, target_position); // Surface contact following let surface_normal = Vector3::new(0.0, 0.0, 1.0); // Horizontal surface let contact_force = 5.0; // 5N contact force let surface_cmd = ForceCommand::surface_contact(contact_force, surface_normal); // Set timeout let cmd_with_timeout = force_cmd.with_timeout(5.0); // 5 second timeout ``` **Fields:** | Field | Type | Unit | Description | |-------|------|------|-------------| | `target_force` | `Vector3` | N | Desired force | | `target_torque` | `Vector3` | Nm | Desired torque | | `force_mode` | `[u8; 6]` | — | 1 = force control, 0 = position control per axis | | `position_setpoint` | `Vector3` | m | Position target for position-controlled axes | | `orientation_setpoint` | `Vector3` | rad | Orientation target (Euler angles) | | `max_deviation` | `Vector3` | m | Maximum position deviation | | `gains` | `[f64; 6]` | — | Control gains | | `timeout_seconds` | `f64` | s | Command timeout (0 = no timeout) | | `frame_id` | `[u8; 32]` | — | Reference frame | | `timestamp_ns` | `u64` | ns | Nanoseconds since epoch | > **ROS2 equivalent:** `geometry_msgs/msg/Wrench` (force/torque portion) **Methods:** | Method | Description | |--------|-------------| | `force_only(target_force)` | Create pure force command (all axes force-controlled) | | `hybrid(force_axes, target_force, target_position)` | Create hybrid force/position command | | `surface_contact(normal_force, surface_normal)` | Create surface following command | | `with_timeout(seconds)` | Set command timeout (builder pattern) | ## ContactInfo Contact detection and classification. > **Note:** `ContactInfo` and `ContactState` are not re-exported in the prelude. Import directly: `use horus_library::messages::force::{ContactInfo, ContactState};` ```rust // simplified use horus_library::messages::force::{ContactInfo, ContactState}; // Create contact info let contact = ContactInfo::new(ContactState::StableContact, 15.0); // 15N force // Check contact state if contact.is_in_contact() { println!("Contact force: {:.1} N", contact.contact_force); println!("Duration: {:.2}s", contact.contact_duration_seconds()); println!("Confidence: {:.0}%", contact.confidence * 100.0); } ``` **ContactState values:** | State | Value | Description | |-------|-------|-------------| | `NoContact` | 0 | No contact detected (default) | | `InitialContact` | 1 | First contact moment | | `StableContact` | 2 | Established contact | | `ContactLoss` | 3 | Contact being broken | | `Sliding` | 4 | Sliding contact | | `Impact` | 5 | Impact detected | **Fields:** | Field | Type | Unit | Description | |-------|------|------|-------------| | `state` | `u8` | — | Contact state (use `ContactState as u8` to set) | | `contact_force` | `f64` | N | Contact force magnitude | | `contact_normal` | `Vector3` | — | Contact normal vector (estimated, unit vector) | | `contact_point` | `Point3` | m | Contact point (estimated) | | `stiffness` | `f64` | N/m | Contact stiffness (estimated) | | `damping` | `f64` | Ns/m | Contact damping (estimated) | | `confidence` | `f32` | — | Detection confidence (0.0 to 1.0) | | `contact_start_time` | `u64` | ns | Time contact was first detected | | `frame_id` | `[u8; 32]` | — | Reference frame | | `timestamp_ns` | `u64` | ns | Nanoseconds since epoch | **Methods:** | Method | Description | |--------|-------------| | `new(state, force_magnitude)` | Create new contact info (takes `ContactState`, stores as `u8`) | | `is_in_contact()` | Check if currently in contact (InitialContact, StableContact, or Sliding) | | `contact_duration_seconds()` | Get contact duration in seconds | ## HapticFeedback Haptic feedback commands for user interfaces. > **Note:** `HapticFeedback` is not re-exported in the prelude. Import directly: `use horus_library::messages::force::HapticFeedback;` ```rust // simplified use horus::prelude::*; use horus_library::messages::force::HapticFeedback; // Vibration feedback let vibration = HapticFeedback::vibration( 0.8, // intensity (0-1) 250.0, // frequency (Hz) 0.5 // duration (seconds) ); // Force feedback let force = HapticFeedback::force( Vector3::new(1.0, 0.0, 0.0), // Force direction 1.0 // Duration (seconds) ); // Pulse pattern let pulse = HapticFeedback::pulse( 0.6, // intensity 100.0, // frequency 0.3 // duration ); ``` **Pattern Types:** | Constant | Value | Description | |----------|-------|-------------| | `PATTERN_CONSTANT` | 0 | Constant intensity | | `PATTERN_PULSE` | 1 | Pulsing pattern | | `PATTERN_RAMP` | 2 | Ramping intensity | **Fields:** | Field | Type | Unit | Description | |-------|------|------|-------------| | `vibration_intensity` | `f32` | — | Vibration intensity (0.0 to 1.0) | | `vibration_frequency` | `f32` | Hz | Vibration frequency | | `duration_seconds` | `f32` | s | Duration of feedback | | `force_feedback` | `Vector3` | N | Force feedback vector | | `pattern_type` | `u8` | — | Feedback pattern (see constants) | | `enabled` | `u8` | — | Enable/disable feedback (0 = off, 1 = on) | | `timestamp_ns` | `u64` | ns | Nanoseconds since epoch | **Methods:** | Method | Description | |--------|-------------| | `vibration(intensity, frequency, duration)` | Create vibration feedback (intensity clamped to 0-1) | | `force(force, duration)` | Create force feedback | | `pulse(intensity, frequency, duration)` | Create pulse pattern feedback | ## Force Control Node Example ```rust // simplified use horus::prelude::*; struct ForceControlNode { wrench_sub: Topic, cmd_pub: Topic, impedance_pub: Topic, target_force: f64, prev_wrench: Option, } impl Node for ForceControlNode { fn name(&self) -> &str { "ForceControl" } fn tick(&mut self) { if let Some(mut wrench) = self.wrench_sub.recv() { // Apply low-pass filter if let Some(prev) = &self.prev_wrench { wrench.filter(prev, 0.2); } self.prev_wrench = Some(wrench); // Safety check if wrench.exceeds_limits(100.0, 10.0) { // Switch to compliant mode let compliant = ImpedanceParameters::compliant(); self.impedance_pub.send(compliant); return; } // Force control to maintain target contact force let error = self.target_force - wrench.force.z; let correction = error * 0.001; // Simple P control let cmd = ForceCommand::force_only( Vector3::new(0.0, 0.0, self.target_force + correction) ); self.cmd_pub.send(cmd); } } } ``` ## ContactState Represents the phase of a contact event during force-controlled manipulation. | Variant | Value | Description | |---------|-------|-------------| | `NoContact` | 0 | No contact detected | | `InitialContact` | 1 | First moment of contact detected | | `StableContact` | 2 | Stable, sustained contact | | `ContactLoss` | 3 | Contact is being lost | | `Sliding` | 4 | Sliding contact (tangential motion) | | `Impact` | 5 | High-force impact detected | ```rust // simplified use horus::prelude::*; use horus_library::messages::force::ContactState; if let Some(contact) = contact_sub.recv() { match contact.state { s if s == ContactState::Impact as u8 => { // Emergency: zero out force immediately let zero_cmd = ForceCommand::force_only(Vector3::new(0.0, 0.0, 0.0)); cmd_pub.send(zero_cmd); } s if s == ContactState::StableContact as u8 => { // Safe to apply task forces } _ => {} } } ``` ## TactileArray Tactile sensor array reading for contact-rich manipulation. Each taxel reports a force value in Newtons arranged in a row-major grid. > **Note:** `TactileArray` uses `Vec` internally, so it is **not** zero-copy POD — it goes through the serde serialization path. Import directly: `use horus_library::messages::force::TactileArray;` ```rust // simplified use horus_library::messages::force::TactileArray; // Create a 4x4 tactile sensor pad let mut tactile = TactileArray::new(4, 4); // Set individual taxel readings tactile.set_force(1, 2, 3.5); // row 1, col 2 = 3.5 N // Read a taxel if let Some(force) = tactile.get_force(1, 2) { println!("Force at (1,2): {:.2} N", force); } // Aggregate contact info tactile.in_contact = true; tactile.total_force = [0.0, 0.0, 12.5]; // Net force vector tactile.center_of_pressure = [0.45, 0.52]; // Normalized [0..1] tactile.physical_size = [0.04, 0.04]; // 4cm × 4cm sensor ``` **Fields:** | Field | Type | Unit | Description | |-------|------|------|-------------| | `rows` | `u32` | — | Number of taxel rows | | `cols` | `u32` | — | Number of taxel columns | | `forces` | `Vec` | N | Row-major taxel readings (length = rows × cols) | | `total_force` | `[f32; 3]` | N | Net contact force [fx, fy, fz] | | `center_of_pressure` | `[f32; 2]` | — | Normalized CoP [0.0, 1.0] (row, col) | | `in_contact` | `bool` | — | Whether any contact is detected | | `physical_size` | `[f32; 2]` | m | Sensor surface dimensions [width, height] | | `timestamp_ns` | `u64` | ns | Nanoseconds since epoch | **Methods:** | Method | Description | |--------|-------------| | `new(rows, cols)` | Create array with given dimensions, all forces zeroed | | `get_force(row, col) -> Option` | Read taxel at (row, col), None if out of bounds | | `set_force(row, col, force)` | Set taxel at (row, col), no-op if out of bounds | ## See Also - [Geometry Messages](/rust/api/geometry-messages) - Vector3, Point3 - [Control Messages](/rust/api/control-messages) - Motor and actuator commands --- ## Control Messages Path: /rust/api/control-messages Description: Motor control, servo, PID, trajectory, and joint command messages # Control Messages HORUS provides comprehensive message types for controlling motors, servos, and multi-joint robotic systems. All control messages are fixed-size types optimized for zero-copy shared memory transport at ~50ns latency. ## MotorCommand Direct motor control with multiple control modes. ```rust // simplified use horus::prelude::*; // Provides control::MotorCommand; // Velocity control let vel_cmd = MotorCommand::velocity(0, 1.5); // motor_id=0, 1.5 rad/s // Position control with max velocity let pos_cmd = MotorCommand::position(0, 3.14, 2.0); // motor_id=0, 3.14 rad, max 2 rad/s // Stop motor let stop_cmd = MotorCommand::stop(0); // Check validity if vel_cmd.is_valid() { println!("Target: {:.2}", vel_cmd.target); } ``` **Control Modes:** | Constant | Value | Description | |----------|-------|-------------| | `MODE_VELOCITY` | 0 | Velocity control (rad/s) | | `MODE_POSITION` | 1 | Position control (rad) | | `MODE_TORQUE` | 2 | Torque control (Nm) | | `MODE_VOLTAGE` | 3 | Direct voltage control | **Fields:** | Field | Type | Unit | Description | |-------|------|------|-------------| | `motor_id` | `u8` | — | Motor identifier (0-255) | | `mode` | `u8` | — | Control mode (see constants above) | | `target` | `f64` | mode-dependent | rad/s (velocity), rad (position), Nm (torque), V (voltage) | | `max_velocity` | `f64` | rad/s | Velocity limit in position mode. 0.0 = no limit | | `max_acceleration` | `f64` | rad/s² | Acceleration limit. 0.0 = no limit | | `feed_forward` | `f64` | Feed-forward term | | `enable` | `u8` | Motor enabled (1=enabled, 0=disabled) | | `timestamp_ns` | `u64` | Nanoseconds since epoch | ## DifferentialDriveCommand Commands for differential drive robots. ```rust // simplified use horus::prelude::*; // Provides control::DifferentialDriveCommand; // Direct wheel velocities let cmd = DifferentialDriveCommand::new(1.0, 1.2); // left=1.0, right=1.2 rad/s // From linear and angular velocities let wheel_base = 0.5; // 50cm between wheels let wheel_radius = 0.1; // 10cm wheels let cmd = DifferentialDriveCommand::from_twist( 0.5, // 0.5 m/s forward 0.2, // 0.2 rad/s rotation wheel_base, wheel_radius ); // Stop let stop = DifferentialDriveCommand::stop(); ``` **Fields:** | Field | Type | Description | |-------|------|-------------| | `left_velocity` | `f64` | Left wheel velocity (rad/s) | | `right_velocity` | `f64` | Right wheel velocity (rad/s) | | `max_acceleration` | `f64` | Acceleration limit (rad/s²) | | `enable` | `u8` | Motors enabled (1=enabled, 0=disabled) | | `timestamp_ns` | `u64` | Nanoseconds since epoch | **Methods:** | Method | Description | |--------|-------------| | `new(left, right)` | Create with wheel velocities (rad/s) | | `from_twist(linear, angular, wheel_base, wheel_radius)` | Create from linear/angular velocity | | `stop()` | Stop both motors | | `is_valid()` | Check if all values are finite | ## ServoCommand Position-controlled servo commands. ```rust // simplified use horus::prelude::*; // Provides control::ServoCommand; // Position command (radians) let cmd = ServoCommand::new(0, 1.57); // servo_id=0, 90 degrees // With specific speed (0-1) let cmd = ServoCommand::with_speed(0, 1.57, 0.3); // 30% speed // From degrees let cmd = ServoCommand::from_degrees(0, 90.0); // Disable servo (release torque) let disable = ServoCommand::disable(0); ``` **Fields:** | Field | Type | Description | |-------|------|-------------| | `servo_id` | `u8` | Servo identifier | | `position` | `f32` | Target position (radians) | | `speed` | `f32` | Movement speed (0-1, 0=max speed) | | `enable` | `u8` | Torque enabled (1=enabled, 0=disabled) | | `timestamp_ns` | `u64` | Nanoseconds since epoch | ## PidConfig PID controller configuration. ```rust // simplified use horus::prelude::*; // Provides control::PidConfig; // Full PID let pid = PidConfig::new(1.0, 0.1, 0.05); // Kp, Ki, Kd // P-only controller let p_only = PidConfig::proportional(2.0); // PI controller let pi = PidConfig::pi(1.5, 0.2); // PD controller let pd = PidConfig::pd(1.0, 0.1); // With limits let pid_limited = PidConfig::new(1.0, 0.1, 0.05) .with_limits(10.0, 100.0); // integral_limit, output_limit // Validation if pid.is_valid() { println!("Kp={}, Ki={}, Kd={}", pid.kp, pid.ki, pid.kd); } ``` **Fields:** | Field | Type | Description | |-------|------|-------------| | `controller_id` | `u8` | Controller identifier | | `kp` | `f64` | Proportional gain | | `ki` | `f64` | Integral gain | | `kd` | `f64` | Derivative gain | | `integral_limit` | `f64` | Integral windup limit | | `output_limit` | `f64` | Output saturation limit | | `anti_windup` | `u8` | Anti-windup enabled (1=enabled, 0=disabled) | | `timestamp_ns` | `u64` | Nanoseconds since epoch | ## TrajectoryPoint Single point in a trajectory. ```rust // simplified use horus::prelude::*; // Provides control::TrajectoryPoint; // Simple 2D trajectory point let point = TrajectoryPoint::new_2d( 1.0, 2.0, // x, y position 0.5, 0.3, // vx, vy velocity 1.5 // time from start (seconds) ); // Stationary point (x, y, z) let waypoint = TrajectoryPoint::stationary(1.0, 2.0, 0.0); ``` **Fields:** | Field | Type | Description | |-------|------|-------------| | `position` | `[f64; 3]` | Position [x, y, z] | | `velocity` | `[f64; 3]` | Velocity [vx, vy, vz] | | `acceleration` | `[f64; 3]` | Acceleration [ax, ay, az] | | `orientation` | `[f64; 4]` | Quaternion [x, y, z, w] | | `angular_velocity` | `[f64; 3]` | Angular velocity [wx, wy, wz] | | `time_from_start` | `f64` | Time offset (seconds) | ## JointCommand Multi-joint command for robot arms and manipulators. ```rust // simplified use horus::prelude::*; // Provides control::JointCommand; let mut cmd = JointCommand::new(); // Add position commands cmd.add_position("shoulder", 0.5)?; cmd.add_position("elbow", 1.0)?; cmd.add_position("wrist", -0.3)?; // Add velocity commands cmd.add_velocity("gripper", 0.2)?; ``` **Control Modes:** | Constant | Value | Description | |----------|-------|-------------| | `MODE_POSITION` | 0 | Position control (rad) | | `MODE_VELOCITY` | 1 | Velocity control (rad/s) | | `MODE_EFFORT` | 2 | Torque/effort control (Nm) | **Fields:** | Field | Type | Description | |-------|------|-------------| | `joint_names` | `[[u8; 32]; 16]` | Joint name strings | | `joint_count` | `u8` | Number of joints (max 16) | | `positions` | `[f64; 16]` | Position commands (rad) | | `velocities` | `[f64; 16]` | Velocity commands (rad/s) | | `efforts` | `[f64; 16]` | Effort commands (Nm) | | `modes` | `[u8; 16]` | Control mode per joint | | `timestamp_ns` | `u64` | Nanoseconds since epoch | ## Motor Control Node Example ```rust // simplified use horus::prelude::*; struct MotorDriverNode { cmd_sub: Topic, left_motor_pub: Topic, right_motor_pub: Topic, wheel_base: f64, wheel_radius: f64, } impl Node for MotorDriverNode { fn name(&self) -> &str { "MotorDriver" } fn tick(&mut self) { if let Some(cmd) = self.cmd_sub.recv() { // Convert CmdVel to differential drive let diff = DifferentialDriveCommand::from_twist( cmd.linear as f64, cmd.angular as f64, self.wheel_base, self.wheel_radius ); // Send motor velocity commands self.left_motor_pub.send(MotorCommand::velocity(0, diff.left_velocity)); self.right_motor_pub.send(MotorCommand::velocity(1, diff.right_velocity)); } } } ``` ## See Also - [Navigation Messages](/rust/api/navigation-messages) - Path and goal messages - [Sensor Messages](/rust/api/sensor-messages) - Sensor feedback data --- ## Sensor Messages Path: /rust/api/sensor-messages Description: Lidar, IMU, odometry, GPS, range sensors, and battery state # Sensor Messages HORUS provides standard sensor data formats for common robotics sensors including lidar, IMU, GPS, and battery monitoring. All sensor messages are fixed-size types optimized for zero-copy shared memory transport at ~50ns latency. ## LaserScan Laser scan data from a 2D lidar sensor. Fixed-size array (360 readings) for shared memory safety. Supports up to 360-degree scanning with 1-degree resolution. ```rust // simplified use horus::prelude::*; // Create a new laser scan let mut scan = LaserScan::new(); // Set scan parameters scan.angle_min = -std::f32::consts::PI; // -180 degrees scan.angle_max = std::f32::consts::PI; // +180 degrees scan.range_min = 0.1; // 10cm minimum scan.range_max = 30.0; // 30m maximum // Fill in range data (360 readings) for i in 0..360 { scan.ranges[i] = 2.5; // 2.5m reading at all angles } // Get angle for a specific reading let angle = scan.angle_at(90); // 90th reading println!("Angle at index 90: {:.2} rad", angle); // Check if a reading is valid if scan.is_range_valid(45) { println!("Reading at 45 is valid: {:.2}m", scan.ranges[45]); } // Get statistics let valid = scan.valid_count(); let min_dist = scan.min_range(); println!("Valid readings: {}, Min distance: {:?}", valid, min_dist); ``` **Fields:** | Field | Type | Unit | Description | |-------|------|------|-------------| | `ranges` | `[f32; 360]` | m | Range measurements. 0.0 = invalid/no return | | `angle_min` | `f32` | rad | Start angle (typically -pi or 0) | | `angle_max` | `f32` | rad | End angle (typically pi or 2*pi) | | `range_min` | `f32` | m | Minimum valid range (readings below are noise) | | `range_max` | `f32` | m | Maximum valid range (readings above are no-return) | | `angle_increment` | `f32` | rad | Angular step between consecutive ranges | | `time_increment` | `f32` | s | Time between consecutive measurements | | `scan_time` | `f32` | s | Duration of complete scan rotation | | `timestamp_ns` | `u64` | ns | Nanoseconds since epoch | > **ROS2 equivalent:** `sensor_msgs/msg/LaserScan` > > **Valid range check:** `range_min <= ranges[i] <= range_max` (ranges outside are invalid) > > **Typical values:** RPLiDAR A1: 360 ranges, angle_min=0, angle_max=2*pi, range_max=12m **Methods:** | Method | Description | |--------|-------------| | `new()` | Create with default parameters | | `angle_at(index)` | Get angle for a specific range index | | `is_range_valid(index)` | Check if a reading is valid | | `valid_count()` | Count valid range readings | | `min_range()` | Get minimum valid range reading | ## Imu IMU (Inertial Measurement Unit) sensor data. Provides orientation, angular velocity, and linear acceleration measurements. ```rust // simplified use horus::prelude::*; // Create new IMU message let mut imu = Imu::new(); // Set orientation from Euler angles (roll, pitch, yaw) imu.set_orientation_from_euler(0.0, 0.05, 1.57); // Slight pitch, 90° yaw // Set angular velocity [x, y, z] in rad/s imu.angular_velocity = [0.0, 0.0, 0.5]; // Rotating around Z-axis // Set linear acceleration [x, y, z] in m/s² imu.linear_acceleration = [0.0, 0.0, 9.81]; // Gravity pointing up // Check data availability if imu.has_orientation() { println!("Orientation: {:?}", imu.orientation); } // Get as Vector3 for calculations let angular_vel = imu.angular_velocity_vec(); let linear_acc = imu.linear_acceleration_vec(); println!("Angular velocity magnitude: {:.2} rad/s", angular_vel.magnitude()); // Validate data assert!(imu.is_valid()); ``` **Fields:** | Field | Type | Unit | Description | |-------|------|------|-------------| | `orientation` | `[f64; 4]` | — | Quaternion [x, y, z, w]. Identity = [0, 0, 0, 1] | | `orientation_covariance` | `[f64; 9]` | rad² | 3x3 row-major. Set first element to -1 if no orientation data | | `angular_velocity` | `[f64; 3]` | rad/s | Gyroscope [roll_rate, pitch_rate, yaw_rate] | | `angular_velocity_covariance` | `[f64; 9]` | (rad/s)² | 3x3 row-major covariance | | `linear_acceleration` | `[f64; 3]` | m/s² | Accelerometer [x, y, z]. Includes gravity (~9.81 on Z when level) | | `linear_acceleration_covariance` | `[f64; 9]` | (m/s²)² | 3x3 row-major covariance | | `timestamp_ns` | `u64` | ns | Nanoseconds since epoch | > **ROS2 equivalent:** `sensor_msgs/msg/Imu` > > **Gravity convention:** `linear_acceleration` includes gravity. A stationary IMU reads ~[0, 0, 9.81]. Subtract gravity for motion-only acceleration. > > **Covariance = -1:** Set the first element of a covariance matrix to -1.0 to indicate "no data available" for that measurement. **Methods:** | Method | Description | |--------|-------------| | `new()` | Create new IMU message | | `set_orientation_from_euler(roll, pitch, yaw)` | Set orientation from Euler angles | | `has_orientation()` | Check if orientation data available | | `is_valid()` | Check if all values are finite | | `angular_velocity_vec()` | Get angular velocity as `Vector3` | | `linear_acceleration_vec()` | Get linear acceleration as `Vector3` | ## Odometry Odometry data combining pose and velocity. Typically computed from wheel encoders or visual odometry, provides the robot's estimated position and velocity. ```rust // simplified use horus::prelude::*; // Create odometry message let mut odom = Odometry::new(); // Set coordinate frames odom.set_frames("odom", "base_link"); // Update with current pose and velocity let pose = Pose2D::new(5.0, 3.0, 0.785); // x, y, theta let twist = Twist::new_2d(0.5, 0.1); // linear, angular odom.update(pose, twist); // Access pose and velocity println!("Position: ({:.2}, {:.2})", odom.pose.x, odom.pose.y); println!("Velocity: {:.2} m/s", odom.twist.linear[0]); // Validate assert!(odom.is_valid()); ``` **Fields:** | Field | Type | Unit | Description | |-------|------|------|-------------| | `pose` | `Pose2D` | m, rad | Current position (x, y) and heading (theta) | | `twist` | `Twist` | m/s, rad/s | Current linear and angular velocity | | `pose_covariance` | `[f64; 36]` | mixed | 6x6 row-major: [x, y, z, roll, pitch, yaw] covariance | | `twist_covariance` | `[f64; 36]` | mixed | 6x6 row-major velocity covariance | | `frame_id` | `[u8; 32]` | — | Reference frame name (e.g., "odom") | | `child_frame_id` | `[u8; 32]` | — | Body frame name (e.g., "base_link") | | `timestamp_ns` | `u64` | ns | Nanoseconds since epoch | > **ROS2 equivalent:** `nav_msgs/msg/Odometry` **Methods:** | Method | Description | |--------|-------------| | `new()` | Create new odometry message | | `set_frames(frame, child_frame)` | Set coordinate frame names | | `update(pose, twist)` | Update pose and velocity with timestamp | | `is_valid()` | Check if pose and twist are valid | ## RangeSensor Single-point range sensor data (ultrasonic, infrared). ```rust // simplified use horus::prelude::*; // Create ultrasonic range reading let ultrasonic = RangeSensor::new(RangeSensor::ULTRASONIC, 1.5); // 1.5m reading // Create infrared range reading let ir = RangeSensor::new(RangeSensor::INFRARED, 0.3); // 30cm reading // Check if reading is valid (within sensor limits) if ultrasonic.is_valid() { println!("Distance: {:.2}m", ultrasonic.range); } // Access sensor parameters println!("FOV: {:.2} rad", ultrasonic.field_of_view); println!("Range: {:.2} - {:.2}m", ultrasonic.min_range, ultrasonic.max_range); ``` **Fields:** | Field | Type | Unit | Description | |-------|------|------|-------------| | `sensor_type` | `u8` | — | 0 = ultrasonic, 1 = infrared | | `field_of_view` | `f32` | rad | Sensor field of view cone angle | | `min_range` | `f32` | m | Minimum valid range | | `max_range` | `f32` | m | Maximum valid range | | `range` | `f32` | m | Current range reading | | `timestamp_ns` | `u64` | ns | Nanoseconds since epoch | > **ROS2 equivalent:** `sensor_msgs/msg/Range` **Constants:** | Constant | Value | Description | |----------|-------|-------------| | `RangeSensor::ULTRASONIC` | 0 | Ultrasonic sensor | | `RangeSensor::INFRARED` | 1 | Infrared sensor | **Methods:** | Method | Description | |--------|-------------| | `new(sensor_type, range)` | Create with sensor type and reading | | `is_valid()` | Check if reading is within sensor limits | ## NavSatFix GPS/GNSS position data. Standard GNSS position data from GPS, GLONASS, Galileo, or other satellite navigation systems. ```rust // simplified use horus::prelude::*; // Create GPS fix from coordinates let fix = NavSatFix::from_coordinates( 37.7749, // Latitude (positive = North) -122.4194, // Longitude (positive = East) 10.5 // Altitude in meters ); // Check fix status if fix.has_fix() { println!("GPS Fix acquired!"); println!("Position: {:.6}°N, {:.6}°E", fix.latitude, fix.longitude); println!("Altitude: {:.1}m", fix.altitude); println!("Satellites: {}", fix.satellites_visible); } // Get accuracy estimate let accuracy = fix.horizontal_accuracy(); println!("Horizontal accuracy: ±{:.1}m", accuracy); // Calculate distance to another position let destination = NavSatFix::from_coordinates(37.8044, -122.2712, 0.0); let distance = fix.distance_to(&destination); println!("Distance to destination: {:.0}m", distance); // Validate coordinates assert!(fix.is_valid()); ``` **Fields:** | Field | Type | Unit | Description | |-------|------|------|-------------| | `latitude` | `f64` | deg | Latitude (+ = North, - = South). WGS84 | | `longitude` | `f64` | deg | Longitude (+ = East, - = West). WGS84 | | `altitude` | `f64` | m | Altitude above WGS84 ellipsoid | | `position_covariance` | `[f64; 9]` | m² | 3x3 position covariance (ENU frame) | | `position_covariance_type` | `u8` | — | Covariance type (see constants) | | `status` | `u8` | — | Fix status (see constants) | | `satellites_visible` | `u16` | — | Satellite count | | `hdop` | `f32` | — | Horizontal dilution of precision (lower = better, <2 = good) | | `vdop` | `f32` | — | Vertical dilution of precision | | `speed` | `f32` | m/s | Ground speed | | `heading` | `f32` | deg | Course over ground (0 = North, 90 = East) | | `timestamp_ns` | `u64` | ns | Nanoseconds since epoch | > **ROS2 equivalent:** `sensor_msgs/msg/NavSatFix` **Status Constants:** | Constant | Value | Description | |----------|-------|-------------| | `STATUS_NO_FIX` | 0 | No GPS fix | | `STATUS_FIX` | 1 | Standard GPS fix | | `STATUS_SBAS_FIX` | 2 | SBAS-augmented fix | | `STATUS_GBAS_FIX` | 3 | GBAS-augmented fix | **Covariance Type Constants:** | Constant | Value | Description | |----------|-------|-------------| | `COVARIANCE_TYPE_UNKNOWN` | 0 | Unknown covariance | | `COVARIANCE_TYPE_APPROXIMATED` | 1 | Approximated covariance | | `COVARIANCE_TYPE_DIAGONAL_KNOWN` | 2 | Diagonal elements known | | `COVARIANCE_TYPE_KNOWN` | 3 | Full covariance matrix known | **Methods:** | Method | Returns | Description | |--------|---------|-------------| | `new()` | `NavSatFix` | Create empty fix | | `from_coordinates(lat, lon, alt)` | `NavSatFix` | Create from coordinates | | `has_fix()` | `bool` | Check if valid GPS fix | | `is_valid()` | `bool` | Check if coordinates are valid | | `horizontal_accuracy()` | `f32` | Estimate accuracy from HDOP | | `distance_to(&other)` | `f64` | Calculate distance using Haversine formula | ## BatteryState Battery status monitoring. ```rust // simplified use horus::prelude::*; // Create battery state let mut battery = BatteryState::new(12.6, 85.0); // 12.6V, 85% // Set additional fields battery.current = -2.5; // Discharging at 2.5A battery.temperature = 28.0; battery.power_supply_status = BatteryState::STATUS_DISCHARGING; // Check battery level if battery.is_low(20.0) { println!("Battery low!"); } if battery.is_critical() { println!("Battery critical (below 10%)!"); } // Estimate remaining time if let Some(remaining) = battery.time_remaining() { println!("Estimated time remaining: {:.0} seconds", remaining); } println!("Voltage: {:.2}V", battery.voltage); println!("Charge: {:.0}%", battery.percentage); println!("Temperature: {:.1}°C", battery.temperature); ``` **Fields:** | Field | Type | Description | |-------|------|-------------| | `voltage` | `f32` | Voltage in volts | | `current` | `f32` | Current in amperes (- = discharging) | | `charge` | `f32` | Charge in amp-hours (NaN if unknown) | | `capacity` | `f32` | Capacity in amp-hours (NaN if unknown) | | `percentage` | `f32` | Charge percentage (0-100) | | `power_supply_status` | `u8` | Status (see constants) | | `temperature` | `f32` | Temperature in Celsius | | `cell_voltages` | `[f32; 16]` | Individual cell voltages | | `cell_count` | `u8` | Number of valid cell readings | | `timestamp_ns` | `u64` | Nanoseconds since epoch | **Status Constants:** | Constant | Value | Description | |----------|-------|-------------| | `STATUS_UNKNOWN` | 0 | Unknown status | | `STATUS_CHARGING` | 1 | Charging | | `STATUS_DISCHARGING` | 2 | Discharging | | `STATUS_FULL` | 3 | Fully charged | **Methods:** | Method | Description | |--------|-------------| | `new(voltage, percentage)` | Create new battery state | | `is_low(threshold)` | Check if below threshold % | | `is_critical()` | Check if below 10% | | `time_remaining()` | Estimate remaining time (seconds) | ## JointState Joint positions, velocities, and efforts for articulated robots. ```rust // simplified use horus::prelude::*; let mut state = JointState::new(); state.add_joint("shoulder", 0.0, 0.0, 0.1).unwrap(); state.add_joint("elbow", 0.5, 0.0, 0.5).unwrap(); // Look up by name if let Some(pos) = state.position("shoulder") { println!("Shoulder at {:.2} rad", pos); } ``` **Methods:** | Method | Returns | Description | |--------|---------|-------------| | `new()` | `JointState` | Create empty joint state | | `add_joint(name, position, velocity, effort)` | `Result` | Add a joint with all values | | `joint_name(index)` | `Option<&str>` | Get joint name by index | | `position(name)` | `Option` | Get position by joint name | | `velocity(name)` | `Option` | Get velocity by joint name | | `effort(name)` | `Option` | Get effort by joint name | ## Environmental Sensors Simple sensor types with `new()` constructor and `with_frame_id()` builder. | Type | Constructor | Fields | |------|------------|--------| | `MagneticField` | `new(field: [f64; 3])` | `x`, `y`, `z` (Tesla) | | `Temperature` | `new(temperature: f64)` | `temperature` (°C), `variance` | | `FluidPressure` | `new(pressure: f64)` | `pressure` (Pa), `variance` | | `Illuminance` | `new(illuminance: f64)` | `illuminance` (lux), `variance` | All support `with_frame_id(&str) -> Self` for setting the coordinate frame. ## Sensor Fusion Example ```rust // simplified use horus::prelude::*; struct SensorFusionNode { imu_sub: Topic, odom_sub: Topic, gps_sub: Topic, fused_pose_pub: Topic, // Extended Kalman Filter state ekf_state: [f64; 6], // [x, y, theta, vx, vy, omega] } impl Node for SensorFusionNode { fn name(&self) -> &str { "SensorFusion" } fn tick(&mut self) { // Process IMU at highest rate if let Some(imu) = self.imu_sub.recv() { if imu.is_valid() { // Use angular velocity for heading prediction let omega = imu.angular_velocity[2]; self.predict_state(omega); } } // Process odometry if let Some(odom) = self.odom_sub.recv() { if odom.is_valid() { // Update with wheel odometry self.update_odometry(&odom); } } // Process GPS (lower rate, absolute position) if let Some(gps) = self.gps_sub.recv() { if gps.has_fix() && gps.is_valid() { // Update with GPS (when available) self.update_gps(&gps); } } // Publish fused pose let pose = Pose2D::new( self.ekf_state[0], self.ekf_state[1], self.ekf_state[2] ); self.fused_pose_pub.send(pose); } } impl SensorFusionNode { fn predict_state(&mut self, omega: f64) { // EKF prediction step using IMU let dt = horus::dt(); // actual tick timestep self.ekf_state[2] += omega * dt; } fn update_odometry(&mut self, odom: &Odometry) { // EKF update with odometry measurement // ... implementation } fn update_gps(&mut self, gps: &NavSatFix) { // EKF update with GPS measurement // ... implementation } } ``` ## See Also - [Geometry Messages](/rust/api/geometry-messages) - Pose2D, Twist, TransformStamped - [Navigation Messages](/rust/api/navigation-messages) - Goals, paths, occupancy grids - [Perception Messages](/rust/api/perception-messages) - Point clouds, depth data --- ## ML & Segmentation Path: /rust/api/ml-messages Description: Segmentation masks and ML inference output types # ML & Segmentation The original ML message types were removed in 0.1.10. For ML inference pipelines, use: - **Model I/O**: [Tensor](/rust/api/tensor) and [TensorPool](/rust/api/tensor) for zero-copy tensor transport - **Detection results**: [Detection / Detection3D](/rust/api/vision-messages) with bounding boxes - **Custom outputs**: [GenericMessage](/rust/api/generic-message) for cross-language data - **Segmentation**: `SegmentationMask` (below) ## SegmentationMask Output type for semantic, instance, and panoptic segmentation models. Fixed-size header (64 bytes) — the pixel data follows in shared memory. ```rust // simplified use horus::prelude::*; // Semantic segmentation (e.g., from DeepLab — 21 classes) let mask = SegmentationMask::semantic(640, 480, 21) .with_frame_id("camera_front") .with_timestamp(1234567890); // Instance segmentation (e.g., from Mask R-CNN) let mask = SegmentationMask::instance(640, 480); // Panoptic segmentation (e.g., from Panoptic-FPN — 80 classes) let mask = SegmentationMask::panoptic(640, 480, 80); // Send via topic let topic: Topic = Topic::new("segmentation")?; topic.send(mask); ``` **Fields:** | Field | Type | Description | |-------|------|-------------| | `width` | `u32` | Mask width in pixels | | `height` | `u32` | Mask height in pixels | | `num_classes` | `u32` | Number of semantic classes | | `mask_type` | `u32` | 0=semantic, 1=instance, 2=panoptic | | `timestamp_ns` | `u64` | Nanoseconds since epoch | | `seq` | `u64` | Sequence number | | `frame_id` | `[u8; 32]` | Camera frame identifier | **Methods:** | Method | Returns | Description | |--------|---------|-------------| | `semantic(w, h, num_classes)` | `SegmentationMask` | Create semantic mask header | | `instance(w, h)` | `SegmentationMask` | Create instance mask header | | `panoptic(w, h, num_classes)` | `SegmentationMask` | Create panoptic mask header | | `with_frame_id(id)` | `Self` | Set frame ID | | `with_timestamp(ts)` | `Self` | Set timestamp | | `frame_id()` | `&str` | Get frame ID as string | | `data_size()` | `usize` | Mask data size in bytes (w * h) | ## See Also - [TensorPool API](/rust/api/tensor) — Zero-copy tensor memory management - [Vision Messages](/rust/api/vision-messages) — Detection and bounding box types - [Perception Messages](/rust/api/perception-messages) — Point cloud and depth sensing --- ## Navigation Messages Path: /rust/api/navigation-messages Description: Path planning, goals, waypoints, occupancy grids, and cost maps # Navigation Messages HORUS provides message types for autonomous navigation, path planning, mapping, and localization systems. **Re-exported types** (available via `use horus::prelude::*`): `NavGoal`, `NavPath`, `PathPlan`, `OccupancyGrid`, `CostMap`. **Non-re-exported types** (require direct import): `GoalStatus`, `GoalResult`, `Waypoint`, `VelocityObstacle`, `VelocityObstacles` — import from `horus_library::messages::navigation::*`. ## NavGoal Navigation goal specification with tolerance and timeout. ```rust // simplified use horus::prelude::*; // Create navigation goal let target = Pose2D::new(5.0, 3.0, 1.57); // x, y, theta let goal = NavGoal::new(target, 0.1, 0.05); // 10cm position, 0.05rad angle tolerance // With timeout and priority let goal = NavGoal::new(target, 0.1, 0.05) .with_timeout(30.0) // 30 second timeout .with_priority(0); // Highest priority // Check if goal reached let current_pose = Pose2D::new(5.05, 3.02, 1.55); if goal.is_reached(¤t_pose) { println!("Goal reached!"); } // Check position and orientation separately if goal.is_position_reached(¤t_pose) { println!("Position reached, adjusting orientation..."); } if goal.is_orientation_reached(¤t_pose) { println!("Orientation reached"); } ``` **Fields:** | Field | Type | Unit | Description | |-------|------|------|-------------| | `target_pose` | `Pose2D` | m, rad | Target pose to reach | | `tolerance_position` | `f64` | m | Position tolerance | | `tolerance_angle` | `f64` | rad | Orientation tolerance | | `timeout_seconds` | `f64` | s | Maximum time (0 = no limit) | | `priority` | `u8` | — | Goal priority (0 = highest) | | `goal_id` | `u32` | — | Unique goal identifier | | `timestamp_ns` | `u64` | ns | Nanoseconds since epoch | > **ROS2 equivalent:** `nav2_msgs/action/NavigateToPose` **Methods:** | Method | Description | |--------|-------------| | `new(target_pose, tolerance_position, tolerance_angle)` | Create a new navigation goal | | `with_timeout(seconds)` | Set timeout (builder pattern) | | `with_priority(priority)` | Set priority (builder pattern) | | `is_position_reached(¤t_pose)` | Check if position is within tolerance | | `is_orientation_reached(¤t_pose)` | Check if orientation is within tolerance | | `is_reached(¤t_pose)` | Check if both position and orientation are reached | ## GoalStatus Goal execution status enumeration. > **Note:** `GoalStatus` is not re-exported in the prelude. Import directly: `use horus_library::messages::navigation::GoalStatus;` ```rust // simplified use horus_library::messages::navigation::GoalStatus; let status = GoalStatus::Active; match status { GoalStatus::Pending => println!("Waiting to start"), GoalStatus::Active => println!("Moving to goal"), GoalStatus::Succeeded => println!("Goal reached!"), GoalStatus::Aborted => println!("Navigation failed"), GoalStatus::Cancelled => println!("Goal cancelled by user"), GoalStatus::Preempted => println!("Higher priority goal received"), GoalStatus::TimedOut => println!("Goal timed out"), } ``` **Status Values:** | Status | Value | Description | |--------|-------|-------------| | `Pending` | 0 | Goal pending execution (default) | | `Active` | 1 | Actively pursuing goal | | `Succeeded` | 2 | Goal reached successfully | | `Aborted` | 3 | Navigation aborted (error) | | `Cancelled` | 4 | Cancelled by user | | `Preempted` | 5 | Preempted by higher priority | | `TimedOut` | 6 | Goal timed out | ## GoalResult Goal status feedback with progress information. > **Note:** `GoalResult` is not re-exported in the prelude. Import directly: `use horus_library::messages::navigation::GoalResult;` ```rust // simplified use horus_library::messages::navigation::{GoalResult, GoalStatus}; // Create success result let result = GoalResult::new(42, GoalStatus::Succeeded); // Create failure result with error let error_result = GoalResult::new(42, GoalStatus::Aborted) .with_error("Obstacle blocking path"); // Update progress let mut in_progress = GoalResult::new(42, GoalStatus::Active); in_progress.distance_to_goal = 2.5; // 2.5m remaining in_progress.eta_seconds = 5.0; // 5s estimated in_progress.progress = 0.75; // 75% complete println!("Goal {}: status={}, {:.1}m to go, ETA {:.1}s", in_progress.goal_id, in_progress.status, in_progress.distance_to_goal, in_progress.eta_seconds); ``` **Fields:** | Field | Type | Unit | Description | |-------|------|------|-------------| | `goal_id` | `u32` | — | Goal identifier | | `status` | `u8` | — | Current status (use `GoalStatus as u8` to set) | | `distance_to_goal` | `f64` | m | Distance remaining | | `eta_seconds` | `f64` | s | Estimated time to arrive | | `progress` | `f32` | — | Progress (0.0 to 1.0) | | `error_message` | `[u8; 64]` | — | Error message if failed | | `timestamp_ns` | `u64` | ns | Nanoseconds since epoch | **Methods:** | Method | Description | |--------|-------------| | `new(goal_id, status)` | Create a new result (takes `GoalStatus`, stores as `u8`) | | `with_error(message)` | Set error message string (builder pattern) | ## Waypoint Single waypoint in a navigation path. > **Note:** `Waypoint` is not re-exported in the prelude. Import directly: `use horus_library::messages::navigation::Waypoint;` ```rust // simplified use horus::prelude::*; use horus_library::messages::navigation::Waypoint; // Simple waypoint let wp = Waypoint::new(Pose2D::new(1.0, 2.0, 0.0)); // Waypoint with velocity profile let wp = Waypoint::new(Pose2D::new(1.0, 2.0, 0.0)) .with_velocity(Twist::new_2d(0.5, 0.0)); // 0.5 m/s forward // Waypoint requiring stop (e.g., for pickup) let stop_wp = Waypoint::new(Pose2D::new(3.0, 4.0, 1.57)) .with_stop(); // Access properties println!("Position: ({:.1}, {:.1})", wp.pose.x, wp.pose.y); println!("Curvature: {:.3}", wp.curvature); println!("Stop required: {}", wp.stop_required); // 0 = no, 1 = yes ``` **Fields:** | Field | Type | Unit | Description | |-------|------|------|-------------| | `pose` | `Pose2D` | m, rad | Waypoint pose (x, y, theta) | | `velocity` | `Twist` | m/s, rad/s | Desired velocity at this point | | `time_from_start` | `f64` | s | Time from path start | | `curvature` | `f32` | 1/m | Path curvature (1/radius) | | `stop_required` | `u8` | — | Whether to stop at waypoint (0 = no, 1 = yes) | **Methods:** | Method | Description | |--------|-------------| | `new(pose)` | Create a new waypoint with default velocity | | `with_velocity(twist)` | Set desired velocity (builder pattern) | | `with_stop()` | Mark as requiring stop, sets velocity to zero | ## NavPath Navigation path with up to 256 waypoints. ```rust // simplified use horus::prelude::*; use horus_library::messages::navigation::Waypoint; // Create empty path let mut path = NavPath::new(); // Add waypoints path.add_waypoint(Waypoint::new(Pose2D::new(0.0, 0.0, 0.0)))?; path.add_waypoint(Waypoint::new(Pose2D::new(1.0, 0.0, 0.0)))?; path.add_waypoint(Waypoint::new(Pose2D::new(2.0, 1.0, 0.785)))?; // Get path info println!("Waypoints: {}", path.waypoint_count); println!("Total length: {:.2}m", path.total_length); // Get valid waypoints slice let waypoints = path.waypoints(); // Find closest waypoint to current position let current = Pose2D::new(1.2, 0.3, 0.0); if let Some(idx) = path.closest_waypoint_index(¤t) { println!("Closest waypoint: {}", idx); } // Calculate progress along path let progress = path.calculate_progress(¤t); println!("Path progress: {:.0}%", progress * 100.0); ``` **Fields:** | Field | Type | Unit | Description | |-------|------|------|-------------| | `waypoints` | `[Waypoint; 256]` | — | Array of waypoints | | `waypoint_count` | `u16` | — | Number of valid waypoints | | `total_length` | `f64` | m | Total path length | | `duration_seconds` | `f64` | s | Estimated completion time | | `frame_id` | `[u8; 32]` | — | Coordinate frame | | `algorithm` | `[u8; 32]` | — | Planning algorithm used | | `timestamp_ns` | `u64` | ns | Nanoseconds since epoch | > **ROS2 equivalent:** `nav_msgs/msg/Path` **Methods:** | Method | Description | |--------|-------------| | `new()` | Create a new empty path | | `add_waypoint(waypoint)` | Add a waypoint (returns `Result`, max 256) | | `waypoints()` | Get slice of valid waypoints | | `closest_waypoint_index(&pose)` | Find index of closest waypoint | | `calculate_progress(&pose)` | Calculate progress along path (0.0 to 1.0) | | `with_frame_id(frame_id)` | Builder: set coordinate frame | ## PathPlan Fixed-size path plan for zero-copy IPC transfer. Stores up to 256 waypoints as packed `[x, y, theta]` f32 values. ```rust // simplified use horus::prelude::*; // Create path plan from waypoints let waypoints = &[ [0.0f32, 0.0, 0.0], // [x, y, theta] [1.0, 0.0, 0.0], [2.0, 0.5, 0.5], [3.0, 1.0, 0.785], ]; let goal = [3.0f32, 1.0, 0.785]; let plan = PathPlan::from_waypoints(waypoints, goal); // Or build incrementally let mut plan = PathPlan::new(); plan.add_waypoint(0.0, 0.0, 0.0); plan.add_waypoint(1.0, 0.5, 0.2); plan.goal_pose = [1.0, 0.5, 0.2]; println!("Path has {} waypoints", plan.waypoint_count); println!("Empty: {}", plan.is_empty()); // Access individual waypoints if let Some(wp) = plan.get_waypoint(0) { println!("First waypoint: x={}, y={}, theta={}", wp[0], wp[1], wp[2]); } ``` **Fields:** | Field | Type | Unit | Description | |-------|------|------|-------------| | `waypoint_data` | `[f32; 768]` | m, rad | Packed waypoint data (256 x 3 floats: x, y, theta) | | `goal_pose` | `[f32; 3]` | m, rad | Goal pose [x, y, theta] | | `waypoint_count` | `u16` | — | Number of valid waypoints | | `timestamp_ns` | `u64` | ns | Nanoseconds since epoch | **Methods:** | Method | Description | |--------|-------------| | `new()` | Create a new empty path plan | | `from_waypoints(waypoints, goal)` | Create from a slice of `[f32; 3]` waypoints | | `add_waypoint(x, y, theta)` | Add a waypoint (returns `bool`, max 256) | | `get_waypoint(index)` | Get waypoint at index as `Option<[f32; 3]>` | | `is_empty()` | Check if path has no waypoints | ## OccupancyGrid 2D occupancy grid map for navigation. Uses `Vec` data (Serde-based, variable-size). ```rust // simplified use horus::prelude::*; // Create 10m x 10m map at 5cm resolution let origin = Pose2D::origin(); let mut grid = OccupancyGrid::new( 200, // width (200 * 0.05 = 10m) 200, // height 0.05, // resolution (5cm per cell) origin ); // Set occupancy (-1=unknown, 0=free, 100=occupied) grid.set_occupancy(100, 100, 0); // Free cell grid.set_occupancy(150, 150, 100); // Occupied cell (obstacle) // World to grid coordinate conversion if let Some((gx, gy)) = grid.world_to_grid(5.0, 5.0) { println!("World (5.0, 5.0) -> Grid ({}, {})", gx, gy); } // Grid to world coordinate conversion if let Some((x, y)) = grid.grid_to_world(100, 100) { println!("Grid (100, 100) -> World ({:.2}, {:.2})", x, y); } // Check cell status let test_x = 7.5; let test_y = 7.5; if grid.is_free(test_x, test_y) { println!("({}, {}) is free", test_x, test_y); } else if grid.is_occupied(test_x, test_y) { println!("({}, {}) is occupied", test_x, test_y); } // Get occupancy value if let Some((gx, gy)) = grid.world_to_grid(test_x, test_y) { if let Some(value) = grid.occupancy(gx, gy) { println!("Occupancy: {}", value); } } ``` **Occupancy Values:** | Value | Meaning | |-------|---------| | `-1` | Unknown | | `0` | Free | | `1-49` | Probably free | | `50-99` | Probably occupied | | `100` | Occupied | **Fields:** | Field | Type | Unit | Description | |-------|------|------|-------------| | `resolution` | `f32` | m/cell | Meters per pixel | | `width` | `u32` | cells | Map width in pixels | | `height` | `u32` | cells | Map height in pixels | | `origin` | `Pose2D` | m, rad | Map origin (bottom-left) | | `data` | `Vec` | — | Occupancy values (-1 to 100) | | `frame_id` | `[u8; 32]` | — | Coordinate frame | | `metadata` | `[u8; 64]` | — | Map metadata | | `timestamp_ns` | `u64` | ns | Nanoseconds since epoch | > **ROS2 equivalent:** `nav_msgs/msg/OccupancyGrid` **Methods:** | Method | Description | |--------|-------------| | `new(width, height, resolution, origin)` | Create a new grid (initialized to unknown) | | `world_to_grid(x, y)` | Convert world coordinates to grid indices | | `grid_to_world(grid_x, grid_y)` | Convert grid indices to world coordinates (cell center) | | `occupancy(grid_x, grid_y)` | Get occupancy value at grid coordinates | | `set_occupancy(grid_x, grid_y, value)` | Set occupancy value (clamped to -1..100) | | `is_free(x, y)` | Check if world point is free (occupancy 0-49) | | `is_occupied(x, y)` | Check if world point is occupied (occupancy >= 50) | ## CostMap Navigation cost map with obstacle inflation. Uses `Vec` cost data (Serde-based, variable-size). ```rust // simplified use horus::prelude::*; // Create occupancy grid first let grid = OccupancyGrid::new(200, 200, 0.05, Pose2D::origin()); // Create costmap with inflation radius let costmap = CostMap::from_occupancy_grid(grid, 0.55); // 55cm inflation // Get cost at world coordinates (0-255, 253=lethal) if let Some(cost) = costmap.cost(5.0, 5.0) { if cost >= costmap.lethal_cost { println!("Position is in obstacle!"); } else { println!("Cost: {}", cost); } } // Access underlying grid println!("Map size: {}x{}", costmap.occupancy_grid.width, costmap.occupancy_grid.height); ``` **Cost Values:** | Value | Meaning | |-------|---------| | `0` | Free space | | `1-252` | Increasing cost (near obstacles) | | `253` | Lethal (default lethal_cost) | | `254-255` | Reserved | **Fields:** | Field | Type | Unit | Description | |-------|------|------|-------------| | `occupancy_grid` | `OccupancyGrid` | — | Base occupancy map | | `costs` | `Vec` | — | Cost values (0-255) | | `inflation_radius` | `f32` | m | Inflation radius | | `cost_scaling_factor` | `f32` | — | Cost decay factor | | `lethal_cost` | `u8` | — | Lethal obstacle threshold (default 253) | > **ROS2 equivalent:** `nav2_msgs/msg/Costmap` **Methods:** | Method | Description | |--------|-------------| | `from_occupancy_grid(grid, inflation_radius)` | Create costmap from occupancy grid with inflation | | `cost(x, y)` | Get cost at world coordinates (returns lethal for out-of-bounds) | | `compute_costs()` | Recompute costs from occupancy data (call after modifying the grid) | ## VelocityObstacle Dynamic obstacle for velocity-based avoidance. > **Note:** `VelocityObstacle` is not re-exported in the prelude. Import directly: `use horus_library::messages::navigation::VelocityObstacle;` ```rust // simplified use horus_library::messages::navigation::VelocityObstacle; let obstacle = VelocityObstacle { position: [3.0, 2.0], // [x, y] velocity: [0.5, 0.0], // Moving at 0.5 m/s in x radius: 0.3, // 30cm radius time_horizon: 5.0, // 5 second prediction obstacle_id: 1, }; println!("Obstacle {} at ({:.1}, {:.1}) moving at ({:.1}, {:.1})", obstacle.obstacle_id, obstacle.position[0], obstacle.position[1], obstacle.velocity[0], obstacle.velocity[1]); ``` **Fields:** | Field | Type | Unit | Description | |-------|------|------|-------------| | `position` | `[f64; 2]` | m | Obstacle position [x, y] | | `velocity` | `[f64; 2]` | m/s | Obstacle velocity [vx, vy] | | `radius` | `f32` | m | Obstacle radius | | `time_horizon` | `f32` | s | Collision prediction horizon | | `obstacle_id` | `u32` | — | Tracking ID | ## VelocityObstacles Array of velocity obstacles (max 32). > **Note:** `VelocityObstacles` is not re-exported in the prelude. Import directly: `use horus_library::messages::navigation::VelocityObstacles;` ```rust // simplified use horus_library::messages::navigation::{VelocityObstacles, VelocityObstacle}; let mut obstacles = VelocityObstacles::default(); obstacles.obstacles[0] = VelocityObstacle { position: [2.0, 1.0], velocity: [0.3, 0.1], radius: 0.25, time_horizon: 3.0, obstacle_id: 1, }; obstacles.count = 1; println!("Tracking {} dynamic obstacles", obstacles.count); ``` **Fields:** | Field | Type | Unit | Description | |-------|------|------|-------------| | `obstacles` | `[VelocityObstacle; 32]` | — | Obstacle array | | `count` | `u8` | — | Number of valid obstacles | | `timestamp_ns` | `u64` | ns | Nanoseconds since epoch | ## Navigation Node Example ```rust // simplified use horus::prelude::*; use horus_library::messages::navigation::{GoalResult, GoalStatus, Waypoint}; struct NavigationNode { goal_sub: Topic, odom_sub: Topic, map_sub: Topic, path_pub: Topic, result_pub: Topic, current_goal: Option, current_path: Option, } impl Node for NavigationNode { fn name(&self) -> &str { "Navigation" } fn tick(&mut self) { // Check for new goals if let Some(goal) = self.goal_sub.recv() { self.current_goal = Some(goal); // Plan path to goal if let Some(map) = self.map_sub.recv() { let path = self.plan_path(&goal, &map); self.path_pub.send(path); self.current_path = Some(path); } } // Check goal progress if let (Some(goal), Some(odom)) = (&self.current_goal, self.odom_sub.recv()) { let current_pose = odom.pose; // Odometry.pose is Pose2D if goal.is_reached(¤t_pose) { let result = GoalResult::new(goal.goal_id, GoalStatus::Succeeded); self.result_pub.send(result); self.current_goal = None; } else { let mut result = GoalResult::new(goal.goal_id, GoalStatus::Active); result.distance_to_goal = goal.target_pose.distance_to(¤t_pose); if let Some(path) = &self.current_path { result.progress = path.calculate_progress(¤t_pose); } self.result_pub.send(result); } } } } impl NavigationNode { fn plan_path(&self, _goal: &NavGoal, _map: &OccupancyGrid) -> NavPath { // Path planning implementation (A*, RRT*, etc.) NavPath::new() } } ``` ## See Also - [Geometry Messages](/rust/api/geometry-messages) - Pose2D, Twist, TransformStamped - [Sensor Messages](/rust/api/sensor-messages) - Odometry, IMU, LaserScan --- ## RuntimeParams Path: /rust/api/runtime-params Description: Dynamic runtime parameter store with typed access, validation, and persistence # RuntimeParams A typed key-value store for runtime configuration. Loads defaults from `.horus/config/params.yaml`, supports typed get/set with validation, concurrent access, and persistence to disk. ## Quick Start ```rust // simplified use horus::prelude::*; let params = RuntimeParams::new()?; // Typed get with default let speed: f64 = params.get_or("max_speed", 1.0); // Typed set params.set("max_speed", 2.0)?; // Check and iterate if params.has("pid_kp") { let keys = params.list_keys(); println!("Parameters: {:?}", keys); } // Persist to disk params.save_to_disk()?; ``` ## Creating ```rust // simplified // Load from .horus/config/params.yaml (or built-in defaults) let params = RuntimeParams::new()?; // Load from explicit file let params = RuntimeParams::new()?; params.load_from_disk(Path::new("config/robot.yaml"))?; ``` If `.horus/config/params.yaml` exists, parameters are loaded from it. Otherwise, built-in defaults are used. ## Reading Parameters ```rust // simplified // Option-based (returns None if missing or type mismatch) let speed: Option = params.get("max_speed"); // With default value let kp: f64 = params.get_or("pid_kp", 1.0); // With explicit error reporting let speed: f64 = params.get_typed("max_speed")?; // Returns HorusError if missing // Check existence if params.has("emergency_stop_distance") { // ... } ``` ## Writing Parameters ```rust // simplified // Set any serde-serializable value params.set("max_speed", 1.5_f64)?; params.set("sensor_ids", vec![1, 2, 3])?; params.set("robot_name", "atlas")?; // Optimistic locking (concurrent edit protection) let version = params.get_version("max_speed"); params.set_with_version("max_speed", 2.0, version)?; // Fails if modified since // Remove a parameter params.remove("obsolete_param"); // Reset to defaults params.reset()?; ``` ## Persistence ```rust // simplified // Save current state to .horus/config/params.yaml params.save_to_disk()?; // Load from specific file params.load_from_disk(Path::new("config/prod.yaml"))?; ``` ## Built-in Defaults These parameters are available out of the box: | Key | Default | Description | |-----|---------|-------------| | `tick_rate` | `30` | Default scheduler tick rate (Hz) | | `max_memory_mb` | `512` | Memory limit | | `max_speed` | `1.0` | Maximum linear speed (m/s) | | `max_angular_speed` | `1.0` | Maximum angular speed (rad/s) | | `acceleration_limit` | `0.5` | Acceleration limit (m/s²) | | `lidar_rate` | `10` | LiDAR scan rate (Hz) | | `camera_fps` | `30` | Camera frame rate | | `sensor_timeout_ms` | `1000` | Sensor timeout (ms) | | `emergency_stop_distance` | `0.3` | E-stop trigger distance (m) | | `collision_threshold` | `0.5` | Collision detection threshold (m) | | `pid_kp` | `1.0` | PID proportional gain | | `pid_ki` | `0.1` | PID integral gain | | `pid_kd` | `0.05` | PID derivative gain | ## Usage in Nodes ```rust // simplified use horus::prelude::*; struct MotorController { params: RuntimeParams, } impl MotorController { fn new() -> Result { Ok(Self { params: RuntimeParams::new()?, }) } } impl Node for MotorController { fn name(&self) -> &str { "motor_ctrl" } fn tick(&mut self) { let kp: f64 = self.params.get_or("pid_kp", 1.0); let max_speed: f64 = self.params.get_or("max_speed", 1.0); // ... use params in control loop } } ``` ## Python Equivalent ```python import horus p = horus.Params() kp = p["pid_kp"] # 1.0 p["pid_kp"] = 2.5 p.save() ``` See [Python Bindings — Runtime Parameters](/python/api/python-bindings#runtime-parameters) for full Python API. ## API Reference | Method | Returns | Description | |--------|---------|-------------| | `new()` | `Result` | Load from `.horus/config/params.yaml` or defaults | | `get::(key)` | `Option` | Typed get, None if missing | | `get_or::(key, default)` | `T` | Get with default | | `get_typed::(key)` | `Result` | Get with error reporting | | `set(key, value)` | `Result<()>` | Set (validates if metadata exists) | | `has(key)` | `bool` | Check existence | | `list_keys()` | `Vec` | All parameter names | | `remove(key)` | `Option` | Remove and return | | `reset()` | `Result<()>` | Reset to built-in defaults | | `save_to_disk()` | `Result<()>` | Persist to YAML | | `load_from_disk(path)` | `Result<()>` | Load from YAML file | | `get_version(key)` | `u64` | Version for optimistic locking | | `set_with_version(key, value, version)` | `Result<()>` | Set with version check | ## Validation Rules Parameters can have validation rules that are enforced on every `set()` call. Rules are defined through `ParamMetadata` and checked automatically. | Rule | Applies To | Description | |------|-----------|-------------| | `MinValue(f64)` | Numbers | Value must be >= minimum | | `MaxValue(f64)` | Numbers | Value must be <= maximum | | `Range(f64, f64)` | Numbers | Value must be within [min, max] | | `RegexPattern(String)` | Strings | Value must match regex pattern | | `Enum(Vec)` | Strings | Value must be one of the allowed values | | `MinLength(usize)` | Strings/Arrays | Minimum length/element count | | `MaxLength(usize)` | Strings/Arrays | Maximum length/element count | | `RequiredKeys(Vec)` | Objects | JSON object must contain all listed keys | When a `set()` violates a rule, it returns `Err(HorusError::Validation(...))` with a descriptive message. ```rust // simplified use horus::prelude::*; // PID gain with range validation // If metadata exists with Range(0.0, 100.0), this fails: let result = params.set("pid.kp", 150.0); // Err: "Parameter 'pid.kp' value 150 exceeds maximum 100" // Enum-validated parameter let result = params.set("mode", "turbo"); // Err: "Parameter 'mode' value 'turbo' not in allowed values: [manual, auto, standby]" ``` ## Metadata `ParamMetadata` provides descriptions, units, validation, and read-only flags for parameters. | Field | Type | Description | |-------|------|-------------| | `description` | `Option` | Human-readable description | | `unit` | `Option` | Unit of measurement (e.g., "m/s", "Hz", "rad") | | `validation` | `Vec` | Validation rules (checked on set) | | `read_only` | `bool` | If true, `set()` is rejected | ```rust // simplified // Query metadata for a parameter if let Some(meta) = params.get_metadata("max_speed") { println!("Description: {:?}", meta.description); println!("Unit: {:?}", meta.unit); println!("Read-only: {}", meta.read_only()); } ``` ## Versioned Updates (Optimistic Locking) For concurrent parameter tuning (e.g., monitor UI + code both updating the same parameter), use versioned updates to prevent lost writes: ```rust // simplified use horus::prelude::*; // Read current version let version = params.get_version("pid.kp"); // Set with version check — fails if someone else modified it since our read match params.set_with_version("pid.kp", 2.5, version) { Ok(()) => println!("Updated successfully"), Err(e) => println!("Conflict: {}", e), // Someone else changed it } ``` This implements optimistic concurrency control: 1. Read the current version with `get_version(key)` 2. Compute your new value 3. Call `set_with_version(key, value, version)` — if the version hasn't changed, the write succeeds 4. If it fails, re-read and retry ## Persistence Parameters can be saved to and loaded from YAML files: ```rust // simplified // Save all parameters to disk (uses default path) params.save_to_disk()?; // Load parameters from a specific file params.load_from_disk(Path::new("robot_config.yaml"))?; ``` The YAML format is a flat key-value map: ```yaml pid.kp: 2.5 pid.ki: 0.1 pid.kd: 0.05 max_speed: 1.0 mode: "auto" ``` ## Method Reference ### new ```rust // simplified pub fn new() -> HorusResult ``` Create a parameter store. Loads defaults and any existing values from `.horus/config/params.yaml`. **Example:** ```rust // simplified let params = RuntimeParams::new()?; ``` --- ### get ```rust // simplified pub fn get(&self, key: &str) -> Option ``` Get a parameter value, deserialized to type `T`. Returns `None` if the key doesn't exist. **Example:** ```rust // simplified let speed: Option = params.get("max_speed"); ``` --- ### get_typed ```rust // simplified pub fn get_typed(&self, key: &str) -> HorusResult ``` Get a parameter, returning an error if missing or wrong type. Use when the parameter is required. **Example:** ```rust // simplified let kp: f64 = params.get_typed("pid_kp")?; // error if missing ``` --- ### get_or ```rust // simplified pub fn get_or(&self, key: &str, default: T) -> T ``` Get a parameter with a fallback default. **Example:** ```rust // simplified let speed = params.get_or("max_speed", 1.5); // 1.5 if not set ``` --- ### set ```rust // simplified pub fn set(&self, key: &str, value: T) -> Result<(), HorusError> ``` Set a parameter. Validates against metadata constraints (min/max, regex, read-only) if configured. Logs the change to the audit log. **Errors:** `ValidationError::OutOfRange` if value violates constraints, or if the key is read-only. **Example:** ```rust // simplified params.set("pid_kp", 2.5)?; params.set("max_speed", 0.8)?; ``` --- ### set_with_version ```rust // simplified pub fn set_with_version(&self, key: &str, value: T, expected_version: u64) -> Result<(), HorusError> ``` Optimistic locking — set only if the parameter hasn't been modified since `expected_version`. Prevents concurrent edit conflicts. **Example:** ```rust // simplified let version = params.get_version("pid_kp"); // ... compute new value ... params.set_with_version("pid_kp", new_kp, version)?; // fails if someone else changed it ``` --- ### list_keys ```rust // simplified pub fn list_keys(&self) -> Vec ``` Returns all parameter names. **Example:** ```rust // simplified for key in params.list_keys() { println!("{} = {:?}", key, params.get::(&key)); } ``` --- ### has ```rust // simplified pub fn has(&self, key: &str) -> bool ``` Check if a parameter exists. --- ### remove ```rust // simplified pub fn remove(&self, key: &str) -> Option ``` Remove a parameter. Returns the old value if it existed. --- ### reset ```rust // simplified pub fn reset(&self) -> Result<(), HorusError> ``` Reset all parameters to defaults. Clears user-set values. --- ### get_metadata ```rust // simplified pub fn get_metadata(&self, key: &str) -> Option ``` Get metadata for a parameter: description, unit, read-only flag, validation rules. **Example:** ```rust // simplified if let Some(meta) = params.get_metadata("max_speed") { println!("Unit: {:?}, Read-only: {}", meta.unit, meta.read_only); } ``` --- ### get_version ```rust // simplified pub fn get_version(&self, key: &str) -> u64 ``` Get the current version counter for a parameter. Used with `set_with_version()` for optimistic locking. --- ## See Also - [Python Params](/python/api/python-bindings#runtime-parameters) — Python dict-like API - [horus.toml Configuration](/concepts/horus-toml) — Project-level configuration - [Parameters CLI](/development/parameters) — `horus param get/set/list` --- ## Diagnostics Messages Path: /rust/api/diagnostics-messages Description: System monitoring, health checks, heartbeats, and error reporting # Diagnostics Messages HORUS provides message types for system monitoring, health checks, error reporting, and general diagnostics. ## Heartbeat Periodic signal indicating a node is alive and operational. ```rust // simplified use horus::prelude::*; // Provides diagnostics::Heartbeat; // Create heartbeat let mut heartbeat = Heartbeat::new("MotorController", 1); // Update for each heartbeat cycle heartbeat.update(120.5); // 120.5 seconds uptime println!("Node: {}", heartbeat.name()); println!("Sequence: {}", heartbeat.sequence); println!("Uptime: {:.1}s", heartbeat.uptime); println!("Alive: {}", heartbeat.alive); ``` **Fields:** | Field | Type | Description | |-------|------|-------------| | `node_name` | `[u8; 32]` | Node name (null-terminated) | | `node_id` | `u32` | Node identifier | | `sequence` | `u64` | Heartbeat sequence number | | `alive` | `u8` | Node is responding (0 = dead, 1 = alive) | | `uptime` | `f64` | Time since startup (seconds) | | `timestamp_ns` | `u64` | Nanoseconds since epoch | ## DiagnosticStatus General-purpose status reporting. ```rust // simplified use horus::prelude::*; // Provides DiagnosticStatus, StatusLevel // Create status messages let ok = DiagnosticStatus::ok("System initialized successfully"); let warning = DiagnosticStatus::warn(1001, "Battery level low") .with_component("PowerManager"); let error = DiagnosticStatus::error(2001, "Sensor communication timeout") .with_component("SensorHub"); let fatal = DiagnosticStatus::fatal(9001, "Motor driver fault - emergency stop") .with_component("MotorController"); // Access status info println!("[{:?}] {}: {}", error.level, error.component_str(), error.message_str()); ``` **StatusLevel values:** | Level | Value | Description | |-------|-------|-------------| | `Ok` | 0 | Everything is OK | | `Warn` | 1 | Warning condition | | `Error` | 2 | Error (recoverable) | | `Fatal` | 3 | Fatal error (system should stop) | **Fields:** | Field | Type | Description | |-------|------|-------------| | `level` | `u8` | Severity level (use `StatusLevel as u8` to set) | | `code` | `u32` | Component-specific error code | | `message` | `[u8; 128]` | Human-readable message | | `component` | `[u8; 32]` | Reporting component name | | `timestamp_ns` | `u64` | Nanoseconds since epoch | ## EmergencyStop Critical safety message to immediately stop all robot motion. ```rust // simplified use horus::prelude::*; // Provides diagnostics::EmergencyStop; // Engage emergency stop let estop = EmergencyStop::engage("Obstacle detected in safety zone") .with_source("SafetyController"); println!("E-STOP engaged: {}", estop.engaged); println!("Reason: {}", estop.reason_str()); // Release emergency stop let release = EmergencyStop::release(); // Allow auto-reset let mut estop_auto = EmergencyStop::engage("Soft limit exceeded"); estop_auto.auto_reset = 1; ``` **Fields:** | Field | Type | Description | |-------|------|-------------| | `engaged` | `u8` | Emergency stop is active (0 = off, 1 = on) | | `reason` | `[u8; 64]` | Stop reason | | `source` | `[u8; 32]` | Triggering source | | `auto_reset` | `u8` | Can auto-reset after clearing (0 = no, 1 = yes) | | `timestamp_ns` | `u64` | Nanoseconds since epoch | ## ResourceUsage System resource utilization. ```rust // simplified use horus::prelude::*; // Provides diagnostics::ResourceUsage; let mut usage = ResourceUsage::new(); usage.cpu_percent = 45.5; usage.memory_bytes = 1024 * 1024 * 512; // 512MB usage.memory_percent = 25.0; usage.temperature = 65.5; usage.thread_count = 12; // Check thresholds if usage.is_cpu_high(80.0) { println!("Warning: High CPU usage"); } if usage.is_memory_high(90.0) { println!("Warning: High memory usage"); } if usage.is_temperature_high(80.0) { println!("Warning: High temperature"); } println!("CPU: {:.1}%, Memory: {:.1}%, Temp: {:.1}C", usage.cpu_percent, usage.memory_percent, usage.temperature); ``` **Fields:** | Field | Type | Description | |-------|------|-------------| | `cpu_percent` | `f32` | CPU usage (0-100) | | `memory_bytes` | `u64` | Memory usage in bytes | | `memory_percent` | `f32` | Memory usage (0-100) | | `disk_bytes` | `u64` | Disk usage in bytes | | `disk_percent` | `f32` | Disk usage (0-100) | | `network_tx_bytes` | `u64` | Network bytes sent | | `network_rx_bytes` | `u64` | Network bytes received | | `temperature` | `f32` | System temperature (Celsius) | | `thread_count` | `u32` | Active thread count | | `timestamp_ns` | `u64` | Nanoseconds since epoch | ## DiagnosticValue Key-value pair for diagnostic reports. ```rust // simplified use horus::prelude::*; // Provides diagnostics::DiagnosticValue; // Create different value types let string_val = DiagnosticValue::string("firmware_version", "1.2.3"); let int_val = DiagnosticValue::int("error_count", 42); let float_val = DiagnosticValue::float("temperature", 65.5); let bool_val = DiagnosticValue::bool("calibrated", true); ``` **Value Type Constants:** | Constant | Value | Description | |----------|-------|-------------| | `TYPE_STRING` | 0 | String value | | `TYPE_INT` | 1 | Integer value | | `TYPE_FLOAT` | 2 | Float value | | `TYPE_BOOL` | 3 | Boolean value | **Fields:** | Field | Type | Description | |-------|------|-------------| | `key` | `[u8; 32]` | Key name | | `value` | `[u8; 64]` | Value as string | | `value_type` | `u8` | Value type hint | ## DiagnosticReport Diagnostic report with multiple key-value pairs (up to 16). ```rust // simplified use horus::prelude::*; // Provides diagnostics::{DiagnosticReport, StatusLevel}; let mut report = DiagnosticReport::new("MotorController"); // Add diagnostic values report.add_string("firmware", "2.1.0")?; report.add_int("tick_count", 15000)?; report.add_float("voltage", 24.5)?; report.add_bool("calibrated", true)?; // Set overall status report.set_level(StatusLevel::Ok); println!("Report has {} values at level {}", report.value_count, report.level); ``` **Fields:** | Field | Type | Description | |-------|------|-------------| | `component` | `[u8; 32]` | Component name | | `values` | `[DiagnosticValue; 16]` | Diagnostic values | | `value_count` | `u8` | Number of valid values | | `level` | `u8` | Overall status level (use `StatusLevel as u8` to set) | | `timestamp_ns` | `u64` | Nanoseconds since epoch | ## NodeState Node execution state enumeration. ```rust // simplified use horus_library::messages::diagnostics::NodeState; // Note: The prelude's NodeState is the core scheduler version. // For the POD message version, import from diagnostics directly. let state = NodeState::Running; println!("State: {}", state.as_str()); // "Running" ``` **NodeState values:** | State | Value | Description | |-------|-------|-------------| | `Idle` | 0 | Created but not started | | `Initializing` | 1 | Running initialization | | `Running` | 2 | Active and executing | | `Paused` | 3 | Temporarily suspended | | `Stopped` | 4 | Cleanly shut down | | `Error` | 5 | Error/crashed state | ## HealthStatus Node operational health status. ```rust // simplified use horus::prelude::*; // Provides diagnostics::HealthStatus; let health = HealthStatus::Healthy; println!("Health: {} ({})", health.as_str(), health.color()); // Color codes for monitor display // Healthy -> "green" // Warning -> "yellow" // Error -> "orange" // Critical -> "red" // Unknown -> "gray" ``` **HealthStatus values:** | Status | Value | Description | |--------|-------|-------------| | `Healthy` | 0 | Operating normally | | `Warning` | 1 | Degraded performance | | `Error` | 2 | Errors but running | | `Critical` | 3 | Fatal errors | | `Unknown` | 4 | No heartbeat received | ## NodeHeartbeat Node status heartbeat with health information (written to shared memory). ```rust // simplified use horus::prelude::*; // Provides NodeHeartbeat, HealthStatus use horus_library::messages::diagnostics::NodeState; // POD version (distinct from core NodeState) // Create heartbeat let mut heartbeat = NodeHeartbeat::new(NodeState::Running, HealthStatus::Healthy); heartbeat.tick_count = 15000; heartbeat.target_rate = 100; heartbeat.actual_rate = 98; heartbeat.error_count = 0; // Update timestamp heartbeat.update_timestamp(); // Check freshness (within last 5 seconds) if heartbeat.is_fresh(5) { println!("Node is alive"); } // Serialize for file writing let bytes = heartbeat.to_bytes(); // Deserialize from file if let Some(hb) = NodeHeartbeat::from_bytes(&bytes) { println!("Tick rate: {}/{} Hz", hb.actual_rate, hb.target_rate); } ``` **Fields:** | Field | Type | Description | |-------|------|-------------| | `state` | `u8` | Execution state (use `NodeState as u8` to set) | | `health` | `u8` | Health status (use `HealthStatus as u8` to set) | | `tick_count` | `u64` | Total tick count | | `target_rate` | `u32` | Target tick rate | | `actual_rate` | `u32` | Measured tick rate | | `error_count` | `u32` | Error count | | `last_tick_timestamp` | `u64` | Last tick time (unix epoch seconds) | | `heartbeat_timestamp` | `u64` | Heartbeat time (unix epoch seconds) | ## SafetyStatus Safety system status. ```rust // simplified use horus::prelude::*; // Provides diagnostics::SafetyStatus; let mut safety = SafetyStatus::new(); // SafetyStatus::new() sets good defaults (enabled=1, watchdog=1, limits=1, comms=1) // Override only if needed: safety.estop_engaged = 0; // Check if safe to operate if safety.is_safe() { println!("System is safe to operate"); } else { println!("Safety interlock active - fault code: {}", safety.fault_code); } // Set fault condition safety.set_fault(1001); println!("Mode: {}", match safety.mode { SafetyStatus::MODE_NORMAL => "Normal", SafetyStatus::MODE_REDUCED => "Reduced", SafetyStatus::MODE_SAFE_STOP => "Safe Stop", _ => "Unknown" }); // Clear faults safety.clear_faults(); ``` **Mode Constants:** | Constant | Value | Description | |----------|-------|-------------| | `MODE_NORMAL` | 0 | Normal operation | | `MODE_REDUCED` | 1 | Reduced speed/power | | `MODE_SAFE_STOP` | 2 | Safe stop engaged | **Fields:** | Field | Type | Description | |-------|------|-------------| | `enabled` | `u8` | Safety system active (0 = off, 1 = on) | | `estop_engaged` | `u8` | Emergency stop engaged (0 = no, 1 = yes) | | `watchdog_ok` | `u8` | Watchdog timer OK (0 = fault, 1 = ok) | | `limits_ok` | `u8` | All limits within bounds (0 = fault, 1 = ok) | | `comms_ok` | `u8` | Communication healthy (0 = fault, 1 = ok) | | `mode` | `u8` | Safety mode | | `fault_code` | `u32` | Fault code (0 = none) | | `timestamp_ns` | `u64` | Nanoseconds since epoch | ## Diagnostics Node Example ```rust // simplified use horus::prelude::*; struct DiagnosticsNode { status_pub: Topic, resource_pub: Topic, safety_sub: Topic, estop_pub: Topic, tick_count: u64, start_time: std::time::Instant, } impl Node for DiagnosticsNode { fn name(&self) -> &str { "Diagnostics" } fn tick(&mut self) { self.tick_count += 1; // Check safety status if let Some(safety) = self.safety_sub.recv() { if !safety.is_safe() { // Trigger emergency stop let estop = EmergencyStop::engage(&format!( "Safety fault code: {}", safety.fault_code )).with_source("DiagnosticsNode"); self.estop_pub.send(estop); // Send error status let status = DiagnosticStatus::error(safety.fault_code, "Safety system fault") .with_component("SafetyMonitor"); self.status_pub.send(status); } } // Periodic resource reporting (every 100 ticks) if self.tick_count % 100 == 0 { let mut usage = ResourceUsage::new(); // ... populate with actual system metrics ... // Check thresholds if usage.is_cpu_high(90.0) { let status = DiagnosticStatus::warn(1001, "CPU usage above 90%") .with_component("ResourceMonitor"); self.status_pub.send(status); } self.resource_pub.send(usage); } // Periodic OK status (every 1000 ticks) if self.tick_count % 1000 == 0 { let uptime = self.start_time.elapsed().as_secs_f64(); let status = DiagnosticStatus::ok(&format!("System healthy, uptime: {:.0}s", uptime)) .with_component("DiagnosticsNode"); self.status_pub.send(status); } } } ``` ## StatusLevel Severity level for diagnostic status reports. Used by `DiagnosticStatus` to indicate severity. | Variant | Value | Description | |---------|-------|-------------| | `Ok` | 0 | Everything is operating normally | | `Warn` | 1 | Warning condition (degraded but functional) | | `Error` | 2 | Error condition (recoverable) | | `Fatal` | 3 | Fatal error (system should stop) | ```rust // simplified use horus::prelude::*; let status = DiagnosticStatus::new(StatusLevel::Warn, "Battery low: 15%"); ``` ## NodeStateMsg Represents the lifecycle state of a node. Published by the scheduler for monitoring. | Variant | Value | Description | |---------|-------|-------------| | `Idle` | 0 | Node created but not yet started | | `Initializing` | 1 | Running `init()` | | `Running` | 2 | Active and executing `tick()` | | `Paused` | 3 | Temporarily suspended | | `Stopped` | 4 | Cleanly shut down | | `Error` | 5 | Error or crashed state | ```rust // simplified use horus::prelude::*; // Monitor node state transitions if let Some(state) = node_state_sub.recv() { match state { NodeStateMsg::Running => println!("Node is active"), NodeStateMsg::Error => println!("Node has errors!"), _ => {} } } ``` ## See Also - [Safety Monitor](/advanced/safety-monitor) - Safety monitoring features - [BlackBox Flight Recorder](/advanced/blackbox) - Event recording and crash analysis --- ## Vision Messages Path: /rust/api/vision-messages Description: Camera, image, calibration, and visual detection messages # Vision Messages HORUS provides message types for cameras, images, camera calibration, and visual detection systems. ## Image Pool-backed RAII image type with zero-copy shared memory transport. Image allocates from a global tensor pool — you don't manage memory directly. ```rust // simplified use horus::prelude::*; // Create an RGB image (width, height, encoding) — allocates from global pool let mut image = Image::new(640, 480, ImageEncoding::Rgb8)?; // Copy pixel data into the image let pixels: Vec = vec![128; 480 * 640 * 3]; image.copy_from(&pixels); // Set metadata (method chaining) image.set_frame_id("camera_front").set_timestamp_ns(1234567890); // Access image properties println!("Image: {}x{}, {:?}", image.width(), image.height(), image.encoding()); println!("Data size: {} bytes", image.data().len()); // Access individual pixel (x, y) if let Some(pixel) = image.pixel(0, 0) { println!("Pixel[0,0]: R={}, G={}, B={}", pixel[0], pixel[1], pixel[2]); } // Set a pixel value image.set_pixel(0, 0, &[255, 0, 0]); // Extract region of interest (returns raw bytes, not Image) if let Some(roi_data) = image.roi(0, 0, 100, 100) { println!("ROI data: {} bytes", roi_data.len()); } // Fill entire image with a color image.fill(&[0, 0, 0]); // Black ``` **ImageEncoding values:** | Encoding | Channels | Bytes/Pixel | Description | |----------|----------|-------------|-------------| | `Mono8` | 1 | 1 | 8-bit monochrome | | `Mono16` | 1 | 2 | 16-bit monochrome | | `Rgb8` | 3 | 3 | 8-bit RGB (default) | | `Bgr8` | 3 | 3 | 8-bit BGR (OpenCV) | | `Rgba8` | 4 | 4 | 8-bit RGBA | | `Bgra8` | 4 | 4 | 8-bit BGRA | | `Yuv422` | 2 | 2 | YUV 4:2:2 | | `Mono32F` | 1 | 4 | 32-bit float mono | | `Rgb32F` | 3 | 12 | 32-bit float RGB | | `BayerRggb8` | 1 | 1 | Bayer pattern (raw) | | `Depth16` | 1 | 2 | 16-bit depth (mm) | **ImageEncoding methods:** | Method | Returns | Description | |--------|---------|-------------| | `bytes_per_pixel()` | `u32` | Bytes per pixel for this encoding | | `is_color()` | `bool` | Whether encoding has color information | **Image methods:** Image is an RAII type — fields are private, accessed through methods. Mutation methods return `&mut Self` for chaining. | Method | Returns | Description | |--------|---------|-------------| | `new(width, height, encoding)` | `Result` | Create image (allocates from global pool) | | `width()` | `u32` | Image width in pixels | | `height()` | `u32` | Image height in pixels | | `encoding()` | `ImageEncoding` | Pixel encoding format | | `data()` | `&[u8]` | Zero-copy access to pixel data | | `data_mut()` | `&mut [u8]` | Mutable access to pixel data | | `copy_from(src)` | `&mut Self` | Copy pixel data into image | | `pixel(x, y)` | `Option<&[u8]>` | Get pixel bytes at coordinates | | `set_pixel(x, y, value)` | `&mut Self` | Set pixel value at coordinates | | `fill(value)` | `&mut Self` | Fill entire image with a value | | `roi(x, y, w, h)` | `Option>` | Extract raw bytes for a region | | `set_frame_id(id)` | `&mut Self` | Set camera frame identifier | | `set_timestamp_ns(ts)` | `&mut Self` | Set timestamp in nanoseconds | ## CompressedImage Compressed image data (JPEG, PNG, etc.). ```rust // simplified use horus::prelude::*; // Create compressed image from JPEG data let jpeg_data = std::fs::read("image.jpg")?; let compressed = CompressedImage::new("jpeg", jpeg_data); println!("Format: {}", compressed.format_str()); println!("Compressed size: {} bytes", compressed.data.len()); // Optional: set original dimensions if known let mut img = compressed; img.width = 640; img.height = 480; ``` **Fields:** | Field | Type | Description | |-------|------|-------------| | `format` | `[u8; 8]` | Compression format string (null-padded) | | `data` | `Vec` | Compressed data | | `width` | `u32` | Original width (0 if unknown) | | `height` | `u32` | Original height (0 if unknown) | | `frame_id` | `[u8; 32]` | Camera identifier | | `timestamp_ns` | `u64` | Nanoseconds since epoch | **Methods:** | Method | Returns | Description | |--------|---------|-------------| | `new(format, data)` | `CompressedImage` | Create from format string and data (auto-sets timestamp) | | `format_str()` | `String` | Get format as string | ## CameraInfo Camera calibration information. ```rust // simplified use horus::prelude::*; // Create camera info with intrinsics let camera = CameraInfo::new( 640, 480, // width, height 525.0, 525.0, // fx, fy 320.0, 240.0 // cx, cy (principal point) ).with_distortion_model("plumb_bob"); // Access intrinsics let (fx, fy) = camera.focal_lengths(); let (cx, cy) = camera.principal_point(); println!("Focal length: ({:.1}, {:.1})", fx, fy); println!("Principal point: ({:.1}, {:.1})", cx, cy); // Set distortion coefficients let mut camera = camera; camera.distortion_coefficients = [ -0.25, // k1 0.12, // k2 0.001, // p1 -0.001, // p2 0.0, 0.0, 0.0, 0.0 // k3-k6 ]; ``` **Camera Matrix (3x3):** ``` [fx, 0, cx] [ 0, fy, cy] [ 0, 0, 1] ``` **Projection Matrix (3x4):** ``` [fx', 0, cx', Tx] [ 0, fy', cy', Ty] [ 0, 0, 1, 0] ``` **Fields:** | Field | Type | Description | |-------|------|-------------| | `width` | `u32` | Image width in pixels | | `height` | `u32` | Image height in pixels | | `distortion_model` | `[u8; 16]` | Distortion model name (null-padded) | | `distortion_coefficients` | `[f64; 8]` | [k1, k2, p1, p2, k3, k4, k5, k6] | | `camera_matrix` | `[f64; 9]` | 3x3 intrinsic matrix (row-major) | | `rectification_matrix` | `[f64; 9]` | 3x3 rectification matrix (identity by default) | | `projection_matrix` | `[f64; 12]` | 3x4 projection matrix (row-major) | | `frame_id` | `[u8; 32]` | Camera identifier | | `timestamp_ns` | `u64` | Nanoseconds since epoch | **Methods:** | Method | Returns | Description | |--------|---------|-------------| | `new(width, height, fx, fy, cx, cy)` | `CameraInfo` | Create with intrinsics (auto-sets camera/projection matrices) | | `with_distortion_model(model)` | `CameraInfo` | Set distortion model name (builder) | | `focal_lengths()` | `(f64, f64)` | Get (fx, fy) from camera matrix | | `principal_point()` | `(f64, f64)` | Get (cx, cy) from camera matrix | ## RegionOfInterest Region of interest (bounding box) in an image. ```rust // simplified use horus::prelude::*; // Create ROI let roi = RegionOfInterest::new(100, 50, 200, 150); // Check if point is inside ROI if roi.contains(150, 100) { println!("Point is inside ROI"); } // Get area println!("ROI area: {} pixels", roi.area()); // Access properties println!("ROI: ({}, {}) -> {}x{}", roi.x_offset, roi.y_offset, roi.width, roi.height); ``` **Fields:** | Field | Type | Description | |-------|------|-------------| | `x_offset` | `u32` | X offset of region | | `y_offset` | `u32` | Y offset of region | | `width` | `u32` | Region width | | `height` | `u32` | Region height | | `do_rectify` | `bool` | Apply rectification | **Methods:** | Method | Returns | Description | |--------|---------|-------------| | `new(x, y, width, height)` | `RegionOfInterest` | Create a new ROI | | `contains(x, y)` | `bool` | Check if point is inside ROI | | `area()` | `u32` | Get area in pixels | ## StereoInfo Stereo camera pair information. ```rust // simplified use horus_library::messages::vision::StereoInfo; use horus::prelude::*; // Create stereo configuration let left = CameraInfo::new(640, 480, 525.0, 525.0, 320.0, 240.0); let right = CameraInfo::new(640, 480, 525.0, 525.0, 320.0, 240.0); let stereo = StereoInfo { left_camera: left, right_camera: right, baseline: 0.12, // 12cm between cameras depth_scale: 1.0, }; // Calculate depth from disparity let disparity = 64.0_f32; // pixels let depth = stereo.depth_from_disparity(disparity); println!("Disparity {} -> depth {:.2}m", disparity, depth); // Calculate disparity from depth let depth = 2.0_f32; // meters let disparity = stereo.disparity_from_depth(depth); println!("Depth {}m -> disparity {:.1}px", depth, disparity); ``` > **Note:** `StereoInfo` is not included in the convenience re-exports. Import it directly from `horus_library::messages::vision::StereoInfo`. **Fields:** | Field | Type | Description | |-------|------|-------------| | `left_camera` | `CameraInfo` | Left camera calibration | | `right_camera` | `CameraInfo` | Right camera calibration | | `baseline` | `f64` | Camera distance (meters) | | `depth_scale` | `f64` | Disparity-to-depth factor | **Methods:** | Method | Returns | Description | |--------|---------|-------------| | `depth_from_disparity(disparity: f32)` | `f32` | Calculate depth from pixel disparity (returns `INFINITY` if disparity <= 0) | | `disparity_from_depth(depth: f32)` | `f32` | Calculate disparity from depth (returns 0 if depth <= 0) | ## BoundingBox2D 2D bounding box for object detection. Fixed-size, suitable for zero-copy shared memory transport. ```rust // simplified use horus::prelude::*; // Create from top-left corner let bbox = BoundingBox2D::new(100.0, 50.0, 200.0, 150.0); // Create from center (YOLO format) let bbox = BoundingBox2D::from_center(200.0, 125.0, 200.0, 150.0); // Get properties println!("Center: ({}, {})", bbox.center_x(), bbox.center_y()); println!("Area: {} px²", bbox.area()); // Calculate IoU between two boxes let other = BoundingBox2D::new(150.0, 75.0, 200.0, 150.0); println!("IoU: {:.3}", bbox.iou(&other)); ``` **Fields:** | Field | Type | Description | |-------|------|-------------| | `x` | `f32` | X of top-left corner (pixels) | | `y` | `f32` | Y of top-left corner (pixels) | | `width` | `f32` | Width (pixels) | | `height` | `f32` | Height (pixels) | **Methods:** | Method | Returns | Description | |--------|---------|-------------| | `new(x, y, width, height)` | `BoundingBox2D` | Create from top-left corner | | `from_center(cx, cy, width, height)` | `BoundingBox2D` | Create from center (YOLO format) | | `center_x()` | `f32` | Get center X coordinate | | `center_y()` | `f32` | Get center Y coordinate | | `area()` | `f32` | Get area | | `iou(other)` | `f32` | Intersection over Union with another box | ## Detection 2D object detection result. Fixed-size (72 bytes), suitable for zero-copy shared memory transport. ```rust // simplified use horus::prelude::*; // Create detection with class name, confidence, and bounding box coordinates let det = Detection::new("person", 0.95, 100.0, 50.0, 200.0, 300.0); println!("Detected: {} ({:.1}% confidence)", det.class_name(), det.confidence * 100.0); println!("BBox: ({}, {}) {}x{}", det.bbox.x, det.bbox.y, det.bbox.width, det.bbox.height); // Check confidence threshold if det.is_confident(0.9) { println!("High confidence detection!"); } // Create with class ID instead of name let bbox = BoundingBox2D::new(100.0, 50.0, 200.0, 300.0); let det = Detection::with_class_id(1, 0.88, bbox); ``` **Fields:** | Field | Type | Description | |-------|------|-------------| | `bbox` | `BoundingBox2D` | Bounding box (x, y, width, height) | | `confidence` | `f32` | Detection confidence (0.0-1.0) | | `class_id` | `u32` | Numeric class identifier | | `class_name` | `[u8; 32]` | Class name string (null-padded, max 31 chars) | | `instance_id` | `u32` | Instance ID (for instance segmentation) | **Methods:** | Method | Returns | Description | |--------|---------|-------------| | `new(class_name, confidence, x, y, width, height)` | `Detection` | Create with name and bbox coordinates | | `with_class_id(class_id, confidence, bbox)` | `Detection` | Create with numeric class ID | | `set_class_name(name)` | `()` | Set class name (truncates to 31 chars) | | `class_name()` | `&str` | Get class name as string | | `is_confident(threshold)` | `bool` | Check if confidence >= threshold | ## Detection3D 3D object detection from point clouds or depth-aware models. Fixed-size (104 bytes) with velocity tracking. ```rust // simplified use horus::prelude::*; // Create 3D bounding box (center, dimensions, yaw) let bbox = BoundingBox3D::new( 5.0, 2.0, 0.5, // center (x, y, z) in meters 4.5, 2.0, 1.5, // dimensions (length, width, height) 0.1 // yaw rotation in radians ); // Create 3D detection with velocity let det = Detection3D::new("car", 0.92, bbox) .with_velocity(10.0, 5.0, 0.0); // m/s println!("Detected: {} at ({}, {}, {})", det.class_name(), det.bbox.cx, det.bbox.cy, det.bbox.cz); println!("Volume: {:.1} m³", det.bbox.volume()); ``` **BoundingBox3D fields:** | Field | Type | Description | |-------|------|-------------| | `cx`, `cy`, `cz` | `f32` | Center coordinates (meters) | | `length`, `width`, `height` | `f32` | Dimensions (meters) | | `roll`, `pitch`, `yaw` | `f32` | Euler angles (radians) | **BoundingBox3D methods:** | Method | Returns | Description | |--------|---------|-------------| | `new(cx, cy, cz, length, width, height, yaw)` | `BoundingBox3D` | Create with yaw only | | `with_rotation(center, dimensions, rotation)` | `BoundingBox3D` | Create with full roll/pitch/yaw | | `volume()` | `f32` | Box volume in m³ | **Detection3D fields:** | Field | Type | Description | |-------|------|-------------| | `bbox` | `BoundingBox3D` | 3D bounding box | | `confidence` | `f32` | Confidence score (0.0-1.0) | | `class_id` | `u32` | Numeric class identifier | | `class_name` | `[u8; 32]` | Class name (null-padded, max 31 chars) | | `velocity_x`, `velocity_y`, `velocity_z` | `f32` | Velocity in m/s | | `instance_id` | `u32` | Instance/tracking ID | ## Vision Processing Node Example ```rust // simplified use horus::prelude::*; struct VisionNode { image_sub: Topic, camera_info_sub: Topic, detection_pub: Topic, camera_info: Option, } impl Node for VisionNode { fn name(&self) -> &str { "VisionNode" } fn tick(&mut self) { // Update camera calibration if let Some(info) = self.camera_info_sub.recv() { self.camera_info = Some(info); } // Process images if let Some(image) = self.image_sub.recv() { // Run detection (your ML model here) let detection = Detection::new( "person", 0.95, 100.0, 50.0, 200.0, 300.0 ); self.detection_pub.send(detection); } } } ``` ## See Also - [Perception Messages](/rust/api/perception-messages) - PointCloud, DepthImage - [Message Types](/concepts/message-types) - Standard message type overview --- ## Image Path: /rust/api/image Description: Zero-copy shared memory image type for camera pipelines # Image Pool-backed RAII image type with zero-copy shared memory transport. `Image` allocates from a global tensor pool automatically — you never manage memory directly. Only a lightweight descriptor travels through topics; the pixel data stays in shared memory at ~50ns IPC latency. ## Quick Start ```rust // simplified use horus::prelude::*; // Create a 640x480 RGB image let mut img = Image::new(640, 480, ImageEncoding::Rgb8)?; // Fill with red img.fill(&[255, 0, 0]); // Set metadata (chainable) img.set_frame_id("camera_front") .set_timestamp_ns(1_700_000_000_000_000_000); // Publish on a topic — zero-copy, only the descriptor is sent let topic: Topic = Topic::new("camera.rgb")?; topic.send(&img); ``` ```rust // simplified // Receiver side let topic: Topic = Topic::new("camera.rgb")?; if let Some(img) = topic.recv() { println!("{}x{} {:?}", img.width(), img.height(), img.encoding()); println!("Frame: {}, Timestamp: {}ns", img.frame_id(), img.timestamp_ns()); // Direct pixel access — zero-copy if let Some(px) = img.pixel(0, 0) { println!("Top-left pixel: R={} G={} B={}", px[0], px[1], px[2]); } } ``` ## Pixel Access ```rust // simplified use horus::prelude::*; let mut img = Image::new(640, 480, ImageEncoding::Rgb8)?; // Read a pixel — returns None if out of bounds if let Some(pixel) = img.pixel(100, 200) { println!("R={} G={} B={}", pixel[0], pixel[1], pixel[2]); } // Write a pixel — no-op if out of bounds, chainable img.set_pixel(100, 200, &[255, 128, 0]) .set_pixel(101, 200, &[255, 128, 0]); // Fill entire image with a single color img.fill(&[0, 0, 0]); // Black // Extract a region of interest (returns raw bytes) if let Some(roi_data) = img.roi(10, 10, 100, 100) { println!("ROI: {} bytes", roi_data.len()); } ``` ## Raw Data Access ```rust // simplified use horus::prelude::*; let mut img = Image::new(640, 480, ImageEncoding::Rgb8)?; // Zero-copy read access to the underlying buffer let data: &[u8] = img.data(); println!("Total bytes: {}", data.len()); // 640 * 480 * 3 // Mutable access for bulk operations let data_mut: &mut [u8] = img.data_mut(); data_mut[0] = 255; // Set first byte directly // Copy from an external buffer let pixels: Vec = vec![128; 640 * 480 * 3]; img.copy_from(&pixels); ``` ## Camera Pipeline Example A complete camera processing node that receives images, processes them, and publishes results. ```rust // simplified use horus::prelude::*; struct CameraNode { raw_sub: Topic, processed_pub: Topic, } impl Node for CameraNode { fn name(&self) -> &str { "CameraProcessor" } fn tick(&mut self) { if let Some(raw) = self.raw_sub.recv() { // Create output image with same dimensions let mut out = Image::new( raw.width(), raw.height(), ImageEncoding::Mono8 ).unwrap(); // Convert RGB to grayscale (simple luminance) let src = raw.data(); let dst = out.data_mut(); for i in 0..((raw.width() * raw.height()) as usize) { let r = src[i * 3] as f32; let g = src[i * 3 + 1] as f32; let b = src[i * 3 + 2] as f32; dst[i] = (0.299 * r + 0.587 * g + 0.114 * b) as u8; } out.set_frame_id(raw.frame_id()) .set_timestamp_ns(raw.timestamp_ns()); self.processed_pub.send(&out); } } } ``` ## Depth Camera Example ```rust // simplified use horus::prelude::*; // Create a 16-bit depth image (values in millimeters) let mut depth = Image::new(640, 480, ImageEncoding::Depth16)?; // Each pixel is 2 bytes (u16 little-endian) let data = depth.data_mut(); let distance_mm: u16 = 1500; // 1.5 meters data[0] = (distance_mm & 0xFF) as u8; data[1] = (distance_mm >> 8) as u8; println!("Depth image: {}x{}, {} bytes/pixel, step={}", depth.width(), depth.height(), depth.encoding().bytes_per_pixel(), depth.step()); ``` ## Rust API Reference ### Constructor ```rust // simplified pub fn new(width: u32, height: u32, encoding: ImageEncoding) -> Result ``` Create an image by allocating from the global tensor pool (zero-copy SHM). **Parameters** | Name | Type | Required | Description | |------|------|----------|-------------| | `width` | `u32` | yes | Image width in pixels. Must be > 0. | | `height` | `u32` | yes | Image height in pixels. Must be > 0. | | `encoding` | `ImageEncoding` | yes | Pixel format: `Rgb8`, `Bgr8`, `Rgba8`, `Mono8`, `Depth16`, etc. Determines bytes per pixel. | - `encoding: ImageEncoding` — Pixel format. Common values: - `ImageEncoding::Rgb8` — 3 bytes/pixel (red, green, blue) - `ImageEncoding::Rgba8` — 4 bytes/pixel (with alpha) - `ImageEncoding::Bgr8` — 3 bytes/pixel (OpenCV default order) - `ImageEncoding::Mono8` — 1 byte/pixel (grayscale) - `ImageEncoding::Mono16` — 2 bytes/pixel (16-bit grayscale) **Returns:** `Result` — `Ok(image)` or `Err(MemoryError::PoolExhausted)` if no pool slots available. **Memory:** Allocated from SHM tensor pool. Zero-copy when sent via `Topic`. The image data is NOT copied between publisher and subscriber — they share the same physical memory. **Example:** ```rust // simplified let img = Image::new(640, 480, ImageEncoding::Rgb8)?; assert_eq!(img.width(), 640); assert_eq!(img.height(), 480); assert_eq!(img.channels(), 3); ``` ### Pixel Access | Method | Returns | Description | |--------|---------|-------------| | `pixel(x, y)` | `Option<&[u8]>` | Get pixel bytes at (x, y). Returns `None` if out of bounds | | `set_pixel(x, y, value)` | `&mut Self` | Set pixel value. No-op if out of bounds. Chainable | | `fill(value)` | `&mut Self` | Fill every pixel with the same value. Chainable | | `roi(x, y, w, h)` | `Option>` | Extract a rectangular region as raw bytes | ### Metadata | Method | Returns | Description | |--------|---------|-------------| | `width()` | `u32` | Image width in pixels | | `height()` | `u32` | Image height in pixels | | `channels()` | `u32` | Number of channels (e.g., 3 for RGB) | | `encoding()` | `ImageEncoding` | Pixel encoding format | | `step()` | `u32` | Bytes per row (`width * bytes_per_pixel`) | ### pixel ```rust // simplified pub fn pixel(&self, x: u32, y: u32) -> Option<&[u8]> ``` Get pixel bytes at (x, y). Returns a slice of length `channels()` (e.g., 3 bytes for RGB8). **Returns:** `None` if coordinates are out of bounds. **Example:** ```rust // simplified if let Some(rgb) = img.pixel(100, 200) { println!("R={} G={} B={}", rgb[0], rgb[1], rgb[2]); } ``` ### set_pixel ```rust // simplified pub fn set_pixel(&mut self, x: u32, y: u32, value: &[u8]) -> &mut Self ``` Set pixel value at (x, y). No-op if out of bounds. Chainable. **Parameters** | Name | Type | Required | Description | |------|------|----------|-------------| | `x` | `u32` | yes | Pixel x coordinate (column) | | `y` | `u32` | yes | Pixel y coordinate (row) | | `value` | `&[u8]` | yes | Pixel data. Length must match `channels()` (e.g., 3 bytes for RGB8). | **Example:** ```rust // simplified img.set_pixel(100, 200, &[255, 0, 0]) // red .set_pixel(101, 200, &[0, 255, 0]); // green (chained) ``` ### fill ```rust // simplified pub fn fill(&mut self, value: &[u8]) -> &mut Self ``` Fill every pixel with the same value. Uses SIMD-optimized copy. Chainable. **Example:** ```rust // simplified img.fill(&[0, 0, 0]); // black ``` ### roi ```rust // simplified pub fn roi(&self, x: u32, y: u32, w: u32, h: u32) -> Option> ``` Extract a rectangular region as raw bytes. Returns `None` if the region extends out of bounds. **Example:** ```rust // simplified // Extract 100x100 region from top-left corner if let Some(region) = img.roi(0, 0, 100, 100) { println!("Region: {} bytes", region.len()); } ``` ### Data Access | Method | Returns | Description | |--------|---------|-------------| | `data()` | `&[u8]` | Zero-copy read access to raw pixel bytes | | `data_mut()` | `&mut [u8]` | Mutable access to raw pixel bytes | | `copy_from(src)` | `&mut Self` | Copy bytes from a slice into the image buffer (SIMD-optimized). Chainable | ```rust // simplified // Zero-copy access to pixel buffer let bytes: &[u8] = img.data(); println!("{} bytes total", bytes.len()); // Copy from an external source (e.g., camera capture) let camera_data: Vec = capture_frame(); img.copy_from(&camera_data); ``` ### Frame & Timestamp | Method | Returns | Description | |--------|---------|-------------| | `set_frame_id(id)` | `&mut Self` | Set the camera frame identifier. Chainable | | `set_timestamp_ns(ts)` | `&mut Self` | Set timestamp in nanoseconds. Chainable | | `frame_id()` | `&str` | Get the camera frame identifier | | `timestamp_ns()` | `u64` | Get timestamp in nanoseconds | ### Type Info | Method | Returns | Description | |--------|---------|-------------| | `dtype()` | `TensorDtype` | Underlying tensor data type (e.g., `U8` for 8-bit encodings) | | `nbytes()` | `u64` | Total size of the pixel buffer in bytes | | `is_cpu()` | `bool` | Whether the image data resides on CPU | ## ImageEncoding Pixel format enum (`#[repr(u8)]`, default: `Rgb8`). | Variant | Channels | Bytes/Pixel | Description | |---------|----------|-------------|-------------| | `Mono8` | 1 | 1 | 8-bit grayscale | | `Mono16` | 1 | 2 | 16-bit grayscale | | `Rgb8` | 3 | 3 | 8-bit RGB (default) | | `Bgr8` | 3 | 3 | 8-bit BGR (OpenCV convention) | | `Rgba8` | 4 | 4 | 8-bit RGBA with alpha | | `Bgra8` | 4 | 4 | 8-bit BGRA with alpha | | `Yuv422` | 2 | 2 | YUV 4:2:2 packed | | `Mono32F` | 1 | 4 | 32-bit float grayscale | | `Rgb32F` | 3 | 12 | 32-bit float RGB | | `BayerRggb8` | 1 | 1 | Bayer RGGB raw sensor data | | `Depth16` | 1 | 2 | 16-bit depth in millimeters | **Methods:** | Method | Returns | Description | |--------|---------|-------------| | `bytes_per_pixel()` | `u32` | Number of bytes per pixel | | `channels()` | `u32` | Number of color channels | ## Python API The Python `Image` class wraps the same shared memory backend, with zero-copy interop to NumPy, PyTorch, and JAX. ### Constructor & Factories ```python import horus # Create an empty image (height, width, encoding) img = horus.Image(480, 640, encoding="rgb8") # From a NumPy array — encoding auto-detected from shape import numpy as np arr = np.zeros((480, 640, 3), dtype=np.uint8) img = horus.Image.from_numpy(arr, encoding="rgb8") # From a PyTorch tensor import torch t = torch.zeros(480, 640, 3, dtype=torch.uint8) img = horus.Image.from_torch(t, encoding="rgb8") # From raw bytes data = bytes(480 * 640 * 3) img = horus.Image.from_bytes(data, height=480, width=640, encoding="rgb8") ``` ### Zero-Copy Conversions ```python # To NumPy — zero-copy view arr = img.to_numpy() # shape: (480, 640, 3), dtype: uint8 # To PyTorch — zero-copy via DLPack tensor = img.to_torch() # To JAX — zero-copy via DLPack jax_arr = img.to_jax() ``` ### Pixel Operations ```python # Read pixel at (x, y) r, g, b = img.pixel(100, 200) # Write pixel img.set_pixel(100, 200, [255, 0, 0]) # Fill entire image img.fill([128, 128, 128]) # Extract region of interest roi_bytes = img.roi(0, 0, 100, 100) ``` ### Properties & Setters | Property | Type | Description | |----------|------|-------------| | `height` | `int` | Image height in pixels | | `width` | `int` | Image width in pixels | | `channels` | `int` | Number of channels | | `encoding` | `str` | Encoding name (e.g., `"rgb8"`) | | `dtype` | `str` | Tensor data type | | `nbytes` | `int` | Total buffer size in bytes | | `step` | `int` | Bytes per row | | `frame_id` | `str` | Camera frame identifier | | `timestamp_ns` | `int` | Timestamp in nanoseconds | ```python img.set_frame_id("camera_front") img.set_timestamp_ns(1_700_000_000_000_000_000) ``` ### Encoding Aliases Python accepts flexible encoding strings: | Canonical | Aliases | |-----------|---------| | `"mono8"` | `"gray"`, `"grey"`, `"l"` | | `"rgb8"` | `"rgb"` | | `"bgr8"` | `"bgr"` | | `"rgba8"` | `"rgba"` | | `"bgra8"` | `"bgra"` | | `"yuv422"` | `"yuyv"` | | `"mono32f"` | `"gray32f"`, `"float"` | | `"depth16"` | — | ## See Also - [Vision Messages](/rust/api/vision-messages) — CompressedImage, CameraInfo, Detection types - [Tensor API](/rust/api/tensor) — Advanced pool management - [Tensor Messages](/rust/api/tensor-messages) — Tensor descriptor, TensorDtype, Device - [Python Image](/python/api/image) — Python Image API --- ## PointCloud Path: /rust/api/pointcloud Description: Zero-copy shared memory point cloud type for LiDAR and 3D sensing # PointCloud HORUS provides a pool-backed RAII `PointCloud` type for LiDAR and 3D sensing workloads. Point cloud data lives in shared memory and is transported zero-copy between nodes — only a lightweight descriptor is transmitted through topics. ## Creating a PointCloud ```rust // simplified use horus::prelude::*; // XYZ point cloud: 10,000 points, 3 fields per point (x, y, z), float32 let mut cloud = PointCloud::from_xyz(&points)? // 10_000 points; // XYZI point cloud (with intensity): 4 fields per point let mut cloud_i = PointCloud::from_xyzi(&points)? // 50_000 points; // XYZRGB point cloud (with color): 6 fields per point let mut cloud_rgb = PointCloud::from_xyz(&points) // 20_000 points, 6 fields?; ``` ## Writing Point Data ```rust // simplified use horus::prelude::*; let mut cloud = PointCloud::from_xyz(&points)? // 3 points; // Copy raw bytes into the cloud let points: Vec = vec![ 1.0, 2.0, 3.0, // point 0 4.0, 5.0, 6.0, // point 1 7.0, 8.0, 9.0, // point 2 ]; let bytes: &[u8] = bytemuck::cast_slice(&points); cloud.copy_from(bytes); // Or write directly via mutable data access let data: &mut [u8] = cloud.data_mut(); // ... fill data ... ``` ## Reading Points ```rust // simplified use horus::prelude::*; let cloud = PointCloud::from_xyz(&points)? // 10_000 points; // Extract all XYZ coordinates (F32 clouds only) if let Some(points) = cloud.extract_xyz() { for p in &points[..5] { println!("({:.2}, {:.2}, {:.2})", p[0], p[1], p[2]); } } // Access a single point as raw bytes if let Some(point_bytes) = cloud.point_at(0) { // Reinterpret the raw bytes as f32 (XYZ cloud: 12 bytes = 3 x f32) let x = f32::from_le_bytes(point_bytes[0..4].try_into().unwrap()); let y = f32::from_le_bytes(point_bytes[4..8].try_into().unwrap()); let z = f32::from_le_bytes(point_bytes[8..12].try_into().unwrap()); println!("Point 0: x={}, y={}, z={}", x, y, z); } // Zero-copy access to the entire buffer let raw: &[u8] = cloud.data(); ``` ## Metadata and Properties ```rust // simplified use horus::prelude::*; let mut cloud = PointCloud::from_xyzi(&points)? // 10_000 points; // Set frame and timestamp (method chaining) cloud.set_frame_id("velodyne_top") .set_timestamp_ns(1_700_000_000_000_000_000); // Read back println!("Frame: {}", cloud.frame_id()); println!("Timestamp: {} ns", cloud.timestamp_ns()); // Point layout queries println!("Points: {}", cloud.point_count()); // 10000 println!("Fields/point: {}", cloud.fields_per_point()); // 4 println!("Is XYZ: {}", cloud.is_xyz()); // false (4 fields) println!("Has intensity: {}", cloud.has_intensity()); // true println!("Has color: {}", cloud.has_color()); // false // Type info println!("Dtype: {:?}", cloud.dtype()); // F32 println!("Total bytes: {}", cloud.nbytes()); // 10000 * 4 * 4 println!("Is CPU: {}", cloud.is_cpu()); // true ``` ## Sending and Receiving via Topic Only a lightweight descriptor is transmitted through topics. The point data stays in shared memory — true zero-copy IPC. ```rust // simplified use horus::prelude::*; // Publisher let pub_topic: Topic = Topic::new("lidar.points")?; let mut cloud = PointCloud::from_xyzi(&points)? // 64_000 points; cloud.set_frame_id("velodyne_top"); // ... fill point data from sensor driver ... pub_topic.send(cloud); ``` ```rust // simplified // Subscriber let sub_topic: Topic = Topic::new("lidar.points")?; if let Some(cloud) = sub_topic.recv() { println!("Received {} points from '{}'", cloud.point_count(), cloud.frame_id()); if let Some(xyz) = cloud.extract_xyz() { let closest = xyz.iter() .map(|p| (p[0]*p[0] + p[1]*p[1] + p[2]*p[2]).sqrt()) .fold(f32::INFINITY, f32::min); println!("Closest point: {:.2}m", closest); } } ``` ## LiDAR Processing Pipeline A complete node that receives raw LiDAR scans, filters ground points, and publishes the result: ```rust // simplified use horus::prelude::*; struct LidarFilterNode { raw_sub: Topic, filtered_pub: Topic, ground_threshold: f32, } impl Node for LidarFilterNode { fn name(&self) -> &str { "LidarFilter" } fn tick(&mut self) { if let Some(raw) = self.raw_sub.recv() { if let Some(points) = raw.extract_xyz() { // Filter out ground points (z below threshold) let non_ground: Vec = points.iter() .filter(|p| p[2] > self.ground_threshold) .flat_map(|p| p.iter().copied()) .collect(); let num_points = non_ground.len() / 3; if let Ok(mut filtered) = PointCloud::new( num_points as u32, 3, TensorDtype::F32 ) { let bytes: &[u8] = bytemuck::cast_slice(&non_ground); filtered.copy_from(bytes) .set_frame_id(raw.frame_id()) .set_timestamp_ns(raw.timestamp_ns()); self.filtered_pub.send(filtered); } } } } } ``` ## API Reference ### Constructors #### from_xyz ```rust // simplified pub fn from_xyz(points: &[[f32; 3]]) -> Result ``` Create a point cloud from XYZ coordinate arrays. **Recommended for most LiDAR and depth camera data.** **Parameters** | Name | Type | Required | Description | |------|------|----------|-------------| | `points` | `&[[f32; 3]]` | yes | Array of [x, y, z] coordinates in meters. Right-hand, Z-up convention. | **Returns** `HorusResult` — `Err(MemoryError::PoolExhausted)` if tensor pool is full. **Memory:** Data is copied into the SHM tensor pool. Subsequent `Topic::send()` is zero-copy. ```rust // simplified let points = vec![[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]]; let cloud = PointCloud::from_xyz(&points)?; assert_eq!(cloud.point_count(), 2); assert!(cloud.is_xyz()); ``` #### from_xyzi ```rust // simplified pub fn from_xyzi(points: &[[f32; 4]]) -> Result ``` Create from XYZI arrays (with intensity). Intensity is typically 0.0-1.0 (normalized reflectance). **Parameters:** - `points: &[[f32; 4]]` — Array of [x, y, z, intensity]. Units: meters + unitless (0.0-1.0). #### from_xyzrgb ```rust // simplified pub fn from_xyzrgb(points: &[[f32; 6]]) -> Result ``` Create from XYZRGB arrays. Color values are 0.0-255.0 as floats. **Parameters:** - `points: &[[f32; 6]]` — Array of [x, y, z, r, g, b]. Units: meters + 0-255 color. | Constructor | Fields/Point | Use Case | |------------|-------------|----------| | `from_xyz()` | 3 | LiDAR, depth cameras | | `from_xyzi()` | 4 | LiDAR with reflectance | | `from_xyzrgb()` | 6 | RGB-D cameras, colored reconstructions | ### point_at ```rust // simplified pub fn point_at(&self, idx: u64) -> Option<&[u8]> ``` Get raw bytes of the i-th point. Returns a slice of length `fields_per_point * sizeof(f32)`. **Returns:** `None` if index is out of bounds. **Example:** ```rust // simplified if let Some(bytes) = cloud.point_at(0) { // For XYZ cloud: 12 bytes = 3 x f32 let x = f32::from_le_bytes(bytes[0..4].try_into().unwrap()); println!("First point X: {}", x); } ``` ### extract_xyz ```rust // simplified pub fn extract_xyz(&self) -> Option> ``` Extract all XYZ coordinates as float arrays. Validates alignment and bounds. Only works for F32 dtype. **Returns:** `None` if dtype is not F32 or data is misaligned. **Example:** ```rust // simplified if let Some(points) = cloud.extract_xyz() { for p in &points { println!("({:.2}, {:.2}, {:.2})", p[0], p[1], p[2]); } println!("{} points total", points.len()); } ``` ### Metadata | Method | Returns | Description | |--------|---------|-------------| | `point_count()` | `u64` | Number of points in the cloud | | `fields_per_point()` | `u32` | Fields per point (3=XYZ, 4=XYZI, 6=XYZRGB) | | `is_xyz()` | `bool` | True if this is a plain XYZ cloud (3 fields) | | `has_intensity()` | `bool` | True if cloud includes an intensity field (4+ fields) | | `has_color()` | `bool` | True if cloud includes color fields (6+ fields) | ### Data Access (Zero-Copy) | Method | Returns | Description | |--------|---------|-------------| | `data()` | `&[u8]` | Immutable access to the raw point buffer | | `data_mut()` | `&mut [u8]` | Mutable access to the raw point buffer | | `copy_from(src)` | `&mut Self` | Copy bytes into the point buffer (chainable) | ### Frame and Timestamp | Method | Returns | Description | |--------|---------|-------------| | `set_frame_id(id)` | `&mut Self` | Set sensor/coordinate frame identifier (chainable) | | `set_timestamp_ns(ts)` | `&mut Self` | Set timestamp in nanoseconds (chainable) | | `frame_id()` | `&str` | Get the frame identifier | | `timestamp_ns()` | `u64` | Get timestamp in nanoseconds | ### Type Info | Method | Returns | Description | |--------|---------|-------------| | `dtype()` | `TensorDtype` | Data type of each field (e.g., F32, F64) | | `nbytes()` | `u64` | Total size of the point buffer in bytes | | `is_cpu()` | `bool` | Whether data resides on CPU (shared memory) | ## Point Types Fixed-size point structs optimized for zero-copy transport. ### PointXYZ Basic 3D point (12 bytes). ```rust // simplified use horus::prelude::*; let p = PointXYZ::new(1.0, 2.0, 3.0); println!("Distance from origin: {:.2}m", p.distance()); let q = PointXYZ::new(4.0, 6.0, 3.0); println!("Distance between: {:.2}m", p.distance_to(&q)); ``` | Field | Type | Description | |-------|------|-------------| | `x` | `f32` | X coordinate (meters) | | `y` | `f32` | Y coordinate (meters) | | `z` | `f32` | Z coordinate (meters) | | Method | Returns | Description | |--------|---------|-------------| | `new(x, y, z)` | `PointXYZ` | Create a point | | `distance()` | `f32` | Euclidean distance from the origin | | `distance_to(other)` | `f32` | Euclidean distance to another point | ### PointXYZI 3D point with intensity (16 bytes). Common for LiDAR sensors (Velodyne, Ouster, Livox). ```rust // simplified use horus::prelude::*; let p = PointXYZI::new(1.0, 2.0, 3.0, 128.0); // Convert from PointXYZ (zero intensity) let xyz = PointXYZ::new(1.0, 2.0, 3.0); let with_intensity = PointXYZI::from_xyz(xyz); // Convert back to PointXYZ (drop intensity) let xyz_only = p.xyz(); ``` | Field | Type | Description | |-------|------|-------------| | `x`, `y`, `z` | `f32` | Coordinates (meters) | | `intensity` | `f32` | Reflectance intensity (typically 0-255) | | Method | Returns | Description | |--------|---------|-------------| | `new(x, y, z, intensity)` | `PointXYZI` | Create a point with intensity | | `from_xyz(xyz)` | `PointXYZI` | Convert from PointXYZ (intensity = 0) | | `xyz()` | `PointXYZ` | Convert to PointXYZ (drop intensity) | ### PointXYZRGB 3D point with RGB color (16 bytes). Common for RGB-D cameras like Intel RealSense. ```rust // simplified use horus::prelude::*; let p = PointXYZRGB::new(1.0, 2.0, 3.0, 255, 0, 0); // Red point // Convert from PointXYZ (defaults to white) let xyz = PointXYZ::new(1.0, 2.0, 3.0); let colored = PointXYZRGB::from_xyz(xyz); // Get packed RGB as u32 (0xRRGGBBAA) let packed = p.rgb_packed(); // Convert back to PointXYZ (drop color) let xyz_only = p.xyz(); ``` | Field | Type | Description | |-------|------|-------------| | `x`, `y`, `z` | `f32` | Coordinates (meters) | | `r`, `g`, `b` | `u8` | Color components (0-255) | | `a` | `u8` | Alpha/padding (255 default) | | Method | Returns | Description | |--------|---------|-------------| | `new(x, y, z, r, g, b)` | `PointXYZRGB` | Create a colored point | | `from_xyz(xyz)` | `PointXYZRGB` | Convert from PointXYZ (white, alpha 255) | | `rgb_packed()` | `u32` | Get color as packed 0xRRGGBBAA | | `xyz()` | `PointXYZ` | Convert to PointXYZ (drop color) | ## Python API (PyPointCloud) ### Constructor and Factories ```python import horus import numpy as np # Create from scratch cloud = horus.PointCloud(num_points=10000, fields=3, dtype="float32") # Create from NumPy array (shape must be [N, fields]) arr = np.random.randn(10000, 3).astype(np.float32) cloud = horus.PointCloud.from_numpy(arr) # Create from PyTorch tensor import torch tensor = torch.randn(10000, 4) cloud = horus.PointCloud.from_torch(tensor) ``` ### Conversions ```python # Convert to NumPy (zero-copy when possible) arr = cloud.to_numpy() # shape: (N, fields) # Convert to PyTorch tensor tensor = cloud.to_torch() # shape: (N, fields) # Convert to JAX array jax_arr = cloud.to_jax() # shape: (N, fields) ``` ### Access and Properties ```python cloud = horus.PointCloud(num_points=10000, fields=4, dtype="float32") # Properties print(cloud.point_count) # 10000 print(cloud.fields_per_point) # 4 print(cloud.dtype) # "float32" print(cloud.nbytes) # 160000 # Point access coords = cloud.point_at(0) # [x, y, z, intensity] as list of floats # Metadata cloud.set_frame_id("velodyne_top") cloud.set_timestamp_ns(1700000000000000000) print(cloud.frame_id) # "velodyne_top" print(cloud.timestamp_ns) # 1700000000000000000 # Layout queries cloud.is_xyz() # False (4 fields) cloud.has_intensity() # True cloud.has_color() # False ``` ### Python LiDAR Pipeline ```python import horus import numpy as np sub = horus.Topic(horus.PointCloud, endpoint="lidar.points") pub = horus.Topic(horus.PointCloud, endpoint="lidar.filtered") while True: cloud = sub.recv() if cloud is None: continue # Convert to NumPy for processing points = cloud.to_numpy() # (N, 3) # Remove points below ground plane mask = points[:, 2] > -0.3 filtered = points[mask] # Publish filtered cloud out = horus.PointCloud.from_numpy(filtered.astype(np.float32)) out.set_frame_id(cloud.frame_id) out.set_timestamp_ns(cloud.timestamp_ns) pub.send(out) ``` ## See Also - [Perception Messages](/rust/api/perception-messages) - PointField, PointCloudHeader, DepthImage, landmarks, tracking - [Tensor Messages](/rust/api/tensor-messages) - TensorDtype, Device, auto-managed tensor pools - [Tensor API](/rust/api/tensor) - Advanced pool management - [Sensor Messages](/rust/api/sensor-messages) - LaserScan for 2D LiDAR - [Vision Messages](/rust/api/vision-messages) - Image, CameraInfo, Detection3D - [Python PointCloud](/python/api/pointcloud) — Python PointCloud API - [Python Perception Types](/python/api/perception) — PointCloudBuffer, DetectionList, TrackedObject --- ## DepthImage Path: /rust/api/depth-image Description: Zero-copy shared memory depth image for RGB-D cameras and depth sensors # DepthImage Pool-backed depth image with zero-copy shared memory transport. Use it for obstacle detection, 3D reconstruction, or navigation costmap updates from RGB-D cameras. **Quick example** — detect obstacles closer than 1 meter: ```rust // simplified use horus::prelude::*; let depth_topic: Topic = Topic::new("camera.depth")?; if let Some(depth) = depth_topic.recv() { // Check center pixel for close obstacles if let Some(center_distance) = depth.get_depth(320, 240) { if center_distance > 0.0 && center_distance < 1.0 { hlog!(warn, "Obstacle at {:.2}m!", center_distance); } } // Find closest point in the frame if let Some((min, _max, _mean)) = depth.depth_statistics() { hlog!(info, "Closest point: {:.2}m", min); } } ``` DepthImage supports two formats: F32 (meters) and U16 (millimeters). All depth access methods work in meters regardless of format — U16 values are automatically converted. ## Creating a DepthImage ```rust // simplified use horus::prelude::*; // F32 depth image -- values stored in meters (default choice) let mut depth = DepthImage::meters(640, 480)?; // U16 depth image -- values stored in millimeters (common for Intel RealSense, Azure Kinect) let mut depth_mm = DepthImage::millimeters(640, 480)?; ``` ## Reading and Writing Depth ```rust // simplified use horus::prelude::*; let mut depth = DepthImage::meters(640, 480)?; // Set depth at pixel (x, y) in meters -- returns &mut Self for chaining depth.set_depth(320, 240, 1.5)?; depth.set_depth(100, 100, 2.3)?; // Get depth at pixel -- always returns meters (auto-converts U16 mm to f32 m) if let Some(d) = depth.get_depth(320, 240) { println!("Depth at center: {:.3}m", d); } // For U16 images, get raw millimeter value let depth_u16 = DepthImage::millimeters(640, 480)?; if let Some(mm) = depth_u16.get_depth_u16(320, 240) { println!("Raw depth: {}mm", mm); } // Compute min, max, mean over valid (non-zero) depth values if let Some((min, max, mean)) = depth.depth_statistics() { println!("Range: {:.2}-{:.2}m, mean: {:.2}m", min, max, mean); } ``` ## Metadata and Transport ```rust // simplified use horus::prelude::*; let mut depth = DepthImage::meters(640, 480)?; // Set metadata (method chaining) depth .set_frame_id("depth_camera") .set_timestamp_ns(1234567890); // Check format println!("{}x{}", depth.width(), depth.height()); println!("Meters: {}, Scale: {}", depth.is_meters(), depth.depth_scale()); // Zero-copy raw data access let raw: &[u8] = depth.data(); println!("{} bytes", depth.nbytes()); ``` ## RGB-D Pipeline ```rust // simplified use horus::prelude::*; struct RgbdNode { rgb_sub: Topic, depth_sub: Topic, cloud_pub: Topic, fx: f32, fy: f32, cx: f32, cy: f32, } impl Node for RgbdNode { fn name(&self) -> &str { "RgbdNode" } fn tick(&mut self) { let depth = match self.depth_sub.recv() { Some(d) => d, None => return, }; let w = depth.width(); let h = depth.height(); if let Ok(mut cloud) = PointCloud::from_xyz(&xyz_points) { // Depth-to-pointcloud using camera intrinsics for y in 0..h { for x in 0..w { if let Some(z) = depth.get_depth(x, y) { if z > 0.0 { let px = (x as f32 - self.cx) * z / self.fx; let py = (y as f32 - self.cy) * z / self.fy; // Write (px, py, z) into cloud data... } } } } cloud.set_frame_id(depth.frame_id()); self.cloud_pub.send(cloud); } } } ``` ## Method Reference ### meters Creates an F32 depth image storing values in meters. **Signature** ```rust // simplified pub fn meters(width: u32, height: u32) -> HorusResult ``` **Parameters** | Name | Type | Required | Description | |------|------|----------|-------------| | `width` | `u32` | yes | Image width in pixels | | `height` | `u32` | yes | Image height in pixels | **Returns** `HorusResult` — F32 depth image. Each pixel stores depth in meters as `f32`. **When to use**: Processed depth data, simulation output, SLAM maps. **When NOT to use**: Raw sensor data from RealSense/Kinect — use `millimeters()` instead (matches sensor output format). **Example:** ```rust // simplified let depth = DepthImage::meters(640, 480)?; assert!(depth.is_meters()); ``` ### millimeters Creates a U16 depth image storing values in millimeters. **Signature** ```rust // simplified pub fn millimeters(width: u32, height: u32) -> HorusResult ``` **Parameters** | Name | Type | Required | Description | |------|------|----------|-------------| | `width` | `u32` | yes | Image width in pixels | | `height` | `u32` | yes | Image height in pixels | **Returns** `HorusResult` — U16 depth image. Each pixel stores depth in millimeters as `u16`. Range: 0-65535mm (0-65.535m). **When to use**: Raw sensor data from Intel RealSense, Azure Kinect, stereo cameras — matches their native output format. Half the memory of F32. **Example:** ```rust // simplified let depth = DepthImage::millimeters(640, 480)?; assert!(depth.is_millimeters()); ``` ### get_depth ```rust // simplified pub fn get_depth(&self, x: u32, y: u32) -> Option ``` Get depth in **meters** at pixel (x, y). Automatically converts U16 millimeters to F32 meters. **Returns:** `None` if out of bounds. Returns `0.0` for invalid/no-return pixels. ### set_depth ```rust // simplified pub fn set_depth(&mut self, x: u32, y: u32, value: f32) -> HorusResult<&mut Self> ``` Set depth in meters. For U16 images, validates range (0-65.535m). Chainable. **Errors:** `ValidationError::OutOfRange` if value exceeds U16 range for millimeter images. ### depth_statistics ```rust // simplified pub fn depth_statistics(&self) -> Option<(f32, f32, f32)> ``` Compute (min, max, mean) over valid (non-zero) depth values, in meters. Returns `None` if no valid pixels. **Example:** ```rust // simplified if let Some((min, max, mean)) = depth.depth_statistics() { println!("Range: {:.2}-{:.2}m, avg: {:.2}m", min, max, mean); } ``` ## API Quick Reference DepthImage is an RAII type -- fields are private, accessed through methods. Mutation methods return `&mut Self` for chaining. | Method | Returns | Description | |--------|---------|-------------| | `new(width, height, dtype)` | `Result` | Create depth image (F32 or U16) from global pool | | `get_depth(x, y)` | `Option` | Depth in meters at pixel (auto-converts U16) | | `set_depth(x, y, value)` | `Result<&mut Self>` | Set depth in meters at pixel | | `get_depth_u16(x, y)` | `Option` | Raw U16 millimeters (U16 dtype only) | | `depth_statistics()` | `Option<(f32, f32, f32)>` | (min, max, mean) in meters over valid depths | | `width()` | `u32` | Image width in pixels | | `height()` | `u32` | Image height in pixels | | `is_meters()` | `bool` | True if dtype is F32 | | `is_millimeters()` | `bool` | True if dtype is U16 | | `depth_scale()` | `f32` | Depth unit scale factor | | `data()` | `&[u8]` | Zero-copy access to raw depth data | | `data_mut()` | `&mut [u8]` | Mutable access to raw depth data | | `copy_from(src)` | `&mut Self` | Copy raw bytes into the image | | `dtype()` | `TensorDtype` | Data type (F32 or U16) | | `nbytes()` | `u64` | Total size in bytes | | `is_cpu()` | `bool` | Whether data is on CPU | | `set_frame_id(id)` | `&mut Self` | Set sensor frame identifier | | `frame_id()` | `&str` | Get sensor frame identifier | | `set_timestamp_ns(ts)` | `&mut Self` | Set timestamp in nanoseconds | | `timestamp_ns()` | `u64` | Get timestamp in nanoseconds | **Depth format summary:** | Dtype | Unit | `is_meters()` | `depth_scale()` | Typical sensor | |-------|------|---------------|-----------------|----------------| | `TensorDtype::F32` | meters | `true` | `1.0` | Processed / simulated | | `TensorDtype::U16` | millimeters | `false` | `0.001` | RealSense, Kinect, stereo | ## Python API (PyDepthImage) ```python import horus import numpy as np # Create from dtype string depth = horus.DepthImage(480, 640, dtype="float32") # meters depth = horus.DepthImage(480, 640, dtype="uint16") # millimeters depth = horus.DepthImage(480, 640, dtype="mm") # alias for uint16 # Create from numpy array arr = np.zeros((480, 640), dtype=np.float32) depth = horus.DepthImage.from_numpy(arr) # Create from torch tensor import torch t = torch.zeros(480, 640, dtype=torch.float32) depth = horus.DepthImage.from_torch(t) # Read/write depth (meters) depth.set_depth(320, 240, 1.5) d = depth.get_depth(320, 240) # -> float # Statistics stats = depth.depth_statistics() # -> (min, max, mean) or None # Convert to numpy/torch/jax arr = depth.to_numpy() t = depth.to_torch() j = depth.to_jax() # Properties depth.height # 480 depth.width # 640 depth.dtype # "float32" depth.nbytes # total bytes depth.depth_scale # 1.0 for F32, 0.001 for U16 # Metadata depth.set_frame_id("depth_cam") depth.set_timestamp_ns(123456) depth.is_meters() # True depth.is_millimeters() # False ``` **Valid dtype strings:** | String | Format | |--------|--------| | `"float32"`, `"meters"` | F32, meters | | `"uint16"`, `"millimeters"`, `"mm"` | U16, millimeters | ## See Also - [Perception Messages](/rust/api/perception-messages) -- PointCloud, Landmark, TrackedObject - [Vision Messages](/rust/api/vision-messages) -- Image, CameraInfo, Detection - [Tensor API](/rust/api/tensor) -- Underlying pool management - [Python DepthImage](/python/api/depth-image) — Python DepthImage API --- ## Tensor Path: /rust/api/tensor Description: Zero-copy tensor descriptor with DLPack interop for ML pipelines # Tensor A lightweight tensor descriptor for zero-copy ML data sharing across nodes and processes. ```rust // simplified use horus::prelude::*; ``` ## Overview `Tensor` is a lightweight descriptor that references data in shared memory. Only the descriptor is transmitted through topics — the actual tensor data stays in-place, enabling zero-copy transport for large ML payloads. ## Methods | Method | Return Type | Description | |--------|-------------|-------------| | `shape()` | `&[u64]` | Tensor dimensions (e.g., `[1080, 1920, 3]`) | | `strides()` | `&[u64]` | Byte strides per dimension | | `numel()` | `u64` | Total number of elements | | `nbytes()` | `u64` | Total size in bytes (`numel * dtype.element_size()`) | | `dtype()` | `TensorDtype` | Element data type | | `device()` | `Device` | Device location (CPU or CUDA) | | `is_cpu()` | `bool` | True if data resides on CPU / shared memory | | `is_cuda()` | `bool` | True if device descriptor is set to CUDA | | `is_contiguous()` | `bool` | True if memory layout is C-contiguous | | `view(new_shape)` | `Option` | Reshape without copying (fails if not contiguous or element count changes) | | `slice_first_dim(start, end)` | `Option` | Slice along the first dimension, adjusting strides | ### Reshape and Slice ```rust // simplified let topic: Topic = Topic::new("model.input")?; if let Some(tensor) = topic.recv() { // Reshape a flat 1D tensor into a batch of images if let Some(reshaped) = tensor.view(&[4, 3, 224, 224]) { println!("Batch shape: {:?}", reshaped.shape()); // [4, 3, 224, 224] } // Take the first 2 items from a batch if let Some(sliced) = tensor.slice_first_dim(0, 2) { println!("Sliced shape: {:?}", sliced.shape()); // [2, 3, 224, 224] } } ``` ## TensorDtype Supported element types with sizes and common use cases: | Dtype | Size | Use Case | |-------|------|----------| | `F32` | 4 bytes | ML training and inference | | `F64` | 8 bytes | High-precision computation | | `F16` | 2 bytes | Memory-efficient inference | | `BF16` | 2 bytes | Training on modern GPUs | | `I8` | 1 byte | Quantized inference | | `I16` | 2 bytes | Audio, sensor data | | `I32` | 4 bytes | General integer | | `I64` | 8 bytes | Large signed values | | `U8` | 1 byte | Images | | `U16` | 2 bytes | Depth sensors (mm) | | `U32` | 4 bytes | Large indices | | `U64` | 8 bytes | Counters, timestamps | | `Bool` | 1 byte | Masks | ### TensorDtype Methods ```rust // simplified let dtype = TensorDtype::F32; // Size in bytes assert_eq!(dtype.element_size(), 4); // Display (lowercase string representation) println!("{}", dtype); // "float32" // Parse from string — accepts common aliases let parsed = TensorDtype::parse("float32").unwrap(); // F32 let parsed = TensorDtype::parse("f16").unwrap(); // F16 let parsed = TensorDtype::parse("uint8").unwrap(); // U8 let parsed = TensorDtype::parse("bool").unwrap(); // Bool ``` ## Device Fixed-size device descriptor supporting CPU and CUDA device tags. `Device::cuda(N)` targets CUDA GPU index N. The optimal GPU memory backend is auto-detected at runtime: unified memory on Jetson, managed memory on discrete GPUs. ```rust // simplified // Constructors let cpu = Device::cpu(); let gpu0 = Device::cuda(0); // CUDA device 0 // Check device type assert!(cpu.is_cpu()); assert!(gpu0.is_cuda()); // Display println!("{}", gpu0); // "cuda:0" // Parse from string let dev = Device::parse("cpu").unwrap(); let dev = Device::parse("cuda:0").unwrap(); // GPU detection if horus::cuda_available() { println!("{} GPU(s) found", horus::cuda_device_count()); } // GPU-backed Image let gpu_img = Image::new(640, 480, ImageEncoding::Rgb8)?.to_gpu(Device::cuda(0))?; assert!(gpu_img.is_cuda()); // Transfer between devices let cpu_img = gpu_img.to_cpu()?; let back_on_gpu = cpu_img.to_gpu(Device::cuda(0))?; ``` ## ML Pipeline Example A camera node captures frames using `Image`, while a preprocessing node converts them into batched `Tensor` data for model inference: ```rust // simplified use horus::prelude::*; // Producer: camera capture node — uses Image, not raw Tensor node! { CameraNode { pub { frames: Image -> "camera.rgb" } data { frame_count: u64 = 0 } tick { let image = Image::new(640, 480, ImageEncoding::Rgb8); // ... fill pixel data from camera driver ... self.frames.send(&image); self.frame_count += 1; } } } // Preprocessor: converts Image frames into batched Tensor for ML node! { PreprocessNode { sub { frames: Image -> "camera.rgb" } pub { batch: Tensor -> "model.input" } data { buffer: Vec = Vec::new() } tick { if let Some(img) = self.frames.recv() { self.buffer.push(img); if self.buffer.len() >= 4 { // Build a [4, 3, 224, 224] F32 batch tensor for the model let tensor = Tensor::from_shape( &[4, 3, 224, 224], TensorDtype::F32, Device::cpu(), ); // ... resize, normalize, and copy frames into tensor ... self.batch.send(&tensor); self.buffer.clear(); } } } } } // Consumer: inference node — works with raw Tensor input/output node! { InferenceNode { sub { input: Tensor -> "model.input" } pub { detections: GenericMessage -> "model.detections" } tick { if let Some(tensor) = self.input.recv() { hlog!(debug, "Input: {:?}, {} bytes", tensor.shape(), tensor.nbytes()); // Run inference on the batch tensor, publish results ... } } } } ``` ## Python Usage In Python, use `Image`, `PointCloud`, or `DepthImage` for zero-copy tensor data — they wrap the pool-backed tensor system automatically and provide `.to_numpy()`, `.to_torch()`, `.to_jax()` conversions: ```python import horus import numpy as np # Image → NumPy (zero-copy) img = horus.Image(480, 640, "rgb8") arr = img.to_numpy() # shape: (480, 640, 3), dtype: uint8 # PointCloud → PyTorch (zero-copy via DLPack) cloud = horus.PointCloud.from_numpy(np.random.randn(1000, 3).astype(np.float32)) tensor = cloud.to_torch() # DepthImage → JAX depth = horus.DepthImage(480, 640, "float32") jax_arr = depth.to_jax() ``` See [Python Image](/python/api/image), [Python PointCloud](/python/api/pointcloud), and [Python DepthImage](/python/api/depth-image) for the full APIs. --- ## TensorDtype Enumerates all supported tensor element types. Matches common ML framework dtypes for seamless interop with PyTorch, NumPy, JAX, and DLPack. | Variant | Value | Size | NumPy | Use Case | |---------|-------|------|-------|----------| | `F32` | 0 | 4 bytes | ` = Topic::new("external.detections")?; topic.send(msg); ``` ```python # Python side: receive and parse topic = Topic(GenericMessage, "external.detections") msg = topic.recv() data = msg.to_dict() # {"class": "box", "confidence": 0.95, ...} ``` ## When to Use Use `GenericMessage` when standard types (`CmdVel`, `Imu`, `LaserScan`, etc.) don't cover your data: - Sending structured data from Rust to Python (or vice versa) - Prototyping a custom message before defining a dedicated type - Bridging external systems that produce arbitrary payloads **Prefer typed messages whenever possible.** Typed POD messages achieve ~200ns latency via zero-copy shared memory. `GenericMessage` requires MessagePack serialization, which adds overhead (~4us). Use it only when you need the flexibility. ## Constructors ### From Raw Bytes ```rust // simplified let data: Vec = vec![0x01, 0x02, 0x03]; let msg = GenericMessage::new(data)?; ``` Maximum payload is **4,096 bytes** (4KB). Returns an error if exceeded. ### With Metadata ```rust // simplified let data: Vec = sensor_bytes.to_vec(); let msg = GenericMessage::with_metadata(data, "lidar_raw".to_string())?; ``` Metadata is an optional string label (max **255 bytes**). Useful for tagging the content type so the receiver knows how to interpret the payload. ### From a Serializable Value ```rust // simplified use std::collections::HashMap; let mut config = HashMap::new(); config.insert("gain", 1.5_f64); config.insert("offset", 0.3); let msg = GenericMessage::from_value(&config)?; ``` Serializes any `T: Serialize` via MessagePack into the payload bytes. ## Methods | Method | Return Type | Description | |--------|-------------|-------------| | `data()` | `Vec` | Get the raw payload bytes | | `metadata()` | `Option` | Get the metadata string, if set | | `to_value::()` | `Result` | Deserialize payload from MessagePack into `T` | ## Examples ### Rust to Rust ```rust // simplified use horus::prelude::*; use serde::{Serialize, Deserialize}; #[derive(Serialize, Deserialize, Debug)] struct CalibrationData { offsets: Vec, scale: f64, label: String, } // Sender let cal = CalibrationData { offsets: vec![0.1, -0.05, 0.02], scale: 1.001, label: "imu_0".into(), }; let topic: Topic = Topic::new("calibration")?; topic.send(GenericMessage::from_value(&cal)?); // Receiver if let Some(msg) = topic.recv() { let cal: CalibrationData = msg.to_value()?; println!("Scale: {}, offsets: {:?}", cal.scale, cal.offsets); } ``` ### Rust to Python A Rust node publishes config data that a Python node consumes: ```rust // simplified // Rust sender use std::collections::HashMap; let mut params = HashMap::new(); params.insert("kp", 1.2_f64); params.insert("ki", 0.01); params.insert("kd", 0.5); let topic: Topic = Topic::new("controller.params")?; let msg = GenericMessage::with_metadata( GenericMessage::from_value(¶ms)?.data(), "pid_gains".to_string(), )?; topic.send(msg); ``` ```python # Python receiver import horus topic = horus.Topic(horus.GenericMessage, endpoint="controller.params") msg = topic.recv() if msg is not None: params = msg.to_dict() # Deserializes MessagePack print(f"PID gains: kp={params['kp']}, ki={params['ki']}, kd={params['kd']}") if msg.metadata == "pid_gains": controller.update_gains(**params) ``` ### With Metadata Tagging ```rust // simplified // Tag messages so the receiver can dispatch by type let msg = GenericMessage::with_metadata( GenericMessage::from_value(&sensor_reading)?.data(), "temperature".to_string(), )?; topic.send(msg); // Receiver dispatches based on metadata if let Some(msg) = topic.recv() { match msg.metadata().as_deref() { Some("temperature") => { let temp: f64 = msg.to_value()?; hlog!(info, "Temperature: {:.1} C", temp); } Some("humidity") => { let hum: f64 = msg.to_value()?; hlog!(info, "Humidity: {:.1}%", hum); } _ => { hlog!(warn, "Unknown message type"); } } } ``` ## Performance `GenericMessage` uses MessagePack serialization, which adds overhead compared to typed POD messages: | Payload Size | GenericMessage Latency | Typed Message Latency | |-------------|----------------------|----------------------| | Small (up to 256 bytes) | ~4.0 us | ~200 ns | | Large (up to 4 KB) | ~4.4 us | ~200 ns | The 20x difference comes from serialization/deserialization. For high-frequency data (IMU at 1 kHz, motor commands at 500 Hz), always use typed messages. `GenericMessage` is appropriate for lower-frequency data like configuration updates, calibration parameters, or diagnostic payloads. ## Limits | Constraint | Value | |-----------|-------| | Maximum payload | 4,096 bytes (4 KB) | | Maximum metadata | 255 bytes | | Serialization format | MessagePack | The 4KB limit keeps `GenericMessage` as a fixed-size type suitable for shared memory transport. For larger payloads, use `Topic` with [Tensor API](/rust/api/tensor) instead. ## Pitfalls **No compile-time type safety.** Unlike `Topic`, `Topic` can carry any data. A publisher sending `{"velocity": 1.0}` and a subscriber expecting `{"speed": 1.0}` will silently fail — the key name mismatch is only caught at runtime. **Debugging is harder.** GenericMessage shows as raw bytes in the monitor. Typed messages show structured fields. Prefer typed messages for anything you'll need to debug. **Not for high-frequency data.** The 20x latency overhead (4μs vs 200ns) matters at 1kHz+. Use typed messages (`CmdVel`, `Imu`, etc.) for control loops and sensor streams. **When to use GenericMessage:** - Python↔Python communication with arbitrary dicts - Configuration updates at low frequency - Prototyping before defining a typed message - Cross-language payloads where defining a shared struct isn't worth it ## See Also - [Standard Messages](/rust/api/messages) -- All built-in message types - [TensorPool API](/rust/api/tensor) -- For payloads larger than 4KB - [message! Macro](/rust/api/macros#message) -- Define custom typed messages --- ## Input Messages Path: /rust/api/input-messages Description: Keyboard and joystick/gamepad input messages for teleoperation and HID control # Input Messages HORUS provides input message types for teleoperation and human interface device (HID) control. Both types are fixed-size (72 bytes), optimized for zero-copy shared memory transport. ## KeyboardInput Keyboard key press/release events with modifier tracking. ```rust // simplified use horus::prelude::*; // Create a key press event let key = KeyboardInput::new( "w".to_string(), // key name 87, // raw key code vec!["Shift".into()], // active modifiers true // pressed ); // Check key state println!("Key: {}", key.key_name()); println!("Pressed: {}", key.is_pressed()); // Check individual modifiers if key.is_ctrl() { println!("Ctrl is held"); } if key.is_shift() { println!("Shift is held"); } // Check any modifier by name if key.has_modifier("Alt") { println!("Alt is held"); } // Get all active modifiers as a list let mods = key.modifiers(); // Vec println!("Active modifiers: {:?}", mods); ``` **Fields:** | Field | Type | Unit | Description | |-------|------|------|-------------| | `key_name` | `[u8; 32]` | — | Key name buffer (null-terminated, max 31 chars) | | `code` | `u32` | — | Raw key code | | `modifier_flags` | `u32` | — | Bit flags for active modifiers | | `pressed` | `u8` | — | 1 = press, 0 = release | | `timestamp_ms` | `u64` | ms | Unix timestamp in milliseconds | **Modifier Flags:** | Constant | Value | Description | |----------|-------|-------------| | `MODIFIER_CTRL` | `1 << 0` | Control key | | `MODIFIER_ALT` | `1 << 1` | Alt key | | `MODIFIER_SHIFT` | `1 << 2` | Shift key | | `MODIFIER_SUPER` | `1 << 3` | Super/Windows/Cmd key | | `MODIFIER_HYPER` | `1 << 4` | Hyper key | | `MODIFIER_META` | `1 << 5` | Meta key | **Methods:** | Method | Returns | Description | |--------|---------|-------------| | `new(key, code, modifiers, pressed)` | `KeyboardInput` | Create from key name, code, modifier names, pressed state | | `key_name()` | `String` | Get key name from buffer | | `is_pressed()` | `bool` | Check if key is pressed | | `modifiers()` | `Vec` | Get all active modifiers as string names | | `has_modifier(name)` | `bool` | Check modifier by name ("Ctrl", "Alt", "Shift", "Super", "Hyper", "Meta") | | `is_ctrl()` | `bool` | Ctrl held | | `is_shift()` | `bool` | Shift held | | `is_alt()` | `bool` | Alt held | ## JoystickInput Gamepad/joystick events for buttons, axes, hats, and connection state. ```rust // simplified use horus::prelude::*; // Button press let btn = JoystickInput::new_button(0, 1, "A".to_string(), true); println!("Button {} pressed: {}", btn.element_name(), btn.is_pressed()); // Axis movement (value: -1.0 to 1.0) let axis = JoystickInput::new_axis(0, 0, "LeftStickX".to_string(), 0.75); println!("Axis {}: {:.2}", axis.element_name(), axis.value); // Hat/D-pad let hat = JoystickInput::new_hat(0, 0, "DPad".to_string(), 1.0); // Controller connection event let conn = JoystickInput::new_connection(0, true); // joystick 0 connected if conn.is_connection_event() { println!("Controller {}: connected={}", conn.joystick_id, conn.is_connected()); } // Check event type if axis.is_axis() { println!("Axis event: {}", axis.value); } else if btn.is_button() { println!("Button event: pressed={}", btn.is_pressed()); } ``` **Fields:** | Field | Type | Unit | Description | |-------|------|------|-------------| | `joystick_id` | `u32` | — | Controller ID (0, 1, 2, ...) | | `event_type` | `[u8; 16]` | — | Event type: `"button"`, `"axis"`, `"hat"`, `"connection"` | | `element_id` | `u32` | — | Button/axis/hat number | | `element_name` | `[u8; 32]` | — | Element name (null-terminated, max 31 chars) | | `value` | `f32` | — | Button: 0.0/1.0, Axis: -1.0 to 1.0, Hat: directional | | `pressed` | `u8` | — | For buttons: 1 = pressed, 0 = released | | `timestamp_ms` | `u64` | ms | Unix timestamp in milliseconds | > **ROS2 equivalent:** `sensor_msgs/msg/Joy` **Constructors:** | Constructor | Description | |-------------|-------------| | `new_button(joystick_id, button_id, name, pressed)` | Button press/release event | | `new_axis(joystick_id, axis_id, name, value)` | Axis movement event | | `new_hat(joystick_id, hat_id, name, value)` | Hat/D-pad event | | `new_connection(joystick_id, connected)` | Controller connect/disconnect | **Methods:** | Method | Returns | Description | |--------|---------|-------------| | `event_type()` | `String` | Get event type string | | `element_name()` | `String` | Get element name string | | `is_button()` | `bool` | Is a button event | | `is_axis()` | `bool` | Is an axis event | | `is_hat()` | `bool` | Is a hat/D-pad event | | `is_connection_event()` | `bool` | Is a connection event | | `is_pressed()` | `bool` | Button is pressed | | `is_connected()` | `bool` | Controller is connected (connection events) | ## AudioFrame Audio data for speech recognition, sound localization, and audio processing pipelines. ```rust // simplified use horus::prelude::*; // Create mono audio frame let frame = AudioFrame::mono(16000, &samples) .with_frame_id("microphone_0") .with_timestamp(horus::timestamp_ns()); // Create stereo audio frame let stereo = AudioFrame::stereo(44100, &interleaved_samples); // Create multi-channel audio let multi = AudioFrame::multi_channel(48000, 4, &samples); // Access audio data println!("Duration: {:.1}ms", frame.duration_ms()); println!("Samples: {}", frame.valid_samples().len()); println!("Frame count: {}", frame.frame_count()); ``` **Fields:** | Field | Type | Description | |-------|------|-------------| | `sample_rate` | `u32` | Sample rate in Hz | | `channels` | `u8` | Number of audio channels | | `num_samples` | `u32` | Number of valid samples | | `timestamp_ns` | `u64` | Nanoseconds since epoch | | `frame_id` | `[u8; 32]` | Source identifier | **Methods:** | Method | Returns | Description | |--------|---------|-------------| | `mono(sample_rate, samples)` | `AudioFrame` | Create mono (1-channel) frame | | `stereo(sample_rate, samples)` | `AudioFrame` | Create stereo (2-channel) frame | | `multi_channel(sample_rate, channels, samples)` | `AudioFrame` | Create multi-channel frame | | `valid_samples()` | `&[f32]` | Get the valid sample data | | `duration_ms()` | `f64` | Frame duration in milliseconds | | `frame_count()` | `u32` | Number of frames (samples / channels) | | `frame_id_str()` | `&str` | Get frame ID as string | | `with_frame_id(id)` | `Self` | Builder: set source identifier | | `with_timestamp(ts)` | `Self` | Builder: set timestamp | ## Teleoperation Example ```rust // simplified use horus::prelude::*; struct TeleoperationNode { keyboard_sub: Topic, joystick_sub: Topic, cmd_vel_pub: Topic, linear_speed: f64, angular_speed: f64, } impl Node for TeleoperationNode { fn name(&self) -> &str { "Teleop" } fn tick(&mut self) { let mut linear = 0.0; let mut angular = 0.0; // Process keyboard (WASD) if let Some(key) = self.keyboard_sub.recv() { if key.is_pressed() { match key.key_name().as_str() { "w" => linear = self.linear_speed, "s" => linear = -self.linear_speed, "a" => angular = self.angular_speed, "d" => angular = -self.angular_speed, _ => {} } // Shift for boost if key.is_shift() { linear *= 2.0; angular *= 2.0; } } } // Process joystick (overrides keyboard if present) if let Some(joy) = self.joystick_sub.recv() { if joy.is_axis() { match joy.element_name().as_str() { "LeftStickY" => linear = joy.value as f64 * self.linear_speed, "RightStickX" => angular = joy.value as f64 * self.angular_speed, _ => {} } } } self.cmd_vel_pub.send(Twist::new_2d(linear, angular)); } } ``` ## See Also - [Sensor Messages](/rust/api/sensor-messages) - Lidar, IMU, GPS sensor data - [Control Messages](/rust/api/control-messages) - Motor and servo commands --- ## Clock & Time Messages Path: /rust/api/clock-messages Description: Simulation clock, replay time, and time synchronization messages # Clock & Time Messages HORUS provides clock and time synchronization messages for simulation, replay, and multi-sensor time alignment. Both types are fixed-size, optimized for zero-copy shared memory transport. ## Clock Simulation and replay time source. Allows nodes to operate in simulated time instead of wall-clock time. ```rust // simplified use horus::prelude::*; // Real-time wall clock let wall = Clock::wall_clock(); // Simulation time at 2x speed let sim = Clock::sim_time(5_000_000_000, 2.0); // 5 seconds sim time, 2x speed println!("Sim time: {} ns", sim.clock_ns); println!("Speed: {}x", sim.sim_speed); // Replay time (playing back recorded data) let replay = Clock::replay_time(10_000_000_000, 0.5); // 10s replay time, half speed // Pause/resume let paused = sim.set_paused(true); println!("Paused: {}", paused.is_paused()); // Measure elapsed time between clock messages let earlier = Clock::sim_time(1_000_000_000, 1.0); let later = Clock::sim_time(2_000_000_000, 1.0); let elapsed = later.elapsed_since(&earlier); println!("Elapsed: {} ns", elapsed); ``` **Fields:** | Field | Type | Description | |-------|------|-------------| | `clock_ns` | `u64` | Current simulation/replay time in nanoseconds | | `realtime_ns` | `u64` | Wall-clock time for comparison | | `sim_speed` | `f64` | Playback speed (1.0 = real-time, 2.0 = 2x, 0.5 = half) | | `paused` | `u8` | 0 = running, 1 = paused | | `source` | `u8` | Time source (see constants) | | `timestamp_ns` | `u64` | When this message was published | **Source Constants:** | Constant | Value | Description | |----------|-------|-------------| | `SOURCE_WALL` | 0 | Wall-clock (real time) | | `SOURCE_SIM` | 1 | Simulation time | | `SOURCE_REPLAY` | 2 | Replay/playback time | **Methods:** | Method | Returns | Description | |--------|---------|-------------| | `wall_clock()` | `Clock` | Create real-time clock | | `sim_time(sim_ns, speed)` | `Clock` | Create simulation time clock | | `replay_time(replay_ns, speed)` | `Clock` | Create replay time clock | | `elapsed_since(&earlier)` | `u64` | Nanoseconds between two clock messages | | `is_paused()` | `bool` | Check if clock is paused | | `set_paused(paused)` | `Clock` | Return new clock with paused state (builder) | ## TimeReference External time source for synchronization between sensors or between local clock and GPS/NTP/PTP time. ```rust // simplified use horus::prelude::*; // GPS time reference with offset let time_ref = TimeReference::new( 1_709_000_000_000_000_000, // GPS time in nanoseconds "gps", // source name -500_000, // offset: local is 500µs behind GPS ); println!("Source: {}", time_ref.source_name()); // Correct a local timestamp using the reference offset let local_ts = 1_709_000_001_000_000_000; let corrected = time_ref.correct_timestamp(local_ts); println!("Corrected timestamp: {} ns", corrected); ``` **Fields:** | Field | Type | Description | |-------|------|-------------| | `time_ref_ns` | `u64` | External reference time in nanoseconds | | `source` | `[u8; 32]` | Source identifier (null-terminated): `"gps"`, `"ntp"`, `"ptp"` | | `offset_ns` | `i64` | Signed offset: `local_time - reference_time` (nanoseconds) | | `timestamp_ns` | `u64` | When this message was published | **Methods:** | Method | Returns | Description | |--------|---------|-------------| | `new(time_ref_ns, source, offset_ns)` | `TimeReference` | Create time reference | | `source_name()` | `&str` | Get source identifier as string | | `correct_timestamp(local_ns)` | `u64` | Apply offset to correct a local timestamp | ## Simulation Time Node Example ```rust // simplified use horus::prelude::*; struct SimAwareNode { clock_sub: Topic, current_time_ns: u64, is_paused: bool, } impl Node for SimAwareNode { fn name(&self) -> &str { "SimAwareNode" } fn tick(&mut self) { // Read the latest clock message if let Some(clock) = self.clock_sub.recv() { self.is_paused = clock.is_paused(); self.current_time_ns = clock.clock_ns; // Skip processing when paused if self.is_paused { return; } // Use sim time for physics, animation, etc. let sim_seconds = self.current_time_ns as f64 / 1e9; println!("Sim time: {:.3}s (speed: {}x)", sim_seconds, clock.sim_speed); } } } ``` ## Time Synchronization Example ```rust // simplified use horus::prelude::*; struct TimeSyncNode { gps_time_sub: Topic, ntp_time_sub: Topic, best_offset_ns: i64, } impl Node for TimeSyncNode { fn name(&self) -> &str { "TimeSync" } fn tick(&mut self) { // Prefer GPS time when available if let Some(gps) = self.gps_time_sub.recv() { self.best_offset_ns = gps.offset_ns; } else if let Some(ntp) = self.ntp_time_sub.recv() { self.best_offset_ns = ntp.offset_ns; } } } ``` ## See Also - [Sensor Messages](/rust/api/sensor-messages) - NavSatFix for GPS data - [Diagnostics Messages](/rust/api/diagnostics-messages) - Heartbeat for system health --- ## Cargo Feature Flags Path: /rust/api/feature-flags Description: Complete reference for HORUS Cargo feature flags across all crates # Cargo Feature Flags HORUS uses Cargo feature flags to keep the default binary small while allowing opt-in to heavier capabilities. This page lists every feature flag across the workspace. ## `horus` (Main Crate) The crate you add to `Cargo.toml`. Re-exports `horus_core` and `horus_library`. ```toml [dependencies] horus = "0.1" ``` | Feature | Default | What It Enables | |---------|---------|-----------------| | `macros` | **Yes** | Procedural macros: `message!`, `service!`, `action!`, `node!`, `hlog!` | | `telemetry` | **Yes** | Live monitoring export (HTTP, UDP, file) | | `blackbox` | **Yes** | Post-mortem flight recorder via `.blackbox(size_mb)` | ### Disabling defaults To build a minimal binary without macros, telemetry, or blackbox: ```toml [dependencies] horus = { version = "0.1", default-features = false } ``` To selectively re-enable: ```toml [dependencies] horus = { version = "0.1", default-features = false, features = ["macros"] } ``` --- ## `horus_library` (Message & Transform Library) These flags control optional blocking/async APIs in the transform frame system. | Feature | Default | What It Enables | |---------|---------|-----------------| | `wait` | No | Blocking `wait_for_transform()` using condvar — zero overhead when disabled | | `async-wait` | No | Async `wait_for_transform_async()` using `tokio::sync::Notify` — pulls in `tokio` | ### Usage ```toml [dependencies] horus_library = { version = "0.1", features = ["wait"] } ``` ```rust // simplified // Only available with "wait" feature let tf = tf_tree.wait_for_transform("lidar", "world", Duration::from_secs(1))?; ``` ```toml [dependencies] horus_library = { version = "0.1", features = ["async-wait"] } ``` ```rust // simplified // Only available with "async-wait" feature let tf = tf_tree.wait_for_transform_async("lidar", "world", Duration::from_secs(1)).await?; ``` --- ## `horus_core` (Runtime) These mirror the `horus` crate flags and are typically set transitively. | Feature | Default | What It Enables | |---------|---------|-----------------| | `macros` | **Yes** | Includes `horus_macros` proc-macro crate | | `telemetry` | **Yes** | Telemetry export subsystem | | `blackbox` | **Yes** | Flight recorder subsystem | | `test-utils` | No | Test utilities like `MockTopic` for downstream crate testing | ### Using test-utils Enable `test-utils` in dev-dependencies for testing: ```toml [dev-dependencies] horus_core = { version = "0.1", features = ["test-utils"] } ``` --- ## `horus_py` (Python Bindings) | Feature | Default | What It Enables | |---------|---------|-----------------| | `extension-module` | **Yes** | Required for building the `.so` Python extension | **Important**: Disable when running Rust unit tests (test binary cannot link against libpython): ```bash cargo test -p horus_py --no-default-features ``` --- ## `horus_manager` (CLI) | Feature | Default | What It Enables | |---------|---------|-----------------| | `schema` | No | JSON Schema generation for manifest types (`schemars`) | --- ## Summary Table | Crate | Feature | Default | Dependency | |-------|---------|---------|------------| | `horus` | `macros` | Yes | `horus_macros` | | `horus` | `telemetry` | Yes | — (marker) | | `horus` | `blackbox` | Yes | — (marker) | | `horus_library` | `wait` | No | — (condvar) | | `horus_library` | `async-wait` | No | `tokio` | | `horus_core` | `test-utils` | No | — (marker) | | `horus_py` | `extension-module` | Yes | `pyo3` | | `horus_manager` | `schema` | No | `schemars` | ## See Also - [Configuration Reference](/package-management/configuration) — `horus.toml` project config - [Prelude Contents](/rust/api#prelude-contents) — What `use horus::prelude::*` includes - [Performance](/performance/performance) — Optimization guide --- ## Rust Examples Path: /rust/examples Description: Working examples demonstrating HORUS patterns in Rust # Rust Examples Learn HORUS through working, copy-paste-ready examples -- from single-node publishers to multi-process cross-language systems. ## [Basic Examples](/rust/examples/basic-examples) Fundamental patterns for beginners: - Publisher-Subscriber communication - Robot velocity control and lidar obstacle detection - PID controllers and multi-node pipelines - Camera image and point cloud processing ## [Advanced Examples](/rust/examples/advanced-examples) Complex patterns for production systems: - State machines for autonomous behavior - Priority-based safety systems - Multi-process architectures - Cross-language (Rust + Python) systems --- ## See Also - [Rust API Reference](/rust/api) — Complete API documentation - [Tutorials](/tutorials) — Step-by-step guided learning path - [Recipes](/recipes) — Production-ready patterns for common tasks --- ## Rust Documentation Path: /rust Description: HORUS Rust API, library, and examples # Rust Documentation Everything you need to build robotics applications with HORUS in Rust -- from API reference to working examples. ## Quick Reference | Section | Description | |---------|-------------| | [API Reference](/rust/api) | Node, Topic, Scheduler, messages, and driver APIs | | [Time API](/rust/time-api) | `horus::now()`, `horus::dt()`, `horus::rng()`, and SimClock | | [Examples](/rust/examples) | Working code you can copy and run | ## Sections ### [API Reference](/rust/api) Core Rust API documentation including Node, Topic, Scheduler, and message types. ### [Time API](/rust/time-api) Framework clock, timestep, deterministic RNG, and budget-aware anytime algorithms. ### [Examples](/rust/examples) Working examples demonstrating HORUS patterns. - [Basic Examples](/rust/examples/basic-examples) - Publisher-subscriber, multi-node systems - [Advanced Examples](/rust/examples/advanced-examples) - Complex patterns and integrations --- ## See Also - [Python Overview](/python) — Python API for rapid prototyping - [Getting Started](/getting-started/installation) — Installation and first application - [Concepts](/concepts) — Architecture, topics, nodes, and shared memory ======================================== # SECTION: Python ======================================== --- ## Node API Path: /python/api/node Description: Python Node class — constructor kwargs, send/recv methods, topic specs, lifecycle, execution classes, error handling # Node API The `Node` class is the primary building block of a HORUS application. Every component — sensors, controllers, planners, loggers — is a Node. All configuration happens in the constructor via kwargs — no class inheritance needed. > **Rust**: Available via `impl Node for MyStruct`. See [Rust Node API](/rust/api/node). ```python # simplified import horus def my_tick(node): node.send("heartbeat", {"alive": True}) node = horus.Node(name="pinger", pubs=["heartbeat"], tick=my_tick, rate=1) horus.run(node) ``` --- ## Constructor ```python # simplified horus.Node( name="my_node", # Optional: auto-generated UUID if None tick=my_tick_fn, # Required for useful nodes: tick(node) or async def tick(node) rate=30, # Hz (must be positive) pubs=["cmd_vel"], # Topics this node publishes to subs=["scan"], # Topics this node subscribes to init=my_init_fn, # Optional: init(node), called once at startup shutdown=my_shutdown_fn, # Optional: shutdown(node), called once at exit on_error=my_error_fn, # Optional: on_error(node, exception) order=100, # Execution priority (lower = earlier) budget=300 * horus.us, # Max tick time (seconds, use horus.us/horus.ms) deadline=900 * horus.us, # Hard deadline per tick on_miss="warn", # "warn", "skip", "safe_mode", "stop" failure_policy="fatal", # "fatal", "restart", "skip", "ignore" compute=False, # Run on thread pool (CPU-bound) on="scan", # Event-driven: tick only when topic receives data priority=50, # OS scheduling priority (1-99, requires RT) core=0, # Pin to CPU core watchdog=0.5, # Per-node watchdog timeout (seconds) default_capacity=1024, # Ring buffer capacity for auto-created topics ) ``` **Parameters:** | Parameter | Type | Default | Description | |-----------|------|---------|-------------| | `name` | `str` or `None` | UUID | Unique node identifier | | `tick` | `Callable[[Node], None]` | `None` | Main loop function. Can be `async def` | | `rate` | `float` | `30` | Tick rate in Hz. Must be positive | | `pubs` | `list`, `str`, `dict`, or `None` | `None` | Publisher topic specs | | `subs` | `list`, `str`, `dict`, or `None` | `None` | Subscriber topic specs | | `init` | `Callable` or `None` | `None` | One-time setup. Can be `async def` | | `shutdown` | `Callable` or `None` | `None` | Cleanup on exit. Can be `async def` | | `on_error` | `Callable` or `None` | `None` | Error handler: `on_error(node, exception)` | | `order` | `int` | `100` | Execution order (lower = runs first in tick cycle) | | `budget` | `float` or `None` | `None` | Max tick execution time in seconds | | `deadline` | `float` or `None` | `None` | Hard deadline in seconds | | `on_miss` | `str` or `None` | `None` | Deadline miss policy | | `failure_policy` | `str` or `None` | `None` | Error recovery strategy | | `compute` | `bool` | `False` | Run on parallel thread pool | | `on` | `str` or `None` | `None` | Event-driven topic trigger | | `priority` | `int` or `None` | `None` | OS SCHED_FIFO priority (1-99) | | `core` | `int` or `None` | `None` | CPU core pinning | | `watchdog` | `float` or `None` | `None` | Per-node watchdog timeout (seconds) | | `default_capacity` | `int` | `1024` | Default ring buffer capacity | **Validation rules:** - `rate` must be positive — `ValueError` raised for 0 or negative - `name` must be unique within a scheduler — duplicate names cause an error at `add()` time - `compute=True` is mutually exclusive with `on="topic"` and `async def tick` - `budget` and `deadline` are in **seconds** — use `horus.us` and `horus.ms` constants: `budget=300 * horus.us` --- ## Lifecycle The scheduler manages the Node lifecycle in a strict order: ``` Construction → Registration → Init → Tick Loop → Shutdown you sched.add() once repeated once ``` 1. **Construction** — `horus.Node(...)` creates the node with your callbacks and config. No I/O happens here. 2. **Registration** — `scheduler.add(node)` registers the node. Validates name uniqueness and config. 3. **Initialization** — On first `run()` or `tick_once()`, the scheduler calls your `init(node)` callback (lazy). If `init` raises an exception, the `failure_policy` determines what happens. 4. **Tick Loop** — Each cycle: scheduler calls `tick(node)`. If tick raises, `on_error(node, exception)` is called, then `failure_policy` handles recovery. 5. **Shutdown** — On Ctrl+C, SIGTERM, duration expiry, or `request_stop()`: `shutdown(node)` called on each node in registration order. **Key Python differences from Rust:** - Init is lazy (first `run()` or `tick_once()`), not at `add()` time - `shutdown()` always runs even if `init()` was never called (returns silently if no `shutdown` callback) - `on_error` receives a Python exception object, not a string - GIL is acquired for each callback, released between ticks --- ## Topic Spec Formats The `pubs` and `subs` parameters accept several formats: ```python # simplified # String — single topic, GenericMessage (~5-50μs) pubs="cmd_vel" # List of strings — multiple topics, GenericMessage pubs=["cmd_vel", "status"] # Typed class — zero-copy POD (~1.5μs) pubs=[horus.CmdVel, horus.LaserScan] # Dict with custom names — typed with explicit topic name pubs={"cmd": horus.CmdVel, "scan": horus.LaserScan} # Dict with config — full control pubs={"cmd": {"type": horus.CmdVel, "capacity": 2048}} ``` **Performance**: Typed topics (`horus.CmdVel`) use zero-copy POD transport (~1.7μs). String topics use `GenericMessage` with MessagePack serialization (~6-50μs). Use typed topics for anything in a control loop or crossing the Rust/Python boundary. --- ## Methods — Quick Reference | Method | Signature | Returns | Description | |--------|-----------|---------|-------------| | `send` | `send(topic: str, data) -> bool` | `True` on success | Publish data to a topic | | `recv` | `recv(topic: str) -> Any or None` | Message or `None` | Get latest message (pull-based) | | `recv_all` | `recv_all(topic: str) -> list` | List of messages | Drain all available messages | | `has_msg` | `has_msg(topic: str) -> bool` | `bool` | Check if a message is available | | `log_info` | `log_info(msg: str)` | — | Log at INFO level | | `log_warning` | `log_warning(msg: str)` | — | Log at WARNING level | | `log_error` | `log_error(msg: str)` | — | Log at ERROR level | | `log_debug` | `log_debug(msg: str)` | — | Log at DEBUG level | | `request_stop` | `request_stop()` | — | Signal scheduler to stop | | `publishers` | `publishers() -> list[str]` | Topic names | List of publisher topic names | | `subscribers` | `subscribers() -> list[str]` | Topic names | List of subscriber topic names | --- ## Method Details ### send() ```python # simplified node.send(topic: str, data: Any) -> bool ``` Publish data to a topic. Returns `True` on success. **Behavior:** - If the topic was declared in `pubs`, uses the pre-created topic - If the topic was NOT declared in `pubs`, auto-creates a GenericMessage topic (works but won't appear in monitoring) - For typed topics (`pubs=[horus.CmdVel]`), `data` must be the correct type - For string topics (`pubs=["data"]`), `data` is serialized via MessagePack **Errors:** - `TypeError` if `data` is not serializable (lambda, socket, custom class without `__dict__`) - Never blocks — if the ring buffer is full, the oldest message is dropped ```python # simplified # Typed — zero-copy (~1.7μs) node.send("cmd_vel", horus.CmdVel(linear=1.0, angular=0.5)) # Dict — serialized (~6-50μs) node.send("status", {"battery": 0.85, "mode": "auto"}) ``` ### recv() ```python # simplified node.recv(topic: str) -> Any or None ``` Get the latest message from a topic. Returns `None` if no messages are available — **never blocks**. **Behavior:** - Pull-based: you decide when to check for messages - Returns the next unread message from the ring buffer (FIFO order). Use `read_latest()` on a standalone `Topic` if you need only the most recent - If `has_msg()` was called and buffered a message, `recv()` returns that buffered message first - Auto-creates topics for undeclared subscriptions (won't appear in monitoring) ```python # simplified def tick(node): msg = node.recv("sensor") if msg is None: return # No data yet — normal, not an error process(msg) ``` ### has_msg() ```python # simplified node.has_msg(topic: str) -> bool ``` Check if messages are available. **Internally peeks by calling recv()** — if a message exists, it's buffered and returned by the next `recv()` call. **Important**: `has_msg()` consumes the message into a peek buffer. The next `recv()` returns the buffered message. This means `has_msg()` + `recv()` is safe but costs one recv internally: ```python # simplified # Safe pattern — the peeked message is returned by recv() if node.has_msg("sensor"): data = node.recv("sensor") # Returns the peeked message ``` ### recv_all() ```python # simplified node.recv_all(topic: str) -> list ``` Drain all available messages from the ring buffer. Returns an empty list if none. Use when you need to process every message (not just the latest): ```python # simplified # Process all accumulated messages for msg in node.recv_all("events"): handle_event(msg) ``` ### log_*() ```python # simplified node.log_info(msg: str) node.log_warning(msg: str) node.log_error(msg: str) node.log_debug(msg: str) ``` Log messages via the HORUS logging system (backed by `hlog!` in Rust). **Important**: These only work during `tick()`, `init()`, and `shutdown()` callbacks. Calling log outside these callbacks (e.g., at module level) drops the message silently — there's no active logging context. ### request_stop() ```python # simplified node.request_stop() ``` Signal the scheduler to perform a graceful shutdown. All nodes complete their current tick, then `shutdown()` is called on each node in order. --- ## Properties | Property | Type | Description | |----------|------|-------------| | `name` | `str` | Node name | | `rate` | `float` | Tick rate in Hz | | `order` | `int` | Execution order | | `info` | `NodeInfo` | Scheduler-provided metrics (available during tick) | > **User state**: Assign any attribute in `init()` (e.g., `node.state = {"counter": 0}`) — it persists across ticks as a regular Python attribute. Not a built-in property. --- ## Execution Classes How your Node kwargs map to execution classes: | Kwargs | Execution Class | Thread | |--------|----------------|--------| | `rate=100` (with budget or deadline) | **Rt** (auto) | RT thread with SCHED_FIFO | | `compute=True` | **Compute** | Worker thread pool | | `on="topic_name"` | **Event** | Triggered on topic data | | `async def tick` | **AsyncIo** | Async I/O thread pool | | Default (none of the above) | **BestEffort** | Main tick thread | RT is auto-detected: if you set `rate` with `budget` or `deadline`, the node is classified as real-time. No explicit `rt=True` needed on the node — the scheduler handles promotion. --- ## Failure Policies | String | Behavior | Use For | |--------|----------|---------| | `"fatal"` | First failure stops the scheduler | Motor controllers, safety monitors | | `"restart"` | Re-initialize with exponential backoff | Sensor drivers (USB reconnect) | | `"skip"` | Tolerate failures with cooldown | Logging, telemetry | | `"ignore"` | Swallow failures, keep ticking | Statistics, debug output | ```python # simplified # Safety-critical motor controller motor = horus.Node(tick=motor_fn, failure_policy="fatal", on_miss="safe_mode") # Recoverable sensor driver lidar = horus.Node(tick=lidar_fn, failure_policy="restart") # Non-critical telemetry telemetry = horus.Node(tick=upload_fn, failure_policy="ignore") ``` --- ## Examples ### Basic Node ```python # simplified import horus def my_tick(node): node.send("heartbeat", {"alive": True, "tick": horus.tick()}) node = horus.Node(name="pinger", pubs=["heartbeat"], tick=my_tick, rate=1) horus.run(node) ``` ### Typed Topics (Zero-Copy) ```python # simplified import horus def controller_tick(node): scan = node.recv("scan") if scan: cmd = horus.CmdVel(linear=0.5, angular=0.0) node.send("cmd_vel", cmd) controller = horus.Node( name="controller", pubs=[horus.CmdVel], subs=[horus.LaserScan], tick=controller_tick, rate=50 ) horus.run(controller) ``` ### Node with Full Lifecycle ```python # simplified import horus def my_init(node): node.state = {"counter": 0, "errors": 0} node.log_info("Initialized") def my_tick(node): node.state["counter"] += 1 node.send("count", {"value": node.state["counter"]}) def my_shutdown(node): node.log_info(f"Shutting down after {node.state['counter']} ticks, {node.state['errors']} errors") def my_error(node, exception): node.state["errors"] += 1 node.log_error(f"Error #{node.state['errors']}: {exception}") sensor = horus.Node( name="counter", init=my_init, tick=my_tick, shutdown=my_shutdown, on_error=my_error, rate=10, pubs=["count"], failure_policy="restart", ) horus.run(sensor) ``` ### Production: Safety-Critical Motor Controller ```python # simplified import horus def motor_tick(node): cmd = node.recv("cmd_vel") if cmd is None: # SAFETY: no command received — check stale timeout node.state["stale_ticks"] += 1 if node.state["stale_ticks"] > 50: # 500ms at 100Hz node.send("motor", horus.CmdVel(linear=0.0, angular=0.0)) node.log_warning("Stale command — motors zeroed") return node.state["stale_ticks"] = 0 # SAFETY: clamp to safe range linear = max(-1.0, min(1.0, cmd.linear)) angular = max(-2.0, min(2.0, cmd.angular)) node.send("motor", horus.CmdVel(linear=linear, angular=angular)) def motor_init(node): node.state = {"stale_ticks": 0} def motor_shutdown(node): # CRITICAL: zero motors before exit node.send("motor", horus.CmdVel(linear=0.0, angular=0.0)) node.log_info("Motors zeroed") def motor_error(node, exception): node.send("motor", horus.CmdVel(linear=0.0, angular=0.0)) node.log_error(f"Motor error — zeroed: {exception}") motor = horus.Node( name="motor_ctrl", init=motor_init, tick=motor_tick, shutdown=motor_shutdown, on_error=motor_error, rate=100, order=0, budget=500 * horus.us, on_miss="safe_mode", failure_policy="fatal", subs=[horus.CmdVel], pubs=["motor"], ) horus.run(motor, tick_rate=100, rt=True, watchdog_ms=500) ``` ### Production: Sensor Node with Reconnection ```python # simplified import horus hw_connected = [False] def sensor_init(node): try: # Open hardware connection node.state = {"device": open_hardware("/dev/ttyUSB0")} hw_connected[0] = True except OSError as e: node.log_error(f"Hardware init failed: {e}") raise # Let failure_policy="restart" handle it def sensor_tick(node): reading = node.state["device"].read() node.send("sensor.data", horus.Imu( accel_x=reading[0], accel_y=reading[1], accel_z=reading[2], gyro_x=reading[3], gyro_y=reading[4], gyro_z=reading[5], )) def sensor_shutdown(node): if hw_connected[0]: node.state["device"].close() sensor = horus.Node( name="imu_driver", init=sensor_init, tick=sensor_tick, shutdown=sensor_shutdown, rate=100, order=0, failure_policy="restart", # Auto-reconnect on USB disconnect pubs=[horus.Imu], ) horus.run(sensor) ``` --- ## Supporting Types ### NodeInfo Available as `node.info` during tick/init/shutdown. Provides scheduler-managed metrics: | Method/Property | Returns | Description | |----------------|---------|-------------| | `name` | `str` | Node name | | `state` | `str` | Current lifecycle state | | `tick_count()` | `int` | Total ticks executed | | `error_count()` | `int` | Total errors | | `successful_ticks()` | `int` | Ticks without error | | `failed_ticks()` | `int` | Ticks that raised exceptions | | `avg_tick_duration_ms()` | `float` | Running average tick time | | `get_uptime()` | `float` | Seconds since init | | `get_metrics()` | `dict` | Full metrics snapshot (tick times, message counts, errors) | | `request_stop()` | — | Stop the scheduler | | `set_custom_data(key, value)` | — | Attach custom metadata | | `get_custom_data(key)` | `str` or `None` | Read custom metadata | ### NodeState String enumeration of node lifecycle states: | State | Description | |-------|-------------| | `UNINITIALIZED` | Registered but `init()` not yet called | | `INITIALIZING` | `init()` is running | | `RUNNING` | Normal operation — `tick()` being called | | `STOPPING` | `shutdown()` is running | | `STOPPED` | Shutdown complete | | `ERROR` | `init()` or `tick()` returned an error | | `CRASHED` | `tick()` raised an unhandled exception | ### Miss Deadline miss policies: | Constant | String | Behavior | |----------|--------|----------| | `horus.Miss.WARN` | `"warn"` | Log warning, continue normally | | `horus.Miss.SKIP` | `"skip"` | Skip this tick, resume next cycle | | `horus.Miss.SAFE_MODE` | `"safe_mode"` | Enter safe state, continue ticking | | `horus.Miss.STOP` | `"stop"` | Stop the entire scheduler | --- ## Design Decisions **Why kwargs, not class inheritance?** `class MyNode(horus.Node): def tick(self):...` requires boilerplate and doesn't work with plain functions or lambdas. The kwargs API (`horus.Node(tick=my_fn, rate=30)`) is more Pythonic, matches FastAPI/Click patterns, and all config happens in one call. **Why `recv()` returns data, not a callback?** Pull-based reception keeps timing deterministic — your tick controls when data is consumed. Push-based callbacks fire at unpredictable times, making budget compliance harder. **Why `has_msg()` uses peek buffering?** `has_msg()` internally calls `recv()` and buffers the result. This avoids a separate "peek" API while keeping the common `if node.has_msg("x"): data = node.recv("x")` pattern zero-overhead. **Why undeclared topics auto-create?** Reduces boilerplate for prototyping. But undeclared topics don't appear in monitoring — always declare in `pubs`/`subs` for production. **Why no `is_safe_state` / `enter_safe_state` in Python?** These are Rust-only Node trait methods requiring mutable borrows that don't map cleanly to Python's callback model. Safety-critical nodes should use Rust. Python nodes use `on_miss="safe_mode"` and `on_error` callbacks instead. --- ## Trade-offs | Choice | Benefit | Cost | |--------|---------|------| | Kwargs over inheritance | Concise, works with lambdas | No IDE auto-complete for tick signature | | Auto-create topics | Zero boilerplate for prototyping | Undeclared topics invisible to monitoring | | `has_msg()` peek buffering | Clean API, no separate peek method | Consumes message on check | | No `is_safe_state` / `enter_safe_state` | Simpler Python API | Safety-critical nodes must use Rust | | Pull-based `recv()` | Deterministic timing | Must check every tick (no push notification) | | GIL acquired per callback | Python works correctly | ~3μs overhead per tick | --- ## See Also - [Scheduler API](/python/api/scheduler) — Node orchestration and execution - [Topic API](/python/api/topic) — Standalone pub/sub outside node lifecycle - [Async Nodes](/python/api/async-nodes) — Async I/O patterns and best practices - [Node Lifecycle](/python/node-lifecycle) — Init, tick, shutdown deep dive - [Fault Tolerance](/advanced/circuit-breaker) — Failure policies in depth - [GIL & Performance](/python/gil-performance) — Tick rate ceilings, optimization - [Getting Started (Python)](/getting-started/quick-start-python) — First Python application - [Rust Node API](/rust/api/node) — Rust equivalent (582 lines) --- ## Execution Classes (Python) Path: /python/execution-classes Description: The 5 Python execution classes — Rt, Compute, Event, AsyncIo, BestEffort — and when to use each # Execution Classes (Python) A motor controller that misses its deadline by even a millisecond can cause a robot arm to overshoot and collide with a person. A path planner that takes 50 ms of CPU time is completely normal — but if it runs on the same thread as the motor controller, it blocks 50 ticks. A logging node that takes an extra 10 ms is harmless. A cloud uploader that blocks on a network request shouldn't hold up anything. These are fundamentally different workloads. Running them all the same way — in a single sequential loop — forces every node to compromise. The fast ones wait for the slow ones. The critical ones share a thread with the optional ones. A single slow node can cascade timing failures across the entire system. HORUS solves this with **execution classes**: five different executors, each optimized for a specific workload type. The scheduler automatically selects the right class based on how you configure the `Node()` constructor — you describe what your node needs, and the scheduler figures out how to run it. ## The Five Classes |"default — no special params"| F["BestEffort
Main loop, sequential"] A -->|"rate= or budget= or deadline="| B["Rt
Dedicated thread, timing enforced"] A -->|"compute=True"| C["Compute
Thread pool, parallel"] A -->|"on='topic.name'"| D["Event
Sleeps until message"] A -->|"async def tick"| E["AsyncIo
Tokio runtime, non-blocking"] style F fill:#6b7280,stroke:#4b5563,color:#fff style B fill:#ef4444,stroke:#dc2626,color:#fff style C fill:#3b82f6,stroke:#2563eb,color:#fff style D fill:#22c55e,stroke:#16a34a,color:#000 style E fill:#a855f7,stroke:#9333ea,color:#fff `} caption="The scheduler selects the execution class from your Node() configuration" /> ### BestEffort (Default) The default class. The scheduler **automatically parallelizes** independent BestEffort nodes using a dependency graph built from topic `send()`/`recv()` calls. Nodes that share no topics run simultaneously. Nodes with topic dependencies execute in causal order (publisher before subscriber). When no topic metadata exists, nodes fall back to `order=` tiers. ```python # simplified import horus def update_display(node): stats = node.recv("stats") if stats: print(f"Speed: {stats['speed']:.1f} m/s") display = horus.Node( name="display", subs=["stats"], tick=update_display, order=100, # No rate=, compute=, on=, or async def → BestEffort ) ``` **How it works**: At startup, the scheduler builds a dependency graph from topic metadata. Independent nodes are dispatched to a thread pool via the **ready-dispatch executor** — each node starts the instant its last dependency finishes. Dependent nodes execute in causal order. **Use for**: Sensors, controllers, planners, logging, telemetry — most nodes. The scheduler automatically determines what can run in parallel. **Characteristics**: - Runs at the scheduler's global `tick_rate` - Independent nodes execute in parallel (automatic — no configuration needed) - Dependent nodes execute in causal order (publisher before subscriber) - `order=` is optional — used as tiebreaker for nodes with no topic relationship ### Rt (Real-Time) Each RT node gets a **dedicated thread** with optional OS-level priority scheduling. The scheduler enforces timing budgets and deadlines, and takes action when a node runs too long. ```python # simplified import horus us = horus.us # 1e-6 ms = horus.ms # 1e-3 def motor_control(node): cmd = node.recv("cmd_vel") if cmd: write_to_motor(cmd.linear, cmd.angular) # Auto-derived: budget = 80% of period, deadline = 95% of period motor = horus.Node( name="motor", subs=[horus.CmdVel], tick=motor_control, rate=1000, # 1 kHz → budget=800μs, deadline=950μs on_miss="safe_mode", # Enter safe state on deadline miss order=0, ) # Explicit budget and deadline safety = horus.Node( name="safety_monitor", subs=["safety.heartbeat"], tick=check_safety, budget=100 * us, deadline=200 * us, on_miss="stop", # Full stop on deadline miss order=0, ) ``` **How it works**: Each RT node runs on its own dedicated thread. If `rt=True` is set on the `Scheduler` and the OS supports it, the thread gets `SCHED_FIFO` real-time priority. The scheduler measures every `tick()` call and applies the `on_miss` policy when budget or deadline is exceeded. **Auto-derivation from rate**: When you set `rate=` without explicit `budget=` or `deadline=`, the scheduler derives them: | You set | Scheduler derives | |---------|-------------------| | `rate=100` (10 ms period) | `budget=8 ms` (80%), `deadline=9.5 ms` (95%) | | `rate=1000` (1 ms period) | `budget=800 μs` (80%), `deadline=950 μs` (95%) | | `rate=500` (2 ms period) | `budget=1.6 ms` (80%), `deadline=1.9 ms` (95%) | You can override either or both: ```python # simplified # Auto budget + explicit deadline horus.Node(rate=1000, deadline=900 * us, ...) # Explicit budget + auto deadline horus.Node(rate=1000, budget=300 * us, ...) # Both explicit horus.Node(rate=1000, budget=300 * us, deadline=900 * us, ...) ``` **Additional RT configuration**: ```python # simplified critical = horus.Node( name="critical_ctrl", tick=control_loop, rate=1000, budget=300 * us, deadline=900 * us, on_miss="skip", priority=90, # OS-level thread priority (1-99, higher = more urgent) core=0, # Pin to CPU core 0 watchdog=0.5, # 500 ms freeze detection order=0, ) ``` **Use for**: Motor control, safety monitoring, sensor fusion — anything where missing a deadline has physical consequences. ### Compute For CPU-heavy work that benefits from parallelism. Multiple Compute nodes run simultaneously on a shared thread pool. ```python # simplified import horus def plan_path(node): scan = node.recv("scan") if scan: path = a_star(scan.ranges, goal) node.send("path", path) planner = horus.Node( name="planner", subs=[horus.LaserScan], pubs=["path"], tick=plan_path, compute=True, # CPU thread pool rate=10, # Optional: tick at most 10 times/sec order=5, ) ``` **How it works**: Compute nodes are dispatched to a thread pool. Multiple compute nodes can run in parallel on different CPU cores. They don't block the main tick loop or RT threads. **Use for**: Path planning, SLAM, point cloud processing, ML inference on CPU, image processing — any CPU-bound work that takes more than ~1 ms per tick. ### Event Nodes that sleep until a specific topic receives new data. Zero CPU usage when idle. ```python # simplified import horus def handle_estop(node): msg = node.recv("emergency.stop") if msg: node.log_warning("EMERGENCY STOP RECEIVED") disable_all_motors() node.request_stop() estop = horus.Node( name="estop_handler", subs=["emergency.stop"], tick=handle_estop, on="emergency.stop", # Sleep until this topic gets data order=0, ) ``` **How it works**: The node's thread sleeps. When any publisher calls `send()` on the named topic, the Event node wakes and `tick()` is called. If multiple messages arrive between wakes, the node ticks once — call `recv()` in a loop inside `tick()` to drain all pending messages. **Use for**: Emergency stop handlers, command receivers, sparse event processors — anything where the node should be completely idle until something specific happens. **Characteristics**: - Zero CPU when no messages arrive - Wake latency: ~microseconds from `send()` to `tick()` - `order=` still applies: if two event nodes wake simultaneously, lower order runs first - The topic name in `on=` must match a topic declared in another node's `pubs` ### AsyncIo For network, file I/O, or GPU operations that would block a real-time thread. Runs on a tokio runtime. In Python, this class is **automatic** — just use `async def` for your tick function. ```python # simplified import horus import aiohttp async def upload_telemetry(node): if node.has_msg("telemetry"): data = node.recv("telemetry") async with aiohttp.ClientSession() as session: await session.post("https://api.example.com/telemetry", json=data) uploader = horus.Node( name="uploader", subs=["telemetry"], tick=upload_telemetry, # async def → automatically AsyncIo rate=1, # Upload once per second order=50, ) ``` **How it works**: The node's `tick()` runs on a tokio-managed thread pool. The node can safely block on network requests, file I/O, or database queries without affecting any other node. Python's `async def` is detected automatically via `inspect.iscoroutinefunction`. **Use for**: HTTP/REST API calls, database writes, file logging, cloud telemetry, WebSocket connections. ## How Classes Are Selected The scheduler selects the execution class based on **which `Node()` parameters you set**: | Configuration | Resulting Class | |--------------|-----------------| | (nothing special) | BestEffort | | `rate=100` | Rt (auto-derived budget/deadline) | | `budget=300*us` | Rt | | `deadline=900*us` | Rt | | `rate=100, budget=..., deadline=...` | Rt (explicit overrides) | | `compute=True` | Compute | | `compute=True, rate=10` | Compute (rate-limited, not RT) | | `on="topic.name"` | Event | | `async def tick` | AsyncIo | | `async def tick, rate=1` | AsyncIo (rate-limited, not RT) | **Key rule**: `rate=` only auto-enables RT when no explicit execution class (`compute=True`, `on=`, `async def`) is set. When combined with an explicit class, `rate=` just limits tick frequency. ### The `rate=` Dual Meaning This is the most important interaction to understand: ```python # simplified # rate= ALONE → Rt (dedicated thread, timing enforced) horus.Node(tick=motor_ctrl, rate=1000) # Result: Rt class, budget=800μs, deadline=950μs # rate= WITH compute=True → Compute (frequency cap, no timing enforcement) horus.Node(tick=plan_path, rate=10, compute=True) # Result: Compute class, ticks at most 10/sec, no budget or deadline # rate= WITH on= → Event (frequency cap after wake) horus.Node(tick=handler, rate=100, on="commands") # Result: Event class, processes at most 100 msg/sec after waking # rate= WITH async def → AsyncIo (frequency cap) horus.Node(tick=async_upload, rate=1) # Result: AsyncIo class, uploads at most 1/sec, no budget or deadline ``` The rule is simple: `rate=` triggers RT **only** when it is the sole execution signal. The moment you add `compute=True`, `on=`, or use `async def`, the `rate=` becomes a frequency cap with no timing enforcement. ### Deferred Finalization Class selection happens when `horus.run()` or `sched.add()` resolves the node, not at `Node()` construction time. This means parameter order in the constructor does not matter: ```python # simplified # These produce identical results (both Compute, not RT): horus.Node(rate=100, compute=True, tick=fn) horus.Node(compute=True, rate=100, tick=fn) ``` If you accidentally set conflicting classes, the last explicit class wins and a warning is logged: ```python # simplified # compute=True is overridden by on= — warning logged horus.Node(compute=True, on="topic", tick=fn) # → Event, NOT Compute ``` ## Decision Guide | Your node does... | Use | Node() params | |-------------------|-----|---------------| | Motor control at 500+ Hz | **Rt** | `rate=500` | | Safety monitoring with deadlines | **Rt** | `rate=100, budget=..., deadline=..., on_miss="stop"` | | Sensor fusion at 200 Hz | **Rt** | `rate=200` | | Path planning (takes 10-50 ms) | **Compute** | `compute=True` | | ML inference on CPU | **Compute** | `compute=True, rate=30` | | SLAM processing | **Compute** | `compute=True` | | React to emergency stop | **Event** | `on="emergency.stop"` | | Process commands as they arrive | **Event** | `on="commands"` | | Upload telemetry to cloud | **AsyncIo** | `async def tick` | | Write logs to database | **AsyncIo** | `async def tick` | | WebSocket streaming | **AsyncIo** | `async def tick` | | Display dashboard updates | **BestEffort** | (default) | | Simple diagnostics | **BestEffort** | (default) | ## Validation and Common Mistakes The scheduler validates your configuration when adding nodes and catches mistakes at startup, not at runtime. ### What's Rejected | Configuration | Error | |--------------|-------| | `compute=True, budget=300*us` | Budget only meaningful for RT nodes | | `on="topic", deadline=900*us` | Deadline only meaningful for RT nodes | | `async def tick, budget=...` | Budget only meaningful for RT nodes | | `budget=0` | Budget must be > 0 | | `on=""` | Empty topic — node can never trigger | | `compute=True` with `async def tick` | Mutually exclusive — pick one | | `on="topic"` with `async def tick` | Mutually exclusive — pick one | ### What's Warned | Configuration | Warning | |--------------|---------| | `compute=True, on="topic"` | Last class wins (Event), first silently overridden | | `compute=True, priority=99` | Priority ignored on non-RT nodes | | `on_miss="stop"` without rate/budget/deadline | No deadline to miss — policy has no effect | | `core=0` without rate/budget/deadline | CPU pinning ignored on non-RT nodes | ### Common Mistakes **Mistake 1: Thinking `rate=` always means RT** ```python # simplified # WRONG assumption: "rate=10 means this is an RT node" planner = horus.Node( tick=plan_path, rate=10, compute=True, # ← compute=True overrides RT ) # Result: Compute class. rate=10 is just a frequency cap. # There is NO budget or deadline enforcement. ``` If you need timing enforcement on a compute-heavy node, drop `compute=True` and use `rate=` alone — but understand that the node gets a dedicated thread, not a pool: ```python # simplified # This IS an RT node — budget and deadline enforced planner = horus.Node( tick=plan_path, rate=10, budget=80 * ms, deadline=95 * ms, on_miss="warn", ) ``` **Mistake 2: Setting `on_miss=` without a deadline** ```python # simplified # on_miss has no effect — there's no deadline to miss horus.Node( tick=log_tick, compute=True, on_miss="stop", # ← useless on a Compute node ) # Fix: make it an RT node so a deadline exists horus.Node( tick=ctrl_tick, rate=100, on_miss="stop", # ← now triggers when the 9.5 ms deadline is missed ) ``` **Mistake 3: Thinking `priority=` works on Compute nodes** ```python # simplified # Priority is silently ignored — only RT nodes get SCHED_FIFO threads horus.Node(tick=plan, compute=True, priority=99) # Fix: make it RT if you need OS-level priority horus.Node(tick=plan, rate=100, priority=99) ``` **Mistake 4: Using `async def` when you want Compute** ```python # simplified # WRONG: this node does CPU-heavy ML inference, but async def # makes it AsyncIo — it runs on the I/O pool, not the compute pool async def infer(node): img = node.recv("camera") if img: result = model.predict(img.to_numpy()) # CPU-bound, not I/O node.send("detections", result) # Fix: use a regular def with compute=True def infer(node): img = node.recv("camera") if img: result = model.predict(img.to_numpy()) node.send("detections", result) detector = horus.Node(tick=infer, compute=True, rate=30) ``` **Mistake 5: Budget on a Compute node** ```python # simplified # REJECTED: budget is only for RT nodes horus.Node( tick=process, compute=True, budget=50 * ms, # ← scheduler rejects this ) # Fix option A: remove compute=True (becomes RT with timing enforcement) horus.Node(tick=process, rate=20, budget=50 * ms, on_miss="skip") # Fix option B: remove budget (stays Compute, no timing enforcement) horus.Node(tick=process, compute=True, rate=20) ``` **Mistake 6: Forgetting that `rate=30` (the default) triggers RT** ```python # simplified # This looks innocent but IS an RT node (rate=30 is the default) horus.Node(tick=log_tick) # rate=30 is set by default → auto-RT with budget=26.6ms, deadline=31.6ms # If you truly want BestEffort, you need rate=0 or the node must have # no timing parameters. In practice, the default rate=30 makes most # nodes RT — this is intentional for safety. ``` ## Complete Example: Mixed Execution Classes ```python # simplified import horus import aiohttp us = horus.us ms = horus.ms # --- Rt: 1 kHz motor control with strict timing --- def motor_tick(node): cmd = node.recv("cmd_vel") if cmd: write_motors(cmd.linear, cmd.angular) motor = horus.Node( name="motor_ctrl", subs=[horus.CmdVel], tick=motor_tick, rate=1000, budget=300 * us, on_miss="skip", priority=90, core=0, order=0, ) # --- Event: only runs when emergency.stop topic updates --- def estop_tick(node): msg = node.recv("emergency.stop") if msg: disable_all_motors() node.request_stop() estop = horus.Node( name="estop", subs=["emergency.stop"], tick=estop_tick, on="emergency.stop", order=0, ) # --- Rt: 100 Hz IMU sensor reading --- def imu_tick(node): reading = read_imu_hardware() node.send("imu", horus.Imu( accel_x=reading.ax, accel_y=reading.ay, accel_z=reading.az, gyro_x=reading.gx, gyro_y=reading.gy, gyro_z=reading.gz, )) imu = horus.Node( name="imu_reader", pubs=[horus.Imu], tick=imu_tick, rate=100, order=1, ) # --- Compute: path planning on thread pool --- def plan_tick(node): scan = node.recv("scan") if scan: path = compute_path(scan.ranges) node.send("path", path) planner = horus.Node( name="planner", subs=[horus.LaserScan], pubs=["path"], tick=plan_tick, compute=True, rate=10, order=5, ) # --- AsyncIo: cloud telemetry upload --- async def telemetry_tick(node): if node.has_msg("telemetry"): data = node.recv("telemetry") async with aiohttp.ClientSession() as session: await session.post("https://api.example.com/telemetry", json=data) uploader = horus.Node( name="telemetry", subs=["telemetry"], tick=telemetry_tick, rate=0.2, # Every 5 seconds order=50, ) # --- BestEffort: dashboard display in main loop --- def dashboard_tick(node): if node.has_msg("stats"): stats = node.recv("stats") update_display(stats) dashboard = horus.Node( name="dashboard", subs=["stats"], tick=dashboard_tick, order=100, # No rate, compute, on, or async → would be BestEffort # But note: default rate=30 makes this RT. To truly get BestEffort, # the scheduler treats order-only nodes in the main loop. ) # Run everything horus.run( motor, estop, imu, planner, uploader, dashboard, tick_rate=1000, rt=True, watchdog_ms=500, ) ``` ## Unit Constants Python doesn't have Rust's `300_u64.us()` extension trait syntax. Instead, HORUS provides unit constants for readable duration expressions: ```python # simplified import horus # Unit constants horus.us # 1e-6 (microseconds → seconds) horus.ms # 1e-3 (milliseconds → seconds) # Usage in Node() horus.Node( budget=300 * horus.us, # 300 μs = 0.0003 seconds deadline=900 * horus.us, # 900 μs = 0.0009 seconds watchdog=500 * horus.ms, # 500 ms = 0.5 seconds ) # Equivalent raw values (less readable) horus.Node( budget=0.0003, deadline=0.0009, watchdog=0.5, ) ``` | Python | Rust equivalent | Value | |--------|-----------------|-------| | `300 * horus.us` | `300_u64.us()` | 0.0003 s | | `1 * horus.ms` | `1_u64.ms()` | 0.001 s | | `500 * horus.ms` | `500_u64.ms()` | 0.5 s | ## Testing with `tick_once()` Execution classes work with single-tick testing. The scheduler still classifies nodes correctly — it just runs one cycle instead of looping: ```python # simplified import horus results = [] def sensor_tick(node): node.send("temp", {"value": 25.0}) def logger_tick(node): msg = node.recv("temp") if msg: results.append(msg["value"]) sensor = horus.Node(name="sensor", pubs=["temp"], tick=sensor_tick, rate=100, order=0) logger = horus.Node(name="logger", subs=["temp"], tick=logger_tick, rate=100, order=1) sched = horus.Scheduler(tick_rate=100, deterministic=True) sched.add(sensor) sched.add(logger) # RT nodes still get their class — tick_once just runs one cycle for _ in range(5): sched.tick_once() assert len(results) == 5 ``` For event-driven nodes, `tick_once()` only ticks them if the trigger topic has data: ```python # simplified def handler_tick(node): msg = node.recv("commands") results.append(msg) handler = horus.Node(name="handler", subs=["commands"], tick=handler_tick, on="commands") sched.add(handler) # handler does NOT tick here — no data on "commands" sched.tick_once() assert len(results) == 0 # Publish data, then tick — now handler runs cmd_topic = horus.Topic("commands") cmd_topic.send({"action": "go"}) sched.tick_once() assert len(results) == 1 ``` ## Design Decisions **Why 5 classes instead of just RT and non-RT?** A thread-per-node model (RT) is wasteful for logging nodes — dedicating OS threads and SCHED_FIFO slots to telemetry is overkill. A single-threaded model (BestEffort) can't handle 50 ms path planning without stalling the control loop. A two-class split (RT vs non-RT) doesn't distinguish between CPU-bound work (Compute), event-driven reactions (Event), and I/O-bound operations (AsyncIo) — each of which has a fundamentally different optimal executor. The five-class model matches the five common robotics workload patterns. **Why auto-detection instead of an explicit `execution_class=` parameter?** Most developers don't think in terms of "execution classes" — they think "this node needs to run at 1 kHz" or "this node does heavy computation." Auto-detection from `rate=`, `compute=`, `on=`, and `async def` maps intent to the right executor without requiring framework knowledge. If you set `rate=1000`, the scheduler knows you need a dedicated real-time thread. You don't have to explicitly request one. **Why does `rate=` with `compute=True` not become RT?** Because rate-limiting and real-time are different things. A path planner at 10 Hz means "tick at most 10 times per second" — not "this node has a 100 ms deadline that must be enforced." Mixing the two concepts would force compute nodes to pay RT overhead (dedicated threads, timing measurement) for no benefit. The rule is clear: `rate=` only triggers RT when no explicit class is set. **Why does Python auto-detect `async def` instead of having an `async_io=True` parameter?** Python already distinguishes `def` from `async def` at the language level. Auto-detection means zero boilerplate — just write `async def tick(node):` and the scheduler does the right thing. Adding a redundant `async_io=True` parameter would create two conflicting signals and make the API harder to use. The `async` keyword is the explicit signal. **Why can't you combine `compute=True` with `async def`?** Compute nodes run on a CPU thread pool optimized for parallel computation. Async nodes run on a tokio I/O thread pool optimized for non-blocking await. These are fundamentally different runtimes — a CPU-bound ML inference should not share the I/O pool (it would block other async nodes), and an I/O-bound HTTP request should not occupy a compute slot (it wastes a CPU thread while waiting). The mutual exclusion forces you to pick the right executor for your workload. ## Trade-offs | Gain | Cost | |------|------| | **Right executor per workload** — each node runs optimally | Must understand which class fits your node | | **Auto-detection** — `rate=` infers RT without explicit configuration | Less explicit — must know the `rate=` + `compute=True` interaction | | **RT isolation** — a slow Compute node can't block an RT motor controller | RT nodes consume one OS thread each | | **Event nodes** — zero CPU when idle | Must match `on="topic"` name exactly to a publisher's topic | | **AsyncIo auto-detect** — `async def` just works | Cannot combine with `compute=True` or `on=` | | **Default `rate=30`** — nodes get timing enforcement by default | Must explicitly opt out for truly best-effort nodes | | **Python unit constants** (`300 * horus.us`) | Less ergonomic than Rust's `300_u64.us()` — multiplication instead of method | ## See Also - [Python API Reference](/python/api) — Complete `Node()`, `Scheduler`, and `horus.run()` reference - [Async Nodes](/python/api/async-nodes) — Deep dive on `async def` tick patterns, timeouts, and cancellation - [Execution Classes (Concepts)](/concepts/execution-classes) — Language-agnostic theory with Rust examples - [Python Bindings](/python/api/python-bindings) — Full binding reference including topic formats and typed messages - [Python Examples](/python/examples) — Complete working examples --- ## Geometry Messages Path: /python/messages/geometry Description: 3D math, poses, transforms, and velocity types for Python robotics — every method explained # Geometry Messages Geometry types are the foundation of all robotics code. They represent positions, orientations, velocities, and coordinate transforms — the language your robot uses to understand where it is and where it needs to go. **When you need these:** Every robotics project uses at least `Pose2D` (mobile robots) or `Pose3D` (arms, drones). If you're doing any spatial math — distances, rotations, velocity commands — you need `Vector3`, `Quaternion`, and `Twist`. ```python # simplified from horus import ( Vector3, Point3, Quaternion, Pose2D, Pose3D, Twist, CmdVel, TransformStamped, PoseStamped, PoseWithCovariance, TwistWithCovariance, Accel, AccelStamped, ) ``` --- ## Vector3 A 3D vector representing directions, forces, velocities, or any three-component quantity. Includes built-in math operations so you don't need numpy for basic 3D math. ### Constructor ```python # simplified v = Vector3(x=1.0, y=2.0, z=3.0) ``` All fields default to `0.0`. ### `.zero()` — Zero Vector ```python # simplified v = Vector3.zero() # Vector3(0, 0, 0) ``` Creates a zero vector. Use this instead of `Vector3(x=0.0, y=0.0, z=0.0)` for clarity — it signals intent ("I want no force/velocity/displacement") rather than just default values. ### `.magnitude()` — Vector Length ```python # simplified v = Vector3(x=3.0, y=4.0, z=0.0) print(v.magnitude()) # 5.0 ``` Returns the Euclidean length (L2 norm): `sqrt(x² + y² + z²)`. The physical meaning depends on what the vector represents — if it's a force, magnitude is in Newtons; if it's a velocity, magnitude is in m/s. Use this to check if a vector is "significant" before processing it: ```python # simplified if force_vec.magnitude() < 0.01: return # Ignore negligible force readings ``` ### `.normalize()` — Normalize In-Place ```python # simplified v = Vector3(x=3.0, y=4.0, z=0.0) v.normalize() print(v.magnitude()) # 1.0 ``` Mutates the vector to unit length (magnitude = 1.0). The direction is preserved, only the length changes. Use this when you need a pure direction vector — for example, computing a repulsive force direction away from an obstacle. > **Common mistake:** Normalizing a zero vector. If `magnitude() == 0`, normalizing produces NaN. Always check magnitude first. ### `.normalized()` — Unit-Length Copy ```python # simplified v = Vector3(x=3.0, y=4.0, z=0.0) unit = v.normalized() print(v.magnitude()) # 5.0 (original unchanged) print(unit.magnitude()) # 1.0 (new copy) ``` Returns a **new** unit-length vector without modifying the original. Use this when you need both the original vector and its direction: ```python # simplified direction = force.normalized() scaled_force = Vector3( x=direction.x * desired_magnitude, y=direction.y * desired_magnitude, z=direction.z * desired_magnitude, ) ``` ### `.dot(other)` — Dot Product ```python # simplified a = Vector3(x=1.0, y=0.0, z=0.0) # Points right b = Vector3(x=0.0, y=1.0, z=0.0) # Points up print(a.dot(b)) # 0.0 — perpendicular ``` Returns a scalar measuring alignment between two vectors: - **Positive** → vectors point in roughly the same direction - **Zero** → vectors are perpendicular (90°) - **Negative** → vectors point in roughly opposite directions The magnitude equals `|a| * |b| * cos(angle)`. For unit vectors, it's just `cos(angle)`. Common use: checking if a sensor is facing a target: ```python # simplified sensor_dir = Vector3(x=1.0, y=0.0, z=0.0) # Sensor faces forward to_target = Vector3(x=0.8, y=0.6, z=0.0).normalized() if sensor_dir.dot(to_target) > 0.5: # Within ~60° cone print("Target is in sensor's field of view") ``` ### `.cross(other)` — Cross Product ```python # simplified x_axis = Vector3(x=1.0, y=0.0, z=0.0) y_axis = Vector3(x=0.0, y=1.0, z=0.0) z_axis = x_axis.cross(y_axis) print(z_axis) # Vector3(0, 0, 1) — right-hand rule ``` Returns a new vector perpendicular to both inputs, following the right-hand rule. The magnitude equals `|a| * |b| * sin(angle)`. Common use: computing surface normals from two edge vectors, or finding the rotation axis between two directions. > **Common mistake:** Cross product is order-dependent: `a.cross(b)` = `-b.cross(a)`. Swapping the order flips the direction. --- ## Point3 A 3D point in space. Semantically different from Vector3 — a Point3 is a *location* (like a landmark or waypoint), while Vector3 is a *direction/displacement*. ### Constructor ```python # simplified p = Point3(x=1.0, y=2.0, z=3.0) ``` ### `.origin()` — The Origin ```python # simplified o = Point3.origin() # Point3(0, 0, 0) ``` The world origin. Use this as a default reference point or to represent "no position set": ```python # simplified target = Point3.origin() # Will be updated when target is detected ``` ### `.distance_to(other)` — Euclidean Distance ```python # simplified p1 = Point3(x=1.0, y=0.0, z=0.0) p2 = Point3(x=4.0, y=4.0, z=0.0) print(p1.distance_to(p2)) # 5.0 ``` Straight-line distance between two points in meters. This is the fundamental spatial query in robotics — "how far is the obstacle?", "how far to the goal?", "how close are these two landmarks?" > **Common mistake:** This is 3D Euclidean distance. If you're working in 2D (ground robots), the Z component still contributes. Set Z=0 for both points if you want 2D distance, or use `Pose2D.distance_to()` instead. --- ## Quaternion Quaternion orientation (x, y, z, w). Quaternions avoid the gimbal lock problem of Euler angles and compose cleanly. But you almost never need to work with raw quaternion components — use `from_euler()` to create them from human-readable angles. ### Constructor ```python # simplified q = Quaternion(x=0.0, y=0.0, z=0.0, w=1.0) # Identity (no rotation) ``` Defaults to identity (w=1, rest=0). You rarely need to set raw xyzw values directly. ### `.identity()` — No Rotation ```python # simplified q = Quaternion.identity() # (0, 0, 0, 1) ``` The identity quaternion represents zero rotation. Use this as a default orientation or to initialize a pose before setting the actual rotation: ```python # simplified pose = Pose3D(x=1.0, y=2.0, z=0.0, qx=0.0, qy=0.0, qz=0.0, qw=1.0) # Equivalent to: pose = Pose3D.identity() pose.x = 1.0 pose.y = 2.0 ``` ### `.from_euler(roll, pitch, yaw)` — From Euler Angles ```python # simplified import math q = Quaternion.from_euler(roll=0.0, pitch=0.0, yaw=math.pi/2) # 90° left turn ``` **This is the most important method in the geometry library.** In robotics, you think in Euler angles (roll/pitch/yaw), but the system stores quaternions. This method bridges the gap. - **roll**: rotation around the X axis (tilting left/right). Positive = right side down. - **pitch**: rotation around the Y axis (tilting forward/backward). Positive = nose down. - **yaw**: rotation around the Z axis (turning left/right). Positive = turn left (counterclockwise from above). Convention: ZYX (yaw applied first, then pitch, then roll). All angles in **radians**. Common values: | Angle | Radians | Description | |-------|---------|-------------| | 0° | `0.0` | No rotation | | 45° | `math.pi/4` ≈ `0.785` | Quarter turn | | 90° | `math.pi/2` ≈ `1.571` | Right angle | | 180° | `math.pi` ≈ `3.142` | Half turn | | -90° | `-math.pi/2` | Right turn | ```python # simplified facing_north = Quaternion.from_euler(yaw=0.0) facing_east = Quaternion.from_euler(yaw=-math.pi/2) # 90° right facing_south = Quaternion.from_euler(yaw=math.pi) # 180° tilted_forward = Quaternion.from_euler(pitch=0.1) # Slight nose-down ``` > **Common mistake:** Using degrees instead of radians. `from_euler(yaw=90)` is NOT 90° — it's 90 radians (≈ 14 full rotations). Always use `math.pi/2` for 90°, or convert: `math.radians(90)`. ### `.normalize()` — Fix Quaternion Drift ```python # simplified q.normalize() ``` After repeated quaternion operations (interpolation, composition), floating-point errors accumulate and the quaternion drifts from unit length. Normalizing snaps it back. Call this periodically in long-running systems: ```python # simplified # In a sensor fusion loop orientation = update_orientation(orientation, gyro_reading, dt) orientation.normalize() # Prevent drift every tick ``` ### `.is_valid()` — Validation ```python # simplified print(q.is_valid()) # True if all components are finite and quaternion is non-zero ``` Catches NaN and infinity from bad sensor data or division by zero. Use this before publishing orientation data: ```python # simplified if not q.is_valid(): print("WARNING: Invalid orientation, skipping publish") return ``` --- ## Pose2D 2D robot pose — the workhorse of mobile robotics. Represents position (x, y) in meters and heading angle (theta) in radians. ### Constructor ```python # simplified pose = Pose2D(x=1.0, y=2.0, theta=0.5) ``` ### `.origin()` — The Origin Pose ```python # simplified start = Pose2D.origin() # (0, 0, 0) ``` The origin with zero heading. Common as an initial pose or "home" position. ### `.distance_to(other)` — How Far to the Goal? ```python # simplified robot = Pose2D(x=1.0, y=2.0, theta=0.5) goal = Pose2D(x=10.0, y=5.0, theta=0.0) dist = robot.distance_to(goal) # 9.487 meters ``` Euclidean distance between positions, **ignoring heading**. This is the straight-line distance — it doesn't account for obstacles or the robot's orientation. Use this in navigation loops to check "am I close enough?": ```python # simplified if current.distance_to(goal) < 0.2: # Within 20cm print("Close enough — stop") ``` > **Common mistake:** `distance_to()` ignores theta. A robot can be at exactly the right position but facing the wrong direction. For full goal checking (position + heading), use `NavGoal.is_reached()` instead. ### `.normalize_angle()` — Fix Theta Wrap-Around ```python # simplified pose = Pose2D(x=0.0, y=0.0, theta=7.0) pose.normalize_angle() print(f"{pose.theta:.3f}") # 0.717 (wrapped to [-pi, pi]) ``` Wraps theta into the range [-π, π]. Without this, angular calculations break — for example, the difference between θ=359° and θ=1° is 2°, not 358°. Always normalize after adding angles: ```python # simplified pose.theta += turn_rate * dt pose.normalize_angle() ``` > **Common mistake:** Comparing angles without normalizing. `theta=3.14` and `theta=-3.14` are almost the same heading (both ≈ 180°), but their numerical difference is 6.28. ### `.is_valid()` — Validation ```python # simplified print(pose.is_valid()) # True if x, y, theta are all finite ``` --- ## Pose3D 3D pose with position and quaternion orientation. Used for arms, drones, and bridging 2D navigation with 3D visualization. ### Constructor ```python # simplified pose = Pose3D(x=1.0, y=2.0, z=0.5, qx=0.0, qy=0.0, qz=0.0, qw=1.0) ``` ### `.identity()` — Default Pose ```python # simplified pose = Pose3D.identity() # Origin, no rotation ``` ### `.from_pose_2d(pose2d)` — Bridge 2D to 3D ```python # simplified nav_pose = Pose2D(x=5.0, y=3.0, theta=1.57) viz_pose = Pose3D.from_pose_2d(nav_pose) # x=5.0, y=3.0, z=0.0, quaternion = 90° yaw ``` Converts a 2D navigation pose to a 3D pose by setting z=0 and converting theta to a yaw quaternion. Essential when your navigation planner works in 2D but your visualization or transform system works in 3D. ### `.distance_to(other)` — 3D Distance ```python # simplified dist = pose_a.distance_to(pose_b) # Euclidean distance, position only ``` Same as Point3.distance_to — measures straight-line 3D distance, ignoring orientation. ### `.is_valid()` — Validation ```python # simplified print(pose.is_valid()) # True if all position + quaternion fields are finite ``` --- ## CmdVel 2D velocity command — the standard way to drive mobile robots. Simpler than Twist (only linear + angular for 2D). ### Constructor ```python # simplified cmd = CmdVel(linear=0.5, angular=0.3) # Forward at 0.5 m/s, turning left ``` ### `.zero()` — Stop Command ```python # simplified stop = CmdVel.zero() # linear=0, angular=0 ``` The emergency stop / halt command. Use this when the robot needs to stop immediately: ```python # simplified if obstacle_too_close: cmd_topic.send(CmdVel.zero()) ``` > **Common mistake:** Using `CmdVel(linear=0.0, angular=0.0)` instead of `CmdVel.zero()`. Both work, but `zero()` is clearer and less error-prone (no risk of typos in field names). --- ## Twist Full 6-DOF velocity (linear xyz + angular xyz). Used for 3D robots — drones, manipulators, holonomic bases. ### Constructor ```python # simplified twist = Twist(linear_x=1.0, linear_y=0.0, linear_z=0.0, angular_x=0.0, angular_y=0.0, angular_z=0.5) ``` ### `.stop()` — Zero Velocity ```python # simplified halt = Twist.stop() # All zeros ``` Emergency halt for any robot type. Published on a velocity topic to command immediate stop. ### `.new_2d(linear_x, angular_z)` — 2D Shorthand ```python # simplified twist = Twist.new_2d(linear_x=0.5, angular_z=0.3) # Sets only linear_x and angular_z, rest are zero ``` Most mobile robots only use 2 of the 6 DOF. This shorthand avoids filling in 4 zeros: ```python # simplified # Instead of: twist = Twist(linear_x=0.5, linear_y=0.0, linear_z=0.0, angular_x=0.0, angular_y=0.0, angular_z=0.3) # Just: twist = Twist.new_2d(linear_x=0.5, angular_z=0.3) ``` ### `.is_valid()` — Validation ```python # simplified print(twist.is_valid()) # True if all 6 components are finite ``` --- ## TransformStamped A 3D coordinate frame transformation (translation + rotation). Used with the TransformFrame system to describe how different parts of the robot relate spatially. ### Constructor ```python # simplified tf = TransformStamped(tx=1.0, ty=0.0, tz=0.5, rx=0.0, ry=0.0, rz=0.0, rw=1.0) ``` ### `.identity()` — No Transform ```python # simplified tf = TransformStamped.identity() # Zero translation, identity rotation ``` ### `.from_pose_2d(pose)` — Convert 2D Pose ```python # simplified pose = Pose2D(x=1.0, y=2.0, theta=0.5) tf = TransformStamped.from_pose_2d(pose) ``` Creates a 3D transform from a 2D pose. Common for publishing the odom→base_link transform from wheel odometry. ### `.is_valid()` — Validation ```python # simplified print(tf.is_valid()) # True if all fields are finite ``` ### `.normalize_rotation()` — Fix Quaternion Drift ```python # simplified tf.normalize_rotation() ``` Same as `Quaternion.normalize()` but applied to the transform's rotation component. Call this periodically when composing transforms over time. --- ## PoseWithCovariance, TwistWithCovariance Pose and velocity with uncertainty estimates. Used in sensor fusion (EKF, particle filters) where you need to track not just the value but how confident you are. ### `.position_variance()` — Position Uncertainty ```python # simplified pwc = PoseWithCovariance(x=1.0, y=2.0, z=0.0) pwc.covariance = [0.1, 0,0,0,0,0, 0,0.1,0,0,0,0, 0,0,0.01,0,0,0, 0,0,0,0.01,0,0, 0,0,0,0,0.01,0, 0,0,0,0,0,0.01] var = pwc.position_variance() # [0.1, 0.1, 0.01] — diagonal of position block ``` Extracts [var_x, var_y, var_z] from the 6x6 covariance matrix diagonal. Larger values mean less certainty about the position. ### `.orientation_variance()` — Orientation Uncertainty ```python # simplified var = pwc.orientation_variance() # [var_roll, var_pitch, var_yaw] ``` ### `.linear_variance()` / `.angular_variance()` Same pattern for TwistWithCovariance — extract velocity uncertainty: ```python # simplified twc = TwistWithCovariance(linear_x=0.5, angular_z=1.0) lin_var = twc.linear_variance() # [var_vx, var_vy, var_vz] ang_var = twc.angular_variance() # [var_wx, var_wy, var_wz] ``` --- ## Accel, AccelStamped Linear and angular acceleration. ### `.is_valid()` — Validation ```python # simplified accel = Accel(linear_x=9.81, angular_z=0.1) print(accel.is_valid()) # True if all fields are finite ``` --- ## Complete Examples ### Example: Obstacle Avoidance with Vector Math ```python # simplified from horus import Vector3, Point3 robot_pos = Point3(x=0.0, y=0.0, z=0.0) obstacle_pos = Point3(x=2.0, y=1.0, z=0.0) dist = robot_pos.distance_to(obstacle_pos) if dist < 3.0: # Compute direction away from obstacle dx = robot_pos.x - obstacle_pos.x dy = robot_pos.y - obstacle_pos.y repulse = Vector3(x=dx, y=dy, z=0.0) repulse.normalize() # Pure direction # Scale by inverse distance (closer = stronger repulsion) strength = 1.0 / max(dist, 0.1) escape = Vector3( x=repulse.x * strength, y=repulse.y * strength, z=0.0, ) print(f"Escape force: {escape.magnitude():.2f} (dir: {repulse})") ``` ### Example: Navigation Goal Check ```python # simplified from horus import Node, run, Pose2D, CmdVel, Topic odom_topic = Topic(Pose2D) cmd_topic = Topic(CmdVel) goal = Pose2D(x=5.0, y=3.0, theta=0.0) def navigate(node): current = odom_topic.recv(node) if current is None: return dist = current.distance_to(goal) if dist < 0.2: cmd_topic.send(CmdVel.zero(), node) # Arrived else: cmd_topic.send(CmdVel(linear=0.3, angular=0.0), node) run(Node(tick=navigate, rate=10, pubs=["cmd_vel"], subs=["pose"])) ``` ### Example: Setting Up Coordinate Frames ```python # simplified from horus import TransformStamped, Pose2D, Quaternion import math # Base link to camera: camera is 0.3m forward, 0.2m up, tilted down 15° camera_tf = TransformStamped( tx=0.3, ty=0.0, tz=0.2, rx=0.0, ry=0.0, rz=0.0, rw=1.0, ) # Set the pitch rotation (15° down) q = Quaternion.from_euler(roll=0.0, pitch=math.radians(-15), yaw=0.0) camera_tf.rx = q.x camera_tf.ry = q.y camera_tf.rz = q.z camera_tf.rw = q.w # Odom to base_link from 2D odometry odom_pose = Pose2D(x=1.5, y=0.3, theta=0.2) odom_tf = TransformStamped.from_pose_2d(odom_pose) ``` --- ## Design Decisions **Why separate `Vector3` and `Point3`?** They are mathematically similar (three floats) but semantically different. A `Vector3` is a direction or displacement (it can be added, scaled, normalized). A `Point3` is a location in space (it can be measured for distance, but normalizing a location is meaningless). Separating them prevents bugs like normalizing a waypoint coordinate or taking the distance between two velocities. **Why `Quaternion` instead of Euler angles for orientation?** Euler angles suffer from gimbal lock (a mathematical singularity where you lose a degree of freedom at pitch = +/-90 degrees). Quaternions compose cleanly, interpolate smoothly (SLERP), and never have singularities. The tradeoff is that quaternions are unintuitive for humans, which is why `from_euler()` exists as the primary construction method. **Why both `CmdVel` and `Twist`?** `CmdVel` is a 2-DOF velocity (linear + angular) for ground robots. `Twist` is a 6-DOF velocity (linear xyz + angular xyz) for drones, arms, and holonomic bases. Most mobile robots only need `CmdVel`, and using a simpler type prevents accidentally commanding out-of-plane motion. Use `Twist` only when you genuinely need 6 degrees of freedom. **Why `PoseWithCovariance` instead of separate pose and covariance messages?** Covariance is meaningless without the value it describes. Publishing them separately creates a synchronization problem (which covariance goes with which pose?). Bundling them guarantees the uncertainty estimate always matches the measurement. **Why does `TransformStamped` have `from_pose_2d()` but not `from_pose_3d()`?** The 2D-to-3D bridge is the common case (wheel odometry produces Pose2D, but the transform tree is 3D). A Pose3D already contains a position and quaternion, so converting it to a TransformStamped is just field assignment with no mathematical conversion needed. --- ## See Also - [Sensor Messages](/python/messages/sensor) — LaserScan, Imu, Odometry - [Navigation Messages](/python/messages/navigation) — NavGoal, OccupancyGrid - [TransformFrame API](/python/api/transform-frame) — Coordinate frame management - [Control Messages](/python/messages/control) — CmdVel-to-wheel conversions - [Python Message Library](/python/library/python-message-library) — All 55+ message types overview --- ## Async Nodes Path: /python/api/async-nodes Description: Python async/await support for non-blocking HORUS nodes # Async Nodes HORUS automatically detects `async def` tick functions and runs them on the async I/O thread pool. No special classes or imports needed — just pass an async function to `Node()`. ## Basic Usage ```python # simplified import horus import aiohttp async def fetch_weather(node): async with aiohttp.ClientSession() as session: async with session.get("https://api.weather.com/data") as resp: data = await resp.json() node.send("weather", data) node = horus.Node( name="weather", tick=fetch_weather, rate=1, pubs=["weather"], ) horus.run(node) ``` That's it. `async def` is auto-detected — the scheduler runs this node on the async I/O thread pool (matching Rust's `.async_io()` execution class), so it doesn't block other nodes. ## How It Works When you pass an `async def` to `Node(tick=...)`: 1. `Node()` detects that `tick` is an async function 2. The scheduler automatically applies the async I/O execution class (matching Rust `.async_io()`) 3. The Rust scheduler runs this node on a separate thread pool, not the main tick thread Other (sync) nodes continue ticking while async nodes await. **Async Node Detection** | Signal | Result | |--------|--------| | `async def tick(node):` | Auto-classified as AsyncIo | | `async def init(node):` | Init runs on async runtime | | `async def shutdown(node):` | Shutdown runs on async runtime | | Regular `def tick(node):` | Stays in default execution class | ## Async Init and Shutdown `init` and `shutdown` callbacks can also be async: ```python # simplified import horus import asyncpg async def setup(node): node.db = await asyncpg.connect("postgresql://localhost/robotics") async def process(node): if node.has_msg("data"): data = node.recv("data") await node.db.execute("INSERT INTO logs (value) VALUES ($1)", data) async def cleanup(node): await node.db.close() node = horus.Node( name="db_logger", tick=process, init=setup, shutdown=cleanup, rate=10, subs=["data"], ) horus.run(node) ``` ## Complete Example: HTTP API + Database ```python # simplified import horus import aiohttp import asyncpg async def fetch(node): """Fetch sensor data from HTTP API""" async with aiohttp.ClientSession() as session: async with session.get("https://api.example.com/sensor") as resp: if resp.status == 200: data = await resp.json() node.send("sensor.data", data) async def store_init(node): node.db = await asyncpg.connect("postgresql://localhost/robotics") async def store(node): """Store received data in database""" if node.has_msg("sensor.data"): data = node.recv("sensor.data") await node.db.execute( "INSERT INTO sensor_log (temp, humidity) VALUES ($1, $2)", data["temperature"], data["humidity"] ) async def store_shutdown(node): await node.db.close() horus.run( horus.Node(name="fetcher", tick=fetch, rate=1, pubs=["sensor.data"], order=0), horus.Node(name="storer", tick=store, init=store_init, shutdown=store_shutdown, rate=10, subs=["sensor.data"], order=1), ) ``` ## Mixing Sync and Async Sync and async nodes work together in the same scheduler. Sync nodes run on the main tick thread, async nodes run on the I/O thread pool: ```python # simplified import horus def read_sensor(node): """Fast sync sensor read""" node.send("raw", get_lidar_data()) async def upload(node): """Slow async cloud upload""" if node.has_msg("raw"): data = node.recv("raw") await cloud_client.upload(data) horus.run( horus.Node(name="sensor", tick=read_sensor, rate=100, order=0, pubs=["raw"]), horus.Node(name="upload", tick=upload, rate=1, order=1, subs=["raw"]), ) ``` No special handling — the scheduler detects which is async and routes accordingly. ## Quick Reference | Feature | Sync Node | Async Node | |---------|-----------|------------| | Tick function | `def tick(node):` | `async def tick(node):` | | Execution class | Default / Compute / Rt | AsyncIo (auto) | | Thread pool | Main tick thread | Separate I/O thread pool | | Detection | Automatic | Automatic (`inspect.iscoroutinefunction`) | | Latency overhead | ~0 | ~1ms (event loop scheduling) | | Best for | Sensor reads, control, ML inference | HTTP, database, WebSocket, file I/O | ## When to Use Async **Good use cases:** - HTTP/REST API integration (aiohttp, httpx) - Database operations (asyncpg, aioredis, motor) - WebSocket connections - File I/O operations - Any I/O-bound work that benefits from `await` **Not ideal for:** - CPU-bound computation -- use `compute=True` instead - Real-time control loops -- use sync tick with `budget` and `deadline` - Operations requiring sub-millisecond latency -- async overhead is ~1ms ## Design Decisions **Why auto-detect instead of explicit `async=True` parameter?** Python already distinguishes `def` from `async def` at the language level. Auto-detection means zero boilerplate -- just write `async def tick(node):` and the scheduler does the right thing. This matches Python's "explicit is better than implicit" principle: the `async` keyword *is* the explicit signal. **Why a separate I/O thread pool instead of running async on the main tick thread?** The main tick thread runs all sync nodes with deterministic timing. If an async node's `await` blocked the main thread, it would delay every other node. Running async nodes on a dedicated thread pool isolates I/O latency from control loop timing. **Why no `await` for `node.send()` and `node.recv()`?** Topic operations use lock-free shared memory and complete in microseconds. Making them async would add event loop overhead for no benefit. Keep topic I/O synchronous even inside async tick functions. **Mixing sync and async:** The scheduler treats async nodes as AsyncIo execution class, which runs on the I/O thread pool alongside Rust `.async_io()` nodes. Sync and async nodes communicate through the same topics with no special handling -- shared memory IPC is thread-safe by design. ## Error Handling When an async tick raises an exception, it flows through the same `FailurePolicy` as sync nodes -- there is no separate error path for async. `KeyboardInterrupt` is special-cased: it sets the scheduler's stop flag instead of propagating, so Ctrl+C triggers a clean shutdown rather than crashing. Use try/except inside your async tick to handle expected failures gracefully: ```python # simplified import horus import aiohttp async def resilient_tick(node): try: async with aiohttp.ClientSession() as session: async with session.get("http://api.example.com/data") as resp: data = await resp.json() node.send("api_data", data) except aiohttp.ClientError as e: node.log_error(f"HTTP request failed: {e}") except Exception as e: node.log_error(f"Unexpected error: {e}") node = horus.Node( name="resilient_fetcher", tick=resilient_tick, rate=1, pubs=["api_data"], ) horus.run(node) ``` ## Cancellation Behavior When the scheduler stops (Ctrl+C, duration expires), pending awaits complete before `shutdown` runs -- they are **not** cancelled. This means a hanging `await` will block shutdown indefinitely. Always use timeouts on network requests and any other awaits that could hang. If your async tick makes an HTTP call without a timeout and the server never responds, the entire scheduler will hang on shutdown waiting for that tick to finish. ## Timeout Pattern Wrap network calls with `asyncio.wait_for()` to guarantee bounded execution time: ```python # simplified import asyncio import aiohttp async def safe_tick(node): try: async with aiohttp.ClientSession() as session: resp = await asyncio.wait_for( session.get("http://api.example.com/data"), timeout=2.0 ) data = await resp.json() node.send("api_data", data) except asyncio.TimeoutError: node.log_warning("API timeout — skipping this tick") except Exception as e: node.log_error(f"API error: {e}") ``` ## Async File I/O Use `aiofiles` for non-blocking log writing: ```python # simplified import horus import aiofiles log_file = None async def logger_init(node): global log_file log_file = await aiofiles.open("sensor_log.csv", "w") await log_file.write("timestamp,value\n") async def logger_tick(node): msg = node.recv("sensor") if msg: await log_file.write(f"{horus.timestamp_ns()},{msg.get('value', 0)}\n") async def logger_shutdown(node): await log_file.close() node.log_info("Log file closed") logger = horus.Node( name="logger", init=logger_init, tick=logger_tick, shutdown=logger_shutdown, rate=100, subs=["sensor"], ) ``` --- ## Testing Async Nodes `tick_once()` works with async nodes — the scheduler runs the async event loop internally: ```python # simplified sched = horus.Scheduler(tick_rate=10, deterministic=True) sched.add(horus.Node(name="fetcher", tick=async_tick, rate=1, pubs=["data"])) # tick_once() handles async nodes transparently for _ in range(5): sched.tick_once() ``` --- ## Common Errors | Error | Cause | Fix | |-------|-------|-----| | Node hangs on shutdown | await without timeout | Use `asyncio.wait_for()` | | `compute=True` with `async def` | Mutually exclusive | Remove `compute=True` or make tick synchronous | | `on="topic"` with `async def` | Mutually exclusive | Remove `on=` or make tick synchronous | | Low throughput | GIL + await overhead | Reduce rate, use batch processing | ## See Also - [Python Bindings](/python/api/python-bindings) — Core Python API - [ML Utilities](/python/library/ml-utilities) — ML inference helpers - [Examples](/python/examples) — More Python examples - [Custom Messages](/python/api/custom-messages) — Define your own message types --- ## Scheduler API Path: /python/api/scheduler Description: Python Scheduler class — node orchestration, tick-rate control, real-time scheduling, watchdogs, recording, and the run() convenience function # Scheduler API The `Scheduler` orchestrates node execution with tick-rate control, real-time scheduling, watchdogs, and recording. ```python # simplified import horus sched = horus.Scheduler(tick_rate=100, rt=True) sched.add(my_node) sched.run() ``` --- ## Constructor ```python # simplified horus.Scheduler( *, # keyword-only tick_rate=1000.0, # Global tick rate in Hz rt=False, # Enable RT scheduling (memory locking, SCHED_FIFO) deterministic=False, # SimClock, fixed dt, seeded RNG blackbox_mb=0, # Flight recorder size (0 = disabled) watchdog_ms=0, # Global watchdog timeout (0 = disabled) recording=False, # Enable session recording name=None, # Scheduler name for logging cores=None, # CPU affinity list (e.g., [0, 1]) max_deadline_misses=None, # Escalation threshold verbose=False, # Debug logging telemetry=None, # Telemetry endpoint URL ) ``` --- ## Lifecycle Methods | Method | Signature | Returns | Description | |--------|-----------|---------|-------------| | `add` | `add(node: Node) -> Scheduler` | `self` (chaining) | Register a node | | `run` | `run(duration: float = None)` | — | Start tick loop. `None` = run forever | | `stop` | `stop()` | — | Signal graceful shutdown | | `is_running` | `is_running() -> bool` | `bool` | Check if scheduler is running | | `status` | `status() -> str` | `"idle"`, `"running"`, `"stopped"` | Current state | | `current_tick` | `current_tick() -> int` | Tick count | Current tick number | | `scheduler_name` | `scheduler_name() -> str` | Name | Scheduler name for logging | ### add() ```python # simplified sched.add(node) -> Scheduler # Returns self for chaining ``` Registers a node with the scheduler. Returns `self` for chaining: `sched.add(a).add(b).add(c)`. **Edge cases:** - Duplicate `name` raises an error — node names must be unique - Can be called before `run()` only — adding nodes during `run()` is not supported - Does NOT call `init()` — init happens lazily on first `run()` or `tick_once()` ### run() ```python # simplified sched.run(duration: float = None) ``` Start the tick loop. **Blocks** until completion. - `duration=None` — run forever (until Ctrl+C, SIGTERM, or `request_stop()`) - `duration=10.0` — run for 10 seconds, then return - **GIL is released** during the Rust scheduler loop — other Python threads run freely - **GIL is re-acquired** only when calling Python tick/init/shutdown callbacks (~500ns per acquire) - Ctrl+C triggers graceful shutdown: all nodes get `shutdown()` called ### stop() ```python # simplified sched.stop() ``` Signal graceful shutdown from another thread or from within a node's `request_stop()`. --- ## Single-Tick Execution (Testing & Simulation) | Method | Signature | Returns | Description | |--------|-----------|---------|-------------| | `tick_once` | `tick_once(node_names: list = None)` | — | Execute one tick cycle (lazy init on first call) | | `tick_for` | `tick_for(duration: float, node_names: list = None)` | — | Run tick loop for a duration, then return | ### tick_once() ```python # simplified sched.tick_once(node_names: list = None) ``` Execute exactly one tick cycle and return. **Lazy init**: on first call, `init()` is called on all nodes. - `node_names=None` — tick all nodes in order - `node_names=["sensor", "controller"]` — tick only named nodes (skip others) - Each call advances the tick counter by 1 - In deterministic mode, `horus.dt()` returns fixed `1/rate` per tick - Async nodes are handled transparently (async event loop runs internally) **Edge cases:** - Calling before `add()` does nothing (no nodes to tick) - Filtered `node_names` that don't exist are silently ignored - If a node's `init()` fails, `tick_once()` raises based on `failure_policy` ### tick_for() ```python # simplified sched.tick_for(duration: float, node_names: list = None) ``` Run the tick loop for `duration` seconds, then return. Useful for bounded test runs: ```python # simplified sched.tick_for(1.0) # Run for 1 second at tick_rate, then return ``` ### Example: Testing with tick_once() Step through ticks manually for unit testing and simulation: ```python # simplified import horus results = [] def sensor_tick(node): node.send("temp", {"value": 25.0 + horus.tick() * 0.1}) def logger_tick(node): msg = node.recv("temp") if msg: results.append(msg["value"]) sensor = horus.Node(name="sensor", pubs=["temp"], tick=sensor_tick, rate=100, order=0) logger = horus.Node(name="logger", subs=["temp"], tick=logger_tick, rate=100, order=1) sched = horus.Scheduler(tick_rate=100, deterministic=True) sched.add(sensor) sched.add(logger) # Step through 5 ticks for _ in range(5): sched.tick_once() assert len(results) == 5 assert results[0] == 25.0 print(f"Passed: {results}") ``` --- ## Runtime Mutation | Method | Signature | Returns | Description | |--------|-----------|---------|-------------| | `set_node_rate` | `set_node_rate(name: str, rate: float)` | — | Change a node's tick rate at runtime | | `set_tick_budget` | `set_tick_budget(name: str, budget_us: int)` | — | Change a node's tick budget at runtime (microseconds) | | `add_critical_node` | `add_critical_node(name: str, timeout_ms: int)` | — | Mark node as safety-critical with watchdog timeout | | `remove_node` | `remove_node(name: str) -> bool` | `bool` | Exclude a node from stats and queries (node still ticks until next restart) | ### Example: Runtime Safety Configuration Mark safety-critical nodes and adjust budgets at runtime: ```python # simplified import horus def motor_tick(node): cmd = node.recv("cmd_vel") if cmd: node.send("motor_cmd", {"rpm": cmd.linear * 100}) motor = horus.Node( name="motor", subs=[horus.CmdVel], pubs=["motor_cmd"], tick=motor_tick, rate=1000, budget=300 * horus.us, on_miss="safe_mode", ) sched = horus.Scheduler(tick_rate=1000, watchdog_ms=500) sched.add(motor) # Mark motor as critical — triggers enter_safe_state() on all nodes if motor # exceeds 500ms without ticking sched.add_critical_node("motor", timeout_ms=500) # Adjust budget at runtime (e.g., after profiling shows headroom) sched.set_tick_budget("motor", 200) # tighten to 200μs # Check RT capabilities if sched.has_full_rt(): print("Full RT: memory locked, SCHED_FIFO active") else: for d in sched.degradations(): print(f"RT degradation: {d['feature']} — {d['reason']}") sched.run() # After run, inspect safety stats stats = sched.safety_stats() if stats: print(f"Deadline misses: {stats.get('deadline_misses', 0)}") print(f"Watchdog expirations: {stats.get('watchdog_expirations', 0)}") ``` --- ## Introspection | Method | Signature | Returns | Description | |--------|-----------|---------|-------------| | `get_node_stats` | `get_node_stats(name: str) -> dict` | Metrics dict | Get node performance stats | | `get_all_nodes` | `get_all_nodes() -> list` | Node info list | Get all registered nodes | | `get_node_names` | `get_node_names() -> list[str]` | Name list | Get all node names | | `get_node_count` | `get_node_count() -> int` | Count | Number of registered nodes | | `has_node` | `has_node(name: str) -> bool` | `bool` | Check if a node exists | | `get_node_info` | `get_node_info(name: str) -> Optional[int]` | Order or `None` | Get execution order of a node | ### Example: Introspection at Runtime ```python # simplified sched = horus.Scheduler(tick_rate=100, name="my_robot") sched.add(sensor) sched.add(controller) # After starting (e.g., in a monitoring thread) print(f"Scheduler: {sched.scheduler_name()}") print(f"Status: {sched.status()}") print(f"Nodes: {sched.get_node_names()}") print(f"Count: {sched.get_node_count()}") print(f"Has motor? {sched.has_node('motor')}") # Per-node stats for name in sched.get_node_names(): stats = sched.get_node_stats(name) print(f" {name}: {stats['total_ticks']} ticks, avg {stats.get('avg_tick_duration_ms', 0):.2f}ms") ``` --- ## RT & Safety | Method | Signature | Returns | Description | |--------|-----------|---------|-------------| | `capabilities` | `capabilities() -> dict` | Capability dict | RT support, CPU features | | `has_full_rt` | `has_full_rt() -> bool` | `bool` | Full RT capabilities available? | | `degradations` | `degradations() -> list[dict]` | Degradation list | RT features requested but unavailable (dicts with `feature`, `reason`, `severity` keys) | | `safety_stats` | `safety_stats() -> dict` | Stats dict | Watchdog stats, deadline misses, health states | --- ## Recording & Replay | Method | Signature | Returns | Description | |--------|-----------|---------|-------------| | `is_recording` | `is_recording() -> bool` | `bool` | Session recording active? | | `is_replaying` | `is_replaying() -> bool` | `bool` | Session replay active? | | `stop_recording` | `stop_recording() -> list[str]` | File paths | Stop recording, return session files | | `list_recordings` | `list_recordings() -> list[str]` | Session list | List available recordings | | `delete_recording` | `delete_recording(name: str)` | `None` | Delete a recorded session | ### Example: Recording and Replay ```python # simplified import horus def sensor_tick(node): node.send("imu", horus.Imu(accel_x=0.0, accel_y=0.0, accel_z=9.81)) sensor = horus.Node(name="imu", pubs=[horus.Imu], tick=sensor_tick, rate=100) # Record a 5-second session sched = horus.Scheduler(tick_rate=100, recording=True) sched.add(sensor) sched.run(duration=5.0) # Stop recording and get file paths files = sched.stop_recording() print(f"Recorded to: {files}") # List all recordings for rec in sched.list_recordings(): print(f" Session: {rec}") # Delete a recording sched.delete_recording(files[0]) ``` --- ## Context Manager ```python # simplified with horus.Scheduler(tick_rate=100) as sched: sched.add(sensor_node) sched.add(controller_node) sched.run(duration=10.0) # Run for 10 seconds # sched.stop() called automatically on exit ``` --- ## Deterministic Mode Deterministic mode uses SimClock (fixed dt), seeded RNG, and sequential execution — every run produces identical output: ```python # simplified import horus outputs = [] def physics_tick(node): # horus.dt() returns fixed 1/rate in deterministic mode # horus.rng_float() returns tick-seeded values (reproducible) noise = horus.rng_float() * 0.01 position = horus.dt() * 10.0 + noise outputs.append(position) node = horus.Node(name="physics", tick=physics_tick, rate=100) # Run 1 sched = horus.Scheduler(tick_rate=100, deterministic=True) sched.add(node) sched.run(duration=1.0) run1 = outputs.copy() # Run 2 — identical output outputs.clear() sched2 = horus.Scheduler(tick_rate=100, deterministic=True) sched2.add(horus.Node(name="physics", tick=physics_tick, rate=100)) sched2.run(duration=1.0) run2 = outputs.copy() assert run1 == run2, "Deterministic mode guarantees identical output" ``` --- ## Multi-Node System ```python # simplified import horus def sensor_tick(node): reading = horus.Imu(accel_x=0.0, accel_y=0.0, accel_z=9.81) node.send("imu", reading) def controller_tick(node): imu = node.recv("imu") if imu: cmd = horus.CmdVel(linear=0.5 if imu.accel_z > 9.0 else 0.0, angular=0.0) node.send("cmd_vel", cmd) sensor = horus.Node(name="imu_sensor", pubs=[horus.Imu], tick=sensor_tick, rate=100, order=0) controller = horus.Node(name="nav", subs=[horus.Imu], pubs=[horus.CmdVel], tick=controller_tick, rate=50, order=1) sched = horus.Scheduler(tick_rate=100, watchdog_ms=500) sched.add(sensor) sched.add(controller) sched.run() ``` --- ## run() Convenience Function One-liner — creates a Scheduler, adds all nodes, and runs: ```python # simplified horus.run( *nodes, # Node instances to run duration=None, # Seconds to run (None = forever) tick_rate=1000.0, # Global tick rate rt=False, # RT scheduling deterministic=False, # Deterministic mode watchdog_ms=0, # Watchdog timeout blackbox_mb=0, # Flight recorder recording=False, # Session recording name=None, # Scheduler name cores=None, # CPU affinity max_deadline_misses=None, # Miss threshold verbose=False, # Debug logging telemetry=None, # Telemetry endpoint ) ``` Equivalent to: ```python # simplified sched = horus.Scheduler(tick_rate=tick_rate, rt=rt, ...) for node in nodes: sched.add(node) sched.run(duration=duration) ``` --- ## Execution Classes The scheduler automatically classifies each node into an execution class based on its configuration: | Node Configuration | Execution Class | Thread | Timing | |-------------------|----------------|--------|--------| | `rate=100, budget=X` or `deadline=X` | **Rt** (auto-detected) | Dedicated RT thread (SCHED_FIFO if available) | Strict budget/deadline enforcement | | `compute=True` | **Compute** | Worker thread pool | No timing guarantee | | `on="topic_name"` | **Event** | Event-triggered | Tick only when topic has data | | `async def tick` | **AsyncIo** | Async I/O thread pool | Async event loop scheduling | | None of the above | **BestEffort** | Main tick thread | Best-effort timing | **RT auto-detection**: Setting `rate` with `budget` or `deadline` automatically classifies the node as RT — no explicit flag needed on the node. The scheduler's `rt=True` enables the RT runtime (memory locking, SCHED_FIFO); individual nodes opt in via timing constraints. **Mutual exclusion**: `compute=True`, `on="topic"`, and `async def tick` are mutually exclusive. Combining them raises an error. --- ## Design Decisions **Why keyword-only constructor?** All parameters are keyword-only (`*` in the signature) to prevent positional argument mistakes. `Scheduler(100, True)` is ambiguous — is 100 the tick rate or watchdog timeout? `Scheduler(tick_rate=100, rt=True)` is unambiguous. **Why `run()` blocks?** The scheduler's tick loop must own the execution thread for deterministic timing. Returning a future or running in a background thread would add jitter. For concurrent Python work (HTTP server, monitoring), use `async def` nodes or Python threads — the GIL is released during `run()`. **Why `rt=True` on the scheduler, not per-node?** RT requires system-level setup (memory locking, SCHED_FIFO). This is a process-level decision, not per-node. Individual nodes opt into RT timing via `budget`/`deadline` — the scheduler handles the runtime. **Why `tick_once()` for testing?** Unit tests need deterministic, step-by-step execution. `tick_once()` executes one complete tick cycle (all nodes in order) and returns. This makes tests reproducible without timers or sleeps. **Why release the GIL during `run()`?** The tick loop is Rust code — no Python objects are accessed between callbacks. Releasing the GIL lets other Python threads (Flask server, monitoring, logging) run concurrently. The GIL is re-acquired only for `tick()`/`init()`/`shutdown()` callbacks (~500ns per acquire). --- ## Trade-offs | Choice | Benefit | Cost | |--------|---------|------| | `run()` blocks | Deterministic timing, no jitter | Must use threads for concurrent Python work | | GIL released during run | Other Python threads work | ~500ns GIL re-acquire per tick per node | | Keyword-only constructor | No positional argument mistakes | More typing than positional args | | `tick_once()` for testing | Step-by-step deterministic tests | Must manually loop for multi-tick scenarios | | Auto RT classification | No manual RT flags | Must understand budget/deadline → RT mapping | | Context manager support | Clean resource cleanup | Extra indentation | --- ## See Also - [Node API](/python/api/node) — Node constructor, methods, lifecycle - [Clock API](/python/api/clock) — Framework clock functions - [Deterministic Mode](/advanced/deterministic-mode) — Full deterministic mode guide - [Safety Monitor](/advanced/safety-monitor) — Graduated degradation - [Record & Replay](/advanced/record-replay) — Session recording guide - [Scheduler Deep-Dive](/python/scheduler-guide) — Python scheduler patterns - [Rust Scheduler API](/rust/api/scheduler) — Rust equivalent --- ## Sensor Messages Path: /python/messages/sensor Description: LiDAR, IMU, GPS, battery, and environmental sensor types for Python robotics — every method explained # Sensor Messages Sensor messages carry raw and processed data from your robot's sensors — LiDAR scans, IMU readings, GPS positions, battery levels, and more. These are the inputs to every robotics pipeline: perception, navigation, monitoring. **When you need these:** Every robot has sensors. If it moves, you need `Odometry`. If it has a lidar, you need `LaserScan`. If it has an IMU, you need `Imu`. If it runs on battery, you need `BatteryState`. ```python # simplified from horus import ( LaserScan, Imu, Odometry, JointState, BatteryState, NavSatFix, RangeSensor, MagneticField, Temperature, FluidPressure, Illuminance, Clock, TimeReference, ) ``` --- ## LaserScan 2D LiDAR scan — an array of range readings at known angles. The most common sensor for obstacle avoidance in mobile robotics. ### Constructor ```python # simplified import math scan = LaserScan( angle_min=-math.pi, # Start angle (radians) angle_max=math.pi, # End angle (radians) range_min=0.1, # Minimum valid range (meters) range_max=10.0, # Maximum valid range (meters) ranges=[1.0, 1.5, 2.0], # Range readings ) ``` ### `.angle_at(index)` — Angle for a Range Reading ```python # simplified angle = scan.angle_at(0) # Angle of the first reading angle = scan.angle_at(100) # Angle of the 101st reading ``` Computes the angle (in radians) for a given index in the `ranges` array. The formula is: `angle_min + index * angle_increment`. You need this to convert polar (angle, range) readings to cartesian (x, y) coordinates: ```python # simplified import math for i in range(len(scan.ranges)): if scan.is_range_valid(i): angle = scan.angle_at(i) r = scan.ranges[i] x = r * math.cos(angle) # In robot's frame y = r * math.sin(angle) ``` ### `.is_range_valid(index)` — Filter Bad Readings ```python # simplified if scan.is_range_valid(5): print(f"Valid: {scan.ranges[5]:.2f} m") else: print("Invalid: too close, too far, or NaN") ``` Returns `True` if the range at the given index falls within `[range_min, range_max]`. LiDAR sensors return invalid readings for many reasons — reflective surfaces, transparent objects, direct sunlight, readings below minimum range (too close to the sensor), or at maximum range (nothing detected). Always filter with this before using a range value. > **Common mistake:** Iterating `scan.ranges` without checking validity. Invalid readings are often 0.0 or `inf`, which will corrupt your obstacle map if not filtered. ### `.min_range()` — Closest Valid Object ```python # simplified closest = scan.min_range() if closest is not None and closest < 0.5: print("DANGER: object at {:.2f} m!".format(closest)) ``` Returns the minimum **valid** range reading, or `None` if all readings are invalid. This is the fastest way to check for close obstacles — one call instead of iterating the entire array. ### `len(scan)` — Valid Reading Count ```python # simplified print(f"{len(scan)} valid readings out of {len(scan.ranges)} total") ``` **Example — LiDAR Obstacle Detection:** ```python # simplified from horus import Node, run, LaserScan, CmdVel, Topic import math scan_topic = Topic(LaserScan) cmd_topic = Topic(CmdVel) def avoid_obstacles(node): scan = scan_topic.recv(node) if scan is None: return closest = scan.min_range() if closest is not None and closest < 0.5: # Too close — stop and turn cmd_topic.send(CmdVel(linear=0.0, angular=0.5), node) else: # Clear path — drive forward cmd_topic.send(CmdVel(linear=0.3, angular=0.0), node) run(Node(tick=avoid_obstacles, rate=10, pubs=["cmd_vel"], subs=["scan"])) ``` --- ## Imu Inertial Measurement Unit — accelerometer (linear acceleration) + gyroscope (angular velocity) + optional orientation quaternion. ### Constructor ```python # simplified imu = Imu(accel_x=0.0, accel_y=0.0, accel_z=9.81, gyro_x=0.0, gyro_y=0.0, gyro_z=0.1) ``` ### `.set_orientation_from_euler(roll, pitch, yaw)` — Set Orientation ```python # simplified import math imu.set_orientation_from_euler(roll=0.0, pitch=0.05, yaw=math.pi/4) ``` Converts Euler angles (radians) to the internal quaternion representation and stores it. Use this when you have orientation estimates from a sensor fusion algorithm (complementary filter, Madgwick, EKF) and want to publish them with the IMU data. - **roll**: tilt left/right (positive = right side down) - **pitch**: tilt forward/backward (positive = nose down) - **yaw**: compass heading (positive = counterclockwise from above) > **Common mistake:** Publishing orientation without actually computing it. If your IMU only has a 6-axis accelerometer+gyroscope (no magnetometer or fusion), don't set orientation — call `has_orientation()` to check, and let downstream nodes know the orientation field is empty. ### `.has_orientation()` — Is Orientation Data Present? ```python # simplified if imu.has_orientation(): # Safe to use orientation fields print("Orientation available") else: # Only acceleration and angular velocity are valid print("Raw IMU — no orientation estimate") ``` Not all IMUs provide orientation. A raw 6-axis chip gives only acceleration and angular velocity — orientation must be computed by a sensor fusion node. This method checks whether the orientation quaternion has been set (is non-identity). ### `.is_valid()` — Validation ```python # simplified if not imu.is_valid(): print("Bad IMU reading — NaN or inf detected") ``` ### `.angular_velocity_vec()` — Gyroscope as Vector3 ```python # simplified gyro = imu.angular_velocity_vec() # Returns Vector3 print(f"Rotation rate: {gyro.magnitude():.3f} rad/s") if gyro.magnitude() > 2.0: print("Spinning too fast!") ``` Returns the gyroscope reading as a `Vector3`, giving you access to `magnitude()`, `dot()`, `cross()`, and other vector operations. Much more useful than raw `gyro_x, gyro_y, gyro_z` fields when you need to compute rotation rates or compare angular velocities. ### `.linear_acceleration_vec()` — Accelerometer as Vector3 ```python # simplified accel = imu.linear_acceleration_vec() gravity_removed = accel.magnitude() - 9.81 print(f"Dynamic acceleration: {gravity_removed:.2f} m/s²") ``` Same idea — returns a `Vector3` for the accelerometer. On a stationary robot, the magnitude should be ≈ 9.81 m/s² (gravity). Deviations indicate dynamic acceleration (movement, vibration, impact). **Example — Tilt Detection:** ```python # simplified from horus import Imu, Topic import math imu_topic = Topic(Imu) def check_tilt(node): imu = imu_topic.recv(node) if imu is None or not imu.is_valid(): return accel = imu.linear_acceleration_vec() # Pitch angle from accelerometer (simplified, assumes no dynamic motion) pitch = math.atan2(accel.x, math.sqrt(accel.y**2 + accel.z**2)) if abs(pitch) > math.radians(15): print(f"WARNING: Robot tilted {math.degrees(pitch):.1f}° — risk of tipping!") ``` --- ## Odometry Robot odometry — 2D pose + velocity, typically published by wheel encoders or visual odometry systems. ### Constructor ```python # simplified odom = Odometry(x=1.0, y=2.0, theta=0.5, linear_velocity=0.5, angular_velocity=0.1) ``` ### `.set_frames(frame, child_frame)` — Set Coordinate Frames ```python # simplified odom.set_frames("odom", "base_link") ``` Sets the parent and child frame names. By convention: - `frame` = "odom" (the fixed odometry frame) - `child_frame` = "base_link" (the robot's body frame) These names are used by the TransformFrame system to build the transform tree. ### `.update(pose, twist)` — Update from New Measurements ```python # simplified from horus import Pose2D, Twist new_pose = Pose2D(x=1.1, y=2.05, theta=0.52) new_twist = Twist.new_2d(linear_x=0.5, angular_z=0.1) odom.update(new_pose, new_twist) ``` Replaces the pose and twist with new measurements. Use this in a sensor fusion node where you compute updated estimates each tick. ### `.is_valid()` — Validation ```python # simplified if not odom.is_valid(): print("Odometry contains NaN — encoder fault?") ``` --- ## JointState Multi-joint state for robot arms and legged robots. Stores up to 16 joint names, positions, velocities, and efforts. The key feature is **by-name lookup** — query individual joints without iterating arrays. ### Constructor ```python # simplified js = JointState( names=["shoulder", "elbow", "wrist"], positions=[1.57, 0.785, 0.0], velocities=[0.0, 0.1, 0.0], efforts=[5.0, 2.0, 0.5], ) ``` ### `.position(name)` — Look Up Joint Position ```python # simplified shoulder_angle = js.position("shoulder") # 1.57 gripper_angle = js.position("gripper") # None (not found) ``` Returns the position of the named joint, or `None` if that joint doesn't exist in this message. Positions are typically in radians for revolute joints or meters for prismatic joints. ### `.velocity(name)` — Look Up Joint Velocity ```python # simplified elbow_speed = js.velocity("elbow") # 0.1 rad/s ``` ### `.effort(name)` — Look Up Joint Effort/Torque ```python # simplified wrist_torque = js.effort("wrist") # 0.5 Nm ``` > **Common mistake:** Expecting the same joint names in `JointState` and `JointCommand`. If the publisher uses "joint_1" but the subscriber expects "shoulder", the by-name lookup returns `None`. Standardize joint names across your system. **Example — Arm Joint Monitor:** ```python # simplified from horus import JointState, Topic js_topic = Topic(JointState) def monitor_arm(node): js = js_topic.recv(node) if js is None: return for name in js.names: pos = js.position(name) eff = js.effort(name) if eff is not None and abs(eff) > 10.0: print(f"WARNING: {name} effort {eff:.1f} Nm exceeds limit!") ``` --- ## BatteryState Battery monitoring with threshold checks and time estimation. ### Constructor ```python # simplified battery = BatteryState(voltage=11.1, percentage=15.0, current=-2.5) ``` ### `.is_critical()` — Below 10% ```python # simplified if battery.is_critical(): print("CRITICAL — land immediately!") ``` Hardcoded threshold at 10%. Intended for "land now" or "return to dock immediately" decisions. ### `.is_low(threshold)` — Below Custom Threshold ```python # simplified if battery.is_low(20.0): print("Low battery — start heading home") ``` Check against your own threshold. Typical values: 20-30% for "start returning", 10% for critical. ### `.time_remaining()` — Estimated Remaining Time ```python # simplified remaining = battery.time_remaining() if remaining is not None: print(f"~{remaining/60:.0f} minutes remaining") else: print("Not discharging (charging or current=0)") ``` Returns estimated seconds of battery life based on current draw, or `None` if the battery isn't discharging (current ≥ 0 or current = 0). This is a simple linear estimate — actual runtime depends on load variations. > **Common mistake:** Treating `None` as "infinite battery". `None` means the estimate isn't available — the battery might not be discharging, or the current sensor might not be connected. --- ## NavSatFix GPS/GNSS position with fix status and distance calculation. ### `.from_coordinates(lat, lon, alt)` — From GPS Coordinates ```python # simplified pos = NavSatFix.from_coordinates(lat=37.7749, lon=-122.4194, alt=10.0) ``` Factory method that creates a fix with STATUS_FIX and sets the coordinates. Simpler than setting fields individually. ### `.has_fix()` — Is GPS Valid? ```python # simplified if not gps.has_fix(): print("No GPS fix — using dead reckoning only") ``` Returns `True` when the receiver has a valid position fix. Without a fix, latitude/longitude are meaningless. ### `.distance_to(other)` — Great-Circle Distance ```python # simplified home = NavSatFix.from_coordinates(lat=37.7750, lon=-122.4195, alt=10.0) current = NavSatFix.from_coordinates(lat=37.7760, lon=-122.4180, alt=10.0) dist = current.distance_to(home) print(f"Distance to home: {dist:.1f} m") ``` Computes the great-circle distance between two GPS positions using the Haversine formula. Returns meters. This accounts for Earth's curvature — accurate for any distance from centimeters to thousands of kilometers. > **Common mistake:** Using Euclidean distance on latitude/longitude values. `distance = sqrt((lat1-lat2)² + (lon1-lon2)²)` is **wrong** — it doesn't account for Earth's curvature or the fact that longitude degrees get shorter near the poles. Always use `distance_to()`. ### `.horizontal_accuracy()` / `.is_valid()` ```python # simplified print(f"Accuracy: ±{gps.horizontal_accuracy():.1f} m") print(gps.is_valid()) ``` --- ## Clock Time source for wall-clock, simulation, and replay modes. The factory methods create the right clock type for your use case. ### `.wall_clock()` — Real Time ```python # simplified clk = Clock.wall_clock() ``` Production time — actual wall-clock nanoseconds. Use this for logging, timestamps, and real-world operation. ### `.sim_time(sim_ns, speed)` — Simulation Time ```python # simplified clk = Clock.sim_time(sim_ns=0, speed=2.0) # Start at t=0, 2x speed ``` Controlled time for simulation. The scheduler advances sim_time deterministically, so replaying the same sequence produces identical results. Use this for testing and development. > **Common mistake:** Using `wall_clock()` during simulation. Wall time advances independently of the simulation — your node will process data at the wrong rate. Always use `sim_time()` in simulation mode. ### `.replay_time(replay_ns, speed)` — Replay Time ```python # simplified clk = Clock.replay_time(replay_ns=1000000000, speed=1.0) ``` Playback from a recorded session. Time advances at `speed` multiplier (1.0 = real-time, 0.5 = half-speed). ### `.elapsed_since(earlier)` — Time Delta ```python # simplified start = Clock.wall_clock() # ... do work ... end = Clock.wall_clock() elapsed_ns = end.elapsed_since(start) print(f"Took {elapsed_ns / 1e6:.1f} ms") ``` Nanoseconds elapsed since another clock reading. Use this for profiling and timing measurements. ### `.is_paused()` — Check Pause State ```python # simplified if clk.is_paused(): print("Simulation paused") ``` --- ## TimeReference External time source for synchronization (GPS time, NTP offset). ### `.correct_timestamp(local_ns)` — Apply Time Offset ```python # simplified tref = TimeReference(time_ref_ns=1000000000, source="gps", offset_ns=-500) corrected = tref.correct_timestamp(local_ns=1000000500) print(tref.source) # "gps" ``` Applies the offset correction to a local timestamp: `corrected = local_ns + offset_ns`. Use this when your system clock differs from an external reference (GPS, NTP server). --- ## RangeSensor Single-point distance measurement from an ultrasonic, infrared, or single-beam LiDAR sensor. ### Constructor ```python # simplified range_s = RangeSensor(range=2.5, min_range=0.02, max_range=10.0, field_of_view=0.1, radiation_type=0) ``` ### Fields | Field | Type | Description | |-------|------|-------------| | `range` | `float` | Measured distance in meters | | `min_range` | `float` | Minimum detectable range (meters) | | `max_range` | `float` | Maximum detectable range (meters) | | `field_of_view` | `float` | Sensor beam width in radians | | `radiation_type` | `int` | 0 = ultrasonic, 1 = infrared | **Use case — Cliff detection for mobile robots:** ```python # simplified from horus import RangeSensor, CmdVel, Topic cliff_topic = Topic(RangeSensor) cmd_topic = Topic(CmdVel) def cliff_check(node): cliff = cliff_topic.recv(node) if cliff is None: return if cliff.range > 0.15: # Ground dropped away — cliff! cmd_topic.send(CmdVel(linear=-0.2, angular=0.0), node) # Back up ``` --- ## MagneticField Magnetometer reading for compass heading and magnetic anomaly detection. ### Constructor ```python # simplified mag = MagneticField(x=0.25, y=-0.1, z=0.45) # Tesla ``` ### Fields | Field | Type | Description | |-------|------|-------------| | `x` | `float` | Magnetic field strength along X axis (Tesla) | | `y` | `float` | Magnetic field strength along Y axis (Tesla) | | `z` | `float` | Magnetic field strength along Z axis (Tesla) | Earth's magnetic field is approximately 25-65 microtesla (0.000025-0.000065 T). Values significantly larger indicate nearby ferromagnetic objects or electromagnetic interference. ```python # simplified import math heading = math.atan2(mag.y, mag.x) # Compass heading (radians) ``` --- ## Temperature Temperature measurement from a thermocouple, thermistor, or integrated sensor. ### Constructor ```python # simplified temp = Temperature(temperature=72.5, variance=0.1) ``` ### Fields | Field | Type | Description | |-------|------|-------------| | `temperature` | `float` | Temperature in degrees Celsius | | `variance` | `float` | Measurement variance (uncertainty squared) | **Use case — Motor thermal protection:** ```python # simplified if temp.temperature > 80.0: print("Motor overheating — reduce duty cycle!") ``` --- ## FluidPressure Atmospheric or fluid pressure measurement from a barometer or pressure transducer. ### Constructor ```python # simplified pressure = FluidPressure(pressure=101325.0, variance=10.0) ``` ### Fields | Field | Type | Description | |-------|------|-------------| | `pressure` | `float` | Pressure in Pascals (Pa) | | `variance` | `float` | Measurement variance | Standard atmospheric pressure at sea level is 101,325 Pa. Barometric altitude estimation uses the pressure-altitude relationship (~12 Pa per meter near sea level). --- ## Illuminance Light level measurement from an ambient light sensor. ### Constructor ```python # simplified lux = Illuminance(illuminance=500.0, variance=5.0) ``` ### Fields | Field | Type | Description | |-------|------|-------------| | `illuminance` | `float` | Light intensity in lux | | `variance` | `float` | Measurement variance | Reference values: 0.001 lux (moonless night), 50 lux (living room), 500 lux (office), 10,000-100,000 lux (direct sunlight). Use for adaptive camera exposure or deciding when to activate headlights. --- ## Design Decisions **Why separate `LaserScan` and `RangeSensor`?** `LaserScan` represents a full sweep of range readings at known angles — it is a *scan*, not a single measurement. `RangeSensor` is a single distance reading from an ultrasonic, infrared, or single-beam sensor. Different use cases: `LaserScan` drives mapping and SLAM; `RangeSensor` is for simple distance checks (cliff detection, fill-level measurement). **Why does `Imu` have optional orientation?** Raw IMU chips (6-axis accelerometer+gyroscope) cannot determine orientation without a fusion algorithm. Publishing zeroed orientation would mislead downstream nodes into thinking the robot is level when it might not be. The `has_orientation()` check makes the contract explicit: if it returns `False`, only raw accel/gyro are valid. **Why does `BatteryState` use a hardcoded 10% threshold for `is_critical()`?** Below 10%, most LiPo batteries risk permanent damage and voltage sag can cause brownouts. The threshold is conservative-by-design — if 10% is wrong for your battery chemistry, use `is_low(your_threshold)` instead. **Why nanosecond `timestamp_ns` on every message?** Sensor fusion requires sub-millisecond time correlation. An IMU at 1kHz produces readings every 1ms, so millisecond timestamps lose ordering information. Nanosecond precision is free (it is just a u64) and ensures any fusion algorithm has sufficient temporal resolution. --- ## See Also - [Geometry Messages](/python/messages/geometry) — Pose2D, Vector3, Quaternion - [Navigation Messages](/python/messages/navigation) — NavGoal, OccupancyGrid - [Diagnostics Messages](/python/messages/diagnostics) — BatteryState monitoring patterns - [Control Messages](/python/messages/control) — MotorCommand, JointCommand for actuators - [Python Message Library](/python/library/python-message-library) — All 55+ message types overview --- ## Scheduler Deep-Dive (Python) Path: /python/scheduler-guide Description: Complete guide to the HORUS Python scheduler: execution model, real-time features, safety monitoring, deterministic mode, and production patterns # Scheduler Deep-Dive (Python) A warehouse AGV runs eight nodes: two LiDARs, a camera pipeline, a safety monitor, a path planner, motor controllers, a battery watcher, and a cloud uploader. The motor controller must tick every millisecond. The safety monitor must run *before* the motors every cycle. The path planner needs 40 ms of CPU time and must never block the motor controller. When the emergency stop fires, motors must halt before sensors disconnect. Coordinating all of this by hand --- threads, locks, timers, signal handlers --- is the single biggest source of bugs in robotics software. The HORUS scheduler handles it. You configure nodes, call `run()`, and the scheduler manages execution order, tick rates, deadline enforcement, safety monitoring, and graceful shutdown. ## The Execution Model ### How a Tick Works Every tick cycle, the scheduler does the following: 1. For each registered node (in `.order()` sequence): - Check the per-node rate limiter --- is this node due to tick? - If yes, start a timer and call the node's `tick()` function - Stop the timer. If the node exceeded its budget, apply its `on_miss` policy - Record metrics (tick duration, overruns, health state) 2. After all nodes are processed, sleep to maintain the global tick rate ``` Tick N Tick N+1 | | v v [ safety ] [ sensor ] [ motor ] [sleep] [ safety ] [ sensor ] ... order=0 order=10 order=20 order=0 order=10 ``` **Key details**: - BestEffort nodes (the default) execute **sequentially in order** on the main thread. RT, Compute, Event, and AsyncIo nodes run on their own threads and synchronize at tick boundaries. - Rate limiting is per-node: a 10 Hz node inside a 1 kHz scheduler only has `tick()` called every 100th cycle. - Budget enforcement happens *after* `tick()` returns --- the scheduler does not preempt mid-tick. It records the overrun and applies the miss policy. ### Initialization When you call `scheduler.run()` (or `horus.run()`), initialization happens lazily: 1. All pending node configurations are finalized (execution class inference, budget/deadline auto-derivation) 2. `init()` is called on every node, in `.order()` sequence 3. If a node's `init()` raises an exception, that node enters Error state and is excluded from ticking. Other nodes continue. 4. The main tick loop begins Lazy initialization means you can add and configure nodes in any order. The scheduler resolves everything at startup. ### Shutdown When the scheduler receives a stop signal (Ctrl+C, `scheduler.stop()`, or a node calling `node.request_stop()`): 1. The main loop exits 2. `shutdown()` is called on every node in **reverse order** --- last-added first 3. RT threads are given 3 seconds to exit; stalled threads are detached 4. Shared memory is cleaned up Reverse-order shutdown ensures dependent nodes stop before their dependencies. The motor controller (order 20) shuts down before the sensor (order 10) that feeds it. ## `horus.run()` --- The One-Liner For most programs, `horus.run()` is all you need: ```python # simplified import horus from horus import us def read_sensor(node): node.send("scan", {"ranges": [1.0, 2.0, 3.0]}) def navigate(node): if node.has_msg("scan"): scan = node.recv("scan") node.send("cmd", {"linear": 0.5, "angular": scan["ranges"][0]}) def drive(node): if node.has_msg("cmd"): cmd = node.recv("cmd") # Send to motor hardware node.log_info(f"Driving: linear={cmd['linear']}") sensor = horus.Node(name="sensor", tick=read_sensor, rate=10, order=0, pubs=["scan"]) ctrl = horus.Node(name="controller", tick=navigate, rate=30, order=10, subs=["scan"], pubs=["cmd"]) motor = horus.Node(name="motor", tick=drive, rate=1000, order=20, budget=300*us, subs=["cmd"]) horus.run(sensor, ctrl, motor, rt=True, watchdog_ms=500) ``` `horus.run()` creates a `Scheduler` behind the scenes, adds all nodes, and calls `run()`. Every scheduler-level parameter is available as a keyword argument: ```python # simplified horus.run( *nodes, duration=None, # seconds (None = forever) tick_rate=1000.0, # Hz rt=False, # SCHED_FIFO + mlockall deterministic=False, # SimClock blackbox_mb=0, # Flight recorder size watchdog_ms=0, # Frozen node detection recording=False, # Session recording name=None, # Scheduler name cores=None, # CPU affinity [0, 1, ...] max_deadline_misses=None, # Escalation threshold verbose=False, # Debug logging telemetry=None, # Endpoint URL ) ``` **When to use `horus.run()` vs `Scheduler`**: Use `horus.run()` when you create all nodes upfront and run until Ctrl+C (or a fixed duration). Use `Scheduler` when you need runtime mutation (adding/removing nodes, changing rates), `tick_once()` for testing, or the context manager pattern. ## The `Scheduler` Class ### Creating a Scheduler All configuration happens through keyword arguments: ```python # simplified from horus import Scheduler, Node, us, ms sched = Scheduler( tick_rate=1000.0, # 1 kHz global tick rate rt=True, # Enable real-time scheduling watchdog_ms=500, # 500 ms frozen-node detection blackbox_mb=16, # 16 MB flight recorder max_deadline_misses=50, # Emergency stop after 50 misses ) ``` ### Scheduler Parameters Reference | Parameter | Type | Default | Description | |-----------|------|---------|-------------| | `tick_rate` | `float` | `1000.0` | Global tick rate in Hz. Match to your fastest node | | `rt` | `bool` | `False` | Enable SCHED_FIFO scheduling and `mlockall` memory locking | | `deterministic` | `bool` | `False` | Enable SimClock (fixed `dt`, seeded RNG, reproducible results) | | `blackbox_mb` | `int` | `0` | Flight recorder buffer size in MB. 0 disables | | `watchdog_ms` | `int` | `0` | Watchdog timeout in milliseconds. 0 disables | | `recording` | `bool` | `False` | Enable session recording for replay | | `name` | `str` or `None` | `None` | Scheduler name for logging and diagnostics | | `cores` | `list[int]` or `None` | `None` | Pin scheduler to specific CPU cores | | `max_deadline_misses` | `int` or `None` | `None` | Emergency stop threshold. Default (in Rust): 100 | | `verbose` | `bool` | `False` | Enable verbose debug logging | | `telemetry` | `str` or `None` | `None` | Telemetry export endpoint URL | ### Adding Nodes ```python # simplified sched = Scheduler(tick_rate=1000, rt=True) sched.add(Node(name="safety", tick=safety_fn, rate=1000, order=0)) sched.add(Node(name="motor", tick=motor_fn, rate=1000, order=5, budget=300*us)) sched.add(Node(name="planner", tick=plan_fn, rate=50, order=50, compute=True)) sched.run() ``` `add()` returns `self`, so you can chain: ```python # simplified sched.add(sensor_node).add(ctrl_node).add(motor_node) ``` ### Running ```python # simplified # Run forever (until Ctrl+C or .stop()) sched.run() # Run for a fixed duration sched.run(duration=30.0) # 30 seconds ``` ### Context Manager The context manager calls `stop()` automatically on exit, ensuring clean shutdown even if an exception occurs: ```python # simplified with Scheduler(tick_rate=100, watchdog_ms=500) as sched: sched.add(Node(name="sensor", tick=read_sensor, rate=100, order=0)) sched.add(Node(name="logger", tick=log_data, rate=10, order=100)) sched.run(duration=60.0) # stop() called automatically here ``` This is the recommended pattern for production code. ## Node Scheduling Parameters Every scheduling parameter is set on the `Node` constructor. The scheduler reads them when you call `add()`. ### Parameter Reference | Parameter | Type | Default | Description | |-----------|------|---------|-------------| | `rate` | `float` | `30` | Node tick rate in Hz | | `order` | `int` | `100` | Execution order (lower = earlier). 0-9 critical, 10-49 high, 50-99 normal, 100-199 low, 200+ background | | `budget` | `float` or `None` | `None` | Max expected tick duration in **seconds**. `None` = auto (80% of period) | | `deadline` | `float` or `None` | `None` | Hard deadline in **seconds**. `None` = auto (95% of period) | | `on_miss` | `str` or `None` | `None` | Deadline miss policy: `"warn"`, `"skip"`, `"safe_mode"`, `"stop"` | | `failure_policy` | `str` or `None` | `None` | Error policy: `"fatal"`, `"restart"`, `"skip"`, `"ignore"` | | `compute` | `bool` | `False` | Run on thread pool (CPU-heavy work) | | `on` | `str` or `None` | `None` | Event-driven --- tick only when this topic receives a message | | `priority` | `int` or `None` | `None` | OS thread priority (SCHED_FIFO 1-99, requires `rt=True`) | | `core` | `int` or `None` | `None` | Pin to specific CPU core index | | `watchdog` | `float` or `None` | `None` | Per-node watchdog timeout in seconds (overrides global) | ### Unit Constants ```python # simplified from horus import us, ms us = 1e-6 # microseconds -> seconds ms = 1e-3 # milliseconds -> seconds # Examples budget=300 * us # 300 microseconds deadline=900 * us # 900 microseconds deadline=5 * ms # 5 milliseconds watchdog=500 * ms # 500 milliseconds (or just use watchdog_ms=500 on the Scheduler) ``` ### Execution Classes The scheduler assigns each node an execution class based on its configuration. You do not set this directly --- it is inferred: | Configuration | Assigned Class | Thread Model | Best For | |---------------|---------------|--------------|----------| | `rate`, `budget`, or `deadline` set | **Rt** | Dedicated thread, budget enforced | Motor control, safety, sensor fusion | | `compute=True` | **Compute** | Thread pool | Path planning, SLAM, image processing | | `on="topic.name"` | **Event** | Sleeps until message arrives | Emergency stop, command handlers | | `async def tick` | **AsyncIo** | Tokio runtime | HTTP, cloud upload, database | | None of the above | **BestEffort** | Sequential on main thread | Logging, telemetry, display | ### Budget and Deadline Auto-Derivation When you set `rate` on a node, the scheduler auto-derives timing constraints: - **Budget** = 80% of the period (e.g., 1000 Hz means a 1 ms period, so budget = 800 us) - **Deadline** = 95% of the period (e.g., 1000 Hz means deadline = 950 us) You can override either with explicit values: ```python # simplified # Auto-derived: rate=1000 -> budget=800us, deadline=950us auto_node = Node(name="auto", tick=fn, rate=1000, order=0) # Explicit override explicit_node = Node(name="explicit", tick=fn, rate=1000, order=0, budget=300 * us, deadline=900 * us) ``` ### The `on_miss` Policy When a node exceeds its deadline, the miss policy fires: | Policy | Behavior | Use When | |--------|----------|----------| | `"warn"` | Log warning, continue normally | Non-critical nodes (default) | | `"skip"` | Skip this node's next tick to recover | High-frequency nodes that can afford one skipped cycle | | `"safe_mode"` | Call `enter_safe_state()` on the node | Motor controllers, actuators | | `"stop"` | Stop the entire scheduler | Safety monitors, last resort | ```python # simplified motor = Node( name="motor", tick=motor_ctrl, rate=1000, order=5, budget=300 * us, deadline=900 * us, on_miss="safe_mode", # Safe state on deadline miss ) safety = Node( name="safety_monitor", tick=check_safety, rate=1000, order=0, budget=100 * us, deadline=200 * us, on_miss="stop", # Stop everything if safety monitor misses ) ``` ### The `failure_policy` When a node's `tick()` raises an exception, the failure policy determines what happens: | Policy | Behavior | |--------|----------| | `"fatal"` | Stop the entire scheduler | | `"restart"` | Re-initialize the node and resume | | `"skip"` | Skip this tick, try again next cycle | | `"ignore"` | Log the error and continue | ```python # simplified cloud = Node( name="cloud_upload", tick=upload_data, rate=1, order=200, failure_policy="skip", # Network glitches shouldn't stop the robot ) ``` ### Async Nodes If your `tick` function is `async def`, the node is automatically assigned to the AsyncIo execution class. No additional configuration needed. ```python # simplified import aiohttp async def upload_telemetry(node): if node.has_msg("telemetry"): data = node.recv("telemetry") async with aiohttp.ClientSession() as session: await session.post("https://fleet.example.com/telemetry", json=data) uploader = Node( name="cloud", tick=upload_telemetry, # async -> automatically AsyncIo class rate=1, order=200, pubs=[], subs=["telemetry"], failure_policy="skip", ) ``` ### Event-Driven Nodes Set `on="topic.name"` to create a node that sleeps until that topic receives a message: ```python # simplified def handle_estop(node): msg = node.recv("emergency.stop") node.log_warning(f"E-STOP triggered: {msg}") node.request_stop() estop = Node( name="estop", tick=handle_estop, on="emergency.stop", # Only wakes when message arrives order=0, subs=["emergency.stop"], ) ``` ### Compute Nodes Set `compute=True` for CPU-heavy work that should run on a thread pool instead of the main loop: ```python # simplified def plan_path(node): if node.has_msg("map"): grid = node.recv("map") # Heavy computation — runs on thread pool, won't block motor controller path = a_star(grid, start=(0, 0), goal=(10, 10)) node.send("path", path) planner = Node( name="planner", tick=plan_path, rate=10, order=50, compute=True, # Thread pool, not main loop subs=["map"], pubs=["path"], ) ``` ### CPU Pinning and Priority For maximum timing determinism, pin nodes to specific CPU cores and set OS thread priority: ```python # simplified motor = Node( name="motor", tick=motor_ctrl, rate=1000, order=0, budget=300 * us, core=2, # Pin to CPU core 2 priority=80, # SCHED_FIFO priority (requires rt=True on scheduler) ) ``` ## The Framework Clock Inside `tick()`, `init()`, and `shutdown()` callbacks, the framework clock provides consistent time across all nodes: ```python # simplified import horus def control_loop(node): t = horus.now() # Current time (seconds) delta = horus.dt() # Time since last tick (seconds) total = horus.elapsed() # Time since scheduler start (seconds) n = horus.tick() # Current tick number (int) remaining = horus.budget_remaining() # Time left in budget (seconds) # Use dt for frame-rate-independent physics velocity += acceleration * delta position += velocity * delta ``` | Function | Returns | Description | |----------|---------|-------------| | `horus.now()` | `float` | Current time in seconds. Wall clock in normal mode, SimClock in deterministic mode | | `horus.dt()` | `float` | Time elapsed since last tick in seconds. Fixed `1/rate` in deterministic mode | | `horus.elapsed()` | `float` | Seconds since scheduler start | | `horus.tick()` | `int` | Monotonically increasing tick counter | | `horus.budget_remaining()` | `float` | Seconds remaining in this tick's budget. `float('inf')` if no budget set | | `horus.rng_float()` | `float` | Random float in [0.0, 1.0). System entropy normally, tick-seeded in deterministic mode | ### Normal vs Deterministic Clock In normal mode, `horus.now()` returns wall-clock time and `horus.dt()` returns the actual elapsed duration. In deterministic mode (`deterministic=True`), the scheduler uses a **SimClock**: `horus.dt()` returns a fixed `1/rate` value every tick, and `horus.rng_float()` produces the same sequence across runs. This makes tests and simulations reproducible. ## `tick_once()` --- Testing and Simulation `tick_once()` executes exactly one tick cycle, then returns. This is the primary tool for testing scheduler behavior without threads or timing. ```python # simplified from horus import Scheduler, Node def accumulate(node): count = node.recv("count") or 0 node.send("count", count + 1) sched = Scheduler(tick_rate=100) sched.add(Node(name="counter", tick=accumulate, rate=100, order=0, pubs=["count"], subs=["count"])) sched.tick_once() # Init (lazy) + one tick # count topic now has value 1 sched.tick_once() # Second tick # count topic now has value 2 ``` ### Selective Ticking Pass a list of node names to tick only specific nodes: ```python # simplified sched.tick_once(["sensor"]) # Only tick the sensor sched.tick_once(["sensor", "ctrl"]) # Tick sensor and controller, skip motor ``` This is useful for unit-testing a single node while keeping others frozen. ### `tick_for()` --- Timed Runs `tick_for()` runs the tick loop for a specific duration, then returns: ```python # simplified sched.tick_for(1.0) # Run for 1 second, then return sched.tick_for(0.5, ["sensor"]) # Run only sensor for 0.5 seconds ``` This is useful for integration tests that need to observe behavior over time, and for simulation stepping where you advance by a fixed wall-clock interval. ## Runtime Mutation You can modify the scheduler while it is running. ### `set_node_rate()` Change a node's tick rate dynamically: ```python # simplified # Sensor normally runs at 100 Hz sched.add(Node(name="sensor", tick=read_lidar, rate=100, order=0)) sched.run() # running in a thread or after tick_once # Slow down to conserve power sched.set_node_rate("sensor", 20) # 100 Hz -> 20 Hz # Speed up for precision docking sched.set_node_rate("sensor", 500) # 20 Hz -> 500 Hz ``` ### `set_tick_budget()` Change a node's tick budget dynamically. The argument is in **microseconds** (unlike the `Node` constructor which takes seconds): ```python # simplified sched.set_tick_budget("motor", 500) # Allow 500 us per tick ``` ### `remove_node()` Remove a node from the running scheduler: ```python # simplified removed = sched.remove_node("cloud_logger") # Returns True if found ``` The removed node's `shutdown()` is called before it is detached from the tick loop. ### `add_critical_node()` Mark a node as safety-critical with a dedicated watchdog timeout. If a critical node exceeds this timeout, the scheduler calls `enter_safe_state()` on **all** nodes: ```python # simplified sched.add_critical_node("motor_controller", timeout_ms=500) ``` This is stricter than the global watchdog: a regular node that freezes gets isolated; a critical node that freezes triggers a system-wide safe state. ## Safety and Monitoring ### Watchdog Enable the watchdog with `watchdog_ms` on the scheduler: ```python # simplified sched = Scheduler(tick_rate=1000, watchdog_ms=500) ``` The watchdog uses graduated response: | Timeout | Health State | Action | |---------|-------------|--------| | 1x watchdog | Warning | Log warning | | 2x watchdog | Unhealthy | Skip tick, log error | | 3x watchdog (critical node) | Isolated | Remove from tick loop, call `enter_safe_state()` | A single late tick might be a transient GC pause. The graduated response gives transient problems time to resolve while still catching truly frozen nodes. ### Per-Node Watchdog Override the global watchdog for specific nodes: ```python # simplified motor = Node( name="motor", tick=motor_ctrl, rate=1000, order=0, watchdog=200 * ms, # Stricter: 200 ms instead of global 500 ms ) ``` ### Deadline Miss Escalation Set `max_deadline_misses` to stop the scheduler after a cumulative threshold: ```python # simplified sched = Scheduler(tick_rate=1000, max_deadline_misses=50) # After 50 total deadline misses across all nodes, the scheduler stops ``` ### `safety_stats()` Query safety monitoring statistics at runtime: ```python # simplified stats = sched.safety_stats() if stats: print(f"Budget overruns: {stats['budget_overruns']}") print(f"Watchdog expirations: {stats['watchdog_expirations']}") print(f"Node health states: {stats['health_states']}") ``` ### `get_node_stats()` Query per-node statistics: ```python # simplified stats = sched.get_node_stats("motor") print(f"Node: {stats['name']}") print(f"Priority: {stats['priority']}") print(f"Total ticks: {stats['total_ticks']}") print(f"Errors: {stats['errors_count']}") ``` ## Real-Time Features ### Enabling RT Pass `rt=True` to the scheduler to request real-time scheduling: ```python # simplified sched = Scheduler(tick_rate=1000, rt=True) ``` This enables: - **SCHED_FIFO**: Linux real-time scheduling class for all RT-class nodes - **mlockall**: Lock all memory pages to prevent page faults during ticks - **CPU isolation**: Use isolated cores when available ### Checking RT Capabilities ```python # simplified sched = Scheduler(tick_rate=1000, rt=True) # Did we get full RT? if sched.has_full_rt(): print("Full real-time capabilities active") else: print("Running with degraded RT") for d in sched.degradations(): print(f" Degradation: {d}") # Detailed capabilities caps = sched.capabilities() print(f"RT scheduling: {caps['rt_scheduling']}") print(f"Memory locking: {caps['memory_locking']}") print(f"CPU isolation: {caps['cpu_isolation']}") ``` ### Degradations When `rt=True` is set but the system cannot provide all RT features, the scheduler degrades gracefully and logs what it could not enable: ```python # simplified sched = Scheduler(tick_rate=1000, rt=True) for msg in sched.degradations(): print(f" {msg}") # Example output: # SCHED_FIFO unavailable (no CAP_SYS_NICE) - using SCHED_OTHER # mlockall failed (EPERM) - memory pages may be swapped ``` The scheduler still runs. Nodes still tick. Timing guarantees are weakened but not absent. ## Deterministic Mode Enable deterministic mode for reproducible simulation and testing: ```python # simplified sched = Scheduler(tick_rate=100, deterministic=True) ``` In deterministic mode: - `horus.dt()` returns a **fixed** `1/rate` every tick (not wall-clock elapsed) - `horus.now()` advances by `dt` each tick (SimClock) - `horus.rng_float()` returns a **tick-seeded** sequence --- same across runs - Execution order is determined by the dependency graph (inferred from topic connections), not OS thread scheduling This guarantees identical results across runs on any machine. Use it for: - **Unit tests**: Assert exact outputs after N ticks - **Simulation**: Physics engines need fixed timesteps - **Regression tests**: Catch behavioral changes in CI ```python # simplified sched = Scheduler(tick_rate=100, deterministic=True) sched.add(Node(name="sim_sensor", tick=sim_tick, rate=100, order=0)) for _ in range(1000): sched.tick_once() assert horus.dt() == 0.01 # Fixed: 1/100 Hz = 10 ms ``` ## Recording and Replay ### Recording a Session ```python # simplified sched = Scheduler(tick_rate=100, recording=True, blackbox_mb=16) sched.add(Node(name="sensor", tick=read_lidar, rate=100, order=0)) sched.run(duration=60.0) # Record 60 seconds of data # Check recording status print(sched.is_recording()) # True while running ``` ### Stopping and Listing Recordings ```python # simplified # Stop recording and get file paths files = sched.stop_recording() for f in files: print(f"Recorded: {f}") # List all available recordings recordings = sched.list_recordings() for r in recordings: print(f"Session: {r}") ``` ### Deleting Recordings ```python # simplified sched.delete_recording("session_2026_03_20_143000") ``` ### Flight Recorder (Blackbox) The blackbox is a rolling buffer that records the last N megabytes of tick data. Unlike recording, which captures everything to disk, the blackbox keeps a fixed-size ring buffer in memory and only writes to disk on crash or explicit dump: ```python # simplified sched = Scheduler(tick_rate=1000, blackbox_mb=64) # On crash: last 64 MB of tick data is saved for post-mortem analysis ``` ## Introspection Query the scheduler at runtime: ```python # simplified print(sched.status()) # "idle", "running", or "stopped" print(sched.current_tick()) # Current tick number print(sched.is_running()) # True if in the tick loop print(sched.scheduler_name()) # Scheduler name # Node introspection print(sched.get_node_count()) # Number of registered nodes print(sched.get_node_names()) # ["sensor", "motor", "planner"] print(sched.has_node("motor")) # True # All node info for info in sched.get_all_nodes(): print(f"{info['name']}: order={info['order']}, state={info['state']}") # Specific node order = sched.get_node_info("motor") # Returns execution order (int) ``` ## Node Lifecycle Callbacks Beyond `tick`, nodes have `init` and `shutdown` callbacks: ```python # simplified def setup_hardware(node): node.log_info("Connecting to motor controller...") # Hardware init here node.send("status", {"state": "ready"}) def control(node): if node.has_msg("cmd"): cmd = node.recv("cmd") # Motor control logic node.log_debug(f"Command: {cmd}") def cleanup(node): node.log_info("Stopping motor controller...") node.send("cmd", {"linear": 0.0, "angular": 0.0}) # Stop motors motor = Node( name="motor", init=setup_hardware, # Called once at startup tick=control, # Called every tick shutdown=cleanup, # Called once at shutdown rate=1000, order=5, budget=300 * us, ) ``` **Logging methods** (`log_info`, `log_warning`, `log_error`, `log_debug`) only work inside `init`, `tick`, and `shutdown` callbacks. Calling them outside these contexts silently drops the message. ## Production Patterns ### Warehouse AGV A full warehouse AGV with safety monitoring, path planning, motor control, and fleet reporting: ```python # simplified import horus from horus import Node, Scheduler, us, ms # --- Node callbacks --- def safety_check(node): """Order 0: runs before everything else, every cycle.""" if node.has_msg("scan"): scan = node.recv("scan") min_range = min(scan["ranges"]) if min_range < 0.3: # 30 cm emergency threshold node.send("emergency.stop", {"reason": "obstacle", "distance": min_range}) node.log_warning(f"Emergency stop: obstacle at {min_range:.2f}m") def read_lidar(node): """Order 10: read LiDAR at 40 Hz.""" # Hardware read node.send("scan", {"ranges": [1.2, 0.8, 2.5, 1.1], "angle_min": -1.57}) def plan_path(node): """Order 50: heavy computation on thread pool.""" if node.has_msg("scan"): scan = node.recv("scan") # A* or RRT on occupancy grid (40+ ms of computation) path = compute_path(scan) node.send("path", path) def track_path(node): """Order 60: pure pursuit controller at 100 Hz.""" if node.has_msg("path"): path = node.recv("path") cmd = pure_pursuit(path, lookahead=0.5) node.send("cmd_vel", cmd) def motor_drive(node): """Order 70: motor controller at 1 kHz with tight budget.""" if node.has_msg("cmd_vel"): cmd = node.recv("cmd_vel") # Write to motor hardware apply_wheel_velocities(cmd["left"], cmd["right"]) def battery_check(node): """Order 100: slow background monitoring.""" voltage = read_battery_voltage() if voltage < 22.0: node.log_warning(f"Low battery: {voltage:.1f}V") node.send("battery", {"voltage": voltage}) def fleet_report(node): """Order 200: async HTTP upload to fleet management.""" if node.has_msg("battery"): data = node.recv("battery") # aiohttp call to fleet server node.send("fleet.telemetry", data) # --- Build and run --- safety = Node(name="safety", tick=safety_check, rate=1000, order=0, budget=100*us, deadline=200*us, on_miss="stop", subs=["scan"], pubs=["emergency.stop"]) lidar = Node(name="lidar", tick=read_lidar, rate=40, order=10, pubs=["scan"]) planner = Node(name="planner", tick=plan_path, rate=10, order=50, compute=True, subs=["scan"], pubs=["path"]) tracker = Node(name="tracker", tick=track_path, rate=100, order=60, subs=["path"], pubs=["cmd_vel"]) motor = Node(name="motor", tick=motor_drive, rate=1000, order=70, budget=300*us, deadline=900*us, on_miss="safe_mode", core=2, subs=["cmd_vel"]) battery = Node(name="battery", tick=battery_check, rate=1, order=100) fleet = Node(name="fleet", tick=fleet_report, rate=1, order=200, failure_policy="skip", subs=["battery"], pubs=["fleet.telemetry"]) with Scheduler(tick_rate=1000, rt=True, watchdog_ms=500, blackbox_mb=16, max_deadline_misses=50) as sched: sched.add(safety).add(lidar).add(planner).add(tracker) sched.add(motor).add(battery).add(fleet) sched.add_critical_node("motor", timeout_ms=200) sched.add_critical_node("safety", timeout_ms=100) sched.run() ``` What this sets up: - **Safety monitor** at order 0 runs before everything --- if it detects an obstacle, it fires before the motor ticks - **LiDAR** at 40 Hz feeds the planner and safety monitor - **Path planner** on the Compute thread pool --- its 40 ms computation does not block the 1 kHz motor controller - **Motor controller** pinned to core 2 with a 300 us budget --- enters safe state on miss - **Fleet reporter** tolerates network failures with `failure_policy="skip"` - Motor and safety are **critical nodes** --- if either freezes, the entire system enters safe state ### Drone Flight Controller A quadrotor with IMU fusion, PID control, and telemetry: ```python # simplified import horus from horus import Node, us, ms def imu_read(node): """Read IMU at 400 Hz.""" raw = read_imu_hardware() fused = complementary_filter(raw) node.send("imu", fused) def attitude_control(node): """Inner loop: attitude PID at 400 Hz.""" if node.has_msg("imu") and node.has_msg("setpoint"): imu = node.recv("imu") sp = node.recv("setpoint") delta = horus.dt() motors = pid_control(imu, sp, delta) node.send("motor_cmd", motors) def position_control(node): """Outer loop: position PID at 50 Hz.""" if node.has_msg("gps"): gps = node.recv("gps") waypoint = node.recv("waypoint") if node.has_msg("waypoint") else current_waypoint() setpoint = position_pid(gps, waypoint, horus.dt()) node.send("setpoint", setpoint) def motor_output(node): """ESC output at 400 Hz with hard deadline.""" if node.has_msg("motor_cmd"): cmd = node.recv("motor_cmd") write_esc(cmd) def telemetry_log(node): """Log to SD card at 10 Hz.""" if node.has_msg("imu"): node.send("log", { "t": horus.now(), "tick": horus.tick(), "imu": node.recv("imu"), }) imu = Node(name="imu", tick=imu_read, rate=400, order=0, core=0) att = Node(name="attitude", tick=attitude_control, rate=400, order=10, budget=200*us, deadline=500*us, on_miss="safe_mode", core=1, subs=["imu", "setpoint"], pubs=["motor_cmd"]) pos = Node(name="position", tick=position_control, rate=50, order=20, subs=["gps", "waypoint"], pubs=["setpoint"]) esc = Node(name="esc", tick=motor_output, rate=400, order=30, budget=100*us, deadline=200*us, on_miss="stop", core=1, subs=["motor_cmd"]) logger = Node(name="logger", tick=telemetry_log, rate=10, order=100, subs=["imu"], pubs=["log"]) horus.run(imu, att, pos, esc, logger, tick_rate=400, rt=True, watchdog_ms=100, blackbox_mb=8, max_deadline_misses=10, cores=[0, 1]) ``` Key patterns: - **Inner loop (attitude)** and **outer loop (position)** run at different rates through the same scheduler - `horus.dt()` provides frame-rate-independent PID integration - ESC output uses `on_miss="stop"` --- if the ESC misses a deadline, stop the scheduler (drone must not fly with stale motor commands) - `cores=[0, 1]` pins the entire scheduler to two dedicated cores - `blackbox_mb=8` records the last 8 MB of flight data for crash investigation ## Testing with the Scheduler ### Unit Testing a Single Node ```python # simplified def test_counter_node(): """Test that counter increments each tick.""" results = [] def counter_tick(node): count = (node.recv("count") or 0) + 1 node.send("count", count) results.append(count) sched = Scheduler(tick_rate=100, deterministic=True) sched.add(Node(name="counter", tick=counter_tick, rate=100, order=0, pubs=["count"], subs=["count"])) sched.tick_once() sched.tick_once() sched.tick_once() assert results == [1, 2, 3] ``` ### Integration Testing Multiple Nodes ```python # simplified def test_sensor_to_motor_pipeline(): """Test that sensor data flows through the controller to the motor.""" motor_cmds = [] def fake_sensor(node): node.send("scan", {"ranges": [0.5, 1.0, 1.5]}) def controller(node): if node.has_msg("scan"): scan = node.recv("scan") node.send("cmd", {"speed": min(scan["ranges"])}) def mock_motor(node): if node.has_msg("cmd"): motor_cmds.append(node.recv("cmd")) sched = Scheduler(tick_rate=100, deterministic=True) sched.add(Node(name="sensor", tick=fake_sensor, rate=100, order=0, pubs=["scan"])) sched.add(Node(name="ctrl", tick=controller, rate=100, order=10, subs=["scan"], pubs=["cmd"])) sched.add(Node(name="motor", tick=mock_motor, rate=100, order=20, subs=["cmd"])) # Run enough ticks for data to flow through the pipeline sched.tick_for(0.1) assert len(motor_cmds) > 0 assert motor_cmds[0]["speed"] == 0.5 ``` ### Testing Safety Behavior ```python # simplified def test_watchdog_detects_frozen_node(): """Verify the watchdog catches a node that hangs.""" import time def frozen_tick(node): time.sleep(2.0) # Simulate frozen node sched = Scheduler(tick_rate=10, watchdog_ms=100) sched.add(Node(name="frozen", tick=frozen_tick, rate=10, order=0)) sched.tick_once() stats = sched.safety_stats() assert stats is not None assert stats["watchdog_expirations"] > 0 ``` ## Design Decisions **Why `horus.run()` instead of manual thread management?** Python's GIL makes multi-threaded Python unreliable for real-time work. `horus.run()` hands control to the Rust scheduler, which manages threads natively. Python `tick()` functions are called back from Rust --- the GIL is acquired only for the duration of each Python callback, then released. This gives Python nodes near-native scheduling precision while keeping the API simple. **Why all config on `Node()` instead of `scheduler.add().order().rate().build()`?** In Python, keyword arguments are idiomatic. Chained builder methods are a Rust pattern that does not translate well. `Node(rate=1000, order=0, budget=300*us)` is immediately readable. It also means the node carries its own configuration --- you can pass it between functions, store it in a list, or serialize it without losing scheduling intent. **Why `us` and `ms` constants instead of special unit types?** `300 * us` is plain Python math --- it produces a `float` in seconds. No special types to import, no conversion functions to remember, no confusion about what unit a function expects. The cost is that `budget=300` silently means 300 seconds, which is why the constants exist and the docstrings emphasize them. **Why lazy initialization?** `init()` runs when `scheduler.run()` is called, not when `scheduler.add()` is called. This means you can configure all nodes, set global scheduler settings, and defer hardware initialization until the system is truly ready to start. It also means the scheduler's clock, RT configuration, and recording state are all finalized before any node initializes. **Why reverse-order shutdown?** Nodes are typically added in dependency order: sensors then controllers then loggers. Reverse-order shutdown means controllers stop motors before sensors disconnect, and loggers record the shutdown sequence before they themselves stop. ## Trade-offs | Gain | Cost | |------|------| | **One-liner `horus.run()`** --- no boilerplate | Less control than manual `Scheduler` for runtime mutation | | **Python GIL released between ticks** --- Rust scheduler handles threading | Python `tick()` functions must not hold the GIL for work done outside the callback | | **Auto-derived budget/deadline from `rate`** --- less to configure | Must use explicit `budget`/`deadline` to override the 80%/95% defaults | | **Graduated watchdog** --- transient spikes do not kill nodes | 3x timeout before isolation means truly frozen nodes take longer to detect | | **`deterministic=True`** --- reproducible tests and simulations | Not suitable for production (no wall-clock timing) | | **All config on `Node()`** --- Pythonic, self-contained | Cannot reconfigure a node after creation (create a new one instead) | | **`tick_once()` for testing** --- fully deterministic single-step | No rate control in single-step mode (by design) | | **Recording** --- full session capture for replay | Disk I/O overhead, not suitable for ultra-low-latency production | | **RT degradation** --- runs on developer laptops and production alike | Must check `has_full_rt()` to confirm actual RT in production | ## See Also - [Python API Reference](/python/api) --- Full API surface - [Scheduler: Running Your Nodes](/concepts/scheduler-beginner) --- Beginner introduction - [Scheduler --- Full Reference](/concepts/core-concepts-scheduler) --- Concepts and architecture - [Execution Classes](/concepts/execution-classes) --- Deep dive into the five classes - [Nodes --- Full Reference](/concepts/core-concepts-nodes) --- The components the scheduler runs - [Safety Monitor](/advanced/safety-monitor) --- Watchdog, budget enforcement, graduated degradation - [Getting Started (Python)](/getting-started/quick-start-python) --- First Python application --- ## Topic API Path: /python/api/topic Description: Python Topic class — standalone pub/sub, typed vs string topics, GenericMessage, cross-language compatibility # Topic API `Topic` provides standalone pub/sub communication outside the node lifecycle — for scripts, tests, tools, and monitoring. Inside nodes, use `node.send()` / `node.recv()` instead. ```python # simplified import horus topic = horus.Topic(horus.CmdVel) topic.send(horus.CmdVel(linear=1.0, angular=0.0)) ``` --- ## Constructor ```python # simplified horus.Topic( msg_type, # Message class or string name capacity=1024, # Ring buffer capacity endpoint=None, # Network endpoint (e.g., "topic@host:port") ) ``` | Parameter | Type | Default | Description | |-----------|------|---------|-------------| | `msg_type` | class or `str` | required | `horus.CmdVel` for typed, `"my_topic"` for generic | | `capacity` | `int` | `1024` | Ring buffer slot count | | `endpoint` | `str` or `None` | `None` | Network endpoint for cross-process topics | ### Typed vs String ```python # simplified # Typed — zero-copy POD transport (~1.7μs) topic = horus.Topic(horus.CmdVel) topic = horus.Topic(horus.Imu, capacity=64) # String — GenericMessage with MessagePack serialization (~6-50μs) topic = horus.Topic("my_data") ``` --- ## Methods | Method | Signature | Returns | Description | |--------|-----------|---------|-------------| | `send` | `send(message) -> bool` | `True` on success | Publish a message | | `recv` | `recv() -> Any or None` | Message or `None` | Get latest message | ## Properties | Property | Type | Description | |----------|------|-------------| | `name` | `str` | Topic name | | `is_network_topic` | `bool` | Whether this is a network-backed topic | | `endpoint` | `str` or `None` | Network endpoint string | | `backend_type` | `str` | Backend type name (e.g., `"SpscIntra"`, `"PodShm"`) | --- ## Typed Topics (Zero-Copy) Typed topics use the same fixed-size `#[repr(C)]` message types as Rust. Data transfers through shared memory with no serialization — just a memcpy of the struct. ```python # simplified import horus # Publisher pub = horus.Topic(horus.CmdVel) pub.send(horus.CmdVel(linear=1.0, angular=0.5)) # Subscriber (same or different process) sub = horus.Topic(horus.CmdVel) msg = sub.recv() if msg: print(f"linear={msg.linear}, angular={msg.angular}") ``` **Performance**: ~1.7μs per send+recv (Python overhead from PyO3 + GIL). The underlying Rust transport is ~14-89ns. **Cross-language**: Typed topics are binary-compatible between Rust and Python. A Python node publishing `horus.CmdVel` is received by a Rust `Topic` subscriber with zero conversion. --- ## GenericMessage (String Topics) When you use a string name, Python dicts are serialized via MessagePack into a `GenericMessage`. ```python # simplified # These work node.send("data", {"x": 1.0, "y": 2.0}) # dict node.send("data", [1, 2, 3]) # list node.send("data", "hello") # string node.send("data", {"nested": {"a": [1, 2]}}) # nested # These fail node.send("data", my_custom_object) # TypeError node.send("data", lambda: 42) # TypeError ``` **Size limit**: 4KB max per message. Messages exceeding 4KB are spilled to a TensorPool region automatically, but very large dicts (>100KB) should use typed messages or Image/PointCloud instead. **Serializable types**: Any type MessagePack handles — `dict`, `list`, `str`, `int`, `float`, `bool`, `None`, `bytes`. Nested structures work. Custom classes do NOT work unless you convert to dict first. **Error behavior**: If serialization fails, `send()` raises `TypeError`. **Performance**: ~6-12μs for small dicts, ~50-110μs for large dicts (50+ keys). **Cross-language**: GenericMessage does **NOT** cross to Rust nodes. Rust nodes use typed `Topic` — they cannot receive Python dicts. For cross-language communication, use typed message classes. --- ## Performance Comparison | Transport | Latency | Use Case | |-----------|---------|----------| | Typed topic (`horus.CmdVel`) | ~1.7μs | Control loops, sensor data, any cross-language | | GenericMessage (dict) | ~6-50μs | Prototyping, config data, Python-only nodes | | Image zero-copy (DLPack) | ~1.1μs | Camera frames to ML inference | **Rule of thumb**: Use typed topics for anything running faster than 10 Hz or crossing the Rust/Python boundary. --- ## Cross-Language Compatibility | Type | Python → Rust | Rust → Python | Transport | |------|:---:|:---:|-----------| | Typed messages (`CmdVel`, `Imu`, etc.) | Yes | Yes | Zero-copy Pod | | `Image`, `PointCloud`, `DepthImage` | Yes | Yes | Pool-backed descriptor | | Python dicts (GenericMessage) | No | No | Python-only (MessagePack) | --- ## Examples ### One-Shot Publisher Script ```python # simplified import horus topic = horus.Topic(horus.CmdVel) topic.send(horus.CmdVel(linear=1.0, angular=0.0)) print("Sent velocity command") ``` ### Topic Monitor Script ```python # simplified import horus import time topic = horus.Topic(horus.Imu) while True: msg = topic.recv() if msg: print(f"accel_z={msg.accel_z:.2f}") time.sleep(0.01) ``` ### Inside Nodes vs Standalone Inside a Node, use `node.send()` / `node.recv()` — these use the topics declared in `pubs` / `subs`: ```python # simplified # Inside node — uses auto-created topics from pubs/subs def tick(node): node.send("cmd_vel", horus.CmdVel(linear=1.0, angular=0.0)) # Outside node — standalone Topic for scripts/tools topic = horus.Topic(horus.CmdVel) topic.send(horus.CmdVel(linear=1.0, angular=0.0)) ``` --- ## Backpressure Topics use a **fixed-size ring buffer**. When the buffer is full, the oldest message is dropped — `send()` never blocks. ``` Buffer capacity: 4 send(A) → [A, _, _, _] send(B) → [A, B, _, _] send(C) → [A, B, C, _] send(D) → [A, B, C, D] ← buffer full send(E) → [B, C, D, E] ← A dropped (oldest) ``` **Implications:** - Slow consumers miss messages — the producer is never throttled - `recv()` always returns the most recent message (or None) - `recv_all()` drains whatever is in the buffer (may be fewer than sent) - Default capacity is 1024 slots — sufficient for most robotics rates **When to increase capacity:** - High-rate producer (>1kHz) with slow consumer — increase to avoid drops - Recording all messages — increase to match burst rate **When to decrease capacity:** - Memory-constrained system — reduce to lower SHM usage - Latest-value semantics only (motor commands) — capacity=2 is fine ```python # simplified # High-rate lidar (40Hz, large scans) — small capacity, latest-only topic = horus.Topic(horus.LaserScan, capacity=4) # Event logging — large capacity to avoid drops topic = horus.Topic("events", capacity=8192) ``` --- ## Network Topics Topics can bridge across processes and machines using the `endpoint` parameter: ```python # simplified # Cross-process on same machine (automatic via SHM — no endpoint needed) topic = horus.Topic(horus.CmdVel) # Cross-machine via network topic = horus.Topic(horus.CmdVel, endpoint="cmd_vel@192.168.1.10:9000") ``` Network topics use UDP for low-latency transport. The endpoint format is `topic_name@host:port`. > **Note**: Most robotics use cases run on a single machine — cross-process topics use shared memory automatically with no configuration. Network topics are for multi-robot systems or distributed architectures. --- ## Pool-Backed Types `Image`, `PointCloud`, and `DepthImage` are NOT transferred through the ring buffer. Instead, the data lives in a shared memory pool, and only a **descriptor** (~8 bytes) passes through the topic: ``` Producer Consumer │ │ ├─ Allocate 640×480 image │ │ in SHM pool │ ├─ Write pixels to pool │ ├─ send(descriptor) ─────────────►│ recv() returns descriptor │ (8 bytes through ring buffer) ├─ to_numpy() → view into same SHM │ │ (zero-copy, ~1.1μs) ``` This means: - 640×480 RGB image = 921,600 bytes of pixel data, but only 8 bytes transit the ring buffer - `to_numpy()` and `np.from_dlpack()` return views into the pool — no copy - The pool handles allocation, reference counting, and recycling automatically ```python # simplified # Sending an image — pool allocation is automatic img = horus.Image(480, 640, "rgb8") topic.send(img) # Sends 8-byte descriptor, not 900KB of pixels # Receiving — zero-copy view into shared pool received = topic.recv() frame = received.to_numpy() # (480, 640, 3) uint8 — backed by SHM pool ``` See [NumPy & Zero-Copy](/python/numpy-zerocopy) for detailed patterns and performance data. --- ## Design Decisions **Why pull-based (`recv()`) not push-based (callbacks)?** Pull-based keeps timing deterministic — your tick controls when data is consumed. Callbacks fire at arbitrary times, making budget compliance impossible. The scheduler needs to know exactly how long your tick takes. **Why ring buffer, not a queue?** A queue grows without bound if the consumer is slow — dangerous for embedded systems with fixed memory. A ring buffer has fixed size and drops oldest messages on overflow. This matches robotics semantics: the latest sensor reading is more useful than a stale one. **Why overflow drops oldest, not newest?** In robotics, the latest data is always more relevant. A 50ms-old LiDAR scan is more useful for obstacle avoidance than a 200ms-old one. Dropping the oldest preserves freshness. **Why pool-backed Image instead of ring buffer?** A 640×480 RGB image is 900KB — too large for the ring buffer. Pool allocation puts the data in a separate SHM region and passes only a descriptor (~8 bytes) through the ring buffer. This gives zero-copy semantics at any image size. --- ## Trade-offs | Choice | Benefit | Cost | |--------|---------|------| | Ring buffer (fixed size) | Bounded memory, never blocks | Messages dropped on overflow | | Drop oldest on overflow | Latest data always available | Historical messages lost | | Pull-based recv() | Deterministic timing | Must poll every tick | | Pool-backed Image/PointCloud | Zero-copy at any size | Pool memory pre-allocated | | Auto-topic creation | Works without declaration | Undeclared topics invisible to monitoring | --- ## See Also - [Node API](/python/api/node) — Node send/recv (auto-created topics from pubs/subs) - [Standard Messages](/python/api/messages) — 75+ typed message types - [NumPy & Zero-Copy](/python/numpy-zerocopy) — DLPack and pool-backed image patterns - [Shared Memory](/python/shared-memory) — How zero-copy transport works - [Topics Deep-Dive](/python/topics-guide) — Topic patterns and best practices - [Rust Topic API](/rust/api/topic) — Rust equivalent (704 lines) --- ## Debugging & Introspection (Python) Path: /python/debugging Description: Tools and techniques for debugging HORUS Python nodes: CLI introspection, profiling slow ticks, diagnosing dropped messages, and monitoring node health # Debugging & Introspection (Python) When a robot misbehaves, you need answers fast. Is the sensor publishing? Is the controller receiving? Is the tick too slow? HORUS gives you three layers of debugging tools: CLI commands that work from any terminal, a live dashboard that shows everything at once, and a programmatic introspection API you can use inside your nodes. This guide covers all three, with step-by-step workflows for the most common problems. ## Your First Debugging Tool Before writing any debug code, open a second terminal. HORUS's CLI tools let you inspect a running system from the outside --- no code changes, no restarts. ### Listing Active Topics ```bash horus topic list ``` ``` TOPICS (3 active) cmd_vel CmdVel 1024 capacity 2 subscribers imu Imu 1024 capacity 1 subscriber scan LaserScan 1024 capacity 1 subscriber ``` This shows every topic in the current shared memory namespace: its name, message type, buffer capacity, and subscriber count. If a topic you expect is missing, the node that creates it is either not running or not started with `horus run`. ### Watching Messages in Real Time ```bash horus topic echo cmd_vel ``` ``` [0.001s] CmdVel { linear: 0.5, angular: 0.1 } [0.011s] CmdVel { linear: 0.5, angular: 0.0 } [0.021s] CmdVel { linear: 0.3, angular: -0.2 } ``` `echo` prints every message published to a topic, with a timestamp relative to when you started listening. Use this to verify: - That a publisher is actually sending data - That the data looks correct (no NaN values, no stale readings) - That messages arrive at the expected frequency Press Ctrl+C to stop. --- ## Verifying Topics Are Publishing A silent topic is one of the most common problems. `horus topic hz` measures the actual publish rate: ```bash horus topic hz imu ``` ``` average rate: 99.8 Hz min: 9.92ms max: 10.15ms std dev: 0.08ms ``` This tells you three things: 1. **The topic is active** --- if nothing is publishing, you will see `no messages received` after a few seconds. 2. **The rate matches your configuration** --- if you configured `rate=100` but see 50 Hz, your tick is taking too long (more on this below). 3. **Jitter is acceptable** --- a standard deviation over 1 ms on a 100 Hz topic suggests the tick is sometimes overrunning its budget. **Quick rate check across all topics:** ```bash # In one terminal, list topics horus topic list # In another, check rates one by one horus topic hz cmd_vel horus topic hz scan ``` If a topic shows a rate significantly lower than the node's configured `rate`, the node's `tick()` is taking too long. Jump to [Debugging Slow tick()](#debugging-slow-tick) below. --- ## Using horus monitor For a live overview of the entire system, use the TUI dashboard: ```bash horus monitor ``` The monitor shows all nodes and topics in a single view, updated in real time: - **Node list** with per-node tick rate, average tick duration, deadline misses, and health state - **Topic list** with publish rate, subscriber count, and buffer usage - **System-wide stats**: total ticks, total errors, scheduler status This is the fastest way to find the bottleneck in a multi-node system. Look for: | Symptom in monitor | Likely cause | |-------------------|--------------| | Node tick duration climbing over time | Memory leak or unbounded data structure | | Deadline misses incrementing | Tick is too slow for configured rate | | Topic rate lower than expected | Publisher node is overloaded or blocked | | Subscriber count is 0 on a topic | No node is subscribed, or subscriber crashed | --- ## Debugging Slow tick() The most common performance problem in Python nodes is a `tick()` that takes longer than its budget. When this happens, the scheduler applies the node's `on_miss` policy and the actual tick rate drops below the configured rate. ### Detecting the Problem From the CLI: ```bash horus topic hz output_topic ``` If you configured `rate=100` (10 ms period) but `hz` reports 30 Hz, your tick is averaging ~33 ms. From inside the node, check `horus.budget_remaining()`: ```python # simplified import horus import time def tick(node): start = time.perf_counter() # ... your processing ... do_work() elapsed_ms = (time.perf_counter() - start) * 1000 remaining = horus.budget_remaining() if remaining < 0.001: # Less than 1ms remaining node.log_warning(f"Tick took {elapsed_ms:.1f}ms, budget nearly exhausted") ``` ### Profiling with cProfile For a detailed breakdown of where time is spent, use Python's built-in profiler. Since HORUS manages the tick loop, profile the tick function itself: ```python # simplified import horus import cProfile import pstats import io profiler = cProfile.Profile() profile_active = False def tick(node): global profile_active # Profile ticks 100-200 (skip warmup) tick_num = horus.tick() if tick_num == 100: profiler.enable() profile_active = True elif tick_num == 200 and profile_active: profiler.disable() profile_active = False # Print the top 20 time consumers s = io.StringIO() ps = pstats.Stats(profiler, stream=s).sort_stats("cumulative") ps.print_stats(20) node.log_info(f"Profile results:\n{s.getvalue()}") # Normal tick logic if node.has_msg("sensor"): data = node.recv("sensor") process(data) node.send("output", result) node = horus.Node(name="processor", subs=["sensor"], pubs=["output"], tick=tick, rate=50) horus.run(node) ``` ### Checking Budget Remaining `horus.budget_remaining()` returns the time left in the current tick's budget, in seconds. Use it to skip expensive optional work when you are running out of time: ```python # simplified import horus def tick(node): # Always do critical work if node.has_msg("sensor"): data = node.recv("sensor") cmd = compute_command(data) node.send("cmd_vel", cmd) # Only do visualization if budget allows if horus.budget_remaining() > 0.005: # More than 5ms left update_visualization(data) else: node.log_debug("Skipping visualization, budget low") ``` ### Common Causes of Slow Ticks | Cause | Symptom | Fix | |-------|---------|-----| | NumPy/OpenCV doing work on every tick | Consistent high tick time | Downsample input, reduce resolution | | ML model inference | Consistent 20-50 ms ticks | Lower rate, use `compute=True` so it runs on thread pool | | Allocating large arrays every tick | Tick time grows over time | Pre-allocate in `init()`, reuse buffers | | Calling `recv_all()` on a high-volume topic | Spiky tick times | Use `recv()` (one at a time) or increase tick rate | | File I/O or network calls in tick | Occasional very long ticks | Move to an async node or a separate node | --- ## Debugging Dropped Messages A message is "dropped" when the publisher writes to a full ring buffer, overwriting the oldest unread message. The subscriber never sees it. ### What Causes Drops Drops happen when the publisher sends faster than the subscriber reads. The ring buffer has a fixed capacity (default: 1024 messages). When it fills up, new messages overwrite the oldest ones. Common scenarios: - **Slow subscriber**: A 10 Hz subscriber on a 1000 Hz topic will miss messages between ticks. This is usually intentional --- the subscriber only needs the latest reading. - **Bursty publisher**: A node that publishes 100 messages in a single tick can overflow the buffer before the subscriber ticks. - **Subscriber stall**: If a subscriber's tick takes 500 ms, messages published during that time accumulate and may overflow. ### Detecting Drops There is no automatic drop counter (the ring buffer is lock-free by design). Instead, use these techniques: **Sequence numbers**: Add a counter to your messages and check for gaps on the subscriber side: ```python # simplified import horus # Publisher send_count = 0 def publisher_tick(node): global send_count send_count += 1 node.send("data", {"seq": send_count, "value": read_sensor()}) # Subscriber last_seq = 0 def subscriber_tick(node): global last_seq if node.has_msg("data"): msg = node.recv("data") if msg["seq"] != last_seq + 1: dropped = msg["seq"] - last_seq - 1 node.log_warning(f"Dropped {dropped} messages (seq {last_seq} -> {msg['seq']})") last_seq = msg["seq"] ``` **Rate comparison**: Compare `horus topic hz` on the publisher and subscriber output topics. If the publisher runs at 100 Hz but the subscriber output is at 50 Hz, messages are being lost. ### Preventing Drops **Increase buffer capacity** for topics that carry bursty or high-frequency data: ```python # simplified node = horus.Node( name="fast_sub", subs={"sensor": {"type": "sensor", "capacity": 4096}}, tick=process, rate=50, ) ``` **Use `recv_all()` to drain the buffer** when you need to process every message: ```python # simplified def tick(node): messages = node.recv_all("commands") for msg in messages: execute(msg) ``` **Match rates**: If both publisher and subscriber need to run at the same rate, set them explicitly: ```python # simplified sensor = horus.Node(name="sensor", pubs=["data"], tick=sense, rate=100, order=0) processor = horus.Node(name="proc", subs=["data"], pubs=["out"], tick=process, rate=100, order=1) ``` Setting `order` ensures the sensor publishes before the processor reads in the same tick cycle. --- ## Node Health Monitoring ### The node.info API During `tick()`, `init()`, and `shutdown()`, the node object exposes an `info` property with scheduler-managed metrics: ```python # simplified import horus def tick(node): # Basic metrics ticks = node.info.tick_count() errors = node.info.error_count() avg_ms = node.info.avg_tick_duration_ms() uptime = node.info.get_uptime() state = node.info.state # Full metrics snapshot (dict) metrics = node.info.get_metrics() # Periodic health logging if ticks % 500 == 0: node.log_info( f"Health: {ticks} ticks, {errors} errors, " f"avg {avg_ms:.2f}ms, uptime {uptime:.1f}s, state={state}" ) ``` | Method/Property | Returns | Description | |----------------|---------|-------------| | `node.info.tick_count()` | `int` | Total number of ticks executed | | `node.info.error_count()` | `int` | Total tick errors (exceptions) | | `node.info.successful_ticks()` | `int` | Ticks that completed without error | | `node.info.avg_tick_duration_ms()` | `float` | Average tick execution time in milliseconds | | `node.info.get_uptime()` | `float` | Seconds since the node's `init()` was called | | `node.info.get_metrics()` | `dict` | Full metrics snapshot with all available data | | `node.info.state` | `str` | Current node state (see NodeState below) | | `node.info.set_custom_data(key, value)` | --- | Attach custom key-value metadata | | `node.info.get_custom_data(key)` | value | Retrieve custom metadata | **NodeState values**: `UNINITIALIZED`, `INITIALIZING`, `RUNNING`, `STOPPING`, `STOPPED`, `ERROR`, `CRASHED` **Custom data** lets you attach arbitrary metadata visible to `horus monitor` and `get_node_stats()`: ```python # simplified def tick(node): # Track application-specific metrics detections = run_detector(frame) node.info.set_custom_data("detections_this_tick", len(detections)) node.info.set_custom_data("model_version", "yolov8n") ``` ### Scheduler-Level Monitoring with get_node_stats() When using a `Scheduler` directly (instead of `horus.run()`), you can query stats for any node: ```python # simplified import horus scheduler = horus.Scheduler(tick_rate=100, watchdog_ms=500) scheduler.add(sensor) scheduler.add(controller) scheduler.run(duration=10.0) # After the run completes for name in scheduler.get_node_names(): stats = scheduler.get_node_stats(name) print(f"{name}:") print(f" Total ticks: {stats['total_ticks']}") print(f" Failed ticks: {stats['failed_ticks']}") print(f" Avg tick: {stats.get('avg_tick_duration_ms', 0):.2f} ms") print(f" Max tick: {stats.get('max_tick_duration_ms', 0):.2f} ms") print(f" Errors: {stats['errors_count']}") print(f" Uptime: {stats.get('uptime_seconds', 0):.1f}s") ``` ### Safety Stats For systems with budgets, deadlines, or watchdogs, `safety_stats()` reports system-level safety events: ```python # simplified stats = scheduler.safety_stats() if stats: print(f"Budget overruns: {stats.get('budget_overruns', 0)}") print(f"Deadline misses: {stats.get('deadline_misses', 0)}") print(f"Watchdog expirations: {stats.get('watchdog_expirations', 0)}") ``` If any of these numbers are non-zero, the system is under stress. Budget overruns mean ticks are taking longer than expected. Deadline misses mean the `on_miss` policy is firing. Watchdog expirations mean a node was unresponsive for an extended period. ### RT Degradations If you requested RT features, check whether they were actually applied: ```python # simplified scheduler = horus.Scheduler(tick_rate=1000, rt=True) # ... add nodes and run ... if not scheduler.has_full_rt(): for d in scheduler.degradations(): print(f"Degraded: {d.get('feature')} -- {d.get('reason')}") caps = scheduler.capabilities() print(f"RT priority: {caps.get('max_priority', 'N/A')}") print(f"Memory lock: {caps.get('memory_locking', False)}") ``` --- ## Logging from Nodes HORUS provides structured logging methods on the node object. These integrate with the scheduler's log system and appear in `horus monitor`. ```python # simplified def tick(node): node.log_debug("Starting tick processing") if node.has_msg("sensor"): data = node.recv("sensor") node.log_info(f"Received sensor reading: {data}") if data.get("value", 0) > THRESHOLD: node.log_warning(f"Sensor value {data['value']} exceeds threshold {THRESHOLD}") else: node.log_debug("No sensor data this tick") def init(node): node.log_info(f"Node initialized, publishing to: {node.publishers()}") node.log_info(f"Subscribed to: {node.subscribers()}") def shutdown(node): node.log_info("Shutting down, cleaning up resources") ``` | Method | Use for | |--------|---------| | `node.log_debug(msg)` | Verbose tracing during development | | `node.log_info(msg)` | Normal operational messages | | `node.log_warning(msg)` | Anomalies that do not stop operation (stale data, high latency) | | `node.log_error(msg)` | Failures that affect node behavior | **Logging performance tip**: Avoid expensive string formatting in debug logs when they are not needed. Python evaluates f-string arguments before the function call: ```python # simplified # Bad: formats the string even if debug logging is suppressed node.log_debug(f"Full state dump: {expensive_serialize(state)}") # Better: guard expensive formatting if DEBUG_ENABLED: node.log_debug(f"Full state dump: {expensive_serialize(state)}") ``` --- ## Common Debugging Workflows ### "My subscriber is not getting messages" **Step 1: Verify the publisher is running.** ```bash horus topic list ``` If the topic is missing, the publisher node is not running or was not started with `horus run`. **Step 2: Verify messages are being published.** ```bash horus topic echo sensor_data ``` If no output appears, the publisher's `tick()` is not calling `node.send()`. Check that: - The publisher's `pubs` list includes the topic name - The `send()` call is actually reached (no early return, no exception swallowed) - The publisher node is not in an error state (`horus monitor`) **Step 3: Verify the subscriber is subscribed to the correct topic name.** Topic names must match exactly, including case. A common mistake: ```python # simplified # Publisher pub = horus.Node(pubs=["sensor.data"], tick=pub_tick, rate=100) # Subscriber -- WRONG: underscore instead of dot sub = horus.Node(subs=["sensor_data"], tick=sub_tick, rate=100) ``` **Step 4: Check execution order.** If both nodes run in the same scheduler, the publisher must have a lower `order` than the subscriber: ```python # simplified sensor = horus.Node(name="sensor", pubs=["data"], tick=sense, rate=100, order=0) processor = horus.Node(name="proc", subs=["data"], tick=process, rate=100, order=1) ``` If `order` is wrong, the subscriber ticks before the publisher in every cycle and always sees an empty buffer. **Step 5: Check that `recv()` is called correctly.** ```python # simplified def tick(node): # WRONG: recv() returns None if no message, not an exception data = node.recv("data") process(data) # TypeError: process() got None # CORRECT: always check for None data = node.recv("data") if data is not None: process(data) # ALSO CORRECT: use has_msg() first if node.has_msg("data"): data = node.recv("data") process(data) ``` --- ### "My node is too slow" **Step 1: Measure the actual rate.** ```bash horus topic hz output_topic ``` Compare to the configured `rate`. If the measured rate is lower, the tick is overrunning. **Step 2: Check tick duration in the monitor.** ```bash horus monitor ``` Look at the average and max tick duration for the slow node. **Step 3: Profile the tick function.** Add timing inside the tick to isolate the expensive section: ```python # simplified import time def tick(node): t0 = time.perf_counter() data = node.recv("input") t1 = time.perf_counter() result = heavy_computation(data) t2 = time.perf_counter() node.send("output", result) t3 = time.perf_counter() node.log_debug( f"recv={1000*(t1-t0):.1f}ms " f"compute={1000*(t2-t1):.1f}ms " f"send={1000*(t3-t2):.1f}ms" ) ``` **Step 4: Apply the right fix.** | Bottleneck | Fix | |-----------|-----| | Heavy computation (NumPy, OpenCV) | Use `compute=True` to run on thread pool, or lower the rate | | ML inference | Lower rate to match inference time, or run on a dedicated node | | Data serialization | Use typed messages instead of dicts for zero-copy performance | | File or network I/O | Move to an async node (`async def tick`) | | Repeated allocation | Pre-allocate arrays in `init()`, reuse across ticks | **Step 5: Set a budget to catch future regressions.** ```python # simplified node = horus.Node( name="controller", tick=control_tick, rate=100, budget=0.008, # 8ms budget (80% of 10ms period) on_miss="warn", # Log a warning when exceeded ) ``` --- ### "My robot stops unexpectedly" **Step 1: Check for exceptions in node ticks.** If a node's `tick()` raises an unhandled exception, the default `failure_policy` is `"fatal"`, which stops the entire scheduler. Check the terminal output for tracebacks. **Step 2: Check safety stats after the run.** ```python # simplified scheduler = horus.Scheduler(tick_rate=100, watchdog_ms=500) scheduler.add(sensor) scheduler.add(controller) try: scheduler.run() except KeyboardInterrupt: pass # Check what happened stats = scheduler.safety_stats() if stats: print(f"Budget overruns: {stats.get('budget_overruns', 0)}") print(f"Deadline misses: {stats.get('deadline_misses', 0)}") print(f"Watchdog expirations: {stats.get('watchdog_expirations', 0)}") for name in scheduler.get_node_names(): ns = scheduler.get_node_stats(name) if ns['failed_ticks'] > 0: print(f"Node '{name}' had {ns['failed_ticks']} failed ticks") ``` **Step 3: Check for `on_miss="stop"` triggering.** If any node has `on_miss="stop"`, a single deadline miss will halt the system. Change to `"warn"` during development: ```python # simplified # Development: warn on deadline miss motor = horus.Node(tick=motor_fn, rate=500, budget=0.0016, on_miss="warn") # Production: stop on deadline miss (safety-critical) motor = horus.Node(tick=motor_fn, rate=500, budget=0.0016, on_miss="stop") ``` **Step 4: Check for `request_stop()` calls.** Search your code for `node.request_stop()`. A node may be intentionally stopping the scheduler based on a condition: ```python # simplified def tick(node): if battery_voltage < CRITICAL_THRESHOLD: node.log_error(f"Battery critical: {battery_voltage}V") node.request_stop() # This stops the whole system ``` **Step 5: Use a non-fatal failure policy during development.** ```python # simplified node = horus.Node( name="experimental", tick=experimental_tick, rate=30, failure_policy="skip", # Skip failed ticks instead of crashing ) ``` | Policy | Behavior | When to use | |--------|----------|-------------| | `"fatal"` | Stop the entire scheduler (default) | Production safety-critical nodes | | `"restart"` | Re-run `init()` and resume | Nodes that can recover from errors | | `"skip"` | Skip the failed tick, continue | Development, non-critical nodes | | `"ignore"` | Silently ignore the error | Logging, telemetry | --- ## Python-Specific Debugging ### GIL and Threading Python's Global Interpreter Lock (GIL) means only one thread executes Python bytecode at a time. This has specific implications for HORUS: - **`compute=True` only helps if your code releases the GIL.** NumPy, OpenCV, and PyTorch release the GIL during C/CUDA operations, so `compute=True` lets them run in parallel. Pure Python computation does not benefit. - **Multiple Python nodes in the same scheduler share the GIL.** If one node's tick runs pure Python for 10 ms, other Python nodes cannot tick during that time, even on separate threads. **Detecting GIL contention:** ```python # simplified import horus import time def tick(node): wall_start = time.perf_counter() cpu_start = time.process_time() do_work() wall_elapsed = time.perf_counter() - wall_start cpu_elapsed = time.process_time() - cpu_start # If wall >> cpu, the thread was waiting (GIL or I/O) if wall_elapsed > cpu_elapsed * 1.5: node.log_warning( f"GIL contention: wall={wall_elapsed*1000:.1f}ms " f"cpu={cpu_elapsed*1000:.1f}ms" ) ``` **Mitigations:** - Move GIL-heavy nodes to separate processes using HORUS multi-process mode - Use libraries that release the GIL (NumPy, OpenCV, SciPy, PyTorch) - Lower the rate of pure-Python nodes so they do not contend with critical nodes ### Memory Leaks Python's garbage collector handles most memory, but leaks still happen --- typically from: - Growing lists or dicts that are never cleared - Circular references with `__del__` methods - C extension objects that are not properly released **Detecting leaks by tracking tick duration:** ```python # simplified import horus import tracemalloc def init(node): tracemalloc.start() node.info.set_custom_data("baseline_mem", 0) def tick(node): if horus.tick() % 1000 == 0: current, peak = tracemalloc.get_traced_memory() node.log_info(f"Memory: current={current/1024:.0f}KB, peak={peak/1024:.0f}KB") # If memory keeps growing, take a snapshot if current > 100 * 1024 * 1024: # Over 100MB snapshot = tracemalloc.take_snapshot() top = snapshot.statistics("lineno")[:5] for stat in top: node.log_warning(f" {stat}") ``` **Common leak patterns in HORUS nodes:** ```python # simplified # LEAK: appending to a list every tick without bound history = [] def tick(node): data = node.recv("sensor") if data: history.append(data) # Grows forever # FIX: use a fixed-size buffer from collections import deque history = deque(maxlen=1000) def tick(node): data = node.recv("sensor") if data: history.append(data) # Oldest entries automatically removed ``` ### Exception Tracing When a `tick()` raises an exception, HORUS catches it and applies the `failure_policy`. To get full tracebacks for debugging, add an `on_error` handler: ```python # simplified import horus import traceback def on_error(node, error): tb = traceback.format_exception(type(error), error, error.__traceback__) node.log_error(f"Exception in tick:\n{''.join(tb)}") node = horus.Node( name="processor", tick=process_tick, on_error=on_error, rate=50, failure_policy="skip", # Continue after errors ) ``` For intermittent errors that are hard to reproduce, log the node state alongside the traceback: ```python # simplified def on_error(node, error): context = { "tick": horus.tick(), "elapsed": horus.elapsed(), "has_sensor": node.has_msg("sensor"), "tick_count": node.info.tick_count() if node.info else "N/A", "error_count": node.info.error_count() if node.info else "N/A", } node.log_error(f"Error at {context}: {error}") ``` ### Debugging with pdb For interactive debugging, you can use `pdb` inside a tick, but be aware that the scheduler pauses while you are in the debugger --- other nodes will not tick, and watchdogs may fire. ```python # simplified import horus def tick(node): data = node.recv("sensor") if data and data.get("value", 0) < 0: # Drop into debugger on unexpected negative value import pdb; pdb.set_trace() process(data) ``` --- ## Quick Reference ### CLI Commands | Command | What it shows | |---------|--------------| | `horus topic list` | All active topics with type, capacity, and subscriber count | | `horus topic echo ` | Live stream of messages on a topic | | `horus topic hz ` | Measured publish rate with min/max/stddev timing | | `horus monitor` | Full-system TUI dashboard with nodes, topics, and metrics | ### Python Introspection API | Call | Returns | Description | |------|---------|-------------| | `node.info.tick_count()` | `int` | Total ticks executed | | `node.info.error_count()` | `int` | Total tick errors | | `node.info.avg_tick_duration_ms()` | `float` | Average tick time in ms | | `node.info.get_uptime()` | `float` | Seconds since init | | `node.info.get_metrics()` | `dict` | Full metrics snapshot | | `node.info.state` | `str` | Current node state | | `node.info.set_custom_data(k, v)` | --- | Attach custom metadata | | `node.info.get_custom_data(k)` | value | Retrieve custom metadata | | `horus.budget_remaining()` | `float` | Seconds left in tick budget | | `sched.get_node_stats(name)` | `dict` | Per-node stats from scheduler | | `sched.get_all_nodes()` | `list` | All nodes with config | | `sched.safety_stats()` | `dict` | Budget overruns, deadline misses, watchdog expirations | | `sched.degradations()` | `list` | RT features that could not be applied | | `sched.capabilities()` | `dict` | Detected RT capabilities | | `sched.status()` | `str` | Scheduler state: idle, running, stopped | --- ## See Also - [Python API Reference](/python/api) --- Full Node, Scheduler, and message API - [Python Bindings](/python/api/python-bindings) --- Detailed API with all Scheduler methods - [Common Mistakes (Python)](/getting-started/common-mistakes-python) --- Pitfalls and how to avoid them - [Performance Guide](/performance/performance) --- Optimization techniques for HORUS nodes - [Second Application (Python)](/getting-started/second-application-python) --- Intro to CLI introspection with `horus topic` --- ## Control Messages Path: /python/messages/control Description: Motor, servo, drive, PID, and joint control types for Python robotics — every method explained # Control Messages Control messages are how Python users make robots move — commanding motors, servos, wheels, and arm joints. The factory methods and presets handle the domain-specific math (kinematics, PID tuning) so you don't have to. **When you need these:** If your robot has actuators — motors, servos, wheels, arm joints — you need control messages. `CmdVel` for simple differential drive, `MotorCommand` for direct motor control, `JointCommand` for multi-DOF arms. ```python # simplified from horus import ( CmdVel, MotorCommand, ServoCommand, DifferentialDriveCommand, PidConfig, TrajectoryPoint, JointCommand, ) ``` > **Note:** `CmdVel` is documented in [Geometry Messages](/python/messages/geometry#cmdvel) since it's a velocity type, but it's listed here too because it's the most common control message. --- ## MotorCommand Individual motor control with mode-specific factory methods. The factories set the right mode, enable flag, and limits — use them instead of setting raw integers. ### Constructor ```python # simplified cmd = MotorCommand(motor_id=0, mode=0, target=1.5, max_velocity=10.0, max_acceleration=5.0, enable=True) ``` ### `.velocity(motor_id, velocity)` — Speed Control ```python # simplified cmd = MotorCommand.velocity(motor_id=0, velocity=1.5) # 1.5 rad/s ``` Creates a velocity-mode command. The motor controller maintains the target speed using its internal PID loop. Use this for wheels, conveyors, or any continuous rotation. The `velocity` parameter is in rad/s for rotary motors or m/s for linear actuators — check your motor controller's documentation. ### `.position(motor_id, position, max_velocity)` — Position Control ```python # simplified cmd = MotorCommand.position(motor_id=0, position=3.14, max_velocity=2.0) ``` Creates a position-mode command. The motor moves to the target position at up to `max_velocity`. Use this for precise positioning — homing, tool changes, pan/tilt control. The motor controller handles the trajectory — it accelerates, cruises, and decelerates to reach the target smoothly. ### `.stop(motor_id)` — Emergency Stop ```python # simplified cmd = MotorCommand.stop(motor_id=0) ``` Immediately stops the motor (zero target, disable). Use this for safety — collision detected, e-stop pressed, fault condition. > **Common mistake:** Sending `velocity(id, 0.0)` instead of `stop()`. A zero velocity command keeps the motor enabled and holding position. `stop()` actually disables the motor, allowing it to coast to a stop and drawing less power. ### `.is_valid()` — Validation ```python # simplified if not cmd.is_valid(): print("Invalid motor command — check parameters") ``` --- ## ServoCommand Angle-based servo control with speed limiting and degree-to-radian conversion. ### Constructor ```python # simplified cmd = ServoCommand(servo_id=0, position=1.57) # Radians ``` ### `.from_degrees(servo_id, degrees)` — From Angle in Degrees ```python # simplified cmd = ServoCommand.from_degrees(servo_id=0, degrees=90.0) # Internally converts to 1.5708 radians ``` Most servo specifications list angle ranges in degrees (0-180°, 0-270°, etc.), but horus uses radians internally. This factory handles the conversion so you can think in degrees. > **Common mistake:** Using `ServoCommand(servo_id=0, position=90.0)` — that's 90 **radians** (≈ 14 full rotations), which will slam the servo to its limit. Use `from_degrees()` for degree values. ### `.with_speed(servo_id, position, speed)` — Speed-Limited Movement ```python # simplified cmd = ServoCommand.with_speed(servo_id=0, position=1.57, speed=0.5) ``` Creates a command with a speed limit in rad/s. Without a speed limit, the servo moves as fast as its hardware allows — which can be dangerous for large servos or when carrying a load. ### `.disable(servo_id)` — Power Off ```python # simplified cmd = ServoCommand.disable(servo_id=0) ``` Powers off the servo motor. The servo becomes free-moving (no holding torque). Important for: - Reducing power consumption when the servo doesn't need to hold position - Allowing manual positioning of robot arms - Safety — a disabled servo can't exert force ### `.is_valid()` — Validation ```python # simplified print(cmd.is_valid()) ``` --- ## DifferentialDriveCommand Wheel-level control for differential drive robots (two-wheeled robots like TurtleBot, iRobot Create, most ground robots). The `from_twist()` factory is the key method — it converts a velocity command to left/right wheel speeds using inverse kinematics. ### Constructor ```python # simplified cmd = DifferentialDriveCommand(left_velocity=1.0, right_velocity=-1.0) # Equal and opposite = spin in place ``` ### `.from_twist(linear, angular, wheel_base, wheel_radius)` — Inverse Kinematics ```python # simplified cmd = DifferentialDriveCommand.from_twist( linear=0.5, # m/s forward angular=0.3, # rad/s turning (positive = left) wheel_base=0.3, # meters between wheel centers wheel_radius=0.05, # wheel radius in meters ) ``` Converts a desired robot velocity (linear + angular) to individual wheel speeds. The math: ``` v_left = (linear - angular × wheel_base / 2) / wheel_radius v_right = (linear + angular × wheel_base / 2) / wheel_radius ``` **Parameters you need to measure on your robot:** - `wheel_base`: distance between the centers of the two drive wheels (measure with a ruler) - `wheel_radius`: radius of each drive wheel (measure the diameter and divide by 2) When `angular > 0`, the robot turns **left** (counterclockwise from above). The left wheel slows down and the right wheel speeds up. > **Common mistake:** Swapping `wheel_base` and `wheel_radius`. If your robot drives 10x too fast or too slow, you probably swapped them. `wheel_base` is always larger (typically 0.1-0.5m) and `wheel_radius` is smaller (typically 0.02-0.1m). ### `.stop()` — Stop Both Wheels ```python # simplified cmd = DifferentialDriveCommand.stop() ``` Zero velocity on both wheels with enable=true. The motors actively hold position (zero speed). ### `.is_valid()` — Validation ```python # simplified print(cmd.is_valid()) ``` **Example — Teleop with Kinematics:** ```python # simplified from horus import Node, run, CmdVel, DifferentialDriveCommand, Topic cmd_vel_topic = Topic(CmdVel) wheel_topic = Topic(DifferentialDriveCommand) WHEEL_BASE = 0.287 # TurtleBot3 Burger WHEEL_RADIUS = 0.033 # TurtleBot3 Burger def convert_to_wheels(node): cmd = cmd_vel_topic.recv(node) if cmd is None: return wheel_cmd = DifferentialDriveCommand.from_twist( linear=cmd.linear, angular=cmd.angular, wheel_base=WHEEL_BASE, wheel_radius=WHEEL_RADIUS, ) wheel_topic.send(wheel_cmd, node) run(Node(tick=convert_to_wheels, rate=50, pubs=["wheel_cmd"], subs=["cmd_vel"])) ``` --- ## PidConfig PID controller gains with preset configurations. The presets are the easiest way to get started — pick the simplest controller that works, then add complexity only if needed. ### Constructor ```python # simplified pid = PidConfig(kp=2.0, ki=0.1, kd=0.05) ``` ### `.proportional(kp)` — P Controller ```python # simplified pid = PidConfig.proportional(kp=1.0) ``` Proportional-only. The simplest controller — output is proportional to error. Fast response, but always has **steady-state error** (the system never quite reaches the target). **Use when:** Speed matters more than precision. Coarse positioning, rough speed control. ### `.pi(kp, ki)` — PI Controller ```python # simplified pid = PidConfig.pi(kp=1.0, ki=0.1) ``` Proportional + Integral. The integral term eliminates steady-state error — given enough time, the system reaches the exact target. But high `ki` causes **overshoot and oscillation**. **Use when:** You need zero steady-state error. Temperature control, steady-state speed tracking. ### `.pd(kp, kd)` — PD Controller ```python # simplified pid = PidConfig.pd(kp=1.0, kd=0.05) ``` Proportional + Derivative. The derivative term damps oscillations and improves transient response. But there's no integral term, so steady-state error persists. **Use when:** You need fast, smooth response without overshoot. Joint angle control, quick positioning. ### `.with_limits(integral_limit, output_limit)` — Anti-Windup ```python # simplified pid = PidConfig(kp=2.0, ki=0.5, kd=0.1).with_limits( integral_limit=10.0, # Clamp integral accumulator output_limit=100.0, # Clamp output value ) ``` Sets bounds on the integral accumulator and the total output. Without limits, the integral term can "wind up" during sustained error (e.g., when the motor is stalled) and cause massive overshoot when the error finally clears. Returns a new `PidConfig` with the limits applied. > **Common mistake:** Setting `ki` without `with_limits()`. The integral will accumulate unboundedly, eventually producing huge output spikes. Always set limits when using integral control. ### `.is_valid()` — Validation ```python # simplified if not pid.is_valid(): print("Invalid PID gains — kp must be >= 0, all values finite") ``` --- ## TrajectoryPoint A point along a trajectory with position, velocity, and time. Used for smooth motion planning where you specify the desired state at each time step. ### `.new_2d(x, y, vx, vy, time)` — 2D Trajectory Point ```python # simplified tp = TrajectoryPoint.new_2d(x=1.0, y=2.0, vx=0.5, vy=0.0, time=1.0) ``` Creates a 2D trajectory point with position (x, y), velocity (vx, vy), and time-from-start in seconds. Used for 2D path planning where you want the robot at specific positions at specific times. ### `.stationary(x, y, z)` — Hold Position ```python # simplified hold = TrajectoryPoint.stationary(x=5.0, y=3.0, z=0.0) ``` Creates a point with zero velocity — the robot should be stationary at this position. Use this for the final point in a trajectory (the "stop here" point). --- ## JointCommand Multi-joint command for robot arms and legged robots. Add position or velocity commands for individual joints by name. ### Constructor ```python # simplified cmd = JointCommand() ``` ### `.add_position(name, position)` — Position a Joint ```python # simplified cmd = JointCommand() cmd.add_position("shoulder", 1.57) # Move shoulder to 90° cmd.add_position("elbow", 0.785) # Move elbow to 45° cmd.add_position("wrist_rotate", 0.0) # Center wrist ``` Adds a position command for the named joint. Position is in radians for revolute joints, meters for prismatic joints. Raises `ValueError` if the joint limit (16 joints) is exceeded. > **Common mistake:** Joint names must match exactly what the arm driver publishes in `JointState`. If the driver uses "joint_1" but you send "shoulder", the command is ignored. Check `JointState.names` to see what names your hardware uses. ### `.add_velocity(name, velocity)` — Velocity-Control a Joint ```python # simplified cmd = JointCommand() cmd.add_velocity("wheel_left", 1.0) # 1 rad/s cmd.add_velocity("wheel_right", 1.0) # 1 rad/s ``` Same pattern but for velocity control. Typically used for wheels or continuous-rotation joints. ### `.is_valid()` — Validation ```python # simplified if not cmd.is_valid(): print("Invalid joint command") ``` **Example — Robot Arm Homing:** ```python # simplified from horus import JointCommand, Topic arm_cmd = Topic(JointCommand) def home_arm(node): cmd = JointCommand() cmd.add_position("shoulder", 0.0) cmd.add_position("elbow", 0.0) cmd.add_position("wrist_pitch", 0.0) cmd.add_position("wrist_rotate", 0.0) cmd.add_position("gripper", 0.0) arm_cmd.send(cmd, node) ``` --- ## Design Decisions **Why factory methods instead of mode integers?** `MotorCommand.velocity(id, 1.5)` is self-documenting; `MotorCommand(motor_id=0, mode=1, target=1.5)` requires you to memorize that mode 1 is velocity. Factory methods encode domain knowledge (correct mode flag, default limits, enable state) so you cannot construct an invalid command by accident. **Why does `DifferentialDriveCommand.from_twist()` require wheel geometry?** Different robots have different wheel sizes and spacing. Hardcoding these values would work only for one robot. By requiring `wheel_base` and `wheel_radius` as parameters, the same code works for any differential drive robot. Measure your robot once, define them as constants, and the kinematics are correct forever. **Why PID presets (`proportional()`, `pi()`, `pd()`) instead of just the constructor?** Most PID tuning starts with a P controller and adds complexity only if needed. The presets encode this workflow: start with `proportional()`, add integral with `pi()` if you have steady-state error, add derivative with `pd()` if you have oscillation. The constructor exists for when you need all three gains, but the presets catch the 80% case. **Why `JointCommand` uses name-based addressing instead of index arrays?** Index-based commands are fragile — if someone adds a joint to the URDF, all indices shift and every command node breaks. Name-based addressing is robust to hardware changes. The cost is a string comparison per joint, but with a 16-joint limit, this is negligible. --- ## See Also - [CmdVel](/python/messages/geometry#cmdvel) — 2D velocity command (in Geometry page) - [Geometry Messages](/python/messages/geometry) — Twist, Pose2D for navigation - [Navigation Messages](/python/messages/navigation) — NavGoal, path following - [Sensor Messages](/python/messages/sensor) — JointState for reading joint positions - [Python Message Library](/python/library/python-message-library) — All 55+ message types overview --- ## Clock API Path: /python/api/clock Description: Framework clock functions — now(), dt(), elapsed(), tick(), budget_remaining(), rng_float(), and unit constants # Clock API Framework-aware clock that respects deterministic mode. Use these instead of `time.time()` for reproducible behavior. ```python # simplified import horus dt = horus.dt() # Timestep for this tick remaining = horus.budget_remaining() # Budget left ``` --- ## Functions | Function | Returns | Description | |----------|---------|-------------| | `horus.now()` | `float` | Current time in seconds (wall clock or SimClock) | | `horus.dt()` | `float` | Timestep for this tick (actual elapsed or fixed `1/rate`) | | `horus.elapsed()` | `float` | Seconds since scheduler start | | `horus.tick()` | `int` | Current tick number | | `horus.budget_remaining()` | `float` | Seconds of budget left this tick (`inf` if no budget) | | `horus.rng_float()` | `float` | Random `[0.0, 1.0)` (system entropy or tick-seeded in deterministic) | | `horus.timestamp_ns()` | `int` | Nanosecond timestamp for TransformFrame queries | --- ## Unit Constants | Constant | Value | Usage | |----------|-------|-------| | `horus.us` | `1e-6` | Microseconds: `300 * horus.us` = 300μs | | `horus.ms` | `1e-3` | Milliseconds: `1 * horus.ms` = 1ms | ```python # simplified # Budget and deadline use seconds — multiply with constants node = horus.Node( budget=300 * horus.us, # 300 microseconds deadline=900 * horus.us, # 900 microseconds ... ) ``` --- ## Wall Clock vs SimClock | Function | Normal Mode | Deterministic Mode | |----------|------------|-------------------| | `now()` | Wall clock (`time.monotonic()`) | SimClock (advances by exactly `1/rate` per tick) | | `dt()` | Actual elapsed since last tick | Fixed `1/rate` (e.g., 0.01 for 100 Hz) | | `elapsed()` | Real wall time since start | Simulated time since start | | `tick()` | Monotonic tick counter | Same | | `rng_float()` | System entropy (non-deterministic) | Tick-seeded (deterministic, reproducible) | | `budget_remaining()` | Real remaining budget | Same (budget enforcement still real-time) | Enable deterministic mode: ```python # simplified horus.run(my_node, deterministic=True) # or sched = horus.Scheduler(deterministic=True) ``` --- ## Adaptive Quality with budget_remaining() Use `budget_remaining()` to do more work when time permits: ```python # simplified import horus def planner_tick(node): # Always compute basic path path = compute_basic_path() # Only optimize if budget allows if horus.budget_remaining() > 2 * horus.ms: path = optimize_path(path) # Only smooth if still have time if horus.budget_remaining() > 1 * horus.ms: path = smooth_path(path) node.send("path", path) planner = horus.Node( name="planner", tick=planner_tick, rate=10, budget=20 * horus.ms, # 20ms budget on_miss="warn", pubs=["path"], ) ``` --- ## Physics Integration with dt() Use `dt()` for frame-rate-independent physics. In deterministic mode, `dt()` returns a fixed value making simulations reproducible: ```python # simplified import horus velocity = [0.0, 0.0] position = [0.0, 0.0] def physics_tick(node): cmd = node.recv("cmd_vel") if cmd: velocity[0] = cmd.linear velocity[1] = cmd.angular # dt() adapts to actual tick rate; fixed in deterministic mode dt = horus.dt() position[0] += velocity[0] * dt position[1] += velocity[1] * dt node.send("odom", {"x": position[0], "y": position[1]}) ``` --- ## Reproducible Random Numbers `rng_float()` returns system entropy normally, but tick-seeded deterministic values when `deterministic=True`. Use it instead of `random.random()` for reproducible behavior in replay: ```python # simplified import horus def sensor_tick(node): # Noise is reproducible across runs in deterministic mode noise = horus.rng_float() * 0.01 reading = 9.81 + noise node.send("accel_z", {"value": reading}) ``` --- ## See Also - [Scheduler API](/python/api/scheduler) — Deterministic mode, tick_rate - [Node API](/python/api/node) — Budget, deadline, on_miss kwargs - [Deterministic Mode](/advanced/deterministic-mode) — Full deterministic mode guide - [Rust Duration/Frequency API](/rust/api/duration-ext) — Rust equivalent (`100_u64.hz()`, `200_u64.us()`) --- ## Node Configuration Composition (Python) Path: /python/builder-composition Description: How Python Node() parameters interact, compose, and override each other — the complete reference for combining rate, compute, budget, on, and friends # Node Configuration Composition (Python) You know what `rate` does. You know what `compute` does. But what happens when you pass both to `Node()`? Does `rate` make it RT, or does `compute` override that? What if you add `budget` on top? Does it matter whether you also set `on_miss`? These are the questions that trip up every Python HORUS developer eventually. Each parameter's documentation explains what it does in isolation, but the real power — and the real confusion — comes from combining them. This page is the complete reference for how `Node()` parameters interact. ## The Core Rule: All-at-Once Resolution Every `Node()` parameter is just **a value on a form**. When you call `horus.run()` or `sched.run()`, the scheduler looks at *everything* you passed and resolves the configuration in one pass. This means **parameter order does not matter** — because there is no order. Python kwargs are evaluated together: ```python # simplified import horus # These three produce the EXACT same node configuration: horus.Node(name="a", tick=my_fn, rate=100, compute=True, order=5) horus.Node(name="a", tick=my_fn, compute=True, order=5, rate=100) horus.Node(name="a", tick=my_fn, order=5, compute=True, rate=100) ``` All three result in a **Compute** node that ticks at most 100 times per second. The scheduler sees both `rate=100` and `compute=True`, and `compute=True` wins — it determines the execution class, while `rate` becomes a frequency cap. ## The `rate` Dual Meaning This is the single most important interaction to understand. `rate` changes its behavior based on what else you pass: ```python # simplified import horus us = horus.us # 1e-6 # Scenario A: rate alone → RT motor = horus.Node( name="motor", tick=motor_tick, rate=1000, # → Rt class, budget=800μs, deadline=950μs ) # Scenario B: rate + compute → just a frequency limiter planner = horus.Node( name="planner", tick=planner_tick, rate=10, # → just "tick at most 10x/sec" compute=True, # → Compute class, NO budget, NO deadline ) ``` **Why?** Because "run at 1,000 Hz" and "run at most 10 times per second" are different intents. A motor controller running at 1,000 Hz needs a dedicated thread, timing enforcement, and deadline monitoring. A path planner ticking at 10 Hz just needs a frequency cap — it is CPU-bound work that runs on the thread pool. The resolution rule: | `rate` combined with... | Resulting class | `rate` means... | |---|---|---| | Nothing else | **Rt** | "This node has real-time timing requirements" | | `compute=True` | **Compute** | "Tick at most N times per second" (frequency cap) | | `async def tick` | **AsyncIo** | "Tick at most N times per second" (frequency cap) | | `on="topic"` | **Event** | Ignored — Event nodes trigger on messages, not time | | `budget` or `deadline` only | **Rt** | Both rate and explicit timing — RT with overrides | ### What `rate` Auto-Derives (Rt Only) When `rate` results in the Rt class, it auto-derives timing parameters you did not set: ```python # simplified import horus us = horus.us ms = horus.ms # rate alone — everything auto-derived sensor = horus.Node( name="sensor", tick=sensor_tick, rate=100, # period = 10ms ) # Auto-derived: budget = 8ms (80%), deadline = 9.5ms (95%) # rate + explicit budget — budget overrides, deadline still auto sensor = horus.Node( name="sensor", tick=sensor_tick, rate=100, # period = 10ms budget=5 * ms, # explicit budget overrides 80% default ) # Result: budget = 5ms (explicit), deadline = 9.5ms (still auto-derived) # rate + both explicit — full manual control sensor = horus.Node( name="sensor", tick=sensor_tick, rate=100, budget=5 * ms, deadline=8 * ms, # explicit deadline overrides 95% default ) # Result: budget = 5ms, deadline = 8ms (both explicit) ``` | What you set | Budget | Deadline | |---|---|---| | `rate=100` only | 8ms (80% of 10ms) | 9.5ms (95% of 10ms) | | `rate=100, budget=5*ms` | 5ms (explicit) | 9.5ms (auto) | | `rate=100, deadline=8*ms` | 8ms (auto 80%) | 8ms (explicit) | | `rate=100, budget=5*ms, deadline=8*ms` | 5ms | 8ms | | `budget=5*ms` only (no rate) | 5ms | 5ms (deadline = budget) | | `deadline=8*ms` only (no rate) | None | 8ms | The auto-derivation formula: **budget = 80% of period**, **deadline = 95% of period**. These defaults give your tick 80% of the period to finish, with a 15% buffer between budget and the hard deadline. If your tick consistently runs within the budget, you have a 20% safety margin before the deadline fires. ## Full Interaction Matrix This table shows what happens when you combine any two configuration parameters. Read it as: "row parameter + column parameter produces what result?" ### Execution Class Parameters Only one execution class can be active. If you pass multiple, **the last-evaluated one wins** (with a warning logged): ```python # simplified # DON'T DO THIS — compute is silently overridden node = horus.Node( name="confused", tick=my_tick, compute=True, # overridden on="scan", # wins → Event class ) # Warning: "confused: compute=True overridden by on='scan' — only one execution class applies" ``` | First | + Second | Result | Notes | |---|---|---|---| | `compute=True` | `on="topic"` | Event | Warning: `compute` overridden | | `compute=True` | `async def tick` | AsyncIo | Warning: `compute` overridden | | `on="topic"` | `compute=True` | Compute | Warning: `on` overridden | | `on="topic"` | `async def tick` | AsyncIo | Warning: `on` overridden | | `async def tick` | `compute=True` | **Error** | Mutually exclusive | | `async def tick` | `on="topic"` | **Error** | Mutually exclusive | ### RT-Only Parameters on Non-RT Nodes Some parameters only make sense for RT nodes. Using them on the wrong execution class produces warnings or errors: | Parameter | On Rt node | On Compute node | On Event node | On AsyncIo node | On BestEffort node | |---|---|---|---|---|---| | `budget` | Sets budget | **Error** | **Error** | **Error** | Promotes to Rt | | `deadline` | Sets deadline | **Error** | **Error** | **Error** | Promotes to Rt | | `on_miss` | Sets policy | Warning (no effect) | Warning (no effect) | Warning (no effect) | Warning (no effect) | | `priority` | Sets OS priority | Warning (ignored) | Warning (ignored) | Warning (ignored) | Warning (ignored) | | `core` | Pins to CPU | Warning (ignored) | Warning (ignored) | Warning (ignored) | Warning (ignored) | | `watchdog` | Per-node watchdog | Works | Works | Works | Works | | `rate` | Sets tick rate | Frequency cap | Ignored | Frequency cap | Promotes to Rt | | `order` | Sets order | Sets order | Sets order | Sets order | Sets order | | `failure_policy` | Sets policy | Sets policy | Sets policy | Sets policy | Sets policy | ### The Promotion and Conflict Summary To make the interaction rules concrete, here is every path to each execution class: | Execution class | How to get it | |---|---| | **Rt** | `rate` alone; `budget` alone; `deadline` alone; `rate` + `budget`; `rate` + `deadline` | | **Compute** | `compute=True` (optionally with `rate` as frequency cap) | | **Event** | `on="topic"` (rate is ignored if present) | | **AsyncIo** | `async def tick` (optionally with `rate` as frequency cap) | | **BestEffort** | No rate, no compute, no on, no budget, no deadline, sync `def tick` | ## Goal-Oriented Recipes Instead of "what does this parameter do?", here is "I need X — which parameters do I pass?" ### "100 Hz sensor driver with deadline monitoring" ```python # simplified import horus imu_driver = horus.Node( name="imu_driver", tick=read_imu, order=1, # After safety monitor (order 0) rate=100, # 10ms period → Rt class on_miss="skip", # Drop a reading if we are late subs=["imu.config"], pubs=[horus.Imu], ) ``` **Why these parameters**: `rate=100` alone triggers Rt class with auto-derived 8ms budget and 9.5ms deadline. `on_miss="skip"` means if the driver stalls waiting for hardware, skip one reading rather than accumulating delay. `order=1` runs after safety-critical nodes. **What removing each parameter changes**: - Remove `rate` → BestEffort, no timing enforcement at all - Remove `on_miss` → defaults to `"warn"` (logs but takes no action) - Remove `order` → defaults to 100 (normal priority) ### "Background logger that must not starve RT nodes" ```python # simplified import horus logger = horus.Node( name="logger", tick=log_data, order=200, # Runs last compute=True, # Thread pool — not the main tick thread rate=10, # At most 10x/sec (NOT RT!) failure_policy="ignore", # Never crash for logging subs=["imu", "cmd_vel", "scan"], ) ``` **Why `compute=True` and not just BestEffort**: A logger doing disk I/O in the main loop would block all BestEffort nodes behind it. `compute=True` moves it to the thread pool. `rate=10` caps frequency (NOT RT — `compute=True` overrides that). **What if you used `async def tick` instead**: Also works if your logger does network I/O (cloud upload). Use `async def tick` for network I/O, `compute=True` for local file I/O with CPU-bound formatting. ### "Event-driven planner that reacts to new scans" ```python # simplified import horus planner = horus.Node( name="planner", tick=plan_path, order=5, on="lidar.scan", # Sleep until new scan arrives subs=[horus.LaserScan], pubs=["path"], ) ``` **Why not `rate`**: The planner has nothing to do until a new scan arrives. Polling at a fixed rate wastes CPU. `on="lidar.scan"` means zero CPU when idle, instant wake on new data. **Can you add `budget` to an Event node?** No — this is an error at startup. Event nodes trigger on data arrival, not on a fixed schedule, so deadline enforcement does not apply. ### "1 kHz motor controller on production hardware" ```python # simplified import horus us = horus.us motor_ctrl = horus.Node( name="motor_ctrl", tick=control_motors, order=0, # Highest priority rate=1000, # 1ms period → Rt budget=300 * us, # Must finish in 300μs deadline=900 * us, # Hard wall at 900μs on_miss="safe_mode", # Hold position on overrun priority=90, # OS-level SCHED_FIFO priority core=0, # Pinned to CPU 0 subs=[horus.CmdVel], pubs=["motor.pwm"], ) ``` **Why explicit `budget` and `deadline`**: Auto-derived values (800μs budget, 950μs deadline at 1 kHz) are generous defaults. After profiling, you know the motor controller takes ~200μs. Setting `budget=300*us` with `deadline=900*us` gives a tighter budget for monitoring while leaving headroom before the deadline fires. **Why `priority=90` and `core=0`**: On a multi-core robot computer, pinning the motor controller to an isolated CPU core eliminates jitter from OS scheduling and cache migration. `priority=90` ensures the kernel never preempts this thread for normal processes. ### "ML inference that takes 50-200ms" ```python # simplified import horus detector = horus.Node( name="yolo_detector", tick=run_inference, order=10, compute=True, # Thread pool — long-running is fine subs=[horus.Image], pubs=["detections"], ) ``` **Why no `rate`**: ML inference time varies (50-200ms depending on scene complexity). A fixed rate would either waste CPU (rate too low) or queue up work (rate too high). Let it run as fast as it can on the thread pool. **Why not `async def tick`**: ML inference is CPU-bound, not I/O-bound. `compute=True` runs on a CPU thread pool optimized for parallel work. `async def tick` runs on the async runtime, which is optimized for I/O waiting. ### "Safety monitor that must never miss" ```python # simplified import horus us = horus.us ms = horus.ms safety_monitor = horus.Node( name="safety_monitor", tick=check_safety, order=0, # Runs first, always rate=1000, # Matches fastest control loop budget=100 * us, # Must be extremely fast deadline=200 * us, # Tight deadline on_miss="stop", # Kill everything if this misses priority=99, # Maximum OS priority core=1, # Dedicated CPU core watchdog=5 * ms, # Tight per-node watchdog failure_policy="fatal", # Panic if tick() raises pubs=["safety.status"], ) ``` **Every parameter is load-bearing**: Remove any one and you lose a safety guarantee. This is the maximum-configuration pattern for the most critical node in your system. ### "Async telemetry uploader with graceful degradation" ```python # simplified import horus import aiohttp async def upload_tick(node): if node.has_msg("telemetry"): data = node.recv("telemetry") try: async with aiohttp.ClientSession() as session: await session.post("https://api.example.com/telemetry", json=data) except aiohttp.ClientError: node.log_warning("Upload failed — will retry next tick") uploader = horus.Node( name="uploader", tick=upload_tick, # async def → AsyncIo (auto-detected) rate=1, # At most 1x/sec (frequency cap, not RT) order=200, # Low priority failure_policy="ignore", # Never crash for telemetry subs=["telemetry"], ) ``` **Why no `compute=True`**: The `async def tick` is auto-detected and classified as AsyncIo. Adding `compute=True` would be an error — they are mutually exclusive. ### "Event-driven emergency stop handler" ```python # simplified import horus def handle_estop(node): node.log_error("EMERGENCY STOP received!") node.send("cmd_vel", horus.CmdVel(linear=0.0, angular=0.0)) node.request_stop() estop = horus.Node( name="estop", tick=handle_estop, order=0, # Highest priority on="emergency.stop", # Only fires when message arrives failure_policy="fatal", # If this fails, everything stops subs=["emergency.stop"], pubs=[horus.CmdVel], ) ``` **Why `on` instead of `rate`**: An emergency stop handler should not be polling. It should sleep and use zero CPU until the moment an `emergency.stop` message arrives. `on="emergency.stop"` provides instant wake-up with no wasted cycles. ## What Happens If I... Quick answers to common "what if" questions. **"...pass `rate` and `compute=True`?"** Compute class. `rate` becomes a frequency cap, not RT. No budget, no deadline, no timing enforcement. **"...pass `budget` without `rate`?"** RT class. `budget` alone implies "this node has timing requirements." Deadline auto-derived as `deadline = budget`. **"...pass `deadline` without `rate` or `budget`?"** RT class. Budget is not set (no auto-derivation without `rate`). The scheduler monitors wall time against the deadline. **"...set `budget` larger than `deadline`?"** Error at startup. Budget is "expected time," deadline is "maximum time." A budget larger than the deadline means you expect the work to take longer than the hard limit — that is a configuration mistake. **"...set `budget=0`?"** Error at startup. Zero budget is meaningless. **"...pass `on_miss="stop"` on a Compute node?"** Warning: "has no effect without a deadline." Compute nodes have no deadline, so the miss policy can never trigger. The node runs fine, but `on_miss` does nothing. **"...pass `priority=99` on a Compute node?"** Warning: "only RT nodes get SCHED_FIFO threads." Priority is silently ignored. The node runs fine — it just does not get OS-level priority. **"...pass `on=""` (empty topic string)?"** Error at startup. An Event node with an empty topic can never trigger. **"...pass `compute=True` with `async def tick`?"** Error at startup. These are mutually exclusive. `async def tick` runs on the async I/O runtime; `compute=True` runs on the CPU thread pool. Pick one. **"...pass `on="topic"` with `async def tick`?"** Error at startup. Event-driven triggering and async I/O are mutually exclusive. **"...just pass `name` and `tick` with no other parameters?"** BestEffort class at `rate=30` (the default) and `order=100`. Ticks in the main loop at the scheduler's global rate. This is the simplest valid configuration. **"...pass `rate=30` (the default) — is that RT?"** Yes. `rate` alone always means RT. Even `rate=30` produces an RT node with a 26.7ms budget (80% of 33.3ms) and 31.7ms deadline (95%). If you want 30 Hz without RT, add `compute=True` or use `async def tick`. **"...use `horus.run()` with `rt=True` and all nodes are Compute?"** `rt=True` on the scheduler enables system-level RT features (memory locking, SCHED_FIFO). But individual nodes still get their own execution class. A `compute=True` node stays on the thread pool even with `rt=True` on the scheduler — RT thread allocation only happens for nodes classified as Rt. ## Anti-Patterns ### Cargo-culting RT configuration ```python # simplified # WRONG: Adding RT parameters "just in case" to a logger logger = horus.Node( name="logger", tick=log_data, rate=100, budget=5 * horus.ms, priority=50, core=3, ) ``` This wastes a dedicated CPU thread and an entire CPU core on a logger. The `rate=100` alone makes it RT, `budget` confirms RT, and then `priority` and `core` pin it to real hardware resources. Use `compute=True` or just leave it as BestEffort with `rate=30`. ### Using `compute=True` for everything ```python # simplified # WRONG: Motor controller on the thread pool motor_ctrl = horus.Node( name="motor_ctrl", tick=control_motors, compute=True, # No timing guarantees! rate=1000, ) ``` `compute=True` with `rate=1000` gives you a frequency cap, not RT. The motor controller has no budget, no deadline, and no `on_miss` policy. When the thread pool is busy with other Compute nodes, the motor controller waits its turn. Use `rate=1000` alone for nodes with timing requirements. ### Deadline without a response plan ```python # simplified # QUESTIONABLE: Deadline set but using default on_miss="warn" motor_ctrl = horus.Node( name="motor_ctrl", tick=control_motors, rate=1000, budget=300 * horus.us, deadline=900 * horus.us, # on_miss defaults to "warn" ) ``` If you have set explicit budget and deadline, you have decided this node's timing matters. But the default `on_miss="warn"` just logs a warning and continues — the robot keeps moving with a late motor command. Add `on_miss="safe_mode"` or `on_miss="skip"` to define what should actually happen. ### Mixing intent across classes ```python # simplified # WRONG: Event node that also needs a deadline handler = horus.Node( name="handler", tick=handle_command, on="command", deadline=10 * horus.ms, # ERROR at startup! ) ``` Event nodes trigger on messages, not time. A deadline ("must finish within 10ms of... what?") does not apply because there is no periodic schedule to miss. If you need deadline enforcement, use `rate` instead of `on` and poll the topic in your tick function. ### on_miss without a deadline ```python # simplified # WARNING: on_miss has nothing to enforce processor = horus.Node( name="processor", tick=process_data, compute=True, rate=50, on_miss="stop", # Warning: no deadline to miss ) ``` `compute=True` means no deadline. `on_miss="stop"` says "stop the scheduler if I miss my deadline," but there is no deadline to miss. This builds successfully with a warning, but `on_miss` is dead code. ### priority on non-RT nodes ```python # simplified # WARNING: priority is silently ignored planner = horus.Node( name="planner", tick=plan_path, compute=True, rate=10, priority=90, # Warning: only RT gets SCHED_FIFO ) ``` `priority` sets the OS-level `SCHED_FIFO` scheduling priority, which only applies to dedicated RT threads. Compute nodes run on the thread pool where OS priority is managed by the thread pool itself. This builds with a warning, and `priority=90` is ignored. ## Putting It All Together: Complete System ```python # simplified import horus us = horus.us ms = horus.ms # --- Tick functions --- def check_safety(node): """Verify all systems nominal.""" imu = node.recv("imu") if imu and abs(imu.accel_z) < 5.0: node.send("safety.status", {"ok": False, "reason": "freefall"}) node.request_stop() node.send("safety.status", {"ok": True}) def read_imu(node): """Read IMU hardware, publish typed message.""" reading = read_hardware_imu() node.send("imu", horus.Imu( accel_x=reading[0], accel_y=reading[1], accel_z=reading[2], gyro_x=reading[3], gyro_y=reading[4], gyro_z=reading[5], )) def control_motors(node): """PID loop: read cmd_vel, write motor PWM.""" cmd = node.recv("cmd_vel") if cmd: left = cmd.linear - cmd.angular * 0.3 right = cmd.linear + cmd.angular * 0.3 node.send("motor.pwm", {"left": left, "right": right}) def handle_estop(node): """Immediate stop on emergency signal.""" node.send("cmd_vel", horus.CmdVel(linear=0.0, angular=0.0)) node.log_error("EMERGENCY STOP activated") node.request_stop() def plan_path(node): """CPU-heavy path planning from latest scan.""" scan = node.recv("scan") if scan: path = compute_path(scan) node.send("path", path) def run_inference(node): """ML object detection — variable duration.""" img = node.recv("camera.rgb") if img: detections = model.predict(img.to_numpy()) for det in detections: node.send("detections", { "class": det.class_name, "confidence": float(det.confidence), "bbox": [det.x1, det.y1, det.x2, det.y2], }) async def upload_telemetry(node): """Async cloud upload — network I/O.""" import aiohttp if node.has_msg("telemetry"): data = node.recv("telemetry") try: async with aiohttp.ClientSession() as session: await session.post("https://telemetry.example.com/v1", json=data) except Exception as e: node.log_warning(f"Upload failed: {e}") def update_dashboard(node): """Low-priority display update.""" stats = {"tick": horus.tick(), "elapsed": horus.elapsed()} node.send("dashboard", stats) # --- Node definitions --- # Safety monitor — maximum everything safety = horus.Node( name="safety_monitor", tick=check_safety, order=0, rate=1000, budget=100 * us, deadline=200 * us, on_miss="stop", priority=99, core=1, watchdog=5 * ms, failure_policy="fatal", subs=[horus.Imu], pubs=["safety.status"], ) # IMU driver — RT with auto-derived timing imu = horus.Node( name="imu_driver", tick=read_imu, order=1, rate=200, on_miss="skip", pubs=[horus.Imu], ) # Motor controller — strict RT motor = horus.Node( name="motor_ctrl", tick=control_motors, order=2, rate=500, on_miss="safe_mode", priority=80, core=0, subs=[horus.CmdVel], pubs=["motor.pwm"], ) # Emergency stop — event-driven, zero CPU when idle estop = horus.Node( name="estop", tick=handle_estop, order=0, on="emergency.stop", failure_policy="fatal", subs=["emergency.stop"], pubs=[horus.CmdVel], ) # Path planner — CPU-heavy, no rate constraint planner = horus.Node( name="planner", tick=plan_path, order=10, compute=True, subs=[horus.LaserScan], pubs=["path"], ) # ML detector — CPU-heavy, rate-limited detector = horus.Node( name="detector", tick=run_inference, order=11, compute=True, rate=10, subs=[horus.Image], pubs=["detections"], ) # Cloud telemetry — async I/O telemetry = horus.Node( name="telemetry", tick=upload_telemetry, rate=1, order=100, failure_policy="ignore", subs=["telemetry"], ) # Dashboard — BestEffort, no special needs dashboard = horus.Node( name="dashboard", tick=update_dashboard, order=200, pubs=["dashboard"], ) # --- Run the system --- sched = horus.Scheduler(tick_rate=500, watchdog_ms=500) sched.add(safety) sched.add(imu) sched.add(motor) sched.add(estop) sched.add(planner) sched.add(detector) sched.add(telemetry) sched.add(dashboard) sched.run() ``` Each node uses exactly the parameters it needs — no more, no less: | Node | Class | Why | |------|-------|-----| | `safety_monitor` | Rt | Every RT parameter enabled — the most critical node | | `imu_driver` | Rt | `rate=200` alone, auto-derived timing, skip on miss | | `motor_ctrl` | Rt | `rate=500` with explicit priority and CPU pinning | | `estop` | Event | `on="emergency.stop"` — zero CPU until triggered | | `planner` | Compute | `compute=True` — CPU-heavy, runs on thread pool | | `detector` | Compute | `compute=True, rate=10` — thread pool with frequency cap | | `telemetry` | AsyncIo | `async def tick` — auto-detected, network I/O | | `dashboard` | BestEffort | No class parameters — runs in main loop at default rate | ## Design Decisions **Why is parameter order irrelevant (all-at-once resolution)?** Python kwargs have no meaningful order. But even if they did, the scheduler resolves everything together at startup because the alternative — resolving eagerly as each parameter is parsed — creates subtle bugs. In an eager system, `rate=100, compute=True` and `compute=True, rate=100` could produce different nodes. All-at-once resolution eliminates this entire class of bugs. **Why does `rate` change meaning based on context?** The alternative was having two parameters: `rate` for RT and `frequency_cap` for non-RT. But this forces developers to understand execution classes before they can set a tick rate. With the current design, the intent is clear from context: `rate=1000` alone means "timing matters" (Rt), while `rate=10, compute=True` means "do not run too often" (frequency cap). The mental model is "describe what you need, the scheduler figures out how to run it." **Why errors instead of silent fixes for invalid combinations?** Setting `budget` on a Compute node is almost always a mistake — the developer thinks they are getting timing enforcement, but Compute nodes do not have it. Silently ignoring the budget would hide the bug. Erroring at startup catches it immediately, before the robot moves. The principle: configuration mistakes should fail fast, not fail silently on the factory floor. **Why is `async def tick` strict about combinations while `compute` and `on` are lenient?** `compute=True` and `on="topic"` are simple flags that the scheduler can reason about — when both are present, one clearly overrides the other. But `async def tick` fundamentally changes how the function runs (coroutine vs regular function). Running an async function on a synchronous thread pool (`compute`) would require wrapping it in `asyncio.run()` per tick, adding 50-100μs of event loop overhead. Rather than silently degrading performance, the scheduler rejects the combination. **Why kwargs instead of a builder chain?** Python does not have HORUS's Rust builder pattern (`.rate(100).compute().build()`). The idiomatic Python equivalent is kwargs: `Node(rate=100, compute=True)`. This gives the same "fill out a form" mental model with standard Python syntax. IDE auto-complete works, type checkers validate parameter types, and `help(horus.Node)` shows every option. ## Trade-offs | Gain | Cost | |------|------| | **All-at-once resolution** — no "pass this last" bugs | Must understand that `rate` alone means RT | | **`rate` dual meaning** — one parameter, context-dependent behavior | Must know that `rate` + `compute=True` is NOT RT | | **Strict validation** — catches mistakes at startup | Learning curve: must understand which combinations are valid | | **RT auto-detection** — no explicit `rt=True` per node | Less visible which nodes are RT (use `horus monitor` to check) | | **Warnings for ignored parameters** — `priority` on Compute logs a warning | Warning fatigue if you are intentionally experimenting | | **`async def` auto-detection** — zero config for async I/O | Cannot use async tick on compute thread pool (must choose) | ## See Also - [Python API Reference](/python/api) — Full Node, Scheduler, Topic, and Clock API - [Async Nodes](/python/api/async-nodes) — Async I/O patterns and best practices - [Execution Classes](/concepts/execution-classes) — The 5 classes and when to use each - [Choosing Configuration](/concepts/choosing-configuration) — Progressive complexity guide (Levels 0-5) - [Builder Composition (Concepts)](/concepts/builder-composition) — Language-agnostic composition reference --- ## Navigation Messages Path: /python/messages/navigation Description: Goals, paths, maps, and planning types for Python robotics navigation — every method explained # Navigation Messages Navigation messages represent the "where do I go?" problem — goals, paths, maps, and planning. These types work together: set a `NavGoal`, follow a `NavPath`, build an `OccupancyGrid`, plan with `CostMap`. **When you need these:** Any robot that moves autonomously. Mobile robots, delivery bots, drones with waypoint missions, cleaning robots with coverage plans. ```python # simplified from horus import ( NavGoal, GoalResult, Waypoint, NavPath, PathPlan, OccupancyGrid, CostMap, VelocityObstacle, VelocityObstacles, ) ``` --- ## NavGoal A navigation goal with tolerance-based arrival checking. This is the core of every "go to point" behavior — you set a target, then check `is_reached()` every tick until the robot arrives. ### Constructor ```python # simplified goal = NavGoal(x=10.0, y=5.0, theta=0.0, position_tolerance=0.1, angle_tolerance=0.05) ``` - `position_tolerance`: meters. The robot must be within this distance of (x, y). - `angle_tolerance`: radians. The robot's heading must be within this of theta. ### `.is_reached(current_pose)` — Has the Robot Arrived? ```python # simplified from horus import Pose2D current = Pose2D(x=9.95, y=5.02, theta=0.01) if goal.is_reached(current): print("Goal reached!") ``` The single most important navigation method. Returns `True` when **both** conditions are met: 1. Distance from current position to goal position < `position_tolerance` 2. Angular difference between current heading and goal heading < `angle_tolerance` This is what you call every tick in a navigation loop. When it returns `True`, send a stop command. > **Common mistake:** Using `Pose2D.distance_to()` instead of `is_reached()`. Distance only checks position — your robot can be at the right spot but facing the wrong direction. `is_reached()` checks both. ### `.is_position_reached(current_pose)` — Position Only ```python # simplified if goal.is_position_reached(current): print("At the right position (heading may differ)") ``` Checks position tolerance only, ignores heading. Use this when heading doesn't matter — picking up an object (you just need to be close), reaching a charging dock (alignment handled by a separate docking controller). ### `.is_orientation_reached(current_pose)` — Heading Only ```python # simplified if goal.is_orientation_reached(current): print("Facing the right direction (position may differ)") ``` Checks heading tolerance only, ignores position. Use this after position is already correct — for example, rotating in place to face a docking station before approaching. ### `.with_timeout(seconds)` — Set Time Limit ```python # simplified goal = NavGoal(x=10.0, y=5.0, theta=0.0, position_tolerance=0.2).with_timeout(30.0) ``` Returns a new NavGoal with a timeout. The timeout value is stored on the goal — your navigation controller checks it and aborts if time runs out. This prevents the robot from circling forever trying to reach an unreachable goal. ### `.with_priority(level)` — Set Priority ```python # simplified goal = goal.with_priority(1) # High priority ``` For multi-goal systems where goals can preempt each other. Higher priority goals get executed first. > **Common mistake:** Setting `position_tolerance` too tight. With sensor noise and control imprecision, a tolerance of 0.01m (1cm) means the robot may never "arrive" — it oscillates around the target. Use 0.05-0.2m for wheeled robots, 0.01-0.05m for precise arms. **Example — Go-to-Goal Navigation Loop:** ```python # simplified from horus import Node, run, NavGoal, Pose2D, CmdVel, Topic import math odom_topic = Topic(Pose2D) cmd_topic = Topic(CmdVel) goal = NavGoal(x=5.0, y=3.0, theta=0.0, position_tolerance=0.2, angle_tolerance=0.1).with_timeout(60.0) def navigate(node): current = odom_topic.recv(node) if current is None: return if goal.is_reached(current): cmd_topic.send(CmdVel.zero(), node) # Arrived — stop return # Simple proportional controller toward goal dx = goal.x - current.x dy = goal.y - current.y target_angle = math.atan2(dy, dx) angle_error = target_angle - current.theta # Normalize angle error to [-pi, pi] while angle_error > math.pi: angle_error -= 2 * math.pi while angle_error < -math.pi: angle_error += 2 * math.pi if abs(angle_error) > 0.3: # Turn toward goal first cmd_topic.send(CmdVel(linear=0.0, angular=0.5 * angle_error), node) else: # Drive toward goal dist = current.distance_to(Pose2D(x=goal.x, y=goal.y, theta=0.0)) speed = min(0.5, dist) # Slow down near goal cmd_topic.send(CmdVel(linear=speed, angular=0.3 * angle_error), node) run(Node(tick=navigate, rate=20, pubs=["cmd_vel"], subs=["odom"])) ``` --- ## NavPath An ordered sequence of waypoints for path following. Includes progress tracking for monitoring and pure-pursuit controllers. ### Constructor + Building ```python # simplified from horus import NavPath, Waypoint path = NavPath() path.add_waypoint(Waypoint(x=0.0, y=0.0)) path.add_waypoint(Waypoint(x=2.0, y=0.0)) path.add_waypoint(Waypoint(x=2.0, y=3.0).with_stop()) # Stop at end ``` ### `.closest_waypoint_index(current_pose)` — Nearest Waypoint ```python # simplified from horus import Pose2D current = Pose2D(x=1.5, y=0.2, theta=0.0) idx = path.closest_waypoint_index(current) print(f"Nearest waypoint: #{idx}") ``` Returns the index of the waypoint closest to the current pose, or `None` if the path is empty. This is the "lookahead" for pure-pursuit path following — you steer toward the closest waypoint, then advance to the next one once you pass it. ### `.calculate_progress(current_pose)` — How Far Along the Path? ```python # simplified progress = path.calculate_progress(current) print(f"Path progress: {progress*100:.0f}%") ``` Returns a float from 0.0 (at start) to 1.0 (at end). The progress is based on which waypoint is closest and how far between waypoints the robot is. Use this for: - Progress bars in monitoring dashboards - Deciding when to start slowing down (e.g., if progress > 0.9, reduce speed) - Logging and analytics **Example — Path Following with Progress:** ```python # simplified from horus import Node, run, NavPath, Waypoint, Pose2D, CmdVel, Topic path = NavPath() path.add_waypoint(Waypoint(x=0.0, y=0.0)) path.add_waypoint(Waypoint(x=3.0, y=0.0)) path.add_waypoint(Waypoint(x=3.0, y=4.0)) path.add_waypoint(Waypoint(x=0.0, y=4.0).with_stop()) odom = Topic(Pose2D) cmd = Topic(CmdVel) def follow_path(node): current = odom.recv(node) if current is None: return progress = path.calculate_progress(current) if progress >= 0.98: cmd.send(CmdVel.zero(), node) return # Steer toward closest waypoint (simplified) idx = path.closest_waypoint_index(current) if idx is not None: speed = 0.3 if progress < 0.9 else 0.1 # Slow near end cmd.send(CmdVel(linear=speed, angular=0.0), node) run(Node(tick=follow_path, rate=10, pubs=["cmd_vel"], subs=["odom"])) ``` --- ## Waypoint A single point on a path with optional velocity constraints. ### Constructor ```python # simplified wp = Waypoint(x=5.0, y=3.0, theta=0.0) ``` ### `.with_velocity(twist)` — Set Desired Velocity ```python # simplified from horus import Twist wp = Waypoint(x=5.0, y=3.0, theta=0.0).with_velocity( Twist.new_2d(linear_x=0.5, angular_z=0.0) ) ``` Returns a new Waypoint with a velocity constraint. The path follower should try to match this velocity when passing through the waypoint. Use for smooth trajectories where speed varies (e.g., slow down for turns). ### `.with_stop()` — Must Stop Here ```python # simplified wp = Waypoint(x=5.0, y=3.0, theta=0.0).with_stop() ``` Returns a new Waypoint that requires the robot to come to a complete stop. Use for the final waypoint, or at intermediate points where the robot must pause (e.g., intersection, pickup location). --- ## GoalResult Feedback from the navigation system about goal execution. ### `.with_error(message)` — Attach Error Message ```python # simplified result = GoalResult(goal_id=1, status=3) # Aborted result = result.with_error("Path blocked by obstacle") ``` Returns a new GoalResult with an error message. Use when the navigation fails — the message explains why (timeout, obstacle, invalid goal, etc.). --- ## OccupancyGrid A 2D grid map where each cell is free, occupied, or unknown. The foundation for obstacle avoidance and path planning. ### Constructor ```python # simplified grid = OccupancyGrid(width=100, height=100, resolution=0.05) # 100 × 100 cells at 5cm/cell = 5m × 5m map ``` **Understanding the coordinate system:** - `resolution`: meters per cell. 0.05 = each cell covers 5cm × 5cm. - `width` × `height`: number of cells. `width * resolution` = map width in meters. - Origin is at the bottom-left corner of the grid. ### `.world_to_grid(x, y)` — Meters to Cell Indices ```python # simplified gx, gy = grid.world_to_grid(2.5, 2.5) # Returns cell indices (50, 50) for a 0.05m resolution grid ``` Converts world coordinates (meters) to grid cell indices. The formula: `gx = (x - origin.x) / resolution`. Returns `None` if the point is outside the grid bounds. You need this every time you have a sensor reading in meters and want to mark it on the map. ### `.grid_to_world(grid_x, grid_y)` — Cell Indices to Meters ```python # simplified wx, wy = grid.grid_to_world(50, 50) # Returns (2.5, 2.5) — the center of cell (50, 50) ``` The inverse — converts grid indices back to world coordinates. Returns the center of the cell in meters. ### `.set_occupancy(grid_x, grid_y, value)` — Mark a Cell ```python # simplified grid.set_occupancy(50, 50, 100) # Occupied grid.set_occupancy(30, 30, 0) # Free grid.set_occupancy(10, 10, -1) # Unknown ``` Takes **grid indices** (not meters). Values: - `100` = definitely occupied (wall, obstacle) - `0` = definitely free (empty space) - `-1` = unknown (not yet observed) - Values 1-99 = probability of occupancy (higher = more likely occupied) ### `.is_free(x, y)` — Can the Robot Go Here? ```python # simplified if grid.is_free(2.5, 2.5): print("Path is clear") ``` Takes **world coordinates** (meters), converts to grid internally, returns `True` if the cell value is in the free range (0-49). This is the method you call in a path planner to check if a position is traversable. > **Common mistake:** Mixing up `set_occupancy()` (takes grid indices) and `is_free()` (takes world meters). They use different coordinate systems — `set_occupancy(50, 50, ...)` and `is_free(2.5, 2.5)` refer to the same cell when resolution=0.05. ### `.is_occupied(x, y)` — Is There an Obstacle? ```python # simplified if grid.is_occupied(2.5, 2.5): print("Obstacle detected!") ``` Takes world coordinates. Returns `True` if the cell value is 100 (occupied). ### `.occupancy(grid_x, grid_y)` — Raw Cell Value ```python # simplified value = grid.occupancy(50, 50) # Returns 0, 100, -1, or probability ``` Takes grid indices. Returns the raw occupancy value for fine-grained decisions. **Example — Building a Map from LiDAR:** ```python # simplified from horus import LaserScan, OccupancyGrid, Pose2D import math grid = OccupancyGrid(width=200, height=200, resolution=0.05) # 10m × 10m def update_map(scan, robot_pose): for i in range(len(scan.ranges)): if not scan.is_range_valid(i): continue angle = scan.angle_at(i) + robot_pose.theta r = scan.ranges[i] # Obstacle position in world frame ox = robot_pose.x + r * math.cos(angle) oy = robot_pose.y + r * math.sin(angle) result = grid.world_to_grid(ox, oy) if result is not None: gx, gy = result grid.set_occupancy(gx, gy, 100) # Mark occupied ``` --- ## CostMap Inflated cost map for path planning. Built from an OccupancyGrid by expanding obstacles with a safety margin. ### Constructor ```python # simplified costmap = CostMap(grid=grid, inflation_radius=0.3) # Inflates obstacles by 30cm — robot won't plan paths within 30cm of walls ``` ### `.cost(x, y)` — Query Cost at a Position ```python # simplified c = costmap.cost(2.0, 2.0) if c is not None and c > 200: print("Too close to obstacle — find alternate path") ``` Takes world coordinates. Returns the cost at that position (0 = free, 254 = lethal/occupied, values in between = proximity to obstacles). Use this in a path planner to prefer paths that stay far from obstacles. ### `.compute_costs()` — Recompute After Changes ```python # simplified costmap.compute_costs() ``` Recalculates the inflated cost layer from the underlying occupancy grid. Call this after modifying the grid (adding new obstacles from sensor data). --- ## PathPlan Planned path output from a path planning algorithm. ### `.add_waypoint(x, y, theta)` — Add a Point ```python # simplified plan = PathPlan() plan.add_waypoint(0.0, 0.0, 0.0) plan.add_waypoint(3.0, 0.0, 0.0) plan.add_waypoint(3.0, 4.0, 1.57) ``` ### `.from_waypoints(waypoints, goal)` — Build from Array ```python # simplified plan = PathPlan.from_waypoints( waypoints=[[0.0, 0.0, 0.0], [3.0, 0.0, 0.0], [3.0, 4.0, 1.57]], goal=[3.0, 4.0, 1.57], ) ``` ### `.is_empty()` — Check if Empty ```python # simplified if plan.is_empty(): print("No path found!") ``` --- ## VelocityObstacle, VelocityObstacles Dynamic obstacle avoidance using the velocity obstacle model. These are POD types with field access only — used by reactive collision avoidance algorithms. ```python # simplified from horus import VelocityObstacle, VelocityObstacles vo = VelocityObstacle() # position, velocity, radius fields ``` --- ## Design Decisions **Why does `NavGoal.is_reached()` check both position and heading?** A robot at the right position but facing the wrong direction has not truly "arrived" — it cannot perform its task (docking, picking up an object, facing a doorway). Separating `is_position_reached()` and `is_orientation_reached()` lets you handle the two phases independently when needed (drive to position, then rotate to heading). **Why does `OccupancyGrid` use two coordinate systems (grid indices and world meters)?** Grid operations (setting cells, iterating neighbors) are fastest with integer indices. Navigation operations (path planning, goal checking) work in world meters. The dual system avoids constant conversion in hot loops. `world_to_grid()` and `grid_to_world()` bridge the gap when needed. **Why -1 for unknown cells instead of a probability value?** Unknown is fundamentally different from "50% likely occupied." A cell that has never been observed provides no information. Using -1 as a sentinel makes the three-state semantics (free/occupied/unknown) explicit and prevents planners from treating unexplored space as traversable. **Why does `CostMap` take an `inflation_radius` at construction?** Robot footprint matters. A 30cm-wide robot cannot fit through a 20cm gap even if both adjacent cells are "free." Inflation expands obstacles by the robot's radius so the planner can treat the robot as a point and still guarantee collision-free paths. Different robots need different radii, so it is a parameter, not a constant. **Why separate `NavPath` and `PathPlan`?** `NavPath` is for execution (waypoint following, progress tracking). `PathPlan` is for planning output (the result of A*, RRT, etc.). The planner produces a `PathPlan`; the path follower consumes a `NavPath`. Keeping them separate prevents tight coupling between planning and execution modules. --- ## See Also - [Geometry Messages](/python/messages/geometry) — Pose2D, CmdVel for goal checking and velocity commands - [Sensor Messages](/python/messages/sensor) — LaserScan for building maps - [Control Messages](/python/messages/control) — DifferentialDriveCommand for wheel-level control - [Diagnostics Messages](/python/messages/diagnostics) — EmergencyStop for navigation safety - [Python Message Library](/python/library/python-message-library) — All 55+ message types overview --- ## AI/ML Developer's Guide Path: /python/library/ml-guide Description: Use HORUS with PyTorch, ONNX, HuggingFace, OpenCV, and NumPy — zero-copy pipelines for robotics AI # AI/ML Developer's Guide You're an ML engineer. You have a trained model. You need it running on a robot at 30Hz, receiving camera frames and publishing detections — with minimal latency and zero unnecessary copies. This page is your complete guide. --- ## Quick Start: YOLO on a Robot ```python # simplified import horus import torch import numpy as np model = torch.hub.load('ultralytics/yolov5', 'yolov5s', pretrained=True) model.eval() def detect_tick(node): img = node.recv("camera.rgb") if img is None: return # Zero-copy: shared memory → numpy → torch (no pixel copying) frame = img.to_numpy() # ~3μs (view into SHM) tensor = torch.from_numpy(frame).permute(2, 0, 1).unsqueeze(0).float() / 255.0 with torch.no_grad(): results = model(tensor) for *box, conf, cls in results.xyxy[0].cpu().numpy(): if conf > 0.5: node.send("detections", { "class": model.names[int(cls)], "confidence": float(conf), "bbox": [float(x) for x in box], }) detector = horus.Node( name="yolo", subs=[horus.Image], pubs=["detections"], tick=detect_tick, rate=30, compute=True, # Run on thread pool (CPU-bound inference) budget=30 * horus.ms, # 30ms budget per frame on_miss="skip", # Drop frames if inference is slow ) horus.run(detector, tick_rate=100) ``` --- ## Data Flow: Camera → Model → Action ``` Camera Node (Rust, 30Hz) │ │ Image via SHM pool (zero-copy, ~50ns) ▼ ML Node (Python) │ ├── img.to_numpy() ~3μs (SHM → NumPy view, no copy) ├── torch.from_numpy() ~1μs (NumPy → PyTorch, shared memory) ├── tensor.cuda() ~50μs (CPU → GPU copy, unavoidable) ├── model(tensor) ~10ms (GPU inference) ├── results.cpu() ~20μs (GPU → CPU copy) └── node.send("det", data) ~6μs (GenericMessage via SHM) │ ▼ Planner Node (Rust, 100Hz) │ │ reads detections, plans path ▼ Motor Controller (Rust, 1kHz) ``` **Key insight**: The only unavoidable copies are CPU↔GPU transfers. Everything else — camera to ML node, ML node to planner — is zero-copy via shared memory. --- ## Framework Integration ### NumPy (always available) Every HORUS domain type converts to/from NumPy with zero copy: ```python # simplified import numpy as np # Image → NumPy (zero-copy view into shared memory) frame = img.to_numpy() # shape: (H, W, C), dtype: uint8 frame = np.from_dlpack(img) # DLPack protocol — even faster (~1μs) # NumPy → Image (copies data into SHM pool) img = horus.Image.from_numpy(my_array) # PointCloud → NumPy points = cloud.to_numpy() # shape: (N, fields_per_point), dtype: float32 # DepthImage → NumPy depth = depth_img.to_numpy() # shape: (H, W), dtype: float32 # Tensor (arbitrary shape) data = tensor.numpy() # zero-copy view tensor = horus.Tensor.from_numpy(my_array) ``` ### PyTorch ```python # simplified import torch # Image → PyTorch (via NumPy bridge) frame = img.to_numpy() tensor = torch.from_numpy(frame).permute(2, 0, 1).float() / 255.0 # HWC → CHW # Or via DLPack (true zero-copy, no intermediate NumPy) tensor = torch.from_dlpack(img) # PyTorch → Image result_np = output_tensor.cpu().numpy() result_img = horus.Image.from_numpy(result_np) # Tensor → PyTorch t = horus.Tensor.from_numpy(np.zeros((3, 3), dtype=np.float32)) pt = t.torch() # zero-copy PyTorch tensor # PyTorch → Tensor t = horus.Tensor.from_numpy(pt.cpu().numpy()) ``` ### ONNX Runtime (recommended for production) Fastest inference on both CPU and GPU. No Python framework overhead: ```python # simplified import onnxruntime as ort import numpy as np # Load model once in init session = ort.InferenceSession("model.onnx", providers=['CUDAExecutionProvider', 'CPUExecutionProvider']) input_name = session.get_inputs()[0].name def inference_tick(node): img = node.recv("camera.rgb") if img is None: return # ONNX Runtime accepts NumPy directly frame = img.to_numpy() input_data = frame.astype(np.float32).transpose(2, 0, 1)[np.newaxis] / 255.0 outputs = session.run(None, {input_name: input_data}) # Process outputs... node = horus.Node( name="onnx_detector", subs=[horus.Image], pubs=["detections"], tick=inference_tick, rate=30, compute=True, ) ``` ### HuggingFace Transformers ```python # simplified from transformers import pipeline import horus # Load once — downloads model on first run classifier = pipeline("image-classification", model="google/vit-base-patch16-224", device=0) def classify_tick(node): img = node.recv("camera.rgb") if img is None: return frame = img.to_numpy() # HF pipeline accepts numpy arrays via PIL from PIL import Image as PILImage pil_img = PILImage.fromarray(frame) results = classifier(pil_img, top_k=3) node.send("classification", { "labels": [r["label"] for r in results], "scores": [r["score"] for r in results], }) node = horus.Node( name="hf_classifier", subs=[horus.Image], pubs=["classification"], tick=classify_tick, rate=10, compute=True, budget=100 * horus.ms, ) horus.run(node) ``` ### OpenCV ```python # simplified import cv2 import numpy as np def vision_tick(node): img = node.recv("camera.rgb") if img is None: return # to_numpy() returns RGB; OpenCV uses BGR frame_rgb = img.to_numpy() frame_bgr = cv2.cvtColor(frame_rgb, cv2.COLOR_RGB2BGR) # ArUco marker detection gray = cv2.cvtColor(frame_bgr, cv2.COLOR_BGR2GRAY) detector = cv2.aruco.ArucoDetector(cv2.aruco.getPredefinedDictionary(cv2.aruco.DICT_4X4_50)) corners, ids, _ = detector.detectMarkers(gray) if ids is not None: for marker_id, corner in zip(ids.flatten(), corners): center = corner[0].mean(axis=0) node.send("markers", { "id": int(marker_id), "x": float(center[0]), "y": float(center[1]), }) ``` ### JAX ```python # simplified import jax import jax.numpy as jnp def jax_tick(node): img = node.recv("camera.rgb") if img is None: return # DLPack: HORUS → JAX (zero-copy on same device) frame = jnp.from_dlpack(img) # Or via NumPy frame = jnp.array(img.to_numpy()) # JAX computation processed = jax.jit(my_model)(frame) ``` --- ## Tensor for Matrix Math `horus.Tensor` supports arbitrary shapes — use it for rotation matrices, Jacobians, homogeneous transforms, and any matrix computation: ```python # simplified import horus import numpy as np # 3x3 rotation matrix R = horus.Tensor.from_numpy(np.eye(3, dtype=np.float32)) # 4x4 homogeneous transform T = horus.Tensor.from_numpy(np.array([ [1, 0, 0, 0.5], [0, 1, 0, 0.0], [0, 0, 1, 0.3], [0, 0, 0, 1.0], ], dtype=np.float32)) # 6x6 Jacobian J = horus.Tensor.from_numpy(np.zeros((6, 6), dtype=np.float64)) # Share matrices between nodes via topics topic = horus.Topic("jacobian") topic.send(J) # Receive and use received = topic.recv() J_np = received.numpy() # zero-copy view ``` ### Supported Dtypes | Dtype | NumPy | Bytes | Use case | |-------|-------|-------|----------| | `float32` | `np.float32` | 4 | Images, point clouds, most robotics | | `float64` | `np.float64` | 8 | Precision: covariances, Jacobians | | `uint8` | `np.uint8` | 1 | Raw images, masks | | `uint16` | `np.uint16` | 2 | Depth images (millimeters) | | `int32` | `np.int32` | 4 | Indices, labels | | `int64` | `np.int64` | 8 | Timestamps, counters | | `bool` | `np.bool_` | 1 | Masks, occupancy | ### Shape Operations ```python # simplified t = horus.Tensor.from_numpy(np.zeros((480, 640, 3), dtype=np.float32)) t.shape # (480, 640, 3) t.numel # 921600 t.nbytes # 3686400 t.dtype # 'float32' # Reshape (zero-copy — same underlying data) flat = t.reshape((921600,)) batched = t.view((1, 480, 640, 3)) # Slice (zero-copy view) roi = t[100:200, 200:400, :] # Arithmetic scaled = t * 0.5 normalized = (t - t.mean()) / t.std() ``` --- ## Batch Inference (RL Vectorized Environments) Process multiple observations in one forward pass: ```python # simplified import horus import torch import numpy as np model = torch.jit.load("policy.pt").cuda().eval() # Collect observations from N environments env_topics = [horus.Topic(f"env.{i}.obs") for i in range(16)] def batch_tick(node): observations = [] for topic in env_topics: obs = topic.recv() if obs is not None: observations.append(obs.to_numpy()) if len(observations) < 16: return # wait for all envs # Stack into batch: (N, obs_dim) batch = np.stack(observations) batch_tensor = torch.from_numpy(batch).cuda() with torch.no_grad(): actions = model(batch_tensor).cpu().numpy() # Send actions back to each environment for i, action in enumerate(actions): action_topic = horus.Topic(f"env.{i}.action") action_topic.send(horus.Tensor.from_numpy(action)) node = horus.Node( name="rl_policy", tick=batch_tick, rate=100, compute=True, budget=10 * horus.ms, ) horus.run(node) ``` --- ## GPU Memory Management Critical for embedded devices (Jetson: 4-8GB shared CPU/GPU RAM): ```python # simplified import torch # Limit GPU memory (do this BEFORE loading any model) torch.cuda.set_per_process_memory_fraction(0.5) # use max 50% VRAM # Always use no_grad for inference (saves ~30% memory) with torch.no_grad(): output = model(input_tensor) # Periodically clear cache (every N ticks) if horus.tick() % 100 == 0: torch.cuda.empty_cache() # Monitor usage print(f"GPU memory: {torch.cuda.memory_allocated() / 1e6:.1f}MB / {torch.cuda.max_memory_allocated() / 1e6:.1f}MB peak") ``` ### Model Warmup First inference is 10-100x slower due to CUDA kernel compilation: ```python # simplified def my_init(node): global model model = torch.jit.load("model.pt").cuda().eval() # Warmup: run dummy inference in init (before tick loop starts) dummy = torch.zeros(1, 3, 640, 640).cuda() with torch.no_grad(): model(dummy) node.log_info("Model warmed up") ``` ### Choosing the Right Precision | Precision | Memory | Speed | When to use | |-----------|--------|-------|-------------| | FP32 | 100% | 1x | Training, debugging | | FP16 | 50% | 2x | Most inference on GPU | | INT8 | 25% | 4x | Edge deployment (TensorRT) | | TF32 | 100% | 1.5x | Ampere+ GPUs, automatic | ```python # simplified # FP16 inference (halves memory, doubles speed on GPU) model = model.half() input_tensor = input_tensor.half() # Or use torch.autocast with torch.cuda.amp.autocast(): output = model(input_tensor) ``` --- ## Production Deployment on Jetson ```python # simplified import horus import numpy as np # Use ONNX Runtime with TensorRT for maximum speed on Jetson import onnxruntime as ort def init_model(node): global session providers = [ ('TensorrtExecutionProvider', {'trt_max_workspace_size': '2147483648'}), 'CUDAExecutionProvider', 'CPUExecutionProvider', ] session = ort.InferenceSession("model.onnx", providers=providers) node.log_info(f"Using provider: {session.get_providers()[0]}") # Warmup dummy = np.zeros((1, 3, 640, 640), dtype=np.float32) session.run(None, {session.get_inputs()[0].name: dummy}) node.log_info("TensorRT engine built and warmed up") def detect_tick(node): img = node.recv("camera.rgb") if img is None: return frame = img.to_numpy().astype(np.float32).transpose(2, 0, 1)[np.newaxis] / 255.0 outputs = session.run(None, {session.get_inputs()[0].name: frame}) # ... process outputs node = horus.Node( name="jetson_detector", subs=[horus.Image], pubs=["detections"], tick=detect_tick, init=init_model, rate=30, compute=True, budget=30 * horus.ms, on_miss="skip", ) horus.run(node, tick_rate=100) ``` --- ## Performance Tips | Tip | Why | Impact | |-----|-----|--------| | Use `img.to_numpy()` not `np.frombuffer(img.data, ...)` | `to_numpy()` is zero-copy from SHM pool | 4-60x faster for large images | | Use `np.from_dlpack(img)` for NumPy | DLPack is even faster than `to_numpy()` | 1.1μs vs 3μs | | Use `compute=True` for CPU inference | Runs on thread pool, doesn't block scheduler | Prevents deadline misses | | Use `on_miss="skip"` for ML nodes | ML inference is variable-latency | Drops frames gracefully | | Warmup model in `init()` | First inference compiles CUDA kernels | Avoids 100x latency spike on first tick | | Use ONNX Runtime for production | Fastest cross-platform inference | 2-5x faster than raw PyTorch | | Use FP16/INT8 on edge devices | Halves memory, doubles throughput | Critical for Jetson | | Use typed topics for cross-language | GenericMessage (dicts) don't cross to Rust | ~1.7μs vs ~6μs | | Pre-allocate output tensors | Avoid allocation in tick hot path | Reduces GC pauses | | Batch observations for RL | One forward pass for N envs | N× throughput | --- ## See Also - [Tensor API](/python/api/tensor) — Full Tensor class reference - [Image API](/python/api/image) — Camera frame handling with DLPack - [PointCloud API](/python/api/pointcloud) — 3D point cloud with ML interop - [DepthImage API](/python/api/depth-image) — Depth maps with NumPy/PyTorch - [ML Utilities](/python/library/ml-utilities) — ONNX, PyTorch, OpenCV, TensorFlow integration - [Benchmarks](/performance/benchmarks) — Python binding latency numbers - [Python API](/python/api) — Complete Python reference --- ## Diagnostics Messages Path: /python/messages/diagnostics Description: Health monitoring, emergency stops, and safety system types for Python robotics — every method explained # Diagnostics Messages Diagnostics messages keep robots safe and observable in production. They report node health, trigger emergency stops, monitor resources, and track safety state. Every production robot needs these — even simple hobby robots benefit from battery monitoring and heartbeats. ```python # simplified from horus import ( DiagnosticStatus, EmergencyStop, ResourceUsage, SafetyStatus, DiagnosticReport, DiagnosticValue, Heartbeat, NodeHeartbeat, ) ``` --- ## DiagnosticStatus Node health reporting with severity-level factory methods. Instead of remembering that level 2 means ERROR, use the `error()` factory. ### Constructor ```python # simplified ds = DiagnosticStatus(level=2, code=101, message="overheating", component="motor") ``` ### `.ok(message)` — Everything Is Fine ```python # simplified ds = DiagnosticStatus.ok("All systems nominal") ``` Level 0. Publish this periodically to confirm your node is alive and healthy. Monitoring dashboards show OK nodes in green. If a node stops publishing OK statuses, the watchdog knows something is wrong. ### `.warn(code, message)` — Degraded But Functional ```python # simplified ds = DiagnosticStatus.warn(code=101, message="Temperature rising: 65°C") ``` Level 1. The node is still working but something needs attention. Examples: battery getting low, sensor noise increasing, CPU usage above 70%, communication latency above threshold. **When to use warn vs error:** If the robot can still complete its mission, it's a warning. If the mission is compromised, it's an error. ### `.error(code, message)` — Something Is Wrong ```python # simplified ds = DiagnosticStatus.error(code=201, message="Motor stalled on joint 3") ``` Level 2. The node cannot function correctly. Examples: motor stalled, sensor disconnected, localization lost, path blocked. An operator should investigate. > **Common mistake:** Using `error()` for recoverable conditions. If the motor stalls briefly then recovers, that's a `warn()`. `error()` should mean "this needs human intervention." ### `.fatal(code, message)` — System Cannot Continue ```python # simplified ds = DiagnosticStatus.fatal(code=301, message="Hardware fault: CAN bus disconnected") ``` Level 3. Unrecoverable failure. The node should enter safe state and stop. Examples: hardware fault, firmware crash, safety violation. This often triggers an EmergencyStop. ### `.with_component(name)` — Set Component Name ```python # simplified ds = DiagnosticStatus.error(code=201, message="Overheating") \ .with_component("left_drive_motor") ``` Returns a new DiagnosticStatus with the component name set. Always set this — monitoring dashboards group statuses by component, and without it, operators can't tell which motor is overheating. ### `.message_str()` / `.component_str()` — Read Back as Strings ```python # simplified print(ds.message_str()) # "Overheating" print(ds.component_str()) # "left_drive_motor" ``` The message and component are stored as fixed-size byte arrays internally. These methods convert them to Python strings. **Example — Node Health Reporter:** ```python # simplified from horus import Node, run, DiagnosticStatus, Topic diag_topic = Topic(DiagnosticStatus) cpu_percent = 0.0 # Updated elsewhere def report_health(node): if cpu_percent > 90: status = DiagnosticStatus.error(code=100, message=f"CPU at {cpu_percent:.0f}%") elif cpu_percent > 70: status = DiagnosticStatus.warn(code=100, message=f"CPU at {cpu_percent:.0f}%") else: status = DiagnosticStatus.ok(f"CPU at {cpu_percent:.0f}%") diag_topic.send(status.with_component("controller"), node) run(Node(tick=report_health, rate=1, pubs=["diagnostics"])) ``` --- ## EmergencyStop The panic button. `engage()` triggers an immediate stop; `release()` clears it after an operator confirms safe conditions. ### `.engage(reason)` — Trigger E-Stop ```python # simplified estop = EmergencyStop.engage("Obstacle detected at 0.1m") ``` Creates an engaged emergency stop with a reason string. Publish this on the e-stop topic and **all nodes should immediately enter safe state** — stop motors, lock brakes, disable actuators. ### `.release()` — Clear E-Stop ```python # simplified release = EmergencyStop.release() ``` Creates a release command. Publish this to clear the e-stop and allow normal operation to resume. > **Common mistake:** Auto-releasing the e-stop programmatically. E-stop release should **always** require human confirmation — a physical button, operator console acknowledgment, or at minimum a deliberate command. Auto-release defeats the purpose of safety systems. ### `.with_source(source)` — Identify Who Triggered It ```python # simplified estop = EmergencyStop.engage("Collision detected") \ .with_source("lidar_safety_node") ``` Returns a new EmergencyStop with a source identifier. When multiple nodes can trigger e-stops, the source tells operators which node detected the problem. ### `.reason_str()` — Read the Reason ```python # simplified print(estop.reason_str()) # "Collision detected" ``` **Example — Safety Controller:** ```python # simplified from horus import Node, run, EmergencyStop, LaserScan, CmdVel, Topic scan_topic = Topic(LaserScan) estop_topic = Topic(EmergencyStop) cmd_topic = Topic(CmdVel) def safety_check(node): scan = scan_topic.recv(node) if scan is None: return closest = scan.min_range() if closest is not None and closest < 0.15: estop = EmergencyStop.engage(f"Object at {closest:.2f}m") \ .with_source("safety_monitor") estop_topic.send(estop, node) cmd_topic.send(CmdVel.zero(), node) run(Node(tick=safety_check, rate=50, pubs=["estop", "cmd_vel"], subs=["scan"])) ``` --- ## ResourceUsage System resource monitoring with threshold checks. ### Constructor ```python # simplified ru = ResourceUsage(cpu_percent=85.0, memory_bytes=4_000_000_000) ``` ### `.is_cpu_high(threshold)` — CPU Overload Check ```python # simplified if ru.is_cpu_high(80.0): print("CPU overloaded!") ``` Returns `True` if `cpu_percent` exceeds the given threshold. Typical thresholds: - **70%**: Warning — consider reducing processing load - **85%**: Error — system may miss deadlines - **95%**: Critical — risk of dropped messages and missed ticks ### `.is_memory_high(threshold)` — Memory Pressure ```python # simplified if ru.is_memory_high(90.0): print("Memory pressure! Consider releasing caches") ``` ### `.is_temperature_high(threshold)` — Thermal Check ```python # simplified if ru.is_temperature_high(75.0): print("Overheating! Reduce motor duty cycle") ``` Hardware-specific. Raspberry Pi throttles at 80°C. Jetson limits at 97°C. Industrial PCs vary. --- ## SafetyStatus Safety system state machine with fault tracking. ### Constructor ```python # simplified ss = SafetyStatus() ``` ### `.is_safe()` — All Clear? ```python # simplified if not ss.is_safe(): print("Safety fault — entering safe state") ``` Returns `True` when no faults are active, e-stop is not engaged, and watchdog is healthy. Check this every tick — if it returns `False`, your node should stop actuators. ### `.set_fault(code)` — Register a Fault ```python # simplified ss.set_fault(101) # Motor overcurrent fault ``` Registers a fault code. `is_safe()` will return `False` until all faults are cleared. Use fault codes consistently across your system — document what each code means. ### `.clear_faults()` — Reset After Recovery ```python # simplified ss.clear_faults() print(ss.is_safe()) # True (assuming no other issues) ``` Clears all registered faults. Call this only after the root cause has been fixed — not as a way to ignore problems. --- ## DiagnosticReport Structured diagnostic data with typed key-value pairs. More organized than free-text messages — monitoring tools can parse and chart the values. ### Constructor ```python # simplified report = DiagnosticReport(component="sensor_hub") ``` ### `.add_string(key, value)` — Text Data ```python # simplified report.add_string("firmware_version", "2.1.3") report.add_string("status", "calibrating") ``` ### `.add_int(key, value)` — Integer Data ```python # simplified report.add_int("retry_count", 3) report.add_int("messages_dropped", 0) ``` ### `.add_float(key, value)` — Float Data ```python # simplified report.add_float("temperature_c", 42.5) report.add_float("voltage", 24.1) ``` ### `.add_bool(key, value)` — Boolean Data ```python # simplified report.add_bool("calibrated", True) report.add_bool("firmware_update_available", False) ``` All `add_*` methods raise `ValueError` if the report is full (max 16 values). **Example — Periodic Diagnostic Report:** ```python # simplified from horus import DiagnosticReport, Topic diag_topic = Topic(DiagnosticReport) def publish_diagnostics(node, temp, voltage, calibrated): report = DiagnosticReport(component="imu_driver") report.add_float("temperature_c", temp) report.add_float("supply_voltage", voltage) report.add_bool("calibrated", calibrated) report.add_int("tick_count", node.tick) diag_topic.send(report, node) ``` --- ## Heartbeat Simple "I'm alive" signal from nodes. ### `.update(uptime)` — Tick the Heartbeat ```python # simplified hb = Heartbeat(node_name="controller", node_id=1) hb.update(uptime=120.5) # Increments sequence, sets uptime ``` Call once per tick and publish. The monitoring system watches for heartbeats — if a node stops publishing, it's considered dead. --- ## NodeHeartbeat Filesystem-based heartbeat for cross-process discovery. Written to shared memory, not published on topics. ### `.update_timestamp()` — Refresh Timestamp ```python # simplified nhb = NodeHeartbeat(state=1, health=0) nhb.update_timestamp() # Sets to current time ``` ### `.is_fresh(max_age_secs)` — Check Staleness ```python # simplified if not nhb.is_fresh(max_age_secs=5): print("Node heartbeat is stale — node may have crashed") ``` Returns `True` if the timestamp is within `max_age_secs` of the current time. Use this in monitoring tools to detect crashed nodes. --- ## Design Decisions **Why factory methods (`ok()`, `warn()`, `error()`, `fatal()`) instead of severity integers?** Level numbers (0, 1, 2, 3) are meaningless without documentation. Factory methods are self-documenting: `DiagnosticStatus.error(201, "Motor stalled")` is immediately clear. The factories also set default fields correctly, reducing the chance of publishing a severity-2 status with level=0. **Why does `EmergencyStop` have a `.with_source()` method instead of a required field?** Not all e-stop triggers are software nodes. A physical e-stop button, a hardware watchdog, or an operator console might trigger an e-stop. The source is optional metadata that helps debugging, not a required field that would complicate hardware integration. **Why `DiagnosticReport` with typed key-value pairs instead of free-text?** Free-text diagnostics are human-readable but machine-unparseable. Typed values (`add_float("temperature_c", 42.5)`) can be charted, alerted on, and aggregated by monitoring tools. The 16-value limit keeps the message Pod-compatible (fixed-size, no heap allocation). **Why both `Heartbeat` (topic-based) and `NodeHeartbeat` (filesystem-based)?** Topic-based heartbeats detect node crashes within the same horus instance. Filesystem-based heartbeats enable cross-process discovery (monitoring tools that are not part of the horus graph can still check if nodes are alive by reading shared memory). Different failure modes require different detection mechanisms. **Why is `SafetyStatus.clear_faults()` a manual operation?** Auto-clearing faults is dangerous. If a motor overcurrent fault clears automatically, the motor re-engages immediately, potentially causing the same overcurrent condition. Manual clearing forces an operator (or a deliberate recovery procedure) to confirm the root cause is resolved before the system resumes. --- ## See Also - [Navigation Messages](/python/messages/navigation) — NavGoal, path following (often paired with diagnostics) - [Sensor Messages](/python/messages/sensor) — BatteryState for power monitoring - [Force Messages](/python/messages/force) — `WrenchStamped.exceeds_limits()` for force safety - [Control Messages](/python/messages/control) — MotorCommand.stop() for actuator safety - [Python Message Library](/python/library/python-message-library) — All 55+ message types overview --- ## Real-Time Systems (Python) Path: /python/real-time Description: Budget, deadline, and miss-policy configuration for real-time Python nodes in HORUS # Real-Time Systems (Python) A motor controller sends velocity commands 100 times per second. Each command says "move at this speed for the next 10 milliseconds." If one command arrives 50 ms late, the motor runs uncorrected for 5x too long. The arm overshoots, oscillates, or collides with the work surface. The problem is not speed — the controller was already plenty fast. The problem is **predictability**. Every command must arrive on time. Real-time means predictable, not fast. A real-time system guarantees that each computation finishes within a bounded time window. Three values define that window: **Budget**: how long the computation *should* take. This is the expected execution time of your `tick()` function. A 5 ms budget means your code should finish its work in 5 ms. **Deadline**: the maximum time the computation *can* take before the system intervenes. A 9 ms deadline means the system tolerates up to 9 ms, but fires the miss policy if the tick exceeds that limit. The deadline is always greater than or equal to the budget. **Jitter**: the variation in timing between consecutive ticks. If your node runs at 100 Hz, perfect timing means exactly 10 ms between each tick. In practice, one tick starts at 10.1 ms and the next at 9.9 ms. That 0.2 ms variation is jitter. Low jitter means smooth control. High jitter means the robot stutters or drifts. ``` Perfect timing (zero jitter): | 10ms | 10ms | 10ms | 10ms | 10ms | tick tick tick tick tick tick Real-world timing (some jitter): |10.1ms|9.9ms |10.0ms|9.8ms |10.2ms| tick tick tick tick tick tick Pathological (high jitter, GC pause): | 10ms | 10ms | 45ms | 10ms | 10ms | tick tick tick tick tick ^ GC pause ``` --- ## When Python Is Fine for Real-Time Python runs on CPython, which has a Global Interpreter Lock (GIL). The GIL means only one Python thread executes Python bytecode at a time. Every `tick()` call acquires the GIL, runs your Python code, and releases it. This acquire/release cycle adds overhead and constrains what frequencies are practical. Here is an honest assessment: | Frequency | Period | Typical tick budget | Python viable? | Why | |-----------|--------|--------------------|----|-----| | 1-10 Hz | 100-1000 ms | 80-800 ms | Yes | Huge budget, GIL overhead is negligible | | 10-50 Hz | 20-100 ms | 16-80 ms | Yes | Plenty of time for Python + ML inference | | 50-100 Hz | 10-20 ms | 8-16 ms | Yes, with care | Budget is tight but achievable for simple logic | | 100-500 Hz | 2-10 ms | 1.6-8 ms | Marginal | GIL acquire (~3 us) is small, but GC pauses (~1-5 ms) can blow the budget | | 500+ Hz | <2 ms | <1.6 ms | No | GIL overhead + GC pauses make consistent timing impossible | **The practical ceiling for Python RT is about 100 Hz.** At 100 Hz, your tick budget is 8 ms (80% of the 10 ms period). A typical Python `tick()` doing sensor reads and simple math takes 0.1-2 ms, leaving plenty of margin. At 500 Hz, the budget drops to 1.6 ms, where a single garbage collection pause (1-5 ms) blows through the deadline. --- ## GIL Impact on Timing The GIL adds two sources of timing unpredictability: **GIL acquisition overhead**: ~3 microseconds per tick. The scheduler (written in Rust) must acquire the GIL before calling your Python `tick()` and release it afterward. At 100 Hz (10 ms period), 3 us is 0.03% of the period — negligible. At 1000 Hz (1 ms period), 3 us is 0.3% — still small, but it adds up with other overhead. **Garbage collection pauses**: CPython's garbage collector runs periodically to reclaim cyclic references. A minor GC takes 0.1-1 ms. A major GC (generation 2) can take 1-10 ms. These pauses are unpredictable and cannot be preempted — they happen inside `tick()` and count against your budget. ```python # simplified import gc import horus # For critical nodes: disable GC and collect manually between ticks gc.disable() def motor_tick(node): cmd = node.recv("cmd_vel") if cmd: # Fast path — no allocations, no GC risk node.send("motor_cmd", {"rpm": cmd.linear * 100}) # Check remaining budget before optional GC if horus.budget_remaining() > 0.002: # 2ms headroom gc.collect(generation=0) # Minor collection only motor = horus.Node( name="motor_controller", subs=[horus.CmdVel], pubs=["motor_cmd"], tick=motor_tick, rate=100, budget=5 * horus.ms, deadline=8 * horus.ms, on_miss="safe_mode", ) horus.run(motor, rt=True) ``` --- ## Auto-Derived Timing from `rate=` The simplest way to get real-time behavior. Set a tick rate and HORUS calculates safe defaults: ```python # simplified import horus def controller_tick(node): scan = node.recv("scan") if scan: cmd = horus.CmdVel(linear=0.3, angular=0.0) node.send("cmd_vel", cmd) controller = horus.Node( name="controller", subs=[horus.LaserScan], pubs=[horus.CmdVel], tick=controller_tick, rate=100, # 10ms period -> 8ms budget (80%), 9.5ms deadline (95%) ) horus.run(controller, rt=True) ``` When you set `rate=`, the scheduler automatically: 1. Calculates the period: 1/100 = 10 ms 2. Sets the **budget** to 80% of the period: 8 ms 3. Sets the **deadline** to 95% of the period: 9.5 ms 4. Assigns the **Rt execution class** — the node gets a dedicated thread You do not need to call any special method to enable real-time. Setting `rate=` (when `compute` and `on` are not set), or `budget=`, or `deadline=` is enough. HORUS auto-detects that the node needs real-time scheduling. --- ## Explicit Budget and Deadline For fine-grained control, set budget and deadline directly instead of relying on auto-derivation: ```python # simplified import horus us = horus.us # 1e-6 ms = horus.ms # 1e-3 def fusion_tick(node): imu = node.recv("imu") gps = node.recv("gps") if imu and gps: # Fuse sensor data estimate = {"x": gps.latitude, "y": gps.longitude, "heading": imu.yaw} node.send("pose", estimate) fusion = horus.Node( name="fusion", subs=[horus.Imu, horus.NavSatFix], pubs=["pose"], tick=fusion_tick, rate=50, budget=3 * ms, # Must finish compute in 3ms deadline=8 * ms, # Hard wall at 8ms on_miss="warn", ) horus.run(fusion, rt=True) ``` Budget and deadline are specified in **seconds**. Use the `horus.us` (1e-6) and `horus.ms` (1e-3) constants for readability. If you set `budget=` without `deadline=`, the deadline equals the budget — your budget IS your hard deadline: ```python # simplified # budget=500us, deadline=500us (auto-derived from budget) critical = horus.Node( name="safety_check", tick=safety_tick, rate=100, budget=500 * us, # Tight: any overrun fires the miss policy on_miss="stop", ) ``` --- ## Checking Budget at Runtime Use `horus.budget_remaining()` inside `tick()` to check how much time is left. This lets you skip optional work when running behind: ```python # simplified import horus def perception_tick(node): img = node.recv("camera.rgb") if img is None: return # Always run: fast object detection detections = fast_detect(img) node.send("detections", detections) # Optional: expensive classification (only if budget allows) if horus.budget_remaining() > 0.005: # 5ms headroom classified = classify_objects(img, detections) node.send("classified", classified) perception = horus.Node( name="perception", subs=[horus.Image], pubs=["detections", "classified"], tick=perception_tick, rate=30, budget=20 * horus.ms, deadline=30 * horus.ms, on_miss="skip", ) horus.run(perception, rt=True) ``` `budget_remaining()` returns `float("inf")` if no budget is set. When a budget is active, it returns the remaining seconds. Use this to implement graceful degradation within a single tick — do the critical work first, then fill remaining time with optional processing. --- ## Deadline Miss Policies When a node's `tick()` exceeds its deadline, the miss policy determines what happens next: ```python # simplified import horus # Log and continue (default) — good for development planner = horus.Node( name="planner", tick=planner_tick, rate=10, on_miss="warn", ) # Skip the next tick to recover timing sensor_fusion = horus.Node( name="fusion", tick=fusion_tick, rate=100, budget=5 * horus.ms, on_miss="skip", ) # Call enter_safe_state() — stops motors, holds position actuator = horus.Node( name="actuator", tick=actuator_tick, rate=100, budget=3 * horus.ms, on_miss="safe_mode", ) # Shut down the entire scheduler immediately safety_monitor = horus.Node( name="safety", tick=safety_tick, rate=100, budget=500 * horus.us, on_miss="stop", ) ``` | Policy | String value | What happens | Best for | |--------|-------------|-------------|----------| | Warn | `"warn"` | Logs a warning, continues normally | Default. Non-critical nodes. Development. | | Skip | `"skip"` | Skips this node's next tick to let it catch up | High-frequency nodes where one dropped cycle is acceptable | | SafeMode | `"safe_mode"` | Calls `enter_safe_state()` on the node | Motor controllers, actuators — stops movement on overrun | | Stop | `"stop"` | Stops the entire scheduler immediately | Safety monitors — the last line of defense | --- ## Enabling OS-Level Real-Time Scheduling The `rt=True` flag on the Scheduler (or `horus.run()`) enables OS-level real-time features: ```python # simplified import horus # Recommended for most deployments — try RT, continue if unavailable sched = horus.Scheduler(tick_rate=100, rt=True) sched.add(controller) sched.run() # Or via the one-liner horus.run(controller, tick_rate=100, rt=True) ``` When `rt=True`, the scheduler attempts to: | Feature | What it does | Requires | |---------|-------------|----------| | `SCHED_FIFO` | Gives your process priority over all normal processes | Root or `CAP_SYS_NICE` | | `mlockall` | Locks all memory pages — prevents swap-induced page faults | Root or `CAP_IPC_LOCK` | | CPU isolation | Uses isolated cores if available (`isolcpus=` kernel param) | Kernel boot config | If any feature is unavailable (e.g., running without root on a development laptop), HORUS **logs a warning and continues**. This is the `prefer_rt` behavior — apply what you can, degrade gracefully on the rest. ```python # simplified # After starting, check what was achieved sched = horus.Scheduler(tick_rate=100, rt=True) sched.add(controller) caps = sched.capabilities() print(f"Full RT: {sched.has_full_rt()}") for degradation in sched.degradations(): print(f" Degradation: {degradation}") ``` `rt=True` is the right choice for almost all deployments. The same code works on a developer laptop (where RT features are unavailable) and on a production robot (where they are). Timing improves progressively as the platform improves. --- ## CPU Pinning Pin a node to a specific CPU core to reduce jitter from cache thrashing: ```python # simplified controller = horus.Node( name="controller", tick=controller_tick, rate=100, budget=5 * horus.ms, core=2, # Pin to CPU core 2 ) ``` When a thread migrates between CPU cores (which the OS does for load balancing), it loses its L1 and L2 cache contents. Rebuilding the cache takes microseconds — which shows up as jitter. Pinning eliminates this. CPU pinning is most effective when combined with Linux CPU isolation: ```bash # In /etc/default/grub (then update-grub and reboot) GRUB_CMDLINE_LINUX="isolcpus=2,3" ``` This tells the kernel not to schedule anything else on cores 2 and 3, reserving them entirely for your pinned nodes. --- ## Priority Set the OS scheduling priority for a real-time node: ```python # simplified controller = horus.Node( name="controller", tick=controller_tick, rate=100, budget=5 * horus.ms, priority=90, # SCHED_FIFO priority 1-99 (higher = more urgent) core=2, ) ``` Priority only takes effect when `rt=True` is set on the Scheduler and `SCHED_FIFO` is available. Higher values (closer to 99) mean the node preempts lower-priority real-time threads. The Linux kernel reserves priority 99 for its own real-time threads, so practical values are 1-98. | Priority range | Typical use | |---------------|-------------| | 90-98 | Safety monitors, emergency stop handlers | | 50-89 | Motor controllers, actuator loops | | 10-49 | Sensor fusion, perception pipelines | | 1-9 | Low-priority RT nodes (data recording with timing) | --- ## Watchdog The watchdog detects nodes that stop responding. HORUS provides two levels: ### Global Watchdog Set on the Scheduler, applies to all nodes: ```python # simplified sched = horus.Scheduler( tick_rate=100, rt=True, watchdog_ms=500, # Fire if any node is silent for 500ms max_deadline_misses=3, # Escalate after 3 consecutive misses ) ``` The graduated response prevents a single transient spike from killing a node: | Timeout | Health state | Response | |---------|-------------|----------| | 1x watchdog (500 ms) | Warning | Log warning | | 2x watchdog (1000 ms) | Unhealthy | Skip tick, log error | | 3x watchdog (1500 ms, critical node) | Isolated | Remove from tick loop, call `enter_safe_state()` | ### Per-Node Watchdog Override the global watchdog for individual nodes: ```python # simplified # Safety monitor gets a tighter watchdog safety = horus.Node( name="safety", tick=safety_tick, rate=100, watchdog=0.2, # 200ms — tighter than the global 500ms on_miss="stop", ) # ML inference gets a longer watchdog (model loading can be slow) detector = horus.Node( name="detector", tick=detect_tick, rate=30, watchdog=2.0, # 2 seconds — first inference loads the model on_miss="skip", ) ``` --- ## Complete Example: Multi-Node RT System A full system with sensor, controller, and safety monitor at different rates and miss policies: ```python # simplified import horus import gc us = horus.us ms = horus.ms # --- Sensor node: read IMU at 100Hz --- def sensor_tick(node): reading = horus.Imu( accel_x=0.0, accel_y=0.0, accel_z=9.81, gyro_x=0.01, gyro_y=0.0, gyro_z=0.0, ) node.send("imu", reading) sensor = horus.Node( name="imu_driver", pubs=[horus.Imu], tick=sensor_tick, rate=100, budget=2 * ms, on_miss="skip", priority=60, core=2, ) # --- Controller: PID loop at 50Hz --- gc.disable() # No GC pauses in the controller target_speed = 0.5 integral = 0.0 def controller_tick(node): global integral imu = node.recv("imu") if imu is None: return # Simple P+I controller error = target_speed - imu.accel_x integral += error * horus.dt() command = 2.0 * error + 0.1 * integral cmd = horus.CmdVel(linear=command, angular=0.0) node.send("cmd_vel", cmd) # Collect GC only if we have headroom if horus.budget_remaining() > 0.003: gc.collect(generation=0) controller = horus.Node( name="pid_controller", subs=[horus.Imu], pubs=[horus.CmdVel], tick=controller_tick, rate=50, budget=5 * ms, deadline=15 * ms, on_miss="safe_mode", priority=70, core=3, ) # --- Safety monitor: check system health at 100Hz --- def safety_tick(node): cmd = node.recv("cmd_vel") if cmd and abs(cmd.linear) > 2.0: node.log_warning(f"Unsafe velocity: {cmd.linear}") node.request_stop() safety = horus.Node( name="safety_monitor", subs=[horus.CmdVel], tick=safety_tick, rate=100, budget=500 * us, on_miss="stop", priority=95, watchdog=0.2, core=2, ) # --- Run everything --- sched = horus.Scheduler( tick_rate=100, rt=True, watchdog_ms=500, max_deadline_misses=3, ) sched.add(sensor) sched.add(controller) sched.add(safety) sched.run() # After shutdown, inspect what happened stats = sched.safety_stats() if stats: print(f"Deadline misses: {stats.get('deadline_misses', 0)}") print(f"Watchdog timeouts: {stats.get('watchdog_timeouts', 0)}") ``` --- ## Quick Reference | Your node does... | Configuration | Why | |---|---|---| | Motor control at 50 Hz | `rate=50, budget=5*ms, on_miss="safe_mode"` | Explicit budget, safe degradation on overrun | | Sensor fusion at 100 Hz | `rate=100, budget=3*ms, on_miss="skip"` | Skip one reading rather than cascade delays | | Safety monitor | `rate=100, budget=500*us, on_miss="stop", priority=95` | Highest priority, immediate shutdown on overrun | | ML inference at 30 Hz | `rate=30, compute=True, budget=30*ms, on_miss="skip"` | Thread pool for CPU-bound work, skip slow frames | | Path planner (1 Hz, slow) | `rate=1, compute=True` | CPU-heavy, no deadline needed | | Background logging | default (no RT config) | BestEffort is fine | | Emergency stop handler | `on="emergency.stop"` | Runs only when event fires, zero polling overhead | ### Node Parameters for RT | Parameter | Type | Default | Description | |-----------|------|---------|-------------| | `rate` | `float` | `30` | Tick rate in Hz. Auto-derives budget/deadline if no `compute`/`on` set | | `budget` | `float` | `None` | Max expected tick time in seconds. Use `horus.us`/`horus.ms` | | `deadline` | `float` | `None` | Hard wall in seconds. Miss policy fires beyond this | | `on_miss` | `str` | `None` | `"warn"`, `"skip"`, `"safe_mode"`, `"stop"` | | `priority` | `int` | `None` | OS SCHED_FIFO priority, 1-99. Requires `rt=True` | | `core` | `int` | `None` | Pin to CPU core | | `watchdog` | `float` | `None` | Per-node watchdog timeout in seconds | ### Scheduler Parameters for RT | Parameter | Type | Default | Description | |-----------|------|---------|-------------| | `tick_rate` | `float` | `1000.0` | Global tick rate in Hz | | `rt` | `bool` | `False` | Enable SCHED_FIFO + mlockall | | `watchdog_ms` | `int` | `0` | Global watchdog timeout (0 = disabled) | | `max_deadline_misses` | `int` | `None` | Consecutive misses before escalation | | `cores` | `list` | `None` | CPU affinity for the scheduler itself | --- ## Design Decisions **Why allow real-time features in Python at all?** Many robotics teams have Python-heavy codebases — ML pipelines, prototyping, data analysis. Telling them "rewrite in a compiled language for any timing guarantees" means they get no timing guarantees during the prototype phase, which is when timing bugs are cheapest to find. Python RT at 10-100 Hz catches timing problems early. Teams can migrate hot loops to compiled code later with confidence, because the same budget/deadline parameters work in both languages. **Why auto-derive budget and deadline from `rate=`?** Developers think in terms of what their node needs: "this controller must run at 100 Hz." They do not think in scheduling policies. Auto-derivation (80% budget, 95% deadline) maps developer intent to the correct execution class. You can override with explicit `budget=` and `deadline=` after profiling, but the defaults are safe starting points that catch real problems. **Why `"warn"` as the default miss policy?** During development, most deadline misses are transient — a background process spiked, the system was under load, the debugger was attached. Stopping the system on every transient miss makes development painful. `"warn"` logs the problem so you can see the pattern, without disrupting the run. Switch to `"safe_mode"` or `"stop"` when deploying to hardware. **Why `budget_remaining()` instead of just enforcing the deadline?** Hard enforcement (kill the tick at the deadline) is dangerous — it can leave data structures in an inconsistent state. `budget_remaining()` enables cooperative degradation: the node checks its remaining time and decides what to skip. This is safer and gives the developer control over graceful degradation within a single tick. **Why `gc.disable()` instead of automatic GC management?** Automatic GC management (e.g., running GC only between ticks) would require deep integration with CPython internals and would break when users import C extensions that trigger GC internally. Manual `gc.disable()` + `gc.collect()` is explicit, portable, and gives the developer full control. The `budget_remaining()` pattern shown in this guide is the recommended approach. ## Trade-offs | Gain | Cost | |------|------| | **Python RT at 10-100 Hz** — catch timing bugs during prototyping | GIL limits max frequency; GC pauses cause jitter above 100 Hz | | **Auto-derived timing from `rate=`** — no explicit configuration needed | Less visible what budget/deadline the node actually got | | **`budget_remaining()` cooperative degradation** — node controls what to skip | Requires developer discipline; nothing prevents ignoring the budget | | **`gc.disable()` for critical nodes** — eliminates GC jitter | Cyclic reference leaks if not manually collecting; must understand GC behavior | | **`rt=True` graceful degradation** — works on laptops and production robots | May run without RT features and developer does not notice (check `degradations()`) | | **Per-node `priority=` and `core=`** — fine-grained OS scheduling control | Requires root or `CAP_SYS_NICE`; incorrect pinning wastes cores | | **Per-node `watchdog=`** — tighter timeout for critical nodes, looser for slow nodes | Multiple timeout values make debugging more complex | --- ## See Also - [Python API Reference](/python/api) — Full Node, Scheduler, and Clock API reference - [Real-Time Systems (Concepts)](/concepts/real-time) — Platform-independent RT concepts, hard vs soft vs firm - [Execution Classes](/concepts/execution-classes) — The 5 execution classes and how `rate=` maps to Rt - [RT Setup](/advanced/rt-setup) — Linux kernel configuration for real-time (isolcpus, PREEMPT_RT, ulimits) - [Python Examples](/python/examples) — Complete working examples including RT patterns --- ## Python Examples Path: /python/examples Description: Python code examples for HORUS robotics applications # Python Examples All examples use the standard `horus.Node` callback API — one pattern, no inheritance. ```python # simplified import horus def my_tick(node): node.send("output", data) node = horus.Node(name="my_node", pubs=["output"], tick=my_tick, rate=30) horus.run(node) ``` --- ## Basic Node A minimal node that reads sensor data and publishes motor commands: ```python # simplified import horus def controller(node): """Simple obstacle avoidance""" if node.has_msg("sensor.distance"): distance = node.recv("sensor.distance") if distance < 0.5: node.send("motor.cmd", {"linear": 0.0, "angular": 0.5}) else: node.send("motor.cmd", {"linear": 1.0, "angular": 0.0}) node = horus.Node( name="obstacle_avoider", subs=["sensor.distance"], pubs=["motor.cmd"], tick=controller, rate=10 ) horus.run(node) ``` --- ## Typed Topics Using typed messages for proper logging and cross-language compatibility: ```python # simplified import horus def controller(node): if node.has_msg("localization.pose"): pose = node.recv("localization.pose") # Logs show: Pose2D { x: 2.31, y: 1.31, theta: 0.5 } cmd = horus.CmdVel(linear=1.0, angular=0.0) node.send("control.cmd", cmd) node = horus.Node( name="controller", subs={"localization.pose": {"type": horus.Pose2D}}, pubs={"control.cmd": {"type": horus.CmdVel}}, tick=controller, rate=30 ) horus.run(node) ``` Available typed messages: `CmdVel`, `Pose2D`, `Imu`, `Odometry`, `LaserScan`, `JointState`, `MotorCommand`, and 50+ more. --- ## Multi-Node System Multiple nodes with execution order: ```python # simplified import horus def sensor_tick(node): node.send("sensor.distance", 1.5) def controller_tick(node): if node.has_msg("sensor.distance"): dist = node.recv("sensor.distance") if dist < 0.5: node.send("cmd_vel", horus.CmdVel(linear=0.0, angular=0.5)) else: node.send("cmd_vel", horus.CmdVel(linear=1.0, angular=0.0)) def logger_tick(node): if node.has_msg("cmd_vel"): cmd = node.recv("cmd_vel") print(f"Command: linear={cmd.linear:.1f} angular={cmd.angular:.1f}") sensor = horus.Node(name="sensor", pubs=["sensor.distance"], tick=sensor_tick, rate=30) controller = horus.Node( name="controller", subs=["sensor.distance"], pubs={"cmd_vel": {"type": horus.CmdVel}}, tick=controller_tick, rate=30 ) logger = horus.Node(name="logger", subs=["cmd_vel"], tick=logger_tick, rate=10) # Quick run — no scheduler needed horus.run(sensor, controller, logger, duration=10.0) ``` --- ## Scheduler with Execution Control When you need execution order, failure policies, or RT features: ```python # simplified import horus sensor = horus.Node(name="sensor", pubs=["scan"], tick=sensor_tick, rate=100, order=0) controller = horus.Node(name="ctrl", subs=["scan"], pubs=["cmd"], tick=ctrl_tick, rate=100, order=1) logger = horus.Node(name="logger", subs=["cmd"], tick=log_tick, rate=10, order=10) scheduler = horus.Scheduler(tick_rate=100, watchdog_ms=500) # Critical — runs first scheduler.add(sensor) # RT control — runs after sensor scheduler.add(controller) # Non-critical — runs last scheduler.add(logger) scheduler.run() ``` ### Context Manager ```python # simplified sensor = horus.Node(name="sensor", tick=sensor_tick, rate=100, order=0) controller = horus.Node(name="ctrl", tick=ctrl_tick, rate=100, order=1) with horus.Scheduler(tick_rate=100) as sched: sched.add(sensor) sched.add(controller) sched.run(duration=30.0) # auto-stop on exit ``` ### Advanced Node Configuration For execution classes (compute, async I/O, event-driven), configure via Node kwargs: ```python # simplified scheduler = horus.Scheduler(tick_rate=500, rt=True) # Event-triggered: ticks when "lidar_scan" topic receives data scheduler.add(horus.Node(tick=detect_fn, on="lidar_scan", order=0)) # Compute pool: CPU-bound work on separate thread pool scheduler.add(horus.Node(tick=plan_fn, compute=True, rate=10)) # Async I/O: non-blocking network/file operations scheduler.add(horus.Node(tick=upload_fn, rate=1, failure_policy="ignore")) scheduler.run() ``` --- ## Sensor Processing Pipeline Processing laser scans for obstacle avoidance: ```python # simplified import horus def obstacle_avoidance(node): if node.has_msg("scan"): scan = node.recv("scan") if scan and hasattr(scan, 'ranges') and scan.ranges: min_dist = min(r for r in scan.ranges if r > 0.01) if min_dist < 0.5: node.send("cmd_vel", horus.CmdVel(linear=0.0, angular=0.5)) else: node.send("cmd_vel", horus.CmdVel(linear=1.0, angular=0.0)) node = horus.Node( name="obstacle_avoider", subs={"scan": {"type": horus.LaserScan}}, pubs={"cmd_vel": {"type": horus.CmdVel}}, tick=obstacle_avoidance, rate=10 ) horus.run(node) ``` --- ## Camera Image Pipeline Send and receive images with zero-copy shared memory: ```python # simplified import horus import numpy as np # Create image backed by shared memory img = horus.Image(480, 640, "rgb8") # Fill from NumPy (zero-copy when possible) pixels = np.zeros((480, 640, 3), dtype=np.uint8) pixels[:, :, 2] = 255 # Blue img = horus.Image.from_numpy(pixels) # Send over topic topic = horus.Topic("camera.rgb") topic.send(img) # Receive (in another node or process) received = topic.recv() if received is not None: arr = received.to_numpy() # zero-copy NumPy tensor = received.to_torch() # zero-copy PyTorch via DLPack print(f"Received {arr.shape[1]}x{arr.shape[0]} image") ``` --- ## Cross-Language Communication Python and Rust nodes communicate via shared memory topics — same machine, automatic: **Python publisher:** ```python # simplified import horus import time topic = horus.Topic(horus.CmdVel) while True: topic.send(horus.CmdVel(linear=1.0, angular=0.2)) time.sleep(0.1) ``` **Rust subscriber (separate process):** ```rust use horus::prelude::*; let topic: Topic = Topic::new("cmd_vel")?; loop { if let Some(cmd) = topic.recv() { println!("linear={}, angular={}", cmd.linear, cmd.angular); } } ``` --- ## Time API Framework clock for deterministic-compatible code: ```python # simplified import horus def physics_tick(node): dt = horus.dt() # Fixed in deterministic mode elapsed = horus.elapsed() # Time since start tick = horus.tick() # Tick number rng = horus.rng_float() # Deterministic random [0, 1) budget = horus.budget_remaining() # Seconds left in budget # Physics integration node.velocity += node.acceleration * dt node.position += node.velocity * dt node = horus.Node(name="physics", tick=physics_tick, rate=100) horus.run(node) ``` | Function | Returns | Description | |----------|---------|-------------| | `horus.now()` | `float` | Current time (seconds) | | `horus.dt()` | `float` | Timestep (seconds) | | `horus.elapsed()` | `float` | Time since start (seconds) | | `horus.tick()` | `int` | Current tick number | | `horus.budget_remaining()` | `float` | Budget left (seconds, `inf` if none) | | `horus.rng_float()` | `float` | Random in [0, 1) | | `horus.timestamp_ns()` | `int` | Nanosecond timestamp | --- ## Async Node Nodes with `async def` tick functions run on the async I/O thread pool — perfect for network requests and file I/O: ```python # simplified import horus async def fetch_weather(node): """Fetch weather data from an API every tick""" import aiohttp async with aiohttp.ClientSession() as session: async with session.get("http://api.weather.local/current") as resp: data = await resp.json() node.send("weather", data) weather = horus.Node( name="weather_fetcher", pubs=["weather"], tick=fetch_weather, # async def auto-detected rate=1 # 1 Hz — fetch every second ) horus.run(weather) ``` HORUS auto-detects `async def` and runs it on the async executor. No manual event loop setup needed. --- ## ML Inference ONNX model inference with performance monitoring: ```python # simplified import horus import numpy as np def detect_tick(node): if node.has_msg("camera.rgb"): img = node.recv("camera.rgb") arr = img.to_numpy() # Preprocess input_data = preprocess(arr) # Run inference results = model.run(None, {"input": input_data})[0] # Publish detections for det in parse_detections(results): node.send("detections", det) node = horus.Node( name="detector", subs={"camera.rgb": {"type": horus.Image}}, pubs=["detections"], tick=detect_tick, rate=30, compute=True, order=5 ) scheduler = horus.Scheduler(tick_rate=30) scheduler.add(node) # Runs on compute pool scheduler.run() ``` --- ## See Also - [Python Bindings](/python/api/python-bindings) - Core Python API - [Message Library](/python/library/python-message-library) - Available message types - [Scheduler Deep-Dive](/python/scheduler-guide) — Time API and scheduler reference - [Deterministic Mode](/advanced/deterministic-mode) - Deterministic execution guide --- ## Force & Haptics Messages Path: /python/messages/force Description: Force sensing, impedance control, haptic feedback, and tactile types for Python robotics — every method explained # Force & Haptics Messages Force messages handle the physical interaction between your robot and the world — measuring contact forces, commanding force-controlled actuators, adjusting compliance, and providing haptic feedback to operators. This is the most physics-intensive message category. ```python # simplified from horus import ( WrenchStamped, ForceCommand, ImpedanceParameters, HapticFeedback, ContactInfo, TactileArray, Vector3, ) ``` --- ## WrenchStamped A 6-DOF force/torque measurement from a force/torque sensor. "Wrench" is the physics term for combined force + torque. ### Constructor ```python # simplified wrench = WrenchStamped(fx=10.0, fy=0.0, fz=-9.81, tx=0.0, ty=0.5, tz=0.0) ``` ### Fields | Field | Type | Description | |-------|------|-------------| | `fx`, `fy`, `fz` | `float` | Force components in Newtons (N) | | `tx`, `ty`, `tz` | `float` | Torque components in Newton-meters (Nm) | | `timestamp_ns` | `int` | Measurement timestamp in nanoseconds | ### `.force_only(vector)` — Force Without Torque ```python # simplified from horus import Vector3 w = WrenchStamped.force_only(Vector3(x=0.0, y=0.0, z=-20.0)) # torque is zero ``` Creates a wrench with only force components. Use when modeling pure forces (gravity, push/pull) without any rotational component. ### `.torque_only(vector)` — Torque Without Force ```python # simplified w = WrenchStamped.torque_only(Vector3(x=0.0, y=0.5, z=0.0)) # force is zero ``` ### `.force_magnitude()` — Total Force Strength ```python # simplified print(f"Force: {wrench.force_magnitude():.1f} N") ``` Euclidean magnitude of the force vector: `sqrt(fx² + fy² + fz²)`. This is the total force regardless of direction — use it for safety checks ("is the force too high?") and monitoring. Typical ranges by robot type: - **Small arm (UR3)**: 0-30N normal, >50N = concerning - **Large arm (UR10)**: 0-100N normal, >150N = concerning - **Gripper**: 0-50N grasp force ### `.torque_magnitude()` — Total Torque Strength ```python # simplified print(f"Torque: {wrench.torque_magnitude():.2f} Nm") ``` ### `.exceeds_limits(max_force, max_torque)` — Safety Check ```python # simplified if wrench.exceeds_limits(max_force=50.0, max_torque=5.0): estop_topic.send(EmergencyStop.engage("Force limit exceeded"), node) cmd_topic.send(CmdVel.zero(), node) ``` Returns `True` if either `force_magnitude() > max_force` OR `torque_magnitude() > max_torque`. **This is a safety-critical method** — call it every tick when the robot is in contact with the environment or near humans. > **Common mistake:** Not calling this frequently enough. Force spikes happen in milliseconds — checking at 10Hz might miss a dangerous impact. Check at your control rate (typically 100-1000Hz). ### `.filter(prev_wrench, alpha)` — Noise Smoothing ```python # simplified prev = wrench # Save previous reading new_reading = WrenchStamped(fx=10.5, fy=0.1, fz=-9.8, tx=0.0, ty=0.5, tz=0.0) new_reading.filter(prev, alpha=0.8) # Result: 80% new reading + 20% previous reading ``` Applies exponential moving average (EMA) filter in-place. `alpha` controls responsiveness: - **0.9-1.0**: Very responsive but noisy — use for collision detection - **0.5-0.8**: Balanced — use for force control - **0.1-0.3**: Very smooth but laggy — use for slow-changing measurements > **Common mistake:** Not filtering force/torque sensor data. Raw readings are noisy, and noise causes false safety triggers and jittery force control. Always filter. --- ## ForceCommand Commands for force-controlled actuators. ### `.force_only(target)` — Pure Force Command ```python # simplified cmd = ForceCommand.force_only(Vector3(x=0.0, y=0.0, z=-10.0)) # Push down with 10N ``` Commands the actuator to apply the specified force. The controller maintains the target force using feedback from a force/torque sensor. ### `.surface_contact(normal_force, normal)` — Follow a Surface ```python # simplified cmd = ForceCommand.surface_contact( normal_force=5.0, normal=Vector3(x=0.0, y=0.0, z=1.0), # Surface normal points up ) ``` Creates a compliant contact command — the robot pushes against a surface with constant force along the surface normal while allowing free motion in the tangential plane. Essential for surface polishing, assembly insertion, and inspection tasks. ### `.with_timeout(seconds)` — Safety Timeout ```python # simplified cmd = ForceCommand.force_only(Vector3(x=0.0, y=0.0, z=-10.0)) \ .with_timeout(5.0) ``` **Always set a timeout on force commands.** Without one, the robot pushes indefinitely — if the target surface moves away, the robot lunges forward into free space at full force. --- ## ImpedanceParameters Spring-damper model for compliant robot behavior. Think of it as a virtual spring between the robot and its target position. ### `.compliant()` — Soft Spring (Safe for Contact) ```python # simplified params = ImpedanceParameters.compliant() ``` Low stiffness, high damping. The robot yields easily on contact — like pushing against a foam pad. **Use this when the robot is near humans or during the approach phase of contact tasks.** ### `.stiff()` — Rigid Spring (Precise Positioning) ```python # simplified params = ImpedanceParameters.stiff() ``` High stiffness, low damping. The robot holds position rigidly — small forces won't displace it. **Use this for precise positioning in free space** (not during contact — rigid + contact = high forces). ### `.enable()` / `.disable()` — Toggle Impedance Mode ```python # simplified params = ImpedanceParameters.compliant() params.enable() # Activate compliance # ... do contact task ... params.disable() # Back to position control ``` > **Common mistake:** Using `stiff()` during contact approach. If the robot touches something while stiff, the contact force spikes immediately. Always approach in `compliant()` mode, then switch to `stiff()` after stable contact is established. --- ## HapticFeedback Haptic patterns for teleoperation — the operator *feels* what the robot feels. ### `.vibration(intensity, frequency, duration)` — Continuous Vibration ```python # simplified vib = HapticFeedback.vibration(intensity=0.8, frequency=200.0, duration=0.5) ``` - `intensity`: 0.0 (off) to 1.0 (max). Clamped. - `frequency`: Hz. 100-300Hz is most perceptible on most haptic devices. - `duration`: seconds. Use for collision warnings, proximity alerts, motor stall notification. ### `.force(force_vec, duration)` — Force Feedback ```python # simplified ff = HapticFeedback.force(force=Vector3(x=0.0, y=0.0, z=-2.0), duration=1.0) ``` Pushes back on the operator's hand. Use to convey the robot's contact force — the operator feels resistance proportional to what the robot feels. ### `.pulse(intensity, frequency, duration)` — Single Pulse ```python # simplified pulse = HapticFeedback.pulse(intensity=1.0, frequency=50.0, duration=0.1) ``` A brief, sharp pulse. Use for event notification — button click confirmation, waypoint reached, object grasped. --- ## ContactInfo Contact detection and classification from force/torque sensors or contact sensors. ### Constructor ```python # simplified contact = ContactInfo(state=1, force=5.0) ``` ### Fields | Field | Type | Description | |-------|------|-------------| | `state` | `int` | Contact state (0=NoContact, 1=Contact, 2=Sliding, 3=Stuck) | | `force` | `float` | Contact force magnitude (Newtons) | | `timestamp_ns` | `int` | Timestamp of contact event | ### `.is_in_contact()` — Is the Robot Touching Something? ```python # simplified if contact.is_in_contact(): print("Contact detected!") ``` Returns `True` when the contact state indicates active contact (not `NoContact`). ### `.contact_duration_seconds()` — How Long Has Contact Lasted? ```python # simplified duration = contact.contact_duration_seconds() if duration > 2.0: print("Stable contact for 2+ seconds — safe to increase force") ``` Returns seconds since initial contact. Use to distinguish brief collisions (< 0.1s) from stable grasps (> 1s). **Example — Grasp Verification:** ```python # simplified from horus import ContactInfo, Topic contact_topic = Topic(ContactInfo) def verify_grasp(node): contact = contact_topic.recv(node) if contact is None: return if contact.is_in_contact() and contact.contact_duration_seconds() > 1.0: if contact.force > 2.0: print("Stable grasp confirmed") else: print("Weak grasp — increase gripper force") elif not contact.is_in_contact(): print("No contact — object dropped?") ``` --- ## TactileArray Grid of force readings from a tactile sensor pad — common on robotic gripper fingertips. ### Constructor ```python # simplified tactile = TactileArray(rows=4, cols=4) # 4x4 taxel grid ``` ### Fields | Field | Type | Description | |-------|------|-------------| | `rows` | `int` | Number of taxel rows | | `cols` | `int` | Number of taxel columns | | `physical_size` | `(float, float)` | Sensor pad dimensions in meters (width, height) | ### `.set_force(row, col, force)` / `.get_force(row, col)` — Access Taxels ```python # simplified tactile.set_force(1, 2, 3.5) # Set taxel at row 1, col 2 to 3.5 N force = tactile.get_force(1, 2) # 3.5 force = tactile.get_force(10, 10) # None (out of bounds) ``` Row-major grid. Each taxel is a force reading in Newtons. `physical_size` gives the sensor pad dimensions in meters, `center_of_pressure` gives the weighted average contact point. **Example — Detect Grip Slip:** ```python # simplified from horus import TactileArray, Topic tactile_topic = Topic(TactileArray) prev_total = 0.0 def detect_slip(node): global prev_total tactile = tactile_topic.recv(node) if tactile is None: return total_force = 0.0 for r in range(tactile.rows): for c in range(tactile.cols): f = tactile.get_force(r, c) if f is not None: total_force += f if prev_total > 0 and total_force < prev_total * 0.7: print("WARNING: Rapid force drop — possible slip!") prev_total = total_force ``` --- ## Design Decisions **Why "WrenchStamped" instead of "ForceTorque"?** "Wrench" is the standard physics term for a combined force-torque vector (6-DOF). Using the correct physics terminology aligns with academic literature and other robotics frameworks. "Stamped" indicates the message includes a timestamp and frame ID, distinguishing it from a raw wrench without metadata. **Why does `exceeds_limits()` check force and torque independently?** A robot arm can exert dangerous force without dangerous torque (pushing straight), or dangerous torque without dangerous force (twisting a stuck bolt). Checking both independently catches both failure modes. If you only need one, compare `force_magnitude()` or `torque_magnitude()` directly. **Why does `ImpedanceParameters` have presets instead of requiring manual tuning?** Impedance tuning requires knowledge of the robot's mass, inertia, and task requirements. Getting it wrong causes instability (oscillation) or poor performance (too stiff or too soft). The `compliant()` and `stiff()` presets provide safe starting points for the two most common modes. Users can adjust from there. **Why does `ForceCommand` require an explicit timeout?** Force control without a timeout is a safety hazard. If the contact surface moves away, the robot accelerates into free space at the commanded force. A timeout limits the maximum duration, giving the safety system time to intervene. The `with_timeout()` method is chained rather than required in the constructor to keep simple examples readable, but production code should always set one. **Why `HapticFeedback` patterns instead of raw motor commands?** Haptic devices vary widely (vibration motors, voice coils, force-feedback joysticks). Patterns (`vibration()`, `pulse()`, `force()`) describe intent, not implementation. The haptic driver maps patterns to device-specific commands. This decouples your teleoperation code from the specific haptic hardware. --- ## See Also - [Diagnostics Messages](/python/messages/diagnostics) — EmergencyStop for force safety - [Control Messages](/python/messages/control) — MotorCommand, JointCommand for actuators - [Geometry Messages](/python/messages/geometry) — Vector3 for force/torque directions - [Sensor Messages](/python/messages/sensor) — JointState for reading joint efforts - [Python Message Library](/python/library/python-message-library) — All 55+ message types overview --- ## Node Lifecycle (Python) Path: /python/node-lifecycle Description: Complete guide to Python node lifecycle in HORUS: construction, initialization, tick loop, shutdown, error handling, and state management # Node Lifecycle (Python) A warehouse robot runs an ML detector at 30 Hz, a motor controller at 100 Hz, and a safety monitor that must stop the wheels within one cycle if the detector flags an obstacle. The motor controller needs a serial port opened before the first tick. The detector needs a model loaded into GPU memory. The safety monitor must send a zero-velocity command before the serial port closes. And when someone presses Ctrl+C, all of this must happen in the right order -- motors stop before sensors disconnect, the model releases GPU memory, and the log file flushes before the process exits. In HORUS, every Python node follows the same lifecycle: **construct, initialize, tick, shut down**. The scheduler controls when each phase runs, guarantees the order, and catches exceptions at every stage. This page covers the complete lifecycle, the rules for each phase, and the patterns for managing state across them. ## Lifecycle Overview Every node transitions through well-defined states, managed by the scheduler: UNINIT: Construction UNINIT --> INITIALIZING: Scheduler.run() INITIALIZING --> RUNNING: init() success RUNNING --> STOPPING: Ctrl+C / stop() STOPPING --> STOPPED INITIALIZING --> ERROR: init error RUNNING --> CRASHED: unrecoverable ERROR --> CRASHED CRASHED --> ERROR ERROR --> RUNNING: recovery `} caption="Node lifecycle: UNINIT → INITIALIZING → RUNNING → STOPPING → STOPPED, with error/crash recovery paths" /> **Phase summary**: | Phase | Callback | Called | Purpose | |-------|----------|-------|---------| | Construction | `Node(...)` | Once, by your code | Declare topics, set rate, wire callbacks | | Initialization | `init(node)` | Once, by the scheduler | Open hardware, load models, allocate buffers | | Tick loop | `tick(node)` | Repeatedly, at configured rate | Read sensors, compute, publish results | | Error handling | `on_error(node, exception)` | On each exception in `tick()` | Log, recover, or escalate | | Shutdown | `shutdown(node)` | Once, by the scheduler | Stop motors, close files, release resources | ## NodeState Enum The `NodeState` enum tracks the current lifecycle phase. Import and compare it directly: ```python # simplified from horus import NodeState # Values NodeState.UNINITIALIZED # Created but scheduler hasn't started NodeState.INITIALIZING # init() is executing NodeState.RUNNING # Actively ticking NodeState.STOPPING # shutdown() is executing NodeState.STOPPED # Clean shutdown complete NodeState.ERROR # Recoverable error -- on_error() was called NodeState.CRASHED # Unrecoverable -- node removed from tick loop ``` NodeState values are strings, so direct comparison works: ```python # simplified if node.info.state == "running": node.log_info("Node is active") ``` ## Construction Construction declares *what* the node does. No hardware is opened, no connections are made, no resources are allocated. That happens in `init()`. ### Function-Based (Closures) The simplest pattern. Use module-level or closure variables for state: ```python # simplified import horus count = 0 def my_init(node): node.log_info("Sensor starting") def my_tick(node): global count count += 1 node.send("heartbeat", {"tick": count}) def my_shutdown(node): node.log_info(f"Sensor stopped after {count} ticks") sensor = horus.Node( name="sensor", tick=my_tick, init=my_init, shutdown=my_shutdown, pubs=["heartbeat"], rate=10, ) horus.run(sensor) ``` This works well for simple nodes. The `count` variable lives in module scope and persists across ticks. ### Class-Based (Bound Methods) For nodes with complex state, use a class and pass its methods as callbacks: ```python # simplified import horus class MotorController: def __init__(self): self.serial = None self.velocity = 0.0 self.error_count = 0 def init(self, node): import serial self.serial = serial.Serial("/dev/ttyUSB0", 115200) node.log_info("Motor serial port opened") def tick(self, node): if node.has_msg("cmd_vel"): cmd = node.recv("cmd_vel") self.velocity = cmd["linear"] self.serial.write(f"V{self.velocity:.2f}\n".encode()) def on_error(self, node, exception): self.error_count += 1 node.log_error(f"Motor error #{self.error_count}: {exception}") if self.error_count > 5: self.velocity = 0.0 node.log_warning("Too many errors -- zeroing velocity") def shutdown(self, node): self.velocity = 0.0 if self.serial: self.serial.write(b"V0.00\n") self.serial.close() node.log_info("Motor stopped safely") motor = MotorController() node = horus.Node( name="motor", tick=motor.tick, init=motor.init, shutdown=motor.shutdown, on_error=motor.on_error, subs=["cmd_vel"], rate=100, order=10, ) horus.run(node) ``` The class instance (`motor`) owns all state. Bound methods (`motor.tick`) automatically have access to `self`, so no globals are needed. ### When to Use Which | Pattern | Best for | Drawback | |---------|----------|----------| | **Functions + closures** | Simple nodes, prototyping, stateless transforms | Global variables, harder to test in isolation | | **Functions + mutable container** | Moderate state (a dict, a counter) | Manual discipline -- nothing enforces structure | | **Class with bound methods** | Complex state, hardware drivers, ML models | More boilerplate | A practical rule: if your node needs more than two mutable variables across ticks, use a class. ## Initialization: `init(node)` Called **once** by the scheduler, before the first `tick()`. This is where you do work that is too slow or too risky for the tick loop. ```python # simplified def init(node): # Open hardware connections node.serial = serial.Serial("/dev/ttyUSB0", 115200) # Load ML models (may take seconds) node.model = load_yolo_model("yolov8n.pt") # Pre-allocate buffers (avoid allocation in tick) node.buffer = bytearray(4096) # Set actuators to known-safe state node.velocity = 0.0 node.log_info("Controller initialized") ``` **Rules for init()**: - **Do open hardware here.** Serial ports, cameras, GPIO pins -- all should be opened in `init()`, not during construction. Construction happens at import time; `init()` happens when the scheduler is ready to run. - **Do pre-allocate.** Create lists, arrays, buffers, and model inference sessions here. Allocation during `tick()` causes jitter. - **Do set safe defaults.** Motors at zero velocity, grippers open, heaters off. If the system shuts down before the first tick, `shutdown()` should find everything in a safe state. - **Do not start background threads.** The scheduler manages execution. Background threads fight the scheduler for CPU time and break deterministic ordering. - **If init() raises an exception**, the node enters the Error state and its `failure_policy` is applied. Other nodes continue to initialize and run. ### Lazy Initialization `init()` is called lazily -- when `scheduler.run()` or `horus.run()` starts, not when the node is constructed. This means: ```python # simplified # Construction happens here -- no hardware touched motor = horus.Node(name="motor", tick=motor_tick, init=motor_init, ...) # ... possibly minutes later ... # init() runs here -- hardware is opened horus.run(motor) ``` This is deliberate. You can construct and configure all nodes before any hardware is touched. The scheduler finalizes its own configuration (clock, RT settings, recording) before calling any node's `init()`. ## The Tick Loop: `tick(node)` The scheduler calls `tick()` repeatedly at the configured rate. This is your main logic -- read sensors, compute, publish results. ```python # simplified def tick(node): # Read all subscribed topics if node.has_msg("scan"): scan = node.recv("scan") obstacles = detect_obstacles(scan) if obstacles: cmd = compute_avoidance(obstacles) else: cmd = {"linear": 1.0, "angular": 0.0} node.send("cmd_vel", cmd) ``` ### Rules for tick() **Keep it fast.** The scheduler monitors how long `tick()` takes. If it exceeds the node's budget, that is a deadline miss -- which can trigger safety responses. For a 100 Hz node, each tick has 10 ms. Stay well under. **Never block on I/O.** Do not read files, make HTTP requests, or wait on sockets inside `tick()`. These operations can take milliseconds to seconds, stalling the entire tick cycle. For I/O-heavy work, use an async tick function -- it auto-detects and runs on the async I/O executor: ```python # simplified import aiohttp async def fetch_tick(node): async with aiohttp.ClientSession() as session: async with session.get("http://api.weather.com/data") as resp: data = await resp.json() node.send("weather", data) # async def auto-detected -- runs on async executor, not the main tick loop node = horus.Node(tick=fetch_tick, pubs=["weather"], rate=0.1) ``` **Never call `time.sleep()`.** The scheduler controls timing. Sleeping inside `tick()` wastes the node's entire time budget and delays every node that runs after it. **Always drain your topics.** Call `recv()` on every subscribed topic every tick, even if you do not need the data. Unread messages pile up in the ring buffer, and when you finally read them, you get stale data from potentially seconds ago: ```python # simplified def tick(node): # CORRECT: always drain, even if you only act on the latest scan = node.recv("scan") odom = node.recv("odom") if scan is not None: process(scan) ``` **Do not allocate in the hot path.** Creating lists, dicts, or large objects in `tick()` triggers Python's allocator, which can add milliseconds of jitter. Pre-allocate in `init()` and reuse: ```python # simplified class Processor: def __init__(self): self.buffer = [0.0] * 360 # Pre-allocate in constructor def init(self, node): self.result = {"ranges": self.buffer, "count": 0} def tick(self, node): scan = node.recv("scan") if scan: # Reuse pre-allocated buffer for i, r in enumerate(scan["ranges"]): self.buffer[i] = r * 0.01 self.result["count"] += 1 node.send("processed", self.result) ``` ## Shutdown: `shutdown(node)` Called **once** when the scheduler stops -- whether from Ctrl+C, SIGINT, SIGTERM, `node.request_stop()`, or `scheduler.stop()`. Nodes shut down in **reverse order**: the last-added node shuts down first. ```python # simplified def shutdown(node): # 1. Stop actuators FIRST node.send("cmd_vel", {"linear": 0.0, "angular": 0.0}) # 2. Close hardware connections if hasattr(node, "serial") and node.serial: node.serial.close() # 3. Flush and close files if hasattr(node, "log_file") and node.log_file: node.log_file.flush() node.log_file.close() node.log_info("Motor controller shut down safely") ``` **Rules for shutdown()**: - **Stop actuators before closing connections.** Send zero velocity before dropping the serial port. Otherwise, the motor holds its last commanded velocity. - **Never raise exceptions.** If one cleanup step fails, log and continue. Do not let one failure prevent other cleanup: ```python # simplified def shutdown(node): try: send_stop_command() except Exception as e: node.log_error(f"Failed to stop motors: {e}") # Continue cleanup -- don't return early try: close_serial_port() except Exception as e: node.log_warning(f"Failed to close serial: {e}") ``` - **Do not assume tick() ran.** The system may shut down between `init()` and the first `tick()`. Your shutdown code must handle hardware that was opened but never used. ### Reverse-Order Shutdown Nodes are typically added in dependency order: sensors first, then controllers, then loggers. Reverse-order shutdown means controllers stop motors before sensors disconnect: ```python # simplified sensor = horus.Node(name="sensor", ...) # Added first, shuts down last controller = horus.Node(name="controller", ...) # Shuts down second logger = horus.Node(name="logger", ...) # Added last, shuts down first horus.run(sensor, controller, logger) # Shutdown order: logger -> controller -> sensor ``` This prevents the scenario where a sensor disconnects while the motor controller is still running with its last received command. ## Error Handling: `on_error(node, exception)` When `tick()` raises an exception, the scheduler catches it and calls `on_error()` if provided. The callback receives both the node and the exception: ```python # simplified def on_error(node, exception): node.log_error(f"Tick failed: {exception}") # Track consecutive errors if not hasattr(node, "_error_count"): node._error_count = 0 node._error_count += 1 if node._error_count > 10: node.log_error("Too many consecutive errors -- requesting stop") node.request_stop() ``` **How error escalation works**: 1. `tick()` raises an exception 2. If `on_error` is set, it is called with `(node, exception)` 3. If `on_error` handles the exception (does not re-raise), the node continues ticking 4. If `on_error` itself raises, or if no `on_error` is set, the exception propagates to the Rust `FailurePolicy` 5. The `FailurePolicy` decides what happens next: | Policy | Behavior | |--------|----------| | `"fatal"` (default) | Stop the entire scheduler | | `"restart"` | Re-initialize and resume the node | | `"skip"` | Skip this tick, continue with next | | `"ignore"` | Silently continue | ```python # simplified # Node with custom error handling + skip policy as fallback node = horus.Node( name="detector", tick=detect_objects, on_error=handle_detection_error, failure_policy="skip", # If on_error also fails, skip this tick rate=30, ) ``` **When to use `on_error` vs `failure_policy`**: Use `on_error` when you want to inspect the exception, track error counts, or take corrective action in Python. Use `failure_policy` as a safety net for unhandled exceptions. They compose -- `on_error` runs first, and `failure_policy` only fires if the exception is not handled. ## Topic Operations All topic communication happens through the `node` object passed to your callbacks. Topics must be declared at construction time via `pubs` and `subs`. ### `node.send(topic, data)` Publish a message. Non-blocking. If the ring buffer is full, the oldest unread message is overwritten. ```python # simplified # Dict (generic -- MessagePack serialization) node.send("status", {"battery": 85.2, "state": "running"}) # Typed message (zero-copy POD transport) from horus import CmdVel node.send("cmd_vel", CmdVel(linear=1.0, angular=0.5)) # Primitive node.send("temperature", 22.5) ``` ### `node.recv(topic)` Receive one message (FIFO order). Returns `None` if no messages are available. ```python # simplified msg = node.recv("scan") if msg is not None: process(msg) ``` ### `node.recv_all(topic)` Drain all available messages as a list. Returns an empty list if none are available. Use this for batch processing when every message matters: ```python # simplified def tick(node): commands = node.recv_all("commands") for cmd in commands: execute(cmd) node.log_debug(f"Processed {len(commands)} commands this tick") ``` ### `node.has_msg(topic)` Check if at least one message is available without consuming it. The message is buffered internally and returned by the next `recv()` call. ```python # simplified def tick(node): if node.has_msg("emergency_stop"): stop = node.recv("emergency_stop") node.log_warning("Emergency stop received") node.request_stop() ``` ### Multi-Topic Pattern When your node subscribes to multiple topics, always drain all of them every tick. Cache the latest value from each and process when all are available: ```python # simplified class SensorFusion: def __init__(self): self.last_imu = None self.last_odom = None def tick(self, node): # ALWAYS drain both topics -- even if you only act when both are present imu = node.recv("imu") odom = node.recv("odom") if imu is not None: self.last_imu = imu if odom is not None: self.last_odom = odom if self.last_imu and self.last_odom: fused = self.fuse(self.last_imu, self.last_odom) node.send("pose", fused) ``` ## Logging HORUS provides structured logging through the node object. Log messages are routed to the HORUS logging system, which integrates with the TUI monitor and CLI tools. ```python # simplified def tick(node): node.log_info("Processing frame") node.log_warning("Sensor reading is stale") node.log_error("Failed to connect to motor") node.log_debug(f"Raw value: {value}") ``` | Method | Level | Use for | |--------|-------|---------| | `node.log_debug(msg)` | DEBUG | Verbose diagnostics, variable dumps | | `node.log_info(msg)` | INFO | Normal operation milestones | | `node.log_warning(msg)` | WARNING | Degraded operation, stale data | | `node.log_error(msg)` | ERROR | Failures that affect behavior | **Logging only works during lifecycle callbacks** -- inside `init()`, `tick()`, `shutdown()`, and `on_error()`. Calling logging methods outside the scheduler produces a `RuntimeWarning` and the message is silently dropped: ```python # simplified node = horus.Node(name="test", tick=my_tick, pubs=["data"], rate=10) # This is dropped with a RuntimeWarning -- scheduler is not running yet node.log_info("This will not appear") # This works -- called inside tick() by the scheduler def my_tick(node): node.log_info("This appears in the log") ``` This restriction exists because the HORUS logging system requires the scheduler's runtime context (timestamps, node identity, routing) to function. Outside the scheduler, that context does not exist. ## NodeInfo: Runtime Metrics During `init()`, `tick()`, and `shutdown()`, the scheduler attaches a `NodeInfo` object to `node.info`. It provides scheduler-managed metrics: | Method/Property | Returns | Description | |----------------|---------|-------------| | `node.info.name` | `str` | Node name | | `node.info.state` | `str` | Current NodeState value | | `node.info.tick_count()` | `int` | Total ticks executed | | `node.info.error_count()` | `int` | Total errors encountered | | `node.info.successful_ticks()` | `int` | Ticks that completed without error | | `node.info.avg_tick_duration_ms()` | `float` | Average tick execution time | | `node.info.get_uptime()` | `float` | Seconds since init() completed | | `node.info.get_metrics()` | `dict` | Full metrics snapshot | | `node.info.request_stop()` | -- | Signal the scheduler to stop | ```python # simplified def tick(node): # Periodic health report if node.info.tick_count() % 1000 == 0: metrics = node.info.get_metrics() node.log_info( f"Health: {node.info.tick_count()} ticks, " f"avg={node.info.avg_tick_duration_ms():.2f}ms, " f"errors={node.info.error_count()}" ) # Self-monitoring: stop if error rate is too high total = node.info.tick_count() if total > 100 and node.info.error_count() / total > 0.1: node.log_error("Error rate exceeded 10% -- shutting down") node.info.request_stop() ``` `node.info` is `None` outside of lifecycle callbacks. Do not cache it -- the scheduler replaces it each tick with updated metrics. ## Scheduler Context Manager The `Scheduler` (not `Node`) supports Python's `with` statement for automatic cleanup. When the `with` block exits -- whether normally or via exception -- `stop()` is called automatically: ```python # simplified import horus with horus.Scheduler(tick_rate=100) as sched: sched.add(horus.Node(name="sensor", tick=sensor_fn, rate=100, order=0, pubs=["data"])) sched.add(horus.Node(name="ctrl", tick=ctrl_fn, rate=100, order=1, subs=["data"])) sched.run(duration=10.0) # stop() called automatically here -- all shutdown() callbacks run ``` This is equivalent to wrapping the scheduler in a try/finally block. Use it in scripts and tests to ensure clean shutdown even when exceptions occur. For most applications, `horus.run()` is simpler: ```python # simplified # horus.run() handles scheduler creation and cleanup internally horus.run(sensor, controller, logger, duration=10.0) ``` ## Complete Example: ML Object Detector This example ties every lifecycle phase together -- a real-world pattern where an ML model processes camera frames and publishes detection results: ```python # simplified import horus class ObjectDetector: def __init__(self): self.model = None self.frame_count = 0 self.detection_count = 0 self.log_file = None def init(self, node): # Load model (slow -- do it here, not in tick) import torch self.model = torch.hub.load("ultralytics/yolov5", "yolov5s") self.model.eval() # Open log file self.log_file = open("/tmp/detections.csv", "w") self.log_file.write("frame,num_detections,latency_ms\n") node.log_info("YOLOv5 model loaded, logging to /tmp/detections.csv") def tick(self, node): frame = node.recv("camera.rgb") if frame is None: return self.frame_count += 1 # Run inference import time t0 = time.monotonic() results = self.model(frame["data"]) latency = (time.monotonic() - t0) * 1000 detections = results.xyxy[0].tolist() self.detection_count += len(detections) # Publish results node.send("detections", { "frame": self.frame_count, "objects": detections, "latency_ms": latency, }) # Log to file self.log_file.write(f"{self.frame_count},{len(detections)},{latency:.1f}\n") if latency > 50: node.log_warning(f"Inference slow: {latency:.1f}ms") def on_error(self, node, exception): node.log_error(f"Detection failed on frame {self.frame_count}: {exception}") # Continue -- skip this frame, try the next one def shutdown(self, node): if self.log_file: self.log_file.flush() self.log_file.close() node.log_info( f"Detector stopped: {self.frame_count} frames, " f"{self.detection_count} detections" ) detector = ObjectDetector() node = horus.Node( name="detector", tick=detector.tick, init=detector.init, shutdown=detector.shutdown, on_error=detector.on_error, subs=["camera.rgb"], pubs=["detections"], rate=30, order=50, failure_policy="skip", # Skip frame on unhandled error ) horus.run(node) ``` ## Design Decisions **Why callbacks instead of a base class?** A base class (`class MyNode(horus.Node)`) would require every node to be a class, even trivial ones. Callbacks let you use plain functions for simple nodes and classes for complex ones. The same `Node` constructor handles both -- pass a function or a bound method. **Why does `init()` run lazily instead of at construction?** Construction happens when your Python module loads. `init()` runs when the scheduler starts. This separation means you can construct all nodes, configure the scheduler, and defer hardware initialization until the system is actually ready. It also means `init()` can depend on the scheduler's configuration (clock source, RT settings) being finalized. **Why does `on_error` receive `(node, exception)` instead of just `(exception)`?** The node parameter gives your error handler access to `node.log_error()`, `node.request_stop()`, `node.info`, and topic operations. Without it, error handlers would need to capture the node via closure -- forcing every error handler to be a closure or bound method, even for simple "log and continue" cases. **Why is logging restricted to lifecycle callbacks?** HORUS logging integrates with the scheduler's runtime context -- timestamps, node identity, log routing, and the TUI monitor. Outside the scheduler, that context does not exist. Rather than silently producing broken log entries, HORUS warns you that the message was dropped. Use Python's standard `print()` or `logging` module for messages outside the scheduler. **Why reverse-order shutdown instead of parallel?** Parallel shutdown is faster but introduces race conditions. If the sensor and motor controller shut down simultaneously, the motor controller might try to read one last sensor value from a topic that has already been deallocated. Reverse-order shutdown is deterministic: controllers always stop before the sensors they depend on. ## Trade-offs | Gain | Cost | |------|------| | **Callback-based API** -- functions and classes both work without boilerplate | No static type checking on callback signatures (Python limitation) | | **Lazy initialization** -- hardware only opens when the system starts | Must handle the case where `init()` succeeds but `tick()` never runs | | **Structured error handling** -- `on_error` + `failure_policy` compose | Two layers to understand (Python callback + Rust policy) | | **Guaranteed shutdown** -- runs even on Ctrl+C and SIGTERM | Must implement `shutdown()` for every node with hardware | | **Reverse-order shutdown** -- deterministic dependency cleanup | Cannot shut down independent nodes in parallel | | **Logging only in callbacks** -- consistent timestamps and routing | Cannot use `node.log_*()` for setup messages before `run()` | ## See Also - [Python API Reference](/python/api) -- Complete Node, Scheduler, and Topic API - [Python Bindings](/python/api/python-bindings) -- Full Python API with Scheduler, typed messages, and multiprocess - [Nodes -- Full Reference](/concepts/core-concepts-nodes) -- Rust Node trait, lifecycle, and safety mechanisms - [Scheduler -- Full Reference](/concepts/core-concepts-scheduler) -- Execution model, RT enforcement, and deterministic mode - [Execution Classes](/concepts/execution-classes) -- How `rate`, `compute`, `on`, and `async` map to executors --- ## Hardware API Path: /python/api/drivers Description: Python hardware config loading — hardware.load(), NodeParams, register() # Hardware API Load hardware node configurations from `horus.toml` and use them in your Python nodes. ```python # simplified import horus entries = horus.hardware.load() for name, obj in entries: print(f"Loaded: {name}") ``` --- ## Loading Hardware ### `hardware.load()` Reads the `[hardware]` section from `horus.toml` (searches current directory and up to 10 parents). Returns a list of `(name, obj)` tuples. If a registered Python class matches the `use` field, `obj` is an instance of that class. Otherwise, `obj` is a `NodeParams` dict-like with the config values. ```python # simplified entries = horus.hardware.load() ``` ### `hardware.load_from(path)` Load from a specific config file. Useful for testing. ```python # simplified entries = horus.hardware.load_from("tests/test_hardware.toml") ``` --- ## NodeParams Typed access to config values from a `[hardware.NAME]` table. | Method | Returns | Description | |--------|---------|-------------| | `params.get(key)` | value | Required param — raises `KeyError` if missing | | `params.get_or(key, default)` | value | Optional param with fallback | | `params.has(key)` | `bool` | Whether key exists | | `params.keys()` | `list[str]` | All param names | | `len(params)` | `int` | Number of params | | `params[key]` | value | Dict-like access | | `key in params` | `bool` | Dict-like contains | TOML types are auto-converted to Python: `str`, `int`, `float`, `bool`, `list`. ```python # simplified entries = horus.hardware.load() for name, obj in entries: if isinstance(obj, horus.NodeParams): port = obj.get_or("port", "/dev/ttyUSB0") baud = obj.get_or("baudrate", 115200) print(f"{name}: {port} @ {baud}") ``` --- ## Registering Python Drivers Register a Python class so `hardware.load()` instantiates it automatically when the `use` field matches: ```python # simplified import horus class ConveyorDriver(horus.Node): def __init__(self, params): super().__init__( name="conveyor", pubs=["conveyor.velocity"], rate=params.get_or("rate", 50), ) self.port = params.get_or("port", "/dev/ttyACM0") self.speed = params.get_or("speed", 1.0) def tick(self): self.send("conveyor.velocity", horus.CmdVel(self.speed, 0.0)) # Register the class horus.hardware.register_driver("ConveyorDriver", ConveyorDriver) ``` Then in `horus.toml`: ```toml [hardware.conveyor] use = "ConveyorDriver" port = "/dev/ttyACM0" speed = 0.5 ``` And in your main script: ```python # simplified entries = horus.hardware.load() nodes = [obj for _, obj in entries if isinstance(obj, horus.Node)] horus.run(*nodes) ``` --- ## Complete Example ```toml # horus.toml [hardware.imu] use = "Bno055Driver" bus = 1 address = 104 [hardware.motors] use = "MotorDriver" port = "/dev/ttyUSB0" baudrate = 115200 ``` ```python # simplified import horus import smbus2 import serial class Bno055Driver(horus.Node): def __init__(self, params): super().__init__(name="imu", pubs=["imu"], rate=100) bus_num = params.get_or("bus", 1) self.addr = params.get_or("address", 0x68) self.i2c = smbus2.SMBus(bus_num) def tick(self): data = self.i2c.read_i2c_block_data(self.addr, 0x08, 6) self.send("imu", horus.Imu(linear_acceleration=parse_accel(data))) class MotorDriver(horus.Node): def __init__(self, params): super().__init__(name="motors", subs=["cmd_vel"], rate=50) port = params.get("port") baud = params.get_or("baudrate", 115200) self.ser = serial.Serial(port, baud) def tick(self): if self.has_msg("cmd_vel"): cmd = self.recv("cmd_vel") self.ser.write(encode_motor_cmd(cmd)) horus.hardware.register_driver("Bno055Driver", Bno055Driver) horus.hardware.register_driver("MotorDriver", MotorDriver) entries = horus.hardware.load() nodes = [obj for _, obj in entries if isinstance(obj, horus.Node)] horus.run(*nodes) ``` --- ## Simulation Override Mark hardware entries with `sim = true` to swap them for stubs in simulation mode: ```toml [hardware.imu] use = "Bno055Driver" bus = 1 sim = true ``` ```bash horus run # real hardware horus run --sim # sim mode — stub nodes, simulator publishes to same topics ``` --- ## Error Handling ```python # simplified try: entries = horus.hardware.load() except Exception as e: print(f"No hardware config: {e}") entries = [] for name, obj in entries: if isinstance(obj, horus.NodeParams): print(f" {name}: no registered class (params only)") ``` --- ## Legacy Support The `[drivers]` section name and old source keys (`terra`, `node`, `package`) are still parsed. Migrate to `[hardware]` with the `use` field: ```toml # Old [drivers.imu] terra = "mpu6050" bus = 1 # New [hardware.imu] use = "mpu6050" bus = 1 sim = true ``` Use `horus.hardware` to access hardware drivers. --- ## See Also - [Rust Hardware API](/rust/api/drivers) — Rust equivalent - [Hardware Drivers Tutorial (Python)](/tutorials/05-hardware-and-rt-python) — Step-by-step guide - [Real Hardware Recipe](/recipes/real-hardware) — Complete I2C + serial examples - [Configuration Reference](/package-management/configuration) — `[hardware]` syntax in horus.toml --- ## Perception Messages Path: /python/messages/perception Description: Object detection, tracking, landmarks, segmentation, and plane detection for Python robotics — every method explained # Perception Messages Perception messages carry computer vision results — detected objects, tracked targets, body pose keypoints, segmentation masks, and plane surfaces. These are the outputs of your ML models and the inputs to your planning/control systems. ```python # simplified from horus import ( BoundingBox2D, BoundingBox3D, Detection, Detection3D, TrackedObject, TrackingHeader, Landmark, Landmark3D, LandmarkArray, PlaneDetection, PlaneArray, SegmentationMask, ) ``` --- ## BoundingBox2D Axis-aligned bounding box in 2D image coordinates. The fundamental output of object detectors like YOLO, SSD, Faster R-CNN. ### Constructor ```python # simplified bbox = BoundingBox2D(x=10.0, y=20.0, width=100.0, height=200.0) ``` ### `.from_center(cx, cy, width, height)` — From Center Point ```python # simplified bbox = BoundingBox2D.from_center(cx=60.0, cy=120.0, width=100.0, height=200.0) ``` Many ML models output bounding boxes as (center_x, center_y, width, height). This factory creates a BoundingBox2D from that format. ### `.area()` — Box Area in Pixels ```python # simplified print(bbox.area()) # 20000.0 ``` Width × height. Use for filtering — ignore very small detections (noise) or very large ones (false positives spanning the whole image). ### `.iou(other)` — Intersection Over Union ```python # simplified bbox_a = BoundingBox2D(x=0.0, y=0.0, width=100.0, height=100.0) bbox_b = BoundingBox2D(x=50.0, y=50.0, width=100.0, height=100.0) print(bbox_a.iou(bbox_b)) # ~0.143 (partial overlap) ``` Returns 0.0 (no overlap) to 1.0 (identical boxes). **This is the core metric for non-maximum suppression (NMS)** — when your detector finds multiple boxes for the same object, keep the highest-confidence one and suppress any box with IoU > threshold (typically 0.3-0.5). ```python # simplified # Simple NMS pattern detections.sort(key=lambda d: d.confidence, reverse=True) kept = [] for det in detections: if all(det.bbox.iou(k.bbox) < 0.5 for k in kept): kept.append(det) ``` ### `.as_tuple()` / `.as_xyxy()` — Format Conversion ```python # simplified x, y, w, h = bbox.as_tuple() # (x, y, width, height) x1, y1, x2, y2 = bbox.as_xyxy() # (x_min, y_min, x_max, y_max) ``` Different drawing libraries expect different formats. OpenCV uses (x, y, w, h), some plotting tools use (x1, y1, x2, y2). --- ## BoundingBox3D A 3D bounding box with center, dimensions, and orientation. The constructor takes a single `yaw` angle for ground-plane rotation (the most common case). For full 3D orientation, use `with_rotation`. ### Constructor ```python # simplified bbox = BoundingBox3D(cx=1.0, cy=2.0, cz=0.5, length=2.0, width=1.0, height=1.5, yaw=0.3) ``` ### Fields | Field | Type | Description | |-------|------|-------------| | `cx`, `cy`, `cz` | `float` | Center position in meters | | `length` | `float` | Size along the X axis (meters) | | `width` | `float` | Size along the Y axis (meters) | | `height` | `float` | Size along the Z axis (meters) | | `yaw` | `float` | Rotation around the Z axis (radians) | ### `.with_rotation(cx, cy, cz, length, width, height, roll, pitch, yaw)` — Full 3D Rotation ```python # simplified bbox = BoundingBox3D.with_rotation( cx=1.0, cy=2.0, cz=0.5, length=2.0, width=1.0, height=1.5, roll=0.0, pitch=0.1, yaw=0.3 ) ``` Use this when the detected object is tilted or on a slope. The constructor only accepts `yaw` (rotation around the vertical axis), which is sufficient for objects on flat ground. `with_rotation` lets you specify all three Euler angles for objects at arbitrary orientations — a crate on a ramp, a drone in flight, or a wall-mounted sensor. **Example — 3D Detection from LiDAR:** ```python # simplified from horus import BoundingBox3D # Detected car: 4.5m long, 1.8m wide, 1.5m tall, heading 30 degrees car_box = BoundingBox3D( cx=5.0, cy=2.0, cz=0.75, length=4.5, width=1.8, height=1.5, yaw=0.524 # ~30 degrees ) ``` --- ## Detection A single 2D object detection result — class + confidence + bounding box. ### Constructor ```python # simplified det = Detection(class_name="person", confidence=0.95, x=10.0, y=20.0, width=100.0, height=200.0) ``` ### Fields | Field | Type | Description | |-------|------|-------------| | `class_name` | `str` | Detected object class (e.g., "person", "car") | | `confidence` | `float` | Detection confidence, 0.0 to 1.0 | | `x`, `y` | `float` | Top-left corner of bounding box (pixels) | | `width`, `height` | `float` | Bounding box dimensions (pixels) | | `class_id` | `int` | Numeric class ID (optional, set via `with_class_id()`) | ### `.is_confident(threshold)` — Filter Low Confidence ```python # simplified if det.is_confident(0.5): print(f"Detected {det.class_name} at {det.confidence:.0%}") ``` Returns `True` if confidence exceeds the threshold. Typical thresholds: - **0.3-0.5**: Real-time applications (more detections, some false positives) - **0.7-0.9**: High-precision applications (fewer detections, almost no false positives) ### `.with_class_id(class_id)` — Set Numeric Class ID ```python # simplified det = det.with_class_id(1) # COCO class ID for "person" ``` Returns a new Detection with the class ID set. Many ML frameworks output numeric class IDs alongside string names. --- ## Detection3D 3D object detection with position, size, and optional velocity. ### Constructor ```python # simplified det3d = Detection3D(class_name="car", confidence=0.9, cx=5.0, cy=2.0, cz=0.0, length=4.5, width=1.8, height=1.5) ``` ### Fields | Field | Type | Description | |-------|------|-------------| | `class_name` | `str` | Detected object class | | `confidence` | `float` | Detection confidence, 0.0 to 1.0 | | `cx`, `cy`, `cz` | `float` | Center position in meters | | `length`, `width`, `height` | `float` | Object dimensions in meters | | `vx`, `vy`, `vz` | `float` | Velocity components (m/s, optional) | ### `.with_velocity(vx, vy, vz)` — Add Motion Estimate ```python # simplified det3d = det3d.with_velocity(vx=10.0, vy=0.0, vz=0.0) # Moving at 10 m/s in x ``` Returns a new Detection3D with velocity components. Use when your 3D detector also estimates object motion (e.g., from multi-frame tracking or radar fusion). --- ## TrackedObject Multi-object tracking state with a lifecycle: **tentative → confirmed → deleted**. A new detection starts as **tentative**. After being seen in multiple consecutive frames, it's **confirmed**. If it's not seen for too long, it's **deleted**. This lifecycle prevents spurious single-frame detections from being treated as real objects. ### Constructor ```python # simplified tracked = TrackedObject(track_id=42, class_id=1, confidence=0.9, x=1.0, y=2.0, width=3.0, height=4.0) ``` ### Fields | Field | Type | Description | |-------|------|-------------| | `track_id` | `int` | Unique track identifier (persists across frames) | | `class_id` | `int` | Object class ID | | `confidence` | `float` | Latest detection confidence | | `x`, `y` | `float` | Bounding box top-left (pixels) | | `width`, `height` | `float` | Bounding box dimensions (pixels) | ### `.is_tentative()` / `.is_confirmed()` / `.is_deleted()` — State Queries ```python # simplified if tracked.is_tentative(): print("New detection — not yet reliable") elif tracked.is_confirmed(): print("Stable track — use for planning") elif tracked.is_deleted(): print("Lost track — remove from state") ``` ### `.confirm()` — Promote to Confirmed ```python # simplified tracked.confirm() # Tentative → Confirmed ``` Call after the object has been matched across enough frames (typically 3-5). Only confirmed tracks should be used for navigation and planning decisions. ### `.update(bbox, confidence)` — New Frame Data ```python # simplified tracked.update(new_bbox, new_confidence) ``` Updates the track with the latest detection. Resets the "time since update" counter. Call this every frame where the object is re-detected. > **Common mistake:** Forgetting to call `update()` for matched tracks. Without it, `time_since_update` grows and the track eventually gets deleted even though you keep detecting the object. ### `.mark_missed()` — Not Seen This Frame ```python # simplified tracked.mark_missed() ``` Call when the object was NOT detected in the current frame. Increments the miss counter — after enough misses, the track should be deleted. ### `.delete()` — Remove Track ```python # simplified tracked.delete() ``` Marks the track as deleted. `is_deleted()` returns `True`. ### `.speed()` / `.heading()` — Motion Estimation ```python # simplified print(f"Speed: {tracked.speed():.1f} px/frame") print(f"Heading: {tracked.heading():.1f} rad") ``` Computed from the tracked trajectory. Speed is in pixels per frame (or meters per frame if tracking in world coordinates). Heading is the direction of motion in radians. --- ## Landmark, Landmark3D, LandmarkArray Body pose estimation keypoints — skeleton joints from COCO, MediaPipe, or custom pose models. ### `Landmark` — 2D Keypoint ```python # simplified lm = Landmark(x=100.0, y=200.0, visibility=0.95, index=5) visible_lm = Landmark.visible(x=100.0, y=200.0, index=5) # visibility=1.0 ``` ### `.is_visible(threshold)` — Filter Occluded Keypoints ```python # simplified if lm.is_visible(0.5): # Keypoint is visible — use for pose estimation pass ``` Visibility is a confidence score (0.0 = occluded/not detected, 1.0 = clearly visible). Filter low-visibility keypoints to avoid using unreliable data. ### `.distance_to(other)` — Keypoint Distance ```python # simplified dist = left_wrist.distance_to(right_wrist) ``` ### `Landmark3D` — 3D Keypoint ```python # simplified lm3d = Landmark3D(x=1.0, y=2.0, z=0.5, visibility=0.9, index=10) lm2d = lm3d.to_2d() # Project to 2D (drops z) ``` ### `LandmarkArray` — Skeleton Presets ```python # simplified # Standard presets for popular pose models skeleton = LandmarkArray.coco_pose() # 17 COCO keypoints skeleton = LandmarkArray.mediapipe_pose() # 33 MediaPipe pose keypoints hand = LandmarkArray.mediapipe_hand() # 21 hand keypoints face = LandmarkArray.mediapipe_face() # 478 face mesh keypoints ``` These presets set the correct number of landmarks and dimension for each model. Fill in the actual keypoint coordinates from your model's output. --- ## PlaneDetection Detected planar surfaces — floors, walls, tables. Used for navigation (floor detection), manipulation (table surface), and augmented reality. ### Fields | Field | Type | Description | |-------|------|-------------| | `nx`, `ny`, `nz` | `float` | Plane normal vector components | | `d` | `float` | Distance from origin to plane along the normal | | `confidence` | `float` | Detection confidence, 0.0 to 1.0 | ### `.distance_to_point(px, py, pz)` — Point-to-Plane Distance ```python # simplified plane = PlaneDetection(...) dist = plane.distance_to_point(1.0, 2.0, 0.5) # Signed distance — positive = above plane, negative = below ``` The signed perpendicular distance from a point to the plane. Use this to check if objects are on, above, or below a surface. ### `.contains_point(px, py, pz, tolerance)` — Is a Point on This Plane? ```python # simplified if plane.contains_point(1.0, 2.0, 0.01, tolerance=0.05): print("Point is on the table surface (within 5cm)") ``` Returns `True` if the point is within `tolerance` meters of the plane. Use for classifying which objects are on which surface. --- ## SegmentationMask Pixel-level image segmentation — semantic (class per pixel), instance (unique ID per object), or panoptic (both). ### Factory Methods ```python # simplified # Semantic: what class is each pixel? mask = SegmentationMask.semantic(width=640, height=480, num_classes=21) # Instance: which object is each pixel? mask = SegmentationMask.instance(width=640, height=480) # Panoptic: both class AND instance for each pixel mask = SegmentationMask.panoptic(width=640, height=480, num_classes=21) ``` ### `.is_semantic()` / `.is_instance()` / `.is_panoptic()` — Check Type ```python # simplified if mask.is_semantic(): print("Semantic segmentation mask") ``` ### Fields | Field | Type | Description | |-------|------|-------------| | `width` | `int` | Mask width in pixels | | `height` | `int` | Mask height in pixels | | `num_classes` | `int` | Number of semantic classes (semantic/panoptic only) | ### `.data_size()` / `.data_size_u16()` ```python # simplified print(f"Data: {mask.data_size()} bytes (u8), {mask.data_size_u16()} elements (u16)") ``` **Example — Check Driveable Area:** ```python # simplified from horus import SegmentationMask, Topic seg_topic = Topic(SegmentationMask) ROAD_CLASS = 7 # Example: COCO stuff class for "road" def check_driveable(node): mask = seg_topic.recv(node) if mask is None or not mask.is_semantic(): return # Count pixels belonging to the road class # (actual pixel access depends on your data pipeline) print(f"Segmentation mask: {mask.width}x{mask.height}, {mask.num_classes} classes") ``` --- ## PointField Describes a single field in a point cloud — name, byte offset, datatype, and element count. Used when defining custom point cloud formats (e.g., XYZ + RGB + intensity). ### Constructor ```python # simplified field = PointField(name="x", offset=0, datatype=7, count=1) # FLOAT32 ``` ### `.field_size()` — Byte Size of One Element ```python # simplified size = field.field_size() # e.g., 4 for FLOAT32, 8 for FLOAT64 ``` Returns the byte size of a single element based on the `datatype`. Useful when computing byte offsets for the next field in a point cloud layout, or when parsing raw point cloud buffers. --- ## Design Decisions **Why `BoundingBox2D.iou()` instead of a standalone function?** IoU is always computed between two boxes. Making it a method (`bbox_a.iou(bbox_b)`) reads naturally and avoids importing a separate utility. It also ensures both boxes use the same coordinate convention (top-left origin, positive width/height). **Why does `TrackedObject` have an explicit lifecycle (tentative/confirmed/deleted)?** Without lifecycle management, every detection is treated equally. A single-frame false positive would trigger the same response as a stable track. The lifecycle pattern filters noise: only confirmed tracks (seen across multiple frames) should influence planning. This is standard practice in multi-object tracking (SORT, DeepSORT, ByteTrack all use this pattern). **Why separate `Detection` (2D) and `Detection3D` (3D)?** Most object detectors output 2D bounding boxes in image coordinates. 3D detection requires depth information (stereo, LiDAR, monocular depth estimation). Forcing every detection into a 3D struct would waste 7 fields per detection for the 2D case (which is the 90% case). Separate types keep 2D detections lightweight while giving 3D detections full spatial information. **Why `LandmarkArray` presets (`coco_pose()`, `mediapipe_pose()`) instead of a generic constructor?** Different pose models output different numbers of keypoints in different orders. COCO has 17 keypoints, MediaPipe Pose has 33, MediaPipe Hand has 21. The presets pre-allocate the correct number of landmarks and set the right dimension, preventing size mismatches between your model output and the message. **Why `PlaneDetection.distance_to_point()` returns a signed distance?** The sign tells you which side of the plane the point is on. Positive means above the plane (same side as the normal), negative means below. This is essential for tasks like "is this object on the table?" (distance near zero) or "is the robot above the floor?" (distance should be positive). --- ## See Also - [Vision Messages](/python/messages/vision) — Image, PointCloud, DepthImage - [Geometry Messages](/python/messages/geometry) — Point3 for 3D positions - [Navigation Messages](/python/messages/navigation) — OccupancyGrid for mapping from detections - [Python Message Library](/python/library/python-message-library) — All 55+ message types overview --- ## Topics Deep-Dive (Python) Path: /python/topics-guide Description: Complete guide to HORUS topics in Python: declaration styles, performance paths, receive patterns, cross-process communication, and the standalone Topic class # Topics Deep-Dive (Python) A warehouse robot has a Python ML node running YOLO at 30 FPS, a Rust motor controller ticking at 1 kHz, and a Python dashboard logging everything. The ML node detects an obstacle and needs to tell the motor controller to stop -- in under 2 microseconds, across process boundaries, without the GIL blocking the Rust side. HORUS topics make this work. Every `node.send()` writes into a shared-memory ring buffer. Every `node.recv()` reads from it. No sockets, no serialization for typed messages, no kernel involvement. The Python node and the Rust node share the same memory region -- the ML node writes a `CmdVel`, and the motor controller reads it directly. This page covers the Python topic API in depth: how to declare topics, what determines your latency, how to receive messages correctly, and when to use the standalone `Topic` class outside the scheduler. --- ## How Topics Work A topic is a named ring buffer in shared memory. When you call `node.send("cmd_vel", data)`, HORUS writes `data` into the ring buffer for the topic named `"cmd_vel"`. When another node calls `node.recv("cmd_vel")`, it reads the oldest unread message from that same buffer. ``` Python Node A Shared Memory Rust/Python Node B ┌─────────────┐ ┌─────────────────────────┐ ┌─────────────────┐ │ node.send() │ ──write──│ Ring Buffer (1024 slots)│──read── │ node.recv() │ │ │ │ topic: "cmd_vel" │ │ │ └─────────────┘ └─────────────────────────┘ └─────────────────┘ ``` Key properties: - **Lock-free**: Writers and readers never block each other. `send()` always returns immediately. - **Bounded**: The ring buffer has a fixed capacity (default: 1024 messages). When full, the oldest unread message is overwritten. - **Cross-language**: A Python publisher and a Rust subscriber share the exact same SHM region. No bridge process, no translation layer. - **Auto-discovered**: Two nodes that use the same topic name are connected automatically. No configuration, no broker. --- ## Three Ways to Declare Topics Topic declarations go in the `pubs` and `subs` parameters of `horus.Node()`. The declaration style determines the **performance path** -- string topics use generic serialization, typed topics use zero-copy Pod transfer. ### String Topics (Generic) ```python # simplified import horus node = horus.Node( name="logger", subs=["sensor.data", "motor.status"], pubs=["log.output"], tick=my_tick, rate=10, ) ``` String topics use `GenericMessage` under the hood -- your data is serialized with MessagePack before writing to SHM. This handles any Python-serializable value: dicts, lists, strings, numbers, nested structures. ```python # simplified def my_tick(node): # Dicts, lists, nested structures -- all work node.send("log.output", { "level": "info", "message": "Motor started", "details": {"voltage": 12.4, "current": 1.2}, }) ``` **Latency**: ~5--50 us per send+recv round-trip (depends on message size). The cost is serialization and deserialization -- MessagePack must convert your Python dict to bytes and back. **When to use**: Prototyping, logging, configuration updates, any data that changes shape frequently, or when you need to send arbitrary Python objects. ### Typed Topics (Pod Zero-Copy) ```python # simplified from horus import Node, CmdVel, Imu, LaserScan node = Node( name="controller", subs=[Imu, LaserScan], pubs=[CmdVel], tick=control_tick, rate=100, ) ``` When you pass a message class instead of a string, HORUS uses the Pod (plain-old-data) path. The message type has a fixed binary layout known at compile time. The Python wrapper writes fields directly into SHM -- no serialization step. Topic names are derived automatically from the type: | Type | Auto-generated name | |------|-------------------| | `CmdVel` | `"cmd_vel"` | | `Imu` | `"imu"` | | `LaserScan` | `"scan"` | | `Pose2D` | `"pose2d"` | | `Odometry` | `"odometry"` | ```python # simplified from horus import CmdVel def control_tick(node): cmd = CmdVel(linear=1.0, angular=0.0) node.send("cmd_vel", cmd) ``` **Latency**: ~1.5 us per send+recv round-trip. This is 3--30x faster than the generic path because there is no serialization -- the Pod struct is memory-mapped directly. **When to use**: Control loops, sensor data, any hot path where latency matters. Use typed topics for anything running above 10 Hz. ### Named Typed Topics ```python # simplified node = horus.Node( name="dual_arm", pubs={"left.cmd": CmdVel, "right.cmd": CmdVel}, subs={"left.odom": Odometry, "right.odom": Odometry}, tick=dual_arm_tick, rate=100, ) ``` A dict maps custom topic names to types. This gives you the Pod zero-copy performance path with explicit control over the topic name -- essential when you have multiple topics of the same type (two arms, two cameras, four wheels). ```python # simplified def dual_arm_tick(node): left_odom = node.recv("left.odom") right_odom = node.recv("right.odom") if left_odom and right_odom: node.send("left.cmd", CmdVel(linear=1.0, angular=0.0)) node.send("right.cmd", CmdVel(linear=1.0, angular=0.0)) ``` **Latency**: Same as typed topics (~1.5 us). The dict syntax only changes the name, not the transport mechanism. ### Declaration Summary | Style | Syntax | Transport | Latency | Best for | |-------|--------|-----------|---------|----------| | String | `pubs=["data"]` | GenericMessage (MessagePack) | ~5--50 us | Dicts, prototyping, flexible data | | Typed | `pubs=[CmdVel]` | Pod zero-copy | ~1.5 us | Control loops, sensor data | | Named typed | `pubs={"cmd": CmdVel}` | Pod zero-copy | ~1.5 us | Multiple topics of the same type | --- ## Performance: Generic vs Typed The difference between generic and typed topics is not academic. At high frequencies, serialization overhead dominates: | Scenario | Generic (string) | Typed (Pod) | Speedup | |----------|-----------------|-------------|---------| | `CmdVel` (16 bytes) at 100 Hz | ~10 us/msg | ~1.5 us/msg | 6.7x | | `Imu` (~300 bytes) at 200 Hz | ~25 us/msg | ~2.5 us/msg | 10x | | `dict` with 5 keys at 30 Hz | ~15 us/msg | N/A | -- | At 1 kHz, a 25 us generic send+recv consumes 2.5% of every tick cycle. A 1.5 us typed send+recv consumes 0.15%. For a motor controller with a 1 ms budget, that difference is the margin between meeting deadlines and missing them. **Rule of thumb**: If the topic runs above 10 Hz and carries structured data with a known schema, use a typed topic. If the data is ad-hoc (debug dicts, configuration blobs, log messages), use a string topic. ### Why the Difference? Generic topics must serialize your Python dict into bytes (MessagePack encoding), copy those bytes into SHM, then deserialize on the receiver side. Typed topics have a fixed binary layout -- the Python wrapper writes each field at its known offset in the SHM slot, and the receiver reads from those same offsets. No encoding, no decoding, no intermediate byte buffer. --- ## Receiving Messages ### `recv()` -- Take One Message ```python # simplified def tick(node): msg = node.recv("sensor.data") if msg is not None: process(msg) ``` `recv()` returns the oldest unread message and removes it from the buffer. If nothing is available, it returns `None` immediately -- it never blocks. Each call consumes exactly one message. **Important**: If a publisher sends 5 messages between your ticks, a single `recv()` returns only the first one. The other 4 remain in the buffer. If you only ever call `recv()` once per tick but messages arrive faster than you process them, the buffer fills and older messages are dropped. ### `has_msg()` -- Peek Without Consuming ```python # simplified def tick(node): if node.has_msg("emergency"): stop_cmd = node.recv("emergency") handle_emergency(stop_cmd) else: do_normal_work(node) ``` `has_msg()` checks whether at least one message is waiting on the topic, without consuming it. Internally, it performs a `recv()` and buffers the result -- the next `recv()` call returns that same message. This means `has_msg()` followed by `recv()` does not skip a message. Use `has_msg()` when you need to branch on whether data is available before committing to process it. ### `recv_all()` -- Drain the Buffer ```python # simplified def tick(node): commands = node.recv_all("commands") for cmd in commands: execute(cmd) node.log_debug(f"Processed {len(commands)} commands") ``` `recv_all()` returns a list of all available messages, draining the buffer completely. Returns an empty list if nothing is available. This is the correct pattern when you must process every message (logging, event recording, command queues) rather than just the latest. ### Pattern: Keep-Latest The most common pattern in control loops -- drain the buffer and act on only the newest message: ```python # simplified def control_tick(node): # Drain all buffered IMU readings, keep only the latest latest_imu = None for msg in node.recv_all("imu"): latest_imu = msg if latest_imu is not None: cmd = compute_control(latest_imu) node.send("cmd_vel", cmd) ``` This ensures you always act on the freshest data, even if several messages accumulated between ticks. Stale readings are discarded. ### Pattern: Conditional Multi-Topic ```python # simplified def fusion_tick(node): lidar = node.recv("lidar") if node.has_msg("lidar") else None camera = node.recv("camera") if node.has_msg("camera") else None if lidar and camera: fused = fuse_sensors(lidar, camera) node.send("fused", fused) elif lidar: node.send("fused", lidar_only_estimate(lidar)) ``` Check multiple topics and degrade gracefully when some sensors are unavailable. ### Pattern: Batch Processing ```python # simplified def logger_tick(node): events = node.recv_all("events") if events: # Batch write is more efficient than one-at-a-time with open("log.jsonl", "a") as f: for event in events: f.write(json.dumps(event) + "\n") ``` Collect all messages and process them in a batch. More efficient for I/O-heavy consumers that benefit from batching. --- ## Dropped Messages Messages are dropped when the ring buffer is full and a new `send()` overwrites the oldest unread slot. This happens when: - A publisher sends faster than the subscriber reads (subscriber too slow) - A subscriber is temporarily blocked (GIL contention, I/O wait, garbage collection pause) - The buffer capacity is too small for the burst rate ### How to Detect Drops HORUS does not expose a Python-level `dropped_count()` method on nodes. Instead, use these strategies: **Sequence numbers**: Add a counter to your messages and check for gaps on the receiver side: ```python # simplified send_seq = 0 def publisher_tick(node): global send_seq send_seq += 1 node.send("data", {"seq": send_seq, "value": read_sensor()}) last_seq = 0 def subscriber_tick(node): global last_seq for msg in node.recv_all("data"): if msg["seq"] != last_seq + 1: node.log_warning(f"Dropped {msg['seq'] - last_seq - 1} messages") last_seq = msg["seq"] process(msg) ``` **Rate monitoring**: Use `horus monitor --tui` to watch topic publish and subscribe rates in real time. If the publish rate consistently exceeds the subscribe rate, drops are happening. ### How to Prevent Drops | Strategy | How | When | |----------|-----|------| | Drain every tick | Use `recv_all()` instead of single `recv()` | Always, unless you only need latest | | Increase capacity | `default_capacity=4096` on Node | Bursty publishers | | Keep-latest pattern | Drain + use only the newest | Control loops | | Reduce publisher rate | Lower the publisher's `rate` | Publisher is too fast for the subscriber | | Use Rust for the subscriber | Rust nodes process messages faster | GIL is the bottleneck | The default buffer capacity is 1024 messages. For most applications, this is large enough that drops only occur under sustained overload, not brief bursts. --- ## Topic Naming Rules HORUS topic names use **dots** for hierarchy. Never use slashes. ```python # simplified # CORRECT pubs=["sensor.temperature"] pubs=["robot.arm.left.position"] pubs=["camera.front.rgb"] # WRONG -- fails on macOS, may cause subtle bugs on Linux pubs=["sensor/temperature"] pubs=["robot/arm/left/position"] ``` ### Naming Conventions | Pattern | Example | Use for | |---------|---------|---------| | `subsystem.data` | `sensor.temperature` | Most topics | | `subsystem.device.data` | `camera.front.rgb` | Multi-device systems | | `robot.subsystem.data` | `robot1.motor.cmd_vel` | Multi-robot fleets | **Avoid**: Names starting with `_` (reserved for internal use), names containing special characters, names containing `/`. --- ## Cross-Process Topics Topics work across process boundaries with zero configuration. A Python node in one process and a Rust node in another process can share the same topic -- HORUS handles everything through shared memory. ### How Auto-Discovery Works When a node creates a topic, HORUS: 1. Creates (or opens) a shared-memory file named after the topic (e.g., `horus_cmd_vel` for topic `"cmd_vel"`) 2. Writes a `.meta` discovery file containing the topic's type, capacity, and creator PID 3. Maps the SHM region into the process's address space When a second node (even in a different process, even in a different language) creates a topic with the same name, HORUS: 1. Finds the existing SHM file 2. Maps it into the new process's address space 3. Both processes now read and write to the same ring buffer No broker. No discovery service. No network. Just filesystem-level coordination through SHM metadata. ### Python + Rust Interop A Python node and a Rust node share topics seamlessly when they use the same typed messages: **Python process**: ```python # simplified from horus import Node, CmdVel, run def control_tick(node): node.send("cmd_vel", CmdVel(linear=1.0, angular=0.5)) controller = Node(name="py_controller", pubs=[CmdVel], tick=control_tick, rate=50) run(controller) ``` **Rust process** (running separately): ```rust use horus::prelude::*; struct Motor { cmd_sub: Topic } impl Node for Motor { fn name(&self) -> &str { "Motor" } fn tick(&mut self) { if let Some(cmd) = self.cmd_sub.recv() { drive(cmd.linear, cmd.angular); } } } ``` Both use the `CmdVel` Pod type. The Python side writes fields at the same memory offsets the Rust side reads from. No serialization, no translation. The Rust node sees the exact bytes the Python node wrote. ### Namespace Isolation By default, each terminal session gets its own SHM namespace (based on session ID and user ID). Two `horus run` commands in different terminals do not share topics. To explicitly share topics across terminals or processes: ```bash # Both processes must use the same namespace HORUS_NAMESPACE=shared horus run my_publisher.py HORUS_NAMESPACE=shared horus run my_subscriber.py ``` --- ## Standalone Topic Class The `node.send()` and `node.recv()` methods work inside the scheduler tick loop. For code that runs **outside** the scheduler -- scripts, tests, debugging tools, one-shot publishers -- use the standalone `Topic` class directly. ```python # simplified from horus import Topic, CmdVel # Create a typed topic (Pod zero-copy) cmd_topic = Topic(CmdVel) # Send a message cmd_topic.send(CmdVel(linear=1.0, angular=0.0)) # Receive a message msg = cmd_topic.recv() if msg is not None: print(f"linear={msg.linear}, angular={msg.angular}") ``` ### Constructor ```python # simplified Topic(msg_type, capacity=None, endpoint=None) ``` | Parameter | Type | Description | |-----------|------|-------------| | `msg_type` | `type` or `str` | Message class (e.g., `CmdVel`) for typed topics, or a string name for generic topics | | `capacity` | `int` or `None` | Ring buffer capacity. `None` uses the default (1024) | | `endpoint` | `str` or `None` | Custom topic name. `None` auto-derives from the type | ### Methods | Method | Signature | Description | |--------|-----------|-------------| | `send` | `send(message, node=None) -> bool` | Write a message to the ring buffer. Returns `True` on success | | `recv` | `recv(node=None) -> Optional[Any]` | Read the next message. Returns `None` if empty | ### Properties | Property | Type | Description | |----------|------|-------------| | `name` | `str` | Topic name | | `msg_type` | `type` | Message type class | | `endpoint` | `str` or `None` | Custom endpoint if set | | `backend_type` | `str` | Current backend (e.g., `"shm"`, `"intra"`) | ### When to Use Standalone Topics **Testing**: Send test data to a node and verify its output without running the full scheduler. ```python # simplified from horus import Topic, CmdVel # Inject test data cmd_topic = Topic(CmdVel) cmd_topic.send(CmdVel(linear=1.0, angular=0.0)) # Verify the node processed it output_topic = Topic("output") result = output_topic.recv() assert result is not None ``` **One-shot commands**: Send a single command from a script and exit. ```python # simplified from horus import Topic, CmdVel # Send emergency stop Topic(CmdVel).send(CmdVel(linear=0.0, angular=0.0)) ``` **Monitoring tools**: Read topic data from a separate process for visualization or logging. ```python # simplified from horus import Topic, Imu import time imu_topic = Topic(Imu) while True: msg = imu_topic.recv() if msg: print(f"accel=({msg.ax:.2f}, {msg.ay:.2f}, {msg.az:.2f})") time.sleep(0.01) ``` --- ## Auto-Created Topics Topics used in `send()` or `recv()` that were not declared in `pubs`/`subs` are created automatically on first use: ```python # simplified def tick(node): # "debug.info" was not in pubs -- auto-created as a GenericMessage topic node.send("debug.info", {"tick": 42, "status": "ok"}) ``` Auto-created topics always use the GenericMessage path (string-style, ~5--50 us). If you want the typed fast path, you must declare the topic in `pubs`/`subs` with a message type. Auto-creation is convenient for debugging and prototyping but should not be relied on in production -- undeclared topics make the data flow harder to trace and always take the slower generic path. --- ## Design Decisions **Why three declaration styles instead of one?** String topics are the simplest possible API -- you just name your data channel and send anything through it. Typed topics exist because serialization overhead matters at high frequencies. Named typed topics exist because real robots have multiple instances of the same sensor type (two cameras, four wheels) and need distinct names with the same zero-copy performance. Each style exists to serve a different need; none is universally better. **Why is the default capacity 1024 messages?** A control loop at 1 kHz produces 1000 messages per second. At the default capacity, a subscriber can fall a full second behind before messages are dropped. This is generous enough for most applications while keeping memory usage bounded. Increase it for bursty workloads; decrease it if memory is constrained. **Why does `recv()` return `None` instead of blocking?** Blocking in a tick function stalls the entire scheduler -- every other node waits until the blocked node returns. Returning `None` lets the node continue to its next tick, and the scheduler maintains its timing guarantees. This is the same design as Rust's `Option`-based `recv()` for the same reason. **Why auto-create topics?** Strict declaration-only topics are safer (every data flow is explicit), but they add friction during prototyping. Auto-creation lets you add a `send("debug.foo", ...)` call without modifying the node constructor. The trade-off is that auto-created topics are invisible in the node's `pubs`/`subs` lists and always use the slower generic path -- a deliberate incentive to declare your topics properly for production. **Why the standalone Topic class?** Nodes require a scheduler. But testing, scripting, and one-shot commands should not need a full scheduler setup. The standalone `Topic` class provides direct SHM access for these use cases, at the cost of no lifecycle management. --- ## Trade-offs | Gain | Cost | |------|------| | **Three declaration styles** -- pick the right abstraction level | Must understand the performance difference between generic and typed | | **Auto-created topics** -- fast prototyping, no boilerplate | Undeclared topics take the slow generic path; hidden data flow | | **1024-slot default buffer** -- generous for most workloads | Uses more memory than Rust's default 4-slot buffer | | **`recv()` returns `None`** -- never blocks the scheduler | Must check for `None` on every call (no exceptions on empty) | | **Pod zero-copy for typed** -- 1.5 us cross-process | Only works for fixed-layout message types, not arbitrary dicts | | **Cross-process transparency** -- same API for in-process and cross-process | All cross-process topics incur SHM overhead; cannot force in-process-only | | **Standalone `Topic` class** -- works outside the scheduler | No rate control, no lifecycle, no budget enforcement | --- ## See Also - [Topics: How Nodes Talk](/concepts/topics-beginner) -- Beginner introduction to topics with first-principles explanations - [Topics -- Full Reference](/concepts/core-concepts-topic) -- Rust-focused deep dive: 10 backends, live migration, capacity tuning - [Python Bindings](/python/api/python-bindings) -- Complete Python API reference for Node, Scheduler, and Topic - [Custom Messages](/python/api/custom-messages) -- Define your own typed messages in Python - [Message Library](/python/library/python-message-library) -- 55+ typed message classes (CmdVel, Imu, LaserScan, etc.) - [Shared Memory](/concepts/shared-memory) -- How SHM works under the hood: ring buffers, directory structure, namespace isolation --- ## Error Types Path: /python/api/error-types Description: Python exception types — HorusNotFoundError, HorusTransformError, HorusTimeoutError, and common error patterns # Error Types HORUS defines three custom exception types plus uses standard Python exceptions. All are importable from `horus`. ```python # simplified from horus import HorusNotFoundError, HorusTransformError, HorusTimeoutError ``` --- ## Custom Exceptions | Exception | Base | When Raised | |-----------|------|-------------| | `HorusNotFoundError` | `Exception` | Topic, node, frame, or resource not found | | `HorusTransformError` | `Exception` | Transform lookup fails (frame not in tree, stale data, extrapolation) | | `HorusTimeoutError` | `Exception` | Blocking operation exceeded timeout | ### HorusNotFoundError Raised when looking up a topic, node, or transform frame that doesn't exist. ```python # simplified import horus try: tf = horus.TransformFrame() result = tf.lookup("base_link", "nonexistent_frame") except horus.HorusNotFoundError as e: print(f"Frame not found: {e}") ``` **Common causes:** - Subscribing to a topic that no publisher has created - Looking up a transform frame before it's been broadcast - Querying a node that hasn't registered with the scheduler ### HorusTransformError Raised when a transform lookup fails due to stale data, extrapolation, or disconnected frames. ```python # simplified import horus try: tf = horus.TransformFrame() result = tf.lookup("base_link", "camera", timestamp=horus.timestamp_ns()) except horus.HorusTransformError as e: print(f"Transform error: {e}") # Fall back to last known transform or skip this tick ``` **Common causes:** - Querying a timestamp older than the transform buffer window - Frames not connected in the transform tree - Transform publisher stopped broadcasting ### HorusTimeoutError Raised when a blocking operation exceeds its timeout. ```python # simplified try: tf = horus.TransformFrame() # Wait up to 1 second for transform to become available result = tf.lookup_with_timeout("base_link", "gripper", timeout_ms=1000) except horus.HorusTimeoutError: print("Transform not available within 1 second") ``` --- ## Standard Python Exceptions HORUS also raises standard Python exceptions in certain cases: | Exception | When | |-----------|------| | `ValueError` | Invalid parameter (negative rate, empty name, bad capacity) | | `TypeError` | Wrong type passed to `send()` (e.g., lambda, socket object) | | `RuntimeError` | Scheduler not running, double-start, or internal error | | `OSError` | Hardware I/O failure (from driver libraries like serialport, smbus2) | --- ## Common Error Patterns ### Graceful Hardware Fallback ```python # simplified import horus try: entries = horus.hardware.load() except Exception as e: print(f"No hardware config: {e}") print("Running in simulation mode") entries = [] for name, obj in entries: if hasattr(obj, 'get_or'): port = obj.get_or("port", "/dev/ttyUSB0") ``` ### Stale Transform Recovery ```python # simplified def tick(node): try: transform = tf.lookup("base_link", "camera") process_with_transform(transform) except horus.HorusTransformError: # Use identity transform as fallback node.log_warning("Transform stale, using identity") process_without_transform() ``` ### Topic Not Found ```python # simplified def tick(node): msg = node.recv("sensor.data") if msg is None: return # No message available — normal, not an error # recv() returns None for no data, raises HorusNotFoundError # only if the topic itself doesn't exist (rare — topics auto-created from pubs/subs) ``` ### Protecting Against Serialization Errors ```python # simplified def tick(node): data = compute_result() try: node.send("results", data) except TypeError as e: # Data contains non-serializable type (custom class, socket, etc.) node.log_error(f"Cannot serialize: {e}") # Convert to dict and retry node.send("results", {"value": str(data)}) ``` --- ## Rust-to-Python Error Mapping When Rust code raises an error that crosses the PyO3 boundary: | Rust Error | Python Exception | |-----------|-----------------| | `NotFoundError` | `HorusNotFoundError` | | `TransformError` | `HorusTransformError` | | `TimeoutError` | `HorusTimeoutError` | | `ValidationError` | `ValueError` | | `ConfigError` | `ValueError` | | `InvalidInput` | `ValueError` | | `InvalidDescriptor` | `ValueError` | | `ParseError` | `ValueError` | | `SerializationError` | `TypeError` | | `Io` | `OSError` | | `Memory` | `MemoryError` | | `CommunicationError` | `RuntimeError` | | All other variants | `RuntimeError` | --- ## See Also - [Error Handling Guide](/python/error-handling) — Error handling patterns and best practices - [Node API](/python/api/node) — `on_error` callback for node-level error handling - [Scheduler API](/python/api/scheduler) — `failure_policy` for error recovery - [Rust Error Types](/rust/api/error-types) — Rust equivalent (13 variants) --- ## Vision Messages Path: /python/messages/vision Description: Images, point clouds, depth maps, camera calibration, and stereo vision for Python robotics — every method explained # Vision Messages Vision messages carry image data, point clouds, depth maps, and camera calibration. The pool-backed types (`Image`, `PointCloud`, `DepthImage`) use zero-copy shared memory for maximum throughput — no serialization overhead when passing images between nodes. ```python # simplified from horus import ( Image, PointCloud, DepthImage, CameraInfo, CompressedImage, RegionOfInterest, StereoInfo, ) ``` --- ## Image (Pool-Backed) Zero-copy image with numpy and PyTorch interop. Memory lives in shared memory — passing an Image between nodes costs ~3us regardless of resolution. ### Constructor ```python # simplified img = Image(height=480, width=640, encoding=0) # RGB8 ``` ### Fields | Field | Type | Description | |-------|------|-------------| | `height` | `int` | Image height in pixels | | `width` | `int` | Image width in pixels | | `encoding` | `int` | Pixel format (0=RGB8, 1=BGR8, 2=RGBA8, 3=GRAY8, 4=GRAY16, 5=DEPTH32F) | | `timestamp_ns` | `int` | Capture timestamp in nanoseconds | ### `.from_numpy(array)` — Create from Numpy Array ```python # simplified import numpy as np pixels = np.zeros((480, 640, 3), dtype=np.uint8) img = Image.from_numpy(pixels) ``` Creates an Image from a numpy array. The array shape determines height, width, and channel count. Accepted shapes: `(H, W, 3)` for RGB/BGR, `(H, W, 4)` for RGBA, `(H, W)` for grayscale. ### `.to_numpy()` — Convert to Numpy (Zero-Copy) ```python # simplified arr = img.to_numpy() arr[100, 200] = [255, 0, 0] # Set a red pixel ``` Returns a numpy array view of the image data. This is zero-copy when possible — the numpy array points directly at the shared memory buffer. ### `.to_torch()` — Convert to PyTorch Tensor ```python # simplified tensor = img.to_torch() # CHW format for PyTorch models ``` Returns a PyTorch tensor in CHW (Channel, Height, Width) format, which is what most PyTorch vision models expect. Values are float32 in range [0.0, 1.0]. > **Common mistake:** Holding references to Image across ticks. Pool-backed images are recycled — the same memory slot may be reused on the next tick. Process the image within the current tick, or copy the data with `img.to_numpy().copy()`. **Example — Image Processing Pipeline:** ```python # simplified from horus import Image, Detection, Topic import numpy as np cam_topic = Topic(Image) det_topic = Topic(Detection) def detect_objects(node): img = cam_topic.recv(node) if img is None: return arr = img.to_numpy() # Zero-copy view # Run your detector (YOLO, SSD, etc.) results = my_model.predict(arr) for r in results: det = Detection(class_name=r.label, confidence=r.score, x=r.x, y=r.y, width=r.w, height=r.h) det_topic.send(det, node) ``` --- ## PointCloud (Pool-Backed) Zero-copy 3D point cloud with field descriptors. Used for LiDAR data, stereo reconstruction, and depth camera output. ### Constructor ```python # simplified cloud = PointCloud(num_points=1000) ``` ### Fields | Field | Type | Description | |-------|------|-------------| | `num_points` | `int` | Number of 3D points | | `timestamp_ns` | `int` | Capture timestamp in nanoseconds | ### `.from_xyz(array)` — Create from Numpy XYZ Array ```python # simplified import numpy as np xyz = np.random.randn(1000, 3).astype(np.float32) cloud = PointCloud.from_xyz(xyz) ``` Creates a point cloud from an Nx3 float32 numpy array. Each row is (x, y, z) in meters. ### `.point_count()` — Number of Points ```python # simplified print(cloud.point_count()) # 1000 ``` **Example — Obstacle Detection from Point Cloud:** ```python # simplified from horus import PointCloud, Topic import numpy as np cloud_topic = Topic(PointCloud) def check_obstacles(node): cloud = cloud_topic.recv(node) if cloud is None: return xyz = cloud.to_numpy() # Nx3 array distances = np.linalg.norm(xyz[:, :2], axis=1) # 2D distance from robot close_points = np.sum(distances < 1.0) if close_points > 50: print(f"WARNING: {close_points} points within 1m!") ``` --- ## DepthImage (Pool-Backed) Depth map from a depth camera (RealSense, Kinect, stereo). Each pixel stores a distance in meters. ### Constructor ```python # simplified depth = DepthImage(height=480, width=640) ``` ### Fields | Field | Type | Description | |-------|------|-------------| | `height` | `int` | Image height in pixels | | `width` | `int` | Image width in pixels | | `timestamp_ns` | `int` | Capture timestamp in nanoseconds | ### `.get_depth(u, v)` — Depth at a Pixel ```python # simplified d = depth.get_depth(320, 240) # Depth at image center (meters) ``` Returns the depth value at pixel (u, v) in meters, or `None` if the pixel is out of bounds or has no valid depth. Invalid depth is typically 0.0 or NaN. ### `.set_depth(u, v, value)` — Set Depth at a Pixel ```python # simplified depth.set_depth(320, 240, 1.5) # Set depth to 1.5m ``` ### `.depth_statistics()` — Summary Statistics ```python # simplified stats = depth.depth_statistics() # min, max, mean, valid_count print(f"Range: {stats[0]:.2f}m - {stats[1]:.2f}m, Mean: {stats[2]:.2f}m") print(f"Valid pixels: {stats[3]} / {depth.width * depth.height}") ``` Returns (min, max, mean, valid_count) across all non-zero pixels. Use this for quick quality checks: if `valid_count` is low, the depth camera may be obstructed or miscalibrated. **Example — Depth-Based Floor Detection:** ```python # simplified from horus import DepthImage, Topic depth_topic = Topic(DepthImage) def check_floor(node): depth = depth_topic.recv(node) if depth is None: return # Sample the bottom row of the image floor_depths = [] for u in range(0, depth.width, 10): d = depth.get_depth(u, depth.height - 1) if d is not None and d > 0: floor_depths.append(d) if floor_depths: avg = sum(floor_depths) / len(floor_depths) print(f"Average floor distance: {avg:.2f}m") ``` --- ## CameraInfo Camera intrinsic calibration parameters — the link between 2D pixel coordinates and 3D world coordinates. ### Constructor ```python # simplified cam = CameraInfo(width=640, height=480, fx=525.0, fy=525.0, cx=320.0, cy=240.0) ``` ### `.focal_lengths()` — Camera Focal Length ```python # simplified fx, fy = cam.focal_lengths() print(f"Focal length: ({fx:.0f}, {fy:.0f}) pixels") ``` Returns (fx, fy) in pixels. The focal length determines how "zoomed in" the camera is — higher values = narrower field of view. In the pinhole camera model, a 3D point (X, Y, Z) projects to pixel (fx*X/Z + cx, fy*Y/Z + cy). ### `.principal_point()` — Optical Center ```python # simplified cx, cy = cam.principal_point() print(f"Principal point: ({cx:.0f}, {cy:.0f})") ``` Returns (cx, cy) — the pixel where the optical axis hits the image sensor. Usually near the image center, but not exactly — calibration corrects for manufacturing imprecision. ### `.with_distortion_model(model)` — Set Lens Distortion Type ```python # simplified cam = cam.with_distortion_model("plumb_bob") ``` Returns a new CameraInfo with the distortion model name. Common models: - `"plumb_bob"`: Standard radial + tangential (5 coefficients) - `"equidistant"`: Fisheye cameras - `"none"`: Synthetic cameras (no distortion) **Example — 3D Projection from Depth:** ```python # simplified from horus import CameraInfo, DepthImage, Point3 cam = CameraInfo(width=640, height=480, fx=525.0, fy=525.0, cx=320.0, cy=240.0) def pixel_to_3d(cam, depth, u, v): """Convert pixel (u,v) + depth to 3D point.""" fx, fy = cam.focal_lengths() cx, cy = cam.principal_point() z = depth.get_depth(u, v) if z is None or z <= 0: return None x = (u - cx) * z / fx y = (v - cy) * z / fy return Point3(x=x, y=y, z=z) ``` --- ## CompressedImage JPEG/PNG compressed image for network transmission. Smaller than raw Image but requires decompression. ### Constructor ```python # simplified img = CompressedImage(format="jpeg", data=jpeg_bytes, width=640, height=480) ``` ### Fields | Field | Type | Description | |-------|------|-------------| | `format` | `str` | Compression format ("jpeg", "png") | | `data` | `bytes` | Compressed image data | | `width` | `int` | Image width in pixels | | `height` | `int` | Image height in pixels | | `timestamp_ns` | `int` | Capture timestamp in nanoseconds | ### `.format_str()` — Get Format Name ```python # simplified print(img.format_str()) # "jpeg" ``` Use `CompressedImage` for network topics or when bandwidth is limited. Use `Image` for local processing (zero-copy, no decompression needed). **Example — Compress and Send:** ```python # simplified from horus import Image, CompressedImage, Topic import cv2 import numpy as np cam_topic = Topic(Image) net_topic = Topic(CompressedImage) def compress_and_send(node): img = cam_topic.recv(node) if img is None: return arr = img.to_numpy() _, jpeg_data = cv2.imencode('.jpg', arr, [cv2.IMWRITE_JPEG_QUALITY, 80]) compressed = CompressedImage( format="jpeg", data=jpeg_data.tobytes(), width=img.width, height=img.height ) net_topic.send(compressed, node) ``` --- ## RegionOfInterest Rectangular region in an image — for cropping detections or specifying areas of interest. ### Constructor ```python # simplified roi = RegionOfInterest(x=100, y=200, width=50, height=60) ``` ### Fields | Field | Type | Description | |-------|------|-------------| | `x` | `int` | Left edge (pixels from left) | | `y` | `int` | Top edge (pixels from top) | | `width` | `int` | Region width in pixels | | `height` | `int` | Region height in pixels | ### `.contains(x, y)` — Is a Pixel Inside? ```python # simplified if roi.contains(120, 220): print("Pixel is inside the ROI") ``` ### `.area()` — Region Size ```python # simplified print(f"ROI area: {roi.area()} pixels") # 3000 ``` **Use case — Crop a detection from an image:** ```python # simplified arr = img.to_numpy() cropped = arr[roi.y:roi.y+roi.height, roi.x:roi.x+roi.width] ``` --- ## StereoInfo Stereo camera pair calibration with depth-to-disparity conversion. ### Constructor ```python # simplified stereo = StereoInfo(baseline=0.12, fx=525.0) ``` ### Fields | Field | Type | Description | |-------|------|-------------| | `baseline` | `float` | Distance between camera centers in meters | | `fx` | `float` | Horizontal focal length in pixels | ### `.depth_from_disparity(disparity)` — Convert Disparity to Depth ```python # simplified stereo = StereoInfo(baseline=0.12, fx=525.0) # 12cm between cameras depth = stereo.depth_from_disparity(disparity=30.0) # depth = baseline * fx / disparity ``` The fundamental stereo vision equation: Z = B * f / d. Given disparity (pixel offset between left and right images), compute real-world depth. Larger baseline = better depth accuracy at long range. ### `.disparity_from_depth(depth)` — Convert Depth to Disparity ```python # simplified disp = stereo.disparity_from_depth(depth=2.0) # Expected disparity for an object at 2 meters ``` The inverse — useful for setting depth range limits in stereo matching algorithms. **Example — Stereo Depth Range Check:** ```python # simplified from horus import StereoInfo stereo = StereoInfo(baseline=0.12, fx=525.0) # What disparity range corresponds to 0.5m - 10m depth? near_disp = stereo.disparity_from_depth(0.5) # ~126 pixels far_disp = stereo.disparity_from_depth(10.0) # ~6.3 pixels print(f"Search range: {far_disp:.0f} - {near_disp:.0f} pixels") ``` --- ## Design Decisions **Why pool-backed images instead of heap-allocated buffers?** A 1080p RGB image is 6MB. Allocating and freeing 6MB at 30fps causes GC pressure and memory fragmentation. Pool-backed images recycle pre-allocated shared memory slots, eliminating allocation overhead. The tradeoff is a fixed pool size (configurable) and the requirement to process images within a single tick. **Why `to_numpy()` returns a view instead of a copy?** Copying 6MB per frame at 30fps wastes 180MB/s of memory bandwidth. A view points directly at the shared memory buffer, so there is zero overhead. The tradeoff: the view is only valid during the current tick (the pool may recycle the slot on the next tick). If you need the data longer, call `.copy()` on the numpy array. **Why separate `Image` and `CompressedImage`?** They serve different transport needs. `Image` is raw pixels in shared memory (zero-copy, ~3us transfer, local only). `CompressedImage` is JPEG/PNG bytes (requires encoding/decoding, smaller size, suitable for network transport). Using the wrong one wastes either bandwidth (raw over network) or CPU (decompressing locally). **Why does `CameraInfo` store focal length in pixels, not millimeters?** Pixel-based focal lengths are what you need for the pinhole camera model: `pixel_x = fx * X/Z + cx`. Millimeter focal lengths require an additional conversion using sensor pixel size. Since calibration tools (OpenCV, ROS camera_calibration) output pixel-based values, storing them directly avoids a conversion step and a potential source of error. **Why does `StereoInfo` take `baseline` in meters?** The stereo depth equation is `Z = baseline * fx / disparity`. Baseline must be in the same units as the desired depth output. Since horus uses meters for all spatial measurements, baseline is in meters. If your stereo camera spec lists baseline in centimeters or millimeters, convert before construction. --- ## See Also - [Perception Messages](/python/messages/perception) — Detection, SegmentationMask - [Geometry Messages](/python/messages/geometry) — Point3 for 3D positions - [Sensor Messages](/python/messages/sensor) — LaserScan, Imu for non-vision sensors - [Python Message Library](/python/library/python-message-library) — All 55+ message types overview --- ## Shared Memory Architecture (Python) Path: /python/shared-memory Description: How HORUS shared memory works from Python — ring buffers, zero-copy versus serialization, pool-backed types, cross-process discovery, and memory management # Shared Memory Architecture (Python) When your Python node calls `node.send("imu", reading)`, the data does not pass through a socket, a pipe, or the kernel. It lands in a ring buffer backed by shared memory. Another node — Python or Rust, same process or different — reads it directly from that same memory region. No serialization for typed messages. No copies for images. Sub-microsecond latency. You do not need to configure shared memory, allocate buffers, or manage file descriptors. The HORUS runtime handles all of it. This page explains what happens underneath so you can make informed decisions about message types, debug performance issues, and understand why certain patterns are faster than others. --- ## The Two-Sentence Version **Typed messages** (`CmdVel`, `Imu`, `Pose3D`) are binary-compatible structs written directly into shared memory — zero copies, approximately 1.5 us end-to-end. **Dict messages** (`node.send("data", {"key": value})`) are serialized to bytes first — one copy in, one copy out, approximately 5-50 us depending on size. If performance matters for a topic, use a typed message. If convenience matters, use a dict. Both travel through the same ring buffer. --- ## Ring Buffers Every topic is backed by a **circular buffer** in shared memory. The buffer has a fixed number of slots. The publisher writes to the next slot; the subscriber reads from the oldest unread slot. ```text Publisher writes here ──┐ ▼ ┌──────┬──────┬──────┬──────┬──────┬──────┬──────┬──────┐ │ msg5 │ msg6 │ msg7 │ msg8 │ │ │ │ │ └──────┴──────┴──────┴──────┴──────┴──────┴──────┴──────┘ ▲ └── Subscriber reads here ``` Key properties: - **Lock-free**: No mutexes. Publisher and subscriber use atomic operations, so neither blocks the other. - **Single-producer / single-consumer (SPSC)**: The fastest path. HORUS auto-detects when multiple publishers or subscribers exist and upgrades the backend. - **Overflow drops the oldest**: If the subscriber cannot keep up, the publisher overwrites the oldest unread slot. The subscriber sees the freshest data, not stale data. This is intentional — a robot controller needs the latest sensor reading, not one from 100 ms ago. - **Capacity is a power of two**: Slot lookup uses a bitmask instead of division — a single CPU instruction. The default capacity is auto-sized from the message type (16-1024 slots). From Python, you never interact with the ring buffer directly. `node.send()` writes into it; `node.recv()` reads from it. --- ## Zero-Copy vs. Copy: When Does a Copy Happen? This is the most important performance distinction in HORUS Python. ### Typed Messages: Zero-Copy (~1.5 us) When you declare a topic with a typed message class, the data is a fixed-size binary struct: ```python # simplified from horus import Node, CmdVel, Imu node = Node( name="controller", pubs=[CmdVel], # typed — Pod zero-copy subs=[Imu], # typed — Pod zero-copy tick=my_tick, rate=100, ) def my_tick(node): reading = node.recv("imu") # reads directly from SHM cmd = CmdVel(linear_x=0.5, angular_z=0.1) node.send("cmd_vel", cmd) # writes directly to SHM ``` What happens internally: 1. `node.send("cmd_vel", cmd)` — The `CmdVel` struct (a few dozen bytes) is `memcpy`'d into the ring buffer slot. No serialization. 2. `node.recv("imu")` — The bytes are read from the ring buffer slot and interpreted as an `Imu` struct. No deserialization. This works because typed messages (`CmdVel`, `Imu`, `Pose2D`, `Pose3D`, `LaserScan`, `Odometry`, and the other 50+ types in `horus.messages`) are **POD types** — plain old data with no pointers, no heap allocations, and a fixed binary layout. The Rust and Python representations are byte-for-byte identical. ### Dict Messages: Serialized (~5-50 us) When you use string topic names and send Python dicts, HORUS serializes the data with MessagePack: ```python # simplified node = Node( name="logger", pubs=["status"], # string — GenericMessage tick=log_tick, rate=10, ) def log_tick(node): node.send("status", { "battery": 85.0, "mode": "autonomous", "errors": [], }) ``` What happens internally: 1. `node.send("status", {...})` — The dict is serialized to bytes (MessagePack), then the bytes are copied into the ring buffer slot. **Two operations: serialize + copy.** 2. `node.recv("status")` — The bytes are read from the ring buffer and deserialized back into a Python dict. **Two operations: copy + deserialize.** The overhead depends on the dict size. A small dict (a few fields) takes approximately 5 us. A large dict with nested structures can take 50 us or more. ### Decision Table | Pattern | Transport | Latency | When to use | |---------|-----------|---------|-------------| | `pubs=[CmdVel]` | Pod zero-copy | ~1.5 us | Sensor data, control commands, anything with a known type | | `pubs=["data"]` + dict | MessagePack serialization | ~5-50 us | Prototyping, configuration, logs, variable-shape data | | `pubs=[Image]` | Pool-backed descriptor | ~3 us (descriptor) | Camera frames, see next section | | `pubs=[PointCloud]` | Pool-backed descriptor | ~3 us (descriptor) | LiDAR scans, 3D data | | `pubs=[Tensor]` | Pool-backed descriptor | ~3 us (descriptor) | ML features, costmaps, custom arrays | --- ## Pool-Backed Types: Image, PointCloud, DepthImage, Tensor Camera frames, LiDAR scans, and ML tensors are too large for ring buffer slots (a 1080p RGB image is 6 MB). HORUS handles these with **pool-backed shared memory**: the actual data lives in a separate shared memory pool, and only a small descriptor (64-336 bytes) travels through the ring buffer. ```python # simplified from horus import Image, PointCloud, Tensor import numpy as np # Create an image — pixel data is allocated in the SHM pool img = Image(480, 640, "rgb8") # Write pixels — this writes directly into the SHM pool img.copy_from(camera_frame_bytes) # Or create from NumPy — one copy into the pool img = Image.from_numpy(frame_array, encoding="rgb8") ``` ### NumPy Zero-Copy Views The key feature: **reading pool-backed data produces a zero-copy NumPy view**. ```python # simplified def vision_tick(node): img = node.recv("camera") if img: # to_numpy() returns a view into shared memory — NO COPY pixels = img.to_numpy() # shape: (480, 640, 3), dtype: uint8 # You can pass this directly to OpenCV, scikit-image, etc. gray = np.mean(pixels, axis=2) edges = np.abs(np.diff(gray, axis=1)) ``` The `pixels` array points directly into the shared memory pool. No bytes are copied. The array is valid as long as the `img` object exists. The same applies to all pool-backed types: ```python # simplified cloud = node.recv("lidar") points = cloud.to_numpy() # shape: (N, 3), dtype: float32 — zero-copy depth = node.recv("depth") depth_map = depth.to_numpy() # shape: (H, W), dtype: float32 — zero-copy tensor = node.recv("features") arr = tensor.numpy() # shape matches creation — zero-copy ``` ### GPU Interop via DLPack Pool-backed types support zero-copy conversion to PyTorch and JAX through the DLPack protocol. When data is on GPU, DLPack exports a CUDA device pointer -- PyTorch receives a CUDA tensor directly, no CPU roundtrip: ```python # simplified import torch # CPU Image to PyTorch — zero-copy via SHM img = node.recv("camera") tensor = img.to_torch() # torch.Tensor on CPU, backed by SHM # GPU Image to PyTorch — zero-copy via CUDA pointer gpu_img = img.to_gpu() # copy to CUDA managed memory gpu_tensor = gpu_img.to_torch() # torch.Tensor on CUDA, zero-copy # Transfer between devices cpu_img = gpu_img.to_cpu() # copy back to CPU print(gpu_img.device) # "cuda:0" print(gpu_img.is_gpu) # True # Tensor to PyTorch — zero-copy features = node.recv("features") pt = torch.from_dlpack(features) # standard DLPack protocol # Tensor to JAX — zero-copy import jax jax_arr = features.to_jax() # GPU detection print(horus.cuda_available()) # True if GPU present print(horus.cuda_device_count()) # Number of GPUs print(horus.gpu_platform()) # e.g., "NVIDIA GeForce RTX 4090 (sm_89, 24.0 GB)" ``` DLPack is the standard zero-copy tensor exchange protocol supported by NumPy (1.25+), PyTorch (1.10+), JAX (0.4+), CuPy, and TensorFlow. One protocol covers all frameworks. GPU tensors export with `device_type=kDLCUDA`, enabling direct GPU-to-GPU transfer between HORUS and ML frameworks. ### The Zero-Copy Chain For pool-backed types, the full data path involves zero copies on the receive side: ```text Rust allocator ──► SHM pool ──► Python receives descriptor │ └──► img.to_numpy() ──► NumPy view (same memory) └──► img.to_torch() ──► PyTorch tensor (same memory) └──► img.to_jax() ──► JAX array (same memory) ``` `from_numpy()` and `from_torch()` do copy once — they place data into the pool so it can be shared. `to_numpy()` and `to_torch()` do not copy — they create views into existing pool memory. | Direction | Method | Copies | Why | |-----------|--------|--------|-----| | Python to SHM | `Image.from_numpy(arr)` | 1 | Data must go into a pool slot at a specific address | | Python to SHM | `Tensor.from_torch(t)` | 1 | Same reason | | SHM to Python | `img.to_numpy()` | 0 | Returns a view into the pool slot | | SHM to Python | `torch.from_dlpack(tensor)` | 0 | DLPack wraps the pool pointer | --- ## Cross-Process Auto-Discovery Two Python processes (or a Python process and a Rust process) sharing a topic need no configuration. They discover each other through the shared memory filesystem. ```bash # Terminal 1: Python sensor node horus run sensor.py # Terminal 2: Python controller node (or Rust — doesn't matter) horus run controller.py ``` What happens: 1. Process A calls `Topic("imu")` internally, which creates a shared memory file and writes a header with type info, capacity, and a magic number. 2. Process B calls `Topic("imu")`, finds the existing file, validates the type matches, and memory-maps the same region. 3. Both processes read and write the same ring buffer. Writes by A are immediately visible to B. **Namespace isolation**: By default, each `horus run` invocation gets an auto-generated namespace (derived from session ID and user ID). Two terminals get different namespaces and cannot see each other's topics. To share topics across terminals: ```bash # Both terminals must use the same namespace HORUS_NAMESPACE=robot horus run sensor.py HORUS_NAMESPACE=robot horus run controller.py ``` Or use `horus launch`, which sets the namespace automatically for all nodes in the launch file. **Mixed languages**: A Python node and a Rust node can share the same typed topic. The binary layout of `CmdVel` in Python and `CmdVel` in Rust is identical — same field offsets, same size, same alignment. Dict topics (GenericMessage) also work cross-language because both sides use the same MessagePack format. --- ## Platform Differences HORUS runs on Linux and macOS with the same Python API. The shared memory mechanism differs underneath: | Aspect | Linux | macOS | |--------|-------|-------| | SHM mechanism | `/dev/shm` (tmpfs backed by RAM) | `shm_open()` (Mach VM) | | Base directory | `/dev/shm/horus_/` | `/tmp/horus_/` | | Topic name rule | Any valid filename characters | **No slashes** — `shm_open` limitation | | Stale detection | `flock` (kernel-managed) | PID-based via `.meta` files | **Topic naming**: Always use dots as separators (`sensor.imu`, `camera.rgb`), never slashes. This rule is enforced on all platforms for portability. Your code works on both Linux and macOS without changes. --- ## Cleanup ### Automatic (You Rarely Need This Section) HORUS has three automatic cleanup layers: 1. **Normal exit**: When your Python process exits (Ctrl+C, `horus.run()` returns), shared memory files owned by that process are removed automatically. 2. **Startup cleanup**: Every `horus` CLI command scans for stale namespaces from dead sessions and removes them. Cost: <1 ms. 3. **Pre-run cleanup**: Before every `horus run`, stale topics older than 5 minutes with no live processes are removed. You almost never need to think about cleanup. It happens silently. ### Manual Escape Hatch If a process is killed with `kill -9` in rapid succession (before the next `horus` command triggers auto-cleanup), stale SHM files may linger. The manual escape hatch: ```bash # Preview what would be cleaned horus clean --shm --dry-run # Remove stale SHM files horus clean --shm # Nuclear option: SHM + build cache + everything horus clean --all ``` **Never manually delete files under `/dev/shm/horus_*/`** — use `horus clean --shm` instead. The cleanup command knows which files are stale and which are actively in use. --- ## Memory Management: Python GC vs. Rust Allocator A common concern with shared memory in Python: does the garbage collector interfere? The short answer is **no**. ### How It Works Shared memory buffers are allocated and managed by the Rust runtime, not by Python's memory allocator. When you receive an `Image` in Python, the Python object is a thin wrapper around a Rust-owned reference to a pool slot. The actual pixel data lives in mmap'd shared memory that Python's GC cannot see or move. ```python # simplified img = node.recv("camera") pixels = img.to_numpy() # NumPy view into SHM — not a Python heap object # Python's GC tracks 'img' and 'pixels' (the wrapper objects) # but the underlying 6MB of pixel data is in SHM, managed by Rust ``` ### Lifetime Rules - **The NumPy view is valid as long as the source object exists.** If you drop the `img` reference and the GC collects it, the NumPy array becomes invalid. In practice, this is rarely a problem — you typically use the array within the same tick function. - **Pool slots use atomic reference counting.** When multiple subscribers receive the same image, each holds a reference. The pool slot is reclaimed only when all references are dropped. - **Python's GC cycles do not pause SHM access.** The GC only tracks Python wrapper objects (a few hundred bytes each). The megabytes of sensor data in shared memory are invisible to the GC. ### What This Means in Practice - You do not need to call `del` or manually free shared memory objects. - Large images and point clouds do not contribute to GC pressure. - You can hold references to received messages across ticks without leaking SHM (the reference count keeps the slot alive). - If you store a `to_numpy()` view in a long-lived variable, make sure the source HORUS object also stays alive. --- ## When to Care About SHM Details Most Python users never need to think about shared memory. HORUS handles it. Here is when these details matter: **You should care when:** - You are choosing between typed messages and dicts for a high-frequency topic. Typed messages are 3-30x faster. See the decision table above. - You are processing camera images or LiDAR clouds and want to avoid unnecessary copies. Use `to_numpy()` instead of converting to a Python list. - You see "Topic not found" errors across processes — check that both processes use the same `HORUS_NAMESPACE`. - You are debugging latency spikes — use `horus topic list --verbose` to see which backend each topic uses. - You are running on macOS and topic names contain slashes — change to dots. **You should not care when:** - You are prototyping at low frequencies (10-30 Hz). Dict topics are fine. - You have a single-process application. Everything stays in-process with no SHM overhead. - You are writing application logic. Just call `node.send()` and `node.recv()`. --- ## Design Decisions and Trade-offs **Why shared memory instead of sockets or pipes?** Sockets and pipes require kernel transitions (`write()` + `read()` system calls) for every message — 1-5 us of overhead per message. Shared memory is just regular memory access: the CPU reads and writes at RAM speed without entering the kernel. For a robot running control loops at 1 kHz, that 1-5 us saving per message adds up to keeping the control cycle under budget. **Why ring buffers instead of queues?** Ring buffers have a fixed memory footprint (allocated once at topic creation) and predictable access patterns. A growable queue would require dynamic allocation, which is unpredictable in real-time contexts. The fixed size also means overflow behavior is explicit: the oldest message is dropped, and the subscriber always gets the freshest data. **Why drop oldest on overflow instead of blocking?** A robot controller that blocks waiting for a full buffer to drain is a robot controller that crashes into a wall. Dropping stale data and keeping fresh data is the safer default for robotics. If you need guaranteed delivery (log collection, recording), increase the ring buffer capacity. **Why typed messages are faster than dicts:** A `CmdVel` struct is 24 bytes with a fixed layout known at compile time. A dict `{"linear_x": 0.5, "angular_z": 0.1}` must be serialized to bytes (encoding field names, types, values), copied into the buffer, then deserialized on the other side. The typed path skips all of that — it writes the 24 bytes directly. **Why `from_numpy()` copies but `to_numpy()` does not:** The shared memory pool allocator controls where data lives — each slot is at a specific address within the mmap'd region. A NumPy array created by your application lives at an arbitrary heap address that cannot be shared across processes. So `from_numpy()` copies once into the pool. On the receive side, the data is already in shared memory at a known address, so `to_numpy()` wraps it as a view — no copy. **Why Python's GC does not affect performance:** Shared memory buffers are mmap'd regions managed by the Rust allocator. Python's garbage collector only tracks the small wrapper objects (a few hundred bytes). The megabytes of actual sensor data are invisible to the GC, so GC pauses do not cause latency spikes in the data path. | Choice | Benefit | Cost | |--------|---------|------| | SHM ring buffers | Sub-microsecond latency, no kernel involvement | Platform-specific internals (hidden from Python users) | | Typed messages for POD types | Zero-copy, ~1.5 us end-to-end | Must use predefined message types (50+ available) | | Dict messages via MessagePack | Any Python object, no schema needed | 3-30x slower than typed messages | | Pool-backed large types | Zero-copy views for images and clouds | One copy on publish (`from_numpy()`) | | Atomic refcounting for pool slots | Multiple subscribers share one copy | Slots reclaimed only when all refs dropped | | Overflow drops oldest | Controller always gets fresh data | Subscribers that fall behind lose messages | --- ## Inspecting Topics at Runtime ```bash # List all active topics with backend type and message rate horus topic list --verbose # Watch a topic's messages in real time horus topic echo sensor.imu # Measure publishing rate horus topic hz sensor.imu # Measure bandwidth horus topic bw camera.rgb # See running nodes and their topic connections horus node list ``` --- ## See Also - [Shared Memory (Concepts)](/concepts/shared-memory) — Full architecture: ring buffer internals, backend selection, cache-line layout, SIMD optimization - [Python Memory Types](/python/api/memory-types) — Image, PointCloud, DepthImage, Tensor API reference - [Python Image](/python/api/image) — Camera image API with encoding table and framework conversions - [Tensor](/python/api/tensor) — General-purpose shared memory tensor with Pythonic API - [Multi-Process Architecture](/concepts/multi-process) — Cross-process topics, namespace management, mixed-language nodes - [Python Bindings](/python/api/python-bindings) — Full Python API: Node, send/recv, topic declaration formats --- ## Rate & Params Path: /python/api/rate-params Description: Rate limiter for fixed-frequency loops and RuntimeParams for dynamic configuration # Rate & Params Two utility types for timing control and dynamic configuration outside the scheduler's node lifecycle. --- ## Rate — Fixed-Frequency Loop `horus.Rate` provides drift-compensated rate limiting for loops that need to run at a fixed frequency. Use it for standalone scripts and background threads — inside nodes, the scheduler handles timing automatically. ```python # simplified import horus rate = horus.Rate(100) # 100 Hz while True: do_work() rate.sleep() # Blocks until next tick (drift-compensated) ``` ### Constructor ```python # simplified horus.Rate(hz: float) # Target frequency in Hz ``` ### Methods | Method | Returns | Description | |--------|---------|-------------| | `rate.sleep()` | — | Block until next tick (compensates for work time) | | `rate.remaining()` | `float` | Seconds until next tick | | `rate.reset()` | — | Reset the timer (next sleep starts fresh) | ### Example: Hardware Driver Thread ```python # simplified import horus import threading def imu_reader_thread(): rate = horus.Rate(100) # 100 Hz topic = horus.Topic(horus.Imu) while True: reading = read_imu_hardware() topic.send(reading) rate.sleep() thread = threading.Thread(target=imu_reader_thread, daemon=True) thread.start() ``` ### When to Use Rate vs Scheduler | Use Case | Use | |----------|-----| | Nodes with tick callbacks | Scheduler (handles timing, RT, safety) | | Standalone scripts | `Rate` | | Background threads alongside scheduler | `Rate` | | One-shot tools | Neither — just run once | --- ## Params — Runtime Parameters `horus.Params` is a typed key-value store for dynamic configuration. Change gains, thresholds, or feature flags at runtime without restarting nodes. ```python # simplified import horus params = horus.Params() params.set("pid.kp", 1.5) params.set("pid.ki", 0.01) kp = params.get("pid.kp") # 1.5 ``` ### Constructor ```python # simplified horus.Params() # Empty parameter store ``` ### Methods | Method | Returns | Description | |--------|---------|-------------| | `params.get(key)` | value | Get parameter value (raises if missing) | | `params.get_or(key, default)` | value | Get with fallback | | `params.set(key, value)` | — | Set parameter value | | `params.has(key)` | `bool` | Check if key exists | | `params.list_keys()` | `list[str]` | All parameter names | | `params.remove(key)` | — | Remove a parameter | | `params.reset()` | — | Clear all parameters | ### Example: Dynamic PID Tuning ```python # simplified import horus params = horus.Params() params.set("kp", 2.0) params.set("ki", 0.1) params.set("kd", 0.05) def controller_tick(node): kp = params.get_or("kp", 2.0) ki = params.get_or("ki", 0.1) kd = params.get_or("kd", 0.05) error = get_setpoint() - get_measured() output = kp * error + ki * integral + kd * derivative node.send("control", {"output": output}) # Another thread or monitoring tool can update params at runtime: # params.set("kp", 3.0) # Takes effect on next tick ``` ### Example: Feature Flags ```python # simplified params = horus.Params() params.set("enable_slam", True) params.set("max_speed", 1.0) def tick(node): if params.get_or("enable_slam", False): run_slam() speed = min(velocity, params.get_or("max_speed", 1.0)) node.send("cmd_vel", horus.CmdVel(linear=speed, angular=0.0)) ``` --- ## See Also - [Clock API](/python/api/clock) — Framework time functions (`dt()`, `budget_remaining()`) - [Scheduler API](/python/api/scheduler) — Scheduler-managed timing for nodes - [Rust Rate & Stopwatch](/rust/api/rate-stopwatch) — Rust equivalent - [Rust RuntimeParams](/rust/api/runtime-params) — Rust equivalent --- ## Input & Audio Messages Path: /python/messages/input Description: Joystick, keyboard, and audio types for Python robotics teleoperation and voice control — every method explained # Input & Audio Messages Input messages handle human interaction — gamepad teleoperation, keyboard shortcuts, and audio capture. These are the human interface layer of your robot. ```python # simplified from horus import JoystickInput, KeyboardInput, AudioFrame ``` --- ## JoystickInput Gamepad/joystick events with typed factories for each event kind. The factory methods create properly-typed events without you needing to remember raw integer constants. ### Fields | Field | Type | Description | |-------|------|-------------| | `joystick_id` | `int` | Controller identifier (for multi-controller setups) | | `event_type` | `int` | Event type (0=button, 1=axis, 2=hat, 3=connection) | | `element_id` | `int` | Button/axis/hat ID within the controller | | `element_name` | `str` | Human-readable name (e.g., "A", "left_stick_x") | | `value` | `float` | Analog value (-1.0 to 1.0 for sticks, 0.0/1.0 for buttons) | | `pressed` | `bool` | Button state (True = pressed, False = released) | | `timestamp_ns` | `int` | Event timestamp in nanoseconds | ### Factory Methods ```python # simplified # Button press btn = JoystickInput.new_button(joystick_id=0, button_id=1, name="A", pressed=True) # Analog axis (stick/trigger) — value range: -1.0 to 1.0 for sticks, 0.0 to 1.0 for triggers axis = JoystickInput.new_axis(joystick_id=0, axis_id=0, name="left_stick_x", value=0.75) # D-pad/hat switch hat = JoystickInput.new_hat(joystick_id=0, hat_id=0, name="dpad", value=1.0) # Controller connected/disconnected conn = JoystickInput.new_connection(joystick_id=0, connected=True) ``` ### Event Type Queries — Dispatch Pattern The standard way to handle mixed joystick events is to check the type and dispatch: ```python # simplified def handle_input(joy): if joy.is_button(): if joy.pressed and joy.element_name == "A": trigger_action() elif joy.element_name == "B": cancel_action() elif joy.is_axis(): if "stick_x" in joy.element_name: steer(joy.value) elif "trigger" in joy.element_name: throttle(joy.value) elif joy.is_hat(): handle_dpad(joy.value) elif joy.is_connection_event(): if joy.is_connected(): print("Controller connected") else: print("Controller disconnected — engaging e-stop!") ``` ### `.is_button()` — Is This a Button Event? ```python # simplified if joy.is_button(): print(f"Button {joy.element_name}: {'pressed' if joy.pressed else 'released'}") ``` ### `.is_axis()` — Is This an Axis Event? ```python # simplified if joy.is_axis(): print(f"Axis {joy.element_name}: {joy.value:.2f}") ``` Axis values are typically -1.0 to 1.0 for sticks (center = 0.0) and 0.0 to 1.0 for triggers (released = 0.0). ### `.is_hat()` — Is This a Hat/D-pad Event? ```python # simplified if joy.is_hat(): print(f"D-pad direction: {joy.value}") ``` ### `.is_connection_event()` — Is This a Hotplug Event? ```python # simplified if joy.is_connection_event(): connected = joy.is_connected() ``` Handle controller hotplug — if the operator's gamepad disconnects mid-mission, you should trigger an emergency stop. ### `.is_connected()` — Is the Controller Plugged In? ```python # simplified if joy.is_connection_event() and not joy.is_connected(): estop_topic.send(EmergencyStop.engage("Controller disconnected")) ``` **Example — Gamepad Teleoperation:** ```python # simplified from horus import Node, run, JoystickInput, CmdVel, EmergencyStop, Topic joy_topic = Topic(JoystickInput) cmd_topic = Topic(CmdVel) estop_topic = Topic(EmergencyStop) speed_scale = 1.0 linear = 0.0 angular = 0.0 def teleop(node): global speed_scale, linear, angular joy = joy_topic.recv(node) if joy is None: return if joy.is_axis(): name = joy.element_name if "left_stick_y" in name: linear = -joy.value * speed_scale # Forward/backward elif "left_stick_x" in name: angular = -joy.value * speed_scale # Turn left/right elif "right_trigger" in name: speed_scale = 0.5 + joy.value * 1.5 # Speed boost elif joy.is_button() and joy.pressed: if joy.element_name == "A": estop_topic.send(EmergencyStop.engage("Operator e-stop")) linear = angular = 0.0 elif joy.is_connection_event() and not joy.is_connected(): estop_topic.send(EmergencyStop.engage("Controller disconnected")) linear = angular = 0.0 return cmd_topic.send(CmdVel(linear=linear, angular=angular), node) run(Node(tick=teleop, rate=50, pubs=["cmd_vel", "estop"], subs=["joystick"])) ``` --- ## KeyboardInput Keyboard events with modifier key detection. ### Constructor ```python # simplified key = KeyboardInput(key_name="A", code=65, pressed=True, modifiers=0) ``` ### Fields | Field | Type | Description | |-------|------|-------------| | `key_name` | `str` | Key label (e.g., "A", "space", "escape") | | `code` | `int` | Platform-specific key code | | `pressed` | `bool` | True = key down, False = key up | | `modifiers` | `int` | Modifier bit flags (Ctrl=1, Shift=2, Alt=4) | ### `.is_ctrl()` / `.is_shift()` / `.is_alt()` — Modifier Checks ```python # simplified if key.is_ctrl() and key.key_name == "S": save_map() elif key.is_ctrl() and key.key_name == "Q": shutdown() elif key.key_name == "space" and key.pressed: toggle_estop() ``` These check the modifier bit flags. Use them for keyboard shortcuts in operator consoles and development tools. ### `.pressed` — Key State ```python # simplified if key.pressed: print(f"Key down: {key.key_name}") else: print(f"Key up: {key.key_name}") ``` **Example — Keyboard Shortcuts:** ```python # simplified from horus import KeyboardInput, Topic key_topic = Topic(KeyboardInput) def handle_keys(node): key = key_topic.recv(node) if key is None or not key.pressed: return if key.key_name == "space": toggle_pause() elif key.is_ctrl() and key.key_name == "Z": undo() elif key.key_name == "escape": emergency_stop() ``` --- ## AudioFrame Audio data from microphones or audio sources. Factory methods handle channel layout — use `mono()` for single-mic, `stereo()` for stereo pair, `multi_channel()` for microphone arrays. ### Fields | Field | Type | Description | |-------|------|-------------| | `sample_rate` | `int` | Sample rate in Hz (e.g., 16000, 44100, 48000) | | `channels` | `int` | Number of audio channels | | `samples` | `list[float]` | Interleaved audio samples (-1.0 to 1.0) | | `timestamp_ns` | `int` | Capture timestamp in nanoseconds | ### `.mono(sample_rate, samples)` — Single Channel ```python # simplified frame = AudioFrame.mono(sample_rate=16000, samples=[0.1, 0.2, -0.1, 0.0]) ``` 16kHz is standard for speech recognition. 44.1kHz or 48kHz for music-quality audio. ### `.stereo(sample_rate, samples)` — Two Channels ```python # simplified # Interleaved L/R samples: [L0, R0, L1, R1, ...] frame = AudioFrame.stereo(sample_rate=48000, samples=interleaved_data) ``` ### `.multi_channel(sample_rate, channels, samples)` — Microphone Array ```python # simplified # 4-channel microphone array at 16kHz frame = AudioFrame.multi_channel(sample_rate=16000, channels=4, samples=array_data) ``` Used for sound source localization (beamforming) — determining which direction a sound comes from using multiple microphones. ### `.duration_ms()` — Audio Duration ```python # simplified print(f"Frame duration: {frame.duration_ms():.1f} ms") ``` Computed from sample count and sample rate. Typical frame durations: 10-50ms for real-time processing, 100-500ms for batch processing. ### `.frame_count()` — Number of Samples per Channel ```python # simplified print(f"Samples per channel: {frame.frame_count()}") ``` **Example — Audio Recording Node:** ```python # simplified from horus import Node, run, AudioFrame, Topic audio_topic = Topic(AudioFrame) recorded_samples = [] def record_audio(node): frame = audio_topic.recv(node) if frame is not None: recorded_samples.extend(frame.samples) if frame.duration_ms() > 0: total_seconds = len(recorded_samples) / frame.sample_rate if total_seconds > 10: print(f"Recorded {total_seconds:.1f}s of audio") ``` --- ## Design Decisions **Why factory methods (`new_button()`, `new_axis()`) instead of raw event types?** Joystick events carry different data depending on type: buttons have a pressed/released boolean, axes have a floating-point value, hat switches have a direction value. The factories set the correct event type flag and validate the data for that type, preventing a button event from carrying an axis value or vice versa. **Why does `JoystickInput` include connection events?** Controller disconnection during teleoperation is a critical safety event. If the gamepad disconnects mid-mission, the robot receives no more velocity commands and coasts at its last speed. By including connection events in the same message stream, the handler can immediately detect disconnection and trigger an e-stop without needing a separate watchdog. **Why `AudioFrame` factory methods for channel layout?** Mono, stereo, and multi-channel audio have different interleaving patterns. Mono is just a flat array. Stereo is interleaved L/R. Multi-channel is interleaved across N channels. The factories set the channel count and validate that the sample array length is a multiple of the channel count, catching a common class of audio processing bugs. **Why `KeyboardInput` uses modifier bit flags instead of separate booleans?** Modifier keys can be combined (Ctrl+Shift+A). Bit flags allow efficient combination checking (`is_ctrl() and is_shift()`) without wasting space. The convenience methods (`is_ctrl()`, `is_shift()`, `is_alt()`) hide the bit manipulation, so you never need to work with raw flags. --- ## See Also - [Control Messages](/python/messages/control) — CmdVel for translating joystick input to velocity - [Diagnostics Messages](/python/messages/diagnostics) — EmergencyStop for safety - [Geometry Messages](/python/messages/geometry) — Twist for 6-DOF velocity commands from joysticks - [Python Message Library](/python/library/python-message-library) — All 55+ message types overview --- ## Safety & Policies (Python) Path: /python/safety-policies Description: Watchdog monitoring, deadline miss policies, failure policies, and safety patterns for Python HORUS nodes # Safety and Policies A warehouse robot runs Python nodes for ML-based obstacle detection, path planning, and telemetry. The obstacle detector runs a YOLO model at 30 Hz -- if inference takes too long, the robot drives blind. The path planner occasionally crashes when it receives malformed scan data. The telemetry uploader fails when the network drops. Each failure mode needs a different response: the detector needs to know it missed its timing window, the planner needs to restart, and the telemetry node should keep trying without bringing down the system. HORUS provides three complementary safety systems in Python: - **Miss policies** handle timing violations -- what happens when a node takes too long - **Failure policies** handle execution errors -- what happens when a node raises an exception - **Watchdogs** detect frozen nodes -- graduated response when a node stops responding entirely All three are configured through `Node()` constructor parameters and `Scheduler()` settings. They work together: a node can have a miss policy for deadline overruns, a failure policy for exceptions, and fall under the scheduler's watchdog for freeze detection. ## Miss Policies When a node has a `budget` or `deadline` set and exceeds it, the miss policy determines what happens next. Set it with the `on_miss` parameter on `Node()`: ```python # simplified import horus node = horus.Node( name="detector", tick=detect_fn, rate=30, budget=0.030, # 30 ms budget on_miss="warn", # What to do when budget is exceeded ) ``` There are four miss policies: ### warn (default) Log a warning, continue running normally. The node keeps ticking at its configured rate. ```python # simplified detector = horus.Node( name="detector", tick=run_yolo, rate=30, budget=0.030, on_miss="warn", # Log but keep going pubs=["detections"], subs=["camera.rgb"], ) ``` **When to use**: Development and testing. Non-critical nodes where timing overruns are informational. Nodes you are actively profiling -- run with `"warn"` first to understand your timing distribution before tightening to `"skip"` or `"stop"`. ### skip Skip the next tick to let the system recover timing. If a 30 Hz node takes 50 ms instead of 33 ms, it skips the following tick to avoid falling further behind. ```python # simplified planner = horus.Node( name="planner", tick=plan_path, rate=10, budget=0.080, on_miss="skip", # Skip next tick to recover subs=["map", "pose"], pubs=["path"], ) ``` **When to use**: High-frequency nodes where occasional dropped ticks are acceptable. Sensor fusion, path planning, and perception pipelines where using slightly stale data for one cycle is better than accumulating latency. If the node is running at 100 Hz and misses one tick, the system gets 99 ticks that cycle instead of 100 -- much better than all subsequent ticks starting late. ### safe_mode Trigger the safety mechanism on the node. ```python # simplified motor = horus.Node( name="motor_ctrl", tick=drive_motors, rate=100, budget=0.008, on_miss="safe_mode", # Enter safe state on miss subs=["cmd_vel"], pubs=["motor.status"], ) ``` **When to use**: Motor controllers, actuators, and any node where continued operation after a timing violation could be physically dangerous. In practice, use this policy on Rust nodes that implement `enter_safe_state()`, or pair it with a Python-side try/except pattern (see below). ### stop Stop the entire scheduler immediately. All nodes receive their `shutdown()` callback. ```python # simplified safety_monitor = horus.Node( name="safety", tick=check_safety, rate=200, budget=0.003, on_miss="stop", # Kill everything if safety check is late order=0, # Run first every cycle ) ``` **When to use**: Safety-critical nodes where a late result means the safety guarantee is void. If your safety monitor must complete within 3 ms and it takes 5 ms, the system can no longer guarantee safe operation -- stopping is the correct response. Also appropriate as a last resort for nodes where `"safe_mode"` is insufficient. ### Choosing a Miss Policy | Node role | Policy | Reasoning | |-----------|--------|-----------| | Safety monitor | `"stop"` | Late safety check = no safety guarantee | | Motor controller | `"safe_mode"` | Stop motors on timing violation | | Sensor fusion | `"skip"` | One stale cycle is acceptable | | Path planner | `"skip"` | Can use previous path for one cycle | | ML inference | `"warn"` | Timing varies with input; log and profile | | Telemetry | `"warn"` | Non-critical; timing overruns are informational | ## Failure Policies When a node's `tick()` function raises an exception, the failure policy determines the response. Set it with the `failure_policy` parameter on `Node()`: ```python # simplified node = horus.Node( name="sensor", tick=read_sensor, rate=100, failure_policy="restart", ) ``` ### fatal (default) Stop the entire scheduler on the first exception. This is the safest default -- an unhandled exception means the system is in an unknown state. ```python # simplified motor = horus.Node( name="motor_ctrl", tick=drive_motors, rate=100, failure_policy="fatal", subs=["cmd_vel"], ) ``` **When to use**: Motor controllers, safety monitors, and any node where a crash indicates a state that is unsafe to continue from. If `drive_motors()` raises an exception, the motor may be in an unknown state -- continuing could mean uncontrolled motion. ### restart Re-initialize the node with exponential backoff. The `init()` callback runs again, then ticking resumes. After the maximum retry count, escalates to a fatal stop. ```python # simplified lidar = horus.Node( name="lidar", tick=read_lidar, init=connect_lidar, rate=10, failure_policy="restart", pubs=["scan"], ) ``` Configure retry behavior with additional keyword arguments: ```python # simplified camera = horus.Node( name="camera", tick=capture_frame, init=open_camera, rate=30, failure_policy="restart", max_retries=5, # Give up after 5 restarts backoff_ms=100, # Start with 100 ms between retries pubs=["camera.rgb"], ) ``` The backoff doubles on each consecutive failure: 100 ms, 200 ms, 400 ms, 800 ms, 1600 ms. After the 5th failure, the scheduler stops. A successful tick resets the counter and backoff. **When to use**: Hardware drivers that may disconnect (USB sensors, serial devices, network cameras). Nodes that depend on external services that may be temporarily unavailable. Any node where re-running `init()` has a reasonable chance of fixing the problem. ### skip Ignore the failed tick and continue. After `max_failures` consecutive failures, suppress the node for a cooldown period, then try again. ```python # simplified telemetry = horus.Node( name="telemetry", tick=upload_telemetry, rate=1, failure_policy="skip", max_failures=5, # After 5 consecutive failures cooldown_ms=2000, # Suppress for 2 seconds pubs=["telemetry.status"], ) ``` **When to use**: Non-critical nodes whose absence does not affect core operation. Logging, telemetry upload, visualization, diagnostics. The robot should keep running even if telemetry fails for a few seconds. ### ignore Swallow all exceptions completely. The node keeps ticking every cycle regardless of errors. ```python # simplified stats = horus.Node( name="stats", tick=collect_stats, rate=1, failure_policy="ignore", ) ``` **When to use**: Best-effort nodes where partial results are acceptable. Statistics collectors, debug output, optional monitoring. Use sparingly -- silently swallowing errors can mask real problems. ### Choosing a Failure Policy | Node role | Policy | Why | |-----------|--------|-----| | Motor controller | `"fatal"` | Unknown state after crash is unsafe | | Safety monitor | `"fatal"` | Cannot monitor safety if monitor is broken | | LiDAR driver | `"restart"` | USB reconnect often fixes it | | Camera node | `"restart"` | Hardware reset recovers most failures | | Path planner | `"skip"` | System can coast on last known path | | Cloud upload | `"skip"` | Network outages are transient | | Telemetry | `"skip"` or `"ignore"` | Non-critical data collection | | Debug logger | `"ignore"` | Missing log entries are acceptable | ## Watchdog The watchdog detects frozen nodes -- nodes whose `tick()` function never returns. This catches deadlocks, infinite loops, and hardware calls that block indefinitely. ### Global Watchdog Set a global watchdog timeout on the `Scheduler`. Every node must complete its tick within this window: ```python # simplified scheduler = horus.Scheduler( tick_rate=100, watchdog_ms=500, # 500 ms global watchdog ) ``` Or pass it through `horus.run()`: ```python # simplified horus.run(sensor, controller, logger, watchdog_ms=500) ``` ### Per-Node Watchdog Override the global timeout for specific nodes: ```python # simplified # Safety-critical node: tight watchdog safety = horus.Node( name="safety", tick=check_safety, rate=200, watchdog=0.050, # 50 ms -- must respond quickly order=0, ) # ML inference node: loose watchdog detector = horus.Node( name="detector", tick=run_yolo, rate=10, watchdog=2.0, # 2 seconds -- inference can be slow order=5, compute=True, ) scheduler = horus.Scheduler(tick_rate=200, watchdog_ms=500) scheduler.add(safety) # Uses its own 50 ms watchdog scheduler.add(detector) # Uses its own 2 s watchdog ``` ### Critical Nodes Mark a node as critical with `add_critical_node()` to enforce a tight watchdog and trigger an emergency stop if it goes unresponsive: ```python # simplified scheduler = horus.Scheduler(tick_rate=1000, watchdog_ms=500) scheduler.add(sensor) scheduler.add(controller) # This node gets a 5 ms watchdog -- emergency stop if it freezes scheduler.add_critical_node("safety_monitor", timeout_ms=5) ``` Critical nodes bypass the graduated degradation ladder. If a critical node exceeds its timeout, the scheduler stops immediately rather than escalating through Warning and Unhealthy states. ### Graduated Degradation For non-critical nodes, the watchdog uses a graduated response based on how many timeout multiples have elapsed: | Elapsed time | Health state | Scheduler response | |-------------|-------------|-------------------| | Within timeout | Healthy | Normal operation | | 1x timeout | Warning | Log warning, keep ticking | | 2x timeout | Unhealthy | Skip this node's tick | | 3x timeout | Isolated | Remove from tick loop | This prevents a single late tick from triggering a drastic response. A 500 ms watchdog means: - At 500 ms without a tick: warning logged - At 1000 ms: node skipped (other nodes keep running) - At 1500 ms: node isolated entirely Recovery is automatic. If a Warning node completes a tick successfully, it transitions back to Healthy immediately. ### Monitoring Safety Statistics Inspect watchdog triggers, deadline misses, and node health at runtime: ```python # simplified scheduler = horus.Scheduler(tick_rate=100, watchdog_ms=500) scheduler.add(sensor) scheduler.add(controller) scheduler.run(duration=30.0) # After run completes, check what happened stats = scheduler.safety_stats() if stats: print(f"Deadline misses: {stats.get('deadline_misses', 0)}") print(f"Budget overruns: {stats.get('budget_overruns', 0)}") print(f"Watchdog triggers: {stats.get('watchdog_expirations', 0)}") ``` For per-node inspection: ```python # simplified node_stats = scheduler.get_node_stats("controller") print(f"Total ticks: {node_stats['total_ticks']}") print(f"Failed ticks: {node_stats['failed_ticks']}") print(f"Avg duration: {node_stats.get('avg_tick_duration_ms', 0):.2f} ms") print(f"Max duration: {node_stats.get('max_tick_duration_ms', 0):.2f} ms") ``` ### Max Deadline Misses Set an emergency stop threshold -- after N consecutive deadline misses across the system, the scheduler stops: ```python # simplified scheduler = horus.Scheduler( tick_rate=100, watchdog_ms=500, max_deadline_misses=50, # Emergency stop after 50 consecutive misses ) ``` Or via `horus.run()`: ```python # simplified horus.run(sensor, controller, max_deadline_misses=50, watchdog_ms=500) ``` This is a system-wide backstop. Individual nodes handle their own misses via `on_miss`, but if the entire system is consistently falling behind, `max_deadline_misses` triggers a clean shutdown before the situation degrades further. ## Python Safety Limitations Python nodes have a meaningful gap compared to their Rust counterparts: there is no Python-side equivalent of `is_safe_state()` or `enter_safe_state()`. In Rust, you implement these methods on your Node trait: ```python # simplified # This is what Rust can do — Python CANNOT: # # impl Node for MotorController { # fn enter_safe_state(&mut self) { # self.velocity = 0.0; # self.disable_motor(); # } # # fn is_safe_state(&self) -> bool { # self.velocity == 0.0 # } # } ``` When `on_miss="safe_mode"` fires on a Python node, the scheduler invokes the default Rust-side safe-state mechanism, but Python cannot define what "entering safe state" means for that specific node. The node cannot report back that it has reached a safe state. This is an intentional design constraint. The safe-state mechanism requires lock-free, deterministic execution that Python's GIL and garbage collector cannot guarantee. A Python `enter_safe_state()` that triggers a GC pause defeats the purpose. ### What Still Works - `on_miss="warn"`, `"skip"`, and `"stop"` all work identically in Python and Rust - All failure policies (`"fatal"`, `"restart"`, `"skip"`, `"ignore"`) work identically - Watchdog monitoring, graduated degradation, and `add_critical_node()` all work identically - `safety_stats()` reports the same data regardless of node language The limitation is narrow: only `on_miss="safe_mode"` has reduced functionality in Python. ## Workaround Patterns for Safety in Python ### Pattern 1: Safety Logic in tick() Handle safety directly in your tick function with try/except. This gives you explicit control over what "safe" means: ```python # simplified import horus class MotorState: def __init__(self): self.velocity = 0.0 self.safe = False def tick(self, node): try: if node.has_msg("cmd_vel"): cmd = node.recv("cmd_vel") self.velocity = cmd["linear"] # Check timing budget remaining = horus.budget_remaining() if remaining < 0.001: # Less than 1 ms left self.enter_safe_state(node) return node.send("motor.cmd", {"velocity": self.velocity}) except Exception as e: node.log_error(f"Motor error: {e}") self.enter_safe_state(node) def enter_safe_state(self, node): self.velocity = 0.0 self.safe = True node.send("motor.cmd", {"velocity": 0.0}) node.log_warning("Entered safe state — motors stopped") state = MotorState() motor = horus.Node( name="motor", tick=state.tick, rate=100, budget=0.008, on_miss="warn", # Log the overrun; safety handled in tick() subs=["cmd_vel"], pubs=["motor.cmd"], ) horus.run(motor) ``` This pattern gives you full control but places the safety burden on your code. The scheduler's `on_miss` still fires for monitoring, but the actual safe-state transition is managed in Python. ### Pattern 2: Dedicated Safety Node Run a separate node whose only job is monitoring other nodes and triggering emergency stops: ```python # simplified import horus def safety_tick(node): """Check system health every tick""" # Check for stale motor commands if node.has_msg("motor.status"): status = node.recv("motor.status") age_ms = (horus.timestamp_ns() - status.get("timestamp_ns", 0)) / 1e6 if age_ms > 100: node.log_error(f"Motor status stale: {age_ms:.0f} ms") node.send("emergency.stop", {"reason": "stale_motor_data"}) node.request_stop() return # Check sensor health if not node.has_msg("sensor.heartbeat"): node.send("motor.override", {"velocity": 0.0}) node.log_warning("Sensor heartbeat missing — motors zeroed") safety = horus.Node( name="safety_monitor", tick=safety_tick, rate=200, budget=0.003, on_miss="stop", # If safety monitor is late, stop everything failure_policy="fatal", # If safety monitor crashes, stop everything order=0, # Run before all other nodes subs=["motor.status", "sensor.heartbeat"], pubs=["emergency.stop", "motor.override"], ) ``` This node runs at high frequency, checks system invariants, and calls `node.request_stop()` or publishes override commands when something is wrong. Use `on_miss="stop"` on the safety node itself -- if the monitor cannot keep up, the system cannot guarantee safety. ### Pattern 3: Mixed Python and Rust For genuinely safety-critical systems, write the safety-critical node in Rust (with proper `enter_safe_state()` and `is_safe_state()`) and keep Python nodes for perception, planning, and telemetry: ```python # simplified # Python: ML inference node (non-safety-critical) import horus def detect(node): if node.has_msg("camera.rgb"): img = node.recv("camera.rgb") detections = run_model(img) node.send("detections", detections) detector = horus.Node( name="detector", tick=detect, rate=30, compute=True, failure_policy="skip", # Non-critical — skip on failure on_miss="warn", # Inference time varies subs=["camera.rgb"], pubs=["detections"], ) horus.run(detector) ``` Meanwhile, a Rust process runs the motor controller with full safe-state support: ```bash # Run both in the same session — they share topics automatically horus run safety_controller detector.py ``` This architecture plays to each language's strengths: Rust for deterministic, safety-critical control; Python for ML inference and high-level logic. Both communicate through the same shared-memory topics. ## GIL and Garbage Collection Gotchas Python's Global Interpreter Lock (GIL) and garbage collector create timing challenges that do not exist in Rust. Understanding these is essential for setting realistic budgets and interpreting deadline misses. ### GC Pauses Cause False Deadline Misses Python's garbage collector runs periodically and can pause your tick function for 1--10 ms, depending on the number of live objects. A node with a 5 ms budget may miss its deadline not because the tick logic is slow, but because the GC ran mid-tick. ```python # simplified import gc def ml_tick(node): # Disable GC during time-critical work gc.disable() try: if node.has_msg("input"): data = node.recv("input") result = run_inference(data) # Time-critical node.send("output", result) finally: gc.enable() ml_node = horus.Node( name="inference", tick=ml_tick, rate=30, budget=0.030, on_miss="warn", subs=["input"], pubs=["output"], ) ``` ### GIL Contention with Multiple Nodes When multiple Python nodes run in the same process, they share the GIL. Only one node's tick function executes Python bytecode at a time. This means: - Two 100 Hz Python nodes cannot both sustain 100 Hz in the same process - Nodes using C extensions that release the GIL (NumPy, PyTorch, OpenCV) are unaffected during the C call - Use `compute=True` on nodes that call GIL-releasing C extensions to run them on the thread pool ```python # simplified # This node releases the GIL during PyTorch inference detector = horus.Node( name="detector", tick=run_pytorch_inference, rate=30, compute=True, # Runs on thread pool — GIL released during inference budget=0.030, on_miss="warn", ) # This node holds the GIL for pure Python work logger = horus.Node( name="logger", tick=log_data, rate=10, # No compute=True — runs on main thread ) ``` ### Setting Realistic Budgets Python tick functions are orders of magnitude slower than Rust. Set budgets accordingly: | Operation | Typical Python time | Suggested budget | |-----------|-------------------|-----------------| | Simple dict processing | 0.1--0.5 ms | 2 ms | | NumPy array operations | 0.5--5 ms | 10 ms | | ML inference (ONNX) | 5--50 ms | 80 ms | | ML inference (PyTorch) | 10--100 ms | 150 ms | | HTTP request (async) | 50--500 ms | Use async node, not budget | A budget of 800 microseconds makes sense for a Rust motor controller. The same budget on a Python node would trigger `on_miss` on nearly every tick, flooding your logs with false alarms. Start with generous budgets during development (use `on_miss="warn"`) and tighten after profiling real-world performance. ## Complete Example: Safety-Critical Robot A warehouse robot with Python nodes for perception and planning, using all three safety systems together: ```python # simplified import horus import gc # --- Sensor node: reads LiDAR data --- def lidar_tick(node): if node.has_msg("scan.raw"): scan = node.recv("scan.raw") # Filter and validate scan data if scan and len(scan.get("ranges", [])) > 0: node.send("scan.filtered", scan) else: node.log_warning("Invalid scan data — skipping") lidar = horus.Node( name="lidar_filter", tick=lidar_tick, rate=30, budget=0.010, # 10 ms budget on_miss="skip", # Skip one cycle if filtering is slow failure_policy="restart", # Restart on crash (sensor reconnect) max_retries=5, backoff_ms=200, order=1, subs=["scan.raw"], pubs=["scan.filtered"], ) # --- Safety monitor: checks system invariants --- class SafetyState: def __init__(self): self.missed_heartbeats = 0 self.max_missed = 10 def tick(self, node): # Check motor heartbeat if node.has_msg("motor.heartbeat"): node.recv("motor.heartbeat") self.missed_heartbeats = 0 else: self.missed_heartbeats += 1 if self.missed_heartbeats >= self.max_missed: node.log_error( f"Motor heartbeat lost for {self.missed_heartbeats} cycles" ) node.send("emergency.stop", {"reason": "motor_heartbeat_lost"}) node.request_stop() return # Check scan freshness if node.has_msg("scan.age"): age = node.recv("scan.age") if age > 0.5: # Scan older than 500 ms node.log_warning(f"Scan data stale: {age*1000:.0f} ms") node.send("motor.override", {"velocity": 0.0, "angular": 0.0}) safety_state = SafetyState() safety = horus.Node( name="safety_monitor", tick=safety_state.tick, rate=200, budget=0.003, # 3 ms — must be fast on_miss="stop", # If safety is late, stop everything failure_policy="fatal", # If safety crashes, stop everything order=0, # Always runs first subs=["motor.heartbeat", "scan.age"], pubs=["emergency.stop", "motor.override"], ) # --- ML detector: runs YOLO on camera images --- def detect_tick(node): gc.disable() try: if node.has_msg("camera.rgb"): img = node.recv("camera.rgb") detections = run_yolo(img) node.send("detections", detections) finally: gc.enable() detector = horus.Node( name="detector", tick=detect_tick, rate=10, budget=0.080, # 80 ms — ML inference is slow on_miss="warn", # Inference time varies; just log failure_policy="skip", # Skip on crash; not safety-critical max_failures=3, cooldown_ms=5000, compute=True, # Thread pool (releases GIL during inference) order=5, subs=["camera.rgb"], pubs=["detections"], ) # --- Planner: computes paths from detections and scans --- def plan_tick(node): if node.has_msg("scan.filtered") and node.has_msg("detections"): scan = node.recv("scan.filtered") dets = node.recv("detections") path = compute_path(scan, dets) node.send("cmd_vel", path) planner = horus.Node( name="planner", tick=plan_tick, rate=10, budget=0.050, # 50 ms budget on_miss="skip", # Skip if planning takes too long failure_policy="restart", # Restart on crash max_retries=3, backoff_ms=100, order=10, subs=["scan.filtered", "detections"], pubs=["cmd_vel"], ) # --- Telemetry: uploads metrics to cloud --- async def telemetry_tick(node): import aiohttp try: stats = { "tick": horus.tick(), "elapsed": horus.elapsed(), } async with aiohttp.ClientSession() as session: await session.post( "http://telemetry.local/api/metrics", json=stats, timeout=aiohttp.ClientTimeout(total=2.0), ) except Exception: pass # Best-effort; failure_policy handles the rest telemetry = horus.Node( name="telemetry", tick=telemetry_tick, # async — auto-detected rate=1, failure_policy="ignore", # Never bring down the system for telemetry order=200, ) # --- Assemble and run --- scheduler = horus.Scheduler( tick_rate=200, watchdog_ms=500, # Detect frozen nodes rt=True, # Request RT scheduling ) scheduler.add(safety) # order=0, runs first scheduler.add(lidar) # order=1 scheduler.add(detector) # order=5, compute pool scheduler.add(planner) # order=10 scheduler.add(telemetry) # order=200, async # Mark safety monitor as critical — emergency stop on freeze scheduler.add_critical_node("safety_monitor", timeout_ms=5) scheduler.run() # Post-run diagnostics stats = scheduler.safety_stats() if stats: print(f"\n--- Safety Report ---") print(f"Deadline misses: {stats.get('deadline_misses', 0)}") print(f"Budget overruns: {stats.get('budget_overruns', 0)}") print(f"Watchdog triggers: {stats.get('watchdog_expirations', 0)}") for name in scheduler.get_node_names(): ns = scheduler.get_node_stats(name) print(f" {name}: {ns['total_ticks']} ticks, " f"{ns['failed_ticks']} failed, " f"avg={ns.get('avg_tick_duration_ms', 0):.2f} ms") ``` This example shows all three safety systems working together: - **Safety monitor** uses `on_miss="stop"` and `failure_policy="fatal"` -- if the safety node itself is compromised, stop everything - **LiDAR filter** uses `on_miss="skip"` and `failure_policy="restart"` -- skip slow ticks, restart on crashes - **ML detector** uses `on_miss="warn"` and `failure_policy="skip"` with `compute=True` -- non-critical, variable timing - **Planner** uses `on_miss="skip"` and `failure_policy="restart"` -- skip slow ticks, restart on bad data - **Telemetry** uses `failure_policy="ignore"` -- best-effort, never brings down the system - **Global watchdog** at 500 ms catches any frozen node - **Critical node** designation on the safety monitor bypasses graduated degradation ## Design Decisions **Why are miss policies strings instead of an enum?** Python does not enforce enum types at the boundary between Python and Rust (PyO3). Using strings (`"warn"`, `"skip"`, `"safe_mode"`, `"stop"`) avoids requiring an import for a four-value enum. The strings are validated at node construction time -- a typo like `on_miss="wrn"` raises an error immediately, not at runtime. **Why is the default miss policy `"warn"` and not `"skip"` or `"stop"`?** Most deadline misses during development are caused by untuned budgets, not real problems. Defaulting to `"warn"` means a new user who sets `budget=0.001` on a Python node sees warnings in the logs rather than nodes silently skipping ticks or the scheduler stopping. Once budgets are tuned, the developer switches to `"skip"` or `"stop"` deliberately. **Why is the default failure policy `"fatal"` and not `"restart"`?** An unhandled exception in a robotics node often means hardware is in an unknown state. Restarting by default could re-initialize hardware mid-operation (e.g., re-homing a motor while the robot is moving). `"fatal"` forces the developer to make an explicit decision about which nodes can safely restart. **Why can't Python nodes implement `enter_safe_state()`?** The safe-state mechanism must execute deterministically within microseconds. Python's GIL and garbage collector cannot provide this guarantee. A Python `enter_safe_state()` that triggers a 5 ms GC pause while the robot needs to stop its motors within 1 ms is worse than no safe-state callback at all. The workaround patterns (try/except in `tick()`, dedicated safety node, mixed Python/Rust) provide equivalent functionality with honest timing characteristics. **Why graduated watchdog degradation instead of immediate kill?** A single late tick in Python is often caused by a GC pause or GIL contention -- not a deadlock. Immediately killing the node would cause false positives in Python-heavy systems. Graduated degradation (warn at 1x, skip at 2x, isolate at 3x) gives transient pauses time to resolve while still catching genuinely frozen nodes. ## Trade-offs | Gain | Cost | |------|------| | Per-node miss policies match safety requirements to node criticality | Must configure each node individually | | Per-node failure policies prevent cascading crashes | Must reason about failure contracts per node | | Graduated watchdog tolerates GC pauses | A genuinely frozen node takes 3x timeout to isolate | | String-based policy configuration requires no imports | Typos caught at construction, not at load time | | `"fatal"` default prevents unsafe automatic restarts | Requires explicit opt-in to restart on every recoverable node | | Python-side safety workarounds give explicit control | No automatic safe-state integration with the scheduler | | `add_critical_node()` bypasses graduated degradation for safety nodes | Critical nodes get no grace period for transient issues | | GC-disable pattern prevents pause-induced misses | Increases memory pressure; must re-enable GC promptly | ## Common Errors | Symptom | Cause | Fix | |---------|-------|-----| | Every tick triggers `on_miss` | Budget too tight for Python | Increase `budget` -- Python ticks take milliseconds, not microseconds | | Node silently stops ticking | `failure_policy="skip"` with low `max_failures` | Increase `max_failures` or fix the underlying exception | | Scheduler stops on network timeout | `failure_policy="fatal"` on a network-dependent node | Use `"restart"` or `"skip"` for nodes with external dependencies | | `on_miss="safe_mode"` has no visible effect | Python nodes cannot implement `enter_safe_state()` | Use try/except in `tick()` or a dedicated safety node | | Watchdog triggers during startup | Node's `init()` takes longer than watchdog timeout | Increase `watchdog_ms` or make `init()` faster | | False deadline misses in bursts | Python GC pause during tick | Disable GC during tight-budget ticks with `gc.disable()` / `gc.enable()` | | Two Python nodes cannot both sustain 100 Hz | GIL contention in same process | Use `compute=True` on nodes that call GIL-releasing C extensions, or run nodes in separate processes via `horus run` | | `add_critical_node` raises an error | Node not yet added to the scheduler | Call `scheduler.add(node)` before `scheduler.add_critical_node()` | ## See Also - [Python Bindings](/python/api/python-bindings) -- Core Python API reference - [Async Nodes](/python/api/async-nodes) -- async/await patterns for I/O-bound nodes - [Safety Monitor](/advanced/safety-monitor) -- Detailed watchdog and graduated degradation reference - [Fault Tolerance](/advanced/circuit-breaker) -- Failure policy deep dive with severity-aware handling - [Scheduler Concepts](/concepts/core-concepts-scheduler) -- How the scheduler manages node execution --- ## Standard Messages Path: /python/api/messages Description: 75+ standard robotics message types for Python — overview, decision table, and category navigation # Standard Messages HORUS provides 75+ typed message types covering every common robotics domain. All are importable from `horus` and binary-compatible with Rust for cross-language topics. ```python # simplified from horus import CmdVel, Imu, LaserScan, Pose2D, Odometry ``` --- ## Which Message Do I Need? | I need to... | Use | Category | |-------------|-----|----------| | Send velocity commands to motors | `CmdVel`, `Twist` | [Control](/python/api/control-messages) | | Read IMU data (accel + gyro) | `Imu` | [Sensor](/python/api/sensor-messages) | | Read LiDAR scans | `LaserScan` | [Sensor](/python/api/sensor-messages) | | Publish robot position | `Odometry`, `Pose2D`, `Pose3D` | [Sensor](/python/api/sensor-messages), [Geometry](/python/api/geometry-messages) | | Send camera images | `Image` | [Image API](/python/api/image) | | Send point clouds | `PointCloud` | [PointCloud API](/python/api/pointcloud) | | Report ML detections | `Detection`, `Detection3D` | [Perception](/python/api/perception-messages) | | Send navigation goals | `NavGoal`, `NavPath` | [Navigation](/python/api/navigation-messages) | | Report system health | `DiagnosticReport`, `NodeHeartbeat` | [Diagnostics](/python/api/diagnostics-messages) | | Send force/torque data | `WrenchStamped`, `ForceCommand` | [Force](/python/api/force-messages) | | Send joystick/keyboard input | `JoystickInput`, `KeyboardInput` | [Input](/python/api/input-messages) | | Send audio data | `AudioFrame` | [ML](/python/api/ml-messages) | | Send dynamic/untyped data | Python dicts (GenericMessage, 4KB) | [Topic API](/python/api/topic#genericmessage-string-topics) | --- ## Message Categories | Category | Types | Page | |----------|-------|------| | **Sensor** | Imu, LaserScan, Odometry, NavSatFix, BatteryState, RangeSensor, Temperature, MagneticField | [Sensor Messages](/python/api/sensor-messages) | | **Control** | CmdVel, MotorCommand, ServoCommand, PidConfig, DifferentialDriveCommand, JointCommand | [Control Messages](/python/api/control-messages) | | **Geometry** | Pose2D, Pose3D, Twist, Vector3, Point3, Quaternion, TransformStamped, Accel | [Geometry Messages](/python/api/geometry-messages) | | **Navigation** | NavGoal, NavPath, Waypoint, OccupancyGrid, CostMap, VelocityObstacle | [Navigation Messages](/python/api/navigation-messages) | | **Perception** | Detection, Detection3D, BoundingBox2D/3D, SegmentationMask, TrackedObject, Landmark | [Perception Messages](/python/api/perception-messages) | | **Diagnostics** | Heartbeat, DiagnosticReport, EmergencyStop, SafetyStatus, NodeHeartbeat | [Diagnostics Messages](/python/api/diagnostics-messages) | | **Vision** | CompressedImage, CameraInfo, StereoInfo, RegionOfInterest | [Vision Messages](/python/api/vision-messages) | | **Force & Haptics** | WrenchStamped, ForceCommand, ContactInfo, HapticFeedback, TactileArray | [Force Messages](/python/api/force-messages) | | **Input** | JoystickInput, KeyboardInput | [Input Messages](/python/api/input-messages) | | **Clock** | Clock, TimeReference | [Clock Messages](/python/api/clock-messages) | | **ML** | AudioFrame | [ML Messages](/python/api/ml-messages) | | **Tensor** | Tensor descriptors | [Tensor Messages](/python/api/tensor-messages) | --- ## Import All standard messages are available from the top-level `horus` module: ```python # simplified import horus # Direct attribute access cmd = horus.CmdVel(linear=1.0, angular=0.5) imu = horus.Imu(accel_x=0.0, accel_y=0.0, accel_z=9.81) # Or import specific types from horus import CmdVel, Imu, LaserScan, Pose2D, Odometry ``` --- ## Custom Messages Need a type that doesn't exist? Use Python dicts for prototyping: ```python # simplified node.send("motor_status", { "rpm": 1500.0, "current_amps": 2.3, "temperature_c": 45.0, }) ``` For production cross-language use, define message types in Rust with `message!` and they become available in Python automatically. See [Custom Messages](/python/api/custom-messages). --- ## Typed vs Generic Performance | Transport | Latency | Cross-Language | |-----------|---------|:-:| | Typed messages (`horus.CmdVel`) | ~1.7μs | Yes | | Python dicts (GenericMessage) | ~6-50μs | No | | `Image` zero-copy (DLPack) | ~1.1μs | Yes | **Rule**: Use typed messages for control loops and cross-language topics. Use dicts for Python-only prototyping. --- ## See Also - [Custom Messages](/python/api/custom-messages) — Runtime and compiled custom types - [Topic API](/python/api/topic) — Typed vs generic topics, GenericMessage constraints - [Rust Standard Messages](/rust/api/messages) — Rust equivalent - [Message Types Concept](/concepts/message-types) — How messages work in HORUS --- ## Multi-Process Architecture (Python) Path: /python/multi-process Description: Run HORUS nodes across separate Python processes — SHM auto-discovery, mixed-language systems, fault isolation # Multi-Process Architecture (Python) HORUS topics work transparently across process boundaries. Two Python processes that use the same topic name connect to the same shared memory region automatically. No broker, no configuration file, no registration step. ```python # simplified # process_a.py — publisher import horus def publish_tick(node): node.send("sensor.temp", {"celsius": 22.5, "location": "motor_1"}) node = horus.Node("temp_pub", pubs=["sensor.temp"], tick=publish_tick, rate=10) horus.run(node) ``` ```python # simplified # process_b.py — subscriber (separate terminal) import horus def monitor_tick(node): if node.has_msg("sensor.temp"): data = node.recv("sensor.temp") print(f"Temperature: {data['celsius']}C at {data['location']}") node = horus.Node("temp_mon", subs=["sensor.temp"], tick=monitor_tick, rate=10) horus.run(node) ``` Run each in its own terminal. They discover each other through shared memory -- no coordination needed. --- ## How Auto-Discovery Works When you create a topic (via `node.send()`, `node.recv()`, or `horus.Topic()`), HORUS creates or opens a shared memory region keyed by the topic name. Any process on the same machine that uses the same topic name connects to the same underlying ring buffer. ``` Process A Process B ┌──────────────┐ ┌──────────────┐ │ Node("pub") │ │ Node("sub") │ │ │ │ │ │ send("imu")──┼──┐ ┌───┼──recv("imu") │ └──────────────┘ │ │ └──────────────┘ ▼ ▲ ┌─────────────────────┐ │ Shared Memory │ │ Ring Buffer: "imu" │ │ (kernel-managed) │ └─────────────────────┘ ``` There is no discovery protocol, no handshake, and no central broker. The shared memory namespace is the discovery mechanism. Processes can start in any order -- a subscriber that starts before its publisher simply sees no messages until the publisher connects. --- ## Running Multiple Python Processes ### Separate Terminals The simplest approach. Run each node file in its own terminal: ```bash # Terminal 1 horus run sensor.py # Terminal 2 horus run controller.py # Terminal 3 horus run logger.py ``` ### Using `horus run` with Multiple Files ```bash # Launches both as separate processes, manages their lifecycle horus run sensor.py controller.py # Ctrl+C sends SIGTERM to all processes ``` ### Using `horus launch` (Production) Declare your multi-process layout in a YAML launch file: ```yaml # launch.yaml nodes: - name: sensor cmd: horus run sensor.py - name: controller cmd: horus run controller.py - name: logger cmd: horus run logger.py ``` ```bash horus launch launch.yaml ``` ### Using subprocess (Programmatic) ```python # simplified import subprocess # Start a companion process from within Python proc = subprocess.Popen(["horus", "run", "sensor.py"]) # Your main process continues import horus def controller_tick(node): if node.has_msg("sensor.data"): data = node.recv("sensor.data") node.send("cmd_vel", horus.CmdVel(linear=data["speed"], angular=0.0)) node = horus.Node("controller", subs=["sensor.data"], pubs=[horus.CmdVel], tick=controller_tick, rate=50) horus.run(node) ``` ### Using systemd (Deployment) For production deployments, run each process as a systemd service: ```ini # /etc/systemd/system/horus-sensor.service [Unit] Description=HORUS Sensor Node After=network.target [Service] ExecStart=/usr/local/bin/horus run /opt/robot/sensor.py Restart=always RestartSec=1 Environment=HORUS_NAMESPACE=production [Install] WantedBy=multi-user.target ``` ```bash sudo systemctl start horus-sensor sudo systemctl start horus-controller ``` Set the same `HORUS_NAMESPACE` across all services so they share the same SHM namespace. --- ## Topic Sharing Across Process Boundaries Topics in HORUS are identified by name. Any process that uses the same topic name and the same SHM namespace connects to the same ring buffer. This applies to both dict topics and typed topics. ### Dict Topics (GenericMessage) ```python # simplified # Process A node.send("status", {"battery": 85.0, "mode": "autonomous"}) # Process B (separate process, same topic name) if node.has_msg("status"): data = node.recv("status") # {"battery": 85.0, "mode": "autonomous"} ``` Dict topics serialize via MessagePack. Both processes see the same data with no configuration. ### Typed Topics ```python # simplified # Process A node.send("cmd_vel", horus.CmdVel(linear=1.0, angular=0.5)) # Process B if node.has_msg("cmd_vel"): cmd = node.recv("cmd_vel") # CmdVel object, zero-copy print(cmd.linear) # 1.0 ``` Typed topics use zero-copy shared memory. Both processes must use the same message type for the same topic name. A type mismatch raises an error at connection time. ### Pool-Backed Types (Image, PointCloud, Tensor) Large data transfers use pool-backed shared memory. Only a small descriptor (64-168 bytes) travels through the ring buffer; the actual data stays in the shared memory pool. ```python # simplified # Process A — camera capture import numpy as np def camera_tick(node): pixels = capture_frame() # returns numpy array img = horus.Image.from_numpy(pixels) # copies once into SHM pool node.send("camera.rgb", img) # sends 64B descriptor, not 1.4MB # Process B — ML inference def detect_tick(node): if node.has_msg("camera.rgb"): img = node.recv("camera.rgb") frame = img.to_numpy() # zero-copy view into SHM pool # Run inference on frame... ``` A 1080p RGB image (1920x1080x3 = ~6MB) transfers in microseconds because only the descriptor crosses the ring buffer. --- ## Namespaces By default, each terminal session gets its own SHM namespace (derived from session ID and user ID). This prevents accidental cross-talk between unrelated projects. To share topics across separate terminals, set the same namespace: ```bash # Terminal 1 HORUS_NAMESPACE=myrobot horus run sensor.py # Terminal 2 HORUS_NAMESPACE=myrobot horus run controller.py ``` `horus launch` automatically sets a shared namespace for all processes in the launch file. | Scenario | Namespace behavior | |----------|--------------------| | Same terminal | Auto-shared (same session ID) | | `horus run a.py b.py` | Auto-shared (same invocation) | | `horus launch` | Auto-shared (launch sets namespace) | | Separate terminals | Separate by default. Set `HORUS_NAMESPACE` to share | | systemd services | Must set `HORUS_NAMESPACE` explicitly | --- ## Mixed-Language Systems The most common multi-process pattern pairs a Python process with processes written in other languages. The shared memory transport is language-agnostic -- any process that uses HORUS topics connects to the same ring buffers. ### Example: Motor Controller + ML Inference A compiled motor controller process runs the real-time control loop at 1kHz. A Python process runs ML inference at 30Hz. Both communicate through shared memory. ```python # simplified # ml_detector.py — Python ML inference process import horus import numpy as np def detect_tick(node): if node.has_msg("camera.rgb"): img = node.recv("camera.rgb") frame = img.to_numpy() # zero-copy from SHM # Run your ML model detections = run_yolo(frame) # Publish results for the controller if detections: closest = min(detections, key=lambda d: d["distance"]) node.send("obstacle", { "distance": closest["distance"], "angle": closest["angle"], "class": closest["label"], }) detector = horus.Node( name="yolo", subs=[horus.Image], pubs=["obstacle"], tick=detect_tick, rate=30, ) horus.run(detector) ``` ```bash # Terminal 1 — compiled motor controller (real-time, 1kHz) horus run motor_controller # Terminal 2 — Python ML inference (best-effort, 30Hz) horus run ml_detector.py ``` The Image flows from the compiled process through shared memory. The Python process gets a zero-copy view -- no serialization, no copying megabytes of pixel data. ### Binary Compatibility Typed messages are binary-compatible across languages. A `CmdVel` published by a compiled process is received as a `horus.CmdVel` in Python with identical field values and memory layout. This works because all typed messages use the same Pod (Plain Old Data) layout in shared memory. ```python # simplified # Python receives typed messages from any language cmd = node.recv("cmd_vel") # horus.CmdVel — same binary layout imu = node.recv("imu") # horus.Imu — same binary layout scan = node.recv("scan") # horus.LaserScan — same binary layout ``` No protobuf, no JSON, no serialization step. The message bytes in shared memory are read directly. --- ## When to Use Multi-Process | Factor | Single Process | Multi-Process | |--------|----------------|---------------| | **Latency** | ~500ns (intra-process) | ~1-5us (cross-process) | | **GIL** | All nodes share one GIL | Each process has its own GIL | | **Fault isolation** | One crash takes down everything | A crash is contained to one process | | **Languages** | Python only | Mix Python + compiled languages | | **Restart** | Must restart everything | Restart one process independently | | **Debugging** | Single debugger session | Attach debugger to one process | | **Deployment** | One script to run | Multiple scripts/services | | **Memory** | Shared address space | Separate address spaces | ### Use single-process when - All nodes are Python - You need deterministic ordering between nodes (sensor, then controller, then actuator -- in that order) - Latency at the sub-microsecond level matters - Simpler deployment is preferred ### Use multi-process when - **GIL bypass** -- CPU-bound Python nodes (ML inference, image processing) block the GIL. Separate processes give each node its own GIL - **Fault isolation** -- a segfault in one process (e.g., a buggy C extension) does not crash the rest - **Mixed languages** -- pair Python ML with compiled real-time control - **Independent restart** -- update one node without stopping others - **Independent scaling** -- run the heavy ML inference process on a GPU machine, sensors on the robot ### The GIL Problem Python's Global Interpreter Lock means only one thread executes Python bytecode at a time. In a single HORUS process, if one node does heavy computation (ML inference, image processing), it blocks all other nodes until it finishes. Multi-process solves this completely. Each process has its own Python interpreter and its own GIL: ```python # simplified # PROBLEM: Single process, GIL blocks everything # sensor_tick waits while inference_tick holds the GIL # SOLUTION: Separate processes # sensor.py — runs at 100Hz, unblocked # inference.py — runs at 10Hz, heavy computation, own GIL ``` --- ## What Happens When a Process Crashes When a Python process dies (exception, SIGKILL, OOM): 1. **SHM files persist** -- the kernel closes file descriptors but the shared memory region stays 2. **Other processes continue** -- subscribers see no new messages from the dead publisher, but they do not crash 3. **Automatic reconnection** -- when the crashed process restarts and recreates its topics, other processes see fresh data again 4. **Automatic cleanup** -- the next `horus` CLI command or `horus.run()` call auto-cleans stale namespaces ```python # simplified # Process A crashes mid-publish # Process B keeps running, ticking normally # Process A restarts # Process B sees fresh data from Process A — no reconfiguration ``` --- ## Debugging Multi-Process Systems ### See all topics from all processes ```bash horus topic list ``` This shows every topic in the current namespace, regardless of which process created it. Use `--verbose` to see publisher/subscriber PIDs. ### Monitor cross-process data flow ```bash # Watch messages on a topic (from any process) horus topic echo sensor.temp # Check actual publishing rate horus topic hz sensor.temp # Measure bandwidth horus topic bw camera.rgb ``` ### See all running nodes ```bash horus node list # Shows all nodes across all processes with PID, rate, CPU usage ``` ### Debug one process at a time ```bash # Start sensor normally horus run sensor.py # Start controller with verbose logging HORUS_LOG=debug horus run controller.py ``` ### System-wide view ```bash horus monitor # Web dashboard at http://localhost:3000 showing all nodes from all processes ``` ### Common debugging workflow 1. `horus topic list` -- verify all processes see the expected topics 2. `horus topic hz sensor.data` -- verify the publisher sends at the expected rate 3. `horus topic echo sensor.data` -- verify message content is correct 4. `horus node list` -- verify all nodes are `Running` (not `Error` or `Crashed`) --- ## Complete Example: Three-Process Pipeline A camera process captures frames. An ML process detects objects. A controller process drives the robot. **camera.py** -- captures at 30 FPS: ```python # simplified import horus import numpy as np def camera_tick(node): # Simulate camera capture pixels = np.random.randint(0, 255, (480, 640, 3), dtype=np.uint8) img = horus.Image.from_numpy(pixels) node.send("camera.rgb", img) cam = horus.Node("camera", pubs=[horus.Image], tick=camera_tick, rate=30) horus.run(cam) ``` **detector.py** -- ML inference at 10 FPS: ```python # simplified import horus def detect_tick(node): if node.has_msg("camera.rgb"): img = node.recv("camera.rgb") frame = img.to_numpy() # zero-copy # Run detection model detections = my_model.predict(frame) node.send("detections", { "count": len(detections), "closest_distance": min(d["dist"] for d in detections) if detections else 999.0, }) det = horus.Node("detector", subs=[horus.Image], pubs=["detections"], tick=detect_tick, rate=10) horus.run(det) ``` **controller.py** -- drives motors at 50Hz: ```python # simplified import horus def control_tick(node): if node.has_msg("detections"): det = node.recv("detections") if det["closest_distance"] < 1.0: # Obstacle close — stop node.send("cmd_vel", horus.CmdVel(linear=0.0, angular=0.0)) else: # Clear path — drive forward node.send("cmd_vel", horus.CmdVel(linear=0.5, angular=0.0)) ctrl = horus.Node("controller", subs=["detections"], pubs=[horus.CmdVel], tick=control_tick, rate=50) horus.run(ctrl) ``` **Run all three:** ```bash # Option A: separate terminals (set same namespace) HORUS_NAMESPACE=robot horus run camera.py HORUS_NAMESPACE=robot horus run detector.py HORUS_NAMESPACE=robot horus run controller.py # Option B: launch file horus launch robot_launch.yaml # Option C: single command horus run camera.py detector.py controller.py ``` --- ## Cleaning Up Shared memory files persist after processes exit. HORUS auto-cleans stale regions on every `horus` CLI command and every `horus.run()` call. For manual cleanup: ```bash horus clean --shm ``` --- ## Common Errors | Error | Cause | Fix | |-------|-------|-----| | Topics not visible across terminals | Different SHM namespaces | Set `HORUS_NAMESPACE=shared` in both terminals | | Type mismatch on topic | Process A uses `CmdVel`, Process B uses different type for same name | Ensure both processes use the same message type for the same topic name | | Stale data after crash | SHM files persist after process death | Auto-cleaned on next `horus run`. Manual: `horus clean --shm` | | High message drops | Subscriber is slower than publisher | Increase subscriber rate or decrease publisher rate | | Permission denied on SHM | Different users running processes | Run both as the same user | --- ## Design Decisions **Why auto-discovery via shared memory names instead of a configuration file?** When you use a topic name, HORUS maps that name deterministically to a shared memory region. Any process that uses the same name connects to the same region. There is no registration step, no discovery protocol, and no configuration listing endpoints. This eliminates an entire class of misconfiguration bugs ("I forgot to register my topic") and means processes can start and stop in any order. The cost is that topics only work on a single machine -- cross-machine communication requires an explicit network bridge. **Why no message broker?** Brokers (like DDS in ROS2 or MQTT) add a routing hop between every publisher and subscriber. Even optimized brokers add latency and create a single point of failure. HORUS uses direct shared memory: publishers write to a ring buffer, subscribers read from it. This gives microsecond-level latency and means there is no central process that can crash and take down communication. The tradeoff is single-machine scope. **Why separate processes instead of threads for GIL bypass?** Python threads share the GIL, so CPU-bound work in one thread blocks all others. `multiprocessing` uses fork/spawn which requires pickling data across process boundaries. HORUS processes use shared memory topics -- the same API as single-process -- so splitting into multiple processes requires zero code changes. Each process gets its own GIL, its own memory space, and fault isolation. **Why kernel-managed namespaces instead of a registry?** Shared memory is a kernel-level namespace. Any process on the same machine that opens the same named region gets the same memory. This is inherently race-free and requires no coordination daemon. A registry-based approach would need a long-running process to maintain state, which adds complexity and a failure point. ## Trade-offs | Area | Benefit | Cost | |------|---------|------| | **Auto-discovery** | Zero configuration; start/stop in any order | No explicit topology -- use `horus topic list` to audit connections | | **No broker** | Microsecond latency; no single point of failure | Single-machine only; cross-machine needs a network bridge | | **Process isolation** | One crash does not take down the system; independent restart | Higher latency (~1-5us cross-process vs ~500ns same-process) | | **Separate GILs** | CPU-bound nodes do not block each other | Higher memory usage; one Python interpreter per process | | **Shared memory persistence** | Fast reconnection; no handshake on restart | Stale files persist after crashes; auto-cleaned on next startup | | **No cross-process ordering** | Each process runs at its own rate independently | Sensor-to-actuator chains across processes depend on timing, not scheduler order | --- ## See Also - [Python Bindings](/python/api/python-bindings) -- Core Python API reference - [Shared Memory](/concepts/shared-memory) -- SHM architecture and ring buffers - [Message Design (Python)](/python/message-design) -- Choosing between dict and typed topics - [Async Nodes](/python/api/async-nodes) -- Non-blocking I/O nodes --- ## Sensor Messages Path: /python/api/sensor-messages Description: IMU, LiDAR, odometry, GPS, range sensors, battery state, and environmental sensors # Sensor Messages Standard sensor data formats for common robotics sensors. All are typed, zero-copy, and binary-compatible with Rust. ```python # simplified from horus import Imu, LaserScan, Odometry, NavSatFix, BatteryState, RangeSensor ``` --- ## Imu IMU (Inertial Measurement Unit) — accelerometer + gyroscope data. ```python # simplified import horus # All 6 accel/gyro fields are REQUIRED positional arguments imu = horus.Imu( 0.0, # accel_x — m/s², linear acceleration X 0.0, # accel_y — m/s², linear acceleration Y 9.81, # accel_z — m/s², linear acceleration Z (gravity) 0.0, # gyro_x — rad/s, angular velocity X (roll rate) 0.0, # gyro_y — rad/s, angular velocity Y (pitch rate) 0.05, # gyro_z — rad/s, angular velocity Z (yaw rate) timestamp_ns=0, # optional ) # Access fields print(f"Gravity: {imu.accel_z:.2f} m/s²") print(f"Yaw rate: {imu.gyro_z:.3f} rad/s") ``` **Constructor:** `Imu(accel_x, accel_y, accel_z, gyro_x, gyro_y, gyro_z, timestamp_ns=0)` All 6 accel/gyro values are **required positional** arguments. `timestamp_ns` is optional (keyword-only). **Fields (getters/setters):** | Field | Type | Unit | Description | |-------|------|------|-------------| | `accel_x`, `accel_y`, `accel_z` | `float` | m/s² | Linear acceleration | | `gyro_x`, `gyro_y`, `gyro_z` | `float` | rad/s | Angular velocity | | `timestamp_ns` | `int` | ns | Nanoseconds since epoch | **Methods:** | Method | Returns | Description | |--------|---------|-------------| | `is_valid()` | `bool` | Check for NaN/inf in accel and gyro fields | | `has_orientation()` | `bool` | True if orientation quaternion is set (not identity with covariance[0] == -1) | | `set_orientation_from_euler(roll, pitch, yaw)` | — | Set orientation quaternion from Euler angles (radians) | | `angular_velocity_vec()` | `Vector3` | Returns gyroscope readings as a Vector3 | | `linear_acceleration_vec()` | `Vector3` | Returns accelerometer readings as a Vector3 | **Validation pattern:** ```python # simplified def imu_tick(node): imu = read_hardware() # IMPORTANT: always validate before publishing if not imu.is_valid(): node.log_warning("Invalid IMU reading (NaN/inf) — skipping") return node.send("imu.raw", imu) ``` **Common pitfalls:** - All 6 accel/gyro fields are **required positional** args — you cannot use keyword-only construction - Hardware fault produces NaN — always check `is_valid()` before publishing - Orientation is NOT set by default — `has_orientation()` returns `False` until you call `set_orientation_from_euler()` - There are NO `mag_x`/`mag_y`/`mag_z` fields on `Imu` in Python — use `MagneticField` for magnetometer data - Units: acceleration is m/s² (not g), angular velocity is rad/s (not deg/s) > **ROS2 equivalent:** `sensor_msgs/msg/Imu` --- ## LaserScan 2D LiDAR scan data — 360 range readings. ```python # simplified import horus scan = horus.LaserScan( angle_min=-3.14159, # rad — start angle angle_max=3.14159, # rad — end angle angle_increment=0.01745, # rad — step between readings range_min=0.1, # m — minimum valid range range_max=30.0, # m — maximum valid range ranges=[1.5, 2.0, 0.0, 3.2], # optional list of range measurements timestamp_ns=0, # optional ) # Access range data for i in range(len(scan)): if scan.is_range_valid(i): print(f"Reading {i}: {scan.ranges[i]:.2f}m at {scan.angle_at(i):.2f} rad") ``` **Constructor:** `LaserScan(angle_min=0.0, angle_max=0.0, angle_increment=0.0, range_min=0.0, range_max=0.0, ranges=None, timestamp_ns=0)` All parameters are optional keyword arguments. **Fields:** | Field | Type | Unit | Description | |-------|------|------|-------------| | `ranges` | `list[float]` | m | Range measurements (0.0 = invalid) | | `angle_min` | `float` | rad | Start angle | | `angle_max` | `float` | rad | End angle | | `angle_increment` | `float` | rad | Step between readings | | `range_min` | `float` | m | Minimum valid range | | `range_max` | `float` | m | Maximum valid range | | `timestamp_ns` | `int` | ns | Nanoseconds since epoch | **Methods:** | Method | Returns | Description | |--------|---------|-------------| | `len(scan)` | `int` | Count of valid range readings (via `__len__`) | | `angle_at(index)` | `float` | Get the angle (radians) for a specific range index | | `is_range_valid(index)` | `bool` | Check if `range_min <= ranges[index] <= range_max` | | `min_range()` | `float` or `None` | Minimum valid range reading (None if all invalid) | **Validation pattern:** ```python # simplified def lidar_tick(node): scan = node.recv("scan") if scan is None: return # Find closest obstacle min_dist = scan.min_range() if min_dist is not None and min_dist < 0.5: node.log_warning(f"Obstacle at {min_dist:.2f}m!") node.send("cmd_vel", horus.CmdVel(linear=0.0, angular=0.0)) return # Count valid readings — use len(), not .valid_count() valid = len(scan) node.log_debug(f"{valid}/360 valid readings") ``` **Typical sensor configurations:** | Sensor | Ranges | `angle_min` | `angle_max` | `range_max` | |--------|--------|-------------|-------------|-------------| | RPLiDAR A1 | 360 | 0 | 2π | 12m | | RPLiDAR A2 | 360 | 0 | 2π | 18m | | Hokuyo UTM-30LX | 1081 | -π/2 | π/2 | 30m | **Common pitfalls:** - `ranges[i] == 0.0` means invalid/no return — NOT zero distance - Always check `is_range_valid(i)` before using a reading - Use `len(scan)` to get the valid reading count — there is no `valid_count()` method - `angle_at(i)` assumes uniform spacing — verify `angle_increment` matches your sensor - There is no `scan_time` field in Python > **ROS2 equivalent:** `sensor_msgs/msg/LaserScan` --- ## Odometry Robot odometry — pose + velocity as flat scalar fields. ```python # simplified import horus odom = horus.Odometry( x=1.0, # m — position X y=2.0, # m — position Y theta=0.5, # rad — heading linear_velocity=0.5, # m/s — forward speed angular_velocity=0.1, # rad/s — turn rate timestamp_ns=0, # optional ) print(f"Position: ({odom.x}, {odom.y})") print(f"Heading: {odom.theta:.2f} rad") print(f"Speed: {odom.linear_velocity:.2f} m/s") ``` **Constructor:** `Odometry(x=0.0, y=0.0, theta=0.0, linear_velocity=0.0, angular_velocity=0.0, timestamp_ns=0)` All parameters are optional keyword arguments with defaults of 0.0. **Fields:** | Field | Type | Unit | Description | |-------|------|------|-------------| | `x` | `float` | m | Position X | | `y` | `float` | m | Position Y | | `theta` | `float` | rad | Heading (orientation) | | `linear_velocity` | `float` | m/s | Forward speed | | `angular_velocity` | `float` | rad/s | Turn rate | | `timestamp_ns` | `int` | ns | Nanoseconds since epoch | **Methods:** | Method | Returns | Description | |--------|---------|-------------| | `set_frames(frame, child_frame)` | `None` | Set the coordinate frame and child frame strings | | `update(pose, twist)` | `None` | Update from a `Pose2D` and `Twist` | | `is_valid()` | `bool` | Check for NaN/inf values | **Common pitfalls:** - Odometry uses **flat scalar fields** — there are no nested `Pose2D` or `Twist` objects - `theta` is in radians, not degrees > **ROS2 equivalent:** `nav_msgs/msg/Odometry` --- ## NavSatFix GPS/GNSS fix — latitude, longitude, altitude. ```python # simplified import horus fix = horus.NavSatFix( latitude=37.7749, # degrees longitude=-122.4194, # degrees altitude=10.5, # meters above WGS84 ellipsoid timestamp_ns=0, # optional ) print(f"Position: {fix.latitude}, {fix.longitude}") print(f"Satellites: {fix.satellites_visible}") print(f"HDOP: {fix.hdop}") ``` **Constructor:** `NavSatFix(latitude=0.0, longitude=0.0, altitude=0.0, timestamp_ns=0)` All parameters are optional keyword arguments. **Fields:** | Field | Type | Unit | Description | |-------|------|------|-------------| | `latitude` | `float` | deg | WGS84 latitude | | `longitude` | `float` | deg | WGS84 longitude | | `altitude` | `float` | m | Height above ellipsoid | | `timestamp_ns` | `int` | ns | Nanoseconds since epoch | **Additional getters:** | Getter | Type | Description | |--------|------|-------------| | `satellites_visible` | `int` | Number of visible satellites | | `hdop` | `float` | Horizontal dilution of precision | | `speed` | `float` | Ground speed | | `heading` | `float` | Course heading | | `status` | `int` | Fix status (0=no fix, 1=fix, 2=DGPS, 3=RTK) | **Static Methods:** | Method | Returns | Description | |--------|---------|-------------| | `NavSatFix.from_coordinates(lat, lon, alt)` | `NavSatFix` | Create from latitude, longitude, altitude | **Methods:** | Method | Returns | Description | |--------|---------|-------------| | `has_fix()` | `bool` | True if GPS has a valid fix | | `is_valid()` | `bool` | True if coordinates are valid (not NaN, within range) | | `horizontal_accuracy()` | `float` | Estimated horizontal accuracy from HDOP (meters) | | `distance_to(other)` | `float` | Distance to another NavSatFix using Haversine formula (meters) | > **ROS2 equivalent:** `sensor_msgs/msg/NavSatFix` --- ## BatteryState Battery monitoring — voltage, current, charge level. ```python # simplified import horus battery = horus.BatteryState( voltage=12.4, # V percentage=0.75, # 0.0-1.0 current=-2.5, # A (negative = discharging) temperature=35.0, # °C (default: 25.0) power_supply_status=1, # u8 status code timestamp_ns=0, # optional ) if battery.percentage < 0.2: print("LOW BATTERY WARNING") # Check power supply status (u8), NOT a bool print(f"Status code: {battery.power_supply_status}") ``` **Constructor:** `BatteryState(voltage=0.0, percentage=0.0, current=0.0, temperature=25.0, power_supply_status=0, timestamp_ns=0)` All parameters are optional keyword arguments. Note that `temperature` defaults to 25.0 (room temperature). **Fields:** | Field | Type | Unit | Description | |-------|------|------|-------------| | `voltage` | `float` | V | Battery voltage | | `percentage` | `float` | 0-1 | Charge level (0.0 = empty, 1.0 = full) | | `current` | `float` | A | Current (negative = discharging) | | `temperature` | `float` | °C | Battery temperature (default: 25.0) | | `power_supply_status` | `int` | — | Power supply status code (u8) | | `timestamp_ns` | `int` | ns | Nanoseconds since epoch | **Methods:** | Method | Returns | Description | |--------|---------|-------------| | `is_low(threshold)` | `bool` | True if percentage is below threshold | | `is_critical()` | `bool` | True if percentage is below 10% | | `time_remaining()` | `float` or `None` | Estimated time remaining in seconds (None if current is zero) | **Common pitfalls:** - There is NO `is_charging` field — use `power_supply_status` (a `u8` integer) instead - `temperature` defaults to 25.0, not 0.0 > **ROS2 equivalent:** `sensor_msgs/msg/BatteryState` --- ## RangeSensor Single-point distance sensor (ultrasonic, IR, ToF). ```python # simplified import horus dist = horus.RangeSensor( range=1.5, # m — measured distance sensor_type=0, # sensor type code field_of_view=0.26, # rad (~15°) min_range=0.02, # m (default: 0.02) max_range=4.0, # m (default: 4.0) timestamp_ns=0, # optional ) ``` **Constructor:** `RangeSensor(range=0.0, sensor_type=0, field_of_view=0.1, min_range=0.02, max_range=4.0, timestamp_ns=0)` All parameters are optional keyword arguments. **Fields:** | Field | Type | Unit | Description | |-------|------|------|-------------| | `range` | `float` | m | Measured distance | | `sensor_type` | `int` | — | Sensor type code | | `field_of_view` | `float` | rad | Sensor beam width (default: 0.1) | | `min_range` | `float` | m | Minimum valid range (default: 0.02) | | `max_range` | `float` | m | Maximum valid range (default: 4.0) | | `timestamp_ns` | `int` | ns | Nanoseconds since epoch | **Methods:** | Method | Returns | Description | |--------|---------|-------------| | `is_valid()` | `bool` | True if range is within min_range..max_range and not NaN | --- ## JointState Joint positions, velocities, and efforts for articulated robots. ```python # simplified import horus state = horus.JointState( names=["shoulder", "elbow", "wrist", "gripper", "j5", "j6"], positions=[0.0, 0.5, -0.3, 1.2, 0.0, 0.0], velocities=[0.0, 0.0, 0.0, 0.0, 0.0, 0.0], efforts=[0.1, 0.5, 0.3, 0.8, 0.0, 0.0], timestamp_ns=0, # optional ) print(f"Joint '{state.names[1]}' at {state.positions[1]:.2f} rad") ``` **Constructor:** `JointState(names=None, positions=None, velocities=None, efforts=None, timestamp_ns=0)` All parameters are optional keyword arguments. `names` is the first parameter. **Fields:** | Field | Type | Unit | Description | |-------|------|------|-------------| | `names` | `list[str]` | — | Joint names | | `positions` | `list[float]` | rad | Joint positions | | `velocities` | `list[float]` | rad/s | Joint velocities | | `efforts` | `list[float]` | Nm | Joint torques/forces | | `timestamp_ns` | `int` | ns | Nanoseconds since epoch | **Methods:** | Method | Returns | Description | |--------|---------|-------------| | `position(name)` | `float` or `None` | Get position of a joint by name | | `velocity(name)` | `float` or `None` | Get velocity of a joint by name | | `effort(name)` | `float` or `None` | Get effort of a joint by name | | `len(state)` | `int` | Number of joints (via `__len__`) | ```python # simplified # Look up joint values by name instead of index shoulder_pos = state.position("shoulder") # Returns None if not found elbow_vel = state.velocity("elbow") ``` > **ROS2 equivalent:** `sensor_msgs/msg/JointState` --- ## Environmental Sensors ### Temperature ```python # simplified temp = horus.Temperature(temperature=22.5, variance=0.1, timestamp_ns=0) ``` **Constructor:** `Temperature(temperature=0.0, variance=0.0, timestamp_ns=0)` | Field | Type | Unit | Description | |-------|------|------|-------------| | `temperature` | `float` | °C | Temperature reading | | `variance` | `float` | — | Measurement variance | | `timestamp_ns` | `int` | ns | Nanoseconds since epoch | ### MagneticField ```python # simplified mag = horus.MagneticField(x=0.00002, y=0.00005, z=-0.00004, timestamp_ns=0) # Tesla ``` **Constructor:** `MagneticField(x=0.0, y=0.0, z=0.0, timestamp_ns=0)` | Field | Type | Unit | Description | |-------|------|------|-------------| | `x`, `y`, `z` | `float` | T | Magnetic field components | | `timestamp_ns` | `int` | ns | Nanoseconds since epoch | ### FluidPressure ```python # simplified pressure = horus.FluidPressure(pressure=101325.0, variance=10.0, timestamp_ns=0) # Pascals ``` **Constructor:** `FluidPressure(pressure=0.0, variance=0.0, timestamp_ns=0)` | Field | Type | Unit | Description | |-------|------|------|-------------| | `pressure` | `float` | Pa | Fluid pressure | | `variance` | `float` | — | Measurement variance | | `timestamp_ns` | `int` | ns | Nanoseconds since epoch | ### Illuminance ```python # simplified lux = horus.Illuminance(illuminance=500.0, variance=5.0, timestamp_ns=0) # Lux ``` **Constructor:** `Illuminance(illuminance=0.0, variance=0.0, timestamp_ns=0)` | Field | Type | Unit | Description | |-------|------|------|-------------| | `illuminance` | `float` | lx | Illuminance reading | | `variance` | `float` | — | Measurement variance | | `timestamp_ns` | `int` | ns | Nanoseconds since epoch | --- ## Example: IMU Sensor Node ```python # simplified import horus def imu_tick(node): imu = read_hardware() # Your I2C/SPI driver # Validate before publishing if not imu.is_valid(): return node.send("imu.raw", imu) sensor = horus.Node( name="imu_reader", pubs=[horus.Imu], tick=imu_tick, rate=100, order=0, ) horus.run(sensor, rt=True) ``` --- ## See Also - [Control Messages](/python/api/control-messages) — CmdVel, MotorCommand, ServoCommand - [Geometry Messages](/python/api/geometry-messages) — Pose2D, Twist, Quaternion - [Standard Messages](/python/api/messages) — Full message catalog - [IMU Reader Recipe](/recipes/imu-reader-python) — Complete IMU example - [Rust Sensor Messages](/rust/api/sensor-messages) — Rust equivalent --- ## Message Design (Python) Path: /python/message-design Description: Choose the right message strategy for Python HORUS nodes — dict topics, typed topics, pool-backed types, and custom messages # Message Design (Python) HORUS gives you two fundamentally different ways to send data between nodes: **dict topics** (flexible, serialized) and **typed topics** (fast, zero-copy). Choosing the right one for each topic in your system is one of the most important design decisions you will make. ```python # simplified import horus # Dict topic — any Python dict, serialized via MessagePack node.send("status", {"battery": 85.0, "mode": "autonomous"}) # Typed topic — fixed-layout struct, zero-copy via shared memory node.send("cmd_vel", horus.CmdVel(linear=1.0, angular=0.5)) ``` Both approaches share the same `node.send()` / `node.recv()` API. The difference is what happens underneath. --- ## Dict Topics (GenericMessage) Pass a Python dict as the message payload. HORUS serializes it via MessagePack into a `GenericMessage` with a 4KB maximum payload. ```python # simplified import horus def sensor_tick(node): node.send("environment", { "temperature": 22.5, "humidity": 65.0, "light_level": 800, "room": "lab_3", }) def logger_tick(node): if node.has_msg("environment"): data = node.recv("environment") print(f"Temp: {data['temperature']}C in {data['room']}") sensor = horus.Node("sensor", pubs=["environment"], tick=sensor_tick, rate=10) logger = horus.Node("logger", subs=["environment"], tick=logger_tick, rate=10) horus.run(sensor, logger) ``` ### What dict topics support - Primitive types: `int`, `float`, `str`, `bool`, `None` - Collections: `list`, `dict` (nested) - Bytes: `bytes`, `bytearray` ### What dict topics do not support - Custom class instances (unless you convert to dict first) - NumPy arrays (convert with `.tolist()` first) - Payloads larger than 4KB (use pool-backed types instead) ### When to use dict topics - **Prototyping** -- evolve your message schema without restarting - **Low-frequency data** -- status updates, configuration, logs - **Variable structure** -- messages where fields change between sends - **Quick experiments** -- send arbitrary data to see what works --- ## Typed Topics Pass a HORUS message type (Pod struct) as the payload. The message is written directly into shared memory with no serialization. The receiver reads the same bytes -- zero-copy. ```python # simplified import horus def drive_tick(node): node.send("cmd_vel", horus.CmdVel(linear=0.5, angular=0.1)) def motor_tick(node): if node.has_msg("cmd_vel"): cmd = node.recv("cmd_vel") # CmdVel object, zero-copy apply_motor(cmd.linear, cmd.angular) driver = horus.Node("driver", pubs=[horus.CmdVel], tick=drive_tick, rate=50) motor = horus.Node("motor", subs=[horus.CmdVel], tick=motor_tick, rate=50) horus.run(driver, motor) ``` ### Declaring typed topics Declare typed topics in `pubs` and `subs` by passing the message class: ```python # simplified # Single typed topic — auto-derives topic name from the type node = horus.Node("ctrl", pubs=[horus.CmdVel], tick=my_tick, rate=50) # Publishes on "cmd_vel" (derived from CmdVel.__topic_name__) # Multiple typed topics node = horus.Node("nav", pubs=[horus.CmdVel, horus.Pose2D], subs=[horus.LaserScan, horus.Imu], tick=my_tick, rate=50, ) # Mixed: typed + dict topics in the same node node = horus.Node("hybrid", pubs=[horus.CmdVel, "debug_log"], # typed + dict subs=[horus.LaserScan, "config"], # typed + dict tick=my_tick, rate=50, ) ``` ### Override topic name By default, typed topics use the name from the type's `__topic_name__` attribute (e.g., `CmdVel` maps to `"cmd_vel"`). Override with dict syntax: ```python # simplified node = horus.Node("ctrl", pubs={"my_velocity": horus.CmdVel}, # publishes on "my_velocity", not "cmd_vel" tick=my_tick, rate=50, ) node.send("my_velocity", horus.CmdVel(linear=1.0, angular=0.0)) ``` ### When to use typed topics - **High-frequency data** -- sensor streams, control commands, anything above 50Hz - **Cross-language systems** -- binary-compatible with other HORUS language bindings - **Production deployments** -- type safety catches mismatches at connection time - **Performance-critical paths** -- ~1.5us vs ~5-50us for dict topics --- ## Performance Comparison | Approach | Latency | Throughput | Max Payload | Serialization | |----------|---------|------------|-------------|---------------| | **Typed (Pod)** | ~1.5us | ~650K msgs/sec | Fixed-size struct | None (zero-copy) | | **Dict (GenericMessage)** | ~5-50us | ~20-200K msgs/sec | 4KB | MessagePack | | **Pool-backed (Image, Tensor)** | ~3-5us | ~300K descriptors/sec | Unlimited (pool) | Descriptor only (64-168B) | | **Custom Runtime** | ~20-40us | ~25K msgs/sec | Fixed-size struct | Python `struct` module | | **Custom Compiled** | ~3-5us | ~200K msgs/sec | Fixed-size struct | None (zero-copy) | Dict latency varies with payload size: a 50-byte dict serializes in ~5us; a 3KB dict takes ~50us. --- ## Built-in Typed Messages HORUS provides 55+ typed message classes. All are importable from `horus`: ```python # simplified from horus import CmdVel, LaserScan, Imu, Odometry, Image, Pose2D ``` ### Common message types by use case | Use Case | Messages | Typical Rate | |----------|----------|--------------| | Mobile robot drive | `CmdVel`, `Odometry` | 50-100Hz | | LiDAR processing | `LaserScan` | 10-40Hz | | IMU integration | `Imu` | 100-1000Hz | | Camera pipeline | `Image`, `DepthImage` | 30-60Hz | | Object detection | `Detection`, `BoundingBox2D` | 10-30Hz | | Robot arm control | `JointState`, `JointCommand` | 100-1000Hz | | Navigation | `NavGoal`, `NavPath`, `OccupancyGrid` | 1-10Hz | | System health | `Heartbeat`, `DiagnosticStatus`, `BatteryState` | 1-10Hz | ### Creating typed messages ```python # simplified # Constructors use keyword arguments cmd = horus.CmdVel(linear=1.0, angular=0.5) pose = horus.Pose2D(x=1.0, y=2.0, theta=0.785) imu = horus.Imu( accel_x=0.0, accel_y=0.0, accel_z=9.81, gyro_x=0.0, gyro_y=0.0, gyro_z=0.0, ) # Access fields directly print(cmd.linear) # 1.0 print(pose.theta) # 0.785 # All messages include a nanosecond timestamp print(cmd.timestamp_ns) ``` --- ## Pool-Backed Types: Image, PointCloud, DepthImage, Tensor For large data (camera frames, LiDAR scans, ML tensors), HORUS uses pool-backed shared memory. Only a small descriptor travels through the ring buffer; the actual data stays in a shared memory pool. ```python # simplified import horus import numpy as np # Image — camera frames img = horus.Image(480, 640, "rgb8") # create empty img = horus.Image.from_numpy(pixels) # from NumPy (one copy into pool) arr = img.to_numpy() # to NumPy (zero-copy view) # PointCloud — LiDAR scans cloud = horus.PointCloud(10000, 3) # 10k points, XYZ cloud = horus.PointCloud.from_numpy(points) # DepthImage — depth maps depth = horus.DepthImage(480, 640) # F32 meters by default # Tensor — arbitrary array data tensor = horus.Tensor([1000, 1000], dtype="float32") # costmap tensor = horus.Tensor.from_numpy(np_array) ``` ### NumPy interop All pool-backed types convert to and from NumPy arrays: ```python # simplified # to_numpy() — zero-copy view into shared memory arr = img.to_numpy() # shape=(480, 640, 3), dtype=uint8 arr = cloud.to_numpy() # shape=(10000, 3), dtype=float32 arr = depth.to_numpy() # shape=(480, 640), dtype=float32 arr = tensor.numpy() # shape matches creation shape # from_numpy() — one copy into shared memory pool img = horus.Image.from_numpy(np_array) cloud = horus.PointCloud.from_numpy(np_array) tensor = horus.Tensor.from_numpy(np_array) ``` `to_numpy()` is zero-copy (~3us) because it returns a view into the existing shared memory. `from_numpy()` copies once because the data must be placed into a specific pool slot for cross-process sharing. ### PyTorch and JAX interop (DLPack) ```python # simplified import torch # Image to PyTorch (zero-copy via DLPack) pt_tensor = torch.from_dlpack(img.as_tensor()) # Tensor to PyTorch pt_tensor = torch.from_dlpack(tensor) # JAX (zero-copy via DLPack) jax_array = img.to_jax() ``` ### Sending pool-backed types ```python # simplified def camera_tick(node): pixels = capture_frame() # numpy array img = horus.Image.from_numpy(pixels) # copy into SHM pool node.send("camera.rgb", img) # sends 64B descriptor def vision_tick(node): if node.has_msg("camera.rgb"): img = node.recv("camera.rgb") # receives descriptor frame = img.to_numpy() # zero-copy into numpy # 6MB of pixel data never moved — only the 64B descriptor did ``` ### When to use pool-backed types - **Camera frames** -- `Image` for RGB/BGR/grayscale/Bayer - **LiDAR scans** -- `PointCloud` for XYZ, XYZI, XYZRGB point data - **Depth cameras** -- `DepthImage` for F32 meter or U16 millimeter depth maps - **ML data** -- `Tensor` for costmaps, feature maps, CNN outputs, RL observations - **Any large array** -- anything bigger than a few KB benefits from pool-backed transport --- ## GenericMessage for Dynamic Data When you send a dict, HORUS wraps it in a `GenericMessage` automatically. You can also use `GenericMessage` explicitly for more control: ```python # simplified import horus # Implicit — just send a dict node.send("data", {"x": 1.0, "y": 2.0, "label": "waypoint"}) # Receive — comes back as a dict data = node.recv("data") # {"x": 1.0, "y": 2.0, "label": "waypoint"} ``` `GenericMessage` uses MessagePack serialization with a 4KB maximum payload. For small messages (256 bytes or fewer), it uses an inline buffer with no heap allocation. ### Nested structures ```python # simplified node.send("state", { "position": {"x": 1.0, "y": 2.0, "z": 0.0}, "velocity": {"linear": 0.5, "angular": 0.1}, "sensors": { "battery": 85.0, "temperature": 42.3, }, "flags": [True, False, True], }) ``` ### Limitations - Maximum 4KB serialized payload - No NumPy arrays (use `.tolist()` to convert first) - No custom class instances (convert to dict first) - No type checking at connection time -- a subscriber sees whatever the publisher sends --- ## Custom Messages When built-in types don't fit your data model, create custom messages. ### Runtime messages (no build step) Use `horus.msggen` to define custom binary-serialized messages at runtime: ```python # simplified from horus.msggen import define_message # Define a custom message type RobotStatus = define_message('RobotStatus', 'robot.status', [ ('battery_level', 'f32'), ('error_code', 'i32'), ('is_active', 'bool'), ('motor_temp', 'f32'), ]) # Create, serialize, send status = RobotStatus(battery_level=85.0, error_code=0, is_active=True, motor_temp=42.3) node.send("robot.status", status.to_bytes()) # Receive, deserialize raw = node.recv("robot.status") status = RobotStatus.from_bytes(raw) print(status.battery_level) # 85.0 ``` Runtime messages use Python's `struct` module for fixed-layout binary serialization. They support only primitive types (`f32`, `f64`, `i32`, `u64`, `bool`, etc.) -- no nested objects or variable-length arrays. ### Compiled messages (production) For maximum performance, compile custom messages via `horus.msggen`. This generates code that produces the same Pod types as built-in messages: ```python # simplified from horus.msggen import register_message, build_messages register_message('RobotStatus', 'robot.status', [ ('battery_level', 'f32'), ('error_code', 'i32'), ('is_active', 'bool'), ]) build_messages() # generates code and rebuilds # After build, use like any built-in type from horus import RobotStatus, Topic topic = Topic(RobotStatus) topic.send(RobotStatus(battery_level=85.0, error_code=0, is_active=True)) ``` See [Custom Messages](/python/api/custom-messages) for the full API. --- ## Choosing the Right Approach ### Decision flowchart **Is it a standard robotics type?** (CmdVel, LaserScan, Imu, Pose2D, etc.) - Yes: Use the built-in typed message. Done. **Is it large array data?** (camera frames, point clouds, feature maps) - Yes: Use pool-backed types (Image, PointCloud, DepthImage, Tensor). Done. **Is the schema stable and performance-critical?** - Yes: Define a custom compiled message via `horus.msggen`. Done. **Is this for prototyping or low-frequency data?** - Yes: Use a dict topic. Done. ### Recommendation by project phase | Phase | Approach | Why | |-------|----------|-----| | **Prototyping** | Dict topics for everything | Iterate fast, change schemas freely | | **Early development** | Dict for custom data, typed for standard robotics types | Get type safety where it matters | | **Pre-production** | Migrate high-frequency dict topics to typed or custom compiled | Performance optimization | | **Production** | Typed for all fixed-schema data, dict only for truly dynamic data | Maximum performance and type safety | --- ## Cross-Language Compatibility Typed messages and pool-backed types are binary-compatible across all HORUS language bindings. A `CmdVel` published in one language is received as the same struct in any other language. Field names, types, and memory layout are identical. ```python # simplified # Python publishes node.send("cmd_vel", horus.CmdVel(linear=1.0, angular=0.5)) # Any HORUS language binding receives the same CmdVel # with linear=1.0, angular=0.5 — same bytes in shared memory ``` Dict topics (GenericMessage) are also cross-language compatible. The MessagePack serialization format is language-agnostic. Pool-backed types (Image, PointCloud, Tensor) share the same memory pool across languages. A compiled process publishes an Image; a Python process reads it with `img.to_numpy()` -- zero-copy. --- ## Complete Example: Multi-Topic Node A single node that uses dict topics, typed topics, and pool-backed types together: ```python # simplified import horus import numpy as np def robot_tick(node): # Receive typed sensor data if node.has_msg("imu"): imu = node.recv("imu") roll, pitch = estimate_orientation(imu) # Receive pool-backed camera frame if node.has_msg("camera.rgb"): img = node.recv("camera.rgb") frame = img.to_numpy() detections = detect_objects(frame) # Send dict topic (variable structure) node.send("detections", { "count": len(detections), "objects": [{"class": d.label, "conf": d.confidence} for d in detections], }) # Send typed command node.send("cmd_vel", horus.CmdVel(linear=0.5, angular=0.0)) # Send pool-backed costmap costmap = horus.Tensor([100, 100], dtype="float32") costmap.numpy()[:] = compute_costmap() node.send("nav.costmap", costmap) robot = horus.Node( name="robot_brain", subs=[horus.Imu, horus.Image, "config"], pubs=[horus.CmdVel, "detections", "nav.costmap"], tick=robot_tick, rate=30, ) horus.run(robot) ``` --- ## Design Decisions **Why two message paths (dict vs typed) instead of a unified format?** Flexibility and performance are at odds. Dict topics give maximum flexibility -- send any Python dict, change the schema at will, no compilation needed. Typed topics give maximum performance -- zero-copy shared memory, no serialization, compile-time type safety. A single format would compromise one or the other. The dual-path design lets you prototype with dicts and migrate to typed messages for production, topic by topic. **Why MessagePack for GenericMessage instead of JSON or protobuf?** MessagePack is compact (30-50% smaller than JSON), fast to serialize/deserialize, and produces deterministic output. JSON is human-readable but slower and larger. Protobuf requires a schema definition file and a compilation step, which defeats the purpose of a flexible dict-based format. The tradeoff is that MessagePack is not human-readable in raw form, but `horus topic echo` handles display. **Why Pod (Plain Old Data) for typed messages instead of protobuf or FlatBuffers?** Pod types are fixed-size structs with no pointers, no heap allocation, and no serialization. They can be placed directly in shared memory and read by any process without parsing. Protobuf and FlatBuffers require a deserialization step, even if minimal. For robotics control loops running at 1kHz+, the difference between "deserialize then use" and "just use" matters. The cost is that Pod types cannot contain variable-length fields (strings, arrays) -- those use GenericMessage or pool-backed types. **Why pool-backed transport for large data instead of serializing into the ring buffer?** A 1080p RGB image is ~6MB. Copying 6MB through a ring buffer wastes bandwidth and adds milliseconds of latency. Pool-backed types keep the data in a shared memory pool and send only a 64-168 byte descriptor through the ring buffer. The receiver gets a zero-copy view into the pool. This makes camera and LiDAR pipelines practical at full sensor frame rates. **Why `from_numpy()` copies but `to_numpy()` does not?** Publishing requires placing data into a specific pool slot. A NumPy array at an arbitrary heap address cannot be shared across processes. So `from_numpy()` copies once into the shared memory pool. `to_numpy()` returns a view into the already-shared memory -- no copy needed. This asymmetry is intentional: one copy on publish, zero copies on receive. ## Trade-offs | Area | Benefit | Cost | |------|---------|------| | **Dict topics** | Any Python dict; no schema; evolve freely | Serialization overhead (~5-50us); no type checking; 4KB limit | | **Typed topics** | Zero-copy (~1.5us); type safety; cross-language compatible | Fixed schema; only primitive fields; must use built-in or compiled types | | **Pool-backed types** | Zero-copy for megabytes of data; NumPy/PyTorch interop | One copy on `from_numpy()`; pool slot management; descriptors add indirection | | **Custom runtime messages** | No build step; instant iteration | Slower (~20-40us); primitive types only; manual `to_bytes()`/`from_bytes()` | | **Custom compiled messages** | Same performance as built-in types; cross-language | Requires `maturin develop` build step; primitive types only | | **GenericMessage inline buffer** | No heap allocation for small messages (256B or less) | 4KB maximum; overflow to heap above 256B | --- ## See Also - [Python Message Library](/python/library/python-message-library) -- All 55+ built-in message types - [Custom Messages](/python/api/custom-messages) -- Define your own typed messages - [Memory Types](/python/api/memory-types) -- Image, PointCloud, DepthImage, Tensor deep reference - [Multi-Process Architecture (Python)](/python/multi-process) -- Running nodes across separate processes - [Tensor](/python/api/tensor) -- General-purpose shared memory tensor for ML data --- ## Control Messages Path: /python/api/control-messages Description: Motor control, servo, PID, differential drive, and joint command messages # Control Messages Command messages for motors, servos, and actuators. All are typed, zero-copy, and binary-compatible with Rust. ```python # simplified from horus import CmdVel, MotorCommand, ServoCommand, DifferentialDriveCommand, PidConfig, JointCommand ``` --- ## CmdVel 2D velocity command — the most common control message in robotics. Used for differential drive, holonomic, and Ackermann robots. ```python # simplified import horus cmd = horus.CmdVel(linear=0.5, angular=0.3) print(f"Forward: {cmd.linear} m/s") print(f"Turn: {cmd.angular} rad/s") ``` **Fields:** | Field | Type | Unit | Description | |-------|------|------|-------------| | `linear` | `float` | m/s | Forward/backward velocity | | `angular` | `float` | rad/s | Rotational velocity (counter-clockwise positive) | > **ROS2 equivalent:** `geometry_msgs/msg/Twist` (2D simplified) > > **Common rates:** 10-1000 Hz depending on controller. Motor control at 100+ Hz, teleoperation at 10-30 Hz. **Safety patterns:** ```python # simplified # ALWAYS clamp before sending to hardware MAX_LINEAR = 1.0 # m/s MAX_ANGULAR = 2.0 # rad/s def safe_cmd(linear, angular): return horus.CmdVel( linear=max(-MAX_LINEAR, min(MAX_LINEAR, linear)), angular=max(-MAX_ANGULAR, min(MAX_ANGULAR, angular)), ) # ALWAYS send zero in shutdown def motor_shutdown(node): node.send("cmd_vel", horus.CmdVel(linear=0.0, angular=0.0)) # Stale command watchdog — zero motors if no command for 500ms stale_ticks = [0] def motor_tick(node): cmd = node.recv("cmd_vel") if cmd is None: stale_ticks[0] += 1 if stale_ticks[0] > 50: # 500ms at 100Hz node.send("motor", horus.CmdVel(linear=0.0, angular=0.0)) return stale_ticks[0] = 0 node.send("motor", safe_cmd(cmd.linear, cmd.angular)) ``` **Common pitfalls:** - Sending unclamped values → motor damage. Always clamp to hardware limits. - Missing `shutdown()` → motors keep running after program exits - No stale timeout → motors hold last command forever if publisher dies ### Example: Teleoperation ```python # simplified def teleop_tick(node): joy = node.recv("joystick") if joy: cmd = horus.CmdVel( linear=joy.axes[1] * 1.0, # Left stick Y → forward/back angular=joy.axes[0] * 2.0, # Left stick X → turn ) node.send("cmd_vel", cmd) ``` --- ## MotorCommand Direct motor control — position, velocity, or torque mode. ```python # simplified import horus cmd = horus.MotorCommand( motor_id=1, # Motor identifier mode=1, # 0=position, 1=velocity, 2=torque target=5.0, # Target value (units depend on mode) max_velocity=10.0, # rad/s — velocity limit max_acceleration=50.0, # rad/s^2 — acceleration limit feed_forward=0.1, # Feed-forward term enable=True, # Motor enabled ) ``` **Constructor:** `MotorCommand(motor_id=0, mode=0, target=0.0, max_velocity=inf, max_acceleration=inf, feed_forward=0.0, enable=True, timestamp_ns=0)` **Fields:** | Field | Type | Default | Description | |-------|------|---------|-------------| | `motor_id` | `int` | `0` | Motor identifier | | `mode` | `int` | `0` | Control mode (0=position, 1=velocity, 2=torque) | | `target` | `float` | `0.0` | Target value (units depend on mode) | | `max_velocity` | `float` | `inf` | Maximum velocity limit (rad/s) | | `max_acceleration` | `float` | `inf` | Maximum acceleration limit (rad/s^2) | | `feed_forward` | `float` | `0.0` | Feed-forward term | | `enable` | `bool` | `True` | Whether motor is enabled | | `timestamp_ns` | `int` | `0` | Timestamp in nanoseconds | **Static Methods:** | Method | Returns | Description | |--------|---------|-------------| | `MotorCommand.velocity(motor_id, velocity)` | `MotorCommand` | Create a velocity-mode command | | `MotorCommand.position(motor_id, position, max_velocity)` | `MotorCommand` | Create a position-mode command with velocity limit | | `MotorCommand.stop(motor_id)` | `MotorCommand` | Create a stop command (velocity=0, enabled) | **Methods:** | Method | Returns | Description | |--------|---------|-------------| | `is_valid()` | `bool` | Check for NaN/inf in target and limits | --- ## ServoCommand Position command for hobby or industrial servos. ```python # simplified import horus cmd = horus.ServoCommand( servo_id=3, # Servo identifier position=1.57, # rad — target position speed=0.8, # Movement speed (0.0-1.0) enable=True, # Servo enabled ) ``` **Constructor:** `ServoCommand(servo_id=0, position=0.0, speed=0.5, enable=True, timestamp_ns=0)` **Fields:** | Field | Type | Default | Description | |-------|------|---------|-------------| | `servo_id` | `int` | `0` | Servo identifier | | `position` | `float` | `0.0` | Target position (rad) | | `speed` | `float` | `0.5` | Movement speed (0.0-1.0) | | `enable` | `bool` | `True` | Whether servo is enabled | | `timestamp_ns` | `int` | `0` | Timestamp in nanoseconds | **Static Methods:** | Method | Returns | Description | |--------|---------|-------------| | `ServoCommand.with_speed(servo_id, position, speed)` | `ServoCommand` | Create with explicit speed | | `ServoCommand.disable(servo_id)` | `ServoCommand` | Create a disable command | | `ServoCommand.from_degrees(servo_id, degrees)` | `ServoCommand` | Create from degrees (auto-converts to radians) | **Methods:** | Method | Returns | Description | |--------|---------|-------------| | `is_valid()` | `bool` | Check for NaN/inf values | --- ## DifferentialDriveCommand Direct left/right wheel velocity command. ```python # simplified import horus cmd = horus.DifferentialDriveCommand( left_velocity=0.5, # m/s — left wheel right_velocity=0.3, # m/s — right wheel max_acceleration=2.0, # m/s^2 — acceleration limit enable=True, # Drive enabled ) ``` **Constructor:** `DifferentialDriveCommand(left_velocity=0.0, right_velocity=0.0, max_acceleration=inf, enable=True, timestamp_ns=0)` **Fields:** | Field | Type | Default | Description | |-------|------|---------|-------------| | `left_velocity` | `float` | `0.0` | Left wheel velocity (m/s) | | `right_velocity` | `float` | `0.0` | Right wheel velocity (m/s) | | `max_acceleration` | `float` | `inf` | Maximum acceleration limit (m/s^2) | | `enable` | `bool` | `True` | Whether drive is enabled | | `timestamp_ns` | `int` | `0` | Timestamp in nanoseconds | **Static Methods:** | Method | Returns | Description | |--------|---------|-------------| | `DifferentialDriveCommand.stop()` | `DifferentialDriveCommand` | Create a stop command (both wheels zero) | | `DifferentialDriveCommand.from_twist(linear, angular, wheel_base, wheel_radius)` | `DifferentialDriveCommand` | Convert CmdVel-style values to wheel speeds using kinematics | **Methods:** | Method | Returns | Description | |--------|---------|-------------| | `is_valid()` | `bool` | Check for NaN/inf values | > Use `CmdVel` for high-level velocity commands. Use `DifferentialDriveCommand` when your motor driver accepts raw wheel speeds. --- ## PidConfig PID controller configuration — gains, limits, and tuning. ```python # simplified import horus pid = horus.PidConfig( kp=2.0, # Proportional gain ki=0.1, # Integral gain kd=0.05, # Derivative gain controller_id=1, # Controller identifier integral_limit=10.0, # Anti-windup integral limit output_limit=1.0, # Output clamp (symmetric) anti_windup=True, # Anti-windup enabled ) ``` **Constructor:** `PidConfig(kp=1.0, ki=0.0, kd=0.0, controller_id=0, integral_limit=inf, output_limit=inf, anti_windup=True, timestamp_ns=0)` **Fields:** | Field | Type | Default | Description | |-------|------|---------|-------------| | `kp` | `float` | `1.0` | Proportional gain | | `ki` | `float` | `0.0` | Integral gain | | `kd` | `float` | `0.0` | Derivative gain | | `controller_id` | `int` | `0` | Controller identifier | | `integral_limit` | `float` | `inf` | Anti-windup integral limit | | `output_limit` | `float` | `inf` | Output clamp (symmetric +/-) | | `anti_windup` | `bool` | `True` | Whether anti-windup is enabled | | `timestamp_ns` | `int` | `0` | Timestamp in nanoseconds | **Static Methods:** | Method | Returns | Description | |--------|---------|-------------| | `PidConfig.proportional(kp)` | `PidConfig` | P-only controller | | `PidConfig.pi(kp, ki)` | `PidConfig` | PI controller | | `PidConfig.pd(kp, kd)` | `PidConfig` | PD controller | **Methods:** | Method | Returns | Description | |--------|---------|-------------| | `with_limits(integral_limit, output_limit)` | `PidConfig` | Return a copy with limits set | | `is_valid()` | `bool` | Check for NaN/inf in gains and limits | --- ## TrajectoryPoint A single point along a trajectory with position, velocity, acceleration, and timing. ```python # simplified tp = TrajectoryPoint( position=[1.0, 2.0, 0.0], velocity=[0.5, 0.0, 0.0], time_from_start=1.5, ) ``` **Constructor:** `TrajectoryPoint(position=None, velocity=None, acceleration=None, orientation=None, angular_velocity=None, time_from_start=0.0)` **Fields:** | Field | Type | Description | |-------|------|-------------| | `position` | `list[float]` | Position coordinates | | `velocity` | `list[float]` | Velocity components | | `acceleration` | `list[float]` | Acceleration components | | `orientation` | `list[float]` | Quaternion (4 elements) | | `angular_velocity` | `list[float]` | Angular velocity components | | `time_from_start` | `float` | Time offset from trajectory start (seconds) | **Static Methods:** | Method | Returns | Description | |--------|---------|-------------| | `TrajectoryPoint.new_2d(x, y, vx, vy, time)` | `TrajectoryPoint` | Create a 2D trajectory point | | `TrajectoryPoint.stationary(x, y, z)` | `TrajectoryPoint` | Create a stationary point (zero velocity) | --- ## JointCommand Multi-joint position/velocity/effort command for robot arms. ```python # simplified import horus cmd = horus.JointCommand( names=["shoulder", "elbow", "wrist_1", "wrist_2", "wrist_3", "gripper"], positions=[0.0, 0.5, -0.3, 1.2, 0.0, 0.0], velocities=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0], efforts=[], # Empty = no effort control modes=[1, 1, 1, 1, 1, 1], # Per-joint control modes ) ``` **Constructor:** `JointCommand(names=None, positions=None, velocities=None, efforts=None, modes=None, timestamp_ns=0)` **Fields:** | Field | Type | Default | Description | |-------|------|---------|-------------| | `names` | `list[str]` | `None` | Joint names | | `positions` | `list[float]` | `None` | Target joint positions (rad) | | `velocities` | `list[float]` | `None` | Max joint velocities (rad/s) | | `efforts` | `list[float]` | `None` | Joint torques (Nm) | | `modes` | `list[int]` | `None` | Per-joint control modes | | `timestamp_ns` | `int` | `0` | Timestamp in nanoseconds | **Methods:** | Method | Returns | Description | |--------|---------|-------------| | `add_position(name, position)` | `None` | Add a joint position command by name | | `add_velocity(name, velocity)` | `None` | Add a joint velocity command by name | | `is_valid()` | `bool` | Check for NaN/inf values | | `len(cmd)` | `int` | Number of joints (via `__len__`) | --- ## Example: Motor Control Node ```python # simplified import horus def motor_tick(node): cmd = node.recv("cmd_vel") if cmd is None: return # Differential drive kinematics wheel_base = 0.3 # meters left = cmd.linear - cmd.angular * wheel_base / 2 right = cmd.linear + cmd.angular * wheel_base / 2 # Clamp to safe range max_speed = 1.0 left = max(-max_speed, min(max_speed, left)) right = max(-max_speed, min(max_speed, right)) node.send("drive", horus.DifferentialDriveCommand( left_velocity=left, right_velocity=right, enable=True, )) def motor_shutdown(node): # SAFETY: stop both motors node.send("drive", horus.DifferentialDriveCommand( left_velocity=0.0, right_velocity=0.0, enable=False, )) motor = horus.Node( name="motor_ctrl", subs=[horus.CmdVel], pubs=[horus.DifferentialDriveCommand], tick=motor_tick, shutdown=motor_shutdown, rate=100, order=10, on_miss="safe_mode", ) horus.run(motor, rt=True) ``` --- ## See Also - [Sensor Messages](/python/api/sensor-messages) — Imu, LaserScan, Odometry - [Geometry Messages](/python/api/geometry-messages) — Pose2D, Twist, Vector3 - [Standard Messages](/python/api/messages) — Full message catalog - [Differential Drive Recipe](/recipes/differential-drive) — Complete kinematics example - [Rust Control Messages](/rust/api/control-messages) — Rust equivalent --- ## Error Handling (Python) Path: /python/error-handling # Error Handling (Python) Every production node fails eventually. A sensor disconnects, shared memory fills up, a transform goes stale. What matters is how your node responds. This page covers HORUS exception types, failure policies, error callbacks, and defensive patterns that keep your robot running when things go wrong. --- ## Exception Types HORUS raises three domain-specific exceptions that map to distinct failure modes. Import them from `horus`: ```python # simplified from horus import HorusNotFoundError, HorusTransformError, HorusTimeoutError ``` ### HorusNotFoundError Raised when a topic, transform frame, or node does not exist. ```python # simplified try: data = node.recv("nonexistent.topic") except HorusNotFoundError as e: node.log_error(f"Topic missing: {e}") # Error message includes a hint: "Run: horus topic list" ``` **Common triggers:** - Subscribing to a topic that no publisher has created yet - Looking up a transform frame that was never broadcast - Querying a node that has not been registered with the scheduler ### HorusTransformError Raised when a coordinate transform cannot be computed. Two sub-cases: - **Extrapolation** -- the requested timestamp is outside the buffered range - **Stale data** -- the transform exists but has not been updated recently ```python # simplified from horus import TransformFrame, HorusTransformError tf = TransformFrame() try: transform = tf.lookup("base_link", "camera_link") except HorusTransformError as e: node.log_warning(f"Transform unavailable: {e}") # Hint may suggest using tf_at() for clamped lookup ``` ### HorusTimeoutError Raised when a blocking operation exceeds its deadline. The error message includes the resource name, elapsed time, and the deadline that was exceeded. ```python # simplified from horus import HorusTimeoutError try: data = node.recv("lidar.scan", timeout=0.5) except HorusTimeoutError: node.log_warning("LiDAR scan not received within 500ms") ``` ### Standard Python Exceptions HORUS also raises standard Python exceptions for input and system errors: | Exception | When | |-----------|------| | `ValueError` | Invalid input: bad topic name, invalid config, parse failure | | `TypeError` | Serialization failure: wrong message type for a typed topic | | `IOError` | File or shared memory I/O failure | | `MemoryError` | Shared memory allocation failed (SHM segment full) | | `RuntimeError` | Internal error or unclassified failure | | `KeyError` | Missing key in driver parameters | --- ## Failure Policies When `tick()` raises an unhandled exception, the scheduler applies the node's `failure_policy` to decide what happens next. Set it on the node: ```python # simplified import horus node = horus.Node( name="sensor", tick=read_sensor, rate=100, failure_policy="restart", ) ``` ### Policy Reference | Policy | Behavior | When to use | |--------|----------|-------------| | `"fatal"` | Stops the entire scheduler immediately | Safety-critical nodes where any error means stop | | `"restart"` | Retries the node (up to max retries with backoff) | Nodes that recover from transient failures | | `"skip"` | Skips the failed tick, continues on next cycle | Sensor nodes where one missed reading is acceptable | | `"ignore"` | Swallows the exception silently | Logging or telemetry nodes that must never stop the system | **Default behavior:** If no `failure_policy` is set, unhandled exceptions propagate to the scheduler, which logs the error and continues running other nodes. Set an explicit policy for every production node. ### How Each Policy Works **Fatal** stops everything. The scheduler calls `shutdown()` on all nodes and exits. Use this for motor controllers or safety monitors where an error means the robot is in an unknown state. ```python # simplified motor = horus.Node( name="motor_ctrl", tick=motor_tick, rate=1000, failure_policy="fatal", ) ``` **Restart** retries the node with exponential backoff. The scheduler calls `shutdown()` then `init()` again before resuming ticks. If retries are exhausted, the node is marked unhealthy and removed from the tick loop. ```python # simplified camera = horus.Node( name="camera", tick=capture_frame, rate=30, failure_policy="restart", ) ``` **Skip** drops the current tick and moves on. The scheduler increments an error counter and continues to the next tick cycle. The node's state is preserved -- `init()` is not called again. ```python # simplified logger = horus.Node( name="db_logger", tick=log_to_database, rate=10, failure_policy="skip", ) ``` **Ignore** swallows the exception without logging. The scheduler does not even increment the error counter. Use sparingly -- silent failures are hard to debug. --- ## The on_error Callback For custom error handling, pass an `on_error` function to `Node()`. It runs before the failure policy kicks in: ```python # simplified import horus error_count = 0 def handle_error(node, exception): global error_count error_count += 1 node.log_error(f"Error #{error_count}: {exception}") if error_count > 10: node.log_error("Too many errors -- requesting shutdown") raise exception # Re-raise to trigger failure_policy node = horus.Node( name="sensor", tick=read_sensor, rate=100, on_error=handle_error, failure_policy="skip", ) ``` ### on_error Flow 1. `tick()` raises an exception 2. `on_error(node, exception)` is called 3. If `on_error` returns normally, the exception is **suppressed** -- the scheduler continues as if tick succeeded 4. If `on_error` raises (or re-raises), the failure policy takes over 5. If `on_error` itself raises a different exception, the **original** exception is propagated instead This means `on_error` acts as a filter. Return normally to swallow the error. Re-raise to escalate. ```python # simplified def selective_handler(node, exception): if isinstance(exception, HorusTimeoutError): node.log_warning("Timeout -- will retry next tick") return # Swallow timeout errors # All other errors escalate to failure_policy raise exception ``` --- ## Structured Logging HORUS provides four logging methods on the node object. These integrate with the scheduler's structured logging pipeline and appear in `horus logs` output: ```python # simplified def tick(node): node.log_debug("Processing frame 42") node.log_info("Detection found: person at (120, 340)") node.log_warning("LiDAR signal weak -- SNR below threshold") node.log_error("Motor controller not responding") ``` **Logging only works during `init()`, `tick()`, and `shutdown()`.** Calling `node.log_info()` outside the scheduler lifecycle (before `run()` or after shutdown) will emit a `RuntimeWarning` and the message will be dropped. Use standard `print()` or Python's `logging` module for setup-time diagnostics. ### Log Levels in Practice | Method | Use for | Shows in `horus logs` | |--------|---------|----------------------| | `node.log_debug()` | Internal state, per-frame values | Only with `--verbose` | | `node.log_info()` | State changes, detections, milestones | Default output | | `node.log_warning()` | Degraded operation, approaching limits | Default + highlighted | | `node.log_error()` | Failures that trigger recovery | Default + highlighted | --- ## Common Errors and Fixes ### Topic Not Found The subscriber starts before the publisher. The topic does not exist yet. ```python # simplified def tick(node): if not node.has_msg("lidar.scan"): return # No data yet -- skip this tick data = node.recv("lidar.scan") process(data) ``` **Fix:** Always check `node.has_msg()` before `node.recv()`. This is the standard pattern for handling publisher/subscriber startup order. ### Shared Memory Full The ring buffer is full because the consumer is slower than the producer. ```python # simplified def tick(node): try: node.send("big.pointcloud", cloud_data) except MemoryError: node.log_warning("SHM full -- dropping frame") ``` **Fix:** Increase the ring buffer capacity or reduce the publishing rate. The `default_capacity` parameter on `Node()` controls buffer size: ```python # simplified node = horus.Node( name="publisher", tick=publish_fn, rate=30, pubs=["big.pointcloud"], default_capacity=4096, # Larger buffer (default: 1024) ) ``` You can also clean up stale shared memory segments with `horus clean --shm`. ### Permission Denied (SHM) Shared memory segments created by one user cannot be accessed by another. ```python # simplified # This error appears as an IOError try: node.send("topic", data) except IOError as e: if "Permission denied" in str(e): node.log_error("SHM permission error -- check user/group") ``` **Fix:** Run all nodes as the same user, or clean stale segments with `horus clean --shm` and restart. ### Transform Stale A transform frame has not been updated recently. ```python # simplified from horus import TransformFrame, HorusTransformError tf = TransformFrame() def tick(node): try: t = tf.lookup("base_link", "camera_link") except HorusTransformError: node.log_warning("Camera transform stale -- using last known") return process_with_transform(t) ``` **Fix:** Ensure the sensor driver publishing the transform is running and healthy. Check with `horus frame list`. --- ## Defensive tick() Pattern A production-ready tick function handles all expected failures explicitly, uses `on_error` as a safety net, and lets the failure policy handle everything else. ```python # simplified import horus from horus import HorusNotFoundError, HorusTimeoutError def motor_tick(node): # Guard: skip if no command available if not node.has_msg("cmd_vel"): return try: cmd = node.recv("cmd_vel") except HorusNotFoundError: node.log_error("cmd_vel topic disappeared") return try: result = apply_motor_command(cmd) except TimeoutError: node.log_error("Motor hardware timeout") emergency_stop() return except ValueError as e: node.log_warning(f"Invalid command: {e}") return node.send("motor.status", {"applied": True, "velocity": result}) def motor_error_handler(node, exception): """Last resort before failure_policy.""" node.log_error(f"Unhandled motor error: {exception}") emergency_stop() raise exception # Escalate to failure_policy="fatal" motor = horus.Node( name="motor_ctrl", tick=motor_tick, rate=1000, order=0, subs=["cmd_vel"], pubs=["motor.status"], on_error=motor_error_handler, failure_policy="fatal", ) horus.run(motor, rt=True) ``` **Structure:** 1. **Guard clause** -- `has_msg()` check, return early if no data 2. **Specific try/except** -- catch expected exceptions by type 3. **Handle and continue** -- log, take corrective action, return 4. **on_error as safety net** -- catches anything tick() missed 5. **failure_policy as last resort** -- scheduler-level response --- ## Exception Handling Anti-Patterns ### Bare except Never use bare `except:` -- it catches `KeyboardInterrupt` and `SystemExit`, preventing clean shutdown: ```python # simplified # BAD: catches Ctrl+C, prevents shutdown def tick(node): try: data = node.recv("topic") except: pass # GOOD: catch specific exceptions def tick(node): try: data = node.recv("topic") except (HorusNotFoundError, HorusTimeoutError) as e: node.log_warning(f"Expected error: {e}") ``` ### Catch-and-ignore in Safety Nodes Swallowing exceptions in a safety-critical node defeats the purpose of the failure policy: ```python # simplified # BAD: hides failures in a critical node def safety_tick(node): try: check_limits() except Exception: pass # "It's fine." # GOOD: let failure_policy handle it def safety_tick(node): check_limits() # If this fails, failure_policy="fatal" stops everything ``` ### Logging without Action Catching an exception just to log it, then re-raising, adds noise without value: ```python # simplified # UNNECESSARY: the scheduler already logs unhandled exceptions def tick(node): try: process() except Exception as e: node.log_error(f"Error: {e}") raise # Scheduler logs this again # BETTER: either handle it or don't catch it def tick(node): process() # Let failure_policy handle errors ``` --- ## Design Decisions **Why three custom exceptions instead of one?** `HorusNotFoundError`, `HorusTransformError`, and `HorusTimeoutError` represent distinct failure modes with different recovery strategies. A missing topic means a node has not started. A stale transform means a sensor stopped publishing. A timeout means the system is overloaded. Catching a generic `HorusError` would force every handler to inspect the message string. Separate types let you write precise except clauses. **Why does on_error suppress by default?** If `on_error` returns normally, the exception is swallowed. This lets `on_error` act as a filter -- handle what you know, re-raise what you do not. The alternative (always propagating after `on_error`) would make custom error handling useless, since the failure policy would fire regardless. Swallow-by-default gives the callback full control. **Why no exception hierarchy?** The three HORUS exceptions inherit directly from `Exception`, not from a shared `HorusError` base class. This is intentional. In practice, you almost never want to catch "any HORUS error" -- you want to catch a specific failure mode and respond accordingly. A base class encourages overly broad except clauses. If you truly need to catch all three, list them explicitly: `except (HorusNotFoundError, HorusTransformError, HorusTimeoutError)`. **Why standard exceptions for input errors?** Invalid topic names raise `ValueError`, not a custom exception. Serialization failures raise `TypeError`. These are programming errors, not runtime failures. Standard exceptions mean you do not need to import HORUS-specific types to catch bugs in your own code, and linters/IDEs already understand them. ## Trade-offs **failure_policy vs on_error:** The failure policy is coarse-grained (one policy per node) but reliable (enforced by the scheduler, works even if Python crashes). `on_error` is fine-grained but fragile (runs in Python, can itself fail). For safety-critical nodes, rely on `failure_policy="fatal"` and keep `on_error` simple. For non-critical nodes, `on_error` gives flexibility to implement retry logic, circuit breakers, or rate-limited alerts. **Skip vs Ignore:** `"skip"` logs the error and increments the error counter. `"ignore"` does neither. Use `"skip"` unless you have measured that the logging overhead is unacceptable at your tick rate. Silent failures with `"ignore"` make debugging production issues significantly harder. **Defensive tick vs let-it-crash:** Adding try/except for every expected failure makes `tick()` verbose but predictable. Removing all error handling and relying on `failure_policy="restart"` is simpler code but slower recovery (restart calls `shutdown()` then `init()` again). The right balance depends on your recovery cost: if `init()` takes 2 seconds to reconnect to hardware, defensive handling inside `tick()` avoids that penalty. --- ## See Also - [Python Bindings](/python/api/python-bindings) -- Core API reference - [Async Nodes](/python/api/async-nodes) -- Error handling in async tick functions - [Production Deployment](/python/production) -- Failure policies in production --- ## Geometry Messages Path: /python/api/geometry-messages Description: Poses, twists, vectors, quaternions, and transforms for spatial robotics # Geometry Messages Spatial data types for positions, orientations, velocities, and coordinate transforms. ```python # simplified from horus import Pose2D, Pose3D, Twist, Vector3, Quaternion, TransformStamped ``` --- ## Pose2D 2D position and orientation — the most common pose type for ground robots. ```python # simplified pose = Pose2D(x=1.0, y=2.0, theta=0.5) # meters, radians ``` | Field | Type | Unit | Description | |-------|------|------|-------------| | `x` | `float` | m | X position | | `y` | `float` | m | Y position | | `theta` | `float` | rad | Heading angle | | `timestamp_ns` | `int` | ns | Timestamp | | Static Method | Returns | Description | |---------------|---------|-------------| | `Pose2D.origin()` | `Pose2D` | Create a pose at the origin (0, 0, 0) | | Method | Returns | Description | |--------|---------|-------------| | `distance_to(other)` | `float` | Euclidean distance to another Pose2D | | `normalize_angle()` | `None` | Normalize theta to [-pi, pi] in-place | | `is_valid()` | `bool` | Check for NaN/inf values | --- ## Pose3D 6-DOF pose — position + quaternion orientation. ```python # simplified pose = Pose3D( x=1.0, y=2.0, z=0.5, qx=0.0, qy=0.0, qz=0.0, qw=1.0, # identity rotation ) ``` | Field | Type | Unit | Description | |-------|------|------|-------------| | `x`, `y`, `z` | `float` | m | Position | | `qx`, `qy`, `qz`, `qw` | `float` | — | Quaternion orientation | | `timestamp_ns` | `int` | ns | Timestamp | | Static Method | Returns | Description | |---------------|---------|-------------| | `Pose3D.identity()` | `Pose3D` | Identity pose (origin, no rotation) | | `Pose3D.from_pose_2d(pose)` | `Pose3D` | Create from a Pose2D (z=0, rotation around z-axis) | | Method | Returns | Description | |--------|---------|-------------| | `distance_to(other)` | `float` | Euclidean distance to another Pose3D | | `is_valid()` | `bool` | Check for NaN/inf values | --- ## Twist 6-DOF velocity — linear + angular. ```python # simplified twist = Twist( linear_x=0.5, linear_y=0.0, linear_z=0.0, angular_x=0.0, angular_y=0.0, angular_z=0.3, ) ``` | Field | Type | Unit | Description | |-------|------|------|-------------| | `linear_x`, `linear_y`, `linear_z` | `float` | m/s | Linear velocity | | `angular_x`, `angular_y`, `angular_z` | `float` | rad/s | Angular velocity | | `timestamp_ns` | `int` | ns | Timestamp | | Static Method | Returns | Description | |---------------|---------|-------------| | `Twist.stop()` | `Twist` | Zero velocity (all components 0) | | `Twist.new_2d(linear_x=0.0, angular_z=0.0)` | `Twist` | 2D twist (only forward + yaw) | | Method | Returns | Description | |--------|---------|-------------| | `is_valid()` | `bool` | Check for NaN/inf values | > **ROS2 equivalent:** `geometry_msgs/msg/Twist` --- ## Vector3 3D vector for general-purpose spatial math. ```python # simplified v = Vector3(x=1.0, y=0.0, z=0.0) ``` | Field | Type | Description | |-------|------|-------------| | `x`, `y`, `z` | `float` | Vector components | | Static Method | Returns | Description | |---------------|---------|-------------| | `Vector3.zero()` | `Vector3` | Zero vector (0, 0, 0) | | Method | Returns | Description | |--------|---------|-------------| | `magnitude()` | `float` | Vector length (L2 norm) | | `normalize()` | `None` | Normalize to unit length in-place | | `normalized()` | `Vector3` | Return a new unit-length vector (does not mutate self) | | `dot(other)` | `float` | Dot product with another Vector3 | | `cross(other)` | `Vector3` | Cross product with another Vector3 | --- ## Point3 3D point (semantically distinct from Vector3 — a position, not a direction). ```python # simplified p = Point3(x=1.0, y=2.0, z=3.0) ``` | Field | Type | Description | |-------|------|-------------| | `x`, `y`, `z` | `float` | Point coordinates | | Static Method | Returns | Description | |---------------|---------|-------------| | `Point3.origin()` | `Point3` | Origin point (0, 0, 0) | | Method | Returns | Description | |--------|---------|-------------| | `distance_to(other)` | `float` | Euclidean distance to another Point3 | --- ## Quaternion Rotation quaternion (x, y, z, w format). ```python # simplified q = Quaternion(x=0.0, y=0.0, z=0.0, w=1.0) # identity ``` | Field | Type | Description | |-------|------|-------------| | `x`, `y`, `z`, `w` | `float` | Quaternion components | | Static Method | Returns | Description | |---------------|---------|-------------| | `Quaternion.identity()` | `Quaternion` | Identity rotation (0, 0, 0, 1) | | `Quaternion.from_euler(roll=0.0, pitch=0.0, yaw=0.0)` | `Quaternion` | Create from Euler angles (radians) | | Method | Returns | Description | |--------|---------|-------------| | `normalize()` | `None` | Normalize to unit quaternion in-place | | `is_valid()` | `bool` | Check if quaternion is approximately unit length | --- ## Accel 6-DOF acceleration. ```python # simplified accel = Accel( linear_x=0.0, linear_y=0.0, linear_z=9.81, angular_x=0.0, angular_y=0.0, angular_z=0.0, ) ``` | Field | Type | Unit | Description | |-------|------|------|-------------| | `linear_x`, `linear_y`, `linear_z` | `float` | m/s² | Linear acceleration | | `angular_x`, `angular_y`, `angular_z` | `float` | rad/s² | Angular acceleration | | `timestamp_ns` | `int` | ns | Timestamp | | Method | Returns | Description | |--------|---------|-------------| | `is_valid()` | `bool` | Check for NaN/inf values | ## TransformStamped Timestamped rigid-body transform (translation + rotation). ### Constructor ```python # simplified tf = horus.TransformStamped( tx=0.1, ty=0.0, tz=0.3, rx=0.0, ry=0.0, rz=0.0, rw=1.0, timestamp_ns=horus.timestamp_ns(), ) ``` | Field | Type | Description | |-------|------|-------------| | `tx`, `ty`, `tz` | `float` | Translation (meters) | | `rx`, `ry`, `rz`, `rw` | `float` | Rotation quaternion | | `timestamp_ns` | `int` | Timestamp in nanoseconds | ### Static Methods | Method | Returns | Description | |--------|---------|-------------| | `TransformStamped.identity()` | `TransformStamped` | Identity transform (no translation or rotation) | | `TransformStamped.from_pose_2d(pose)` | `TransformStamped` | Create from a `Pose2D` (z=0, rotation around z-axis) | ### Methods | Method | Returns | Description | |--------|---------|-------------| | `is_valid()` | `bool` | Check if the rotation quaternion is normalized | | `normalize_rotation()` | `None` | Normalize the rotation quaternion in-place | ## PoseStamped Timestamped pose. ```python # simplified ps = horus.PoseStamped( x=1.0, y=2.0, z=0.0, qx=0.0, qy=0.0, qz=0.0, qw=1.0, frame_id="map", timestamp_ns=horus.timestamp_ns(), ) ``` ## PoseWithCovariance Pose with uncertainty estimate (6x6 covariance matrix, row-major). ```python # simplified pwc = PoseWithCovariance( x=1.0, y=2.0, z=0.0, qx=0.0, qy=0.0, qz=0.0, qw=1.0, frame_id="map", ) pwc.covariance = [0.01] * 36 # 6x6 row-major ``` | Field | Type | Description | |-------|------|-------------| | `x`, `y`, `z` | `float` | Position | | `qx`, `qy`, `qz`, `qw` | `float` | Quaternion orientation | | `covariance` | `list[float]` | 6x6 covariance matrix (36 elements, row-major) | | `frame_id` | `str` | Coordinate frame | | `timestamp_ns` | `int` | Timestamp | | Method | Returns | Description | |--------|---------|-------------| | `position_variance()` | `list[float]` | Diagonal elements [0,0], [1,1], [2,2] — position uncertainty (x, y, z) | | `orientation_variance()` | `list[float]` | Diagonal elements [3,3], [4,4], [5,5] — orientation uncertainty (roll, pitch, yaw) | ## TwistWithCovariance Twist with uncertainty estimate (6x6 covariance matrix, row-major). ```python # simplified twc = TwistWithCovariance( linear_x=0.5, linear_y=0.0, linear_z=0.0, angular_x=0.0, angular_y=0.0, angular_z=0.1, frame_id="base_link", ) twc.covariance = [0.01] * 36 ``` | Field | Type | Description | |-------|------|-------------| | `linear_x`, `linear_y`, `linear_z` | `float` | Linear velocity | | `angular_x`, `angular_y`, `angular_z` | `float` | Angular velocity | | `covariance` | `list[float]` | 6x6 covariance matrix (36 elements, row-major) | | `frame_id` | `str` | Coordinate frame | | `timestamp_ns` | `int` | Timestamp | | Method | Returns | Description | |--------|---------|-------------| | `linear_variance()` | `list[float]` | Diagonal elements [0,0], [1,1], [2,2] — linear velocity uncertainty | | `angular_variance()` | `list[float]` | Diagonal elements [3,3], [4,4], [5,5] — angular velocity uncertainty | --- ## See Also - [Sensor Messages](/python/api/sensor-messages) — Odometry uses Pose2D + Twist - [Navigation Messages](/python/api/navigation-messages) — Goals, paths, waypoints - [TransformFrame API](/python/api/transform-frame) — Coordinate frame tree - [Rust Geometry Messages](/rust/api/geometry-messages) — Rust equivalent --- ## Production Deployment (Python) Path: /python/production # Production Deployment (Python) Your Python HORUS nodes work on your laptop. Now they need to run 24/7 on a robot with no keyboard, no monitor, and nobody watching. This page covers virtual environments, dependency pinning, systemd services, logging, monitoring, garbage collection tuning, memory profiling, and the decision of what stays in Python versus what gets rewritten. --- ## Virtual Environment Setup Always isolate HORUS Python nodes in a virtual environment. System Python packages drift between OS updates and break silently. ```bash # Create a dedicated venv for your project python3 -m venv /opt/myrobot/venv # Activate and install horus source /opt/myrobot/venv/bin/activate pip install maturin cd /path/to/horus/horus_py maturin develop --release # Install your project dependencies pip install -r requirements.txt ``` ### venv in horus.toml Projects If you are using `horus.toml` for project management, HORUS generates a `.horus/pyproject.toml` from your manifest. The venv still works -- install the generated project after activation: ```bash source /opt/myrobot/venv/bin/activate cd /path/to/your/project horus build # generates .horus/pyproject.toml from horus.toml pip install -e .horus/ ``` --- ## Dependency Pinning Pin every dependency version. An unpinned `numpy` upgrade at 3 AM will crash your robot at 3:01 AM. ### requirements.txt ```text numpy==1.26.4 opencv-python-headless==4.9.0.80 torch==2.2.1+cpu onnxruntime==1.17.1 scipy==1.12.0 ``` Generate from your working environment: ```bash pip freeze > requirements.txt ``` ### horus.toml For HORUS-managed projects, pin in the manifest: ```toml [dependencies] numpy = { version = "1.26.4", source = "pypi" } opencv-python-headless = { version = "4.9.0.80", source = "pypi" } torch = { version = "2.2.1+cpu", source = "pypi" } ``` HORUS generates `horus.lock` (lockfile v3) with exact resolved versions for reproducible installs across machines. ### CPU-Only PyTorch Production robots rarely have datacenter GPUs. Use the CPU-only torch build to save 2 GB of disk and avoid CUDA driver version mismatches: ```bash pip install torch==2.2.1+cpu --index-url https://download.pytorch.org/whl/cpu ``` --- ## systemd Service Files Run HORUS Python nodes as systemd services for automatic restart, logging, and boot-time startup. ### Basic Service ```ini # /etc/systemd/system/horus-myrobot.service [Unit] Description=HORUS MyRobot Nodes After=network.target [Service] Type=simple User=robot Group=robot WorkingDirectory=/opt/myrobot Environment=PATH=/opt/myrobot/venv/bin:/usr/local/bin:/usr/bin ExecStart=/opt/myrobot/venv/bin/python -u main.py Restart=on-failure RestartSec=3 StandardOutput=journal StandardError=journal # Shared memory access SupplementaryGroups= # Real-time scheduling (optional) LimitMEMLOCK=infinity LimitRTPRIO=99 [Install] WantedBy=multi-user.target ``` ### Key Settings | Setting | Value | Why | |---------|-------|-----| | `Type=simple` | Required | HORUS blocks on `horus.run()` | | `User=robot` | Dedicated user | Never run as root in production | | `-u` flag on Python | Required | Unbuffered output so journald gets logs immediately | | `Restart=on-failure` | Auto-restart | systemd restarts if the process exits non-zero | | `RestartSec=3` | 3 second delay | Prevents restart loops from burning CPU | | `LimitMEMLOCK=infinity` | For RT nodes | Allows memory locking to prevent page faults | | `LimitRTPRIO=99` | For RT nodes | Allows real-time scheduling priority | ### Enable and Start ```bash sudo systemctl daemon-reload sudo systemctl enable horus-myrobot.service sudo systemctl start horus-myrobot.service # Check status sudo systemctl status horus-myrobot.service # View logs journalctl -u horus-myrobot.service -f ``` ### Multi-Node Service with Separate Processes For process isolation, run each node as its own service: ```ini # /etc/systemd/system/horus-camera.service [Unit] Description=HORUS Camera Node After=network.target [Service] Type=simple User=robot Environment=PATH=/opt/myrobot/venv/bin:/usr/local/bin:/usr/bin ExecStart=/opt/myrobot/venv/bin/python -u nodes/camera_node.py Restart=on-failure RestartSec=2 [Install] WantedBy=horus-myrobot.target ``` ```ini # /etc/systemd/system/horus-planner.service [Unit] Description=HORUS Planner Node After=horus-camera.service [Service] Type=simple User=robot Environment=PATH=/opt/myrobot/venv/bin:/usr/local/bin:/usr/bin ExecStart=/opt/myrobot/venv/bin/python -u nodes/planner_node.py Restart=on-failure RestartSec=2 [Install] WantedBy=horus-myrobot.target ``` Use `After=` to express startup order between nodes. Use a shared `.target` to start/stop the entire robot stack as one unit. --- ## Log Collection HORUS nodes produce two streams of logs: structured logs from `node.log_*()` calls and standard output from `print()`. ### horus logs The `horus logs` CLI command reads the structured log stream: ```bash # Follow logs from all running nodes horus logs -f # Filter by node name horus logs -f --node camera # Filter by level horus logs -f --level warning ``` ### node.log_* Output Inside your tick function, use the structured logging methods: ```python # simplified def tick(node): node.log_info("Frame processed") node.log_warning("Latency spike: 12ms") node.log_error("Motor timeout") node.log_debug("Raw encoder: 4821") ``` These go through the scheduler's logging pipeline, tagged with the node name and timestamp. They appear in `horus logs` and, when running under systemd, in the journal. ### journald Integration When running as a systemd service, all output (structured logs and print statements) goes to the journal: ```bash # Live follow journalctl -u horus-myrobot.service -f # Last 100 lines journalctl -u horus-myrobot.service -n 100 # Since last boot journalctl -u horus-myrobot.service -b # Export for analysis journalctl -u horus-myrobot.service --output=json > logs.json ``` ### Log Rotation journald handles rotation automatically. For long-running deployments, configure retention: ```ini # /etc/systemd/journald.conf.d/horus.conf [Journal] SystemMaxUse=500M MaxRetentionSec=7d ``` --- ## Performance Tuning ### What Stays in Python Python is the right choice for nodes that are I/O-bound, compute-heavy-but-batchable, or change frequently: | Node type | Why Python works | Typical rate | |-----------|-----------------|--------------| | ML inference | PyTorch/ONNX ecosystem, GPU offload | 10-30 Hz | | Data logging | I/O-bound (disk, database, network) | 1-10 Hz | | Path planning | Scipy/numpy, compute=True offloads to thread pool | 1-10 Hz | | Visualization | matplotlib, OpenCV display | 1-30 Hz | | HTTP/API integration | aiohttp, async nodes handle I/O naturally | 0.1-10 Hz | | Prototyping | Fast iteration, no compile step | Any | ### When to Rewrite in Python to Another Language Rewrite a node when Python becomes the bottleneck, not before. Profile first: | Signal | What it means | Action | |--------|---------------|--------| | `tick()` consistently exceeds budget | CPU-bound work is too slow | Profile, optimize, then rewrite hot path | | Deadline misses under load | GIL contention or GC pauses | Try `gc.disable()`, then rewrite if still missing | | Memory growing unbounded | Python object overhead | Profile with tracemalloc, rewrite if unfixable | | Latency jitter >1ms at >100Hz | Python overhead is inherent | Rewrite -- Python cannot do sub-ms deterministic ticks | **The practical threshold:** If your node needs deterministic ticks above 100 Hz, or sub-millisecond jitter, rewrite it. Below that, Python is fine. --- ## Monitoring Python Nodes ### horus monitor The `horus monitor` command shows a live dashboard of all running nodes: ```bash horus monitor ``` This shows per-node tick rate, budget usage, deadline misses, error counts, and health state. ### Programmatic Monitoring Use the `Scheduler` API to query stats from within your code: ```python # simplified import horus sched = horus.Scheduler(tick_rate=1000, rt=True) sched.add(sensor_node) sched.add(planner_node) # Start in background or check after running stats = sched.get_node_stats("sensor") print(f"Total ticks: {stats['total_ticks']}") print(f"Errors: {stats['errors_count']}") ``` ### safety_stats() For safety-critical deployments, query the safety monitor: ```python # simplified safety = sched.safety_stats() if safety: print(f"Watchdog: {safety}") # Returns dict with watchdog stats, deadline misses, health states ``` ### Health Checks Build a health-check endpoint for external monitoring (Prometheus, Grafana, fleet manager): ```python # simplified import horus import json from http.server import HTTPServer, BaseHTTPRequestHandler sched = None class HealthHandler(BaseHTTPRequestHandler): def do_GET(self): stats = {} for name in ["camera", "planner", "motor"]: stats[name] = sched.get_node_stats(name) healthy = all(s.get("errors_count", 0) == 0 for s in stats.values()) self.send_response(200 if healthy else 503) self.send_header("Content-Type", "application/json") self.end_headers() self.wfile.write(json.dumps(stats).encode()) def log_message(self, format, *args): pass # Suppress access logs def start_health_server(): server = HTTPServer(("0.0.0.0", 8080), HealthHandler) server.serve_forever() ``` Run the health server in a background thread or as a separate async node. --- ## Garbage Collection Tuning Python's garbage collector introduces non-deterministic pauses. For nodes with timing constraints, tune or disable it. ### Disable GC for Real-Time Nodes If your node has a tight budget (sub-10ms) and allocates few objects per tick, disable GC entirely: ```python # simplified import gc import horus def init(node): gc.disable() node.log_info("GC disabled for RT node") def tick(node): # Pre-allocated buffers only -- no new objects per tick cmd = node.recv("cmd_vel") if cmd: apply_command(cmd) motor = horus.Node( name="motor", tick=tick, init=init, rate=1000, subs=["cmd_vel"], failure_policy="fatal", ) ``` **Requirement:** When GC is disabled, you must not create circular references. Use pre-allocated buffers, avoid closures that capture `self`, and avoid building data structures in `tick()`. If you leak memory with GC disabled, it is never reclaimed. ### Tune GC Thresholds for Other Nodes For nodes that allocate objects (ML inference, data processing), tune the thresholds instead of disabling: ```python # simplified import gc def init(node): # Default: (700, 10, 10) # Raise gen0 threshold to reduce collection frequency gc.set_threshold(1500, 15, 15) node.log_info(f"GC thresholds: {gc.get_threshold()}") ``` Higher thresholds mean fewer GC pauses but higher peak memory usage. Measure both latency and memory for your workload. ### Manual GC Between Ticks For the best control, disable automatic GC and trigger collection manually during idle periods: ```python # simplified import gc import horus gc.disable() def tick(node): if not node.has_msg("camera.rgb"): # No frame to process -- good time to collect gc.collect(generation=0) # Only gen0, fast (~100us) return frame = node.recv("camera.rgb") detect(frame) ``` --- ## Memory Profiling ### tracemalloc for Leak Detection Python nodes running for days can leak memory through accumulating references. Use `tracemalloc` to find the source: ```python # simplified import tracemalloc import horus tracemalloc.start(10) # Keep 10 frames of traceback tick_count = 0 baseline = None def tick(node): global tick_count, baseline tick_count += 1 # Normal work process_data(node) # Snapshot every 10000 ticks if tick_count % 10000 == 0: snapshot = tracemalloc.take_snapshot() if baseline is None: baseline = snapshot else: stats = snapshot.compare_to(baseline, "lineno") for stat in stats[:5]: node.log_warning(f"Memory growth: {stat}") ``` ### What to Look For | Pattern | Likely cause | Fix | |---------|-------------|-----| | Steady growth in one file/line | List or dict accumulating entries | Cap size or use `collections.deque(maxlen=N)` | | Growth in `node.recv()` calls | Holding references to old messages | Process and discard, do not store | | Growth in `json.loads()` | String interning or dict caching | Use `msgpack` or typed messages instead | | Growth in third-party library | Library-internal caching | Check library docs for cache control | ### Resource Monitoring Monitor system resources from within a node: ```python # simplified import os import resource import horus def monitor_tick(node): # RSS (Resident Set Size) in MB usage = resource.getrusage(resource.RUSAGE_SELF) rss_mb = usage.ru_maxrss / 1024 # Linux reports in KB node.send("diagnostics.memory", { "rss_mb": rss_mb, "pid": os.getpid(), }) if rss_mb > 500: node.log_warning(f"High memory: {rss_mb:.0f} MB") monitor = horus.Node( name="resource_monitor", tick=monitor_tick, rate=1, pubs=["diagnostics.memory"], ) ``` --- ## Mixed Deployments The most effective production architectures combine Python and other HORUS-supported languages. Each language handles what it does best, communicating through zero-copy shared memory topics. ### Typical Architecture ``` Camera Driver (high-freq, safety) ──→ camera.rgb topic ML Inference (Python, PyTorch) ←── camera.rgb topic ──→ detections topic Path Planner (Python, scipy) ←── detections topic ──→ path topic Motor Controller (high-freq, RT) ←── path topic ──→ motor.status topic Safety Monitor (high-freq, RT) ←── motor.status topic ``` Safety-critical nodes (camera driver, motor controller, safety monitor) benefit from compiled languages. Python handles ML inference and path planning where ecosystem libraries matter more than tick latency. ### Running Together Each process runs independently. They communicate through HORUS topics over shared memory: ```bash # Terminal 1: compiled safety-critical nodes horus run safety_stack # Terminal 2: Python ML nodes source /opt/myrobot/venv/bin/activate python ml_nodes.py # Terminal 3: Python planner source /opt/myrobot/venv/bin/activate python planner.py ``` Or use systemd to manage all processes: ```ini # /etc/systemd/system/horus-safety.service [Service] ExecStart=/usr/local/bin/horus run safety_stack # /etc/systemd/system/horus-ml.service [Service] ExecStart=/opt/myrobot/venv/bin/python -u ml_nodes.py # /etc/systemd/system/horus-planner.service [Service] ExecStart=/opt/myrobot/venv/bin/python -u planner.py ``` ### The Handoff Pattern When a Python prototype node gets promoted to production, the topic interface stays the same. Only the implementation changes: ```python # simplified # Python prototype (runs at 30 Hz, good enough for testing) def planner_tick(node): scan = node.recv("lidar.scan") if scan: path = compute_path(scan) # scipy A* node.send("path", path) ``` The compiled replacement subscribes to the same topics and publishes the same messages. No other node needs to change. This is the key benefit of topic-based IPC: language boundaries are invisible to the rest of the system. --- ## Pre-Deployment Checklist Before shipping Python nodes to production: - [ ] All dependencies pinned in `requirements.txt` or `horus.toml` - [ ] Virtual environment created and tested on target hardware - [ ] systemd service file with `Restart=on-failure` - [ ] `failure_policy` set on every node (not relying on defaults) - [ ] `node.log_*()` used instead of `print()` for operational messages - [ ] GC tuned or disabled for nodes with timing constraints - [ ] Memory profiled under sustained load (run for hours, check RSS) - [ ] `horus monitor` shows all nodes healthy under load - [ ] Health-check endpoint accessible for external monitoring - [ ] Shared memory cleaned before first deploy (`horus clean --shm`) --- ## Design Decisions **Why venv instead of containers?** Containers add overhead (cgroup management, overlay filesystem, network namespacing) that hurts real-time performance. Shared memory IPC between containers requires explicit `--ipc=host` flags that defeat isolation. A virtual environment gives dependency isolation without the performance or IPC penalty. Use containers for CI/CD and development, not for production robots. **Why systemd instead of a HORUS-native process manager?** systemd is battle-tested, ships with every Linux distribution, integrates with journald for logging, and supports cgroup resource limits. Building a custom process manager would duplicate all of this poorly. The HORUS scheduler manages node execution within a process; systemd manages processes within the system. Each tool does what it does best. **Why not auto-detect which nodes need GC tuning?** Garbage collection impact depends on allocation patterns, object lifetimes, and timing requirements -- all application-specific. A node publishing pre-allocated IMU structs at 1000 Hz needs GC disabled. A node building detection lists at 10 Hz needs GC enabled. There is no heuristic that works for both. Explicit tuning by the developer is the only reliable approach. ## Trade-offs **Python for ML vs compiled inference:** Python gives you the full PyTorch/ONNX/HuggingFace ecosystem. Compiled inference (ONNX Runtime C++, TensorRT) gives lower latency and no GIL. For most robotics workloads, Python inference at 10-30 Hz is fast enough. Rewrite when profiling shows that Python overhead (not model inference) is the bottleneck. **Single process vs multi-process:** Running all Python nodes in one process (one `horus.run()` call) shares the GIL. Running each node as a separate process (separate systemd services) avoids GIL contention but uses more memory and loses in-process topic shortcuts. Single process is simpler to deploy. Multi-process scales better when you have CPU-bound Python nodes competing for the GIL. **gc.disable() vs gc.set_threshold():** Disabling GC eliminates pauses completely but risks memory leaks if you create circular references. Tuning thresholds reduces pause frequency without eliminating leaks. For nodes with pre-allocated buffers and no circular references, disable. For nodes that build temporary data structures, tune thresholds. When in doubt, tune rather than disable -- a slow leak is easier to debug than a mysterious OOM after 48 hours. **Pinned versions vs version ranges:** Pinned versions (`numpy==1.26.4`) guarantee reproducibility but require manual updates for security patches. Version ranges (`numpy>=1.26,<1.27`) allow patch updates but risk behavior changes. For production robots, pin everything. Run `pip install --upgrade` in CI, run your test suite, and pin the new versions explicitly. --- ## See Also - [Error Handling](/python/error-handling) -- Exception types and failure policies - [Python Bindings](/python/api/python-bindings) -- Core Python API reference - [Async Nodes](/python/api/async-nodes) -- Async I/O nodes for HTTP and database - [ML Guide](/python/library/ml-guide) -- ML inference optimization --- ## Navigation Messages Path: /python/api/navigation-messages Description: Path planning, goals, waypoints, occupancy grids, and cost maps # Navigation Messages Data types for autonomous navigation — goals, paths, maps, and obstacle avoidance. ```python # simplified from horus import NavGoal, GoalResult, PathPlan, Waypoint, NavPath, OccupancyGrid, CostMap ``` --- ## NavGoal Navigation goal — target pose with tolerance. ```python # simplified import horus goal = horus.NavGoal( x=5.0, y=3.0, theta=1.57, position_tolerance=0.1, angle_tolerance=0.05, timeout=30.0, timestamp_ns=horus.timestamp_ns(), ) ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `x`, `y`, `theta` | `float` | `0.0` | Target pose (m, rad) | | `position_tolerance` | `float` | `0.1` | Position tolerance (m) | | `angle_tolerance` | `float` | `0.1` | Heading tolerance (rad) | | `timeout` | `float` | `30.0` | Goal timeout (seconds) | | `tolerance_position` | `float` | — | Position tolerance getter (same as position_tolerance) | | `tolerance_angle` | `float` | — | Angle tolerance getter (same as angle_tolerance) | | `timeout_seconds` | `float` | — | Timeout getter (same as timeout) | | `goal_id` | `int` | `0` | Goal identifier | | `priority` | `int` | `0` | Goal priority (higher = more important) | | `timestamp_ns` | `int` | `0` | Timestamp in nanoseconds | **Methods:** | Method | Returns | Description | |--------|---------|-------------| | `with_timeout(seconds)` | `NavGoal` | Return a copy with timeout set | | `with_priority(priority)` | `NavGoal` | Return a copy with priority set | | `is_position_reached(current)` | `bool` | True if current `Pose2D` is within position tolerance | | `is_orientation_reached(current)` | `bool` | True if current `Pose2D` is within angle tolerance | | `is_reached(current)` | `bool` | True if both position and orientation are reached | ## GoalResult Navigation goal outcome. ```python # simplified result = horus.GoalResult(goal_id=1, status=0, progress=1.0, timestamp_ns=horus.timestamp_ns()) ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `goal_id` | `int` | `0` | Associated goal identifier | | `status` | `int` | `0` | Result status code | | `progress` | `float` | `0.0` | Completion progress (0.0-1.0) | | `distance_to_goal` | `float` | `0.0` | Remaining distance to goal (m) | | `eta_seconds` | `float` | `0.0` | Estimated time of arrival (seconds) | | `timestamp_ns` | `int` | `0` | Timestamp in nanoseconds | | Method | Returns | Description | |--------|---------|-------------| | `with_error(message)` | `GoalResult` | Return a copy with error message set | ## Waypoint Single waypoint in a path. ```python # simplified wp = horus.Waypoint(x=2.0, y=3.0, theta=0.0) ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `x`, `y`, `theta` | `float` | `0.0` | Waypoint pose (m, rad) | | `pose` | `Pose2D` | — | Waypoint pose as Pose2D object | | `velocity` | `Twist` | — | Desired velocity at this waypoint | | `curvature` | `float` | `0.0` | Path curvature at this point | | `stop_required` | `bool` | `False` | Whether the robot must stop here | | Method | Returns | Description | |--------|---------|-------------| | `with_velocity(twist)` | `Waypoint` | Return a copy with velocity set | | `with_stop()` | `Waypoint` | Return a copy with stop_required=True | ## NavPath Ordered list of waypoints. Created with no parameters. ```python # simplified path = horus.NavPath() ``` | Getter | Type | Description | |--------|------|-------------| | `waypoint_count` | `int` | Number of waypoints | | `total_length` | `float` | Total path length (m) | | `duration_seconds` | `float` | Estimated path duration (s) | | `timestamp_ns` | `int` | Timestamp | | Method | Returns | Description | |--------|---------|-------------| | `add_waypoint(waypoint)` | `None` | Append a `Waypoint` to the path | | `get_waypoints()` | `list` | Get all waypoints | | `closest_waypoint_index(pose)` | `int` or `None` | Index of closest waypoint to a `Pose2D` | | `calculate_progress(pose)` | `float` | Progress along path (0.0-1.0) for a `Pose2D` | ## PathPlan Planned path to a goal. ```python # simplified plan = horus.PathPlan(goal_x=5.0, goal_y=3.0, goal_theta=1.57, timestamp_ns=horus.timestamp_ns()) ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `goal_x`, `goal_y`, `goal_theta` | `float` | `0.0` | Goal pose (m, rad) | | `waypoint_count` | `int` | — | Number of waypoints (getter only) | | `goal_pose` | `tuple` | — | Goal as (x, y, theta) tuple (getter only) | | `timestamp_ns` | `int` | `0` | Timestamp in nanoseconds | **Static Methods:** | Method | Returns | Description | |--------|---------|-------------| | `PathPlan.from_waypoints(waypoints, goal_x, goal_y, goal_theta)` | `PathPlan` | Create from a list of [x, y, theta] waypoints | **Methods:** | Method | Returns | Description | |--------|---------|-------------| | `add_waypoint(x, y, theta)` | `bool` | Add a waypoint, returns False if full | | `waypoint(index)` | `tuple` or `None` | Get waypoint at index as (x, y, theta) | | `is_empty()` | `bool` | True if no waypoints | ## OccupancyGrid 2D grid map — each cell is free (0), occupied (100), or unknown (-1). ```python # simplified grid = horus.OccupancyGrid(width=100, height=100, resolution=0.05) ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `width`, `height` | `int` | `0` | Grid dimensions (cells) | | `resolution` | `float` | `0.05` | Meters per cell | | `origin` | `Pose2D` | — | Map origin (bottom-left corner) | | `data` | `list[int]` | — | Occupancy values (-1 to 100) | | `timestamp_ns` | `int` | `0` | Timestamp | **Methods:** | Method | Returns | Description | |--------|---------|-------------| | `occupancy(gx, gy)` | `int` or `None` | Get occupancy value at grid coordinates | | `set_occupancy(gx, gy, value)` | `bool` | Set occupancy value, returns False if out of bounds | | `is_free(x, y)` | `bool` | True if world point is free (occupancy 0-49) | | `is_occupied(x, y)` | `bool` | True if world point is occupied (occupancy >= 50) | | `world_to_grid(x, y)` | `tuple` or `None` | Convert world coords to grid indices | | `grid_to_world(gx, gy)` | `tuple` or `None` | Convert grid indices to world coords | ## CostMap Weighted grid for path planning — higher cost = harder to traverse. ```python # simplified costmap = horus.CostMap(grid=OccupancyGrid(), inflation_radius=0.55) ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `occupancy_grid` | `OccupancyGrid` | — | Underlying occupancy grid (getter) | | `costs` | `list[int]` | — | Cost values (0-255) | | `inflation_radius` | `float` | `0.55` | Obstacle inflation radius (m) | | `cost_scaling_factor` | `float` | — | Cost decay factor | | `lethal_cost` | `int` | — | Cost threshold for lethal obstacles | **Methods:** | Method | Returns | Description | |--------|---------|-------------| | `cost(x, y)` | `int` or `None` | Get cost at world coordinates | | `compute_costs()` | `None` | Recompute costs from occupancy data | ## VelocityObstacle / VelocityObstacles Dynamic obstacle representation for reactive avoidance. ```python # simplified vo = horus.VelocityObstacle( px=2.0, py_=1.0, vx=0.5, vy=0.0, radius=0.3, time_horizon=1.0, obstacle_id=0, ) vos = horus.VelocityObstacles() # No parameters ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `px` | `float` | `0.0` | Position X (m) | | `py_` | `float` | `0.0` | Position Y (m) — note trailing underscore | | `vx`, `vy` | `float` | `0.0` | Velocity (m/s) | | `radius` | `float` | `0.0` | Obstacle radius (m) | | `time_horizon` | `float` | `1.0` | Planning time horizon (s) | | `obstacle_id` | `int` | `0` | Obstacle identifier | **VelocityObstacles methods:** | Method | Returns | Description | |--------|---------|-------------| | `add_obstacle(obs)` | `None` | Add a `VelocityObstacle` | | `get_obstacles()` | `list` | Get all obstacles | --- ## See Also - [Geometry Messages](/python/api/geometry-messages) — Pose2D, Twist, Vector3 - [Sensor Messages](/python/api/sensor-messages) — LaserScan, Odometry - [Rust Navigation Messages](/rust/api/navigation-messages) — Rust equivalent --- ## Performance Guide (Python) Path: /python/performance Description: Optimization techniques for Python HORUS nodes — topic types, GIL management, zero-copy patterns, GPU interop, and profiling # Performance Guide (Python) Your Python node works. Now you want it faster. This page covers every optimization available to Python HORUS nodes — from topic type selection to GPU interop — with concrete latency numbers so you can decide what matters for your application. **Golden rule:** Optimize only after your system works correctly. A fast controller that computes the wrong output is worse than a slow one that gets it right. --- ## Quick Reference: Operation Latencies | Operation | Typical Latency | Notes | |-----------|----------------|-------| | Typed topic send/recv (`horus.CmdVel`) | ~1.5-1.7 us | Zero-copy Pod, binary-compatible with Rust | | Dict topic send (small, 3-5 keys) | ~6-12 us | MessagePack serialization | | Dict topic send (large, 50+ keys) | ~50-110 us | Proportional to dict size | | `Image.to_numpy()` | ~3 us | Zero-copy view into SHM pool | | `Image.to_torch()` | ~3 us | Zero-copy via DLPack | | `Image.from_numpy()` | ~50-200 us | One copy into SHM pool (size-dependent) | | `torch.from_dlpack(tensor)` | ~1 us | Zero-copy tensor exchange | | `tensor.cuda()` (CPU to GPU) | ~50 us | Unavoidable PCIe transfer | | GIL acquire per tick | ~3 us | Fixed overhead per Python tick | | Runtime custom message | ~20-40 us | `struct` serialization | | Compiled custom message | ~3-5 us | Generated PyO3 bindings | | `node.recv()` (no message) | ~0.1 us | Lock-free ring buffer check | --- ## Dict Topics vs Typed Topics String topics (`pubs=["data"]`) use `GenericMessage` with MessagePack serialization. Typed topics (`pubs=[horus.CmdVel]`) use zero-copy Pod transport. The performance gap is significant. ### Benchmarks ``` Dict topic (3 keys): ~6-12 μs per send/recv Dict topic (50+ keys): ~50-110 μs per send/recv Typed topic (CmdVel): ~1.5-1.7 μs per send/recv ``` A 4x-30x difference. For a control loop at 100Hz (10ms budget), dict overhead is negligible. For 1kHz loops (1ms budget), it consumes 5-10% of your budget. ### When to Upgrade **Stay with dicts** when: - Prototyping and schema is changing frequently - Rate is <50Hz and message is small (<10 keys) - Communication is Python-to-Python only - You value iteration speed over latency **Switch to typed** when: - Rate is >100Hz or budget is <1ms - Messages cross to Rust nodes (dicts cannot cross the language boundary) - You need deterministic, predictable latency - Message schema is stable ### Upgrading ```python # simplified # Before: dict topic (~8 μs) node = horus.Node( name="controller", pubs=["cmd_vel"], tick=my_tick, rate=100, ) def my_tick(node): node.send("cmd_vel", {"linear": 0.5, "angular": 0.1}) # After: typed topic (~1.5 μs) node = horus.Node( name="controller", pubs=[horus.CmdVel], tick=my_tick, rate=100, ) def my_tick(node): node.send("cmd_vel", horus.CmdVel(linear=0.5, angular=0.1)) ``` The node API is identical — only the `pubs`/`subs` spec and the data you pass to `send()` change. --- ## GIL Impact on Tick Latency Every Python tick acquires the GIL. This costs ~3 us per tick — fixed, unavoidable overhead. The GIL is released during `run()` and re-acquired only when the scheduler calls your `tick`, `init`, or `shutdown` callback. ### What This Means in Practice | Tick Rate | GIL Overhead per Second | Budget Consumed | |-----------|------------------------|-----------------| | 10 Hz | 30 us | Negligible | | 100 Hz | 300 us | Negligible | | 1 kHz | 3 ms | 0.3% of wall time | For most Python nodes, GIL overhead is irrelevant. It becomes a concern only at very high tick rates (>500Hz) where the 3 us per tick adds up. ### GC Pauses Python's garbage collector can introduce unpredictable pauses: - **Generation 0 collection:** ~0.1-0.5 ms (frequent, small) - **Generation 1 collection:** ~1-5 ms (less frequent) - **Generation 2 collection:** ~5-50 ms (rare, large heap) For latency-sensitive nodes, minimize allocations inside `tick()`: ```python # simplified import gc # Pre-allocate outside tick cmd = horus.CmdVel(linear=0.0, angular=0.0) def controller_tick(node): scan = node.recv("scan") if scan: # Reuse pre-allocated message — no allocation in tick cmd.linear = 0.5 if min(scan.ranges) > 0.3 else 0.0 cmd.angular = 0.0 node.send("cmd_vel", cmd) # For tight budgets, disable GC during critical phases def init(node): gc.disable() # Manual GC control def shutdown(node): gc.enable() ``` **Warning:** Disabling GC risks memory growth. Only do this for short-duration, allocation-light nodes. --- ## Zero-Copy Patterns Pool-backed types (`Image`, `PointCloud`, `DepthImage`, `Tensor`) use shared memory. The zero-copy path avoids serialization entirely. ### The Zero-Copy Pipeline ``` Camera Node (Rust, 30Hz) │ │ Image descriptor (64 bytes) via ring buffer │ Pixel data stays in SHM pool ▼ Python Node │ ├── img.to_numpy() ~3 μs (NumPy view into SHM, no copy) ├── img.to_torch() ~3 μs (DLPack, no copy) ├── img.to_jax() ~3 μs (DLPack, no copy) │ │ Processing happens on SHM data directly │ ├── Image.from_numpy() ~50-200 μs (one copy into SHM pool) └── node.send() ~1.5 μs (descriptor only) ``` **Key insight:** `to_*()` methods are zero-copy. `from_*()` methods copy once into the pool. Design your pipeline to minimize `from_*()` calls. ### What Copies and What Does Not | Operation | Copy? | Latency | Why | |-----------|-------|---------|-----| | `img.to_numpy()` | No | ~3 us | Returns view into existing SHM | | `img.to_torch()` | No | ~3 us | DLPack wraps SHM pointer | | `img.to_jax()` | No | ~3 us | DLPack wraps SHM pointer | | `img.as_tensor()` | No | ~3 us | Tensor shares same SHM slot | | `Image.from_numpy(arr)` | Yes (1x) | ~50-200 us | Must place data in pool slot | | `Image.from_torch(t)` | Yes (1x) | ~50-200 us | Must place data in pool slot | | `node.send("topic", dict)` | Yes | ~6-50 us | MessagePack serialization | | `node.send("topic", typed)` | No | ~1.5 us | Pod copied into ring buffer slot | ### Anti-Pattern: Unnecessary Copies ```python # simplified # BAD: Two copies — to_numpy creates a view, but np.array() copies it def tick(node): img = node.recv("camera.rgb") arr = np.array(img.to_numpy()) # Unnecessary copy! process(arr) # GOOD: One view, zero copies def tick(node): img = node.recv("camera.rgb") arr = img.to_numpy() # Zero-copy view process(arr) ``` --- ## NumPy Interop HORUS pool-backed types implement the array protocol. NumPy operations work directly on shared memory. ### Direct Array Operations ```python # simplified import numpy as np def vision_tick(node): img = node.recv("camera.rgb") if img is None: return pixels = img.to_numpy() # (H, W, C) view, zero-copy # NumPy operations on SHM data — no copies gray = np.mean(pixels, axis=2, dtype=np.float32) edges = np.abs(np.diff(gray, axis=1)) obstacle_count = np.sum(edges > 128) node.send("obstacles", {"count": int(obstacle_count)}) ``` ### Avoiding Copies with NumPy ```python # simplified # BAD: .copy() forces allocation cropped = pixels[100:300, 200:400].copy() # GOOD: Slice is a view (no copy until you write to it) cropped = pixels[100:300, 200:400] # BAD: astype() always copies float_pixels = pixels.astype(np.float32) # GOOD: Use view if memory layout allows float_pixels = pixels.view(np.float32) # Only works for same-size dtypes ``` ### PointCloud with NumPy ```python # simplified cloud = node.recv("lidar.points") if cloud: points = cloud.to_numpy() # (N, 3) float32, zero-copy # Filter ground plane — operates on SHM data above_ground = points[points[:, 2] > 0.1] # Compute centroid centroid = np.mean(above_ground, axis=0) ``` --- ## GPU Interop HORUS supports zero-copy tensor exchange with PyTorch, JAX, and CuPy via DLPack. ### DLPack with PyTorch ```python # simplified import torch def ml_tick(node): img = node.recv("camera.rgb") if img is None: return # Zero-copy: SHM → PyTorch CPU tensor cpu_tensor = img.to_torch() # ~3 μs, no copy # CPU → GPU (unavoidable PCIe transfer, ~50 μs) gpu_tensor = cpu_tensor.cuda().float() / 255.0 gpu_tensor = gpu_tensor.permute(2, 0, 1).unsqueeze(0) # Inference with torch.no_grad(): output = model(gpu_tensor) # GPU → CPU results = output.cpu().numpy() node.send("detections", parse_results(results)) ``` ### DLPack with JAX ```python # simplified import jax def jax_tick(node): img = node.recv("camera.rgb") if img is None: return # Zero-copy: SHM → JAX array jax_array = img.to_jax() # ~3 μs # JAX processing processed = jax.numpy.mean(jax_array, axis=2) node.send("processed", {"mean_brightness": float(processed.mean())}) ``` ### Tensor Bridge for Custom Data Use `.as_tensor()` to get a general-purpose Tensor from any pool-backed type, then pass it to any framework via DLPack: ```python # simplified import torch img = node.recv("camera.rgb") t = img.as_tensor() # shape=[480, 640, 3], zero-copy pt = torch.from_dlpack(t) # zero-copy to PyTorch # Process with PyTorch... ``` ### GPU Pipeline Performance ``` img.to_torch() ~3 μs (SHM → CPU tensor, zero-copy) tensor.cuda() ~50 μs (CPU → GPU, PCIe transfer) model(tensor) ~5-30 ms (GPU inference) output.cpu() ~20 μs (GPU → CPU) node.send(results) ~6-12 μs (dict) or ~1.5 μs (typed) ─────────────────────────────────── Total pipeline: ~5-30 ms (dominated by inference) ``` The IPC overhead (3 us + 6 us) is negligible compared to GPU inference time. Optimize the model, not the transport. --- ## Profiling ### budget_remaining() Check how much time is left in your tick budget: ```python # simplified import horus def adaptive_tick(node): img = node.recv("camera.rgb") if img is None: return frame = img.to_numpy() # Always run fast detection fast_result = fast_detect(frame) node.send("detections", fast_result) # Run expensive refinement only if budget allows remaining = horus.budget_remaining() if remaining > 5 * horus.ms: refined = expensive_refinement(frame, fast_result) node.send("detections.refined", refined) node = horus.Node( name="adaptive_detector", subs=[horus.Image], pubs=["detections", "detections.refined"], tick=adaptive_tick, rate=30, budget=30 * horus.ms, on_miss="skip", ) ``` ### Node Metrics Query tick duration and error stats at runtime: ```python # simplified sched = horus.Scheduler(tick_rate=100) sched.add(detector) sched.add(planner) # After running for a while... for name in sched.get_node_names(): stats = sched.get_node_stats(name) avg_ms = stats.get("avg_tick_duration_ms", 0) total = stats.get("total_ticks", 0) errors = stats.get("error_count", 0) print(f"{name}: avg={avg_ms:.2f}ms, ticks={total}, errors={errors}") ``` ### cProfile for Tick Functions Profile individual tick functions to find bottlenecks: ```python # simplified import cProfile import pstats profiler = cProfile.Profile() tick_count = 0 def profiled_tick(node): global tick_count profiler.enable() actual_tick(node) # Your real tick logic profiler.disable() tick_count += 1 def shutdown(node): stats = pstats.Stats(profiler) stats.sort_stats("cumulative") stats.print_stats(20) # Top 20 hotspots print(f"Profiled {tick_count} ticks") node = horus.Node( name="profiled_node", tick=profiled_tick, shutdown=shutdown, rate=30, ) horus.run(node, duration=10.0) ``` ### CLI Profiling Use the HORUS CLI to monitor node performance without modifying code: ```bash # Watch tick rates and latencies for all nodes horus monitor # Check topic message rates horus topic hz camera.rgb # View topic data in real time horus topic echo detections ``` --- ## When to Move Work to Rust Python is the right choice for most ML, prototyping, and I/O-heavy work. Move to Rust when Python becomes the bottleneck — not before. ### Concrete Guidelines | Situation | Recommendation | |-----------|---------------| | Tick rate >1 kHz | Move to Rust (GIL overhead dominates) | | Budget <100 us | Move to Rust (Python tick overhead alone is ~3 us) | | Safety-critical node | Move to Rust (`is_safe_state` / `enter_safe_state` unavailable in Python) | | Tight control loop | Move to Rust (GC pauses are unpredictable) | | ML inference at 30Hz | Stay in Python (inference dominates, not tick overhead) | | I/O-heavy (HTTP, DB) | Stay in Python (async support is natural) | | Prototyping any rate | Stay in Python (iterate faster, optimize later) | | Data visualization | Stay in Python (matplotlib, plotly ecosystem) | ### The Hybrid Pattern The most common production architecture: Rust for high-frequency control, Python for ML and I/O. ``` Python ML Node (30Hz) Rust Control Node (1kHz) ├── Receives camera images ├── Receives detections ├── Runs YOLO inference ├── Runs path planning ├── Publishes detections ├── Publishes motor commands │ │ └── budget=30ms, compute=True └── budget=200μs, deadline=500μs ``` Both share the same topics via zero-copy SHM. The Python node uses `compute=True` to run on the thread pool. The Rust node uses budget/deadline for hard timing guarantees. --- ## Memory: Pool-Backed vs Heap-Allocated ### Pool-Backed Types `Image`, `PointCloud`, `DepthImage`, and `Tensor` are backed by a shared memory pool. The pool pre-allocates slots, so creating and sending these types avoids per-tick heap allocation. ```python # simplified def camera_tick(node): # Image.from_numpy() places data in a pre-allocated pool slot # Only the 64-byte descriptor is sent through the ring buffer frame = capture_camera() img = horus.Image.from_numpy(frame) node.send("camera.rgb", img) # ~1.5 μs (descriptor only) ``` **Performance:** Pool allocation is O(1) — a single atomic compare-and-swap to claim a slot. No malloc, no GC pressure. ### Heap-Allocated (Dict Topics) Dict topics allocate a new MessagePack buffer on every `send()`. This creates GC pressure: ```python # simplified def telemetry_tick(node): # Every send() allocates a new MessagePack buffer node.send("telemetry", { "cpu": get_cpu(), "mem": get_mem(), "temp": get_temp(), }) ``` At low rates (<100Hz), this is fine. At high rates, the repeated allocations can trigger GC pauses. ### Reducing Allocation Pressure ```python # simplified # Pre-allocate typed message (reuse across ticks) cmd = horus.CmdVel(linear=0.0, angular=0.0) def fast_tick(node): scan = node.recv("scan") if scan: cmd.linear = compute_speed(scan) cmd.angular = compute_turn(scan) node.send("cmd_vel", cmd) # No allocation — reuses existing Pod ``` For pool-backed types, the pool handles reuse automatically. For typed Pod messages, you can reuse the same object across ticks. --- ## Design Decisions **Why does Python have a ~3 us GIL overhead per tick?** The HORUS scheduler is Rust code. It releases the GIL during the main tick loop so other Python threads (Flask servers, background tasks) can run concurrently. The GIL is re-acquired only when calling your Python callback. This design prioritizes scheduler determinism: the Rust tick loop runs without Python interference, and Python code gets a clean, bounded window. **Why is `GenericMessage` slower than typed topics?** Dict topics serialize Python objects to MessagePack binary format, which requires traversing the dict, type-checking each value, and writing variable-length output. Typed topics (`horus.CmdVel`) are fixed-size Plain Old Data — a single `memcpy` of known size. The serialization cost is the price of Python's dynamic typing. **Why does `from_numpy()` copy but `to_numpy()` does not?** The shared memory pool controls memory layout and lifetime. `from_numpy()` must copy data into a specific pool slot for cross-process sharing. `to_numpy()` returns a view into that already-shared slot. This is one copy on publish, zero copies on subscribe — the optimal tradeoff for pub/sub patterns where one publisher serves many subscribers. **Why not auto-detect when to use typed vs dict topics?** Explicit is better than implicit. Dict topics and typed topics have different semantics (cross-language support, size limits, error behavior). Forcing the choice at `Node()` construction time makes the performance characteristics visible in the code, not hidden behind heuristics. --- ## Trade-offs | Choice | Benefit | Cost | |--------|---------|------| | GIL release during `run()` | Other Python threads run freely | ~3 us re-acquire per tick | | Dict topics for flexibility | Any Python object works | ~5-50 us vs ~1.5 us for typed | | Pool-backed large data | Zero-copy IPC for images/clouds | One copy on `from_numpy()` | | DLPack for GPU interop | Works with PyTorch, JAX, CuPy | Requires framework-specific import | | Pre-allocation for speed | No GC pressure in tick | More setup code, less flexibility | | `budget_remaining()` for adaptive work | Maximizes budget usage | Adds branching complexity | | Disabling GC | Eliminates GC pauses | Risks memory growth | --- ## See Also - [Python API](/python/api) -- Node, Scheduler, Topic, Clock API reference - [Image API](/python/api/image) -- Zero-copy camera frames with NumPy/PyTorch - [Memory Types](/python/api/memory-types) -- Pool-backed Image, PointCloud, DepthImage, Tensor - [ML Developer's Guide](/python/library/ml-guide) -- PyTorch, ONNX, and OpenCV integration - [Advanced Patterns](/python/advanced-patterns) -- Compute nodes, async I/O, deterministic mode - [Benchmarks](/performance/benchmarks) -- Full IPC and scheduler benchmark data --- ## Advanced Patterns (Python) Path: /python/advanced-patterns Description: Compute nodes, async I/O, event-driven triggers, runtime mutation, deterministic mode, record/replay, and testing patterns # Advanced Patterns (Python) Beyond the basics of `Node`, `Scheduler`, and `run()`, HORUS provides execution classes, runtime mutation, deterministic simulation, and recording — all accessible from Python kwargs. This page covers every advanced pattern with working code. --- ## Compute Nodes By default, nodes run on the main tick thread sequentially. `compute=True` moves a node to a parallel thread pool, freeing the main thread for time-critical nodes. ### When to Use Use `compute=True` for CPU-heavy work that would block other nodes: ML inference, SLAM computation, path planning, image processing. The scheduler runs compute nodes concurrently with the main tick thread. ```python # simplified import horus import numpy as np model = load_model("yolov8n.onnx") def detect_tick(node): img = node.recv("camera.rgb") if img is None: return frame = img.to_numpy() detections = model.predict(frame) for det in detections: node.send("detections", { "class": det.class_name, "confidence": float(det.confidence), "bbox": [det.x1, det.y1, det.x2, det.y2], }) detector = horus.Node( name="yolo_detector", subs=[horus.Image], pubs=["detections"], tick=detect_tick, rate=30, compute=True, # Runs on thread pool, not main tick thread budget=30 * horus.ms, # 30ms budget per frame on_miss="skip", # Skip frame if inference overruns ) horus.run(detector, tick_rate=100) ``` ### How It Works Without `compute=True`, a 25ms ML inference at 30Hz would block the main tick thread for 25ms every 33ms — starving any 100Hz control node on the same scheduler. With `compute=True`, inference runs on a separate thread pool and does not delay the main tick. | Configuration | Main Thread | Thread Pool | |--------------|-------------|-------------| | Default (no flags) | Runs this node | Not used | | `compute=True` | Free for other nodes | Runs this node | | `async def tick` | Free for other nodes | Runs on async I/O pool | **Constraint:** `compute=True` is mutually exclusive with `async def tick` and `on="topic"`. A node can only have one execution class. --- ## Async I/O Nodes For network requests, database queries, file I/O, and WebSocket connections, use `async def tick`. The scheduler auto-detects async functions and runs them on a dedicated I/O thread pool. ```python # simplified import horus import aiohttp async def cloud_tick(node): if node.has_msg("telemetry"): data = node.recv("telemetry") try: async with aiohttp.ClientSession() as session: await session.post( "https://api.myrobot.io/telemetry", json=data, timeout=aiohttp.ClientTimeout(total=2.0), ) except aiohttp.ClientError as e: node.log_warning(f"Upload failed: {e}") uploader = horus.Node( name="cloud_uploader", subs=["telemetry"], tick=cloud_tick, # async def auto-detected rate=1, # 1Hz — upload every second ) horus.run(uploader) ``` No configuration needed beyond writing `async def`. The scheduler inspects the function and routes it to the I/O executor automatically. ### Async Init and Shutdown `init` and `shutdown` callbacks can also be async — useful for establishing database connections or closing network sessions: ```python # simplified import asyncpg async def setup(node): node.db = await asyncpg.connect("postgresql://localhost/robotics") async def store(node): if node.has_msg("sensor.data"): data = node.recv("sensor.data") await node.db.execute( "INSERT INTO readings (value, ts) VALUES ($1, $2)", data["value"], data["timestamp"], ) async def cleanup(node): await node.db.close() db_node = horus.Node( name="db_logger", subs=["sensor.data"], tick=store, init=setup, shutdown=cleanup, rate=10, ) ``` ### Timeout Discipline Async ticks that hang will block scheduler shutdown. Always wrap network calls: ```python # simplified import asyncio async def safe_tick(node): try: result = await asyncio.wait_for(fetch_data(), timeout=2.0) node.send("data", result) except asyncio.TimeoutError: node.log_warning("Fetch timed out, skipping tick") ``` --- ## Event-Driven Nodes Event-driven nodes tick only when a specific topic receives data, rather than at a fixed rate. Use `on="topic.name"` for sparse events like emergency stops, button presses, or configuration changes. ```python # simplified import horus def emergency_handler(node): event = node.recv("emergency.stop") if event: node.log_warning("Emergency stop triggered!") node.send("cmd_vel", horus.CmdVel(linear=0.0, angular=0.0)) node.send("status", {"state": "emergency_stopped"}) estop = horus.Node( name="estop_handler", subs=["emergency.stop"], pubs=[horus.CmdVel, "status"], tick=emergency_handler, on="emergency.stop", # Only tick when this topic has data ) horus.run(estop) ``` ### Behavior - The node does not tick on the scheduler's fixed rate - When a message arrives on the trigger topic, the node ticks once - Multiple messages arriving between scheduler cycles result in one tick (not one per message) - `node.recv()` inside the tick returns the latest message **Constraint:** `on="topic"` is mutually exclusive with `compute=True` and `async def tick`. ### Rate-Limited Events Combine `on="topic"` with `rate=` to cap how often an event-driven node can fire: ```python # simplified # Ticks on new config, but no more than once per second config_node = horus.Node( name="config_watcher", subs=["system.config"], tick=apply_config, on="system.config", rate=1, # Max 1Hz even if config changes faster ) ``` --- ## Rate-Limited Compute Combine `compute=True` with `rate=` to run CPU-heavy work at a lower frequency than the scheduler tick rate: ```python # simplified # Scheduler ticks at 100Hz, but SLAM only runs at 10Hz on the thread pool slam_node = horus.Node( name="slam", subs=[horus.LaserScan, horus.Odometry], pubs=["map", "localization.pose"], tick=slam_tick, compute=True, rate=10, # 10Hz SLAM updates budget=80 * horus.ms, on_miss="warn", ) # Controller runs at full 100Hz on the main tick thread controller = horus.Node( name="controller", subs=["localization.pose"], pubs=[horus.CmdVel], tick=controller_tick, rate=100, order=1, ) horus.run(slam_node, controller, tick_rate=100) ``` The scheduler skips the compute node on ticks where it is not scheduled, so the main thread never waits for it. --- ## Runtime Mutation The scheduler supports modifying nodes while running — adjusting rates, budgets, and even adding or removing nodes. ### Changing Node Rate ```python # simplified sched = horus.Scheduler(tick_rate=100) sched.add(sensor) sched.add(controller) # Later (e.g., from a monitoring thread or on_error callback): sched.set_node_rate("sensor", 200) # Speed up sensor to 200Hz sched.set_node_rate("controller", 50) # Slow down controller to 50Hz ``` ### Changing Tick Budget ```python # simplified # Tighten budget after profiling shows headroom sched.set_tick_budget("motor_controller", 150) # 150 μs # Loosen budget during degraded mode sched.set_tick_budget("motor_controller", 500) # 500 μs ``` ### Removing Nodes ```python # simplified # Remove a malfunctioning node at runtime if error_count > threshold: removed = sched.remove_node("flaky_sensor") if removed: print("Removed flaky_sensor from scheduler") ``` ### Adaptive Rate Pattern Adjust node rate based on system load: ```python # simplified def monitor_tick(node): stats = node.info.get_metrics() avg_ms = stats.get("avg_tick_duration_ms", 0) if avg_ms > 8.0: # Getting close to 10ms budget node.log_warning(f"High load: {avg_ms:.1f}ms avg tick") # Could signal another node to reduce rate via topic monitor = horus.Node( name="load_monitor", tick=monitor_tick, rate=1, order=999, # Run last ) ``` --- ## Deterministic Mode Deterministic mode makes every run produce identical output — essential for testing, simulation, and debugging. It enables SimClock (fixed dt), seeded RNG, and sequential execution. ### Basic Usage ```python # simplified sched = horus.Scheduler(tick_rate=100, deterministic=True) ``` In deterministic mode: - `horus.dt()` returns exactly `1/rate` (fixed timestep, not wall clock) - `horus.now()` advances by exactly `dt` each tick (SimClock) - `horus.rng_float()` returns tick-seeded values (same sequence every run) - Nodes execute sequentially in `order` — no thread pool randomness ### tick_once() for Testing Step through ticks manually. This is the primary testing tool for deterministic logic: ```python # simplified import horus outputs = [] def sensor_tick(node): node.send("temp", {"value": 20.0 + horus.tick() * 0.5}) def logger_tick(node): msg = node.recv("temp") if msg: outputs.append(msg["value"]) sensor = horus.Node(name="sensor", pubs=["temp"], tick=sensor_tick, rate=100, order=0) logger = horus.Node(name="logger", subs=["temp"], tick=logger_tick, rate=100, order=1) sched = horus.Scheduler(tick_rate=100, deterministic=True) sched.add(sensor) sched.add(logger) # Step 5 ticks for _ in range(5): sched.tick_once() assert len(outputs) == 5 assert outputs[0] == 20.0 assert outputs[4] == 22.0 ``` ### Filtered tick_once() Tick only specific nodes — useful for isolating behavior: ```python # simplified # Tick only the sensor node (logger does not run) sched.tick_once(node_names=["sensor"]) # Tick only the logger (consumes whatever sensor published) sched.tick_once(node_names=["logger"]) ``` ### tick_for() Run the tick loop for a bounded duration, then return: ```python # simplified # Run for exactly 1 second (100 ticks at 100Hz in deterministic mode) sched.tick_for(duration=1.0) assert sched.current_tick() == 100 ``` ### SimClock for Reproducibility In deterministic mode, time is simulated. Physics integration is bit-exact across runs: ```python # simplified positions = [] def physics_tick(node): dt = horus.dt() # Exactly 0.01 at 100Hz noise = horus.rng_float() * 0.001 # Same noise every run position = dt * 10.0 + noise positions.append(position) node = horus.Node(name="physics", tick=physics_tick, rate=100) # Run 1 sched1 = horus.Scheduler(tick_rate=100, deterministic=True) sched1.add(node) sched1.run(duration=1.0) run1 = positions.copy() # Run 2 — identical output positions.clear() sched2 = horus.Scheduler(tick_rate=100, deterministic=True) sched2.add(horus.Node(name="physics", tick=physics_tick, rate=100)) sched2.run(duration=1.0) run2 = positions.copy() assert run1 == run2 # Bit-exact ``` --- ## Record and Replay Record all topic data during a run for later analysis, debugging, or test regression. ### Recording a Session ```python # simplified sched = horus.Scheduler(tick_rate=100, recording=True) sched.add(sensor) sched.add(controller) # Run for 10 seconds sched.run(duration=10.0) # Stop recording and get file paths files = sched.stop_recording() print(f"Session saved to: {files}") ``` ### Managing Recordings ```python # simplified # List all saved recordings for session in sched.list_recordings(): print(f" {session}") # Delete a recording sched.delete_recording(session_name) # Check recording state if sched.is_recording(): print("Recording in progress") if sched.is_replaying(): print("Replay in progress") ``` ### Recording with Deterministic Mode Combine recording and deterministic mode for fully reproducible test sessions: ```python # simplified sched = horus.Scheduler( tick_rate=100, deterministic=True, recording=True, ) sched.add(sensor) sched.add(controller) sched.run(duration=5.0) files = sched.stop_recording() # This recording can be replayed and will produce identical output ``` --- ## Custom Messages with GenericMessage For ad-hoc data that does not warrant a typed message, dict topics (GenericMessage) provide maximum flexibility: ```python # simplified def diagnostics_tick(node): node.send("diagnostics", { "cpu_temp": get_cpu_temp(), "battery_pct": get_battery(), "error_log": get_recent_errors(), "uptime_sec": horus.elapsed(), "tick_count": horus.tick(), }) ``` ### Size and Type Constraints - **Max size:** 4KB per message (larger messages spill to TensorPool automatically) - **Supported types:** `dict`, `list`, `str`, `int`, `float`, `bool`, `None`, `bytes` - **Unsupported:** Custom classes, lambdas, sockets, file handles - **Cross-language:** GenericMessage does NOT cross to Rust nodes For structured data that crosses to Rust, use typed messages or custom messages: ```python # simplified from horus.msggen import define_message RobotDiagnostics = define_message("RobotDiagnostics", "diagnostics", [ ("cpu_temp", "f32"), ("battery_pct", "f32"), ("error_code", "i32"), ("uptime_sec", "f64"), ]) def diagnostics_tick(node): diag = RobotDiagnostics( cpu_temp=65.0, battery_pct=85.0, error_code=0, uptime_sec=horus.elapsed(), ) node.send("diagnostics", diag.to_bytes()) ``` See [Custom Messages](/python/api/custom-messages) for runtime vs compiled message details. --- ## Testing Patterns ### tick_once with Mock Data The primary testing pattern: set up nodes, inject data via topics, step through ticks, assert outputs. ```python # simplified import horus def test_obstacle_avoidance(): results = [] def sensor_tick(node): # Simulate obstacle at 0.3m node.send("distance", {"value": 0.3}) def controller_tick(node): msg = node.recv("distance") if msg: if msg["value"] < 0.5: cmd = horus.CmdVel(linear=0.0, angular=0.5) else: cmd = horus.CmdVel(linear=1.0, angular=0.0) node.send("cmd_vel", cmd) results.append(cmd) sensor = horus.Node(name="sensor", pubs=["distance"], tick=sensor_tick, rate=100, order=0) ctrl = horus.Node( name="controller", subs=["distance"], pubs=[horus.CmdVel], tick=controller_tick, rate=100, order=1, ) sched = horus.Scheduler(tick_rate=100, deterministic=True) sched.add(sensor) sched.add(ctrl) sched.tick_once() assert len(results) == 1 assert results[0].linear == 0.0 # Should stop assert results[0].angular == 0.5 # Should turn ``` ### Deterministic RNG for Reproducible Tests ```python # simplified def test_noisy_sensor(): readings = [] def noisy_sensor(node): base = 25.0 noise = horus.rng_float() * 0.5 # Deterministic noise readings.append(base + noise) node = horus.Node(name="sensor", tick=noisy_sensor, rate=100) sched = horus.Scheduler(tick_rate=100, deterministic=True) sched.add(node) for _ in range(10): sched.tick_once() # Same 10 readings every run expected = readings.copy() readings.clear() sched2 = horus.Scheduler(tick_rate=100, deterministic=True) sched2.add(horus.Node(name="sensor", tick=noisy_sensor, rate=100)) for _ in range(10): sched2.tick_once() assert readings == expected ``` ### Testing Node Interactions Test that multiple nodes communicate correctly: ```python # simplified def test_pipeline(): final_outputs = [] def producer_tick(node): node.send("raw", {"value": 42}) def transformer_tick(node): msg = node.recv("raw") if msg: node.send("processed", {"doubled": msg["value"] * 2}) def consumer_tick(node): msg = node.recv("processed") if msg: final_outputs.append(msg["doubled"]) producer = horus.Node(name="producer", pubs=["raw"], tick=producer_tick, rate=100, order=0) transformer = horus.Node( name="transformer", subs=["raw"], pubs=["processed"], tick=transformer_tick, rate=100, order=1, ) consumer = horus.Node( name="consumer", subs=["processed"], tick=consumer_tick, rate=100, order=2, ) sched = horus.Scheduler(tick_rate=100, deterministic=True) sched.add(producer) sched.add(transformer) sched.add(consumer) sched.tick_once() assert final_outputs == [84] ``` ### Testing Safety Behavior ```python # simplified def test_deadline_miss_policy(): import time miss_count = [0] def slow_tick(node): time.sleep(0.01) # 10ms — exceeds 1ms budget node.send("output", {"tick": horus.tick()}) def on_error(node, exc): miss_count[0] += 1 node = horus.Node( name="slow", pubs=["output"], tick=slow_tick, on_error=on_error, rate=100, budget=1 * horus.ms, on_miss="warn", ) sched = horus.Scheduler(tick_rate=100) sched.add(node) sched.run(duration=0.5) stats = sched.safety_stats() print(f"Deadline misses: {stats.get('deadline_misses', 0)}") ``` ### Introspection in Tests Query node and scheduler state for assertions: ```python # simplified sched = horus.Scheduler(tick_rate=100, deterministic=True) sched.add(sensor) sched.add(controller) for _ in range(100): sched.tick_once() # Assert node health assert sched.has_node("sensor") assert sched.has_node("controller") assert sched.get_node_count() == 2 # Assert performance sensor_stats = sched.get_node_stats("sensor") assert sensor_stats["total_ticks"] == 100 assert sensor_stats["error_count"] == 0 assert sensor_stats["avg_tick_duration_ms"] < 1.0 ``` --- ## Design Decisions **Why `compute=True` as a kwarg instead of a separate `ComputeNode` class?** One `Node` class with kwargs is simpler than a class hierarchy. `compute=True` is a scheduling hint, not a different kind of node. The same tick function, the same send/recv API, the same error handling. Changing a node from default to compute is a one-character edit, not a class refactor. **Why auto-detect `async def` instead of requiring `async=True`?** Python's `async def` already declares the function's execution model at the language level. Requiring a separate `async=True` kwarg would be redundant and error-prone (what happens if `async=True` but tick is a regular `def`?). Auto-detection means the declaration is the configuration. **Why is `on="topic"` mutually exclusive with `compute=True`?** These are different execution models. Event-driven nodes wake on data arrival (interrupt-like). Compute nodes run on a fixed schedule on a thread pool (periodic). Combining them creates ambiguous semantics: should the node run when data arrives AND on a fixed schedule? The explicit choice avoids confusion. **Why deterministic mode uses SimClock instead of slowed wall clock?** SimClock advances by exactly `1/rate` per tick regardless of actual execution time. This means a 1-second simulation produces the same physics whether the machine takes 0.5s or 5s to compute it. Wall clock, even slowed, would produce different results depending on system load. **Why `tick_once()` instead of mocking the scheduler?** `tick_once()` runs the real scheduler for exactly one tick — same init sequence, same ordering, same safety checks. Mocking the scheduler would test your mock, not your node. `tick_once()` gives you deterministic, real execution with zero test infrastructure. --- ## Trade-offs | Choice | Benefit | Cost | |--------|---------|------| | `compute=True` for CPU work | Main thread stays responsive | Thread pool scheduling adds ~0.1ms jitter | | `async def` for I/O | Non-blocking, natural Python async | ~1ms event loop overhead, pending awaits block shutdown | | `on="topic"` for events | Zero CPU when idle | Cannot guarantee tick rate, mutually exclusive with compute | | `rate=` on compute nodes | Prevents thread pool saturation | Node might miss data between ticks | | Runtime mutation (`set_node_rate`) | Adaptive systems, degraded mode | Potential for race conditions if multiple threads mutate | | Deterministic mode | Reproducible tests, debugging | Single-threaded, slower than real-time | | Recording | Post-mortem debugging, regression tests | Disk I/O overhead, storage growth | | Dict topics for prototyping | Any Python data, zero setup | ~5-50 us vs ~1.5 us typed, Python-only | | `tick_once()` for testing | Real scheduler, real ordering | No parallel execution in deterministic mode | --- ## See Also - [Python API](/python/api) -- Node constructor, Scheduler, Clock API - [Async Nodes](/python/api/async-nodes) -- Deep dive on async patterns - [Custom Messages](/python/api/custom-messages) -- Runtime and compiled message definitions - [Performance Guide](/python/performance) -- Optimization techniques and latency numbers - [Deterministic Mode](/advanced/deterministic-mode) -- Full deterministic execution guide - [Python Examples](/python/examples) -- Working code examples --- ## Perception Messages Path: /python/api/perception-messages Description: Object detection, bounding boxes, segmentation, tracking, and landmark types for ML/CV output # Perception Messages Output types for machine learning and computer vision pipelines — detections, segmentation masks, tracked objects, and landmarks. ```python # simplified from horus import Detection, Detection3D, BoundingBox2D, BoundingBox3D, SegmentationMask ``` --- ## Detection 2D object detection — flat constructor with class, confidence, and bounding box fields. ```python # simplified import horus det = horus.Detection( class_name="person", confidence=0.95, x=100.0, y=50.0, width=100.0, height=250.0, class_id=0, instance_id=0, ) ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `class_name` | `str` | `""` | Detected class name | | `confidence` | `float` | `0.0` | Detection confidence | | `x`, `y` | `float` | `0.0` | Bounding box top-left (px) | | `width`, `height` | `float` | `0.0` | Bounding box size (px) | | `class_id` | `int` | `0` | Numeric class identifier | | `instance_id` | `int` | `0` | Instance identifier | | `bbox` | `BoundingBox2D` | — | Bounding box as a BoundingBox2D object | | Method | Returns | Description | |--------|---------|-------------| | `is_confident(threshold)` | `bool` | True if confidence exceeds threshold | | `with_class_id(class_id)` | `Detection` | Return a copy with class_id set | ## Detection3D 3D object detection — flat constructor with center, dimensions, and yaw. ```python # simplified det3d = horus.Detection3D( class_name="car", confidence=0.87, cx=5.0, cy=2.0, cz=0.8, length=4.5, width=1.8, height=1.5, yaw=0.0, ) ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `class_name` | `str` | `""` | Detected class name | | `confidence` | `float` | `0.0` | Detection confidence | | `cx`, `cy`, `cz` | `float` | `0.0` | Bounding box center (m) | | `length`, `width`, `height` | `float` | `0.0` | Bounding box dimensions (m) | | `yaw` | `float` | `0.0` | Heading angle (rad) | | `bbox` | `BoundingBox3D` | — | Bounding box as a BoundingBox3D object | | `velocity_x`, `velocity_y`, `velocity_z` | `float` | `0.0` | Object velocity | | Method | Returns | Description | |--------|---------|-------------| | `with_velocity(vx, vy, vz)` | `Detection3D` | Return a copy with velocity set | ## BoundingBox2D Axis-aligned 2D bounding box in pixel coordinates. ```python # simplified bbox = horus.BoundingBox2D(x=100.0, y=50.0, width=100.0, height=250.0) ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `x`, `y` | `float` | `0.0` | Top-left corner (px) | | `width`, `height` | `float` | `0.0` | Box dimensions (px) | | `center_x` | `float` | — | Box center X (getter only) | | `center_y` | `float` | — | Box center Y (getter only) | | `area` | `float` | — | Box area in pixels (getter only) | **Static Methods:** | Method | Returns | Description | |--------|---------|-------------| | `BoundingBox2D.from_center(cx, cy, width, height)` | `BoundingBox2D` | Create from center point | **Methods:** | Method | Returns | Description | |--------|---------|-------------| | `iou(other)` | `float` | Intersection over Union with another BoundingBox2D | | `as_tuple()` | `tuple` | Returns (x, y, width, height) | | `as_xyxy()` | `tuple` | Returns (x1, y1, x2, y2) format | ## BoundingBox3D 3D bounding box in world coordinates. ```python # simplified bbox3d = horus.BoundingBox3D( cx=5.0, cy=2.0, cz=0.8, length=4.5, width=1.8, height=1.5, yaw=0.0, ) ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `cx`, `cy`, `cz` | `float` | `0.0` | Box center (m) | | `length`, `width`, `height` | `float` | `0.0` | Box dimensions (m) | | `yaw` | `float` | `0.0` | Heading angle (rad) | ## SegmentationMask Per-pixel class labels. ```python # simplified mask = horus.SegmentationMask(width=640, height=480, mask_type=0, num_classes=21) ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `width`, `height` | `int` | `0` | Image dimensions (px) | | `mask_type` | `int` | `0` | Segmentation mask type (getter only) | | `num_classes` | `int` | `0` | Number of semantic classes | | `frame_id` | `str` | — | Frame identifier (getter only) | | `timestamp_ns` | `int` | `0` | Timestamp | | `seq` | `int` | `0` | Sequence number | **Static Methods:** | Method | Returns | Description | |--------|---------|-------------| | `SegmentationMask.semantic(width, height, num_classes)` | `SegmentationMask` | Create a semantic segmentation mask | | `SegmentationMask.instance(width, height)` | `SegmentationMask` | Create an instance segmentation mask | | `SegmentationMask.panoptic(width, height, num_classes)` | `SegmentationMask` | Create a panoptic segmentation mask | **Methods:** | Method | Returns | Description | |--------|---------|-------------| | `is_semantic()` | `bool` | True if semantic segmentation | | `is_instance()` | `bool` | True if instance segmentation | | `is_panoptic()` | `bool` | True if panoptic segmentation | | `data_size()` | `int` | Size of mask data buffer in bytes | | `data_size_u16()` | `int` | Size of mask data buffer in u16 elements | ## TrackedObject Object with persistent tracking ID across frames. ```python # simplified tracked = horus.TrackedObject( track_id=42, x=100.0, y=50.0, width=100.0, height=250.0, class_id=0, confidence=0.9, ) ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `track_id` | `int` | `0` | Persistent tracking ID | | `x`, `y` | `float` | `0.0` | Bounding box top-left (px) | | `width`, `height` | `float` | `0.0` | Bounding box size (px) | | `class_id` | `int` | `0` | Numeric class identifier | | `confidence` | `float` | `0.0` | Detection confidence | | `class_name` | `str` | — | Class name | | `bbox` | `BoundingBox2D` | — | Current bounding box (getter only) | | `predicted_bbox` | `BoundingBox2D` | — | Predicted bounding box (getter only) | | `velocity_x`, `velocity_y` | `float` | — | Estimated velocity (getter only) | | `velocity` | `tuple` | — | Velocity as (vx, vy) tuple (getter only) | | `accel_x`, `accel_y` | `float` | — | Estimated acceleration (getter only) | | `age` | `int` | — | Track age in frames (getter only) | | `hits` | `int` | — | Number of detection hits (getter only) | | `time_since_update` | `int` | — | Frames since last update (getter only) | | `state` | `int` | — | Track state code (getter only) | **Methods:** | Method | Returns | Description | |--------|---------|-------------| | `speed()` | `float` | Estimated speed (magnitude of velocity) | | `heading()` | `float` | Estimated heading angle (radians) | | `is_tentative()` | `bool` | True if track is tentative (not yet confirmed) | | `is_confirmed()` | `bool` | True if track is confirmed | | `is_deleted()` | `bool` | True if track is marked for deletion | | `confirm()` | `None` | Confirm the track | | `delete()` | `None` | Mark the track for deletion | | `mark_missed()` | `None` | Mark a missed detection (no match this frame) | | `update(bbox, confidence)` | `None` | Update with new detection | ## Landmark / Landmark3D Visual landmarks for SLAM and localization. ```python # simplified lm = horus.Landmark(x=1.5, y=2.3, visibility=0.95, index=7) lm3d = horus.Landmark3D(x=1.5, y=2.3, z=0.8, visibility=0.95, index=7) ``` ### Landmark | Field | Type | Default | Description | |-------|------|---------|-------------| | `x`, `y` | `float` | `0.0` | Position (px or m) | | `visibility` | `float` | `1.0` | Visibility score (0.0-1.0) | | `index` | `int` | `0` | Landmark index | | Static Method | Returns | Description | |---------------|---------|-------------| | `Landmark.visible(x, y, index)` | `Landmark` | Create a visible landmark (visibility=1.0) | | Method | Returns | Description | |--------|---------|-------------| | `is_visible(threshold)` | `bool` | True if visibility exceeds threshold | | `distance_to(other)` | `float` | Euclidean distance to another Landmark | ### Landmark3D | Field | Type | Default | Description | |-------|------|---------|-------------| | `x`, `y`, `z` | `float` | `0.0` | 3D position (m) | | `visibility` | `float` | `1.0` | Visibility score (0.0-1.0) | | `index` | `int` | `0` | Landmark index | | Static Method | Returns | Description | |---------------|---------|-------------| | `Landmark3D.visible(x, y, z, index)` | `Landmark3D` | Create a visible 3D landmark | | Method | Returns | Description | |--------|---------|-------------| | `is_visible(threshold)` | `bool` | True if visibility exceeds threshold | | `distance_to(other)` | `float` | Euclidean distance to another Landmark3D | | `to_2d()` | `Landmark` | Project to 2D (drops z coordinate) | --- ## Example: YOLO Detection Pipeline ```python # simplified import horus def detect_tick(node): img = node.recv("camera.rgb") if img is None: return frame = img.to_numpy() # Zero-copy results = model.predict(frame) for r in results: det = horus.Detection( class_name=r.class_name, confidence=float(r.confidence), x=r.x, y=r.y, width=r.w, height=r.h, class_id=r.class_id, ) node.send("detections", det) detector = horus.Node( name="yolo", subs=[horus.Image], pubs=[horus.Detection], tick=detect_tick, rate=30, compute=True, on_miss="skip", ) horus.run(detector) ``` --- ## See Also - [Image API](/python/api/image) — Zero-copy camera frames - [Vision Messages](/python/api/vision-messages) — CameraInfo, CompressedImage - [Python CV Node Recipe](/recipes/python-cv-node) — Complete CV pipeline - [Rust Perception Messages](/rust/api/perception-messages) — Rust equivalent --- ## Diagnostics Messages Path: /python/api/diagnostics-messages Description: Heartbeat, diagnostic status, emergency stop, safety status, and system health monitoring # Diagnostics Messages System health and safety monitoring types. ```python # simplified from horus import Heartbeat, DiagnosticStatus, EmergencyStop, SafetyStatus ``` --- ## Heartbeat Node alive signal — publish periodically to indicate health. ```python # simplified hb = horus.Heartbeat(node_name="motor_ctrl", node_id=1, timestamp_ns=horus.timestamp_ns()) ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `node_name` | `str` | `""` | Name of the sending node | | `node_id` | `int` | `0` | Numeric node identifier | | `sequence` | `int` | — | Heartbeat sequence number (getter only) | | `alive` | `bool` | — | Whether the node is alive (getter only) | | `uptime` | `float` | — | Node uptime in seconds (getter only) | | `timestamp_ns` | `int` | `0` | Timestamp in nanoseconds | | Method | Returns | Description | |--------|---------|-------------| | `update(uptime)` | `None` | Update heartbeat with current uptime, increments sequence | ## DiagnosticStatus Diagnostic report from a subsystem. ```python # simplified status = horus.DiagnosticStatus(component="battery", level=0, message="OK") # level: 0=OK, 1=WARN, 2=ERROR, 3=FATAL ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `level` | `int` | `0` | 0=OK, 1=WARN, 2=ERROR, 3=FATAL | | `code` | `int` | `0` | Diagnostic error code | | `message` | `str` | `""` | Human-readable status message | | `component` | `str` | `""` | Component name | | `timestamp_ns` | `int` | `0` | Timestamp in nanoseconds | **Static Methods:** | Method | Returns | Description | |--------|---------|-------------| | `DiagnosticStatus.ok(message)` | `DiagnosticStatus` | Create an OK status | | `DiagnosticStatus.warn(code, message)` | `DiagnosticStatus` | Create a warning | | `DiagnosticStatus.error(code, message)` | `DiagnosticStatus` | Create an error | | `DiagnosticStatus.fatal(code, message)` | `DiagnosticStatus` | Create a fatal error | **Methods:** | Method | Returns | Description | |--------|---------|-------------| | `with_component(component)` | `DiagnosticStatus` | Return a copy with component name set | | `message_str()` | `str` | Get message as a string | | `component_str()` | `str` | Get component as a string | ## EmergencyStop E-stop signal — safety-critical. ```python # simplified estop = horus.EmergencyStop(engaged=True, reason="obstacle detected", timestamp_ns=horus.timestamp_ns()) ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `engaged` | `bool` | `True` | Whether the E-stop is engaged | | `reason` | `str` | `""` | Reason for engagement | | `auto_reset` | `bool` | — | Whether auto-reset is enabled (getter only) | | `timestamp_ns` | `int` | `0` | Timestamp in nanoseconds | **Static Methods:** | Method | Returns | Description | |--------|---------|-------------| | `EmergencyStop.engage(reason)` | `EmergencyStop` | Create an engaged E-stop with reason | | `EmergencyStop.release()` | `EmergencyStop` | Create a released E-stop | **Methods:** | Method | Returns | Description | |--------|---------|-------------| | `with_source(source)` | `EmergencyStop` | Return a copy with source name set | | `reason_str()` | `str` | Get reason as a string | ## SafetyStatus Overall safety monitoring state. Created with no parameters — read state via getters. ```python # simplified safety = horus.SafetyStatus() # Read-only getters safety.enabled # bool safety.estop_engaged # bool safety.watchdog_ok # bool safety.limits_ok # bool safety.comms_ok # bool safety.mode # int safety.fault_code # int safety.timestamp_ns # int ``` | Getter | Type | Description | |--------|------|-------------| | `enabled` | `bool` | Safety system enabled | | `estop_engaged` | `bool` | Emergency stop is engaged | | `watchdog_ok` | `bool` | Watchdog is healthy | | `limits_ok` | `bool` | All limits within bounds | | `comms_ok` | `bool` | Communications healthy | | `mode` | `int` | Current safety mode | | `fault_code` | `int` | Active fault code | | `timestamp_ns` | `int` | Timestamp in nanoseconds | | Method | Returns | Description | |--------|---------|-------------| | `is_safe()` | `bool` | True if all safety checks pass | | `set_fault(code)` | `None` | Set a fault code | | `clear_faults()` | `None` | Clear all fault codes | ## NodeHeartbeat Per-node health heartbeat with state and health level. ```python # simplified nhb = horus.NodeHeartbeat(state=1, health=0) # state and health are u8 integer codes ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `state` | `int` | `0` | Node state (u8) | | `health` | `int` | `0` | Node health level (u8) | | `tick_count` | `int` | — | Total tick count (getter only) | | `target_rate_hz` | `int` | — | Target tick rate (getter only) | | `actual_rate_hz` | `int` | — | Actual measured tick rate (getter only) | | `error_count` | `int` | — | Cumulative error count (getter only) | | `last_tick_timestamp` | `int` | — | Last tick timestamp in ns (getter only) | | `heartbeat_timestamp` | `int` | — | Heartbeat timestamp in ns (getter only) | | Method | Returns | Description | |--------|---------|-------------| | `is_fresh(max_age_secs)` | `bool` | True if heartbeat is within max_age_secs of now | | `update_timestamp()` | `None` | Update the heartbeat timestamp to now | ## ResourceUsage System resource monitoring. ```python # simplified usage = horus.ResourceUsage( cpu_percent=45.0, memory_bytes=268435456, memory_percent=12.5, timestamp_ns=horus.timestamp_ns(), ) ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `cpu_percent` | `float` | `0.0` | CPU usage percentage | | `memory_bytes` | `int` | `0` | Memory usage in bytes | | `memory_percent` | `float` | `0.0` | Memory usage percentage | | `disk_bytes` | `int` | `0` | Disk usage in bytes | | `disk_percent` | `float` | `0.0` | Disk usage percentage | | `network_tx_bytes` | `int` | `0` | Network bytes transmitted | | `network_rx_bytes` | `int` | `0` | Network bytes received | | `temperature` | `float` | `0.0` | System temperature (°C) | | `thread_count` | `int` | `0` | Active thread count | | `timestamp_ns` | `int` | `0` | Timestamp in nanoseconds | | Method | Returns | Description | |--------|---------|-------------| | `is_cpu_high(threshold)` | `bool` | True if CPU usage exceeds threshold | | `is_memory_high(threshold)` | `bool` | True if memory usage exceeds threshold | | `is_temperature_high(threshold)` | `bool` | True if temperature exceeds threshold | ## DiagnosticReport Aggregated diagnostic report. Values are added via helper methods, not constructor args. ```python # simplified report = horus.DiagnosticReport(component="motors", level=0) report.add_value("temperature", 45.2) report.add_string("firmware", "v2.1.0") ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `component` | `str` | `""` | Component being reported on | | `level` | `int` | `0` | Overall diagnostic level | | Method | Description | |--------|-------------| | `add_value(key, value)` | Add a numeric diagnostic value | | `add_string(key, value)` | Add a string diagnostic value | ## DiagnosticValue Single key-value diagnostic entry. ```python # simplified dv = horus.DiagnosticValue(key="temperature", value="45.2") ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `key` | `str` | `""` | Diagnostic key | | `value` | `str` | `""` | Diagnostic value (as string) | | `value_type` | `int` | — | Value type code (getter only) | **Static Methods:** | Method | Returns | Description | |--------|---------|-------------| | `DiagnosticValue.int(key, value)` | `DiagnosticValue` | Create an integer diagnostic value | | `DiagnosticValue.float(key, value)` | `DiagnosticValue` | Create a float diagnostic value | | `DiagnosticValue.boolean(key, value)` | `DiagnosticValue` | Create a boolean diagnostic value | --- ## See Also - [Emergency Stop Recipe](/recipes/emergency-stop-python) — Complete E-stop pattern - [Safety Monitor](/advanced/safety-monitor) — Graduated degradation - [Rust Diagnostics Messages](/rust/api/diagnostics-messages) — Rust equivalent --- ## TransformFrame Guide (Python) Path: /python/transform-frame Description: Coordinate frame trees, static and dynamic transforms, lookups, point transforms, staleness, and diagnostics for Python developers # TransformFrame Guide (Python) A camera is mounted 15 cm above and 10 cm forward of the robot's center. A LiDAR is 30 cm above. An object detected in camera coordinates at `[2.0, 0.0, 1.5]` is meaningless to the path planner unless you can express it in the world frame. The path planner does not know where the camera is. It knows where the robot is in the world, and it needs the obstacle in the same frame. This is what TransformFrame solves. It stores a tree of named coordinate frames, each with a transform relative to its parent, and computes the transform between any two frames in the tree. You register "camera is offset from base_link by X," update "base_link is at this pose in the world," and then ask "where is this camera point in world coordinates?" TransformFrame chains the transforms through the tree and gives you the answer. ```python # simplified from horus import TransformFrame, Transform tf_tree = TransformFrame() # Build the tree: world -> base_link -> camera, lidar tf_tree.register_frame("base_link", "world") tf_tree.register_static_frame("camera", Transform.from_translation([0.1, 0.0, 0.15]), parent="base_link") tf_tree.register_static_frame("lidar", Transform.from_translation([0.0, 0.0, 0.3]), parent="base_link") # Update robot pose as it moves tf_tree.update_transform("base_link", Transform.from_euler([1.5, 2.0, 0.0], [0.0, 0.0, 0.78])) # Where is a LiDAR detection in the world? world_point = tf_tree.transform_point("lidar", "world", [5.0, 0.0, 0.0]) ``` --- ## Creating a TransformFrame The `TransformFrame` constructor takes an optional `TransformFrameConfig` that controls capacity and history depth. ```python # simplified from horus import TransformFrame, TransformFrameConfig # Default: 512 frames, 32 history entries per frame tf_tree = TransformFrame() # Custom configuration config = TransformFrameConfig(max_frames=256, history_len=64) tf_tree = TransformFrame(config=config) # Presets for common use cases tf_tree = TransformFrame.small() # 256 frames, ~550 KB tf_tree = TransformFrame.medium() # 1024 frames, ~2.2 MB tf_tree = TransformFrame.large() # 4096 frames, ~9 MB tf_tree = TransformFrame.massive() # 16384 frames, ~35 MB ``` `max_frames` is the maximum number of frames the tree can hold. `history_len` is the number of past transforms stored per dynamic frame, used for temporal interpolation. When the history buffer fills, the oldest entry is dropped. **Choosing a size**: A simple mobile robot (base, 2 wheels, camera, LiDAR, IMU) needs ~10 frames. A humanoid (30 joints + sensors) needs ~50. A multi-robot system with 10 robots might need 500. `TransformFrame.small()` covers most single-robot systems. Use `TransformFrame.medium()` if you need headroom. ```python # simplified # Check memory usage before allocating config = TransformFrameConfig(max_frames=1024, history_len=32) print(config.memory_estimate()) # "~2.2MB" ``` --- ## Building the Frame Tree Every frame has a name, a parent (except root frames), and a transform from the parent to the child. The tree is built by registering frames with their parent-child relationships. ### Root Frames A root frame has no parent. Most systems have one root called `"world"`. Register a frame under a root by passing the root name as the parent: ```python # simplified tf_tree.register_frame("base_link", "world") ``` If the parent frame does not exist, HORUS raises `HorusNotFoundError`. Register parents before children. ### Static vs Dynamic Frames This is the most important distinction in the frame tree. **Static frames** have a fixed transform that never changes at runtime. Sensor mounts, fixed offsets between joints, calibration transforms -- these are static. Register them with `register_static_frame()`, which stores exactly one transform and skips interpolation on lookup: ```python # simplified from horus import Transform # Camera is 10cm forward, 15cm up from base_link, fixed forever tf_tree.register_static_frame( "camera", Transform.from_translation([0.1, 0.0, 0.15]), parent="base_link", ) # LiDAR is 30cm above base_link, slightly tilted tf_tree.register_static_frame( "lidar", Transform.from_euler([0.0, 0.0, 0.3], [0.0, 0.05, 0.0]), parent="base_link", ) ``` **Dynamic frames** change at runtime. The robot's pose in the world, a pan-tilt head, a rotating joint -- these are dynamic. Register them with `register_frame()`, then call `update_transform()` as new data arrives: ```python # simplified # Register (no initial transform) tf_tree.register_frame("base_link", "world") # Update as odometry arrives tf_tree.update_transform( "base_link", Transform.from_euler([x, y, 0.0], [0.0, 0.0, yaw]), ) ``` Dynamic frames store a history of timestamped transforms (controlled by `history_len`). Lookups between timestamps use interpolation. ### Complete Tree Example A typical mobile robot frame tree: ```python # simplified from horus import TransformFrame, Transform tf_tree = TransformFrame() # Root: world frame (implicit, created when first child registers under it) tf_tree.register_frame("base_link", "world") # Robot body in world tf_tree.register_frame("odom", "world") # Odometry estimate # Static sensor mounts on the robot body tf_tree.register_static_frame("camera_link", Transform.from_translation([0.1, 0.0, 0.15]), parent="base_link") tf_tree.register_static_frame("lidar_link", Transform.from_translation([0.0, 0.0, 0.3]), parent="base_link") tf_tree.register_static_frame("imu_link", Transform.from_translation([0.0, 0.0, 0.05]), parent="base_link") # Dynamic: camera on a pan-tilt mount tf_tree.register_frame("camera_optical", "camera_link") ``` --- ## Updating Transforms Call `update_transform()` whenever new pose data arrives. Each call adds a timestamped entry to the frame's history buffer: ```python # simplified from horus import Transform, get_timestamp_ns # Basic update (uses current time automatically) tf_tree.update_transform("base_link", Transform.from_euler( [x, y, 0.0], [0.0, 0.0, yaw] )) # Explicit timestamp (useful when the sensor provides its own timestamp) tf_tree.update_transform("base_link", Transform.from_euler([x, y, 0.0], [0.0, 0.0, yaw]), timestamp_ns=sensor_timestamp_ns, ) ``` For performance-critical paths, use `update_transform_by_id()` to skip the name-to-ID lookup: ```python # simplified # Cache the frame ID at init time base_id = tf_tree.frame_id("base_link") # Use the numeric ID in the hot loop def odom_tick(node): tf_tree.update_transform_by_id(base_id, new_transform) ``` ### Overwriting Static Transforms Static transforms are set at registration time. To change one later (e.g., after a recalibration), use `set_static_transform()`: ```python # simplified # Recalibrate camera offset tf_tree.set_static_transform("camera_link", Transform.from_euler([0.105, 0.002, 0.148], [0.001, 0.0, 0.003]) ) ``` --- ## Looking Up Transforms ### Latest Transform `tf()` returns the most recent transform from the source frame to the target frame. This is the most common operation: ```python # simplified # Get the transform from lidar to world lidar_to_world = tf_tree.tf("lidar_link", "world") print(f"LiDAR position in world: {lidar_to_world.translation}") ``` Under the hood, `tf()` walks the frame tree from source to target, composing transforms along the path. If the two frames share no common ancestor, it raises `HorusTransformError`. ### Historical Transform `tf_at()` returns the interpolated transform at a specific timestamp. This is essential for sensor fusion, where sensor readings arrive with timestamps that may not align with the latest transform updates: ```python # simplified # Get the robot's pose at the time the LiDAR scan was captured lidar_to_world_at_scan = tf_tree.tf_at("lidar_link", "world", timestamp_ns=scan_timestamp) ``` Interpolation uses SLERP for rotation and linear interpolation for translation. If the requested timestamp falls outside the history window, `HorusTransformError` is raised. For strict lookups without interpolation: ```python # simplified # Exact timestamp match only (no interpolation) tf = tf_tree.tf_at_strict("lidar_link", "world", timestamp_ns=exact_ts) # Interpolation with tolerance window (default 100ms) tf = tf_tree.tf_at_with_tolerance("lidar_link", "world", timestamp_ns=ts, tolerance_ns=50_000_000) # 50ms tolerance ``` ### Checking Availability Before a blocking lookup, check if the transform path exists: ```python # simplified if tf_tree.can_transform("lidar_link", "world"): tf = tf_tree.tf("lidar_link", "world") # Use tf... # Check at a specific timestamp if tf_tree.can_transform_at("lidar_link", "world", timestamp_ns=ts): tf = tf_tree.tf_at("lidar_link", "world", timestamp_ns=ts) ``` ### Waiting for Transforms At startup, transforms may not be available yet (e.g., waiting for the first odometry message). Use `wait_for_transform()` to block until the transform becomes available: ```python # simplified try: tf = tf_tree.wait_for_transform("lidar_link", "world", timeout_sec=2.0) print(f"Got transform: {tf.translation}") except HorusTimeoutError: print("Transform not available within 2 seconds") ``` For non-blocking waits, use the async variant: ```python # simplified import concurrent.futures future = tf_tree.wait_for_transform_async("lidar_link", "world", timeout_sec=5.0) # Do other work... try: tf = future.result(timeout=5.0) # Block only when you need the result except concurrent.futures.TimeoutError: print("Still waiting for transform") ``` --- ## Transforming Points and Vectors The most common use of TransformFrame: transform sensor readings from the sensor's coordinate frame into a frame the rest of the system understands. ### Points A point has position. `transform_point()` applies translation and rotation: ```python # simplified # LiDAR detected an obstacle at [5.0, 0.0, 0.0] in lidar coordinates # Where is it in the world? world_point = tf_tree.transform_point("lidar_link", "world", [5.0, 0.0, 0.0]) print(f"Obstacle at world: ({world_point[0]:.2f}, {world_point[1]:.2f}, {world_point[2]:.2f})") ``` ### Vectors A vector has direction but no position. `transform_vector()` applies rotation only (no translation). Use this for velocities, normals, and directions: ```python # simplified # IMU measures acceleration in its own frame # What direction is that in the world frame? accel_world = tf_tree.transform_vector("imu_link", "world", [0.0, 0.0, 9.81]) ``` ### Batch Pattern For processing multiple points (e.g., a full LiDAR scan), look up the transform once and apply it to each point: ```python # simplified def process_scan(node): scan = node.recv("scan") if scan is None: return # Look up the transform once lidar_to_world = tf_tree.tf("lidar_link", "world") # Apply to each scan point world_points = [] for angle, distance in zip(scan.angles, scan.ranges): if distance > 0.0: x = distance * math.cos(angle) y = distance * math.sin(angle) world_pt = lidar_to_world.transform_point([x, y, 0.0]) world_points.append(world_pt) node.send("obstacles", world_points) ``` This is more efficient than calling `tf_tree.transform_point()` for each point, because the tree walk and transform composition happen only once. --- ## Staleness Detection Stale transforms are a real problem. If the odometry node crashes, `base_link` stops updating. Every lookup still succeeds -- it returns the last known transform -- but the robot has moved. You are now working with wrong data. ### Checking Staleness ```python # simplified # How long since base_link was last updated? age = tf_tree.time_since_last_update("base_link") if age is not None: print(f"base_link last updated {age:.3f}s ago") else: print("base_link has never been updated") # Convenience: is this frame's data too old? if tf_tree.is_stale("base_link", max_age_sec=0.1): print("WARNING: Odometry data is stale!") ``` ### Staleness Guard Pattern Check staleness before using transforms for control decisions: ```python # simplified def controller_tick(node): # Safety check: don't navigate with stale odometry if tf_tree.is_stale("base_link", max_age_sec=0.2): node.log_warning("Odometry stale — sending zero velocity") node.send("cmd_vel", horus.CmdVel(linear=0.0, angular=0.0)) return # Safe to navigate robot_in_world = tf_tree.tf("base_link", "world") cmd = compute_velocity(robot_in_world, goal) node.send("cmd_vel", cmd) ``` ### What "Stale" Means `time_since_last_update(name)` returns seconds since the last `update_transform()` call for that frame. It returns `None` if the frame was registered but never updated. `is_stale(name, max_age_sec)` returns `True` if the frame has never been updated or if the last update was longer ago than `max_age_sec`. Static frames are never stale -- they have no temporal component. `is_stale()` always returns `False` for static frames. --- ## Diagnostics ### Tree Visualization Print a human-readable tree of all registered frames: ```python # simplified print(tf_tree.format_tree()) ``` Output: ``` world base_link [dynamic, age=0.015s] camera_link [static] lidar_link [static] imu_link [static] camera_optical [dynamic, age=0.022s] ``` ### Graphviz Export Export the frame tree as a DOT graph for rendering with Graphviz: ```python # simplified dot = tf_tree.frames_as_dot() with open("frame_tree.dot", "w") as f: f.write(dot) # Render: dot -Tpng frame_tree.dot -o frame_tree.png ``` ### YAML Export Export the frame tree structure as YAML: ```python # simplified print(tf_tree.frames_as_yaml()) ``` ### Statistics ```python # simplified stats = tf_tree.stats() print(f"Frames: {stats['total_frames']}/{stats['max_frames']}") print(f"Static: {stats['static_frames']}, Dynamic: {stats['dynamic_frames']}") print(f"Tree depth: {stats['tree_depth']}, Roots: {stats['root_count']}") ``` | Key | Type | Description | |-----|------|-------------| | `total_frames` | `int` | Total registered frames | | `static_frames` | `int` | Frames that never change | | `dynamic_frames` | `int` | Frames updated at runtime | | `max_frames` | `int` | Maximum capacity | | `history_len` | `int` | Transform history buffer size per frame | | `tree_depth` | `int` | Maximum depth of the frame tree | | `root_count` | `int` | Number of root frames (no parent) | ### Per-Frame Info ```python # simplified info = tf_tree.frame_info("camera_link") print(f"Frame: {info['name']}, Parent: {info['parent']}, Static: {info['is_static']}") ``` ### Integrity Validation ```python # simplified if not tf_tree.validate(): print("Frame tree integrity check failed!") ``` `validate()` checks for cycles, orphaned frames, and internal consistency. Returns `True` if the tree is healthy. --- ## Error Handling | Operation | Exception | When | |-----------|-----------|------| | `tf_tree.tf("missing", "world")` | `HorusNotFoundError` | Frame not registered | | `tf_tree.tf("lidar", "disconnected")` | `HorusTransformError` | No path between frames | | `tf_tree.wait_for_transform(..., 1.0)` | `HorusTimeoutError` | Transform not available in time | | `tf_tree.register_frame("child", "missing")` | `HorusNotFoundError` | Parent not registered | | `tf_tree.unregister_frame("missing")` | `HorusNotFoundError` | Frame does not exist | | `tf_tree.tf_at_strict(s, d, old_ts)` | `HorusTransformError` | No exact timestamp match | | `Transform.from_matrix(bad)` | `ValueError` | Not a valid 4x4 matrix | The safe pattern for tick functions: ```python # simplified from horus import HorusNotFoundError, HorusTransformError def perception_tick(node): try: lidar_to_world = tf_tree.tf("lidar_link", "world") point_in_world = lidar_to_world.transform_point([5.0, 0.0, 0.0]) node.send("obstacle", point_in_world) except HorusNotFoundError: pass # Frame not yet registered — skip this tick except HorusTransformError as e: node.log_warning(f"Transform failed: {e}") ``` --- ## Complete Example A mobile robot with IMU, LiDAR, and camera. The odometry node updates `base_link` in the world. The perception node transforms LiDAR detections into world coordinates. Static frames define sensor mounting positions. ```python # simplified import horus from horus import TransformFrame, Transform import math # Global transform tree tf_tree = TransformFrame.medium() def setup_frames(node): """Register all coordinate frames at startup.""" tf_tree.register_frame("base_link", "world") # Static sensor mounts tf_tree.register_static_frame("lidar_link", Transform.from_translation([0.0, 0.0, 0.3]), parent="base_link") tf_tree.register_static_frame("camera_link", Transform.from_euler([0.1, 0.0, 0.15], [0.0, 0.1, 0.0]), parent="base_link") tick_count = 0 def odometry_tick(node): """Simulate robot moving in a circle, updating base_link pose.""" global tick_count tick_count += 1 t = tick_count * 0.01 x = math.cos(t) * 2.0 y = math.sin(t) * 2.0 yaw = t + math.pi / 2 tf_tree.update_transform("base_link", Transform.from_euler([x, y, 0.0], [0.0, 0.0, yaw])) def perception_tick(node): """Transform LiDAR detections into world coordinates.""" if tf_tree.is_stale("base_link", max_age_sec=0.1): node.log_warning("Odometry stale, skipping perception") return try: lidar_to_world = tf_tree.tf("lidar_link", "world") # Simulated detection at 5m directly ahead of LiDAR point_in_world = lidar_to_world.transform_point([5.0, 0.0, 0.0]) node.send("obstacle_world", point_in_world) node.log_info(f"Obstacle at world: ({point_in_world[0]:.2f}, " f"{point_in_world[1]:.2f}, {point_in_world[2]:.2f})") except Exception: pass # Frame not yet available odom_node = horus.Node( name="odometry", tick=odometry_tick, init=setup_frames, rate=100, pubs=["odom"], order=0, ) perception_node = horus.Node( name="perception", tick=perception_tick, rate=10, pubs=["obstacle_world"], order=1, ) horus.run(odom_node, perception_node, duration=10) ``` --- ## Design Decisions **Why a global tree instead of per-node transform state?** Multiple nodes need the same frame data. The perception node needs `lidar_link -> world`. The controller needs `base_link -> world`. The visualizer needs everything. A per-node tree would require broadcasting every transform update to every node, duplicating memory and adding synchronization complexity. A single shared tree, backed by lock-free shared memory, lets all nodes read the same data with no contention. **Why SLERP for rotation interpolation?** Linear interpolation of quaternions produces non-unit quaternions, which are invalid rotations. SLERP (Spherical Linear Interpolation) follows the shortest arc on the unit sphere, producing valid rotations at every interpolation step. This matters because transform lookups between timestamps use interpolation -- incorrect interpolation means incorrect robot pose, which compounds through the frame tree and produces incorrect world-frame coordinates. **Why bounded transform history instead of unlimited?** Each dynamic frame stores the last N transforms (configurable via `history_len`). Unlimited history would leak memory in long-running robots. Bounded history means old transforms are discarded -- if you query a timestamp older than the history window, you get a `HorusTransformError`. The default history length covers several seconds at typical update rates, which is sufficient for sensor fusion while keeping memory bounded. **Why separate static and dynamic frames?** Static frames (sensor mounts, fixed offsets) never change -- storing history for them wastes memory and adds lookup overhead. `register_static_frame()` stores exactly one transform and skips interpolation. Dynamic frames (robot base, moving joints) need timestamped history for interpolation. Separating them lets the system optimize each case: static lookups are O(1) with no interpolation, dynamic lookups use binary search + SLERP. **Why lock-free shared memory for the frame tree?** Multiple nodes read transforms concurrently (perception, control, visualization). A mutex-protected tree would serialize all readers, creating a bottleneck. The lock-free implementation uses atomic operations so readers never block each other or the writer. The cost is slightly more complex update logic, but the benefit is zero contention in multi-node systems. ## Trade-offs | Gain | Cost | |------|------| | **Single shared tree** -- all nodes see the same transforms | Must manage tree lifecycle carefully; clearing the tree affects everyone | | **Lock-free reads** -- zero contention between nodes | Writers pay slightly more overhead for atomic updates | | **Bounded history** -- memory usage is predictable | Old timestamps become unavailable; must tune `history_len` for your update rate | | **Static frame optimization** -- O(1) lookup, no interpolation | Must decide at registration time; changing a static frame requires `set_static_transform()` | | **SLERP interpolation** -- correct rotation at all interpolation points | Slightly more expensive than linear interpolation (~2x per lookup) | | **Name-based API** -- readable code (`tf("lidar", "world")`) | String lookup per call; use `frame_id()` + `tf_by_id()` on hot paths | --- ## See Also - [TransformFrame API Reference](/python/api/transform-frame) -- Full method tables and signatures - [TransformFrame Concepts](/concepts/transform-frame) -- Architecture and design of the frame tree system - [Recipe: Coordinate Transform Tree](/recipes/transform-frames) -- Copy-paste transform tree setup - [Geometry Messages](/python/messages/geometry) -- Pose, Twist, and TransformStamped message types - [Python Bindings](/python/api/python-bindings) -- Core Python API reference --- ## Vision Messages Path: /python/api/vision-messages Description: Camera calibration, compressed images, stereo info, and region of interest # Vision Messages Camera metadata and compressed image types. For raw images, see the [Image API](/python/api/image). ```python # simplified from horus import CompressedImage, CameraInfo, StereoInfo, RegionOfInterest ``` --- ## CameraInfo Camera intrinsic calibration — focal length, principal point, distortion. ```python # simplified info = horus.CameraInfo( width=640, height=480, fx=525.0, fy=525.0, # Focal length (pixels) cx=320.0, cy=240.0, # Principal point (pixels) ) ``` | Field | Type | Unit | Description | |-------|------|------|-------------| | `width`, `height` | `int` | px | Image dimensions | | `fx`, `fy` | `float` | px | Focal length | | `cx`, `cy` | `float` | px | Principal point | ## CompressedImage JPEG/PNG encoded image for network transport. ```python # simplified comp = horus.CompressedImage(format="jpeg", data=jpeg_bytes) ``` ## RegionOfInterest Rectangular region within an image. ```python # simplified roi = horus.RegionOfInterest(x_offset=100, y_offset=50, width=200, height=150) ``` ## StereoInfo Stereo camera configuration. ```python # simplified stereo = horus.StereoInfo(baseline=0.12, left_info=left_cam, right_info=right_cam) ``` --- ## See Also - [Image API](/python/api/image) — Zero-copy raw images - [Perception Messages](/python/api/perception-messages) — Detection, segmentation - [Rust Vision Messages](/rust/api/vision-messages) — Rust equivalent --- ## Ring Buffer Semantics (Python) Path: /python/ring-buffer Description: How HORUS topic buffers work in Python: capacity, overflow, recv behavior, drop detection, and tuning # Ring Buffer Semantics (Python) You have a LiDAR node publishing scans at 40 Hz and a path planner consuming them at 10 Hz. Four scans arrive between each planner tick. Where do the extra three go? They sit in the ring buffer. When the planner calls `node.recv("scan")`, it gets the oldest unread scan -- not the latest. If the planner falls further behind and 1024 scans pile up, the buffer is full. The next scan overwrites the oldest one. No error. No exception. No backpressure. The publisher keeps publishing at full speed. This is the fundamental contract of HORUS topic communication: **publishers never block, subscribers never stall, and when the buffer fills, the oldest message is silently dropped.** Understanding this contract is the difference between a system that works reliably and one where "my subscriber misses messages" becomes a recurring mystery. --- ## How the Ring Buffer Works Every topic subscription creates a ring buffer. The buffer is a fixed-size circular queue in shared memory. Publishers write to the head. Subscribers read from the tail. ``` Buffer capacity: 8 (shown small for illustration; default is 1024) After 5 publishes, 0 reads: [msg1] [msg2] [msg3] [msg4] [msg5] [ ] [ ] [ ] tail head ^ recv() returns msg1 After 5 publishes, 2 reads: [ ] [ ] [msg3] [msg4] [msg5] [ ] [ ] [ ] tail head ^ recv() returns msg3 Buffer full (8 publishes, 0 reads): [msg1] [msg2] [msg3] [msg4] [msg5] [msg6] [msg7] [msg8] tail head 9th publish — msg1 is dropped, msg9 takes its slot: [msg9] [msg2] [msg3] [msg4] [msg5] [msg6] [msg7] [msg8] tail head ^ recv() returns msg2 (msg1 is gone) ``` ### Key Properties | Property | Behavior | |----------|----------| | Default capacity | 1024 messages per topic | | Overflow policy | Oldest message dropped (no backpressure) | | `recv()` when empty | Returns `None` | | `recv()` when available | Returns oldest unread message (FIFO) | | `recv_all()` | Returns all buffered messages as a list, drains the buffer | | `has_msg()` | Returns `True` if at least one message is available, does not consume it | | Publisher behavior on full buffer | Overwrites oldest, continues at full speed | | Subscriber behavior on empty buffer | Returns immediately with `None` | --- ## The Core API ### `recv()` -- Read One Message ```python # simplified def tick(node): msg = node.recv("scan") if msg is None: return # Nothing available this tick # Process the oldest unread message process(msg) ``` `recv()` returns the oldest unread message from the buffer and removes it. If the buffer is empty, it returns `None` immediately -- it never blocks. Each call consumes exactly one message. ### `recv_all()` -- Drain Everything ```python # simplified def tick(node): messages = node.recv_all("commands") for msg in messages: execute(msg) # messages is [] if nothing was buffered ``` `recv_all()` returns every buffered message as a Python list and empties the buffer. Returns an empty list if nothing is available. Use this when you need to process every message and cannot afford to drop any. ### `has_msg()` -- Peek Without Consuming ```python # simplified def tick(node): if node.has_msg("emergency.stop"): stop_msg = node.recv("emergency.stop") handle_emergency(stop_msg) ``` `has_msg()` checks if at least one message is available without consuming it. The message stays in the buffer for the next `recv()` call. Use this for conditional processing -- check first, then decide whether to consume. --- ## "My Subscriber Misses Messages" This is the most common question from developers new to HORUS. Here is exactly what happens and why. ### The Scenario A camera node publishes images at 30 Hz. A detection node subscribes and processes them at 10 Hz. Over 3 seconds: - Camera publishes 90 images - Detection processes 30 images (one per tick at 10 Hz) - 60 images sit in the buffer (90 - 30 = 60) If this continues indefinitely, the buffer fills to 1024. After that, every new publish drops the oldest unread image. **This is not a bug.** The detection node is processing images slower than they arrive. The ring buffer absorbs the burst, and when full, it keeps only the most recent 1024 images. ### When `recv()` Returns `None` `recv()` returns `None` only when the buffer is truly empty -- the subscriber has consumed everything and the publisher has not written anything new since the last read. This happens when: 1. The publisher has not started yet (startup race) 2. The publisher runs slower than the subscriber 3. The subscriber just called `recv_all()` and drained everything `recv()` does **not** return `None` when messages were dropped due to overflow. Overflow is invisible to the subscriber -- it simply never sees the dropped messages. ### How to Know If Messages Were Dropped There is no built-in per-message drop counter on the Python API. Instead, use these approaches: **Approach 1: Check publish rate vs. receive rate with CLI tools** ```bash # In one terminal: check how fast a topic is being published horus topic hz scan # In another: check how many messages are buffered horus topic info scan ``` If the publish rate is significantly higher than your subscriber's tick rate and the buffer is near capacity, drops are happening. **Approach 2: Use message timestamps** If your messages include a `timestamp_ns` field (all typed messages do), check for gaps: ```python # simplified last_ts = None def tick(node): global last_ts msg = node.recv("imu") if msg is None: return if last_ts is not None: gap_ms = (msg.timestamp_ns - last_ts) / 1_000_000 expected_ms = 10.0 # 100 Hz publisher if gap_ms > expected_ms * 2: node.log_warning(f"IMU gap: {gap_ms:.1f}ms (expected ~{expected_ms:.1f}ms)") last_ts = msg.timestamp_ns ``` A timestamp gap larger than the expected publish interval means either messages were dropped or the publisher hiccupped. **Approach 3: Count what you receive** ```python # simplified received = 0 def tick(node): global received msgs = node.recv_all("data") received += len(msgs) # After the run: # If publisher sent 10000 and you received 8500, 1500 were dropped ``` --- ## Tuning Buffer Capacity ### Setting Capacity Set the default buffer capacity for all topics auto-created by a node using `default_capacity` on the `Node` constructor: ```python # simplified import horus # Small buffer: only keep the 16 most recent images camera_sub = horus.Node( name="detector", subs=[horus.Image], tick=detect_tick, rate=10, default_capacity=16, ) # Large buffer: keep 4096 commands to never drop any command_processor = horus.Node( name="cmd_processor", subs=["commands"], tick=process_tick, rate=100, default_capacity=4096, ) ``` ### Choosing the Right Capacity The right capacity depends on two numbers: the publisher's rate and the subscriber's rate. **Minimum to avoid drops**: `capacity >= publisher_rate / subscriber_rate * max_burst_duration` For a 100 Hz publisher and a 10 Hz subscriber, each subscriber tick processes 1 message while 10 arrive. If the subscriber occasionally takes 2x longer (e.g., ML inference on a complex frame), 20 messages arrive in that period. A capacity of 64 gives comfortable headroom for occasional slowdowns. **Rule of thumb**: | Publisher rate | Subscriber rate | Recommended capacity | |---------------|-----------------|---------------------| | Same or lower | Any | Default (1024) is fine | | 2-10x faster | Any | Default (1024) is fine | | 10-100x faster | Any | Consider larger if every message matters | | Any | Bursty (variable processing time) | 2-4x the steady-state backlog | ### Memory Impact Each slot in the ring buffer holds one message. For typed (Pod) messages, the memory per slot is the message size (e.g., 48 bytes for `CmdVel`, 64 bytes for `Imu`). For generic messages (dicts), each slot holds a serialized MessagePack blob. | Capacity | `CmdVel` (48 B) | `Imu` (64 B) | `Image` 640x480 RGB (921 KB) | |----------|-----------------|--------------|------------------------------| | 64 | 3 KB | 4 KB | 57 MB | | 1024 (default) | 48 KB | 64 KB | 922 MB | | 4096 | 192 KB | 256 KB | 3.6 GB | For large messages like images, **reduce the buffer capacity**. A 1024-slot buffer for 640x480 RGB images uses nearly 1 GB. Set `default_capacity=8` or `default_capacity=16` for image topics -- you almost certainly want the latest frame, not a backlog of 1024 frames. --- ## When Drops Are OK vs. When They Are Not ### Drops Are Fine **Video frames**: A perception node running at 10 Hz on a 30 Hz camera stream should process the most recent frame, not a queue of old ones. Dropped frames are not just acceptable -- they are desirable. Processing a 3-second-old frame is worse than skipping it. **Sensor readings for display**: A dashboard showing IMU data at 1 Hz does not need every 100 Hz reading. The latest reading is sufficient. **Telemetry / logging**: If the logger falls behind, it is better to lose some data points than to build up an ever-growing backlog that delays all future data. **Redundant readings**: If a temperature sensor publishes at 100 Hz but the value changes every few seconds, dropping 99% of the readings loses no information. For these cases, use `recv()` to get one message per tick, and accept that earlier messages were overwritten. Or even better, drain with `recv_all()` and only use the last one: ```python # simplified def tick(node): # Get all buffered frames, use only the latest frames = node.recv_all("camera.rgb") if frames: latest = frames[-1] process(latest) ``` ### Drops Are Not OK **Motor commands**: If a trajectory planner sends a sequence of 100 velocity commands and the motor controller drops 20, the robot follows the wrong trajectory. Every command in the sequence matters. **State machine transitions**: If a state manager sends "arm -> disarm -> arm" and the middle message is dropped, the system stays armed when it should be armed but missed the safety check. **Configuration updates**: If a parameter server sends updated PID gains and the message is dropped, the controller runs with stale gains indefinitely. **Sequential protocols**: Any communication where message ordering and completeness matter (handshakes, acknowledgments, multi-step commands). For these cases: ```python # simplified # Use recv_all() to process every message def motor_tick(node): commands = node.recv_all("trajectory") for cmd in commands: apply_velocity(cmd) # And increase capacity if needed motor = horus.Node( name="motor", subs=["trajectory"], tick=motor_tick, rate=100, default_capacity=4096, ) ``` --- ## Patterns ### Latest-Only (Skip Backlog) When you only care about the most recent value: ```python # simplified def tick(node): msgs = node.recv_all("camera.rgb") if msgs: process(msgs[-1]) # Only the newest ``` ### Process All (Never Drop) When every message must be handled: ```python # simplified def tick(node): for cmd in node.recv_all("commands"): execute(cmd) ``` ### Conditional Processing When some messages need immediate handling: ```python # simplified def tick(node): # Always check for emergency stop first if node.has_msg("emergency.stop"): stop = node.recv("emergency.stop") handle_emergency(stop) return # Normal processing msg = node.recv("data") if msg: process(msg) ``` ### Accumulate and Batch When you want to collect messages over time and process as a batch: ```python # simplified accumulated = [] def tick(node): global accumulated accumulated.extend(node.recv_all("measurements")) if len(accumulated) >= 100: result = analyze_batch(accumulated) node.send("analysis", result) accumulated.clear() ``` --- ## Common Mistakes ### Mistake 1: Calling `recv()` Once When Multiple Messages Are Buffered ```python # simplified # WRONG: processes only 1 message per tick, even if 50 are buffered def tick(node): msg = node.recv("commands") if msg: execute(msg) ``` If the publisher is faster than the subscriber, this creates an ever-growing backlog. The buffer eventually fills and messages are dropped. If every message matters, use `recv_all()`: ```python # simplified # CORRECT: processes all buffered messages each tick def tick(node): for msg in node.recv_all("commands"): execute(msg) ``` ### Mistake 2: Assuming `recv()` Returns the Latest Message ```python # simplified # WRONG assumption: "recv gives me the newest" def tick(node): latest = node.recv("scan") # This is the OLDEST unread, not the newest ``` `recv()` returns the oldest unread message (FIFO order). If you want the latest, drain and use the last: ```python # simplified # CORRECT: get the latest def tick(node): scans = node.recv_all("scan") if scans: latest = scans[-1] ``` ### Mistake 3: Not Checking for `None` ```python # simplified # WRONG: crashes when buffer is empty def tick(node): msg = node.recv("data") process(msg["value"]) # TypeError: 'NoneType' is not subscriptable ``` Always check: ```python # simplified # CORRECT def tick(node): msg = node.recv("data") if msg is not None: process(msg["value"]) ``` ### Mistake 4: Using Default Capacity for Large Messages ```python # simplified # DANGEROUS: 1024 * 921KB = 922 MB for one topic node = horus.Node( subs=[horus.Image], tick=process_tick, rate=10, # default_capacity=1024 (implicit) ) ``` Reduce capacity for large messages: ```python # simplified # CORRECT: 16 * 921KB = ~14 MB node = horus.Node( subs=[horus.Image], tick=process_tick, rate=10, default_capacity=16, ) ``` --- ## Monitoring Buffer Health Use the HORUS CLI to inspect topic buffer state at runtime: ```bash # Check publish rate horus topic hz scan # Output: scan: 40.0 Hz # List active topics and their metadata horus topic list # Watch a topic's contents horus topic echo scan --count 5 ``` If you see the publish rate is much higher than your subscriber's tick rate and you cannot afford drops, either increase the subscriber's rate, increase the buffer capacity, or use `recv_all()` to drain each tick. --- ## Design Decisions **Why a fixed-size ring buffer instead of an unbounded queue?** Unbounded queues leak memory. A LiDAR publishing at 40 Hz on a 1 Hz subscriber would accumulate 40 messages per second indefinitely. After an hour, that is 144,000 messages consuming memory that will never be reclaimed because the subscriber will never catch up. A fixed-size ring buffer caps memory at `capacity * message_size`, and the publisher keeps running at full speed regardless of subscriber behavior. In robotics, predictable memory usage is a requirement, not a luxury. **Why drop the oldest message instead of the newest?** When the buffer is full and a new message arrives, the system has two choices: drop the new message (keep old data) or drop the oldest message (keep new data). In robotics, newer data is almost always more valuable -- the robot has moved since the old reading was taken. Keeping old data at the expense of new data means the subscriber is working with increasingly stale information. Dropping the oldest keeps the buffer as fresh as possible. **Why no backpressure?** Backpressure means the publisher slows down or blocks when the subscriber cannot keep up. In robotics, publishers are often tied to physical sensors (a camera produces frames at 30 Hz regardless of what the subscriber does). Blocking the sensor read thread causes hardware buffer overflows, dropped interrupts, or device resets. Even for software publishers, backpressure from a slow logger should never cause a motor controller to miss its deadline. The ring buffer decouples publisher and subscriber timing completely. **Why `recv()` returns the oldest, not the latest?** FIFO ordering preserves causality. If a trajectory planner sends commands `[1, 2, 3, 4, 5]`, the motor controller should execute them in that order. Returning the latest would skip commands. Developers who want only the latest can use the `recv_all()[-1]` pattern, but the default behavior preserves message ordering for safety-critical communication. **Why 1024 as the default capacity?** It is large enough to absorb multi-second bursts between a fast publisher and slow subscriber (e.g., 100 Hz pub, 10 Hz sub = 10 messages per subscriber tick, 1024 covers ~100 seconds of backlog). It is small enough that memory usage stays reasonable for small messages (1024 * 64 bytes = 64 KB per topic). Large messages (images, point clouds) should explicitly reduce the capacity. ## Trade-offs | Gain | Cost | |------|------| | **Publishers never block** -- sensor nodes run at full speed regardless of subscribers | Subscribers can miss messages with no notification | | **Fixed memory** -- `capacity * message_size`, never grows | Must choose capacity at construction time; too small drops messages, too large wastes memory | | **FIFO ordering** -- messages arrive in publish order | Subscribers that only need the latest must drain the buffer themselves | | **No backpressure** -- fast publisher cannot stall slow subscriber | No built-in mechanism to signal "subscriber is falling behind" | | **Zero-copy shared memory** -- publisher and subscriber share the same buffer | Buffer is per-subscriber; N subscribers means N copies of each message | | **Default capacity 1024** -- works for most small-message topics without tuning | Image topics at 1024 slots use nearly 1 GB; must tune manually | --- ## See Also - [Python Bindings](/python/api/python-bindings) -- Full `recv()`, `recv_all()`, `has_msg()`, and `send()` reference - [Topics: How Nodes Talk](/concepts/topics-beginner) -- Beginner introduction to topic communication - [Topics -- Full Reference](/concepts/core-concepts-topic) -- Topic architecture, zero-copy IPC, and shared memory - [Shared Memory](/concepts/shared-memory) -- How HORUS uses shared memory for zero-copy transport - [Python Examples](/python/examples) -- Complete working examples --- ## Force & Haptic Messages Path: /python/api/force-messages Description: Force/torque sensing, impedance control, haptic feedback, and tactile arrays # Force & Haptic Messages Force/torque sensing and haptic feedback types for manipulation and human-robot interaction. ```python # simplified from horus import WrenchStamped, ForceCommand, ContactInfo, HapticFeedback ``` --- ## WrenchStamped 6-DOF force/torque measurement with timestamp. ```python # simplified wrench = horus.WrenchStamped( fx=0.5, fy=0.0, fz=-9.81, tx=0.0, ty=0.1, tz=0.0, timestamp_ns=horus.timestamp_ns(), ) ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `fx`, `fy`, `fz` | `float` | `0.0` | Force components (N) | | `tx`, `ty`, `tz` | `float` | `0.0` | Torque components (Nm) | | `timestamp_ns` | `int` | `0` | Timestamp in nanoseconds | **Static Methods:** | Method | Returns | Description | |--------|---------|-------------| | `WrenchStamped.force_only(force)` | `WrenchStamped` | Create from a `Vector3` force (zero torque) | | `WrenchStamped.torque_only(torque)` | `WrenchStamped` | Create from a `Vector3` torque (zero force) | **Methods:** | Method | Returns | Description | |--------|---------|-------------| | `force_magnitude()` | `float` | L2 norm of force vector | | `torque_magnitude()` | `float` | L2 norm of torque vector | | `exceeds_limits(max_force, max_torque)` | `bool` | True if force or torque exceeds limits | | `filter(prev, alpha)` | `None` | Low-pass filter with previous reading (mutates self) | ## ForceCommand Force/torque command for impedance or force control. ```python # simplified cmd = horus.ForceCommand( fx=0.0, fy=0.0, fz=-5.0, tx=0.0, ty=0.0, tz=0.0, timeout=1.0, timestamp_ns=horus.timestamp_ns(), ) ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `fx`, `fy`, `fz` | `float` | `0.0` | Force components (N) | | `tx`, `ty`, `tz` | `float` | `0.0` | Torque components (Nm) | | `timeout_seconds` | `float` | `0.0` | Command timeout (seconds). Constructor param: `timeout` | | `timestamp_ns` | `int` | `0` | Timestamp in nanoseconds | **Static Methods:** | Method | Returns | Description | |--------|---------|-------------| | `ForceCommand.force_only(target)` | `ForceCommand` | Create from a `Vector3` target force | | `ForceCommand.surface_contact(normal_force, normal)` | `ForceCommand` | Create a surface contact command | **Methods:** | Method | Returns | Description | |--------|---------|-------------| | `with_timeout(seconds)` | `ForceCommand` | Return a copy with timeout set | ## ContactInfo Contact state information. ```python # simplified contact = horus.ContactInfo(state=1, contact_force=2.5, confidence=0.9, timestamp_ns=horus.timestamp_ns()) ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `state` | `int` | `0` | Contact state code | | `contact_force` | `float` | `0.0` | Contact force magnitude (N) | | `confidence` | `float` | `0.0` | Detection confidence | | `stiffness` | `float` | `0.0` | Contact stiffness | | `damping` | `float` | `0.0` | Contact damping | | `timestamp_ns` | `int` | `0` | Timestamp in nanoseconds | | Method | Returns | Description | |--------|---------|-------------| | `is_in_contact()` | `bool` | True if currently in contact | | `contact_duration_seconds()` | `float` | Duration of current contact in seconds | ## ImpedanceParameters Impedance control parameters (stiffness, damping). Created with no parameters — read via getters. ```python # simplified imp = horus.ImpedanceParameters() # Read-only getters imp.stiffness # list[float] imp.damping # list[float] imp.inertia # list[float] imp.force_limits # list[float] imp.enabled # bool ``` | Getter | Type | Description | |--------|------|-------------| | `stiffness` | `list[float]` | Stiffness values per axis | | `damping` | `list[float]` | Damping values per axis | | `inertia` | `list[float]` | Inertia values per axis | | `force_limits` | `list[float]` | Force limits per axis | | `enabled` | `bool` | Whether impedance control is enabled | | `timestamp_ns` | `int` | Timestamp in nanoseconds | **Static Methods:** | Method | Returns | Description | |--------|---------|-------------| | `ImpedanceParameters.compliant()` | `ImpedanceParameters` | Create compliant (soft) parameters | | `ImpedanceParameters.stiff()` | `ImpedanceParameters` | Create stiff (rigid) parameters | **Methods:** | Method | Returns | Description | |--------|---------|-------------| | `enable()` | `None` | Enable impedance control | | `disable()` | `None` | Disable impedance control | ## HapticFeedback Haptic feedback command for teleoperation. ```python # simplified haptic = horus.HapticFeedback( vibration_intensity=0.8, vibration_frequency=100.0, duration_seconds=0.05, pattern_type=0, ) ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `vibration_intensity` | `float` | `0.0` | Vibration intensity (0.0-1.0) | | `vibration_frequency` | `float` | `0.0` | Vibration frequency (Hz) | | `duration_seconds` | `float` | `0.0` | Duration in seconds | | `pattern_type` | `int` | `0` | Haptic pattern type | | `enabled` | `bool` | `False` | Whether haptic feedback is enabled | | `timestamp_ns` | `int` | `0` | Timestamp in nanoseconds | **Static Methods:** | Method | Returns | Description | |--------|---------|-------------| | `HapticFeedback.vibration(intensity, frequency, duration)` | `HapticFeedback` | Create a vibration effect | | `HapticFeedback.force(force, duration)` | `HapticFeedback` | Create a force feedback effect (force is a `Vector3`) | | `HapticFeedback.pulse(intensity, frequency, duration)` | `HapticFeedback` | Create a pulsing vibration | ## TactileArray Tactile sensor array (e.g., fingertip pressure grid). ```python # simplified tactile = horus.TactileArray(rows=4, cols=4) ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `rows` | `int` | `0` | Number of rows | | `cols` | `int` | `0` | Number of columns | | `forces` | `list[float]` | — | Raw force values (rows * cols elements) | | `total_force` | `list[float]` | — | Total force vector (3 elements: x, y, z) | | `center_of_pressure` | `list[float]` | — | Center of pressure (2 elements: x, y) | | `in_contact` | `bool` | — | Whether any cell is in contact | | `physical_size` | `list[float]` | — | Physical dimensions (2 elements: width, height in meters) | | `timestamp_ns` | `int` | `0` | Timestamp in nanoseconds | | Method | Returns | Description | |--------|---------|-------------| | `get_force(row, col)` | `float` or `None` | Get force at a specific cell | | `set_force(row, col, force)` | `None` | Set force at a specific cell | --- ## See Also - [Control Messages](/python/api/control-messages) — Motor and servo commands - [Rust Force Messages](/rust/api/force-messages) — Rust equivalent --- ## Testing & Deterministic Mode (Python) Path: /python/testing Description: Unit testing, deterministic simulation, record and replay, pytest patterns, and CI/CD for Python HORUS nodes # Testing and Deterministic Mode (Python) A PID controller runs at 50 Hz. In production, it reads an IMU, computes a correction, and publishes a velocity command --- 20 ms per cycle, wall-clock time, with real sensor noise and OS jitter. In a test, you need the opposite: no wall clock, no jitter, no hardware, and identical results every run. You need to step the scheduler one tick at a time, inject fake sensor data, and assert that the controller produces the exact expected output. HORUS provides three tools for this: - **`tick_once()`** --- step the scheduler by exactly one tick cycle, then return control to your test - **Deterministic mode** --- replace the wall clock with a SimClock, fix `dt`, seed the RNG, and guarantee identical results across runs - **Record and replay** --- capture a full session and replay it for regression testing ## Single-Tick Testing with `tick_once()` `tick_once()` is the foundation of all HORUS testing. It executes exactly one tick cycle --- calling `tick()` on every due node in order --- then returns. No threads, no timing, no sleeping. You control the clock. ```python # simplified import horus results = [] def counter_tick(node): count = (node.recv("count") or 0) + 1 node.send("count", count) results.append(count) sched = horus.Scheduler(tick_rate=100) sched.add(horus.Node( name="counter", tick=counter_tick, rate=100, order=0, pubs=["count"], subs=["count"], )) sched.tick_once() # init (lazy) + tick 1 sched.tick_once() # tick 2 sched.tick_once() # tick 3 assert results == [1, 2, 3] ``` On the first call to `tick_once()`, the scheduler lazily initializes: it finalizes node configurations, calls `init()` on every node in order, then runs one tick. Subsequent calls run one tick each. ### Selective Ticking Pass a list of node names to tick only specific nodes: ```python # simplified sched.tick_once(["sensor"]) # only tick "sensor" sched.tick_once(["sensor", "controller"]) # tick sensor and controller, skip motor ``` This is the primary tool for unit-testing a single node in isolation. Other nodes remain frozen --- they do not tick, but their published data stays on the topics. ### `tick_for()` --- Timed Runs `tick_for()` runs the tick loop for a specific wall-clock duration, then returns: ```python # simplified sched.tick_for(1.0) # run for 1 second sched.tick_for(0.5, ["sensor"]) # run only sensor for 0.5 seconds ``` Use this for integration tests that need to observe behavior over many ticks without calling `tick_once()` in a loop. --- ## Deterministic Mode Enable deterministic mode on the Scheduler for reproducible results: ```python # simplified sched = horus.Scheduler(tick_rate=100, deterministic=True) ``` In deterministic mode, three things change: | Feature | Normal mode | Deterministic mode | |---------|------------|-------------------| | `horus.dt()` | Actual wall-clock elapsed time | Fixed `1/rate` every tick (e.g., 0.01 at 100 Hz) | | `horus.now()` | Wall-clock time | SimClock --- advances by `dt` each tick | | `horus.rng_float()` | System entropy | Tick-seeded sequence --- same across runs | This guarantees **identical results on any machine, any run**. The physics does not depend on how fast your CPU is or what else the OS is doing. ```python # simplified sched = horus.Scheduler(tick_rate=100, deterministic=True) sched.add(horus.Node(name="sim", tick=sim_tick, rate=100, order=0)) for _ in range(1000): sched.tick_once() assert horus.dt() == 0.01 # always 1/100 ``` ### Why Deterministic Mode Matters Without deterministic mode, a PID controller's output depends on the actual `dt` between ticks. On a fast machine, `dt` might be 9.98 ms. On a loaded CI server, it might be 10.5 ms. The integral term accumulates differently, and the test produces different outputs each run. With `deterministic=True`, `dt` is always exactly `1/rate`, and the test is reproducible. ### SimClock Behavior The SimClock starts at 0.0 and advances by `1/rate` each tick: ```python # simplified sched = horus.Scheduler(tick_rate=50, deterministic=True) sched.add(horus.Node(name="n", tick=lambda node: None, rate=50, order=0)) sched.tick_once() # horus.now() == 0.02, horus.dt() == 0.02 sched.tick_once() # horus.now() == 0.04, horus.dt() == 0.02 sched.tick_once() # horus.now() == 0.06, horus.dt() == 0.02 ``` All time-dependent logic --- `horus.now()`, `horus.dt()`, `horus.elapsed()` --- uses the SimClock. Wall-clock time is irrelevant. This means a 1000-tick test that simulates 10 seconds of robot time finishes in milliseconds. ### Seeded RNG `horus.rng_float()` returns a repeatable sequence in deterministic mode: ```python # simplified sched = horus.Scheduler(tick_rate=100, deterministic=True) values = [] def tick(node): values.append(horus.rng_float()) sched.add(horus.Node(name="rng", tick=tick, rate=100, order=0)) for _ in range(5): sched.tick_once() # values is identical across runs, across machines ``` Use `horus.rng_float()` instead of Python's `random.random()` for any randomness inside `tick()`. The system `random` module is not seeded by HORUS and will produce different results across runs. --- ## Testing Patterns ### Unit Testing a Single Node's `tick()` Test the tick function directly, without a scheduler, for pure logic validation: ```python # simplified def test_pid_proportional_gain(): """Verify the P-term of the PID controller.""" integral = [0.0] target = 1.0 def pid_tick(node): imu = node.recv("imu.data") if imu is None: return error = target - imu.accel_x integral[0] += error * horus.dt() command = 2.0 * error + 0.1 * integral[0] node.send("cmd_vel", horus.CmdVel(linear=command, angular=0.0)) results = [] def capture_tick(node): cmd = node.recv("cmd_vel") if cmd: results.append(cmd.linear) sched = horus.Scheduler(tick_rate=100, deterministic=True) sched.add(horus.Node(name="pid", tick=pid_tick, rate=100, order=0, subs=[horus.Imu], pubs=[horus.CmdVel])) sched.add(horus.Node(name="capture", tick=capture_tick, rate=100, order=10, subs=[horus.CmdVel])) # Inject fake IMU data by publishing before ticking imu_topic = horus.Topic("imu.data") imu_topic.send(horus.Imu(accel_x=0.0, accel_y=0.0, accel_z=9.81, gyro_x=0.0, gyro_y=0.0, gyro_z=0.0)) sched.tick_once() assert len(results) == 1 assert abs(results[0] - 2.0) < 0.01 # P-gain * error = 2.0 * 1.0 ``` ### Integration Testing with Real Topics Test data flow across multiple nodes: ```python # simplified def test_sensor_to_controller_pipeline(): """Verify that sensor data flows through the controller to a motor command.""" motor_cmds = [] def fake_imu(node): node.send("imu.data", horus.Imu( accel_x=0.3, accel_y=0.0, accel_z=9.81, gyro_x=0.0, gyro_y=0.0, gyro_z=0.0, )) def controller(node): imu = node.recv("imu.data") if imu: cmd = horus.CmdVel(linear=0.5 - imu.accel_x, angular=0.0) node.send("cmd_vel", cmd) def mock_motor(node): cmd = node.recv("cmd_vel") if cmd: motor_cmds.append(cmd.linear) sched = horus.Scheduler(tick_rate=100, deterministic=True) sched.add(horus.Node(name="imu", tick=fake_imu, rate=100, order=0, pubs=[horus.Imu])) sched.add(horus.Node(name="ctrl", tick=controller, rate=100, order=10, subs=[horus.Imu], pubs=[horus.CmdVel])) sched.add(horus.Node(name="motor", tick=mock_motor, rate=100, order=20, subs=[horus.CmdVel])) sched.tick_for(0.1) # 10 ticks at 100 Hz assert len(motor_cmds) > 0 assert abs(motor_cmds[0] - 0.2) < 0.01 # 0.5 - 0.3 = 0.2 ``` ### Testing Safety Behavior Verify that deadline miss policies and watchdog fire correctly: ```python # simplified def test_safety_monitor_stops_on_dangerous_velocity(): """Safety monitor should request_stop() when velocity exceeds threshold.""" stopped = [False] def unsafe_controller(node): node.send("cmd_vel", horus.CmdVel(linear=5.0, angular=0.0)) def safety_tick(node): cmd = node.recv("cmd_vel") if cmd and abs(cmd.linear) > 2.0: node.log_warning(f"Unsafe: {cmd.linear}") node.request_stop() stopped[0] = True sched = horus.Scheduler(tick_rate=100, deterministic=True) sched.add(horus.Node(name="ctrl", tick=unsafe_controller, rate=100, order=0, pubs=[horus.CmdVel])) sched.add(horus.Node(name="safety", tick=safety_tick, rate=100, order=1, subs=[horus.CmdVel])) sched.tick_once() assert stopped[0] is True def test_watchdog_detects_frozen_node(): """Verify the watchdog catches a node that hangs.""" import time def frozen_tick(node): time.sleep(2.0) # simulate frozen node sched = horus.Scheduler(tick_rate=10, watchdog_ms=100) sched.add(horus.Node(name="frozen", tick=frozen_tick, rate=10, order=0)) sched.tick_once() stats = sched.safety_stats() assert stats is not None assert stats["watchdog_expirations"] > 0 ``` ### Testing Multi-Rate Interactions Verify that nodes running at different rates interact correctly: ```python # simplified def test_multi_rate_data_flow(): """50 Hz controller reads from 100 Hz sensor --- gets latest value.""" readings = [] def fast_sensor(node): t = horus.now() node.send("sensor.data", {"value": t * 100}) def slow_controller(node): data = node.recv("sensor.data") if data: readings.append(data["value"]) sched = horus.Scheduler(tick_rate=100, deterministic=True) sched.add(horus.Node(name="sensor", tick=fast_sensor, rate=100, order=0, pubs=["sensor.data"])) sched.add(horus.Node(name="ctrl", tick=slow_controller, rate=50, order=10, subs=["sensor.data"])) # Run 10 ticks. The sensor ticks 10 times, the controller ticks 5 times. for _ in range(10): sched.tick_once() # Controller should have received about 5 readings assert len(readings) >= 4 assert len(readings) <= 6 ``` --- ## Mock Patterns for Hardware Dependencies Nodes that depend on hardware --- serial ports, I2C buses, GPIO pins --- cannot run in CI. Replace the hardware call with a mock in tests. ### Pattern 1: Dependency Injection via Closure ```python # simplified def make_motor_node(write_fn=None): """Create a motor node with injectable hardware write function.""" if write_fn is None: write_fn = real_motor_write # production: actual hardware def tick(node): cmd = node.recv("cmd_vel") if cmd: write_fn(cmd.linear, cmd.angular) node.send("motor.state", {"velocity": cmd.linear}) return horus.Node( name="motor", tick=tick, rate=50, order=10, subs=[horus.CmdVel], pubs=["motor.state"], ) def test_motor_writes_velocity(): """Test motor node without real hardware.""" writes = [] def mock_write(linear, angular): writes.append((linear, angular)) motor = make_motor_node(write_fn=mock_write) sched = horus.Scheduler(tick_rate=100, deterministic=True) sched.add(motor) # Inject a command topic = horus.Topic("cmd_vel") topic.send(horus.CmdVel(linear=0.5, angular=0.1)) sched.tick_once() assert len(writes) == 1 assert abs(writes[0][0] - 0.5) < 0.01 ``` ### Pattern 2: Fake Sensor Node Replace a real hardware driver with a fake that publishes known data: ```python # simplified def make_fake_lidar(ranges=None): """Fake LiDAR that publishes static scan data.""" if ranges is None: ranges = [1.0, 2.0, 3.0, 1.5] def tick(node): node.send("scan", horus.LaserScan( ranges=ranges, angle_min=-1.57, angle_max=1.57, angle_increment=3.14 / len(ranges), )) return horus.Node( name="lidar", tick=tick, rate=40, order=0, pubs=[horus.LaserScan], ) def test_planner_avoids_close_obstacle(): """Planner should slow down when an obstacle is close.""" commands = [] def capture(node): cmd = node.recv("cmd_vel") if cmd: commands.append(cmd.linear) sched = horus.Scheduler(tick_rate=100, deterministic=True) sched.add(make_fake_lidar(ranges=[0.3, 0.5, 1.0, 2.0])) sched.add(horus.Node(name="planner", tick=planner_tick, rate=10, order=10, compute=True, subs=[horus.LaserScan], pubs=[horus.CmdVel])) sched.add(horus.Node(name="capture", tick=capture, rate=100, order=20, subs=[horus.CmdVel])) sched.tick_for(0.5) assert len(commands) > 0 assert all(c < 1.0 for c in commands) # slowed down near obstacle ``` ### Pattern 3: Test Fixture with Setup and Teardown ```python # simplified import pytest @pytest.fixture def robot_sched(): """Create a deterministic scheduler with standard test nodes.""" sched = horus.Scheduler(tick_rate=100, deterministic=True) sched.add(make_fake_lidar()) sched.add(make_controller()) sched.add(make_safety()) yield sched # Scheduler cleans up automatically when garbage collected def test_controller_publishes_at_rate(robot_sched): """Controller should publish commands every tick.""" commands = [] def capture(node): cmd = node.recv("cmd_vel") if cmd: commands.append(cmd) robot_sched.add(horus.Node(name="capture", tick=capture, rate=100, order=100, subs=[horus.CmdVel])) for _ in range(50): robot_sched.tick_once() # Controller at 50 Hz in a 100 Hz scheduler: ~25 commands in 50 ticks assert len(commands) >= 20 ``` --- ## Record and Replay Record a session to capture all topic data for later replay and regression testing. ### Recording a Session ```python # simplified sched = horus.Scheduler(tick_rate=100, recording=True) sched.add(horus.Node(name="sensor", tick=sensor_tick, rate=100, order=0, pubs=["scan"])) sched.add(horus.Node(name="ctrl", tick=ctrl_tick, rate=50, order=10, subs=["scan"], pubs=["cmd_vel"])) # Run for 60 seconds, recording all topic data sched.run(duration=60.0) # Check recording status print(sched.is_recording()) # True while running ``` ### Stopping and Listing Recordings ```python # simplified # Stop recording and get file paths files = sched.stop_recording() for f in files: print(f"Recorded: {f}") # List all available recordings recordings = sched.list_recordings() for r in recordings: print(f"Session: {r}") ``` ### Using Recordings for Regression Tests Record a known-good session, then replay the inputs in a test and assert that outputs match: ```python # simplified def test_controller_regression(): """Replay recorded sensor data and verify controller outputs match baseline.""" sched = horus.Scheduler(tick_rate=100, deterministic=True) sched.add(horus.Node(name="ctrl", tick=ctrl_tick, rate=100, order=0, subs=["scan"], pubs=["cmd_vel"])) # Load recorded sensor data recordings = sched.list_recordings() assert len(recordings) > 0 # Replay: inject recorded data tick by tick # (actual replay API depends on your recording format) for _ in range(1000): sched.tick_once() # Compare outputs to baseline # ... ``` ### Deleting Old Recordings ```python # simplified sched.delete_recording("session_2026_03_20_143000") ``` --- ## pytest Integration ### Project Layout ``` my-robot/ horus.toml src/ main.py nodes/ imu.py controller.py planner.py safety.py tests/ conftest.py test_controller.py test_safety.py test_integration.py ``` ### `conftest.py` --- Shared Fixtures ```python # simplified import pytest import horus @pytest.fixture def det_sched(): """Deterministic scheduler for reproducible tests.""" return horus.Scheduler(tick_rate=100, deterministic=True) @pytest.fixture def fake_imu_node(): """Fake IMU that publishes constant readings.""" def tick(node): node.send("imu.data", horus.Imu( accel_x=0.0, accel_y=0.0, accel_z=9.81, gyro_x=0.0, gyro_y=0.0, gyro_z=0.0, )) return horus.Node(name="fake_imu", tick=tick, rate=100, order=0, pubs=[horus.Imu]) @pytest.fixture def capture_node(): """Capture node that collects CmdVel messages.""" captured = [] def tick(node): cmd = node.recv("cmd_vel") if cmd: captured.append(cmd) node = horus.Node(name="capture", tick=tick, rate=100, order=99, subs=[horus.CmdVel]) node._test_captured = captured # attach for test access return node ``` ### Test File Example ```python # simplified # tests/test_controller.py import horus from nodes.controller import make_controller def test_controller_zero_error_zero_output(det_sched, fake_imu_node, capture_node): """When IMU reads match the target, controller output should be near zero.""" det_sched.add(fake_imu_node) det_sched.add(make_controller(target_speed=0.0)) det_sched.add(capture_node) for _ in range(10): det_sched.tick_once() cmds = capture_node._test_captured assert len(cmds) > 0 assert all(abs(c.linear) < 0.01 for c in cmds) def test_controller_nonzero_target(det_sched, fake_imu_node, capture_node): """When target speed is 1.0 and IMU reads 0.0, controller should output positive.""" det_sched.add(fake_imu_node) det_sched.add(make_controller(target_speed=1.0)) det_sched.add(capture_node) for _ in range(10): det_sched.tick_once() cmds = capture_node._test_captured assert len(cmds) > 0 assert all(c.linear > 0.0 for c in cmds) ``` ### Running Tests ```bash # Run all tests pytest tests/ # Run with verbose output pytest tests/ -v # Run a specific test pytest tests/test_controller.py::test_controller_zero_error_zero_output # Run with coverage pytest tests/ --cov=src --cov-report=term-missing ``` --- ## CI/CD: Running Tests in Docker HORUS tests need shared memory (`/dev/shm`) for topic IPC. Docker provides this by default, but you may need to increase the size for large test suites. ### Dockerfile ```dockerfile FROM python:3.11-slim # Install HORUS RUN curl -fsSL https://github.com/softmata/horus/raw/release/install.sh | bash # Copy project WORKDIR /app COPY . . # Run tests with deterministic mode CMD ["pytest", "tests/", "-v", "--tb=short"] ``` ### Docker Compose for CI ```yaml services: test: build: . shm_size: "256m" # increase shared memory for topic IPC environment: - HORUS_LOG=warn # reduce log noise in CI ``` ### GitHub Actions Example ```yaml name: Test on: [push, pull_request] jobs: test: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: Install HORUS run: curl -fsSL https://github.com/softmata/horus/raw/release/install.sh | bash - name: Run tests run: pytest tests/ -v --tb=short - name: Clean shared memory if: always() run: horus clean --shm ``` ### Key CI Rules **Always use `deterministic=True` in tests.** Without it, tests depend on wall-clock timing and fail intermittently on loaded CI servers. **Always clean shared memory after tests.** HORUS topics use shared memory segments that persist after the process exits. Run `horus clean --shm` in your CI teardown step to avoid leaking shared memory between runs. **Set `HORUS_LOG=warn` in CI.** Debug-level logging produces megabytes of output per test run. Reduce to `warn` to keep CI logs readable. **Do not use `rt=True` in CI tests.** CI runners do not have `CAP_SYS_NICE` or `SCHED_FIFO`. The scheduler degrades gracefully, but RT features add overhead without benefit in CI. --- ## Deterministic Reproducibility Guarantees When `deterministic=True` is set: | Property | Guarantee | |----------|-----------| | `horus.dt()` | Always `1/rate`, independent of wall clock | | `horus.now()` | Monotonically increases by `dt` each tick | | `horus.rng_float()` | Same sequence across runs, across machines | | Execution order | Determined by `order=`, not OS thread scheduling | | Tick count | Exact: 100 calls to `tick_once()` means exactly 100 ticks | | Topic delivery | Same-tick delivery for same-order publishers and subscribers | What is **not** guaranteed: | Property | Why | |----------|-----| | Absolute wall-clock duration | `tick_once()` returns as fast as the CPU allows | | Python `random.random()` | Not seeded by HORUS --- use `horus.rng_float()` | | `time.time()` inside `tick()` | Returns wall clock, not SimClock | | External I/O (files, network) | Non-deterministic by nature | | GC timing with `gc.disable()` | GC pauses depend on allocation patterns | --- ## Quick Reference | I want to... | Pattern | |---|---| | Test one tick of a node | `sched.tick_once()` | | Test only specific nodes | `sched.tick_once(["node_a", "node_b"])` | | Run for N ticks | `for _ in range(N): sched.tick_once()` | | Run for a duration | `sched.tick_for(1.0)` | | Get reproducible results | `Scheduler(deterministic=True)` | | Inject fake sensor data | `horus.Topic("name").send(data)` before `tick_once()` | | Capture node outputs | Add a capture node that appends to a list | | Mock hardware writes | Dependency injection via closure | | Record a session | `Scheduler(recording=True)` | | Stop and save recording | `sched.stop_recording()` | | List recordings | `sched.list_recordings()` | | Run in CI | `deterministic=True`, no `rt=True`, clean shm after | ## See Also - [Scheduler Deep-Dive (Python)](/python/scheduler-guide) --- `tick_once()`, `tick_for()`, deterministic mode internals - [Safety and Policies (Python)](/python/safety-policies) --- miss policies, failure policies, watchdog - [Node Lifecycle (Python)](/python/node-lifecycle) --- init, tick, shutdown, error handling - [Real-Time Systems (Python)](/python/real-time) --- budget, deadline, GIL impact - [Tutorial: Real-Time Control (Python)](/tutorials/realtime-control-python) --- build the system these tests verify - [Python API Reference](/python/api) --- complete `Node()`, `Scheduler`, and `Topic` reference --- ## Input Messages Path: /python/api/input-messages Description: Joystick and keyboard input for teleoperation # Input Messages Human input device types for teleoperation and manual control. ```python # simplified from horus import JoystickInput, KeyboardInput ``` --- ## JoystickInput Event-based joystick/gamepad input. Each message represents a single input event (button press, axis move, hat change, or connection event) rather than the full controller state. ### Constructor ```python # simplified joy = JoystickInput(joystick_id=0, element_id=0, value=0.0, pressed=False) ``` ### Fields | Field | Type | Access | Description | |-------|------|--------|-------------| | `joystick_id` | `int` | get | Controller ID | | `element_id` | `int` | get | Button/axis/hat element ID | | `value` | `float` | get/set | Axis value (-1.0 to 1.0) or hat value | | `pressed` | `bool` | get | Whether a button is pressed | | `event_type` | `str` | get | Event type: `"button"`, `"axis"`, `"hat"`, or `"connection"` | | `element_name` | `str` | get | Human-readable element name | | `timestamp_ms` | `int` | get | Event timestamp in milliseconds | ### Static Methods (Factory Constructors) | Method | Returns | Description | |--------|---------|-------------| | `JoystickInput.new_button(joystick_id, button_id, name, pressed)` | `JoystickInput` | Create a button event | | `JoystickInput.new_axis(joystick_id, axis_id, name, value)` | `JoystickInput` | Create an axis event | | `JoystickInput.new_hat(joystick_id, hat_id, name, value)` | `JoystickInput` | Create a hat/d-pad event | | `JoystickInput.new_connection(joystick_id, connected)` | `JoystickInput` | Create a connection/disconnection event | ### Methods | Method | Returns | Description | |--------|---------|-------------| | `is_button()` | `bool` | True if this is a button event | | `is_axis()` | `bool` | True if this is an axis event | | `is_hat()` | `bool` | True if this is a hat/d-pad event | | `is_connection_event()` | `bool` | True if this is a connection event | | `is_connected()` | `bool` | True if the joystick just connected (only valid for connection events) | ### Example ```python # simplified # Create events using factory methods btn = JoystickInput.new_button(joystick_id=0, button_id=0, name="A", pressed=True) axis = JoystickInput.new_axis(joystick_id=0, axis_id=1, name="LeftY", value=0.75) # Query event type if btn.is_button() and btn.pressed: print(f"Button {btn.element_name} pressed") if axis.is_axis(): print(f"Axis {axis.element_name} = {axis.value}") ``` --- ## KeyboardInput Keyboard key event with modifier support. ### Constructor ```python # simplified key = KeyboardInput(key_name="", code=0, pressed=True, modifiers=0) ``` ### Fields | Field | Type | Access | Description | |-------|------|--------|-------------| | `key_name` | `str` | get | Human-readable key name | | `code` | `int` | get | Numeric key code | | `pressed` | `bool` | get | Whether the key is pressed | | `modifier_flags` | `int` | get | Bitmask of active modifiers | | `timestamp_ms` | `int` | get | Event timestamp in milliseconds | ### Methods | Method | Returns | Description | |--------|---------|-------------| | `is_ctrl()` | `bool` | True if Ctrl modifier is active | | `is_shift()` | `bool` | True if Shift modifier is active | | `is_alt()` | `bool` | True if Alt modifier is active | ### Example ```python # simplified key = KeyboardInput(key_name="w", code=119, pressed=True) if key.pressed and key.is_ctrl(): print("Ctrl+W pressed") ``` --- ## Example: Joystick Teleoperation ```python # simplified import horus linear_speed = 0.0 angular_speed = 0.0 def teleop_tick(node): global linear_speed, angular_speed joy = node.recv("joystick") if joy is None: return # Handle axis events for driving if joy.is_axis(): if joy.element_name == "LeftY": linear_speed = joy.value if abs(joy.value) > 0.1 else 0.0 elif joy.element_name == "LeftX": angular_speed = joy.value * 2.0 if abs(joy.value) > 0.1 else 0.0 # E-stop on button A if joy.is_button() and joy.element_name == "A" and joy.pressed: linear_speed = 0.0 angular_speed = 0.0 cmd = horus.CmdVel(linear=linear_speed, angular=angular_speed) node.send("cmd_vel", cmd) ``` --- ## AudioFrame Audio data for speech recognition, sound localization, and audio processing. ### Constructor ```python # simplified frame = AudioFrame(sample_rate=16000, channels=1, samples=None, timestamp_ns=0, frame_id="") ``` ### Fields | Field | Type | Access | Description | |-------|------|--------|-------------| | `samples` | `list[float]` | get | Audio sample data (f32) | | `num_samples` | `int` | get | Number of valid samples | | `sample_rate` | `int` | get | Sample rate in Hz | | `channels` | `int` | get | Number of audio channels | | `duration_ms` | `float` | get | Frame duration in milliseconds | | `frame_count` | `int` | get | Number of frames (samples / channels) | | `timestamp_ns` | `int` | get/set | Timestamp in nanoseconds | | `frame_id` | `str` | get | Source identifier | ### Static Methods | Method | Returns | Description | |--------|---------|-------------| | `AudioFrame.mono(sample_rate, samples)` | `AudioFrame` | Create mono (1-channel) frame | | `AudioFrame.stereo(sample_rate, samples)` | `AudioFrame` | Create stereo (2-channel) frame | | `AudioFrame.multi_channel(sample_rate, channels, samples)` | `AudioFrame` | Create multi-channel frame | ### Example ```python # simplified # Mono audio at 16kHz frame = AudioFrame.mono(16000, samples=[0.0] * 1600) print(f"Duration: {frame.duration_ms:.1f}ms") # 100.0ms # Stereo audio at 44.1kHz stereo = AudioFrame.stereo(44100, samples=[0.0] * 88200) ``` --- ## See Also - [Control Messages](/python/api/control-messages) — CmdVel, MotorCommand - [Rust Input Messages](/rust/api/input-messages) — Rust equivalent --- ## Clock Messages Path: /python/api/clock-messages Description: Clock synchronization and time reference messages # Clock Messages Time synchronization types for multi-node and multi-process systems. ```python # simplified from horus import Clock, TimeReference ``` --- ## Clock Timestamp message for clock distribution across nodes. ### Constructor ```python # simplified clock = Clock(clock_ns=0, realtime_ns=0, sim_speed=1.0, paused=False, source=0, timestamp_ns=0) ``` ### Fields | Field | Type | Description | |-------|------|-------------| | `clock_ns` | `int` | Clock time in nanoseconds | | `realtime_ns` | `int` | Real (wall) time in nanoseconds | | `sim_speed` | `float` | Simulation speed multiplier (1.0 = real-time) | | `paused` | `bool` | Whether the clock is paused | | `source` | `int` | Clock source: 0 = wall, 1 = sim, 2 = replay | | `timestamp_ns` | `int` | Message timestamp in nanoseconds | ### Static Methods | Method | Returns | Description | |--------|---------|-------------| | `Clock.wall_clock()` | `Clock` | Create a wall clock (real time) | | `Clock.sim_time(sim_ns, speed)` | `Clock` | Create a simulation clock with given time and speed multiplier | | `Clock.replay_time(replay_ns, speed)` | `Clock` | Create a replay clock with given time and speed multiplier | ### Methods | Method | Returns | Description | |--------|---------|-------------| | `elapsed_since(earlier)` | `int` | Elapsed time in nanoseconds since another clock reading | | `is_paused()` | `bool` | Check if the clock is paused | ### Example ```python # simplified # Wall clock clock = Clock.wall_clock() # Simulation clock at 2x speed sim = Clock.sim_time(sim_ns=1_000_000_000, speed=2.0) # Check elapsed time elapsed = clock.elapsed_since(sim) # Replay clock replay = Clock.replay_time(replay_ns=500_000_000, speed=1.0) print(replay.paused) # False ``` --- ## TimeReference External time reference for synchronization (GPS, NTP, PTP). ### Constructor ```python # simplified tref = TimeReference(time_ref_ns=0, source="", offset_ns=0, timestamp_ns=0) ``` ### Fields | Field | Type | Description | |-------|------|-------------| | `time_ref_ns` | `int` | External reference time in nanoseconds | | `source` | `str` | Time source name (e.g. `"gps"`, `"ntp"`, `"ptp"`) | | `offset_ns` | `int` | Signed offset in nanoseconds (local_time - reference_time) | | `timestamp_ns` | `int` | Message timestamp in nanoseconds | ### Methods | Method | Returns | Description | |--------|---------|-------------| | `correct_timestamp(local_ns)` | `int` | Correct a local timestamp using this reference's offset | ### Example ```python # simplified tref = TimeReference(time_ref_ns=1_000_000_000, source="gps", offset_ns=-500) corrected = tref.correct_timestamp(local_ns=1_000_000_500) ``` --- ## See Also - [Clock API](/python/api/clock) — Framework time functions (`now()`, `dt()`) - [Rust Clock Messages](/rust/api/clock-messages) — Rust equivalent --- ## GIL & Performance Path: /python/gil-performance Description: How the GIL works in HORUS, tick rate ceilings, and when to use Python vs Rust # GIL & Performance The GIL (Global Interpreter Lock) is the single most important performance factor for Python HORUS nodes. Understanding how it works lets you design systems that maximize Python's strengths while avoiding its limitations. --- ## How the GIL Works in HORUS The scheduler's tick loop runs in Rust. The GIL is only acquired when calling your Python callbacks: ``` Rust scheduler tick loop (no GIL held) │ ├── Acquire GIL (~500ns) ├── Call Python tick(node) ├── Release GIL │ ├── Acquire GIL (~500ns) ├── Call Python tick(node) for next node ├── Release GIL │ └── ... (Rust handles timing, SHM, RT) ``` **Key insight**: The scheduler, shared memory transport, ring buffers, and RT scheduling are all pure Rust — they run without the GIL. Only your Python `tick()`, `init()`, and `shutdown()` callbacks acquire the GIL. --- ## Tick Rate Ceiling The GIL acquisition + Python callback overhead is ~11μs per tick. This puts a hard ceiling on Python tick rates: | Target Rate | Budget per Tick | Achievable? | Headroom | |-------------|----------------|:-----------:|----------| | 100 Hz | 10ms | Yes | 900x | | 1,000 Hz | 1ms | Yes | 90x | | 5,000 Hz | 200μs | Marginal | ~18x | | 10,000 Hz | 100μs | No | Measured: ~5,932 Hz max | **Practical ceiling: ~5-6 kHz** for trivial tick functions. With real work (NumPy, I/O, computation), expect lower. ### What Costs What | Operation | Time | Source | |-----------|------|--------| | GIL acquire + release | ~500ns + 500ns | PyO3 boundary | | Python object allocation | ~700ns | Per-tick overhead | | `node.send(CmdVel)` | ~1.7μs | Typed message (total) | | `node.send(dict)` | ~6-50μs | GenericMessage serialization | | `node.recv()` | ~1.5μs | Typed message | | NumPy array creation | ~1-5μs | Depends on size | | `img.to_numpy()` | ~3μs | SHM view | | `np.from_dlpack(img)` | ~1.1μs | True zero-copy | --- ## When to Use Python vs Rust | Use Case | Language | Why | |----------|---------|-----| | ML inference (PyTorch, YOLO, TensorFlow) | **Python** | 1.7μs overhead negligible vs 10-200ms inference | | Data science, prototyping | **Python** | Developer velocity matters more than latency | | HTTP APIs, database queries | **Python** | Use async nodes, GIL released during I/O | | Visualization, dashboards | **Python** | matplotlib, plotly, etc. | | Motor control at 1kHz+ | **Rust** | 89ns vs 1,700ns — 19x difference | | Safety monitors | **Rust** | Deterministic timing, no GIL | | Sensor fusion at 500Hz+ | **Rust** | Predictable p99 latency | | High-frequency sensor drivers | **Rust** | Direct hardware access, no Python overhead | **Rule of thumb**: If your tick function takes >1ms (ML inference, complex planning, I/O), Python is fine — the GIL overhead is negligible. If it takes <100μs (control loops, sensor processing), use Rust. --- ## `compute=True` for CPU-Bound Nodes For CPU-heavy Python nodes (ML inference, path planning), use `compute=True` to run on a thread pool: ```python # simplified detector = horus.Node( name="yolo", tick=detect_tick, rate=30, compute=True, # Runs on worker thread, not main tick loop on_miss="skip", ) ``` **What this does**: The node runs on a separate thread. The GIL is still acquired for `tick()`, but it doesn't block the main scheduler loop — other nodes tick on time. **What this doesn't do**: It doesn't bypass the GIL. Two `compute=True` Python nodes still serialize through the GIL. For true parallelism in Python, use multi-process with `horus launch`. --- ## GC Pauses Python's garbage collector can cause tick jitter: | GC Impact | Typical Duration | |-----------|-----------------| | Gen 0 collection | ~50-200μs | | Gen 1 collection | ~500μs-2ms | | Gen 2 collection | ~5-50ms | ### Mitigation ```python # simplified import gc def init(node): # Disable automatic GC — run manually between ticks gc.disable() def tick(node): do_work() # Run GC only when budget allows if horus.budget_remaining() > 5 * horus.ms: gc.collect(0) # Gen 0 only (~100μs) ``` For hard-RT Python nodes, disable GC entirely and manage memory manually (pre-allocate buffers, reuse objects). --- ## Optimization Patterns ### Pre-allocate Outside tick() ```python # simplified import numpy as np # BAD: allocate every tick def tick(node): buffer = np.zeros((640, 480, 3)) # ~1ms allocation process(buffer) # GOOD: allocate once in init buffer = None def init(node): global buffer buffer = np.zeros((640, 480, 3)) # One-time cost def tick(node): process(buffer) # Reuse buffer ``` ### Use Typed Messages, Not Dicts ```python # simplified # SLOW: ~6-50μs (MessagePack serialization) node.send("cmd_vel", {"linear": 1.0, "angular": 0.5}) # FAST: ~1.7μs (zero-copy POD) node.send("cmd_vel", horus.CmdVel(linear=1.0, angular=0.5)) ``` ### Use DLPack for Images ```python # simplified import numpy as np # SLOW: ~14μs (data copy) frame = np.array(img) # FAST: ~1.1μs (zero-copy view) frame = np.from_dlpack(img) ``` --- ## Measuring Tick Performance ```python # simplified import horus import time tick_times = [] def profiled_tick(node): start = time.perf_counter_ns() # Your actual work here do_work() elapsed_us = (time.perf_counter_ns() - start) / 1000 tick_times.append(elapsed_us) if len(tick_times) % 1000 == 0: avg = sum(tick_times[-1000:]) / 1000 p99 = sorted(tick_times[-1000:])[990] node.log_info(f"Tick avg: {avg:.1f}μs, p99: {p99:.1f}μs") ``` Or use the scheduler's built-in stats: ```python # simplified sched = horus.Scheduler(tick_rate=100) sched.add(my_node) sched.run(duration=10.0) stats = sched.get_node_stats("my_node") print(f"Avg tick: {stats.get('avg_tick_duration_ms', 0):.2f}ms") ``` --- ## See Also - [Performance Guide](/python/performance) — Python-specific performance patterns - [Benchmarks](/performance/benchmarks) — Measured latency numbers - [Choosing a Language](/getting-started/choosing-language) — Rust vs Python comparison - [Scheduler API](/python/api/scheduler) — `compute=True`, budgets, deadlines - [NumPy & Zero-Copy](/python/numpy-zerocopy) — DLPack and pool-backed images --- ## ML Integration Path: /python/library/ml-utilities Description: Using PyTorch, ONNX Runtime, TensorFlow, and OpenCV with HORUS nodes # ML Integration Use ML frameworks directly in horus nodes — no wrapper library needed. Import PyTorch, ONNX Runtime, TensorFlow, or OpenCV and use them in your `tick` function. ## Zero-Copy Interop Matrix horus data types integrate with the Python ML ecosystem via three protocols: `__array_interface__` (NumPy), `__dlpack__` (universal), and `__cuda_array_interface__` (GPU). | horus type | NumPy | PyTorch | JAX | OpenCV | ONNX RT | |-----------|-------|---------|-----|--------|---------| | **Image** | `to_numpy()` / `from_numpy()` | `to_torch()` / `from_torch()` | `to_jax()` | via `to_numpy()` | via `to_numpy()` | | **PointCloud** | `to_numpy()` / `from_numpy()` | `to_torch()` / `from_torch()` | `to_jax()` | — | via `to_numpy()` | | **DepthImage** | `to_numpy()` / `from_numpy()` | `to_torch()` / `from_torch()` | `to_jax()` | via `to_numpy()` | via `to_numpy()` | All conversions are **zero-copy** (~3μs constant time, regardless of data size). The Python side gets a view into horus shared memory — no pixel data is copied. ```python # simplified img = node.recv("camera") # Any of these — all zero-copy, all ~3μs: np_arr = img.to_numpy() # NumPy ndarray tensor = img.to_torch() # PyTorch tensor jax_arr = img.to_jax() # JAX array dlpack = np.from_dlpack(img) # DLPack protocol (979ns) ``` **Performance**: A 1920×1080 RGB image (6MB) takes 3μs to access as NumPy vs 178μs to copy — **59x faster**. See [Benchmarks](/performance/benchmarks#python-benchmarks) for full numbers. ## ONNX Runtime (Recommended for Production) ```python # simplified import horus import onnxruntime as ort import numpy as np session = ort.InferenceSession("yolov8n.onnx", providers=["CUDAExecutionProvider"]) def detect(node): if node.has_msg("camera"): img = node.recv("camera").to_numpy() img = img.astype(np.float32) / 255.0 img = np.transpose(img, (2, 0, 1))[np.newaxis] # HWC→NCHW output = session.run(None, {"images": img}) node.send("detections", output[0]) horus.run( horus.Node(tick=detect, rate=30, subs=["camera"], pubs=["detections"], order=0), ) ``` ## PyTorch ```python # simplified import horus import torch model = torch.jit.load("resnet50.pt", map_location="cuda:0") model.eval() def classify(node): if node.has_msg("camera"): img = node.recv("camera").to_torch() # Zero-copy to PyTorch tensor with torch.no_grad(): output = model(img.unsqueeze(0).cuda()) class_id = output.argmax(dim=1).item() node.send("class", {"id": class_id, "confidence": output.max().item()}) horus.run( horus.Node(tick=classify, rate=10, subs=["camera"], pubs=["class"]), ) ``` ## OpenCV ```python # simplified import horus import cv2 import numpy as np def process_frame(node): if node.has_msg("camera"): img = node.recv("camera").to_numpy() gray = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY) edges = cv2.Canny(gray, 50, 150) result = horus.Image.from_numpy(edges) node.send("edges", result) horus.run( horus.Node(tick=process_frame, rate=30, subs=["camera"], pubs=["edges"]), ) ``` ## TensorFlow / TFLite ```python # simplified import horus import tensorflow as tf model = tf.saved_model.load("saved_model") def infer(node): if node.has_msg("input"): data = node.recv("input") tensor = tf.convert_to_tensor(data, dtype=tf.float32) output = model(tensor) node.send("output", output.numpy()) horus.run(horus.Node(tick=infer, rate=10, subs=["input"], pubs=["output"])) ``` ## Performance Tips - **Use `compute=True`** for CPU-bound inference — runs on thread pool, releases GIL during C extension calls (NumPy, ONNX, PyTorch): ```python # simplified horus.Node(tick=detect, rate=30, compute=True, ...) ``` - **Set realistic `budget`** to detect slow inference: ```python # simplified horus.Node(tick=detect, rate=30, budget=50 * horus.ms, on_miss="skip") ``` - **Use `horus.Image.to_torch()`** for zero-copy GPU transfer — no pixel data copied. - **Batch with `recv_all()`** if messages queue up: ```python # simplified def batch_infer(node): frames = node.recv_all("camera") if frames: batch = np.stack([f.to_numpy() for f in frames]) outputs = session.run(None, {"images": batch}) for det in outputs[0]: node.send("detections", det) ``` ### GPU Memory Management Critical for Jetson and other embedded devices with 4-8GB shared RAM between CPU and GPU. ```python # simplified import torch # Limit GPU memory on embedded devices torch.cuda.set_per_process_memory_fraction(0.5) # Use max 50% of VRAM # Always use no_grad for inference with torch.no_grad(): output = model(input_tensor) # Periodically clear cache torch.cuda.empty_cache() ``` - Prefer FP16 or INT8 quantized models on embedded - Monitor with `torch.cuda.memory_allocated()` / `torch.cuda.max_memory_allocated()` ### Error Handling ```python # simplified def my_init(node): try: model = torch.load("model.pt", map_location="cuda") except FileNotFoundError: node.log_error("Model file not found — running without ML") model = None except RuntimeError as e: if "CUDA out of memory" in str(e): node.log_error("GPU OOM — try a smaller model or reduce batch size") model = None else: raise ``` ### Model Warmup First inference is 10-100x slower than steady-state due to CUDA kernel compilation and memory allocation. Run a dummy inference in `init()` before the tick loop: ```python # simplified def my_init(node): model.eval() dummy = torch.zeros(1, 3, 640, 640).cuda() with torch.no_grad(): model(dummy) # warmup — first call compiles CUDA kernels node.log_info('Model warmed up') ``` ## Quick Reference | Framework | Import | Zero-Copy From Image | Inference Pattern | |-----------|--------|---------------------|-------------------| | ONNX Runtime | `import onnxruntime as ort` | `img.to_numpy()` | `session.run(None, input_dict)` | | PyTorch | `import torch` | `img.to_torch()` | `model(tensor.unsqueeze(0).cuda())` | | OpenCV | `import cv2` | `img.to_numpy()` | `cv2.cvtColor(arr, cv2.COLOR_RGB2GRAY)` | | TensorFlow | `import tensorflow as tf` | `img.to_numpy()` | `model(tf.convert_to_tensor(arr))` | | JAX | `import jax` | `img.to_jax()` | `model.apply(params, arr)` | ## Design Decisions **Why no ML wrapper library?** HORUS provides zero-copy data types (Image, PointCloud, DepthImage) with direct interop to ML frameworks via `to_numpy()`, `to_torch()`, `to_jax()`. Adding a wrapper would hide the framework API, limit flexibility, and add maintenance burden as frameworks evolve. Instead, import your framework directly and use HORUS types as the data bridge. **Why ONNX Runtime recommended for production?** ONNX Runtime provides consistent cross-platform inference with hardware acceleration (CUDA, TensorRT, OpenVINO) and does not require a full training framework at runtime. PyTorch models export to ONNX via `torch.onnx.export()`, giving you PyTorch for training and ONNX RT for deployment. **Why `compute=True` for CPU inference instead of async?** ML inference is CPU-bound (or GPU-bound), not I/O-bound. Async nodes are designed for I/O waits (HTTP, database). The `compute=True` flag runs the node on a compute thread pool and releases the GIL during C extension calls (NumPy, ONNX, PyTorch), giving better throughput than async for number-crunching workloads. **Why set `budget` for inference nodes?** ML inference time varies with input complexity (more detections = slower NMS). Setting a budget (e.g., `budget=50 * horus.ms`) lets the scheduler detect when inference exceeds its time allocation and take action (`on_miss="skip"` drops the frame, `on_miss="warn"` logs it). This prevents a slow model from starving downstream control nodes. ## See Also - [Image](/python/api/image), [PointCloud](/python/api/pointcloud), [DepthImage](/python/api/depth-image) — zero-copy types - [Async Nodes](/python/api/async-nodes) — Non-blocking inference with `async def` - [Perception Types](/python/api/perception) — Detection, BoundingBox, Landmark types - [Image](/stdlib/messages/image) — Camera image type for ML pipelines - [Detection](/stdlib/messages/detection) — Object detection output types --- ## ML Messages Path: /python/api/ml-messages Description: Audio frames and ML inference data types # ML Messages Data types for machine learning pipelines and audio processing. ```python # simplified from horus import AudioFrame ``` --- ## AudioFrame Audio data — mono or stereo PCM samples. ```python # simplified import horus frame = horus.AudioFrame( channels=1, # 1=mono, 2=stereo sample_rate=16000, # Hz samples=[0.0] * 1600, # 100ms of audio at 16kHz ) ``` | Field | Type | Unit | Description | |-------|------|------|-------------| | `channels` | `int` | — | Number of channels (1 or 2) | | `sample_rate` | `int` | Hz | Sample rate | | `samples` | `list[float]` | — | PCM samples (interleaved for stereo) | --- ## See Also - [Tensor API](/python/api/tensor) — General-purpose tensor data - [Image API](/python/api/image) — Camera frames - [Rust ML Messages](/rust/api/ml-messages) — Rust equivalent --- ## NumPy & Zero-Copy Path: /python/numpy-zerocopy Description: Zero-copy data transfer between HORUS and NumPy/PyTorch via DLPack and pool-backed shared memory # NumPy & Zero-Copy HORUS transfers images, point clouds, and tensors between Rust and Python with zero memory copies. This page shows how. --- ## The Three Paths | Method | Latency | Copy? | Use When | |--------|---------|:-----:|----------| | `np.from_dlpack(img)` | ~1.1μs | No | ML inference, GPU pipelines | | `img.to_numpy()` | ~3.0μs | No (SHM view) | General numpy processing | | `np.array(img)` / `np.copy()` | ~14μs | Yes | Need to modify data or hold past next `recv()` | --- ## Image to NumPy ```python # simplified import horus import numpy as np def detect_tick(node): img = node.recv("camera.rgb") if img is None: return # Zero-copy — returns a numpy view backed by shared memory frame = img.to_numpy() # shape: (480, 640, 3), dtype: uint8 # ~3μs — no data movement # Or use DLPack for maximum performance frame = np.from_dlpack(img) # ~1.1μs — true zero-copy # Process with numpy/OpenCV mean_brightness = frame.mean() node.send("brightness", {"value": float(mean_brightness)}) ``` ### When Copies Happen The zero-copy view is backed by the HORUS shared memory pool. It becomes invalid when: - **Next `recv()` overwrites the slot** — the ring buffer reuses memory. If you need to hold the frame across ticks, copy it: `frame = img.to_numpy().copy()` - **You modify the array** — `to_numpy()` returns a read-only view. To modify, copy first: `frame = img.to_numpy().copy(); frame[0,0] = 255` - **You pass to a function that requires contiguous/owned memory** — some libraries need owned arrays ```python # simplified # SAFE: process immediately, don't hold across ticks def tick(node): img = node.recv("camera") if img: result = model.predict(img.to_numpy()) # Used immediately, no copy needed # UNSAFE: holding reference across ticks stored_frame = None def tick(node): global stored_frame img = node.recv("camera") if img: stored_frame = img.to_numpy() # BAD — will be overwritten on next recv() # SAFE: copy if you need to hold it def tick(node): global stored_frame img = node.recv("camera") if img: stored_frame = img.to_numpy().copy() # OK — owned copy ``` --- ## Image to PyTorch ```python # simplified import torch def inference_tick(node): img = node.recv("camera.rgb") if img is None: return # Zero-copy to PyTorch tensor via DLPack tensor = torch.from_dlpack(img) # (H, W, C) uint8 on CPU # Move to GPU for inference tensor = tensor.permute(2, 0, 1).unsqueeze(0).float() / 255.0 tensor = tensor.to("cuda") with torch.no_grad(): output = model(tensor) node.send("predictions", process_output(output)) ``` --- ## PointCloud to NumPy ```python # simplified def lidar_tick(node): cloud = node.recv("lidar.points") if cloud is None: return # Zero-copy to numpy — shape depends on point type points = cloud.to_numpy() # XYZ: shape (N, 3), dtype float32 # XYZI: shape (N, 4), dtype float32 # XYZRGB: shape (N, 6), dtype float32 # Filter points within 5m range distances = np.linalg.norm(points[:, :3], axis=1) nearby = points[distances < 5.0] node.send("nearby_points", {"count": len(nearby)}) ``` --- ## DepthImage to NumPy ```python # simplified def depth_tick(node): depth = node.recv("camera.depth") if depth is None: return # Zero-copy — shape (H, W), dtype float32 (meters) depth_map = depth.to_numpy() # Find closest obstacle valid = depth_map[depth_map > 0] if len(valid) > 0: min_dist = valid.min() node.send("closest", {"distance_m": float(min_dist)}) ``` --- ## Performance Summary Data from [Benchmarks](/performance/benchmarks) page, measured on i9-14900K: | Operation | Latency | Throughput | |-----------|---------|-----------| | `np.from_dlpack()` (640x480 RGB) | **1.1μs** | 3.5M/s | | `img.to_numpy()` (640x480 RGB) | **3.0μs** | 1.5M/s | | `np.copy()` (640x480 RGB) | 14.0μs | 334K/s | | Typed message send+recv (`CmdVel`) | **1.7μs** | 2.7M/s | | Dict send+recv (small) | 6.2μs | 714K/s | **DLPack is 13x faster than copying** — it returns a numpy/torch array backed directly by the shared memory pool. --- ## Complete ML Pipeline ```python # simplified import horus import numpy as np def detect_tick(node): img = node.recv("camera.rgb") if img is None: return # Zero-copy to numpy (1.1μs via DLPack) frame = np.from_dlpack(img) # Run YOLO inference (~20-100ms) results = model.predict(frame) # Publish detections for r in results: node.send("detections", horus.Detection( class_id=r.class_id, class_name=r.class_name, confidence=float(r.confidence), bbox=horus.BoundingBox2D( x_min=r.x1, y_min=r.y1, x_max=r.x2, y_max=r.y2, ), )) detector = horus.Node( name="yolo", subs=[horus.Image], pubs=[horus.Detection], tick=detect_tick, rate=30, compute=True, budget=50 * horus.ms, on_miss="skip", ) horus.run(detector, tick_rate=100) ``` --- ## See Also - [Image API](/python/api/image) — Image constructor, encoding, pool allocation - [PointCloud API](/python/api/pointcloud) — Point cloud types and formats - [DepthImage API](/python/api/depth-image) — Depth map types - [GIL & Performance](/python/gil-performance) — Tick rate ceilings, optimization patterns - [Benchmarks](/performance/benchmarks) — Full measured performance data --- ## Tensor Messages Path: /python/api/tensor-messages Description: Tensor descriptor types for ML inference pipelines # Tensor Messages Tensor metadata types for ML pipelines. For the full tensor API with DLPack zero-copy, see [Tensor API](/python/api/tensor). --- ## Usage Tensor data flows through HORUS using the `Image`, `PointCloud`, and `Tensor` pool-backed types — not message structs. The tensor descriptor types here are metadata for describing tensor shapes and dtypes. For ML inference, the typical pattern is: ```python # simplified import horus import numpy as np def inference_tick(node): img = node.recv("camera.rgb") if img is None: return # Zero-copy to numpy via DLPack frame = np.from_dlpack(img) # (H, W, C) uint8 # Run inference result = model(frame) node.send("predictions", result) ``` See [Tensor API](/python/api/tensor) for TensorHandle, TensorPool, and DLPack patterns. --- ## See Also - [Tensor API](/python/api/tensor) — TensorHandle, TensorPool, DLPack - [Image API](/python/api/image) — Zero-copy camera frames - [Rust Tensor Messages](/rust/api/tensor-messages) — Rust equivalent --- ## Python Deployment Path: /python/deployment Description: Deploying Python HORUS nodes to robots — venv, Docker, systemd, ARM platforms # Python Deployment How to get your Python HORUS nodes running on a real robot. --- ## Virtual Environments `horus run` manages Python virtual environments automatically via `.horus/`: ```bash # First run creates .horus/venv and installs dependencies from horus.toml horus run src/main.py # Dependencies declared in horus.toml [dependencies] with source = "pypi" # are installed into .horus/venv automatically ``` For manual venv management: ```bash python3 -m venv .venv source .venv/bin/activate pip install horus smbus2 pyserial numpy python src/main.py ``` --- ## Docker ```dockerfile FROM python:3.12-slim # Install system deps for hardware access RUN apt-get update && apt-get install -y \ i2c-tools \ && rm -rf /var/lib/apt/lists/* # Install HORUS RUN curl -fsSL https://horusrobotics.dev/install | bash # Install Python deps COPY horus.toml . RUN pip install smbus2 pyserial numpy # Copy project COPY src/ src/ # Run CMD ["horus", "run", "src/main.py"] ``` ```bash # Build docker build -t my-robot . # Run with hardware access docker run --rm \ --device /dev/i2c-1 \ --device /dev/ttyUSB0 \ --cap-add SYS_NICE \ my-robot ``` **Key flags:** - `--device /dev/i2c-1` — I2C bus access - `--device /dev/ttyUSB0` — Serial port access - `--cap-add SYS_NICE` — RT scheduling permissions - `--privileged` — Full hardware access (use only if needed) --- ## systemd Service Auto-start your robot on boot: ```ini # /etc/systemd/system/my-robot.service [Unit] Description=My Robot HORUS Node After=network.target [Service] Type=simple User=robot WorkingDirectory=/home/robot/my-robot ExecStart=/home/robot/.cargo/bin/horus run src/main.py Restart=on-failure RestartSec=5 Environment=PYTHONPATH=/home/robot/my-robot # RT scheduling permissions AmbientCapabilities=CAP_SYS_NICE LimitRTPRIO=99 LimitMEMLOCK=infinity [Install] WantedBy=multi-user.target ``` ```bash sudo systemctl daemon-reload sudo systemctl enable my-robot sudo systemctl start my-robot sudo journalctl -u my-robot -f # View logs ``` --- ## ARM Platforms ### Raspberry Pi ```bash # Install HORUS curl -fsSL https://horusrobotics.dev/install | bash # NumPy wheels are available for Pi 4/5 (aarch64) pip install numpy # I2C access sudo apt install i2c-tools python3-smbus sudo usermod -aG i2c $USER # GPIO access (for direct GPIO, not via horus drivers) pip install RPi.GPIO # or gpiod for modern interface ``` ### NVIDIA Jetson ```bash # NumPy + CUDA pre-installed on JetPack # PyTorch available via NVIDIA's wheel index: pip install torch --index-url https://pypi.ngc.nvidia.com # Install HORUS curl -fsSL https://horusrobotics.dev/install | bash ``` ### Common ARM Issues | Issue | Fix | |-------|-----| | `numpy` install fails (compilation) | Use `pip install numpy` on aarch64 (wheels available for Pi 4+) | | `scipy` takes 30+ min to install | Use `apt install python3-scipy` instead of pip | | `opencv-python` fails | Use `apt install python3-opencv` or `pip install opencv-python-headless` | | Permission denied on `/dev/i2c-*` | `sudo usermod -aG i2c $USER` then re-login | | Permission denied on `/dev/ttyUSB*` | `sudo usermod -aG dialout $USER` then re-login | --- ## Mixed Rust + Python Deployment For robots with both Rust and Python nodes, use a launch file: ```yaml # launch.yaml nodes: - name: motor_driver command: "horus run src/motor.rs" rate_hz: 1000 - name: ml_detector command: "horus run src/detector.py" rate_hz: 30 depends_on: [motor_driver] ``` ```bash horus launch launch.yaml ``` Both processes share topics via shared memory — zero-copy between Rust and Python. --- ## Freeze Dependencies Pin exact versions for reproducible deploys: ```bash # Generate lockfile from horus.toml horus lock # Or with pip pip freeze > requirements.txt ``` --- ## See Also - [Deployment Guide](/operations/deployment) — General deployment (Rust + Python) - [Real Hardware Recipe](/recipes/real-hardware) — I2C + serial examples - [GIL & Performance](/python/gil-performance) — Performance considerations - [Linux RT Setup](/advanced/rt-setup) — PREEMPT_RT, CPU isolation, permissions --- ## Python Bindings Path: /python/api/python-bindings Description: Production-ready Python API for HORUS robotics with advanced features # HORUS Python Bindings **Production-Ready Python API** for the HORUS robotics framework - combines simplicity with advanced features for professional robotics applications. ## Why HORUS Python? - **Zero Boilerplate**: Working node in 10 lines - **Flexible API**: Functional style or class inheritance - your choice - **Production Performance**: ~500ns latency (same shared memory as Rust) - **Per-Node Rate Control**: Different nodes at different frequencies (100Hz sensor, 10Hz logger) - **Message Timestamps**: Typed messages include `timestamp_ns` for timing - **Typed Messages**: Optional type-safe messages from Rust - **Multiprocess Support**: Process isolation and multi-language nodes - **Pythonic**: Feels like native Python, not a foreign function wrapper - **Rich Ecosystem**: Use NumPy, OpenCV, scikit-learn, etc. --- ## Quick Start ### Installation **Automatic (Recommended)** Python bindings are automatically installed when you run the HORUS installer: ```bash # From HORUS root directory ./install.sh ``` The installer will detect Python 3.9+ and automatically build and install the bindings. **Manual Installation** If you prefer to install manually or need to rebuild: ```bash # Install maturin (Python/Rust build tool) # Option A: Via Cargo (recommended for Ubuntu 24.04+) cargo install maturin # Option B: Via pip (if not blocked by PEP 668) # pip install maturin # Build and install from source cd horus_py maturin develop --release ``` **Requirements**: - Python 3.9+ - Rust 1.70+ - Linux (for shared memory support) ### Minimal Example ```python # simplified import horus def process(node): node.send("output", "Hello HORUS!") node = horus.Node(pubs="output", tick=process, rate=1) horus.run(node, duration=3) ``` This minimal example demonstrates functional-style node creation without class boilerplate. --- ## Core API ### Creating a Node ```python # simplified def Node( name: str = "", # Node name (auto-generated from tick function if empty) subs: str | list = "", # Topics to subscribe (string, list of strings, or list of message types) pubs: str | list = "", # Topics to publish (string, list of strings, or list of message types) tick: Callable = None, # Function called every tick: tick(node) -> None init: Callable = None, # Optional: called once on startup: init(node) -> None shutdown: Callable = None, # Optional: called on graceful shutdown: shutdown(node) -> None rate: int = 30, # Tick rate in Hz order: int = 100, # Execution priority (lower = earlier) budget: float = None, # Max tick duration in seconds (None = auto from rate) deadline: float = None, # Hard deadline in seconds (None = auto from rate) on_miss: str = None, # "warn", "skip", "safe_mode", "stop" (None = "warn") ) -> Node ``` **Parameters:** - `name` — Node identifier. If empty, derived from tick function name. - `subs/pubs` — Topic declarations. Accepts: `"topic_name"`, `["topic1", "topic2"]`, or typed `[CmdVel, LaserScan]` (see formats below). - `tick` — Main loop function, called at `rate` Hz. Receives the node instance. - `rate` — Tick frequency in Hz. Default: 30. Setting rate auto-enables RT scheduling. - `order` — Priority within scheduler tick (0-9: critical, 10-49: sensors, 50-99: processing, 100+: background). **Example:** ```python # simplified from horus import Node, CmdVel, LaserScan, Imu node = Node( name="controller", pubs=[CmdVel], # Typed — fast Pod zero-copy (~1.5μs) subs=[LaserScan, Imu], # Auto-named: "scan", "imu" tick=control_fn, rate=100, order=0, ) ``` **Topic declaration formats** (determines performance path): ```python # simplified # FAST — typed (Pod zero-copy, ~2.7μs send+recv) pubs=[CmdVel] # auto-name from type: "cmd_vel" pubs=[CmdVel, Pose2D] # multiple typed topics pubs={"motor": CmdVel} # custom name + type # GENERIC — string (MessagePack, ~10μs send+recv) pubs=["data"] # GenericMessage — for dicts and custom data pubs="single_topic" # shorthand for single string topic ``` **Parameters**: - `name` (str, optional): Node name (auto-generated if omitted) - `pubs`: Topics to publish — `[CmdVel]` (typed, fast) or `["name"]` (generic) - `subs`: Topics to subscribe — same formats as pubs - `tick` (callable): Function called each cycle, receives `(node)` as argument - `rate` (float): Execution rate in Hz (default: 30) - `init` (callable, optional): Setup function, called once at start - `shutdown` (callable, optional): Cleanup function, called once at end - `on_error` (callable, optional): Error handler, called if tick raises an exception - `default_capacity` (int, optional): Buffer capacity for auto-created topics (default: 1024) ### Alternative: Class as State Container For nodes with complex state, use a plain class and pass its method as the tick function: ```python # simplified import horus class SensorState: def __init__(self): self.reading = 0.0 def tick(self, node): self.reading += 0.1 node.send("temperature", self.reading) def init(self, node): print("Sensor initialized!") def shutdown(self, node): print("Sensor shutting down!") # Use it state = SensorState() sensor = horus.Node(name="sensor", tick=state.tick, init=state.init, shutdown=state.shutdown, pubs=["temperature"], rate=10) horus.run(sensor) ``` **Both patterns work!** Use functional style for simplicity or class containers for complex nodes with state. ### Node Functions Your tick function receives the node as a parameter: ```python # simplified def my_tick(node): # Check for messages if node.has_msg("input"): data = node.recv("input") # Get one message # Get all messages all_msgs = node.recv_all("input") # Send messages node.send("output", {"value": 42}) ``` **Node Methods**: #### send ```python # simplified def send(topic: str, data: Any) -> None ``` Publish a message to a topic. Non-blocking. Overwrites oldest if buffer is full. **Parameters:** - `topic: str` — Topic name (must be in the node's `pubs` list) - `data: Any` — Message to send. Can be: a dict, a typed message (`CmdVel`, `Image`, etc.), or any serializable object ```python # simplified node.send("cmd_vel", {"linear": 1.0, "angular": 0.0}) # dict node.send("cmd_vel", horus.CmdVel(1.0, 0.0)) # typed message ``` #### recv ```python # simplified def recv(topic: str) -> Optional[Any] ``` Receive one message from a topic (FIFO order). Returns `None` if no messages available. **Parameters:** - `topic: str` — Topic name (must be in the node's `subs` list) **Returns:** The message, or `None` if buffer is empty. ```python # simplified msg = node.recv("scan") if msg is not None: print(f"Got {len(msg.ranges)} ranges") ``` #### `node.recv_all(topic) -> list` Receive ALL available messages as a list. Drains the buffer completely. Returns an empty list if none available. Use this for batch processing when you need to handle every message, not just the latest: ```python # simplified def tick(node): # Process all queued commands (don't drop any) commands = node.recv_all("commands") for cmd in commands: execute_command(cmd) node.log_debug(f"Processed {len(commands)} commands this tick") ``` #### `node.has_msg(topic) -> bool` Check if at least one unread message is available on the topic without consuming it. The message is buffered internally and returned by the next `recv()` call. ```python # simplified def tick(node): if node.has_msg("emergency_stop"): stop = node.recv("emergency_stop") node.log_warning("Emergency stop received!") node.request_stop() ``` #### `node.request_stop()` Request the scheduler to shut down gracefully after the current tick completes. Use this to stop execution programmatically from within a node. ```python # simplified def tick(node): if horus.tick() >= 1000: node.log_info("Reached 1000 ticks, stopping") node.request_stop() error = check_safety() if error: node.log_error(f"Safety violation: {error}") node.request_stop() ``` #### `node.publishers() -> List[str]` Returns the list of topic names this node publishes to. ```python # simplified def init(node): node.log_info(f"Publishing to: {node.publishers()}") node.log_info(f"Subscribing to: {node.subscribers()}") ``` #### `node.subscribers() -> List[str]` Returns the list of topic names this node subscribes to. #### Logging Methods | Method | Description | |--------|-------------| | `node.log_info(msg)` | Log an informational message | | `node.log_warning(msg)` | Log a warning message | | `node.log_error(msg)` | Log an error message | | `node.log_debug(msg)` | Log a debug message | **Important:** Logging only works during `init()`, `tick()`, or `shutdown()` callbacks. Calling outside the scheduler raises a `RuntimeWarning` and the message is silently dropped. ```python # simplified def tick(node): node.log_info("Processing sensor data") node.log_warning("Sensor reading is stale") node.log_error("Failed to process data") node.log_debug(f"Raw value: {value}") # Outside scheduler — message is dropped with RuntimeWarning: node = horus.Node(name="test", tick=tick) node.log_info("This will be dropped!") # RuntimeWarning ``` **Node scheduling kwargs** (maps 1:1 to Rust NodeBuilder): | Kwarg | Default | Rust equivalent | |-------|---------|-----------------| | `rate` | `30` | `.rate()` | | `order` | `100` | `.order()` | | `budget` | `None` | `.budget()` | | `deadline` | `None` | `.deadline()` | | `on_miss` | `None` (warn) | `.on_miss()` | | `failure_policy` | `None` (fatal) | `.failure_policy()` | | `compute` | `False` | `.compute()` | | `on` | `None` | `.on(topic)` | | `priority` | `None` | `.priority()` | | `core` | `None` | `.core()` | | `watchdog` | `None` | `.watchdog()` | | async `tick` | auto-detected | `.async_io()` | :::tip Python Timing Guide **Budget/deadline** in Python detect overruns, not guarantee timing. Python ticks take milliseconds, not microseconds. Use realistic values: ```python # simplified # Rust node: microsecond budget Node(tick=motor_ctrl, rate=1000, budget=300 * us) # 300μs — Rust can do this # Python ML node: millisecond budget — detects when inference is too slow Node(tick=run_model, rate=30, budget=50 * ms) # 50ms — triggers on_miss if exceeded ``` **compute=True** is useful when your tick calls C extensions that release the GIL (NumPy, PyTorch, OpenCV) — they run in parallel on the thread pool. **priority/core** are for mixed Rust+Python systems — tell the OS to schedule Rust RT nodes before Python, and keep Python off RT cores. ::: ### Running Nodes ```python # simplified def run(*nodes: Node, duration: float = None) -> None ``` Convenience one-liner: creates a Scheduler, adds all nodes, and runs. **Parameters:** - `*nodes: Node` — One or more Node instances to run - `duration: float` — Optional. Run for this many seconds, then stop. `None` = run until Ctrl+C. ```python # simplified # Single node — runs until Ctrl+C horus.run(node) # Multiple nodes for 10 seconds horus.run(node1, node2, node3, duration=10) ``` --- ## Examples ### 1. Simple Publisher ```python # simplified import horus def publish_temperature(node): node.send("temperature", 25.5) sensor = horus.Node( name="temp_sensor", pubs="temperature", tick=publish_temperature, rate=1 # 1 Hz ) horus.run(sensor, duration=10) ``` ### 2. Subscriber ```python # simplified import horus def display_temperature(node): if node.has_msg("temperature"): temp = node.recv("temperature") print(f"Temperature: {temp}°C") display = horus.Node( name="display", subs="temperature", tick=display_temperature ) horus.run(display) ``` ### 3. Pub/Sub Pipeline ```python # simplified import horus def publish(node): node.send("raw", 42.0) def process(node): if node.has_msg("raw"): data = node.recv("raw") result = data * 2.0 node.send("processed", result) def display(node): if node.has_msg("processed"): value = node.recv("processed") print(f"Result: {value}") # Create pipeline publisher = horus.Node("publisher", pubs="raw", tick=publish, rate=1) processor = horus.Node("processor", subs="raw", pubs="processed", tick=process) displayer = horus.Node("display", subs="processed", tick=display) # Run all together horus.run(publisher, processor, displayer, duration=5) ``` ### 4. Using Lambda Functions ```python # simplified import horus # Producer (inline) producer = horus.Node( pubs="numbers", tick=lambda n: n.send("numbers", 42), rate=1 ) # Transformer (inline) doubler = horus.Node( subs="numbers", pubs="doubled", tick=lambda n: n.send("doubled", n.recv("numbers") * 2) if n.has_msg("numbers") else None ) horus.run(producer, doubler, duration=5) ``` ### 5. Multi-Topic Robot Controller ```python # simplified import horus def robot_controller(node): # Read from multiple sensors lidar_data = None camera_data = None if node.has_msg("lidar"): lidar_data = node.recv("lidar") if node.has_msg("camera"): camera_data = node.recv("camera") # Compute commands if lidar_data and camera_data: cmd = compute_navigation(lidar_data, camera_data) node.send("motors", cmd) node.send("status", "navigating") robot = horus.Node( name="robot_controller", subs=["lidar", "camera"], pubs=["motors", "status"], tick=robot_controller, rate=50 # 50Hz control loop ) ``` ### 6. Lifecycle Management ```python # simplified import horus class Context: def __init__(self): self.count = 0 self.file = None ctx = Context() def init_handler(node): print("Starting up!") ctx.file = open("data.txt", "w") def tick_handler(node): ctx.count += 1 data = f"Tick {ctx.count}" node.send("data", data) ctx.file.write(data + "\n") def shutdown_handler(node): print(f"Processed {ctx.count} messages") ctx.file.close() node = horus.Node( pubs="data", init=init_handler, tick=tick_handler, shutdown=shutdown_handler, rate=10 ) horus.run(node, duration=5) ``` --- ## Advanced Features (Production-Ready) HORUS Python includes advanced features that match or exceed ROS2 capabilities while maintaining simplicity. ### NodeState The `NodeState` enum tracks which lifecycle phase a node is in: ```python # simplified from horus import NodeState # Values: NodeState.UNINITIALIZED # Created but not yet running NodeState.INITIALIZING # init() is executing NodeState.RUNNING # Actively ticking NodeState.STOPPING # shutdown() is executing NodeState.STOPPED # Clean shutdown complete NodeState.ERROR # Recoverable error state NodeState.CRASHED # Unrecoverable crash ``` NodeState values are strings — you can compare directly: ```python # simplified if node_state == "running": print("Node is active") ``` ### Scheduler ```python # simplified def Scheduler( tick_rate: float = 1000.0, # Global tick rate in Hz rt: bool = False, # Enable real-time scheduling (prefer_rt) watchdog_ms: int = 0, # Watchdog timeout in ms (0 = disabled) deterministic: bool = False, # Deterministic execution mode verbose: bool = False, # Enable verbose logging ) -> Scheduler ``` The Scheduler orchestrates node execution with priority ordering, per-node rate control, and real-time features. **Methods:** - `scheduler.add(node: Node) -> Scheduler` — Register a node (returns self for chaining) - `scheduler.run(duration: float = None) -> None` — Start the main loop. Blocks until Ctrl+C, `stop()`, or `duration` expires - `scheduler.stop() -> None` — Request graceful shutdown **Creating a Scheduler:** ```python # simplified # All config on Node(), scheduler.add() takes only the node scheduler = horus.Scheduler() scheduler.add(horus.Node(tick=motor_fn, rate=1000, order=0, budget=200)) scheduler.add(horus.Node(tick=planner_fn, order=5, compute=True)) scheduler.add(horus.Node(tick=telemetry_fn, rate=1, order=10)) ``` **Node configuration** (kwargs on `Node()`): | Method | Description | |--------|-------------| | `.order(n)` | Execution priority (lower = runs first) | | `.rate(hz)` | Node tick rate in Hz — auto-derives budget/deadline, marks as RT | | `.budget(us)` | Tick budget in microseconds | | `.on_miss(policy)` | `"warn"`, `"skip"`, `"safe_mode"`, or `"stop"` | | `.on(topic)` | Event-driven — wakes only when topic has new data | | `.compute()` | Offload to worker thread pool (planning, ML) | | `.async_io()` | Run on async executor (network, disk) | | `.failure_policy(name, ...)` | `"fatal"`, `"restart"`, `"skip"`, or `"ignore"` — optional kwargs: `max_retries`, `backoff_ms`, `max_failures`, `cooldown_ms` | | `.build()` | Finalize and register — returns `Scheduler` | **Adding Nodes:** All configuration (order, rate, budget, etc.) goes on the `Node()` constructor. `scheduler.add()` takes only the node: ```python # simplified sensor = horus.Node(name="sensor", tick=sensor_fn, rate=100, order=0) controller = horus.Node(name="ctrl", tick=ctrl_fn, rate=100, order=1) logger = horus.Node(name="logger", tick=log_fn, rate=10, order=2) scheduler.add(sensor) scheduler.add(controller) scheduler.add(logger) ``` **Execution:** | Method | Description | |--------|-------------| | `scheduler.run()` | Run until Ctrl+C or `.stop()` | | `scheduler.run(duration=10.0)` | Run for a specific duration, then shut down | | `scheduler.stop()` | Signal graceful shutdown | | `scheduler.current_tick()` | Current tick count | **Monitoring:** | Method | Description | |--------|-------------| | `scheduler.get_node_stats(name)` | Stats dict: `total_ticks`, `errors_count`, `avg_tick_duration_ms`, etc. | | `scheduler.set_node_rate(name, rate)` | Change a node's tick rate at runtime | | `scheduler.set_tick_budget(name, us)` | Update per-node tick budget (microseconds) | | `scheduler.get_all_nodes()` | List all nodes with their configuration | | `scheduler.get_node_count()` | Number of registered nodes | | `scheduler.has_node(name)` | Check if a node is registered | | `scheduler.get_node_names()` | List of registered node names | | `scheduler.remove_node(name)` | Remove a node (returns `True` if found) | | `scheduler.status()` | Formatted status string | | `scheduler.capabilities()` | Dict of RT capabilities | | `scheduler.has_full_rt()` | `True` if all RT features available | | `scheduler.safety_stats()` | Dict of budget overruns, deadline misses, watchdog expirations | **Recording & Replay:** | Method | Description | |--------|-------------| | `scheduler.is_recording()` | Check if recording is active | | `scheduler.is_replaying()` | Check if replaying | | `scheduler.stop_recording()` | Stop recording, returns list of saved file paths | | `Scheduler.list_recordings()` | List available recordings (static method) | | `Scheduler.delete_recording(name)` | Delete a recording (static method) | **Context Manager:** The Scheduler supports the `with` statement for automatic cleanup: ```python # simplified with horus.Scheduler(tick_rate=100) as sched: sched.add(horus.Node(tick=sensor_fn, rate=100, order=0)) sched.add(horus.Node(tick=ctrl_fn, rate=100, order=1)) sched.run(duration=10.0) # stop() called automatically on exit, even if an exception occurs ``` **Expanded Method Details:** #### `scheduler.get_node_stats(name) -> dict` Returns a dictionary with detailed statistics for the named node: ```python # simplified stats = scheduler.get_node_stats("motor_ctrl") print(f"Total ticks: {stats['total_ticks']}") print(f"Avg tick: {stats.get('avg_tick_duration_ms', 0):.2f} ms") print(f"Errors: {stats['errors_count']}") ``` Keys include: `name`, `priority`, `total_ticks`, `successful_ticks`, `failed_ticks`, `avg_tick_duration_ms`, `max_tick_duration_ms`, `errors_count`, `uptime_seconds`. #### `scheduler.status() -> str` Returns the current scheduler state: `"idle"`, `"running"`, or `"stopped"`. #### `scheduler.current_tick() -> int` Returns the current tick count (0-indexed). #### `scheduler.set_node_rate(name, rate)` Change a node's tick rate at runtime. Useful for adaptive control: ```python # simplified # Slow down logging when battery is low if battery_low: scheduler.set_node_rate("logger", 1) # 1 Hz else: scheduler.set_node_rate("logger", 10) # 10 Hz ``` #### `scheduler.run(duration=None)` Start the scheduler tick loop. Blocks until Ctrl+C, `stop()`, or duration expires. ```python # simplified scheduler.run() # Run forever (until Ctrl+C) scheduler.run(duration=30.0) # Run for 30 seconds, then stop ``` #### `scheduler.stop()` Signal graceful shutdown. All nodes' `shutdown()` callbacks run before exit. ```python # simplified # From another thread or a node's tick: scheduler.stop() ``` #### `scheduler.get_all_nodes() -> List[Dict]` Returns all registered nodes with their configuration. ```python # simplified for node in scheduler.get_all_nodes(): print(f"{node['name']}: order={node.get('order', '?')}") ``` #### `scheduler.get_node_count() -> int` Number of registered nodes. ```python # simplified print(f"Running {scheduler.get_node_count()} nodes") ``` #### `scheduler.has_node(name) -> bool` Check if a node with the given name is registered. ```python # simplified if scheduler.has_node("motor_ctrl"): stats = scheduler.get_node_stats("motor_ctrl") ``` #### `scheduler.get_node_names() -> List[str]` List of all registered node names. ```python # simplified print(f"Nodes: {scheduler.get_node_names()}") ``` #### `scheduler.remove_node(name) -> bool` Remove a node by name. Returns `True` if found and removed. ```python # simplified if scheduler.remove_node("debug_logger"): print("Debug logger removed") ``` #### `scheduler.capabilities() -> Dict` Returns a dict of detected RT capabilities (SCHED_FIFO, memory locking, CPU affinity, etc.). ```python # simplified caps = scheduler.capabilities() print(f"RT priority: {caps.get('max_priority', 'N/A')}") print(f"Memory lock: {caps.get('memory_locking', False)}") ``` #### `scheduler.has_full_rt() -> bool` Returns `True` if all requested RT features are available (no degradations). ```python # simplified if not scheduler.has_full_rt(): print("Warning: running with degraded RT — check capabilities()") ``` #### `scheduler.safety_stats() -> Dict` Returns safety monitor statistics: budget overruns, deadline misses, watchdog expirations. ```python # simplified stats = scheduler.safety_stats() if stats: print(f"Deadline misses: {stats.get('deadline_misses', 0)}") print(f"Budget overruns: {stats.get('budget_overruns', 0)}") ``` #### `scheduler.is_recording() -> bool` Check if session recording is currently active. #### `scheduler.is_replaying() -> bool` Check if the scheduler is replaying a recorded session. #### `scheduler.stop_recording() -> List[str]` Stop recording and return the list of saved file paths. ```python # simplified if scheduler.is_recording(): paths = scheduler.stop_recording() print(f"Saved recordings: {paths}") ``` #### `Scheduler.list_recordings() -> List[str]` List available recording sessions (static method). ```python # simplified recordings = horus.Scheduler.list_recordings() for r in recordings: print(f" {r}") ``` #### `Scheduler.delete_recording(name) -> bool` Delete a recording by name (static method). #### `scheduler.tick(node_names) -> None` Execute one tick cycle for the specified nodes only. Essential for deterministic testing — run exactly one tick and verify output. ```python # simplified scheduler = horus.Scheduler(tick_rate=100) scheduler.add(sensor) scheduler.add(controller) # Test: run one tick, check output scheduler.tick(["Sensor", "Controller"]) stats = scheduler.get_node_stats("Controller") assert stats["total_ticks"] == 1 ``` #### `scheduler.tick_for(node_names, duration_seconds) -> None` Execute ticks for the specified nodes over a duration. Useful for benchmarking and timed test runs. ```python # simplified # Benchmark sensor processing for 5 seconds scheduler.tick_for(["SensorProcessor"], 5.0) stats = scheduler.get_node_stats("SensorProcessor") print(f"Processed {stats['total_ticks']} ticks in 5s") ``` #### `scheduler.is_running() -> bool` Check if the scheduler is currently executing its tick loop. ```python # simplified import threading def monitor_thread(sched): while sched.is_running(): print(f"Tick: {sched.current_tick()}") time.sleep(1.0) t = threading.Thread(target=monitor_thread, args=(scheduler,), daemon=True) t.start() scheduler.run(duration=10.0) ``` #### `scheduler.get_node_info(name) -> Optional[int]` Get the execution order (priority) for a named node. Returns `None` if the node is not registered. ```python # simplified order = scheduler.get_node_info("motor_ctrl") if order is not None: print(f"motor_ctrl runs at order {order}") ``` #### `scheduler.degradations() -> List[Dict]` Returns a list of RT feature degradations — features that were requested but couldn't be applied (e.g., SCHED_FIFO unavailable without root). ```python # simplified scheduler = horus.Scheduler(tick_rate=1000, rt=True) # ... add nodes and run ... for d in scheduler.degradations(): print(f"Degraded: {d.get('feature')} — {d.get('reason')}") ``` Each dict contains `feature` (what was requested) and `reason` (why it couldn't be applied). **`horus.run()` — The ONE way to run nodes:** ```python # simplified from horus import Node, run, us sensor = Node(tick=read_lidar, rate=10, order=0, pubs=["scan"]) ctrl = Node(tick=navigate, rate=30, order=1, subs=["scan"], pubs=["cmd"]) motor = Node(tick=drive, rate=1000, order=2, budget=300*us, subs=["cmd"]) # All scheduler config as kwargs run(sensor, ctrl, motor, rt=True, watchdog_ms=500) run(node, duration=10, deterministic=True) ``` **All `run()` kwargs** (maps to Rust Scheduler builder): | Kwarg | Default | Rust equivalent | |-------|---------|-----------------| | `duration` | `None` (forever) | `.run(duration=)` | | `tick_rate` | `1000.0` | `.tick_rate()` | | `rt` | `False` | `.prefer_rt()` | | `deterministic` | `False` | `.deterministic()` | | `watchdog_ms` | `0` | `.watchdog()` | | `blackbox_mb` | `0` | `.blackbox()` | | `recording` | `False` | `.with_recording()` | | `name` | `None` | `.name()` | | `cores` | `None` | `.cores()` | | `max_deadline_misses` | `None` | `.max_deadline_misses()` | | `verbose` | `False` | `.verbose()` | | `telemetry` | `None` | `.telemetry()` | ### Miss — Deadline Miss Policy The `Miss` class defines what happens when a node exceeds its deadline: ```python # simplified from horus import Miss # Available policies Miss.WARN # Log warning and continue (default) Miss.SKIP # Skip the node for this tick Miss.SAFE_MODE # Call enter_safe_state() on the node Miss.STOP # Stop the entire scheduler ``` Use via the Node constructor: ```python # simplified # Config on Node, then add motor = horus.Node(name="motor", tick=motor_fn, rate=500, order=0, budget=200, on_miss="safe_mode") scheduler.add(motor) ``` ### Scheduler Configuration All configuration via constructor kwargs: ```python # simplified from horus import Scheduler, Node # Development — simple scheduler = Scheduler() # Production — watchdog + RT scheduler = Scheduler(tick_rate=1000, rt=True, watchdog_ms=500) # With blackbox + telemetry scheduler = Scheduler( tick_rate=1000, watchdog_ms=500, blackbox_mb=64, telemetry="http://localhost:9090", verbose=True, ) # Deterministic mode for simulation/testing scheduler = Scheduler(tick_rate=100, deterministic=True) ``` **Testing with short runs:** ```python # simplified scheduler = Scheduler() scheduler.add(Node(name="sensor", tick=sensor_fn, rate=100, order=0)) scheduler.add(Node(name="ctrl", tick=ctrl_fn, rate=100, order=1)) # Run for a short duration scheduler.run(duration=0.1) ``` ### Message Timestamps Timestamps are managed by the Rust Topic backend. Typed messages include a `timestamp_ns` field for nanosecond-precision timing: ```python # simplified import horus import time def control_tick(node): if node.has_msg("sensor_data"): msg = node.recv("sensor_data") # Use message-level timestamps for latency checks if hasattr(msg, 'timestamp_ns') and msg.timestamp_ns: age_s = (time.time_ns() - msg.timestamp_ns) / 1e9 if age_s > 0.1: # More than 100ms old node.log_warning(f"Stale data: {age_s*1000:.1f}ms old") return latency = age_s print(f"Latency: {latency*1000:.1f}ms") # Process fresh data process(msg) ``` **Timestamp access:** Use `msg.timestamp_ns` on typed messages (CmdVel, Pose2D, Imu, etc.) for nanosecond timestamps set by the Rust backend. ### Multiprocess Execution Run Python nodes in separate processes for isolation and multi-language support: ```bash # Run multiple Python files as separate processes horus run node1.py node2.py node3.py # Mix Python and Rust nodes horus run sensor.rs controller.py visualizer.py # Mix Rust and Python horus run lidar_driver.rs planner.py motor_control.rs ``` All nodes in the same `horus run` session automatically communicate via shared memory! **Example - Distributed System:** ```python # simplified # sensor_node.py import horus def sensor_tick(node): data = read_lidar() # Your sensor code node.send("lidar_data", data) sensor = horus.Node(name="lidar", pubs="lidar_data", tick=sensor_tick) horus.run(sensor) ``` ```python # simplified # controller_node.py import horus def control_tick(node): if node.has_msg("lidar_data"): data = node.recv("lidar_data") cmd = compute_control(data) node.send("motor_cmd", cmd) controller = horus.Node( name="controller", subs="lidar_data", pubs="motor_cmd", tick=control_tick ) horus.run(controller) ``` ```bash # Run both in separate processes horus run sensor_node.py controller_node.py ``` **Benefits:** - **Process isolation**: One crash doesn't kill everything - **Multi-language**: Mix Python and Rust nodes in the same application - **Parallel execution**: True multicore utilization - **Zero configuration**: Shared memory IPC automatically set up ### Complete Example: All Features Together ```python # simplified import horus import time def sensor_tick(node): """High-frequency sensor (100Hz)""" imu = {"accel_x": 1.0, "accel_y": 0.0, "accel_z": 9.8} node.send("imu_data", imu) node.log_info("Published IMU data") def control_tick(node): """Medium-frequency control (50Hz)""" if node.has_msg("imu_data"): imu = node.recv("imu_data") cmd = {"linear": 1.0, "angular": 0.0} node.send("cmd_vel", cmd) def logger_tick(node): """Low-frequency logging (10Hz)""" if node.has_msg("cmd_vel"): msg = node.recv("cmd_vel") node.log_info(f"Command received: {msg}") # Create nodes with rate and order configured on the Node sensor = horus.Node(name="imu", pubs="imu_data", tick=sensor_tick, rate=100, order=0) controller = horus.Node(name="ctrl", subs="imu_data", pubs="cmd_vel", tick=control_tick, rate=50, order=1) logger = horus.Node(name="log", subs="cmd_vel", tick=logger_tick, rate=10, order=2) # Add nodes to scheduler scheduler = horus.Scheduler() scheduler.add(sensor) scheduler.add(controller) scheduler.add(logger) scheduler.run(duration=5.0) # Check statistics stats = scheduler.get_node_stats("imu") print(f"Sensor: {stats['total_ticks']} ticks in 5 seconds") ``` --- ## Network Communication HORUS Python supports network communication for distributed multi-machine systems. Topic, and Router all work transparently over the network. ### Topic Network Endpoints Add an `endpoint` parameter to communicate over the network: ```python # simplified from horus import Topic, CmdVel # Local (shared memory) - default local_topic = Topic(CmdVel) # Network (UDP direct) network_topic = Topic(CmdVel, endpoint="cmdvel@192.168.1.100:8000") # Router (TCP broker for WAN/NAT traversal) router_topic = Topic(CmdVel, endpoint="cmdvel@router") ``` **Endpoint Syntax:** - `"topic"` - Local shared memory (~500ns latency) - `"topic@host:port"` - Direct UDP (<50μs latency) - `"topic@router"` - Router broker (auto-discovery on localhost:7777) - `"topic@192.168.1.100:7777"` - Router broker at specific address ### Topic Methods | Method | Description | |--------|-------------| | `topic.send(msg, node=None)` | Send a message. Pass optional `node` for automatic IPC logging. Returns `True`. | | `topic.recv(node=None)` | Receive one message. Returns the message or `None` if empty. | | `topic.name` | Property: the topic name string | | `topic.backend_type` | Property: the active backend name (e.g., `"direct"`, `"spsc_shm"`) | | `topic.is_network_topic` | Property: `True` if this topic uses network transport | | `topic.endpoint` | Property: the endpoint string, or `None` for local topics | | `topic.stats()` | Returns a dict with `messages_sent`, `messages_received`, `send_failures`, `recv_failures`, `is_network`, `backend` | | `topic.is_generic()` | Returns `True` if this is a generic (string-name) topic | **Example:** ```python # simplified from horus import Topic, CmdVel topic = Topic(CmdVel) # Send and receive typed messages topic.send(CmdVel(linear=1.0, angular=0.5)) msg = topic.recv() if msg: print(f"linear={msg.linear}, angular={msg.angular}") # Check topic properties print(f"Name: {topic.name}") # "cmd_vel" print(f"Backend: {topic.backend_type}") # e.g. "mpmc_shm" print(f"Stats: {topic.stats()}") ``` ### Generic Topics When you create a Topic with a **string name** (instead of a typed class), you get a generic topic that accepts any JSON-serializable data: ```python # simplified from horus import Topic, CmdVel # Generic topic (string name = dynamic typing) topic = Topic("my_topic") # Typed topic (class = static typing, better performance) typed_topic = Topic(CmdVel) ``` Generic topics use the same `send()` and `recv()` methods as typed topics, but accept any JSON-serializable Python object. Data is serialized via MessagePack internally. ```python # simplified from horus import Topic topic = Topic("sensor_data") # Send dict, list, or any JSON-serializable data topic.send({"temperature": 25.5, "humidity": 60.0}) topic.send([1.0, 2.0, 3.0, 4.0]) topic.send("status: OK") # Receive (returns Python object) msg = topic.recv() # {"temperature": 25.5, "humidity": 60.0} # Check if generic print(topic.is_generic()) # True ``` **Typed vs Generic Performance:** | Topic Type | Serialization | Use Case | |------------|---------------|----------| | Typed (`Topic(CmdVel)`) | Direct field extraction (no serde) | Production, cross-language, high-frequency | | Generic (`Topic("name")`) | Python → JSON → MessagePack | Dynamic schemas, prototyping, Python-only | ### Automatic Transport Selection HORUS automatically selects the fastest communication path based on where publishers and subscribers are located. You never need to configure this manually: ```python # simplified from horus import Topic, CmdVel # Just create a topic — HORUS picks the fastest path automatically: # Same-thread: ~3ns (when pub+sub are in the same node) # Same-process: ~18-36ns (when pub+sub are in different nodes, same process) # Cross-process: ~85-171ns (when pub+sub are in different processes) topic = Topic("cmd_vel", CmdVel) ``` **Automatic Transport Tiers:** | Scenario | Latency | When it applies | |----------|---------|-----------------| | Same thread | ~3ns | Publisher and subscriber are in the same node | | Same process (1:1) | ~18ns | One publisher, one subscriber, same process | | Same process (many:1) | ~26ns | Multiple publishers, one subscriber, same process | | Same process (many:many) | ~36ns | Multiple publishers and subscribers, same process | | Cross-process (1:1) | ~85ns | One publisher, one subscriber, different processes | | Cross-process (many:many) | ~91ns | Multiple publishers and subscribers, different processes | ### Router Client (WAN/NAT Traversal) For communication across networks, through NAT, or for large-scale deployments, use the Router: ```python # simplified from horus import RouterClient, Topic, CmdVel # Create router client for explicit connection management router = RouterClient("192.168.1.100", 7777) # Build endpoints through the router cmd_endpoint = router.endpoint("cmdvel") # Returns "cmdvel@192.168.1.100:7777" pose_endpoint = router.endpoint("pose") # Use endpoints with Topic topic = Topic(CmdVel, endpoint=cmd_endpoint) # Router properties print(f"Address: {router.address}") # "192.168.1.100:7777" print(f"Connected: {router.is_connected}") # True print(f"Topics: {router.topics}") # ["cmdvel", "pose"] print(f"Uptime: {router.uptime_seconds}s") ``` **Helper Functions:** ```python # simplified from horus import default_router_endpoint, router_endpoint # Default router (localhost:7777) ep1 = default_router_endpoint("cmdvel") # "cmdvel@router" # Custom router address ep2 = router_endpoint("cmdvel", "192.168.1.100", 7777) # "cmdvel@192.168.1.100:7777" ``` **Router Server (for testing):** ```python # simplified from horus import RouterServer # Start a local router (for development/testing) server = RouterServer(port=7777) server.start() # For production, use CLI instead: # $ horus router start --port 7777 ``` ### When to Use What | Transport | Latency | Use Case | |-----------|---------|----------| | Same-process (`Topic(CmdVel)`) | ~18-36ns | In-process communication (automatic) | | Cross-process, 1:1 (`Topic(CmdVel)`) | ~85ns | Same machine, one publisher and one subscriber | | Cross-process, many:many (`Topic(CmdVel)`) | ~91ns | Same machine, multiple publishers and subscribers | | Network (`endpoint="topic@host:port"`) | <50μs | Multi-machine on LAN (direct UDP) | | Router (`endpoint="topic@router"`) | 10-50ms | WAN, NAT traversal, cloud deployments | ### Multi-Machine Example ```python # simplified # === ROBOT (192.168.1.50) === from horus import Topic, CmdVel, Imu, Odometry # Local: Critical flight control (ultra-fast) imu_topic = Topic(Imu) # ~85ns local shared memory # Network: Telemetry to ground station telemetry = Topic(Odometry, endpoint="telem@192.168.1.100:8000") # Network: Commands from ground station commands = Topic(CmdVel, endpoint="cmd@0.0.0.0:8001") # === GROUND STATION (192.168.1.100) === from horus import Topic, CmdVel, Odometry # Receive telemetry from robot telemetry_sub = Topic(Odometry, endpoint="telem@0.0.0.0:8000") # Send commands to robot command_pub = Topic(CmdVel, endpoint="cmd@192.168.1.50:8001") ``` --- ## Integration with Python Ecosystem ### NumPy Integration ```python # simplified import horus import numpy as np def process_array(node): if node.has_msg("raw_data"): data = node.recv("raw_data") # Convert to NumPy array arr = np.array(data) # Process with NumPy result = np.fft.fft(arr) node.send("fft_result", result.tolist()) processor = horus.Node( subs="raw_data", pubs="fft_result", tick=process_array ) ``` ### OpenCV Integration ```python # simplified import horus import cv2 import numpy as np def process_image(node): if node.has_msg("camera"): img_data = node.recv("camera") # Convert to OpenCV format img = np.array(img_data, dtype=np.uint8).reshape((480, 640, 3)) # Apply OpenCV processing gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) edges = cv2.Canny(gray, 50, 150) # Publish result node.send("edges", edges.flatten().tolist()) vision = horus.Node( subs="camera", pubs="edges", tick=process_image, rate=30 ) ``` ### scikit-learn Integration ```python # simplified import horus from sklearn.linear_model import LinearRegression import numpy as np model = LinearRegression() def train_model(node): if node.has_msg("training_data"): data = node.recv("training_data") X = np.array(data['features']) y = np.array(data['labels']) # Train model model.fit(X, y) score = model.score(X, y) node.send("model_score", score) trainer = horus.Node( subs="training_data", pubs="model_score", tick=train_model ) ``` --- ## Advanced Patterns ### State Management ```python # simplified import horus class RobotState: def __init__(self): self.position = {"x": 0.0, "y": 0.0} self.velocity = 0.0 self.last_update = 0 state = RobotState() def update_state(node): if node.has_msg("velocity"): state.velocity = node.recv("velocity") if node.has_msg("position"): state.position = node.recv("position") # Publish combined state node.send("robot_state", { "pos": state.position, "vel": state.velocity }) state_manager = horus.Node( subs=["velocity", "position"], pubs="robot_state", tick=update_state ) ``` ### Rate Limiting ```python # simplified import horus import time class RateLimiter: def __init__(self, min_interval): self.min_interval = min_interval self.last_send = 0 limiter = RateLimiter(min_interval=0.1) # 100ms minimum def rate_limited_publish(node): current_time = time.time() if current_time - limiter.last_send >= limiter.min_interval: node.send("output", "data") limiter.last_send = current_time node = horus.Node( pubs="output", tick=rate_limited_publish, rate=100 # Node runs at 100Hz, but publishes at max 10Hz ) ``` ### Error Handling ```python # simplified import horus def safe_processing(node): try: if node.has_msg("input"): data = node.recv("input") result = risky_operation(data) node.send("output", result) except Exception as e: node.send("errors", str(e)) print(f"Error: {e}") processor = horus.Node( subs="input", pubs=["output", "errors"], tick=safe_processing ) ``` --- ## Performance Tips ### 1. Use Per-Node Rate Control ```python # simplified # Configure rate and order on the Node, then add to scheduler sensor = horus.Node(name="sensor", tick=sensor_fn, rate=100, order=0) controller = horus.Node(name="ctrl", tick=ctrl_fn, rate=50, order=1) logger = horus.Node(name="logger", tick=log_fn, rate=10, order=2) scheduler = horus.Scheduler() scheduler.add(sensor) scheduler.add(controller) scheduler.add(logger) scheduler.run() # Monitor performance with get_node_stats() stats = scheduler.get_node_stats("sensor") print(f"Sensor executed {stats['total_ticks']} ticks") ``` ### 2. Check Message Freshness ```python # simplified import time def control_tick(node): if node.has_msg("sensor_data"): data = node.recv("sensor_data") # Use message-level timestamps for staleness checks if hasattr(data, 'timestamp_ns') and data.timestamp_ns: age_s = (time.time_ns() - data.timestamp_ns) / 1e9 if age_s > 0.1: node.log_warning("Skipping stale sensor data") return process(data) ``` ### 3. Use Dicts for Messages ```python # simplified # Send messages as Python dicts (automatically serialized to JSON) cmd = {"linear": 1.5, "angular": 0.8} node.send("cmd_vel", cmd) # For staleness checks, use typed messages with timestamp_ns # or track send time at the application level ``` ### 4. Batch Processing ```python # simplified # Use node.recv_all() to process all available messages at once def batch_processor(node): messages = node.recv_all("input") if messages: results = [process(msg) for msg in messages] for result in results: node.send("output", result) ``` ### 5. Keep tick() Fast ```python # simplified # GOOD: Fast tick def good_tick(node): if node.has_msg("input"): data = node.recv("input") result = quick_operation(data) node.send("output", result) # BAD: Slow tick def bad_tick(node): time.sleep(1) # Don't block! data = requests.get("http://api.example.com") # Don't do I/O! ``` ### 6. Offload Heavy Processing ```python # simplified from concurrent.futures import ThreadPoolExecutor executor = ThreadPoolExecutor(max_workers=4) def heavy_processing_node(node): if node.has_msg("input"): data = node.recv("input") # Offload to thread pool future = executor.submit(expensive_operation, data) # Don't block - check result later or use callback ``` ### 7. Use Multiprocess for CPU-Intensive Tasks ```bash # Isolate heavy processing in separate processes horus run sensor.py heavy_vision.py light_controller.py # Each node gets its own CPU core ``` --- ## Development ### Building from Source ```bash # Debug build (fast compile, slow runtime) cd horus_py maturin develop # Release build (slow compile, fast runtime) maturin develop --release # Build wheel for distribution maturin build --release ``` ### Running Tests ```bash # Install test dependencies pip install pytest # Run all tests pytest tests/ # Run specific feature tests horus run tests/test_rate_control.py # Phase 1: Per-node rates horus run tests/test_timestamps.py # Phase 2: Timestamps horus run tests/test_typed_messages.py # Phase 3: Typed messages # With coverage pytest --cov=horus tests/ # Test multiprocess execution (Phase 4) horus run tests/multiprocess_publisher.py tests/multiprocess_subscriber.py ``` ### Mock Mode HORUS Python includes a mock mode for testing without Rust bindings: ```python # simplified # If Rust bindings aren't available, automatically falls back to mock # You'll see: "Warning: Rust bindings not available. Running in mock mode." # Use for unit testing Python logic without HORUS running ``` ### Debugging Tips ```python # simplified # Check node statistics scheduler = horus.Scheduler() scheduler.add(my_node) # Check node statistics stats = scheduler.get_node_stats("my_node") print(f"Ticks: {stats['total_ticks']}, Errors: {stats['errors_count']}") # Monitor message timestamps via message-level fields msg = node.recv("topic") if msg and hasattr(msg, 'timestamp_ns') and msg.timestamp_ns: age = (time.time_ns() - msg.timestamp_ns) / 1e9 print(f"Message age: {age*1000:.1f}ms") ``` --- ## Interoperability ### With Rust Nodes **Important**: For cross-language communication, use typed topics by passing a message type to `Topic()`. #### Cross-Language with Typed Topics ```python # simplified # Python node with typed topic from horus import Topic, CmdVel cmd_topic = Topic(CmdVel) # Typed topic cmd_topic.send(CmdVel(linear=1.0, angular=0.5)) ``` ```rust // Rust node receives use horus::prelude::*; let topic: Topic = Topic::new("cmd_vel")?; if let Some(cmd) = topic.recv() { println!("Got: linear={}, angular={}", cmd.linear, cmd.angular); } ``` #### Generic Topic (String Topics) ```python # simplified # Generic Topic - for custom topics from horus import Topic topic = Topic("my_topic") # Pass string for generic topic topic.send({"linear": 1.0, "angular": 0.5}) # Uses JSON serialization ``` **Typed topics:** Use `Topic(CmdVel)`, `Topic(Pose2D)` for cross-language communication. See [Python Message Library](/python/library/python-message-library) for details. --- ## Time API Framework-aware time functions. Use these instead of `time.time()` — they integrate with deterministic mode and SimClock. **Quick reference:** | Function | Returns | Description | |----------|---------|-------------| | `horus.now()` | `float` | Current time in seconds | | `horus.dt()` | `float` | Timestep for this tick in seconds | | `horus.elapsed()` | `float` | Time since scheduler start | | `horus.tick()` | `int` | Current tick number | | `horus.budget_remaining()` | `float` | Time left in tick budget | | `horus.rng_float()` | `float` | Random float in [0, 1) | | `horus.timestamp_ns()` | `int` | Nanosecond timestamp | ### `horus.now() -> float` Current framework time in seconds. - **Normal mode**: Wall clock (`time.time()` equivalent) - **Deterministic mode**: Virtual SimClock that advances by fixed `dt` each tick ```python # simplified def tick(node): t = horus.now() node.send("timestamp", t) ``` ### `horus.dt() -> float` Timestep for this tick in seconds. Use this for physics integration instead of measuring elapsed time manually. - **Normal mode**: Actual elapsed time since last tick - **Deterministic mode**: Fixed `1.0 / rate` — identical across runs ```python # simplified def tick(node): # PID controller using dt() for correct integration error = target - current integral += error * horus.dt() derivative = (error - prev_error) / horus.dt() output = kp * error + ki * integral + kd * derivative ``` ### `horus.elapsed() -> float` Time elapsed since the scheduler started, in seconds. ```python # simplified def tick(node): if horus.elapsed() > 30.0: node.log_info("Running for 30+ seconds, stabilized") ``` ### `horus.tick() -> int` Current tick number (0-indexed, increments each scheduler cycle). ```python # simplified def tick(node): if horus.tick() % 100 == 0: node.log_info(f"Tick {horus.tick()}: system healthy") ``` ### `horus.budget_remaining() -> float` Time remaining in this tick's budget, in seconds. Returns `float('inf')` if no budget is configured. Use this for adaptive quality — do more work when time permits, skip expensive operations when tight. ```python # simplified def tick(node): # Always do critical work process_sensor_data() # Only do expensive work if budget allows if horus.budget_remaining() > 0.001: # >1ms remaining run_expensive_optimization() ``` ### `horus.rng_float() -> float` Random float in `[0.0, 1.0)`. - **Normal mode**: System entropy (non-deterministic) - **Deterministic mode**: Tick-seeded RNG — produces identical sequences across runs ```python # simplified def tick(node): # Simulated sensor noise (reproducible in deterministic mode) noise = (horus.rng_float() - 0.5) * 0.1 reading = true_value + noise ``` ### `horus.timestamp_ns() -> int` Current timestamp in nanoseconds. Use for TransformFrame queries and message timestamping. ```python # simplified def tick(node): ts = horus.timestamp_ns() transform = tf.lookup("camera", "base_link", ts) ``` ### Deterministic Mode When using `horus.run(..., deterministic=True)`, the time functions switch from wall clock to SimClock: | Function | Normal Mode | Deterministic Mode | |----------|-------------|-------------------| | `now()` | Wall clock | SimClock (virtual) | | `dt()` | Actual elapsed | Fixed `1/rate` | | `elapsed()` | Real elapsed | Virtual elapsed | | `rng_float()` | System entropy | Tick-seeded (reproducible) | | `tick()` | Same | Same | | `budget_remaining()` | Same | Same | | `timestamp_ns()` | Real nanoseconds | Virtual nanoseconds | This ensures identical behavior across runs — critical for simulation, testing, and replay. --- ## Runtime Parameters `horus.Params` provides dict-like access to runtime configuration stored in `.horus/config/params.yaml`. Change PID gains, speed limits, and thresholds without recompiling. ### `Params(path=None)` Create a parameter store. - `Params()` — loads from `.horus/config/params.yaml` (default) - `Params("path/to/file.yaml")` — loads from explicit path ### Methods | Method | Returns | Description | |--------|---------|-------------| | `get(key, default=None)` | `Any` | Get value, return default if missing | | `params[key]` | `Any` | Get value, raise `KeyError` if missing | | `params[key] = value` | — | Set value | | `has(key)` | `bool` | Check if key exists | | `key in params` | `bool` | Same as `has(key)` | | `keys()` | `List[str]` | All parameter names | | `len(params)` | `int` | Number of parameters | | `save()` | — | Persist to disk | | `remove(key)` | `bool` | Remove a key, returns True if existed | | `reset()` | — | Reset all parameters to defaults | ### Example: Live PID Tuning ```python # simplified import horus params = horus.Params() def controller_tick(node): # Read gains from params — change at runtime via CLI or monitor kp = params.get("pid_kp", 1.0) ki = params.get("pid_ki", 0.1) kd = params.get("pid_kd", 0.01) max_speed = params.get("max_speed", 1.5) error = target - current output = min(kp * error, max_speed) node.send("cmd_vel", output) controller = horus.Node(name="PIDController", tick=controller_tick, rate=100, pubs=["cmd_vel"]) horus.run(controller) ``` Set parameters at runtime: ```bash horus param set pid_kp 2.5 horus param set max_speed 0.8 horus param list ``` --- ## Rate Limiter `horus.Rate` provides drift-compensated rate limiting for background threads and standalone loops. For nodes, use the `rate=` constructor kwarg instead. ### `Rate(hz)` Create a rate limiter targeting `hz` iterations per second. ### Methods | Method | Returns | Description | |--------|---------|-------------| | `sleep()` | — | Sleep until next cycle. Compensates for work time to maintain target rate | | `actual_hz()` | `float` | Measured frequency (smoothed average) | | `target_hz()` | `float` | Target frequency | | `period()` | `float` | Target period in seconds (1/hz) | | `is_late()` | `bool` | True if current cycle exceeded the target period | | `reset()` | — | Reset timing (call after a pause to avoid burst catch-up) | ### Example: Camera Capture Thread ```python # simplified import threading from horus import Rate, Topic, Image def camera_loop(): rate = Rate(30) # 30 FPS target topic = Topic(Image) while running: frame = capture_camera() topic.send(frame) if rate.is_late(): print(f"Camera behind: {rate.actual_hz():.1f} Hz (target {rate.target_hz():.0f})") rate.sleep() thread = threading.Thread(target=camera_loop, daemon=True) thread.start() ``` --- ## Hardware Configuration `horus.hardware` loads hardware node configs from `horus.toml`'s `[hardware]` section. ### Module Functions | Function | Returns | Description | |----------|---------|-------------| | `hardware.load()` | `list[(str, obj)]` | Load hardware entries from `horus.toml` | | `hardware.load_from(path)` | `list[(str, obj)]` | Load from explicit config path | | `hardware.register_driver(name, cls)` | — | Register a Python node class | ### NodeParams Dict-like access to a hardware entry's configuration values. | Method | Returns | Description | |--------|---------|-------------| | `params[key]` | `Any` | Get value (KeyError if missing) | | `params.get(key)` | `Any` | Get value (KeyError if missing) | | `params.get_or(key, default)` | `Any` | Get with default if missing | | `params.has(key)` | `bool` | Check if key exists | | `params.keys()` | `List[str]` | All parameter names | ### Example ```toml # horus.toml [hardware.arm] use = "ArmDriver" port = "/dev/ttyUSB0" baudrate = 1000000 [hardware.lidar] use = "rplidar" port = "/dev/ttyUSB1" sim = true ``` ```python # simplified import horus entries = horus.hardware.load() for name, obj in entries: if isinstance(obj, horus.NodeParams): port = obj.get_or("port", "/dev/ttyUSB0") print(f"{name} on {port}") ``` --- ## Unit Constants ```python # simplified from horus import us, ms us # 1e-6 — microseconds to seconds ms # 1e-3 — milliseconds to seconds # Use with budget/deadline for readability node = horus.Node(tick=fn, rate=1000, budget=300 * us, deadline=900 * us) ``` --- ## Error Types ```python # simplified from horus import HorusNotFoundError, HorusTransformError, HorusTimeoutError try: tf.tf("missing_frame", "base") except HorusTransformError as e: print(f"Transform failed: {e}") try: tf.wait_for_transform("src", "dst", timeout_sec=1.0) except HorusTimeoutError: print("Timed out waiting for transform") ``` | Exception | Rust source | Raised when | |-----------|-------------|-------------| | `HorusNotFoundError` | `NotFound(...)` | Missing topic, frame, node | | `HorusTransformError` | `Transform(...)` | TF extrapolation, stale data | | `HorusTimeoutError` | `Timeout(...)` | Blocking operation timed out | Other Rust errors map to stdlib: `Config` → `ValueError`, `Io` → `IOError`, `Memory` → `MemoryError`, etc. --- ## See Also - [Transform Frame](/python/api/transform-frame) — Coordinate transforms (`TransformFrame`, `Transform`) - [Perception](/python/api/perception) — Detection, landmarks, tracking (`DetectionList`, `PointXYZ`, `COCOPose`) - [Image](/python/api/image), [PointCloud](/python/api/pointcloud), [DepthImage](/python/api/depth-image) — zero-copy types - [Async Nodes](/python/api/async-nodes) — Async tick functions - [ML Utilities](/python/library/ml-utilities) — PyTorch/ONNX inference helpers --- ## Common Patterns ### Producer-Consumer ```python # simplified # Producer producer = horus.Node( pubs="queue", tick=lambda n: n.send("queue", generate_work()) ) # Consumer consumer = horus.Node( subs="queue", tick=lambda n: process_work(n.recv("queue")) if n.has_msg("queue") else None ) horus.run(producer, consumer) ``` ### Request-Response ```python # simplified def request_node(node): node.send("requests", {"id": 1, "query": "data"}) def response_node(node): if node.has_msg("requests"): req = node.recv("requests") response = handle_request(req) node.send("responses", response) req = horus.Node(pubs="requests", tick=request_node) res = horus.Node(subs="requests", pubs="responses", tick=response_node) ``` ### Periodic Tasks ```python # simplified import time class PeriodicTask: def __init__(self, interval): self.interval = interval self.last_run = 0 task = PeriodicTask(interval=5.0) # Every 5 seconds def periodic_tick(node): current = time.time() if current - task.last_run >= task.interval: node.send("periodic", "task_executed") task.last_run = current node = horus.Node(pubs="periodic", tick=periodic_tick, rate=10) ``` --- ## Troubleshooting ### Import Errors ```python # simplified # If you see: ModuleNotFoundError: No module named 'horus' # Rebuild and install: cd horus_py maturin develop --release ``` ### Slow Performance ```python # simplified # Use release build (not debug) maturin develop --release # Check tick rate isn't too high node = horus.Node(tick=fn, rate=30) # 30Hz is reasonable ``` ### Memory Issues ```python # simplified # Avoid accumulating data in closures # BAD: all_data = [] def bad_tick(node): all_data.append(node.recv("input")) # Memory leak! # GOOD: def good_tick(node): data = node.recv("input") process_and_discard(data) # Process immediately ``` --- ## Monitor Integration and Logging ### Current Limitations **Python nodes currently do NOT appear in the HORUS monitor logs.** The Python bindings do not integrate with the Rust logging system: ```python # simplified # Python nodes use standard print() for logging print("Debug message") # Visible in console, not in monitor ``` **What this means:** - Python nodes communicate via shared memory - All message passing functionality works - Python log messages don't appear in monitor logs - Use `print()` for Python-side debugging ### Monitoring Python Nodes Since Python nodes don't integrate with the monitor logging system, use these alternatives: 1. **Node-level logging methods:** ```python # simplified def tick(node): node.log_info("Processing sensor data") node.log_warning("Sensor reading is stale") node.log_error("Failed to process data") node.log_debug("Debug information") # These print to console, not monitor ``` 2. **Manual topic monitoring:** ```python # simplified def tick(node): if node.has_msg("input"): data = node.recv("input") print(f"[{node.name}] Received: {data}") node.send("output", result) print(f"[{node.name}] Published: {result}") ``` 3. **Node statistics:** ```python # simplified scheduler = horus.Scheduler() scheduler.add(node) scheduler.run(duration=10) # Get stats after running stats = scheduler.get_node_stats("my_node") print(f"Ticks: {stats['total_ticks']}") print(f"Errors: {stats['errors_count']}") ``` ### Future Improvements Monitor integration for Python nodes is planned for a future release. This will include: - Full `NodeInfo` context in Python callbacks - `LogSummary` for Python message types - Python node logs visible in the monitor TUI and web dashboard --- ### Custom Exceptions HORUS defines three custom exception types plus maps internal errors to standard Python exceptions: ```python # simplified from horus import HorusNotFoundError, HorusTransformError, HorusTimeoutError try: result = some_horus_operation() except HorusNotFoundError: print("Resource not found") except HorusTransformError: print("Transform computation failed") except HorusTimeoutError: print("Operation timed out") ``` **Custom exceptions** (inherit from `Exception`): | Exception | When Raised | Rust Source | |-----------|-------------|-------------| | `HorusNotFoundError` | Topic, frame, node, or parent frame not found | `HorusError::NotFound` | | `HorusTransformError` | Transform extrapolation or stale data | `HorusError::Transform` | | `HorusTimeoutError` | Blocking operation exceeded time limit | `HorusError::Timeout` | **Standard Python exceptions** raised by HORUS operations: | Python Exception | When Raised | Rust Source | |------------------|-------------|-------------| | `IOError` | File or IPC I/O failures | `HorusError::Io` | | `MemoryError` | Shared memory or pool allocation failures | `HorusError::Memory` | | `ValueError` | Invalid parameters, bad config, parse errors | `HorusError::InvalidInput`, `InvalidDescriptor`, `Parse`, `Config` | | `TypeError` | Serialization/deserialization failures | `HorusError::Serialization` | | `RuntimeError` | Internal or unmapped errors | All other variants | All exceptions preserve the original Rust error message, so you get full context: ```python # simplified try: tf = tf_tree.tf("nonexistent", "world") except HorusNotFoundError as e: print(e) # "Frame not found: nonexistent" try: img = Image(height=-1, width=640, encoding="rgb8") except ValueError as e: print(e) # "Invalid input: height must be positive" ``` **Catch hierarchy** — order matters when catching: ```python # simplified try: result = horus_operation() except HorusNotFoundError: pass # Specific: missing resource except HorusTransformError: pass # Specific: TF failure except HorusTimeoutError: pass # Specific: deadline exceeded except (ValueError, TypeError): pass # Bad input or serialization except (IOError, MemoryError): pass # System-level failures except RuntimeError: pass # Catch-all for internal errors ``` --- ## See Also - [Python Examples](/python/examples) — Code examples - [Core Concepts](/concepts/core-concepts-nodes) - Understanding HORUS architecture - [Monitor](/development/monitor) - Real-time monitoring and visualization - [Python Message Library](/python/library/python-message-library) - Typed message classes - [Multi-Language Support](/concepts/multi-language) - Cross-language communication - [Performance](/performance/performance) - Optimization guide --- **Remember**: With HORUS Python, you focus on *what* your robot does, not *how* the framework works! --- ## Python Memory Types Path: /python/api/memory-types Description: Zero-copy Image, PointCloud, DepthImage, and Tensor — pool-backed types for camera, LiDAR, depth, and custom ML data # Python Memory Types Pool-backed types for sharing large sensor data between nodes with zero-copy IPC. Only a small descriptor (64-168 bytes) travels through the ring buffer; actual data stays in shared memory. | Type | Use case | Create | See | |------|----------|--------|-----| | [**Tensor**](/python/api/tensor) | Custom data: costmaps, feature maps, state vectors | `Tensor([1000, 1000])` | [Full reference](/python/api/tensor) | | [**Image**](/python/api/image) | Camera frames (RGB, BGR, grayscale, Bayer) | `Image(480, 640, "rgb8")` | [Full reference](/python/api/image) | | [**PointCloud**](/python/api/pointcloud) | LiDAR scans, 3D data (XYZ, XYZI, XYZRGB) | `PointCloud(10000, 3)` | [Full reference](/python/api/pointcloud) | | [**DepthImage**](/python/api/depth-image) | Depth maps (F32 meters, U16 millimeters) | `DepthImage(480, 640)` | [Full reference](/python/api/depth-image) | All four types support zero-copy conversion to NumPy, PyTorch, and JAX via DLPack: ```python # simplified np_array = img.to_numpy() # zero-copy torch_tensor = torch.from_dlpack(img.as_tensor()) # zero-copy via DLPack jax_array = img.to_jax() # zero-copy via DLPack ``` `to_*()` and `.as_tensor()` methods are zero-copy (~3 us). `from_*()` methods copy once into shared memory. ## Tensor Bridge: `.as_tensor()` Every domain type can be converted to a `Tensor` for full Pythonic operations. This is zero-copy -- the Tensor shares the same shared memory: ```python # simplified img = Image(480, 640, "rgb8") t = img.as_tensor() # shape=[480, 640, 3], dtype=uint8 t[0:10] += 128 # brighten top rows (writes to SHM) bright = img + 50 # arithmetic returns Tensor pt = torch.from_dlpack(t) # direct to PyTorch (zero-copy) cloud = PointCloud(10000) t = cloud.as_tensor() # shape=[10000, 3], dtype=float32 cloud[0] # first point (direct indexing) len(cloud) # 10000 depth = DepthImage(480, 640) t = depth.as_tensor() # shape=[480, 640], dtype=float32 ``` ## Design Decisions **Why pool-backed instead of heap-allocated?** Pool-backed memory enables cross-process sharing. A heap-allocated NumPy array must be serialized for IPC. Pool-backed types live in shared memory from the start, so `topic.send()` copies only the descriptor, not megabytes of pixel data. **Why DLPack for PyTorch/JAX?** DLPack is the standard protocol for zero-copy tensor exchange across ML frameworks (NumPy 1.25+, PyTorch 1.10+, JAX 0.4+, CuPy, TensorFlow). One protocol covers all frameworks. **Why `from_numpy()` copies but `to_numpy()` doesn't?** Publishing requires placing data at a specific pool slot. NumPy arrays at arbitrary heap addresses can't be shared. So `from_numpy()` copies once into the pool. `to_numpy()` returns a view into the already-shared memory -- no copy. **Thread safety:** Pool-backed types use atomic reference counting. NumPy/PyTorch views should not outlive the source object -- when the HORUS type is dropped, the pool slot may be reclaimed. ## See Also - [Tensor](/python/api/tensor) -- General-purpose tensor with Pythonic API - [Image](/python/api/image) -- Camera images with encoding support - [PointCloud](/python/api/pointcloud) -- 3D point clouds with format queries - [DepthImage](/python/api/depth-image) -- Depth maps with typed access - [ML Utilities](/python/library/ml-utilities) -- ML framework integration --- ## Python Message Library Path: /python/library/python-message-library Description: Standard robotics message types for Python — 55+ types for sensors, control, navigation, vision, and more # Python Message Library 55+ typed message classes for Python robotics. Zero-copy shared memory transport, nanosecond timestamps, and binary-compatible cross-language IPC. **Which messages do I need?** | I'm building a... | Start with these | |---|---| | Mobile robot | `CmdVel`, `Odometry`, `LaserScan`, `Imu` | | Robot arm | `JointState`, `JointCommand`, `WrenchStamped`, `TrajectoryPoint` | | Drone | `Imu`, `NavSatFix`, `MotorCommand`, `BatteryState` | | Vision system | `Image`, `Detection`, `PointCloud`, `DepthImage` | | Multi-robot | `Pose2D`, `Heartbeat`, `DiagnosticStatus`, `TransformStamped` | | Teleoperation | `JoystickInput`, `CmdVel`, `EmergencyStop` | ## Overview **Key Features:** - **55+ message types** — All standard robotics messages available in Python - **Zero-copy IPC** — POD types transfer via shared memory with no serialization overhead - **Cross-language compatible** — Binary-compatible across all horus language bindings - **Nanosecond timestamps** — All messages include `timestamp_ns` field - **Typed Topic support** — Use `Topic(CmdVel)` for type-safe pub/sub All message types are importable directly from the top-level `horus` module (e.g., `from horus import CmdVel, LaserScan`). Category-based imports (`from horus.messages.geometry import Pose2D`) are also supported. --- ## Quick Reference — All 55+ Types ### Geometry (13 types) | Type | One-Line Description | Key Methods | [Details](/python/messages/geometry) | |------|---------------------|-------------|------| | `Vector3` | 3D vector for directions, forces, velocities | `magnitude()`, `normalize()`, `dot()`, `cross()`, `zero()` | [docs](/python/messages/geometry#vector3) | | `Point3` | 3D point (location in space) | `distance_to()`, `origin()` | [docs](/python/messages/geometry#point3) | | `Quaternion` | Rotation without gimbal lock | `from_euler()`, `identity()`, `normalize()`, `is_valid()` | [docs](/python/messages/geometry#quaternion) | | `Pose2D` | 2D position + heading for mobile robots | `distance_to()`, `normalize_angle()`, `origin()` | [docs](/python/messages/geometry#pose2d) | | `Pose3D` | 3D position + quaternion orientation | `from_pose_2d()`, `identity()`, `distance_to()` | [docs](/python/messages/geometry#pose3d) | | `Twist` | 6-DOF velocity (linear + angular) | `stop()`, `new_2d()`, `is_valid()` | [docs](/python/messages/geometry#twist) | | `CmdVel` | 2D velocity command for ground robots | `zero()` | [docs](/python/messages/geometry#cmdvel) | | `TransformStamped` | 3D coordinate frame transform | `identity()`, `from_pose_2d()`, `normalize_rotation()` | [docs](/python/messages/geometry#transformstamped) | | `PoseStamped` | Pose with timestamp and frame ID | `is_valid()` | [docs](/python/messages/geometry#posestamped-posewithcovariance) | | `PoseWithCovariance` | Pose + uncertainty estimate (6x6) | `position_variance()`, `orientation_variance()` | [docs](/python/messages/geometry#posewithcovariance-twistwithcovariance) | | `TwistWithCovariance` | Velocity + uncertainty estimate | `linear_variance()`, `angular_variance()` | [docs](/python/messages/geometry#posewithcovariance-twistwithcovariance) | | `Accel` | Linear + angular acceleration | `is_valid()` | [docs](/python/messages/geometry#accel-accelstamped) | | `AccelStamped` | Acceleration with timestamp | `is_valid()` | [docs](/python/messages/geometry#accel-accelstamped) | ### Sensor (13 types) | Type | One-Line Description | Key Methods | [Details](/python/messages/sensor) | |------|---------------------|-------------|------| | `LaserScan` | 2D LiDAR scan (array of range readings) | `angle_at()`, `is_range_valid()`, `min_range()`, `len()` | [docs](/python/messages/sensor#laserscan) | | `Imu` | Accelerometer + gyroscope + optional orientation | `set_orientation_from_euler()`, `has_orientation()`, `angular_velocity_vec()` | [docs](/python/messages/sensor#imu) | | `Odometry` | 2D pose + velocity from wheel encoders | `set_frames()`, `update()`, `is_valid()` | [docs](/python/messages/sensor#odometry) | | `JointState` | Multi-joint positions, velocities, efforts | `position(name)`, `velocity(name)`, `effort(name)` | [docs](/python/messages/sensor#jointstate) | | `BatteryState` | Battery voltage, percentage, current | `is_critical()`, `is_low()`, `time_remaining()` | [docs](/python/messages/sensor#batterystate) | | `NavSatFix` | GPS/GNSS position with fix status | `from_coordinates()`, `has_fix()`, `distance_to()` | [docs](/python/messages/sensor#navsatfix) | | `RangeSensor` | Single-point distance measurement | Field access only | [docs](/python/messages/sensor#rangesensor) | | `MagneticField` | Magnetometer reading (Tesla) | Field access only | [docs](/python/messages/sensor#magneticfield) | | `Temperature` | Temperature in Celsius + variance | Field access only | [docs](/python/messages/sensor#temperature) | | `FluidPressure` | Barometric/fluid pressure (Pascals) | Field access only | [docs](/python/messages/sensor#fluidpressure) | | `Illuminance` | Ambient light level (lux) | Field access only | [docs](/python/messages/sensor#illuminance) | | `Clock` | Time source (wall, sim, replay) | `wall_clock()`, `sim_time()`, `replay_time()`, `elapsed_since()` | [docs](/python/messages/sensor#clock) | | `TimeReference` | External time sync (GPS, NTP) | `correct_timestamp()` | [docs](/python/messages/sensor#timereference) | ### Control (7 types) | Type | One-Line Description | Key Methods | [Details](/python/messages/control) | |------|---------------------|-------------|------| | `MotorCommand` | Individual motor control | `velocity()`, `position()`, `stop()` | [docs](/python/messages/control#motorcommand) | | `ServoCommand` | Angle-based servo control | `from_degrees()`, `with_speed()`, `disable()` | [docs](/python/messages/control#servocommand) | | `DifferentialDriveCommand` | Left/right wheel speeds | `from_twist()`, `stop()` | [docs](/python/messages/control#differentialdrivecommand) | | `PidConfig` | PID controller gains + presets | `proportional()`, `pi()`, `pd()`, `with_limits()` | [docs](/python/messages/control#pidconfig) | | `TrajectoryPoint` | Position + velocity + time for paths | `new_2d()`, `stationary()` | [docs](/python/messages/control#trajectorypoint) | | `JointCommand` | Multi-joint position/velocity commands | `add_position()`, `add_velocity()` | [docs](/python/messages/control#jointcommand) | | `CmdVel` | 2D velocity (also in Geometry) | `zero()` | [docs](/python/messages/geometry#cmdvel) | ### Navigation (9 types) | Type | One-Line Description | Key Methods | [Details](/python/messages/navigation) | |------|---------------------|-------------|------| | `NavGoal` | Target position with tolerance checking | `is_reached()`, `is_position_reached()`, `with_timeout()` | [docs](/python/messages/navigation#navgoal) | | `NavPath` | Ordered waypoint sequence | `closest_waypoint_index()`, `calculate_progress()` | [docs](/python/messages/navigation#navpath) | | `Waypoint` | Single path point with velocity constraints | `with_velocity()`, `with_stop()` | [docs](/python/messages/navigation#waypoint) | | `GoalResult` | Navigation goal execution feedback | `with_error()` | [docs](/python/messages/navigation#goalresult) | | `OccupancyGrid` | 2D obstacle map (free/occupied/unknown) | `world_to_grid()`, `grid_to_world()`, `is_free()`, `is_occupied()` | [docs](/python/messages/navigation#occupancygrid) | | `CostMap` | Inflated cost map for path planning | `cost()`, `compute_costs()` | [docs](/python/messages/navigation#costmap) | | `PathPlan` | Planned path output | `add_waypoint()`, `from_waypoints()`, `is_empty()` | [docs](/python/messages/navigation#pathplan) | | `VelocityObstacle` | Dynamic obstacle for reactive avoidance | Field access only | [docs](/python/messages/navigation#velocityobstacle-velocityobstacles) | | `VelocityObstacles` | Collection of velocity obstacles | Field access only | [docs](/python/messages/navigation#velocityobstacle-velocityobstacles) | ### Diagnostics (8 types) | Type | One-Line Description | Key Methods | [Details](/python/messages/diagnostics) | |------|---------------------|-------------|------| | `DiagnosticStatus` | Node health with severity levels | `ok()`, `warn()`, `error()`, `fatal()`, `with_component()` | [docs](/python/messages/diagnostics#diagnosticstatus) | | `EmergencyStop` | E-stop trigger and release | `engage()`, `release()`, `with_source()` | [docs](/python/messages/diagnostics#emergencystop) | | `ResourceUsage` | CPU, memory, temperature monitoring | `is_cpu_high()`, `is_memory_high()`, `is_temperature_high()` | [docs](/python/messages/diagnostics#resourceusage) | | `SafetyStatus` | Safety system state machine | `is_safe()`, `set_fault()`, `clear_faults()` | [docs](/python/messages/diagnostics#safetystatus) | | `DiagnosticReport` | Typed key-value diagnostics | `add_string()`, `add_int()`, `add_float()`, `add_bool()` | [docs](/python/messages/diagnostics#diagnosticreport) | | `DiagnosticValue` | Single diagnostic key-value pair | Field access only | [docs](/python/messages/diagnostics#diagnosticreport) | | `Heartbeat` | Topic-based "I'm alive" signal | `update()` | [docs](/python/messages/diagnostics#heartbeat) | | `NodeHeartbeat` | Filesystem-based cross-process heartbeat | `update_timestamp()`, `is_fresh()` | [docs](/python/messages/diagnostics#nodeheartbeat) | ### Force and Haptics (6 types) | Type | One-Line Description | Key Methods | [Details](/python/messages/force) | |------|---------------------|-------------|------| | `WrenchStamped` | 6-DOF force/torque measurement | `force_magnitude()`, `torque_magnitude()`, `exceeds_limits()`, `filter()` | [docs](/python/messages/force#wrenchstamped) | | `ForceCommand` | Force-controlled actuator command | `force_only()`, `surface_contact()`, `with_timeout()` | [docs](/python/messages/force#forcecommand) | | `ImpedanceParameters` | Spring-damper compliance model | `compliant()`, `stiff()`, `enable()`, `disable()` | [docs](/python/messages/force#impedanceparameters) | | `HapticFeedback` | Haptic patterns for teleoperation | `vibration()`, `force()`, `pulse()` | [docs](/python/messages/force#hapticfeedback) | | `ContactInfo` | Contact detection and classification | `is_in_contact()`, `contact_duration_seconds()` | [docs](/python/messages/force#contactinfo) | | `TactileArray` | Grid of force readings (tactile sensor pad) | `set_force()`, `get_force()` | [docs](/python/messages/force#tactilearray) | ### Perception (13 types) | Type | One-Line Description | Key Methods | [Details](/python/messages/perception) | |------|---------------------|-------------|------| | `BoundingBox2D` | 2D axis-aligned bounding box | `iou()`, `area()`, `from_center()`, `as_xyxy()` | [docs](/python/messages/perception#boundingbox2d) | | `BoundingBox3D` | 3D oriented bounding box | `with_rotation()` | [docs](/python/messages/perception#boundingbox3d) | | `Detection` | 2D object detection (class + bbox) | `is_confident()`, `with_class_id()` | [docs](/python/messages/perception#detection) | | `Detection3D` | 3D object detection with velocity | `with_velocity()` | [docs](/python/messages/perception#detection3d) | | `TrackedObject` | Multi-frame tracked object | `is_tentative()`, `is_confirmed()`, `confirm()`, `update()`, `speed()` | [docs](/python/messages/perception#trackedobject) | | `TrackingHeader` | Tracking metadata (frame count, etc.) | Field access only | [docs](/python/messages/perception#trackedobject) | | `Landmark` | 2D body pose keypoint | `is_visible()`, `distance_to()`, `visible()` | [docs](/python/messages/perception#landmark-landmark3d-landmarkarray) | | `Landmark3D` | 3D body pose keypoint | `to_2d()` | [docs](/python/messages/perception#landmark-landmark3d-landmarkarray) | | `LandmarkArray` | Skeleton with model presets | `coco_pose()`, `mediapipe_pose()`, `mediapipe_hand()`, `mediapipe_face()` | [docs](/python/messages/perception#landmark-landmark3d-landmarkarray) | | `PlaneDetection` | Detected planar surface | `distance_to_point()`, `contains_point()` | [docs](/python/messages/perception#planedetection) | | `PlaneArray` | Collection of detected planes | Field access only | [docs](/python/messages/perception#planedetection) | | `SegmentationMask` | Pixel-level image segmentation | `semantic()`, `instance()`, `panoptic()`, `is_semantic()` | [docs](/python/messages/perception#segmentationmask) | | `PointField` | Point cloud field descriptor | `field_size()` | [docs](/python/messages/perception#pointfield) | ### Vision (7 types) | Type | One-Line Description | Key Methods | [Details](/python/messages/vision) | |------|---------------------|-------------|------| | `Image` | Zero-copy image (pool-backed) | `from_numpy()`, `to_numpy()`, `to_torch()` | [docs](/python/messages/vision#image-pool-backed) | | `PointCloud` | Zero-copy 3D point cloud (pool-backed) | `from_xyz()`, `point_count()` | [docs](/python/messages/vision#pointcloud-pool-backed) | | `DepthImage` | Zero-copy depth map (pool-backed) | `get_depth()`, `set_depth()`, `depth_statistics()` | [docs](/python/messages/vision#depthimage-pool-backed) | | `CompressedImage` | JPEG/PNG for network transport | `format_str()` | [docs](/python/messages/vision#compressedimage) | | `CameraInfo` | Camera intrinsic calibration | `focal_lengths()`, `principal_point()`, `with_distortion_model()` | [docs](/python/messages/vision#camerainfo) | | `RegionOfInterest` | Rectangular image region | `contains()`, `area()` | [docs](/python/messages/vision#regionofinterest) | | `StereoInfo` | Stereo camera calibration | `depth_from_disparity()`, `disparity_from_depth()` | [docs](/python/messages/vision#stereoinfo) | ### Input and Audio (3 types) | Type | One-Line Description | Key Methods | [Details](/python/messages/input) | |------|---------------------|-------------|------| | `JoystickInput` | Gamepad/joystick events | `new_button()`, `new_axis()`, `is_button()`, `is_connected()` | [docs](/python/messages/input#joystickinput) | | `KeyboardInput` | Keyboard events with modifiers | `is_ctrl()`, `is_shift()`, `is_alt()` | [docs](/python/messages/input#keyboardinput) | | `AudioFrame` | Audio data from microphones | `mono()`, `stereo()`, `multi_channel()`, `duration_ms()` | [docs](/python/messages/input#audioframe) | --- ## Using Types with Topics ```python # simplified import horus # Typed topic — zero-copy Pod transport (~1.7us) node = horus.Node( pubs=[horus.CmdVel], # auto-creates topic named "cmd_vel" subs=[horus.LaserScan], # auto-creates topic named "scan" tick=my_tick, rate=50 ) # String topic — GenericMessage with serialization (~6-12us) node = horus.Node( pubs=["my_data"], # GenericMessage — any dict works subs=["sensor_raw"], tick=my_tick, rate=50 ) # Send typed node.send("cmd_vel", horus.CmdVel(linear=1.0, angular=0.0)) # Send dict (generic) node.send("my_data", {"x": 1.0, "y": 2.0}) ``` Typed topics auto-derive the topic name from the type's `__topic_name__` class attribute (e.g., `CmdVel.__topic_name__ = 'cmd_vel'`). Override with dict syntax: `pubs={'my_name': horus.CmdVel}`. --- ## Cross-Language Compatibility All 55+ Python message types are **binary-compatible** with their counterparts in other horus language bindings via zero-copy shared memory. A Python node publishing `CmdVel` on a topic can be consumed by any other language node subscribed to the same topic, with no serialization overhead. ```python # simplified # Python publisher from horus import Topic, Twist topic = Topic(Twist) topic.send(Twist(linear_x=1.0, angular_z=0.5)) ``` Any node subscribed to the same topic receives the exact same bytes with zero-copy semantics. --- ## Usage Patterns ### Robot Controller with Multiple Sensors ```python # simplified from horus import Node, Topic, CmdVel, LaserScan, Imu scan_topic = Topic(LaserScan) imu_topic = Topic(Imu) cmd_topic = Topic(CmdVel) def controller_tick(node): scan = scan_topic.recv(node) imu = imu_topic.recv(node) if scan: closest = scan.min_range() if closest is not None and closest < 0.5: cmd_topic.send(CmdVel.zero(), node) # Stop else: cmd_topic.send(CmdVel(linear=0.5, angular=0.0), node) node = Node(name="controller", tick=controller_tick, rate=10, pubs=["cmd_vel"], subs=["scan", "imu"]) ``` ### Safety Monitor ```python # simplified from horus import Node, run, BatteryState, EmergencyStop, DiagnosticStatus, Topic battery_topic = Topic(BatteryState) estop_topic = Topic(EmergencyStop) diag_topic = Topic(DiagnosticStatus) def safety_monitor(node): battery = battery_topic.recv(node) if battery is None: return if battery.is_critical(): estop_topic.send( EmergencyStop.engage("Battery critical").with_source("safety"), node ) elif battery.is_low(20.0): diag_topic.send( DiagnosticStatus.warn(100, f"Battery at {battery.percentage:.0f}%"), node ) else: diag_topic.send(DiagnosticStatus.ok("Battery OK"), node) run(Node(tick=safety_monitor, rate=1, pubs=["estop", "diagnostics"], subs=["battery"])) ``` --- ## Design Decisions **Why 55+ types instead of a minimal set?** Mixed-language systems are the norm in robotics: compiled languages for real-time control, Python for ML and prototyping. If Python lacked certain message types, cross-language pipelines would break. Full parity means any topic can be consumed from Python with binary-compatible zero-copy transport. **Why PyO3 bindings instead of Python dataclasses?** PyO3 bindings expose the actual compiled structs, so Python messages are binary-identical in shared memory. Python dataclasses would require a serialization layer, adding latency and risking format mismatches. The tradeoff is that adding a new message type requires a compiled rebuild, but the `horus.msggen` module provides runtime custom messages for rapid iteration. **Why `from horus import X` instead of `horus.messages.X`?** Flat imports reduce typing and match the convention used across all horus language bindings. All 55+ types are importable from the top-level `horus` module. Category-based imports (`from horus.messages.geometry import Pose2D`) are also supported for code organization. **Why nanosecond timestamps on every message?** Nanosecond precision is needed for sensor fusion (IMU at 1kHz needs sub-millisecond accuracy) and for correlating events across nodes. Embedding the timestamp in every message (instead of a separate Header) keeps messages flat and Pod-compatible. --- ## See Also - [Python Bindings](/python/api/python-bindings) — Full Python API guide - [Python Perception Types](/python/api/perception) — DetectionList, TrackedObject, COCOPose - [Custom Messages](/python/api/custom-messages) — Runtime and compiled message generation - [Python API Reference](/python/api) — Complete Python API for Node, Scheduler, Topics - [Message Types Overview](/concepts/message-types) — Conceptual message type documentation --- ## Custom Messages Path: /python/api/custom-messages Description: Create your own typed messages in Python without writing Rust # Custom Messages (horus.msggen) The `horus.msggen` module lets you define **custom typed messages** directly in Python. Two approaches are available: | Approach | Build Step | Latency | Best For | |----------|-----------|---------|----------| | **Runtime Messages** | None | ~20-40μs | Prototyping, quick iteration | | **Compiled Messages** | `maturin develop` | ~3-5μs | Production, high-frequency | --- ## Runtime Messages (No Build Step) Create custom messages instantly without any compilation. Uses Python's `struct` module for fixed-layout binary serialization. ### Basic Usage ```python # simplified from horus.msggen import define_message # Define a custom message type RobotStatus = define_message('RobotStatus', 'robot.status', [ ('battery_level', 'f32'), ('error_code', 'i32'), ('is_active', 'bool'), ('timestamp', 'u64'), ]) # Create instances status = RobotStatus(battery_level=85.0, error_code=0, is_active=True, timestamp=0) # Access fields print(status.battery_level) # 85.0 status.error_code = 5 # Serialize for IPC raw_bytes = status.to_bytes() # 17 bytes # Reconstruct from bytes status2 = RobotStatus.from_bytes(raw_bytes) ``` ### Supported Types | Type String | Size | Description | |-------------|------|-------------| | `f32` / `float32` | 4 bytes | 32-bit float | | `f64` / `float64` | 8 bytes | 64-bit float | | `i8` | 1 byte | Signed 8-bit int | | `i16` | 2 bytes | Signed 16-bit int | | `i32` | 4 bytes | Signed 32-bit int | | `i64` | 8 bytes | Signed 64-bit int | | `u8` | 1 byte | Unsigned 8-bit int | | `u16` | 2 bytes | Unsigned 16-bit int | | `u32` | 4 bytes | Unsigned 32-bit int | | `u64` | 8 bytes | Unsigned 64-bit int | | `bool` | 1 byte | Boolean | ### NumPy Messages (Better Performance) If NumPy is available, use `define_numpy_message` for better performance: ```python # simplified from horus.msggen import define_numpy_message import numpy as np # NumPy-based message (uses structured arrays internally) SensorData = define_numpy_message('SensorData', 'sensor.data', [ ('x', np.float32), ('y', np.float32), ('z', np.float32), ('temperature', np.float32), ('timestamp', np.uint64), ]) # Create instance data = SensorData(x=1.0, y=2.0, z=3.0, temperature=25.5, timestamp=0) # Get underlying numpy structured array arr = data.to_numpy() # Zero-copy bytes access raw = data.to_bytes() ``` ### With Topic (IPC) Runtime messages work with Topic for inter-process communication. Use `to_bytes()` / `from_bytes()` for serialization over generic topics: ```python # simplified from horus import Topic from horus.msggen import define_message # Define message RobotStatus = define_message('RobotStatus', 'robot.status', [ ('battery_level', 'f32'), ('error_code', 'i32'), ]) # Publisher — use generic topic with manual serialization pub_topic = Topic("robot.status") status = RobotStatus(battery_level=85.0, error_code=0) pub_topic.send(status.to_bytes()) # Subscriber (different process) sub_topic = Topic("robot.status") raw = sub_topic.recv() if raw: received = RobotStatus.from_bytes(raw) print(received.battery_level) # 85.0 ``` Or use the Node convenience API (handles serialization automatically): ```python # simplified import horus from horus.msggen import define_message RobotStatus = define_message('RobotStatus', 'robot.status', [ ('battery_level', 'f32'), ('error_code', 'i32'), ]) def publisher_tick(node): status = RobotStatus(battery_level=85.0, error_code=0) node.send("robot.status", status.to_bytes()) def subscriber_tick(node): if node.has_msg("robot.status"): raw = node.recv("robot.status") status = RobotStatus.from_bytes(raw) print(status.battery_level) # 85.0 pub = horus.Node("publisher", pubs="robot.status", tick=publisher_tick) sub = horus.Node("subscriber", subs="robot.status", tick=subscriber_tick) horus.run(pub, sub, duration=3) ``` --- ## Compiled Messages (Production) For maximum performance (~3-5μs), compile your messages to Rust. This generates PyO3 bindings with the same zero-copy performance as built-in types. ### Step 1: Define Messages ```python # simplified from horus.msggen import register_message # Register one or more messages register_message('RobotStatus', 'robot.status', [ ('battery_level', 'f32'), ('error_code', 'i32'), ('is_active', 'bool'), ('timestamp', 'u64'), ]) register_message('SensorReading', 'sensor.reading', [ ('x', 'f64'), ('y', 'f64'), ('z', 'f64'), ]) ``` ### Step 2: Build ```python # simplified from horus.msggen import build_messages # Generate Rust code and rebuild build_messages() # Runs: maturin develop --release ``` This generates Rust code in `horus_py/src/custom_messages/` and rebuilds the module. ### Step 3: Use After building, your messages are available directly from `horus`: ```python # simplified from horus import RobotStatus, SensorReading, Topic # Create typed topic topic = Topic(RobotStatus) # Send status = RobotStatus(battery_level=85.0, error_code=0, is_active=True, timestamp=0) topic.send(status) # Receive (typed!) received = topic.recv() print(received.battery_level) ``` ### YAML Schema (Recommended for Teams) For larger projects, define messages in YAML: ```yaml # messages.yaml messages: - name: RobotStatus topic: robot.status fields: - name: battery_level type: f32 - name: error_code type: i32 - name: is_active type: bool - name: SensorReading topic: sensor.reading fields: - name: x type: f64 - name: y type: f64 - name: z type: f64 ``` ```python # simplified from horus.msggen import generate_messages_from_yaml, build_messages generate_messages_from_yaml('messages.yaml') build_messages() ``` ### Rebuild Detection The builder tracks message definitions via hash. It won't rebuild unless messages change: ```python # simplified from horus.msggen import check_needs_rebuild, build_messages if check_needs_rebuild(): build_messages() else: print("Messages are up to date") ``` Force rebuild with: ```python # simplified build_messages(force=True) ``` --- ## Performance Comparison | Approach | Latency | Throughput | Use Case | |----------|---------|------------|----------| | **Built-in (Rust)** | ~3μs | 300K msgs/sec | CmdVel, Pose2D, etc. | | **Compiled Custom** | ~3-5μs | 200K msgs/sec | Production custom types | | **Runtime** | ~20-40μs | 25K msgs/sec | Prototyping | | **Runtime (NumPy)** | ~15-30μs | 35K msgs/sec | NumPy integration | | **Pickle** | ~50-100μs | 10K msgs/sec | Legacy/dynamic types | **Recommendation**: Start with runtime messages for fast iteration, then compile for production. --- ## API Reference ### define_message ```python # simplified def define_message( name: str, topic: str, fields: List[Tuple[str, str]] ) -> Type[RuntimeMessage] ``` Create a runtime message class. **Parameters:** - `name`: Class name (e.g., `"RobotStatus"`) - `topic`: Topic name (e.g., `"robot.status"`) - `fields`: List of `(field_name, type_string)` tuples **Returns:** New message class ### define_numpy_message ```python # simplified def define_numpy_message( name: str, topic: str, fields: List[Tuple[str, Any]] ) -> Type[NumpyMessage] ``` Create a NumPy-based message class. **Parameters:** - `name`: Class name - `topic`: Topic name - `fields`: List of `(field_name, numpy_dtype)` tuples **Returns:** New NumPy message class ### register_message ```python # simplified def register_message( name: str, topic: str, fields: List[Tuple[str, str]] ) -> None ``` Register a message for compiled generation. ### build_messages ```python # simplified def build_messages( force: bool = False, verbose: bool = True ) -> bool ``` Build all registered messages. **Parameters:** - `force`: Rebuild even if unchanged - `verbose`: Print progress **Returns:** `True` if successful ### check_needs_rebuild ```python # simplified def check_needs_rebuild() -> bool ``` Check if registered messages differ from last build. --- ## Complete Example ```python # simplified #!/usr/bin/env python3 """Custom message example with runtime messages.""" import horus from horus.msggen import define_message # Define custom sensor message MySensor = define_message('MySensor', 'my.sensor', [ ('distance', 'f32'), ('angle', 'f32'), ('confidence', 'f32'), ('object_id', 'u32'), ]) def sensor_tick(node): """Publish sensor readings.""" reading = MySensor( distance=2.5, angle=0.785, confidence=0.95, object_id=42 ) node.send("my.sensor", reading.to_bytes()) def processor_tick(node): """Process sensor readings.""" if node.has_msg("my.sensor"): raw = node.recv("my.sensor") reading = MySensor.from_bytes(raw) print(f"Object {reading.object_id}: {reading.distance}m at {reading.angle}rad") # Create nodes sensor = horus.Node("sensor", pubs="my.sensor", tick=sensor_tick, rate=10) processor = horus.Node("processor", subs="my.sensor", tick=processor_tick) # Run horus.run(sensor, processor, duration=3) ``` --- ## When to Use Each Approach ### Use Runtime Messages When: - Prototyping new message types - Message schema changes frequently - You don't want to wait for compilation - Performance requirements are moderate (<50Hz) ### Use Compiled Messages When: - Deploying to production - High-frequency data (>100Hz) - Cross-language compatibility required - Type safety is critical ### Use Built-in Messages When: - Standard robotics types (CmdVel, Pose2D, LaserScan) - Maximum performance needed - Compatibility with other HORUS systems ## Design Decisions **Why two approaches (runtime vs compiled) instead of one?** Development speed and runtime speed are inversely correlated. Runtime messages need no build step and let you iterate on message schemas in seconds -- ideal for prototyping. Compiled messages take longer to build but achieve near-native performance -- essential for production. The two-path design matches the typical robotics workflow: prototype fast, then optimize for deployment. **Why `struct` module for runtime messages instead of pickle?** `struct` produces a fixed-size binary layout that is deterministic and cross-process compatible. Pickle produces variable-length output, is version-sensitive, and has security implications (arbitrary code execution on deserialization). The tradeoff is that `struct` only supports primitive types (no nested objects), but this matches the fixed-size Pod constraint of zero-copy IPC. **Why YAML schema support?** Teams need a shared source of truth for message definitions that is language-agnostic and version-controllable. YAML schemas serve as the canonical definition that generates both Python runtime classes and Rust compiled types, ensuring cross-language compatibility. **Why generate Rust code for compiled messages instead of a Python-only binary format?** Generated Rust code produces the same Pod types used by the standard library, so compiled custom messages get identical zero-copy IPC performance and are binary-compatible with Rust nodes. A Python-only approach would sacrifice cross-language support. --- ## See Also - [Python Bindings](/python/api/python-bindings) — Core Python API - [Custom Messages Tutorial](/tutorials/04-custom-messages) — Step-by-step guide - [Message Types](/concepts/message-types) — How messages work in HORUS - [Python Message Library](/python/library/python-message-library) — Standard message types --- ## Python Image Path: /python/api/image Description: Zero-copy camera images backed by shared memory with NumPy, PyTorch, and JAX interop # Python Image A camera image backed by shared memory for zero-copy inter-process communication. Only a small descriptor travels through the ring buffer; the actual pixel data stays in a shared memory pool. This enables real-time image pipelines at full camera frame rates without serialization overhead. ## When to Use Use `Image` when your robot has a camera and you need to share frames between nodes — for example, between a camera driver, a vision node, and a display node. A 1080p RGB image transfers in microseconds, not milliseconds. **ROS2 equivalent:** `sensor_msgs/Image` — same concept, but HORUS uses shared memory pools instead of serialized byte buffers. ## Constructor ```python # simplified from horus import Image # Image(height, width, encoding) img = Image(480, 640, "rgb8") ``` **Parameters:** - `height: int` — Image height in pixels - `width: int` — Image width in pixels - `encoding: str` — Pixel format (default: `"rgb8"`, see encoding table below) ### Factory Methods ```python # simplified # From NumPy array — copies data into shared memory pool import numpy as np pixels = np.random.randint(0, 255, (480, 640, 3), dtype=np.uint8) img = Image.from_numpy(pixels) # encoding auto-detected from shape img = Image.from_numpy(pixels, encoding="bgr8") # explicit encoding # From PyTorch tensor — copies into pool import torch tensor = torch.zeros(480, 640, 3, dtype=torch.uint8) img = Image.from_torch(tensor, encoding="rgb8") # From raw bytes — copies into pool img = Image.from_bytes(raw_data, height=480, width=640, encoding="rgb8") ``` | Factory | Parameters | Use case | |---------|-----------|----------| | `Image(h, w, enc)` | height, width, encoding | Create empty image to fill manually | | `Image.from_numpy(arr, enc?)` | ndarray, optional encoding | Camera capture, OpenCV output | | `Image.from_torch(tensor, enc?)` | Tensor, optional encoding | ML model output | | `Image.from_bytes(data, h, w, enc)` | bytes, height, width, encoding | Network/file loading | ## Supported Encodings | Encoding | Channels | Bytes/Pixel | When to use | |----------|----------|-------------|-------------| | `"mono8"` | 1 | 1 | Grayscale cameras, edge detection output | | `"mono16"` | 1 | 2 | High dynamic range grayscale | | `"rgb8"` | 3 | 3 | Standard color cameras (default) | | `"bgr8"` | 3 | 3 | OpenCV output (OpenCV uses BGR internally) | | `"rgba8"` | 4 | 4 | Images with transparency | | `"bgra8"` | 4 | 4 | Windows/DirectX style with transparency | | `"yuv422"` | 2 | 2 | Raw USB camera output | | `"mono32f"` | 1 | 4 | ML model output (float grayscale) | | `"rgb32f"` | 3 | 12 | HDR imaging, ML float output | | `"bayer_rggb8"` | 1 | 1 | Raw sensor data before debayering | | `"depth16"` | 1 | 2 | 16-bit depth in millimeters (use `DepthImage` for float meters) | ## Properties | Property | Type | Description | |----------|------|-------------| | `height` | `int` | Image height in pixels | | `width` | `int` | Image width in pixels | | `channels` | `int` | Number of color channels (1, 2, 3, or 4) | | `encoding` | `str` | Encoding string (e.g., `"rgb8"`) | | `dtype` | `str` | Data type string (e.g., `"uint8"`) | | `nbytes` | `int` | Total pixel data size in bytes | | `step` | `int` | Row stride in bytes (`width * bytes_per_pixel`) | | `frame_id` | `str` | Coordinate frame (e.g., `"camera_front"`) | | `timestamp_ns` | `int` | Timestamp in nanoseconds since epoch | ## Methods ### Pixel Access ```python # simplified # Read pixel at (x, y) — returns channel values as list pixel = img.pixel(320, 240) # e.g., [128, 64, 255] for RGB # Write pixel at (x, y) img.set_pixel(320, 240, [255, 0, 0]) # Red pixel # Fill entire image with one color img.fill([0, 0, 0]) # Black # Copy raw bytes into image img.copy_from(raw_bytes) # Extract region of interest (returns raw bytes) roi_data = img.roi(x=100, y=100, w=200, h=200) ``` | Method | Signature | Description | |--------|-----------|-------------| | `pixel(x, y)` | `(int, int) -> list[int]` | Read pixel channel values | | `set_pixel(x, y, val)` | `(int, int, list[int]) -> None` | Write pixel | | `fill(val)` | `(list[int]) -> None` | Fill entire image | | `copy_from(data)` | `(bytes) -> None` | Overwrite pixel data from bytes | | `roi(x, y, w, h)` | `(int, int, int, int) -> list[int]` | Extract region of interest | ### Framework Conversions ```python # simplified # To NumPy — zero-copy (shared memory view) np_array = img.to_numpy() # Shape: (H, W, C) for color, (H, W) for mono # To PyTorch — zero-copy via DLPack torch_tensor = img.to_torch() # To JAX — zero-copy via DLPack jax_array = img.to_jax() ``` All `to_*()` methods are zero-copy (~3 us). They return views into the shared memory pool — no pixel data is copied. `from_*()` methods copy data into the pool (one copy at publish time). This is necessary because the pool allocator controls memory layout. ### Metadata ```python # simplified # Set coordinate frame for TransformFrame integration img.set_frame_id("camera_front") # Set timestamp for time-based queries img.set_timestamp_ns(horus.timestamp_ns()) ``` ## Complete Example ```python # simplified import horus from horus import Image, Topic import numpy as np img_topic = Topic(Image) def camera_tick(node): # Simulate camera capture frame = np.random.randint(0, 255, (480, 640, 3), dtype=np.uint8) img = Image.from_numpy(frame, encoding="rgb8") img.set_frame_id("camera_front") img.set_timestamp_ns(horus.timestamp_ns()) img_topic.send(img) def vision_tick(node): img = img_topic.recv() if img: # Zero-copy to NumPy for OpenCV processing pixels = img.to_numpy() gray = np.mean(pixels, axis=2).astype(np.uint8) edges = np.abs(np.diff(gray, axis=1)) node.log_info(f"Detected {np.sum(edges > 128)} edge pixels") camera = horus.Node(name="camera", tick=camera_tick, rate=30, order=0, pubs=["image"]) vision = horus.Node(name="vision", tick=vision_tick, rate=30, order=1, subs=["image"]) horus.run(camera, vision) ``` ## Tensor Interop Convert an Image to a general-purpose `Tensor` for Pythonic operations. This is zero-copy -- the Tensor shares the same shared memory: ```python # simplified t = img.as_tensor() # shape=[480, 640, 3], dtype=uint8 t[0:10] += 128 # brighten top rows (writes to SHM) features = t.flatten() # Tensor reshape pt = torch.from_dlpack(t) # zero-copy to PyTorch ``` Images also support direct indexing and arithmetic: ```python # simplified pixel = img[240, 320] # read pixel at (y, x) img[0:10] = 255 # write to rows bright = img + 50 # returns Tensor normalized = img / 255.0 # returns Tensor ``` See [Tensor](/python/api/tensor) for the full Pythonic API (reshape, arithmetic, reductions, type conversion). --- ## Design Decisions **Why pool-backed shared memory instead of serialized byte buffers?** Serializing a 1080p RGB image (6 MB) takes ~2 ms and doubles memory usage (sender buffer + receiver buffer). With pool-backed shared memory, only the 64-byte descriptor is copied; the pixel data stays in one place and every subscriber maps the same physical memory. Latency stays under 10 us regardless of resolution. **Why fixed encoding enums instead of arbitrary format strings?** Fixed enums enable compile-time size calculations (`step = width * bytes_per_pixel`) and prevent encoding mismatches between publisher and subscriber. The enum covers all common camera output formats; for exotic encodings, use `GenericMessage` with manual layout. **Why `from_numpy()` copies but `to_numpy()` doesn't?** Writing into the shared memory pool requires placing data at a specific pool slot. `from_numpy()` copies once into that slot. Reading (`to_numpy()`) returns a view into the existing pool memory — no copy needed. One copy on publish, zero copies on subscribe. --- ## See Also - [Tensor](/python/api/tensor) — General-purpose tensor with full Pythonic API - [Image (Stdlib)](/stdlib/messages/image) — Image message type overview - [PointCloud (Python)](/python/api/pointcloud) — 3D point cloud data - [DepthImage (Python)](/python/api/depth-image) — Depth maps - [Python CV Node Recipe](/recipes/python-cv-node) — Computer vision with Python - [ML Utilities](/python/library/ml-utilities) — ML framework integration --- ## Tensor Path: /python/api/tensor Description: General-purpose shared memory tensor with Pythonic API for ML/AI workflows # Tensor General-purpose shared memory tensor for custom data. Feels like PyTorch/NumPy but lives in shared memory for zero-copy IPC. Use `Tensor` for costmaps, feature maps, state vectors, occupancy grids, CNN outputs, RL observations -- any array data that needs to move between nodes without serialization overhead. ```python # simplified import horus import numpy as np import torch # Create a costmap in shared memory costmap = horus.Tensor([1000, 1000], dtype="float32") costmap.numpy()[:] = compute_costmap() # write directly to SHM # Send to another process (168B descriptor, not 4MB) topic = horus.Topic("nav.costmap", "Tensor") topic.send(costmap) # Receive and use with any library received = topic.recv() arr = received.numpy() # NumPy (zero-copy) pt = torch.from_dlpack(received) # PyTorch (zero-copy) ``` ## Constructors ```python # simplified from horus import Tensor # Shape + dtype (zero-initialized) t = Tensor([480, 640, 3], dtype="uint8") t = Tensor([1000, 1000], dtype="float32") # Static constructors t = Tensor.zeros([100, 100]) # explicit zeros t = Tensor.empty([100, 100]) # fast, uninitialized # From data (one copy into shared memory) t = Tensor.from_numpy(np.array(...)) # from NumPy t = Tensor.from_torch(torch_tensor) # from PyTorch t = Tensor.from_dlpack(any_dlpack_obj) # from any DLPack source ``` ## Properties ```python # simplified t.shape # [480, 640, 3] t.dtype # "float32" t.nbytes # total bytes t.numel # total elements len(t) # first dimension ``` ## Framework Interop Every Python ML library works with `horus.Tensor` out of the box: ### NumPy ```python # simplified arr = t.numpy() # zero-copy view arr = np.asarray(t) # also zero-copy (via __array_interface__) arr = np.from_dlpack(t) # also works (via __dlpack__) ``` ### PyTorch ```python # simplified pt = torch.from_dlpack(t) # zero-copy into PyTorch t = Tensor.from_torch(pt) # copy into shared memory # Use directly in models features = model(torch.from_dlpack(t)) action = Tensor.from_torch(policy(obs_tensor)) ``` ### SciPy ```python # simplified # Signal processing on IMU data filtered = scipy.signal.filtfilt(b, a, t.numpy()) # Distance transform on costmap inflated = scipy.ndimage.distance_transform_edt(t.numpy()) # KD-tree on point cloud tree = scipy.spatial.KDTree(t.numpy()) ``` ### scikit-learn ```python # simplified # Normalize sensor data scaled = StandardScaler().fit_transform(t.numpy()) # PCA on feature vectors reduced = PCA(n_components=10).fit_transform(t.numpy()) # Cluster point cloud labels = DBSCAN(eps=0.5).fit_predict(t.numpy()) ``` ### OpenCV ```python # simplified # Image processing gray = cv2.cvtColor(t.numpy(), cv2.COLOR_RGB2GRAY) edges = cv2.Canny(t.numpy(), 50, 150) resized = cv2.resize(t.numpy(), (320, 240)) ``` ### Pandas ```python # simplified # Robot telemetry as DataFrame df = pd.DataFrame(t.numpy(), columns=["x", "y", "theta", "v"]) rolling_avg = df.rolling(50).mean() ``` ### JAX ```python # simplified jax_arr = jnp.array(t.numpy()) grad = jax.grad(loss_fn)(jax_arr) ``` ## Pythonic Operations ### Indexing ```python # simplified t[0] # first row/element t[10:20] # slice t[0, :, 3] # multi-dimensional t[5] = 42.0 # write t[0:10] = arr # slice write ``` ### Shape Operations ```python # simplified t.reshape(100, 100) # reshape (view, no copy if contiguous) t.reshape([50, 200]) # list form t.flatten() # 1D view t.squeeze() # remove size-1 dims t.unsqueeze(0) # add dim: [5] -> [1, 5] t.unsqueeze(-1) # add dim: [5] -> [5, 1] t.T # transpose (2D) t.view([2, 3, 4]) # explicit reshape t.slice(10, 20) # slice first dimension ``` ### Arithmetic Returns a new `Tensor` backed by shared memory: ```python # simplified result = t + 10 # scalar result = a + b # tensor + tensor result = t + np_array # tensor + numpy result = t * 2.0 result = t / 5 result = t - other result = -t # negation ``` ### Comparisons Returns a bool `Tensor`: ```python # simplified mask = t > 0.5 mask = t == 0 mask = t < threshold mask = t >= 2.0 mask = t <= other ``` ### Reductions ```python # simplified t.sum() # scalar sum (returns 1-element Tensor) t.sum(dim=0) # sum along axis t.mean() t.mean(dim=1) t.max() t.min(dim=0) ``` ### Type Conversion ```python # simplified t.astype("float16") # returns new Tensor t.to_float32() # convenience t.to_float16() t.to_int32() t.to_uint8() t.tolist() # Python list ``` ## Supported Dtypes | dtype | NumPy | PyTorch | Bytes | |-------|-------|---------|-------| | `"float32"` | `float32` | `float32` | 4 | | `"float64"` | `float64` | `float64` | 8 | | `"float16"` | `float16` | `float16` | 2 | | `"int8"` | `int8` | `int8` | 1 | | `"int16"` | `int16` | `int16` | 2 | | `"int32"` | `int32` | `int32` | 4 | | `"int64"` | `int64` | `int64` | 8 | | `"uint8"` | `uint8` | `uint8` | 1 | | `"uint16"` | `uint16` | -- | 2 | | `"uint32"` | `uint32` | -- | 4 | | `"uint64"` | `uint64` | -- | 8 | | `"bool"` | `bool_` | `bool` | 1 | ## Usage with Topics ```python # simplified from horus import Tensor, Topic topic = Topic("features", "Tensor") # Publish — 168B descriptor through ring buffer, data stays in SHM t = Tensor.from_numpy(np.random.randn(64, 64).astype(np.float32)) topic.send(t) # Subscribe — zero-copy access received = topic.recv() if received: arr = received.numpy() # direct view into shared memory ``` ## Relation to Image, PointCloud, DepthImage `Tensor` is the general-purpose type. `Image`, `PointCloud`, and `DepthImage` are specialized wrappers with domain-specific methods (`.width`, `.encoding`, `.point_count`). All domain types can be converted to a `Tensor` view via `.as_tensor()`: ```python # simplified img = horus.Image(480, 640, "rgb8") t = img.as_tensor() # zero-copy Tensor view t.shape # [480, 640, 3] torch.from_dlpack(t) # works t.reshape(480 * 640, 3) # works t + 50 # works ``` Domain types also support direct indexing and arithmetic: ```python # simplified img[240, 320] # pixel access img + 50 # returns Tensor cloud[0] # first point len(cloud) # number of points depth[100, 200] # depth value ``` ## Robotics Examples ### Costmap for Path Planning ```python # simplified grid = horus.Tensor([500, 500], dtype="float32") arr = grid.numpy() arr[:] = compute_costmap() # Inflate obstacles with scipy from scipy.ndimage import distance_transform_edt obstacles = (arr > 0.8).astype(np.float32) inflated = 1.0 - np.clip(distance_transform_edt(1 - obstacles) / 20.0, 0, 1) grid.numpy()[:] = inflated topic.send(grid) ``` ### RL Policy Inference ```python # simplified # Observation from sensors obs = horus.Tensor.from_numpy( np.concatenate([imu_data, lidar_ranges, joint_positions]).astype(np.float32) ) # PyTorch inference pt_obs = torch.from_dlpack(obs) with torch.no_grad(): action = policy(pt_obs) # Send command cmd = horus.Tensor.from_torch(action) cmd_topic.send(cmd) ``` ### Feature Map Between Nodes ```python # simplified # Detection node img_tensor = torch.from_dlpack(img.as_tensor()) features = backbone(img_tensor.unsqueeze(0).float() / 255.0) feature_map = horus.Tensor.from_torch(features.squeeze(0)) feature_topic.send(feature_map) # Planning node features = feature_topic.recv() planning_input = features.numpy() # zero-copy into numpy path = planner.plan(planning_input) ``` ## See Also - [Memory Types](/python/api/memory-types) -- Overview of all pool-backed types - [Image](/python/api/image) -- Camera images - [PointCloud](/python/api/pointcloud) -- 3D point clouds - [DepthImage](/python/api/depth-image) -- Depth maps - [Topic](/concepts/core-concepts-topic) -- Zero-copy IPC --- ## Python TransformFrame Path: /python/api/transform-frame Description: Coordinate frame management and 3D transform lookups in Python # Python TransformFrame HORUS provides a Python API for coordinate frame management — registering frames, updating transforms, and looking up transformations between any two frames in the tree. ## Transform A 3D rigid transformation (translation + quaternion rotation). ### Creating Transforms ```python # simplified from horus import Transform # Identity transform tf = Transform.identity() # From translation and rotation quaternion tf = Transform(translation=[1.0, 2.0, 0.0], rotation=[0.0, 0.0, 0.0, 1.0]) # From translation only (identity rotation) tf = Transform.from_translation([1.0, 2.0, 3.0]) # From Euler angles (translation + roll/pitch/yaw) tf = Transform.from_euler([1.0, 0.0, 0.5], [0.0, 0.1, 1.57]) # From 4x4 homogeneous matrix tf = Transform.from_matrix(matrix_4x4) ``` ### Properties ```python # simplified tf.translation # [x, y, z] as list of floats tf.rotation # [x, y, z, w] quaternion as list of floats ``` Both are readable and writable: ```python # simplified tf.translation = [2.0, 3.0, 0.0] tf.rotation = [0.0, 0.0, 0.383, 0.924] # 45 degrees around Z ``` ### Methods ```python # simplified # Convert to Euler angles roll, pitch, yaw = tf.to_euler() # Compose transforms: result = self * other combined = parent_tf.compose(child_tf) # Inverse transform inv = tf.inverse() # Apply to a point point_in_world = tf.transform_point([1.0, 0.0, 0.0]) # Apply rotation only (no translation) rotated = tf.transform_vector([1.0, 0.0, 0.0]) # Interpolate between two transforms (SLERP for rotation) halfway = tf_a.interpolate(tf_b, t=0.5) # Magnitudes dist = tf.translation_magnitude() # Translation distance angle = tf.rotation_angle() # Rotation angle in radians # Export as 4x4 matrix matrix = tf.to_matrix() # [[f64; 4]; 4] ``` | Method | Returns | Description | |--------|---------|-------------| | `identity()` | `Transform` | No translation, no rotation | | `from_translation([x,y,z])` | `Transform` | Translation only | | `from_euler([x,y,z], [r,p,y])` | `Transform` | Translation + Euler angles | | `from_matrix(4x4)` | `Transform` | From homogeneous matrix | | `to_euler()` | `[roll, pitch, yaw]` | Get Euler angles | | `compose(other)` | `Transform` | Chain transforms (self * other) | | `inverse()` | `Transform` | Compute inverse | | `transform_point([x,y,z])` | `[x,y,z]` | Apply full transform to point | | `transform_vector([x,y,z])` | `[x,y,z]` | Apply rotation only to vector | | `interpolate(other, t)` | `Transform` | SLERP interpolation (0.0-1.0) | | `translation_magnitude()` | `float` | Translation distance | | `rotation_angle()` | `float` | Rotation angle in radians | | `to_matrix()` | `4x4 list` | Export as homogeneous matrix | --- ## TransformFrame The frame tree manager. Stores a hierarchy of coordinate frames and computes transforms between any two frames. ### Creating a TransformFrame ```python # simplified from horus import TransformFrame, TransformFrameConfig # Default configuration tf_tree = TransformFrame() # With custom config config = TransformFrameConfig(max_frames=1024, history_len=64) tf_tree = TransformFrame(config=config) # Preset sizes tf_tree = TransformFrame.small() # 256 frames, ~550KB tf_tree = TransformFrame.medium() # 1024 frames, ~2.2MB tf_tree = TransformFrame.large() # 4096 frames, ~9MB tf_tree = TransformFrame.massive() # 16384 frames, ~35MB ``` ### Registering Frames ```python # simplified # Register child frame under a parent — returns frame ID (int) frame_id = tf_tree.register_frame("base_link", "world") tf_tree.register_frame("lidar", "base_link") tf_tree.register_frame("camera", "base_link") # Unregister a frame (raises HorusNotFoundError if not found) tf_tree.unregister_frame("camera") ``` ### Updating Transforms ```python # simplified from horus import Transform # Set the transform from parent to child (raises on unknown frame) tf_tree.update_transform("base_link", Transform( translation=[1.0, 0.0, 0.0], rotation=[0.0, 0.0, 0.0, 1.0] )) # With optional explicit timestamp tf_tree.update_transform("lidar", Transform.from_translation([0.0, 0.0, 0.3]), timestamp_ns=1234567890) ``` ### Looking Up Transforms ```python # simplified # Get transform from source frame to target frame tf = tf_tree.tf("lidar", "world") print(f"Lidar in world: {tf.translation}") # With specific timestamp (for interpolation) tf = tf_tree.tf_at("lidar", "world", timestamp_ns=1234567890) ``` ### Querying the Tree ```python # simplified # List all registered frames frames = tf_tree.all_frames() # ["world", "base_link", "lidar", "camera"] # Get parent of a frame parent = tf_tree.parent("lidar") # "base_link" # Get children of a frame children = tf_tree.children("base_link") # ["lidar", "camera"] # Seconds since last transform update (None if never updated) age = tf_tree.time_since_last_update("lidar") # e.g., 0.015 # Wait for a transform to become available (blocking with timeout) tf = tf_tree.wait_for_transform("lidar", "world", timeout_sec=1.0) print(f"Got transform: {tf.translation}") ``` ### Frame Registration | Method | Returns | Description | |--------|---------|-------------| | `register_frame(name, parent)` | `int` | Register dynamic frame, returns frame ID | | `register_static_frame(name, transform, parent=None)` | `int` | Register frame with fixed transform | | `unregister_frame(name)` | `None` | Remove a frame | | `has_frame(name)` | `bool` | Check if frame exists | | `frame_count()` | `int` | Number of registered frames | | `frame_id(name)` | `int` | Get numeric frame ID from name | | `frame_name(id)` | `str` | Get frame name from numeric ID | ### Updating Transforms | Method | Returns | Description | |--------|---------|-------------| | `update_transform(name, tf, timestamp_ns=None)` | `None` | Set transform for a frame | | `update_transform_by_id(frame_id, tf, timestamp_ns=None)` | `None` | Update by numeric ID (faster) | | `set_static_transform(name, transform)` | `None` | Set a static (never-changing) transform | ### Looking Up Transforms | Method | Returns | Description | |--------|---------|-------------| | `tf(source, target)` | `Transform` | Latest transform between frames | | `tf_at(source, target, ts)` | `Transform` | Transform at specific timestamp (interpolated) | | `tf_at_strict(source, target, ts)` | `Transform` | Exact timestamp match (no interpolation) | | `tf_at_with_tolerance(src, dst, ts, tolerance_ns=100_000_000)` | `Transform` | Interpolated with tolerance window | | `tf_by_id(src_id, dst_id)` | `Transform` | Look up by numeric IDs (fastest) | | `can_transform(source, target)` | `bool` | Check if transform path exists | | `can_transform_at(src, dst, ts)` | `bool` | Check if available at timestamp | | `can_transform_at_with_tolerance(src, dst, ts, tolerance_ns)` | `bool` | Check with tolerance | | `wait_for_transform(src, dst, timeout_sec)` | `Transform` | Block until available | | `wait_for_transform_async(src, dst, timeout_sec)` | `Future[Transform]` | Async wait (returns `concurrent.futures.Future`) | ### Applying Transforms | Method | Returns | Description | |--------|---------|-------------| | `transform_point(source, target, [x,y,z])` | `[x,y,z]` | Transform a point between frames | | `transform_vector(source, target, [x,y,z])` | `[x,y,z]` | Transform a vector (rotation only) | ### Querying the Tree | Method | Returns | Description | |--------|---------|-------------| | `all_frames()` | `list[str]` | All registered frame names | | `parent(name)` | `str` or `None` | Parent frame name | | `children(name)` | `list[str]` | Child frame names | | `frame_chain(source, target)` | `list[str]` | Frame path from source to target | | `time_since_last_update(name)` | `float` or `None` | Seconds since last update | | `is_stale(name, max_age_sec=1.0)` | `bool` | Check if frame data is stale | ### Diagnostics | Method | Returns | Description | |--------|---------|-------------| | `stats()` | `dict` | Frame tree statistics | | `validate()` | `bool` | Validate tree integrity | | `frame_info(name)` | `dict` | Metadata for a single frame | | `frame_info_all()` | `list[dict]` | Metadata for all frames | | `format_tree()` | `str` | Human-readable tree visualization | | `frames_as_dot()` | `str` | DOT graph format (for Graphviz) | | `frames_as_yaml()` | `str` | YAML export of frame tree | #### `stats()` Return Value The `stats()` method returns a dictionary with the following keys: | Key | Type | Description | |-----|------|-------------| | `total_frames` | `int` | Total registered frames | | `static_frames` | `int` | Frames that never change | | `dynamic_frames` | `int` | Frames updated at runtime | | `max_frames` | `int` | Maximum capacity | | `history_len` | `int` | Transform history buffer size | | `tree_depth` | `int` | Maximum depth of the frame tree | | `root_count` | `int` | Number of root frames (no parent) | ```python # simplified stats = tf_tree.stats() print(f"Frames: {stats['total_frames']}/{stats['max_frames']}") print(f"Tree depth: {stats['tree_depth']}, Roots: {stats['root_count']}") ``` #### `frame_info(name)` Return Value The `frame_info(name)` method returns a dictionary with metadata for a single frame: | Key | Type | Description | |-----|------|-------------| | `name` | `str` | Frame name | | `id` | `int` | Internal frame ID | | `parent` | `str` or `None` | Parent frame name (`None` for root) | | `is_static` | `bool` | Whether this frame never changes | ```python # simplified info = tf_tree.frame_info("camera") print(f"Frame: {info['name']}, Parent: {info['parent']}, Static: {info['is_static']}") ``` ### Advanced Usage ```python # simplified # Static frames — set once, never changes tf_tree.register_static_frame("lidar", Transform.from_translation([0.0, 0.0, 0.3]), parent="base_link") # Transform points directly between frames world_point = tf_tree.transform_point("lidar", "world", [5.0, 0.0, 0.0]) # Async wait (non-blocking) import concurrent.futures future = tf_tree.wait_for_transform_async("lidar", "world", timeout_sec=2.0) tf = future.result() # Blocks until ready # Check staleness before using data if tf_tree.is_stale("base_link", max_age_sec=0.1): print("Odometry data is stale!") # Diagnostics print(tf_tree.format_tree()) # Visual tree print(tf_tree.stats()) # {"frames": 5, "lookups": 1234, ...} tf_tree.validate() # Checks tree integrity ``` --- ## TransformFrameConfig Configuration for the frame tree. ```python # simplified from horus import TransformFrameConfig # Custom config = TransformFrameConfig(max_frames=512, history_len=16) # Presets config = TransformFrameConfig.small() # 256 frames, ~550KB config = TransformFrameConfig.medium() # 1024 frames, ~2.2MB config = TransformFrameConfig.large() # 4096 frames, ~9MB config = TransformFrameConfig.massive() # 16384 frames, ~35MB # Check memory usage print(config.max_frames) # 512 print(config.history_len) # 16 print(config.memory_estimate()) # "~1.1MB" ``` --- ## Complete Example ```python # simplified from horus import Node, Scheduler, TransformFrame, Transform import math # Global transform tree tf_tree = TransformFrame.medium() def setup_frames(node): tf_tree.register_frame("base_link", "world") tf_tree.register_frame("lidar", "base_link") tf_tree.register_frame("camera", "base_link") # Static transform: lidar is 30cm above base tf_tree.update_transform("lidar", Transform.from_translation([0.0, 0.0, 0.3])) # Static transform: camera is 10cm forward, 15cm up tf_tree.update_transform("camera", Transform.from_translation([0.1, 0.0, 0.15])) tick_count = 0 def odometry_tick(node): global tick_count tick_count += 1 # Simulate robot moving in a circle t = tick_count * 0.01 x = math.cos(t) * 2.0 y = math.sin(t) * 2.0 yaw = t + math.pi / 2 # Update base_link in world tf_tree.update_transform("base_link", Transform.from_euler([x, y, 0.0], [0.0, 0.0, yaw])) def perception_tick(node): # Transform a lidar point into world coordinates try: lidar_to_world = tf_tree.tf("lidar", "world") point_in_world = lidar_to_world.transform_point([5.0, 0.0, 0.0]) node.log_info(f"Obstacle at world: {point_in_world}") except Exception: pass # Frame not yet available odom = Node(name="odom", tick=odometry_tick, init=setup_frames, rate=100, order=0) percept = Node(name="perception", tick=perception_tick, rate=10, order=1) scheduler = Scheduler() scheduler.add(odom) scheduler.add(percept) scheduler.run(duration=10) ``` ## Utility ```python # simplified from horus import get_timestamp_ns # Get current time in nanoseconds (same clock as Rust) now = get_timestamp_ns() ``` ## Error Handling | Operation | Exception | When | |-----------|-----------|------| | `tf_tree.tf("missing", "world")` | `HorusNotFoundError` | Frame not registered | | `tf_tree.tf("lidar", "disconnected_tree")` | `HorusTransformError` | No path between frames | | `tf_tree.wait_for_transform(..., timeout_sec=1.0)` | `HorusTimeoutError` | Transform not available within timeout | | `tf_tree.unregister_frame("missing")` | `HorusNotFoundError` | Frame doesn't exist | | `tf_tree.register_frame("child", "missing_parent")` | `HorusNotFoundError` | Parent frame not registered | | `tf_tree.tf_at_strict(src, dst, stale_ts)` | `HorusTransformError` | No exact timestamp match | | `Transform.from_matrix(bad_matrix)` | `ValueError` | Not a valid 4x4 matrix | ```python # simplified from horus import HorusNotFoundError, HorusTransformError, HorusTimeoutError # Safe transform lookup try: tf = tf_tree.tf("lidar", "world") point_in_world = tf.transform_point([5.0, 0.0, 0.0]) except HorusNotFoundError: pass # Frame not yet registered — skip this tick except HorusTransformError as e: print(f"Transform failed: {e}") # Stale data or disconnected tree ``` --- ## Design Decisions **Why SLERP for rotation interpolation?** Linear interpolation of quaternions produces non-unit quaternions (invalid rotations). SLERP (Spherical Linear Interpolation) follows the shortest arc on the unit sphere, producing valid rotations at every interpolation step. This matters because transform lookups between timestamps use interpolation — incorrect interpolation means incorrect robot pose, which compounds through the frame tree. **Why bounded transform history instead of unlimited?** Each frame stores the last N transforms (configurable via `history_len`). Unlimited history would leak memory in long-running robots. Bounded history means old transforms are discarded — if you query a timestamp older than the history window, you get a `HorusTransformError`. The default history length (64) covers ~1 second at typical update rates, which is sufficient for sensor fusion and safe enough for memory. **Why separate static and dynamic frames?** Static frames (sensor mounts, fixed offsets) never change — storing history for them wastes memory and adds lookup overhead. `register_static_frame()` stores exactly one transform and skips interpolation. Dynamic frames (robot base, moving joints) need timestamped history for interpolation. Separating them lets the system optimize each case: static lookups are O(1) with no interpolation, dynamic lookups use binary search + SLERP. **Why lock-free shared memory for the frame tree?** Multiple nodes read transforms concurrently (perception, control, visualization). A mutex-protected tree would serialize all readers, creating a bottleneck. The lock-free implementation uses atomic operations so readers never block each other or the writer. The cost is slightly more complex update logic, but the benefit is zero contention in multi-node systems. --- ## See Also - [TransformFrame Concepts](/concepts/transform-frame) — Architecture and design - [Python Bindings](/python/api/python-bindings) — Core Python API - [Python Geometry Messages](/python/messages/geometry) — TransformStamped type --- ## Perception Types Path: /python/api/perception Description: Python perception types for object detection, tracking, pose estimation, and point cloud processing # Python Perception Types Types for computer vision pipelines — object detection, tracking, pose estimation, and point cloud processing. **Quick example** — publish YOLO detection results: ```python # simplified import horus from horus import Image, Detection, DetectionList, BoundingBox2D, Topic model = load_yolo("model.pt") sub_image = Topic(Image, "camera.rgb") pub_detections = Topic(DetectionList, "detections") def detector_tick(node): img = sub_image.recv() if img is not None: results = model.predict(img.to_numpy()) detections = DetectionList() for r in results: det = Detection( label=r.class_name, confidence=r.score, bbox=BoundingBox2D( x=r.x, y=r.y, width=r.w, height=r.h, ), ) detections.add(det) pub_detections.send(detections) detector = horus.Node(name="detector", tick=detector_tick, rate=30, subs=["camera.rgb"], pubs=["detections"]) ``` ```python # simplified from horus import ( BoundingBox2D, Detection, DetectionList, PointXYZ, PointXYZRGB, PointCloudBuffer, Landmark, Landmark3D, LandmarkArray, TrackedObject, COCOPose, ) ``` --- ## BoundingBox2D 2D bounding box in pixel coordinates. ```python # simplified bbox = BoundingBox2D(x=10.0, y=20.0, width=100.0, height=200.0) bbox = BoundingBox2D.from_center(cx=60.0, cy=120.0, width=100.0, height=200.0) ``` | Property / Method | Returns | Description | |-------------------|---------|-------------| | `.x`, `.y`, `.width`, `.height` | `float` | Box coordinates | | `.center_x()`, `.center_y()` | `float` | Center point | | `.area()` | `float` | Area in pixels² | | `.iou(other)` | `float` | Intersection over Union | | `.as_tuple()` | `(x, y, w, h)` | XYWH format | | `.as_xyxy()` | `(x1, y1, x2, y2)` | Corner format | --- ## Detection 2D object detection result. ```python # simplified det = Detection(class_name="person", confidence=0.95, x=10.0, y=20.0, width=100.0, height=200.0) det = Detection.from_bbox(bbox, class_name="car", confidence=0.87) ``` | Property / Method | Returns | Description | |-------------------|---------|-------------| | `.bbox` | `BoundingBox2D` | Bounding box | | `.confidence` | `float` | Detection confidence (0-1) | | `.class_id` | `int` | Numeric class identifier | | `.class_name` | `str` | Class label string | | `.instance_id` | `int` | Instance tracking ID | | `.is_confident(threshold)` | `bool` | Check if above threshold | | `.to_bytes()` / `.from_bytes(data)` | `bytes` / `Detection` | Serialization | --- ## DetectionList Filterable collection of detections with iteration support. ```python # simplified detections = DetectionList() detections.append(Detection("person", 0.95, 10, 20, 100, 200)) detections.append(Detection("car", 0.72, 300, 150, 80, 60)) detections.append(Detection("person", 0.45, 500, 100, 90, 180)) # Filter by confidence confident = detections.filter_confidence(0.7) # 2 detections # Filter by class people = detections.filter_class("person") # 2 detections # Iterate for det in detections: print(f"{det.class_name}: {det.confidence:.2f}") # Index access first = detections[0] count = len(detections) # Convert to dicts (for JSON/logging) dicts = detections.to_dicts() # Serialization data = detections.to_bytes() restored = DetectionList.from_bytes(data) ``` | Method | Returns | Description | |--------|---------|-------------| | `.append(det)` | — | Add a detection | | `.filter_confidence(threshold)` | `DetectionList` | Keep detections above threshold | | `.filter_class(name)` | `DetectionList` | Keep only matching class | | `.to_dicts()` | `list[dict]` | Convert to list of Python dicts | | `.to_bytes()` / `.from_bytes(data)` | `bytes` / `DetectionList` | Serialization | | `len(detections)` | `int` | Number of detections | | `detections[i]` | `Detection` | Index access | | `for det in detections` | — | Iteration | --- ## PointXYZ / PointXYZRGB Individual 3D point types. ```python # simplified point = PointXYZ(x=1.0, y=2.0, z=3.0) print(point.distance()) # Distance from origin print(point.distance_to(other_point)) # Distance between points np_arr = point.to_numpy() # [1.0, 2.0, 3.0] colored = PointXYZRGB(x=1.0, y=2.0, z=3.0, r=255, g=0, b=0) print(colored.rgb()) # (255, 0, 0) print(colored.xyz()) # PointXYZ(1.0, 2.0, 3.0) ``` --- ## PointCloudBuffer Mutable point cloud buffer for building point clouds incrementally. ```python # simplified buffer = PointCloudBuffer(capacity=10000, frame_id="lidar_front") # Add points one at a time buffer.add_point(1.0, 2.0, 3.0) buffer.add_point(4.0, 5.0, 6.0) # From NumPy — shape (N, 3) buffer = PointCloudBuffer.from_numpy(np_points, frame_id="lidar") # Access point = buffer[0] # PointXYZ count = len(buffer) np_arr = buffer.to_numpy() # Shape (N, 3) data = buffer.to_bytes() # Serialization ``` --- ## TrackedObject Object with tracking state (for multi-object tracking pipelines). ```python # simplified tracked = TrackedObject( track_id=42, bbox=BoundingBox2D(10, 20, 100, 200), class_name="person", confidence=0.95, ) ``` | Property / Method | Returns | Description | |-------------------|---------|-------------| | `.track_id` | `int` | Unique track identifier | | `.bbox` | `BoundingBox2D` | Current bounding box | | `.confidence` | `float` | Detection confidence | | `.class_id` / `.class_name` | `int` / `str` | Class info | | `.velocity` | `(float, float)` | Estimated (vx, vy) in pixels/frame | | `.speed()` | `float` | Speed magnitude | | `.age` | `int` | Frames since creation | | `.hits` | `int` | Successful detections | | `.is_tentative()` | `bool` | Not yet confirmed | | `.is_confirmed()` | `bool` | Track is confirmed | | `.is_deleted()` | `bool` | Track marked for deletion | | `.update(bbox, confidence)` | — | Update with new detection | | `.mark_missed()` | — | No detection this frame | | `.confirm()` / `.delete()` | — | State transitions | ### Tracking Pipeline Example ```python # simplified from horus import DetectionList, TrackedObject, Topic tracks: dict[int, TrackedObject] = {} def tracker_tick(node): detections = det_topic.recv() if not detections: return # Simple nearest-neighbor matching matched = match_detections(tracks, detections) for track_id, det in matched.items(): tracks[track_id].update(det.bbox, det.confidence) for track_id in unmatched_tracks: tracks[track_id].mark_missed() if tracks[track_id].age > 30: tracks[track_id].delete() ``` --- ## COCOPose Constants for COCO 17-keypoint pose estimation. ```python # simplified from horus import COCOPose # Keypoint indices COCOPose.NOSE # 0 COCOPose.LEFT_EYE # 1 COCOPose.RIGHT_EYE # 2 COCOPose.LEFT_EAR # 3 COCOPose.RIGHT_EAR # 4 COCOPose.LEFT_SHOULDER # 5 COCOPose.RIGHT_SHOULDER # 6 COCOPose.LEFT_ELBOW # 7 COCOPose.RIGHT_ELBOW # 8 COCOPose.LEFT_WRIST # 9 COCOPose.RIGHT_WRIST # 10 COCOPose.LEFT_HIP # 11 COCOPose.RIGHT_HIP # 12 COCOPose.LEFT_KNEE # 13 COCOPose.RIGHT_KNEE # 14 COCOPose.LEFT_ANKLE # 15 COCOPose.RIGHT_ANKLE # 16 COCOPose.NUM_KEYPOINTS # 17 ``` ### Pose Estimation Example ```python # simplified from horus import Landmark, LandmarkArray, COCOPose # Create landmarks from pose model output landmarks = LandmarkArray(num_landmarks=17, dimension=2) landmarks.confidence = 0.92 # Access specific keypoints nose = Landmark(x=320.0, y=200.0, visibility=0.99, index=COCOPose.NOSE) left_wrist = Landmark(x=450.0, y=380.0, visibility=0.85, index=COCOPose.LEFT_WRIST) if nose.is_visible(0.5) and left_wrist.is_visible(0.5): dist = nose.distance_to(left_wrist) print(f"Nose to wrist: {dist:.1f}px") ``` --- ## Error Handling Perception types raise standard Python exceptions: | Operation | Exception | When | |-----------|-----------|------| | `Detection(confidence=-1)` | `ValueError` | Confidence outside 0.0–1.0 | | `BoundingBox2D(width=-5)` | `ValueError` | Negative dimensions | | `detections[99]` on 3-item list | `IndexError` | Index out of range | | `TrackedObject.update(None, 0.5)` | `TypeError` | Wrong argument type | | `DetectionList.from_bytes(bad_data)` | `ValueError` | Corrupt or incompatible bytes | | `PointCloudBuffer(capacity=0)` | `ValueError` | Zero capacity | ```python # simplified # Safe detection pipeline try: detections = DetectionList.from_bytes(raw_data) confident = detections.filter_confidence(0.7) except ValueError as e: print(f"Bad detection data: {e}") confident = DetectionList() # Empty fallback ``` --- ## Design Decisions **Why is `Detection` immutable but `DetectionList` is mutable?** A detection is a single observation from a model — changing its confidence or bounding box after creation would make debugging impossible ("where did this value come from?"). The list, however, needs filtering, appending, and iteration as detections flow through the pipeline. Immutable atoms in a mutable container is a common pattern in data processing. **Why `DetectionList` instead of a plain Python `list`?** `DetectionList` provides domain-specific operations (`filter_confidence`, `filter_class`, `to_bytes`) and efficient serialization for IPC. A plain list would require manual filtering and custom serialization code in every node. The wrapper keeps pipeline code concise: `detections.filter_confidence(0.7).filter_class("person")`. **Why does `TrackedObject` have explicit state transitions (`confirm`, `delete`, `mark_missed`)?** Object tracking requires lifecycle management — a detection must be seen multiple times before it's trusted ("confirmed"), and must be missing for several frames before it's removed ("deleted"). Explicit state methods make the lifecycle visible in code and prevent silent state corruption from ad-hoc flag setting. The states follow the standard SORT/DeepSORT tracking pattern. **Why separate `PointXYZ` and `PointCloudBuffer`?** `PointXYZ` is a value type for individual point operations (distance, transform). `PointCloudBuffer` is a container optimized for bulk operations (NumPy conversion, serialization). Trying to make one type serve both roles would compromise the API — individual point access would be slow on bulk containers, and bulk operations would be awkward on individual points. --- ## See Also - [Python Message Library](/python/library/python-message-library) — All 55+ message types - [Image](/python/api/image), [PointCloud](/python/api/pointcloud), [DepthImage](/python/api/depth-image) - [ML Utilities](/python/library/ml-utilities) — ML framework integration --- ## Python PointCloud Path: /python/api/pointcloud Description: Zero-copy 3D point clouds backed by shared memory with NumPy, PyTorch, and JAX interop # Python PointCloud A 3D point cloud backed by shared memory for zero-copy inter-process communication. Supports XYZ, XYZI (intensity), and XYZRGB (color) point formats with direct conversion to NumPy, PyTorch, and JAX. ## When to Use Use `PointCloud` when your robot has a LiDAR, depth camera, or stereo vision system producing 3D point data. A 100K-point cloud transfers between nodes in microseconds via shared memory — no serialization. **ROS2 equivalent:** `sensor_msgs/PointCloud2` — same concept, but HORUS uses shared memory pools instead of serialized byte arrays. ## Constructor ```python # simplified from horus import PointCloud # PointCloud(num_points, fields, dtype) cloud = PointCloud(num_points=10000, fields=3, dtype="float32") ``` **Parameters:** - `num_points: int` — Number of points to allocate - `fields: int` — Floats per point: `3` = XYZ, `4` = XYZI, `6` = XYZRGB (default: `3`) - `dtype: str` — Data type (default: `"float32"`) ### Factory Methods ```python # simplified # From NumPy — shape (N, F) where F = fields per point import numpy as np points = np.random.randn(10000, 3).astype(np.float32) cloud = PointCloud.from_numpy(points) # From PyTorch tensor import torch tensor = torch.randn(10000, 3) cloud = PointCloud.from_torch(tensor) ``` | Factory | Parameters | Use case | |---------|-----------|----------| | `PointCloud(n, f, dtype)` | count, fields, dtype | Pre-allocate for filling | | `PointCloud.from_numpy(arr)` | ndarray shape (N, F) | LiDAR driver output | | `PointCloud.from_torch(tensor)` | Tensor shape (N, F) | ML model output | ## Properties | Property | Type | Description | |----------|------|-------------| | `point_count` | `int` | Number of points | | `fields_per_point` | `int` | Floats per point (3, 4, or 6) | | `dtype` | `str` | Data type string (e.g., `"float32"`) | | `nbytes` | `int` | Total data size in bytes | | `frame_id` | `str` | Sensor coordinate frame (e.g., `"lidar_front"`) | | `timestamp_ns` | `int` | Timestamp in nanoseconds | ### Point Format Queries ```python # simplified cloud.is_xyz() # True if 3 fields (XYZ only) cloud.has_intensity() # True if 4+ fields (XYZI) cloud.has_color() # True if 6+ fields (XYZRGB) ``` ## Methods ### Point Access ```python # simplified # Get i-th point as list of floats point = cloud.point_at(0) # e.g., [1.0, 2.0, 3.0] ``` ### Framework Conversions ```python # simplified # To NumPy — zero-copy, shape (N, F) np_points = cloud.to_numpy() # To PyTorch — zero-copy via DLPack torch_points = cloud.to_torch() # To JAX — zero-copy via DLPack jax_points = cloud.to_jax() ``` ### Metadata ```python # simplified cloud.set_frame_id("lidar_front") cloud.set_timestamp_ns(horus.timestamp_ns()) ``` ## Complete Example ```python # simplified import horus from horus import PointCloud, Topic import numpy as np scan_topic = Topic(PointCloud) def lidar_tick(node): # Simulate LiDAR scan (10000 XYZ points) points = np.random.randn(10000, 3).astype(np.float32) cloud = PointCloud.from_numpy(points) cloud.set_frame_id("lidar_front") scan_topic.send(cloud) def obstacle_tick(node): cloud = scan_topic.recv() if cloud: pts = cloud.to_numpy() # Zero-copy (N, 3) # Find points within 2m distances = np.linalg.norm(pts, axis=1) nearby = np.sum(distances < 2.0) if nearby > 100: node.log_warning(f"{nearby} points within 2m!") lidar = horus.Node(name="lidar", tick=lidar_tick, rate=10, order=0, pubs=["scan"]) detector = horus.Node(name="obstacle", tick=obstacle_tick, rate=10, order=1, subs=["scan"]) horus.run(lidar, detector) ``` ## Design Decisions **Why `fields_per_point` instead of named fields?** Point clouds in robotics use a small set of layouts: XYZ (3), XYZI (4), XYZRGB (6). A flat float array with a known field count is the fastest representation — no per-point struct overhead, direct memcpy to GPU, and trivial NumPy reshaping. Named fields would add indirection and prevent zero-copy to ML frameworks. **Why pool-backed?** Same as Image — shared memory pools enable zero-copy IPC. A 100K-point XYZ cloud is 1.2 MB. Serializing it takes milliseconds; sharing a 64-byte descriptor takes microseconds. --- ## See Also - [Image (Python)](/python/api/image) — Camera images - [DepthImage (Python)](/python/api/depth-image) — Depth maps - [PointCloudBuffer](/python/api/perception) — Incremental point cloud building - [LiDAR Obstacle Avoidance](/recipes/lidar-obstacle-avoidance) — Recipe using PointCloud --- ## Python DepthImage Path: /python/api/depth-image Description: Zero-copy depth images backed by shared memory with F32 meters and U16 millimeters support # Python DepthImage A depth image backed by shared memory for zero-copy inter-process communication. Supports F32 (meters) and U16 (millimeters) formats. Use for stereo cameras, structured light sensors (RealSense, Kinect), and ToF cameras. ## When to Use Use `DepthImage` when you need per-pixel depth values with typed access (`get_depth()` returns meters, `depth_statistics()` gives min/max/mean). For raw 16-bit depth transport without typed access, use `Image` with `"depth16"` encoding instead. **ROS2 equivalent:** `sensor_msgs/Image` with `encoding=32FC1` — but HORUS adds typed depth access methods and statistics. ## Constructor ```python # simplified from horus import DepthImage # F32 depth image (meters) — most common depth = DepthImage(height=480, width=640, dtype="float32") # U16 depth image (millimeters) — RealSense raw output depth_u16 = DepthImage(height=480, width=640, dtype="uint16") ``` **Parameters:** - `height: int` — Image height in pixels - `width: int` — Image width in pixels - `dtype: str` — `"float32"` (meters) or `"uint16"` (millimeters). Default: `"float32"` ### Factory Methods ```python # simplified # From NumPy — shape (H, W), dtype auto-detected import numpy as np depth_data = np.random.uniform(0.5, 5.0, (480, 640)).astype(np.float32) depth = DepthImage.from_numpy(depth_data) # From PyTorch tensor import torch depth = DepthImage.from_torch(torch.randn(480, 640)) ``` ## Properties | Property | Type | Description | |----------|------|-------------| | `height` | `int` | Image height | | `width` | `int` | Image width | | `dtype` | `str` | `"float32"` or `"uint16"` | | `nbytes` | `int` | Total data size in bytes | | `frame_id` | `str` | Camera coordinate frame | | `timestamp_ns` | `int` | Timestamp in nanoseconds | | `depth_scale` | `float` | Scale factor (1.0 for meters, 0.001 for mm→m) | ### Format Queries ```python # simplified depth.is_meters() # True if F32 (float32) depth.is_millimeters() # True if U16 (uint16) ``` ## Methods ### Depth Access ```python # simplified # Get depth at pixel — always returns meters as float d = depth.get_depth(320, 240) print(f"Center depth: {d:.3f}m") # Set depth at pixel — value in meters depth.set_depth(100, 100, 1.5) # Statistics (min, max, mean) — None if all pixels are invalid/zero stats = depth.depth_statistics() if stats: min_d, max_d, mean_d = stats print(f"Range: {min_d:.2f}–{max_d:.2f}m, mean: {mean_d:.2f}m") ``` | Method | Signature | Description | |--------|-----------|-------------| | `get_depth(x, y)` | `(int, int) -> float` | Depth in meters at pixel | | `set_depth(x, y, val)` | `(int, int, float) -> None` | Set depth in meters | | `depth_statistics()` | `() -> Optional[tuple[float, float, float]]` | (min, max, mean) in meters | ### Framework Conversions ```python # simplified # To NumPy — zero-copy, shape (H, W) np_depth = depth.to_numpy() # To PyTorch — zero-copy via DLPack torch_depth = depth.to_torch() # To JAX — zero-copy via DLPack jax_depth = depth.to_jax() ``` ### Metadata ```python # simplified depth.set_frame_id("depth_camera") depth.set_timestamp_ns(horus.timestamp_ns()) ``` ## Complete Example ```python # simplified import horus from horus import DepthImage, Topic import numpy as np depth_topic = Topic(DepthImage) def depth_camera_tick(node): # Simulate depth camera (0.3m–10m range) raw = np.random.uniform(0.3, 10.0, (480, 640)).astype(np.float32) depth = DepthImage.from_numpy(raw) depth.set_frame_id("realsense_depth") depth_topic.send(depth) def safety_tick(node): depth = depth_topic.recv() if depth: stats = depth.depth_statistics() if stats: min_d, _, _ = stats if min_d < 0.5: node.log_warning(f"Object at {min_d:.2f}m — too close!") # Check specific region (center pixel) center_d = depth.get_depth(320, 240) if center_d < 1.0: node.log_error(f"Collision risk: {center_d:.2f}m ahead") camera = horus.Node(name="depth_cam", tick=depth_camera_tick, rate=30, order=0, pubs=["depth"]) safety = horus.Node(name="safety", tick=safety_tick, rate=30, order=1, subs=["depth"]) horus.run(camera, safety) ``` ## Design Decisions **Why separate DepthImage instead of `Image` with depth encoding?** `Image` with `"depth16"` encoding gives you raw 16-bit values with no unit semantics. `DepthImage` adds typed depth access: `get_depth()` always returns meters (auto-converting from mm for U16), and `depth_statistics()` computes min/max/mean. Use `Image` for transport, `DepthImage` for processing. **Why F32 and U16 but not F64 or U32?** F32 (meters) gives 7 decimal digits of precision — sub-millimeter accuracy up to 1 km. More than any depth sensor provides. U16 (millimeters) covers 0–65.5m range, matching RealSense/Kinect native output. F64 and U32 would double memory usage with no practical benefit. --- ## See Also - [Image (Python)](/python/api/image) — Camera images - [PointCloud (Python)](/python/api/pointcloud) — 3D point clouds - [Image (Stdlib)](/stdlib/messages/image) — Image message type overview --- ## Python API Path: /python/api Description: Complete Python API reference for HORUS — Node, Scheduler, Topics, Clock, Drivers, and Messages # Python API You have a Python ML model running inference at 30Hz. You need it to receive camera images from a Rust sensor driver and publish detections to a Rust planner — all through shared memory at microsecond latency. This page is the complete reference for doing that. > **Two layers**: `_horus` (Rust PyO3 internals) and `horus` (Python wrapper you import). This page documents the `horus` layer — the user-facing API. ```python # simplified import horus ``` --- ## Quick Reference | API | Description | Page | |-----|-------------|------| | `horus.Node(...)` | Computation unit — tick, init, shutdown via kwargs | [Node API](/python/api/node) | | `horus.Scheduler(...)` | Node orchestrator — tick rate, RT, watchdog, recording | [Scheduler API](/python/api/scheduler) | | `horus.run(*nodes)` | One-liner: create scheduler, add nodes, run | [Scheduler API](/python/api/scheduler#run-convenience-function) | | `horus.Topic(type)` | Standalone pub/sub (outside node lifecycle) | [Topic API](/python/api/topic) | | `horus.now()` / `horus.dt()` | Framework clock (wall clock or SimClock) | [Clock API](/python/api/clock) | | `horus.CmdVel`, `horus.Imu`, ... | 75+ standard robotics message types | [Messages](/python/api/messages) | | `horus.Image`, `horus.PointCloud` | Zero-copy domain types (NumPy/DLPack) | [Image](/python/api/image), [PointCloud](/python/api/pointcloud) | | `horus.drivers.load()` | Load hardware drivers from `horus.toml` | [Drivers](/python/api/drivers) | | `horus.TransformFrame` | Coordinate frame tree | [TransformFrame](/python/api/transform-frame) | | `horus.Params()` | Runtime parameter store | [Rate & Params](/python/api/rate-params) | | `horus.Rate(hz)` | Drift-compensated rate limiter | [Rate & Params](/python/api/rate-params) | --- ## Node The primary building block — configure with kwargs, no inheritance needed. Constructor, methods, topic specs, lifecycle callbacks, and examples. **[Node API Reference →](/python/api/node)** --- ## Scheduler Orchestrates node execution — tick-rate control, RT scheduling, watchdogs, recording, deterministic mode. 25+ methods across lifecycle, introspection, safety, and recording. Includes `run()` one-liner. **[Scheduler API Reference →](/python/api/scheduler)** --- ## Clock Framework clock — `now()`, `dt()`, `elapsed()`, `tick()`, `budget_remaining()`, `rng_float()`. Wall clock vs SimClock behavior. Unit constants `horus.us`, `horus.ms`. **[Clock API Reference →](/python/api/clock)** --- ## Topic Standalone pub/sub for scripts, tests, and tools. Typed topics (~1.7μs, cross-language) vs GenericMessage (~6-50μs, Python-only). Performance comparison and cross-language compatibility rules. **[Topic API Reference →](/python/api/topic)** --- ## Drivers Load hardware node configs from `horus.toml`: ```python # simplified entries = horus.hardware.load() # From horus.toml [hardware] entries = horus.hardware.load_from("path") # From custom path horus.hardware.register_driver("name", MyClass) # Register Python node class ``` Returns `list[(name, obj)]` — registered classes are instantiated, others returned as `NodeParams`. **NodeParams**: `get(key)`, `get_or(key, default)`, `has(key)`, `keys()`, `params[key]` --- ## Error Types | Exception | When | |-----------|------| | `horus.HorusNotFoundError` | Topic, node, or resource not found | | `horus.HorusTransformError` | Transform lookup fails (frame not in tree) | | `horus.HorusTimeoutError` | Operation timed out | --- ## Supporting Types See [Node API](/python/api/node#supporting-types) for NodeInfo, NodeState, and Miss. --- ## Mixed Rust + Python Projects The most common production pattern: Rust for control loops, Python for ML inference. Both communicate via shared memory topics. ### Project Structure ``` my-robot/ ├── horus.toml # Shared config ├── src/ │ ├── sensor.rs # Rust: hardware driver at 1kHz │ └── detector.py # Python: YOLO inference at 30Hz └── launch.yaml # Launch both together ``` ### Which Types Cross the Boundary? | Type | Rust → Python | Python → Rust | Transport | |------|:---:|:---:|-----------| | Typed messages (`CmdVel`, `Imu`, `LaserScan`, etc.) | Yes | Yes | Zero-copy Pod (~1.7μs) | | `Image`, `PointCloud`, `DepthImage` | Yes | Yes | Pool-backed descriptor | | Python dicts (GenericMessage) | No | No | Python-only (MessagePack) | | Custom `message!` types | Yes | Yes | If same `#[repr(C)]` layout | **Rule**: Use typed message classes for cross-language topics. Python dicts only work between Python nodes. ### Launch File ```yaml # launch.yaml nodes: - name: sensor command: "horus run src/sensor.rs" rate_hz: 1000 - name: detector command: "horus run src/detector.py" rate_hz: 30 depends_on: [sensor] ``` ```bash horus launch launch.yaml ``` ### Debugging Cross-Language Issues ```bash # Verify both processes see the same topics horus topic list --verbose # Check message rate from Rust publisher horus topic hz camera.rgb # Watch messages flowing between languages horus topic echo camera.rgb ``` If a Python node can't receive from a Rust node: verify both use the **same typed message class** (e.g., `horus.Imu` in Python, `horus_library::messages::Imu` in Rust). String topics (GenericMessage) do not cross the boundary. --- ## Differences from Rust | Aspect | Python | Rust | |--------|--------|------| | Node definition | `horus.Node(tick=fn, ...)` kwargs | `impl Node for MyStruct` trait | | Topic creation | Auto-created from `pubs`/`subs` | Manual `Topic::new("name")` | | Async support | Auto-detected from `async def` | Explicit `.async_io()` builder | | Safety methods | Not available (`is_safe_state`, `enter_safe_state`) | Available on Node trait | | GIL | Acquired per tick, released during `run()` | N/A | | Performance | ~3μs overhead per tick (GIL acquire) | Zero overhead | | Error routing | Python exceptions → Rust `FailurePolicy` | Panics → `catch_unwind` → `FailurePolicy` | | Unit syntax | `300 * horus.us` | `300_u64.us()` (DurationExt) | --- ## Production Example: ML Inference Node ```python # simplified import horus import numpy as np # Assume a pre-loaded model model = load_model("yolov8n.pt") def detect_tick(node): img = node.recv("camera.rgb") if img is None: return # Convert to numpy (zero-copy via DLPack) frame = img.to_numpy() # Run inference detections = model.predict(frame) # Publish results for det in detections: node.send("detections", { "class": det.class_name, "confidence": float(det.confidence), "bbox": [det.x1, det.y1, det.x2, det.y2], "timestamp_ns": horus.timestamp_ns(), }) detector = horus.Node( name="yolo_detector", subs=[horus.Image], pubs=["detections"], tick=detect_tick, rate=30, compute=True, # Run on thread pool (CPU-bound inference) budget=30 * horus.ms, on_miss="skip", # Skip frame if inference takes too long ) horus.run(detector, tick_rate=100) ``` --- ## Design Decisions **Why a Python wrapper layer over PyO3?** The raw PyO3 bindings (`_horus`) expose Rust types directly, which have Rust-idiomatic APIs (builders, traits, enums). The Python wrapper (`horus`) translates these into Pythonic patterns: kwargs instead of builders, plain functions instead of trait methods, `None` instead of `Option`. This means the Python API can evolve independently of the Rust API. **Why kwargs, not class inheritance?** Class inheritance (`class MyNode(horus.Node): def tick(self):...`) requires boilerplate and doesn't work with plain functions or lambdas. The kwargs API (`horus.Node(tick=my_fn, rate=30)`) is more Pythonic, matches FastAPI/Click patterns, and all config happens in one call. **Why `recv()` returns data, not a callback?** Pull-based reception keeps timing deterministic — your tick controls when data is consumed. Push-based callbacks fire at unpredictable times, making budget compliance harder. This matches the Rust `try_recv()` pattern. **Why release the GIL during `run()`?** The scheduler's tick loop is Rust code. Releasing the GIL via `py.detach()` during `run()` lets other Python threads (e.g., a Flask telemetry server) run concurrently. The GIL is re-acquired only when calling Python tick/init/shutdown callbacks. **Why `has_msg()` uses peek buffering?** `has_msg()` internally calls `recv()` and buffers the result. The next `recv()` returns the buffered value. This avoids a separate "peek" API while keeping the common `if node.has_msg("x"): data = node.recv("x")` pattern zero-overhead. --- ## Trade-offs | Choice | Benefit | Cost | |--------|---------|------| | Kwargs over inheritance | Concise, works with lambdas | No IDE auto-complete for tick signature | | Auto-create topics from specs | Zero boilerplate for pubs/subs | Topics created even if never used | | GenericMessage for string topics | Any Python dict/object works | ~5-50μs vs ~1.5μs for typed | | GIL release during run() | Other threads run freely | ~3μs GIL re-acquire per tick | | Peek buffering for has_msg() | Clean API, no separate peek | Consumes message on has_msg() check | | No `is_safe_state`/`enter_safe_state` | Simpler Python API | Safety-critical nodes must use Rust | --- ## See Also **Core API:** - [Node API](/python/api/node) — Constructor, methods, topic specs, lifecycle - [Scheduler API](/python/api/scheduler) — Orchestration, RT, recording, `run()` - [Topic API](/python/api/topic) — Standalone pub/sub, typed vs generic - [Clock API](/python/api/clock) — Time functions, deterministic mode **Data Types:** - [Standard Messages](/python/api/messages) — 75+ typed robotics messages - [Image API](/python/api/image) — Zero-copy camera frames with NumPy/PyTorch - [PointCloud API](/python/api/pointcloud) — Zero-copy 3D point clouds - [TransformFrame](/python/api/transform-frame) — Coordinate frame management **Advanced:** - [Python Bindings Deep Dive](/python/api/python-bindings) — Full PyO3 binding reference - [Async Nodes](/python/api/async-nodes) — Async I/O patterns - [Custom Messages](/python/api/custom-messages) — Runtime and compiled messages - [Getting Started (Python)](/getting-started/quick-start-python) — First Python application --- ## Python Path: /python Description: Python bindings, message library, and examples for HORUS # Python HORUS provides full Python bindings via PyO3, giving you the same zero-copy shared memory IPC and real-time scheduling in Python. --- ## Quick Start ```python # simplified import horus def my_tick(node): if node.has_msg("sensor"): data = node.recv("sensor") node.send("output", {"processed": data["value"] * 2}) node = horus.Node("processor", subs=["sensor"], pubs=["output"], tick=my_tick, rate=100) horus.run(node) ``` --- ## API Reference | Page | Description | |------|-------------| | [Node API](/python/api/node) | Constructor kwargs, send/recv, topic specs, lifecycle | | [Scheduler API](/python/api/scheduler) | Orchestration, RT, recording, `run()` | | [Topic API](/python/api/topic) | Standalone pub/sub, typed vs generic | | [Clock API](/python/api/clock) | `now()`, `dt()`, deterministic mode | | [Standard Messages](/python/api/messages) | 75+ typed message types (sensor, control, geometry, ...) | | [Drivers](/python/api/drivers) | Hardware config loading from `horus.toml` | | [Error Types](/python/api/error-types) | Exception types and patterns | | [Image](/python/api/image), [PointCloud](/python/api/pointcloud) | Zero-copy domain types | | [TransformFrame](/python/api/transform-frame) | Coordinate frame tree | | [Async Nodes](/python/api/async-nodes) | Async I/O patterns | ## Guides - [Real-Time Systems](/python/real-time) — Budget, deadline, miss policies, GIL implications - [Node Lifecycle](/python/node-lifecycle) — Init, tick, shutdown deep dive - [Scheduler Deep-Dive](/python/scheduler-guide) — Advanced scheduling patterns - [Topics Deep-Dive](/python/topics-guide) — Topic patterns and best practices - [Message Library](/python/library/python-message-library) — 75+ typed message classes - [Examples](/python/examples) — Complete Python examples ## Design Decisions **Why Python bindings instead of Python-only?** The core runtime (scheduler, shared memory, transport) is Rust for deterministic real-time performance. Python bindings via PyO3 give Python users the same zero-copy IPC and scheduling guarantees. A pure Python runtime would add milliseconds of jitter to every tick and lose shared memory support. **When to use Python vs Rust:** Use Python for ML inference, prototyping, data visualization, and I/O-heavy tasks (HTTP, databases). Use Rust for real-time control loops, safety-critical nodes, and high-frequency sensor processing. Both languages share the same topics and messages via zero-copy IPC. --- ## See Also - [Quick Start (Python)](/getting-started/quick-start-python) — First Python application - [Choosing a Language](/getting-started/choosing-language) — Rust vs Python comparison - [Python API Overview](/python/api) — API landing page with cross-language guide - [Real Hardware Recipe](/recipes/real-hardware) — Complete I2C + serial examples with pip libraries ======================================== # SECTION: Development ======================================== --- ## Debugging Workflows Path: /development/debugging Description: Step-by-step debugging for deadline misses, panics, and performance bottlenecks # Debugging Workflows Your robot is misbehaving at runtime: motors stutter, nodes panic, or the system cannot keep up with its tick rate. Here are three step-by-step workflows to diagnose and fix the most common issues. ## When To Use This - A motor or actuator stutters or responds inconsistently (deadline misses) - A node crashes with a panic and you need to find the root cause - The system runs but cannot maintain its target tick rate (performance) **Use [Testing](/development/testing) instead** if you are writing tests before deployment, not debugging a live system. **Use [Monitor](/development/monitor) instead** if you want a visual dashboard for ongoing observation rather than targeted debugging. ## Prerequisites - A HORUS application that reproduces the problem - Familiarity with [Nodes](/concepts/core-concepts-nodes) and [Execution Classes](/concepts/execution-classes) - Access to a terminal on the robot (or SSH) ## Workflow 1: "My Motor Stutters" Stuttering usually means deadline misses — the control loop is not completing within its budget. ### Step 1: Check Scheduler Output Enable monitoring and look for deadline miss warnings in stderr: The scheduler prints a timing report on shutdown. Look for lines like: ``` [WARN] motor_ctrl: 12 deadline misses (worst: 2.3ms, budget: 1.0ms) ``` ### Step 2: Profile Tick Timing Use `profile()` to get percentile statistics: ```rust // simplified let report = scheduler.profile(5000)?; println!("{report}"); // Check per-node budget utilization for node in &report.nodes { if let Some(used) = node.budget_used { if used > 0.8 { println!("WARNING: {} using {:.0}% of budget", node.name, used * 100.0); } } } ``` If `p99` exceeds the budget, the node has latency spikes. If `p99` is much higher than `median`, the node's execution time is inconsistent. ### Step 3: Use Blackbox to Find the Exact Tick Enable the blackbox to record the last N ticks per node: After a miss, inspect the blackbox to find what happened on the tick that exceeded the budget. The blackbox records tick duration, input values, and events. ### Step 4: Fix Common Causes | Cause | Symptom | Fix | |-------|---------|-----| | Allocation in `tick()` | Sporadic spikes | Pre-allocate buffers in `init()` | | Blocking I/O | Consistent high latency | Move to `.async_io()` node | | Lock contention | Spikes correlated with other nodes | Use `try_lock()` or lock-free channels | | Large computation | Always near budget | Move to `.compute()` with a longer budget | ```rust // simplified // Bad: allocating every tick fn tick(&mut self) { let data: Vec = self.sensor.read_all(); // allocates self.process(&data); } // Good: pre-allocate, reuse buffer fn init(&mut self) -> Result<()> { self.buffer = vec![0.0; 128]; // allocate once Ok(()) } fn tick(&mut self) { self.sensor.read_into(&mut self.buffer); // reuse self.process(&self.buffer); } ``` ## Workflow 2: "My Node Panicked" A node panic is caught by the scheduler. The node is marked `Unhealthy` and `on_error()` is called. ### Step 1: Check on_error() Output Implement `on_error()` on your node to log the error: ### Step 2: Get a Full Backtrace ```bash RUST_BACKTRACE=1 ./target/release/my_robot ``` ### Step 3: Reproduce with Deterministic Mode Use deterministic mode and `tick_once()` to replay the exact scenario: Use `tick(&["motor_ctrl"])` to isolate a single node. ### Step 4: Fix the Panic Fix the bug directly if it is in your code. For panics in third-party code, wrap with `catch_unwind`: ```rust // simplified fn tick(&mut self) { let result = std::panic::catch_unwind(std::panic::AssertUnwindSafe(|| { self.flaky_library.update(); })); if let Err(e) = result { eprintln!("Library panicked: {e:?}"); } } ``` ## Workflow 3: "My System Is Slow" The system runs but cannot keep up with its tick rate. ### Step 1: Find the Slowest Node ```rust // simplified let report = scheduler.profile(1000)?; println!("{report}"); // Nodes are listed with median and p99 timing // The node with the highest p99 is your bottleneck for node in &report.nodes { println!("{}: median={:?} p99={:?}", node.name, node.median, node.p99); } ``` Sort by `p99` to find the node causing the most delay. ### Step 2: Check Execution Classes A common mistake is running heavy work as `BestEffort` (the default), which blocks the main thread: ### Step 3: Check CPU and Profile ```bash # Per-core CPU usage — if one core is 100% while others idle, use .compute() mpstat -P ALL 1 5 # Profile with perf to find hot functions perf record -g ./target/release/my_robot && perf report ``` | Symptom | Likely Cause | Fix | |---------|-------------|-----| | One core at 100% | Work not distributed | Use `.compute()`, `.cores(&[...])` | | Periodic spikes ~1s | Allocator pressure | Use `jemalloc`, pre-allocate | | Latency grows over time | Memory leak | Monitor RSS, fix leaking buffers | | Random multi-ms stalls | Page faults | `.require_rt()` calls `mlockall` | ## Common Errors | Symptom | Cause | Fix | |---------|-------|-----| | Motor stutters periodically | Deadline misses in control node | Profile with `scheduler.profile()`, increase `.budget()` or move allocations to `init()` | | Node marked `Unhealthy` | Panic caught by scheduler | Implement `on_error()`, run with `RUST_BACKTRACE=1`, reproduce with `tick_once()` | | System cannot keep tick rate | Bottleneck node blocking main thread | Profile to find slowest node, use `.compute()` or `.async_io()` for heavy work | | Sporadic latency spikes ~1s apart | Allocator pressure (malloc) | Pre-allocate buffers in `init()`, consider `jemalloc` | | Latency grows over time | Memory leak in a node | Monitor RSS over time, fix leaking buffers or growing collections | | Random multi-ms stalls | Page faults from virtual memory | Use `.require_rt()` which calls `mlockall` to lock pages | --- ## See Also - [Logging](/development/logging) — Structured logging with `hlog!`, `hlog_once!`, and `hlog_every!` - [BlackBox Flight Recorder](/advanced/blackbox) — Post-mortem tick analysis and anomaly filtering - [Monitor](/development/monitor) — Real-time visual debugging with web and TUI dashboards - [Execution Classes](/concepts/execution-classes) — Understanding Rt, Compute, Event, AsyncIo, and BestEffort - [Troubleshooting](/getting-started/troubleshooting) — Common issues and solutions --- ## Logging Path: /development/logging Description: Structured node logging with hlog!, hlog_once!, and hlog_every! macros # Logging You need to log diagnostic messages from your nodes without flooding the console or blocking the real-time loop. HORUS provides structured, node-aware logging macros that write to both the console and a shared memory buffer (visible in `horus monitor` and `horus log`). ## When To Use This - Adding diagnostic output to nodes during development - Rate-limiting log output from high-frequency nodes (100+ Hz) - Logging one-time events (first frame received, calibration complete) - Filtering and viewing logs by node name or severity level **Use [Telemetry Export](/development/telemetry) instead** if you need to send numeric metrics to an external monitoring system. **Use [BlackBox](/advanced/blackbox) instead** if you need post-mortem flight recorder data with per-tick granularity. ## Prerequisites - A HORUS project with `use horus::prelude::*;` - Familiarity with [Nodes](/concepts/core-concepts-nodes) (the logging context is set automatically per node) --- ## hlog! — Standard Logging Log a message with a level and the current node context: ### Log Levels | Level | Color | Use For | |-------|-------|---------| | `info` | Blue | Normal operation events (startup, config loaded, calibration done) | | `warn` | Yellow | Abnormal but recoverable conditions (battery low, sensor noisy) | | `error` | Red | Failures that need attention (hardware disconnected, topic timeout) | | `debug` | Gray | Detailed information for development (raw values, timing) | ### Output Format Logs appear on stderr with color and node attribution: ```text [INFO] [SensorNode] Initialized on /dev/ttyUSB0 [WARN] [BatteryMonitor] Battery at 15% — consider charging [ERROR] [MotorController] Failed to read encoder: timeout [DEBUG] [Planner] Path computed in 2.3ms, 47 waypoints ``` The scheduler automatically sets the node context before each `tick()`, `init()`, and `shutdown()` call — you don't need to pass the node name manually. ### Example in a Node --- ## hlog_once! — Log Once Per Run Log a message exactly once, regardless of how many times the callsite executes. Subsequent calls from the same location are silently ignored. Equivalent to ROS2's `RCLCPP_INFO_ONCE`. **Use for:** - First-time events ("calibration complete", "first message received") - One-time warnings ("running without GPU acceleration") - Init messages inside `tick()` that only matter once --- ## hlog_every! — Rate-Limited Logging Log at most once per N milliseconds. Prevents log flooding from high-frequency nodes. **Syntax:** `hlog_every!(interval_ms, level, format, args...)` Equivalent to ROS2's `RCLCPP_INFO_THROTTLE`. **Use for:** - Status updates from high-frequency nodes (>10 Hz) - Periodic health reports - Any log inside `tick()` that would otherwise flood the console --- ## Viewing Logs ### Console Logs appear on stderr in real-time with color coding. ### `horus log` CLI ```bash # View all logs horus log # Follow live (like tail -f) horus log -f # Filter by node name horus log SensorNode # Filter by level horus log --level warn # Show last N entries horus log -n 50 # Filter by time horus log -s "5m ago" # Clear logs horus log --clear ``` ### `horus monitor` The web dashboard (`horus monitor`) and TUI (`horus monitor -t`) show a live log stream with filtering by node and level. --- ## Best Practices | Do | Don't | |----|-------| | `hlog_every!(1000, ...)` in `tick()` at >10 Hz | `hlog!(info, ...)` every tick at 1000 Hz | | `hlog_once!(info, "First frame")` for one-time events | `if self.first { hlog!(...); self.first = false; }` | | `hlog!(info, ...)` in `init()` and `shutdown()` | Log only in `tick()` | | Include units: `"speed: {:.1} m/s"` | Bare numbers: `"speed: {}"` | | Use `debug` for high-volume data | Use `info` for everything | ## Common Errors | Symptom | Cause | Fix | |---------|-------|-----| | Console flooded with log lines | Using `hlog!` in a high-frequency `tick()` | Use `hlog_every!(1000, ...)` to rate-limit to once per second | | Log messages missing node name | Logging outside of scheduler context | Only use `hlog!` inside `init()`, `tick()`, or `shutdown()` where the scheduler sets the node context | | Python logging not appearing | Using `print()` instead of node log methods | Use `node.log_info()`, `node.log_warning()`, etc. | | Logs not visible in `horus log` | Application not writing to shared memory buffer | Ensure you use `hlog!` macros (not `println!` or `eprintln!`) | --- ## See Also - [Monitor](/development/monitor) — Live log dashboard with filtering by node and level - [CLI Reference](/development/cli-reference) — `horus log` command options and filters - [BlackBox](/advanced/blackbox) — Flight recorder for post-mortem per-tick analysis - [Debugging Workflows](/development/debugging) — Step-by-step debugging with log-based diagnosis --- ## Telemetry Export Path: /development/telemetry Description: Export runtime metrics to HTTP, UDP, file, or stdout for external monitoring and dashboards # Telemetry Export You need to send HORUS runtime metrics (node tick durations, message counts, deadline misses) to an external monitoring system like Grafana, Prometheus, or a custom dashboard. Here is how to enable telemetry export with a single builder method. ## When To Use This - Sending scheduler metrics to Grafana, InfluxDB, or Prometheus via an HTTP receiver - Logging metrics to a file for offline analysis - Streaming metrics over UDP to a LAN monitoring tool - Debugging performance with stdout metric output **Use [Monitor](/development/monitor) instead** if you want a built-in web/TUI dashboard without external tooling. **Use [Logging](/development/logging) instead** if you need text-based diagnostic messages, not numeric metrics. ## Prerequisites - A HORUS Rust project with `use horus::prelude::*;` - The `telemetry` feature enabled on `horus_core` (enabled by default) - For HTTP export: a running HTTP endpoint that accepts JSON POST requests ## Quick Start Enable telemetry with a single builder method: The scheduler exports a JSON snapshot every 1 second (default interval). --- ## Architecture Telemetry in HORUS is designed around one constraint: **the scheduler tick loop must never block on I/O**. The architecture achieves this with a three-stage pipeline that decouples metric collection from serialization and delivery. ``` ┌─────────────────────────────────────────────────────────────┐ │ Scheduler Tick Loop (RT thread) │ │ │ │ for each node: │ │ tick() → measure duration → record in Profiler │ │ │ │ if export_interval elapsed: │ │ Profiler.node_stats → TelemetryManager.gauge/counter │ │ TelemetryManager.build_snapshot() │ │ TelemetryManager.export() │ │ ├─ HTTP → try_send(snapshot) on bounded channel ──┐ │ │ ├─ UDP → sendto() (single syscall, fire-forget) │ │ │ ├─ File → write_all() (buffered, local disk) │ │ │ └─ Stdout→ println() (debugging only) │ │ └──────────────────────────────────────────────────────┬───┘ │ │ │ ┌─────────────────────────────────────┘ │ │ Bounded Channel (capacity: 4) │ │ try_send() — non-blocking │ │ If full → snapshot silently dropped │ └──────────────┬──────────────────────────────┘ │ ┌──────────────▼──────────────────────────────┐ │ Background Thread ("horus-telemetry-http") │ │ │ │ loop: │ │ snapshot = rx.recv() │ │ TcpStream::connect(url) │ │ HTTP POST (JSON body) │ │ read status line (200/201/204 = ok) │ │ │ │ On Drop: sender dropped → recv returns Err │ │ → thread exits, handle.join() │ └──────────────────────────────────────────────┘ ``` ### Data flow step by step 1. **Collect** -- The scheduler tick loop executes each node's `tick()` and records timing data in the internal `Profiler`. This is just a `HashMap` update per node, with no allocation on the hot path. 2. **Package** -- At the configured export interval (default: 1 second), `TelemetryManager` reads the profiler stats and calls `gauge_with_labels()` / `counter_with_labels()` for each node. It then calls `build_snapshot()`, which clones the accumulated `HashMap` into a `TelemetrySnapshot` struct. 3. **Deliver** -- `export()` dispatches the snapshot to the configured endpoint: - **HTTP**: The snapshot is sent via `try_send()` on a bounded `mpsc::sync_channel(4)`. If the background thread is busy with a slow POST and the channel is full, the snapshot is silently dropped. The scheduler thread returns immediately in all cases. - **UDP**: A single `sendto()` syscall. The socket is pre-bound at startup and cached. - **File**: `File::create()` + `write_all()` with pretty-printed JSON. - **Stdout**: Direct `println!()` for debugging. ### Why the background thread only applies to HTTP UDP, file, and stdout exports are inherently fast and bounded -- a UDP `sendto()` completes in microseconds, file writes go to the kernel page cache, and stdout is local. Only HTTP involves potentially unbounded network latency (DNS, TCP handshake, slow receivers), which is why it gets its own dedicated thread. ### Graceful shutdown When the `TelemetryManager` is dropped (scheduler exit), it drops the channel sender. The background thread's `rx.recv()` returns `Err`, breaking the loop. The `Drop` implementation then calls `handle.join()` to drain any queued snapshots before the process exits, ensuring no data is lost on clean shutdown. --- ## Endpoint Types The `.telemetry(endpoint)` method accepts a string that determines the export backend: | String Format | Endpoint | Behavior | |--------------|----------|----------| | `"http://host:port/path"` | HTTP POST | Non-blocking — background thread handles network I/O | | `"https://host:port/path"` | HTTPS POST | Same as HTTP with TLS | | `"udp://host:port"` | UDP datagram | Compact JSON, single packet per snapshot | | `"file:///path/to/metrics.json"` | Local file | Pretty-printed JSON, overwritten each export | | `"/path/to/metrics.json"` | Local file | Same as `file://` prefix | | `"stdout"` or `"local"` | Stdout | Pretty-printed to terminal (debugging) | | `"disabled"` or `""` | Disabled | No export (default) | ### HTTP Endpoint (Recommended for Production) ```rust // simplified let mut scheduler = Scheduler::new() .telemetry("http://localhost:9090/metrics"); ``` HTTP export is **fully non-blocking** for the scheduler: 1. Scheduler calls `export()` on its cycle — posts a snapshot to a bounded channel (capacity 4) 2. A dedicated background thread reads from the channel and performs the HTTP POST 3. If the channel is full (receiver slow), the snapshot is silently dropped — the scheduler never blocks ### UDP Endpoint (Low Overhead) ```rust // simplified let mut scheduler = Scheduler::new() .telemetry("udp://192.168.1.100:9999"); ``` Sends compact single-line JSON per snapshot. Good for LAN monitoring where packet loss is acceptable. ### File Endpoint (Debugging & Logging) ```rust // simplified let mut scheduler = Scheduler::new() .telemetry("/tmp/horus-metrics.json"); ``` Overwrites the file on each export cycle. Useful for debugging or feeding into log aggregation pipelines. --- ## JSON Payload Format Every export produces a `TelemetrySnapshot`: ```json { "timestamp_secs": 1710547200, "scheduler_name": "motor_control", "uptime_secs": 42.5, "metrics": [ { "name": "node.tick_duration_us", "value": { "Gauge": 145.2 }, "labels": { "node": "MotorCtrl" }, "timestamp_secs": 1710547200 }, { "name": "node.total_ticks", "value": { "Counter": 4250 }, "labels": { "node": "MotorCtrl" }, "timestamp_secs": 1710547200 }, { "name": "scheduler.deadline_misses", "value": { "Counter": 0 }, "labels": {}, "timestamp_secs": 1710547200 } ] } ``` ### Metric Value Types | Type | JSON | Description | |------|------|-------------| | Counter | `{ "Counter": 42 }` | Monotonically increasing (total ticks, messages sent) | | Gauge | `{ "Gauge": 3.14 }` | Current value (tick duration, CPU usage) | | Histogram | `{ "Histogram": [0.1, 0.2, 0.15] }` | Distribution of values | | Text | `{ "Text": "Healthy" }` | String status | --- ## Auto-Collected Metrics When telemetry is enabled, the scheduler automatically exports: | Metric Name | Type | Labels | Description | |-------------|------|--------|-------------| | `node.total_ticks` | Counter | `node` | Total ticks executed | | `node.tick_duration_us` | Gauge | `node` | Last tick duration in microseconds | | `node.errors` | Counter | `node` | Total tick errors | | `scheduler.deadline_misses` | Counter | — | Total deadline misses across all nodes | | `scheduler.uptime_secs` | Gauge | — | Scheduler uptime | --- ## Feature Flag Telemetry requires the `telemetry` feature on `horus_core` — **enabled by default**. To disable at compile time (saves binary size): ```toml [dependencies] horus = { version = "0.1", default-features = false, features = ["macros", "blackbox"] } ``` --- ## Integration with External Tools ### Grafana + Custom Receiver Write a small HTTP server that receives the JSON POST and forwards metrics to Prometheus/InfluxDB: ```python # receiver.py — minimal Flask example from flask import Flask, request app = Flask(__name__) @app.route("/metrics", methods=["POST"]) def metrics(): snapshot = request.json for m in snapshot["metrics"]: print(f"{m['name']} = {m['value']}") return "ok", 200 app.run(port=9090) ``` ### horus monitor (Alternative) For local debugging, `horus monitor` provides a built-in TUI dashboard — no external setup needed. See [Monitor](/development/monitor) for details. ### Prometheus Integration Prometheus scrapes metrics over HTTP in its own exposition format, not JSON. HORUS exports JSON, so you need a small bridge that converts the JSON payload into Prometheus-compatible text. There are two approaches. **Approach A: Python bridge (recommended for quick setup)** This script receives HORUS JSON POSTs and serves a `/metrics` endpoint that Prometheus scrapes: ```python # prometheus_bridge.py from flask import Flask, request from prometheus_client import Gauge, Counter, generate_latest, REGISTRY app = Flask(__name__) # Dynamic metric storage gauges = {} counters = {} def get_or_create_gauge(name, labels): key = name if key not in gauges: label_keys = list(labels.keys()) if labels else [] gauges[key] = Gauge( name.replace(".", "_"), f"HORUS metric: {name}", label_keys, ) return gauges[key] def get_or_create_counter(name, labels): key = name if key not in counters: label_keys = list(labels.keys()) if labels else [] counters[key] = Counter( name.replace(".", "_"), f"HORUS metric: {name}", label_keys, ) return counters[key] @app.route("/ingest", methods=["POST"]) def ingest(): """Receives HORUS telemetry JSON and updates Prometheus metrics.""" snapshot = request.json for m in snapshot["metrics"]: labels = m.get("labels", {}) value_dict = m["value"] if "Gauge" in value_dict: g = get_or_create_gauge(m["name"], labels) if labels: g.labels(**labels).set(value_dict["Gauge"]) else: g.set(value_dict["Gauge"]) elif "Counter" in value_dict: c = get_or_create_counter(m["name"], labels) # Prometheus counters only increment; set to the HORUS total current = c._value.get() if not labels else 0 delta = value_dict["Counter"] - current if delta > 0: if labels: c.labels(**labels).inc(delta) else: c.inc(delta) return "ok", 200 @app.route("/metrics") def metrics(): """Prometheus scrape endpoint.""" return generate_latest(REGISTRY), 200, {"Content-Type": "text/plain; charset=utf-8"} if __name__ == "__main__": app.run(host="0.0.0.0", port=9090) ``` Configure HORUS to push to the bridge: ```rust // simplified let mut scheduler = Scheduler::new() .tick_rate(100_u64.hz()) .telemetry("http://localhost:9090/ingest"); // POST to the bridge ``` Add the bridge to your `prometheus.yml`: ```yaml scrape_configs: - job_name: "horus" scrape_interval: 5s static_configs: - targets: ["localhost:9090"] ``` **Approach B: File-based with `node_exporter` textfile collector** For environments where you cannot run an extra HTTP service, write telemetry to a file that Prometheus `node_exporter` picks up: ```rust // simplified let mut scheduler = Scheduler::new() .tick_rate(100_u64.hz()) .telemetry("/var/lib/node_exporter/textfile/horus.json"); ``` Then use a cron job or sidecar script to convert the JSON file to Prometheus exposition format: ```python #!/usr/bin/env python3 # json_to_prom.py — convert HORUS JSON to Prometheus textfile format import json, sys with open(sys.argv[1]) as f: snapshot = json.load(f) output = sys.argv[2] if len(sys.argv) > 2 else "/var/lib/node_exporter/textfile/horus.prom" lines = [] for m in snapshot["metrics"]: name = m["name"].replace(".", "_") labels = m.get("labels", {}) label_str = ",".join(f'{k}="{v}"' for k, v in labels.items()) label_part = "{" + label_str + "}" if label_str else "" value_dict = m["value"] for vtype in ("Gauge", "Counter"): if vtype in value_dict: lines.append(f"horus_{name}{label_part} {value_dict[vtype]}") with open(output, "w") as f: f.write("\n".join(lines) + "\n") ``` Run it periodically: `watch -n 1 python3 json_to_prom.py /var/lib/node_exporter/textfile/horus.json` --- ## Performance Overhead All telemetry export happens **outside the scheduler's critical timing path**. The overhead per export cycle depends on the endpoint: | Endpoint | Overhead per export | Thread | Notes | |----------|-------------------|--------|-------| | HTTP | ~50us on the scheduler thread | Background thread does the actual POST | Scheduler only pays for `build_snapshot()` + `try_send()` into the bounded channel. Network I/O is fully off the RT thread. | | UDP | ~5us | Scheduler thread (single syscall) | `sendto()` is fire-and-forget. Socket is pre-bound at startup. No connection overhead per export. | | File | ~10us | Scheduler thread (buffered write) | `File::create()` + `write_all()`. Goes to kernel page cache, not disk. Actual disk flush is asynchronous. | | Stdout | Negligible | Scheduler thread | `println!()` to terminal. Only useful for debugging; not recommended in production. | | Disabled | Zero | N/A | No-op. `should_export()` returns `false` immediately. | ### What "overhead" means in practice At a 100 Hz tick rate, the scheduler has a 10 ms budget per tick. A 50us telemetry export (the worst case for HTTP) consumes 0.5% of that budget, and it only runs once per second (every 100th tick), not every tick. The amortized per-tick cost is effectively 0.005%. For UDP and file endpoints, the overhead is even smaller. The `should_export()` check (a single `Instant::elapsed()` comparison) runs every tick and costs <1us. ### Memory overhead `TelemetryManager` maintains a `HashMap` that grows with the number of unique metric names. For a typical system with 10 nodes, this is approximately 2-4 KB. The bounded channel for HTTP holds at most 4 snapshots, each roughly 1-2 KB serialized. --- ## Complete Example ## Common Errors | Symptom | Cause | Fix | |---------|-------|-----| | No metrics exported | Telemetry endpoint not configured | Add `.telemetry("http://localhost:9090/metrics")` to `Scheduler::new()` builder | | HTTP metrics silently dropped | Receiver is slow or unreachable, bounded channel full (capacity 4) | Check receiver is running, increase receiver throughput — the scheduler never blocks | | Compile error: telemetry method not found | `telemetry` feature disabled | Ensure `horus` dependency has `telemetry` feature (enabled by default) | | File endpoint not updating | File overwritten each cycle but viewer caches | Re-read the file or use `watch cat /tmp/horus-metrics.json` | | UDP metrics lost | Network packet loss on LAN | Expected behavior for UDP; use HTTP for reliable delivery | --- ## Design Decisions Understanding **why** telemetry works this way helps you make the right integration choices for your system. ### Why non-blocking (RT safety) HORUS is a real-time robotics framework. The scheduler tick loop has hard timing budgets -- a motor control node running at 1 kHz has exactly 1 ms per tick. If the telemetry export blocked on a slow HTTP server or a full TCP buffer, it would cause deadline misses and potentially unsafe behavior. The non-blocking design ensures that **telemetry is always best-effort and never degrades control loop timing**. The bounded channel with `try_send()` is the key mechanism. On the scheduler thread, `export()` either succeeds instantly (channel has space) or drops the snapshot instantly (channel full). Both paths complete in nanoseconds. The actual network I/O happens on a separate thread that the scheduler never waits on. ### Why bounded channel with drop (not backpressure) Backpressure means "slow the producer when the consumer is slow." In telemetry for a real-time system, this is exactly wrong -- you never want your 1 kHz motor controller to slow down because Grafana's ingest endpoint is overloaded. The bounded channel (capacity 4) provides a small buffer for transient slowness (e.g., a brief network hiccup) while hard-capping memory usage. When the buffer is full, the oldest-unsent snapshot becomes stale anyway -- dropping it is the correct behavior because a newer snapshot with fresher data will arrive within a second. The capacity of 4 is deliberate: at the default 1-second export interval, it absorbs up to 4 seconds of receiver downtime before dropping begins. This handles common transient issues (GC pauses, brief network congestion) without unbounded memory growth. ### Why JSON (not Protocol Buffers or MessagePack) Three reasons: 1. **Debuggability** -- You can pipe HORUS telemetry to `jq`, read it in a browser, or `cat` the file endpoint directly. Binary formats require dedicated tools to inspect. 2. **No build dependency** -- Protocol Buffers require `protoc` and `.proto` files. MessagePack requires additional codec libraries. JSON serialization via `serde_json` is already a transitive dependency of HORUS and adds zero extra build complexity. 3. **Adequate performance** -- Serializing a typical telemetry snapshot (~10 metrics) to JSON takes <10us. For a 1 Hz export, this is negligible. If HORUS ever needs sub-millisecond export rates (unlikely for telemetry), a binary format would be reconsidered. ### Why feature-gated The `telemetry` feature on `horus_core` is enabled by default, but it can be disabled for two reasons: - **Binary size** -- Disabling `telemetry` removes `serde_json`, the `TelemetryManager`, and the HTTP background thread infrastructure. For deeply embedded targets with tight flash constraints, this matters. - **Attack surface** -- In locked-down production deployments, some teams prefer to compile out all network-facing code paths. Disabling the feature guarantees at the type level that no telemetry endpoint is reachable. When disabled, all telemetry builder methods are either absent (compile error if used) or no-op, depending on the feature gate configuration. There is no runtime overhead. --- ## Trade-offs | Decision | Benefit | Cost | |----------|---------|------| | Non-blocking `try_send()` for HTTP | Scheduler never stalls on network I/O | Snapshots may be silently dropped if receiver is slow | | Bounded channel (capacity 4) | Predictable memory usage, absorbs transient slowness | Only 4 seconds of buffering before drops begin | | Silent drop (no error log on full channel) | No log spam when receiver is down for extended periods | Harder to notice when telemetry delivery is degraded (check receiver-side metrics) | | JSON serialization | Human-readable, no build dependencies, easy debugging | ~5x larger payload than protobuf; ~3x slower serialization (still <10us per snapshot) | | Single background thread (HTTP only) | Minimal resource usage, simple shutdown semantics | HTTP throughput limited to one in-flight POST at a time; pipelining not supported | | File endpoint overwrites (not appends) | Simple, bounded disk usage, always shows latest state | No history; use HTTP + a time-series database for historical data | | UDP fire-and-forget | Lowest overhead (~5us), good for LAN monitoring | No delivery guarantees; packet loss on congested networks | | 1-second default interval | Low overhead, sufficient for most dashboards | Not suitable for sub-second alerting (use `horus monitor` for real-time views) | | Feature-gated (compile-time opt-out) | Zero overhead when disabled, smaller binary | Must recompile to enable/disable; no runtime toggle | | Per-metric `HashMap` accumulation | Deduplicates metrics by name, latest value always wins | Small allocation per unique metric name; not lock-free (acceptable since only the scheduler thread writes) | --- ## See Also - [Monitor](/development/monitor) — Built-in web and TUI dashboards (no external setup needed) - [Logging](/development/logging) — Structured text logging with `hlog!` macros - [Scheduler API](/rust/api/scheduler) — Scheduler builder methods including `.telemetry()` - [Debugging Workflows](/development/debugging) — Using telemetry data to diagnose performance issues - [Operations](/operations) — Production monitoring and deployment patterns --- ## Monitor Guide Path: /development/monitor Description: Monitor, debug, and manage your HORUS applications in real-time with web and TUI dashboards # Monitor You need to observe your running robot system in real-time: see which nodes are active, watch message flow between topics, inspect performance metrics, and tune parameters without restarting. Here is how to use the HORUS Monitor. ## When To Use This - Debugging message flow between nodes ("is my publisher actually sending data?") - Monitoring node performance and tick rates during development - Live tuning of PID gains, speed limits, and other runtime parameters - Remote monitoring of headless robots over SSH (TUI mode) - Verifying system health before field deployment **Use [Telemetry Export](/development/telemetry) instead** if you need to send metrics to external dashboards like Grafana or Prometheus. **Use [Debugging Workflows](/development/debugging) instead** if you need to diagnose a specific problem like deadline misses or panics. ## Prerequisites - A running HORUS application (`horus run`) - A second terminal for the monitor (or access via browser from another device) ## Quick Start ```bash # Start your HORUS application horus run # In another terminal, start the monitor horus monitor ``` Browser opens automatically to `http://localhost:3000`. On first run, you'll be prompted to set a password (or press Enter to skip). ```bash # Custom port horus monitor 8080 # Terminal UI mode (no browser needed) horus monitor --tui # Reset password horus monitor --reset-password ``` ## How It Works The monitor is a read-only observer that attaches to a running HORUS application without modifying its behavior. It reads data that the scheduler already writes as part of normal operation. ``` ┌─────────────────────────────────────────────────────────────────────┐ │ HORUS Application │ │ │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │ Node A │ │ Node B │ │ Node C │ Scheduler writes │ │ │ tick() │ │ tick() │ │ tick() │ NodeMetrics + topic │ │ └────┬─────┘ └────┬─────┘ └────┬─────┘ metadata to SHM │ │ │ │ │ after each tick cycle │ │ ▼ ▼ ▼ │ │ ┌──────────────────────────────────────────┐ │ │ │ Shared Memory (SHM) │ │ │ │ ┌────────────┐ ┌───────────────────┐ │ │ │ │ │ NodeMetrics │ │ Topic Ring Buffers│ │ │ │ │ │ per node │ │ headers + data │ │ │ │ │ └────────────┘ └───────────────────┘ │ │ │ └──────────────────────┬───────────────────┘ │ └─────────────────────────┼───────────────────────────────────────────┘ │ mmap read-only│ (no copies, no syscalls) │ ┌─────────────────────────┼───────────────────────────────────────────┐ │ horus monitor │ │ │ ▼ │ │ ┌──────────────────────────────────┐ │ │ │ SHM Reader (mmap) │ Reads NodeMetrics, topic │ │ │ /proc scan for node discovery │ headers, log buffer │ │ └──────┬───────────────┬───────────┘ │ │ │ │ │ │ ┌─────▼─────┐ ┌────▼──────────────────────┐ │ │ │ TUI Mode │ │ Web Mode │ │ │ │ ratatui │ │ ┌──────────┐ ┌──────────┐ │ │ │ │ redraws │ │ │ Axum │ │WebSocket │ │ │ │ │ at 4 Hz │ │ │ HTTP │ │ push │ │ │ │ │ │ │ │ server │ │ at 4 Hz │ │ │ │ └───────────┘ │ └──────────┘ └──────────┘ │ │ │ └────────────────────────────-┘ │ └─────────────────────────────────────────────────────────────────────┘ ``` **Data flow step by step:** 1. **Scheduler writes metrics** -- After each tick cycle, the scheduler updates `NodeMetrics` (tick count, duration, deadline misses) in the node's SHM region. Topic ring buffer headers already contain pub/sub counts, pending messages, and drop counts as part of normal IPC operation. 2. **Monitor reads via mmap** -- The monitor process opens the same SHM files read-only via `mmap`. This is a pointer dereference, not a copy -- reads hit L1 cache when the data is recent. The monitor scans `/proc` to discover running node processes and reads the SHM topics directory to enumerate active topics. 3. **Web UI via Axum** -- In web mode, an Axum HTTP server runs on a separate thread. REST endpoints (`/api/nodes`, `/api/topics`, `/api/graph`) return JSON snapshots. A WebSocket at `/api/ws` pushes live updates to connected browsers at 4 Hz (every 250ms). 4. **TUI via ratatui** -- In TUI mode, a crossterm/ratatui terminal UI polls for keyboard input at 100ms intervals and refreshes the display at 250ms intervals (4 Hz). No HTTP server is started. ## Web Interface The web monitor has **3 main tabs**: ### Monitor Tab The main monitoring view with two sub-views: **List View** — Shows nodes and topics in a grid layout: - **Nodes card**: All running nodes with their status - **Topics card**: Active message channels with sizes **Graph View** — Interactive canvas showing: - Nodes as circles connected to their topics - Visual representation of the pub/sub network - Helps answer "which nodes are talking to which topics?" A **status bar** at the top always shows: - Active Nodes count (hover for node list) - Active Topics count (hover for topic list) - Monitor port **What the dashboard looks like:** ``` ┌─────────────────────────────────────────────────────────────────┐ │ [Monitor] [Parameters] [Packages] ● 5 Nodes ● 8 Topics │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ ┌─ Nodes ──────────────────┐ ┌─ Topics ──────────────────┐ │ │ │ ● imu_driver [RT] │ │ sensors.imu 4 msgs │ │ │ │ ● camera_node [Comp] │ │ sensors.lidar 12 msgs │ │ │ │ ● slam_engine [RT] │ │ cmd_vel 1 msg │ │ │ │ ● planner [Comp] │ │ map.grid 0 msgs │ │ │ │ ● motor_driver [RT] │ │ odom 2 msgs │ │ │ └──────────────────────────┘ └────────────────────────────┘ │ │ │ │ [List View] [Graph View] │ │ │ └─────────────────────────────────────────────────────────────────┘ ``` In **Graph View**, nodes appear as circles and topics as labeled connection points. Arrows show publish direction. Hovering a node highlights all of its connected topics. ### Parameters Tab Live runtime parameter editor: - **Search** parameters by name - **Add** new parameters at runtime - **Edit** existing values (changes apply immediately) - **Delete** parameters - **Export** all parameters to file - **Import** parameters from file Useful for tuning PID gains, speed limits, sensor thresholds without restarting. ### Packages Tab Browse and manage HORUS packages: - Search the registry - Install packages - Manage environments ## Terminal UI Mode For SSH sessions and headless servers: ```bash horus monitor --tui ``` The TUI provides **8 tabs** navigated with arrow keys: | Tab | Description | |-----|-------------| | **Overview** | System health summary with log panel | | **Nodes** | Running nodes with detailed metrics | | **Topics** | Active topics and message flow | | **Network** | Network connections and transport status | | **TransformFrame** | TransformFrame protocol inspection | | **Packages** | Package management | | **Params** | Runtime parameter editor | | **Recordings** | Session recordings browser | **What the TUI looks like:** ``` ┌─ HORUS Monitor ──────────────────────────────────────────────┐ │ [Overview] [Nodes] [Topics] [Network] [TF] [Pkg] [Par] [Rec]│ ├──────────────────────────────────────────────────────────────┤ │ │ │ System Health: OK Uptime: 00:14:32 │ │ Active Nodes: 5/5 Tick Rate: 100 Hz │ │ │ │ ┌─ Node Status ────────────────────────────────────────┐ │ │ │ imu_driver ████████████████████░░ 92% budget │ │ │ │ camera_node ██████████░░░░░░░░░░░░ 45% budget │ │ │ │ slam_engine ██████████████████░░░░ 78% budget │ │ │ │ planner ████████░░░░░░░░░░░░░░ 35% budget │ │ │ │ motor_driver ██████████████░░░░░░░░ 62% budget │ │ │ └──────────────────────────────────────────────────────┘ │ │ │ │ ┌─ Log ─────────────────────────────────────────────── ┐ │ │ │ 14:32:01 [INFO] imu_driver: tick 142800 (0.4ms) │ │ │ │ 14:32:01 [INFO] slam_engine: map updated (2.1ms) │ │ │ │ 14:32:01 [WARN] camera_node: frame dropped │ │ │ └──────────────────────────────────────────────────────┘ │ │ │ │ ← → Navigate tabs ↑ ↓ Select Enter: Details q: Quit │ └──────────────────────────────────────────────────────────────┘ ``` Navigate between tabs with left/right arrow keys. Within each tab, use up/down arrows to select items, Enter to open detail panels, and Esc to close them. Press `q` to quit, `p` to pause/resume updates, `?` to show the help overlay. ### Topic Debug Logging In the **Topics** tab, press **Enter** on any topic to enable runtime debug logging. All `send()` and `recv()` calls on that topic will emit live log entries showing direction, IPC latency, and message summaries (if `LogSummary` is implemented). Press **Esc** to disable logging — zero overhead resumes immediately. No code changes or recompilation required. ## Performance Overhead The monitor is designed to be always-on during development with negligible impact on your application. | Component | Overhead | Notes | |-----------|----------|-------| | SHM metric reads | Sub-microsecond | mmap pointer dereference, hits L1/L2 cache | | `/proc` node scan | ~1ms per scan | Runs at 4 Hz in the monitor process, not in your application | | HTTP server (Axum) | Own thread | Separate OS thread, does not compete with RT node threads | | WebSocket push | 4 Hz (250ms) | JSON serialization of node/topic snapshots, ~2-5 KB per push | | TUI redraw | 4 Hz (250ms) | Terminal write in monitor process only, event polling at 10 Hz | | Topic debug logging | ~100ns per `send()`/`recv()` | Only when verbose is enabled on a specific topic. Writes one log entry per call to the global log buffer | | Parameter reads | ~50ns | Lock-free atomic reads from `RuntimeParams` SHM | | **Total on application** | **Less than 0.1% CPU** | Monitor runs as a separate process. The only cost inside your application is the metric writes the scheduler already does | **When does overhead increase?** - **Topic debug logging**: Enabling verbose mode on a high-frequency topic (e.g., 1 kHz IMU) adds log entries at that rate. Each entry is ~100ns, but at 1 kHz that is 100 microseconds per second -- still negligible, but visible in profiling. - **Many WebSocket clients**: Each connected browser receives the full snapshot. With 10+ simultaneous browser tabs, JSON serialization time may reach ~1ms per push cycle. - **Parameter writes**: Setting parameters from the web UI triggers a write to the params SHM. This is a one-time cost per edit, not a recurring overhead. ## Network Access The monitor binds to all network interfaces (`0.0.0.0`), so you can access it from: - Same machine: `http://localhost:3000` - Any device on the network: `http://:3000` **Always set a password** when the monitor is network-accessible. ## Security The monitor supports password-based authentication for networked deployments. ### Setup On first run, set a password (or press Enter to skip authentication): ```bash horus monitor [SECURITY] HORUS Monitor - First Time Setup Password: ******** Confirm password: ******** [SUCCESS] Password set successfully! ``` Reset password anytime: ```bash horus monitor --reset-password ``` ### How Authentication Works When a password is set: 1. The web UI shows a login page before granting access 2. All API endpoints require a valid session token (except `/api/login`) 3. Sessions expire after 1 hour of inactivity 4. Failed login attempts are rate-limited When no password is set (Enter pressed at setup): - All endpoints are accessible without authentication - Suitable for local development only ### API Authentication ```bash # Login — returns a session token curl -X POST http://localhost:3000/api/login \ -H "Content-Type: application/json" \ -d '{"password": "your_password"}' # Returns: {"token": "abc123..."} # Use token for API requests curl http://localhost:3000/api/nodes \ -H "Authorization: Bearer abc123..." # Logout curl -X POST http://localhost:3000/api/logout \ -H "Authorization: Bearer abc123..." ``` ### Security Details | Feature | Value | |---------|-------| | Password hashing | Argon2id | | Session timeout | 1 hour inactivity | | Rate limiting | 5 attempts per 60 seconds | | Token size | 256-bit random (base64-encoded) | Password hash stored at `~/.horus/dashboard_password.hash`. For production deployments, consider placing a reverse proxy with TLS (e.g., nginx) in front of the monitor. ### Recovery If locked out: ```bash # Option 1: Reset via CLI horus monitor --reset-password # Option 2: Delete the password hash file rm ~/.horus/dashboard_password.hash horus monitor # Re-prompts for password setup ``` ## API Endpoints The monitor exposes a REST API (authenticated when a password is set): | Endpoint | Method | Description | |----------|--------|-------------| | `/api/status` | GET | System health status | | `/api/nodes` | GET | Running nodes info | | `/api/topics` | GET | Active topics | | `/api/graph` | GET | Node-topic graph | | `/api/network` | GET | Network connections | | `/api/logs/all` | GET | All logs | | `/api/logs/node/:name` | GET | Logs for specific node | | `/api/logs/topic/:name` | GET | Logs for specific topic | | `/api/params` | GET | List parameters | | `/api/params/:key` | GET/POST/DELETE | Get/set/delete parameter | | `/api/params/export` | POST | Export all parameters | | `/api/params/import` | POST | Import parameters | | `/api/packages/registry` | GET | Search packages | | `/api/packages/install` | POST | Install package | | `/api/packages/uninstall` | POST | Uninstall package | | `/api/recordings` | GET | List recordings | | `/api/login` | POST | Authenticate | | `/api/logout` | POST | End session | ## Common Scenarios ### Debugging Message Flow **"My subscriber isn't getting messages"** 1. Open Monitor tab, switch to Graph View 2. Is there an arrow from publisher -> topic -> subscriber? 3. If not: check topic name matches or node isn't running **"The robot is running slow"** 1. Check nodes list for high CPU usage 2. Check tick rates — which node can't keep up? 3. Use logs endpoint to check for slow tick warnings ### Live Parameter Tuning **Tuning PID controller:** 1. Open Parameters tab 2. Search for `pid` 3. Edit `pid.kp` value — change applies instantly 4. Watch robot behavior, adjust until optimal 5. Export final values with Export button ## Common Errors | Symptom | Cause | Fix | |---------|-------|-----| | Monitor shows nothing | HORUS application not running | Start your app first with `horus run` | | Cannot access from another device | Devices on different networks or firewall blocking | Ensure same network, run `sudo ufw allow 3000` | | Port already in use | Another monitor or process on port 3000 | Use a different port: `horus monitor 8080` | | Locked out of password | Forgotten password | Run `horus monitor --reset-password` or delete `~/.horus/dashboard_password.hash` | | TUI rendering broken | Terminal does not support 256 colors | Use a modern terminal (kitty, alacritty, wezterm) or try `TERM=xterm-256color horus monitor --tui` | | API returns 401 Unauthorized | Session expired or invalid token | Re-authenticate via `/api/login` endpoint | ## Design Decisions ### Why SHM-based, not network-based Traditional monitoring tools (e.g., Prometheus exporters, ROS2 introspection) add network hops to collect metrics. HORUS chose shared memory because: - **Zero overhead when not monitoring.** The scheduler writes `NodeMetrics` to SHM regardless -- it uses the same data for deadline enforcement. No extra serialization, no sockets, no packets. - **No configuration.** The monitor auto-discovers running nodes by scanning `/proc` and the SHM topics directory. No need to configure exporters, ports, or scrape endpoints. - **Works offline.** SHM monitoring works without any network stack, which matters on embedded systems and in containers without network access. If you need to send metrics to an external system (Grafana, Prometheus, Datadog), use [Telemetry Export](/development/telemetry) -- it reads the same SHM data and forwards it over the network. ### Why both Web and TUI The monitor provides two interfaces because robots are developed and deployed in different environments: - **Web UI** is for development machines with a browser available. It supports the interactive graph view, drag-and-drop parameter files, and multiple team members can open it simultaneously from different devices on the network. - **TUI** is for SSH sessions into headless robots, CI environments, and embedded systems. It requires only a terminal -- no browser, no X11, no port forwarding. The TUI has feature parity with the web UI for reading data (nodes, topics, params, logs) and additionally supports topic debug logging via mmap. Both interfaces read from the same SHM data source. You can run them simultaneously -- `horus monitor` in one terminal for the web UI and `horus monitor --tui` in another for the TUI. ### Why password auth, not tokens or mTLS The monitor uses simple password-based sessions (Argon2id hashing, 256-bit random session tokens) instead of API keys, OAuth, or mutual TLS: - **Single-user tool.** The monitor is typically accessed by one developer or a small team on a local network. OAuth and mTLS add complexity with no benefit. - **No external dependencies.** Password hashing is done locally with Argon2id. No identity provider, no certificate authority, no token refresh flows. - **Quick setup.** First run prompts for a password. No config files, no key generation, no certificate management. For production deployments exposed to the internet, place a reverse proxy with TLS (e.g., nginx) in front of the monitor rather than building TLS into the monitor itself. ### Why read-only access to SHM The monitor opens SHM files with read-only mmap. It cannot modify topic buffers, corrupt node state, or interfere with the scheduler. The single exception is the topic verbose flag (one byte per topic), which enables debug logging. This is intentional: - **Safety.** A monitoring tool must never be able to crash or corrupt the system it observes. Read-only mmap enforces this at the OS level. - **No synchronization needed.** The monitor reads atomic counters and ring buffer headers that the scheduler writes. No locks, no contention, no priority inversion risk for RT nodes. ## Trade-offs | Choice | Benefit | Cost | |--------|---------|------| | SHM-based monitoring | Zero overhead, no network config, works offline | Only works on the same machine (use Telemetry Export for remote) | | Separate monitor process | Cannot crash the application, clean resource isolation | Must be started separately (`horus monitor`), adds one process | | Web + TUI dual interface | Works everywhere: browser, SSH, headless, embedded | Two codebases to maintain, feature parity requires discipline | | Password auth (not mTLS) | Simple setup, no PKI infrastructure needed | No per-user access control, no audit trail beyond rate limiting | | Read-only SHM access | Cannot corrupt application state, no lock contention | Cannot inject test data or modify parameters from the SHM side (uses params API instead) | | 4 Hz refresh rate | Smooth real-time feel without CPU waste | Cannot capture sub-250ms transient events (use BlackBox or topic debug logging for those) | | `/proc` scan for node discovery | Works without any registration protocol | Linux-specific, ~1ms per scan, may show stale entries briefly after node crash | --- ## See Also - [CLI Reference](/development/cli-reference) — Full `horus monitor` command options - [Parameters Guide](/development/parameters) — Runtime parameter management in detail - [Debugging Workflows](/development/debugging) — Step-by-step diagnosis for deadline misses, panics, and slowdowns - [Telemetry Export](/development/telemetry) — Export metrics to external systems (Grafana, Prometheus) - [Operations](/operations) — Production monitoring and deployment --- ## Native Tool Integration Path: /development/native-tools Description: Use cargo, pip, and cmake directly in HORUS projects — transparent proxy with automatic horus.toml sync # Native Tool Integration You want to use `cargo`, `pip`, and `cmake` the way you always have, but inside a HORUS project where `horus.toml` is the source of truth. HORUS transparently proxies these tools so commands are routed through the build pipeline: native build files are generated from `horus.toml`, the real tool runs against those files, and dependency changes sync back to `horus.toml`. Outside a HORUS project, the tools behave exactly as they normally would. ## When To Use This - Using `cargo add`, `cargo build`, `cargo test` inside a HORUS project - Using `pip install` to add Python dependencies that sync to `horus.toml` - Using `cmake` for C++ components that integrate with the HORUS build - Working with third-party cargo subcommands (`cargo audit`, `cargo nextest`, `cargo llvm-cov`) **Use [horus add](/development/cli-reference) instead** if you prefer the HORUS-native command for adding dependencies directly. **Use the bypass (`HORUS_NO_PROXY=1 cargo build`) instead** if you need to run the real tool without HORUS interception. ## Prerequisites - HORUS installed (the installer sets up proxies automatically) - A project with `horus.toml` in the directory tree ## How It Works When you run a proxied command, here is what happens under the hood: 1. **You type a command** — for example, `cargo build`. 2. **Shell function intercepts it** — the HORUS installer adds thin shell functions that shadow the real `cargo`, `pip`, and `cmake` binaries. 3. **Project detection** — the shell function calls `horus _is-project` to check whether the current directory (or any parent) contains a `horus.toml`. 4. **Delegation** — if a HORUS project is detected, the command is delegated to `horus cargo build` (or `horus pip ...`, `horus cmake ...`). 5. **Build file generation** — the proxy generates `.horus/Cargo.toml` (or `.horus/pyproject.toml`, `.horus/CMakeLists.txt`) from `horus.toml`. 6. **Real tool execution** — the real `cargo` runs with `--manifest-path .horus/Cargo.toml`, pointing it at the generated build file. 7. **Sync back** — if the command modified dependencies (e.g., `cargo add serde`), the proxy detects the change and syncs it back to `horus.toml`. If no HORUS project is detected in step 3, the shell function passes the command straight to the real binary with no interception. ``` ┌──────────────┐ ┌────────────────┐ ┌──────────────────┐ │ cargo build │────▶│ shell function │────▶│ horus _is-project│ └──────────────┘ └────────────────┘ └──────────────────┘ │ ┌───────────────────────┤ ▼ ▼ ┌────────────────┐ ┌────────────────┐ │ horus project │ │ not a project │ │ horus cargo ..│ │ real cargo │ └────────────────┘ └────────────────┘ │ ▼ ┌────────────────┐ │ generate │ │ .horus/ │ │ Cargo.toml │ └────────────────┘ │ ▼ ┌────────────────┐ │ run real cargo │ │ --manifest-path│ │ .horus/... │ └────────────────┘ │ ▼ ┌────────────────┐ │ sync changes │ │ back to │ │ horus.toml │ └────────────────┘ ``` --- ## Setup ### Automatic (recommended) The HORUS installer sets up native tool proxying automatically. After installing HORUS, open a new terminal and verify: ```bash type cargo # Should show: cargo is a function ``` If you see `cargo is a function`, the proxy is active. ### Manual Setup If you installed HORUS without the installer, or need to re-initialize: ```bash # Write shell environment and register proxies horus env --init ``` This does two things: 1. Writes `~/.horus/env.sh` containing the shell functions for `cargo`, `pip`, and `cmake`. 2. Adds a `source ~/.horus/env.sh` line to your `~/.bashrc` and/or `~/.zshrc`. Restart your shell (or `source ~/.horus/env.sh`) to activate. ### Removing Proxies To uninstall the shell proxies and restore the original tool behavior: ```bash horus env --uninstall ``` This removes the source line from your shell config and deletes `~/.horus/env.sh`. The original `cargo`, `pip`, and `cmake` binaries are untouched. --- ## Cargo All standard `cargo` commands work transparently inside a HORUS project: ```bash # Building cargo build # builds from .horus/Cargo.toml cargo build --release # release build cargo build -p member # workspace member selection # Dependencies cargo add serde # adds to .horus/Cargo.toml, syncs to horus.toml cargo add tokio -F full # features work too cargo remove serde # removes from both # Testing and quality cargo test # runs tests cargo clippy # linting cargo fmt # formatting cargo bench # benchmarks # Documentation cargo doc # generate docs cargo doc --open # generate and open in browser # Third-party subcommands cargo audit # security audit (if installed) cargo nextest run # alternative test runner cargo llvm-cov # code coverage ``` ### Workspace Members If your `horus.toml` defines workspace members, the `-p` flag works as expected: ```bash cargo build -p my_driver # build only the driver crate cargo test -p my_algorithm # test only the algorithm crate ``` --- ## Pip Python dependency management syncs bidirectionally with `horus.toml`: ```bash # Installing packages pip install numpy # installs + syncs to horus.toml [dependencies] pip install "requests>=2" # version constraints preserved pip install -e . # editable install (rewrites paths to .horus/) # Removing packages pip uninstall numpy # removes package + syncs removal from horus.toml # Read-only commands (no sync needed) pip list # pass-through, no sync pip freeze # pass-through pip show numpy # pass-through ``` ### Editable Installs When you run `pip install -e .`, the proxy rewrites the path to point at `.horus/pyproject.toml` so the editable install references the correct generated build file. --- ## CMake CMake integration rewrites build paths to keep generated files inside `.horus/`: ```bash # Configure — source dir is rewritten to .horus/ cmake . # configures with -B .horus/cpp-build # Build cmake --build .horus/cpp-build # Install cmake --install .horus/cpp-build # Common options pass through cmake . -DCMAKE_BUILD_TYPE=Release cmake . -G Ninja ``` The proxy ensures that `CMakeLists.txt` in `.horus/` is generated from the `[cmake]` section of your `horus.toml` before running the real `cmake`. --- ## How Sync Works The proxy uses fingerprinting to detect and merge changes between `horus.toml` and native build files. ### Fingerprinting Each time the proxy generates a `.horus/Cargo.toml` (or other native file), it computes a SHA-256 hash of the generated content and stores it in `.horus/.fingerprints`. On the next invocation, it compares: - **Current `horus.toml`** against the last-generated fingerprint. - **Current native file** against the last-generated fingerprint. ### Detecting External Changes If the native file changed but `horus.toml` did not, the proxy knows the native tool modified the file (e.g., `cargo add` wrote to `.horus/Cargo.toml`). It parses the diff and applies matching changes to `horus.toml`. If `horus.toml` changed but the native file did not, the proxy regenerates the native file from `horus.toml`. If both changed, `horus.toml` wins and the native file is regenerated. ### Internal Dependencies Dependencies on HORUS workspace crates (`horus_core`, `horus_library`, `horus_macros`, etc.) are never synced back to `horus.toml`. These are injected by the build pipeline based on your project configuration and should not appear in user-facing dependency lists. --- ## Bypassing the Proxy If you ever need to run the real tool directly, bypassing the HORUS proxy: ```bash # Use the full path /usr/bin/cargo build $(which -a cargo | tail -1) build # Or use command to skip the shell function command cargo build # Or temporarily disable HORUS_NO_PROXY=1 cargo build ``` --- ## Common Errors | Symptom | Cause | Fix | |---------|-------|-----| | `cargo` not being proxied (`type cargo` shows binary path) | Shell proxy not installed or sourced | Run `horus env --init` then `source ~/.horus/env.sh` | | `cargo add` does not update `horus.toml` | Fingerprint state out of sync | Run `horus build` to force regeneration and sync | | Native file conflicts with `horus.toml` | Both files modified independently | `horus.toml` always wins; run `horus build` to regenerate `.horus/` files | | Proxy not detecting HORUS project | No `horus.toml` in current or parent directories | Verify with `horus _is-project`, ensure `horus.toml` exists in project root | | `pip install` not syncing to `horus.toml` | Read-only pip commands (`list`, `freeze`, `show`) do not trigger sync | Only install/uninstall commands trigger sync; verify with `horus param list` | --- ## See Also - [CLI Reference](/development/cli-reference) — Full `horus env`, `horus add`, and `horus build` command options - [horus.toml](/concepts/horus-toml) — Project manifest that the proxy syncs with - [Multi-Crate Workspaces](/development/workspaces) — Workspace support for native tool integration - [Package Management](/package-management/package-management) — Registry-based dependency management --- ## Multi-Crate Workspaces Path: /development/workspaces Description: Organize robotics projects with multiple crates — shared libraries, drivers, and binary targets in one workspace # Multi-Crate Workspaces You need to split your robotics project into multiple crates: shared message types, a hardware abstraction layer, algorithm libraries, and the main binary. HORUS workspaces let you organize these as members that compile together, share dependencies, and reference each other directly, without publishing anything to a registry. ## When To Use This - Your project has shared message types used by multiple crates - You need a driver abstraction layer that separates hardware from application logic - Algorithm libraries (PID, SLAM, path planning) should be reusable across projects - Build times are slow because the entire project recompiles on every change **Use a single-file project instead** if you are prototyping or have a simple system with all nodes in one file. ## Prerequisites - A HORUS project initialized with `horus new --workspace` - Familiarity with [horus.toml](/concepts/horus-toml) (workspace and member configuration) - Familiarity with Rust crate structure (`src/lib.rs`, `src/main.rs`) ## Why Workspaces A single-crate project works fine for prototypes, but production robotics code benefits from separation: - **Shared message types** — Define sensor readings, commands, and state once. Every crate imports them - **Driver abstraction** — Isolate hardware-specific code behind traits. Swap real hardware for simulation without touching application logic - **Algorithm libraries** — PID controllers, path planners, and SLAM modules become reusable across projects - **Faster compilation** — Only changed crates recompile. Your message types crate rarely changes, so it compiles once - **Clear ownership** — Each crate has a focused purpose. Code review and testing stay manageable ## Creating a Workspace ```bash horus new my-robot --workspace -r ``` This generates a workspace scaffold with a single binary member: ``` my-robot/ ├── horus.toml # [workspace] members = ["crates/*"] ├── .horus/ │ ├── Cargo.toml # Generated workspace Cargo.toml │ └── my-robot/ │ └── Cargo.toml # Generated member Cargo.toml └── crates/ └── my-robot/ ├── horus.toml # [package] name = "my-robot" └── src/ └── main.rs ``` The root `horus.toml` defines workspace-level settings. Each member under `crates/` has its own `horus.toml` for package-specific configuration. ## Adding Library Members Add library crates alongside your binary: ```bash cd my-robot horus new messages --lib -r -o crates horus new driver --lib -r -o crates ``` Your workspace now looks like this: ``` my-robot/ ├── horus.toml ├── .horus/ └── crates/ ├── messages/ │ ├── horus.toml │ └── src/ │ └── lib.rs ├── driver/ │ ├── horus.toml │ └── src/ │ └── lib.rs └── my-robot/ ├── horus.toml └── src/ └── main.rs ``` ## Workspace horus.toml Format The root `horus.toml` defines workspace membership and shared dependencies: ```toml # Root horus.toml [workspace] members = ["crates/*"] exclude = ["crates/experimental"] [workspace.dependencies] horus_library = "0.1.9" serde = { version = "1.0", source = "crates.io", features = ["derive"] } nalgebra = { version = "0.33", source = "crates.io" } ``` **`members`** accepts glob patterns. `"crates/*"` includes every directory under `crates/`. **`exclude`** removes specific directories from the glob match. Useful for in-progress crates that should not build with the rest of the workspace. **`[workspace.dependencies]`** centralizes version and source declarations. Members inherit these without repeating version numbers. ## Member horus.toml Format Each member declares its own package metadata and dependencies: ```toml # crates/messages/horus.toml [package] name = "my-robot-messages" version = "0.1.0" type = "lib" [dependencies] serde = { workspace = true } ``` The `workspace = true` marker tells horus to pull the version, source, and features from the root `[workspace.dependencies]` table. ## Target Types The `type` field in a member's `[package]` table controls what gets built: | Type | What it produces | Use case | |------|-----------------|----------| | `"bin"` | Executable binary (default) | Main application, CLI tools | | `"lib"` | Library crate | Shared types, drivers, algorithms | | `"both"` | Library + binary in same crate | Library with a companion CLI | ```toml # A crate that is both a library and a binary [package] name = "my-robot-driver" version = "0.1.0" type = "both" ``` ## Dependency Inheritance When a member uses `workspace = true`, it inherits the version, source, and features from the root `[workspace.dependencies]`: ```toml # Root horus.toml [workspace.dependencies] serde = { version = "1.0", source = "crates.io", features = ["derive"] } ``` ```toml # crates/messages/horus.toml [dependencies] serde = { workspace = true } # Inherits: version = "1.0", source = "crates.io", features = ["derive"] ``` Members can add extra features on top of the inherited ones: ```toml # crates/driver/horus.toml [dependencies] serde = { workspace = true, features = ["alloc"] } # Gets "derive" from workspace + "alloc" from this member ``` This keeps version management in one place. Bump `serde` once in the root, and every member picks up the change. ## Inter-Member Dependencies Members reference each other with path dependencies: ```toml # crates/controller/horus.toml [package] name = "my-robot-controller" version = "0.1.0" type = "lib" [dependencies] my-robot-messages = { path = "../messages" } my-robot-driver = { path = "../driver" } horus_library = { workspace = true } ``` Path dependencies resolve relative to the member's directory. The build system handles everything — no publishing or installation required. ## Building and Running Workspace-level commands operate on all members: ```bash horus build # Build all members horus build -p messages # Build specific member horus run -p my-robot # Run specific binary horus test # Test all members horus test -p driver # Test specific member ``` Native cargo commands also work through the generated build files: ```bash cargo build --workspace # Build everything via native toolchain cargo test --workspace # Test everything cargo clippy --workspace # Lint everything ``` ## Generated Build Files Horus generates native build files in `.horus/` from your `horus.toml` declarations. You should never edit these directly: ``` .horus/ ├── Cargo.toml # [workspace] with members list ├── my-robot-messages/ │ └── Cargo.toml # [lib] pointing to ../../crates/messages/src/ ├── my-robot-driver/ │ └── Cargo.toml # [lib] pointing to ../../crates/driver/src/ ├── my-robot-controller/ │ └── Cargo.toml # [lib] pointing to ../../crates/controller/src/ ├── my-robot/ │ └── Cargo.toml # [[bin]] pointing to ../../crates/my-robot/src/ └── target/ # Shared build artifacts for all members ``` The generated `Cargo.toml` files point back to your source directories. All members share a single `target/` directory, so common dependencies compile once. Running `horus build` regenerates these files if your `horus.toml` has changed, then invokes cargo against the `.horus/` workspace. ## Example: Complete Robotics Workspace Here is a realistic project structure for an autonomous robot: ``` my-robot/ ├── horus.toml └── crates/ ├── messages/ # Shared types │ ├── horus.toml │ └── src/lib.rs ├── driver/ # Hardware abstraction │ ├── horus.toml │ └── src/lib.rs ├── controller/ # Algorithms │ ├── horus.toml │ └── src/lib.rs └── my-robot/ # Main binary ├── horus.toml └── src/main.rs ``` ### messages crate Define sensor readings and commands that every other crate shares: ```rust // simplified // crates/messages/src/lib.rs use serde::{Deserialize, Serialize}; #[derive(Clone, Debug, Serialize, Deserialize)] pub struct LaserScan { pub ranges: Vec, pub angle_min: f32, pub angle_max: f32, pub range_max: f32, } #[derive(Clone, Debug, Serialize, Deserialize)] pub struct CmdVel { pub linear_x: f64, pub angular_z: f64, } #[derive(Clone, Debug, Serialize, Deserialize)] pub struct Odometry { pub x: f64, pub y: f64, pub theta: f64, } ``` ```toml # crates/messages/horus.toml [package] name = "my-robot-messages" version = "0.1.0" type = "lib" [dependencies] serde = { workspace = true } ``` ### driver crate Abstract hardware behind traits so you can swap real sensors for simulated ones: ```rust // simplified // crates/driver/src/lib.rs use my_robot_messages::{CmdVel, LaserScan, Odometry}; pub trait LidarDriver: Send + Sync { fn scan(&self) -> LaserScan; } pub trait MotorDriver: Send + Sync { fn send_velocity(&self, cmd: &CmdVel); fn read_odometry(&self) -> Odometry; } ``` ```toml # crates/driver/horus.toml [package] name = "my-robot-driver" version = "0.1.0" type = "lib" [dependencies] my-robot-messages = { path = "../messages" } ``` ### controller crate Implement algorithms that depend on message types and driver traits: ```rust // simplified // crates/controller/src/lib.rs use my_robot_messages::{CmdVel, LaserScan}; pub struct ObstacleAvoider { pub min_distance: f32, pub turn_speed: f64, } impl ObstacleAvoider { pub fn compute(&self, scan: &LaserScan) -> CmdVel { let closest = scan.ranges.iter().cloned().fold(f32::MAX, f32::min); if closest < self.min_distance { CmdVel { linear_x: 0.0, angular_z: self.turn_speed } } else { CmdVel { linear_x: 0.5, angular_z: 0.0 } } } } ``` ```toml # crates/controller/horus.toml [package] name = "my-robot-controller" version = "0.1.0" type = "lib" [dependencies] my-robot-messages = { path = "../messages" } ``` ### Main binary Tie everything together with horus nodes: ```toml # crates/my-robot/horus.toml [package] name = "my-robot" version = "0.1.0" [dependencies] horus_library = { workspace = true } my-robot-messages = { path = "../messages" } my-robot-controller = { path = "../controller" } ``` ### Root workspace configuration ```toml # Root horus.toml [workspace] members = ["crates/*"] [workspace.dependencies] horus_library = "0.1.9" serde = { version = "1.0", source = "crates.io", features = ["derive"] } ``` Build and run the whole project: ```bash horus build # Compiles messages, driver, controller, then my-robot horus run -p my-robot # Runs the main binary horus test # Tests all crates ``` ## Common Errors | Symptom | Cause | Fix | |---------|-------|-----| | `unresolved import` for a workspace member | Member not added as a path dependency | Add `my-member = { path = "../member" }` to the consuming crate's `horus.toml` | | `workspace = true` not resolving | Dependency not declared in root `[workspace.dependencies]` | Add the dependency to the root `horus.toml` `[workspace.dependencies]` table | | `horus build` compiles everything every time | Source directory symlinks or timestamps changing | Ensure `.horus/` generated `Cargo.toml` points to correct `../../crates/` paths | | Member not included in build | Glob pattern does not match directory | Check `members = ["crates/*"]` matches your directory structure, or add explicit member paths | | Feature merge conflicts | Same dependency with different features across members | Centralize features in `[workspace.dependencies]` and use `workspace = true` everywhere | --- ## See Also - [horus.toml](/concepts/horus-toml) — Project manifest format for workspaces and members - [CLI Reference](/development/cli-reference) — `horus build`, `horus run -p`, and workspace commands - [Native Tool Integration](/development/native-tools) — Using `cargo` natively inside HORUS workspaces - [Package Management](/package-management/package-management) — Publishing workspace members to the registry --- ## Testing HORUS Applications Path: /development/testing Description: Unit testing, integration testing, tick_once testing, and record/replay for HORUS nodes # Testing HORUS Applications You need to verify that your HORUS nodes work correctly before deploying to a real robot. Here is how to write unit tests, integration tests, and simulation tests using Rust's built-in test framework and HORUS testing utilities. ## When To Use This - Writing unit tests for individual node logic - Testing multi-node pipelines with real shared memory IPC - Using `tick_once()` for deterministic single-tick testing - Setting up CI/CD test pipelines with `horus test` - Recording and replaying sessions for regression testing **Use [Debugging Workflows](/development/debugging) instead** if you need to diagnose a problem in a running system rather than write tests. ## Prerequisites - A HORUS project with `horus.toml` (see [Quick Start](/getting-started/quick-start)) - Familiarity with [Nodes](/concepts/core-concepts-nodes) and [Topics](/concepts/core-concepts-topic) - Rust testing basics (`#[test]`, `#[cfg(test)]`, `assert!`) ## Testing Strategies | Strategy | When to Use | Complexity | |----------|-------------|------------| | **Unit test (single node)** | Test node logic in isolation | Low | | **Integration test (multi-node)** | Test nodes communicating through topics | Medium | | **Business logic extraction** | Test pure functions without topics | Lowest | | **`tick_once()` testing** | Deterministic single-tick scheduler tests | Medium | | **Record/replay** | Reproduce bugs and regression test | High | ## Unit Testing a Single Node Test individual node behavior in isolation. ### Example: Testing a Temperature Sensor ### Run the Tests ### Key Testing Patterns **1. Test Node Creation:** ```rust // simplified #[test] fn test_node_creation() { let node = MyNode::new().unwrap(); assert_eq!(node.some_field, expected_value); } ``` **2. Test Initialization:** ```rust // simplified #[test] fn test_init() { let mut node = MyNode::new().unwrap(); assert!(node.init().is_ok()); } ``` **3. Test Tick Logic:** ```rust // simplified #[test] fn test_tick() { let mut node = MyNode::new().unwrap(); node.tick(); // Verify state changes assert_eq!(node.counter, 1); } ``` **4. Test Shutdown:** ```rust // simplified #[test] fn test_shutdown() { let mut node = MyNode::new().unwrap(); assert!(node.shutdown().is_ok()); } ``` ## Testing Multiple Nodes Together Test nodes communicating through topics. ### Example: Publisher-Subscriber Test **File: `src/main.rs`** ```rust // simplified use horus::prelude::*; use std::sync::{Arc, Mutex}; // Publisher node pub struct PublisherNode { data_pub: Topic, } impl PublisherNode { pub fn new() -> Result { Ok(Self { data_pub: Topic::new("test_data")?, }) } } impl Node for PublisherNode { fn name(&self) -> &str { "PublisherNode" } fn tick(&mut self) { self.data_pub.send(42.0); } } // Subscriber node that stores received data pub struct SubscriberNode { data_sub: Topic, received: Arc>>, } impl SubscriberNode { pub fn new(received: Arc>>) -> Result { Ok(Self { data_sub: Topic::new("test_data")?, received, }) } } impl Node for SubscriberNode { fn name(&self) -> &str { "SubscriberNode" } fn tick(&mut self) { if let Some(data) = self.data_sub.recv() { self.received.lock().unwrap().push(data); } } } fn main() -> Result<()> { let received = Arc::new(Mutex::new(Vec::new())); let mut scheduler = Scheduler::new(); scheduler.add(PublisherNode::new()?).order(0).build()?; scheduler.add(SubscriberNode::new(received)?).order(1).build()?; scheduler.run()?; Ok(()) } #[cfg(test)] mod tests { use super::*; use std::thread; use std::time::Duration; #[test] fn test_pubsub_communication() { // Shared storage for received messages let received = Arc::new(Mutex::new(Vec::new())); // Create publisher and subscriber let mut pub_node = PublisherNode::new().unwrap(); let mut sub_node = SubscriberNode::new(Arc::clone(&received)).unwrap(); // Publish a message pub_node.tick(); // Small delay to allow IPC (shared memory needs time to propagate) thread::sleep(Duration::from_millis(10)); // Subscriber receives the message sub_node.tick(); // Verify message was received let data = received.lock().unwrap(); assert_eq!(data.len(), 1); assert_eq!(data[0], 42.0); } #[test] fn test_multiple_messages() { let received = Arc::new(Mutex::new(Vec::new())); let mut pub_node = PublisherNode::new().unwrap(); let mut sub_node = SubscriberNode::new(Arc::clone(&received)).unwrap(); // Publish 5 messages for _ in 0..5 { pub_node.tick(); thread::sleep(Duration::from_millis(5)); sub_node.tick(); } // Verify all messages received let data = received.lock().unwrap(); assert_eq!(data.len(), 5); for value in data.iter() { assert_eq!(*value, 42.0); } } } ``` ### Run Integration Tests ```bash horus test test_pubsub_communication --test-threads 1 ``` **Why `--test-threads 1`?** - Prevents tests from running in parallel (this is the default for `horus test`) - Avoids shared memory conflicts between tests - Ensures deterministic behavior **Expected Output:** ``` running 2 tests test tests::test_pubsub_communication ... ok test tests::test_multiple_messages ... ok test result: ok. 2 passed; 0 failed; 0 ignored; 0 measured ``` ## Testing Business Logic in Isolation Test node logic without real Topic connections. ### Example: Extracting Testable Logic ```rust // simplified use horus::prelude::*; // Node that processes temperature data pub struct TemperatureProcessor { input_sub: Topic, output_pub: Topic, } impl TemperatureProcessor { pub fn new() -> Result { Ok(Self { input_sub: Topic::new("input_temp")?, output_pub: Topic::new("output_temp")?, }) } // Public method for testing business logic pub fn process_temperature(&self, temp: f32) -> f32 { // Convert Celsius to Fahrenheit temp * 9.0 / 5.0 + 32.0 } } impl Node for TemperatureProcessor { fn name(&self) -> &str { "TemperatureProcessor" } fn tick(&mut self) { if let Some(celsius) = self.input_sub.recv() { let fahrenheit = self.process_temperature(celsius); self.output_pub.send(fahrenheit); } } } #[cfg(test)] mod tests { use super::*; #[test] fn test_temperature_conversion_logic() { // Test business logic WITHOUT Topic let processor = TemperatureProcessor::new().unwrap(); // Test known conversions assert_eq!(processor.process_temperature(0.0), 32.0); assert_eq!(processor.process_temperature(100.0), 212.0); assert_eq!(processor.process_temperature(-40.0), -40.0); } #[test] fn test_with_mock_data() { let mut processor = TemperatureProcessor::new().unwrap(); // We can't easily mock Topic, but we can test the logic // by calling process_temperature directly let celsius_readings = vec![0.0, 10.0, 20.0, 30.0, 100.0]; let expected_fahrenheit = vec![32.0, 50.0, 68.0, 86.0, 212.0]; for (celsius, expected) in celsius_readings.iter().zip(expected_fahrenheit.iter()) { let result = processor.process_temperature(*celsius); assert_eq!(result, *expected); } } } ``` ### Testing Strategy Without Topic Mocks Since HORUS Topics use real shared memory, full mocking is complex. Instead: **1. Extract Business Logic:** ```rust // simplified // Good: Business logic in testable method pub fn process_temperature(&self, temp: f32) -> f32 { temp * 9.0 / 5.0 + 32.0 } // Test this directly without Topic #[test] fn test_logic() { let node = TemperatureProcessor::new().unwrap(); assert_eq!(node.process_temperature(0.0), 32.0); } ``` **2. Test Tick with Real Topics:** ```rust // simplified // Topics are lightweight — use real ones in tests #[test] fn test_with_real_topic() { let mut node = MyNode::new().unwrap(); node.tick(); // Verify behavior } ``` **3. Use Shared State for Verification:** ```rust // simplified // Store results in node for verification pub struct TestNode { pub last_result: Option, } #[test] fn test_result() { let mut node = TestNode::new(); node.tick(); assert_eq!(node.last_result, Some(42.0)); } ``` ## Complete Testing Example A fully tested 3-node system. **File: `src/main.rs`** ```rust // simplified use horus::prelude::*; use std::sync::{Arc, Mutex}; // Node 1: Generate numbers pub struct GeneratorNode { output_pub: Topic, counter: u32, } impl GeneratorNode { pub fn new() -> Result { Ok(Self { output_pub: Topic::new("numbers")?, counter: 0, }) } } impl Node for GeneratorNode { fn name(&self) -> &str { "GeneratorNode" } fn tick(&mut self) { self.counter += 1; self.output_pub.send(self.counter); } } // Node 2: Double the numbers pub struct DoublerNode { input_sub: Topic, output_pub: Topic, } impl DoublerNode { pub fn new() -> Result { Ok(Self { input_sub: Topic::new("numbers")?, output_pub: Topic::new("doubled")?, }) } } impl Node for DoublerNode { fn name(&self) -> &str { "DoublerNode" } fn tick(&mut self) { if let Some(n) = self.input_sub.recv() { self.output_pub.send(n * 2); } } } // Node 3: Collect results pub struct CollectorNode { input_sub: Topic, collected: Arc>>, } impl CollectorNode { pub fn new(collected: Arc>>) -> Result { Ok(Self { input_sub: Topic::new("doubled")?, collected, }) } } impl Node for CollectorNode { fn name(&self) -> &str { "CollectorNode" } fn tick(&mut self) { if let Some(n) = self.input_sub.recv() { self.collected.lock().unwrap().push(n); } } } fn main() -> Result<()> { let collected = Arc::new(Mutex::new(Vec::new())); let mut scheduler = Scheduler::new(); scheduler.add(GeneratorNode::new()?).order(0).build()?; scheduler.add(DoublerNode::new()?).order(1).build()?; scheduler.add(CollectorNode::new(collected)?).order(2).build()?; scheduler.run()?; Ok(()) } #[cfg(test)] mod tests { use super::*; use std::thread; use std::time::Duration; #[test] fn test_generator_node() { let mut node = GeneratorNode::new().unwrap(); // Initial state assert_eq!(node.counter, 0); // After 3 ticks for _ in 0..3 { node.tick(); } assert_eq!(node.counter, 3); } #[test] fn test_pipeline() { let collected = Arc::new(Mutex::new(Vec::new())); let mut gen = GeneratorNode::new().unwrap(); let mut dbl = DoublerNode::new().unwrap(); let mut col = CollectorNode::new(Arc::clone(&collected)).unwrap(); // Run 5 iterations of the pipeline for _ in 0..5 { gen.tick(); thread::sleep(Duration::from_millis(5)); dbl.tick(); thread::sleep(Duration::from_millis(5)); col.tick(); } // Verify results: 1*2=2, 2*2=4, 3*2=6, 4*2=8, 5*2=10 let results = collected.lock().unwrap(); assert_eq!(*results, vec![2, 4, 6, 8, 10]); } } ``` ### Run All Tests ```bash horus test --test-threads 1 ``` **Output:** ``` running 2 tests test tests::test_generator_node ... ok test tests::test_pipeline ... ok test result: ok. 2 passed; 0 failed; 0 ignored; 0 measured ``` ## Best Practices ### 1. Test Business Logic Separately Extract pure functions for easy testing: ```rust // simplified // Good: Pure function (easy to test) fn calculate_velocity(distance: f32, time: f32) -> f32 { distance / time } #[test] fn test_velocity() { assert_eq!(calculate_velocity(100.0, 10.0), 10.0); } ``` ### 2. Use Arc for Shared Test Data Share data between nodes for verification: ```rust // simplified let results = Arc::new(Mutex::new(Vec::new())); let node = TestNode::new(Arc::clone(&results))?; // Later in test assert_eq!(results.lock().unwrap().len(), 5); ``` ### 3. Add Small Delays for IPC Shared memory needs time to propagate: ```rust // simplified pub_node.tick(); thread::sleep(Duration::from_millis(10)); // Allow IPC sub_node.tick(); ``` ### 4. Run Tests Sequentially `horus test` defaults to single-threaded execution to prevent shared memory conflicts. If you use `--parallel`, ensure each test uses unique topic names: ```bash horus test # Already single-threaded by default ``` ### 5. Test Edge Cases ```rust // simplified #[test] fn test_no_messages() { let mut node = SubscriberNode::new().unwrap(); node.tick(); // Should handle gracefully } #[test] fn test_invalid_data() { let result = node.process(-1.0); assert!(result.is_err()); } ``` ## Running Tests with horus test ### Basic Commands ```bash # Run all tests (defaults to single-threaded for shared memory safety) horus test # Run specific test by name filter horus test test_sensor_initialization # Run tests matching a pattern horus test sensor # Show test output (println!, hlog!, etc.) horus test --nocapture # Run with multiple threads (override default single-threaded mode) horus test --parallel horus test --test-threads 4 # Run in release mode (optimized build) horus test --release # Skip the build step (use existing build artifacts) horus test --no-build # Skip shared memory cleanup after tests horus test --no-cleanup # Verbose output horus test -v # Run integration tests (tests marked #[ignore]) horus test --integration # Enable simulation drivers (no hardware required) horus test --sim ``` ### Test Organization ```rust // simplified #[cfg(test)] mod tests { use super::*; mod unit_tests { use super::*; #[test] fn test_creation() { /* ... */ } } mod integration_tests { use super::*; // Mark integration tests with #[ignore] — run with `horus test --integration` #[test] #[ignore] fn test_full_pipeline() { /* ... */ } } } ``` ## Single-Tick Testing with tick_once() For simulation and fine-grained testing, `tick_once()` executes exactly one scheduler tick cycle and returns. This gives you full control over the execution loop. ### tick_once() Execute all registered nodes exactly once in priority order: ### tick() Execute only specific nodes by name. Non-existent names are silently ignored: ```rust // simplified #[test] fn test_selective_tick() { let mut scheduler = Scheduler::new(); scheduler.add(SensorNode::new().unwrap()).order(0).build().unwrap(); scheduler.add(ControlNode::new().unwrap()).order(1).build().unwrap(); scheduler.add(LoggerNode::new().unwrap()).order(2).build().unwrap(); // Tick only the sensor — control and logger are skipped scheduler.tick(&["SensorNode"]).unwrap(); // Tick sensor + control, skip logger scheduler.tick(&["SensorNode", "ControlNode"]).unwrap(); } ``` ### Simulation Loop Pattern Use `tick_once()` to integrate HORUS nodes into a simulation loop: ```rust // simplified fn run_simulation() -> Result<()> { let mut scheduler = Scheduler::new(); scheduler.add(MotorController::new()?).order(0).build()?; scheduler.add(SensorFusion::new()?).order(1).build()?; let dt = Duration::from_millis(10); // 100 Hz simulation for step in 0..1000 { // 1. Step physics simulation sim.step(dt); // 2. Run HORUS nodes for this timestep scheduler.tick_once()?; // 3. Render or collect results sim.render(); } Ok(()) } ``` ### Lazy Initialization `tick_once()` and `tick()` lazily initialize nodes on the first call — you don't need to call `run()` or any separate init method. The first `tick_once()` call runs `init()` on all nodes, then executes the tick. ## Time-Limited Test Runs Use `scheduler.run_for()` to run tests for a fixed duration: ```rust // simplified #[test] fn test_system_runs_for_one_second() { let mut scheduler = Scheduler::new(); scheduler.add(SensorNode::new().unwrap()).order(0).build().unwrap(); scheduler.add(ControlNode::new().unwrap()).order(1).build().unwrap(); // Run for exactly 1 second, then shutdown gracefully scheduler.run_for(std::time::Duration::from_secs(1)).unwrap(); } ``` ## Record/Replay Testing HORUS supports recording node execution and replaying it later for deterministic debugging. This is useful for reproducing bugs and regression testing. ### Recording a Session Record all node inputs/outputs during a run: ```bash # Record while running horus run --record ``` Recordings are saved to `~/.horus/recordings//` in binary format (`.horus` files). Each node gets its own recording file: ``` ~/.horus/recordings/my_session/ ├── sensor_node@abc123.horus # Node recording ├── control_node@def456.horus # Node recording └── scheduler@main789.horus # Scheduler execution order ``` ### Replaying in Tests Use `scheduler.add_replay()` to replay a recorded node: ```rust // simplified use std::path::PathBuf; #[test] fn test_replay_crash_scenario() { let mut scheduler = Scheduler::new(); // Replay the motor node from a crash recording scheduler.add_replay( PathBuf::from("~/.horus/recordings/crash/motor_node@abc123.horus"), 1, // priority ).unwrap(); // Add a live node to test against the recorded data scheduler.add(DiagnosticNode::new().unwrap()).order(2).build().unwrap(); scheduler.run_for(std::time::Duration::from_secs(5)).unwrap(); } ``` ### Replay Modes | Mode | Description | |------|-------------| | **Full replay** | Replay all nodes from a scheduler recording | | **Mixed replay** | Replay some nodes while others run live | | **Range replay** | Replay only a specific tick range | ### Managing Recordings ```bash # List recording sessions ls ~/.horus/recordings/ # Recordings auto-cap at 100MB per node # Delete old recordings to free space ``` ## Common Errors | Symptom | Cause | Fix | |---------|-------|-----| | Tests fail randomly | Shared memory conflicts from parallel tests | Use `horus test --test-threads 1` (default) or unique topic names per test | | "Topic not found" errors | Topic created in one test leaks into another | Use unique topic names: `Topic::new("test_topic_1")?` | | Messages not received | IPC needs time to propagate through shared memory | Add `thread::sleep(Duration::from_millis(10))` between send and recv | | Test hangs forever | `scheduler.run()` blocks indefinitely | Use `scheduler.run_for(Duration::from_secs(1))` or `tick_once()` instead | | Compilation errors in tests | Missing `use super::*` or `use horus::prelude::*` | Ensure test module imports parent scope and prelude | | Shared memory permission error | Previous test run crashed without cleanup | Run `horus clean --shm` before testing | --- ## See Also - [CLI Reference: horus test](/development/cli-reference) — Full test command options and flags - [Scheduler API: tick_once()](/rust/api/scheduler) — Single-tick execution for deterministic testing - [Deterministic Mode](/advanced/deterministic-mode) — Reproducible test runs with fixed ordering - [Debugging Workflows](/development/debugging) — Step-by-step diagnosis for runtime issues - [Record/Replay](/advanced/record-replay) — Record sessions for regression testing --- ## Parameters Guide Path: /development/parameters Description: Configure and tune your HORUS applications with runtime parameters, validation, and persistence # Parameters Guide You need to adjust robot behavior (speeds, gains, thresholds) without recompiling code. HORUS runtime parameters let you change values on-the-fly via code, CLI, or the web monitor, with validation, persistence, and versioning built in. ## When To Use This - Tuning PID gains, speed limits, or sensor thresholds while the robot runs - Loading different configuration presets for different environments (lab vs. field) - Enabling or disabling features at runtime without recompilation - Persisting tuned values across restarts via YAML files **Use [horus.toml](/concepts/horus-toml) instead** for build-time configuration that does not change at runtime (project name, dependencies, workspace members). **Use [Topics](/concepts/core-concepts-topic) instead** for high-frequency data exchange between nodes. Parameters are for configuration, not message passing. ## Prerequisites - A HORUS project with `use horus::prelude::*;` - Familiarity with [Nodes](/concepts/core-concepts-nodes) (parameters are typically accessed in `init()` and `tick()`) ## Solution **Without parameters** (hardcoded, requires recompile to change): **With parameters** (dynamic, change at runtime via monitor or CLI): **Key benefits:** - **No recompilation** — Change values without rebuilding - **Live tuning** — Adjust while robot is running - **Persistence** — Save/load from YAML files - **Validation** — Range, regex, enum, and read-only constraints - **Versioning** — Optimistic locking for concurrent edits ## Core Concepts ### Parameter Storage Parameters are stored in a **thread-safe map**: ```rust // simplified Arc>> ``` **Location:** `.horus/config/params.yaml` (relative to your project directory) **Format:** ```yaml # Flat key-value pairs (keys are plain strings) tick_rate: 30 max_memory_mb: 512 max_speed: 1.0 max_angular_speed: 1.0 acceleration_limit: 0.5 lidar_rate: 10 camera_fps: 30 sensor_timeout_ms: 1000 emergency_stop_distance: 0.3 collision_threshold: 0.5 pid_kp: 1.0 pid_ki: 0.1 pid_kd: 0.05 ``` ### Parameter Types HORUS supports all JSON-compatible types: - **Numbers** - `f64`, `i64`, `u64` (stored as `Value::Number`) - **Strings** - `String` (stored as `Value::String`) - **Booleans** - `bool` (stored as `Value::Bool`) - **Arrays** - `Vec` (stored as `Value::Array`) - **Objects** - `HashMap` (stored as `Value::Object`) ### Key Organization Keys are stored as **flat strings** in a `BTreeMap` (sorted alphabetically). Use descriptive names with underscores: ```rust // simplified // Descriptive flat keys self.params.get_or("max_speed", 1.5); self.params.get_or("pid_kp", 1.0); self.params.get_or("lidar_rate", 10); ``` You can use dot notation as a naming convention for grouping, but note that dots are treated as literal characters — there is no automatic hierarchy: ```rust // simplified // Dot notation is a naming convention, not a hierarchy self.params.get_or("motion.max_speed", 1.5); self.params.get_or("control.pid.kp", 1.0); ``` ## Using Parameters in Nodes ### Accessing Parameters ### Parameter Methods **Generic get with default (`get_or`):** ```rust // simplified let speed = self.params.get_or("max_speed", 1.5); // f64 let enabled = self.params.get_or("auto_mode", false); // bool let rate = self.params.get_or("update_rate", 60); // i32 let name = self.params.get_or("node_name", "default"); // String ``` **Generic get (returns `Option`):** ```rust // simplified // Returns None if parameter doesn't exist or type doesn't match if let Some(speed) = self.params.get::("max_speed") { self.max_speed = speed; } ``` **Generic get with default:** ```rust // simplified // Returns default if parameter doesn't exist let rate: i32 = self.params.get_or("update_rate", 60); ``` **Set parameter:** ```rust // simplified // Update parameter value (validates against metadata if set) self.params.set("max_speed", 2.0)?; // Set complex types self.params.set("camera_resolution", vec![1920, 1080])?; ``` **Query methods:** ```rust // simplified self.params.has("max_speed"); // bool — check if key exists self.params.list_keys(); // Vec — all parameter keys self.params.get_all(); // BTreeMap — all params self.params.remove("old_key"); // Option — remove and return self.params.reset()?; // Reset all params to defaults ``` **Persistence:** ```rust // simplified // Save current params to .horus/config/params.yaml self.params.save_to_disk()?; // Load params from a specific YAML file self.params.load_from_disk(Path::new("my_params.yaml"))?; ``` ### Live Reloading **Check for updates every tick:** ```rust // simplified pub struct AdaptiveController { params: RuntimeParams, max_speed: f64, tick_count: u64, } impl Node for AdaptiveController { fn name(&self) -> &str { "adaptive_controller" } fn tick(&mut self) { // Check every 60 ticks (~1 second at 60 Hz) if self.tick_count % 60 == 0 { let new_speed = self.params.get_or("max_speed", 1.5); if new_speed != self.max_speed { hlog!(info, "Speed updated: {} → {}", self.max_speed, new_speed); self.max_speed = new_speed; } } self.tick_count += 1; } } ``` **Performance note:** Parameter access is fast (~80-350ns via `Arc`), but avoid reading hundreds of parameters every tick. Cache values and reload periodically. ### Complex Parameter Types **Arrays:** ```rust // simplified // Set array self.params.set("waypoints", vec![1.0, 2.5, 3.0, 4.5])?; // Get array let waypoints: Vec = self.params .get::>("waypoints") .map(|v| v.iter().filter_map(|x| x.as_f64()).collect()) .unwrap_or_default(); ``` **Objects:** ```rust // simplified use serde_json::json; // Set nested object let config = json!({ "ip": "192.168.1.100", "port": 8080, "timeout_ms": 5000 }); self.params.set("network_config", config)?; // Get the object back if let Some(config) = self.params.get::>("network_config") { let ip = config.get("ip").and_then(|v| v.as_str()).unwrap_or("localhost"); let port = config.get("port").and_then(|v| v.as_i64()).unwrap_or(8080); } ``` ## Default Parameters When `RuntimeParams::init()` is called and no `.horus/config/params.yaml` file exists, HORUS provides these defaults: ```yaml # .horus/config/params.yaml (auto-generated defaults) # System tick_rate: 30 max_memory_mb: 512 # Motion max_speed: 1.0 max_angular_speed: 1.0 acceleration_limit: 0.5 # Sensors lidar_rate: 10 camera_fps: 30 sensor_timeout_ms: 1000 # Safety emergency_stop_distance: 0.3 collision_threshold: 0.5 # PID pid_kp: 1.0 pid_ki: 0.1 pid_kd: 0.05 ``` **Customization:** 1. Defaults are loaded if no params file exists in the project 2. Edit the YAML file directly, use the monitor, or use `horus param set` 3. Call `params.reset()` to restore defaults 4. Call `params.save_to_disk()` to persist changes ## Managing Parameters ### Via Monitor **Web interface** (easiest method): ```bash # Start monitor horus monitor # Navigate to Parameters tab # View all parameters # Edit values inline # Changes auto-save to disk ``` **Features:** - Live editing with validation - Type indicators (number/string/boolean) - Export entire parameter set - Import from YAML/JSON - Delete individual parameters See [Monitor Guide](/development/monitor#tune-parameters-live) for API details. ### Via Code **Save to disk:** ```rust // simplified // Parameters are NOT auto-saved on set() — you must save explicitly self.params.save_to_disk()?; ``` **Load from disk:** ```rust // simplified use std::path::Path; // Load from a specific file self.params.load_from_disk(Path::new(".horus/config/params.yaml"))?; ``` ### Via CLI Use `horus param` to manage parameters from the command line: ```bash # List all parameters horus param list horus param list --verbose # Include metadata horus param list --json # JSON output # Get/set values horus param get max_speed horus param set max_speed 2.0 horus param set enabled true # Delete a parameter horus param delete old_key # Reset all parameters to defaults horus param reset horus param reset --force # Skip confirmation # Save/load from files horus param save -o my_preset.yaml horus param load my_preset.yaml # Dump all parameters as YAML to stdout horus param dump ``` ### Via File Edit **Direct YAML editing:** ```bash # Edit parameters file (project-relative) vim .horus/config/params.yaml # Changes take effect on next RuntimeParams::init() or load_from_disk() ``` **Format:** ```yaml # Use spaces (2 or 4), not tabs max_speed: 2.0 # number mode: "auto" # string (quotes optional for simple strings) enabled: true # boolean rates: [10, 30, 100] # array # Comments are preserved pid_kp: 1.0 # Proportional gain pid_ki: 0.1 # Integral gain pid_kd: 0.05 # Derivative gain ``` ## Common Patterns ### PID Controller Tuning ```rust // simplified use horus::prelude::*; pub struct PIDController { params: RuntimeParams, kp: f64, ki: f64, kd: f64, integral: f64, last_error: f64, } impl Node for PIDController { fn name(&self) -> &str { "pid_controller" } fn init(&mut self) -> Result<()> { self.kp = self.params.get_or("pid_kp", 1.0); self.ki = self.params.get_or("pid_ki", 0.1); self.kd = self.params.get_or("pid_kd", 0.05); hlog!(info, "PID: Kp={}, Ki={}, Kd={}", self.kp, self.ki, self.kd); Ok(()) } fn tick(&mut self) { let error = self.compute_error(); self.integral += error; let derivative = error - self.last_error; let output = self.kp * error + self.ki * self.integral + self.kd * derivative; self.last_error = error; // Use output... } } impl PIDController { fn compute_error(&self) -> f64 { // Your error calculation 0.0 } } ``` **Tuning workflow:** 1. Start robot with default gains 2. Open monitor → Parameters 3. Adjust `pid_kp`/`pid_ki`/`pid_kd` while robot runs 4. Observe behavior in monitor metrics 5. Repeat until satisfactory 6. Save with `horus param save` or `params.save_to_disk()` ### Feature Flags ```rust // simplified pub struct AdvancedController { params: RuntimeParams, enable_obstacle_avoidance: bool, enable_path_planning: bool, enable_localization: bool, } impl Node for AdvancedController { fn name(&self) -> &str { "advanced_controller" } fn init(&mut self) -> Result<()> { self.enable_obstacle_avoidance = self.params.get_or("obstacle_avoidance", false); self.enable_path_planning = self.params.get_or("path_planning", false); self.enable_localization = self.params.get_or("localization", true); Ok(()) } fn tick(&mut self) { if self.enable_localization { self.update_localization(); } if self.enable_obstacle_avoidance { self.avoid_obstacles(); } if self.enable_path_planning { self.plan_path(); } } } impl AdvancedController { fn update_localization(&mut self) { /* ... */ } fn avoid_obstacles(&mut self) { /* ... */ } fn plan_path(&mut self) { /* ... */ } } ``` ### Environment-Specific Config ```rust // simplified pub struct NetworkNode { params: RuntimeParams, server_url: String, timeout_ms: u64, } impl Node for NetworkNode { fn name(&self) -> &str { "network_node" } fn init(&mut self) -> Result<()> { let env: String = self.params.get_or("environment", "development".to_string()); match env.as_str() { "production" => { self.server_url = self.params.get_or("prod_url", "prod.example.com:8080".to_string()); self.timeout_ms = self.params.get_or("prod_timeout_ms", 3000_i64) as u64; }, "staging" => { self.server_url = self.params.get_or("staging_url", "staging.example.com:8080".to_string()); self.timeout_ms = self.params.get_or("staging_timeout_ms", 5000_i64) as u64; }, _ => { self.server_url = self.params.get_or("dev_url", "localhost:8080".to_string()); self.timeout_ms = self.params.get_or("dev_timeout_ms", 10000_i64) as u64; } } hlog!(info, "Connecting to {} (timeout: {}ms)", self.server_url, self.timeout_ms); Ok(()) } fn tick(&mut self) { // Network logic... } } ``` ### Rate Limiting ```rust // simplified pub struct SensorPublisher { params: RuntimeParams, publish_rate: u64, last_publish: std::time::Instant, } impl Node for SensorPublisher { fn name(&self) -> &str { "sensor_publisher" } fn init(&mut self) -> Result<()> { self.publish_rate = self.params.get_or("publish_rate", 10_i64) as u64; hlog!(info, "Publishing at {} Hz", self.publish_rate); Ok(()) } fn tick(&mut self) { let interval = std::time::Duration::from_millis(1000 / self.publish_rate); if self.last_publish.elapsed() >= interval { self.publish_data(); self.last_publish = std::time::Instant::now(); } } } impl SensorPublisher { fn publish_data(&mut self) { // Publishing logic } } ``` ## Best Practices ### Naming Conventions **Use descriptive names:** ```yaml # Good lidar_scan_rate: 10 camera_resolution_width: 1920 # Bad rate: 10 w: 1920 ``` **Use consistent snake_case:** ```yaml # Good (snake_case) max_speed: 1.5 acceleration_limit: 0.5 # Bad (mixed casing) maxSpeed: 1.5 acceleration_limit: 0.5 ``` ### Always Provide Defaults **Never crash on missing parameters:** ```rust // simplified // Good - provides fallback let speed = self.params.get_or("max_speed", 1.5); // Bad - panics if missing let speed = self.params.get::("max_speed").unwrap(); ``` **Use sensible defaults:** ```rust // simplified // Good - safe defaults let emergency_stop = self.params.get_or("emergency_stop", true); // Default to safe state let max_speed = self.params.get_or("max_speed", 1.0); // Default to slow // Bad - unsafe defaults let emergency_stop = self.params.get_or("emergency_stop", false); // Unsafe! let max_speed = self.params.get_or("max_speed", 100.0); // Too fast! ``` ### Document Parameters **Add comments in YAML:** ```yaml # Maximum linear velocity in m/s (default: 1.0) max_speed: 1.0 # Maximum angular velocity in rad/s (default: 1.0) max_angular_speed: 1.0 # Proportional gain - affects responsiveness (range: 0.1-10.0) pid_kp: 1.0 # Integral gain - affects steady-state error (range: 0.01-1.0) pid_ki: 0.1 # Derivative gain - affects damping (range: 0.001-0.1) pid_kd: 0.05 ``` **Add documentation in code:** ```rust // simplified fn init(&mut self) -> Result<()> { // Load PID gains (tuning range: Kp=0.1-10, Ki=0.01-1, Kd=0.001-0.1) self.kp = self.params.get_or("pid_kp", 1.0); self.ki = self.params.get_or("pid_ki", 0.1); self.kd = self.params.get_or("pid_kd", 0.05); Ok(()) } ``` ### Validate Parameter Values **Use built-in validation rules:** RuntimeParams supports metadata with validation rules that are checked on `set()`: ```rust // simplified use horus::prelude::*; // Provides RuntimeParams, ParamMetadata, ValidationRule let params = RuntimeParams::init()?; // Set validation rules for a parameter params.set_metadata("max_speed", ParamMetadata { description: Some("Maximum robot speed".to_string()), unit: Some("m/s".to_string()), validation: vec![ValidationRule::Range(0.0, 5.0)], read_only: false, })?; // This succeeds: params.set("max_speed", 2.0)?; // This returns an error (out of range): params.set("max_speed", 10.0)?; // Error! ``` **Available validation rules:** | Rule | Description | |------|-------------| | `MinValue(f64)` | Minimum numeric value | | `MaxValue(f64)` | Maximum numeric value | | `Range(f64, f64)` | Numeric range (min, max) | | `RegexPattern(String)` | String must match regex | | `Enum(Vec)` | Value must be one of allowed strings | | `MinLength(usize)` | Minimum string/array length | | `MaxLength(usize)` | Maximum string/array length | | `RequiredKeys(Vec)` | Object must contain these keys | **Read-only parameters:** ```rust // simplified params.set_metadata("version", ParamMetadata { description: Some("System version".to_string()), unit: None, validation: vec![], read_only: true, })?; // This returns an error: params.set("version", "1.0.0")?; // Error: Parameter 'version' is read-only ``` **Manual bounds checking in code:** ```rust // simplified fn init(&mut self) -> Result<()> { let speed = self.params.get_or("max_speed", 1.5); // Clamp to safe range self.max_speed = speed.max(0.0).min(5.0); if speed != self.max_speed { hlog!(warn, "max_speed {} out of range, clamped to {}", speed, self.max_speed); } Ok(()) } ``` ### Export Parameter Sets **Create presets for different scenarios:** ```bash # Save current parameters to a preset file horus param save -o aggressive_tuning.yaml # Backup current params and switch to a different preset cp .horus/config/params.yaml .horus/config/params_backup.yaml horus param load aggressive_tuning.yaml # Dump current params to stdout for inspection horus param dump ``` ## Common Errors | Symptom | Cause | Fix | |---------|-------|-----| | Parameters show default values despite YAML file existing | YAML syntax error, wrong file location, or file permissions | Validate with `yamllint .horus/config/params.yaml`, check `ls -la .horus/config/params.yaml`, verify with `horus param list` | | Changes via `set()` do not persist after restart | `set()` only updates in-memory storage | Call `self.params.save_to_disk()?` explicitly, or use `horus param save` from CLI | | Type mismatch error (`expected f64, got String`) | YAML value quoted as string (`"1.5"` instead of `1.5`) | Remove quotes in YAML: use `max_speed: 1.5` not `max_speed: "1.5"` | | Parameters reset to defaults after update | `.horus/config/params.yaml` was deleted or is empty | Backup with `horus param save -o params_backup.yaml` before updates, restore with `horus param load params_backup.yaml` | | Validation error on `set()` | Value violates metadata rules (range, regex, read-only) | Check metadata with `horus param list --verbose`, adjust value to match constraints | | Parameter not found by other nodes | Nodes using different `RuntimeParams` instances | All nodes should call `RuntimeParams::init()` which shares the same backing store | ## Performance Considerations ### Access Speed Parameters use `Arc>`: - **Read**: ~80-350ns (read lock + BTreeMap lookup) - **Write**: ~100-500ns (write lock + BTreeMap insert + potential save) - **Thread-safe**: Multiple nodes can read simultaneously **Fast enough for:** - Loading parameters in `init()` (one-time) - Checking parameters every tick (60 Hz) - Checking parameters every 100 ticks (~1 second) **Too slow for:** - Reading hundreds of parameters every tick - Using as real-time message passing (use buffered `Topic` instead) ### Caching Strategy **Good: Cache and reload periodically** ```rust // simplified fn tick(&mut self) { // Reload every 60 ticks (~1 second) if self.reload_counter % 60 == 0 { self.max_speed = self.params.get_or("max_speed", 1.5); } self.reload_counter += 1; // Use cached value let velocity = calculate_velocity(self.max_speed); } ``` **Bad: Read every tick unnecessarily** ```rust // simplified fn tick(&mut self) { // Wasteful - reads same value 60 times per second let max_speed = self.params.get_or("max_speed", 1.5); } ``` ## Version Tracking RuntimeParams includes an optimistic locking system for concurrent edit protection: ```rust // simplified // Get current version of a parameter let version = self.params.get_version("max_speed"); // Set with version check — fails if another writer changed it self.params.set_with_version("max_speed", 2.0, version)?; ``` This prevents lost updates when multiple processes or threads modify the same parameter simultaneously. ## Audit Logging Parameter changes are automatically logged to `.horus/logs/param_changes.log`: ``` [2025-01-15 14:30:00] max_speed: 1.0 -> 2.0 [2025-01-15 14:31:15] pid_kp: 1.0 -> 1.5 ``` This provides a history of all runtime parameter modifications for debugging and tuning review. --- ## See Also - [RuntimeParams API](/rust/api/runtime-params) — Complete API reference for `RuntimeParams` methods - [CLI Reference: horus param](/development/cli-reference) — Command-line parameter management (get, set, list, save, load) - [Monitor](/development/monitor) — Live parameter tuning via web interface Parameters tab - [horus.toml](/concepts/horus-toml) — Build-time project configuration (complementary to runtime parameters) - [Nodes](/concepts/core-concepts-nodes) — Node lifecycle where parameters are typically accessed --- ## Static Analysis Path: /development/static-analysis Description: Project validation and code checking with horus check, horus lint, and horus fmt # Static Analysis You need to catch configuration errors, type issues, and code quality problems before running your robot. HORUS provides `horus check` for project validation, `horus lint` for code quality, and `horus fmt` for consistent formatting. ## When To Use This - Validating `horus.toml` and workspace configuration before building - Running pre-deployment checks in CI/CD pipelines - Catching Rust borrow checker and type errors before runtime - Enforcing code style with formatting and linting **Use [Testing](/development/testing) instead** if you need to verify runtime behavior, not static correctness. **Use [horus doctor](/development/cli-reference) instead** if you need to check the environment (toolchains, system dependencies, SHM availability). ## Prerequisites - A HORUS project with `horus.toml` - Rust toolchain (for Rust projects) or Python 3 (for Python projects) ## Solution ### Quick Start ```bash # Validate project configuration and code horus check # Format all code (Rust + Python) horus fmt # Lint all code (clippy + ruff) horus lint # CI pipeline: fail on any issue horus fmt --check && horus lint && horus check ``` ## What `horus check` Validates ### Phase 1: Manifest Validation Validates your `horus.toml` project manifest: | Check | Description | |-------|-------------| | **Project name** | Must be present, lowercase alphanumeric + hyphens/underscores | | **Version** | Must be valid semver | | **Build file** | Detects `Cargo.toml` (Rust) or `pyproject.toml` (Python) | | **Required fields** | Ensures all mandatory fields are present | ```bash horus check ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Phase 1: Manifest Validation ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ ✓ Project name valid ✓ Version is valid semver ✓ Build file detected (Cargo.toml) ✓ Configuration valid ``` ### Phase 2: Rust Deep Check For Rust projects, runs `cargo check` to verify code compiles: ```bash ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Phase 2: Rust Deep Check ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Running cargo check... ✓ Rust code compiles successfully ``` This catches type errors, borrow checker issues, and missing imports before runtime. ### Phase 3: Python Validation For Python projects, checks syntax and imports: | Check | Description | |-------|-------------| | **Syntax** | Runs `py_compile` on all `.py` files | | **Imports** | Verifies that imported modules are available | ```bash ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Phase 3: Python Validation ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Checking Python files... ✓ main.py - syntax OK ✓ nodes/sensor.py - syntax OK ✓ All imports resolvable ``` ### Additional Checks `horus check` also validates: - **Toolchain**: Verifies Rust toolchain and Python interpreter are available - **System requirements**: Checks disk space and system dependencies - **API usage**: Validates HORUS API usage patterns in project code - **Registry connectivity**: Tests connection to the HORUS package registry (if packages are declared) ## Example Output ```bash $ horus check ╔══════════════════════════════════════════╗ ║ HORUS Project Check ║ ╚══════════════════════════════════════════╝ Project: my_robot (v0.1.0) Language: python Phase 1: Manifest ...................... ✓ Phase 2: Python Validation ............ ✓ Phase 3: System Requirements .......... ✓ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Result: All checks passed ✓ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ ``` ## Variations ### CI/CD Pipeline ```bash #!/bin/bash set -e horus fmt --check # Fail if code is unformatted horus lint # Fail on lint errors horus check --quiet # Validate project (errors only) horus test # Run all tests horus doc --coverage --fail-under 80 # Doc coverage gate ``` ### Lint Configuration `horus lint` runs clippy (Rust) and ruff check (Python). Configure clippy via Cargo.toml `[lints]`. Configure ruff via `ruff.toml` or `pyproject.toml [tool.ruff]`. ### Machine-Readable Output For CI, use `horus build --json-diagnostics` for machine-parseable Cargo output, or `horus check --json` for structured validation output. ### Auto-Fix Lint Issues ```bash # Fix what can be auto-fixed horus lint --fix horus fmt ``` ### CI Pipeline (Full Example) ```bash #!/bin/bash set -euo pipefail # Stage 1: Static analysis horus fmt --check # Fail if code is unformatted horus lint # Fail on lint errors horus check --json # Structured validation output # Stage 2: Build and test horus build --json-diagnostics # Machine-parseable build output horus test # Run all tests ``` ## Common Errors | Symptom | Cause | Fix | |---------|-------|-----| | `horus check` fails on manifest | Invalid `horus.toml` syntax or missing fields | Run `horus check` and fix the reported issues | | `cargo check` fails | Rust compilation errors (types, borrows, imports) | Fix the errors shown in the output | | Python syntax check fails | Invalid Python syntax in `.py` files | Fix the syntax errors shown by `py_compile` | | `horus fmt --check` exits non-zero | Unformatted code | Run `horus fmt` to auto-format | | `horus lint` reports warnings | Clippy or ruff detected anti-patterns | Run `horus lint --fix` for auto-fixable issues, fix remaining manually | --- ## See Also - [CLI Reference](/development/cli-reference) — Full `horus check`, `horus fmt`, and `horus lint` options - [Testing](/development/testing) — Runtime testing after static validation passes - [Error Handling](/development/error-handling) — Error types and propagation patterns - [AI-Assisted Development](/development/ai-assisted-development) — Structured diagnostics with `--json-diagnostics` for AI agents - [Getting Started](/getting-started/quick-start) — Project setup guide --- ## CLI Reference Path: /development/cli-reference Description: Complete reference for all HORUS CLI commands with options, examples, and errors # CLI Reference Provides the complete command-line interface for building, running, testing, monitoring, and managing HORUS robotics applications. ## Quick Reference | Category | Command | Description | |----------|---------|-------------| | **Project** | `horus init` | Initialize workspace in current directory | | | `horus new ` | Create a new project | | | `horus run [files...]` | Build and run your app | | | `horus build [files...]` | Build without running | | | `horus test [filter]` | Run tests | | | `horus check [path]` | Validate `horus.toml` and workspace | | | `horus clean` | Clean build artifacts and shared memory | | | `horus lock` | Generate or verify `horus.lock` | | | `horus launch ` | Launch multiple nodes from YAML | | **Monitoring** | `horus monitor [port]` | Monitor your system (web or TUI) | | | `horus topic ` | Topic introspection (list, echo, info, hz, pub) | | | `horus node ` | Node management (list, info, kill, restart, pause, resume) | | | `horus service ` | Service interaction (list, call, info, find) | | | `horus action ` | Action introspection (list, info, send-goal, cancel-goal) | | | `horus log [node]` | View and filter logs | | | `horus blackbox` | Inspect BlackBox flight recorder (alias: `bb`) | | **Frames** | `horus frame ` | Transform Frames (list, echo, tree, info, can, hz) (alias: `tf`) | | **Dependencies** | `horus add ` | Add a dependency to `horus.toml` (auto-detects source) | | | `horus remove ` | Remove a dependency from `horus.toml` | | **Packages** | `horus install ` | Install standalone package or plugin from registry | | | `horus uninstall ` | Uninstall a standalone package or plugin | | | `horus list [query]` | List/search packages (`--global`, `--all`, `--json`) | | | `horus search ` | Search available packages/plugins | | | `horus info ` | Show detailed package/plugin info | | | `horus update [package]` | Update project dependencies | | | `horus publish` | Publish current package to registry | | | `horus unpublish ` | Unpublish a package from registry | | | `horus yank ` | Yank a package version | | | `horus deprecate ` | Mark a package as deprecated | | | `horus owner ` | Manage package owners (list, add, remove, transfer) | | | `horus cache ` | Cache management (info, list, clean, purge) | | **Params** | `horus param ` | Parameter management (get, set, list, delete, reset, dump, load, save) | | | `horus msg ` | Message type introspection (list, info, hash) | | **Dev Tools** | `horus fmt` | Format code (Rust + Python) | | | `horus lint` | Lint code (clippy + ruff/pylint) | | | `horus doc` | Generate documentation | | | `horus bench [filter]` | Run benchmarks | | | `horus deps ` | Dependency insight (tree, why, outdated, audit) | | **Maintenance** | `horus doctor` | Comprehensive ecosystem health check | | | `horus self update` | Update the horus CLI to latest version | | | `horus config ` | View/edit `horus.toml` settings (get, set, list) | | | `horus migrate` | Migrate project to unified `horus.toml` format | | **Advanced** | `horus deploy [target]` | Deploy to remote robot | | | `horus record ` | Record/replay for debugging and testing | | | `horus scripts [name]` | Run a script from `horus.toml` `[scripts]` | | **Plugins** | `horus plugin ` | Plugin management (enable, disable, verify) | | **Auth** | `horus auth ` | Authentication (login, api-key, signing-key, logout, whoami) | --- ## `horus init` - Initialize Workspace **What it does**: Initializes a HORUS workspace in the current directory, creating the necessary configuration files. **Why it's useful**: Quickly set up an existing directory as a HORUS project without creating new files from templates. ### Basic Usage ```bash # Initialize in current directory (uses directory name) horus init # Initialize with custom name horus init --name my_robot ``` ### All Options ```bash horus init [OPTIONS] Options: -n, --name Workspace name (defaults to directory name) ``` ### Examples **Initialize existing code as HORUS project**: ```bash cd ~/my-robot-code horus init # Creates horus.toml with project configuration ``` **Initialize with specific name**: ```bash horus init --name sensor_array ``` ### What Gets Created Running `horus init` creates: - `horus.toml` - Project config with name and version This is useful when you have existing code and want to add HORUS support, or when setting up a workspace that will contain multiple HORUS projects. --- ## `horus new` - Create Projects **What it does**: Creates a new HORUS project with all the boilerplate set up for you. **Why it's useful**: Minimal configuration required. Select a language and begin development. ### Basic Usage ```bash # Interactive mode (asks you questions) horus new my_project # Rust with node! macro (recommended for reduced boilerplate) horus new my_project --macro # Python project horus new my_project --python ``` ### All Options ```bash horus new [OPTIONS] Options: -m, --macro Rust with node! macro (less boilerplate) -r, --rust Plain Rust project -p, --python Python project -w, --workspace Create a workspace project with multiple crates -l, --lib Create a library crate (instead of binary) -o, --output Where to create it (default: current directory) ``` ### Examples **Start with Rust + macros** (easiest): ```bash horus new temperature_monitor --macro cd temperature_monitor horus run ``` **Python for prototyping**: ```bash horus new sensor_test --python cd sensor_test python main.py ``` **Put it somewhere specific**: ```bash horus new robot_controller --output ~/projects/robots ``` --- ## `horus run` - Build and Run **What it does**: Compiles your code and runs it. Handles all the build tools for you. **Why it's useful**: One command works for Rust and Python. Language is auto-detected from native build files (`Cargo.toml` for Rust, `pyproject.toml` for Python). For Rust, it builds with Cargo. For Python, it handles the appropriate tooling. ### Basic Usage ```bash # Run current directory (finds main.rs or main.py) horus run # Run specific file horus run src/controller.rs # Run optimized (release mode) horus run --release ``` ### All Options ```bash horus run [FILES...] [OPTIONS] [-- ARGS] Options: -r, --release Optimize for speed (recommended for benchmarks) -c, --clean Remove cached build artifacts and dependencies (Use after updating HORUS or when compilation fails) -q, --quiet Suppress progress indicators -p, --package Build and run a specific workspace member -d, --drivers Override detected drivers (comma-separated) Example: --drivers camera,lidar,imu -e, --enable Enable capabilities (comma-separated) Example: --enable cuda,editor,python --sim Enable simulation mode — hardware entries with sim = true are replaced with stubs --record Enable recording for this session --json Output build diagnostics as JSON --json-diagnostics Output Cargo diagnostics as JSON --no-hooks Skip pre/post build hooks -- Arguments for your program ``` ### Using --sim for Simulation Mode The `--sim` flag runs your project in simulation mode without physical hardware: ```bash # Run with simulated drivers horus run --sim # Combine with release mode horus run --sim --release # Pass arguments to your program horus run --sim -- --config warehouse.yaml ``` When `--sim` is active: 1. Hardware entries with `sim = true` are replaced with stub nodes 2. The simulator plugin specified in `[robot] simulator` (default: `"sim3d"`) is launched automatically 3. Your node code runs unchanged — the same topics and message types are used Configure simulation hardware in `horus.toml`: ```toml [hardware] front_lidar = { use = "rplidar", port = "/dev/ttyUSB0", sim = true } imu = { use = "bno055", bus = 1, sim = true } ``` See [Configuration Reference: \[hardware\]](/package-management/configuration#hardware) for full details. ### Using --enable for Capabilities The `--enable` flag lets you quickly enable features without editing `horus.toml`: ```bash # Enable CUDA GPU acceleration horus run --enable cuda # Enable multiple capabilities horus run --enable cuda,editor,python # Combine with hardware features horus run --enable gpio,i2c --release ``` **Available capabilities:** | Capability | Description | |------------|-------------| | `cuda`, `gpu` | CUDA GPU acceleration | | `editor` | Scene editor UI | | `python`, `py` | Python bindings | | `headless` | No rendering (for training) | | `gpio`, `i2c`, `spi`, `can`, `serial` | Hardware interfaces | | `opencv` | OpenCV backend | | `realsense` | Intel RealSense support | | `full` | All features | **Or configure in horus.toml:** ```toml enable = ["cuda", "editor"] ``` ### Why --release Matters Debug builds have significantly higher overhead than release builds due to runtime checks and lack of optimizations. **Debug mode** (default): Fast compilation, slower execution - Use case: Development iteration - Typical tick time: 60-200μs - Includes overflow checks, bounds checking, assertions **Release mode** (`--release`): Slower compilation, optimized execution - Use case: Performance testing, benchmarks, production deployment - Typical tick time: 1-3μs - Full compiler optimizations enabled **Common Mistake:** ```bash horus run # Debug mode # You see: [IPC: 1862ns | Tick: 87μs] - Looks slow! horus run --release # Release mode # You see: [IPC: 947ns | Tick: 2μs] - Actually fast! ``` **The tick time difference is dramatic:** - Debug: 60-200μs per tick (too slow for real-time control) - Release: 1-3μs per tick (production-ready performance) **Rule of thumb:** Always use `--release` when: - Measuring performance - Running benchmarks - Testing real-time control loops - Deploying to production - Wondering "why is HORUS slow?" ### Why --clean Matters The `--clean` flag removes the `.horus/target/` directory, which contains cached build artifacts and dependencies. **When to use `--clean`:** 1. **After updating HORUS** - Most common use case ```bash # You updated horus CLI to a new version horus run --clean ``` This fixes version mismatch errors like: ``` error: the package `horus` depends on `horus_core 0.1.0`, but `horus_core 0.1.3` is installed ``` 2. **Compilation fails mysteriously** ```bash horus run --clean # Sometimes cached state gets corrupted ``` 3. **Dependencies changed** ```bash # You modified Cargo.toml dependencies horus run --clean ``` **What it does:** - Removes `.horus/target/` (build artifacts) - Removes cached lock files - Forces fresh dependency resolution - Next build rebuilds everything from scratch **Trade-off:** - First build after `--clean` is slower (5-30 seconds) - Subsequent builds are fast again (incremental compilation) **Note:** The `--clean` flag only affects the current project's `.horus/` directory, not the global `~/.horus/` cache. ### Examples **Daily development**: ```bash horus run # Fast iteration, slower execution ``` **Testing performance**: ```bash horus run --release # See real speed ``` **Build for CI without running** (use `horus build`): ```bash horus build --release ``` **Fresh build** (when things act weird or after updating HORUS): ```bash horus run --clean --release ``` **After updating HORUS CLI** (fixes version mismatch errors): ```bash # Clean removes cached dependencies from .horus/target/ horus run --clean ``` **Pass arguments to your program**: ```bash horus run -- --config robot.yaml --verbose ``` ### Important: Single-File Projects Only `horus run` is designed for **single-file HORUS projects** (main.rs or main.py). It creates a temporary workspace in `.horus/` and automatically handles dependencies. **What works with `horus run`:** - Single main.rs with all nodes defined in one file - Simple Python scripts (main.py) **What doesn't work:** - Multi-crate Cargo workspaces - Projects with multiple Cargo.toml files - Complex module structures with separate crate directories **For multi-crate projects**, use `cargo` directly: ```bash cd your_multi_crate_project cargo build --release cargo run --release ``` **Example of a proper single-file structure:** ### Concurrent Multi-Process Execution HORUS supports running multiple node files concurrently as separate processes using **glob patterns**. This is ideal for distributed robotics systems where nodes need to run independently. **Basic Usage:** ```bash horus run "nodes/*.py" # Run all Python nodes concurrently horus run "src/*.rs" # Run all Rust nodes concurrently ``` **How it works:** 1. **Phase 1 (Build)**: Builds all files sequentially, respecting Cargo's file lock 2. **Phase 2 (Execute)**: Spawns all processes concurrently with their own schedulers 3. Each process communicates via HORUS shared memory IPC **Features:** - -**Color-coded output**: Each node is prefixed with `[node_name]` in a unique color - -**Graceful shutdown**: Ctrl+C cleanly terminates all processes - -**Multi-language**: Works with Rust and Python - -**Automatic detection**: No flags needed, just use glob patterns **Example output:** ```bash $ horus run "nodes/*.py" Executing 3 files concurrently: 1. nodes/sensor.py (python) 2. nodes/controller.py (python) 3. nodes/logger.py (python) Phase 1: Building all files... Phase 2: Starting all processes... Started [sensor] Started [controller] Started [logger] All processes running. Press Ctrl+C to stop. [sensor] Sensor reading: 25.3°C [controller] Motor speed: 45% [logger] System operational [sensor] Sensor reading: 26.1°C [controller] Motor speed: 50% [logger] System operational ``` **When to use concurrent execution:** - Multi-node systems where each node is in a separate file - Distributed control architectures (similar to ROS nodes) - Testing multiple nodes simultaneously - Microservices-style robotics architectures **When to use single-process execution:** - All nodes in one file (typical for simple projects) - Projects requiring predictable scheduling across all nodes - Maximum performance with minimal overhead **Important:** Each process runs its own scheduler. Nodes communicate through HORUS shared memory topics, not direct function calls. ### Common Errors | Error | Cause | Fix | |-------|-------|-----| | `No horus.toml found` | Not in a project directory | `cd` into your project or run `horus init` | | `No main source file found` | Missing `src/main.rs` or `src/main.py` | Create the file or specify: `horus run src/your_file.rs` | | `Compilation failed` | Rust/Python syntax errors | Fix the errors shown in the output | | `Permission denied` on shared memory | SHM permissions | Run `horus doctor` to diagnose, then fix OS-level permissions | | `Address already in use` | Another horus process running | `horus clean --shm` to clear stale SHM | --- ## `horus check` - Validate Project **What it does**: Validates `horus.toml`, source files, and the workspace configuration. **Why it's useful**: Quickly diagnose configuration issues, missing dependencies, or environment problems before building. ### Basic Usage ```bash # Check current directory horus check # Check a specific path horus check path/to/project # Quiet mode (errors only) horus check --quiet ``` ### All Options ```bash horus check [OPTIONS] [PATH] Arguments: [PATH] Path to file, directory, or workspace (default: current directory) Options: --json Output as JSON --full Run full validation (all checks) --health Run health checks on running system ``` ### Examples **Validate before building**: ```bash horus check # Validates horus.toml, build files, and environment ``` **CI/CD validation**: ```bash #!/bin/bash if ! horus check --quiet; then echo "Validation failed" exit 1 fi horus build --release ``` **Related:** - [Static Analysis](/development/static-analysis) — Detailed validation phases and output format - [AI-Assisted Development](/development/ai-assisted-development) — Structured diagnostics for AI coding agents --- ## `horus monitor` - Monitor Everything **What it does**: Opens a visual monitor showing all your running nodes, messages, and performance. **Why it's useful**: Debug problems visually. See message flow in real-time. Monitor performance. ### Basic Usage ```bash # Web monitor (opens in browser) horus monitor # Different port horus monitor 8080 # Text-based (for SSH) horus monitor --tui # Reset monitor password before starting horus monitor --reset-password # Disable authentication (local development only) horus monitor --no-auth ``` ### What You See The monitor shows: - **All running nodes** - Names, status, tick rates - **Message flow** - What's talking to what - **Performance** - CPU, memory, latency per node - **Topics** - All active communication channels - **Graph view** - Visual network of your system ### Examples **Start monitoring** (in a second terminal): ```bash # Terminal 1: Run your app horus run --release # Terminal 2: Watch it horus monitor ``` **Access from your phone**: ```bash horus monitor # Visit http://your-computer-ip:3000 from phone ``` **Monitor over SSH**: ```bash ssh robot@192.168.1.100 horus monitor --tui ``` See [Monitor Guide](/development/monitor) for detailed features. --- ## `horus search` - Search Packages **Alias**: `horus s` **What it does**: Search the HORUS registry for available packages and plugins. **Why it's useful**: Find drivers, libraries, and plugins before installing them. ### All Options ```bash horus search [OPTIONS] Arguments: Search query (e.g., "camera", "lidar", "motor") Options: -c, --category Filter by category (camera, lidar, imu, motor, servo, bus, gps, simulation, cli) --json Output as JSON ``` ### Examples **Search for packages**: ```bash horus search lidar ``` **Filter by category**: ```bash horus search driver --category camera ``` **Machine-readable output**: ```bash horus search motor --json ``` --- ## `horus info` - Package/Plugin Info **What it does**: Shows detailed information about a package or plugin, including version, description, dependencies, and metadata. **Why it's useful**: Inspect a package before installing, or check details of an installed package. ### All Options ```bash horus info [OPTIONS] Arguments: Package or plugin name Options: --json Output as JSON ``` ### Examples **Show package details**: ```bash horus info pid-controller ``` **JSON output for tooling**: ```bash horus info rplidar-driver --json ``` --- ## `horus list` - List Installed Packages **What it does**: Lists installed packages and plugins in the current project or globally. **Why it's useful**: See what is installed, verify versions, and audit dependencies. ### All Options ```bash horus list [OPTIONS] Options: -g, --global List global scope packages only -a, --all List all (local + global) --json Output as JSON ``` ### Examples **List local packages**: ```bash horus list ``` **List everything (local + global)**: ```bash horus list --all ``` **List global packages as JSON**: ```bash horus list --global --json ``` --- ## `horus update` - Update Packages **What it does**: Updates project dependencies, the horus CLI tool itself, and installed plugins. **Why it's useful**: Keep dependencies current, apply security patches, and upgrade the CLI without manual reinstallation. ### All Options ```bash horus update [PACKAGE] [OPTIONS] Arguments: [PACKAGE] Specific package to update (updates all deps if omitted) Options: -g, --global Update global scope packages --dry-run Show what would be updated without making changes ``` ### Examples **Update all project dependencies**: ```bash horus update ``` **Update a specific package**: ```bash horus update pid-controller ``` **Preview updates without applying**: ```bash horus update --dry-run ``` **Update global packages**: ```bash horus update --global ``` --- ## `horus publish` - Publish Package **What it does**: Publishes the current package to the HORUS registry. Validates the package, builds it, and uploads to the registry. **Why it's useful**: Share your drivers, libraries, and tools with the community or your team. ### All Options ```bash horus publish [OPTIONS] Options: --dry-run Validate and build the package without actually publishing ``` ### Examples **Publish to registry**: ```bash # First login horus auth login # Then publish from your project directory horus publish ``` **Validate without publishing**: ```bash horus publish --dry-run ``` --- ## `horus unpublish` - Unpublish Package **What it does**: Removes a published package version from the HORUS registry. **Why it's useful**: Retract a broken release or remove a package that should no longer be available. ### All Options ```bash horus unpublish [OPTIONS] Arguments: Package name (supports name@version syntax, e.g. my-pkg@1.0.0) Options: -y, --yes Skip confirmation prompt ``` ### Examples **Unpublish a specific version**: ```bash horus unpublish my-package@1.0.0 ``` **Skip confirmation**: ```bash horus unpublish my-package@1.0.0 --yes ``` --- ## `horus auth signing-key` - Generate Signing Keys **What it does**: Generates an Ed25519 signing key pair for package signing. The private key is stored locally and the public key can be shared or uploaded to the registry. **Why it's useful**: Sign your packages to prove authenticity. Users can verify that packages have not been tampered with. ### Basic Usage ```bash horus auth signing-key ``` ### Examples **Generate a new key pair**: ```bash horus auth signing-key # Generates Ed25519 key pair # Private key saved to ~/.horus/keys/ # Public key printed to stdout ``` --- ## `horus auth` - Authentication & Keys **What it does**: Authenticate with the registry, manage API keys, and generate signing keys. **Why it's useful**: Secure access to the package registry for publishing and downloading. ### Commands ```bash # Login with GitHub horus auth login # Generate API key (for CI/CD publishing) horus auth api-key # Generate Ed25519 signing key pair (for package signing) horus auth signing-key # Check who you are horus auth whoami # Logout horus auth logout # Manage API keys horus auth keys list horus auth keys revoke ``` ### Examples **First time setup**: ```bash horus auth login # Opens browser for GitHub login ``` **Check you're logged in**: ```bash horus auth whoami ``` **Generate API key for CI/CD**: ```bash horus auth api-key --name github-actions --environment ci-cd # Save the generated key in your CI secrets ``` **Generate signing key for package integrity**: ```bash horus auth signing-key # Ed25519 key pair saved to ~/.horus/keys/ ``` **Logout**: ```bash horus auth logout ``` --- ## `horus build` - Build Without Running **What it does**: Compiles your project without executing it. **Why it's useful**: Validate compilation, prepare for deployment, or integrate with CI/CD pipelines. ### Basic Usage ```bash # Build current project horus build # Build in release mode horus build --release # Clean build (remove cached artifacts first) horus build --clean ``` ### All Options ```bash horus build [FILES...] [OPTIONS] Options: -r, --release Build in release mode (optimized) -c, --clean Clean before building -q, --quiet Suppress progress indicators -p, --package Build a specific workspace member -d, --drivers Override detected drivers (comma-separated) -e, --enable Enable capabilities (comma-separated) --json Output build diagnostics as JSON --json-diagnostics Output Cargo diagnostics as JSON --no-hooks Skip pre/post build hooks ``` ### Examples **CI/CD build validation**: ```bash horus build --release # Exit code 0 = success, non-zero = failure ``` **Clean release build for deployment**: ```bash horus build --clean --release ``` ### Common Errors | Error | Cause | Fix | |-------|-------|-----| | `No horus.toml found` | Not in a project directory | Run `horus init` or `cd` into project | | `error[E0432]: unresolved import` | Missing dependency | `horus add --source crates.io` | | `.horus/Cargo.toml generation failed` | Malformed horus.toml | `horus check` to validate | | `linker 'cc' not found` | Missing C compiler | `sudo apt install build-essential` | --- ## `horus test` - Run Tests **What it does**: Runs your HORUS project's test suite. **Why it's useful**: Validate functionality, run integration tests with simulation, and ensure code quality. ### Basic Usage ```bash # Run all tests horus test # Run tests matching a filter horus test my_node # Run with parallel execution horus test --parallel # Run simulation tests horus test --sim ``` ### All Options ```bash horus test [OPTIONS] [FILTER] Arguments: [FILTER] Test name filter (runs tests matching this string) Options: -r, --release Run tests in release mode --parallel Allow parallel test execution --sim Enable simulation mode (no hardware required) --integration Run integration tests (tests marked #[ignore]) --nocapture Show test output -j, --test-threads Number of test threads (default: 1) --no-build Skip the build step -v, --verbose Verbose output -d, --drivers Override detected drivers (comma-separated) -e, --enable Enable capabilities (comma-separated) --json Output test results as JSON --no-hooks Skip pre/post build hooks ``` ### Examples **Run specific tests**: ```bash horus test sensor_node --nocapture ``` **Fast parallel test run**: ```bash horus test --parallel --release ``` **Integration tests with simulator**: ```bash horus test --integration --sim ``` --- ## `horus clean` - Clean Build Artifacts **What it does**: Removes build artifacts, cached dependencies, and shared memory files. **Why it's useful**: Fix corrupted builds, reclaim disk space, or reset shared memory after crashes. ### Basic Usage ```bash # Clean everything (build + shared memory + cache) horus clean --all # Only clean shared memory horus clean --shm # Preview what would be cleaned horus clean --dry-run ``` ### All Options ```bash horus clean [OPTIONS] Options: --shm Only clean shared memory -a, --all Clean everything (build cache + shared memory + horus cache) -n, --dry-run Show what would be cleaned without removing anything -f, --force Skip confirmation prompts --json Output as JSON ``` ### Examples **After a crash (clean stale shared memory)**: ```bash horus clean --shm ``` **Full reset before deployment**: ```bash horus clean --all horus build --release ``` --- ## `horus topic` - Topic Introspection **What it does**: Inspect, monitor, and interact with HORUS topics (shared memory communication channels). **Why it's useful**: Debug message flow, verify data publishing, and measure topic rates. ### Subcommands ```bash horus topic list # List all active topics horus topic echo # Print messages as they arrive horus topic info # Show topic details (type, publishers, subscribers) horus topic hz # Measure publishing rate horus topic pub # Publish a message horus topic bw # Measure bandwidth ``` ### All Options | Subcommand | Flag | Description | |-----------|------|-------------| | `list` | `-v, --verbose` | Show detailed information | | `list` | `--json` | Output as JSON | | `echo` | `-n, --count ` | Number of messages to print then exit | | `echo` | `-r, --rate ` | Throttle display rate | | `hz` | `-w, --window ` | Averaging window size | | `pub` | `-r, --rate ` | Publishing rate | | `pub` | `-n, --count ` | Number of messages to publish | | `bw` | `-w, --window ` | Averaging window size | ### Examples **List all topics**: ```bash horus topic list # Output: # cmd_vel (CmdVel) - 2 publishers, 1 subscriber # scan (LaserScan) - 1 publisher, 3 subscribers # odom (Odometry) - 1 publisher, 1 subscriber ``` **Monitor a topic in real-time**: ```bash horus topic echo scan # Prints each LaserScan message as it arrives ``` **Check publishing rate**: ```bash horus topic hz cmd_vel # Output: average rate: 50.0 Hz, min: 49.2, max: 50.8 ``` **Publish test message**: ```bash horus topic pub cmd_vel '{"stamp_nanos": 0, "linear": 1.0, "angular": 0.5}' ``` --- ## `horus node` - Node Management **What it does**: List, inspect, and control running HORUS nodes. **Why it's useful**: Debug node states, restart misbehaving nodes, or pause nodes during testing. ### Subcommands ```bash horus node list # List all running nodes horus node info # Show detailed node information horus node kill # Terminate a node horus node restart # Restart a node horus node pause # Pause a node's tick execution horus node resume # Resume a paused node ``` ### All Options | Subcommand | Flag | Description | |-----------|------|-------------| | `list` | `-v, --verbose` | Show detailed information | | `list` | `--json` | Output as JSON | | `list` | `-c, --category ` | Filter by category | | `kill` | `-f, --force` | Force kill without graceful shutdown | ### Examples **List all running nodes**: ```bash horus node list # Output: # NAME PID RATE CPU MEMORY STATUS # SensorNode 12345 100Hz 1.2% 10MB Running # ControllerNode 12346 50Hz 2.5% 15MB Running # LoggerNode 12347 10Hz 0.1% 5MB Paused ``` **Get detailed node info**: ```bash horus node info SensorNode # Shows: tick count, error count, subscribed topics, published topics ``` **Restart a stuck node**: ```bash horus node restart ControllerNode ``` **Pause/resume for debugging**: ```bash horus node pause SensorNode # ... inspect state ... horus node resume SensorNode ``` --- ## `horus service` - Service Interaction **Alias**: `horus srv` **What it does**: Inspect and interact with HORUS services (request/response communication channels). **When to use**: Debug service calls, verify server availability, or test request/response workflows from the command line. ### Subcommands | Subcommand | Description | |-----------|-------------| | `list` | List all active services | | `call ` | Call a service with a JSON request | | `info ` | Show type info and status for a service | | `find ` | Find services matching a name filter | ### All Options ```bash horus service list [-v, --verbose] [--json] horus service call [-t, --timeout ] horus service info horus service find ``` | Subcommand | Flag | Description | |-----------|------|-------------| | `list` | `-v, --verbose` | Show detailed information (servers, clients, topics) | | `list` | `--json` | Output as JSON | | `call` | `-t, --timeout ` | Timeout in seconds (default: 5.0) | ### Examples **List all services**: ```bash horus service list # Output: # NAME SERVERS CLIENTS STATUS # add_two_ints 1 1 active # get_map 1 0 active ``` **Call a service**: ```bash horus service call add_two_ints '{"a": 3, "b": 4}' # Response received: # { "sum": 7 } ``` **Call with custom timeout**: ```bash horus service call get_map '{}' --timeout 10.0 ``` **Show service details**: ```bash horus service info add_two_ints # Name: add_two_ints # Request topic: add_two_ints/request # Response topic: add_two_ints/response # Status: active # Servers: 1 # Clients: 1 ``` **Find services by name**: ```bash horus service find map # get_map # save_map ``` --- ## `horus action` - Action Introspection **Alias**: `horus a` **What it does**: Inspect and interact with long-running action servers. Actions use a goal/feedback/result protocol for tasks that take time to complete. **When to use**: Debug navigation goals, manipulation tasks, or any action-based node. Send goals from the command line and monitor progress. ### Subcommands | Subcommand | Description | |-----------|-------------| | `list` | List all active actions | | `info ` | Show action details (topics, publishers, subscribers) | | `send-goal ` | Send a goal to an action server | | `cancel-goal ` | Cancel an active goal | ### All Options ```bash horus action list [-v, --verbose] [--json] horus action info horus action send-goal [-w, --wait] [-t, --timeout ] horus action cancel-goal [-i, --goal-id ] ``` | Subcommand | Flag | Description | |-----------|------|-------------| | `list` | `-v, --verbose` | Show detailed information (topics, publishers) | | `list` | `--json` | Output as JSON | | `send-goal` | `-w, --wait` | Wait for and display the result | | `send-goal` | `-t, --timeout ` | Timeout when waiting for result (default: 30.0) | | `cancel-goal` | `-i, --goal-id ` | Specific goal ID to cancel (cancels all if omitted) | ### Action Topics Each action `` creates five sub-topics: | Topic | Direction | Purpose | |-------|-----------|---------| | `.goal` | Client -> Server | Clients send goals here | | `.cancel` | Client -> Server | Clients request cancellation | | `.status` | Server -> Client | Server broadcasts goal states | | `.feedback` | Server -> Client | Server sends progress updates | | `.result` | Server -> Client | Server sends final result | ### Examples **List all actions**: ```bash horus action list # Output: # NAME GOALS TOPICS # navigate_to_pose 2 5/5 topics # pick_object 1 3/5 topics ``` **Send a navigation goal**: ```bash horus action send-goal navigate_to_pose '{"target_x": 5.0, "target_y": 3.0}' ``` **Send a goal and wait for the result**: ```bash horus action send-goal navigate_to_pose '{"target_x": 5.0, "target_y": 3.0}' --wait # Sending goal to action: navigate_to_pose # Goal ID: abc12345 # # Goal sent # Feedback: {"progress": 0.5} # Status: Running # Feedback: {"progress": 1.0} # Status: Succeeded # # Result: # { "success": true, "distance_traveled": 5.83 } ``` **Send a goal with custom timeout**: ```bash horus action send-goal navigate_to_pose '{"x": 10.0}' --wait --timeout 60.0 ``` **Cancel a specific goal**: ```bash horus action cancel-goal navigate_to_pose --goal-id abc-123-def ``` **Cancel all goals on an action**: ```bash horus action cancel-goal navigate_to_pose ``` **Get action details**: ```bash horus action info navigate_to_pose # Action Information # Name: navigate_to_pose # Topics: # navigate_to_pose/goal -- clients send goals here # navigate_to_pose/cancel -- clients request cancellation # navigate_to_pose/status -- server broadcasts goal states # navigate_to_pose/feedback -- server sends progress # navigate_to_pose/result -- server sends final result # Goal publishers (clients): 2 # Result subscribers (clients): 1 ``` --- ## `horus log` - View and Filter Logs **What it does**: View, filter, and follow HORUS system logs. **Why it's useful**: Debug issues, monitor specific nodes, and track errors in real-time. ### Basic Usage ```bash # View all recent logs horus log # Filter by node horus log SensorNode # Follow logs in real-time horus log --follow # Show only errors horus log --level error ``` ### All Options ```bash horus log [OPTIONS] [NODE] Arguments: [NODE] Filter by node name Options: -l, --level Filter by log level (trace, debug, info, warn, error) -s, --since Show logs from last duration (e.g., "5m", "1h", "30s") -f, --follow Follow log output in real-time -n, --count Number of recent log entries to show --clear Clear logs instead of viewing --clear-all Clear all logs (including file-based logs) -h, --help Print help ``` ### Examples **Follow logs from a specific node**: ```bash horus log SensorNode --follow ``` **View errors from last 10 minutes**: ```bash horus log --level error --since 10m ``` **Show last 50 warnings and errors**: ```bash horus log --level warn --count 50 ``` --- ## `horus param` - Parameter Management **What it does**: Manage node parameters at runtime (get, set, list, dump, save, load). **Why it's useful**: Tune robot behavior without recompiling, persist configurations, and debug parameter values. ### Subcommands ```bash horus param list # List all parameters horus param get # Get parameter value horus param set # Set parameter value horus param delete # Delete a parameter horus param reset # Reset all parameters to defaults horus param load # Load parameters from YAML file horus param save [path] # Save parameters to YAML file horus param dump # Dump all parameters as YAML to stdout ``` ### All Options | Subcommand | Flag | Description | |-----------|------|-------------| | `list` | `-v, --verbose` | Show detailed information | | `list` | `--json` | Output as JSON | | `get` | `--json` | Output as JSON | | `reset` | `-f, --force` | Skip confirmation prompt | ### Examples **List all parameters**: ```bash horus param list # Output: # /SensorNode/sample_rate: 100 # /SensorNode/filter_size: 5 # /ControllerNode/kp: 1.5 # /ControllerNode/ki: 0.1 ``` **Tune a controller at runtime**: ```bash horus param set ControllerNode kp 2.0 horus param set ControllerNode ki 0.2 ``` **Save and restore configuration**: ```bash # Save current params horus param save robot_config.yaml # Later, restore them horus param load robot_config.yaml ``` --- ## `horus tf` - Transform Frames (Coordinate Transforms) **What it does**: Inspect and monitor coordinate frame transforms (similar to ROS tf). **Why it's useful**: Debug transform chains, visualize frame relationships, and verify sensor mounting. ### Subcommands ```bash # Introspection horus tf list # List all frames (-v, --json) horus tf echo # Echo transform (-n, -r, --once, -t) horus tf tree # Show frame tree hierarchy (-o) horus tf info # Detailed frame information horus tf can # Check if transform is possible horus tf hz # Monitor frame update rates (-w) # Recording & Replay horus tf record -o # Record transforms to a .tfr file horus tf play # Replay a .tfr recording horus tf diff # Compare two .tfr recordings # Calibration horus tf tune # Interactively tune a static frame's offset horus tf calibrate # Compute sensor-to-base transform from point pairs (SVD) horus tf hand-eye # Solve hand-eye calibration (AX=XB) from pose pairs ``` ### Introspection Options | Subcommand | Flag | Description | |-----------|------|-------------| | `list` | `-v, --verbose` | Show detailed information | | `list` | `--json` | Output as JSON | | `echo` | `-n, --count ` | Number of transforms to print then exit | | `echo` | `-r, --rate ` | Throttle display rate | | `echo` | `--once` | Print one transform and exit | | `echo` | `-t, --timeout ` | Timeout waiting for transform | | `tree` | `-o, --output ` | Write tree to file | | `hz` | `-w, --window ` | Averaging window size | ### Examples **View frame tree**: ```bash horus tf tree # Output: # world # └── base_link # ├── laser_frame # ├── camera_frame # └── imu_frame ``` **Monitor a transform**: ```bash horus tf echo laser_frame world # Prints: translation [x, y, z] rotation [qx, qy, qz, qw] ``` **Check transform chain**: ```bash horus tf can laser_frame world # Output: Yes, chain: laser_frame -> base_link -> world ``` ### Recording & Replay Record transforms for offline analysis, comparison, or replay. ```bash # Record transforms for 30 seconds horus tf record -o session1.tfr -d 30 # Replay at half speed horus tf play session1.tfr -s 0.5 # Compare two recordings horus tf diff session1.tfr session2.tfr --threshold-m 0.01 ``` **Options:** | Subcommand | Flag | Description | |------------|------|-------------| | `record` | `-o, --output ` | Output .tfr file path (required) | | `record` | `-d, --duration ` | Maximum recording duration in seconds | | `play` | `-s, --speed ` | Playback speed multiplier (default: 1.0) | | `diff` | `--threshold-m ` | Translation difference threshold in meters (default: 0.001) | | `diff` | `--threshold-deg ` | Rotation difference threshold in degrees (default: 0.1) | | `diff` | `--json` | Output as JSON | ### Calibration Calibrate sensor mounting and hand-eye transforms directly from the CLI. **Tune a static frame interactively:** ```bash # Adjust laser_frame offset with fine steps horus tf tune laser_frame --step-m 0.001 --step-deg 0.1 ``` **Compute sensor-to-base transform from known point correspondences:** ```bash # CSV format: sensor_x,sensor_y,sensor_z,world_x,world_y,world_z horus tf calibrate --points-file calibration_points.csv ``` **Solve hand-eye calibration (AX=XB):** ```bash horus tf hand-eye --robot-poses robot.csv --sensor-poses sensor.csv ``` **Options:** | Subcommand | Flag | Description | |------------|------|-------------| | `tune` | `--step-m ` | Translation step size in meters (default: 0.001) | | `tune` | `--step-deg ` | Rotation step size in degrees (default: 0.1) | | `calibrate` | `--points-file ` | CSV file with point pairs (sensor_x,y,z,world_x,y,z) | | `hand-eye` | `--robot-poses ` | CSV file with robot poses | | `hand-eye` | `--sensor-poses ` | CSV file with sensor poses | --- ## `horus msg` - Message Type Introspection **What it does**: Inspect HORUS message type definitions and schemas. **Why it's useful**: Understand message structures, debug serialization issues, and verify type compatibility. ### Subcommands ```bash horus msg list # List all message types horus msg info # Show message definition horus msg hash # Show hash (for compatibility checking) ``` ### All Options | Subcommand | Flag | Description | |-----------|------|-------------| | `list` | `-v, --verbose` | Show detailed information | | `list` | `-f, --filter ` | Filter by name | | `list` | `--json` | Output as JSON | | `info` | `--json` | Output as JSON | | `hash` | `--json` | Output as JSON | ### Examples **List available message types**: ```bash horus msg list # Output: # CmdVel (horus_library::messages::cmd_vel) # LaserScan (horus_library::messages::sensor) # Odometry (horus_library::messages::sensor) # Image (horus_library::messages::vision) ``` **Show message definition**: ```bash horus msg info CmdVel # Output: # struct CmdVel { # stamp_nanos: u64, // nanoseconds # linear: f32, // m/s forward velocity # angular: f32, // rad/s turning velocity # } ``` --- ## `horus launch` - Launch Multiple Nodes **What it does**: Launch multiple nodes from a YAML configuration file. **Why it's useful**: Start complex multi-node systems with one command, define node dependencies and parameters. ### Basic Usage ```bash # Launch from file horus launch robot.yaml # Preview without launching horus launch robot.yaml --dry-run # Launch with namespace horus launch robot.yaml --namespace robot1 ``` ### All Options ```bash horus launch [OPTIONS] Arguments: Path to launch file (YAML) Options: -n, --dry-run Show what would launch without actually launching --namespace Namespace prefix for all nodes --list List nodes in the launch file without launching --shutdown-timeout Graceful shutdown timeout in seconds (default: 2) ``` ### Launch File Format ```yaml # robot.yaml session: warehouse_bot # Optional: session name for identification namespace: /robot # Optional: global namespace prefix for all nodes env: # Optional: global env vars (applied to all nodes) HORUS_LOG_LEVEL: info nodes: - name: sensor_node command: "horus run src/sensor.rs" rate_hz: 100 priority: 0 params: sample_rate: 100 - name: controller command: "horus run src/controller.rs" rate_hz: 50 priority: 1 depends_on: [sensor_node] # Waits for sensor_node to start first params: kp: 1.5 ki: 0.1 env: RUST_LOG: info - name: logger command: "horus run src/logger.py" rate_hz: 10 priority: 2 depends_on: [sensor_node, controller] start_delay: 1.0 # Wait 1s after dependencies are up restart: on-failure # Restart if it crashes ``` **Top-level fields:** | Field | Type | Required | Default | Description | |-------|------|----------|---------|-------------| | `session` | string | No | file stem | Session name for identification | | `namespace` | string | No | — | Global namespace prefix for all nodes | | `env` | map | No | `{}` | Environment variables applied to all nodes | | `nodes` | list | Yes | — | List of node configurations | **Node fields:** | Field | Type | Required | Default | Description | |-------|------|----------|---------|-------------| | `name` | string | Yes | — | Node identifier | | `command` | string | Yes* | — | Shell command to run (*or `package`) | | `package` | string | Yes* | — | HORUS package name (*or `command`) | | `args` | list | No | `[]` | Arguments appended to command | | `priority` | int | No | — | Execution priority (lower = higher) | | `rate_hz` | int | No | — | Node tick rate in Hz | | `namespace` | string | No | — | Local namespace prefix | | `params` | map | No | `{}` | Parameters passed to the node | | `env` | map | No | `{}` | Node-specific environment variables (overrides global) | | `depends_on` | list | No | `[]` | Node names that must start before this one | | `start_delay` | float | No | — | Delay in seconds before starting this node | | `restart` | string | No | `never` | Restart policy: `never`, `always`, `on-failure` | ### Dependency Ordering Nodes are launched in **topological order** based on `depends_on`. The launcher detects circular dependencies and rejects them: ```yaml nodes: - name: camera command: "horus run src/camera.rs" - name: detector command: "horus run src/detector.rs" depends_on: [camera] # camera starts first - name: planner command: "horus run src/planner.rs" depends_on: [detector, lidar] # waits for both ``` Diamond dependencies work fine (D depends on B and C, both depend on A — A starts once). ### Namespace Resolution Namespaces compose hierarchically. Global namespace (from top-level or `--namespace` flag) prefixes the node's local namespace: | Global | Node namespace | Node name | Result | |--------|---------------|-----------|--------| | `/fleet` | `/arm` | `gripper` | `/fleet/arm/gripper` | | `/fleet` | — | `gripper` | `/fleet/gripper` | | — | `/arm` | `gripper` | `/arm/gripper` | | — | — | `gripper` | `gripper` | ### Auto-Set Environment Variables The launcher automatically sets these for each node process: | Variable | Value | |----------|-------| | `HORUS_NODE_NAME` | Node name | | `HORUS_NODE_PRIORITY` | Priority (if set) | | `HORUS_NODE_RATE_HZ` | Rate in Hz (if set) | | `HORUS_NAMESPACE` | Global namespace (if set) | | `HORUS_NODE_NAMESPACE` | Node namespace (if set) | | `HORUS_PARAM_` | Each param (key uppercased, dashes become underscores) | ### Lifecycle 1. **Parse** — Validate YAML, check for circular dependencies 2. **Sort** — Topological sort by `depends_on` 3. **Launch** — Start nodes in order, applying `start_delay` per node 4. **Monitor** — All processes run, stdout/stderr inherited to terminal 5. **Shutdown** — Ctrl+C sends SIGTERM, waits `--shutdown-timeout` seconds, then SIGKILL ### Examples **Launch robot system**: ```bash horus launch robot.yaml ``` **Launch with namespace (for multi-robot)**: ```bash horus launch robot.yaml --namespace robot1 horus launch robot.yaml --namespace robot2 ``` **Multi-robot fleet with same launch file**: ```yaml # formation.yaml nodes: - name: scout_controller package: multi_robot rate_hz: 20 namespace: /scout_1 params: robot_id: 1 - name: scout_controller package: multi_robot rate_hz: 20 namespace: /scout_2 params: robot_id: 2 - name: coordinator package: multi_robot rate_hz: 5 depends_on: [scout_controller] params: num_robots: 2 ``` **Preview what would launch**: ```bash horus launch formation.yaml --dry-run ``` **List nodes without launching**: ```bash horus launch formation.yaml --list ``` --- ## `horus deploy` - Deploy to Remote Robot(s) **What it does**: Cross-compile and deploy your project to one or more remote robots over SSH. Supports named targets from `deploy.yaml`, fleet deployment to multiple robots, and parallel sync. **Why it's useful**: Deploy from development machine to embedded robots. Build once, sync to many. Supports Raspberry Pi, Jetson, and any Linux target. ### Basic Usage ```bash # Deploy to a host directly horus deploy pi@192.168.1.100 # Deploy to a named target from deploy.yaml horus deploy jetson-01 # Deploy and run immediately horus deploy jetson-01 --run # Deploy to multiple robots horus deploy jetson-01 jetson-02 jetson-03 # Deploy to ALL configured targets horus deploy --all # List configured targets horus deploy --list ``` ### All Options ```bash horus deploy [OPTIONS] [TARGETS]... Arguments: [TARGETS]... Target(s) — named targets from deploy.yaml or direct user@host Options: --all Deploy to ALL targets in deploy.yaml --parallel Deploy to multiple targets in parallel -d, --dir Remote directory (default: ~/horus_deploy) -a, --arch Target architecture (aarch64, armv7, x86_64, native) --run Run the project after deploying --debug Build in debug mode instead of release -p, --port SSH port (default: 22) -i, --identity SSH identity file -n, --dry-run Show what would be done without doing it --list List configured deployment targets ``` ### Fleet Deployment Deploy to multiple robots at once. The build runs once (shared across targets with the same architecture), then syncs to each robot: ```bash # Sequential (default) — deploys one at a time horus deploy jetson-01 jetson-02 jetson-03 # All targets from deploy.yaml horus deploy --all # Dry run to preview fleet deployment horus deploy --all --dry-run ``` ### Configure Named Targets Create `deploy.yaml` in your project root: ```yaml targets: jetson-01: host: nvidia@10.0.0.1 arch: aarch64 dir: ~/robot jetson-02: host: nvidia@10.0.0.2 arch: aarch64 dir: ~/robot arm-controller: host: pi@10.0.0.10 arch: aarch64 dir: ~/arm port: 2222 identity: ~/.ssh/robot_key ``` Then deploy by name: ```bash horus deploy jetson-01 # single target horus deploy jetson-01 jetson-02 # multiple targets horus deploy --all # all targets horus deploy --list # show all configured targets ``` ### Examples **Deploy to Raspberry Pi**: ```bash horus deploy pi@raspberrypi.local --arch aarch64 ``` **Deploy and run on NVIDIA Jetson**: ```bash horus deploy ubuntu@jetson.local --arch aarch64 --run ``` **Deploy entire warehouse fleet**: ```bash horus deploy --all --dry-run # preview first horus deploy --all # deploy to all robots ``` Then deploy with: ```bash horus deploy jetson --run ``` --- ## `horus add` - Add Dependency **What it does**: Adds a dependency to `horus.toml`. Auto-detects the source (crates.io, PyPI, system) based on your project language. Like `cargo add` — for project dependencies that get built with your code. **Why it's useful**: Single command to add dependencies from any ecosystem — Rust, Python, system packages, or the horus registry. For standalone tools and plugins, use `horus install` instead. ### Basic Usage ```bash # Auto-detects source from project language horus add serde # Rust project → crates.io horus add numpy # Python project → PyPI # Explicit source override horus add serde --source crates-io horus add numpy --source pypi horus add libudev --source system horus add horus-nav-stack --source registry # With version and features horus add serde@1.0 --features derive # Add to dev-dependencies horus add criterion --dev # Add as driver horus add camera-driver --driver # JSON output (for tooling) horus add serde --json ``` ### Source Auto-Detection | Project Type | `horus add foo` defaults to | |-------------|----------------------------| | Rust only | crates.io | | Python only | PyPI | | Multi-language | Checks known package tables, then project context | | C++ only | system | Override with `--source`: `crates-io`, `pypi`, `system`, `registry`, `git`, `path` --- ## `horus install` - Install Standalone Package **What it does**: Installs a standalone package or plugin from the horus registry to `~/.horus/cache/`. Like `cargo install` — for tools and plugins that aren't project dependencies. **Why it's useful**: Install prebuilt drivers, plugins, and CLI extensions that work across all your projects. Does NOT modify `horus.toml` — use `horus add` for project dependencies. ### `horus add` vs `horus install` | Command | Like | Purpose | Modifies horus.toml? | |---------|------|---------|---------------------| | `horus add serde` | `cargo add` | Add project dependency | Yes | | `horus install slam-toolbox` | `cargo install` | Install standalone tool/plugin | No | ### Basic Usage ```bash horus install horus-sim3d # Install a plugin horus install rplidar-driver@1.2.0 # Install specific version horus install horus-visualizer --plugin # Install as CLI plugin horus install driver --target aarch64 # Install for specific target horus install driver --json # JSON output ``` --- ## `horus remove` - Remove Dependency **What it does**: Removes a dependency from `horus.toml`. **Why it's useful**: Clean up unused dependencies from your project. ### Basic Usage ```bash horus remove pid-controller # Remove and purge unused dependencies horus remove sensor-fusion --purge ``` ### All Options ```bash horus remove [OPTIONS] Arguments: Package/driver/plugin name to remove Options: --purge Also remove unused dependencies ``` --- ## `horus plugin` - Plugin Management **Alias**: `horus plugins` **What it does**: Manage HORUS plugins (extensions that add CLI commands or features). Plugins are installed via `horus install --plugin` and managed with the `plugin` subcommands. **Why it's useful**: Enable/disable plugins without uninstalling, verify plugin integrity after updates. ### Subcommands ```bash horus plugin enable # Enable a disabled plugin horus plugin disable # Disable a plugin (keep installed but don't execute) horus plugin verify [plugin] # Verify integrity of installed plugins ``` ### All Options | Subcommand | Flag | Description | |-----------|------|-------------| | `disable` | `--reason ` | Reason for disabling (recorded in config) | | `verify` | `--json` | Output as JSON | ### Examples **Enable a plugin**: ```bash horus plugin enable horus-visualizer ``` **Disable a plugin temporarily**: ```bash horus plugin disable horus-visualizer --reason "debugging" ``` **Verify all plugins**: ```bash horus plugin verify ``` **Verify a specific plugin**: ```bash horus plugin verify horus-visualizer --json ``` **Install and manage a plugin end-to-end**: ```bash horus search visualizer # Find plugins horus install horus-visualizer --plugin # Install as plugin horus plugin disable horus-visualizer # Disable temporarily horus plugin enable horus-visualizer # Re-enable horus plugin verify horus-visualizer # Verify integrity ``` --- ## `horus cache` - Cache Management **What it does**: Manage the HORUS package cache (downloaded packages, compiled artifacts). **Why it's useful**: Reclaim disk space, troubleshoot package issues. ### Subcommands ```bash horus cache info # Show cache statistics (size, package count) horus cache list # List all cached packages horus cache clean # Remove unused packages (--dry-run to preview) horus cache purge # Remove ALL cached packages (-y to skip confirmation) ``` ### All Options | Subcommand | Flag | Description | |-----------|------|-------------| | `info` | `--json` | Output as JSON | | `list` | `--json` | Output as JSON | | `clean` | `-n, --dry-run` | Preview what would be removed | | `purge` | `-y, --yes` | Skip confirmation prompt | ### Examples **Check cache usage**: ```bash horus cache info # Output: # Location: ~/.horus/cache # Total size: 1.2 GB # Packages: 45 # Last cleaned: 7 days ago ``` **Clean unused packages**: ```bash horus cache clean # Removes packages not used by any project ``` --- ## `horus record` - Record/Replay Management **What it does**: Manage recorded sessions for debugging and testing. **Why it's useful**: Replay exact scenarios, compare runs, debug timing-sensitive issues. ### Subcommands ```bash horus record list # List all recordings horus record info # Show recording details horus record replay # Replay a recording horus record delete # Delete a recording horus record diff
# Compare two recordings horus record export # Export to different format horus record inject # Inject recorded data into live scheduler ``` ### All Options | Subcommand | Flag | Description | |-----------|------|-------------| | `list` | `-l, --long` | Show extended details | | `list` | `--json` | Output as JSON | | `info` | `--json` | Output as JSON | | `delete` | `-f, --force` | Skip confirmation prompt | | `replay` | `--start-tick ` | Start replay from tick N | | `replay` | `--stop-tick ` | Stop replay at tick N | | `replay` | `--speed ` | Playback speed multiplier (default: 1.0) | | `replay` | `--override ` | Override node output (format: `node.output=value`, repeatable) | | `diff` | `-n, --limit ` | Limit number of differences shown | | `export` | `-o, --output ` | Output file path (required) | | `export` | `-f, --format ` | Output format (default: json) | | `inject` | `-n, --nodes ` | Node names to inject (comma-separated) | | `inject` | `--all` | Inject all nodes from recording | | `inject` | `-s, --script ` | Script file for injection rules | | `inject` | `--start-tick ` | Start injection from tick N | | `inject` | `--stop-tick ` | Stop injection at tick N | | `inject` | `--speed ` | Injection speed multiplier (default: 1.0) | | `inject` | `--loop` | Loop the injection continuously | ### Examples **List recordings**: ```bash horus record list # Output: # ID DATE DURATION NODES SIZE # rec_001 2024-01-15 10:30:00 5m 23s 4 45MB # rec_002 2024-01-15 14:15:00 2m 10s 3 18MB ``` **Replay a session**: ```bash horus record replay rec_001 ``` **Compare two runs**: ```bash horus record diff rec_001 rec_002 # Shows differences in timing, message counts, errors ``` **Inject recorded data into live system**: ```bash # Use recorded sensor data with live controller horus record inject rec_001 --nodes SensorNode ``` --- ## `horus blackbox` - BlackBox Flight Recorder **Alias**: `horus bb` **What it does**: Inspects the BlackBox flight recorder for post-mortem crash analysis. The BlackBox automatically records scheduler events, errors, deadline misses, and safety state changes. **Why it's useful**: After a crash or anomaly, review exactly what happened — which nodes failed, when deadlines were missed, and what the safety state was at each tick. ### Basic Usage ```bash # View all recorded events horus blackbox # Show only anomalies (errors, deadline misses, WCET violations, e-stops) horus blackbox --anomalies # Follow mode — stream events in real-time (like tail -f) horus blackbox --follow ``` ### All Options ```bash horus blackbox [OPTIONS] Options: -a, --anomalies Show only anomalies (errors, deadline misses, WCET violations, e-stops) -f, --follow Follow mode — stream new events as they arrive -t, --tick Filter by tick range (e.g. "4500-4510" or "4500") -n, --node Filter by node name (partial, case-insensitive) -e, --event Filter by event type (e.g. "DeadlineMiss", "NodeError") --json Output as machine-readable JSON -l, --last Show only the last N events -p, --path Custom blackbox directory (default: .horus/blackbox/) --clear Clear all blackbox data (with confirmation) ``` ### Examples **View recent anomalies**: ```bash horus bb --anomalies --last 20 ``` **Filter by node and tick range**: ```bash horus bb --node controller --tick 4500-4510 ``` **Stream events in real-time while debugging**: ```bash horus bb --follow --anomalies ``` **Export to JSON for external analysis**: ```bash horus bb --json > blackbox_dump.json ``` **Clear old data**: ```bash horus bb --clear ``` --- ## `horus fmt` - Format Code **What it does**: Formats your project's source code using language-appropriate tools (Rust via `rustfmt`, Python via `ruff`/`black`). **Why it's useful**: Enforce consistent code style across your project without manual formatting. Use `--check` in CI to fail on unformatted code. ### Basic Usage ```bash # Format all code in the project horus fmt # Check formatting without modifying files (useful for CI) horus fmt --check ``` ### All Options ```bash horus fmt [OPTIONS] [-- EXTRA_ARGS] Options: --check Check formatting without modifying files -- Additional arguments passed to underlying tools ``` **Options:** | Flag | Description | |------|-------------| | `--check` | Check formatting without modifying files (exit code 1 if unformatted) | | `-- ` | Additional arguments passed to `rustfmt` or `ruff format` | ### Examples **Format before committing:** ```bash horus fmt git add -A && git commit -m "formatted" ``` **CI formatting check:** ```bash horus fmt --check || (echo "Run 'horus fmt' to fix formatting" && exit 1) ``` **Pass extra arguments to rustfmt:** ```bash horus fmt -- --edition 2021 ``` --- ## `horus lint` - Lint Code **What it does**: Runs linters on your project (Rust via `clippy`, Python via `ruff`/`pylint`). Optionally runs type checking for Python. **Why it's useful**: Catch bugs, anti-patterns, and style issues before they reach production. ### Basic Usage ```bash # Lint all code horus lint # Auto-fix lint issues where possible horus lint --fix # Also run Python type checker (mypy/pyright) horus lint --types ``` ### All Options ```bash horus lint [OPTIONS] [-- EXTRA_ARGS] Options: --fix Auto-fix lint issues where possible --types Also run Python type checker (mypy/pyright) -- Additional arguments passed to underlying tools ``` **Options:** | Flag | Description | |------|-------------| | `--fix` | Auto-fix lint issues where possible | | `--types` | Also run Python type checker (mypy/pyright) | | `-- ` | Additional arguments passed to `clippy` or `ruff check` | ### Examples **Lint and auto-fix:** ```bash horus lint --fix ``` **Full lint with type checking:** ```bash horus lint --types ``` **CI lint gate:** ```bash horus lint || exit 1 ``` --- ## `horus doc` - Generate Documentation **What it does**: Generates documentation for your project. Supports multiple output formats (JSON, Markdown, HTML), doc coverage reporting, API extraction, and topic/message flow graphs. **Why it's useful**: Generate API docs, measure documentation coverage, extract machine-readable API data for tooling, and enforce minimum coverage in CI. ### Basic Usage ```bash # Generate docs and open in browser horus doc --open # Show documentation coverage report horus doc --coverage # Extract machine-readable API documentation as JSON horus doc --extract --json ``` ### All Options ```bash horus doc [OPTIONS] [-- EXTRA_ARGS] Options: --open Open documentation in browser after generating --extract Extract machine-readable API documentation --json Output as JSON --md Output as markdown (for LLM context) --html Output as self-contained HTML report --full Include doc comments in brief output --all Include private/crate-only symbols --lang Filter by language (rust, python) --coverage Show documentation coverage report -o, --output Write output to file instead of stdout --diff Compare against a baseline JSON file --fail-under Fail if doc coverage is below this percentage (for CI) --watch Watch for file changes and regenerate -- Additional arguments passed to underlying tools ``` **Options:** | Flag | Description | |------|-------------| | `--open` | Open documentation in browser after generating | | `--extract` | Extract machine-readable API documentation | | `--json` | Output as JSON | | `--md` | Output as markdown (useful for LLM context) | | `--html` | Output as self-contained HTML report | | `--full` | Include doc comments in brief output | | `--all` | Include private/crate-only symbols | | `--lang ` | Filter by language (`rust`, `python`) | | `--coverage` | Show documentation coverage report | | `-o, --output ` | Write output to file instead of stdout | | `--diff ` | Compare against a baseline JSON file | | `--fail-under ` | Fail if doc coverage is below this percentage (for CI) | | `--watch` | Watch for file changes and regenerate | ### Examples **Generate and browse docs:** ```bash horus doc --open ``` **CI documentation coverage gate:** ```bash horus doc --coverage --fail-under 80 ``` **Extract API docs as markdown for LLM context:** ```bash horus doc --extract --md -o api.md ``` **Track API changes between releases:** ```bash horus doc --extract --json -o api-v2.json horus doc --diff api-v1.json ``` **Watch mode during development:** ```bash horus doc --watch --open ``` --- ## `horus bench` - Run Benchmarks **What it does**: Runs benchmarks for your HORUS project. Supports filtering by name and passing extra arguments to the underlying benchmark framework. **Why it's useful**: Measure and track performance of your nodes, algorithms, and IPC throughput. ### Basic Usage ```bash # Run all benchmarks horus bench # Run benchmarks matching a filter horus bench latency ``` ### All Options ```bash horus bench [OPTIONS] [FILTER] [-- EXTRA_ARGS] Arguments: [FILTER] Filter benchmarks by name Options: -- Additional arguments passed to underlying tools ``` **Options:** | Flag | Description | |------|-------------| | `[FILTER]` | Filter benchmarks by name (substring match) | | `-- ` | Additional arguments passed to the benchmark runner | ### Examples **Run all benchmarks:** ```bash horus bench ``` **Run specific benchmarks:** ```bash horus bench ipc_throughput ``` **Pass extra arguments to criterion:** ```bash horus bench -- --sample-size 100 ``` --- ## `horus deps` - Dependency Insight **What it does**: Inspect, audit, and manage your project's dependencies. Provides subcommands for viewing the dependency tree, explaining why a package is included, checking for outdated packages, and running security audits. **Why it's useful**: Understand your dependency graph, find unused or outdated packages, and catch known vulnerabilities. ### Subcommands ```bash horus deps tree # Show dependency tree horus deps why # Explain why a dependency is included horus deps outdated # Check for outdated dependencies horus deps audit # Security audit of dependencies ``` ### Examples **View dependency tree:** ```bash horus deps tree ``` **Find out why a package is included:** ```bash horus deps why serde # Output: serde is required by: # horus_core -> serde (serialize node configs) # horus_library -> serde (message serialization) ``` **Check for outdated dependencies:** ```bash horus deps outdated ``` **Run security audit:** ```bash horus deps audit # Checks against RustSec advisory database ``` **Pass extra arguments to underlying tools:** ```bash horus deps tree -- --depth 2 ``` --- ## `horus doctor` - Health Check **What it does**: Runs a comprehensive health check of your HORUS ecosystem, including toolchain versions, configuration validity, system dependencies, and environment setup. With `--fix`, installs missing toolchains and system dependencies automatically. **Why it's useful**: Quickly diagnose setup issues, verify that all required tools are installed, and ensure your environment is ready for development. New contributors can run `horus doctor --fix` to get a working environment in one command. ### Basic Usage ```bash # Run health check horus doctor # Install missing toolchains and system dependencies horus doctor --fix # Verbose output with detailed check results horus doctor --verbose # Output as JSON (for tooling) horus doctor --json ``` ### All Options ```bash horus doctor [OPTIONS] Options: -v, --verbose Show detailed output for each check --json Output as JSON --fix Install missing toolchains and system dependencies ``` | Flag | Description | |------|-------------| | `-v, --verbose` | Show detailed output for each check | | `--json` | Output as JSON | | `--fix` | Install missing toolchains and system deps, pin versions to `horus.lock` | ### What It Checks | Check | Description | |-------|-------------| | Toolchains | cargo, rustc, python3, ruff, pytest, cmake, clang-format, clang-tidy | | Manifest | horus.toml validity and structure | | Shared Memory | SHM directory exists and is accessible | | Plugins | Global and local plugin counts | | Disk | Build cache size (.horus/ directory) | | Languages | Detected project languages (Rust, Python, C++) | | Dependencies | Dependency source validation | | Drivers | Device reachability (serial ports, I2C buses, network endpoints) | | System Deps | Python version, C++ compiler, system libraries from horus.toml | ### Examples **Quick health check:** ```bash horus doctor # horus doctor # # ✓ Toolchains — 6/8 tools found # ✓ Manifest — horus.toml valid # ✓ Shared Memory — /dev/shm available # ✓ Plugins — 2 global, 0 local # ✓ Disk — Build cache uses 148.3 MB # ✓ Languages — Rust, Python # ✓ Dependencies — 12 deps, sources look correct # ✓ Drivers — 3 driver(s) reachable # ✗ System Deps — 2/3 satisfied — run horus doctor --fix to install ``` **Fix missing dependencies:** ```bash horus doctor --fix # Runs all checks, then installs missing toolchains and system packages. # Pins installed versions to horus.lock [toolchain] and [[system]] sections. ``` **Hardware reachability** (checks `[hardware]` entries from horus.toml): ```bash horus doctor --verbose # ... # ✓ Hardware — 3 hardware node(s) checked, some unreachable # ✓ 'imu': /dev/i2c-1 found # ✗ driver 'arm': /dev/ttyUSB0 not found # ! driver 'lidar': 192.168.1.201:2368 unreachable ``` Checks serial ports (`Path::exists`), I2C buses (`/dev/i2c-N`), and network endpoints (`TcpStream` with 2s timeout). No terra dependency — pure OS-level checks. **CI environment validation:** ```bash horus doctor --json | jq '.[] | select(.health != "ok")' ``` --- ## `horus self update` - Update CLI **What it does**: Updates the horus CLI binary to the latest version. Like `rustup self update`. **Why it's useful**: Keep your CLI current without affecting project dependencies (use `horus update` for deps). ### Basic Usage ```bash # Update the CLI binary horus self update # Check for updates without installing horus self update --check ``` ### Options | Flag | Description | |------|-------------| | `--check` | Check for available updates without installing | ### Examples **Check if update is available:** ```bash horus self update --check # Output: Current: 0.1.9, Latest: 0.2.0 — update available ``` **Update the CLI:** ```bash horus self update ``` **Update project dependencies separately:** ```bash horus update # Update deps in horus.toml horus self update # Update CLI binary ``` --- ## `horus config` - Config Management **What it does**: View and edit `horus.toml` settings from the command line using dot-notation keys. **Why it's useful**: Quickly inspect or modify project configuration without opening an editor. Useful for scripting and CI. ### Subcommands ```bash horus config get # Get a config value horus config set # Set a config value horus config list # List all config values ``` ### Examples **Get a config value:** ```bash horus config get package.name # Output: my_robot ``` **Set a config value:** ```bash horus config set package.version "0.2.0" ``` **List all config values:** ```bash horus config list # Output: # package.name = "my_robot" # package.version = "0.1.0" # package.language = "rust" ``` --- ## `horus migrate` - Migrate to horus.toml **What it does**: Migrates an existing project to the unified `horus.toml` format. Detects existing `Cargo.toml`, `pyproject.toml`, or `package.xml` files and consolidates them into a single `horus.toml` manifest. **Why it's useful**: Move legacy projects to the unified HORUS manifest format. The `--dry-run` flag lets you preview changes before committing. ### Basic Usage ```bash # Migrate (interactive, asks for confirmation) horus migrate # Preview what would change horus migrate --dry-run # Skip confirmation prompts horus migrate --force ``` ### All Options ```bash horus migrate [OPTIONS] Options: -n, --dry-run Show what would change without modifying -f, --force Skip confirmation prompts ``` **Options:** | Flag | Description | |------|-------------| | `-n, --dry-run` | Show what would change without modifying | | `-f, --force` | Skip confirmation prompts | ### Examples **Preview migration:** ```bash horus migrate --dry-run # Output: # Would create: horus.toml # Would move: Cargo.toml -> .horus/Cargo.toml # Would extract: 5 dependencies from Cargo.toml ``` **Migrate existing Rust project:** ```bash cd my-existing-project horus migrate --force # horus.toml created, Cargo.toml moved to .horus/ ``` --- ## `horus scripts` - Run Scripts **Alias**: `horus script` **What it does**: Runs a named script defined in the `[scripts]` section of `horus.toml`. If no script name is given, lists all available scripts. **Why it's useful**: Define project-specific commands (test suites, deployment steps, data processing) in `horus.toml` and run them with a single command. ### Basic Usage ```bash # List available scripts horus scripts # Run a script by name horus scripts deploy # Run a script with arguments horus scripts test -- --verbose ``` ### All Options ```bash horus scripts [NAME] [-- ARGS] Arguments: [NAME] Script name to run (omit to list available scripts) -- Arguments to pass to the script ``` ### Examples **Define scripts in horus.toml:** ```toml [scripts] deploy = "rsync -avz ./target/release/ robot@192.168.1.100:~/app/" test-hw = "horus test --integration --sim" bench = "horus bench -- --sample-size 50" ``` **List scripts:** ```bash horus scripts # Output: # deploy rsync -avz ./target/release/ robot@192.168.1.100:~/app/ # test-hw horus test --integration --sim # bench horus bench -- --sample-size 50 ``` **Run a script:** ```bash horus scripts deploy ``` --- ## `horus env` - Shell Integration **What it does**: Manages shell integration for HORUS, setting up proxy wrappers so that `cargo`, `pip`, and `cmake` invocations inside a HORUS project automatically use the correct `.horus/` build directory and dependencies. **Why it's useful**: Lets native tools (cargo, pip, cmake) work seamlessly with horus.toml projects without manual configuration. ### Basic Usage ```bash # Print shell integration script to stdout horus env # Install shell integration (sets up cargo/pip/cmake proxy) horus env --init # Remove shell integration horus env --uninstall ``` ### All Options ```bash horus env [OPTIONS] Options: --init Install shell integration (cargo/pip/cmake proxy) --uninstall Remove shell integration ``` ### How It Works Running `horus env --init` adds shell functions that intercept calls to `cargo`, `pip`, and `cmake`. When you run `cargo build` inside a HORUS project directory, the proxy detects the `horus.toml`, redirects to the generated `.horus/Cargo.toml`, and passes the build through the HORUS pipeline. Outside of HORUS projects, the native tools behave normally. To activate for the current shell session without permanent installation: ```bash eval "$(horus env)" ``` To make it permanent, `horus env --init` appends the integration to your shell profile (`~/.bashrc`, `~/.zshrc`, etc.). ### Examples **One-time setup (recommended):** ```bash horus env --init # Restart your shell or source your profile source ~/.bashrc ``` **Verify integration is active:** ```bash horus env # Prints the shell functions; if already sourced, cargo/pip are proxied ``` **Remove if no longer needed:** ```bash horus env --uninstall source ~/.bashrc ``` For a detailed walkthrough of how native tool integration works, see **[Native Tool Integration](/development/native-tools)**. --- ## `horus completion` - Shell Completions **What it does**: Generates shell completion scripts for bash, zsh, fish, elvish, or PowerShell. This is a hidden command (not shown in `horus --help`). **Why it's useful**: Get tab-completion for all horus commands and flags in your shell. ### Basic Usage ```bash # Bash horus completion bash > ~/.local/share/bash-completion/completions/horus # Zsh horus completion zsh > ~/.zfunc/_horus # Fish horus completion fish > ~/.config/fish/completions/horus.fish ``` ### Supported Shells | Shell | Value | |-------|-------| | Bash | `bash` | | Zsh | `zsh` | | Fish | `fish` | | Elvish | `elvish` | | PowerShell | `powershell` | --- ## Common Workflows ### First Time Using HORUS ```bash # Create a project horus new my_robot --rust cd my_robot # Run it horus run # Monitor it (open a second terminal) horus monitor ``` ### Daily Development Cycle ```bash # Edit your code vim src/main.rs # Format + lint before running horus fmt horus lint # Run and test horus run horus test # Monitor in another terminal horus monitor ``` ### Debugging a Motor Stutter Your robot's motor is stuttering. Here's how to diagnose with the CLI: ```bash # Step 1: Run with monitoring horus run --release # Step 2: Watch the motor command topic (another terminal) horus topic echo motor/cmd_vel # Step 3: Check if the motor node is missing deadlines horus node info motor_ctrl # Look at: avg_tick_ms, max_tick_ms, deadline_misses # Step 4: Check the blackbox for deadline miss events horus blackbox --follow # Step 5: Record a session for offline analysis horus record start # ... reproduce the stutter ... horus record stop # Step 6: Replay the session to reproduce horus record replay --session latest ``` ### Adding a New Sensor You're adding a LiDAR to your robot: ```bash # Step 1: Check if there's a driver package available horus search rplidar horus install rplidar # Step 2: If not, create your own driver horus new lidar_driver --rust cd lidar_driver # ... write your driver code ... # Step 3: Run and verify data flow horus run # In another terminal: horus topic list # See all topics horus topic echo lidar/scan # Watch scan data horus topic hz lidar/scan # Check publish rate # Step 4: Verify coordinate frames horus tf tree # See frame hierarchy horus tf echo lidar base_link # Check transform ``` ### Preparing for Field Deployment Pre-deployment checklist using the CLI: ```bash # Step 1: Validate everything horus doctor # Ecosystem health check horus check # Validate horus.toml horus fmt --check # Ensure code is formatted horus lint # Check for issues # Step 2: Run tests horus test horus bench # Check performance hasn't regressed # Step 3: Check dependencies horus deps outdated # Find outdated deps horus deps audit # Security audit # Step 4: Clean build and verify horus run --clean --release # Step 5: Deploy to robot horus deploy robot@192.168.1.10 --release ``` ### Multi-Robot Development Working with multiple robots that share code: ```bash # Step 1: Publish shared packages cd common_messages horus publish cd ../lidar_driver horus publish # Step 2: Install on each robot project cd robot_alpha horus install common_messages lidar_driver cd ../robot_beta horus install common_messages lidar_driver # Step 3: Check running nodes horus node list ``` ### CI/CD Pipeline ```bash # In your CI config (GitHub Actions, GitLab CI, etc.): horus fmt --check # Fail if unformatted horus lint # Fail on lint errors horus check # Validate project horus test # Run all tests horus bench --fail-under 0.95 # Performance gate (optional) horus doc --coverage --fail-under 80 # Doc coverage gate (optional) ``` ### Share Your Work ```bash # Login once horus auth login # Publish horus publish # Others can now: horus install your-package-name ``` --- ## Troubleshooting ### "command not found: horus" Add cargo to your PATH: ```bash export PATH="$HOME/.cargo/bin:$PATH" echo 'export PATH="$HOME/.cargo/bin:$PATH"' >> ~/.bashrc source ~/.bashrc ``` ### "Port already in use" ```bash # Use different port horus monitor 3001 # Or kill the old process lsof -ti:3000 | xargs kill -9 ``` ### Build is slow First build is always slow (5-10 min). After that it's fast (seconds). Use `--release` only when you need speed, not during development. ### "Failed to create Topic" Topic name conflict. Try a unique name or clean up stale shared memory. **Note**: HORUS automatically cleans up shared memory after each run using session isolation. This error usually means a previous run crashed. ```bash # Clean all HORUS shared memory (if needed after crashes) horus clean --shm ``` --- ## Environment Variables Optional configuration: ```bash # Custom registry (for companies) export HORUS_REGISTRY_URL=https://your-company-registry.com # Debug mode (see what's happening) export RUST_LOG=debug horus run # CI/CD authentication export HORUS_API_KEY=your-key-here ``` --- ## Utility Scripts Beyond the `horus` CLI, the repository includes helpful scripts: ```bash ./install.sh # Install or update HORUS ``` See **[Troubleshooting & Maintenance](/getting-started/troubleshooting)** for complete details. --- ## Next Steps Now that you know the commands: 1. **[Quick Start](/getting-started/quick-start)** - Build your first app 2. **[node! Macro](/concepts/node-macro)** - Write less code 3. **[Monitor Guide](/development/monitor)** - Master monitoring 4. **[Examples](/rust/examples/basic-examples)** - See real applications **Having issues?** Check the **[Troubleshooting Guide](/getting-started/troubleshooting)** for solutions to common problems. --- ## See Also - [Development Tools](/development) — Overview of all development tools and workflows - [horus.toml](/concepts/horus-toml) — Project manifest reference and configuration options - [Package Management](/package-management/package-management) — Dependency management and registry - [Monitor Guide](/development/monitor) — Detailed web and TUI monitoring features - [Troubleshooting](/getting-started/troubleshooting) — Solutions to common CLI errors and environment issues --- ## AI Integration Path: /development/ai-integration Description: Integrate AI and ML models (PyTorch, ONNX, cloud APIs) into HORUS robotics nodes # AI Integration You need to run AI/ML inference (object detection, path planning, scene understanding) alongside real-time robot control. Here is how to integrate ML models into HORUS nodes without blocking the control loop. ## When To Use This - Adding object detection, classification, or pose estimation to a robot - Running ONNX or PyTorch models alongside real-time control nodes - Calling cloud APIs (GPT-4, Claude) for task planning from HORUS - Prototyping ML pipelines in Python with HORUS pub/sub **Use [AI-Assisted Development](/development/ai-assisted-development) instead** if you want to use AI coding agents to write HORUS code, not embed ML models in your robot. ## Prerequisites - A HORUS project (Rust or Python) - For Python ML: PyTorch, ONNX Runtime, or your preferred ML library installed - For Rust ML: `ort` or `tract-onnx` crate added to dependencies - Familiarity with [Topics](/concepts/core-concepts-topic) (ML nodes communicate via pub/sub) ## Overview HORUS supports AI integration through two main approaches: **Python ML Nodes** (Recommended for Prototyping) - Use any Python ML library (PyTorch, TensorFlow, ONNX, etc.) - Hardware nodes handle camera/sensor capture - Pub/sub connects ML pipeline to control nodes - 10-100ms typical inference latency **Rust Inference** (For Production) - ONNX Runtime via `ort` crate - Tract (pure Rust inference engine) - 1-50ms typical inference latency ### Architecture Pattern Sensor Node
~1ms"] -->|Topic| M["ML Node
~10-50ms"] M -->|Topic| C["Control Node
~1μs"] style S fill:#3b82f6,color:#fff style M fill:#8b5cf6,color:#fff style C fill:#10b981,color:#fff `} caption="AI pipeline: Sensor → ML inference → Real-time control" /> The key insight: keep AI inference in dedicated nodes. HORUS topics decouple the fast control loop from slower ML processing, so a slow inference step doesn't block motor commands. --- ## ML Inference ### Model Format Comparison | Format | Crate | Use Case | External Deps | |--------|-------|----------|---------------| | **ONNX** | `ort` | General (PyTorch, TF exports) | ONNX Runtime C lib | | **ONNX** | `tract-onnx` | Pure Rust inference | None | | **TFLite** | `tflite` | Edge/mobile models | TFLite C lib | --- ## Cloud API Integration For complex reasoning tasks (task planning, scene understanding, natural language), call cloud APIs from HORUS nodes. --- ## Performance Considerations ### Latency Budget Typical robotics control loop at 100Hz (10ms cycle): ``` Sensor capture: ~1-16ms (hardware dependent) ML inference: ~5-50ms (model dependent) Topic transfer: ~85ns (HORUS shared memory) Control logic: ~1μs (HORUS node tick) Motor command: ~1ms (hardware actuator) ``` ML inference is typically the bottleneck. Strategies to manage this: ### Throttle Inference Process every Nth frame instead of every frame: ### Async Processing Run ML in a background thread so the control loop isn't blocked: ### Use Appropriate Models | Task | CPU Model | GPU Model | Cloud API | |------|-----------|-----------|-----------| | **Object Detection** | YOLOv8n (ONNX) | YOLOv8x | GPT-4 Vision | | **Classification** | MobileNet (TFLite) | EfficientNet | Cloud Vision | | **Pose Estimation** | MediaPipe | OpenPose | - | | **Task Planning** | Phi-3 Mini | Llama 3 | GPT-4 / Claude | | **Depth Estimation** | MiDaS Small | MiDaS Large | - | --- ## Best Practices 1. **Separate concerns**: Keep AI inference in dedicated nodes. Don't mix ML code with control logic. 2. **Handle failures gracefully**: AI inference can fail. Always have a safe fallback: ```python def control_tick(node): if node.has_msg("detections"): detections = node.recv("detections") react_to(detections) else: # Safe default when no detections available node.send("cmd_vel", {"linear": 0.0, "angular": 0.0}) ``` 3. **Monitor performance**: Use `horus monitor` to watch node timing and message flow: ```bash horus monitor # See which nodes are slow ``` 4. **Start with Python**: Prototype in Python first, then move performance-critical inference to Rust if needed. 5. **Cache results**: For cloud APIs, cache common responses to reduce latency and cost. ## Common Errors | Symptom | Cause | Fix | |---------|-------|-----| | Control loop stutters when ML node runs | ML inference blocking the scheduler | Run ML node at a lower rate (`rate=10`) or use `.compute()` / `.async_io()` execution class | | CUDA out of memory | Model too large for GPU | Use a smaller model variant (YOLOv8n instead of YOLOv8x) or offload to CPU | | Python ML node not receiving messages | Topic name mismatch or wrong subscription | Verify topic names match exactly (use dots, e.g., `cam.image_raw`) | | Cloud API timeout | Network latency or API rate limit | Add retry logic, use `.async_io()` for network nodes, cache common responses | | ONNX model fails to load | Wrong ONNX opset version or missing operators | Check model compatibility with your `ort` or `tract` version | --- ## See Also - [Python Bindings](/python/api/python-bindings) — Core Python API for HORUS nodes - [ML Utilities](/python/library/ml-utilities) — Pre-built ONNX inference nodes and performance monitoring - [Execution Classes](/concepts/execution-classes) — Understanding `.compute()` and `.async_io()` for ML workloads - [AI-Assisted Development](/development/ai-assisted-development) — Using AI agents to write HORUS code - [Telemetry Export](/development/telemetry) — Export ML inference metrics to external dashboards --- ## AI-Assisted Development Path: /development/ai-assisted-development Description: Use HORUS with AI coding agents — structured errors, machine-readable API extraction, and auto-fix workflows # AI-Assisted Development You want AI coding agents (Claude Code, Cursor, Copilot) to build, test, and fix your HORUS project autonomously. HORUS provides structured JSON diagnostics with fix commands, machine-readable API extraction, and diff-based change detection so agents can work without parsing compiler output with regex. ## When To Use This - Setting up a HORUS project for AI-assisted vibe coding - Configuring `CLAUDE.md` or similar agent instructions for HORUS - Using `--json-diagnostics` for automated error resolution in CI - Extracting API surface as JSON/Markdown for LLM context windows **Use [AI Integration](/development/ai-integration) instead** if you want to embed ML models inside your robot, not use AI tools to write code. ## Prerequisites - A HORUS project with `horus.toml` - An AI coding agent (Claude Code, Cursor, or any tool that can execute shell commands) ## The AI Development Loop An AI agent working on a HORUS project follows this loop: ``` 1. Understand the project horus doc --extract --json 2. Build and check horus build --json-diagnostics 3. Parse errors (structured JSON, one per line) 4. Auto-fix Execute the "fix" command from each diagnostic 5. Run tests horus test 6. Check what changed horus doc --extract --diff baseline.json 7. Repeat ``` Every step produces machine-readable output. No regex parsing of compiler errors needed. --- ## Structured Error Diagnostics ### Enable JSON diagnostics ```bash horus build --json-diagnostics horus run --json-diagnostics ``` Every error, warning, and hint is emitted as a JSON object on stderr: ```json {"tool":"cargo","code":"H001","severity":"error","message":"Crate 'serde' not found on crates.io","hint":"Check the name or add it with:\n horus add serde","fix":{"type":"command","command":"horus add serde"},"docs_url":"https://horus.dev/errors/missing-crate"} ``` ### The Fix Field Every diagnostic includes a `fix` field that the AI agent can execute directly: ```json { "fix": { "type": "command", "command": "horus add serde" } } ``` The agent parses this, runs `horus add serde`, and the dependency is installed. No guessing. ### Error Code Catalog Diagnostics use standardized codes (H001-H064) grouped by tool: | Range | Tool | Examples | |-------|------|---------| | H001-H007 | Cargo (Rust) | Missing crate, version conflict, linker error | | H010-H014 | Pip (Python) | Package not found, version conflict, wheel build failure | | H030-H040 | Runtime | ModuleNotFoundError, ImportError, SyntaxError | | H050-H064 | Preflight | Missing toolchain, low disk space | ### JSON Output for Build/Test Results ```bash # Build with JSON result horus build --json # Output: {"success": true, "command": "build"} # Or: {"success": false, "command": "build", "errors": [{"message": "..."}]} # Test with JSON result horus test --json # Output: {"success": true, "command": "test"} ``` --- ## API Extraction for Context AI agents need to understand the project's API surface before making changes. `horus doc --extract` provides this in a single command. ### Quick Overview ```bash # Brief text summary (fits in any context window) horus doc --extract ``` Output: ``` # my_robot v0.1.0 — 24 symbols, 78% documented # 3 nodes, 4 messages, 5 topics ## Message Types CmdVel { linear: f32, angular: f32 } Odometry { x: f64, y: f64, theta: f64 } ## Nodes ControllerNode (impl Node) [100hz, Rt] pub -> cmd_vel: CmdVel sub <- odom: Odometry SensorNode (impl Node) [200hz, Rt] pub -> imu: ImuReading ## Topic Graph cmd_vel: CmdVel ControllerNode -> MotorDriver odom: Odometry MotorDriver -> ControllerNode ## src/controller.rs — PID controller struct ControllerNode { pid: PidState } impl Node fn new(kp: f64, ki: f64, kd: f64) -> Result ``` ### Full JSON for Programmatic Access ```bash horus doc --extract --json ``` Returns a `ProjectDoc` with: - All symbols (functions, structs, enums, traits, messages, services, actions) - Doc comments and deprecation annotations - Trait implementations and method associations - Message flow graph (which nodes publish/subscribe to which topics) - Entry points (main functions, Node implementations) - Documentation coverage statistics ### Markdown for LLM System Prompts ```bash horus doc --extract --md > api.md ``` Produces markdown suitable for injecting into an LLM's context window. Include in your `CLAUDE.md` or system prompt: ```markdown # Project API Reference Run `horus doc --extract --md` for current API surface. ``` ### Write to File ```bash horus doc --extract --json -o api.json horus doc --extract --html -o docs/api.html ``` ### Filter by Language ```bash horus doc --extract --lang rust # Only Rust symbols horus doc --extract --lang python # Only Python symbols ``` --- ## API Diff for Change Detection Compare the current API against a saved baseline to detect breaking changes: ```bash # Save baseline (e.g., on main branch) horus doc --extract --json -o baseline.json # After changes, diff against baseline horus doc --extract --diff baseline.json ``` Output: ``` API Changes: Added: + src/sensors/lidar.rs: function pub fn calibrate(&mut self) Removed: [!] BREAKING - src/legacy.rs: function pub fn old_handler() Changed: ~ src/controller.rs: function compute was: pub fn compute(&mut self, setpoint: f64, measurement: f64) -> f64 now: pub fn compute(&mut self, setpoint: f64, measurement: f64, dt: f64) -> f64 [!] BREAKING: added parameter `dt: f64` Summary: +1 added, -1 removed, ~1 changed, 1 breaking changes ``` ### CI Integration ```bash # Fail CI if breaking changes detected horus doc --extract --diff baseline.json # Exit code 1 if breaking changes found, 0 otherwise ``` ### Documentation Coverage Gate ```bash # Fail if coverage drops below 60% horus doc --extract --coverage --fail-under 60 ``` --- ## Watch Mode for Live Development ```bash # Regenerate docs on every file save horus doc --extract --watch # Write to file on every change horus doc --extract --json --watch -o api.json ``` The agent can poll `api.json` to stay updated as the code changes. --- ## Self-Contained HTML Report ```bash horus doc --extract --html -o api-docs.html ``` Generates a single HTML file with: - Embedded CSS (dark mode support) - Client-side search - Collapsible module sections - SVG topic flow graph - Coverage table - TODO/FIXME list No external dependencies — works offline, safe to share. --- ## Typical Agent Workflow ### Step 1: Understand the project ```bash horus doc --extract --json -o api.json # Agent reads api.json, understands: # - What nodes exist and their topics # - What message types are defined # - What the public API surface looks like ``` ### Step 2: Make changes The agent edits source files based on the API understanding. ### Step 3: Build and auto-fix ```bash horus build --json-diagnostics 2> errors.jsonl # Agent parses each line, extracts "fix" commands, executes them # Example: {"fix": {"type": "command", "command": "horus add tokio"}} # Agent runs: horus add tokio # Rebuild until no errors ``` ### Step 4: Verify ```bash horus test --json horus doc --extract --diff baseline.json --json # Agent checks: tests pass? Any breaking changes? ``` ### Step 5: Report ```bash horus doc --extract --coverage # Agent reports: documentation coverage, new symbols, changes ``` --- ## CLAUDE.md Integration Add to your project's `CLAUDE.md` for AI agents: ```markdown ## Build Commands horus build --json-diagnostics # Build with structured errors horus test --json # Run tests horus doc --extract --json # Get API surface ## Auto-Fix Workflow When build fails, parse stderr JSON lines. Each diagnostic has a "fix" field with a command to run. Execute fix commands, then rebuild. ## API Understanding Run `horus doc --extract --brief` to see the project API. The topic graph shows message flow between nodes. ``` --- ## Common Errors | Symptom | Cause | Fix | |---------|-------|-----| | `--json-diagnostics` not producing output | Compilation succeeds with no errors | This is expected; JSON output only appears for errors/warnings | | Agent cannot parse JSON output | Multiple JSON objects per line or mixed stderr | Ensure agent reads stderr line-by-line, each line is one JSON object | | `horus doc --extract` shows stale data | Source files changed but docs not regenerated | Run `horus doc --extract` again, or use `--watch` mode | | API diff shows false-positive breaking changes | Baseline JSON from different branch or version | Regenerate baseline from the correct branch with `horus doc --extract --json -o baseline.json` | --- ## See Also - [CLI Reference](/development/cli-reference) — Full `horus doc`, `horus build`, and `horus test` command options - [Testing](/development/testing) — Writing and running tests that AI agents can verify - [Error Handling](/development/error-handling) — Error types, severity, and the codes referenced by diagnostics - [Static Analysis](/development/static-analysis) — `horus check`, `horus lint`, and `horus fmt` for code quality - [AI Integration](/development/ai-integration) — Embedding ML models in HORUS nodes (complementary topic) --- ## Error Handling Path: /development/error-handling Description: Unified error types, result handling, retry configuration, and best practices for HORUS applications # Error Handling You need to handle errors properly in your HORUS nodes: propagate failures with context, retry transient errors, and degrade gracefully when hardware is unavailable. HORUS provides a unified `Error` type built on Rust's `Result` with structured sub-errors, automatic conversions, and retry utilities. ## When To Use This - Returning errors from node `init()` and helper functions - Adding context when propagating errors with `?` - Matching on specific error variants to handle them differently - Retrying transient errors (network faults, topic full) with exponential backoff - Integrating with `anyhow` in application-level code **Use [Debugging Workflows](/development/debugging) instead** if you need to diagnose runtime problems like panics and deadline misses. ## Prerequisites - Familiarity with Rust's `Result` and `?` operator - `use horus::prelude::*;` in your code (exports `Error`, `Result`, `horus_internal!`, `HorusContext`) ## Quick Start The prelude exports these error types: - `Error` - The main error enum (short alias for `HorusError`) - `Result` - Alias for `std::result::Result` (short alias for `HorusResult`) The long names `HorusError` and `HorusResult` still work for backward compatibility, but new code should prefer the short aliases. ## Core Error Types ### Error The main error type for all HORUS operations (also available as `HorusError` for backward compatibility): ### Error Variants Each variant wraps a structured **sub-error enum** with specific fields for pattern matching: | Variant | Sub-error type | Domain | |---------|---------------|--------| | `Io(std::io::Error)` | — | File system and I/O errors | | `Config(ConfigError)` | `ConfigError` | Configuration parsing/validation | | `Communication(CommunicationError)` | `CommunicationError` | IPC, topics, network | | `Node(NodeError)` | `NodeError` | Node lifecycle (init, tick, shutdown) | | `Memory(MemoryError)` | `MemoryError` | SHM, mmap, tensor pools | | `Serialization(SerializationError)` | `SerializationError` | JSON, YAML, TOML, binary | | `NotFound(NotFoundError)` | `NotFoundError` | Missing frames, topics, nodes | | `Resource(ResourceError)` | `ResourceError` | Already exists, permission denied, unsupported | | `InvalidInput(ValidationError)` | `ValidationError` | Out-of-range, invalid format, constraints | | `Parse(ParseError)` | `ParseError` | Integer, float, boolean parsing | | `InvalidDescriptor(String)` | — | Cross-process tensor descriptor validation | | `Transform(TransformError)` | `TransformError` | Extrapolation, stale data | | `Timeout(TimeoutError)` | `TimeoutError` | Operation exceeded time limit | | `Internal { message, file, line }` | — | Internal errors with source location | | `Contextual { message, source }` | — | Error with preserved source chain | ## Creating Errors ### Using Constructors `Error` provides convenience constructors for the most common variants: ```rust // simplified use horus::prelude::*; // Configuration error let err = Error::config("Invalid frequency: must be positive"); // Node error with context (takes node name + message) let err = Error::node("MotorController", "Failed to initialize PWM"); // Network fault (communication sub-type) let err = Error::network_fault("192.168.1.100", "Connection refused"); // Internal error (prefer horus_internal! macro for file/line capture) let err = horus_internal!("Unexpected state reached"); ``` ### Using Variants Directly For errors without convenience constructors, construct the sub-error directly: ```rust // simplified use horus::prelude::*; use horus::error::{ResourceError, NotFoundError, CommunicationError}; let err = Error::Resource(ResourceError::PermissionDenied { resource: "/dev/ttyUSB0".into(), required_permission: "read/write".into(), }); let err = Error::Resource(ResourceError::AlreadyExists { resource_type: "session".into(), name: "main".into(), }); let err = Error::NotFound(NotFoundError::Topic { name: "cmd_vel".into(), }); ``` ### Internal Errors with Source Location Use the `horus_internal!()` macro to create internal errors that automatically capture file and line number: ```rust // simplified use horus::prelude::*; // Captures file/line automatically return Err(horus_internal!("Unexpected state: {:?}", state)); // Produces: Internal { message: "Unexpected state: ...", file: "src/foo.rs", line: 42 } ``` ### Contextual Errors with Source Chain Use `Error::Contextual` to wrap errors with additional context while preserving the original error chain: ```rust // simplified use horus::prelude::*; let config = load_file("robot.yaml") .map_err(|e| Error::Contextual { message: "Failed to load robot configuration".to_string(), source: Box::new(e), })?; // Produces: "Failed to load robot configuration\n Caused by: " ``` ## Error Context The `HorusContext` trait lets you wrap errors with descriptive context: ```rust // simplified use horus::prelude::*; fn load_config(path: &str) -> Result { let data = std::fs::read_to_string(path) .horus_context(format!("Failed to read config from {}", path))?; let config: Config = toml::from_str(&data) .horus_context("Invalid TOML in config file")?; Ok(config) } ``` | Method | Description | |--------|-------------| | `.horus_context(msg)` | Wrap error with a static context message | | `.horus_context_with(\|\| format!(...))` | Wrap with a lazily-evaluated message (avoids allocation on success) | Works on any `Result` where `E: std::error::Error`. ## Error Propagation ### Using the `?` Operator ```rust // simplified use horus::prelude::*; fn load_robot_config(path: &str) -> Result { // File I/O errors automatically convert to Error::Io let content = std::fs::read_to_string(path)?; // JSON errors automatically convert to Error::Serialization let config: Config = serde_json::from_str(&content)?; Ok(config) } ``` ### Automatic Conversions `Error` implements `From` for many common error types: | Source Type | Target Variant | |-------------|----------------| | `std::io::Error` | `Error::Io` | | `serde_json::Error` | `Error::Serialization` | | `serde_yaml::Error` | `Error::Serialization` | | `toml::de::Error` | `Error::Config` | | `toml::ser::Error` | `Error::Serialization` | | `std::num::ParseIntError` | `Error::Parse` | | `std::num::ParseFloatError` | `Error::Parse` | | `std::str::ParseBoolError` | `Error::Parse` | | `uuid::Error` | `Error::Internal` | | `std::sync::PoisonError` | `Error::Internal` | | `Box` | `Error::Internal` | | `Box` | `Error::Contextual` | | `anyhow::Error` | `Error::Internal` | ## Error Checking ### Pattern Matching ```rust // simplified use horus::prelude::*; use horus::error::{NotFoundError, NodeError, ResourceError}; match result { Ok(value) => process(value), Err(Error::NotFound(NotFoundError::Topic { name })) => { eprintln!("Topic not found: {}", name); } Err(Error::Node(node_err)) => { eprintln!("Node error: {}", node_err); } Err(Error::Resource(ResourceError::PermissionDenied { resource, .. })) => { eprintln!("Permission denied: {}", resource); } Err(Error::Internal { message, file, line }) => { eprintln!("Internal error at {}:{}: {}", file, line, message); } Err(e) => { eprintln!("Unexpected error: {}", e); if let Some(hint) = e.help() { eprintln!(" hint: {}", hint); } } } ``` ## Best Practices ### 1. Use Specific Error Types ```rust // simplified // Good: Specific error with context return Err(Error::node("IMU", "I2C read failed on register 0x3B")); // Avoid: Internal without context return Err(horus_internal!("something went wrong")); ``` ### 2. Add Context When Propagating ```rust // simplified fn initialize_sensor() -> Result<()> { open_i2c_bus().map_err(|e| { Error::node("IMU", format!("Failed to open I2C: {}", e)) })?; Ok(()) } ``` ### 3. Handle Expected Errors Gracefully ```rust // simplified fn get_config() -> Result { match load_config_file("config.yaml") { Ok(config) => Ok(config), Err(Error::NotFound(_)) => { // Expected: use defaults Ok(Config::default()) } Err(e) => Err(e), // Propagate unexpected errors } } ``` ### 4. Log Errors Before Propagating ```rust // simplified use horus::prelude::*; fn critical_operation() -> Result<()> { match do_something_important() { Ok(result) => Ok(result), Err(e) => { hlog!(error, "Critical operation failed: {}", e); Err(e) } } } ``` ## Node Error Handling ### In Tick Methods ### Initialization Errors ### Graceful Degradation ## Testing Error Handling ```rust // simplified #[cfg(test)] mod tests { use super::*; use horus::prelude::*; #[test] fn test_returns_not_found_for_missing_file() { let result = load_config("nonexistent.yaml"); assert!(matches!(result, Err(Error::NotFound(_)))); } #[test] fn test_returns_config_error_for_invalid_yaml() { let result = parse_config("invalid: [yaml"); assert!(matches!(result, Err(Error::Config(_)))); } #[test] fn test_error_context() { let err = Error::node("TestNode", "test message"); let display = format!("{}", err); assert!(display.contains("TestNode")); assert!(display.contains("test message")); } } ``` ## Integration with anyhow For applications that prefer `anyhow`: ```rust // simplified use anyhow::{Context, Result as AnyhowResult}; use horus::prelude::*; fn load_robot() -> AnyhowResult { let config = load_config("robot.yaml") .context("Failed to load robot configuration")?; let robot = Robot::from_config(config) .context("Failed to create robot from config")?; Ok(robot) } // Convert back to horus::Result if needed fn horus_function() -> Result { load_robot().map_err(|e| Error::from(e)) } ``` ## HorusError Variants Reference The `HorusError` enum (aliased as `Error`) is `#[non_exhaustive]` and wraps structured sub-error types: | Variant | Wraps | Example sub-variants | |---------|-------|----------------------| | `Io(std::io::Error)` | std I/O error | — | | `Config(ConfigError)` | Config parsing/validation | `MissingField`, `ParseFailed`, `Other` | | `Communication(CommunicationError)` | IPC, topics, network | `TopicFull`, `TopicNotFound`, `NetworkFault` | | `Node(NodeError)` | Node lifecycle | `InitPanic`, `InitFailed`, `TickFailed`, `Other { node, message }` | | `Memory(MemoryError)` | SHM, tensor pools | `PoolExhausted`, `ShmCreateFailed`, `MmapFailed`, `AllocationFailed` | | `Serialization(SerializationError)` | Serde errors | `Json`, `Yaml`, `Toml`, `Binary` | | `NotFound(NotFoundError)` | Missing resources | `Frame`, `Topic`, `Node`, `Service`, `Parameter` | | `Resource(ResourceError)` | Resource lifecycle | `AlreadyExists`, `PermissionDenied`, `Unsupported` | | `InvalidInput(ValidationError)` | Input validation | `OutOfRange`, `InvalidFormat`, `InvalidEnum`, `MissingRequired` | | `Parse(ParseError)` | Parsing failures | `Int`, `Float`, `Bool`, `Custom` | | `InvalidDescriptor(String)` | Tensor descriptor | — | | `Transform(TransformError)` | TF errors | `Extrapolation`, `StaleData` | | `Timeout(TimeoutError)` | Timeouts | — | | `Internal { message, file, line }` | Debug errors | — | | `Contextual { message, source }` | Error chains | — | ### Sub-Error Variant Details #### ConfigError | Variant | Fields | Severity | |---------|--------|----------| | `ParseFailed` | `format: &'static str`, `reason: String` | Permanent | | `MissingField` | `field: String`, `context: Option` | Permanent | | `ValidationFailed` | `field: String`, `expected: String`, `actual: String` | Permanent | | `InvalidValue` | `key: String`, `reason: String` | Permanent | | `Other(String)` | error message | Permanent | #### CommunicationError | Variant | Fields | Severity | |---------|--------|----------| | `TopicFull` | `topic: String` | Transient | | `TopicNotFound` | `topic: String` | Permanent | | `TopicCreationFailed` | `topic: String`, `reason: String` | Permanent | | `NetworkFault` | `peer: String`, `reason: String` | Transient | | `SerializationFailed` | `reason: String` | Permanent | | `ActionFailed` | `reason: String` | Permanent | #### NodeError | Variant | Fields | Severity | |---------|--------|----------| | `InitPanic` | `node: String` | Fatal | | `ReInitPanic` | `node: String` | Fatal | | `ShutdownPanic` | `node: String` | Permanent | | `InitFailed` | `node: String`, `reason: String` | Permanent | | `TickFailed` | `node: String`, `reason: String` | Permanent | | `Other` | `node: String`, `message: String` | Permanent | #### MemoryError | Variant | Fields | Severity | |---------|--------|----------| | `PoolExhausted` | `reason: String` | Transient | | `AllocationFailed` | `reason: String` | Permanent | | `ShmCreateFailed` | `path: String`, `reason: String` | Permanent | | `MmapFailed` | `reason: String` | Permanent | | `DLPackImportFailed` | `reason: String` | Permanent | | `OffsetOverflow` | (no fields) | Permanent | #### SerializationError | Variant | Fields | Severity | |---------|--------|----------| | `Json` | `source: serde_json::Error` | Permanent | | `Yaml` | `source: serde_yaml::Error` | Permanent | | `Toml` | `source: toml::ser::Error` | Permanent | | `Other` | `format: String`, `reason: String` | Permanent | #### NotFoundError | Variant | Fields | Severity | |---------|--------|----------| | `Frame` | `name: String` | Permanent | | `ParentFrame` | `name: String` | Permanent | | `Topic` | `name: String` | Permanent | | `Node` | `name: String` | Permanent | | `Service` | `name: String` | Permanent | | `Action` | `name: String` | Permanent | | `Parameter` | `name: String` | Permanent | | `Other` | `kind: String`, `name: String` | Permanent | #### ResourceError | Variant | Fields | Severity | |---------|--------|----------| | `AlreadyExists` | `resource_type: String`, `name: String` | Permanent | | `PermissionDenied` | `resource: String`, `required_permission: String` | Permanent | | `Unsupported` | `feature: String`, `reason: String` | Permanent | #### ValidationError | Variant | Fields | Severity | |---------|--------|----------| | `OutOfRange` | `field: String`, `min: String`, `max: String`, `actual: String` | Permanent | | `InvalidFormat` | `field: String`, `expected_format: String`, `actual: String` | Permanent | | `InvalidEnum` | `field: String`, `valid_options: String`, `actual: String` | Permanent | | `MissingRequired` | `field: String` | Permanent | | `ConstraintViolation` | `field: String`, `constraint: String` | Permanent | | `InvalidValue` | `field: String`, `value: String`, `reason: String` | Permanent | | `Conflict` | `field_a: String`, `field_b: String`, `reason: String` | Permanent | | `Other(String)` | error message | Permanent | #### ParseError | Variant | Fields | Severity | |---------|--------|----------| | `Int` | `input: String`, `source: ParseIntError` | Permanent | | `Float` | `input: String`, `source: ParseFloatError` | Permanent | | `Bool` | `input: String`, `source: ParseBoolError` | Permanent | | `Custom` | `type_name: String`, `input: String`, `reason: String` | Permanent | #### TransformError | Variant | Fields | Severity | |---------|--------|----------| | `Extrapolation` | `frame: String`, `requested_ns: u64`, `oldest_ns: u64`, `newest_ns: u64` | Permanent | | `Stale` | `frame: String`, `age: Duration`, `threshold: Duration` | Transient | #### TimeoutError (struct) | Field | Type | |-------|------| | `resource` | `String` | | `elapsed` | `Duration` | | `deadline` | `Option` | Severity: Transient All sub-error enums are `#[non_exhaustive]` — new variants may be added in future releases. ### Constructing Errors ```rust // simplified // Named constructors (3 available) Error::config("Invalid YAML syntax"); Error::node("SensorNode", "Sensor not responding"); Error::network_fault("192.168.1.100", "Connection refused"); // Internal errors (captures file and line automatically) horus_internal!("Unexpected state: {:?}", state); // Contextual errors (wrapping another error) Error::Contextual { message: "Failed to initialize sensor".into(), source: Box::new(io_error), }; ``` ### Type Aliases ```rust // simplified pub type HorusResult = Result; pub type Result = HorusResult; // Convenience alias pub type Error = HorusError; // Short name ``` ## Rate and Stopwatch Utilities ### Rate Drift-compensated rate limiter for controlling loop frequency: ```rust // simplified use horus::prelude::*; let mut rate = Rate::new(100.0); // Target 100 Hz loop { do_work(); rate.sleep(); // Sleeps for the remainder of the 10ms period } // Check actual performance println!("Actual: {:.1} Hz", rate.actual_hz()); println!("Late: {}", rate.is_late()); ``` | Method | Description | |--------|-------------| | `Rate::new(hz)` | Create targeting `hz` frequency | | `rate.sleep()` | Sleep for remainder of current period (drift-compensated) | | `rate.actual_hz()` | Exponentially smoothed actual frequency | | `rate.target_hz()` | Configured target frequency | | `rate.period()` | Target period as `Duration` | | `rate.reset()` | Reset cycle start (after long pauses) | | `rate.is_late()` | Whether current cycle exceeded target | ### Stopwatch Simple elapsed time tracker: ```rust // simplified use horus::prelude::*; let mut sw = Stopwatch::start(); expensive_operation(); println!("Took {:.2} ms", sw.elapsed_ms()); // Lap: return elapsed and reset let lap_time = sw.lap(); ``` | Method | Description | |--------|-------------| | `Stopwatch::start()` | Create and start immediately | | `sw.elapsed()` | Elapsed time as `Duration` | | `sw.elapsed_us()` | Elapsed microseconds (`u64`) | | `sw.elapsed_ms()` | Elapsed milliseconds (`f64`) | | `sw.reset()` | Reset to zero | | `sw.lap()` | Return elapsed and reset | ## Retry Configuration ### RetryConfig Configuration for automatic retry of transient errors with exponential backoff: ```rust // simplified use horus::prelude::*; // Default: 3 retries, 10ms initial backoff, 2x multiplier, 1s cap let config = RetryConfig::default(); // Custom let config = RetryConfig::new(5, 20_u64.ms()) .with_max_backoff(500_u64.ms()) .with_multiplier(1.5); ``` | Method | Returns | Description | |--------|---------|-------------| | `RetryConfig::new(max_retries, initial_backoff)` | `Self` | Create with 2x multiplier and 1s cap | | `.with_max_backoff(duration)` | `Self` | Set maximum backoff duration | | `.with_multiplier(f64)` | `Self` | Set backoff multiplier (must be positive and finite) | | `max_retries()` | `u32` | Maximum retry attempts | | `initial_backoff()` | `Duration` | Initial backoff before first retry | | `max_backoff()` | `Duration` | Maximum backoff cap | | `backoff_multiplier()` | `f64` | Multiplier applied after each retry | Default values: 3 retries, 10ms initial backoff, 2x multiplier, 1s max backoff. ### retry_transient() Generic retry function that only retries transient errors: ```rust // simplified use horus::prelude::*; let config = RetryConfig::new(3, 10_u64.ms()); let result = retry_transient(&config, || { some_operation_that_may_fail() })?; ``` **Signature:** ```rust // simplified pub fn retry_transient(config: &RetryConfig, f: F) -> HorusResult where F: FnMut() -> HorusResult, ``` **Behavior:** - Calls `f()` up to `max_retries + 1` times (initial attempt + retries) - Only `Severity::Transient` errors trigger retry (with exponential backoff) - `Severity::Permanent` and `Severity::Fatal` errors propagate immediately ### Error Severity Each error variant has an associated severity that determines retry behavior: | Severity | Retry? | Examples | |----------|--------|----------| | `Transient` | Yes | `TopicFull`, `NetworkFault`, `PoolExhausted`, `Timeout`, `Stale` | | `Permanent` | No | `TopicNotFound`, `MissingField`, `PermissionDenied`, `InitFailed` | | `Fatal` | No | `Internal`, `Io` | `retry_transient` and `ServiceClient::call_resilient` both use this severity classification. --- ## Quick Recipes ### Recipe: Hardware Init with Retry and Fallback ```rust // simplified fn init(&mut self) -> Result<()> { let config = RetryConfig::new(3, 100_u64.ms()); match retry_transient(&config, || open_hardware("/dev/ttyUSB0")) { Ok(hw) => { self.hardware = Some(hw); hlog!(info, "Hardware connected"); } Err(e) => { hlog!(warn, "Hardware unavailable, using simulation: {}", e); self.hardware = None; // Fallback to sim mode } } Ok(()) } ``` ### Recipe: Pattern-Match Specific Errors ```rust // simplified match Topic::::new("imu") { Ok(topic) => { /* use topic */ } Err(Error::Communication(CommunicationError::TopicCreationFailed { topic, reason })) => { hlog!(error, "Cannot create topic '{}': {} — check SHM permissions", topic, reason); } Err(e) => { hlog!(error, "Unexpected error: {}", e); } } ``` ### Recipe: Conditional Stop on Repeated Failures ```rust // simplified fn tick(&mut self) { match self.sensor.read() { Ok(data) => { self.consecutive_failures = 0; self.process(data); } Err(e) => { self.consecutive_failures += 1; hlog!(warn, "Sensor read failed ({}/5): {}", self.consecutive_failures, e); if self.consecutive_failures >= 5 { hlog!(error, "Too many failures, entering safe state"); self.enter_safe_state(); } } } } ``` --- ## Common Errors | Symptom | Cause | Fix | |---------|-------|-----| | `Error::NotFound(Topic { name })` | Publishing or subscribing to a topic that does not exist | Ensure topic names match exactly (case-sensitive, use dots not slashes) | | `Error::Memory(ShmCreateFailed { .. })` | Shared memory permission issue or stale segment | Run `horus clean --shm`, check OS-level SHM permissions | | `Error::Config(MissingField { .. })` | Required field missing in `horus.toml` | Run `horus check` to identify the missing field | | `retry_transient` retries permanent errors | Error variant has wrong severity classification | Check that the error is truly `Transient` (network, topic full), not `Permanent` | | `Error::Contextual` chain too deep | Multiple layers of `.horus_context()` wrapping | Use `.horus_context()` at domain boundaries, not on every `?` | --- ## See Also - [Error Types Reference](/rust/api/error-types) — Complete listing of all error variants and sub-errors - [Nodes](/concepts/core-concepts-nodes) — Node lifecycle where error handling patterns apply - [Services API](/rust/api/services) — `call_resilient` uses `RetryConfig` for service calls - [Debugging Workflows](/development/debugging) — Runtime debugging when errors manifest as panics or deadline misses - [Troubleshooting](/getting-started/troubleshooting) — Common issues and environment-level solutions --- ## Development Tools Path: /development Description: Build, test, debug, and monitor HORUS applications # Development Tools Everything you need to build, test, debug, and monitor HORUS robotics applications from your development machine. --- ## Quick Reference | Task | Command | Details | |------|---------|---------| | See all CLI commands | `horus --help` | [CLI Reference](/development/cli-reference) | | Run your project | `horus run` | [CLI Reference](/development/cli-reference) | | Run tests | `horus test` | [Testing](/development/testing) | | Monitor running system | `horus monitor` | [Monitor](/development/monitor) | | Check code quality | `horus lint` | [Static Analysis](/development/static-analysis) | | View logs | `horus log` | [Logging](/development/logging) | | Check system health | `horus doctor` | [CLI Reference](/development/cli-reference) | | Export metrics | `.telemetry(endpoint)` | [Telemetry Export](/development/telemetry) | | Manage parameters | `horus param` | [Parameters](/development/parameters) | --- ## Start Here | I want to... | Start with | |-------------|-----------| | Learn the CLI | [CLI Reference](/development/cli-reference) | | Debug a running robot | [Debugging](/development/debugging) then [Monitor](/development/monitor) | | Set up automated testing | [Testing](/development/testing) | | Add runtime parameters to my node | [Parameters](/development/parameters) | | Handle errors properly | [Error Handling](/development/error-handling) | | Use native cargo/pip inside HORUS projects | [Native Tool Integration](/development/native-tools) | | Organize a multi-crate project | [Multi-Crate Workspaces](/development/workspaces) | | Use AI to help write nodes | [AI-Assisted Development](/development/ai-assisted-development) | | Set up CI/CD | [Static Analysis](/development/static-analysis) then [Testing](/development/testing) | --- ## CLI and Build Tools - [CLI Reference](/development/cli-reference) — Complete reference for all 46+ `horus` commands with options, examples, and common errors - [Multi-Crate Workspaces](/development/workspaces) — Organize robotics projects with shared libraries, drivers, and binary targets in one workspace - [Native Tool Integration](/development/native-tools) — Use `cargo`, `pip`, and `cmake` directly inside HORUS projects with transparent proxy and `horus.toml` sync ## Testing and Debugging - [Testing](/development/testing) — Unit tests, integration tests, `tick_once()` testing, record/replay, and `horus test` - [Debugging](/development/debugging) — Step-by-step workflows for deadline misses, panics, and performance bottlenecks - [Monitor](/development/monitor) — Web and TUI dashboards for live node, topic, and parameter observation - [Static Analysis](/development/static-analysis) — Project validation with `horus check`, `horus lint`, and `horus fmt` - [Logging](/development/logging) — Structured node logging with `hlog!`, `hlog_once!`, and `hlog_every!` macros ## Configuration and Error Handling - [Parameters](/development/parameters) — Runtime parameters with live tuning, validation, persistence, and CLI management - [Error Handling](/development/error-handling) — Unified `Error` type, `Result` propagation, retry configuration, and graceful degradation ## Metrics and Integration - [Telemetry Export](/development/telemetry) — Export scheduler metrics to HTTP, UDP, file, or stdout for external dashboards - [AI Integration](/development/ai-integration) — Embed AI/ML models (PyTorch, ONNX, cloud APIs) in HORUS nodes - [AI-Assisted Development](/development/ai-assisted-development) — Structured errors, machine-readable API extraction, and auto-fix workflows for AI coding agents --- ## See Also - [Operations](/operations) — Deployment, production monitoring, fleet management - [Advanced Topics](/advanced) — Scheduler tuning, determinism, safety monitor, real-time setup - [API Cheatsheet](/reference/api-cheatsheet) — Quick API lookup for common patterns - [horus.toml](/concepts/horus-toml) — Project manifest reference ======================================== # SECTION: Advanced Topics ======================================== --- ## BlackBox Flight Recorder Path: /advanced/blackbox Description: Event recording and post-mortem debugging for HORUS applications # BlackBox Flight Recorder Your robot crashed overnight and the logs are gone. You need a crash-safe, always-on event recorder that captures the sequence of events leading up to any failure. The BlackBox is that recorder. ## When To Use This | Situation | Use BlackBox? | |-----------|--------------| | Robot runs unattended (production, field tests) | **Yes** -- you need crash forensics | | Safety-critical system (motors, arms, drones) | **Yes** -- every deadline miss is recorded | | Development with debugger attached | Optional -- you can inspect directly | | Short test runs (under 5 minutes) | Optional -- logs are usually sufficient | | Overnight regression testing | **Yes** -- find intermittent failures | **Use [Record & Replay](/advanced/record-replay) instead** if you need full node state (inputs/outputs) for deterministic replay. The BlackBox captures lightweight events (what happened), not full data. ## Prerequisites - Familiarity with [Scheduler Configuration](/advanced/scheduler-configuration) -- especially the `.blackbox()` builder method - Understanding of [Safety Monitor](/advanced/safety-monitor) for interpreting deadline misses and watchdog events ## Understanding the BlackBox **BlackBox vs Logging:** Logs are text, grow forever, and require parsing. The BlackBox is structured events, fixed-size (never fills your disk), and queryable by type (show only anomalies). **BlackBox vs Record/Replay:** Record/Replay captures full node state (inputs/outputs) for deterministic replay -- great for debugging but storage-heavy. The BlackBox captures lightweight events (what happened, not the full data) -- always-on, zero overhead, crash-safe. ## How It Works The BlackBox is a **circular buffer** — it keeps the last N events and discards the oldest when full. This means: - **Fixed memory** — never grows beyond the configured size - **Always-on** — no performance impact (events are tiny structs) - **Crash-safe** — data persists even if the process is killed - **No manual instrumentation** — the Scheduler records events automatically ## Enabling BlackBox Use the `.blackbox(size_mb)` builder method to enable the BlackBox: ## What Gets Recorded The BlackBox automatically captures events during scheduler execution: | Event | Description | |-------|-------------| | Scheduler start/stop | When the scheduler begins and ends | | Node execution | Each node tick with duration and success/failure | | Node errors | Failed node executions | | Deadline misses | Nodes that missed their timing deadline | | Budget violations | Nodes that exceeded their execution time budget | | Failure policy events | Failure policy state transitions | | Emergency stops | Safety system activations | | Custom events | User-defined markers | ## Post-Mortem Debugging After a failure, the BlackBox contains the sequence of events leading up to it. Inspect via the CLI (works for both Rust and Python projects) or programmatically in Rust: ```bash # CLI — works for any HORUS project (Rust or Python) horus blackbox --anomalies horus blackbox --json horus blackbox show --filter errors horus blackbox show --last 100 ``` ### Programmatic access ## Circular Buffer Behavior The BlackBox uses a fixed-size circular buffer. When full, the oldest events are discarded: ``` Buffer capacity: 50,000 records (10MB) Event 1 → [1, _, _, _, _] New events fill the buffer Event 2 → [1, 2, _, _, _] ... Event N → [1, 2, ..., N-1, N] Buffer full Event N+1 → [2, 3, ..., N, N+1] Oldest dropped ``` This ensures bounded memory usage while keeping the most recent events for debugging. ## Recommended Buffer Sizes | Use Case | Configuration | Buffer Size | |----------|---------------|-------------| | Development | `.blackbox(16)` | 16 MB | | Long-running production | `.blackbox(100)` | 100 MB | | Safety-critical | `.blackbox(1024)` | 1 GB | ## CLI Usage Inspect the BlackBox from the command line: ```bash # View all events horus blackbox # View anomalies only (errors, deadline misses, e-stops) horus blackbox --anomalies # Follow in real-time (like tail -f) horus blackbox --follow # Filter by node horus blackbox --node motor_ctrl # Filter by event type horus blackbox --event DeadlineMiss # JSON output for scripts/dashboards horus blackbox --json ``` ## Debugging Walkthrough: "My Robot Crashed Overnight" **Scenario:** Your mobile robot stopped moving during an overnight warehouse test. The process restarted but the original crash data is gone. **Step 1: Check the BlackBox** ```bash horus blackbox --anomalies ``` **Step 2: Read the timeline** ``` [03:17:01.001] SchedulerStart { nodes: 4, rate: 500Hz } [03:17:01.500] NodeTick { name: "planner", duration_us: 2100, success: true } [03:17:01.502] DeadlineMiss { name: "collision_checker", deadline_us: 1900, actual_us: 4200 } [03:17:01.503] DeadlineMiss { name: "collision_checker", deadline_us: 1900, actual_us: 5100 } [03:17:01.504] NodeError { name: "arm_controller", error: "joint limit exceeded" } [03:17:01.504] EmergencyStop { reason: "deadline miss threshold exceeded" } ``` **Step 3: Diagnose** The collision checker started missing its 1.9ms deadline (taking 4-5ms instead). During that time, the planner sent a trajectory that would have been rejected — but the check arrived too late. The arm exceeded its joint limits. **Step 4: Fix** - Tighten the collision checker's budget: `.budget(1500_u64.us())` - Or add a safety interlock: hold trajectory execution until collision check completes - Or move collision checking to the same RT thread as the arm controller ## BlackBox vs Other Debugging Tools | Tool | What it captures | Storage | When to use | |------|-----------------|---------|-------------| | **BlackBox** | Scheduler events (lightweight) | Fixed ring buffer (16-1024 MB) | Always-on crash forensics | | **Record/Replay** | Full node state (inputs/outputs) | Grows with time | Reproduce specific bugs | | **horus log** | Text log messages | Grows with time | Verbose debugging | | **horus monitor** | Live system state | None (real-time only) | Active debugging | ## Design Decisions **Why a ring buffer instead of a log file?** A log file grows without bound and eventually fills the disk. A ring buffer has fixed, predictable memory usage. For a robot running 24/7 in a warehouse, you cannot afford to run out of disk space. The ring buffer keeps the most recent events and silently discards the oldest. **Why structured events instead of text logs?** Structured events can be filtered by type (`--event DeadlineMiss`), queried by node (`--node motor_ctrl`), and exported to JSON for dashboards. Text logs require regex parsing and are fragile. Structured events also have lower overhead -- no string formatting during the hot path. **Why automatic recording instead of manual instrumentation?** The scheduler knows when every node ticks, when deadlines are missed, and when failures occur. Requiring developers to manually add recording calls would lead to incomplete data. The BlackBox captures everything the scheduler sees, with zero code changes. ## Trade-offs | Gain | Cost | |------|------| | Fixed memory -- never fills disk | Oldest events are lost when buffer is full | | Always-on with zero overhead | Only captures scheduler events, not application-level data | | Crash-safe (survives process kill) | Requires post-mortem inspection (not real-time alerting) | | No code changes required | Cannot record custom application data (use Record/Replay for that) | ## Common Errors | Symptom | Cause | Fix | |---------|-------|-----| | `horus blackbox` shows no events | BlackBox not enabled | Add `.blackbox(16)` to the scheduler builder | | BlackBox missing events from crash | Buffer too small, events were overwritten | Increase buffer size: `.blackbox(100)` or `.blackbox(1024)` | | Cannot read BlackBox after process restart | BlackBox data is in-memory, not persisted to disk by default | Use `horus blackbox --follow` during the run, or configure filesystem persistence | | `horus blackbox --anomalies` shows nothing | No anomalies occurred (all nodes ran within budget) | This is normal. Use `horus blackbox` without filters to see all events | | High memory usage | Buffer size too large for the system | Reduce `.blackbox(size_mb)` to match available memory | ## See Also - [Safety Monitor](/advanced/safety-monitor) — Deadline enforcement and watchdog events recorded by the BlackBox - [Fault Tolerance](/advanced/circuit-breaker) — Failure policies whose state transitions are recorded - [Record & Replay](/advanced/record-replay) — Full node state recording for deterministic replay - [Scheduler Configuration](/advanced/scheduler-configuration) — `.blackbox()` builder method --- ## Record & Replay Path: /advanced/record-replay Description: Recording and tick-perfect replay for debugging, testing, and analysis # Record & Replay You need to capture a robot's execution and replay it later for debugging, regression testing, or what-if analysis. HORUS record/replay captures full node state and replays it with tick-perfect determinism. ## When To Use This - You need to reproduce a bug that only occurs in specific conditions (field test, customer site) - You want to regression-test a new planner/controller against recorded sensor data - You need to compare two algorithm versions on the same input data - You are debugging a crash and need to step through the event timeline **Use [BlackBox](/advanced/blackbox) instead** if you only need lightweight event logging for crash forensics. Record/Replay captures full node state (inputs/outputs) and is storage-heavy. The BlackBox captures event metadata and is always-on. ## Prerequisites - Familiarity with [Scheduler Configuration](/advanced/scheduler-configuration) -- especially `.with_recording()` and `.deterministic(true)` - Understanding of [Deterministic Mode](/advanced/deterministic-mode) for replay to produce identical results - Understanding of [Topics](/concepts/core-concepts-topic) for how recorded data is injected into shared memory ## Overview The record/replay system supports: - **Full recording**: Capture entire system execution - **Tick-perfect replay**: Reproduce exact behavior deterministically - **Time travel**: Jump to any recorded tick - **Mixed replay**: Combine recorded nodes with live execution - **Playback control**: Speed adjustment, tick ranges ## Enabling Recording ### Via Builder API Enable recording through builder methods: ### Via CLI ```bash # Record during a run horus run --record my_session my_project ``` When recording is enabled, the scheduler automatically captures each node's inputs, outputs, and timing. ## Replaying Recordings ### Full Replay Replay an entire recorded session: ### Time Travel Jump to specific tick ranges during replay: ### Mixed Replay Combine recorded nodes with live execution for what-if testing: ### Output Overrides Override specific outputs during replay: ## CLI Commands Record and replay from the command line: ```bash # Start recording during a run horus run --record my_session my_project # List recording sessions horus record list horus record list --long # Show file sizes and tick counts # Show details of a session horus record info my_session # Replay a recording horus record replay my_session horus record replay my_session --start-tick 1000 --stop-tick 2000 horus record replay my_session --speed 0.5 # Compare two recording sessions horus record diff session1 session2 horus record diff session1 session2 --limit 50 # Export to JSON or CSV horus record export my_session --output data.json --format json horus record export my_session --output data.csv --format csv # Inject recorded nodes into a new run horus record inject my_session --nodes camera_node,lidar_node horus record inject my_session --all --loop # Delete a recording session horus record delete my_session horus record delete my_session --force ``` ## Managing Recordings ## `replay_from` vs `add_replay` | Method | Use Case | Clock | |--------|----------|-------| | `Scheduler::replay_from(path)` | Full replay — all nodes from one recording | ReplayClock (recorded timestamps) | | `scheduler.add_replay(path, priority)` | Mixed — replay some nodes, run others live | ReplayClock for replay nodes | **When to use which:** - Use `replay_from()` for **post-mortem debugging** — replay an entire session exactly as recorded - Use `add_replay()` for **regression testing** — replay recorded sensor data while running a new version of your planner/controller live ```rust,ignore // Post-mortem: "what happened in production?" let mut scheduler = Scheduler::replay_from("crash_session.hbag")?; scheduler.run()?; // Regression test: "does the new planner work with the same sensor data?" let mut scheduler = Scheduler::new(); scheduler.add_replay("sensor_data.horus".into(), 0)?; // recorded LiDAR + IMU scheduler.add(NewPlannerV2::new()).order(1).build()?; // live planner under test scheduler.run()?; ``` ## Python Complete Recording Workflow ```python import horus def sensor_tick(node): node.send("imu", horus.Imu(accel_x=0.0, accel_y=0.0, accel_z=9.81)) sensor = horus.Node(name="imu", pubs=[horus.Imu], tick=sensor_tick, rate=100) # Step 1: Record a session sched = horus.Scheduler(tick_rate=100, recording=True) sched.add(sensor) sched.run(duration=5.0) # Step 2: Get recording files files = sched.stop_recording() print(f"Recorded to: {files}") # Step 3: List and manage for rec in sched.list_recordings(): print(f" Available: {rec}") # Step 4: Full replay sched2 = horus.Scheduler.replay_from(files[0]) sched2.run() # Step 5: Time travel replay sched3 = horus.Scheduler.replay_from(files[0]) sched3.start_at_tick(100) sched3.stop_at_tick(400) sched3.set_replay_speed(0.5) sched3.run() # Step 6: Mixed replay (recorded sensor + new controller) sched4 = horus.Scheduler(tick_rate=100) sched4.add_replay("recordings/imu@001.horus", priority=0) sched4.add(horus.Node(tick=new_controller, rate=100, order=1)) sched4.run() ``` > **Note**: Python supports the full replay API: `Scheduler.replay_from()`, `add_replay()`, `start_at_tick()`, `stop_at_tick()`, `set_replay_speed()`, and `set_replay_override()`. See examples below. --- ## Design Decisions **Why record at the topic level instead of the node level?** Recording topic data (inputs/outputs) rather than internal node state means recordings are portable across code versions. You can replay recorded sensor data against a new planner without recompiling the sensor driver. This is the same approach used by ROS2's `rosbag`. **Why mixed replay instead of full-system-only replay?** The most common debugging workflow is: "replay the recorded sensors, but run my new controller live." Mixed replay enables this without re-recording. You swap out the node under test while keeping all other data identical. **Why `.horus` format instead of standard formats?** The `.horus` recording format preserves tick-level timing, shared memory layout, and type metadata. Standard formats (CSV, JSON) lose timing precision and type safety. Export to JSON/CSV is available via `horus record export` for analysis tools. ## Trade-offs | Gain | Cost | |------|------| | Tick-perfect deterministic replay | Recordings grow with session length (not bounded like BlackBox) | | Mixed replay enables what-if testing | Replaying with different code may produce different results (expected) | | Time travel to any tick | Random access requires indexing, which adds to recording size | | CLI tools for comparison and export | Custom `.horus` format requires HORUS tools to read | ## Common Errors | Symptom | Cause | Fix | |---------|-------|-----| | Recording is empty | `.with_recording()` not set on the scheduler | Add `.with_recording()` to the scheduler builder | | Replay produces different results | Code changed between recording and replay | Use the same binary version, or use mixed replay for the changed node | | `replay_from()` fails with file not found | Incorrect recording path | Use `horus record list` to find available recordings | | Mixed replay node does not receive data | Topic names do not match between recorded and live nodes | Verify topic names are identical (case-sensitive) | | Replay runs instantly (no pacing) | Replay uses virtual time by default | Use `.with_replay_speed(1.0)` for real-time pacing | | Large recording files filling disk | Long sessions with many topics | Use `horus record delete` to clean up, or record only specific sessions | | `horus record diff` shows no differences | Sessions are identical | This confirms both runs produced the same output | ## See Also - [BlackBox Flight Recorder](/advanced/blackbox) — Lightweight event recording for crash forensics - [Deterministic Mode](/advanced/deterministic-mode) — Required for bit-identical replay - [Scheduler Configuration](/advanced/scheduler-configuration) — `.with_recording()` and `.deterministic(true)` builder methods - [Time API](/rust/time-api) — ReplayClock and time backends --- ## Fault Tolerance Path: /advanced/circuit-breaker Description: Per-node failure policies for preventing cascading failures — Fatal, Restart, Skip, Ignore with real robotics examples # Fault Tolerance You need to prevent a single crashing node from killing your entire robot. HORUS failure policies let you define per-node behavior on failure: stop the system, restart with backoff, skip temporarily, or ignore entirely. ## When To Use This - Any robot with more than one node (which is every production robot) - Sensor drivers that may disconnect (USB, serial, network) - Systems with non-critical nodes (logging, telemetry) that should not bring down the system - Safety-critical deployments where you need explicit failure contracts per node **Use [Safety Monitor](/advanced/safety-monitor) instead** if you need watchdog timers, deadline enforcement, or graduated degradation. Failure policies and safety monitoring are complementary -- use both in production. ## Prerequisites - Familiarity with [Nodes](/concepts/core-concepts-nodes) and the [Scheduler Configuration](/advanced/scheduler-configuration) node builder API - Understanding of which nodes are critical vs non-critical in your system ## The Problem Without failure policies, one crashing node kills the entire system: ``` Tick 100: sensor_driver panics (USB disconnected) Tick 100: scheduler stops Tick 100: motor_controller stops receiving commands Result: robot stops moving in the middle of a task ``` With failure policies, the system adapts: ``` Tick 100: sensor_driver panics (USB disconnected) Tick 100: FailurePolicy::Restart → re-init sensor_driver (10ms backoff) Tick 101: sensor_driver panics again → restart (20ms backoff) Tick 102: USB reconnects → sensor_driver.init() succeeds → normal operation Result: robot paused briefly, then resumed automatically ``` ## The Four Policies ### Fatal — Stop Everything First failure stops the scheduler immediately. Use for nodes where continued operation after failure is **unsafe**: - Motor controllers (stale commands = uncontrolled motion) - Safety monitors (can't monitor safety if the monitor is broken) - Emergency stop handlers **When it triggers**: `node.tick()` raises an exception (Python) or panics (Rust). The scheduler calls `stop()` and shuts down all nodes cleanly. ### Restart — Re-Initialize with Backoff Re-initializes the node with exponential backoff. After `max_restarts` exhausted, escalates to fatal stop. ``` failure 1 → restart, wait 50ms failure 2 → restart, wait 100ms (2x backoff) failure 3 → restart, wait 200ms (2x backoff) failure 4 → max_restarts exceeded → fatal stop ``` After a successful tick, the backoff timer clears. Use for nodes that can recover from transient failures: - Sensor drivers (hardware reconnection) - Network clients (server temporarily unavailable) - Camera nodes (USB reset) ### Skip — Tolerate with Cooldown After `max_failures` consecutive failures, the node is **suppressed** for the cooldown period. After cooldown, the node is allowed again and the failure counter resets. ``` failure 1 → continue failure 2 → continue failure 3 → continue failure 4 → continue failure 5 → node suppressed for 1 second ... 1 second passes ... node allowed again, failure count = 0 ``` Use for nodes whose absence doesn't affect core robot operation: - Logging and telemetry upload - Diagnostics reporting - Cloud sync - Non-critical monitoring ### Ignore — Swallow Failures Failures are completely ignored. The node keeps ticking every cycle regardless of errors. Use only when partial results are acceptable: - Statistics collectors (missing one sample is fine) - Best-effort visualization - Debug output nodes ## Severity-Aware Handling HORUS errors carry severity levels that can **override** the configured policy: | Severity | Effect | |----------|--------| | **Fatal** (e.g., shared memory corruption) | Always stops the scheduler, even with `Ignore` policy | | **Transient** (e.g., topic full, network timeout) | De-escalates `Fatal` policy to `Restart` (transient errors are recoverable) | | **Permanent** (e.g., invalid configuration) | Follows the configured policy | This means a safety-critical node with `Fatal` policy won't kill the system on a transient network glitch — it'll restart instead. But a shared-memory corruption always stops, even on an `Ignore` node. ## Complete Robot Example ## Choosing the Right Policy | Node Type | Policy | Why | |-----------|--------|-----| | Motor control, safety | `Fatal` | Unsafe to continue without these | | Sensor drivers | `Restart(3-5, 50-200ms)` | Hardware reconnects are common | | Perception pipelines | `Restart(3, 100ms)` or `Skip(5, 2s)` | Can recover or degrade gracefully | | Logging, telemetry | `Skip(5, 1s)` or `Ignore` | Non-critical, absence is tolerable | | Debug/visualization | `Ignore` | Partial results are fine | ## Python Error Handlers Use the `on_error` callback to handle failures in Python nodes: ```python import horus def my_error_handler(node, exception): node.log_error(f"Node failed: {exception}") # Optionally take corrective action if "USB" in str(exception): node.log_warning("USB disconnected — will retry on restart") def sensor_tick(node): data = read_hardware() # May raise OSError node.send("sensor", data) sensor = horus.Node( name="sensor", tick=sensor_tick, on_error=my_error_handler, failure_policy="restart", rate=100, pubs=["sensor"], ) horus.run(sensor) ``` The `on_error` callback runs after the failure policy processes the error. It receives the node and the exception object. Use it for logging, alerting, or state cleanup before the next restart. --- ## Monitoring Failures Failure events are recorded in the [BlackBox](/advanced/blackbox) flight recorder: ```bash # View failure events from the blackbox horus blackbox show --filter errors # Monitor live horus log -f --level error ``` In code: ```rust // simplified // Inspect anomalies via CLI: horus blackbox --anomalies if let Some(bb) = scheduler.get_blackbox() { for record in bb.lock().expect("blackbox lock").anomalies() { println!("[tick {}] {:?}", record.tick, record.event); } } ``` ## Design Decisions **Why per-node policies instead of a global failure mode?** Different nodes have fundamentally different failure characteristics. A motor controller failure is safety-critical -- you must stop. A telemetry uploader failure is harmless -- you can ignore it. Per-node policies match the failure contract to the node's role in the system. **Why severity-aware handling?** A transient network timeout on a `Fatal` node should not kill the system -- it should restart. But shared memory corruption on an `Ignore` node must always stop -- it indicates fundamental system failure. Severity-aware handling prevents both over-reaction (killing the system on a glitch) and under-reaction (ignoring corruption). **Why exponential backoff on Restart?** A sensor driver that fails once probably has a transient issue (USB reset). A sensor driver that fails 5 times in rapid succession has a permanent problem. Exponential backoff gives transient issues time to resolve while quickly escalating persistent failures. ## Trade-offs | Gain | Cost | |------|------| | One crashing node does not kill the system | Must think about failure contracts per node | | Restart with backoff handles transient hardware issues | Backoff adds latency during recovery | | Skip prevents non-critical failures from cascading | Skipped nodes produce no output during cooldown | | Severity overrides prevent both over- and under-reaction | Behavior depends on error severity, not just configured policy | ## Common Errors | Symptom | Cause | Fix | |---------|-------|-----| | System stops on a transient network error | `Fatal` policy on a node with transient failures | Use `Restart` for nodes that can recover, or rely on severity-aware de-escalation | | Node restarts endlessly | `max_restarts` set too high, or the failure is permanent | Lower `max_restarts` and check logs for the root cause | | Non-critical node drags down performance | `Skip` cooldown too short, node fails immediately after cooldown | Increase cooldown duration or investigate the root cause | | `Fatal` node did not stop the system | Error was classified as `Transient` severity, de-escalated to Restart | Expected behavior. Truly fatal errors (e.g., SHM corruption) always stop regardless | | Cannot find failure events after crash | No BlackBox configured | Add `.blackbox(64)` to the scheduler for crash forensics | ## See Also - [Safety Monitor](/advanced/safety-monitor) — Watchdog timers and graduated degradation - [BlackBox Recorder](/advanced/blackbox) — Crash forensics and post-mortem debugging - [Scheduler Configuration](/advanced/scheduler-configuration) — Per-node builder API and execution classes - [Nodes](/concepts/core-concepts-nodes) — Node lifecycle and trait methods --- ## Deterministic Mode Path: /advanced/deterministic-mode Description: Run-to-run reproducible execution with SimClock, dependency ordering, and deterministic RNG for simulation, testing, and replay # Deterministic Mode You need reproducible, bit-identical execution across runs for simulation, testing, or replay. Here is how to enable deterministic mode and what it changes about scheduler behavior. ## When To Use This - You are running physics simulation and need virtual time (SimClock) instead of wall clock - You want CI tests that never flake due to timing nondeterminism - You are replaying recorded sessions for regression testing - You need to compare two runs and verify identical outputs **Do not use for real hardware.** Deterministic mode uses virtual time -- a motor controller receiving `horus::dt()` gets a fixed value regardless of how fast ticks actually execute. Use normal mode for real actuators. ## Prerequisites - Familiarity with [Scheduler](/concepts/core-concepts-scheduler) and [Scheduler Configuration](/advanced/scheduler-configuration) - Understanding of [Nodes](/concepts/core-concepts-nodes) (especially `publishers()` and `subscribers()` for dependency ordering) ## Enabling Deterministic Mode ## What Changes in Deterministic Mode | Aspect | Normal Mode | Deterministic Mode | |--------|-------------|-------------------| | Clock | Wall clock (real time) | Virtual SimClock (fixed dt per step) | | RNG | System entropy | Tick-seeded (reproducible) | | Dependency graph | Same graph (from topic metadata) | Same graph (from topic metadata) | | Independent nodes | **Parallel** (ready-dispatch) | **Sequential** (reproducible order) | | Dependent nodes | Causal order (producer before consumer) | Causal order (producer before consumer) | | BestEffort execution | Ready-dispatch (optimal parallelism) | Sequential within steps, SimClock between steps | | Execution classes | All active | All active | | Failure policies | Active | Active | | Watchdog | Active | Active | **Both modes use the same dependency graph.** The difference is execution strategy: normal mode dispatches independent nodes to a thread pool for maximum parallelism; deterministic mode executes them sequentially for bit-identical reproducibility. Dependent nodes (publisher → subscriber) are always causally ordered in both modes. ## Framework Time API Use `horus::now()`, `horus::dt()`, and `horus::rng()` instead of `Instant::now()` and `rand::random()`. These are the standard framework API — same pattern as `hlog!()` for logging. See the [Time API reference](/rust/time-api) for the full API. ## Dependency Ordering The scheduler builds a dependency graph automatically from topic `send()`/`recv()` calls. When a node calls `send()` on a topic, it registers as a publisher. When another calls `recv()`, it registers as a subscriber. The scheduler detects these edges and sequences producer before consumer. **No manual metadata or trait methods needed.** ```rust // simplified struct SensorDriver { scan_topic: Topic, } impl Node for SensorDriver { fn name(&self) -> &str { "sensor" } fn tick(&mut self) { // send() registers this node as publisher of "scan" self.scan_topic.send(self.read_hardware()); } } struct Controller { scan_topic: Topic, cmd_topic: Topic, } impl Node for Controller { fn name(&self) -> &str { "controller" } fn tick(&mut self) { // recv() registers as subscriber of "scan" if let Some(scan) = self.scan_topic.recv() { let cmd = self.compute_velocity(&scan); // send() registers as publisher of "cmd" self.cmd_topic.send(cmd); } } } ``` The scheduler automatically ensures `sensor` ticks before `controller` because `controller` subscribes to a topic that `sensor` publishes. No `.order()` values needed — the dependency graph handles it. In **normal mode**, independent nodes (no shared topics) run in parallel via the ready-dispatch executor. In **deterministic mode**, they run sequentially within each dependency step for reproducibility. ### Fallback Without Metadata If nodes don't call `send()`/`recv()` during `init()` or the first tick, the scheduler has no topic metadata to build edges from. It falls back to `.order()` tiers: lower order runs first, same order = independent (parallel in normal mode, sequential in deterministic mode). ## Normal vs Deterministic: When to Use Which | Purpose | Mode | Why | |---------|------|-----| | Real robot deployment | Normal | Wall clock matches hardware reality | | Simulation (physics engine) | Deterministic | Virtual clock matches physics time | | Unit / integration tests | Deterministic | Reproducible, no flakes | | CI pipeline | Deterministic | Same result every run | | Record/replay debugging | Replay (`replay_from()`) | Recorded clock reproduces exact scenario | | Recording a session on real robot | Normal + `.with_recording()` | Wall clock for hardware, recording for later | **Deterministic mode uses virtual time** — it cannot drive real hardware. A motor controller receiving `horus::dt()` in deterministic mode gets a fixed value (e.g., exactly 1ms for 1kHz), regardless of how fast ticks actually execute. This is correct for simulation but wrong for real actuators. ## Record and Replay During replay, recorded topic data is injected into shared memory so live subscriber nodes see the replayed data. ## Determinism Guarantees **What HORUS guarantees**: same binary + same hardware produces bit-identical outputs, tick for tick, across unlimited runs. **What is NOT deterministic** (hardware/compiler, not HORUS): - **Cross-platform float**: IEEE 754 differs across CPUs (FMA, extended precision). Same binary + same hardware = deterministic. - **Direct `Instant::now()`**: Bypasses the framework clock. Use `horus::now()` instead. - **`HashMap` iteration**: Rust randomizes per process. Use `BTreeMap` in deterministic nodes. ## Design Decisions **Why virtual time instead of slowed-down wall clock?** A slowed wall clock still introduces nondeterminism from OS scheduling jitter. Virtual time (SimClock) advances by exactly `1/rate` per tick, making output bit-identical across runs. This is the same approach used by Gazebo, Drake, and Isaac Sim. **Why tick-seeded RNG instead of a global seed?** A global seed produces different sequences if nodes are added or removed. Tick-seeded RNG produces the same value at tick N regardless of how many nodes are in the system, making results stable across configuration changes. **Why dependency ordering instead of always-sequential?** Sequential execution is deterministic but slow. Dependency ordering gives the same guarantees (producer before consumer) while allowing independent nodes to run in parallel. This matches real-world robotics where sensor nodes have no dependencies on each other. ## Trade-offs | Gain | Cost | |------|------| | Bit-identical runs for testing and replay | Cannot drive real hardware (virtual time) | | Dependency ordering eliminates data races | Requires `publishers()` and `subscribers()` metadata for automatic ordering | | Tick-seeded RNG is stable across config changes | Must use `horus::rng()` instead of `rand::random()` | | Independent nodes still run in parallel | Dependent nodes are serialized (slower than normal mode) | ## Common Errors | Symptom | Cause | Fix | |---------|-------|-----| | Different results across runs | Using `Instant::now()` or `rand::random()` directly | Use `horus::now()`, `horus::dt()`, and `horus::rng()` instead | | Different results across platforms | IEEE 754 float differences (FMA, extended precision) | Expected. Same binary + same hardware = deterministic | | Nodes execute in unexpected order | Missing `publishers()` / `subscribers()` metadata | Implement metadata on nodes, or use `.order()` as fallback | | `HashMap` iteration order varies | Rust randomizes HashMap per process | Use `BTreeMap` in deterministic nodes | | Motor overshoots in deterministic mode | Virtual time does not match real actuator timing | Do not use deterministic mode with real hardware | | Replay produces different output | Code changed between recording and replay | Replay requires the same binary version | --- ## See Also - [Record & Replay](/advanced/record-replay) — Session recording and mixed replay - [Scheduler Configuration](/advanced/scheduler-configuration) — Builder methods including `.deterministic(true)` - [Time API](/rust/time-api) — SimClock, `horus::now()`, `horus::dt()`, `horus::rng()` - [Scheduler Concepts](/concepts/core-concepts-scheduler) — How the scheduler manages node execution --- ## Network Backends Path: /advanced/network-backends Description: Communication backends in HORUS — automatic local IPC with planned network transport # Communication Backends You need to understand how HORUS routes messages between nodes and what latency to expect. HORUS automatically selects the optimal communication backend based on topology -- no configuration needed. ## When To Use This - You want to understand the latency characteristics of node-to-node communication - You are debugging unexpected latency in your pub/sub pipeline - You are planning a multi-process architecture and need to know cross-process overhead - You want to know the roadmap for multi-machine communication **This is informational.** You do not need to configure backends manually -- HORUS selects them automatically. ## Prerequisites - Familiarity with [Topics](/concepts/core-concepts-topic) and how pub/sub works in HORUS - Understanding of [Multi-Process Communication](/concepts/multi-process) if using cross-process topics ## Automatic Backend Selection When you call `Topic::new("name")`, HORUS automatically detects the optimal backend based on the number of publishers, subscribers, and whether they're in the same process: No configuration needed. The backend upgrades and downgrades dynamically as participants join and leave. ### Verifying the active backend Use `horus topic list --verbose` to see which backend each topic is using and how many publishers/subscribers are connected: ```bash horus topic list --verbose ``` ```text TOPIC TYPE BACKEND PUBS SUBS LATENCY motors.cmd_vel CmdVel in-process 1 1 ~18ns sensors.imu Imu in-process 1 3 ~24ns camera.rgb Image shm 1 1 ~85ns diagnostics.status Generic shm 2 2 ~91ns ``` The `BACKEND` column tells you whether a topic is using in-process channels or shared memory. If you expect in-process but see `shm`, one of the participants is in a different process. If you expect `shm` but see in-process, all participants happen to be in the same process (which is faster -- no action needed). You can also check a single topic: ```bash horus topic info motors.cmd_vel ``` This shows the backend type, message size, publisher/subscriber count, and measured latency. ## Communication Paths HORUS selects the optimal path based on where your nodes are running and how many publishers/subscribers are involved: ### Same-Process Communication | Scenario | Latency | |----------|---------| | Same thread, 1 publisher → 1 subscriber | ~3ns | | 1 publisher → 1 subscriber | ~18ns | | 1 publisher → many subscribers | ~24ns | | Many publishers → 1 subscriber | ~26ns | | Many publishers → many subscribers | ~36ns | Same-process communication uses lock-free ring buffers. No system calls, no serialization, no copies. The subscriber reads directly from the publisher's buffer. ### Cross-Process Communication (Shared Memory) | Scenario | Latency | |----------|---------| | 1 publisher → 1 subscriber (simple types) | ~50ns | | Many publishers → 1 subscriber | ~65ns | | 1 publisher → many subscribers | ~70ns | | 1 publisher → 1 subscriber | ~85ns | | Many publishers → many subscribers | ~91ns | Cross-process communication uses POSIX shared memory (`shm_open` on Linux/macOS). The publisher writes to a shared memory segment; the subscriber reads from the same segment. For POD (plain old data) types, this is zero-copy -- the subscriber reads the bytes directly without deserialization. ### How latency scales Latency is determined by three factors: **Process locality.** Same-thread is fastest (~14ns in benchmarks). Crossing a thread boundary adds ~68ns (14ns to 82ns). Crossing a process boundary adds another ~80ns (82ns to 162ns). These numbers are from actual benchmarks on a 4-core Linux system with `performance` CPU governor. **Topology.** Each additional subscriber adds ~6-8ns of overhead (the publisher must write to each subscriber's slot). Each additional publisher adds synchronization cost (~10-15ns for the lock-free MPSC coordination). **Message size.** For POD types, latency is nearly independent of message size up to a few KB (the shared memory segment is memory-mapped, so the OS handles paging). For serialized types (`String`, `Vec`, nested structs), serialization time scales linearly with payload size -- roughly 10ns per additional KB. The path is selected based on: - **Process locality**: Same thread → same process → cross-process - **Topology**: Number of publishers and subscribers - **Data type**: Simple fixed-size types get the fastest cross-process path ## Dynamic Migration HORUS dynamically migrates between backends as topology changes: ``` Single publisher + single subscriber (same process) → ~18ns Second subscriber joins (same process) → ~24ns Subscriber in different process joins → ~70ns All subscribers disconnect except one in-process → ~18ns ``` Migration is transparent -- `send()` and `recv()` calls are unaffected. ## Performance Characteristics | Metric | In-Process | Shared Memory | |--------|-----------|---------------| | Latency | 3-36ns | 50-171ns | | Throughput | Millions msg/s | Millions msg/s | | Zero-copy | Yes | Yes | | Cross-machine | No | No | ## Debugging Latency If your topic latency is higher than expected, follow these steps in order: ### Step 1: Check the backend type ```bash horus topic list --verbose ``` If a topic shows `shm` but you expected `in-process`, one of the publishers or subscribers is in a different process. This alone adds ~80ns. Check whether all nodes that use this topic are in the same `horus run` invocation. ### Step 2: Check if you are cross-process Cross-process communication adds ~80ns over same-process. If all your nodes are in one process, you should see `in-process` backend. If you launched multiple `horus run` commands that share a topic, the backend upgrades to `shm` automatically. ```bash # See how many processes are connected to a topic horus topic info sensors.imu ``` ### Step 3: Check message size For serialized types (anything with `String`, `Vec`, or `Option`), larger payloads take longer to serialize. Measure with: ```bash horus topic hz sensors.imu --latency ``` If latency grows with message size, consider: - Switching to a `#[fixed]` POD type (zero-copy, no serialization) - Reducing the payload (send only changed fields) - Using `GenericMessage` only for prototyping, not production ### Step 4: Check CPU governor The CPU frequency governor has a major impact on latency. `powersave` mode can double latency compared to `performance` mode: ```bash # Check current governor cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor # Set to performance (requires root) sudo cpufreq-set -g performance ``` On embedded systems (Raspberry Pi, Jetson), the default governor is often `ondemand` or `powersave`. For real-time control loops, always use `performance`. ### Step 5: Measure with `horus topic hz` ```bash # Measure publish rate and latency for a running topic horus topic hz motors.cmd_vel ``` This shows the actual publish frequency and per-message latency. If the measured rate is lower than the node's configured rate, the node's `tick()` function is taking too long -- the bottleneck is in your code, not the transport. ### Step 6: Check for contention If many publishers write to the same topic, lock-free coordination adds overhead. The `horus topic list --verbose` output shows publisher and subscriber counts. For high-frequency topics, prefer a single-publisher architecture (one node publishes, many subscribe). --- ## Planned: Network Transport (Zenoh) Future versions of HORUS will add Zenoh-based network transport for: - **Multi-robot communication** across machines - **Cloud connectivity** for telemetry and remote monitoring - **ROS2 interoperability** via Zenoh DDS bridge The planned architecture adds network backends alongside the existing local backends: | Transport | Latency | Use Case | |-----------|---------|----------| | In-process | 3-36ns | Same-process nodes | | Shared Memory | 50-171ns | Same-machine IPC | | Zenoh (planned) | ~100us+ | Multi-robot, cloud, ROS2 | When network transport is implemented, `Topic::new()` will continue to auto-select the optimal backend. Network transport will only be used when topics are explicitly configured for remote communication. ## Future: Multi-Machine Transport HORUS does not yet support multi-machine communication natively. The likely approach is Zenoh, a pub/sub protocol designed for robotics and IoT that provides: - **Automatic discovery** of peers on the local network (no broker, no configuration) - **Protocol flexibility** -- TCP, UDP, and shared memory, selected automatically - **DDS bridge** for ROS2 interoperability without running a full DDS stack - **Low overhead** -- Zenoh adds ~100-200us of latency for LAN communication, compared to milliseconds for DDS The planned integration will work like this: topics marked as `remote` in the HORUS configuration will be bridged to Zenoh. Local topics (the default) will continue to use shared memory. No code changes will be required -- only configuration. ### What to do today for multi-machine setups If you need multi-machine communication right now, you have two options: **Option 1: `horus deploy` (recommended).** Deploy your HORUS project to a remote machine and run it there. Each machine runs its own HORUS instance with local shared memory. Use an external coordinator (HTTP, MQTT, or a custom bridge node) to exchange data between machines. This is the most reliable approach today. ```bash # Deploy and run on a remote machine horus deploy --target pi@192.168.1.10 --run ``` **Option 2: Custom UDP bridge node.** Write a node that subscribes to local topics, serializes messages, and sends them over UDP to a node on the other machine that deserializes and publishes locally. This adds ~1-5ms of latency (UDP + serialization) but works with any network topology. Both approaches are interim solutions. When Zenoh support ships, migration will require only configuration changes, not code changes. --- ## Design Decisions **Why shared memory instead of sockets?** Shared memory provides 50-171ns latency for cross-process communication. Unix domain sockets would add 5-15us. TCP sockets add 50-100us. For robotics control loops running at 1kHz (1ms budget), every microsecond matters. At 50ns, the transport is effectively invisible -- your control algorithm is always the bottleneck, not the communication layer. **Why automatic selection instead of manual configuration?** Developers should not need to know whether their nodes are in the same thread, same process, or different processes. `Topic::new("name")` always works, and HORUS picks the fastest available path. This also means topology changes (moving a node to a separate process) do not require code changes. A node that works in a single-process prototype continues to work when deployed as multiple processes -- the backend upgrades automatically. **Why dynamic migration between backends?** Nodes can start and stop at any time. When a cross-process subscriber joins, HORUS upgrades from in-process channels to shared memory. When it leaves, HORUS downgrades back. This happens transparently, with no message loss during migration. Without dynamic migration, you would need to pre-configure the backend at startup, which means topology changes require restarts. **Why not DDS?** DDS (Data Distribution Service) is the transport layer behind ROS2. It provides multi-machine communication, QoS policies, and automatic discovery. However: - **Latency overhead.** DDS adds 50-200us of latency even for same-machine communication. HORUS achieves 3-171ns. - **Complexity.** DDS requires configuring QoS profiles, domain IDs, and participant discovery. HORUS requires zero configuration. - **Binary size.** A DDS implementation (Fast-DDS, Cyclone DDS) adds 5-20MB to the binary. HORUS's shared memory backend adds <100KB. - **Startup time.** DDS participant discovery takes 1-5 seconds. HORUS topics are available immediately. HORUS will support DDS interoperability via a Zenoh-DDS bridge for teams that need to integrate with ROS2 systems, without imposing DDS overhead on the core framework. **Why not raw TCP/UDP for cross-process?** Even on localhost, TCP adds ~50us and UDP adds ~20us per message due to kernel-to-userspace copies and system call overhead. Shared memory eliminates these copies entirely -- the publisher and subscriber read and write the same physical memory pages. The OS kernel is not involved in the data path at all. --- ## Trade-offs | Gain | Cost | |------|------| | Zero-configuration backend selection | Less explicit control over transport | | Sub-microsecond latency for all local paths | No cross-machine communication (yet) | | Dynamic migration handles topology changes transparently | Brief latency spike during migration (~1-2 messages) | | Shared memory provides zero-copy cross-process IPC | Shared memory segments require cleanup on crash (`horus clean --shm`) | | No DDS overhead or configuration | No built-in QoS policies (reliability, durability, history depth) | | Immediate topic availability (no discovery phase) | Topics must use dots not slashes for macOS compatibility | | Backend auto-upgrade when cross-process subscribers join | Latency increases from ~18ns to ~70ns when upgrading to shm | ## Common Errors | Symptom | Cause | Fix | |---------|-------|-----| | Cross-process topic not receiving messages | Topic names do not match exactly (case-sensitive) | Verify topic names are identical in both processes | | Stale shared memory after crash | Process was killed without cleanup | Run `horus clean --shm` to clear shared memory segments | | Higher latency than expected | Nodes in different processes when they could be in the same process | Move nodes into the same process if latency is critical | | Topic names with `/` fail on macOS | macOS `shm_open` does not support slashes | Use dots instead of slashes: `"sensors.lidar"` not `"sensors/lidar"` | | Latency doubles after system idle | CPU governor switched to `powersave` | Set governor to `performance`: `sudo cpufreq-set -g performance` | | `horus topic list` shows no topics | No HORUS process is running | Start your application first, then inspect topics | | Subscriber gets stale data on startup | Shared memory retains last message from previous run | Run `horus clean --shm` before starting, or handle stale data in your node | ## See Also - [Topics](/concepts/core-concepts-topic) -- Shared memory architecture and Topic API - [Multi-Process Communication](/concepts/multi-process) -- Cross-process topic routing - [Rust API Reference](/rust/api) -- Topic creation and usage - [Performance](/performance/performance) -- Benchmarks and optimization --- ## Scheduler Configuration Path: /advanced/scheduler-configuration Description: Configuring the HORUS scheduler with the fluent node builder API, execution classes, and per-node settings # Scheduler Configuration You need to configure how your robot's nodes execute: which ones get real-time threads, how to handle deadline misses, and what order nodes tick in. This guide covers the full scheduler and node builder API. ## When To Use This - You are moving beyond the defaults and need per-node timing, priority, or failure handling - You need to assign execution classes (RT, Compute, Event, AsyncIo) to different workloads - You are configuring a production system with watchdogs, blackbox, or RT requirements **Use [Scheduler Concepts](/concepts/core-concepts-scheduler) instead** if you need to understand how the scheduler works before configuring it. ## Prerequisites - Familiarity with [Nodes](/concepts/core-concepts-nodes) and [Scheduler](/concepts/core-concepts-scheduler) - Understanding of [Execution Classes](/concepts/execution-classes) ## Creating a Scheduler Every scheduler starts with `Scheduler::new()`. From there you can optionally set global parameters with builder methods before adding nodes: ### Builder Methods | Method | Description | Default | |--------|-------------|---------| | `.tick_rate(freq)` | Global scheduler tick rate | `100 Hz` | | `.deterministic(bool)` | Deterministic mode — SimClock, dependency ordering, seeded RNG. See [Deterministic Mode](/advanced/deterministic-mode) | `false` | | `.watchdog(Duration)` | Frozen node detection — auto-creates safety monitor | disabled | | `.blackbox(size_mb)` | BlackBox flight recorder (n MB ring buffer) | disabled | | `.max_deadline_misses(n)` | Emergency stop after n deadline misses | `100` | | `.require_rt()` | Hard real-time — panics without RT capabilities | — | | `.prefer_rt()` | Request RT features (degrades gracefully) | — | | `.cores(&[usize])` | Pin scheduler threads to specific CPU cores | all cores | | `.verbose(bool)` | Enable/disable non-emergency logging | `true` | | `.with_recording()` | Enable record/replay | — | | `.telemetry(endpoint)` | Export telemetry to UDP/file endpoint | disabled | ## Adding Nodes Add nodes with `scheduler.add(n)`, then chain configuration calls, and finalize with `.build()?`: ## Execution Classes Every node belongs to exactly one execution class. Set it in the builder chain: | Method | Class | Description | |--------|-------|-------------| | `.compute()` | Compute | Offloaded to a worker thread pool. Use for planning, SLAM, or ML inference. | | `.on(topic)` | Event-Driven | Wakes only when the named topic receives new data. | | `.async_io()` | Async I/O | Runs on an async executor. Use for network, disk, or cloud calls. | If no execution class is specified, the node defaults to **BestEffort**. A node is automatically promoted to the **RT** class when you set `.rate(Frequency)` (which auto-derives budget at 80% and deadline at 95% of the period). ### When to Use Each Class - **RT (auto-detected)** — Motor controllers, safety monitors, sensor fusion, anything that must run every tick with bounded latency. Triggered by `.rate(Frequency)` on a BestEffort node. - **`.compute()`** — Path planning, point cloud processing, ML inference. These can take longer than a single tick without blocking RT nodes. - **`.on(topic)`** — Collision detection, event handlers, reactive behaviors. Only runs when there is new data, saving CPU when idle. - **`.async_io()`** — Telemetry upload, log shipping, cloud API calls. Never blocks any real-time or compute work. **What each class means for your robot:** - **RT** — Your motor controller sends PWM commands every millisecond. Missing one cycle causes the motor to overshoot. This node needs a dedicated RT thread. - **Compute** — Your SLAM algorithm takes 50ms to process a lidar scan. If it runs on the RT thread, the motor controller misses 50 deadlines. Compute nodes run on a separate thread pool. - **Event** — Your collision detector only needs to run when new lidar data arrives, not every cycle. Event nodes sleep until their topic gets a message. - **AsyncIo** — Your telemetry node uploads data to a cloud server. Network calls can take seconds. AsyncIo nodes run on a tokio thread pool so they never block anything. - **BestEffort** — Your debug logger. Runs on the main thread when there's time, no timing guarantees. ## Per-Node Configuration ### Ordering and Timing | Method | Description | |--------|-------------| | `.order(n)` | Tiebreaker for independent nodes (lower = runs first). Optional when nodes have topic dependencies — the dependency graph handles ordering automatically | | `.rate(Frequency)` | Node-specific tick rate — auto-derives budget (80%) and deadline (95%), auto-marks as RT | | `.budget(Duration)` | Override auto-derived tick budget (max execution time) | | `.deadline(Duration)` | Override auto-derived absolute deadline | | `.on_miss(Miss)` | What to do on deadline miss (`Miss::Warn`, `Miss::Skip`, `Miss::SafeMode`, `Miss::Stop`) | ### RT Configuration | Method | Description | |--------|-------------| | `.priority(i32)` | OS thread priority (SCHED_FIFO 1-99) for this node's RT thread | | `.core(usize)` | Pin this node's RT thread to a specific CPU core | | `.watchdog(Duration)` | Per-node watchdog timeout (overrides scheduler global) | These are only meaningful for RT nodes (nodes with `.rate()`). They require Linux with `CAP_SYS_NICE` and degrade gracefully when RT capabilities are unavailable. ```rust // simplified // Safety-critical node: highest priority, pinned to core 2, tight watchdog scheduler.add(EmergencyStop::new()) .order(0) .rate(1000_u64.hz()) .priority(99) .core(2) .watchdog(2_u64.ms()) .on_miss(Miss::Stop) .build()?; // Logger: long watchdog, async I/O scheduler.add(Logger::new()) .order(200) .async_io() .watchdog(5_u64.secs()) .build()?; ``` ### Failure Policy | Method | Description | |--------|-------------| | `.failure_policy(policy)` | Per-node failure handling (see [Fault Tolerance](/advanced/circuit-breaker)) | | `.build()` | Finalize and register the node (returns `Result`) | ### Order Guidelines - **0-9**: Critical real-time (motor control, safety) - **10-49**: High priority (sensors, fast control loops) - **50-99**: Normal priority (processing, planning) - **100-199**: Low priority (logging, diagnostics) - **200+**: Background (telemetry, non-essential) ## Global Configuration with Composable Builders Compose the builder methods you need for each deployment stage: ## Execution Modes HORUS **automatically parallelizes independent nodes** while maintaining causal ordering for dependent nodes. No manual configuration needed — the scheduler builds a dependency graph from topic `send()`/`recv()` metadata. ### Default Mode (Auto-Parallel) The scheduler builds a dependency graph from topic metadata and dispatches independent nodes to a thread pool via the **ready-dispatch executor**. Each node starts the instant its last dependency finishes — no barriers, no wasted time. | Metric | Value | |--------|-------| | Independent nodes | **Parallel** (multi-core) | | Dependent nodes | **Causal order** (publisher before subscriber) | | Latency | Optimal — critical path only | | `.order()` needed | **No** — optional tiebreaker | ```rust,ignore use horus::prelude::*; let mut scheduler = Scheduler::new(); // Independent sensors — run in parallel automatically scheduler.add(lidar_node).build()?; // publishes "scan" scheduler.add(camera_node).build()?; // publishes "image" scheduler.add(imu_node).build()?; // publishes "imu" // Fusion depends on all three — waits for all to complete scheduler.add(fusion_node).build()?; // subscribes "scan", "image", "imu" scheduler.run()?; ``` ### Deterministic Mode Uses the **same dependency graph** but executes nodes sequentially within each step. SimClock advances between steps. Produces bit-identical results across runs. | Metric | Value | |--------|-------| | Independent nodes | **Sequential** (reproducible order) | | Dependent nodes | **Causal order** (same as default) | | Clock | Virtual SimClock | | Best For | Simulation, testing, replay, CI | ```rust,ignore use horus::prelude::*; let mut scheduler = Scheduler::new() .deterministic(true) .tick_rate(100_u64.hz()); scheduler.add(sensor).build()?; scheduler.add(controller).build()?; scheduler.run()?; ``` ### Mode Comparison | Feature | Default (Auto-Parallel) | Deterministic | |---------|------------------------|---------------| | Independent nodes | Parallel (multi-core) | Sequential (reproducible) | | Dependent nodes | Causal order | Causal order | | Clock | Wall clock | SimClock | | Certification ready | Yes (causal ordering guaranteed) | Yes (fully reproducible) | ## DurationExt and Frequency HORUS provides ergonomic extension methods for creating `Duration` and `Frequency` values, replacing verbose `Duration::from_micros(200)` calls: ### Duration Helpers ```rust // simplified use horus::prelude::*; // Microseconds let budget = 200_u64.us(); // Duration::from_micros(200) // Milliseconds let deadline = 1_u64.ms(); // Duration::from_millis(1) // Seconds let timeout = 5_u64.secs(); // Duration::from_secs(5) ``` Works on `u64` literals via the `DurationExt` trait. ### Frequency Type The `.hz()` method creates a `Frequency` that auto-derives timing parameters: ```rust // simplified use horus::prelude::*; let freq = 100_u64.hz(); freq.value() // 100.0 Hz freq.period() // 10ms (1/frequency) freq.budget_default() // 8ms (80% of period) freq.deadline_default() // 9.5ms (95% of period) ``` Use `Frequency` with the node builder's `.rate()` method to auto-configure RT timing: ```rust // simplified // Auto-derives budget (80% period) and deadline (95% period) // Also auto-marks the node as RT scheduler.add(motor_ctrl) .order(0) .rate(500_u64.hz()) // period=2ms, budget=1.6ms, deadline=1.9ms .on_miss(Miss::Skip) .build()?; ``` | Method | Returns | Description | |--------|---------|-------------| | `.us()` | `Duration` | Microseconds | | `.ms()` | `Duration` | Milliseconds | | `.secs()` | `Duration` | Seconds | | `.hz()` | `Frequency` | Frequency in Hz | | `freq.value()` | `f64` | Frequency in Hz | | `freq.period()` | `Duration` | 1/frequency | | `freq.budget_default()` | `Duration` | 80% of period | | `freq.deadline_default()` | `Duration` | 95% of period | ## Design Decisions **Why auto-derive budget and deadline from `.rate()`?** Most developers think in terms of "this node runs at 1kHz" rather than "this node has an 800us budget and 950us deadline." Auto-derivation (budget = 80% period, deadline = 95% period) provides safe defaults without requiring timing expertise. Override with explicit `.budget()` and `.deadline()` when profiling shows different requirements. **Why composable builders instead of presets?** Early versions of HORUS had presets like `deploy()` and `hard_rt()`. These were removed because real systems need specific combinations of features. Composable builders let you pick exactly what you need: `.watchdog(500_u64.ms()).blackbox(64)` is clearer than a preset that might enable features you do not want. **Why `.order()` instead of automatic dependency ordering?** Explicit ordering is predictable and debuggable. Automatic dependency ordering (available in deterministic mode) requires `publishers()` and `subscribers()` metadata on every node. In normal mode, `.order()` gives you full control without metadata overhead. ## Trade-offs | Gain | Cost | |------|------| | Per-node execution classes match workload to executor | More configuration decisions when adding nodes | | Auto-derived timing from `.rate()` reduces configuration | 80%/95% defaults may not match your workload profile | | Composable builders allow precise feature selection | No single-line "production mode" shortcut | | Explicit `.order()` is predictable | Must be maintained manually as nodes are added | ## Common Errors | Symptom | Cause | Fix | |---------|-------|-----| | Node runs as BestEffort when you expected RT | `.rate()` not set, or `.compute()` overrides it | Set `.rate(freq)` and do not combine with `.compute()` | | "Cannot set SCHED_FIFO" at startup | Missing RT permissions | See [RT Setup](/advanced/rt-setup) for `limits.conf` and `setcap` | | Deadline misses on every tick | Budget too tight for actual computation time | Profile with `horus monitor`, then increase `.budget()` or lower `.rate()` | | Node never ticks | `.on(topic)` set but no publisher on that topic | Verify another node publishes to the same topic name | | `.build()` returns error | Conflicting configuration (e.g., `.on()` with `.budget()`) | Event nodes cannot have budgets. Remove timing constraints from event nodes | | Nodes execute in wrong order | Topic dependencies not detected (no `send()`/`recv()` calls) | Ensure nodes call `send()`/`recv()` during `init()` or first tick. Use `.order()` as fallback for non-topic dependencies | ## See Also - [Scheduler Concepts](/concepts/core-concepts-scheduler) — How the scheduler works - [Execution Classes](/concepts/execution-classes) — The 5 execution classes and when to use each - [Safety Monitor](/advanced/safety-monitor) — Watchdog and deadline enforcement - [Fault Tolerance](/advanced/circuit-breaker) — Failure policies and recovery - [RT Setup](/advanced/rt-setup) — Linux real-time kernel configuration --- ## Safety Monitor Path: /advanced/safety-monitor Description: Real-time safety monitoring with watchdogs, budget enforcement, and deadline miss policies # Safety Monitor You need to enforce timing constraints on your robot's nodes and detect when a node hangs, overruns its budget, or misses a deadline. The safety monitor handles this automatically through watchdogs, budget enforcement, and configurable miss policies. ## When To Use This - Any system with timing requirements (motor control, sensor fusion, safety-critical nodes) - Production deployments where a hung node must be detected and handled - Systems that need graduated degradation instead of all-or-nothing failure - When regulations or safety standards require deadline monitoring **Use [Fault Tolerance](/advanced/circuit-breaker) alongside this** for per-node failure policies (restart, skip, ignore). The safety monitor handles timing violations; failure policies handle execution errors. Use both in production. ## Prerequisites - Familiarity with [Scheduler Configuration](/advanced/scheduler-configuration) — especially `.rate()`, `.budget()`, `.deadline()`, and `.on_miss()` - Understanding of [Execution Classes](/concepts/execution-classes) — especially RT auto-detection - Understanding of [Nodes](/concepts/core-concepts-nodes) — especially `enter_safe_state()` and `is_safe_state()` ## Overview The Safety Monitor includes: - **Watchdogs**: Monitor node liveness -- trigger action if a critical node hangs - **Budget Enforcement**: Per-node tick budgets -- act if a node takes too long (implicit when nodes have `.rate()` set) - **Deadline Tracking**: Count deadline misses and apply the configured `Miss` policy - **Miss Policies**: `Warn`, `Skip`, `SafeMode`, or `Stop` -- per-node control over what happens on deadline miss The Scheduler manages the safety monitor internally -- you configure it with composable builder methods and the scheduler automatically feeds watchdogs, checks budgets, and applies miss policies. ## Enabling Safety Monitoring Use composable builder methods to enable safety monitoring. Each method adds a specific safety feature: ### Composable Builder Comparison | Builder | Watchdog | Budget Enforcement | Memory Locking | Blackbox | |---------|----------|-------------------|----------------|----------| | `new()` | No | Implicit (when nodes have `.rate()`) | No | No | | `.watchdog(500_u64.ms())` | **Yes** (500ms) | Implicit | No | No | | `.require_rt()` | No | Implicit | **Yes** | No | | `.watchdog(500_u64.ms()).require_rt()` | **Yes** (500ms) | Implicit | **Yes** | No | | `.watchdog(500_u64.ms()).blackbox(64)` | **Yes** (500ms) | Implicit | No | **Yes** (64MB) | ## Configuring Nodes with Rates After configuring the scheduler, add nodes with timing constraints using the node builder. Setting `.rate()` automatically marks the node as RT and derives budget (80% of period) and deadline (95% of period): ## Watchdogs Watchdogs monitor node liveness. The scheduler automatically feeds watchdogs on successful node ticks. If a critical node fails to execute within the watchdog timeout, the safety monitor triggers graduated degradation. ``` Normal operation: Node tick → success → watchdog fed → timer reset Failure scenario: Node hangs → watchdog timeout expires → graduated degradation → EMERGENCY STOP ``` ### Timeout Guidelines ``` Watchdog timeout should be: - Longer than expected execution time - Shorter than safety-critical response time Example: Expected tick period: 10ms Safety deadline: 100ms Watchdog timeout: 50ms (5× period) ``` ## Budget and Deadline Enforcement Budget and deadline are two levels of timing enforcement: - **Budget** is the expected computation time (soft limit). Budget violations are tracked in `RtStats` for monitoring. - **Deadline** is the hard limit. When exceeded, the `Miss` policy fires (`Warn`, `Skip`, `SafeMode`, or `Stop`). When you set `.rate()`, both are auto-derived: budget = 80% of period, deadline = 95% of period. When you set `.budget()` without `.deadline()`, the deadline equals the budget — your budget IS your hard limit: ```rust // simplified // Auto-derived from rate scheduler.add(motor_controller) .order(0) .rate(1000_u64.hz()) // budget=800us, deadline=950us .on_miss(Miss::SafeMode) // Fires on DEADLINE miss (>950us) .build()?; // Explicit budget — deadline auto-derived to match scheduler.add(fast_loop) .order(0) .budget(500_u64.us()) // budget=500us, deadline=500us (auto) .on_miss(Miss::Stop) // Fires when tick exceeds 500us .build()?; // Explicit budget + deadline — slack between them scheduler.add(with_slack) .order(0) .budget(500_u64.us()) // Soft: track violations above 500us .deadline(900_u64.us()) // Hard: Miss policy fires above 900us .on_miss(Miss::SafeMode) .build()?; ``` Violations are also recorded in the BlackBox when using `.blackbox(n)`. ## Node Health States Every node has a health state tracked internally by the scheduler. The four states form a graduated degradation ladder: | State | Meaning | |-------|---------| | `Healthy` | Normal operation — node ticks every cycle | | `Warning` | Watchdog at 1x timeout — node still ticks, but a warning is logged | | `Unhealthy` | Watchdog at 2x timeout — node is **skipped** in the tick loop | | `Isolated` | Watchdog at 3x timeout — `enter_safe_state()` is called, node is skipped | ### Graduated Degradation Transitions The scheduler evaluates watchdog severity every tick and transitions nodes through health states automatically: Warning: 1x timeout Warning --> Unhealthy: 2x timeout Unhealthy --> Isolated: 3x timeout Warning --> Healthy: successful tick Isolated --> Healthy: recovery (RestoreRate) `} caption="Graduated degradation: Healthy → Warning → Unhealthy → Isolated, with recovery paths" /> **Escalation** happens when a node's watchdog is not fed (the node is slow or hung): - **Healthy to Warning** — 1x watchdog timeout elapsed. The node still runs, but the scheduler logs a warning. - **Warning to Unhealthy** — 2x timeout. The node is skipped entirely in the tick loop to prevent cascading delays. - **Unhealthy to Isolated** — 3x timeout. The scheduler calls `enter_safe_state()` on the node and continues to skip it. For critical nodes, this also triggers an emergency stop. **Recovery** happens on successful ticks: - A `Warning` node that ticks successfully transitions back to `Healthy` immediately, and its watchdog is re-fed. - An `Isolated` or rate-reduced node can recover through the graduated degradation system — after enough consecutive successful ticks at a reduced rate, the scheduler restores the original rate and transitions back to `Healthy`. ### Relationship to Miss Policies Node health states and `Miss` policies are complementary: - **`Miss` policies** act on individual deadline/budget violations (skip one tick, enter safe mode, stop the scheduler). - **Health states** track sustained behavior over time via the watchdog. A node can be in `Warning` even if its `Miss` policy is `Warn` — repeated warnings escalate to `Unhealthy` and eventually `Isolated`. Both systems work together: the `Miss` policy handles immediate responses, while health states provide graduated, automatic degradation for persistently failing nodes. ### Shutdown Report When the scheduler shuts down with `.watchdog()` enabled, the timing report includes a health summary: ``` Node Health: [OK] All 4 nodes healthy ``` Or, if any nodes degraded during the run: ``` Node Health: 3 healthy, 1 warning, 0 unhealthy, 0 isolated, 0 stopped - sensor_fusion: WARNING ``` ## Miss — Deadline Miss Policy The `Miss` enum controls what happens when a node exceeds its deadline: | Policy | Behavior | |--------|----------| | `Miss::Warn` | Log a warning and continue (default) | | `Miss::Skip` | Skip the node for this tick | | `Miss::SafeMode` | Call `enter_safe_state()` on the node | | `Miss::Stop` | Stop the entire scheduler | ### SafeMode in Detail When `Miss::SafeMode` triggers: 1. The scheduler calls `enter_safe_state()` on the offending node 2. Each subsequent tick, the scheduler checks `is_safe_state()` 3. When the node reports safe, normal operation resumes Implement these on your Node: ## RT Node Isolation Each RT node runs on its own dedicated thread by default. If one RT node stalls (deadlock, infinite loop, hardware fault), other RT nodes keep ticking independently on their own threads. ``` Thread 1: [MotorLeft.tick()] → sleep → repeat Thread 2: [MotorRight.tick()] → sleep → repeat ← keeps running Thread 3: [ArmServo.tick()] → sleep → repeat ← keeps running If MotorLeft stalls, MotorRight and ArmServo are unaffected. ``` This is critical for robots where each actuator must be independently controllable. A stalled left wheel controller must not take down the right wheel. Use `.core(N)` to pin specific nodes to CPU cores for cache locality: ```rust // simplified scheduler.add(left_motor).order(0).rate(1000_u64.hz()).core(2).build()?; scheduler.add(right_motor).order(1).rate(1000_u64.hz()).core(3).build()?; ``` > **Note:** The watchdog detects stalled nodes but cannot preempt a running `tick()` — cooperative scheduling means the node must return from `tick()` for the watchdog to take action. Thread isolation ensures the stall doesn't cascade to other nodes. ## Shutdown Safety The scheduler guarantees that shutdown always completes, even if an RT node is stalled. Each RT thread gets 3 seconds to exit cleanly after `running` is set to `false`. If a thread doesn't exit within the timeout, it is detached and the scheduler continues shutting down other nodes. This prevents a single stalled node from blocking the entire process — critical for emergency stop scenarios where the robot must halt immediately. ## Emergency Stop Emergency stop is triggered automatically by: - Watchdog expiration (node hangs) - `Miss::Stop` policy on deadline miss - Exceeding the `max_deadline_misses` threshold When emergency stop triggers: 1. All node execution is halted 2. An emergency stop event is recorded in the BlackBox 3. The scheduler transitions to emergency state 4. RT threads are given 3 seconds to exit before being detached ### Inspecting After Emergency Stop ```rust,ignore use horus::prelude::*; let mut scheduler = Scheduler::new() .watchdog(500_u64.ms()) .blackbox(64) .tick_rate(1000_u64.hz()); // ... application runs and hits emergency stop ... // Inspect what happened via BlackBox // Inspect safety events via CLI: horus blackbox --anomalies if let Some(bb) = scheduler.get_blackbox() { for record in bb.lock().expect("blackbox lock").anomalies() { println!("[tick {}] {:?}", record.tick, record.event); } } ``` ## Best Practices ### 1. Start with Conservative Rates Set rates generously initially, then tighten after profiling: ```rust // simplified // Start: use rate() — auto-derives budget at 80% of period scheduler.add(motor_controller) .order(0) .rate(500_u64.hz()) // period=2ms, budget=1.6ms .on_miss(Miss::Warn) // Log only while tuning .build()?; // After profiling: tighten to 1kHz scheduler.add(motor_controller) .order(0) .rate(1000_u64.hz()) // period=1ms, budget=800us .on_miss(Miss::SafeMode) // Enforce in production .build()?; ``` ### 2. Layer Safety Checks Use composable builders (watchdog + blackbox) with per-node miss policies: ```rust // simplified // .watchdog() gives you frozen node detection // Budget enforcement is implicit from .rate() let mut scheduler = Scheduler::new() .watchdog(500_u64.ms()) .blackbox(64) .tick_rate(1000_u64.hz()); // Then set per-node policies for fine-grained control scheduler.add(motor_controller) .order(0) .rate(1000_u64.hz()) .on_miss(Miss::SafeMode) // Critical — enter safe state .build()?; scheduler.add(telemetry) .order(10) .rate(10_u64.hz()) .on_miss(Miss::Skip) // Non-critical — just skip .build()?; ``` ### 3. Choose the Right Configuration | Use Case | Configuration | |----------|--------------| | Medical / surgical robots | `.require_rt().watchdog(500_u64.ms()).blackbox(64)` | | Industrial control | `.require_rt().watchdog(500_u64.ms())` | | CNC / aerospace | `.require_rt().watchdog(500_u64.ms()).blackbox(64).max_deadline_misses(3)` | | General production | `.watchdog(500_u64.ms()).blackbox(64)` | ### 4. Test Safety Setup Verify your system handles deadline misses correctly: ```rust,ignore #[test] fn test_safety_critical_setup() { let mut scheduler = Scheduler::new() .watchdog(500_u64.ms()) .tick_rate(1000_u64.hz()); scheduler.add(test_node) .order(0) .rate(1000_u64.hz()) .on_miss(Miss::SafeMode) .build() .expect("should build node"); } ``` ## Graduated Watchdog Severity > **Note:** The watchdog and health states are managed automatically by the scheduler — you configure them via `.watchdog(Duration)` and `.on_miss(Miss)` on the node builder. The internal severity levels below explain the scheduler's behavior, not APIs you call directly. The watchdog doesn't just fire a binary "alive/dead" check. It uses **graduated severity** based on how many timeout multiples have elapsed since the last heartbeat: ```text Time since last heartbeat: 0────────1x timeout────────2x timeout────────3x timeout──── │ Ok │ Warning │ Expired │ Critical │ (healthy) │ (node is slow) │ (skip this node) │ (safety response) ``` | Severity | Threshold | Scheduler Response | |----------|-----------|-------------------| | **Ok** | Within timeout | Normal execution | | **Warning** | 1x timeout elapsed | Log warning, node health → `Warning` | | **Expired** | 2x timeout elapsed | Skip node in tick loop, health → `Unhealthy` | | **Critical** | 3x timeout elapsed | Trigger safety response, health → `Isolated` | This prevents a brief jitter from triggering an emergency stop. The scheduler escalates gradually: 1. **Warn** first (gives the node a chance to recover) 2. **Skip** if still unresponsive (other nodes keep running) 3. **Isolate** if critically stuck (enter safe state if configured) ## Tick Timing Ring The scheduler tracks per-node timing statistics using a circular ring buffer: - **Min/Max/Avg** tick execution time per node - Used by the monitor TUI and web dashboard to display CPU load - Helps identify nodes that are close to their budget limits ```rust // simplified use horus::prelude::*; // Timing stats are reported in the shutdown summary: // ┌─ Timing Report ─────────────────┐ // │ lidar_driver: avg=0.8ms max=1.2ms budget=2.0ms ✓ // │ planner: avg=4.5ms max=8.1ms budget=5.0ms ⚠ (max exceeds budget) // │ motor_ctrl: avg=0.2ms max=0.3ms budget=1.0ms ✓ // └──────────────────────────────────┘ ``` ## Design Decisions **Why graduated degradation instead of immediate shutdown?** A brief jitter (1x timeout) should not trigger an emergency stop -- the node may recover on the next tick. Graduated escalation (warn, skip, isolate) gives transient issues time to resolve while still catching persistent failures. This matches how industrial safety systems work: alarm first, then intervene. **Why auto-derive budget from `.rate()` instead of requiring explicit values?** Most developers know their node's desired frequency but not its exact execution time. Auto-deriving budget at 80% of period and deadline at 95% provides a safe starting point. After profiling, developers can override with explicit `.budget()` and `.deadline()` values. **Why cooperative watchdogs instead of preemptive?** The watchdog cannot preempt a running `tick()` -- it can only detect that a tick has not completed. This is a deliberate choice: preempting a tick mid-execution could leave hardware in an unsafe state. Thread isolation ensures that a stalled node does not cascade to other nodes, while the watchdog triggers the safety response. ## Trade-offs | Gain | Cost | |------|------| | Graduated degradation prevents overreaction to jitter | A truly stuck node takes 3x watchdog timeout to reach Isolated | | Auto-derived timing reduces configuration burden | 80%/95% defaults may not match your workload | | Per-node miss policies allow fine-grained control | Must configure each node individually | | Cooperative watchdogs cannot leave hardware in unsafe state | Cannot preempt a running tick | | Thread isolation prevents cascading stalls | RT threads consume OS resources | ## Common Errors | Symptom | Cause | Fix | |---------|-------|-----| | No watchdog warnings despite slow nodes | `.watchdog()` not set on the scheduler | Add `.watchdog(500_u64.ms())` to the scheduler builder | | Emergency stop on startup | Watchdog timeout shorter than node initialization time | Increase watchdog timeout, or ensure `init()` completes quickly | | `Miss::SafeMode` has no effect | Node does not implement `enter_safe_state()` | Implement `enter_safe_state()` and `is_safe_state()` on your Node | | Node stuck in Isolated state | Node's `is_safe_state()` always returns `false` | Fix `is_safe_state()` to return `true` when the node has reached a safe state | | Budget violations but no deadline misses | Budget is soft (tracking only), deadline is the hard trigger | This is expected. Budget violations are informational. Set `.on_miss()` for deadline enforcement | | High deadline miss count in shutdown report | Rate too aggressive for actual computation time | Profile with `horus monitor`, then lower `.rate()` or increase `.budget()` | ## See Also - [Scheduler Concepts](/concepts/core-concepts-scheduler) — How the scheduler manages node execution - [BlackBox Flight Recorder](/advanced/blackbox) — Event recording for post-mortem analysis - [Fault Tolerance](/advanced/circuit-breaker) — Per-node failure policies (restart, skip, ignore) - [Scheduler Configuration](/advanced/scheduler-configuration) — Builder methods and per-node configuration - [RT Setup](/advanced/rt-setup) — Linux real-time kernel for hard timing guarantees --- ## Linux RT Setup Path: /advanced/rt-setup Description: Configure Linux for real-time scheduling — PREEMPT_RT, permissions, CPU isolation, and Horus RT integration # Linux RT Setup HORUS handles real-time automatically — no manual setup required. `rate=1000` just works on any Linux. This page covers **optional** configuration for users who need the lowest possible jitter. ## Quick Setup (Recommended) ```bash # Check current RT status horus setup-rt --check # Install RT kernel + configure system (interactive) sudo horus setup-rt # Reboot, then verify sudo reboot horus setup-rt --check ``` That's it. `horus setup-rt` detects your distro, installs the RT kernel package, configures memory lock limits, and suggests CPU isolation. See [Real-Time Tuning](/performance/rt-tuning) for what each setting does. ## When To Use This - Your robot has force control, balance, or high-bandwidth servo loops that need ±20μs jitter - You are using `.require_rt()` on the scheduler - You see "Operation not permitted" or "cannot set SCHED_FIFO" errors at startup - `horus doctor` shows "Standard kernel" in the Real-Time section **Skip this if** you are prototyping, running in simulation, or doing position control. The default scheduler works without RT configuration, and `.prefer_rt()` degrades gracefully when RT is unavailable. ## Manual Setup If `horus setup-rt` doesn't support your distro, or you prefer manual control: ### Check Current RT Capabilities ```bash # HORUS built-in check (recommended) horus setup-rt --check # Or manually: uname -v | grep -i preempt # Check for PREEMPT_RT ulimit -r # RT priority limit (0 = no RT) chrt -f 1 echo "RT works" # Test SCHED_FIFO ``` If `ulimit -r` returns `0` or `chrt` fails with "Operation not permitted", follow the sections below. ## Grant RT Permissions Edit `/etc/security/limits.conf` to allow your user (or group) to use RT scheduling: ```bash # Add to /etc/security/limits.conf # Replace 'robotics' with your username or group (@groupname for groups) robotics soft rtprio 99 robotics hard rtprio 99 robotics soft memlock unlimited robotics hard memlock unlimited ``` **Log out and back in** for changes to take effect. Verify with `ulimit -r` — it should now return `99`. ## Install PREEMPT_RT Kernel A standard kernel uses `PREEMPT_VOLUNTARY` or `PREEMPT_DYNAMIC`, which gives millisecond-scale worst-case latency. `PREEMPT_RT` brings that down to microseconds. ### Ubuntu / Debian ```bash sudo apt install linux-image-rt-amd64 # Debian sudo apt install linux-lowlatency # Ubuntu (close to RT) # For full PREEMPT_RT on Ubuntu: sudo apt install linux-image-realtime # Ubuntu Pro / 24.04+ ``` Reboot and select the RT kernel from GRUB. Verify: ```bash uname -v # Should contain "PREEMPT_RT" or "PREEMPT RT" ``` ### From Source (Any Distro) Download the PREEMPT_RT patch from [kernel.org/pub/linux/kernel/projects/rt](https://kernel.org/pub/linux/kernel/projects/rt/), apply it to a matching kernel version, and build with `CONFIG_PREEMPT_RT=y`. ## CPU Isolation Isolate cores from the Linux scheduler so only your RT threads run on them. This eliminates scheduling jitter from other processes. Add `isolcpus` to your kernel command line in `/etc/default/grub`: ```bash # Isolate cores 2 and 3 for RT use GRUB_CMDLINE_LINUX="isolcpus=2,3 nohz_full=2,3 rcu_nocbs=2,3" ``` Then `sudo update-grub && sudo reboot`. Verify with: ```bash cat /sys/devices/system/cpu/isolated # Should output: 2-3 ``` Pin Horus nodes to isolated cores: ## Grant CAP_SYS_NICE As an alternative to `limits.conf`, you can grant the RT capability directly to a binary: ```bash sudo setcap cap_sys_nice=eip ./target/release/my_robot ``` This lets that specific binary use RT scheduling without root or limits.conf changes. Useful for deployment where you do not want blanket RT permissions. ## Verify the Setup ```bash # Run a program with SCHED_FIFO at priority 50 chrt -f 50 ./target/release/my_robot # Check that it is actually running with RT scheduling ps -eo pid,cls,rtprio,comm | grep my_robot # Should show "FF" (FIFO) and priority 50 ``` ## Horus RT Integration ### `.prefer_rt()` vs `.require_rt()` (Rust) / `rt=True` (Python) Use `.prefer_rt()` / `rt=True` during development. Use `.require_rt()` (Rust) in production when timing guarantees matter. ### Checking Degradations After building the scheduler, inspect whether RT was successfully acquired: If RT was requested but unavailable, a degradation entry will explain why (missing permissions, no PREEMPT_RT, etc). ## Troubleshooting ### "Operation not permitted" / "cannot set SCHED_FIFO" 1. Check `ulimit -r` — must be > 0 2. Check that limits.conf changes are applied (requires re-login) 3. Try `setcap cap_sys_nice=eip` on the binary 4. If running in Docker: add `--cap-add SYS_NICE` to `docker run` ### RT works but latency is high 1. Verify `PREEMPT_RT` kernel: `uname -v | grep PREEMPT_RT` 2. Check for isolated CPUs: `cat /sys/devices/system/cpu/isolated` 3. Disable CPU frequency scaling: `cpupower frequency-set -g performance` 4. Disable SMT/hyperthreading in BIOS for dedicated RT cores ## Platform-Specific Notes ### NVIDIA Jetson Jetson runs a custom L4T kernel. PREEMPT_RT patches are available from NVIDIA for Jetson Orin and later. Apply them when building the kernel with the Jetson Linux BSP. `isolcpus` works — isolate the performance cores (typically 4-7 on Orin). ### Raspberry Pi Use the `linux-image-rt` package from the Raspberry Pi OS repo, or apply the PREEMPT_RT patch to the `rpi-6.x.y` kernel branch. The Pi 4/5 have 4 cores — isolating cores 2-3 for RT while leaving 0-1 for the OS works well. Set `arm_freq` in `config.txt` to a fixed value to avoid frequency scaling jitter. ## Design Decisions **Why `.prefer_rt()` and `.require_rt()` instead of always using RT?** RT scheduling requires kernel support and permissions that are not always available -- especially during development, CI, or in containers. `.prefer_rt()` lets you develop on any Linux system and only enforce RT in production. `.require_rt()` fails fast at startup so you know immediately if your deployment target is misconfigured. **Why CPU isolation with `isolcpus` instead of just thread affinity?** Thread affinity (`.core(N)`) pins your node to a specific core, but other processes can still be scheduled on that core. `isolcpus` removes the core from the Linux scheduler entirely, so only your pinned threads run there. This eliminates scheduling jitter from kernel threads, interrupts, and other userspace processes. **Why `CAP_SYS_NICE` instead of running as root?** Running a robot as root is a security risk. `setcap cap_sys_nice=eip` grants RT scheduling to a specific binary without root access. This follows the principle of least privilege -- the binary can set thread priorities but cannot modify the filesystem or network configuration. ## Trade-offs | Gain | Cost | |------|------| | PREEMPT_RT gives microsecond worst-case latency | Must install and maintain a separate kernel | | CPU isolation eliminates scheduling jitter | Isolated cores are unavailable for other processes | | `setcap` avoids running as root | Must re-apply after recompiling the binary | | `.prefer_rt()` degrades gracefully | Timing guarantees silently degrade if RT is unavailable | ## Common Errors | Symptom | Cause | Fix | |---------|-------|-----| | "Operation not permitted" on startup | Missing RT permissions | Add `rtprio 99` to `/etc/security/limits.conf` and re-login | | `ulimit -r` still returns 0 after editing limits.conf | Did not log out and back in | Log out completely (not just close terminal) and log back in | | RT works but latency spikes above 1ms | No PREEMPT_RT kernel | Install `linux-image-rt-amd64` (Debian) or `linux-image-realtime` (Ubuntu) | | Latency spikes on specific cores | CPU frequency scaling or SMT interference | Set `cpupower frequency-set -g performance` and disable hyperthreading in BIOS | | `.require_rt()` panics in Docker | Missing `CAP_SYS_NICE` capability | Add `--cap-add SYS_NICE` to `docker run` | | `setcap` has no effect | Binary was recompiled after `setcap` | Re-run `setcap` after every `cargo build --release` | | Isolated cores show 0% usage in `htop` | Expected -- `isolcpus` removes them from the general scheduler | Use `htop` with per-thread view to see your pinned RT threads | --- ## RT Readiness Report Run `horus doctor --rt` to get a complete assessment of your system's real-time capabilities. The report includes a live jitter benchmark, IPC latency measurement, and actionable recommendations. ```bash horus doctor --rt ``` **Example output:** ``` ╔══════════════════════════════════════════════════════════════╗ ║ HORUS RT READINESS REPORT ║ ║ Grade: STANDARD ★★☆ ║ ╠══════════════════════════════════════════════════════════════╣ ║ SYSTEM ║ ║ Kernel: Linux 6.1.0-rt7 ║ ║ PREEMPT_RT: ✗ ║ ║ SCHED_FIFO: ✓ ║ ║ Memory lock: ✓ ║ ║ CPUs: 8 total, 2 isolated ║ ╠══════════════════════════════════════════════════════════════╣ ║ JITTER BENCHMARK @ 1kHz (2999 samples) ║ ║ P99: 12.3 μs ║ ║ Max: 45.7 μs ║ ║ Rate: 999.8 Hz (target: 1000 Hz) ║ ╠══════════════════════════════════════════════════════════════╣ ║ IPC BENCHMARK ║ ║ Latency: 136 ns per message ║ ║ Throughput: 7362626 msg/sec ║ ╠══════════════════════════════════════════════════════════════╣ ║ RECOMMENDATIONS ║ ║ → Install PREEMPT_RT for sub-20μs jitter ║ ╚══════════════════════════════════════════════════════════════╝ ``` **Grades:** | Grade | Meaning | Requirements | |-------|---------|-------------| | **Production** ★★★ | Safety-critical deployment | PREEMPT_RT + SCHED_FIFO + mlockall + P99 jitter <50μs | | **Standard** ★★☆ | Most robotics applications | SCHED_FIFO + P99 jitter <500μs | | **Development** ★☆☆ | Prototyping only | No RT capabilities or high jitter | Run this command on every target machine before deploying. The recommendations tell you exactly what to fix. **From code:** ```rust use horus_core::scheduling::rt_report::RtReport; use std::time::Duration; let report = RtReport::generate(Duration::from_secs(5)); report.print(); assert!(report.is_production_ready()); ``` --- ## What HORUS Auto-Configures After `horus setup-rt`, when your code uses `.core()` or `.prefer_rt()`, the RT executor automatically: 1. **Locks CPU governor to `performance`** on pinned cores — prevents frequency scaling jitter (10-100us spikes) 2. **Moves hardware interrupts** off RT cores — prevents IRQ latency (50-500us spikes) 3. **Attempts SCHED_DEADLINE** when `.deadline_scheduler()` is used — kernel-guaranteed CPU bandwidth 4. **Falls back gracefully** to SCHED_FIFO, then normal scheduling, if any feature is unavailable You don't need to configure these manually — they happen automatically when the permissions are available. ## SCHED_DEADLINE (Advanced) For nodes that need **kernel-guaranteed** CPU bandwidth (not just priority): ```rust scheduler.add(motor_ctrl) .rate(1000_u64.hz()) .budget(500_u64.us()) .deadline_scheduler() // kernel EDF scheduling .build()?; ``` The kernel guarantees 500us of CPU every 1ms. If the system can't honor this (CPU overcommitted), it rejects the request and HORUS falls back to SCHED_FIFO. Requires PREEMPT_RT kernel (installed by `horus setup-rt`) and `CAP_SYS_NICE`. --- ## See Also - [Real-Time Concepts](/concepts/real-time) — Why real-time matters for robotics - [Execution Classes](/concepts/execution-classes) — RT auto-detection from `.rate()`, `.budget()`, `.deadline()` - [Scheduler Configuration](/advanced/scheduler-configuration) — Tick rate tuning and per-node RT options - [Safety Monitor](/advanced/safety-monitor) — Deadline enforcement and watchdog timers --- ## Advanced Topics Path: /advanced Description: Scheduler tuning, determinism, safety monitoring, and production configuration for HORUS # Advanced Topics Production-readiness features for HORUS: real-time scheduling, safety monitoring, failure recovery, and data recording. These guides solve specific problems you encounter when moving from prototype to deployment. ## Quick Reference | I need to... | Read this | |--------------|-----------| | Tune tick rates, thread pools, and execution classes | [Scheduler Configuration](/advanced/scheduler-configuration) | | Get reproducible execution for simulation or testing | [Deterministic Mode](/advanced/deterministic-mode) | | Configure Linux for hard real-time scheduling | [RT Setup](/advanced/rt-setup) | | Monitor node health and enforce timing deadlines | [Safety Monitor](/advanced/safety-monitor) | | Handle node failures without crashing the system | [Fault Tolerance](/advanced/circuit-breaker) | | Record events for crash forensics | [BlackBox Recorder](/advanced/blackbox) | | Record and replay full sessions for debugging | [Record & Replay](/advanced/record-replay) | | Understand communication backends and latency | [Network Backends](/advanced/network-backends) | --- ## Scheduling & Timing - [Scheduler Configuration](/advanced/scheduler-configuration) — Tick rates, execution classes, per-node timing, and priority ordering - [Deterministic Mode](/advanced/deterministic-mode) — Reproducible execution with SimClock, dependency ordering, and seeded RNG - [RT Setup](/advanced/rt-setup) — Linux real-time kernel, SCHED_FIFO, CPU isolation, and PREEMPT_RT ## Safety & Reliability - [Safety Monitor](/advanced/safety-monitor) — Watchdog timers, budget enforcement, deadline miss policies, and graduated degradation - [Fault Tolerance](/advanced/circuit-breaker) — Per-node failure policies (Fatal, Restart, Skip, Ignore) for preventing cascading failures ## Data Recording - [BlackBox Recorder](/advanced/blackbox) — Flight recorder for post-crash analysis with ring buffer and CLI tools - [Record & Replay](/advanced/record-replay) — Session recording for debugging, regression testing, and mixed replay ## Infrastructure - [Network Backends](/advanced/network-backends) — Automatic backend selection, shared memory IPC, and planned network transport --- ## See Also - [Core Concepts](/concepts) — Prerequisite knowledge for all advanced topics - [Recipes](/recipes) — Practical code patterns that use these features - [Rust API Reference](/rust/api) — Exact method signatures and parameters - [Performance](/performance/performance) — Optimization and benchmarks ======================================== # SECTION: Standard Library ======================================== --- ## Imu Path: /stdlib/messages/imu Description: Inertial Measurement Unit data — accelerometer, gyroscope, and orientation for sensor fusion and motion estimation # Imu Provides three-axis linear acceleration, three-axis angular velocity, and an optional orientation quaternion from an IMU sensor. The primary message for motion sensing, sensor fusion, and orientation estimation on mobile robots, drones, and manipulators. > **Python**: Available via `horus.Imu(accel_x, accel_y, accel_z, gyro_x, gyro_y, gyro_z)`. See [Python Sensor Messages](/python/messages/sensor). > > **ROS2 equivalent**: `sensor_msgs/Imu` — identical field layout (orientation quaternion + angular velocity + linear acceleration + covariance matrices). ```rust // simplified use horus::prelude::*; ``` --- ## Quick Reference — Fields | Field | Type | Unit | Description | |-------|------|------|-------------| | `orientation` | `[f64; 4]` | — | Quaternion `[x, y, z, w]`. Default: identity `[0, 0, 0, 1]` | | `orientation_covariance` | `[f64; 9]` | — | 3x3 row-major covariance. `[0] = -1.0` means no orientation data | | `angular_velocity` | `[f64; 3]` | rad/s | Angular velocity `[x, y, z]` | | `angular_velocity_covariance` | `[f64; 9]` | — | 3x3 row-major covariance matrix | | `linear_acceleration` | `[f64; 3]` | m/s² | Linear acceleration `[x, y, z]` | | `linear_acceleration_covariance` | `[f64; 9]` | — | 3x3 row-major covariance matrix | | `timestamp_ns` | `u64` | ns | Timestamp in nanoseconds since epoch | ## Quick Reference — Methods | Method | Returns | Description | |--------|---------|-------------| | `new()` | `Imu` | Creates with identity orientation, zero values, current timestamp | | `set_orientation_from_euler(roll, pitch, yaw)` | `()` | Sets orientation quaternion from Euler angles | | `has_orientation()` | `bool` | Checks if orientation data is available | | `is_valid()` | `bool` | Checks if all values are finite | | `angular_velocity_vec()` | `Vector3` | Returns angular velocity as a `Vector3` | | `linear_acceleration_vec()` | `Vector3` | Returns linear acceleration as a `Vector3` | --- ## Constructor Methods ### `new()` Creates a new IMU message with identity orientation, zero acceleration and velocity, and the current timestamp. **Signature** ```rust // simplified pub fn new() -> Self ``` **Parameters** None. **Returns** `Imu` — with these defaults: - `orientation`: `[0.0, 0.0, 0.0, 1.0]` (identity quaternion) - `orientation_covariance`: `[-1.0; 9]` (no orientation data — `has_orientation()` returns `false`) - `angular_velocity`: `[0.0; 3]` - `linear_acceleration`: `[0.0; 3]` - All covariances: `[0.0; 9]` - `timestamp_ns`: current time via `timestamp_now()` **Panics** Never. **Behavior** - The `orientation_covariance[0]` is set to `-1.0`, meaning no orientation is available by default. Call `set_orientation_from_euler()` or set `orientation` directly to provide orientation data, then set `orientation_covariance[0]` to a non-negative value. **Example** ```rust // simplified use horus::prelude::*; let mut imu = Imu::new(); imu.linear_acceleration = [0.0, 0.0, 9.81]; // gravity on Z imu.angular_velocity = [0.0, 0.0, 0.1]; // yawing slowly let topic: Topic = Topic::new("imu.data")?; topic.send(&imu); ``` --- ### Python Constructor Creates an IMU message from individual axis values. **Signature** ```python Imu(accel_x, accel_y, accel_z, gyro_x, gyro_y, gyro_z, timestamp_ns=0) ``` **Parameters** | Name | Type | Required | Description | |------|------|----------|-------------| | `accel_x` | `f64` | yes | Linear acceleration X-axis (m/s²) | | `accel_y` | `f64` | yes | Linear acceleration Y-axis (m/s²) | | `accel_z` | `f64` | yes | Linear acceleration Z-axis (m/s²) | | `gyro_x` | `f64` | yes | Angular velocity X-axis (rad/s) | | `gyro_y` | `f64` | yes | Angular velocity Y-axis (rad/s) | | `gyro_z` | `f64` | yes | Angular velocity Z-axis (rad/s) | | `timestamp_ns` | `u64` | no | Timestamp in nanoseconds. Default: `0`. | **Returns** `Imu` instance. **Example** ```python from horus import Imu, Topic # Stationary sensor measuring gravity imu = Imu(0.0, 0.0, 9.81, 0.0, 0.0, 0.1) topic = Topic(Imu) topic.send(imu) # Access individual axes (read/write) print(f"Accel Z: {imu.accel_z}") # 9.81 imu.gyro_z = 0.2 ``` --- ## Methods ### `set_orientation_from_euler(roll, pitch, yaw)` Sets the orientation quaternion from Euler angles. Automatically marks orientation as available. **Signature** ```rust // simplified pub fn set_orientation_from_euler(&mut self, roll: f64, pitch: f64, yaw: f64) ``` **Parameters** | Name | Type | Required | Description | |------|------|----------|-------------| | `roll` | `f64` | yes | Roll angle in radians (rotation about X-axis) | | `pitch` | `f64` | yes | Pitch angle in radians (rotation about Y-axis) | | `yaw` | `f64` | yes | Yaw angle in radians (rotation about Z-axis) | **Returns** Nothing (`()`). **Panics** Never. NaN/infinite angles produce a NaN quaternion — check with `is_valid()`. **Behavior** - Converts Euler angles to quaternion via `Quaternion::from_euler()` - Stores result in `self.orientation` as `[x, y, z, w]` - Does NOT automatically update `orientation_covariance` — set `orientation_covariance[0]` to a non-negative value manually if you want `has_orientation()` to return `true` **Example** ```rust // simplified use horus::prelude::*; let mut imu = Imu::new(); imu.set_orientation_from_euler(0.0, 0.0, std::f64::consts::FRAC_PI_2); // 90° yaw imu.orientation_covariance[0] = 0.01; // mark orientation as available assert!(imu.has_orientation()); ``` --- ### `has_orientation()` Checks whether orientation data is available. **Signature** ```rust // simplified pub fn has_orientation(&self) -> bool ``` **Parameters** None. **Returns** `true` if `orientation_covariance[0] >= 0.0`. `false` if `orientation_covariance[0] < 0.0` (the default from `new()`). **Behavior** - Convention follows ROS2 `sensor_msgs/Imu`: setting `orientation_covariance[0]` to `-1.0` signals that orientation is unavailable (e.g., gyro-only sensor) - Consumers should always call `has_orientation()` before reading the `orientation` quaternion **When to use** - Before reading the orientation quaternion — avoids using stale identity values - Filtering in fusion nodes — only fuse orientation if the sensor provides it **Example** ```rust // simplified use horus::prelude::*; fn process_imu(imu: &Imu) { if imu.has_orientation() { let q = imu.orientation; // safe to use } else { // Gyro-only — compute orientation from angular velocity integration } } ``` --- ### `is_valid()` Checks whether all acceleration, velocity, and orientation values are finite. **Signature** ```rust // simplified pub fn is_valid(&self) -> bool ``` **Parameters** None. **Returns** `true` if all values in `orientation`, `angular_velocity`, and `linear_acceleration` are finite (not NaN, not infinite). `false` otherwise. **When to use** - Validating sensor data before publishing — hardware faults can produce NaN - Input validation in fusion nodes — reject corrupted readings **Example** ```rust // simplified use horus::prelude::*; fn tick(&mut self) { let imu = read_hardware_imu(); if imu.is_valid() { self.imu_pub.send(imu); } else { hlog!(warn, "Invalid IMU reading — hardware fault?"); } } ``` --- ### `angular_velocity_vec()` Returns angular velocity as a `Vector3` struct. **Signature** ```rust // simplified pub fn angular_velocity_vec(&self) -> Vector3 ``` **Parameters** None. **Returns** `Vector3` — with `x`, `y`, `z` fields matching `angular_velocity[0]`, `[1]`, `[2]`. **When to use** - When you need named field access (`.x`, `.y`, `.z`) instead of array indexing - Passing velocity to functions that accept `Vector3` --- ### `linear_acceleration_vec()` Returns linear acceleration as a `Vector3` struct. **Signature** ```rust // simplified pub fn linear_acceleration_vec(&self) -> Vector3 ``` **Parameters** None. **Returns** `Vector3` — with `x`, `y`, `z` fields matching `linear_acceleration[0]`, `[1]`, `[2]`. --- ## Common Patterns ### Sensor Fusion Pipeline ``` IMU hardware → Imu message → complementary/Kalman filter → Pose3D (orientation) └→ dead reckoning → Odometry (position estimate) ``` ### Fall Detection ```rust // simplified use horus::prelude::*; fn check_freefall(imu: &Imu) -> bool { let accel = imu.linear_acceleration_vec(); let magnitude = (accel.x * accel.x + accel.y * accel.y + accel.z * accel.z).sqrt(); magnitude < 1.0 // Near-zero gravity = freefall } ``` ### Covariance Convention Set `orientation_covariance[0]` to `-1.0` when orientation is not available (e.g., gyro-only sensor). Consumers should call `has_orientation()` before reading the quaternion. This matches the ROS2 `sensor_msgs/Imu` convention. --- ## See Also - [Sensor Messages (Rust)](/rust/api/sensor-messages) — Full Rust API for all sensor types - [Sensor Messages (Python)](/python/messages/sensor) — Python sensor API - [IMU Reader Recipe](/recipes/imu-reader) — Complete IMU integration guide - [Sensor Node Tutorial](/tutorials/01-sensor-node) — Build an IMU node step by step - [Odometry](/stdlib/messages/odometry) — Combine with IMU for dead reckoning --- ## CmdVel Path: /stdlib/messages/cmd-vel Description: Velocity command for mobile robots — linear speed and angular turning rate # CmdVel Sends a 2D velocity command with forward speed and turning rate. The most common message type in mobile robotics — used between planners, controllers, and drive systems on differential drive, holonomic, and Ackermann platforms. > **Python**: Available via `horus.CmdVel(linear, angular)`. See [Python Control Messages](/python/messages/control). > > **ROS2 equivalent**: `geometry_msgs/Twist` (2D subset). HORUS uses a dedicated 2D type for the common case. Converts to/from `Twist` automatically. ```rust // simplified use horus::prelude::*; ``` --- ## Quick Reference — Fields | Field | Type | Unit | Description | |-------|------|------|-------------| | `linear` | `f32` | m/s | Forward velocity. Positive = forward, negative = backward | | `angular` | `f32` | rad/s | Turning rate. Positive = counter-clockwise, negative = clockwise | | `timestamp_ns` | `u64` | ns | Timestamp in nanoseconds since epoch | **Size**: 16 bytes (`u64` + `f32` + `f32`). `#[repr(C)]` for zero-copy IPC. ## Quick Reference — Methods | Method | Returns | Description | |--------|---------|-------------| | `new(linear, angular)` | `CmdVel` | Creates with current timestamp | | `zero()` | `CmdVel` | Creates zero velocity (stopped) | | `with_timestamp(linear, angular, timestamp_ns)` | `CmdVel` | Creates with explicit timestamp | | `default()` | `CmdVel` | Same as `zero()` | ## Quick Reference — Conversions | From | To | Mapping | |------|----|---------| | `Twist` | `CmdVel` | `twist.linear[0]` → `linear`, `twist.angular[2]` → `angular` (f64→f32) | | `CmdVel` | `Twist` | `linear` → `twist.linear[0]`, `angular` → `twist.angular[2]` (f32→f64) | --- ## Constructor Methods ### `new(linear, angular)` Creates a velocity command with the current timestamp. **Signature** ```rust // simplified pub fn new(linear: f32, angular: f32) -> Self ``` **Parameters** | Name | Type | Required | Description | |------|------|----------|-------------| | `linear` | `f32` | yes | Forward velocity in m/s. Positive = forward, negative = backward. | | `angular` | `f32` | yes | Turning rate in rad/s. Positive = counter-clockwise, negative = clockwise. | **Returns** `CmdVel` — with `timestamp_ns` set to the current time via `timestamp_now()`. **Panics** Never. **Example** ```rust // simplified use horus::prelude::*; // Drive forward at 0.5 m/s, turn left at 0.3 rad/s let cmd = CmdVel::new(0.5, 0.3); let topic: Topic = Topic::new("cmd_vel")?; topic.send(cmd); ``` --- ### `zero()` Creates a zero velocity command (stopped). **Signature** ```rust // simplified pub fn zero() -> Self ``` **Parameters** None. **Returns** `CmdVel` — with `linear = 0.0`, `angular = 0.0`, and current timestamp. **When to use** - Shutdown sequences — always send zero velocity before the motor controller stops - Emergency stop — override the current command with zero - Idle state — robot should hold position **Example** ```rust // simplified use horus::prelude::*; // SAFETY: always send zero on shutdown to prevent runaway fn shutdown(&mut self) -> Result<()> { self.cmd_pub.send(CmdVel::zero()); Ok(()) } ``` --- ### `with_timestamp(linear, angular, timestamp_ns)` Creates a velocity command with an explicit timestamp. **Signature** ```rust // simplified pub fn with_timestamp(linear: f32, angular: f32, timestamp_ns: u64) -> Self ``` **Parameters** | Name | Type | Required | Description | |------|------|----------|-------------| | `linear` | `f32` | yes | Forward velocity in m/s. | | `angular` | `f32` | yes | Turning rate in rad/s. | | `timestamp_ns` | `u64` | yes | Timestamp in nanoseconds since epoch. | **Returns** `CmdVel` — with the specified timestamp. **When to use** - Replay/recording systems where timestamps come from a log file - Simulation with deterministic time --- ### Python Constructor Creates a velocity command from keyword arguments. **Signature** ```python CmdVel(linear, angular, timestamp_ns=0) ``` **Parameters** | Name | Type | Required | Description | |------|------|----------|-------------| | `linear` | `f32` | yes | Forward velocity in m/s | | `angular` | `f32` | yes | Turning rate in rad/s | | `timestamp_ns` | `u64` | no | Timestamp. Default: `0`. | **Returns** `CmdVel` instance. Python also provides a class method `CmdVel.zero()`. **Example** ```python from horus import CmdVel, Topic cmd = CmdVel(linear=0.5, angular=0.3) topic = Topic(CmdVel) topic.send(cmd) # Stop the robot topic.send(CmdVel.zero()) ``` --- ## Safety Notes - **Clamp values** — enforce maximum speed and turn rate before sending to hardware drivers - **Pair with E-stop** — the [Emergency Stop](/recipes/emergency-stop) recipe overrides `cmd_vel` when triggered - **Watchdog timeout** — motor controllers should stop if no `CmdVel` is received within a timeout (e.g., 100ms) --- ## Common Patterns ### Differential Drive Kinematics Convert `CmdVel` to left/right wheel speeds: ```rust // simplified use horus::prelude::*; fn to_wheel_speeds(cmd: &CmdVel, wheel_base: f32, wheel_radius: f32) -> (f32, f32) { let left = (cmd.linear - cmd.angular * wheel_base / 2.0) / wheel_radius; let right = (cmd.linear + cmd.angular * wheel_base / 2.0) / wheel_radius; (left, right) } ``` See the [Differential Drive](/recipes/differential-drive) recipe for a complete example. ### Clamped Teleop ```rust // simplified use horus::prelude::*; fn clamp_cmd(cmd: CmdVel, max_linear: f32, max_angular: f32) -> CmdVel { CmdVel::new( cmd.linear.clamp(-max_linear, max_linear), cmd.angular.clamp(-max_angular, max_angular), ) } ``` --- ## See Also - [Control Messages (Rust)](/rust/api/control-messages) — Full Rust API for all control types - [Control Messages (Python)](/python/messages/control) — Python control API - [Twist](/stdlib/messages/twist) — Full 3D linear + angular velocity - [Odometry](/stdlib/messages/odometry) — Position feedback from wheel encoders - [Differential Drive Recipe](/recipes/differential-drive) — Mobile robot control - [Emergency Stop Recipe](/recipes/emergency-stop) — Safety stop pattern --- ## LaserScan Path: /stdlib/messages/laser-scan Description: 2D LiDAR scan data for obstacle detection, mapping, and SLAM # LaserScan Stores up to 360 range measurements from a 2D LiDAR sensor in a fixed-size array. The fixed `[f32; 360]` layout makes it a POD type safe for zero-copy shared memory transport (~50 ns per message). > **Python**: Available via `horus.LaserScan(angle_min, angle_max, ...)`. See [Python Sensor Messages](/python/messages/sensor). > > **ROS2 equivalent**: `sensor_msgs/LaserScan` — same conceptual fields. HORUS uses a fixed `[f32; 360]` array (shared-memory safe) instead of a dynamic `Vec`. ```rust // simplified use horus::prelude::*; ``` --- ## Quick Reference — Fields | Field | Type | Unit | Default | Description | |-------|------|------|---------|-------------| | `ranges` | `[f32; 360]` | m | `[0.0; 360]` | Range measurements. `0.0` = invalid reading | | `angle_min` | `f32` | rad | `-PI` | Start angle of the scan | | `angle_max` | `f32` | rad | `PI` | End angle of the scan | | `range_min` | `f32` | m | `0.1` | Minimum valid range | | `range_max` | `f32` | m | `30.0` | Maximum valid range | | `angle_increment` | `f32` | rad | `PI/180` | Angular resolution (1 degree) | | `time_increment` | `f32` | s | `0.0` | Time between individual measurements | | `scan_time` | `f32` | s | `0.1` | Time to complete a full scan | | `timestamp_ns` | `u64` | ns | `0` | Timestamp in nanoseconds since epoch | ## Quick Reference — Methods | Method | Returns | Description | |--------|---------|-------------| | `new()` | `LaserScan` | Creates with default parameters and current timestamp | | `angle_at(index)` | `f32` | Angle in radians for a given range index | | `is_range_valid(index)` | `bool` | Checks if a range reading is valid | | `valid_count()` | `usize` | Number of valid range readings | | `min_range()` | `Option` | Minimum valid range, or `None` | --- ## Constructor Methods ### `new()` Creates a laser scan with default parameters and the current timestamp. **Signature** ```rust // simplified pub fn new() -> Self ``` **Parameters** None. **Returns** `LaserScan` — with defaults: `-PI` to `PI` scan range, 1-degree resolution, 0.1–30.0m valid range, all ranges zeroed, and `timestamp_ns` set to current time. **Panics** Never. **Example** ```rust // simplified use horus::prelude::*; let mut scan = LaserScan::new(); scan.range_min = 0.1; scan.range_max = 12.0; // Populate from sensor driver scan.ranges[0] = 2.5; // 2.5m at angle_min scan.ranges[90] = 1.2; // 1.2m at 90° scan.ranges[180] = 0.0; // invalid (no return) let topic: Topic = Topic::new("lidar.scan")?; topic.send(&scan); ``` --- ### Python Constructor Creates a laser scan from keyword arguments. **Signature** ```python LaserScan(angle_min=0.0, angle_max=0.0, angle_increment=0.0, range_min=0.0, range_max=0.0, ranges=None, timestamp_ns=0) ``` **Parameters** | Name | Type | Required | Description | |------|------|----------|-------------| | `angle_min` | `f32` | no | Start angle in radians. Default: `0.0`. | | `angle_max` | `f32` | no | End angle in radians. Default: `0.0`. | | `angle_increment` | `f32` | no | Angular resolution in radians. Default: `0.0`. | | `range_min` | `f32` | no | Minimum valid range in meters. Default: `0.0`. | | `range_max` | `f32` | no | Maximum valid range in meters. Default: `0.0`. | | `ranges` | `list[f32]` | no | Range values. Padded with zeros to 360, truncated if longer. Default: `None` (all zeros). | | `timestamp_ns` | `u64` | no | Timestamp. Default: `0`. | **Returns** `LaserScan` instance. Python supports `len(scan)` which returns `valid_count()`. **Example** ```python from horus import LaserScan, Topic scan = LaserScan( angle_min=-3.14159, angle_max=3.14159, angle_increment=0.01745, range_min=0.1, range_max=12.0, ranges=[1.0, 1.1, 1.2, 0.0, 2.5], ) topic = Topic(LaserScan) topic.send(scan) print(f"Min range: {scan.min_range()}") print(f"Valid readings: {len(scan)}") ``` --- ## Methods ### `angle_at(index)` Returns the angle in radians for a given range index. **Signature** ```rust // simplified pub fn angle_at(&self, index: usize) -> f32 ``` **Parameters** | Name | Type | Required | Description | |------|------|----------|-------------| | `index` | `usize` | yes | Range array index (0–359). | **Returns** `f32` — angle in radians, computed as `angle_min + (index as f32) * angle_increment`. **Panics** Never. Out-of-range indices produce mathematically valid but meaningless angles. **When to use** - Converting a range reading to a Cartesian point: `x = range * cos(angle_at(i))`, `y = range * sin(angle_at(i))` - Filtering readings by angular sector **Example** ```rust // simplified use horus::prelude::*; let scan = LaserScan::new(); for i in 0..360 { if scan.is_range_valid(i) { let angle = scan.angle_at(i); let x = scan.ranges[i] * angle.cos(); let y = scan.ranges[i] * angle.sin(); // (x, y) is the obstacle position in the sensor frame } } ``` --- ### `is_range_valid(index)` Checks whether a range reading at the given index is valid. **Signature** ```rust // simplified pub fn is_range_valid(&self, index: usize) -> bool ``` **Parameters** | Name | Type | Required | Description | |------|------|----------|-------------| | `index` | `usize` | yes | Range array index (0–359). | **Returns** `true` if `ranges[index]` is within `[range_min, range_max]` AND is finite. `false` otherwise (including `0.0`, NaN, infinite). **Behavior** - `0.0` is always invalid — it indicates no return from the sensor - Values below `range_min` are invalid (too close, likely noise) - Values above `range_max` are invalid (beyond sensor capability) - NaN and infinite values are invalid (sensor fault) --- ### `valid_count()` Returns the number of valid range readings in the scan. **Signature** ```rust // simplified pub fn valid_count(&self) -> usize ``` **Parameters** None. **Returns** `usize` — count of indices where `is_range_valid(i)` returns `true`. **When to use** - Diagnostics — a scan with zero valid readings indicates a sensor fault - Adaptive algorithms that need to know scan density --- ### `min_range()` Returns the minimum valid range reading, or `None` if no readings are valid. **Signature** ```rust // simplified pub fn min_range(&self) -> Option ``` **Parameters** None. **Returns** - `Some(f32)` — the smallest valid range value in the scan - `None` — no valid readings exist **When to use** - Emergency stop — if `min_range() < safety_distance`, halt the robot - Closest-obstacle detection for reactive navigation **When NOT to use** - When you need the angle of the closest obstacle — iterate with `angle_at()` instead **Example** ```rust // simplified use horus::prelude::*; fn emergency_stop(scan: &LaserScan, safety_distance: f32) -> bool { match scan.min_range() { Some(closest) => closest < safety_distance, None => true, // no valid readings = assume danger } } ``` --- ## Production Example Reactive obstacle avoidance node: ```rust // simplified use horus::prelude::*; struct ObstacleAvoidance { scan_sub: Topic, cmd_pub: Topic, safety_distance: f32, } impl Node for ObstacleAvoidance { fn name(&self) -> &str { "ObstacleAvoidance" } fn tick(&mut self) { if let Some(scan) = self.scan_sub.recv() { if let Some(closest) = scan.min_range() { if closest < self.safety_distance { // Too close — stop and turn self.cmd_pub.send(CmdVel::new(0.0, 0.5)); } else { // Clear — drive forward self.cmd_pub.send(CmdVel::new(0.3, 0.0)); } } else { // No valid readings — stop self.cmd_pub.send(CmdVel::zero()); } } } fn shutdown(&mut self) -> Result<()> { self.cmd_pub.send(CmdVel::zero()); Ok(()) } } ``` --- ## See Also - [Sensor Messages (Rust)](/rust/api/sensor-messages) — Full Rust API for all sensor types - [Sensor Messages (Python)](/python/messages/sensor) — Python sensor API - [LiDAR Obstacle Avoidance Recipe](/recipes/lidar-obstacle-avoidance) — Reactive navigation - [Sensor Node Tutorial](/tutorials/01-sensor-node) — Build a sensor node - [OccupancyGrid](/stdlib/messages/occupancy-grid) — Grid maps built from laser scans --- ## Image Path: /stdlib/messages/image Description: Zero-copy camera images backed by shared memory with ML framework interop # Image A camera image backed by shared memory for zero-copy inter-process communication. Only a small descriptor (metadata) travels through the ring buffer; the actual pixel data stays in a shared memory pool. This enables real-time image pipelines at full camera frame rates without serialization overhead. ## When to Use Use `Image` when your robot has a camera and you need to share frames between nodes -- for example, between a camera driver node, a computer vision node, and a display node. The zero-copy design means a 1080p RGB image transfers in microseconds, not milliseconds. ## ROS2 Equivalent `sensor_msgs/Image` -- same concept (width, height, encoding, pixel data), but HORUS uses shared memory pools instead of serialized byte buffers. ## Zero-Copy Architecture ``` Camera driver Vision node Display node | | | |-- descriptor --> | | | (64 bytes) |-- descriptor --> | | | (64 bytes) | +-----+ +-----+ +-----+ | | | v v v [ Shared Memory Pool -- pixel data lives here ] ``` The descriptor contains pool ID, slot index, dimensions, and encoding. Each recipient maps the same physical memory -- no copies at any stage. ## Encoding Types | Encoding | Channels | Bytes/Pixel | Description | |----------|----------|-------------|-------------| | `Mono8` | 1 | 1 | 8-bit grayscale | | `Mono16` | 1 | 2 | 16-bit grayscale | | `Rgb8` | 3 | 3 | 8-bit RGB (default) | | `Bgr8` | 3 | 3 | 8-bit BGR (OpenCV format) | | `Rgba8` | 4 | 4 | 8-bit RGBA | | `Bgra8` | 4 | 4 | 8-bit BGRA | | `Yuv422` | 2 | 2 | YUV 4:2:2 | | `Mono32F` | 1 | 4 | 32-bit float grayscale | | `Rgb32F` | 3 | 12 | 32-bit float RGB | | `BayerRggb8` | 1 | 1 | Bayer pattern (raw sensor) | | `Depth16` | 1 | 2 | 16-bit depth in millimeters | ## Constructor ## Example ## Fields | Field | Type | Unit | Description | |-------|------|------|-------------| | `width` | `u32` | px | Image width | | `height` | `u32` | px | Image height | | `channels` | `u32` | -- | Number of color channels | | `encoding` | `ImageEncoding` | -- | Pixel format (see table above) | | `step` | `u32` | bytes | Bytes per row (width * bytes_per_pixel) | | `frame_id` | `str` | -- | Coordinate frame (e.g., `"camera_front"`) | | `timestamp_ns` | `u64` | ns | Timestamp in nanoseconds since epoch | ## Methods | Method | Signature | Description | |--------|-----------|-------------| | `new(w, h, enc)` | `(u32, u32, ImageEncoding) -> Image` | Create zero-initialized image | | `pixel(x, y)` | `(u32, u32) -> Option<&[u8]>` | Read pixel bytes at (x, y) | | `set_pixel(x, y, val)` | `(u32, u32, &[u8]) -> &mut Self` | Write pixel, chainable | | `fill(val)` | `(&[u8]) -> &mut Self` | Fill entire image with color | | `roi(x, y, w, h)` | `(u32, u32, u32, u32) -> Option>` | Extract region of interest | | `data()` | `-> &[u8]` | Raw pixel data slice | | `data_mut()` | `-> &mut [u8]` | Mutable pixel data slice | | `from_numpy(arr)` | Python: array -> Image | Create from numpy (copies in) | | `to_numpy()` | Python: -> ndarray | Zero-copy to numpy | | `to_torch()` | Python: -> Tensor | Zero-copy to PyTorch via DLPack | | `to_jax()` | Python: -> Array | Zero-copy to JAX via DLPack | ## Common Patterns **Camera-to-ML pipeline:** B["Image (SHM)"] B --> C["to_torch()"] C --> D["YOLO model"] D --> E["Detection"] D --> F["to_numpy()"] F --> G["OpenCV overlay"] `} caption="Camera-to-ML pipeline" /> **Multi-encoding workflow:** ```rust // simplified use horus::prelude::*; // Camera outputs BGR (OpenCV convention) let bgr = Image::new(640, 480, ImageEncoding::Bgr8)?; // Depth camera outputs 16-bit depth in millimeters let depth = Image::new(640, 480, ImageEncoding::Depth16)?; // ML model expects float grayscale let gray = Image::new(640, 480, ImageEncoding::Mono32F)?; ``` ## Design Decisions **Why pool-backed shared memory instead of serialized byte buffers?** Serializing a 1080p RGB image (6 MB) takes ~2ms and doubles memory usage (sender buffer + receiver buffer). With pool-backed shared memory, only the 64-byte descriptor is copied; the pixel data stays in one place and every subscriber maps the same physical memory. This keeps latency under 10us regardless of resolution. **Why fixed encoding enums instead of arbitrary format strings?** Fixed enums enable compile-time size calculations (`step = width * bytes_per_pixel`) and prevent encoding mismatches between publisher and subscriber. The enum covers all common camera output formats; for exotic encodings, use `GenericMessage` with manual layout. **Why `from_numpy()` copies data in but `to_numpy()` is zero-copy?** Writing into the shared memory pool requires placing data at a specific pool slot, so `from_numpy()` must copy once. Reading (`to_numpy()`) returns a view into the existing pool memory -- no copy needed. This asymmetry is intentional: one copy on publish, zero copies on subscribe. **Image vs DepthImage:** Use `Image` with `Depth16` encoding for raw depth sensor output (16-bit millimeters). Use `DepthImage` when you need float-meter depth values with statistics and min/max queries. They serve different pipeline stages: `Image` is for transport, `DepthImage` is for processing. --- ## See Also - [Image API (Rust)](/rust/api/image) — Full Rust Image API with pool-backed allocation - [Python Image](/python/api/image) — NumPy/PyTorch zero-copy - [Python CV Node Recipe](/recipes/python-cv-node) — Computer vision with Python - [DepthImage](/rust/api/depth-image) — Depth maps for stereo/structured light - [PointCloud](/rust/api/perception-messages) — 3D point cloud data --- ## Twist Path: /stdlib/messages/twist Description: 3D linear and angular velocity — the full velocity state for robots in 3D space # Twist Represents 3D linear velocity and 3D angular velocity. Used for full 6-DOF velocity representation in odometry, navigation, and force/torque control. For 2D mobile robots, prefer the simpler `CmdVel`. ## When to Use Use `Twist` when you need full 3D velocity: drones, underwater vehicles, manipulator end-effectors, or any system that moves in all six degrees of freedom. Also used as a component of `Odometry` and `TwistWithCovariance`. ## ROS2 Equivalent `geometry_msgs/Twist` — identical field layout (linear `[x, y, z]` + angular `[x, y, z]`). ## Example ## Fields | Field | Type | Unit | Description | |-------|------|------|-------------| | `linear` | `[f64; 3]` | m/s | Linear velocity `[x, y, z]` | | `angular` | `[f64; 3]` | rad/s | Angular velocity `[roll, pitch, yaw]` | | `timestamp_ns` | `u64` | ns | Timestamp | ## TwistWithCovariance For uncertainty-aware systems (EKF, navigation stacks), use the covariance variant: ```rust // simplified let twist_cov = TwistWithCovariance { twist: Twist { linear: [1.0, 0.0, 0.0], angular: [0.0, 0.0, 0.1], timestamp_ns: 0 }, covariance: [0.0; 36], // 6x6 row-major: [vx, vy, vz, wx, wy, wz] }; ``` | Field | Type | Description | |-------|------|-------------| | `twist` | `Twist` | The velocity | | `covariance` | `[f64; 36]` | 6x6 covariance matrix (row-major) | ## CmdVel vs Twist | Feature | `CmdVel` | `Twist` | |---------|----------|---------| | Dimensions | 2D (linear + angular) | 3D (6-DOF) | | Field types | `f32` | `f64` | | Zero-copy | Yes (`#[repr(C)]`, 12 bytes) | Yes (`#[repr(C)]`, 56 bytes) | | Use case | Mobile robots | Drones, 3D systems | ## Related Types - [CmdVel](/stdlib/messages/cmd-vel) — Simplified 2D velocity command - [Odometry](/stdlib/messages/odometry) — Combines pose + twist for localization - [Accel](/rust/api/geometry-messages) — Linear + angular acceleration --- ## See Also - [Geometry Messages (Rust)](/rust/api/geometry-messages) — Full Rust API - [Geometry Messages (Python)](/python/messages/geometry) — Python API - [CmdVel](/stdlib/messages/cmd-vel) — 2D velocity subset --- ## Standard Library Path: /stdlib Description: HORUS standard robotics types — 55+ message types, coordinate transforms, and zero-copy domain types # Standard Library The HORUS Standard Library (`horus_library`) provides **55+ typed messages** for robotics, coordinate transforms (`TransformFrame`), and zero-copy domain types (`Image`, `PointCloud`, `DepthImage`). All types work in both Rust and Python. ## Which Type Do I Need? | I need to... | Use | Rust | Python | |-------------|-----|------|--------| | Drive wheels / send velocity | `CmdVel` | `CmdVel::new(0.5, 0.1)` | `CmdVel(linear=0.5, angular=0.1)` | | Read IMU (accel + gyro) | `Imu` | `imu.linear_acceleration` | `imu.linear_acceleration` | | Read LiDAR scan | `LaserScan` | `scan.ranges` | `scan.ranges` | | Send camera images | `Image` | `Image::new(640, 480, Rgb8)` | `Image(640, 480, encoding=0)` | | Send 3D point cloud | `PointCloud` | `PointCloud::new(1000, XYZ)` | `PointCloud(num_points=1000)` | | Read wheel odometry | `Odometry` | `odom.x`, `odom.y` | `odom.x`, `odom.y` | | Control joints | `JointState` | `js.positions` | `js.positions` | | Detect objects (2D) | `Detection` | `det.class_name` | `det.class_name` | | Detect objects (3D) | `Detection3D` | `det.bbox` | `det.bbox` | | Build a 2D map | `OccupancyGrid` | `grid.is_free(x, y)` | `grid.is_free(x, y)` | | Plan a path | `NavPath` | `path.add_waypoint(wp)` | `path.add_waypoint(wp)` | | Estimate body pose | `LandmarkArray` | `.coco_pose()` | `.coco_pose()` | | Transform coordinates | `TransformFrame` | `tf.lookup("camera", "world")` | `tf.lookup("camera", "world")` | | Monitor robot health | `SafetyStatus` | `status.estop_engaged` | `status.estop_engaged` | ## Sections - **[Message Types](/stdlib/messages)** — All 55+ message types organized by category - **[TransformFrame](/concepts/transform-frame)** — Coordinate transform system - **[Image](/rust/api/image)** — Zero-copy camera images - **[PointCloud](/rust/api/perception-messages)** — 3D point cloud data - **[DepthImage](/rust/api/depth-image)** — Depth maps from stereo/structured light - **[Tensor & DLPack](/rust/api/tensor)** — ML tensor exchange with PyTorch/JAX - **[Python Message Library](/python/library/python-message-library)** — All types in Python with field tables ## Design Decisions **Why a standard library instead of user-defined messages only?** Standardized message types enable interoperability: a SLAM node from one developer works with a path planner from another because they agree on `OccupancyGrid`. Without standard types, every team invents its own Pose, IMU, and scan formats, making integration painful (the ROS2 ecosystem learned this lesson early). **Why 55+ types instead of a minimal set?** Robotics has well-established data patterns (IMU, LiDAR, odometry, detection, etc.). Providing them out of the box means users start building application logic immediately instead of defining message schemas. Custom messages (`message!` macro or `horus.msggen`) cover domain-specific needs. **Why Rust and Python parity?** Mixed-language systems are common: Rust for real-time control, Python for ML inference and prototyping. Every standard type is available in both languages with binary-compatible zero-copy IPC, so a Rust SLAM node can publish `OccupancyGrid` that a Python planner consumes with no translation layer. --- ## See Also - [Message Types Concept](/concepts/message-types) — How messages work in HORUS - [Rust API Messages](/rust/api/messages) — Complete Rust message reference - [Python Message Library](/python/library/python-message-library) — Python equivalents --- ## OccupancyGrid & CostMap Path: /stdlib/messages/occupancy-grid Description: Grid-based environment maps for navigation, mapping, and path planning # OccupancyGrid & CostMap `OccupancyGrid` represents a 2D grid map of the environment where each cell stores an occupancy probability. `CostMap` extends it with inflated costs around obstacles for safe path planning. Together they form the mapping and planning backbone for autonomous navigation. ## When to Use Use `OccupancyGrid` when your robot builds maps from sensor data (SLAM) or loads pre-built maps for localization. Use `CostMap` when you need to plan paths that keep a safe distance from obstacles. ## ROS2 Equivalent `nav_msgs/OccupancyGrid` -- same grid structure with origin, resolution, and cell values. HORUS adds `CostMap` as a first-class type (in ROS2 this lives in the costmap_2d package). ## Cell Values | Value | Meaning | |-------|---------| | `-1` | Unknown (unexplored) | | `0` | Free space | | `1-49` | Probably free (low probability of obstacle) | | `50-99` | Probably occupied | | `100` | Definitely occupied | ## Example ## OccupancyGrid Fields | Field | Type | Unit | Description | |-------|------|------|-------------| | `resolution` | `f32` | m/cell | Meters per cell. Default: `0.05` (5cm) | | `width` | `u32` | cells | Grid width | | `height` | `u32` | cells | Grid height | | `origin` | `Pose2D` | m, rad | World pose of the bottom-left corner | | `data` | `Vec` | 0--100 | Cell values, row-major. `-1`=unknown, `0`=free, `100`=occupied | | `frame_id` | `[u8; 32]` | -- | Coordinate frame (e.g., `"map"`) | | `metadata` | `[u8; 64]` | -- | Free-form metadata | | `timestamp_ns` | `u64` | ns | Timestamp in nanoseconds since epoch | ## OccupancyGrid Methods | Method | Signature | Description | |--------|-----------|-------------| | `new(w, h, res, origin)` | `(u32, u32, f32, Pose2D) -> Self` | Create grid initialized to unknown (-1) | | `world_to_grid(x, y)` | `(f64, f64) -> Option<(u32, u32)>` | Convert world coordinates to grid indices | | `grid_to_world(gx, gy)` | `(u32, u32) -> Option<(f64, f64)>` | Convert grid indices to world coordinates (cell center) | | `occupancy(gx, gy)` | `(u32, u32) -> Option` | Get cell value at grid coordinates | | `set_occupancy(gx, gy, val)` | `(u32, u32, i8) -> bool` | Set cell value (clamped to -1..100) | | `is_free(x, y)` | `(f64, f64) -> bool` | True if occupancy is in 0..50 | | `is_occupied(x, y)` | `(f64, f64) -> bool` | True if occupancy >= 50 | ## CostMap Fields | Field | Type | Unit | Description | |-------|------|------|-------------| | `occupancy_grid` | `OccupancyGrid` | -- | Underlying occupancy grid | | `costs` | `Vec` | 0--255 | Cost values per cell. `253`=lethal, `255`=unknown | | `inflation_radius` | `f32` | m | Inflation radius. Default: `0.55` | | `cost_scaling_factor` | `f32` | -- | Exponential decay factor. Default: `10.0` | | `lethal_cost` | `u8` | -- | Cost threshold for lethal obstacles. Default: `253` | ## CostMap Methods | Method | Signature | Description | |--------|-----------|-------------| | `from_occupancy_grid(grid, radius)` | `(OccupancyGrid, f32) -> Self` | Create costmap with inflation | | `compute_costs()` | `-> ()` | Recompute costs from occupancy data | | `cost(x, y)` | `(f64, f64) -> Option` | Get cost at world coordinates | ## Common Patterns **SLAM pipeline:** ``` LaserScan --> SLAM algorithm --> OccupancyGrid (map) \-> Pose2D (localization) ``` **Path planning pipeline:** B["CostMap (inflated)"] B --> C["path planner"] C --> D["NavPath"] C --> E["CmdVel"] `} caption="Path planning pipeline" /> **Inflation:** The `CostMap` applies exponential cost decay around obstacles. A cell at distance `d` from an obstacle gets cost proportional to `(1 - d/radius)^scaling_factor`. This creates a smooth gradient that path planners use to keep the robot away from walls and obstacles. ## Design Decisions **Why `-1` for unknown instead of a separate boolean mask?** Using `-1` within the same `i8` cell value avoids a second parallel array and matches the ROS2 convention. Three-state cells (unknown/free/occupied) are sufficient for most SLAM and planning algorithms. The tradeoff is that cell values are signed, which requires care when converting to unsigned cost values. **Why separate OccupancyGrid and CostMap?** Raw occupancy (from SLAM) and inflated costs (for planning) serve different consumers. A SLAM node publishes `OccupancyGrid`; a planner reads `CostMap`. Keeping them separate means SLAM doesn't need to know inflation parameters, and multiple planners can use different inflation radii from the same base map. **Why exponential cost decay for inflation?** Linear decay creates sharp cost boundaries that cause path oscillation. Exponential decay (`(1 - d/radius)^scaling_factor`) produces smooth gradients that path planners follow naturally, keeping the robot at a comfortable distance from obstacles without sudden turns. **Resolution tradeoffs:** 5cm default resolution balances map precision (fine enough for indoor navigation) with memory usage (a 20m x 20m map at 5cm is 400x400 = 160KB). For outdoor robots, use 10-25cm resolution; for warehouse micro-navigation, use 1-2cm. --- ## See Also - [Navigation Messages (Rust)](/rust/api/navigation-messages) — Full Rust API - [Navigation Messages (Python)](/python/messages/navigation) — Python API - [Navigation](/stdlib/messages/navigation) — NavGoal, NavPath, PathPlan - [LaserScan](/stdlib/messages/laser-scan) — LiDAR data that builds occupancy grids --- ## Odometry Path: /stdlib/messages/odometry Description: Robot position and velocity from wheel encoders or visual odometry # Odometry Combines a 2D pose (position + heading) with a twist (velocity), covariance matrices, and coordinate frame IDs. The standard output from wheel encoders, visual odometry, or any localization system. > **Python**: Available via `horus.Odometry(x, y, theta, linear_velocity, angular_velocity)`. See [Python Sensor Messages](/python/messages/sensor). > > **ROS2 equivalent**: `nav_msgs/Odometry` — similar structure (pose + twist + covariances + frame IDs). ```rust // simplified use horus::prelude::*; ``` --- ## Quick Reference — Fields | Field | Type | Unit | Description | |-------|------|------|-------------| | `pose` | `Pose2D` | m, rad | Current position (x, y) and heading (theta) | | `twist` | `Twist` | m/s, rad/s | Current velocity (linear + angular) | | `pose_covariance` | `[f64; 36]` | — | 6x6 row-major covariance for pose (x, y, z, roll, pitch, yaw) | | `twist_covariance` | `[f64; 36]` | — | 6x6 row-major covariance for velocity | | `frame_id` | `[u8; 32]` | — | Coordinate frame for pose (e.g., `"odom"`, `"map"`) | | `child_frame_id` | `[u8; 32]` | — | Coordinate frame for twist (e.g., `"base_link"`) | | `timestamp_ns` | `u64` | ns | Timestamp in nanoseconds since epoch | ## Quick Reference — Methods | Method | Returns | Description | |--------|---------|-------------| | `new()` | `Odometry` | Creates at origin with zero velocity and current timestamp | | `set_frames(frame, child_frame)` | `()` | Sets coordinate frame IDs | | `update(pose, twist)` | `()` | Updates pose and velocity, refreshes timestamp | | `is_valid()` | `bool` | Checks if pose and twist contain finite values | --- ## Constructor Methods ### `new()` Creates an odometry message at the origin with zero velocity. **Signature** ```rust // simplified pub fn new() -> Self ``` **Parameters** None. **Returns** `Odometry` — with these defaults: - `pose`: `Pose2D::origin()` (x=0, y=0, theta=0) - `twist`: `Twist::stop()` (all zeros) - `pose_covariance`: `[0.0; 36]` - `twist_covariance`: `[0.0; 36]` - `frame_id`: empty (set via `set_frames()`) - `child_frame_id`: empty (set via `set_frames()`) - `timestamp_ns`: current time via `timestamp_now()` **Panics** Never. **Example** ```rust // simplified use horus::prelude::*; let mut odom = Odometry::new(); odom.pose = Pose2D { x: 1.5, y: 2.0, theta: 0.785, timestamp_ns: 0 }; odom.twist.linear[0] = 0.3; // forward velocity (m/s) odom.twist.angular[2] = 0.05; // yaw rate (rad/s) odom.set_frames("odom", "base_link"); let topic: Topic = Topic::new("odom")?; topic.send(odom); ``` --- ### Python Constructor Creates an odometry message from flat parameters. **Signature** ```python Odometry(x=0.0, y=0.0, theta=0.0, linear_velocity=0.0, angular_velocity=0.0, timestamp_ns=0) ``` **Parameters** | Name | Type | Required | Description | |------|------|----------|-------------| | `x` | `f64` | no | Position X in meters. Default: `0.0`. | | `y` | `f64` | no | Position Y in meters. Default: `0.0`. | | `theta` | `f64` | no | Heading in radians. Default: `0.0`. | | `linear_velocity` | `f64` | no | Forward velocity in m/s. Default: `0.0`. | | `angular_velocity` | `f64` | no | Yaw rate in rad/s. Default: `0.0`. | | `timestamp_ns` | `u64` | no | Timestamp. Default: `0`. | **Returns** `Odometry` instance. **Behavior** - Python provides flat property access: `odom.x`, `odom.y`, `odom.theta`, `odom.linear_velocity`, `odom.angular_velocity` (all read/write) - Internally maps to the nested Rust struct: `x` → `pose.x`, `linear_velocity` → `twist.linear[0]`, `angular_velocity` → `twist.angular[2]` **Example** ```python from horus import Odometry, Topic odom = Odometry(x=1.5, y=2.0, theta=0.785, linear_velocity=0.3, angular_velocity=0.05) odom.set_frames("odom", "base_link") topic = Topic(Odometry) topic.send(odom) print(f"Position: ({odom.x}, {odom.y}), heading: {odom.theta}") print(f"Velocity: {odom.linear_velocity} m/s, yaw: {odom.angular_velocity} rad/s") ``` --- ## Methods ### `set_frames(frame, child_frame)` Sets the coordinate frame IDs for this odometry message. **Signature** ```rust // simplified pub fn set_frames(&mut self, frame: &str, child_frame: &str) ``` **Parameters** | Name | Type | Required | Description | |------|------|----------|-------------| | `frame` | `&str` | yes | Coordinate frame for the pose (e.g., `"odom"`, `"map"`). Max 31 characters. | | `child_frame` | `&str` | yes | Coordinate frame for the twist (e.g., `"base_link"`). Max 31 characters. | **Returns** Nothing (`()`). **Behavior** - Copies frame strings into fixed-size `[u8; 32]` arrays with null termination - Strings longer than 31 characters are silently truncated - Convention: `frame_id` is the reference frame (where the pose is expressed), `child_frame_id` is the body frame (where the twist is expressed) **When to use** - Always — consumers need frame IDs to integrate odometry into the transform tree - Standard convention: `frame_id = "odom"`, `child_frame_id = "base_link"` **Example** ```rust // simplified use horus::prelude::*; let mut odom = Odometry::new(); odom.set_frames("odom", "base_link"); ``` --- ### `update(pose, twist)` Updates the pose and twist, and refreshes the timestamp to the current time. **Signature** ```rust // simplified pub fn update(&mut self, pose: Pose2D, twist: Twist) ``` **Parameters** | Name | Type | Required | Description | |------|------|----------|-------------| | `pose` | `Pose2D` | yes | New position and heading estimate | | `twist` | `Twist` | yes | New velocity estimate | **Returns** Nothing (`()`). **Behavior** - Overwrites `self.pose` and `self.twist` - Sets `self.timestamp_ns` to the current time via `timestamp_now()` - Does NOT update covariances — set those separately if needed **When to use** - Drive nodes that integrate wheel encoders each tick - Fusion nodes that update the pose estimate **Example** ```rust // simplified use horus::prelude::*; fn tick(&mut self) { let new_pose = Pose2D { x: self.x, y: self.y, theta: self.theta, timestamp_ns: 0 }; let new_twist = Twist { linear: [self.speed, 0.0, 0.0], angular: [0.0, 0.0, self.omega], timestamp_ns: 0, }; self.odom.update(new_pose, new_twist); self.odom_pub.send(self.odom); } ``` --- ### `is_valid()` Checks whether the pose and twist contain finite values. **Signature** ```rust // simplified pub fn is_valid(&self) -> bool ``` **Parameters** None. **Returns** `true` if both `self.pose.is_valid()` and `self.twist.is_valid()` return `true`. `false` if any field is NaN or infinite. **When to use** - Validating odometry before feeding it to a planner or fusion node - Detecting encoder faults or integration overflow --- ## Production Example Wheel encoder odometry node with frame IDs: ```rust // simplified use horus::prelude::*; struct WheelOdom { odom_pub: Topic, odom: Odometry, x: f64, y: f64, theta: f64, } impl Node for WheelOdom { fn name(&self) -> &str { "WheelOdom" } fn tick(&mut self) { // Read encoders, compute dx, dtheta let (dx, dtheta) = self.read_encoders(); self.x += dx * self.theta.cos(); self.y += dx * self.theta.sin(); self.theta += dtheta; self.odom.pose = Pose2D { x: self.x, y: self.y, theta: self.theta, timestamp_ns: 0, }; self.odom.twist.linear[0] = dx * 100.0; // dx per tick * rate self.odom.twist.angular[2] = dtheta * 100.0; self.odom.set_frames("odom", "base_link"); self.odom_pub.send(self.odom); } } ``` --- ## See Also - [Sensor Messages (Rust)](/rust/api/sensor-messages) — Full Rust API - [Sensor Messages (Python)](/python/messages/sensor) — Python API - [Pose2D](/stdlib/messages/pose) — 2D position and heading - [Twist](/stdlib/messages/twist) — 3D velocity - [Differential Drive Recipe](/recipes/differential-drive) — Publishes odometry from wheel encoders - [Multi-Sensor Fusion Recipe](/recipes/multi-sensor-fusion) — Combine odometry with IMU --- ## Detection & Detection3D Path: /stdlib/messages/detection Description: ML object detection results for 2D bounding boxes and 3D oriented bounding boxes # Detection & Detection3D Fixed-size object detection messages for zero-copy IPC. `Detection` holds a 2D bounding box result from models like YOLO or SSD. `Detection3D` holds a 3D oriented bounding box from point cloud detectors or depth-aware models. Both are fixed-size types — transferred via zero-copy shared memory (72 and 104 bytes respectively). ## When to Use Use `Detection` when your robot runs a 2D object detection model on camera images and needs to publish results to downstream nodes (tracking, planning, visualization). Use `Detection3D` when you have 3D detections from LiDAR-based or depth-aware models. ## ROS2 Equivalent - `Detection` maps to `vision_msgs/Detection2D` - `Detection3D` maps to `vision_msgs/Detection3D` ## Example ## Detection Fields | Field | Type | Unit | Size | Description | |-------|------|------|------|-------------| | `bbox` | `BoundingBox2D` | px | 16 B | Bounding box `(x, y, width, height)` | | `confidence` | `f32` | 0--1 | 4 B | Detection confidence | | `class_id` | `u32` | -- | 4 B | Numeric class identifier | | `class_name` | `[u8; 32]` | -- | 32 B | UTF-8 class label, null-padded (max 31 chars) | | `instance_id` | `u32` | -- | 4 B | Instance ID for instance segmentation | **Total size: 72 bytes (fixed-size, zero-copy)** ## Detection Methods | Method | Signature | Description | |--------|-----------|-------------| | `new(name, conf, x, y, w, h)` | `(&str, f32, f32, f32, f32, f32) -> Self` | Create with class name and bounding box | | `with_class_id(id, conf, bbox)` | `(u32, f32, BoundingBox2D) -> Self` | Create with numeric class ID | | `class_name()` | `-> &str` | Get class name as string | | `set_class_name(name)` | `(&str) -> ()` | Set class name (truncates to 31 chars) | | `is_confident(threshold)` | `(f32) -> bool` | True if confidence >= threshold | ## BoundingBox2D Methods | Method | Signature | Description | |--------|-----------|-------------| | `new(x, y, w, h)` | `(f32, f32, f32, f32) -> Self` | Create from top-left corner | | `from_center(cx, cy, w, h)` | `(f32, f32, f32, f32) -> Self` | Create from center (YOLO format) | | `center_x()`, `center_y()` | `-> f32` | Center coordinates | | `area()` | `-> f32` | Box area in pixels | | `iou(other)` | `(&BoundingBox2D) -> f32` | Intersection over Union | ## Detection3D Fields | Field | Type | Unit | Size | Description | |-------|------|------|------|-------------| | `bbox` | `BoundingBox3D` | m, rad | 48 B | 3D box: center, dimensions, rotation (roll/pitch/yaw) | | `confidence` | `f32` | 0--1 | 4 B | Detection confidence | | `class_id` | `u32` | -- | 4 B | Numeric class identifier | | `class_name` | `[u8; 32]` | -- | 32 B | UTF-8 class label | | `velocity_x/y/z` | `f32` | m/s | 12 B | Object velocity (for tracking-enabled detectors) | | `instance_id` | `u32` | -- | 4 B | Tracking/instance ID | **Total size: 104 bytes (fixed-size, zero-copy)** ## Common Patterns **Camera-to-tracking pipeline:** B["Image"] B --> C["YOLO model"] C --> D["Detection"] D --> E["tracker"] E --> F["TrackedObject"] D --> G["filter by confidence"] D --> H["filter by class"] `} caption="Camera-to-tracking pipeline" /> **Confidence filtering pattern:** ```rust // simplified use horus::prelude::*; fn filter_detections(detections: &[Detection], min_conf: f32) -> Vec<&Detection> { detections.iter() .filter(|d| d.is_confident(min_conf)) .collect() } ``` **NMS (Non-Maximum Suppression):** ```rust // simplified use horus::prelude::*; fn nms(dets: &mut Vec, iou_threshold: f32) { dets.sort_by(|a, b| b.confidence.partial_cmp(&a.confidence).unwrap()); let mut keep = vec![true; dets.len()]; for i in 0..dets.len() { if !keep[i] { continue; } for j in (i + 1)..dets.len() { if keep[j] && dets[i].bbox.iou(&dets[j].bbox) > iou_threshold { keep[j] = false; } } } let mut idx = 0; dets.retain(|_| { let k = keep[idx]; idx += 1; k }); } ``` ## Design Decisions **Why fixed-size 72/104 bytes instead of variable-length?** Fixed-size enables zero-copy shared memory transport without serialization. Every `Detection` fits in the same ring buffer slot, so the scheduler can deliver detections at the same rate as IMU or odometry data -- no allocation jitter. **Why `[u8; 32]` for class_name instead of a String?** Strings are heap-allocated and variable-length, making them incompatible with zero-copy transport. A 32-byte null-padded array fits in the fixed-size layout and accommodates class labels up to 31 characters (enough for COCO, ImageNet, and custom class names). If you need longer labels, use `class_id` and maintain a lookup table. **Why separate Detection and Detection3D?** 2D and 3D detections have different data layouts and come from different pipelines (camera vs LiDAR/depth). Keeping them separate avoids wasted memory (no 3D fields when doing 2D detection) and makes topic typing unambiguous. A tracker node can subscribe to one or both. **Why no built-in NMS?** Non-Maximum Suppression strategies vary by model and use case (soft-NMS, class-aware NMS, rotated NMS). HORUS provides the `iou()` primitive and lets you implement the NMS variant your pipeline needs, as shown in the Common Patterns section above. --- ## See Also - [Perception Messages (Rust)](/rust/api/perception-messages) — Full Rust API - [Perception Messages (Python)](/python/messages/perception) — Python API - [Segmentation](/stdlib/messages/segmentation) — Related perception type - [Image](/stdlib/messages/image) — Camera frames that feed detection models - [TrackedObject](/rust/api/perception-messages) — Multi-frame object tracking --- ## Pose2D / Pose3D Path: /stdlib/messages/pose Description: 2D and 3D position and orientation types for robotics # Pose2D / Pose3D Position and orientation in 2D or 3D space. The fundamental geometry types for localization, navigation, and manipulation. ## Pose2D — 2D Position + Heading The standard representation for ground robots: position (x, y) and heading angle (theta). ### Fields | Field | Type | Unit | Description | |-------|------|------|-------------| | `x` | `f64` | m | X position | | `y` | `f64` | m | Y position | | `theta` | `f64` | rad | Heading angle (0 = forward, positive = counter-clockwise) | --- ## Pose3D — 3D Position + Quaternion Full 6-DOF pose for 3D applications: drones, manipulator end-effectors, VR tracking. ### Fields | Field | Type | Description | |-------|------|-------------| | `position` | `Point3` | 3D position `{ x, y, z }` in meters | | `orientation` | `Quaternion` | Orientation as `{ x, y, z, w }` quaternion | | `timestamp_ns` | `u64` | Timestamp | --- ## PoseStamped — Pose3D with Frame ID Adds a coordinate frame identifier for use with the [Transform Frame](/concepts/transform-frame) system. ```rust // simplified let stamped = PoseStamped { pose: Pose3D { position: Point3 { x: 1.0, y: 0.0, z: 0.0 }, orientation: Quaternion { x: 0.0, y: 0.0, z: 0.0, w: 1.0 }, timestamp_ns: 0, }, frame_id: *b"base_link\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", timestamp_ns: 0, }; ``` ### Fields | Field | Type | Description | |-------|------|-------------| | `pose` | `Pose3D` | The 3D pose | | `frame_id` | `[u8; 32]` | Coordinate frame (null-terminated string, max 31 chars) | | `timestamp_ns` | `u64` | Timestamp | --- ## PoseWithCovariance — Uncertainty-Aware Pose For EKF and probabilistic localization. The 6x6 covariance matrix represents uncertainty in `[x, y, z, roll, pitch, yaw]`. ```rust // simplified let pose_cov = PoseWithCovariance { pose: Pose3D { position: Point3 { x: 1.0, y: 2.0, z: 0.0 }, orientation: Quaternion { x: 0.0, y: 0.0, z: 0.0, w: 1.0 }, timestamp_ns: 0, }, covariance: [0.0; 36], // 6x6 row-major }; ``` ### Fields | Field | Type | Description | |-------|------|-------------| | `pose` | `Pose3D` | The 3D pose | | `covariance` | `[f64; 36]` | 6x6 covariance matrix (row-major) | --- ## Choosing the Right Type | Type | Dimensions | Use Case | |------|-----------|----------| | `Pose2D` | x, y, theta | Ground robots, 2D navigation | | `Pose3D` | xyz + quaternion | Drones, arms, 3D perception | | `PoseStamped` | Pose3D + frame | Transform frame integration | | `PoseWithCovariance` | Pose3D + covariance | EKF, probabilistic localization | ## Related Types - [Odometry](/stdlib/messages/odometry) — Pose + velocity from wheel encoders - [Twist](/stdlib/messages/twist) — Velocity (the derivative of pose) - [TransformStamped](/concepts/transform-frame) — Relative transform between frames - [Point3, Vector3, Quaternion](/rust/api/geometry-messages) — Primitive geometry types --- ## See Also - [Geometry Messages (Rust)](/rust/api/geometry-messages) — Full Rust API - [Geometry Messages (Python)](/python/messages/geometry) — Python API - [TransformFrame](/concepts/transform-frame) — Coordinate frame transforms --- ## SegmentationMask Path: /stdlib/messages/segmentation Description: Pixel-level segmentation masks for semantic, instance, and panoptic segmentation # SegmentationMask A fixed-size header (64 bytes, zero-copy transport) describing a pixel-level segmentation mask. The mask data follows the header as a raw byte array where each pixel stores a class ID (semantic), instance ID (instance), or both (panoptic). Three static constructors select the segmentation mode. ## When to Use Use `SegmentationMask` when your robot runs a segmentation model and needs to label every pixel in an image. Common scenarios include driveable surface detection (semantic), grasping individual objects (instance), and full scene understanding (panoptic). ## ROS2 Equivalent No direct ROS2 equivalent. ROS2 typically publishes segmentation as `sensor_msgs/Image` with class IDs encoded as pixel values. HORUS provides a dedicated type with mode metadata. ## Three Segmentation Modes | Mode | Value | Pixel Meaning | Use Case | |------|-------|---------------|----------| | **Semantic** | `0` | Class ID (0-255). Each class gets one color. | "What is this pixel?" -- road, sidewalk, sky | | **Instance** | `1` | Instance ID (0-255). Each object gets a unique ID. | "Which object is this pixel?" -- person #1, person #2 | | **Panoptic** | `2` | Both class and instance encoded. | "What and which?" -- car #3, tree #7 | ## Example ## Fields | Field | Type | Unit | Size | Description | |-------|------|------|------|-------------| | `width` | `u32` | px | 4 B | Image width | | `height` | `u32` | px | 4 B | Image height | | `num_classes` | `u32` | -- | 4 B | Number of classes (semantic/panoptic). `0` for instance mode | | `mask_type` | `u32` | -- | 4 B | `0`=semantic, `1`=instance, `2`=panoptic | | `timestamp_ns` | `u64` | ns | 8 B | Timestamp in nanoseconds since epoch | | `seq` | `u64` | -- | 8 B | Sequence number | | `frame_id` | `[u8; 32]` | -- | 32 B | Coordinate frame (e.g., `"camera_front"`) | **Total header size: 64 bytes (fixed-size, zero-copy)** ## Methods | Method | Signature | Description | |--------|-----------|-------------| | `semantic(w, h, classes)` | `(u32, u32, u32) -> Self` | Create semantic mask header | | `instance(w, h)` | `(u32, u32) -> Self` | Create instance mask header | | `panoptic(w, h, classes)` | `(u32, u32, u32) -> Self` | Create panoptic mask header | | `is_semantic()` | `-> bool` | True if `mask_type == 0` | | `is_instance()` | `-> bool` | True if `mask_type == 1` | | `is_panoptic()` | `-> bool` | True if `mask_type == 2` | | `data_size()` | `-> usize` | Buffer size for u8 mask (`width * height`) | | `data_size_u16()` | `-> usize` | Buffer size for u16 mask (`width * height * 2`) | | `with_frame_id(id)` | `(&str) -> Self` | Set coordinate frame, chainable | | `with_timestamp(ts)` | `(u64) -> Self` | Set timestamp, chainable | | `frame_id()` | `-> &str` | Get frame ID as string | ## COCO Class Constants The `segmentation::classes` module provides standard COCO class IDs: ```rust // simplified use horus::prelude::*; use horus_library::messages::segmentation::classes; // Note: not re-exported in prelude let is_person = pixel_class == classes::PERSON; // 1 let is_car = pixel_class == classes::CAR; // 3 let is_dog = pixel_class == classes::DOG; // 18 let is_background = pixel_class == classes::BACKGROUND; // 0 ``` ## Common Patterns **Segmentation pipeline:** B["Image"] B --> C["segmentation model"] C --> D["SegmentationMask"] C --> E["overlay on Image"] `} caption="Segmentation pipeline" /> **Driveable surface detection:** ```rust // simplified use horus::prelude::*; fn driveable_area(mask_data: &[u8], width: u32, road_class: u8) -> f32 { let total = mask_data.len() as f32; let road_pixels = mask_data.iter().filter(|&&p| p == road_class).count() as f32; road_pixels / total } ``` **Instance counting:** ```rust // simplified use horus::prelude::*; fn count_instances(mask_data: &[u8]) -> usize { let mut seen = [false; 256]; for &id in mask_data { if id > 0 { // Skip background seen[id as usize] = true; } } seen.iter().filter(|&&v| v).count() } ``` **Panoptic encoding:** In panoptic mode, use u16 masks (`data_size_u16()`) to encode both class and instance: `encoded = class_id * 256 + instance_id`. This supports up to 256 classes with up to 256 instances each. ## Design Decisions **Why a fixed-size header with separate mask data instead of one variable-length message?** The 64-byte header travels through the zero-copy ring buffer while the mask data (potentially megabytes for high-resolution images) lives in a separate shared memory region. This keeps the ring buffer compact and avoids blocking other messages. **Why u8 per pixel instead of bitfields?** A u8 per pixel supports up to 256 classes (semantic) or 256 instances -- sufficient for virtually all segmentation models. Bitfield packing would halve memory usage but add bit-shift overhead on every pixel access, which matters when processing millions of pixels per frame. **Why three explicit modes instead of a generic mask?** Semantic, instance, and panoptic segmentation have different downstream processing (class lookup vs instance counting vs combined decoding). Explicit modes let subscribers branch on `is_semantic()` / `is_instance()` / `is_panoptic()` without guessing the encoding convention. **Panoptic encoding convention:** `class_id * 256 + instance_id` in u16 format supports 256 classes with 256 instances each. This matches the COCO panoptic format and avoids the complexity of separate class and instance buffers. --- ## See Also - [Perception Messages (Rust)](/rust/api/perception-messages) — Full Rust API - [Perception Messages (Python)](/python/messages/perception) — Python API - [Detection](/stdlib/messages/detection) — Related perception type - [Image](/stdlib/messages/image) — Camera frames that feed segmentation models - [OccupancyGrid](/stdlib/messages/occupancy-grid) — Grid-based maps from segmentation output --- ## BatteryState Path: /stdlib/messages/battery-state Description: Battery monitoring — voltage, current, temperature, cell voltages, and charge status # BatteryState Battery health and charge state for any battery-powered robot. Reports voltage, current draw, temperature, individual cell voltages, and charge percentage. ## When to Use Use `BatteryState` when your robot runs on batteries and you need to monitor power levels, trigger low-battery warnings, or initiate safe shutdown. Essential for mobile robots, drones, and any untethered system. ## ROS2 Equivalent `sensor_msgs/BatteryState` — similar structure (voltage, current, charge, capacity, temperature, cell voltages). ## Example ## Fields | Field | Type | Unit | Description | |-------|------|------|-------------| | `voltage` | `f32` | V | Total pack voltage | | `current` | `f32` | A | Current draw (negative = discharging) | | `charge` | `f32` | Ah | Remaining charge (`NaN` if unknown) | | `capacity` | `f32` | Ah | Full capacity (`NaN` if unknown) | | `percentage` | `f32` | % | State of charge (0–100) | | `power_supply_status` | `u8` | — | 0=unknown, 1=charging, 2=discharging, 3=full | | `temperature` | `f32` | °C | Pack temperature | | `cell_voltages` | `[f32; 16]` | V | Per-cell voltages (if available) | | `cell_count` | `u8` | — | Number of valid cell readings | | `timestamp_ns` | `u64` | ns | Timestamp | ## Common Patterns ### Low Battery Warning ```rust // simplified fn tick(&mut self) { // IMPORTANT: always recv() every tick if let Some(battery) = self.battery_sub.recv() { if battery.percentage < 20.0 { hlog!(warn, "Low battery: {:.0}% ({:.1}V)", battery.percentage, battery.voltage); } if battery.percentage < 5.0 { // SAFETY: trigger safe shutdown hlog!(error, "Critical battery: {:.0}% — shutting down", battery.percentage); self.cmd_pub.send(CmdVel { linear: 0.0, angular: 0.0, timestamp_ns: 0 }); } } } ``` ## Quick Reference | Method / Field | Type | Description | |----------------|------|-------------| | `voltage` | `f32` | Total pack voltage (V) | | `current` | `f32` | Current draw, negative = discharging (A) | | `percentage` | `f32` | State of charge 0--100 (%) | | `temperature` | `f32` | Pack temperature (C) | | `power_supply_status` | `u8` | 0=unknown, 1=charging, 2=discharging, 3=full | | `cell_voltages` | `[f32; 16]` | Per-cell voltages, valid up to `cell_count` | | `cell_count` | `u8` | Number of valid cell readings | | `charge` | `f32` | Remaining charge in Ah (`NaN` if unknown) | | `capacity` | `f32` | Full capacity in Ah (`NaN` if unknown) | ## Design Decisions **Why `NaN` for unknown charge/capacity instead of `Option` or sentinel values?** `BatteryState` is a fixed-size Pod type for zero-copy transport. `Option` is not Pod-compatible. IEEE 754 NaN is universally recognized as "not available" and works in both Rust and Python without extra wrapping. **Why negative current for discharging?** Sign convention matches physics and electrical engineering standards: positive current flows *into* the battery (charging), negative flows *out* (discharging). This lets you compute power with `voltage * current` directly -- negative power means the battery is supplying energy. **Why 16 cell slots instead of variable-length?** Fixed-size arrays enable zero-copy transport. 16 cells cover common battery configurations: 3S/4S/6S LiPo (drones), 4S-14S (ground robots), and most industrial packs. For larger packs, publish multiple `BatteryState` messages per pack segment. **Why no built-in state-of-health or cycle count?** Battery health estimation algorithms vary widely by chemistry (LiPo, LiFePO4, NiMH) and require calibration data. HORUS provides the raw measurements; health estimation belongs in a domain-specific node that publishes `DiagnosticStatus`. ## Related Types - [EmergencyStop](/rust/api/diagnostics-messages) — Triggered by critical battery - [DiagnosticStatus](/rust/api/diagnostics-messages) — General health reporting - [ResourceUsage](/rust/api/diagnostics-messages) — CPU, memory, and system stats --- ## See Also - [Sensor Messages (Rust)](/rust/api/sensor-messages) — Full Rust API - [Diagnostics Messages](/rust/api/diagnostics-messages) — System health reporting - [SafetyStatus](/rust/api/diagnostics-messages) — System-wide safety state --- ## Navigation Messages Path: /stdlib/messages/navigation Description: NavGoal, NavPath, PathPlan, and CostMap — the navigation stack interface # Navigation Messages Messages for goal-based navigation: send a target pose, receive a planned path, and follow waypoints. These form the interface between high-level commands ("go to the kitchen") and low-level control ([CmdVel](/stdlib/messages/cmd-vel)). ## NavGoal — Where to Go A navigation target with position, orientation, and tolerances. ### Fields | Field | Type | Unit | Description | |-------|------|------|-------------| | `target_pose` | `Pose2D` | m, rad | Target position and heading | | `tolerance_position` | `f64` | m | Acceptable position error | | `tolerance_angle` | `f64` | rad | Acceptable heading error | | `timeout_seconds` | `f64` | s | Max time to reach goal (0 = unlimited) | | `timestamp_ns` | `u64` | ns | Timestamp | --- ## NavPath — Planned Waypoints A sequence of waypoints computed by a path planner. Each waypoint has position, heading, and speed. ### Key Fields | Field | Type | Description | |-------|------|-------------| | `waypoints` | `[Waypoint; 256]` | Array of waypoints (max 256) | | `waypoint_count` | `u16` | Number of valid waypoints | | `total_length` | `f64` | Total path length in meters | | `estimated_time` | `f64` | Estimated completion time in seconds | --- ## PathPlan — Compact Path Representation A flat-array path format optimized for zero-copy IPC. Stores waypoints as packed `[x, y, theta]` triples. ### Key Fields | Field | Type | Description | |-------|------|-------------| | `waypoint_data` | `[f32; 768]` | Packed `[x, y, theta]` × 256 waypoints | | `goal_pose` | `[f32; 3]` | Target `[x, y, theta]` | | `waypoint_count` | `u16` | Number of valid waypoints | --- ## CostMap — Navigation Cost Grid An inflated cost grid built on top of an [OccupancyGrid](/stdlib/messages/occupancy-grid). Used by planners to avoid obstacles with safety margins. ### Key Fields | Field | Type | Description | |-------|------|-------------| | `occupancy_grid` | `OccupancyGrid` | Base grid data | | `costs` | `Vec` | Cell costs (0=free, 253=lethal, 255=unknown) | | `inflation_radius` | `f32` | Obstacle inflation radius in meters | | `cost_scaling_factor` | `f32` | Exponential cost decay factor | --- ## Navigation Pipeline ```text User/Planner → NavGoal → Path Planner → NavPath/PathPlan → Path Follower → CmdVel → Drive ↑ CostMap (from OccupancyGrid + LiDAR) ``` ## Related Types - [CmdVel](/stdlib/messages/cmd-vel) — Velocity commands (output of path follower) - [Odometry](/stdlib/messages/odometry) — Robot position feedback - [OccupancyGrid](/stdlib/messages/occupancy-grid) — 2D grid map - [Pose2D](/stdlib/messages/pose) — Position representation ## Design Decisions **Why both NavPath and PathPlan?** `NavPath` carries rich waypoints (position, heading, speed, curvature) for sophisticated path followers. `PathPlan` uses a flat `[f32; 768]` array for maximum zero-copy efficiency when you only need `(x, y, theta)` triples. Use `NavPath` for full-featured navigation; use `PathPlan` for high-frequency replanning where minimal overhead matters. **Why 256 max waypoints?** Fixed-size arrays enable zero-copy shared memory transport. 256 waypoints at 5cm spacing covers 12.8m of path -- enough for most local planning horizons. Global planners that need longer paths can publish successive segments or use NavGoal for high-level goals. **Why tolerances on NavGoal instead of exact target matching?** Real robots can't reach exact positions due to sensor noise, wheel slip, and control loop granularity. Position and angle tolerances let the path follower declare "goal reached" when the robot is close enough, preventing endless oscillation around the target. **Why timeout on NavGoal?** A stuck robot should not attempt a goal forever. The timeout provides a safety bound -- if the robot can't reach the goal within the specified time, the navigation stack can abort and report failure via `GoalResult`, letting higher-level logic decide what to do next (retry, pick a different goal, or request human help). --- ## See Also - [Navigation Messages (Rust)](/rust/api/navigation-messages) — Full Rust API - [Navigation Messages (Python)](/python/messages/navigation) — Python API - [OccupancyGrid](/stdlib/messages/occupancy-grid) — Map representation - [CmdVel](/stdlib/messages/cmd-vel) — Velocity commands from path follower - [Odometry](/stdlib/messages/odometry) — Robot position feedback for navigation --- ## JointState / JointCommand Path: /stdlib/messages/joint-state Description: Multi-joint feedback and control for robot arms, grippers, and articulated mechanisms # JointState / JointCommand Messages for multi-joint robots: manipulator arms, grippers, legged robots, and any system with named revolute or prismatic joints. `JointState` reports current positions/velocities/efforts; `JointCommand` sends target positions/velocities. ## When to Use Use `JointState` to publish feedback from joint encoders (position, velocity, effort). Use `JointCommand` to send target positions or velocities to a servo controller. Supports up to 16 joints per message. ## ROS2 Equivalent `sensor_msgs/JointState` and `trajectory_msgs/JointTrajectoryPoint` — similar structure. ## JointState — Feedback ### Fields | Field | Type | Unit | Description | |-------|------|------|-------------| | `names` | `[[u8; 32]; 16]` | -- | Joint names (null-terminated strings) | | `joint_count` | `u8` | -- | Number of active joints (max 16) | | `positions` | `[f64; 16]` | rad / m | Position: radians (revolute) or meters (prismatic) | | `velocities` | `[f64; 16]` | rad/s / m/s | Velocity | | `efforts` | `[f64; 16]` | Nm / N | Torque (revolute) or force (prismatic) | | `timestamp_ns` | `u64` | ns | Timestamp | --- ## JointCommand — Control ### Fields | Field | Type | Unit | Description | |-------|------|------|-------------| | `joint_names` | `[[u8; 32]; 16]` | -- | Joint names (null-terminated strings) | | `joint_count` | `u8` | -- | Number of active joints (max 16) | | `positions` | `[f64; 16]` | rad / m | Target positions | | `velocities` | `[f64; 16]` | rad/s / m/s | Velocity limits or targets | | `efforts` | `[f64; 16]` | Nm / N | Torque/force limits or targets | | `timestamp_ns` | `u64` | ns | Timestamp | --- ## Common Patterns ### Arm Controller Node ```rust // simplified fn tick(&mut self) { // IMPORTANT: always recv() every tick if let Some(cmd) = self.cmd_sub.recv() { for i in 0..cmd.joint_count as usize { // SAFETY: clamp to joint limits let pos = cmd.positions[i].clamp(self.min_limits[i], self.max_limits[i]); self.write_servo(i, pos); } } // Read encoder feedback let mut state = JointState::default(); state.joint_count = self.num_joints; for i in 0..self.num_joints as usize { state.positions[i] = self.read_encoder(i); } self.state_pub.send(state); } fn shutdown(&mut self) -> Result<()> { // SAFETY: move all joints to home position for i in 0..self.num_joints as usize { self.write_servo(i, 0.0); } Ok(()) } ``` ## Quick Reference | Type | Direction | Key Fields | Use Case | |------|-----------|------------|----------| | `JointState` | Feedback (sensor to controller) | `positions`, `velocities`, `efforts` | Report current joint values from encoders | | `JointCommand` | Control (controller to actuator) | `positions`, `velocities`, `efforts` | Send target positions/velocities to servos | ## Design Decisions **Why fixed-size arrays of 16 joints instead of variable-length?** Fixed-size arrays enable zero-copy shared memory transport with no heap allocation. 16 joints cover common configurations: 6-DOF arms (6), humanoid arms (7), dual-arm setups (14), hexapod legs (3 x 6 = 18, split across two messages). For robots with more than 16 joints, publish separate messages per limb/chain. **Why `[u8; 32]` for joint names instead of strings?** Same rationale as Detection -- fixed-size Pod layout for zero-copy. 31 characters is enough for descriptive names like `"left_shoulder_pitch"` or `"gripper_finger_left"`. **Why positions/velocities/efforts as parallel arrays instead of per-joint structs?** Parallel arrays match how servo drivers and motor controllers work: you read all positions at once from a bus scan, write all targets in one command. This layout also maps directly to NumPy arrays for Python ML pipelines. **Why both position and velocity in JointCommand?** Different control modes need different fields. Position control uses `positions` (servo moves to target). Velocity control uses `velocities` (motor spins at target speed). Impedance/force control uses `efforts`. Publishing all three lets the receiving controller pick the appropriate mode without separate message types. ## Related Types - [ServoCommand](/rust/api/control-messages) — Single servo control - [TrajectoryPoint](/rust/api/control-messages) — Timed trajectory waypoint - [Servo Controller](/recipes/servo-controller) — Complete servo bus recipe --- ## See Also - [Sensor Messages (Rust)](/rust/api/sensor-messages) — Full Rust API - [Servo Controller Recipe](/recipes/servo-controller) — Multi-servo control - [BatteryState](/stdlib/messages/battery-state) — Power monitoring for actuated systems --- ## AudioFrame Path: /stdlib/messages/audio-frame Description: Audio data from microphones for speech recognition, anomaly detection, and human-robot interaction # AudioFrame Audio data from a microphone or audio source. Fixed-size Pod type for zero-copy shared memory transport. Supports mono, stereo, and multi-channel microphone arrays. ## When to Use Use `AudioFrame` when your robot has microphones and needs to share audio between nodes -- for example, between a microphone driver node, a speech recognition node, and an anomaly detection node. Common use cases: - **Voice commands** -- speech-to-text for human-robot interaction - **Anomaly detection** -- motor fault detection by sound - **Acoustic SLAM** -- using sound for localization - **Teleoperation** -- two-way audio between operator and robot ## ROS2 Equivalent `audio_common_msgs/AudioData` -- similar concept, but HORUS uses a fixed-size Pod buffer for zero-copy SHM instead of variable-length serialized bytes. ## Quick Start ## Constructors ## Fields | Field | Type | Unit | Description | |-------|------|------|-------------| | `samples` | `[f32; 4800]` | -- | Audio sample buffer (Rust), `list[float]` (Python -- only valid samples). Range: [-1.0, 1.0] (F32) | | `num_samples` | `u32` | -- | Number of valid samples in buffer | | `sample_rate` | `u32` | Hz | Sample rate (8000, 16000, 44100, 48000) | | `channels` | `u8` | -- | Channel count (1=mono, 2=stereo, N=mic array) | | `encoding` | `u8` | -- | Audio encoding (0=F32, 1=I16) | | `timestamp_ns` | `u64` | ns | Capture timestamp in nanoseconds | | `frame_id` | `[u8; 32]` | -- | Source identifier (e.g. "mic_left") | ## Computed Properties | Property | Type | Description | |----------|------|-------------| | `duration_ms()` | `f64` | Duration of this audio chunk in milliseconds | | `frame_count()` | `u32` | Number of audio frames (samples per channel) | | `valid_samples()` | `&[f32]` | Slice of only the valid samples (Rust) | ## Buffer Size `MAX_AUDIO_SAMPLES = 4800` -- enough for 48kHz at 100ms chunks. For common configurations: | Sample Rate | Chunk Duration | Samples Needed | Fits? | |-------------|---------------|----------------|-------| | 8kHz | 100ms | 800 | Yes | | 16kHz | 20ms | 320 | Yes | | 16kHz | 100ms | 1600 | Yes | | 44.1kHz | 20ms | 882 | Yes | | 48kHz | 100ms | 4800 | Yes (max) | | 48kHz stereo | 50ms | 4800 | Yes (max) | For longer chunks, send multiple frames. ## Multi-Channel Audio For microphone arrays, samples are **interleaved**: channel 0 sample 0, channel 1 sample 0, channel 0 sample 1, channel 1 sample 1, etc. ```rust // simplified // 4-channel mic array, 16kHz, 10ms chunk = 640 samples let samples = capture_4ch_audio(); // [ch0_s0, ch1_s0, ch2_s0, ch3_s0, ch0_s1, ...] let frame = AudioFrame::multi_channel(16000, 4, &samples); assert_eq!(frame.frame_count(), 160); // 640 / 4 channels ``` ## AudioEncoding The encoding format for audio samples in the buffer. | Variant | Value | Description | |---------|-------|-------------| | `F32` | 0 | 32-bit float, range [-1.0, 1.0] (normalized) | | `I16` | 1 | 16-bit signed integer, range [-32768, 32767] (PCM) | ```rust // simplified use horus::prelude::*; // Float encoding (default, best for processing) let frame = AudioFrame::mono(16000, &float_samples); assert_eq!(frame.encoding, AudioEncoding::F32 as u8); // Integer encoding (common for hardware capture) let mut frame = AudioFrame::default(); frame.encoding = AudioEncoding::I16 as u8; ``` ## Wire Format `AudioFrame` is a fixed-size Pod type (~19.2 KB). It uses the same zero-copy SHM transport as all other Pod messages -- no serialization overhead. ``` [f32 x 4800] samples = 19200 bytes [u32] num_samples = 4 bytes [u32] sample_rate = 4 bytes [u8] channels = 1 byte [u8] encoding = 1 byte [u8 x 2] padding = 2 bytes [u64] timestamp_ns = 8 bytes [u8 x 32] frame_id = 32 bytes Total = 19252 bytes ``` ## Design Decisions **Why fixed-size `[f32; 4800]` instead of variable-length?** Fixed-size enables zero-copy Pod transport with no heap allocation. 4800 samples fits the largest common configuration (48kHz mono at 100ms) and smaller configurations use only a portion of the buffer, with `num_samples` tracking the valid range. The ~19KB overhead per message is acceptable given the transport speed advantage. **Why F32 as the default encoding instead of I16 PCM?** Float-normalized audio ([-1.0, 1.0]) is the standard input format for speech recognition models (Whisper, Wav2Vec2), anomaly detection, and audio ML in general. This avoids a normalization step in every consumer node. For hardware that captures I16 PCM, convert once at the driver level. **Why interleaved multi-channel instead of planar?** Interleaved layout (`[ch0_s0, ch1_s0, ch0_s1, ch1_s1, ...]`) matches how audio hardware and ALSA/PulseAudio deliver data. This avoids a deinterleave step in the driver node. ML models that need planar audio can reshape via NumPy: `arr.reshape(-1, channels).T`. **Why 100ms max chunk duration?** Audio processing in robotics needs low latency for reactive behavior (voice commands, anomaly detection). 100ms chunks balance processing efficiency (enough samples for FFT) with responsiveness. For streaming speech recognition, 20ms chunks at 16kHz (320 samples) are typical. **AudioFrame vs Image for spectrograms:** Use `AudioFrame` for raw time-domain audio. If your pipeline computes spectrograms or mel-frequency features, publish the result as an `Image` (Mono32F encoding) -- this lets downstream ML nodes use the standard Image zero-copy path. --- ## See Also - [Sensor Messages (Rust)](/rust/api/sensor-messages) — Full Rust API - [Input Messages (Python)](/python/messages/input) — Python audio/input API - [Image](/stdlib/messages/image) — For spectrogram/mel-feature transport - [Async Nodes (Python)](/python/api/async-nodes) — Non-blocking audio streaming --- ## Message Types Path: /stdlib/messages Description: 55+ standard robotics message types with ROS2 equivalence mapping # Message Types HORUS provides **60+ typed messages** covering every common robotics domain. All types are available in both Rust (`use horus::prelude::*;`) and Python (`from horus import TypeName`). ## Coming from ROS2? | ROS2 Package | ROS2 Message | HORUS Equivalent | |-------------|-------------|-----------------| | `geometry_msgs` | Twist, Pose, Pose2D, TransformStamped, Vector3, Quaternion | Twist, Pose3D, Pose2D, TransformStamped, Vector3, Quaternion | | `sensor_msgs` | Imu, LaserScan, Image, PointCloud2, JointState, BatteryState, CameraInfo | Imu, LaserScan, Image, PointCloud, JointState, BatteryState, CameraInfo | | `nav_msgs` | Odometry, OccupancyGrid, Path | Odometry, OccupancyGrid, NavPath + Waypoint | | `vision_msgs` | Detection2D, Detection3D | Detection, Detection3D | | `audio_common_msgs` | AudioData | AudioFrame | | `std_msgs` | Header | *(embedded — timestamp_ns and frame_id are fields on each message)* | **Key difference from ROS2:** No separate `Header` message. Every HORUS message has `timestamp_ns` and `frame_id` as direct fields. ## Message Categories | Category | Types | Use Case | |----------|-------|----------| | **[Geometry](/rust/api/geometry-messages)** | Pose2D, Pose3D, Twist, Vector3, Point3, Quaternion, TransformStamped, Accel | Position, orientation, motion | | **[Sensors](/rust/api/sensor-messages)** | Imu, LaserScan, Odometry, JointState, BatteryState, Range, Temperature, MagneticField | Sensor data from hardware | | **[Control](/rust/api/control-messages)** | CmdVel, MotorCommand, ServoCommand, JointCommand, PidState | Motor and actuator commands | | **[Navigation](/rust/api/navigation-messages)** | NavGoal, GoalResult, Waypoint, NavPath, OccupancyGrid, CostMap, VelocityObstacle | Path planning and mapping | | **[Perception](/rust/api/perception-messages)** | Detection, Detection3D, TrackedObject, SegmentationMask, LandmarkArray, PlaneDetection | Computer vision and ML output | | **[Vision](/rust/api/vision-messages)** | CompressedImage, CameraInfo, RegionOfInterest, StereoInfo | Camera configuration and compressed data | | **[Force/Haptics](/rust/api/force-messages)** | WrenchStamped, ForceCommand, ContactInfo, ImpedanceParameters, HapticFeedback | Force sensing and control | | **[Diagnostics](/rust/api/diagnostics-messages)** | Heartbeat, DiagnosticStatus, NodeHeartbeat, SafetyStatus, EmergencyStop | System health monitoring | | **[Audio](/stdlib/messages/audio-frame)** | AudioFrame | Microphone data, speech, anomaly detection | | **[Input](/rust/api/input-messages)** | JoystickInput, KeyboardInput | Human input devices | ## Custom Messages Need a type that doesn't exist? Create your own: ```rust // simplified use horus::prelude::*; message! { MotorStatus { rpm: f32, current_amps: f32, temperature_c: f32, fault_code: u32, } } // Now use it like any standard message let topic: Topic = Topic::new("motor.status")?; ``` ## Design Decisions **Why fixed-size Pod types for most messages?** Fixed-size types (`#[repr(C)]`, `Copy`, `Pod`) can be placed directly in shared memory ring buffers with no serialization, allocation, or copying. This gives deterministic sub-microsecond latency. Variable-length types (Image, PointCloud) use a descriptor + pool pattern -- the descriptor is fixed-size and travels through the ring buffer, while bulk data lives in a separate memory pool. **Why no Header message like ROS2?** ROS2's `std_msgs/Header` adds an indirection layer and allocation for every message. HORUS embeds `timestamp_ns` and `frame_id` directly as fields on each message type, eliminating the extra allocation and keeping messages flat and Pod-compatible. **Why categories instead of a flat namespace?** Grouping by domain (geometry, sensor, control, navigation, perception, etc.) helps users discover the right type. The Rust `use horus::prelude::*` and Python `from horus import X` still provide flat access -- the categories are organizational, not API barriers. --- ## See Also - [Message Types Concept](/concepts/message-types) — How messages work in HORUS - [Custom Messages Tutorial](/tutorials/04-custom-messages) — Step-by-step guide - [Python Message Library](/python/library/python-message-library) — Python equivalents ======================================== # SECTION: Plugins ======================================== --- ## Creating CLI Plugins Path: /plugins/creating-plugins Description: Build custom CLI plugins that extend the horus command # Creating CLI Plugins A CLI plugin is a standalone binary that adds a subcommand to `horus`. When a user runs `horus mycommand`, HORUS discovers your plugin binary and executes it, passing through all arguments. ## Problem Statement You need to extend the `horus` CLI with a custom command for your team or project -- for example, a simulator launcher, a hardware calibration tool, or a deployment helper. ## When To Use - You want to add a new `horus ` that integrates seamlessly with the CLI - You need to distribute a tool that other HORUS users can install from the registry - You have project-specific tooling that should feel native to the HORUS workflow ## Prerequisites - Rust toolchain installed (`rustup`, `cargo`) - HORUS CLI installed ([Installation guide](/getting-started/installation)) - Basic familiarity with Rust and [clap](https://docs.rs/clap) for argument parsing --- ## Zero-Config Convention Any Rust package named `horus-*` with a `[[bin]]` target is automatically detected as a plugin. No extra configuration required. **Example:** A package named `horus-sim3d` with `[[bin]] name = "sim3d"` automatically provides the `horus sim3d` command. ## Step-by-Step Guide ### Step 1: Create a New Rust Project ```bash cargo new horus-mycommand cd horus-mycommand ``` ### Step 2: Set Up Cargo.toml ```toml [package] name = "horus-mycommand" version = "0.1.0" edition = "2021" description = "My custom HORUS plugin" [[bin]] name = "mycommand" path = "src/main.rs" [dependencies] clap = { version = "4", features = ["derive"] } ``` The key points: - Package name starts with `horus-` - The `[[bin]]` name defines the subcommand (users will run `horus mycommand`) - Use `clap` or similar for argument parsing ### Step 3: Implement the Plugin ```rust // src/main.rs use clap::Parser; #[derive(Parser)] #[command(name = "mycommand", about = "My custom HORUS command")] struct Cli { /// Target to operate on #[arg(short, long)] target: Option, /// Enable verbose output #[arg(short, long)] verbose: bool, } fn main() { let cli = Cli::parse(); // Check if running as a HORUS plugin if std::env::var("HORUS_PLUGIN").is_ok() { let horus_version = std::env::var("HORUS_VERSION") .unwrap_or_else(|_| "unknown".to_string()); if cli.verbose { eprintln!("Running as HORUS plugin (HORUS v{})", horus_version); } } // Your plugin logic here match cli.target { Some(target) => println!("Operating on: {}", target), None => println!("No target specified. Use --help for usage."), } } ``` ### Step 4: Build and Test Locally ```bash # Build the plugin cargo build --release # Test it standalone ./target/release/mycommand --help # Test it as a HORUS plugin (simulating the environment) HORUS_PLUGIN=1 HORUS_VERSION=0.1.0 ./target/release/mycommand --target foo ``` ### Step 5: Install Locally To test with the actual `horus` CLI, install the binary where HORUS can find it: ```bash # Option A: Copy to global plugin bin directory mkdir -p ~/.horus/bin cp target/release/mycommand ~/.horus/bin/horus-mycommand # Option B: Install via horus from a local path horus install --plugin horus-mycommand --local ``` Now `horus mycommand --help` should work. ## Environment Variables HORUS sets these environment variables when executing plugins: | Variable | Value | Description | |----------|-------|-------------| | `HORUS_PLUGIN` | `1` | Always set when running as a plugin | | `HORUS_VERSION` | e.g., `0.1.7` | Version of the HORUS CLI | Your plugin inherits the user's `stdin`, `stdout`, and `stderr`, so interactive prompts and colored output work normally. ## Plugin Discovery HORUS discovers plugin binaries in this order: 1. **Project `plugins.lock`** — `.horus/plugins.lock` in the current project 2. **Global `plugins.lock`** — `~/.horus/plugins.lock` 3. **Project bin directory** — `.horus/bin/horus-*` 4. **Global bin directory** — `~/.horus/bin/horus-*` 5. **System PATH** — Any `horus-*` binary in `$PATH` The first match wins. This means project-level plugins always override global ones. ## Plugin Detection HORUS discovers plugins by scanning for packages named `horus-*`. The discovery system reads your `Cargo.toml` `[package]` section (name, version, description) and auto-detects the plugin category from the name (e.g., `horus-realsense` → Camera, `horus-rplidar` → LiDAR, `horus-sim3d` → Simulation). ## Security When a plugin is installed through the registry, HORUS records a SHA-256 checksum of the binary. Before each execution, the checksum is verified: - If the binary has been modified, HORUS refuses to run it - Run `horus verify` to check all plugin integrity - Reinstall a plugin with `horus install --plugin ` if verification fails ## Example: A Complete Plugin Here is a minimal but complete plugin that queries HORUS topic statistics: ```rust use clap::Parser; use std::path::PathBuf; #[derive(Parser)] #[command(name = "topic-stats", about = "Show topic statistics summary")] struct Cli { /// Output as JSON #[arg(long)] json: bool, /// Path to shared memory directory #[arg(long, default_value = "/dev/shm/horus")] shm_dir: PathBuf, } fn main() -> Result<(), Box> { let cli = Cli::parse(); if !cli.shm_dir.exists() { eprintln!("No HORUS topics found at {}", cli.shm_dir.display()); std::process::exit(1); } let mut topics = Vec::new(); for entry in std::fs::read_dir(&cli.shm_dir)? { let entry = entry?; if entry.file_type()?.is_file() { let name = entry.file_name().to_string_lossy().to_string(); let size = entry.metadata()?.len(); topics.push((name, size)); } } if cli.json { println!("{}", serde_json::to_string_pretty(&topics)?); } else { println!("Found {} topics:", topics.len()); for (name, size) in &topics { println!(" {} ({} bytes)", name, size); } } Ok(()) } ``` **Cargo.toml:** ```toml [package] name = "horus-topic-stats" version = "0.1.0" edition = "2021" description = "Show HORUS topic statistics" [[bin]] name = "topic-stats" path = "src/main.rs" [dependencies] clap = { version = "4", features = ["derive"] } serde_json = "1" ``` After building and installing, users run: ```bash horus topic-stats horus topic-stats --json ``` ## Common Errors ### "no such command: `mycommand`" The plugin binary is not in a location HORUS searches. Ensure the binary is named `horus-mycommand` and is in `~/.horus/bin/`, `.horus/bin/`, or your system `PATH`. ### Plugin runs standalone but not via `horus` Check that the `[[bin]]` name in `Cargo.toml` matches the subcommand you expect. The binary name `mycommand` maps to `horus mycommand`. Also verify the installed binary is named `horus-mycommand` (with the `horus-` prefix). ### "Checksum mismatch" on execution The plugin binary was modified after installation. Reinstall with `horus install --plugin horus-mycommand` to update the recorded checksum. --- ## Next Steps - **[Publishing Plugins](/package-management/package-management#publishing-packages)** — Publish your plugin to the HORUS registry - **[Managing Plugins](/plugins/managing-plugins)** — Install, enable/disable, and verify plugins --- ## See Also - [Managing Plugins](/plugins/managing-plugins) — Install and list plugins - [CLI Reference](/development/cli-reference) — How plugins extend the CLI --- ## Managing Plugins Path: /plugins/managing-plugins Description: Install, remove, enable/disable, and verify HORUS plugins # Managing Plugins HORUS provides CLI commands for managing plugins throughout their lifecycle. ## Problem Statement You need to install, update, disable, or remove CLI plugins across your development workflow -- whether globally for all projects or locally for a specific one. ## When To Use - Installing community or vendor plugins from the registry - Managing plugin versions across multiple projects - Temporarily disabling a plugin without uninstalling it - Verifying plugin integrity after system updates ## Prerequisites - HORUS CLI installed ([Installation guide](/getting-started/installation)) - Network access to the HORUS registry (for remote installs) --- ## Installing Plugins Plugins are installed through the `horus install --plugin` command. They default to **global** installation since they extend the CLI tool itself. ```bash # Install a plugin globally (default) horus install --plugin horus-sim3d # Install a specific version horus install --plugin horus-sim3d -v 1.2.0 # Install locally to current project only horus install --plugin horus-sim3d --local ``` **What happens during installation:** 1. Downloads the package from the registry 2. Detects plugin from the `horus-*` naming convention 3. Registers the plugin binary in `plugins.lock` 4. Creates symlinks in the bin directory 5. Updates project configuration (for local installs) ### Global vs Local Installation | Aspect | Global | Local | |--------|--------|-------| | **Location** | `~/.horus/cache/` | `.horus/packages/` | | **Scope** | All projects | Current project only | | **Lock file** | `~/.horus/plugins.lock` | `.horus/plugins.lock` | | **Default for plugins** | Yes | No (use `--local`) | | **Override behavior** | — | Overrides global | Local plugins always take priority over global plugins with the same command name. This lets you pin a specific plugin version for a project without affecting other projects. ## Listing Plugins ```bash # List installed plugins horus list # List all plugins including disabled horus list --all ``` ## Searching for Plugins ```bash # Search for plugins by keyword horus search camera # Show all available plugins from registry horus search # Include local development plugins horus search --local # Show detailed info about a specific plugin horus info horus-rplidar ``` ## Removing Plugins ```bash # Remove a plugin horus remove horus-sim3d # Remove from global scope explicitly horus remove horus-sim3d --global ``` ## Enabling and Disabling Plugins You can temporarily disable a plugin without uninstalling it: ```bash # Disable a plugin horus disable sim3d # Disable with a reason horus disable sim3d --reason "Conflicts with sim2d" # Re-enable it horus enable sim3d ``` Disabled plugins remain installed but HORUS will not execute them. ## Verifying Plugin Integrity HORUS records SHA-256 checksums when plugins are installed. Verify that no binaries have been tampered with: ```bash # Verify all plugins horus verify # Verify a specific plugin horus verify sim3d ``` ## The plugins.lock File Plugin registrations are stored in `plugins.lock` (JSON format). You typically don't need to edit this file directly: ```json { "schema_version": "1.0", "scope": "global", "horus_version": "0.1.9", "updated_at": "2025-10-15T10:30:00Z", "plugins": { "sim3d": { "package": "horus-sim3d", "version": "1.0.0", "source": { "type": "registry" }, "binary": "/home/user/.horus/cache/horus-sim3d@1.0.0/bin/sim3d", "checksum": "sha256:abc123...", "installed_at": "2025-10-15T10:30:00Z", "installed_by": "0.1.9", "compatibility": { "horus_min": "0.1.0", "horus_max": "2.0.0" }, "commands": [ { "name": "sim3d", "description": "HORUS 3D robot simulator" } ] } }, "disabled": {}, "inherit_global": true } ``` **File locations:** - Global: `~/.horus/plugins.lock` - Project: `.horus/plugins.lock` ## Using Plugins via `horus install` The `horus install` command provides smart auto-detection. It recognizes plugins automatically: ```bash # Auto-detects as a plugin and installs globally horus install horus-sim3d # Force install as plugin (if auto-detection fails) horus install my-tool --plugin ``` ## Troubleshooting ### Plugin Not Found ``` error: no such command: `mycommand` ``` **Solutions:** ```bash # Check if plugin is installed horus list # Reinstall horus install --plugin horus-mycommand # Check if binary exists in PATH which horus-mycommand ``` ### Checksum Mismatch The binary was modified after installation. Reinstall: ```bash horus install --plugin horus-mycommand ``` ### Plugin Binary Not Found The package is registered but the binary is missing. Reinstall: ```bash horus remove horus-mycommand horus install --plugin horus-mycommand ``` ## Common Errors ### "Registry unavailable" during install The HORUS registry is unreachable. Check your network connection, or use `horus doctor` to diagnose connectivity. For offline development, use local installs with `--local`. ### Plugin version conflicts between projects Use local plugin installs (`--local`) to pin a specific version per project. Local plugins always override global ones with the same command name. --- ## Next Steps - **[Creating Plugins](/plugins/creating-plugins)** — Build your own plugin - **[Package Management](/package-management/package-management)** — General package management --- ## See Also - [Creating Plugins](/plugins/creating-plugins) — Build your own plugin - [Package Management](/package-management/package-management) — Package registry --- ## Plugins Path: /plugins Description: Extend the HORUS CLI with custom subcommands # Plugins HORUS supports **CLI plugins** — standalone binaries that add new subcommands to the `horus` command. When you run `horus sim3d`, HORUS discovers and executes the `sim3d` plugin binary, passing through all arguments. --- ## Quick Reference | Task | Command | |------|---------| | Install a plugin | `horus install --plugin horus-sim3d` | | List installed plugins | `horus list` | | Use a plugin | `horus sim3d --robot kuka_iiwa` | | Verify plugin integrity | `horus verify` | | Disable a plugin | `horus disable sim3d` | | Enable a plugin | `horus enable sim3d` | | Remove a plugin | `horus remove horus-sim3d` | --- ## How CLI Plugins Work 1. Any package named `horus-*` with a binary is automatically a plugin 2. Installing it registers the binary with the HORUS plugin system 3. Running `horus ` checks plugins before showing "unknown command" ```bash # Install a plugin horus install --plugin horus-sim3d # Use it as a native command horus sim3d --robot kuka_iiwa # List installed plugins horus list ``` ## Plugin Resolution Order | Priority | Location | Scope | |----------|----------|-------| | 1st | `.horus/plugins.lock` (project) | Local project only | | 2nd | `~/.horus/plugins.lock` (global) | All projects | | 3rd | `.horus/bin/horus-*` (project) | Local project only | | 4th | `~/.horus/bin/horus-*` (global) | All projects | | 5th | `PATH` lookup for `horus-*` binaries | System-wide | Project plugins override global ones, so you can pin a specific version per project. ## Security Plugins are verified via SHA-256 checksums recorded at install time. Before execution, HORUS verifies the binary hasn't been modified: ```bash # Verify all installed plugins horus verify ``` ## Next Steps - **[Creating Plugins](/plugins/creating-plugins)** — Build a plugin that adds commands to `horus` - **[Managing Plugins](/plugins/managing-plugins)** — Install, remove, and configure plugins --- ## See Also - [Package Management](/package-management/package-management) — Registry and dependencies - [CLI Reference](/development/cli-reference) — All HORUS commands ======================================== # SECTION: Package Management ======================================== --- ## Using Pre-Built Nodes Path: /package-management/using-prebuilt-nodes Description: The idiomatic way to build HORUS applications with ready-made components # Using Pre-Built Nodes **The HORUS Philosophy:** Don't reinvent the wheel. Use comprehensive, battle-tested nodes from the registry and `horus_library`, then configure them to work together. ## Problem Statement You need to build a robotics application but don't want to write every sensor driver, controller, and algorithm from scratch. ## When To Use - Starting a new project and looking for ready-made components - Building a system from common robotics building blocks (PID controllers, sensor drivers, planners) - Prototyping quickly before investing in custom implementations ## Prerequisites - HORUS CLI installed ([Installation guide](/getting-started/installation)) - A HORUS project (`horus new mybot`) - Familiarity with [Nodes](/concepts/core-concepts-nodes) and [Topics](/concepts/core-concepts-topic) --- ## Why Use Pre-Built Nodes? **Advantages of pre-built nodes:** - Production-ready and tested - Configure instead of coding - Focus on application logic, not infrastructure - Nodes use standard HORUS interfaces for interoperability ## Quick Example Instead of writing a PID controller from scratch, just install and configure: ```bash # Install from registry horus install pid-controller ``` ```rust use pid_controller::PIDNode; use horus::prelude::*; fn main() -> Result<()> { let mut scheduler = Scheduler::new(); // Configure the pre-built node let pid = PIDNode::new(1.0, 0.1, 0.01); // kp, ki, kd scheduler.add(pid).order(5).build()?; scheduler.run()?; Ok(()) } ``` That's it! Production-ready PID control in 3 lines. --- ## Discovering Pre-Built Nodes ### From the Registry **Web Interface:** ```bash # Visit the registry in your browser https://registry.horusrobotics.dev ``` Browse by category: - **Control** - PID controllers, motion planners - **Perception** - Camera, LIDAR, sensor fusion - **Drivers** - Motor controllers, sensor interfaces - **Safety** - Emergency stop, watchdogs - **Utilities** - Loggers, data recorders **CLI Search:** ```bash # Search for specific functionality horus list sensor horus list controller horus list motor ``` ### From Standard Library The `horus_library` crate includes standard message types used across nodes: ```rust use horus::prelude::*; // Motion messages CmdVel, Twist, Pose2D, Odometry // Sensor messages LaserScan, Imu, BatteryState, PointCloud // Input messages KeyboardInput, JoystickInput // And many more... ``` **Note**: Hardware-interfacing nodes (sensor drivers, motor controllers, etc.) are available as **registry packages** or **Python nodes** -- they are not built into `horus_library`. Search the registry for ready-made nodes. --- ## Installation Patterns ### Installing from HORUS Registry ```bash # Latest version horus install motion-planner # Specific version horus install sensor-fusion -v 2.1.0 # Multiple packages horus install pid-controller motion-planner sensor-drivers ``` ### Adding crates.io Dependencies ```bash # Add Rust project dependencies (writes to horus.toml) horus add serde horus add tokio@1.35.0 ``` ### Adding PyPI Dependencies ```bash # Add Python project dependencies (writes to horus.toml) horus add numpy horus add opencv-python ``` ### Using Standard Library The standard library is available automatically with `horus run`: ```bash horus run main.rs # horus_library is included by default ``` Or explicitly in your `Cargo.toml`: ```toml [dependencies] horus = { path = "..." } horus_library = { path = "..." } ``` --- ## The Idiomatic Pattern ### 1. Discover What You Need **Example Goal:** Build a mobile robot with keyboard control **Required Nodes:** - Input: Keyboard control - Control: Velocity command processing - Output: Motor driver ### 2. Search and Install ```bash # Check what's available horus list keyboard horus list motor # Install what you need horus install keyboard-input horus install differential-drive ``` ### 3. Configure and Compose ```rust use keyboard_input::KeyboardNode; use differential_drive::DiffDriveNode; use horus::prelude::*; fn main() -> Result<()> { let mut scheduler = Scheduler::new(); // Keyboard input node (order 0 - runs first) let keyboard = KeyboardNode::new("keyboard.input")?; scheduler.add(keyboard).order(0).build()?; // Differential drive controller (order 5) let drive = DiffDriveNode::new( "keyboard.input", // Input topic "motor.left", // Left motor output "motor.right", // Right motor output 0.5 // Wheel separation (meters) )?; scheduler.add(drive).order(5).build()?; scheduler.run()?; Ok(()) } ``` **That's it!** A functional robot in ~20 lines, no custom nodes needed. --- ## Common Workflows ### Mobile Robot Base ```bash # Install components horus install keyboard-input horus install differential-drive horus install emergency-stop ``` ```rust use keyboard_input::KeyboardNode; use differential_drive::DiffDriveNode; use emergency_stop::EStopNode; use horus::prelude::*; fn main() -> Result<()> { let mut scheduler = Scheduler::new(); // Input scheduler.add(KeyboardNode::new("keyboard")?).order(0).build()?; // Safety (runs first!) scheduler.add(EStopNode::new("estop", "cmd_vel")?).order(0).build()?; // Drive control scheduler.add(DiffDriveNode::new("cmd_vel", "motor.left", "motor.right", 0.5)?) .order(1).build()?; scheduler.run()?; Ok(()) } ``` ### Sensor Fusion System ```bash horus install lidar-driver horus install imu-driver horus install kalman-filter ``` ```rust use lidar_driver::LidarNode; use imu_driver::ImuNode; use kalman_filter::EKFNode; use horus::prelude::*; fn main() -> Result<()> { let mut scheduler = Scheduler::new(); // Sensors (order 2) scheduler.add(LidarNode::new("/dev/ttyUSB0", "scan")?).order(2).build()?; scheduler.add(ImuNode::new("/dev/i2c-1", "imu")?).order(2).build()?; // Fusion (order 3 - runs after sensors) scheduler.add(EKFNode::new("scan", "imu", "pose")?).order(3).build()?; scheduler.run()?; Ok(()) } ``` ### Vision Processing Pipeline ```bash # Install vision packages from registry horus install camera-driver horus install image-processor horus install object-detector ``` ```rust use camera_driver::CameraNode; use image_processor::ImageProcessorNode; use object_detector::ObjectDetectorNode; use horus::prelude::*; fn main() -> Result<()> { let mut scheduler = Scheduler::new(); // Using registry packages let camera = CameraNode::new("/dev/video0", "camera.raw", 30)?; let processor = ImageProcessorNode::new("camera.raw", "camera.processed")?; let detector = ObjectDetectorNode::new("camera.processed", "objects")?; scheduler.add(camera).order(2).build()?; scheduler.add(processor).order(3).build()?; scheduler.add(detector).order(3).build()?; scheduler.run()?; Ok(()) } ``` --- ## Configuration Best Practices ### Use Builder Patterns Many registry packages support fluent configuration: ```rust // Example: camera-driver package from registry let camera = CameraNode::new("/dev/video0")? .with_resolution(1920, 1080) .with_fps(60) .with_format(ImageFormat::RGB8); scheduler.add(camera).order(2).build()?; ``` ### Parameter-Based Configuration Configure nodes via the parameter system: ```rust use horus::prelude::*; // Set parameters via RuntimeParams let params = RuntimeParams::init()?; params.set("motor.max_speed", 2.0)?; params.set("motor.acceleration", 0.5)?; // Node reads from parameters let motor = MotorNode::from_params()?; scheduler.add(motor).order(1).build()?; ``` **Adjust at runtime via monitor!** ### Reproducible Setup Commit `horus.lock` to git to pin all dependency versions. On another machine, `horus build` will install the exact same versions. --- ## Composing Complex Systems ### Pipeline Pattern Chain nodes together via topics: Sensor"] -->|topic| F["Filter"] F -->|topic| C["Controller"] C -->|topic| A["Actuator"] style S fill:#3b82f6,color:#fff style F fill:#10b981,color:#fff style C fill:#f59e0b,color:#fff style A fill:#ef4444,color:#fff `} /> ```rust // Each node subscribes to previous, publishes to next scheduler.add(sensor).order(2).build()?; // Publishes "raw" scheduler.add(filter).order(3).build()?; // Subscribes "raw", publishes "filtered" scheduler.add(controller).order(4).build()?; // Subscribes "filtered", publishes "cmd" scheduler.add(actuator).order(5).build()?; // Subscribes "cmd" ``` ### Parallel Processing Multiple nodes at same priority run concurrently: ```rust // All run in parallel (order 2) scheduler.add(lidar).order(2).build()?; scheduler.add(camera).order(2).build()?; scheduler.add(imu).order(2).build()?; ``` ### Safety Layering Critical nodes run first: ```rust // Order 0 - Safety checks (runs first) scheduler.add(watchdog).order(0).build()?; scheduler.add(estop).order(0).build()?; // Order 1 - Control scheduler.add(controller).order(1).build()?; // Order 2 - Sensors scheduler.add(lidar).order(2).build()?; // Order 4 - Logging (runs last) scheduler.add(logger).order(4).build()?; ``` --- ## When to Build Custom Nodes **Use pre-built nodes when:** - Functionality exists in the registry or `horus_library` - Node can be configured to your needs - Performance is acceptable **Build custom nodes when:** - No existing node matches your hardware - Unique algorithm or business logic - Extreme performance requirements **Pro tip:** Even then, consider: 1. Starting with a similar pre-built node 2. Forking and modifying it 3. Publishing your improved version back to the registry --- ## Finding the Right Node ### By Use Case **I need to...** - Control a motor `motor-driver`, `differential-drive`, `servo-controller` - Read a sensor `lidar-driver`, `camera-node`, `imu-driver` - Process data `kalman-filter`, `pid-controller`, `image-processor` - Handle safety `emergency-stop`, `safety-monitor`, `watchdog` - Log data `data-logger`, `rosbag-writer`, `csv-logger` ### By Hardware ```bash # Search by device type horus list lidar horus list camera horus list imu ``` ### By Category Browse registry by category: - **control** - Motion control, PID, path following - **perception** - Sensors, computer vision, SLAM - **planning** - Path planning, motion planning - **drivers** - Hardware interfaces - **safety** - Safety systems, fault tolerance - **utils** - Logging, visualization, debugging --- ## Package Quality Indicators When choosing packages, look for: **High Download Count** ``` Downloads: 5,234 (last 30 days) ``` **Recent Updates** ``` Last updated: 2025-09-28 ``` **Good Documentation** ``` Documentation: 98% coverage ``` **Active Maintenance** ``` Issues: 2 open, 45 closed (96% resolution rate) ``` --- ## Complete Example: Autonomous Robot **Goal:** Build an autonomous mobile robot that avoids obstacles **1. Install Components:** ```bash horus install lidar-driver horus install obstacle-detector horus install path-planner horus install differential-drive horus install emergency-stop ``` **2. Compose System:** ```rust use lidar_driver::LidarNode; use obstacle_detector::ObstacleDetectorNode; use path_planner::LocalPlannerNode; use differential_drive::DiffDriveNode; use emergency_stop::EStopNode; use horus::prelude::*; fn main() -> Result<()> { let mut scheduler = Scheduler::new(); // Safety (order 0 - runs first) scheduler.add(EStopNode::new("estop", "cmd_vel")?).order(0).build()?; // Sensors (order 1) scheduler.add(LidarNode::new("/dev/ttyUSB0", "scan")?).order(1).build()?; // Perception (order 2) scheduler.add(ObstacleDetectorNode::new("scan", "obstacles")?).order(2).build()?; // Planning (order 3) scheduler.add(LocalPlannerNode::new("obstacles", "cmd_vel")?).order(3).build()?; // Control (order 4) scheduler.add(DiffDriveNode::new("cmd_vel", "motor.left", "motor.right", 0.5)?).order(4).build()?; scheduler.run()?; Ok(()) } ``` **That's a full autonomous robot in ~40 lines of configuration!** --- ## Common Errors ### "Package not found" when installing The package name may be misspelled or not published yet. Search with `horus search ` to find the correct name. ### Node topic mismatch Pre-built nodes expect specific topic names. Check the package documentation for the expected topic names and ensure your nodes publish and subscribe to matching topics. ### Version incompatibility If a pre-built node requires a newer HORUS version, update with the one-line installer. Use `horus doctor` to check your toolchain versions. --- ## Next Steps - **[Package Management](/package-management/package-management)** - Discover and manage packages - **[node! Macro](/concepts/node-macro)** - When you need custom functionality - **[Examples](/rust/examples/basic-examples)** - See complete working systems --- ## See Also - [Package Management](/package-management/package-management) — Installing packages - [Plugins](/plugins) — CLI plugin system --- ## Package Management Path: /package-management/package-management Description: Install, publish, and manage reusable HORUS components # Package Management > **Note**: Publishing packages requires the registry backend to be deployed. Installing public packages works immediately. HORUS provides a comprehensive package management system for sharing and discovering robotics components. Create reusable nodes, message types, and algorithms that the community can use. ## Quick Reference | Task | Command | |------|---------| | Add a dependency | `horus add serde` | | Add with version | `horus add serde@1.0 --features derive` | | Install a tool/plugin | `horus install horus-sim3d` | | Remove a package | `horus remove pid-controller` | | Search registry | `horus search slam` | | List installed | `horus list` | | Update all | `horus update` | | Authenticate | `horus auth login` | | Publish | `horus publish` | --- ## Overview The package system allows you to: - **Install packages** from multiple sources (HORUS registry, crates.io, PyPI) - **Publish your work** for others to use - **Manage dependencies** automatically - **Version control** with semantic versioning - **Search and discover** community packages ## Package Sources HORUS supports packages from multiple sources: | Source | Description | Command | |--------|-------------|---------| | **HORUS Registry** | Curated robotics packages | `horus add pid-controller` or `horus install pid-controller` | | **crates.io** | Rust ecosystem packages | `horus add serde` | | **PyPI** | Python ecosystem packages | `horus add numpy` | | **Git** | Git repositories | `horus add https://github.com/org/repo` | | **Local Path** | Local filesystem | `horus add ./path/to/pkg` | ### `horus add` vs `horus install` | Command | Like | Purpose | Modifies horus.toml? | |---------|------|---------|---------------------| | `horus add serde` | `cargo add` | Add project dependency | Yes | | `horus install slam-toolbox` | `cargo install` | Install standalone tool/plugin globally | No | Use `horus add` for libraries your project depends on. Use `horus install` for standalone tools, plugins, and CLI extensions. ## Quick Start ### Adding Project Dependencies ```bash # Add from crates.io (auto-detected for Rust projects) horus add serde horus add tokio # Add from PyPI (auto-detected for Python projects) horus add numpy horus add opencv-python # Add with specific version and features horus add serde@1.0 --features derive # Add from HORUS registry horus add pid-controller ``` ### Installing Standalone Packages ```bash # Install a standalone tool or plugin globally horus install horus-sim3d horus install rplidar-driver@1.2.0 horus install horus-visualizer --plugin ``` ### Automatic Source Detection HORUS automatically detects the package source: 1. First checks HORUS registry 2. Then checks both PyPI and crates.io 3. If found in multiple sources, prompts you to choose: ``` Package 'package_name' found in BOTH PyPI and crates.io Which package source do you want to use? [1] [PYTHON] PyPI (Python package) [2] [RUST] crates.io (Rust binary) [3] [FAIL] Cancel installation Choice [1-3]: ``` ### System Package Detection If a package is already installed system-wide, HORUS offers to reuse it: ``` Package 'ripgrep' v14.0.0 already installed system-wide [1] Use system package (no download) [2] Install fresh copy to HORUS [3] Cancel Choice [1-3]: ``` **What happens during installation:** 1. Detects package source (HORUS registry, crates.io, or PyPI) 2. Downloads package from the appropriate source 3. Resolves dependencies automatically 4. Caches locally in `~/.horus/cache/` or `.horus/packages/` 5. Makes package available for use ### Using an Installed Package ```rust // In your main.rs or any file use pid_controller::PIDNode; use horus::prelude::*; fn main() -> Result<()> { let mut scheduler = Scheduler::new(); // Use the installed package let pid = PIDNode::new(1.0, 0.1, 0.01); scheduler.add(pid).order(5).build()?; scheduler.run()?; Ok(()) } ``` ### Publishing Your Package ```bash # 1. Authenticate first (one-time) horus auth login # 2. Navigate to your project cd my-awesome-controller # 3. Publish horus publish ``` ## Package Locations ### Local Packages **Project-local** (default): ``` my_project/ ── .horus/ ── packages/ ── pid-controller@1.0.0/ # HORUS registry ── serde@1.0.200/ # crates.io ── pypi_numpy@1.24.0/ # PyPI (prefixed with pypi_) ── src/ ── main.rs ``` **Why use local:** - Different projects can use different versions - Clean separation per project - Easy to delete with project ### Global Packages **System-wide** (installed with `-g` flag): ``` ~/.horus/ ── cache/ ── pid-controller@1.0.0/ # HORUS registry ── serde@1.0.200/ # crates.io ── pypi_numpy@1.24.0/ # PyPI packages ── git_abc123/ # Git dependencies ``` **Naming conventions by source:** | Source | Directory Format | Example | |--------|------------------|---------| | HORUS Registry | `@/` | `pid-controller@1.0.0/` | | crates.io | `@/` | `serde@1.0.200/` | | PyPI | `pypi_@/` | `pypi_numpy@1.24.0/` | | Git | `git_/` | `git_abc123def/` | **Why use global:** - Share common packages across all projects - Save disk space (one copy for everything) - Faster install after first download ### Priority Order & Smart Dependency Resolution When resolving packages, HORUS checks in this order: **1. Project-local `.horus/packages/` (highest priority)** - Checked first, ALWAYS wins - Can be symlink to global OR real directory - Enables local override of broken global packages **2. Global cache `~/.horus/cache/`** - Only checked if not found locally - Shared across all projects - Version-specific directories (e.g., `serde@1.0.228/`) **3. System install `/usr/local/lib/horus/` (if available)** - Last resort fallback **Smart Dependency Resolution:** When you run `horus add`, HORUS auto-detects the source and writes to `horus.toml`: ```bash # Add a dependency (auto-detects source) horus add serde # Rust project → crates.io horus add numpy # Python project → PyPI horus add pid-controller # → horus registry # Dependencies are fetched on next horus build/run ``` **Local Package Cache:** Project dependencies installed from the horus registry are cached in `.horus/packages/` with symlinks to the global cache (`~/.horus/cache/`) for disk efficiency. Local copies always take precedence over global ones. **Benefits:** - **Local override** - Bypass broken global packages - **Version isolation** - Different projects can use different versions - **Disk efficient** - Shares global cache when possible - **Zero config** - Works automatically Commit `horus.lock` to git so teammates get identical dependency versions with `horus build`. ## Package Commands ### `horus add` Add a project dependency to `horus.toml`. Auto-detects the source (crates.io, PyPI, horus registry) from your project language. Like `cargo add`. **Usage:** ```bash horus add [OPTIONS] ``` **Options:** - `-s, --source ` - Override source: `crates-io`, `pypi`, `system`, `registry`, `git`, `path` - `-F, --features ` - Enable features (comma-separated) - `--dev` - Add to `[dev-dependencies]` - `--driver` - Add to `[hardware]` **Examples:** ```bash # Auto-detected source based on project language horus add serde # Rust project → crates.io horus add numpy # Python project → PyPI horus add pid-controller # → horus registry # With version and features horus add serde@1.0 --features derive horus add tokio --features full # Explicit source override horus add opencv --source system # Dev dependencies horus add criterion --dev # Drivers horus add camera --driver ``` ### `horus install` Install a standalone package or plugin globally. Like `cargo install` — does NOT modify `horus.toml`. **Usage:** ```bash horus install [OPTIONS] ``` **Options:** - `--plugin` - Install as CLI plugin - `-t, --target ` - Target workspace/project name **Examples:** ```bash # Install standalone tool/plugin horus install horus-sim3d horus install rplidar-driver@1.2.0 horus install horus-visualizer --plugin ``` #### Installing from crates.io When installing Rust packages from crates.io, HORUS uses `cargo install` under the hood: ```bash horus install ripgrep ``` **Output:** ``` Installing ripgrep from crates.io... Compiling ripgrep... Installing with cargo... Package installed: ripgrep@14.0.0 Location: ~/.horus/cache/ripgrep@14.0.0/ ``` **Requirements:** - Rust toolchain must be installed (`rustup`) - `cargo` must be available in PATH #### Adding PyPI Dependencies Add Python packages as project dependencies with `horus add`: ```bash horus add numpy horus add opencv-python ``` This writes to `horus.toml [dependencies]` with `source = "pypi"`. The package is installed via pip on the next `horus build` or `horus run`. **Requirements:** - Python 3.x must be installed - `pip` must be available in PATH When using `horus run`, Python package paths are automatically configured — no manual `sys.path` needed. **HORUS Registry Output:** ``` Installing pid-controller@1.2.0... Downloaded (245 KB) Extracted to .horus/packages/pid-controller@1.2.0/ Installed dependencies: control-utils@1.0.0 Build successful Package installed: pid-controller@1.2.0 Location: .horus/packages/pid-controller@1.2.0/ Usage: use pid_controller::PIDNode; ``` ### `horus remove` Uninstall a package. **Usage:** ```bash horus remove ``` **Options:** - `-g, --global` - Remove from global cache - `-t, --target ` - Target workspace/project name **Examples:** ```bash # Remove local package horus remove motion-planner # Remove from global cache horus remove common-utils -g # Remove from specific workspace horus remove pid-controller -t my-project ``` **Output:** ``` Removing pid-controller@1.2.0... Removed from .horus/packages/ Freed 892 KB Package removed: pid-controller@1.2.0 ``` ### `horus list` List installed packages or search the registry. **Usage:** ```bash horus list [QUERY] [OPTIONS] ``` **Options:** - `-g, --global` - List global cache packages - `-a, --all` - List all (local + global) **List Local Packages:** ```bash horus list ``` **Output:** ``` Local packages: pid-controller 1.2.0 motion-planner 2.0.1 sensor-drivers 1.5.0 ``` **List Global Cache:** ```bash horus list -g ``` **Search Registry:** ```bash # Search by keyword horus list sensor ``` **Output:** ``` Found 3 package(s): sensor-fusion 2.1.0 - Kalman filter fusion sensor-drivers 1.5.0 - LIDAR/IMU/camera drivers sensor-calibration 1.0.0 - Calibration tools ``` ### `horus update` Update installed packages to their latest versions. **Usage:** ```bash horus update [PACKAGE] [OPTIONS] ``` **Options:** - `-g, --global` - Update global cache packages - `--dry-run` - Show what would be updated without making changes **Examples:** ```bash # Update all local packages horus update # Update a specific package horus update pid-controller # Update global packages horus update -g # Preview updates without applying horus update --dry-run ``` ### `horus search` Search the registry for packages. **Usage:** ```bash horus search [OPTIONS] ``` **Options:** - `--category ` - Filter by category (driver, algorithm, plugin, tool) **Examples:** ```bash horus search slam horus search lidar --category driver ``` --- ## Dependency Management ### Version Constraints ```bash horus add pid-controller@1.2.0 # Exact version horus add motion-planner@^2.0 # Compatible (2.x.x, not 3.0.0) horus add sensor-drivers@~1.5.0 # Patch updates (1.5.x) ``` ### Automatic Resolution HORUS automatically resolves and installs transitive dependencies: ```bash horus add robot-controller ``` ``` Resolving dependencies... robot-controller@1.0.0 ├── motion-planner@2.0.1 │ └── pathfinding-utils@1.2.0 └── pid-controller@1.2.0 └── control-utils@1.0.0 Installing 5 packages... ✓ All dependencies installed ``` ### Specifying Dependencies Use `horus add` for all dependency sources: ```bash # HORUS registry horus add pid-controller horus add motion-planner # crates.io horus add serde --source crates-io # PyPI horus add numpy --source pypi # System library horus add opencv --source system ``` See [Configuration Reference](/package-management/configuration) for details on `horus.toml` dependency syntax. --- ## Common Workflows ### Using Multiple Packages ```bash # Add dependencies horus add pid-controller horus add motion-planner horus add sensor-fusion ``` ```rust use pid_controller::PIDController; use motion_planner::AStarPlanner; use sensor_fusion::KalmanFilter; use horus::prelude::*; fn main() -> Result<()> { let mut scheduler = Scheduler::new(); // Combine multiple packages into your system Ok(()) } ``` ### Updating Dependencies ```bash # Update all packages to latest versions horus update # Update a specific package horus update pid-controller # Preview updates without applying horus update --dry-run ``` --- ## Publishing To publish packages to the HORUS registry, see the **[Registry Author Guide](https://registry.horusrobotics.dev/guide)** on the registry website. It covers: - Authentication (`horus auth login`) - Publishing workflow (`horus publish`) - CI/CD integration (GitHub Actions, GitLab CI) - Package versioning and lifecycle (`yank`, `deprecate`, `owner`) - API key management Quick start: ```bash horus auth login # One-time GitHub OAuth cd my-package horus publish # Publish to registry ``` --- ## Troubleshooting ### Package Not Found ``` Error: Package 'nonexistent-package' not found in registry ``` ```bash # Check spelling / search for the correct name horus search sensor ``` ### Version Conflict ``` Error: Version conflict robot-controller requires motion-planner ^2.0 sensor-fusion requires motion-planner ^1.5 ``` ```bash # Update the conflicting package to a compatible version horus add sensor-fusion@2.0.0 ``` ### Build Failures ```bash # Clean and retry horus remove my-package horus add my-package # Check the registry for known issues horus info my-package ``` ### Registry Unavailable ```bash # Use cached packages for offline development horus list -g # Check connectivity horus doctor ``` --- ## Next Steps - **[Using Prebuilt Nodes](/package-management/using-prebuilt-nodes)** — Install and use community packages - **[Configuration](/package-management/configuration)** — horus.toml dependency syntax - **[Lockfile](/package-management/lockfile)** — Reproducible builds with horus.lock - **[CLI Reference](/development/cli-reference)** — Complete command documentation - **[Registry Guide](https://registry.horusrobotics.dev/guide)** — Publishing packages to the registry --- ## See Also - [horus.toml](/concepts/horus-toml) — Project manifest - [Lockfile](/package-management/lockfile) — Reproducible builds - [CLI Reference](/development/cli-reference) — `horus add`, `horus install` --- ## Lockfile & Reproducibility Path: /package-management/lockfile Description: How horus.lock pins dependencies for reproducible builds across machines and platforms # Lockfile & Reproducibility `horus.lock` is the single file that ensures every machine builds with identical dependencies. It pins exact versions for packages, toolchains, and system libraries. ``` horus.toml → what you WANT (version ranges, declarative) horus.lock → what you GOT (exact pins, reproducible) ``` This is the same pattern as `Cargo.lock`, `package-lock.json`, or `uv.lock`. ## Quick Reference | Task | Command | |------|---------| | Generate lockfile | `horus lock` | | Validate lockfile | `horus lock --check` | | Build (auto-verifies lockfile) | `horus build` | | Check system deps | `horus doctor` | --- ## How It Works ```bash # First build: resolves dependencies, creates horus.lock horus build # Subsequent builds: uses pinned versions from horus.lock horus build # Teammate clones repo, gets identical deps git clone && cd && horus build ``` `horus build` handles everything: 1. Reads `horus.toml` (your dependency declarations) 2. Reads `horus.lock` (pinned versions) — creates it if missing 3. Checks toolchain versions (Rust, Python) and warns on mismatches 4. Checks system dependencies via `pkg-config` and suggests install commands 5. Installs language packages (crates.io, PyPI, registry) 6. Compiles the project ## The Lockfile Format (v4) ```toml version = 4 config_hash = "sha256:a1b2c3..." [toolchain] rust = "1.78.0" python = "3.12.3" features = ["monitor"] [[package]] name = "horus_library" version = "0.1.9" source = "registry" checksum = "sha256:abc..." [[package]] name = "serde" version = "1.0.215" source = "crates.io" checksum = "sha256:def..." [[package]] name = "numpy" version = "1.26.4" source = "pypi" [[system]] name = "opencv" version = "4.8.1" pkg_config = "opencv4" apt = "libopencv-dev" brew = "opencv" pacman = "opencv" ``` ### Sections | Section | Purpose | |---------|---------| | `version` | Schema version (currently 4) | | `config_hash` | SHA-256 of `horus.toml` for staleness detection | | `[toolchain]` | Pinned Rust/Python/CMake versions | | `features` | Active feature flags at lock time | | `[[package]]` | Pinned package versions (registry, crates.io, PyPI) | | `[[system]]` | System dependencies with cross-platform package names | ## Cross-Platform System Dependencies Each `[[system]]` entry includes package names for multiple platforms: ```toml [[system]] name = "opencv" version = "4.8.1" pkg_config = "opencv4" # How to detect (cross-platform) apt = "libopencv-dev" # Debian/Ubuntu brew = "opencv" # macOS (Homebrew) pacman = "opencv" # Arch Linux choco = "opencv" # Windows (Chocolatey) ``` When `horus build` detects a missing system dependency, it prints the correct install command for your platform: ```bash # On Ubuntu: ✗ opencv — install with: sudo apt install -y libopencv-dev # On macOS: ✗ opencv — install with: brew install opencv # On Arch: ✗ opencv — install with: sudo pacman -S opencv ``` ## Commands ### `horus lock` Regenerate `horus.lock` from `horus.toml`: ```bash horus lock # [✓] Generated horus.lock v4 (12 packages) ``` ### `horus lock --check` Verify the lockfile is valid and check system dependencies: ```bash horus lock --check # [✓] horus.lock v4 is valid (12 packages, 2 system deps) # ⚠ Rust version mismatch: lockfile pins 1.78.0, you have 1.79.0 # ✗ opencv — install with: sudo apt install -y libopencv-dev ``` ### `horus build` Build the project. Automatically verifies the lockfile first: ```bash horus build # Lockfile verification: # ⚠ Python version mismatch: lockfile pins 3.12.3, you have 3.11.0 # [i] Building project in debug mode... ``` ## Workflows ### Team Development ```bash # Developer A: adds a dependency horus add numpy --source pypi horus build # updates horus.lock git add horus.toml horus.lock git commit -m "add numpy dependency" git push # Developer B: gets identical deps git pull horus build # reads horus.lock, installs exact versions ``` ### Robot Deployment ```bash # On dev machine: horus build --release scp -r . robot@192.168.1.5:~/project/ # On robot: cd ~/project horus build --release # horus.lock ensures identical deps ``` ### CI/CD ```yaml steps: - uses: actions/checkout@v4 - name: Install horus run: curl -fsSL https://github.com/softmata/horus/raw/release/install.sh | bash - name: Build run: horus build --release - name: Test run: horus test ``` The lockfile in the repo ensures CI builds match local builds exactly. ## Version Checking The `[toolchain]` section records which Rust/Python versions were used when the lockfile was generated. On `horus build`, version mismatches produce warnings (not errors): - **Same major.minor** (e.g., 1.78.0 vs 1.78.5): no warning - **Different minor** (e.g., 1.78.0 vs 1.79.0): warning printed - **Different major** (e.g., 3.12 vs 2.7): warning printed Warnings don't block the build — they inform you that behavior may differ. ## Backward Compatibility - `horus.lock` v3 files (packages only) are still readable - Missing sections (`[toolchain]`, `[[system]]`, `features`) default to empty - On the next `horus lock` or `horus build`, the file is upgraded to v4 ## Best Practices 1. **Commit `horus.lock` to git** — this is the reproducibility mechanism 2. **Don't edit `horus.lock` manually** — use `horus lock` to regenerate 3. **Run `horus lock --check` in CI** — catch dependency drift early 4. **Update lockfile when adding deps** — `horus add` + `horus build` updates it automatically 5. **Pin toolchain versions** — `horus doctor --fix` installs missing toolchains/system deps and pins their versions to `[toolchain]` and `[[system]]` ## Common Errors ### "Lockfile is stale" The `config_hash` in `horus.lock` doesn't match the current `horus.toml`. Run `horus lock` to regenerate. ### Toolchain version mismatch warnings Warnings like "Rust version mismatch: lockfile pins 1.78.0, you have 1.79.0" are informational and don't block the build. Update the lockfile with `horus lock` to record your current toolchain versions. ### Missing system dependency When `horus build` reports a missing system library, install it using the platform-specific command shown in the output (e.g., `sudo apt install -y libopencv-dev`). --- ## See Also - [Package Management](/package-management/package-management) — Dependency management - [Configuration](/package-management/configuration) — Registry configuration --- ## Configuration Reference Path: /package-management/configuration Description: Complete field reference for horus.toml — the single source of truth for HORUS projects # Configuration Reference `horus.toml` is the **single source of truth** for a HORUS project. It contains project metadata, dependencies (all languages), hardware devices, scripts, hooks, and build configuration. Native build files (`Cargo.toml`, `pyproject.toml`, `CMakeLists.txt`) are **generated** into `.horus/` automatically — you never edit them directly. --- ## Quick Reference **Minimal `horus.toml`:** ```toml [package] name = "my-robot" version = "0.1.0" ``` **Full `horus.toml`:** ```toml [package] name = "my-robot" version = "1.2.3" description = "Autonomous mobile robot with SLAM navigation" authors = ["Robotics Team "] license = "Apache-2.0" repository = "https://github.com/team/my-robot" package-type = "app" categories = ["navigation", "perception"] type = "bin" [dependencies] # HORUS registry (default source) pid-controller = "1.0" # Rust crates serde = { version = "1.0", source = "crates.io", features = ["derive"] } tokio = { version = "1", source = "crates.io", features = ["full"] } # Python packages numpy = { version = ">=1.24", source = "pypi" } opencv-python = { version = ">=4.8", source = "pypi" } # System libraries opencv = { source = "system", apt = "libopencv-dev", cmake_package = "OpenCV" } # Local path my-lib = { path = "../my-lib" } # Git my-driver = { git = "https://github.com/team/driver.git", branch = "main" } [dev-dependencies] criterion = { version = "0.5", source = "crates.io" } [robot] name = "turtlebot" description = "robot.urdf" simulator = "sim3d" [hardware] arm = { use = "dynamixel", port = "/dev/ttyUSB0", baudrate = 1000000 } lidar = { use = "rplidar", port = "/dev/ttyUSB1" } camera = { use = "opencv" } imu = { use = "bno055", bus = 1 } realsense = { use = "exec:./realsense_bridge", args = ["--serial", "12345"] } # Simulation-only device sim_lidar = { use = "rplidar", sim = true, noise = 0.01 } [scripts] sim = "horus sim start --world warehouse" deploy-pi = "horus deploy robot@192.168.1.5 --release" [hooks] pre_run = ["fmt", "lint"] pre_test = ["lint"] [ignore] files = ["debug_*.py"] directories = ["experiments/"] enable = ["cuda"] [cpp] compiler = "clang++" cmake_args = ["-DCMAKE_BUILD_TYPE=Release"] toolchain = "aarch64" ``` --- ## `[package]` Project metadata. Required for all projects except virtual workspaces. ### `name` (Required) **Type:** String **Validation:** 2-64 characters, lowercase alphanumeric + hyphens + underscores + `@` + `/` (for scoped names like `@org/package`) **Reserved names** (will be rejected): `horus`, `core`, `std`, `lib`, `test`, `main`, `admin`, `api`, `root`, `system`, `internal`, `config`, `setup`, `install` ```toml name = "my-robot" ``` ### `version` (Required) **Type:** String (semantic versioning) **Format:** `MAJOR.MINOR.PATCH` ```toml version = "0.1.0" ``` ### `description` (Optional) **Type:** String **Default:** None ```toml description = "Autonomous mobile robot with SLAM navigation" ``` ### `authors` (Optional) **Type:** Array of strings **Default:** `[]` ```toml authors = ["Jane Doe ", "Robotics Team"] ``` ### `license` (Optional) **Type:** String (SPDX identifier) **Default:** None **Common values:** `"Apache-2.0"`, `"MIT"`, `"GPL-3.0"`, `"BSD-3-Clause"` ```toml license = "Apache-2.0" ``` ### `edition` (Optional) **Type:** String **Default:** `"1"` Manifest schema version. Controls which fields and features are available. ```toml edition = "1" ``` ### `repository` (Optional) **Type:** String (URL) **Default:** None ```toml repository = "https://github.com/team/my-robot" ``` ### `package-type` (Optional) **Type:** String **Default:** None **Values:** `"node"`, `"driver"`, `"tool"`, `"algorithm"`, `"model"`, `"message"`, `"app"` Used for registry classification and discovery. ```toml package-type = "app" ``` ### `categories` (Optional) **Type:** Array of strings **Default:** `[]` ```toml categories = ["navigation", "perception", "control"] ``` ### `type` (Optional) **Type:** String **Default:** `"bin"` **Values:** `"bin"`, `"lib"`, `"both"` Crate target type. `"lib"` for shared libraries, `"bin"` for executables, `"both"` for crates that are both. ```toml type = "lib" ``` ### `standard` (Optional) **Type:** String **Default:** None C++ language standard for C++ projects. ```toml standard = "c++20" ``` ### `rust_edition` (Optional) **Type:** String **Default:** None Rust edition override for the generated `.horus/Cargo.toml`. If not set, defaults to `"2021"`. ```toml rust_edition = "2024" ``` --- ## `[dependencies]` All project dependencies — Rust, Python, system, local, and git — declared in one place. HORUS generates the appropriate native build files from these declarations. ### Simple Form A version string defaults to the HORUS registry: ```toml [dependencies] pid-controller = "1.0" sensor-fusion = "0.5" ``` ### Detailed Form A table with `source` and other fields: ```toml [dependencies] serde = { version = "1.0", source = "crates.io", features = ["derive"] } numpy = { version = ">=1.24", source = "pypi" } ``` ### Dependency Sources | Source | TOML value | When to use | Generated into | |--------|-----------|-------------|----------------| | HORUS Registry | `"registry"` (default) | HORUS packages | `.horus/Cargo.toml` or `.horus/pyproject.toml` | | crates.io | `"crates.io"` | Rust crates | `.horus/Cargo.toml` | | PyPI | `"pypi"` | Python packages | `.horus/pyproject.toml` | | System | `"system"` | OS packages (apt/brew) | `.horus/CMakeLists.txt` | | Path | `"path"` | Local workspace deps | `.horus/Cargo.toml` | | Git | `"git"` | Git repositories | `.horus/Cargo.toml` | ### Detailed Dependency Fields | Field | Type | Default | Description | |-------|------|---------|-------------| | `version` | String | None | Version requirement (`"1.0"`, `">=1.24"`, `"^2.0"`) | | `source` | String | `"registry"` | Dependency source (see table above) | | `features` | Array | `[]` | Cargo/Python features to enable | | `optional` | Boolean | `false` | Whether this dependency is optional | | `path` | String | None | Local path (for `source = "path"`) | | `git` | String | None | Git repository URL (for `source = "git"`) | | `branch` | String | None | Git branch | | `tag` | String | None | Git tag | | `rev` | String | None | Git revision hash | | `apt` | String | None | Apt package name (for `source = "system"`) | | `cmake_package` | String | None | CMake `find_package()` name (for `source = "system"`) | | `lang` | String | None | Language hint: `"cpp"`, `"rust"`, `"python"` | | `workspace` | Boolean | `false` | Inherit from `[workspace.dependencies]` | ### Examples by Source **crates.io (Rust):** ```toml [dependencies] serde = { version = "1.0", source = "crates.io", features = ["derive"] } tokio = { version = "1", source = "crates.io", features = ["full"] } ``` **PyPI (Python):** ```toml [dependencies] numpy = { version = ">=1.24", source = "pypi" } torch = { version = ">=2.0", source = "pypi" } ``` **System (OS packages):** ```toml [dependencies] opencv = { source = "system", apt = "libopencv-dev", cmake_package = "OpenCV" } eigen3 = { source = "system", apt = "libeigen3-dev", cmake_package = "Eigen3" } ``` **Path (local):** ```toml [dependencies] my-messages = { path = "../my-messages" } ``` **Git:** ```toml [dependencies] my-driver = { git = "https://github.com/team/driver.git", branch = "main" } my-algo = { git = "https://github.com/team/algo.git", tag = "v1.0" } ``` **Adding dependencies via CLI:** ```bash horus add serde --source crates.io --features derive horus add numpy --source pypi horus add pid-controller # auto-detects registry horus add ../my-lib --source path horus remove numpy ``` --- ## `[dev-dependencies]` Same format as `[dependencies]`. Dev-only dependencies are included in `horus test` and `horus bench` builds but excluded from `horus publish` and `horus deploy`. ```toml [dev-dependencies] criterion = { version = "0.5", source = "crates.io" } pytest = { version = ">=7.0", source = "pypi" } ``` Add via CLI: ```bash horus add criterion --dev --source crates.io ``` --- ## `[workspace]` Multi-crate workspace configuration. When present, this manifest is a workspace root. | Field | Type | Default | Description | |-------|------|---------|-------------| | `members` | Array of strings | `[]` | Glob patterns for workspace members (e.g., `["crates/*"]`) | | `exclude` | Array of strings | `[]` | Glob patterns to exclude from membership | | `dependencies` | Table | `{}` | Shared dependencies inherited by members via `workspace = true` | ```toml [workspace] members = ["crates/*"] exclude = ["crates/experimental"] [workspace.dependencies] serde = { version = "1.0", source = "crates.io", features = ["derive"] } horus_library = "0.1.9" ``` Members inherit shared dependencies: ```toml # crates/my-node/horus.toml [package] name = "my-node" version = "0.1.0" [dependencies] serde = { workspace = true } # inherits version + features from root horus_library = { workspace = true } ``` Create a workspace: ```bash horus new my-robot --workspace -r ``` ### Virtual Workspaces A workspace without `[package]` is a **virtual workspace** — it organizes members but is not a package itself: ```toml # Root horus.toml — no [package] section [workspace] members = ["sensor-node", "controller", "planner"] [workspace.dependencies] horus_library = "0.1.9" ``` Virtual workspaces must have at least one member. Use for monorepos where the root directory is just an organizer. --- ## `[hardware]` Hardware device configuration. Each entry declares a device by name, specifies which node drives it via the `use` field, and passes device-specific parameters. ### The `use` field Every device entry requires a `use` field that identifies the node responsible for the device. The value is looked up in the **node registry** — this includes Terra HAL drivers, registry packages, and local project nodes. ```toml [hardware] arm = { use = "dynamixel", port = "/dev/ttyUSB0", baudrate = 1000000 } lidar = { use = "rplidar", port = "/dev/ttyUSB1" } camera = { use = "opencv" } imu = { use = "bno055", bus = "i2c-1", address = 0x28 } ``` The `use` value resolves in order: 1. **Local project nodes** — a node defined in the current project 2. **Installed registry packages** — a driver package installed via `horus install` 3. **Terra HAL drivers** — a built-in hardware abstraction driver ### Subprocess drivers (`exec:` prefix) For vendor-provided binaries, custom bridge programs, or drivers written in languages without native HORUS bindings, prefix the `use` value with `exec:` to run an external binary as a child process: ```toml [hardware.camera] use = "exec:./realsense_bridge" args = ["--serial", "12345"] [hardware.lidar] use = "exec:/opt/velodyne/vlp16_driver" args = ["--port", "2368", "--model", "VLP-16"] ``` The `args` field is an array of strings passed as command-line arguments to the binary. The binary must publish and subscribe to HORUS topics using shared memory (any language with HORUS bindings works). ### Simulation devices (`sim` flag) Mark a device as simulation-only by setting `sim = true`. Simulation devices are loaded only when running with `--sim` (`horus run --sim` or `horus test --sim`) and skipped during real-hardware runs. ```toml [hardware] # Real hardware — always loaded (skipped in --sim mode if a sim variant exists) front_lidar = { use = "rplidar", port = "/dev/ttyUSB0" } imu = { use = "bno055", bus = 1 } # Simulation overrides — loaded only with --sim front_lidar_sim = { use = "rplidar", sim = true, noise = 0.01, range_max = 12.0, num_rays = 360 } imu_sim = { use = "bno055", sim = true, drift_rate = 0.001 } camera_sim = { use = "opencv", sim = true, width = 640, height = 480, fps = 30 } ``` When `horus run --sim` is active, devices with `sim = true` replace their non-sim counterparts (matched by the device name prefix before `_sim`). Your node code does not change between simulation and real hardware — the same topic names and message types are used in both modes. ### Device parameters All keys other than the reserved keys are captured as a `HashMap` and passed to the device node at runtime via `NodeParams`. Any TOML value type works — strings, integers, floats, booleans, arrays, and inline tables. ```toml [hardware.arm] use = "dynamixel" port = "/dev/ttyUSB0" baudrate = 1000000 servo_ids = [1, 2, 3, 4, 5, 6] # array parameter torque_limit = 0.8 # float parameter ``` ### Reserved keys | Key | Description | |-----|-------------| | `use` | **Required.** Node registry name (or `exec:` prefixed path for subprocess drivers) | | `sim` | Boolean. Marks device as simulation-only (`true`) | | `args` | Array of strings. Command-line arguments for `exec:` subprocess drivers | **Legacy keys** (still accepted, mapped internally): `terra`, `package`, `node`, `crate`, `source`, `pip`, `exec`, `simulated` ### Disabling a device ```toml [hardware] imu = false # Disabled — not loaded ``` ### Topic mapping (optional) Three reserved fields configure auto-bridging to HORUS topics: ```toml [hardware.imu] use = "mpu6050" bus = "i2c-1" topic = "sensors.imu" # Sensor data output topic [hardware.arm] use = "dynamixel" port = "/dev/ttyUSB0" topic_state = "arm.joint_states" # Joint state output topic topic_command = "arm.joint_cmd" # Command input topic ``` | Field | Description | |-------|-------------| | `topic` | Sensor data output topic name | | `topic_state` | State/feedback output topic name | | `topic_command` | Command input topic name | These fields are **not** included in `NodeParams` — they are bridge configuration used by the terra-horus layer to auto-create sensor/actuator forwarding. Access via `hw.topic_mapping("name")` in code. ### Common `use` values | Name | Hardware | |------|----------| | `dynamixel` | Dynamixel servos | | `rplidar` | SLAMTEC RPLiDAR | | `realsense` | Intel RealSense cameras | | `webcam` | V4L2 cameras | | `mpu6050`, `bno055` | I2C IMU sensors | | `vesc` | VESC motor controllers | | `i2c`, `spi`, `serial`, `can`, `gpio`, `pwm` | Raw bus access | See the [Driver API reference](/rust/api/drivers) for the runtime API. ### Migration from `[drivers]` / `[sim-drivers]` If you have an existing project using the old syntax, the mapping is straightforward: ```toml # Old syntax # New syntax [drivers] [hardware] arm = { terra = "dynamixel", port = "/dev/ttyUSB0" } arm = { use = "dynamixel", port = "/dev/ttyUSB0" } lidar = { package = "horus-driver-rplidar" } lidar = { use = "horus-driver-rplidar" } conveyor = { node = "ConveyorDriver" } conveyor = { use = "ConveyorDriver" } front_lidar = { crate = "rplidar-driver" } front_lidar = { use = "rplidar-driver" } imu = { pip = "adafruit-bno055", bus = 1 } imu = { use = "adafruit-bno055", bus = 1 } camera = { exec = "./realsense_bridge" } camera = { use = "exec:./realsense_bridge" } [sim-drivers] # Inline sim = true instead front_lidar = { simulated = true, noise = 0.01 } front_lidar_sim = { use = "rplidar", sim = true, noise = 0.01 } ``` The old `[drivers]` section name and the six source keys (`terra`, `package`, `node`, `crate`, `pip`, `exec`) are still parsed for backward compatibility. HORUS maps them to the `use` field internally. Likewise, `[sim-drivers]` entries with `simulated = true` are mapped to `sim = true`. No migration is required for existing projects, but new projects should use the `[hardware]` + `use` syntax. --- ## `[scripts]` Custom project commands. Like npm scripts or Justfiles. **Type:** Table of `name = "command"` string pairs ```toml [scripts] sim = "horus sim start --world warehouse" deploy-pi = "horus deploy robot@192.168.1.5 --release" test-hw = "cargo test --features hardware -- --ignored" ``` **Running scripts:** ```bash horus run sim # checks [scripts] before looking for files horus scripts sim # explicit script execution horus scripts # list all available scripts horus scripts sim -- -v # pass extra args after -- ``` **Hook integration:** Custom hook names in `[hooks]` that aren't `fmt`, `lint`, or `check` are looked up in `[scripts]`. For example, `post_test = ["clean-shm"]` runs the `clean-shm` script after tests. --- ## `[hooks]` Pre/post action hooks that run automatically before or after `horus run`, `horus build`, and `horus test`. | Field | Type | Default | Description | |-------|------|---------|-------------| | `pre_run` | Array of strings | `[]` | Run before `horus run` | | `pre_build` | Array of strings | `[]` | Run before `horus build` | | `pre_test` | Array of strings | `[]` | Run before `horus test` | | `post_test` | Array of strings | `[]` | Run after `horus test` | **Built-in hook names:** `fmt`, `lint`, `check`. Any other name is looked up in `[scripts]`. ```toml [hooks] pre_run = ["fmt", "lint"] # auto-format and lint before every run pre_test = ["lint"] # lint before testing post_test = ["clean-shm"] # custom script (defined in [scripts]) ``` **Skipping hooks:** ```bash horus run --no-hooks # skip all hooks for this run horus test --no-hooks # skip hooks for this test ``` If any hook fails (non-zero exit), the command is aborted and the error is shown. **Built-in hook names and what they do:** | Name | Runs | |------|------| | `fmt` | `horus fmt` (rustfmt + ruff format) | | `lint` | `horus lint` (clippy + ruff check) | | `check` | `horus check` (validate horus.toml + source) | Any other name is looked up in `[scripts]`. If not found in scripts, it's executed as a shell command. **Note:** Only `pre_run`, `pre_build`, `pre_test`, and `post_test` exist. There are no `post_run` or `post_build` hooks — the process exits after run/build completes. --- ## `[ignore]` Patterns to exclude from HORUS file scanning and processing. ### `files` **Type:** Array of glob patterns ```toml [ignore] files = ["debug_*.py", "test_*.rs", "**/experiments/**"] ``` ### `directories` **Type:** Array of directory names ```toml [ignore] directories = ["old/", "experiments/", "benchmarks/"] ``` ### `packages` **Type:** Array of package name strings Skip specific packages during auto-install. ```toml [ignore] packages = ["ipython", "debugpy"] ``` --- ## `enable` Top-level array of capability flags to enable. **Type:** Array of strings (at the top level, outside any table) ```toml enable = ["cuda", "editor"] ``` Capabilities are passed to the build system as feature flags. Use `horus run --enable cuda` for one-off activation without editing `horus.toml`. **Available capabilities:** | Capability | Description | |-----------|-------------| | `cuda`, `gpu` | CUDA GPU acceleration | | `editor` | Scene editor UI | | `python`, `py` | Python bindings | | `headless` | No rendering (for training/CI) | | `gpio`, `i2c`, `spi`, `can`, `serial` | Hardware interface support | | `opencv` | OpenCV backend | | `realsense` | Intel RealSense support | | `full` | Enable all features | --- ## `[cpp]` C++ build configuration. Only needed for projects with C++ code. | Field | Type | Default | Description | |-------|------|---------|-------------| | `compiler` | String | None | Override C++ compiler (e.g., `"clang++"`) | | `cmake_args` | Array of strings | `[]` | Additional CMake arguments | | `toolchain` | String | None | Cross-compilation target (e.g., `"aarch64"`, `"armv7"`) | ```toml [cpp] compiler = "clang++" cmake_args = ["-DCMAKE_BUILD_TYPE=Release", "-DBUILD_TESTS=ON"] toolchain = "aarch64" ``` --- ## `[robot]` Robot-specific metadata. Used by the simulator, URDF loading, and topic namespacing. | Field | Type | Default | Description | |-------|------|---------|-------------| | `name` | String | None | Robot name, used as a namespace prefix in topic names (e.g., `turtlebot.imu`) | | `description` | String | None | Path to the robot's URDF file, relative to the project root | | `simulator` | String | `"sim3d"` | Simulator plugin to use when running in simulation mode | ```toml [robot] name = "turtlebot" description = "robot.urdf" simulator = "sim3d" ``` The `name` field sets the robot identity for the session. When present, some drivers and tools use it to namespace topics (e.g., a lidar driver might publish to `turtlebot.scan` instead of `scan`). The `description` field points to a URDF file that describes the robot's kinematic structure. The simulator and transform frame system both read this file to build the robot model. The path is relative to the project root: ``` my_project/ ├── horus.toml ├── robot.urdf ← description = "robot.urdf" ├── models/ │ └── arm.urdf ← description = "models/arm.urdf" └── src/ └── main.rs ``` The `simulator` field selects which simulator plugin to launch when you run `horus run --sim`. The default is `"sim3d"` (the built-in Bevy 3D physics simulator). To use a different simulator plugin, install it from the registry and set its name here: ```bash horus install my-custom-sim ``` ```toml [robot] simulator = "my-custom-sim" ``` --- ## The `.horus/` Directory The `.horus/` directory is **automatically generated** by HORUS. You should never edit files inside it. ``` my_project/ ├── horus.toml ← You edit this (single source of truth) ├── src/ │ ├── main.rs │ └── main.py └── .horus/ ← Generated (don't edit) ├── Cargo.toml ← Generated from horus.toml [dependencies] (Rust deps) ├── pyproject.toml ← Generated from horus.toml [dependencies] (Python deps) ├── CMakeLists.txt ← Generated from horus.toml [dependencies] (C++ deps) ├── target/ ← Rust build artifacts ├── cpp-build/ ← CMake build artifacts └── packages/ ← Cached registry packages ``` **Git:** Always add `.horus/` to `.gitignore`. The `horus new` command does this automatically. **Cleaning:** If `.horus/` gets corrupted: ```bash horus clean --all # delete everything, regenerated on next build ``` **When `.horus/` is created:** Automatically on `horus run`, `horus build`, or `horus install`. --- ## Validation HORUS validates `horus.toml` on every command (`horus run`, `horus build`, `horus check`, etc.). ### Validation Rules | Rule | Details | |------|---------| | **Name** | 2-64 chars, lowercase alphanumeric + `-_@/`. 14 reserved names rejected. | | **Version** | Must be valid semver: `MAJOR.MINOR.PATCH` (e.g., `"1.0.0"`, not `"1.0"`) | | **Edition** | Only `"1"` is recognized. Unknown editions produce a warning, not an error. | | **Sources** | Must be one of: `registry`, `crates.io`, `pypi`, `system`, `path`, `git` | | **Hardware** | Each device entry must have a `use` field. Legacy source keys (`terra`, `package`, `node`, `crate`, `pip`, `exec`) are accepted for backward compatibility. | | **Workspace** | Virtual workspace (no `[package]`) must have at least one member | ### Source auto-inference You can omit `source` when `path` or `git` is present — HORUS infers the source: ```toml # No source needed — inferred from path my-lib = { path = "../my-lib" } # → DepSource::Path # No source needed — inferred from git my-driver = { git = "https://..." } # → DepSource::Git ``` ### Common Errors | Error | Fix | |-------|-----| | `Missing required field 'name'` | Add `name` and `version` under `[package]` | | `Invalid version format '1.0'` | Use `MAJOR.MINOR.PATCH` format: `"1.0.0"` (3 segments required) | | `Unknown dependency source 'npm'` | Use: `registry`, `crates.io`, `pypi`, `system`, `path`, `git` | | `Invalid TOML syntax at line N` | Check TOML syntax: keys use `=`, strings are quoted, tables use `[brackets]` | | `Package name 'horus' is reserved` | Choose a different name (14 names are reserved) | | `Virtual workspace must have members` | Add at least one path to `[workspace] members = [...]` | --- ## Design Decisions **Why one file instead of per-language configs?** A robot using Rust for control and Python for ML traditionally needs `Cargo.toml`, `pyproject.toml`, and possibly `requirements.txt`. When a team member adds a dependency, they need to know which file to edit. `horus.toml` is the single source of truth: all dependencies are declared once with an explicit `source` field. HORUS generates the native build files into `.horus/` automatically. One file to learn, one file to review in PRs, one file to validate in CI. **Why generated build files in .horus/?** Native build tools (cargo, pip, cmake) need their own config files to function. Rather than forcing users to maintain both `horus.toml` and `Cargo.toml` in sync, HORUS generates the native files into `.horus/`. Users never edit these files. `cargo build` and `pip install` still work under the hood with their standard tooling — HORUS generates their input, it does not replace build systems. **Why TOML?** YAML has implicit type coercion (`3.10` becomes `3.1`, `yes` becomes `true`). TOML has an unambiguous grammar — every value has an explicit type. HORUS still uses YAML for launch configs and parameter files where flexibility is useful, but the manifest uses TOML because dependency versions must be parsed unambiguously. ## Trade-offs | Gain | Cost | |------|------| | **Single manifest** — one file for all languages | Must specify `source` for non-default registries | | **Generated build files** — native tooling works unchanged | Cannot use `Cargo.toml` features not exposed by `horus.toml` | | **`horus add/remove`** — no need to know per-language syntax | `cargo add` and `pip install` don't update `horus.toml` (use `horus cargo add` or `horus pip install` for auto-sync) | | **Workspace support** — multi-crate with shared deps | Members still need their own `horus.toml` | --- ## See Also - [horus.toml Concept](/concepts/horus-toml) — Why a single manifest - [Package Management](/package-management/package-management) — Install, search, publish - [CLI Reference](/development/cli-reference) — `horus add`, `horus build`, `horus run` - [Multi-Crate Workspaces](/development/workspaces) — Workspace setup guide - [Native Tool Integration](/development/native-tools) — `horus cargo`, `horus pip` proxy --- ## Publishing & Registry Path: /package-management/publishing Description: Publish packages, manage versions, and use the HORUS registry — the package ecosystem for robotics # Publishing & Registry The [HORUS Registry](https://registry.horusrobotics.dev) is the package ecosystem for robotics — drivers, algorithms, plugins, and tools shared across the community. This page covers the author/publisher workflow. For installing packages, see [Package Management](/package-management/package-management). > **Full guide**: The registry website has a comprehensive [Publisher Guide](https://registry.horusrobotics.dev/guide) with screenshots and detailed walkthroughs. This page is a quick reference. --- ## Authentication ```bash # Login with GitHub (opens browser) horus auth login # Check who you're logged in as horus auth whoami # Generate API key for CI/CD publishing horus auth api-key --name github-actions --environment ci-cd # Generate Ed25519 signing key for package signing horus auth signing-key # List and revoke API keys horus auth keys list horus auth keys revoke # Logout horus auth logout ``` --- ## Publishing a Package ### 1. Prepare your package Your `horus.toml` must have at minimum: ```toml [package] name = "my-lidar-driver" version = "0.1.0" description = "RPLiDAR A2 driver for HORUS" license = "MIT" package-type = "driver" categories = ["lidar", "driver"] ``` ### 2. Validate ```bash horus publish --dry-run # Validates: horus.toml, source files, builds, checks package size ``` ### 3. Publish ```bash horus publish ``` The registry validates the package, builds it, and makes it available for `horus install` and `horus add`. **Rules**: - Package names must be unique across the registry - Once a version is published, that version number cannot be reused — bump the version for changes - Version must follow semver: `MAJOR.MINOR.PATCH` --- ## Version Management ### Yanking (reversible) Yanking hides a version from new installs but doesn't break existing users who already have it: ```bash # Yank a broken version horus yank my-lidar-driver@0.1.0 --reason "bug in scan parsing" # Reverse a yank horus unyank my-lidar-driver@0.1.0 ``` ### Unpublishing (irreversible) Permanently deletes a version. Use with caution — other packages may depend on it: ```bash horus unpublish my-lidar-driver@0.1.0 --yes ``` ### Deprecating a Package Mark an entire package as deprecated to discourage new installs: ```bash # Deprecate with a migration message horus deprecate my-lidar-driver -m "Use rplidar-driver-v2 instead" # Remove deprecation horus undeprecate my-lidar-driver ``` --- ## Ownership & Transfer Packages can have multiple owners. Any owner can publish new versions and manage other owners. ```bash # List current owners horus owner list my-lidar-driver # Add a co-owner horus owner add my-lidar-driver --user teammate # Remove an owner horus owner remove my-lidar-driver --user former-teammate # Transfer ownership to another user horus owner transfer my-lidar-driver --target new-maintainer # Accept a pending transfer horus owner accept ``` --- ## CI/CD Publishing Automate publishing from GitHub Actions or GitLab CI: ```yaml # .github/workflows/publish.yml name: Publish to HORUS Registry on: push: tags: ['v*'] jobs: publish: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: Install HORUS run: curl -fsSL https://github.com/softmata/horus/raw/release/install.sh | bash - name: Publish run: horus publish env: HORUS_API_KEY: ${{ secrets.HORUS_API_KEY }} ``` Generate the API key: ```bash horus auth api-key --name github-actions --environment ci-cd # Save the key as a GitHub secret: HORUS_API_KEY ``` --- ## Developer Portal The [Developer Portal](https://registry.horusrobotics.dev/developer) is a web dashboard for managing your published packages. After signing in with GitHub, you can: - **Edit metadata** — description, categories, keywords, README - **Browse source files** — syntax-highlighted file browser - **View download stats** — charts showing trends over time - **Manage owners** — add/remove co-owners, transfer ownership - **Yank/unyank versions** — via web UI instead of CLI - **Deprecate packages** — with a migration message - **Upload media** — screenshots, GIFs, video embeds - **View build status** — trigger verification, view logs - **Generate API keys** — for CI/CD publishing - **SBOM viewer** — software bill of materials - **License compatibility** — check across dependency tree --- ## Package Types | Type | Description | Example | |------|-------------|---------| | `driver` | Hardware driver | RPLiDAR, Dynamixel, RealSense | | `algorithm` | Algorithm/library | SLAM, path planning, PID | | `plugin` | CLI extension | `horus sim3d`, `horus visualize` | | `tool` | Development tool | Linter, formatter, analyzer | | `model` | ML model | YOLO, depth estimation | | `message` | Custom message types | Industry-specific protocols | | `app` | Complete application | Warehouse robot, delivery bot | | `node` | Reusable node | Sensor fusion, motor controller | Set in `horus.toml`: ```toml [package] package-type = "driver" ``` --- ## Common Errors | Error | Cause | Fix | |-------|-------|-----| | `Not authenticated` | Not logged in | `horus auth login` | | `Package name already taken` | Name conflict | Choose a different name | | `Version already published` | Can't reuse versions | Bump version in horus.toml | | `Registry unavailable` | Network issue or registry down | Check https://api.horusrobotics.dev/health | | `Package too large` | Exceeds size limit | Check `.horusignore`, exclude test data/binaries | | `Missing required field` | horus.toml incomplete | Add `description`, `license`, `package-type` | --- ## See Also - [Package Management](/package-management/package-management) — Installing and managing packages - [Using Pre-Built Nodes](/package-management/using-prebuilt-nodes) — Compose systems from registry packages - [Configuration Reference](/package-management/configuration) — `horus.toml` format - [CLI Reference](/development/cli-reference) — All publish/auth/owner commands - [Registry Guide](https://registry.horusrobotics.dev/guide) — Full publisher guide on the registry website ======================================== # SECTION: Performance ======================================== --- ## Performance Optimization Path: /performance/performance Description: Get maximum performance from HORUS # Performance Optimization HORUS is already fast by default. This guide helps you squeeze out extra performance when needed. ## Problem Statement Your HORUS application works correctly but you need to reduce latency, increase throughput, or meet real-time deadlines for production deployment. ## When To Use - Your node `tick()` exceeds 1ms and you need to find the bottleneck - You are deploying to resource-constrained hardware (Raspberry Pi, Jetson Nano) - You need bounded latency for safety-critical control loops - Your throughput doesn't meet requirements for high-frequency sensors ## Prerequisites - A working HORUS application (optimize correctness first, then performance) - Release builds (`horus run --release`) -- debug builds are 10-100x slower - Basic understanding of [Nodes](/concepts/core-concepts-nodes) and [Scheduler](/concepts/core-concepts-scheduler) --- ## Cross-Platform Philosophy HORUS is designed for **development on any OS** with **production deployment on Linux**: | Phase | Supported Platforms | Performance | |-------|---------------------|-------------| | **Development** | Windows, macOS, Linux | Good (standard IPC) | | **Testing** | Windows, macOS, Linux | Good (standard IPC) | | **Production** | Linux (recommended) | Best (sub-100ns with RT) | All performance features use **graceful degradation** — your code runs everywhere, with maximum performance on Linux. Advanced features like `.prefer_rt()` (which enables SCHED_FIFO and mlockall on Linux) and SIMD acceleration automatically fall back to safe defaults on unsupported platforms. ## Why HORUS is Fast ### Shared Memory Architecture **Zero network overhead**: Data written to shared memory, read directly by subscribers HORUS automatically selects the optimal shared memory backend for your platform (Linux, macOS, Windows). No configuration needed. **Zero serialization**: Fixed-size structs copied directly to shared memory **Zero-copy loan pattern**: Publishers write directly to shared memory slots ### Data Path Where does latency actually go? This diagram shows the publish-to-subscribe path: ``` publish() │ ▼ ┌─────────────────────┐ │ POD type? │ │ yes → memcpy │ ◄── ~6ns │ no → serialize │ ◄── 50-500ns └─────────────────────┘ │ ▼ ┌─────────────────────┐ │ Ring buffer slot │ (shared memory) └─────────────────────┘ │ │ atomic index update ◄── ~2ns │ ▼ recv() │ ▼ ┌─────────────────────┐ │ POD → memcpy │ │ else → deserialize │ └─────────────────────┘ ``` **Where time is spent:** - **POD path** (fixed-size `Copy` types): ~11ns for the memcpy, ~3ns for the atomic — total **14ns** same-thread (measured via RDTSC on i9-14900K) - **Serde path** (types with `Vec`, `String`, etc.): serialization dominates, typically 50-500ns depending on size - **Cross-thread overhead**: cache-line transfer adds ~68ns for SPSC, ~164ns for contended MPMC This is why fixed-size types matter: they skip serialization entirely. ### Optimized Data Structures HORUS uses carefully optimized memory layouts to minimize latency. The communication paths are designed for maximum throughput with predictable timing — same-thread paths achieve **14ns**, cross-thread SPSC achieves **82ns**, and cross-process achieves **162ns** with only 99ns overhead over the 63ns hardware floor. ## Benchmark Results Measured on **Intel i9-14900K (32 cores), WSL2, release mode, RDTSC timing**. Run `cargo run --release -p horus_benchmarks --bin all_paths_latency` to reproduce on your hardware. For detailed methodology and raw data, see the dedicated **[Benchmarks](/performance/benchmarks)** page. ### IPC Latency (All Backend Paths) | Scenario | Backend | p50 | p99 | p99.9 | max | |----------|---------|-----|-----|-------|-----| | Same thread | DirectChannel | **12ns** | 13ns | 13ns | 13ns | | Cross-thread 1:1 | SpscIntra | **91ns** | 107ns | 125ns | 125ns | | Cross-thread 1:N | SpmcIntra | **80ns** | 92ns | 94ns | 94ns | | Cross-thread N:1 | MpscIntra | **187ns** | 372ns | 458ns | 464ns | | Cross-thread N:N | FanoutIntra | **150ns** | 307ns | 322ns | 322ns | | Cross-process 1:1 | SpscShm | **171ns** | 192ns | 195ns | 195ns | | Cross-process MPMC | FanoutShm | **91ns** | 230ns | — | — | | Cross-process broadcast | PodShm | **152ns** | 227ns | 254ns | 254ns | | Hardware floor (raw SHM atomic) | — | **57ns** | — | — | — | Framework overhead: ~99ns over the 63ns hardware floor for cross-process 1:1. ### Robotics Message Types | Message | Size | Median | p99 | Throughput | |---------|------|--------|-----|-----------| | CmdVel | 16B | **89ns** | 91ns | 11.1M msg/s | | Imu | 304B | **119ns** | 150ns | 7.8M msg/s | | LaserScan | 1,480B | **151ns** | 184ns | 6.3M msg/s | | JointCommand | 928B | **128ns** | 157ns | 8.1M msg/s | All message types pass real-time suitability: CmdVel at 10kHz (p99=91ns), Imu at 500Hz (p99=150ns), LaserScan at 40Hz (p99=184ns). ### Python Binding Performance | Operation | p50 | p99 | Throughput | Overhead vs Rust | |-----------|-----|-----|-----------|-----------------| | CmdVel send+recv (typed) | **1.7μs** | 2.4μs | 2.7M msg/s | 78x (GIL + PyO3) | | Pose2D send+recv (typed) | **1.7μs** | 3.0μs | 2.7M msg/s | 76x | | Imu send+recv (typed) | **1.9μs** | 4.2μs | 2.4M msg/s | 63x | | dict send+recv (1 key) | **6.2μs** | 19.9μs | 714K msg/s | 284x | | dict send+recv (4 keys) | **12.4μs** | 34.2μs | 382K msg/s | 564x | | dict send+recv (50 keys) | **111μs** | 196μs | 42K msg/s | 5,065x | | Image.to_numpy (640x480) | **3.0μs** | 14.7μs | 1.5M/s | — | | np.from_dlpack (640x480) | **1.1μs** | 3.9μs | 3.5M/s | Zero-copy | **Key takeaway**: Typed Python topics (`Topic(CmdVel)`) achieve **1.7μs** — fast enough for 30Hz ML inference pipelines. Generic dict topics are 4-60x slower due to serialization. DLPack gives true zero-copy image access at **1.1μs**. **Scheduler tick overhead**: Python GIL acquire adds ~11μs per tick (Rust→Python→Rust). At 10kHz target, achieved 5,932 Hz — GIL is the bottleneck. For high-frequency control, use Rust nodes. ### HORUS vs Competition | Transport | Size | p50 | Throughput | Speedup | |-----------|------|-----|-----------|---------| | **HORUS SHM** | 8B | **23ns** | 100M+ msg/s | — | | Raw UDP | 8B | 1,235ns | 3.9M msg/s | **54x slower** | | **HORUS SHM** | 32B | **23ns** | 101M+ msg/s | — | | Raw UDP | 32B | 1,122ns | 4.1M msg/s | **49x slower** | ### Scalability | Topology | Throughput | |----------|-----------| | 1 pub, 1 sub | 2.4M msg/s | | 2 pub, 1 sub | 7.2M msg/s | | 4 pub, 1 sub | 11.8M msg/s | | 4 pub, 4 sub | 11.2M msg/s | | 8 pub, 8 sub | 8.4M msg/s | | Peak (6 pub, 1 sub) | **13.5M msg/s** | ### Real-Time Determinism | Metric | Value | |--------|-------| | Median latency | 86ns | | p99 | 109ns | | p99.9 | 112ns | | Std dev | 7.9ns | | Deadline misses at 1μs | 0.02% (212/1M) | ### Hardware Baselines (Raw Operations) | Operation | Size | Median | |-----------|------|--------| | memcpy | 8B | 11ns | | memcpy | 1KB | 17ns | | memcpy | 8KB | 49ns | | memcpy | 64KB | 811ns | | Atomic store+load | 8B | 11ns | | mmap write+read | 8B | 11ns | ### Running Benchmarks ```bash # All backend paths (main benchmark, ~2 min) cargo run --release -p horus_benchmarks --bin all_paths_latency # Robotics message types (CmdVel, Imu, LaserScan) cargo run --release -p horus_benchmarks --bin robotics_messages_benchmark # HORUS vs UDP comparison cargo run --release -p horus_benchmarks --bin competitor_comparison # Scalability (thread count sweep) cargo run --release -p horus_benchmarks --bin scalability_benchmark # Hardware floor baselines cargo run --release -p horus_benchmarks --bin raw_baselines # RT determinism analysis cargo run --release -p horus_benchmarks --bin determinism_benchmark # Full suite (~30 min) ./benchmarks/research/run_all.sh ``` ## Build Optimization ### Always Use Release Mode Debug builds are **10-100x slower**: ```bash # SLOW: Debug build (50us/tick) horus run # FAST: Release build (500ns/tick) horus run --release ``` Always use `--release` for benchmarks and production. There is no scenario where profiling debug builds gives useful results. ### Link-Time Optimization (LTO) Enable LTO in your `Cargo.toml` for additional 10-20% speedup: ```toml # Cargo.toml [profile.release] opt-level = 3 lto = "fat" codegen-units = 1 ``` **Warning**: Slower compilation, but faster execution. ### Target CPU Features **CPU-Specific Optimizations:** HORUS compiles with Rust compiler optimizations enabled in release mode. For advanced CPU-specific tuning, the framework is optimized for modern x86-64 and ARM64 processors. **Gains**: 5-15% from CPU-specific SIMD instructions (automatically enabled in release builds). ### Hardware Acceleration HORUS automatically uses hardware-accelerated memory operations when available (e.g., SIMD on x86_64). No configuration needed — your code runs on any platform, with extra performance on supported hardware. For maximum performance, compile targeting your specific CPU: ```bash RUSTFLAGS="-C target-cpu=native" cargo build --release ``` ## Message Optimization ### Use Fixed-Size Types ```rust // simplified // FAST: Fixed-size array pub struct LaserScan { pub ranges: [f32; 360], // Stack-allocated } // SLOW: Dynamic vector pub struct BadLaserScan { pub ranges: Vec, // Heap-allocated } ``` **Impact**: Fixed-size avoids heap allocations in hot path. ### Choose Typed Messages Over Generic ```rust // simplified // FAST: Small, fixed-size struct let topic: Topic = Topic::new("pose")?; topic.send(Pose2D { x: 1.0, y: 2.0, theta: 0.5 }); // IPC latency: ~23-155ns depending on topology // SLOWER: Larger struct with more data let topic: Topic = Topic::new("sensors")?; // Latency scales linearly with message size ``` **Rule**: Use the smallest struct that represents your data. Avoid padding and unused fields. ### Choose Appropriate Precision ```rust // simplified // f32 (single precision) - sufficient for most robotics pub struct FastPose { pub x: f32, // 4 bytes pub y: f32, // 4 bytes } // f64 (double precision) - scientific applications pub struct PrecisePose { pub x: f64, // 8 bytes pub y: f64, // 8 bytes } ``` **Rule**: Use `f32` unless you need scientific precision. ### Minimize Message Size Every byte adds latency — message size is the single biggest factor after backend selection. ```rust // simplified // GOOD: 8 bytes — fast memcpy struct CompactCmd { linear: f32, // 4 bytes angular: f32, // 4 bytes } // BAD: 1KB+ — unnecessary bulk struct BloatedCmd { linear: f32, angular: f32, metadata: [u8; 256], // Unused debug_info: [u8; 768], // Unused } ``` For genuinely large data (images, point clouds), compress before publishing: ```rust // simplified // Large raw image: 1MB per message pub struct RawImage { pixels: [u8; 1_000_000], } // Compressed: ~50KB per message (20x smaller) pub struct CompressedImage { data: Vec, // JPEG compressed } ``` Compression adds CPU cost but dramatically reduces IPC time for large payloads. Profile to find the crossover point for your use case (typically around 10KB+). ### Batch Small Messages Instead of sending 100 separate f32 values: ```rust // simplified // SLOW: 100 separate messages for value in values { topic.send(value); // 100 IPC operations } // FAST: One batched message pub struct BatchedData { values: [f32; 100], } topic.send(batched); // 1 IPC operation ``` **Speedup**: 50-100x for batched operations. ## Node Optimization ### Keep tick() Fast Target: **<1ms per tick** for real-time control. ```rust // simplified // GOOD: Fast tick fn tick(&mut self) { let data = self.read_sensor(); // Quick read self.process_pub.send(data); // ~500ns } // BAD: Slow tick fn tick(&mut self) { let data = std::fs::read_to_string("config.yaml").unwrap(); // 1-10ms! // ... } ``` **File I/O, network calls, sleeps = slow**. Do these in `init()` or use `.async_io()` execution class for I/O-bound nodes. ### Pre-Allocate in init() Heap allocations in `tick()` are one of the most common performance killers. The allocator may need to request memory from the OS, which can take microseconds and is unpredictable. ```rust // simplified struct MyNode { buffer: Vec, // Pre-allocated storage device: Device, config: Config, } fn init(&mut self) -> Result<()> { // Pre-allocate buffers self.buffer = vec![0.0; 10000]; // Open connections self.device = Device::open()?; // Load configuration self.config = Config::from_file("config.yaml")?; Ok(()) } fn tick(&mut self) { // Use pre-allocated resources — no allocations here! self.buffer[0] = self.device.read(); // Reuse self.buffer instead of creating new Vecs } ``` Common hidden allocations to watch for: `format!()`, `String::from()`, `Vec::push()` past capacity, `collect()`, `to_string()`. ### Avoid Unnecessary Cloning ```rust // simplified // BAD: Unnecessary clone fn tick(&mut self) { if let Some(data) = self.sub.recv() { let copy = data.clone(); // Unnecessary! self.process(copy); } } // GOOD: Direct use fn tick(&mut self) { if let Some(data) = self.sub.recv() { self.process(data); // Already cloned by recv() } } ``` `Topic::recv()` already clones data. Don't clone again. ### Minimize Logging in Hot Paths Logging involves formatting strings (allocation), writing to a sink (I/O), and often acquiring a lock. In a 1kHz control loop, that overhead adds up fast. ```rust // simplified // BAD: Logging every tick at 60Hz = 60 format! + write calls/sec fn tick(&mut self) { hlog!(debug, "Tick #{}", self.counter); // Slow! self.counter += 1; } // GOOD: Conditional logging — 1 log per 1000 ticks fn tick(&mut self) { if self.counter % 1000 == 0 { hlog!(info, "Reached tick #{}", self.counter); } self.counter += 1; } ``` **Rule**: Log sparingly in hot paths. Use `horus monitor` for real-time metrics instead of printf-style debugging. ## Scheduler Optimization ### Understanding Tick Rate The default scheduler runs at 100 Hz (10ms per tick). Use `.tick_rate()` to change it: ```rust // simplified // Default: 100 Hz let scheduler = Scheduler::new(); // 10kHz for high-performance control loops let scheduler = Scheduler::new().tick_rate(10000_u64.hz()); ``` **Key Point**: Keep individual node tick() methods fast (ideally <1ms) to maintain the target tick rate. ### Use Priority Levels ```rust // simplified // Critical tasks run first (order 0 = highest) scheduler.add(safety).order(0).build()?; // Logging runs last (order 100 = lowest) scheduler.add(logger).order(100).build()?; ``` **Predictable execution order** = better performance. Use lower numbers for higher priority tasks. ### Minimize Node Count ```rust // simplified // BAD: 50 small nodes for i in 0..50 { scheduler.add(TinyNode::new(i)).order(50).build()?; } // GOOD: One aggregated node scheduler.add(AggregatedNode::new()).order(50).build()?; ``` **Fewer nodes** = less scheduling overhead. ## Ultra-Low-Latency Networking (Linux) HORUS provides optional kernel bypass networking for sub-microsecond latency requirements. ### Transport Options | Transport | Latency (send+recv) | Throughput | Requirements | |-----------|---------|------------|--------------| | Shared Memory (same thread) | **14ns** | 100M+ msg/s | Local only | | Shared Memory (cross thread, 1:1) | **82ns** | 13M+ msg/s | Local only | | io_uring | 2-3us | 500K+ msg/s | Linux 5.1+ | | Batch UDP | 3-5us | 300K+ msg/s | Linux 3.0+ | | Standard UDP | 5-10us | 200K+ msg/s | Cross-platform | ### Enable io_uring Transport io_uring eliminates syscalls on the send path using kernel-side polling: ```bash # Build with io_uring support (Cargo feature flag) cargo build --release --features io-uring-net ``` **Requirements:** - Linux 5.1+ (5.6+ recommended for SQ polling) - CAP_SYS_NICE capability for SQ_POLL mode ### Batch UDP and Combined Features Batch UDP (`sendmmsg`/`recvmmsg`) is automatically enabled on Linux 3.0+ with no extra flags. To enable all ultra-low-latency features together (io_uring + batch UDP): ```bash cargo build --release --features ultra-low-latency ``` ### Smart Transport Selection For network topics, HORUS automatically selects the best transport based on available system features and kernel version. Configure network endpoints through topic configuration rather than the `Topic::new()` API (which creates local shared memory topics). See [Network Backends](/advanced/network-backends) for details. ## Shared Memory Optimization HORUS uses platform-native shared memory managed by `horus_sys` — you never need to manage paths manually. - **Check space**: `horus doctor` includes a shared memory space check. On Linux, tmpfs defaults to 50% of RAM — increase it if messages are being dropped. - **Cleanup**: `horus clean --shm` removes stale topics (rarely needed — cleanup is automatic). - **Memory footprint**: Each topic slot is proportional to message size (`Topic` = 16B/slot, `Topic` = 120KB/slot). Smaller messages = lower total shared memory usage. ## Profiling and Measurement ### Built-In Metrics HORUS automatically tracks node performance metrics. Use `horus monitor` to view real-time performance data including tick duration, messages sent, and CPU usage. **Available metrics** (on `NodeMetrics`): - `total_ticks`: Total number of ticks - `avg_tick_duration_ms`: Average tick time in milliseconds - `max_tick_duration_ms`: Worst-case tick time in milliseconds - `messages_sent`: Messages published - `messages_received`: Messages received - `errors_count`: Total error count - `uptime_seconds`: Node uptime in seconds ### IPC Latency Logging HORUS automatically tracks IPC timing for each topic operation. The `horus monitor` web interface displays per-log-entry metrics: ``` Tick: 12us | IPC: 296ns ``` Each log entry includes `tick_us` (node tick time in microseconds) and `ipc_ns` (IPC write time in nanoseconds). ### CPU Profiling with perf and Flamegraphs `perf` is the standard Linux profiler. Combined with flamegraphs, it pinpoints exactly where CPU time goes. **Step 1: Record a profile** ```bash # Profile your HORUS application for 30 seconds perf record -g --call-graph dwarf -- horus run --release # Or profile an already-running process perf record -g --call-graph dwarf -p $(pidof horus) -- sleep 30 ``` The `-g --call-graph dwarf` flags capture full call stacks using DWARF debug info. If your binary is stripped, use `--call-graph fp` instead (requires frame pointers). **Step 2: Generate a flamegraph** ```bash # Install the tools (one-time) cargo install inferno # Convert perf data to a flamegraph perf script | inferno-collapse-perf | inferno-flamegraph > flame.svg ``` Open `flame.svg` in a browser — it is interactive (click to zoom). **Step 3: Read the flamegraph** - **Width of a bar** = proportion of total CPU time in that function. Wide bars are hot. - **Stack depth** (vertical) = call chain. Read bottom-to-top: `main` at bottom, leaf functions at top. - **Look for**: `alloc::`, `__GI___libc_malloc` — allocator calls in hot paths. `syscall`, `__kernel_` — unexpected kernel transitions. Your `tick()` function — is it the widest bar? If not, something else dominates. - **Ignore**: `perf-` artifacts, idle/sleep functions. **Alternative: cargo-flamegraph** For a simpler workflow that wraps `perf` automatically: ```bash cargo install flamegraph # Generate flamegraph directly cargo flamegraph --bin horus -- run --release # Output: flamegraph.svg ``` ### Memory Profiling CPU profiling catches slow code; memory profiling catches hidden allocations that cause latency spikes and unbounded growth. **heaptrack** traces every allocation with full call stacks and low overhead (~2x slowdown, much less than Valgrind): ```bash # Install (Debian/Ubuntu) sudo apt install heaptrack heaptrack-gui # Profile your application heaptrack horus run --release # Analyze results (GUI) heaptrack_gui heaptrack.horus.*.zst # Or analyze in terminal heaptrack_print heaptrack.horus.*.zst ``` **What to look for:** - **Peak allocation**: Total heap high-water mark. If this grows over time, you have a leak. - **Allocation rate during steady-state**: After `init()` completes, allocations should drop to near-zero. If you see steady allocation in `tick()`, something is allocating per-tick (format strings, Vec growth, String building). - **Top allocation sites**: Sort by count, not size. Thousands of small allocations hurt latency more than one large allocation. - **Flamegraph tab**: heaptrack_gui has a flamegraph view filtered to allocations only — this directly shows which call paths allocate. For production monitoring, `horus monitor` reports per-node memory metrics without profiling overhead. ### Manual Timing in Code For targeted measurement, time specific operations directly: ```rust // simplified fn tick(&mut self) { let start = Instant::now(); self.expensive_operation(); let elapsed = start.elapsed(); // Log periodically, not every tick if self.tick_count % 1000 == 0 { hlog!(info, "Operation: {:?}", elapsed); } } ``` For round-trip latency: timestamp before `send()`, check elapsed after `recv()` on the return path. For throughput: count messages over a fixed time window using `Instant::elapsed()`. ## Common Performance Pitfalls ### Pitfall: Synchronous I/O in tick() ```rust // simplified // BAD: Blocking I/O fn tick(&mut self) { let data = std::fs::read("data.txt").unwrap(); // Blocks! } // GOOD: Async or pre-loaded fn init(&mut self) -> Result<()> { self.data = std::fs::read("data.txt")?; // Load once Ok(()) } ``` **Fix**: Move I/O to `init()` or use `.async_io()` execution class. > For other common pitfalls (allocations in tick, excessive logging, oversized messages, debug builds), see the detailed guidance in [Node Optimization](#node-optimization), [Message Optimization](#message-optimization), and [Build Optimization](#build-optimization) above. ## Design Decisions Understanding *why* HORUS is built this way helps you work with the architecture instead of against it. ### Why Ring Buffers, Not Channels Channels (`mpsc`, `crossbeam`) require heap allocation per message, involve lock contention on the queue, and cannot be shared across processes. Ring buffers in shared memory provide: - **Fixed memory footprint**: No per-message allocation. The buffer is allocated once at topic creation and reused forever. - **Cross-process communication**: Shared memory works between any processes on the same machine, not just threads within one process. - **Predictable latency**: No allocator jitter, no lock contention. The write path is a memcpy plus an atomic store. - **Natural backpressure**: If the subscriber falls behind, old messages are overwritten. For real-time systems, stale data is worse than dropped data. The trade-off is that subscribers must keep up or lose messages. This is deliberate — a safety-critical motor controller should always read the *latest* command, not process a queue of stale ones. ### Why Automatic Backend Selection HORUS detects your message type at compile time and selects the fastest available transport: - **POD types** (no pointers, no heap data, `Copy` + fixed-size): direct `memcpy` into shared memory. No serialization. - **Non-POD types** (contains `Vec`, `String`, `Box`, etc.): Serde serialization into shared memory. - **Large messages** (above internal threshold on x86_64): SIMD-accelerated memcpy using AVX2/SSE2. This means you write `topic.send(msg)` and get the fastest path automatically. No configuration flags, no "zero-copy mode" toggle. The type system determines the backend. ### Why SIMD for Large Messages For messages above ~256 bytes on x86_64, HORUS uses SIMD (AVX2/SSE2) for memory copies instead of the standard `memcpy`. On most platforms the compiler's built-in memcpy already uses SIMD, but HORUS's implementation is tuned for the specific access patterns of ring-buffer slots (aligned, known-size, non-overlapping). The result is consistent throughput for large sensor data (point clouds, images) without relying on the platform's libc quality. On platforms without SIMD (ARM32, older hardware), this falls back to standard `memcpy` with zero code changes. ### Why POD Auto-Detection Rather than requiring users to annotate messages with `#[zero_copy]` or `#[serde]`, HORUS inspects the type at compile time. If a struct is `Copy`, has no pointers, and has a fixed layout, it is automatically treated as POD and copied directly. Otherwise, Serde kicks in. This eliminates a class of bugs where users forget to annotate a message type and unknowingly get slow-path serialization, or annotate a non-POD type as zero-copy and get UB. The compiler makes the decision — the user just writes structs. ## Trade-offs Every design choice in HORUS sacrifices something. Understanding these trade-offs helps you make informed decisions about when to work within the defaults and when to reach for alternatives. | Decision | Benefit | Cost | When the cost matters | |----------|---------|------|-----------------------| | **Ring buffer (overwrite-oldest)** | Constant memory, always-fresh data, no backpressure stalls | Slow subscribers lose messages | Logging, recording, or batch processing that needs every message | | **Fixed ring capacity** | Predictable memory usage, no allocator calls at runtime | Must choose capacity at topic creation; too small = lost messages, too large = wasted memory | High-burst topics where message rate varies 100x between peaks and steady-state | | **POD auto-detection** | Zero-config zero-copy for simple types, no annotation burden | Cannot force zero-copy for types that contain `Vec`/`String` even if you know the layout is stable | Rarely — if you need zero-copy for dynamic types, redesign the message as fixed-size | | **SIMD memcpy** | ~15-30% faster large-message throughput on x86_64 | Binary is x86_64-specific when enabled; no benefit on ARM NEON (falls back) | Cross-compiling for ARM targets; SIMD is a no-op there but binary still works | | **Automatic backend selection** | Users never pick wrong backend, no configuration surface | Cannot override backend per-topic (e.g., force Serde for a POD type) | Debugging serialization issues — workaround: wrap POD in a newtype with `Vec` | | **Shared memory IPC** | Sub-microsecond latency, zero syscalls on hot path | Same-machine only; need network transport for distributed systems | Multi-machine deployments — use network backends (io_uring, batch UDP) for those topics | ## Performance Checklist Before deployment, verify: - [ ] Build in release mode (`--release`) - [ ] Profile with `perf` or flamegraph — identify actual hotspots before optimizing - [ ] tick() completes in <1ms - [ ] No allocations in tick() (verify with heaptrack) - [ ] Messages use fixed-size types where possible - [ ] Logging is rate-limited in hot paths - [ ] Shared memory has sufficient space (`horus doctor`) - [ ] IPC latency is <10us - [ ] Priority levels set correctly ## Real-Time Configuration For hard real-time applications requiring bounded latency, use the Scheduler builder API: ```rust // simplified use horus::prelude::*; let mut scheduler = Scheduler::new() .tick_rate(1000_u64.hz()) .require_rt() // Enables mlockall, SCHED_FIFO (Linux only) .cores(&[2, 3]); // Pin to isolated CPU cores scheduler.add(MotorController::new()) .order(0) .rate(1000_u64.hz()) // Auto-derives budget + deadline, auto-enables RT .priority(80) // SCHED_FIFO priority (1-99) .core(2) // Pin this node's thread to core 2 .build()?; ``` > **Linux only**: RT features (`SCHED_FIFO`, `mlockall`, CPU pinning) require a Linux kernel. On other platforms, `.prefer_rt()` degrades gracefully to best-effort scheduling. For detailed configuration options, see the [Scheduler Configuration](/advanced/scheduler-configuration). ## Next Steps - Apply these optimizations to your [Examples](/rust/examples/basic-examples) - Configure [Scheduler Settings](/advanced/scheduler-configuration) for bounded latency - Learn about [Multi-Language Support](/concepts/multi-language) - Read the [Core Concepts](/concepts/core-concepts-nodes) for deeper understanding - Check the [CLI Reference](/development/cli-reference) for build options --- ## See Also - [Benchmarks](/performance/benchmarks) — Measured latency and throughput - [Scheduler Configuration](/advanced/scheduler-configuration) — Tick rate tuning - [RT Setup](/advanced/rt-setup) — Real-time kernel setup --- ## Real-Time Tuning Path: /performance/rt-tuning Description: Check and optimize real-time performance on your system # Real-Time Tuning HORUS handles real-time automatically. Set your rate, HORUS does the rest: ```python horus.run(Node(tick=my_controller, rate=1000)) ``` No configuration needed. HORUS auto-detects your system's RT capabilities and uses the best available: SCHED_FIFO priority, memory locking, CPU pinning, lock-free IPC — all built in. No external packages required. This page covers **optional** tuning for users who want the lowest possible jitter. --- ## Check Your System ```bash horus doctor ``` The Real-Time section shows what HORUS detected: ``` Real-Time ✓ PREEMPT_RT active, jitter ±10μs or ⚠ Standard kernel, jitter ±100μs (run `horus setup-rt` for ±20μs) ``` For detailed RT status: ```bash horus setup-rt --check ``` ``` HORUS Real-Time Setup Kernel: Linux 6.8.0-generic ⚠ PREEMPT_RT: not detected ✓ SCHED_FIFO: available (priority 1-99) ⚠ Memory locking: limited ℹ CPU cores: 8 ℹ Isolated CPUs: none ℹ Estimated jitter: ±100μs ``` --- ## Install RT Kernel (Optional) If you need tighter timing (force control, high-bandwidth servo loops): ```bash sudo horus setup-rt ``` This command: 1. Detects your Linux distribution (Ubuntu, Debian, Fedora, Arch) 2. Installs the RT kernel package from your distro's repository 3. Configures memory lock limits (`/etc/security/limits.d/99-horus-rt.conf`) 4. Suggests CPU core isolation for dedicated RT threads Requires a reboot after install. Run `horus setup-rt --check` to verify. To undo: `sudo horus setup-rt --undo` --- ## Expected Performance | Setup | Jitter (p99) | Good For | |-------|-------------|----------| | Standard Linux | ±200μs | Position control, ML policy deployment, navigation | | + RT kernel (`horus setup-rt`) | ±20-80μs | Force control, humanoid balance, high-rate servos | | + CPU isolation | ±10-30μs | High-bandwidth servo loops, multi-joint coordination | **Most robots work fine without any tuning.** The standard Linux setup handles position control, velocity control, ML policy deployment, and navigation at rates up to 1kHz. --- ## What HORUS Does Automatically When you set `rate=1000`, HORUS internally: - Detects if PREEMPT_RT is available and uses it - Sets `SCHED_FIFO` thread priority for RT nodes - Calls `mlockall()` to prevent page faults - Pre-faults 256KB of stack memory - Pins RT threads to isolated CPU cores (if available) - Sets budget to 80% of period (800μs for 1kHz) - Sets deadline to 95% of period (950μs for 1kHz) - Enables graduated watchdog (warn at 3 misses, isolate at 10, kill at 20) **You never need to configure any of this.** Override only if you have specific requirements: ```python Node( tick=my_controller, rate=1000, # 1kHz tick rate budget=300 * us, # Override: 300μs budget (default: 800μs) core=6, # Override: pin to CPU 6 on_miss="skip", # Override: skip missed ticks (default: warn) ) ``` --- ## See Also - [Benchmarks](/performance/benchmarks) — Measured IPC latency and throughput - [Performance Optimization](/performance/performance) — General optimization guide - [Execution Classes](/concepts/execution-classes) — RT, Compute, Event, AsyncIo, BestEffort - [Safety Monitor](/advanced/safety) — Graduated watchdog and deadline enforcement --- ## Benchmarks Path: /performance/benchmarks Description: Measured IPC latency, throughput, and real-time performance on real hardware # Benchmarks All numbers on this page are **measured values** from the HORUS benchmark suite, not estimates. Run on **Intel i9-14900K (32 cores), WSL2, release mode, RDTSC cycle-accurate timing** on 2026-03-22. Reproduce on your hardware: ```bash cargo run --release -p horus_benchmarks --bin all_paths_latency ``` --- ## Quick Reference | Transport | p50 | Throughput | Use Case | |-----------|-----|-----------|----------| | Same-thread | **12ns** | 100M+ msg/s | In-process pipeline | | Cross-thread 1:1 | **91ns** | 13M+ msg/s | Multi-threaded nodes | | Cross-thread N:N | **150ns** | 8M+ msg/s | Multi-producer/consumer | | Cross-process 1:1 | **171ns** | 5M+ msg/s | Multi-process systems | | Cross-process MPMC | **91ns** | 10M+ msg/s | Multi-process multi-participant | | Cross-process broadcast | **152ns** | 5M+ msg/s | Latest-value broadcast | | CmdVel (16B) | **89ns** | 11.1M msg/s | Motor control at 1kHz+ | | Imu (304B) | **119ns** | 7.8M msg/s | Sensor fusion at 500Hz+ | | LaserScan (1.5KB) | **151ns** | 6.3M msg/s | Lidar at 10-40Hz | | Python typed msg | **1.7μs** | 2.7M msg/s | ML inference nodes | | HORUS vs iceoryx2 | **1.4–6.3x faster** | — | Beats on every IPC path | --- ## IPC Latency — All Backend Paths Measured with `all_paths_latency`: 100,000 iterations per scenario, RDTSC timing with 6ns overhead subtracted, Tukey IQR outlier removal, bootstrap 95% confidence intervals. ### Intra-Process | Scenario | Backend | p50 | p95 | p99 | p99.9 | max | CV | |----------|---------|-----|-----|-----|-------|-----|----| | Same thread | DirectChannel | **12ns** | 12ns | 13ns | 13ns | 13ns | 0.047 | | Cross-thread 1:1 | SpscIntra | **91ns** | 105ns | 107ns | 125ns | 125ns | 0.077 | | Cross-thread 1:N | SpmcIntra | **80ns** | 88ns | 92ns | 94ns | 94ns | 0.053 | | Cross-thread N:1 | MpscIntra | **187ns** | 312ns | 372ns | 458ns | 464ns | 0.313 | | Cross-thread N:N | FanoutIntra | **150ns** | 270ns | 307ns | 322ns | 322ns | 0.354 | ### Cross-Process | Scenario | Backend | p50 | p95 | p99 | p99.9 | max | |----------|---------|-----|-----|-----|-------|-----| | 1 pub, 1 sub | SpscShm | **171ns** | 186ns | 192ns | 195ns | 195ns | | 2 pub, 1 sub | MpscShm | **158ns** | 182ns | 190ns | 200ns | 200ns | | 2 pub, 2 sub (MPMC) | FanoutShm | **91ns** | — | 230ns | — | — | | Broadcast (POD) | PodShm | **152ns** | 203ns | 227ns | 254ns | 254ns | ### Hardware Floor | Operation | p50 | What it measures | |-----------|-----|------------------| | Raw SHM atomic (cross-process) | **63ns** | Kernel/hardware minimum for cross-process | | Raw memcpy 8B | **11ns** | Cache-to-cache copy | | Raw memcpy 1KB | **17ns** | L1 cache bandwidth | | Raw memcpy 8KB | **49ns** | L2 cache bandwidth | | Raw memcpy 64KB | **811ns** | RAM bandwidth | | Atomic store+load | **11ns** | Single atomic round-trip | ### Framework Overhead | Path | Total | Hardware floor | HORUS overhead | |------|-------|---------------|----------------| | Cross-process 1:1 (SpscShm) | 171ns | 57ns | **114ns** | | Cross-process MPMC (FanoutShm) | 91ns | 57ns | **34ns** | | Cross-process broadcast (PodShm) | 152ns | 57ns | **95ns** | HORUS adds 34–114ns over the hardware minimum for cross-process IPC. FanoutShm achieves the lowest overhead because its contention-free SPSC channel matrix eliminates CAS operations on the hot path. --- ## Robotics Message Types Measured with `robotics_messages_benchmark`: 50,000 iterations, cross-thread producer/consumer on separate cores. | Message | Size | Median | p99 | Throughput | Typical Rate | Headroom | |---------|------|--------|-----|-----------|-------------|----------| | **CmdVel** | 16B | **89ns** | 91ns | 11.1M msg/s | 1,000 Hz | 11,100x | | **Imu** | 304B | **119ns** | 150ns | 7.8M msg/s | 500 Hz | 15,600x | | **JointCommand** | 928B | **128ns** | 157ns | 8.1M msg/s | 500 Hz | 16,200x | | **LaserScan** | 1,480B | **151ns** | 184ns | 6.3M msg/s | 40 Hz | 157,500x | ### Real-Time Suitability | Control Rate | Budget | Worst-Case (p99) | Result | |-------------|--------|-------------------|--------| | 1 kHz (motor control) | 1ms | CmdVel 91ns | **PASS** (11,000x headroom) | | 10 kHz (servo control) | 100μs | CmdVel 91ns | **PASS** (1,100x headroom) | | 500 Hz (sensor fusion) | 2ms | Imu 150ns | **PASS** (13,300x headroom) | | 40 Hz (lidar) | 25ms | LaserScan 184ns | **PASS** (135,000x headroom) | All message types pass real-time suitability at their typical robotics frequencies. --- ## HORUS vs Competition Measured with `competitor_comparison`: 5 seconds sustained per transport, same machine. | Transport | Size | p50 | p95 | p99 | Throughput | |-----------|------|-----|-----|-----|-----------| | **HORUS SHM** | 8B | **23ns** | 25ns | 29ns | 100M+ msg/s | | Raw UDP | 8B | 1,235ns | 1,328ns | 1,558ns | 3.9M msg/s | | **HORUS SHM** | 32B | **23ns** | 25ns | 29ns | 101M+ msg/s | | Raw UDP | 32B | 1,122ns | 1,246ns | 2,129ns | 4.1M msg/s | **Speedup: 54x** (8B), **49x** (32B) over raw UDP on the same machine. HORUS eliminates the kernel network stack entirely. UDP requires `sendto()` + `recvfrom()` system calls (~1,100ns of kernel overhead). HORUS uses direct shared memory access (~23ns total). ### HORUS vs iceoryx2 [iceoryx2](https://github.com/eclipse-iceoryx/iceoryx2) is Eclipse's lock-free zero-copy IPC middleware. Measured with `iceoryx2_comparison` and `fanout_shm_bench`: same machine, same message types, release mode. | Scenario | HORUS | iceoryx2 | Speedup | |----------|-------|----------|---------| | Same-thread | 11 ns | 69 ns | **6.3x** | | Cross-thread 1:1 | 95 ns | 182 ns | **1.9x** | | Cross-process 1:1 | 170 ns | 361 ns | **2.1x** | | Cross-process MPMC 2P/2S | 96 ns | 135 ns | **1.4x** | | Throughput (u64) | 95 M/s | 22 M/s | **4.3x** | HORUS beats iceoryx2 on every IPC path. The cross-process MPMC advantage comes from FanoutShm — a contention-free SPSC channel matrix that eliminates all CAS operations on the hot path. ```bash # Reproduce (requires iceoryx2 feature) cargo run --release -p horus_benchmarks --bin iceoryx2_comparison --features iceoryx2 # HORUS-only cross-process MPMC cargo run --release -p horus_benchmarks --bin fanout_shm_bench ``` --- ## Scalability Measured with `scalability_benchmark`: sustained throughput with varying producer/consumer thread counts. ### Thread Scaling | Producers | Consumers | Throughput | Per-Thread | Scaling Efficiency | |-----------|-----------|-----------|------------|-------------------| | 1 | 1 | 2.4M msg/s | 1.20M | baseline | | 2 | 1 | 7.2M msg/s | 2.40M | 300% | | 4 | 1 | 11.8M msg/s | 2.35M | 489% | | 1 | 2 | 3.5M msg/s | 1.17M | 146% | | 1 | 4 | 2.9M msg/s | 0.58M | 122% | | 2 | 2 | 6.8M msg/s | 1.70M | 141% | | 4 | 4 | 11.2M msg/s | 1.41M | 117% | | 8 | 8 | 8.4M msg/s | 0.52M | 44% | ### Producer Scaling (1 Consumer) ``` 1 producer: 3.0 M/s ████████████ 2 producers: 8.7 M/s ██████████████████████████████████ 3 producers: 11.5 M/s ████████████████████████████████████████████ 4 producers: 11.9 M/s ████████████████████████████████████████████ 6 producers: 13.5 M/s █████████████████████████████████████████████████ ← peak 8 producers: 11.5 M/s ████████████████████████████████████████████ ``` Peak throughput at 6 producers (13.5M msg/s). Beyond 6, contention on the atomic head pointer causes slight degradation. ### Consumer Scaling (1 Producer) ``` 1 consumer: 2.8 M/s ███████████ 2 consumers: 4.8 M/s ██████████████████ 4 consumers: 4.8 M/s ██████████████████ 8 consumers: 3.7 M/s ██████████████ ``` Consumer scaling plateaus at 2 — the ring buffer uses broadcast semantics (all consumers read the same data), so adding consumers doesn't increase total throughput. --- ## Real-Time Determinism Measured with `determinism_benchmark`: 10 runs of 100,000 iterations each, CPU-pinned to cores 0 and 1. | Metric | Value | |--------|-------| | Mean latency | 87.0ns | | Median latency | 86.0ns | | Std dev | 7.9ns | | Min | 61ns | | Max | 112ns | | p95 | 102ns | | p99 | 109ns | | p99.9 | 112ns | | p99.99 | 112ns | | Run-to-run CV | 0.060 | | Deadline misses at 1μs | 212 / 1,000,000 (0.02%) | **Interpretation**: The 7.9ns standard deviation and 0.06 run-to-run coefficient of variation indicate highly deterministic behavior. The 0.02% deadline miss rate at the extremely aggressive 1μs deadline is due to OS scheduling jitter in WSL2 — on a bare-metal Linux system with `PREEMPT_RT` kernel and `isolcpus`, expect zero misses. --- ## Python Binding Performance Measured with `research_bench_python.py`: 5 seconds sustained per test, Python 3.12, PyO3 bindings v0.1.9. ### Typed Message IPC (Zero-Copy Pod Path) | Message | p50 | p95 | p99 | p999 | Throughput | |---------|-----|-----|-----|------|-----------| | CmdVel send+recv | **1.7μs** | 1.8μs | 2.4μs | 15.2μs | 2.7M msg/s | | Pose2D send+recv | **1.7μs** | 1.9μs | 3.0μs | 18.2μs | 2.7M msg/s | | Imu send+recv | **1.9μs** | 2.0μs | 4.2μs | 21.1μs | 2.4M msg/s | ### Generic Message IPC (Serialization Path) | Payload | p50 | p95 | p99 | Throughput | |---------|-----|-----|-----|-----------| | dict \{v: 1.0\} | **6.2μs** | 7.9μs | 19.9μs | 714K msg/s | | dict \{x,y,z,w\} | **12.4μs** | 15.3μs | 34.2μs | 382K msg/s | | dict 50 keys (~1KB) | **111μs** | 143μs | 196μs | 42K msg/s | **Typed messages are 4-65x faster than dicts** because they bypass serialization and use direct Pod memcpy through the Rust layer. ### Image Zero-Copy | Operation | p50 | Throughput | Notes | |-----------|-----|-----------|-------| | Image.to_numpy (640x480) | **3.0μs** | 1.5M/s | Returns view into SHM pool | | np.from_dlpack (640x480) | **1.1μs** | 3.5M/s | DLPack protocol, true zero-copy | | np.copy (640x480) baseline | **14.0μs** | 334K/s | For comparison (actual copy) | `np.from_dlpack()` is **13x faster** than `np.copy()` — it returns a numpy array backed by the shared memory pool with no data movement. ### FFI Overhead Attribution | Operation | Rust (ns) | Python (ns) | Overhead | Factor | |-----------|-----------|-------------|----------|--------| | CmdVel | 14 | 1,712 | 1,698ns | 122x | | Pose2D | 14 | 1,682 | 1,668ns | 120x | | Imu | 14 | 1,884 | 1,870ns | 135x | | dict (small) | 14 | 6,246 | 6,232ns | 446x | The ~1.7μs Python overhead comes from: PyO3 boundary crossing (~500ns), GIL acquisition (~500ns), and Python object allocation (~700ns). This overhead is **constant** regardless of message size. ### Scheduler Tick Overhead | Metric | Value | |--------|-------| | Target rate | 10,000 Hz | | Achieved rate | **5,932 Hz** | | Per-tick overhead | ~11μs (Rust→Python→Rust) | | GC dip (worst second) | 96 fewer ticks | The GIL is the bottleneck for Python tick rate. For control loops above ~5kHz, use Rust nodes. ### When to Use Python vs Rust | Use Case | Recommended | Why | |----------|-------------|-----| | ML inference (PyTorch, YOLO) | **Python** | 1.7μs overhead negligible vs 10-200ms inference | | Data science, prototyping | **Python** | Developer velocity matters more than latency | | Motor control at 1kHz+ | **Rust** | 89ns vs 1,700ns — 19x difference | | Safety monitors | **Rust** | Deterministic timing, no GIL | | Sensor fusion at 500Hz+ | **Rust** | Predictable p99 latency | --- ## C++ Binding Performance Measured with `cpp_benchmark`: release mode, g++ -O2, 10,000 iterations per test, `high_resolution_clock` timing with warmup. ### FFI Boundary Cost | Operation | Min | Median | P99 | P999 | Max | Stddev | |-----------|-----|--------|-----|------|-----|--------| | FFI call (abi_version) | **15ns** | **16ns** | 17ns | 17ns | 5.0μs | 49ns | | Atomic read (is_running) | **16ns** | **17ns** | 18ns | 18ns | 33ns | 0.6ns | The raw cost of crossing the Rust-C++ boundary is **15-17ns** — comparable to a C++ virtual function call. ### Scheduler Tick from C++ | Scenario | Min | Median | P99 | P999 | Max | |----------|-----|--------|-----|------|-----| | Empty scheduler | **35ns** | **37ns** | 45ns | 49ns | 1.6μs | | 1 node + callback | **243ns** | **250ns** | 10.5μs | 11.3μs | 13.8μs | | 10 nodes | **2.1μs** | **2.2μs** | 102μs | 109μs | 127μs | | 50 nodes | **10.7μs** | **11.0μs** | 515μs | 549μs | 597μs | Per-node overhead is ~220ns, which includes `catch_unwind` safety wrapper + closure dispatch through the FFI boundary. ### Throughput | Metric | Value | |--------|-------| | Ticks/second (1 node) | **2,844,911** | | Time per tick | **0.35μs** | | CPU overhead at 1kHz | **0.035%** of one core | ### Scalability | Nodes | Median Tick | Per-Node Cost | |-------|-------------|---------------| | 1 | 250ns | 250ns | | 10 | 2.2μs | 220ns | | 50 | 11.0μs | 220ns | Linear scaling — per-node cost is constant at ~220ns regardless of node count. ### C++ vs Rust vs Python Overhead | Language | 1-Node Tick | Overhead vs Rust | Notes | |----------|-------------|-----------------|-------| | **Rust** (native) | ~89ns | baseline | Direct scheduler call | | **C++** (FFI) | ~250ns | +161ns (1.8x) | extern "C" + catch_unwind | | **Python** (PyO3) | ~1,700ns | +1,611ns (19x) | GIL + PyO3 + object alloc | C++ adds **161ns** over native Rust — the cost of the `extern "C"` boundary and panic safety wrapper. For perspective, this is 0.16 microseconds — invisible at any practical control rate. ### Memory Safety Validated with AddressSanitizer (`g++ -fsanitize=address`): | Test | Iterations | ASAN Errors | |------|-----------|-------------| | Scheduler create/destroy | 1,000 | **0** | | Sustained ticks | 5,000 | **0** | | 50 concurrent nodes | 5,000 ticks | **0** | | Null pointer calls | 10,000 | **0** | Zero memory safety violations across the entire FFI surface. ### Running C++ Benchmarks ```bash # Build release cargo build --release --no-default-features -p horus_cpp # Compile benchmark g++ -std=c++17 -O2 -o cpp_benchmark \ horus_cpp/tests/cpp_benchmark.cpp \ -L target/release -lhorus_cpp -lpthread -ldl -lm # Run LD_LIBRARY_PATH=target/release ./cpp_benchmark # With ASAN g++ -std=c++17 -O2 -fsanitize=address -fno-omit-frame-pointer \ -o cpp_stress_asan horus_cpp/tests/cpp_stress_test.cpp \ -L target/release -lhorus_cpp -lpthread -ldl -lm LD_LIBRARY_PATH=target/release ./cpp_stress_asan ``` --- ## Running Benchmarks ### Rust Benchmarks ```bash # Main benchmark: all 10 backend paths (~2 min) cargo run --release -p horus_benchmarks --bin all_paths_latency # Robotics message types: CmdVel, Imu, LaserScan, JointCommand cargo run --release -p horus_benchmarks --bin robotics_messages_benchmark # HORUS vs UDP comparison cargo run --release -p horus_benchmarks --bin competitor_comparison # Scalability: thread count sweep cargo run --release -p horus_benchmarks --bin scalability_benchmark # RT determinism: jitter analysis cargo run --release -p horus_benchmarks --bin determinism_benchmark # Hardware floor: raw memcpy, atomic, mmap cargo run --release -p horus_benchmarks --bin raw_baselines # Cross-process: true inter-process IPC cargo run --release -p horus_benchmarks --bin cross_process_benchmark # Full research suite (~30 min) ./benchmarks/research/run_all.sh # Quick validation (~3 min) ./benchmarks/research/run_all.sh --quick ``` ### Python Benchmarks ```bash cd horus_py # Quick validation (2s per test) PYTHONPATH=. python3 benchmarks/research_bench_python.py --duration 2 # Full research run (10s per test) PYTHONPATH=. python3 benchmarks/research_bench_python.py --duration 10 --csv results.csv # JSON summary for CI PYTHONPATH=. python3 benchmarks/research_bench_python.py --json summary.json ``` ### Criterion Micro-Benchmarks ```bash # All criterion benches (HTML reports in target/criterion/) cargo bench -p horus_benchmarks # Filter by name cargo bench -p horus_benchmarks -- topic_latency ``` --- ## Methodology ### Timing - **RDTSC** (x86_64) with serializing fences (`lfence` + `mfence`), calibrated per-run (~3.37 GHz on test machine) - **Overhead**: ~6ns per measurement, subtracted from all samples - **Fallback**: `Instant::now()` (~11ns) on non-x86 ### Statistical Analysis - **Percentiles**: p1, p5, p25, p50, p75, p95, p99, p99.9, p99.99 - **Confidence intervals**: Bootstrap with 10,000 resamples, 95% level - **Outlier filtering**: Tukey IQR (1.5x fence) - **Determinism metrics**: Coefficient of variation, run-to-run variance ### Environment Control - **CPU governor**: Performance mode recommended (numbers above measured with `powersave` in WSL2) - **CPU affinity**: Producer and consumer pinned to separate physical cores - **Warmup**: 5,000-10,000 iterations discarded before measurement - **Measurement**: 50,000-100,000 iterations per test - **Turbo boost**: Disabled recommended for reproducibility ### Reproducing These Numbers Your numbers will differ based on CPU, OS, governor, and VM/bare-metal: | Factor | Impact | |--------|--------| | `performance` governor vs `powersave` | 2-5x faster latencies | | Bare-metal Linux vs WSL2 | 10-30% faster, fewer outliers | | `PREEMPT_RT` kernel | Near-zero deadline misses | | `isolcpus` for benchmark cores | Lower jitter, tighter p99 | | Older CPU (i5 vs i9) | 1.5-3x slower | | ARM (Raspberry Pi, Jetson) | 3-10x slower, still sub-microsecond | --- ## See Also - [Performance Optimization](/performance/performance) — How to write fast HORUS code - [Shared Memory](/concepts/shared-memory) — SHM architecture and ring buffer details - [Scheduler API](/rust/api/scheduler) — Timing configuration and execution classes - [Python API](/python/api) — Python binding overhead and usage patterns --- ## Performance Path: /performance Description: Optimization techniques and benchmarks for HORUS robotics applications # Performance HORUS is designed for performance by default — zero-copy shared memory IPC, lock-free ring buffers, and priority-based scheduling. Most applications need no tuning. **Optimize only after your system works correctly.** Premature optimization in robotics is dangerous — a fast controller that computes the wrong output is worse than a slow one that's correct. --- ## Quick Reference | Metric | HORUS (typical) | ROS2 (typical) | |--------|-----------------|----------------| | Same-thread IPC | ~12 ns | N/A | | Cross-thread IPC (1:1) | ~91 ns | 20-50 us | | Cross-process IPC (1:1) | ~171 ns | 50-100 us | | Cross-process MPMC | ~91 ns | 50-100 us | | vs iceoryx2 | **1.4–6.3x faster** | N/A | | Scheduler tick overhead | ~1 us per node | 10-50 us per node | | Node scaling (100 nodes) | +14% overhead | +200-400% overhead | | Topic scaling (1000 topics) | 0% degradation | Significant | --- ## Guides - [Optimization Guide](/performance/performance) — When and how to tune: message sizes, buffer capacities, tick rates, CPU pinning, memory layout - [Benchmarks](/performance/benchmarks) — Latency, throughput, and scalability measurements comparing HORUS to alternatives --- ## What HORUS Optimizes For You | Feature | How | Tunable? | |---------|-----|----------| | IPC latency | Shared memory (no serialization for same-machine) | Buffer size via `Topic::with_capacity()` | | Message passing | Lock-free ring buffer | Slot count and size | | Scheduling | Priority-ordered tick loop | `.order()`, `.rate()`, execution classes | | Large data | Pool-backed allocation (Image, PointCloud, Tensor) | Pool size | | Real-time | Dedicated RT threads, `SCHED_FIFO` | `.budget()`, `.deadline()`, `.on_miss()` | --- ## See Also - [Scheduler Configuration](/advanced/scheduler-configuration) — Advanced tick rate and thread pool tuning - [RT Setup](/advanced/rt-setup) — Linux real-time kernel configuration - [API Cheatsheet](/reference/api-cheatsheet) — Quick reference for all APIs ======================================== # SECTION: Operations ======================================== --- ## Deploy to Your Robot Path: /operations/deploy-to-robot Description: Step-by-step guide to deploying HORUS projects from your development PC to a robot # Deploy to Your Robot This guide walks through every step from "my project works on my PC" to "it's running on my robot." No prior SSH or networking experience required. ## Problem Statement Your HORUS project runs on your development machine and you need to transfer it to a physical robot (Raspberry Pi, Jetson, or other Linux board) and run it there. ## When To Use - First-time deployment to a new robot - Setting up SSH key authentication for passwordless deploys - Configuring named deploy targets for repeated deployments - Deploying to a fleet of multiple robots at once ## Prerequisites - A working HORUS project on your development PC - A Linux-based robot (Raspberry Pi, Jetson, or any ARM/x86 board) - Both machines on the same network (WiFi or Ethernet) --- ## How It Works `horus deploy` is a three-step pipeline that runs on your **development PC**: Cross-compile for ARM"] end subgraph ROBOT["Your Robot"] D["~/horus_deploy/
Binary executes"] end B -->|"Step 2: rsync over SSH"| D D -.->|"Step 3: ssh run"| D `} caption="horus deploy: build on your PC, sync via SSH, run on the robot" /> Your code is compiled on your PC and the finished binary is copied to the robot. The robot just runs it. ## What Each Machine Needs ### Your Development PC | Requirement | Why | How to install | |-------------|-----|----------------| | HORUS CLI | Builds and deploys your project | [Installation guide](/getting-started/installation) | | Rust toolchain | Cross-compiles for robot architecture | Installed with HORUS | | rsync | Syncs files to robot efficiently | `sudo apt install rsync` (usually pre-installed) | | SSH client | Connects to robot | `sudo apt install openssh-client` (usually pre-installed) | **Supported dev PC operating systems:** Linux, macOS, WSL 2. Native Windows without WSL is not supported (no rsync/SSH). ### Your Robot | Requirement | Why | |-------------|-----| | Linux | Any distribution (Raspberry Pi OS, Ubuntu, Debian, etc.) | | SSH server enabled | So your PC can connect and copy files | | Network connection | WiFi or Ethernet, same network as your PC | **The robot does NOT need:** HORUS, Rust, cargo, or any build tools. The compiled binary is self-contained. **Exception for Python projects:** The robot also needs `python3` and the HORUS Python package: ```bash # Run this ON the robot (one-time setup) pip install horus-robotics ``` ## Step-by-Step ### Step 1: Prepare Your Robot If you have a new Raspberry Pi or Jetson that isn't set up yet: **Raspberry Pi:** 1. Download [Raspberry Pi Imager](https://www.raspberrypi.com/software/) on your PC 2. Flash an SD card with **Raspberry Pi OS** (or Ubuntu Server) 3. In the imager settings (gear icon), configure: - **Username and password** (e.g., `pi` / `yourpassword`) - **WiFi** network name and password - **Enable SSH** (check the box) 4. Insert the SD card into the Pi and power it on 5. Wait 1-2 minutes for it to boot and connect to WiFi **Jetson Nano/Xavier/Orin:** 1. Flash the SD card or eMMC with NVIDIA JetPack (Ubuntu-based) 2. Complete the first-boot setup (username, password, WiFi) 3. SSH is enabled by default on JetPack **Any other Linux board:** - Ensure SSH is enabled: `sudo systemctl enable ssh && sudo systemctl start ssh` - Ensure it's connected to the same network as your PC ### Step 2: Find Your Robot's IP Address Your PC and robot must be on the same network (same WiFi or same Ethernet switch). You need the robot's IP address to connect. **Option A: mDNS (easiest, try this first)** Most Linux boards advertise their hostname on the local network: ```bash # On your PC: ping raspberrypi.local # Raspberry Pi default hostname ping jetson.local # Jetson default hostname ``` If it responds, you can use `raspberrypi.local` instead of an IP address: ```bash horus deploy pi@raspberrypi.local --run ``` **Option B: Check your router** 1. Open your router's admin page in a browser (usually `192.168.1.1` or `192.168.0.1`) 2. Look for "Connected Devices" or "DHCP Clients" 3. Find your robot's hostname (e.g., "raspberrypi") and note its IP **Option C: Scan your network** ```bash # On your PC — scan for all devices: nmap -sn 192.168.1.0/24 # Or if your network is 192.168.0.x: nmap -sn 192.168.0.0/24 ``` Look for your robot's hostname or MAC address in the results. **Option D: Connect a monitor to the robot** Plug a monitor and keyboard into the robot and run: ```bash hostname -I # Output: 192.168.1.50 ``` ### Step 3: Test SSH Connection Before deploying, verify you can connect to the robot: ```bash ssh pi@192.168.1.50 # Enter your password when prompted ``` Replace `pi` with your robot's username and `192.168.1.50` with the IP you found. If you see the robot's terminal prompt, it works. Type `exit` to disconnect. **If "Connection refused":** SSH is not enabled on the robot. Connect a monitor and keyboard to the robot and run: ```bash sudo systemctl enable ssh sudo systemctl start ssh ``` ### Step 4: Set Up SSH Keys (Recommended) Without SSH keys, `horus deploy` will prompt for your password during every deployment. Setting up keys makes it passwordless: ```bash # On your PC — generate a key (press Enter for all prompts): ssh-keygen -t ed25519 # Copy it to the robot: ssh-copy-id pi@192.168.1.50 # Enter your password one last time # Verify — this should connect WITHOUT asking for a password: ssh pi@192.168.1.50 ``` ### Step 5: Deploy In your project directory on your PC: ```bash horus deploy pi@192.168.1.50 --run ``` Press `Ctrl+C` to stop the robot. ### Step 6: Save Your Target Typing the full `pi@192.168.1.50` every time gets tedious. Create a `deploy.yaml` file in your project root: ```yaml targets: robot: host: pi@192.168.1.50 arch: aarch64 ``` Now deploy with just a name: ```bash horus deploy robot --run ``` For multiple robots, add more targets: ```yaml targets: robot: host: pi@192.168.1.50 arch: aarch64 jetson: host: nvidia@jetson.local arch: aarch64 port: 2222 identity: ~/.ssh/jetson_key arm-pc: host: ubuntu@10.0.0.20 arch: x86_64 dir: ~/my_app ``` ## Deploy Options | Option | What it does | Example | |--------|-------------|---------| | `--run` | Execute the binary after deploying | `horus deploy robot --run` | | `--dry-run` | Show what would happen without doing it | `horus deploy robot --dry-run` | | `--arch` | Override target architecture | `horus deploy robot --arch armv7` | | `--debug` | Build in debug mode (faster build, slower binary) | `horus deploy robot --debug` | | `-d, --dir` | Custom remote directory (default: `~/horus_deploy`) | `horus deploy robot -d /opt/robot` | | `-p, --port` | Custom SSH port (default: 22) | `horus deploy robot -p 2222` | | `-i, --identity` | SSH private key file | `horus deploy robot -i ~/.ssh/robot_key` | | `--all` | Deploy to every target in deploy.yaml | `horus deploy --all --run` | | `--list` | Show configured targets | `horus deploy --list` | ## Fleet Deployment Deploy to all robots with one command: ```bash horus deploy --all --run ``` This builds **once** and syncs to each robot. If one robot is unreachable, the others still get deployed: ``` HORUS Fleet Deploy (3 targets) 1. robot -> pi@192.168.1.50 2. jetson -> nvidia@jetson.local 3. arm-pc -> ubuntu@10.0.0.20 Step 1: Building for ARM64 (shared across 3 targets)... Build complete Step 2: Syncing to 3 targets... This will sync to 3 remote hosts (with --delete) Continue? [y/N] y --- [1/3] robot Files synced [1/3] robot done --- [2/3] jetson [2/3] jetson done --- [3/3] arm-pc [x] arm-pc failed: SSH connection refused === Fleet Deploy Summary 2/3 targets deployed successfully ``` ## What Gets Copied to the Robot The project directory is synced to `~/horus_deploy/` on the robot, **excluding**: - `target/` (build artifacts — only the final binary is kept) - `.git/` (version control history) - `__pycache__/` and `*.pyc` (Python cache) - `node_modules/` (JavaScript dependencies) The rsync `--delete` flag keeps the remote directory in sync — files you delete locally are also deleted on the robot. ## Architecture Auto-Detection If you don't specify `--arch`, horus guesses from the hostname: | Hostname contains | Detected architecture | |---|---| | `jetson`, `nano`, `xavier`, `orin` | aarch64 | | `pi4`, `pi5`, `raspberry` | aarch64 | | `pi3`, `pi2` | armv7 | | Anything else | aarch64 (default) | Override with `--arch` if the auto-detection is wrong: ```bash horus deploy myrobot@10.0.0.5 --arch x86_64 ``` ## Rust vs Python Differences | Aspect | Rust project | Python project | |--------|-------------|---------------| | Build step | Cross-compiles for target architecture | Skipped (no compilation needed) | | Robot requirements | Nothing beyond Linux + SSH | Python 3 + `pip install horus-robotics` | | What runs on robot | Single binary (`./target/.../my_robot`) | `python3 src/main.py` | | Binary size | Self-contained (~5-50 MB) | Source files only | | First deploy speed | Slower (compilation) | Faster (just file sync) | ## Auto-Start on Boot (systemd) To run your HORUS project automatically when the robot boots, create a systemd service: ```bash # Check status sudo systemctl status horus-robot # View logs journalctl -u horus-robot -f ``` ## Docker Deployment For containerized robots or fleet management, deploy as a Docker image: ### Multi-Stage Dockerfile (Rust) ```dockerfile # Stage 1: Build on x86_64 (cross-compile for ARM if needed) FROM rust:1.92-slim AS builder RUN apt-get update && apt-get install -y \ gcc-aarch64-linux-gnu \ && rm -rf /var/lib/apt/lists/* # Install HORUS RUN curl -fsSL https://horusrobotics.dev/install | bash WORKDIR /app COPY . . # Cross-compile for ARM64 (adjust for your robot's arch) RUN rustup target add aarch64-unknown-linux-gnu RUN CARGO_TARGET_AARCH64_UNKNOWN_LINUX_GNU_LINKER=aarch64-linux-gnu-gcc \ cargo build --release --target aarch64-unknown-linux-gnu # Stage 2: Minimal runtime image FROM debian:bookworm-slim RUN apt-get update && apt-get install -y \ i2c-tools \ && rm -rf /var/lib/apt/lists/* COPY --from=builder /app/target/aarch64-unknown-linux-gnu/release/my_robot /usr/local/bin/ COPY horus.toml /app/ WORKDIR /app CMD ["my_robot"] ``` ### Running with Hardware Access ```bash docker build -t my-robot . docker run --rm \ --device /dev/i2c-1 \ --device /dev/ttyUSB0 \ --cap-add SYS_NICE \ my-robot ``` **Key flags:** | Flag | Purpose | |------|---------| | `--device /dev/i2c-1` | I2C bus access | | `--device /dev/ttyUSB0` | Serial port access | | `--device /dev/gpiochip0` | GPIO access | | `--cap-add SYS_NICE` | RT scheduling (SCHED_FIFO) | | `--privileged` | Full hardware access (use only if needed) | | `--ipc=host` | Share host IPC namespace (for cross-container SHM topics) | | `-v /dev/shm:/dev/shm` | Share host shared memory (alternative to `--ipc=host`) | ### Docker Compose (Multi-Node Robot) ```yaml # docker-compose.yml version: "3.8" services: sensor: build: ./sensor devices: - /dev/i2c-1 - /dev/ttyUSB0 cap_add: - SYS_NICE ipc: host # Share SHM for zero-copy topics restart: unless-stopped controller: build: ./controller cap_add: - SYS_NICE ipc: host depends_on: - sensor restart: unless-stopped ``` ```bash docker compose up -d docker compose logs -f ``` > **SHM sharing**: For nodes in separate containers to communicate via zero-copy shared memory, use `ipc: host` or mount `/dev/shm`. Without this, topics fall back to network transport. --- ## Troubleshooting ### "Connection refused" when deploying SSH is not enabled or not running on the robot: ```bash # Connect a monitor/keyboard to the robot and run: sudo systemctl enable ssh sudo systemctl start ssh ``` ### "Permission denied (publickey)" The robot is rejecting your SSH key or password: ```bash # Try with explicit password authentication: ssh -o PreferredAuthentications=password pi@192.168.1.50 # If that works, re-copy your key: ssh-copy-id pi@192.168.1.50 ``` ### "rsync: command not found" Install rsync on your **development PC**: ```bash # Ubuntu/Debian: sudo apt install rsync # macOS: brew install rsync ``` ### Binary crashes or "Exec format error" on robot Wrong architecture. The binary was compiled for a different CPU than the robot has: ```bash # Check what the robot actually runs: ssh pi@192.168.1.50 "uname -m" # aarch64 = use --arch aarch64 # armv7l = use --arch armv7 # x86_64 = use --arch x86_64 # Re-deploy with the correct architecture: horus deploy pi@192.168.1.50 --arch armv7 --run ``` ### "cargo build failed" during cross-compilation The cross-compilation target may not be installed. Horus installs it automatically, but if it fails: ```bash # Install manually on your PC: rustup target add aarch64-unknown-linux-gnu # For ARM 32-bit: rustup target add armv7-unknown-linux-gnueabihf ``` You may also need the cross-compilation linker: ```bash # Ubuntu/Debian: sudo apt install gcc-aarch64-linux-gnu # For ARM 32-bit: sudo apt install gcc-arm-linux-gnueabihf ``` ### "ping raspberrypi.local" doesn't work mDNS may not be available. Try: ```bash # Install mDNS support on your PC: sudo apt install avahi-utils # Install on the robot (connect monitor/keyboard): sudo apt install avahi-daemon ``` Or skip mDNS and find the IP through your router or `nmap` instead. ### Deploy works but program doesn't find hardware (GPIO, serial, I2C) The robot user may not have permission to access hardware devices: ```bash # On the robot: sudo usermod -aG dialout,gpio,i2c,spi pi # Log out and back in for changes to take effect ``` ## See Also - [Operations Overview](/operations) — Quick reference for all operations commands - [RT Setup](/advanced/rt-setup) — RT kernel, safety monitoring, and performance tuning - [CLI Reference](/development/cli-reference) — Full `horus deploy` flag reference --- ## Operations Path: /operations Description: Deploy, monitor, and maintain HORUS robotics applications in production # Operations From development to production deployment, fleet management, and ongoing maintenance. This section covers everything you need to ship your HORUS application to real hardware. --- ## Quick Reference | Task | Command | |------|---------| | Deploy to one robot | `horus deploy pi@192.168.1.100` | | Deploy to named target | `horus deploy jetson-01` | | Deploy to all robots | `horus deploy --all` | | List deploy targets | `horus deploy --list` | | Monitor running system | `horus monitor` | | Check system health | `horus doctor` | | View flight recorder | `horus blackbox` | | Record a session | `horus run --record session1` | | Replay a recording | `horus record replay session1` | --- ## Deployment New to deploying? See **[Deploy to Your Robot](/operations/deploy-to-robot)** for the full setup guide — from preparing your robot to running your first deploy. ### Single Robot Deploy directly to a host: ```bash # Build, sync, and deploy horus deploy pi@192.168.1.100 # Deploy and run immediately horus deploy pi@192.168.1.100 --run # Deploy to specific architecture horus deploy ubuntu@jetson.local --arch aarch64 # Preview without deploying horus deploy pi@192.168.1.100 --dry-run ``` ### Named Targets Configure robots in `deploy.yaml` (project root): ```yaml targets: jetson-01: host: nvidia@10.0.0.1 arch: aarch64 dir: ~/robot jetson-02: host: nvidia@10.0.0.2 arch: aarch64 dir: ~/robot arm-controller: host: pi@10.0.0.10 arch: aarch64 dir: ~/arm port: 2222 identity: ~/.ssh/robot_key ``` Deploy by name: ```bash horus deploy jetson-01 horus deploy jetson-01 --run ``` ### Fleet Deployment Deploy to multiple robots at once: ```bash # Multiple named targets horus deploy jetson-01 jetson-02 jetson-03 # All targets from deploy.yaml horus deploy --all # Preview fleet deployment horus deploy --all --dry-run # List all configured targets horus deploy --list ``` Fleet deploy builds **once** (shared binary for same architecture), then syncs to each robot sequentially. Confirmation is asked once for the entire fleet, not per robot. If a deployment fails for one robot, the fleet continues to the next. A summary is printed at the end: ``` HORUS Fleet Deploy (3 targets) --- [1/3] jetson-01 [checkmark] jetson-01 done --- [2/3] jetson-02 [checkmark] jetson-02 done --- [3/3] jetson-03 [x] jetson-03 failed: SSH connection refused === Fleet Deploy Summary [checkmark] 2/3 targets deployed successfully ``` ### Supported Architectures | Architecture | Alias | Common Robots | |-------------|-------|---------------| | `aarch64` | `arm64`, `jetson`, `pi4`, `pi5` | Raspberry Pi 4/5, Jetson Nano/Xavier/Orin | | `armv7` | `arm`, `pi3`, `pi2` | Raspberry Pi 2/3, older ARM boards | | `x86_64` | `x64`, `amd64`, `intel` | Intel NUC, standard PCs | | `native` | `host`, `local` | Same as build machine | --- ## Monitoring ### Web Dashboard ```bash horus monitor # Opens web dashboard at http://localhost:4200 ``` The monitor shows: - **Active nodes** with health status, tick rates, CPU/memory usage - **Topic graph** with message flow and rates - **Parameters** with live editing - **Packages** with install/uninstall - **API Docs** with searchable symbol browser and topic flow visualization - **Logs** with filtering by node and severity ### TUI Dashboard ```bash horus monitor --tui # Terminal-based dashboard (no browser needed) ``` ### Programmatic Access ```bash # Node status horus node list --json # Topic rates horus topic list --json # System health horus doctor --json ``` --- ## Environment Management ### Reproducible Builds `horus.lock` pins every dependency to an exact version. Commit it to git, and `horus build` on another machine installs identical deps. --- ## Flight Recorder (BlackBox) The BlackBox records the last N events before a crash — like an airplane's black box. ```bash # View after crash horus blackbox horus blackbox --json horus blackbox anomalies ``` --- ## Record and Replay Record a session for debugging or regression testing: ```bash # Record horus run --record session1 # List recordings horus record list # Replay horus record replay session1 # Compare two recordings horus record diff session1 session2 # Export for analysis horus record export session1 --format mcap ``` --- ## Health Checks ```bash # Full system health check horus doctor # JSON output for CI horus doctor --json ``` The doctor checks: - Rust toolchain (cargo, rustc, clippy, fmt) - Python toolchain (python3, ruff, pytest) - Project manifest validity - Shared memory status - Disk usage - Plugin registry connectivity --- ## Deployment Checklist Before deploying to production: See [Production Deployment](/operations/deploy-to-robot) for the full guide. --- ## See Also - [Production Deployment](/operations/deploy-to-robot) — RT setup, safety, performance tuning - [Scheduler Configuration](/advanced/scheduler-configuration) — Tick rates, budgets, deadlines - [Package Management](/package-management/package-management) — Installing and managing packages - [CLI Reference](/development/cli-reference) — All 46+ commands ======================================== # SECTION: Reference ======================================== --- ## API Lookup by Task Path: /reference/api-index Description: Find the right HORUS type for your task — organized by what you want to do, not by module # API Lookup by Task Find the right type for what you're building. Organized by task, not module. For method signatures, see [API Cheatsheet](/reference/api-cheatsheet). ## Message Types — "What type do I use for...?" ### Motion & Control | Task | Type | Constructor | Topic | |------|------|-------------|-------| | Move a wheeled robot | `CmdVel` | `CmdVel::new(linear, angular)` | `cmd_vel` | | Move a differential drive | `DifferentialDriveCommand` | `DifferentialDriveCommand::new(left, right)` | `drive_cmd` | | Control a servo | `ServoCommand` | `ServoCommand::new(servo_id, position)` | `servo_cmd` | | Control a joint | `JointCommand` | `JointCommand::new()` | `joint_cmd` | | Control a motor | `MotorCommand` | `MotorCommand::velocity(motor_id, velocity)` | `motor_cmd` | | Follow a trajectory | `TrajectoryPoint` | `TrajectoryPoint::new(position, velocity)` | `trajectory` | | PID tuning | `PidConfig` | `PidConfig::new(kp, ki, kd)` | `pid_config` | ### Sensors | Task | Type | Constructor | Topic | |------|------|-------------|-------| | LiDAR scan | `LaserScan` | `LaserScan::new()` | `scan` | | Camera image | `Image` | `Image::new(w, h, ImageEncoding::Rgb8)?` | `image` | | Depth image | `DepthImage` | `DepthImage::meters(w, h)?` | `depth` | | Compressed image | `CompressedImage` | `CompressedImage::new("jpeg", data)` | `image/compressed` | | IMU (accel+gyro) | `Imu` | `Imu::new()` | `imu` | | Odometry (pose+velocity) | `Odometry` | `Odometry::new()` | `odom` | | Joint positions | `JointState` | `JointState::from_positions(&pos)` | `joint_states` | | Battery level | `BatteryState` | `BatteryState::new(voltage, percentage)` | `battery` | | Temperature | `Temperature` | `Temperature::new(value)` | `temperature` | | GPS | `NavSatFix` | `NavSatFix::from_coordinates(lat, lon, alt)` | `gps` | | Range sensor | `RangeSensor` | `RangeSensor::new(sensor_type, range)` | `range` | | Magnetic field | `MagneticField` | `MagneticField::new(x, y, z)` | `mag` | | Audio | `AudioFrame` | `AudioFrame::mono(sample_rate, &samples)` | `audio` | ### Perception | Task | Type | Constructor | |------|------|-------------| | 2D object detection | `Detection` | `Detection::new(class_name, confidence, x, y, w, h)` | | 3D object detection | `Detection3D` | `Detection3D::new(class_name, confidence, bbox3d)` | | 2D bounding box | `BoundingBox2D` | `BoundingBox2D::new(x, y, w, h)` | | 3D bounding box | `BoundingBox3D` | `BoundingBox3D::new(cx, cy, cz, length, width, height, yaw)` | | Object tracking | `TrackedObject` | POD struct — set fields directly | | Segmentation mask | `SegmentationMask` | `SegmentationMask::semantic(w, h, num_classes)` | | Point cloud | `PointCloud` | `PointCloud::from_xyz(&points)?` | | Plane detection | `PlaneDetection` | `PlaneDetection::new(coefficients, center, normal)` | | Landmark | `Landmark` | `Landmark::new(x, y, visibility, index)` | ### Navigation | Task | Type | Constructor | |------|------|-------------| | Send a goal | `NavGoal` | `NavGoal::new(target_pose, pos_tolerance, angle_tolerance)` | | Plan a path | `PathPlan` | `PathPlan::from_waypoints(&waypoints)` | | Occupancy grid | `OccupancyGrid` | `OccupancyGrid::new(w, h, resolution, origin)` | | Cost map | `CostMap` | `CostMap::new()` | | Waypoint | `Waypoint` | `Waypoint::new(pose: Pose2D)` | ### Geometry | Task | Type | Constructor | |------|------|-------------| | 3D point | `Point3` | `Point3::new(x, y, z)` | | 3D vector | `Vector3` | `Vector3::new(x, y, z)` | | Rotation | `Quaternion` | `Quaternion::identity()` | | 2D pose | `Pose2D` | `Pose2D::new(x, y, theta)` | | 3D pose | `Pose3D` | `Pose3D::new(position, orientation)` | | Velocity | `Twist` | `Twist::new(linear, angular)` | | Acceleration | `Accel` | `Accel::new(linear, angular)` | | Transform | `TransformStamped` | `TransformStamped::new(parent, child, translation, rotation)` | ### Force & Contact | Task | Type | Constructor | |------|------|-------------| | Force command | `ForceCommand` | `ForceCommand::force_only(force_vec)` | | Impedance control | `ImpedanceParameters` | `ImpedanceParameters::new(stiffness, damping)` | | Wrench (force+torque) | `WrenchStamped` | `WrenchStamped::new(force, torque)` | | Contact info | `ContactInfo` | `ContactInfo::new(force, position)` | ### Diagnostics | Task | Type | Constructor | |------|------|-------------| | Status report | `DiagnosticStatus` | `DiagnosticStatus::ok("message")` | | Emergency stop | `EmergencyStop` | `EmergencyStop::trigger("reason")` | | Heartbeat | `Heartbeat` | `Heartbeat::now()` | | Resource usage | `ResourceUsage` | `ResourceUsage::new(cpu, memory)` | --- ## Core API Cheatsheet --- ## CLI Commands | Command | Purpose | |---------|---------| | `horus new NAME -r/-p` | Create Rust/Python project | | `horus run` | Build and run | | `horus build` | Build only | | `horus test` | Run tests | | `horus add NAME --source SOURCE` | Add dependency | | `horus remove NAME` | Remove dependency | | `horus install PKG` | Install package from registry | | `horus topic list` | List active topics | | `horus node list` | List running nodes | | `horus param get/set KEY VALUE` | Runtime parameters | | `horus monitor` | Web dashboard | | `horus monitor --tui` | Terminal dashboard | | `horus log` | View logs | | `horus doctor` | Health check | | `horus check` | Validate project | | `horus fmt` | Format code | | `horus lint` | Lint code | | `horus deploy TARGET` | Deploy to robot | | `horus publish` | Publish to registry | | `horus auth login` | Authenticate | ## See Also - [Getting Started](/getting-started/quick-start) — First project in 5 minutes - [Tutorials](/tutorials) — Step-by-step guided examples - [Rust API](/rust/api/scheduler) — Full Rust API reference - [Python API](/python/api/python-bindings) — Python bindings reference - [CLI Reference](/development/cli-reference) — Complete CLI documentation --- ## API Cheatsheet Path: /reference/api-cheatsheet Description: Every public type signature in horus — Node, Topic, Scheduler, Messages, Services, Actions, TransformFrame — in one page. # API Cheatsheet Every public type signature in HORUS on one page. Use this as a quick lookup when coding. For explanations, see [Concepts](/concepts). For detailed docs, see the [Rust API](/rust/api) and [Python API](/python/api). --- ## Node Trait | Method | Signature | Default | Description | |--------|-----------|---------|-------------| | `name` | `fn name(&self) -> &str` | Type name | Unique node identifier | | `init` | `fn init(&mut self) -> Result<()>` | `Ok(())` | Called once at startup | | `tick` | `fn tick(&mut self)` | *required* | Main loop, called repeatedly | | `shutdown` | `fn shutdown(&mut self) -> Result<()>` | `Ok(())` | Called once at cleanup | | `publishers` | `fn publishers(&self) -> Vec` | `vec![]` | Topic metadata for pubs | | `subscribers` | `fn subscribers(&self) -> Vec` | `vec![]` | Topic metadata for subs | | `on_error` | `fn on_error(&mut self, error: &str)` | Logs via `hlog!` | Custom error recovery | | `is_safe_state` | `fn is_safe_state(&self) -> bool` | `true` | Safety monitor query | | `enter_safe_state` | `fn enter_safe_state(&mut self)` | No-op | Emergency stop transition | ## NodeBuilder | Method | Parameter | Description | |--------|-----------|-------------| | `order` | `u32` | Execution priority (lower = earlier). 0-9 critical, 10-49 high, 50-99 normal, 100+ low | | `rate` | `Frequency` | Tick rate. Auto-derives budget (80%) and deadline (95%). Auto-enables RT for BestEffort nodes | | `budget` | `Duration` | Max tick execution time. Overrides auto-derived 80% budget | | `deadline` | `Duration` | Absolute latest tick finish. Overrides auto-derived 95% deadline | | `on_miss` | `Miss` | Deadline miss policy: `Warn`, `Skip`, `SafeMode`, `Stop` | | `compute` | — | Parallel thread pool execution (CPU-bound work) | | `on` | `&str` | Event-triggered on topic update | | `async_io` | — | Tokio blocking pool execution (I/O-bound work) | | `failure_policy` | `FailurePolicy` | Per-node failure handling: `fatal()`, `restart(n, backoff)`, `skip()`, `ignore()` | | `priority` | `i32` | OS-level SCHED_FIFO priority (1-99). RT nodes only | | `core` | `usize` | Pin RT thread to CPU core (also locks governor + moves IRQs) | | `deadline_scheduler` | — | Opt-in to SCHED_DEADLINE (kernel EDF). Falls back to SCHED_FIFO | | `no_alloc` | — | Panic if `tick()` allocates heap memory | | `watchdog` | `Duration` | Per-node watchdog timeout (overrides global) | | `build` | — | Finalize and register node. Returns `Result<&mut Scheduler>` | ## Scheduler | Method | Parameter | Description | |--------|-----------|-------------| | `Scheduler::new()` | — | Constructor with RT capability auto-detection | | `Scheduler::simulation()` | — | Constructor that skips RT detection (for tests) | | `name` | `&str` | Set scheduler name (default: `"Scheduler"`) | | `tick_rate` | `Frequency` | Global tick rate (e.g. `1000_u64.hz()`) | | `deterministic` | `bool` | Sequential execution, SimClock, fixed dt, seeded RNG | | `prefer_rt` | — | Try mlockall + SCHED_FIFO, degrade gracefully | | `require_rt` | — | Panic if RT unavailable | | `cores` | `&[usize]` | Pin scheduler threads to CPU cores | | `watchdog` | `Duration` | Frozen-node detection timeout | | `blackbox` | `usize` | Flight recorder size in MB | | `max_deadline_misses` | `u64` | Emergency stop threshold (default: 100) | | `verbose` | `bool` | Enable/disable executor thread logging | | `with_recording` | — | Enable session recording | | `telemetry` | `&str` | Export endpoint (`"udp://host:port"`) | | `add` | `impl Node` | Returns `NodeBuilder` for fluent configuration | | `set_node_rate` | `(&str, Frequency)` | Change node rate at runtime | | `run` | — | Start the main loop (blocks) | | `run_for` | `Duration` | Run for a fixed duration | | `run_ticks` | `u64` | Run exactly N ticks | | `run_until` | `FnMut() -> bool, u64` | Run until predicate or max ticks | | `tick_once` | — | Single-tick execution (sim/test). Lazy-inits on first call | | `stop` | — | Signal scheduler to stop | | `status` | — | Human-readable status report | ## `Topic` | Method | Signature | Description | |--------|-----------|-------------| | `new` | `fn new(name: impl Into) -> Result` | Create topic with auto-detected backend | | `send` | `fn send(&self, msg: T)` | Fire-and-forget with bounded retry | | `try_send` | `fn try_send(&self, msg: T) -> Result<(), T>` | Non-blocking send, returns msg on failure | | `send_blocking` | `fn send_blocking(&self, msg: T, timeout: Duration) -> Result<(), SendBlockingError>` | Blocking send with spin-yield-sleep strategy | | `recv` | `fn recv(&self) -> Option` | Receive next message | | `try_recv` | `fn try_recv(&self) -> Option` | Non-blocking receive (no logging) | | `read_latest` | `fn read_latest(&self) -> Option where T: Copy` | Peek latest without advancing consumer | | `name` | `fn name(&self) -> &str` | Topic name | | `metrics` | `fn metrics(&self) -> TopicMetrics` | Send/recv counts and failure counts | ## Services | Type | Method | Description | |------|--------|-------------| | `ServiceClient` | `new() -> Result` | Create blocking client for service `S` | | `ServiceClient` | `call(req, timeout) -> Result` | Blocking RPC call | | `ServiceClient` | `call_with_retry(req, timeout, RetryConfig) -> Result` | Call with retry policy | | `AsyncServiceClient` | `new() -> Result` | Create non-blocking client | | `AsyncServiceClient` | `call_async(req) -> PendingResponse` | Non-blocking call, returns handle | | `ServiceServerBuilder` | `new() -> Self` | Start building a server | | `ServiceServerBuilder` | `on_request(Fn(Req) -> Result) -> Self` | Set request handler | | `ServiceServerBuilder` | `build() -> Result>` | Spawn background polling thread | | `ServiceServer` | `stop(&self)` | Stop server (also happens on drop) | ## Actions | Type | Method | Description | |------|--------|-------------| | `ActionClientBuilder` | `new() -> Self` | Start building an action client | | `ActionClientBuilder` | `build() -> Result>` | Create client node | | `ActionClientNode` | `send_goal(goal) -> ClientGoalHandle` | Send goal, get tracking handle | | `ActionClientNode` | `send_goal_with_priority(goal, GoalPriority) -> ClientGoalHandle` | Send prioritized goal | | `ActionClientNode` | `cancel_goal(GoalId)` | Cancel a running goal | | `ClientGoalHandle` | `status() -> GoalStatus` | Current goal status | | `ClientGoalHandle` | `is_active() -> bool` | Still running? | | `ClientGoalHandle` | `is_done() -> bool` | Terminal state? | | `ClientGoalHandle` | `result() -> Option` | Get result if complete | | `ClientGoalHandle` | `last_feedback() -> Option` | Most recent feedback | | `ClientGoalHandle` | `await_result(timeout) -> Option` | Block until done or timeout | | `ClientGoalHandle` | `await_result_with_feedback(timeout, Fn(&Feedback))` | Block with feedback callback | | `ActionServerBuilder` | `new() -> Self` | Start building an action server | | `ActionServerBuilder` | `on_goal(Fn(Goal) -> GoalResponse) -> Self` | Accept/reject handler | | `ActionServerBuilder` | `on_cancel(Fn(GoalId) -> CancelResponse) -> Self` | Cancel handler | | `ActionServerBuilder` | `on_execute(Fn(ServerGoalHandle) -> GoalOutcome) -> Self` | Execution handler | | `ActionServerBuilder` | `build() -> Result>` | Create server node | | `ServerGoalHandle` | `goal() -> &A::Goal` | Access goal data | | `ServerGoalHandle` | `is_cancel_requested() -> bool` | Client requested cancel? | | `ServerGoalHandle` | `should_abort() -> bool` | Cancel or preempt requested? | | `ServerGoalHandle` | `publish_feedback(feedback)` | Send progress feedback | | `ServerGoalHandle` | `succeed(result) -> GoalOutcome` | Complete successfully | | `ServerGoalHandle` | `abort(result) -> GoalOutcome` | Server-side abort | | `ServerGoalHandle` | `canceled(result) -> GoalOutcome` | Acknowledge cancellation | | `ServerGoalHandle` | `preempted(result) -> GoalOutcome` | Preempted by higher-priority goal | ## TransformFrame | Method | Signature | Description | |--------|-----------|-------------| | `new` | `fn new() -> Self` | Create empty TF tree (default config) | | `register_frame` | `fn register_frame(&self, name: &str, parent: Option<&str>) -> Result` | Add a frame to the tree | | `tf` | `fn tf(&self, src: &str, dst: &str) -> Result` | Lookup latest transform between frames | | `tf_at` | `fn tf_at(&self, src: &str, dst: &str, timestamp_ns: u64) -> Result` | Lookup transform at a specific time | | `tf_at_strict` | `fn tf_at_strict(&self, src: &str, dst: &str, timestamp_ns: u64) -> Result` | Strict time lookup (no interpolation beyond tolerance) | | `tf_at_with_tolerance` | `fn tf_at_with_tolerance(&self, src: &str, dst: &str, ts: u64, tol: u64) -> Result` | Custom time tolerance | | `update_transform` | `fn update_transform(&self, parent: &str, child: &str, ts: u64, tf: Transform) -> Result<()>` | Update a frame's transform | | `has_frame` | `fn has_frame(&self, name: &str) -> bool` | Check if frame exists | | `all_frames` | `fn all_frames(&self) -> Vec` | List all registered frame names | | `frame_count` | `fn frame_count(&self) -> usize` | Number of registered frames | | `tf_by_id` | `fn tf_by_id(&self, src: FrameId, dst: FrameId) -> Option` | Lookup by numeric ID (fast path) | | `tf_at_by_id` | `fn tf_at_by_id(&self, src: FrameId, dst: FrameId, ts: u64) -> Option` | Time-based lookup by ID | ## DurationExt | Method | Input | Output | Example | |--------|-------|--------|---------| | `ns` | `u64`, `f64`, `i32` | `Duration` | `500_u64.ns()` | | `us` | `u64`, `f64`, `i32` | `Duration` | `200_u64.us()` | | `ms` | `u64`, `f64`, `i32` | `Duration` | `1_u64.ms()` | | `secs` | `u64`, `f64`, `i32` | `Duration` | `5_u64.secs()` | | `hz` | `u64`, `f64`, `i32` | `Frequency` | `100_u64.hz()` | `Frequency` methods: `value() -> f64`, `period() -> Duration`, `budget_default() -> Duration` (80%), `deadline_default() -> Duration` (95%). ## Enums ### ExecutionClass | Variant | Description | |---------|-------------| | `Rt` | Dedicated RT thread with spin-wait timing | | `Compute` | Parallel thread pool for CPU-bound work | | `Event(String)` | Triggered by topic updates | | `AsyncIo` | Tokio blocking pool for I/O-bound work | | `BestEffort` | Default -- main tick loop, sequential | ### Miss | Variant | Description | |---------|-------------| | `Warn` | Log warning and continue (default) | | `Skip` | Skip this tick, resume next cycle | | `SafeMode` | Call `enter_safe_state()`, continue ticking in safe mode | | `Stop` | Stop the entire scheduler | ### NodeState | Variant | Description | |---------|-------------| | `Uninitialized` | Created but not started | | `Initializing` | `init()` in progress | | `Running` | Normal operation | | `Stopping` | `shutdown()` in progress | | `Stopped` | Cleanly stopped | | `Error(String)` | Error occurred, still running | | `Crashed(String)` | Fatal error, unresponsive | ### HealthStatus | Variant | Description | |---------|-------------| | `Healthy` | Operating normally | | `Warning` | Degraded performance (slow ticks, missed deadlines) | | `Error` | Errors occurring but still running | | `Critical` | Fatal errors, about to crash | | `Unknown` | No heartbeat received (default) | ## Standard Message Types ### Geometry | Type | Fields | Size | |------|--------|------| | `Pose2D` | `x: f64, y: f64, theta: f64` | 24 B | | `Pose3D` | `translation: [f64; 3], rotation: [f64; 4]` | 56 B | | `Point3` | `x: f64, y: f64, z: f64` | 24 B | | `Vector3` | `x: f64, y: f64, z: f64` | 24 B | | `Quaternion` | `x: f64, y: f64, z: f64, w: f64` | 32 B | | `Twist` | `linear: [f64; 3], angular: [f64; 3]` | 48 B | | `Accel` | `linear: [f64; 3], angular: [f64; 3]` | 48 B | | `TransformStamped` | `parent: str, child: str, timestamp_ns: u64, transform: Transform` | var | | `PoseStamped` | `pose: Pose3D, timestamp_ns: u64, frame_id: str` | var | | `PoseWithCovariance` | `pose: Pose3D, covariance: [f64; 36]` | 344 B | | `TwistWithCovariance` | `twist: Twist, covariance: [f64; 36]` | 336 B | | `AccelStamped` | `accel: Accel, timestamp_ns: u64, frame_id: str` | var | ### Sensors | Type | Fields | Size | |------|--------|------| | `Imu` | `orientation: [f64;4], orientation_covariance: [f64;9], angular_velocity: [f64;3], linear_acceleration: [f64;3], + covariances` | 304 B | | `LaserScan` | `ranges: [f32; 360], angle_min/max/increment: f32, range_min/max: f32` | 1480 B | | `Odometry` | `pose: Pose2D, twist: Twist, pose_covariance: [f64;36], twist_covariance: [f64;36]` | POD | | `JointState` | `position: Vec, velocity: Vec, effort: Vec, name: Vec` | var | | `BatteryState` | `voltage: f32, percentage: f32, current: f32, charging: bool, temperature: f32` | 17 B | | `RangeSensor` | `sensor_type: u8, range: f32, min_range: f32, max_range: f32, fov: f32` | 17 B | | `NavSatFix` | `latitude: f64, longitude: f64, altitude: f64, status: i8, covariance: [f64;9]` | var | | `MagneticField` | `field: [f64; 3], covariance: [f64; 9]` | 96 B | | `Temperature` | `temperature: f64, variance: f64` | 16 B | | `FluidPressure` | `pressure: f64, variance: f64` | 16 B | | `Illuminance` | `illuminance: f64, variance: f64` | 16 B | ### Control | Type | Fields | Size | |------|--------|------| | `CmdVel` | `linear: f32, angular: f32` | 8 B | | `MotorCommand` | `left: f64, right: f64` | 16 B | | `ServoCommand` | `servo_id: u8, position: f32, speed: f32, torque_limit: f32` | 13 B | | `JointCommand` | `name: Vec, position: Vec, velocity: Vec, effort: Vec` | var | | `PidConfig` | `kp: f64, ki: f64, kd: f64, output_min: f64, output_max: f64` | 40 B | | `TrajectoryPoint` | `position: [f64;3], linear_velocity: Vec3, angular_velocity: Vec3, time: f64` | var | | `DifferentialDriveCommand` | `left: f64, right: f64, timestamp_ns: u64` | 24 B | ### Navigation | Type | Fields | Size | |------|--------|------| | `NavGoal` | `target_pose: Pose2D, position_tolerance: f64, angle_tolerance: f64` | 40 B | | `GoalResult` | `goal_id: u32, status: GoalStatus, message: String` | var | | `NavPath` | `waypoints: Vec` | var | | `PathPlan` | `poses: Vec, cost: f64` | var | | `Waypoint` | `pose: Pose2D, speed: f64, tolerance: f64` | 40 B | | `OccupancyGrid` | `width: u32, height: u32, resolution: f32, origin: Pose2D, data: Vec` | var | | `CostMap` | `width: u32, height: u32, resolution: f32, origin: Pose2D, data: Vec` | var | | `VelocityObstacle` | `position: Point3, velocity: Vector3, radius: f64` | 56 B | ### Vision | Type | Fields | |------|--------| | `CompressedImage` | `format: String, data: Vec, timestamp_ns: u64` | | `CameraInfo` | `width: u32, height: u32, fx: f64, fy: f64, cx: f64, cy: f64, distortion: Vec` | | `RegionOfInterest` | `x: u32, y: u32, width: u32, height: u32` | | `StereoInfo` | `left: CameraInfo, right: CameraInfo, baseline: f64` | ### Perception and Detection | Type | Fields | |------|--------| | `BoundingBox2D` | `x: f32, y: f32, width: f32, height: f32` | | `BoundingBox3D` | `cx: f32, cy: f32, cz: f32, length: f32, width: f32, height: f32, yaw: f32` | | `Detection` | `class_name: String, confidence: f32, bbox: BoundingBox2D` | | `Detection3D` | `class_name: String, confidence: f32, bbox: BoundingBox3D` | | `Landmark` | `x: f32, y: f32, visibility: f32, index: u32` | | `Landmark3D` | `x: f32, y: f32, z: f32, visibility: f32, index: u32` | | `LandmarkArray` | `landmarks: Vec, landmarks_3d: Vec` | | `SegmentationMask` | `width: u32, height: u32, class_ids: Vec` | | `TrackedObject` | `track_id: u64, bbox: BoundingBox2D, class_id: u32, confidence: f32` | | `PlaneDetection` | `coefficients: [f64;4], center: Point3, normal: Vector3` | ### Force and Haptics | Type | Fields | |------|--------| | `WrenchStamped` | `force: Vector3, torque: Vector3, timestamp_ns: u64, frame_id: String` | | `ForceCommand` | `force: Vector3, torque: Vector3, frame_id: String` | | `ImpedanceParameters` | `stiffness: [f64;6], damping: [f64;6], inertia: [f64;6]` | | `HapticFeedback` | `force: Vector3, vibration_frequency: f64, vibration_amplitude: f64` | | `ContactInfo` | `state: ContactState, force_magnitude: f64, contact_point: Point3, normal: Vector3` | | `TactileArray` | `rows: u32, cols: u32, forces: Vec, total_force: [f32;3], center_of_pressure: [f32;2], in_contact: bool` | ### Diagnostics | Type | Fields | |------|--------| | `Heartbeat` | `node_name: String, node_id: u32, timestamp_ns: u64, sequence: u64, health: HealthStatus` | | `DiagnosticStatus` | `level: StatusLevel, code: u32, message: String, values: Vec` | | `DiagnosticReport` | `statuses: Vec, timestamp_ns: u64` | | `EmergencyStop` | `triggered: bool, source: String, reason: String, timestamp_ns: u64` | | `SafetyStatus` | `state: NodeStateMsg, health: HealthStatus, violations: Vec` | | `ResourceUsage` | `cpu_percent: f32, memory_bytes: u64, thread_count: u32` | | `NodeHeartbeat` | `node_name: String, tick_count: u64, health: HealthStatus` | ### Input | Type | Fields | |------|--------| | `JoystickInput` | `joystick_id: u32, input_type: InputType, button_id/axis_id: u32, value: f32` | | `KeyboardInput` | `key: String, code: u32, modifiers: Vec, pressed: bool` | ### Clock | Type | Fields | |------|--------| | `Clock` | `timestamp_ns: u64, clock_type: ClockType` | | `TimeReference` | `time_ref_ns: u64, source_name: String, offset_ns: i64` | ## Python API ### Node (Functional) | Parameter | Type | Default | Description | |-----------|------|---------|-------------| | `name` | `Optional[str]` | Auto-generated UUID | Unique node name | | `subs` | `str`, `list`, or `dict` | `None` | Topics to subscribe to | | `pubs` | `str`, `list`, or `dict` | `None` | Topics to publish to | | `tick` | `Callable[[Node], None]` | `None` | Main loop callback (can be `async def`) | | `init` | `Callable[[Node], None]` | `None` | Startup callback | | `shutdown` | `Callable[[Node], None]` | `None` | Cleanup callback | | `on_error` | `Callable[[Node, Exception], None]` | `None` | Error handler | | `rate` | `float` | `30` | Tick rate in Hz | ### Node Methods | Method | Signature | Description | |--------|-----------|-------------| | `has_msg` | `has_msg(topic: str) -> bool` | Check if messages available (peeks) | | `recv` | `recv(topic: str) -> Optional[Any]` | Receive next message | | `recv_all` | `recv_all(topic: str) -> List[Any]` | Drain all available messages | | `send` | `send(topic: str, data: Any) -> bool` | Send data to topic | | `log_info` | `log_info(message: str) -> None` | Log info (during tick only) | | `log_warning` | `log_warning(message: str) -> None` | Log warning (during tick only) | | `log_error` | `log_error(message: str) -> None` | Log error (during tick only) | | `log_debug` | `log_debug(message: str) -> None` | Log debug (during tick only) | | `request_stop` | `request_stop() -> None` | Request scheduler shutdown | | `publishers` | `publishers() -> List[str]` | List pub topic names | | `subscribers` | `subscribers() -> List[str]` | List sub topic names | ### Scheduler | Method | Signature | Description | |--------|-----------|-------------| | `__init__` | `Scheduler(*, tick_rate=1000.0, rt=False, deterministic=False, blackbox_mb=0, watchdog_ms=0, recording=False)` | Constructor | | `add` | `add(node, order=100, rate=None, rt=False, failure_policy=None, on_miss=None, budget=None, deadline=None) -> Scheduler` | Add node with options | | `node` | `node(node) -> NodeBuilder` | Fluent builder API | | `run` | `run(duration=None) -> None` | Run scheduler (blocks). Pass seconds or `None` for forever | | `stop` | `stop() -> None` | Signal stop | | `set_node_rate` | `set_node_rate(name: str, rate: float) -> None` | Change node rate at runtime | | `get_node_stats` | `get_node_stats(name: str) -> Dict` | Per-node metrics | | `get_all_nodes` | `get_all_nodes() -> List[Dict]` | All node metrics | | `get_node_count` | `get_node_count() -> int` | Number of registered nodes | | `has_node` | `has_node(name: str) -> bool` | Check node exists | | `get_node_names` | `get_node_names() -> List[str]` | All node names | | `status` | `status() -> str` | Scheduler status string | | `capabilities` | `capabilities() -> Optional[Dict]` | RT capabilities | | `has_full_rt` | `has_full_rt() -> bool` | Full RT available? | | `safety_stats` | `safety_stats() -> Optional[Dict]` | Budget overruns, deadline misses | | `current_tick` | `current_tick() -> int` | Tick counter | ### `horus.run()` | Parameter | Type | Description | |-----------|------|-------------| | `*nodes` | `Node` | Node instances to run (positional) | | `duration` | `Optional[float]` | Seconds to run (`None` = forever) | | `tick_rate` | `float` | Global tick rate in Hz (default: 1000.0) | | `deterministic` | `bool` | SimClock, fixed dt, seeded RNG | | `rt` | `bool` | Enable memory locking + RT scheduling | | `watchdog_ms` | `int` | Watchdog timeout (0 = disabled) | | `blackbox_mb` | `int` | Flight recorder size (0 = disabled) | | `recording` | `bool` | Enable session recording | ## CLI Commands | Command | Usage | Description | |---------|-------|-------------| | `horus new` | `horus new mybot [--python\|--rust\|--cpp]` | Create a new project | | `horus init` | `horus init [-n name]` | Initialize workspace in current directory | | `horus run` | `horus run [files...] [-r] [--sim] [--record name]` | Build and run project | | `horus build` | `horus build [files...] [-r] [-c]` | Build without running | | `horus test` | `horus test [FILTER] [-r] [--sim] [--integration]` | Run tests | | `horus check` | `horus check [PATH] [--full] [--health]` | Validate manifest and sources | | `horus clean` | `horus clean [--shm] [--all] [-n]` | Clean build artifacts and SHM | | `horus launch` | `horus launch file.yaml [--dry-run]` | Launch multiple nodes from YAML | | `horus topic` | `horus topic list\|echo\|pub\|info` | Topic introspection | | `horus node` | `horus node list\|info\|kill` | Node management | | `horus param` | `horus param get\|set\|list\|delete` | Runtime parameters | | `horus frame` | `horus frame list\|echo\|tree` | TransformFrame operations (alias: `horus tf`) | | `horus service` | `horus service list\|call\|info` | Service interaction | | `horus action` | `horus action list\|info\|send-goal\|cancel-goal` | Action interaction | | `horus msg` | `horus msg list\|show\|fields` | Message type introspection | | `horus log` | `horus log [NODE] [-l level] [-f] [-n N]` | View and filter logs | | `horus blackbox` | `horus blackbox [-a] [-f] [--json]` | Flight recorder inspection | | `horus monitor` | `horus monitor [PORT] [--tui]` | Live TUI/web dashboard | | `horus install` | `horus install name[@ver] [--driver\|--plugin]` | Install package or driver | --- ## See Also - [API Quick Reference](/reference/api-index) — One-page type reference - [Rust API](/rust/api) — Complete Rust API documentation - [Python API](/python/api) — Complete Python API documentation --- ## C++ API Reference Path: /reference/cpp-api Description: Complete reference for all HORUS C++ classes, methods, and types # C++ API Reference All C++ types live in the `horus` namespace. Message types are in `horus::msg`. ## horus::Scheduler The central orchestrator — creates, configures, and runs nodes. ```cpp #include ``` ### Constructor | Method | Description | |--------|-------------| | `Scheduler()` | Create with default configuration | ### Builder Methods (chainable) | Method | Description | |--------|-------------| | `tick_rate(Frequency)` | Set main loop rate (`100_hz`) | | `name(string_view)` | Set scheduler name | | `prefer_rt()` | Prefer RT scheduling (degrade gracefully) | | `require_rt()` | Require RT scheduling (fail if unavailable) | | `deterministic(bool)` | Enable deterministic mode (SimClock + seeded RNG) | | `verbose(bool)` | Enable verbose logging | | `watchdog(Duration)` | Set global watchdog timeout | | `blackbox(size_t)` | Set flight recorder size in MB | | `network(bool)` | Enable/disable LAN replication (on by default) | ### Topic Methods | Method | Description | |--------|-------------| | `advertise(name)` | Create a `Publisher` for the named topic | | `subscribe(name)` | Create a `Subscriber` for the named topic | ### Node Methods | Method | Description | |--------|-------------| | `add(name)` | Add a node, returns `NodeBuilder` | ### Lifecycle | Method | Description | |--------|-------------| | `spin()` | Run until stopped (blocks) | | `tick_once()` | Execute one tick of all nodes | | `stop()` | Stop the scheduler (thread-safe) | | `is_running()` | Check if still running | | `get_name()` | Get scheduler name | | `status()` | Human-readable status string | | `has_full_rt()` | Check RT capabilities | | `node_list()` | Get registered node names | --- ## horus::Node (struct-based) Base class for nodes with built-in pub/sub and lifecycle hooks. Like Rust's `impl Node for T`. ```cpp #include class Controller : public horus::Node { public: Controller() : Node("controller") { scan_ = subscribe("lidar.scan"); cmd_ = advertise("motor.cmd"); } void tick() override { auto scan = scan_->recv(); if (!scan) return; msg::CmdVel cmd{}; cmd.linear = 0.3f; cmd_->send(cmd); } void init() override { /* called once before first tick */ } void enter_safe_state() override { /* stop motors */ } private: Subscriber* scan_; Publisher* cmd_; }; // Register with scheduler: Controller ctrl; sched.add(ctrl).rate(100_hz).order(10).build(); ``` | Method | Description | |--------|-------------| | `tick()` | Called every scheduler tick (pure virtual) | | `init()` | Called once before first tick | | `enter_safe_state()` | Called on safety events | | `on_shutdown()` | Called on scheduler shutdown | | `advertise(topic)` | Create publisher, returns `Publisher*` | | `subscribe(topic)` | Create subscriber, returns `Subscriber*` | | `name()` | Node name | | `publishers()` | List of published topic names | | `subscriptions()` | List of subscribed topic names | --- ## horus::LambdaNode (Python-style) Declarative node with builder pattern. Like Python's `horus.Node()`. ```cpp #include auto node = horus::LambdaNode("controller") .sub("lidar.scan") .pub("motor.cmd") .on_tick([](horus::LambdaNode& self) { auto scan = self.recv("lidar.scan"); if (!scan) return; self.send("motor.cmd", msg::CmdVel{0, 0.3f, 0.0f}); }); sched.add(node).order(10).build(); ``` | Method | Description | |--------|-------------| | `pub(topic)` | Declare publisher (builder, chainable) | | `sub(topic)` | Declare subscriber (builder, chainable) | | `on_tick(fn)` | Set tick callback: `void(LambdaNode&)` | | `on_init(fn)` | Set init callback: `void(LambdaNode&)` | | `send(topic, msg)` | Send message to topic (call from tick) | | `recv(topic)` | Receive from topic (call from tick) | | `has_msg(topic)` | Check if message available | --- ## horus::NodeBuilder Configures scheduling for a node. Returned by `Scheduler::add()`. ### Builder Methods (chainable) | Method | Description | Default | |--------|-------------|---------| | `rate(Frequency)` | Tick rate. Auto-derives budget (80%) and deadline (95%) | BestEffort | | `budget(Duration)` | Max tick time. Overrides auto-derived | 80% of period | | `deadline(Duration)` | Absolute latest a tick can finish | 95% of period | | `on_miss(Miss)` | Policy on deadline miss | `Miss::Warn` | | `compute()` | Parallel thread pool execution | — | | `async_io()` | Tokio blocking pool | — | | `on(topic)` | Event-triggered by topic update | — | | `order(uint32_t)` | Execution order (lower = earlier). 0-9: critical, 10-49: high, 50-99: normal, 100+: low | 0 | | `pin_core(size_t)` | Pin to CPU core | Auto | | `priority(int32_t)` | SCHED_FIFO priority (1-99) | Auto | | `watchdog(Duration)` | Per-node watchdog | Global | | `tick(function)` | Set tick callback (lambda or `Node`/`LambdaNode`) | Required | ### Finalize | Method | Description | |--------|-------------| | `build()` | Register the node with the scheduler | ### Three Ways to Add Nodes ```cpp // 1. Lambda (quick) sched.add("ctrl").tick([&]{ /* ... */ }).build(); // 2. Struct-based (stateful, lifecycle hooks) class MyNode : public horus::Node { /* ... */ }; MyNode node; sched.add(node).rate(100_hz).build(); // 3. LambdaNode (Python-style, declarative) auto node = horus::LambdaNode("ctrl").sub("in").pub("out").on_tick(fn); sched.add(node).build(); ``` --- ## horus::Publisher<T> Publishes messages to a topic. Create via `Node::advertise()` or directly. | Method | Description | |--------|-------------| | `loan()` | Get a writable `LoanedSample` (zero-copy from SHM) | | `publish(LoanedSample&&)` | Publish the sample (move-only) | | `send(const T&)` | Send by copy (simpler, slight overhead for large types) | | `name()` | Topic name | --- ## horus::Subscriber<T> Receives messages from a topic. Create via `Scheduler::subscribe()`. | Method | Description | |--------|-------------| | `recv()` | Receive next message → `std::optional>` | | `has_msg()` | Check if message available (non-consuming) | | `name()` | Topic name | --- ## horus::LoanedSample<T> Writable handle to shared memory. Move-only, RAII release. | Method | Description | |--------|-------------| | `operator->()` | Direct SHM pointer (0ns) | | `operator*()` | Dereference | | `get()` | Raw `T*` pointer | **Move-only**: copy constructor and copy assignment are deleted. --- ## horus::BorrowedSample<T> Read-only received message. Move-only, RAII release. | Method | Description | |--------|-------------| | `operator->()` | Read pointer (const) | | `operator*()` | Dereference (const) | | `get()` | Raw `const T*` pointer | --- ## horus::Frequency Represents a frequency in Hertz. Created via `_hz` literal. | Method | Description | |--------|-------------| | `value()` | Get Hz as `double` | | `period()` | Get period as `std::chrono::microseconds` | | `budget_default()` | 80% of period | | `deadline_default()` | 95% of period | --- ## horus::Miss Deadline miss policy enum. | Value | Description | |-------|-------------| | `Miss::Warn` | Log warning, continue | | `Miss::Skip` | Skip this tick | | `Miss::SafeMode` | Enter safe state | | `Miss::Stop` | Stop scheduler | --- ## Duration Literals ```cpp using namespace horus::literals; ``` | Literal | Type | Example | |---------|------|---------| | `_hz` | `Frequency` | `100_hz` | | `_ms` | `microseconds` | `5_ms` (5000us) | | `_us` | `microseconds` | `200_us` | | `_ns` | `nanoseconds` | `500_ns` | | `_s` | `microseconds` | `3_s` (3000000us) | --- ## horus::TensorPool SHM-backed memory pool for zero-copy large data (images, point clouds, tensors). ```cpp #include ``` | Method | Description | |--------|-------------| | `TensorPool(pool_id, size, max_slots)` | Create pool with given capacity | | `stats()` | Get `Stats{allocated, used_bytes, free_bytes}` | --- ## horus::Tensor Pool-allocated N-dimensional tensor. Move-only, RAII. | Method | Description | |--------|-------------| | `Tensor(pool, shape, ndim, dtype)` | Allocate from pool | | `data()` | Raw `uint8_t*` to SHM memory | | `nbytes()` | Size in bytes | | `release()` | Return to pool early | `horus::Dtype`: `F32`, `F64`, `U8`, `I32` --- ## horus::Image Pool-backed image. Move-only, RAII. ```cpp auto pool = horus::TensorPool(1, 16*1024*1024, 64); auto img = horus::Image(pool, 640, 480, horus::Encoding::Rgb8); ``` | Method | Description | |--------|-------------| | `Image(pool, w, h, encoding)` | Allocate from pool | | `width()` / `height()` | Dimensions | | `data_size()` | Total bytes (w * h * channels) | `horus::Encoding`: `Rgb8`, `Rgba8`, `Gray8`, `Bgr8` --- ## horus::PointCloud Pool-backed point cloud. Move-only, RAII. ```cpp auto pc = horus::PointCloud(pool, 1000, 3); // 1000 XYZ points ``` | Method | Description | |--------|-------------| | `PointCloud(pool, n, fields)` | Allocate from pool. fields: 3=XYZ, 4=XYZI, 6=XYZRGB | | `num_points()` | Number of points | | `fields_per_point()` | Fields per point | --- ## horus::Params Dynamic runtime parameters (key-value store). ```cpp #include auto params = horus::Params(); params.set("max_speed", 1.5); double speed = params.get("max_speed", 0.0); ``` | Method | Description | |--------|-------------| | `set(key, value)` | Set parameter (overloaded for double, int64_t, bool, string) | | `get(key, default)` | Get with default. T: `double`, `int64_t`, `int`, `bool`, `std::string` | | `get_f64(key)` / `get_i64(key)` / `get_bool(key)` / `get_string(key)` | Typed getters returning `std::optional` | | `has(key)` | Check if parameter exists | --- ## horus::ServiceClient Request/response RPC client. ```cpp #include auto client = horus::ServiceClient("add_two_ints"); auto resp = client.call(R"({"a":3,"b":4})", 1000ms); ``` | Method | Description | |--------|-------------| | `ServiceClient(name)` | Create client for named service | | `call(json, timeout)` | Call with JSON request, returns `std::optional` | --- ## horus::ServiceServer Request/response RPC server. | Method | Description | |--------|-------------| | `ServiceServer(name)` | Create server for named service | | `set_handler(fn)` | Set handler: `bool(const uint8_t* req, size_t len, uint8_t* res, size_t* res_len)` | --- ## horus::ActionClient Long-running task client with progress feedback. ```cpp #include auto client = horus::ActionClient("navigate"); auto goal = client.send_goal(R"({"x":5.0})"); while (goal.is_active()) { /* poll */ } ``` | Method | Description | |--------|-------------| | `ActionClient(name)` | Create client for named action | | `send_goal(json)` | Send goal, returns `GoalHandle` | --- ## horus::GoalHandle Tracks a sent goal. Move-only. | Method | Description | |--------|-------------| | `status()` | `GoalStatus` enum | | `id()` | Goal ID | | `is_active()` | Still running? | | `cancel()` | Request cancellation | `horus::GoalStatus`: `Pending`, `Active`, `Succeeded`, `Aborted`, `Canceled`, `Rejected` --- ## horus::ActionServer Long-running task server. | Method | Description | |--------|-------------| | `ActionServer(name)` | Create server for named action | | `set_accept_handler(fn)` | `uint8_t(const uint8_t* goal, size_t len)` → 0=accept, 1=reject | | `set_execute_handler(fn)` | `void(uint64_t goal_id, const uint8_t* goal, size_t len)` | | `is_ready()` | Both handlers set? | --- ## horus::TransformFrame Coordinate frame system (TF tree). ```cpp #include auto tf = horus::TransformFrame(); tf.register_frame("world"); tf.register_frame("base_link", "world"); tf.update("base_link", {1,2,0}, {0,0,0,1}, timestamp); auto t = tf.lookup("base_link", "world"); ``` | Method | Description | |--------|-------------| | `TransformFrame()` | Create with default capacity (256 frames) | | `TransformFrame(max)` | Create with custom max frames | | `register_frame(name, parent)` | Register frame. `parent=nullptr` for root | | `update(frame, pos, rot, ts)` | Update transform. rot=[qx,qy,qz,qw] | | `lookup(source, target)` | Returns `std::optional` | | `can_transform(source, target)` | Check if path exists | --- ## Message Types All in `horus::msg::` namespace. Include via `` or individual headers. See [Getting Started: C++](/docs/getting-started/cpp) for the full type table. --- ## Performance FFI boundary: **15-17ns**. Scheduler tick (1 node): **250ns** median. Throughput: **2.84M ticks/sec**. Zero ASAN errors. See [Benchmarks: C++ Binding Performance](/docs/performance/benchmarks#c-binding-performance) for full results with percentiles and scalability analysis. --- ## AI Context — Complete Framework Reference Path: /reference/ai-context Description: Dense, structured reference for AI agents. Complete API surface, safety rules, patterns, and examples in one page. # AI Context — Complete Framework Reference This page is a dense, machine-readable reference designed for AI coding agents (Claude, GPT, Copilot). It contains the complete API surface, safety rules, common patterns, and message types needed to generate correct HORUS code. Human developers should use the [API Cheatsheet](/reference/api-cheatsheet) or the full [API Reference](/reference) instead. --- ## Mental Model HORUS is a tick-based robotics runtime. Applications are composed of **Nodes** (units of computation) that exchange data through **Topics** (typed pub/sub channels backed by shared memory). A **Scheduler** orchestrates node execution in priority order each tick cycle. Communication is local shared memory IPC with latencies from ~3ns (same-thread) to ~167ns (cross-process). Systems can be single-process (all nodes in one scheduler) or multi-process (nodes in separate processes sharing topics via platform-managed shared memory). There are no callbacks — nodes implement `tick()` which the scheduler calls each cycle. --- ## Node Trait (Complete) All nodes implement the `Node` trait. Only `tick()` is required; all others have defaults. ```rust // simplified use horus::prelude::*; ``` | Method | Signature | Default | Description | |--------|-----------|---------|-------------| | `name` | `fn name(&self) -> &str` | **Required** | Unique node identifier | | `tick` | `fn tick(&mut self)` | **Required** | Called every scheduler cycle; do work here | | `init` | `fn init(&mut self) -> Result<()>` | `Ok(())` | Called once before first tick; open hardware, allocate buffers | | `shutdown` | `fn shutdown(&mut self) -> Result<()>` | `Ok(())` | Called on Ctrl+C / scheduler stop; release hardware, zero actuators | | `publishers` | `fn publishers(&self) -> Vec` | `vec![]` | Declare published topic names for introspection (internal — auto-generated by node! macro) | | `subscribers` | `fn subscribers(&self) -> Vec` | `vec![]` | Declare subscribed topic names for introspection (internal — auto-generated by node! macro) | | `on_error` | `fn on_error(&mut self, error: &str)` | no-op | Called when tick() panics or returns error | | `is_safe_state` | `fn is_safe_state(&self) -> bool` | `true` | Query whether node is in a safe state | | `enter_safe_state` | `fn enter_safe_state(&mut self)` | no-op | Called by safety monitor on deadline miss with `Miss::SafeMode` | --- ## NodeBuilder API (Complete) Chain after `scheduler.add(node)`. Finalize with `.build()`. | Method | Parameter | Effect | |--------|-----------|--------| | `.order(n)` | `u32` | Execution priority; lower = runs first. 0-9 critical, 10-49 high, 50-99 normal, 100-199 low, 200+ background | | `.rate(freq)` | `Frequency` | Per-node tick rate. Auto-derives budget (80% period), deadline (95% period). Auto-marks as RT | | `.budget(dur)` | `Duration` | Max allowed tick execution time. Auto-marks as RT | | `.deadline(dur)` | `Duration` | Hard deadline for tick completion. Auto-marks as RT | | `.on_miss(policy)` | `Miss` | Deadline miss policy: `Warn`, `Skip`, `SafeMode`, `Stop` | | `.compute()` | — | Mark as CPU-heavy compute node (may use worker threads) | | `.on(topic)` | `&str` | Event-driven: node ticks only when topic receives a message | | `.async_io()` | — | Mark as async I/O node (non-blocking network/file) | | `.priority(prio)` | `i32` | OS-level SCHED_FIFO priority (1-99). RT nodes only | | `.core(cpu_id)` | `usize` | Pin RT thread to CPU core via `sched_setaffinity` | | `.watchdog(timeout)` | `Duration` | Per-node watchdog timeout (overrides global) | | `.failure_policy(p)` | `FailurePolicy` | Override failure handling: `Fatal`, `Restart`, `Skip`, `Ignore` | | `.build()` | — | Finalize and register the node. Returns `Result<()>` | **RT auto-detection**: Setting `.budget()`, `.deadline()`, or `.rate(Frequency)` automatically sets `is_rt=true` and `ExecutionClass::Rt`. There is no `.rt()` method. This is order-independent via deferred `finalize()`. --- ## Topic API (Complete) `Topic` requires `T: Clone + Serialize + Deserialize + Send + Sync + 'static`. | Method | Signature | Behavior | |--------|-----------|----------| | `new` | `Topic::new(name: &str) -> Result` | Create/connect to a named topic. Auto-selects optimal IPC backend | | `with_capacity` | `Topic::with_capacity(name: &str, capacity: u32, slot_size: Option) -> Result` | Custom ring buffer capacity | | `send` | `fn send(&self, msg: T)` | Publish message. Infallible. Overwrites oldest on buffer full | | `try_send` | `fn try_send(&self, msg: T) -> Result<(), T>` | Attempt send; returns message on failure | | `send_blocking` | `fn send_blocking(&self, msg: T, timeout: Duration) -> Result<(), SendBlockingError>` | Block until space available or timeout. For critical commands | | `recv` | `fn recv(&self) -> Option` | Non-blocking receive. Returns `None` if empty | | `try_recv` | `fn try_recv(&self) -> Option` | Same as `recv()` for most use cases | | `read_latest` | `fn read_latest(&self) -> Option where T: Copy` | Read latest without advancing consumer position. Requires `T: Copy` | | `has_message` | `fn has_message(&self) -> bool` | Check if messages are available without consuming | | `pending_count` | `fn pending_count(&self) -> u64` | Number of messages waiting | | `name` | `fn name(&self) -> &str` | Topic name | | `metrics` | `fn metrics(&self) -> TopicMetrics` | Message counts, send/recv failures | | `dropped_count` | `fn dropped_count(&self) -> u64` | Messages lost to buffer overflow | --- ## Scheduler API | Method | Signature | Description | |--------|-----------|-------------| | `new` | `Scheduler::new() -> Scheduler` | Create with auto-detected capabilities (~30-100us) | | `.tick_rate(freq)` | `Frequency` | Global tick rate (default: 100 Hz) | | `.prefer_rt()` | — | Try RT features, degrade gracefully | | `.require_rt()` | — | Enable RT features, panic if unavailable | | `.watchdog(dur)` | `Duration` | Frozen node detection, auto-creates safety monitor | | `.deterministic(bool)` | `bool` | Enable deterministic mode | | `.with_blackbox(mb)` | `usize` | BlackBox flight recorder buffer | | `.with_recording()` | — | Enable record/replay | | `.max_deadline_misses(n)` | `u32` | Emergency stop after n deadline misses (default: 100) | | `.verbose(bool)` | `bool` | Enable/disable non-emergency logging | | `.add(node)` | `impl Node` | Add node, returns chainable builder | | `.run()` | `-> Result<()>` | Main loop until Ctrl+C | | `.run_for(dur)` | `Duration -> Result<()>` | Run for specific duration | | `.tick_once()` | `-> Result<()>` | Execute exactly one tick cycle | | `.tick(names)` | `&[&str] -> Result<()>` | One tick cycle for named nodes only (unstable) | | `.set_node_rate(name, freq)` | `&str, Frequency` | Change per-node rate at runtime (unstable) | | `.stop()` | — | Stop the scheduler | | `.is_running()` | `-> bool` | Check if running | | `.metrics()` | `-> Vec` | Per-node performance metrics | | `.safety_stats()` | `-> Option` | Budget overruns, deadline misses, watchdog expirations (unstable) | | `.node_list()` | `-> Vec` | Registered node names (unstable) | --- ## DurationExt and Frequency | Method | Type | Example | Result | |--------|------|---------|--------| | `.ns()` | `Duration` | `500.ns()` | 500 nanoseconds | | `.us()` | `Duration` | `200.us()` | 200 microseconds | | `.ms()` | `Duration` | `10.ms()` | 10 milliseconds | | `.secs()` | `Duration` | `1.secs()` | 1 second | | `.hz()` | `Frequency` | `100.hz()` | 100 Hz (period = 10ms) | **Frequency methods**: `value() -> f64`, `period() -> Duration`, `budget_default() -> Duration` (80% period), `deadline_default() -> Duration` (95% period). **Validation**: `Frequency` panics on 0, negative, NaN, or infinity. Works on `u64`, `f64`, `i32`. Import via `use horus::prelude::*;`. --- ## Execution Classes > `ExecutionClass` is `#[doc(hidden)]` and internal to the scheduler. Users never set it directly — it is auto-derived from builder methods (`.rate()`, `.compute()`, `.on()`, `.async_io()`). | Class | When Used | Thread Model | How Triggered | |-------|-----------|--------------|---------------| | **Rt** | Real-time nodes with timing guarantees | Priority-scheduled, optional CPU pinning | Auto: set `.budget()`, `.deadline()`, or `.rate()` | | **Compute** | CPU-heavy work (ML inference, path planning) | May use worker thread pool | `.compute()` | | **Event** | React to incoming data | Wakes on topic message | `.on("topic.name")` | | **AsyncIo** | Network, file, database I/O | Non-blocking async runtime | `.async_io()` | | **BestEffort** | Default; no special scheduling | Runs in tick order | No method called (default) | Execution classes are **mutually exclusive** per node. RT is always implicit from timing parameters. --- ## Miss (Deadline Policy) | Policy | What Happens | Use For | |--------|--------------|---------| | `Miss::Warn` | Log warning, continue normally | Soft RT (logging, UI). **Default** | | `Miss::Skip` | Skip this node for current tick | Firm RT (video encoding) | | `Miss::SafeMode` | Call `enter_safe_state()` on the node | Motor controllers, safety nodes | | `Miss::Stop` | Stop entire scheduler | Hard RT safety-critical | --- ## Safety Rules (CRITICAL) **Rule one: Always implement `shutdown()` for actuators.** Without it, motors continue at last velocity on Ctrl+C. ```rust // simplified fn shutdown(&mut self) -> Result<()> { self.motor.set_velocity(0.0); Ok(()) } ``` **Rule two: Always call `recv()` every tick.** Ring buffers overwrite old messages. Skipping ticks loses data. ```rust // simplified fn tick(&mut self) { // ALWAYS recv, cache result if let Some(msg) = self.sub.recv() { self.cached = Some(msg); } // Then use cached value } ``` **Rule three: Never `sleep()` in `tick()`.** It blocks the entire scheduler. All nodes share the tick cycle. ```rust // simplified // BAD: std::thread::sleep(Duration::from_millis(100)); // GOOD: Use scheduler rate control instead ``` **Rule four: Never do blocking I/O in `tick()`.** File reads, network calls, and database queries belong in `init()` or in an `.async_io()` node. ```rust // simplified // BAD: let data = std::fs::read_to_string("file.txt")?; // GOOD: Read in init(), use cached data in tick() ``` **Rule five: Use dots not slashes in topic names.** Slashes conflict with shared memory paths and fail on macOS. ```rust // simplified // CORRECT: Topic::new("sensors.lidar") // WRONG: Topic::new("sensors/lidar") ``` --- ## Common Patterns ### Pattern: Publisher ### Pattern: Subscriber ### Pattern: Pub+Sub Pipeline ### Pattern: Multi-Rate System ### Pattern: State Machine Always call `recv()` unconditionally, outside the state match. ### Pattern: Aggregator with Caching Synchronize multiple topics by caching the latest value from each. --- ## Standard Message Types All types available via `use horus::prelude::*;`. All fixed-size types support zero-copy shared memory transport. ### Geometry | Type | Key Fields | |------|------------| | `Twist` | `linear: [f64; 3]`, `angular: [f64; 3]`, `timestamp_ns: u64` | | `CmdVel` | `linear: f32`, `angular: f32`, `timestamp_ns: u64` | | `Pose2D` | `x: f64`, `y: f64`, `theta: f64`, `timestamp_ns: u64` | | `Pose3D` | `position: Point3`, `orientation: Quaternion`, `timestamp_ns: u64` | | `PoseStamped` | `pose: Pose3D`, `frame_id: [u8; 32]` | | `PoseWithCovariance` | `pose: Pose3D`, `covariance: [f64; 36]` | | `TwistWithCovariance` | `twist: Twist`, `covariance: [f64; 36]` | | `Point3` | `x: f64`, `y: f64`, `z: f64` | | `Vector3` | `x: f64`, `y: f64`, `z: f64` | | `Quaternion` | `x: f64`, `y: f64`, `z: f64`, `w: f64` | | `TransformStamped` | `translation: [f64; 3]`, `rotation: [f64; 4]` | | `Accel` / `AccelStamped` | `linear: [f64; 3]`, `angular: [f64; 3]` | ### Sensors | Type | Key Fields | |------|------------| | `Imu` | `orientation: [f64; 4]`, `angular_velocity: [f64; 3]`, `linear_acceleration: [f64; 3]`, covariance arrays | | `LaserScan` | `ranges: [f32; 360]`, `angle_min/max: f32`, `range_min/max: f32` | | `Odometry` | `pose: Pose2D`, `twist: Twist`, covariance arrays, frame IDs | | `JointState` | `names: [[u8;32];16]`, `positions/velocities/efforts: [f64;16]`, `joint_count: u8` | | `NavSatFix` | `latitude/longitude/altitude: f64`, `status: u8`, `satellites_visible: u16` | | `BatteryState` | `voltage/current/charge/capacity: f32`, `percentage: f32`, `temperature: f32` | | `RangeSensor` | `range: f32`, `sensor_type: u8`, `field_of_view: f32` | | `Temperature` | `temperature: f64`, `variance: f64` | | `FluidPressure` | `fluid_pressure: f64`, `variance: f64` | | `Illuminance` | `illuminance: f64`, `variance: f64` | | `MagneticField` | `magnetic_field: [f64; 3]`, `covariance: [f64; 9]` | ### Control | Type | Key Fields | |------|------------| | `MotorCommand` | `motor_id: u8`, `mode: u8` (0=vel,1=pos,2=torque,3=voltage), `target: f64` | | `ServoCommand` | `servo_id: u8`, `position: f32` (rad), `speed: f32` (0-1) | | `JointCommand` | Joint-level position/velocity/torque commands | | `PidConfig` | `kp/ki/kd: f64`, `integral_limit: f64`, `output_limit: f64` | | `DifferentialDriveCommand` | `left_velocity: f64`, `right_velocity: f64` (rad/s) | | `TrajectoryPoint` | `position: [f64;3]`, `velocity: [f64;3]`, `orientation: [f64;4]`, `time_from_start: f64` | ### Navigation | Type | Key Fields | |------|------------| | `NavGoal` | `target_pose: Pose2D`, `tolerance_position/angle: f64`, `timeout_seconds: f64` | | `NavPath` | `waypoints: [Waypoint; 256]`, `waypoint_count: u16`, `total_length: f64` | | `OccupancyGrid` | Grid-based occupancy map (variable-size) | | `CostMap` | Navigation cost map (variable-size) | | `PathPlan` | Planned path with algorithm metadata | ### Vision | Type | Key Fields | |------|------------| | `Image` | Pool-backed RAII. `Image::new(w, h, encoding)?`. Zero-copy via shared memory pool | | `DepthImage` | Pool-backed RAII. F32 or U16 depth data | | `CompressedImage` | `format: [u8;8]`, `data: Vec` (variable-size, MessagePack serialized) | | `CameraInfo` | `width/height: u32`, `camera_matrix: [f64;9]`, `distortion_coefficients: [f64;8]` | | `RegionOfInterest` | `x_offset/y_offset/width/height: u32`, `do_rectify: bool` | ### Perception | Type | Key Fields | |------|------------| | `Detection` | `bbox: BoundingBox2D`, `confidence: f32`, `class_id: u32` | | `Detection3D` | `bbox: BoundingBox3D`, `confidence: f32`, `velocity_x/y/z: f32` | | `BoundingBox2D` | `x/y/width/height: f32` (pixels) | | `BoundingBox3D` | `cx/cy/cz/length/width/height: f32` (meters), `roll/pitch/yaw: f32` | | `PointCloud` | Pool-backed RAII. `PointXYZ`, `PointXYZI`, `PointXYZRGB` formats | | `Landmark` / `Landmark3D` | `x/y(/z): f32`, `visibility: f32`, `index: u32` | | `LandmarkArray` | Up to N landmarks. Presets: `coco_pose()`, `mediapipe_pose/hand/face()` | | `SegmentationMask` | `width/height: u32`, `num_classes: u32`, `mask_type: u32` | | `PlaneDetection` | `coefficients: [f64;4]`, `center: Point3`, `normal: Vector3` | | `TrackedObject` | Object tracking with ID and velocity | ### Force and Impedance | Type | Key Fields | |------|------------| | `WrenchStamped` | `force: [f64;3]`, `torque: [f64;3]`, `frame_id` | | `ForceCommand` | Force/torque command for compliant control | | `ImpedanceParameters` | Stiffness, damping, inertia for impedance control | ### Diagnostics | Type | Key Fields | |------|------------| | `DiagnosticReport` | `component: [u8;32]`, up to 16 `DiagnosticValue` entries | | `DiagnosticStatus` | Status level + message | | `DiagnosticValue` | Typed key-value: `string()`, `int()`, `float()`, `bool()` | | `EmergencyStop` | Emergency stop command | | `Heartbeat` / `NodeHeartbeat` | Periodic health signal with tick count and rate | | `SafetyStatus` | Overall system safety state | | `ResourceUsage` | CPU, memory, disk usage | ### Input | Type | Key Fields | |------|------------| | `JoystickInput` | Axes, buttons, hat switches for teleoperation | | `KeyboardInput` | Key events for HID control | ### Clock | Type | Key Fields | |------|------------| | `Clock` | Simulation/wall time. Sources: `SOURCE_WALL`, `SOURCE_SIM`, `SOURCE_REPLAY` | | `TimeReference` | Time synchronization reference | --- ## Python API Quick Reference ### Functional Node ```python import horus def my_tick(node): node.send("temperature", 25.5) sensor = horus.Node( name="temp_sensor", pubs="temperature", tick=my_tick, rate=10 ) horus.run(sensor, duration=30) ``` ### Stateful Node (class container) ```python import horus from horus import CmdVel class DriveState: def tick(self, node): node.send("cmd_vel", CmdVel(linear=0.5, angular=0.0)) def shutdown(self, node): node.send("cmd_vel", CmdVel(linear=0.0, angular=0.0)) drive = DriveState() node = horus.Node(name="drive_node", tick=drive.tick, shutdown=drive.shutdown, pubs=["cmd_vel"], rate=50, order=0) ``` ### Run ```python horus.run(node) ``` ### One-Liner ```python horus.run(sensor_node, controller_node, duration=60) ``` ### Topic ```python from horus import Topic, CmdVel pub = Topic("cmd_vel", CmdVel) pub.send(CmdVel(linear=1.0, angular=0.0)) sub = Topic("cmd_vel", CmdVel) msg = sub.recv() # Returns None if empty ``` ### Functional recv ```python def process(node): if node.has_msg("scan"): scan = node.recv("scan") all_scans = node.recv_all("scan") ``` --- ## CLI Quick Reference | Command | Usage | |---------|-------| | `horus new ` | Create new project with `horus.toml` + `src/` | | `horus run [files...]` | Build and run application | | `horus build` | Build without running | | `horus test [filter]` | Run tests | | `horus check` | Validate `horus.toml` and workspace | | `horus clean --shm` | Clean stale shared memory regions | | `horus monitor` | Web + TUI monitoring dashboard | | `horus topic list` | List active topics | | `horus topic echo ` | Print messages on a topic | | `horus node list` | List running nodes | | `horus tf tree` | Print transform frame tree | | `horus install ` | Install package from registry | | `horus launch ` | Launch multi-node system from YAML | | `horus param get ` | Get runtime parameter | | `horus deploy [target]` | Deploy to remote robot | | `horus doctor` | Comprehensive health check | | `horus fmt` | Format code (Rust + Python) | | `horus lint` | Lint code (clippy + ruff) | --- ## horus.toml Format `horus.toml` is the single source of truth. Native build files (`Cargo.toml`, `pyproject.toml`) are generated into `.horus/` automatically. ```toml [package] name = "my-robot" version = "0.1.0" description = "My robot project" authors = ["Name "] [dependencies] # Rust deps (auto-detected as crates.io) serde = { version = "1.0", source = "crates.io", features = ["derive"] } nalgebra = "0.32" # Python deps (specify source = "pypi") numpy = { version = ">=1.24", source = "pypi" } torch = { version = ">=2.0", source = "pypi" } # System deps libudev = { version = "*", source = "system" } # Path deps my_lib = { path = "../my_lib" } # Git deps some_crate = { git = "https://github.com/user/repo", branch = "main" } [dev-dependencies] criterion = { version = "0.5", source = "crates.io" } pytest = { version = ">=7.0", source = "pypi" } [scripts] sim = "horus sim start --world warehouse" deploy = "horus deploy pi@robot --release" test-hw = "horus run tests/hardware_check.rs" [hardware] realsense = { use = "realsense", sim = true } dynamixel = { use = "dynamixel", port = "/dev/ttyUSB0", sim = true } [hooks] pre_run = ["fmt", "lint"] post_build = ["test"] ``` **Dependency sources**: `crates.io` (Rust, default), `pypi` (Python), `system`, `path`, `git`. **Generated files**: `horus build` creates `.horus/Cargo.toml` and `.horus/pyproject.toml`. Never edit these directly. --- ## Import Pattern ```rust // simplified use horus::prelude::*; // Provides: Node, Topic, Scheduler, DurationExt, Frequency, Miss, // all message types, error types, macros, services, actions, etc. // 165+ types in one import. ``` ## Custom Messages ```rust // simplified use serde::{Serialize, Deserialize}; #[derive(Clone, Serialize, Deserialize)] struct MyMessage { x: f32, y: f32, label: String, } let topic: Topic = Topic::new("my.data")?; ``` ## Macros | Macro | Purpose | |-------|---------| | `message!` | Define custom message types | | `service!` | Define request/response service types | | `action!` | Define long-running action types (goal/feedback/result) | | `node!` | Define node with automatic topic registration | | `topics!` | Compile-time topic name + type descriptors | | `hlog!(level, ...)` | Structured node logging | | `hlog_once!(level, ...)` | Log once per execution | | `hlog_every!(n, level, ...)` | Throttled logging every n calls | ## Error Types ```rust // simplified use horus::prelude::*; // Error, Result, HorusError // Variants: CommunicationError, ConfigError, MemoryError, NodeError, // NotFoundError, ParseError, ResourceError, SerializationError, // TimeoutError, TransformError, ValidationError // Helpers: retry_transient(), RetryConfig ``` ## Performance Reference | Metric | Value | |--------|-------| | Same-thread topic | ~3 ns | | Same-process 1:1 | ~18 ns | | Same-process N:M | ~36 ns | | Cross-process | ~50-167 ns | | Scheduler tick overhead | ~50-100 ns | | Shared memory allocation | ~100 ns | --- ## See Also - [API Cheatsheet](/reference/api-cheatsheet) — Quick command and type lookup - [Internals](/reference/internals) — Architecture for contributors --- ## API Reference Path: /reference Description: Complete API reference for HORUS — pure lookup tables for every method, type, and configuration option # API Reference Pure lookup -- find the method signature you need. For explanations and tutorials, see [Learn](/learn) and [Tutorials](/tutorials). --- ## Core APIs | API | What it covers | |-----|---------------| | **[Scheduler API](/rust/api/scheduler)** | Builder methods, execution modes, introspection | | **[Topic API](/rust/api/topic)** | send, recv, try_send, try_recv, read_latest, capacity | | **[Services API](/rust/api/services)** | ServiceClient, ServiceServer, call, call_resilient | | **[Actions API](/rust/api/actions)** | ActionClient, ActionServer, GoalStatus, GoalOutcome | | **[TransformFrame API](/rust/api/transform-frame)** | Frame registration, lookups, interpolation | ## Helpers | API | What it covers | |-----|---------------| | **[DurationExt & Frequency](/rust/api/duration-ext)** | `.hz()`, `.ms()`, `.us()`, `.secs()`, `Frequency` type | | **[Macros](/rust/api/macros)** | `message!`, `service!`, `action!`, `node!`, `hlog!` | | **[Feature Flags](/rust/api/feature-flags)** | telemetry, blackbox, macros | ## Configuration | Reference | What it covers | |-----------|---------------| | **[CLI Reference](/development/cli-reference)** | All `horus` commands and flags | | **[Error Reference](/development/error-handling)** | HorusError variants, error patterns | | **[horus.toml](/concepts/horus-toml)** | Project manifest format | | **[Configuration](/package-management/configuration)** | Global and project configuration | ## Quick Lookup | Reference | What it covers | |-----------|---------------| | **[API Cheatsheet](/reference/api-cheatsheet)** | Every public type signature on one page | | **[AI Context](/reference/ai-context)** | Dense, machine-readable API reference for AI agents | --- ## See Also - [API Cheatsheet](/reference/api-cheatsheet) — Every public type signature on one page - [AI Context](/reference/ai-context) — Machine-readable API context for AI coding agents - [Internals](/reference/internals) — Architecture for contributors - [Benchmarks](/performance/benchmarks) — Latency and throughput measurements - [CLI Reference](/development/cli-reference) — All 46+ commands and flags --- ## Platform Support Path: /reference/platform-support Description: How HORUS works across Linux, macOS, and Windows — what's identical, what differs, and what to watch for # Platform Support HORUS runs on Linux, macOS, and Windows. The core framework — topics, nodes, scheduler, shared memory IPC — works identically on all platforms. You write the same code everywhere. The `horus_sys` hardware abstraction layer handles OS differences internally. ## Quick Summary | Capability | Linux | macOS | Windows | |-----------|-------|-------|---------| | **Topics (pub/sub)** | Full | Full | Full | | **Nodes and Scheduler** | Full | Full | Full | | **Shared Memory IPC** | Full | Full | Full | | **Real-Time Scheduling** | Full (SCHED_FIFO) | Partial (Mach threads) | Partial (REALTIME_PRIORITY) | | **CLI (`horus` commands)** | Full | Full | Full | | **Python Bindings** | Full | Full | Full | | **Plugin Sandbox** | Full (seccomp) | Not available | Not available | | **Device Discovery** | Full (/dev) | Full (IOKit) | Full (SetupDi) | ## Same Code, Every Platform Your HORUS application code is platform-independent. This Rust node runs on all three OSes: No `#ifdef`, no platform checks, no conditional imports. HORUS handles it. ## How It Works Under the Hood ### Shared Memory (IPC) HORUS uses shared memory for zero-copy inter-process communication. Each platform uses a different OS mechanism, but the API is identical: | Platform | Backend | Location | Performance | |----------|---------|----------|-------------| | **Linux** | tmpfs-backed files | `/dev/shm/horus_{namespace}/` | Fastest (RAM-backed filesystem) | | **macOS** | POSIX `shm_open()` | Kernel objects (no filesystem path) | Fast (kernel-managed) | | **Windows** | `CreateFileMappingW` | Pagefile-backed | Fast (OS page cache) | You never interact with these directly. When you call `Topic::new("sensor.imu")`, HORUS creates the shared memory region using the right backend for your OS. ### Real-Time Scheduling Real-time scheduling quality varies by OS: | Platform | Mechanism | Priority Range | Typical Jitter | Memory Lock | |----------|-----------|---------------|----------------|-------------| | **Linux** | `SCHED_FIFO` | 1–99 | ~10μs (PREEMPT_RT) | Full (`mlockall`) | | **macOS** | Mach thread constraints | 1–99 | ~500μs | Partial (`mlock`) | | **Windows** | `REALTIME_PRIORITY_CLASS` | 1–31 | ~1–10ms | Partial (`VirtualLock`) | When you call `.rate(100_u64.hz())` or `.budget(200_u64.us())` on a node builder, HORUS automatically uses the best available RT mechanism. On macOS and Windows, real-time guarantees are weaker — the framework does its best, but the OS kernel doesn't provide the same determinism as Linux with PREEMPT_RT. ### Timer Precision | Platform | Sleep Resolution | Clock Source | |----------|-----------------|--------------| | **Linux** | 1 ns (`clock_nanosleep`) | `CLOCK_MONOTONIC` | | **macOS** | ~42 ns (`mach_absolute_time`) | Mach timebase | | **Windows** | ~100 ns (QPC + `timeBeginPeriod`) | Query Performance Counter | Your code doesn't need to handle this — `Scheduler` uses the best available timer automatically. ## CLI Commands All `horus` CLI commands work on every platform: ```bash horus new my-project # Create a project horus build # Build it horus run # Run it horus test # Test it horus check # Lint and validate horus doctor # Check system health ``` ### Scripts Scripts defined in `horus.toml` run through the platform's native shell: | Platform | Shell | Command | |----------|-------|---------| | **Linux / macOS** | `sh -c` | POSIX shell (bash, zsh, etc.) | | **Windows** | `cmd.exe /C` | Windows command processor | ```toml [scripts] build-release = "cargo build --release" # Works everywhere start-sim = "horus sim3d --headless" # Works everywhere ``` ### File Paths HORUS handles path separators automatically. Use forward slashes in `horus.toml` — they work on all platforms: ```toml [dependencies] my-driver = { path = "drivers/my-driver" } # Works on Windows too ``` ## Platform-Specific Notes ### Linux Linux is the primary development and deployment platform for HORUS. - **Full real-time support** with PREEMPT_RT kernel (see [RT Setup Guide](/advanced/rt-setup)) - **Plugin sandboxing** via seccomp-BPF for untrusted plugins - **Device access** via `/dev/*` (serial ports, cameras, GPIO) - **Deploy to robots** with `horus deploy` (requires SSH + rsync on target) ### macOS macOS is fully supported for development and testing. - **Shared memory** uses `shm_open()` kernel objects — no `/dev/shm` directory needed - **No PREEMPT_RT** — real-time scheduling uses Mach thread constraints (best-effort) - **No plugin sandbox** — seccomp is Linux-only. Plugins run without sandboxing - **Xcode Command Line Tools** are the only prerequisite (`xcode-select --install`) ### Windows Windows support is for development and testing. Production deployment should target Linux. - **Shared memory** uses Windows file mappings (pagefile-backed) — works without special setup - **Real-time scheduling** uses `REALTIME_PRIORITY_CLASS` — weaker guarantees than Linux - **No plugin sandbox** — seccomp is Linux-only - **No `rsync`/`ssh` by default** — `horus deploy` requires WSL 2 or Git Bash ## Developing Cross-Platform Projects If your HORUS project needs to work on multiple OSes: 1. **Use dots in topic names** — `"sensor.imu"` not `"sensor/imu"` 2. **Avoid Unix-specific tools in scripts** — or provide platform alternatives 3. **Test RT-sensitive code on Linux** — macOS/Windows RT jitter is higher 4. **Use `horus doctor`** to check platform capabilities on any machine 5. **Deploy to Linux** for production — develop anywhere ## Compile-Time Verification HORUS verifies cross-platform compilation in CI on every commit: - **Linux x86_64**: Full build + 3,000+ tests - **macOS x86_64 + ARM64**: Full build + tests - **Windows x86_64**: Full build + core tests - **Linux ARM64**: Cross-compilation check (for Raspberry Pi / Jetson) If your code compiles on any one platform, it compiles on all of them. --- ## Node Discovery & Monitoring Path: /reference/internals Description: Cross-process node discovery and lifecycle monitoring — for building custom monitoring tools # Node Discovery & Monitoring These types are for **building custom monitoring and discovery tools**. Most users don't need them — the built-in [Monitor](/development/monitor) handles this automatically. --- ## NodePresence Node presence files written to shared memory for cross-process discovery. The scheduler creates these automatically at startup; monitoring tools read them. Paths are managed by `horus_sys` and vary by platform. | Method | Returns | Description | |--------|---------|-------------| | `name()` | `&str` | Node name | | `pid()` | `u32` | Process ID (verified for liveness) | | `scheduler()` | `Option<&str>` | Scheduler name | | `publishers()` | `&[TopicMetadata]` | Topics this node publishes | | `subscribers()` | `&[TopicMetadata]` | Topics this node subscribes to | | `rate_hz()` | `Option` | Configured tick rate | | `health_status()` | `Option<&str>` | Healthy / Warning / Error / Critical | | `tick_count()` | `u64` | Total ticks since start | | `error_count()` | `u32` | Total errors since start | | `services()` | `&[String]` | Provided services | | `actions()` | `&[String]` | Provided actions | **Static methods:** `read(name) → Option`, `read_all() → Vec` --- ## NodeAnnouncement Real-time lifecycle events broadcast on the `horus.ctl.{scheduler}` topic. Unlike presence files (polled), announcements are push-based — you get notified the moment a node starts or stops. | Field | Type | Description | |-------|------|-------------| | `name` | `String` | Node name | | `pid` | `u32` | Process ID | | `event` | `NodeEvent` | `Started` or `Stopped` | | `publishers` | `Vec` | Published topic names | | `subscribers` | `Vec` | Subscribed topic names | | `timestamp_ms` | `u64` | Event timestamp | --- ## See Also - [Monitor Guide](/development/monitor) — Built-in web + TUI monitoring - [Scheduler API](/rust/api/scheduler) — `ProfileReport` and `status()` - [Topic API](/rust/api/topic) — `TopicMetrics` ======================================== # SECTION: Recipes ======================================== --- ## Recipes Path: /recipes Description: Copy-paste-ready patterns for common robotics tasks. Each recipe is a complete, self-contained program optimized for AI code generation. ## Recipes Copy-paste-ready patterns for common robotics tasks. Each recipe is a complete, self-contained program you can adapt and ship. Unlike tutorials (which teach step-by-step), recipes are pure working programs with inline safety annotations. **Every recipe includes:** - A complete `horus.toml` manifest - Full source code with all imports and entry point - Inline `// SAFETY:` / `# SAFETY:` and `// IMPORTANT:` / `# IMPORTANT:` comments for critical patterns - Expected terminal output ### Start Here | Recipe | What It Builds | Key Patterns | |--------|---------------|--------------| | [Real Hardware](/recipes/real-hardware) | I2C IMU + serial motors with real crates.io/pip libraries | Complete hardware path, no stubs | ### Rust Recipes | Recipe | What It Builds | Key Patterns | |--------|---------------|--------------| | [Differential Drive](/recipes/differential-drive) | 2-wheel robot: `CmdVel` to motor commands | Actuator safety, shutdown, execution order | | [IMU Reader](/recipes/imu-reader) | 100Hz IMU sensor with orientation publishing | Sensor node, `#[repr(C)]`, zero-copy | | [PID Controller](/recipes/pid-controller) | Generic PID loop with configurable gains | Control theory, rate-based RT | | [LiDAR Obstacle Avoidance](/recipes/lidar-obstacle-avoidance) | Reactive velocity from `LaserScan` | Sensor fusion, safety zones | | [Servo Controller](/recipes/servo-controller) | Multi-servo bus with safe shutdown | Multi-actuator, ordered shutdown | | [Multi-Sensor Fusion](/recipes/multi-sensor-fusion) | IMU + odometry state estimation | Multi-topic aggregation, caching | | [Emergency Stop](/recipes/emergency-stop) | E-stop monitor with safety state | Safety-critical, `Miss::SafeMode` | | [Telemetry Logger](/recipes/telemetry-logger) | Log topics to file via async I/O | `async_io()`, non-blocking file writes | | [Transform Frames](/recipes/transform-frames) | Coordinate frame tree with sensor mounts | `TransformFrame`, static/dynamic frames, `FrameId` caching | | [Record & Replay](/recipes/record-replay) | Record execution, replay, mixed replay for regression | `.with_recording()`, `replay_from()`, `add_replay()`, CLI | ### Python Recipes | Recipe | What It Builds | Key Patterns | |--------|---------------|--------------| | [Differential Drive (Python)](/recipes/differential-drive-python) | 2-wheel robot with kinematics + odometry | `CmdVel`, RPM clamping, safe shutdown | | [IMU Reader (Python)](/recipes/imu-reader-python) | 100Hz IMU sensor with orientation | `horus.Imu`, typed topics, NaN validation | | [PID Controller (Python)](/recipes/pid-controller-python) | Generic PID with anti-windup | `horus.dt()`, integral clamping, gain tuning | | [LiDAR Avoidance (Python)](/recipes/lidar-avoidance-python) | Reactive velocity from `LaserScan` | Three-zone split, safety stop, `CmdVel` | | [Servo Controller (Python)](/recipes/servo-controller-python) | Multi-servo bus with safe shutdown | `ServoCommand`, `JointState`, joint limits | | [Multi-Sensor Fusion (Python)](/recipes/multi-sensor-fusion-python) | IMU + odometry complementary filter | NumPy, multi-topic aggregation, angle wrapping | | [Emergency Stop (Python)](/recipes/emergency-stop-python) | Safety monitor with fail-safe | `EmergencyStop`, debounce, stale detection | | [Telemetry Logger (Python)](/recipes/telemetry-logger-python) | Async CSV logging | `async def tick`, `aiofiles`, non-blocking I/O | | [Coordinate Transforms (Python)](/recipes/transform-frames-python) | Frame tree with sensor mounts | `TransformFrame`, static/dynamic, point conversion | | [Python CV Node](/recipes/python-cv-node) | Python computer vision with `horus.Node` | Python API, NumPy integration | | [Record & Replay (Python)](/recipes/record-replay-python) | Record execution, replay and diff via CLI | `recording=True`, CLI replay, export to CSV | ### How to Use a Recipe ```bash horus new my-robot -r cd my-robot ``` Copy the `horus.toml` and `src/main.rs` from any recipe, then: ```bash horus run ``` ### Conventions - **Execution order**: Publishers run before subscribers (lower `.order()` first) - **Safety**: Every actuator node implements `shutdown()` that zeros outputs - **recv() every tick**: Always call `.recv()` on every subscriber, every tick — even if you discard the value - **No sleep()**: Never use `std::thread::sleep()` — use `.rate()` on the scheduler - **No blocking I/O in tick()**: Use `.async_io()` for file/network operations --- ## See Also - [Tutorials](/tutorials) — Step-by-step learning path - [Getting Started](/getting-started/quick-start) — First application --- ## Real Hardware Path: /recipes/real-hardware Description: Complete working examples with real I2C IMU and serial motor controller using crates.io and pip libraries — no stubs # Real Hardware Other recipes use `read_hardware()` stubs you need to replace. This recipe shows the **complete working path** — real libraries, real I/O, no placeholders. Use this as a starting point for your robot. ## When To Use This - You have a real robot with I2C sensors and/or serial actuators - You want to see how crates.io / pip libraries integrate with HORUS - You want a copy-paste starting point that actually talks to hardware ## Example 1: I2C IMU (MPU6050) Reads an MPU6050 accelerometer + gyroscope over I2C at 100 Hz. Publishes `Imu` messages. --- ## Example 2: Serial Motor Controller (VESC / Generic UART) Subscribes to `CmdVel`, converts to left/right wheel speeds, sends commands over serial UART. Publishes odometry feedback. --- ## Example 3: Both Together Run the IMU and motor controller as a single robot. The scheduler handles ordering, timing, and safe shutdown for both. --- ## Adding Dependencies HORUS projects pull hardware libraries from crates.io and PyPI through `horus.toml`: ```toml [dependencies] # Rust crates (from crates.io) serialport = { version = "4.7", source = "crates.io" } i2cdev = { version = "0.6", source = "crates.io" } rppal = { version = "0.19", source = "crates.io" } # Python packages (from PyPI) smbus2 = { version = ">=0.4", source = "pypi" } pyserial = { version = ">=3.5", source = "pypi" } ``` Or via the CLI: ```bash horus add serialport --source crates.io horus add smbus2 --source pypi ``` HORUS generates the native build files (`.horus/Cargo.toml`, `.horus/pyproject.toml`) from these entries. You never edit them directly. ## Common Hardware Libraries | Hardware | Rust (crates.io) | Python (PyPI) | |----------|-----------------|---------------| | Serial/UART | `serialport` | `pyserial` | | I2C | `i2cdev` | `smbus2` | | SPI | `spidev` | `spidev` | | GPIO | `rppal` (RPi) or `gpio-cdev` | `RPi.GPIO` or `gpiod` | | CAN bus | `socketcan` | `python-can` | | USB | `nusb` or `rusb` | `pyusb` | | Cameras | `v4l2` or `nokhwa` | `opencv-python` | | Bluetooth | `btleplug` | `bleak` | ## Key Points - **No special driver API needed** — use any crates.io or pip library inside your Node - **`horus.toml [dependencies]`** with `source = "crates.io"` or `source = "pypi"` pulls the library - **The Node trait** is the integration point: `init()` opens hardware, `tick()` reads/writes, `shutdown()` stops - **The scheduler** handles timing, RT priority, deadline enforcement, and ordered shutdown - **Topics** connect your hardware nodes to the rest of the system with zero-copy IPC - **All other recipes** use `read_hardware()` stubs — replace them with the patterns shown here --- ## See Also - [IMU Reader](/recipes/imu-reader) — IMU recipe with orientation estimation (uses stubs) - [Servo Controller](/recipes/servo-controller) — Multi-servo bus recipe (uses stubs) - [Differential Drive](/recipes/differential-drive) — Full kinematics + odometry - [Driver API](/rust/api/drivers) — Config-based driver loading (optional convenience) - [Configuration Reference](/package-management/configuration) — `horus.toml` dependency syntax --- ## Differential Drive (Python) Path: /recipes/differential-drive-python Description: Convert CmdVel velocity commands to left/right motor outputs with kinematics, odometry, and safety shutdown — Python version ## Differential Drive (Python) Converts `CmdVel` velocity commands into left/right wheel speeds with RPM clamping, dead-reckoning odometry, and safe shutdown. ### Problem You need to drive a 2-wheel robot from `CmdVel` commands, with safety limits and position tracking. ### When To Use - Any 2-wheel differential drive robot (TurtleBot, AGV, hobby bots) - Converting `CmdVel` from a planner or teleoperation to motor outputs - When you need velocity clamping and safe shutdown ### Prerequisites - HORUS installed ([Installation Guide](/getting-started/installation)) - Familiarity with [CmdVel](/stdlib/messages/cmd-vel) ### Kinematics ```text v_left = v - ω * L / 2 v_right = v + ω * L / 2 ``` Where `v` is linear velocity (m/s), `ω` is angular velocity (rad/s), and `L` is the wheel base (m). ### horus.toml ```toml [package] name = "diff-drive-py" version = "0.1.0" language = "python" ``` ### Complete Code ```python import horus from horus import Node, CmdVel, run, us import math # ── Robot parameters ────────────────────────────────── WHEEL_BASE = 0.3 # meters between wheels WHEEL_RADIUS = 0.05 # meters MAX_RPM = 200.0 # motor safety limit # ── State ───────────────────────────────────────────── x, y, theta = [0.0], [0.0], [0.0] # ── Hardware stub ───────────────────────────────────── def write_motors(left_rpm, right_rpm): """Send to motor controller — replace with serial/CAN driver.""" pass # ── Node callbacks ──────────────────────────────────── def drive_tick(node): cmd = node.recv("cmd_vel") if cmd is None: return v = cmd.linear # m/s forward w = cmd.angular # rad/s counter-clockwise # IMPORTANT: differential drive kinematics v_left = v - w * WHEEL_BASE / 2.0 v_right = v + w * WHEEL_BASE / 2.0 # Convert m/s to RPM left_rpm = v_left / WHEEL_RADIUS * 60.0 / (2.0 * math.pi) right_rpm = v_right / WHEEL_RADIUS * 60.0 / (2.0 * math.pi) # SAFETY: clamp to motor limits left_rpm = max(-MAX_RPM, min(MAX_RPM, left_rpm)) right_rpm = max(-MAX_RPM, min(MAX_RPM, right_rpm)) write_motors(left_rpm, right_rpm) # Dead-reckoning odometry dt = 1.0 / 50.0 # 50 Hz theta[0] += w * dt x[0] += v * dt * math.cos(theta[0]) y[0] += v * dt * math.sin(theta[0]) node.send("odom", { "x": x[0], "y": y[0], "theta": theta[0], "v_linear": v, "v_angular": w, "left_rpm": left_rpm, "right_rpm": right_rpm, }) def drive_shutdown(node): # SAFETY: stop both motors before exit write_motors(0.0, 0.0) node.log_info(f"Shutdown at ({x[0]:.2f}, {y[0]:.2f})") # ── Main ────────────────────────────────────────────── drive = Node( name="DiffDrive", tick=drive_tick, shutdown=drive_shutdown, rate=50, order=10, budget=500 * us, on_miss="safe_mode", subs=[CmdVel], pubs=["odom"], ) run(drive, tick_rate=100, rt=True) ``` ### Expected Output ```text [HORUS] Scheduler running — tick_rate: 100 Hz [HORUS] Node "DiffDrive" started (Rt, 50 Hz, budget: 500μs) ^C [HORUS] Shutting down... [HORUS] DiffDrive: Shutdown at (1.23, 0.45) [HORUS] Node "DiffDrive" shutdown complete ``` ### Key Points - **RPM clamping** prevents motor damage from bad upstream commands - **`shutdown()` zeros both motors** — critical for robots under gravity or in motion - **`on_miss="safe_mode"`** stops motors if the tick budget is exceeded - **Dead-reckoning drifts** — in production, fuse with IMU or wheel encoders - **`write_motors()` is a stub** — replace with `pyserial`, `python-can`, or GPIO. See [Real Hardware](/recipes/real-hardware) for complete examples. ### Common Errors | Symptom | Cause | Fix | |---------|-------|-----| | Robot spins in circles | `WHEEL_BASE` wrong (too small) | Measure wheel center-to-center distance | | Motors saturate at low speed | `WHEEL_RADIUS` too small → RPM too high | Measure actual wheel radius | | Odometry drifts badly | Pure dead-reckoning, no correction | Fuse with IMU ([Multi-Sensor Fusion](/recipes/multi-sensor-fusion-python)) | | Robot does not stop on Ctrl+C | Missing `shutdown` callback | Always implement `shutdown` for actuator nodes | --- ## See Also - [Differential Drive (Rust)](/recipes/differential-drive) — Rust version with full kinematics diagram - [CmdVel](/stdlib/messages/cmd-vel) — Velocity command reference - [Real Hardware](/recipes/real-hardware) — Complete serial motor examples - [PID Controller (Python)](/recipes/pid-controller-python) — Closed-loop speed control --- ## Differential Drive Path: /recipes/differential-drive Description: Convert CmdVel velocity commands to left/right motor outputs with kinematics, odometry, and safety shutdown # Differential Drive Differential drive is the workhorse of mobile robotics. From TurtleBots in research labs to warehouse AGVs moving pallets at Amazon, from sidewalk delivery robots to hospital logistics carts — two independently driven wheels with a caster is the simplest platform that can both drive forward and turn in place. If you are building a ground robot, there is a good chance it is differential drive. The pattern is straightforward: take a velocity command (how fast, which direction) and split it into two wheel speeds. But production code needs more than kinematics — it needs RPM clamping so a bad upstream command cannot burn out a motor, odometry integration so the robot knows where it is, and a safe shutdown that zeros both wheels before the process exits. This recipe covers all three. ## When To Use This - Any 2-wheel differential drive robot (TurtleBot, warehouse AGV, hobby bots) - Converting `CmdVel` from a path planner or teleoperation to motor outputs - When you need velocity clamping and safe shutdown **Use [Servo Controller](/recipes/servo-controller) instead** for directly commanding servo positions. **Use [DifferentialDriveCommand](/rust/api/control-messages) instead** if your motor driver accepts left/right wheel speeds directly. ## Prerequisites - [Quick Start](/getting-started/quick-start) completed - Familiarity with [CmdVel](/stdlib/messages/cmd-vel) and the [Node trait](/concepts/core-concepts-nodes) ## Kinematics A differential drive robot has two wheels separated by a distance **L** (the wheel base), each with radius **r**. A `CmdVel` command provides two values: linear velocity **v** (m/s, forward/backward) and angular velocity **omega** (rad/s, turning). The job is to convert `(v, omega)` into individual wheel angular velocities. ```text ┌───────── L ─────────┐ │ │ ┌────┤ ├────┐ │ L │ Robot │ R │ │ │ Center │ │ │ │ ●───► v │ │ │ │ │ │ │ └────┤ ▼ ω ├────┘ │ │ └─────────────────────┘ L = wheel base (center-to-center distance) v = linear velocity (m/s, forward) ω = angular velocity (rad/s, counter-clockwise positive) ``` The conversion follows from the fact that the left wheel traces the inner arc and the right wheel traces the outer arc during a turn: ```text v_left = v - ω * L / 2 v_right = v + ω * L / 2 ``` Where `v_left` and `v_right` are the linear velocities at each wheel (m/s). To get angular velocity of each wheel in rad/s, divide by the wheel radius **r**: ```text ω_left = (v - ω * L / 2) / r ω_right = (v + ω * L / 2) / r ``` **Intuition**: When `omega = 0` (driving straight), both wheels spin at the same speed `v / r`. When `v = 0` and `omega > 0` (turning in place), the left wheel spins backward and the right wheel spins forward at equal magnitude `omega * L / (2 * r)`. ## Solution ### horus.toml ```toml [package] name = "differential-drive" version = "0.1.0" description = "2-wheel differential drive with safe shutdown" ``` ### src/main.rs (or src/main.py) ## Odometry: Closing the Loop Driving blind is fine for teleoperation, but any autonomous behavior needs position feedback. Odometry integrates wheel encoder ticks into a `(x, y, theta)` estimate. This is dead reckoning — it drifts over time, but it is the foundation that every higher-level system (SLAM, path planning, localization) builds on. The math mirrors the kinematics in reverse. Instead of splitting a velocity command into wheel speeds, we combine wheel displacements back into a robot displacement: ```text Δs = (Δleft + Δright) / 2 distance traveled by robot center Δθ = (Δright - Δleft) / L change in heading x += Δs * cos(θ + Δθ / 2) update x position y += Δs * sin(θ + Δθ / 2) update y position θ += Δθ update heading ``` Where `Δleft` and `Δright` are the linear distances each wheel traveled since the last tick, computed from encoder ticks: ```text Δleft = (left_ticks - prev_left_ticks) * meters_per_tick Δright = (right_ticks - prev_right_ticks) * meters_per_tick ``` The `cos(theta + d_theta / 2)` term (rather than just `cos(theta)`) is the second-order Runge-Kutta integration. It assumes the robot followed a circular arc during the tick, not a straight line. This reduces drift when turning. ### Odometry Node ```rust // simplified use horus::prelude::*; const WHEEL_BASE: f32 = 0.3; const TICKS_PER_REV: f32 = 1440.0; // encoder resolution const WHEEL_CIRCUMFERENCE: f32 = 0.314; // 2 * π * 0.05m radius const METERS_PER_TICK: f32 = WHEEL_CIRCUMFERENCE / TICKS_PER_REV; /// Raw encoder readings from motor driver #[derive(Debug, Clone, Copy, Default, Serialize, Deserialize, LogSummary)] #[repr(C)] struct WheelEncoder { left_ticks: i64, right_ticks: i64, } struct OdometryNode { encoder_sub: Topic, odom_pub: Topic, prev_left: i64, prev_right: i64, x: f64, y: f64, theta: f64, initialized: bool, } impl OdometryNode { fn new() -> Result { Ok(Self { encoder_sub: Topic::new("wheel.encoder")?, odom_pub: Topic::new("odom")?, prev_left: 0, prev_right: 0, x: 0.0, y: 0.0, theta: 0.0, initialized: false, }) } } impl Node for OdometryNode { fn name(&self) -> &str { "Odometry" } fn tick(&mut self) { let enc = match self.encoder_sub.recv() { Some(e) => e, None => return, }; // First reading — just record baseline, no integration if !self.initialized { self.prev_left = enc.left_ticks; self.prev_right = enc.right_ticks; self.initialized = true; return; } // Convert encoder deltas to linear distance let d_left = (enc.left_ticks - self.prev_left) as f64 * METERS_PER_TICK as f64; let d_right = (enc.right_ticks - self.prev_right) as f64 * METERS_PER_TICK as f64; self.prev_left = enc.left_ticks; self.prev_right = enc.right_ticks; // Dead reckoning integration (second-order Runge-Kutta) let ds = (d_left + d_right) / 2.0; let d_theta = (d_right - d_left) / WHEEL_BASE as f64; let mid_theta = self.theta + d_theta / 2.0; self.x += ds * mid_theta.cos(); self.y += ds * mid_theta.sin(); self.theta += d_theta; let mut odom = Odometry::default(); odom.pose = Pose2D { x: self.x, y: self.y, theta: self.theta, ..Default::default() }; self.odom_pub.send(odom); } } ``` ### Running Both Nodes Together ```rust // simplified fn main() -> Result<()> { let mut scheduler = Scheduler::new(); // Order matters: odometry reads encoders first, drive writes commands second scheduler.add(OdometryNode::new()?) .order(0) .rate(50_u64.hz()) .build()?; scheduler.add(DriveNode::new()?) .order(1) .rate(50_u64.hz()) .on_miss(Miss::Warn) .build()?; scheduler.run() } ``` ## Understanding the Code - **Kinematics**: `v - w*L/2` and `v + w*L/2` is the standard unicycle-to-differential conversion, where `v` is linear velocity, `w` is angular velocity, and `L` is the wheel base - **`#[repr(C)]` + `Copy`** on `WheelCmd` enables zero-copy shared memory transport - **`.rate(50_u64.hz())`** auto-enables RT with 80% budget (16 ms) and 95% deadline (19 ms) - **`shutdown()`** sends zero RPM — prevents wheels spinning if the program crashes mid-tick - **`clamp()`** enforces motor safety limits even if upstream sends dangerous velocities ## Design Decisions ### Why shutdown zeros velocity If the drive node crashes or the user hits Ctrl+C mid-tick, the last `WheelCmd` published to shared memory persists. Any downstream motor driver reading that topic will keep spinning the wheels at whatever speed was last commanded. The `shutdown()` method is the only guaranteed opportunity to publish a safe value before the process exits. This is not optional — it is the difference between a robot that stops and a robot that drives into a wall. ### Why RPM clamping A path planner bug, a corrupted message, or a numerical overflow can produce arbitrarily large velocity commands. Without clamping, those commands pass straight through to the motor driver. At best, the driver rejects them and the robot does nothing. At worst, the driver accepts them and the motor stalls, overheats, or strips gears. The `MAX_RPM` constant acts as a last line of defense that is independent of upstream correctness. ### Why fixed constants instead of parameter tuning `WHEEL_BASE`, `WHEEL_RADIUS`, and `MAX_RPM` are compile-time constants in this recipe for clarity. In production, load them from a config file or `horus.toml` parameters so you can calibrate per-robot without recompiling. The trade-off: constants are faster (no lookup), parameters are flexible (no rebuild). For a fleet of identical robots, constants are fine. For heterogeneous hardware, use parameters. ### Why f64 for odometry The drive node uses `f32` because motor commands do not accumulate error — each tick produces an independent output. Odometry is different: `x`, `y`, and `theta` are running sums that grow over thousands of ticks. With `f32` (about 7 decimal digits), a robot at `x = 100.0` meters loses sub-millimeter resolution. `f64` (about 15 decimal digits) keeps sub-micrometer precision out to kilometers. The final `Odometry` message is still `f32` because downstream consumers (visualization, path planner) do not need `f64` precision. ## Trade-offs | Decision | Upside | Downside | |----------|--------|----------| | 50 Hz control loop | Smooth enough for most ground robots, low CPU cost | Too slow for high-speed or high-inertia platforms (use 200+ Hz) | | RPM clamping | Prevents motor damage from bad upstream commands | Silent saturation — path planner does not know it was clamped | | Dead reckoning odometry | No extra sensors needed, always available | Drift accumulates — unusable alone over long distances | | Second-order integration | Reduces drift during turns vs first-order | Marginally more computation per tick (one extra trig call) | | `WheelCmd` as RPM | Matches most hobby motor drivers directly | Industrial drives may expect rad/s, m/s, or duty cycle | | Fixed constants | No config loading, zero runtime overhead | Must recompile to change robot dimensions | | Separate drive + odom nodes | Independent rates, independent failure | Two nodes sharing `WHEEL_BASE` must stay in sync | ## Variations ## Common Errors | Symptom | Cause | Fix | |---------|-------|-----| | Robot drives in circles | Wheel base constant wrong | Measure center-to-center distance between wheels | | Motors overshoot on startup | No acceleration limiting | Add a ramp rate (see Variations above) | | Robot drifts when commanded straight | Wheel radii unequal | Calibrate `WHEEL_RADIUS` per wheel | | Wheels spin after Ctrl+C | Missing `shutdown()` | Always send zero in `shutdown()` | | Motor driver rejects commands | RPM exceeds driver limit | Reduce `MAX_RPM` to match your hardware | | Odometry drifts in a curve | Wheel base measured wrong | Spin robot 360 degrees, compare actual vs reported angle | | Odometry jumps on first tick | Missing initialization guard | Skip integration on first encoder reading (see `initialized` flag) | | Position loses precision far from origin | Using f32 for odometry state | Use f64 for running sums, cast to f32 only in the output message | ## See Also - [CmdVel](/stdlib/messages/cmd-vel) — Velocity command type - [Odometry](/stdlib/messages/odometry) — Position feedback from wheel encoders - [Motor Controller Tutorial](/tutorials/02-motor-controller) — Step-by-step motor control - [Emergency Stop Recipe](/recipes/emergency-stop) — Override cmd_vel on safety trigger - [PID Controller Recipe](/recipes/pid-controller) — Closed-loop control for velocity tracking - [Multi-Sensor Fusion](/recipes/multi-sensor-fusion) — Combine wheel odometry with IMU --- ## Recipe: IMU Reader (Python) Path: /recipes/imu-reader-python Description: 100 Hz IMU sensor node in Python that reads accelerometer and gyroscope data, estimates orientation via gyro integration, and publishes Imu messages. ## IMU Reader (Python) Reads a 6-axis IMU (accelerometer + gyroscope) at 100 Hz and publishes `Imu` messages for downstream consumers. Includes simple gyro integration for orientation estimation. ### Problem You need to read IMU hardware at a fixed rate, validate readings, and publish typed `Imu` messages from Python. ### When To Use - Reading a 6-axis or 9-axis IMU over I2C/SPI from Python - Publishing raw `Imu` messages for fusion, logging, or SLAM nodes - Prototyping IMU pipelines before porting to Rust ### Prerequisites - HORUS installed ([Installation Guide](/getting-started/installation)) - Python 3.8+ with `horus` package ### horus.toml ```toml [package] name = "imu-reader-py" version = "0.1.0" description = "100 Hz IMU sensor with orientation publishing" language = "python" ``` ### Complete Code ```python #!/usr/bin/env python3 """100 Hz IMU reader with gyro-integrated orientation estimation.""" import math import horus from horus import Node, Scheduler, Imu, us, ms # ── State ──────────────────────────────────────────────────── roll = [0.0] pitch = [0.0] yaw = [0.0] tick_count = [0] # ── Hardware stub ──────────────────────────────────────────── def read_hardware(): """Read IMU hardware — replace with your I2C/SPI driver. Returns: horus.Imu with accel and gyro fields populated. """ return Imu( accel_x=0.0, accel_y=0.0, accel_z=9.81, # gravity on Z axis gyro_x=0.01, # slow roll gyro_y=0.0, gyro_z=0.05, # slow yaw rotation ) # ── Node callbacks ─────────────────────────────────────────── def imu_tick(node): imu = read_hardware() # Validate before publishing — hardware faults produce NaN if (math.isnan(imu.accel_x) or math.isnan(imu.accel_y) or math.isnan(imu.accel_z) or math.isnan(imu.gyro_x) or math.isnan(imu.gyro_y) or math.isnan(imu.gyro_z)): return # skip corrupted readings # Publish raw IMU for any subscriber node.send("imu.raw", imu) # Simple gyro integration (replace with Madgwick/Mahony in production) dt = 1.0 / 100.0 # 100 Hz -> 10 ms per tick roll[0] += imu.gyro_x * dt pitch[0] += imu.gyro_y * dt yaw[0] += imu.gyro_z * dt tick_count[0] += 1 # Publish orientation estimate as a dict (or define a custom class) node.send("imu.orientation", { "roll": roll[0], "pitch": pitch[0], "yaw": yaw[0], "tick": tick_count[0], }) def imu_shutdown(node): print(f"ImuReader: {tick_count[0]} ticks, final yaw={yaw[0]:.4f} rad") # ── Main ───────────────────────────────────────────────────── imu_node = Node( name="ImuReader", tick=imu_tick, shutdown=imu_shutdown, rate=100, # 100 Hz sensor rate order=0, pubs=["imu.raw", "imu.orientation"], subs=[], budget=800 * us, # 800 us budget for I2C/SPI read ) if __name__ == "__main__": horus.run(imu_node) ``` ### Expected Output ```text [HORUS] Scheduler running — tick_rate: 1000 Hz [HORUS] Node "ImuReader" started (100 Hz) ^C ImuReader: 500 ticks, final yaw=2.5000 rad [HORUS] Shutting down... [HORUS] Node "ImuReader" shutdown complete ``` ### Key Points - **`horus.Imu`** is the typed message with `accel_x/y/z` and `gyro_x/y/z` fields - **NaN validation** catches hardware faults before they propagate to downstream nodes - **Gyro integration drifts** over time — in production, use a complementary or Madgwick filter - **`read_hardware()`** is a placeholder — see [Real Hardware Recipe](/recipes/real-hardware) for a complete MPU6050 example with `smbus2` - **`budget=800 * us`** accounts for I2C bus latency on typical embedded hardware ### Variations - **Complementary filter**: Blend accelerometer gravity direction with gyro integration for stable pitch/roll - **Hardware I2C**: Use `smbus2` to read from an MPU6050 or BNO055 over I2C - **High-rate IMU**: Set `rate=1000` with `budget=200 * us` for 1 kHz industrial IMUs ### Common Errors | Symptom | Cause | Fix | |---------|-------|-----| | Orientation drifts continuously | Pure gyro integration without correction | Use complementary or Madgwick filter | | `NaN` in published data | Hardware fault or I2C read error | Add NaN check before publishing | | Orientation jumps on startup | Initial gyro bias not calibrated | Average first 100 readings as bias offset | | Yaw rotates when stationary | Gyro bias in Z-axis | Subtract calibrated bias from `gyro_z` | | Topic data stale | Publisher rate too low | Verify `rate=100` matches hardware capability | --- ## See Also - [IMU Reader (Rust)](/recipes/imu-reader) — Rust version of this recipe - [Multi-Sensor Fusion (Python)](/recipes/multi-sensor-fusion-python) — Combine IMU with odometry - [Python CV Node](/recipes/python-cv-node) — Python vision pipeline pattern --- ## IMU Reader Path: /recipes/imu-reader Description: 100 Hz IMU sensor node that reads accelerometer and gyroscope data and publishes orientation estimates # IMU Reader You need to read an IMU sensor (accelerometer + gyroscope) at a fixed rate and publish the data for downstream consumers. Here's a production-ready pattern with raw data publishing and orientation estimation. ## When To Use This - Reading a 6-axis or 9-axis IMU over I2C/SPI - Publishing raw `Imu` messages for sensor fusion or logging - Estimating orientation from gyroscope integration (simple) or a complementary filter (production) **Use the built-in `Imu` message type** — don't define custom IMU structs. Downstream nodes (fusion, SLAM, display) expect `Imu`. ## Prerequisites - [Quick Start](/getting-started/quick-start) completed - Familiarity with [Imu](/stdlib/messages/imu) message type and [Topic](/rust/api/topic) ## Solution ### horus.toml ```toml [package] name = "imu-reader" version = "0.1.0" description = "100 Hz IMU sensor with orientation publishing" ``` ### src/main.rs (or src/main.py) ## Understanding the Code - **`Imu::new()`** creates a message with identity quaternion, zero acceleration/velocity, and current timestamp - **`imu.is_valid()`** rejects NaN/infinite readings from faulty hardware - **`#[repr(C)]` + `Copy`** on `Orientation` enables zero-copy shared memory (~50 ns latency) - **Gyro integration drifts over time** — in production, use a Madgwick, Mahony, or complementary filter ## Variations ## Common Errors | Symptom | Cause | Fix | |---------|-------|-----| | Orientation drifts continuously | Pure gyro integration without correction | Use complementary or Madgwick filter | | `NaN` in published data | Hardware fault or I2C read error | Check `imu.is_valid()` before publishing | | Orientation jumps on startup | Initial gyro bias not calibrated | Average first 100 readings as bias offset | | Yaw rotates when stationary | Gyro bias in Z-axis | Subtract calibrated bias from `angular_velocity[2]` | | Topic data stale | Publisher rate too low | Verify `.rate(100_u64.hz())` matches hardware capability | ## See Also - [Real Hardware Recipe](/recipes/real-hardware) — Complete MPU6050 I2C example (no stubs) - [Imu](/stdlib/messages/imu) — IMU message type reference - [Sensor Node Tutorial](/tutorials/01-sensor-node) — Step-by-step sensor tutorial - [Multi-Sensor Fusion Recipe](/recipes/multi-sensor-fusion) — Combine IMU with odometry - [Sensor Messages (Rust)](/rust/api/sensor-messages) — All sensor message types --- ## Recipe: PID Controller Path: /recipes/pid-controller Description: Generic PID control loop with configurable gains, anti-windup, and derivative filtering. ## PID Controller A reusable PID controller node that reads a setpoint and measured value, then publishes a control output. Includes integral anti-windup and derivative low-pass filtering. ### Problem You need a closed-loop controller that drives a measured value toward a target setpoint with tunable response characteristics. ### How PID Works A PID controller is a **feedback loop**. It continuously measures the difference between where you *want* to be (setpoint) and where you *are* (measured value), then computes a control output to close that gap. The output is the sum of three terms, each addressing a different aspect of the error: **P (Proportional) -- reacts to the present error** `P = Kp * error` The proportional term is the simplest: multiply the current error by a gain. Large error produces large output, pushing the system toward the setpoint. P alone gets you close but usually leaves a small residual error (called *steady-state error*) because the output shrinks as the error shrinks -- at some point the output is too small to overcome friction or load. **I (Integral) -- eliminates steady-state error** `I = Ki * sum_of_errors_over_time` The integral term accumulates past errors. Even if the current error is tiny, the integral keeps growing until the system reaches the setpoint exactly. This eliminates steady-state error but introduces risk: if the system is saturated (actuator at its limit), the integral keeps accumulating, causing *windup*. That is why the code clamps the integral to `integral_max`. **D (Derivative) -- reduces overshoot** `D = Kd * rate_of_change_of_error` The derivative term looks at how fast the error is changing. If the error is shrinking rapidly (you are approaching the setpoint), D produces a braking force that prevents overshoot. The downside: differentiating a noisy sensor signal amplifies the noise. That is why the code applies a low-pass filter to the derivative. **Combined output:** `output = P + I + D` The final output is clamped to `[output_min, output_max]` to protect the actuator from impossible commands. ### When To Use - Position, velocity, or temperature control loops - Any system where you have a measurable output and a controllable input - When you need tunable response without a full model of the plant ### Prerequisites - HORUS installed ([Installation Guide](/getting-started/installation)) - Basic understanding of nodes and topics ([Quick Start](/getting-started/quick-start)) ### horus.toml ```toml [package] name = "pid-controller" version = "0.1.0" description = "Generic PID with anti-windup and derivative filtering" ``` ### Complete Code ### Expected Output ```text [HORUS] Scheduler running — tick_rate: 200 Hz [HORUS] Node "PID" started (Rt, 200 Hz, budget: 400μs, deadline: 4.75ms) ^C [HORUS] Shutting down... [HORUS] Node "PID" shutdown complete ``` ### Key Points - **Anti-windup**: Integral term is clamped to `integral_max` — prevents windup during saturation - **Derivative filter**: Low-pass filter (alpha=0.8) smooths noisy sensor feedback - **`ControlOutput` includes debug fields** (`error`, `p_term`, `i_term`, `d_term`) for tuning - **`shutdown()` zeros output** — prevents actuator from holding last command - **200Hz is typical** for position/velocity PID; use 1kHz+ for current/torque loops - **Gains (kp, ki, kd)** are constructor parameters — wire from config or topic for online tuning ### Tuning Your PID Getting the gains right is the hardest part of PID. Two approaches work well in practice. **Manual tuning (recommended starting point):** 1. Set `ki = 0` and `kd = 0`. Start with P-only control. 2. Increase `kp` until the system responds quickly to setpoint changes but starts to oscillate or overshoot. 3. Add `kd` to dampen the overshoot. Increase `kd` until the overshoot is acceptable. If the output becomes jerky, increase the derivative filter `alpha`. 4. Add a small `ki` to eliminate steady-state error. Start low (e.g., 0.1) and increase until the residual error disappears. Watch for integral windup -- if the system overshoots after saturation, reduce `integral_max`. **Ziegler-Nichols method (systematic):** 1. Set `ki = 0` and `kd = 0`. 2. Increase `kp` from zero until the system oscillates with a constant amplitude. This is the *ultimate gain* `Ku`. 3. Measure the *oscillation period* `Tu` (seconds per cycle). 4. Compute the gains: | Controller | Kp | Ki | Kd | |---|---|---|---| | P-only | `0.5 * Ku` | 0 | 0 | | PI | `0.45 * Ku` | `1.2 * Kp / Tu` | 0 | | PID | `0.6 * Ku` | `2 * Kp / Tu` | `Kp * Tu / 8` | Ziegler-Nichols gives aggressive tuning. You will likely need to reduce `kp` by 20-30% and increase `kd` for less overshoot. ### Online Tuning with Parameters You can adjust PID gains at runtime without restarting. Add `RuntimeParams` to the node and read gains each tick: ```rust // simplified use horus::prelude::*; struct TunablePidNode { params: RuntimeParams, setpoint_sub: Topic, measurement_sub: Topic, output_pub: Topic, // State kp: f32, ki: f32, kd: f32, integral: f32, prev_error: f32, prev_derivative: f32, output_min: f32, output_max: f32, integral_max: f32, alpha: f32, target: f32, measured: f32, tick_count: u64, } impl TunablePidNode { fn new() -> Result { let params = RuntimeParams::init()?; Ok(Self { kp: params.get_or("kp", 2.0), ki: params.get_or("ki", 0.5), kd: params.get_or("kd", 0.1), params, setpoint_sub: Topic::new("pid.setpoint")?, measurement_sub: Topic::new("pid.measurement")?, output_pub: Topic::new("pid.output")?, integral: 0.0, prev_error: 0.0, prev_derivative: 0.0, output_min: -1.0, output_max: 1.0, integral_max: 0.5, alpha: 0.8, target: 0.0, measured: 0.0, tick_count: 0, }) } } impl Node for TunablePidNode { fn name(&self) -> &str { "TunablePID" } fn tick(&mut self) { // Reload gains every 200 ticks (~1s at 200Hz) if self.tick_count % 200 == 0 { self.kp = self.params.get_or("kp", self.kp as f64) as f32; self.ki = self.params.get_or("ki", self.ki as f64) as f32; self.kd = self.params.get_or("kd", self.kd as f64) as f32; } self.tick_count += 1; // ... rest of PID logic identical to PidNode::tick() ... } } ``` Then tune from the command line while the system is running: ```bash # Increase proportional gain horus param set TunablePID kp 3.0 # Reduce integral to fix overshoot horus param set TunablePID ki 0.2 # Add more derivative damping horus param set TunablePID kd 0.3 ``` Changes take effect within one second (the next param reload cycle). No restart needed. ### Testing Your PID Use `tick_once()` to step through the controller deterministically and assert on output: ```rust // simplified #[test] fn test_pid_step_response() { // Create scheduler in deterministic mode let mut scheduler = Scheduler::new() .deterministic(true) .tick_rate(200_u64.hz()); let setpoint_pub: Topic = Topic::new("pid.setpoint").unwrap(); let measurement_pub: Topic = Topic::new("pid.measurement").unwrap(); let output_sub: Topic = Topic::new("pid.output").unwrap(); scheduler.add(PidNode::new(2.0, 0.5, 0.1).unwrap()) .order(0) .rate(200_u64.hz()) .build() .unwrap(); // Send a step input: target=1.0, measured=0.0 setpoint_pub.send(Setpoint { target: 1.0 }); measurement_pub.send(Measurement { value: 0.0 }); // First tick — large error, P dominates scheduler.tick_once().unwrap(); let out = output_sub.recv().unwrap(); assert!(out.command > 0.0, "output should be positive for positive error"); assert!((out.error - 1.0).abs() < 1e-6, "error should be 1.0"); assert!(out.p_term > out.i_term, "P should dominate on first tick"); // Run 200 more ticks — simulate convergence for _ in 0..200 { // Simulate plant: measured moves toward command let m = measurement_pub.recv().unwrap_or(Measurement { value: 0.0 }); let new_val = m.value + out.command * 0.01; // simple integrator plant measurement_pub.send(Measurement { value: new_val }); scheduler.tick_once().unwrap(); } // Error should be small after 200 ticks (1 second at 200Hz) let final_out = output_sub.recv().unwrap(); assert!(final_out.error.abs() < 0.1, "should converge near setpoint"); } ``` Key testing patterns: - **Deterministic mode** ensures identical results on every run - **`tick_once()`** gives precise single-step control -- no threads, no timing variance - **Test the step response**: set target, measure output after N ticks, assert convergence - **Test edge cases**: zero gains, saturated output, setpoint changes, integral windup ### Design Decisions **Why anti-windup (integral clamping)?** Without clamping, the integral term grows unbounded when the output is saturated (e.g., the motor is at full power but the error persists). When conditions change and the error reverses, the integral has accumulated so much that the controller overshoots massively before recovering. Clamping `integral` to `[-integral_max, integral_max]` caps this accumulation. The `integral_max` value should be set so that `ki * integral_max` produces roughly half of your output range. **Why a derivative filter (low-pass on D)?** The derivative term computes `(error - prev_error) / dt`. Real sensor data has noise -- small random fluctuations that produce enormous derivative spikes. These spikes translate directly into actuator jitter. The exponential moving average filter (`alpha * prev + (1-alpha) * new`) smooths the derivative at the cost of a small phase lag. An `alpha` of 0.8 provides strong filtering; 0.1 is nearly unfiltered. For noisy sensors (encoders, IMUs), keep `alpha` between 0.7 and 0.9. **Why output clamping?** Every actuator has physical limits. Sending a motor a command of 10.0 when it only accepts -1.0 to 1.0 is at best wasted and at worst dangerous. Clamping the final output to `[output_min, output_max]` enforces actuator limits in software. Set these to match your hardware specifications. ### Trade-offs | Decision | Benefit | Cost | |----------|---------|------| | Anti-windup clamping | Prevents massive overshoot after saturation | Slows convergence when `integral_max` is too low | | Derivative low-pass filter | Eliminates actuator jitter from sensor noise | Adds phase lag; slower response to real transients | | Output clamping | Protects actuators from impossible commands | Controller cannot express urgency beyond the clamp | | Fixed `dt` (from rate) | Simple, no clock dependency in tick | Inaccurate if scheduler tick drifts significantly | | Debug fields in `ControlOutput` | Easy tuning via `horus topic echo pid.output` | Extra 16 bytes per message (negligible for most use cases) | | Single-rate PID | Straightforward implementation | Cannot run inner loop faster than outer loop (use cascaded PID for that) | | Constructor gains | Compile-time configuration, no runtime overhead | Requires restart to change (use `RuntimeParams` for online tuning) | ### Variations - **Velocity PID**: Change `dt` to match your actual tick rate. Use `.rate(1000_u64.hz())` for current/torque loops - **Cascaded PID**: Chain two PID nodes -- outer loop (position) publishes setpoint for inner loop (velocity) - **Online tuning**: Subscribe to a `pid.gains` topic to adjust `kp`, `ki`, `kd` at runtime without restarting - **Feed-forward**: Add a feed-forward term (`ff * setpoint`) to the output for faster response to known disturbances ### Common Errors | Symptom | Cause | Fix | |---------|-------|-----| | Output oscillates wildly | `kd` too high or no derivative filter | Reduce `kd` or increase `alpha` (more filtering) | | Slow to reach setpoint | `kp` too low | Increase `kp`; check `output_max` is not too restrictive | | Overshoots then settles | `ki` too high or `integral_max` too large | Reduce `ki` or lower `integral_max` | | Output saturates at limit | `output_min`/`output_max` too tight | Widen output range to match actuator capability | | Steady-state error | `ki` is zero | Add a small `ki` term (start with 0.1) | | Jerky output at low rates | Running below 100Hz | Increase `.rate()` or use a stronger derivative filter | --- ## See Also - [Differential Drive Recipe](/recipes/differential-drive) — Uses PID output - [Scheduler API](/rust/api/scheduler) — Rate and timing configuration - [Parameters](/development/parameters) — `RuntimeParams` for online tuning - [Scheduler Concepts](/concepts/core-concepts-scheduler) — `tick_once()`, deterministic mode, execution classes --- ## Recipe: LiDAR Obstacle Avoidance (Python) Path: /recipes/lidar-avoidance-python Description: Reactive obstacle avoidance in Python using LaserScan data to generate safe CmdVel velocity commands. ## LiDAR Obstacle Avoidance (Python) Subscribes to `LaserScan` from a 2D LiDAR, splits the scan into three zones (left, center, right), identifies the closest obstacle in each, and publishes reactive `CmdVel` commands. Stops if an obstacle is too close. ### Problem You need a Python node to avoid obstacles in real time using 2D LiDAR scan data without a map. ### When To Use - Mobile robots navigating unknown environments - Reactive safety layer underneath a path planner - Quick prototyping of autonomous navigation in Python ### Prerequisites - HORUS installed ([Installation Guide](/getting-started/installation)) - A LiDAR driver publishing `LaserScan` to `lidar.scan` ### horus.toml ```toml [package] name = "lidar-avoidance-py" version = "0.1.0" description = "Reactive obstacle avoidance from LaserScan (Python)" language = "python" ``` ### Complete Code ```python #!/usr/bin/env python3 """Reactive LiDAR obstacle avoidance — three-zone split with safety stop.""" import math import horus from horus import Node, CmdVel, LaserScan, us, ms # ── Safety zones (meters) ──────────────────────────────────── STOP_DISTANCE = 0.3 # emergency stop SLOW_DISTANCE = 0.8 # reduce speed CRUISE_SPEED = 0.5 # m/s forward TURN_SPEED = 0.8 # rad/s turning # ── Helpers ────────────────────────────────────────────────── def min_range(ranges, start, end): """Find minimum valid range in a slice of the scan.""" valid = [r for r in ranges[start:end] if math.isfinite(r) and r > 0.01] return min(valid) if valid else float("inf") # ── Node callbacks ─────────────────────────────────────────── def avoidance_tick(node): # IMPORTANT: always recv() every tick to drain the buffer scan = node.recv("lidar.scan") if scan is None: return # no data yet — skip this tick ranges = scan.ranges n = len(ranges) if n == 0: return # Split scan into three zones: left, center, right third = n // 3 left_min = min_range(ranges, 0, third) center_min = min_range(ranges, third, 2 * third) right_min = min_range(ranges, 2 * third, n) # Reactive behavior if center_min < STOP_DISTANCE: # WARNING: obstacle dead ahead — emergency stop cmd = CmdVel(linear=0.0, angular=0.0) elif center_min < SLOW_DISTANCE: # Obstacle ahead — turn toward the more open side angular = TURN_SPEED if left_min > right_min else -TURN_SPEED cmd = CmdVel(linear=0.1, angular=angular) elif left_min < SLOW_DISTANCE: # Obstacle on left — veer right cmd = CmdVel(linear=CRUISE_SPEED * 0.7, angular=-TURN_SPEED * 0.5) elif right_min < SLOW_DISTANCE: # Obstacle on right — veer left cmd = CmdVel(linear=CRUISE_SPEED * 0.7, angular=TURN_SPEED * 0.5) else: # Clear — cruise forward cmd = CmdVel(linear=CRUISE_SPEED, angular=0.0) node.send("cmd_vel", cmd) def avoidance_shutdown(node): # SAFETY: stop the robot on exit node.send("cmd_vel", CmdVel(linear=0.0, angular=0.0)) print("Avoidance: shutdown — robot stopped") # ── Main ───────────────────────────────────────────────────── avoidance_node = Node( name="Avoidance", tick=avoidance_tick, shutdown=avoidance_shutdown, rate=20, # 20 Hz — match typical LiDAR rate order=0, subs=["lidar.scan"], pubs=["cmd_vel"], on_miss="warn", ) if __name__ == "__main__": horus.run(avoidance_node) ``` ### Expected Output ```text [HORUS] Scheduler running — tick_rate: 1000 Hz [HORUS] Node "Avoidance" started (20 Hz) ^C Avoidance: shutdown — robot stopped [HORUS] Shutting down... [HORUS] Node "Avoidance" shutdown complete ``` ### Key Points - **Three-zone split** (left/center/right) is the simplest reactive architecture — extend to N zones for smoother behavior - **`min_range()` filters** invalid readings (`NaN`, `Inf`, near-zero) before comparison - **`STOP_DISTANCE`** is the hard safety limit — tune to your robot's stopping distance at cruise speed - **`shutdown()` sends zero velocity** — robot stops even if killed mid-avoidance - **20 Hz matches most 2D LiDARs** (RPLiDAR A1/A2, Hokuyo URG) — no benefit running faster than the sensor - **Pair with a differential drive node** — this publishes `cmd_vel`, the drive node subscribes to it ### Variations - **N-zone split**: Divide the scan into more zones (e.g., 8) for smoother steering gradients - **Speed scaling**: Scale `CRUISE_SPEED` proportionally to nearest obstacle distance - **Rear sensor**: Add a second `LaserScan` subscriber for rear obstacle detection during reversing - **Hysteresis**: Add state tracking to prevent oscillating between turn directions ### Common Errors | Symptom | Cause | Fix | |---------|-------|-----| | Robot stops but nothing is nearby | `NaN`/`Inf` in scan ranges not filtered | Check `min_range()` filters invalid readings | | Robot turns in circles | `STOP_DISTANCE` too large for the environment | Reduce `STOP_DISTANCE` or increase `SLOW_DISTANCE` gap | | Robot hits obstacles | `STOP_DISTANCE` too small for braking distance | Increase `STOP_DISTANCE` to match max speed stopping distance | | No velocity commands published | No `LaserScan` on `lidar.scan` | Verify LiDAR driver is running with `horus monitor` | | Jittery steering | Scan data noisy or rate too high | Add temporal smoothing or reduce `rate` to match LiDAR rate | --- ## See Also - [LiDAR Obstacle Avoidance (Rust)](/recipes/lidar-obstacle-avoidance) — Rust version of this recipe - [Emergency Stop (Python)](/recipes/emergency-stop-python) — Safety stop pattern --- ## Recipe: LiDAR Obstacle Avoidance Path: /recipes/lidar-obstacle-avoidance Description: Reactive obstacle avoidance using LaserScan data to generate safe velocity commands. ## LiDAR Obstacle Avoidance Reads `LaserScan` data from a 2D LiDAR, identifies obstacles in three zones (left, center, right), and publishes reactive `CmdVel` commands to avoid collisions. Stops if an obstacle is too close. ### Problem You need a robot to avoid obstacles in real time using 2D LiDAR scan data without a pre-built map. ### When To Use - Mobile robots navigating unknown environments - Reactive safety layer underneath a path planner - Quick prototyping of autonomous navigation ### Prerequisites - HORUS installed ([Installation Guide](/getting-started/installation)) - A LiDAR driver publishing `LaserScan` to `lidar.scan` (or the [IMU Reader](/recipes/imu-reader) recipe for testing) ### horus.toml ```toml [package] name = "lidar-avoidance" version = "0.1.0" description = "Reactive obstacle avoidance from LaserScan" ``` ### Complete Code ### Expected Output ```text [HORUS] Scheduler running — tick_rate: 20 Hz [HORUS] Node "Avoidance" started (Rt, 20 Hz, budget: 40.0ms, deadline: 47.5ms) ^C [HORUS] Shutting down... [HORUS] Node "Avoidance" shutdown complete ``` ### Key Points - **Three-zone split** (left/center/right) is the simplest reactive architecture — extend to N zones for smoother behavior - **`min_range()` filters** invalid readings (`NaN`, `Inf`, near-zero) before comparison - **`STOP_DISTANCE`** is the hard safety limit — tune to your robot's stopping distance at cruise speed - **`shutdown()` sends zero velocity** — robot stops even if killed mid-avoidance-maneuver - **20Hz matches most 2D LiDARs** (RPLiDAR A1/A2, Hokuyo URG) — no benefit running faster than sensor - **Pair with differential-drive recipe** — this publishes `cmd_vel`, the drive recipe subscribes to it ### Variations - **N-zone split**: Divide the scan into more zones (e.g., 8) for smoother steering gradients - **Vector Field Histogram (VFH)**: Replace zone logic with polar histogram for denser environments - **Speed scaling**: Scale `CRUISE_SPEED` proportionally to nearest obstacle distance for smoother deceleration - **Rear sensor**: Add a second `LaserScan` subscriber for rear obstacle detection during reversing ### Common Errors | Symptom | Cause | Fix | |---------|-------|-----| | Robot stops but nothing is nearby | `NaN`/`Inf` in scan ranges not filtered | Check `min_range()` filters invalid readings | | Robot turns in circles | `STOP_DISTANCE` too large for the environment | Reduce `STOP_DISTANCE` or increase `SLOW_DISTANCE` gap | | Robot hits obstacles | `STOP_DISTANCE` too small for braking distance | Increase `STOP_DISTANCE` to match max speed stopping distance | | No velocity commands published | No `LaserScan` on `lidar.scan` | Verify LiDAR driver is running with `horus monitor` | | Jittery steering | Scan data noisy or rate too high | Add temporal smoothing or reduce `.rate()` to match LiDAR rate | --- ## See Also - [LaserScan](/stdlib/messages/laser-scan) — LiDAR message type - [Emergency Stop Recipe](/recipes/emergency-stop) — Safety stop pattern --- ## Recipe: PID Controller (C++) Path: /recipes/pid-controller-cpp Description: Generic PID control loop with configurable gains, anti-windup, and derivative filtering. ## PID Controller (C++) A reusable PID controller node that reads a setpoint and measured value, then publishes a control output. Includes integral anti-windup and derivative low-pass filtering. ### Problem You need a closed-loop controller that drives a measured value toward a target setpoint with tunable response characteristics. ### How PID Works A PID controller is a **feedback loop**. It continuously measures the difference between where you *want* to be (setpoint) and where you *are* (measured value), then computes a control output to close that gap. **P (Proportional) — reacts to the present error** `P = Kp * error` Large error produces large output. P alone gets you close but usually leaves steady-state error — at some point the output is too small to overcome friction. **I (Integral) — eliminates steady-state error** `I = Ki * sum_of_errors_over_time` The integral accumulates past errors. Even tiny current error builds up until the system reaches setpoint exactly. Risk: if the actuator saturates, the integral keeps growing (*windup*). That's why the code clamps integral to `integral_max_`. **D (Derivative) — reduces overshoot** `D = Kd * rate_of_change_of_error` Looks at how fast error is changing. If error is shrinking rapidly, D produces a braking force preventing overshoot. Downside: differentiating a noisy sensor amplifies noise, so the code applies a low-pass filter. **Combined output:** `output = clamp(P + I + D, -1.0, 1.0)` ### When To Use - Position, velocity, or temperature control loops - Any system with a measurable output and controllable input - When you need tunable response without a plant model ### Prerequisites - HORUS installed ([Installation Guide](/docs/getting-started/installation)) - Basic understanding of nodes and topics ([C++ Quick Start](/docs/getting-started/quick-start-cpp)) ### Complete Code ```cpp #include #include #include #include using namespace horus::literals; class PidController : public horus::Node { public: PidController(double kp, double ki, double kd) : Node("pid_controller"), kp_(kp), ki_(ki), kd_(kd) { setpoint_sub_ = subscribe("pid.setpoint"); measurement_sub_ = subscribe("pid.measurement"); output_pub_ = advertise("pid.output"); } void tick() override { // IMPORTANT: always recv() every tick to drain buffers if (auto sp = setpoint_sub_->recv()) { target_ = sp->get()->linear; } if (auto fb = measurement_sub_->recv()) { measured_ = fb->get()->linear; } double error = target_ - measured_; double dt = 0.01; // 100 Hz tick rate // ── Proportional ──────────────────────────────────────── double p_term = kp_ * error; // ── Integral with anti-windup ─────────────────────────── integral_ += error * dt; integral_ = std::clamp(integral_, -integral_max_, integral_max_); double i_term = ki_ * integral_; // ── Derivative with low-pass filter ───────────────────── // Raw derivative amplifies sensor noise. // Filter: alpha * prev + (1 - alpha) * new double raw_derivative = (error - prev_error_) / dt; filtered_derivative_ = alpha_ * filtered_derivative_ + (1.0 - alpha_) * raw_derivative; double d_term = kd_ * filtered_derivative_; prev_error_ = error; // ── Output (clamped to actuator limits) ───────────────── double output = std::clamp(p_term + i_term + d_term, output_min_, output_max_); // Publish with diagnostic fields horus::msg::CmdVel cmd{}; cmd.linear = static_cast(output); cmd.angular = static_cast(error); // expose error for monitoring output_pub_->send(cmd); // Log every 100 ticks (1 Hz at 100 Hz) if (++tick_count_ % 100 == 0) { char buf[128]; std::snprintf(buf, sizeof(buf), "err=%.3f P=%.3f I=%.3f D=%.3f out=%.3f", error, p_term, i_term, d_term, output); horus::log::info("pid", buf); } } void enter_safe_state() override { // Zero output on safety event — stop actuator horus::msg::CmdVel stop{}; output_pub_->send(stop); integral_ = 0; // reset integral to prevent windup during recovery horus::blackbox::record("pid", "Entered safe state, output zeroed"); } private: horus::Subscriber* setpoint_sub_; horus::Subscriber* measurement_sub_; horus::Publisher* output_pub_; // Gains double kp_, ki_, kd_; // State double integral_ = 0; double prev_error_ = 0; double filtered_derivative_ = 0; double target_ = 0; double measured_ = 0; int tick_count_ = 0; // Limits double output_min_ = -1.0; double output_max_ = 1.0; double integral_max_ = 0.5; // anti-windup clamp double alpha_ = 0.8; // derivative filter (0=no filter, 0.99=heavy) }; int main() { horus::Scheduler sched; sched.tick_rate(100_hz).name("pid_demo"); PidController pid(1.0, 0.1, 0.05); // Kp=1, Ki=0.1, Kd=0.05 sched.add(pid) .order(10) .budget(5_ms) .on_miss(horus::Miss::Skip) // skip if overrun — don't accumulate lag .build(); sched.spin(); } ``` ### Tuning Guide **Start with P only** (Ki=0, Kd=0). Increase Kp until the system oscillates, then back off 50%. **Add I** to eliminate steady-state error. Start small (Ki = Kp/10). If the system overshoots and oscillates slowly, reduce Ki. **Add D** to reduce overshoot. Start with Kd = Kp/5. If the output is noisy, increase `alpha_` (heavier filter). **Ziegler-Nichols method:** 1. Set Ki=0, Kd=0 2. Increase Kp until sustained oscillation (this is Ku) 3. Measure oscillation period Tu 4. Set: Kp = 0.6 * Ku, Ki = 2 * Kp / Tu, Kd = Kp * Tu / 8 ### Common Errors | Symptom | Cause | Fix | |---------|-------|-----| | Oscillation that grows | Kp too high | Reduce Kp by 50% | | Slow convergence | Kp too low or Ki too low | Increase Kp or Ki | | Overshoot then settle | Kd too low | Increase Kd | | Noisy output | Kd amplifying sensor noise | Increase alpha (0.9+) or reduce Kd | | Windup after saturation | integral_max too large | Reduce integral_max to match actuator range | | Output stuck at ±1.0 | Sustained error with high Ki | Reduce Ki or increase integral_max clamp | ### Design Decisions | Choice | Rationale | |--------|-----------| | `horus::Node` subclass | State (integral, prev_error) persists across ticks cleanly | | `enter_safe_state()` zeros output | Actuator stops immediately on safety event | | Derivative filter | Raw `d/dt(error)` amplifies sensor noise — low-pass essential | | Integral clamp | Prevents windup when actuator is saturated | | `Miss::Skip` policy | PID with accumulated lag is worse than skipping a tick | | 100 Hz tick rate | Fast enough for motor control, slow enough for most sensors | ### Variations **Velocity PID** — for motor speed control, compute on velocity error instead of position: ```cpp double error = target_velocity - measured_velocity; ``` **Cascaded PID** — outer loop sets inner loop's setpoint: ```cpp // Outer: position → velocity setpoint // Inner: velocity → motor command double vel_setpoint = position_pid.compute(pos_error); double cmd = velocity_pid.compute(vel_setpoint - measured_vel); ``` **Runtime gain tuning** with `horus::Params`: ```cpp void tick() override { kp_ = params_.get("pid_kp", kp_); ki_ = params_.get("pid_ki", ki_); kd_ = params_.get("pid_kd", kd_); // ... rest of PID computation } ``` ### Key Takeaways - Always drain subscriber buffers every tick (`recv()` unconditionally) - Use `enter_safe_state()` for safety-critical actuator control - Anti-windup is mandatory for any real PID — integral grows unbounded without it - Filter the derivative — raw differentiation amplifies noise - `Miss::Skip` is better than `Miss::Warn` for control loops — accumulated lag causes oscillation --- ## PID Controller (Python) Path: /recipes/pid-controller-python Description: Generic PID loop with configurable gains, anti-windup, output clamping, and dt-based integration ## PID Controller (Python) A reusable PID controller node that subscribes to a setpoint and measured value, computes a control output, and publishes the result. Includes anti-windup and output clamping. ### Problem You need a closed-loop controller (speed, position, temperature, etc.) that runs at a fixed rate with deterministic timesteps. ### When To Use - Motor speed control (RPM tracking) - Position holding (arm joints, pan-tilt) - Temperature regulation - Any setpoint-tracking control loop ### Prerequisites - HORUS installed ([Installation Guide](/getting-started/installation)) - Familiarity with [Nodes](/concepts/core-concepts-nodes) ### horus.toml ```toml [package] name = "pid-controller-py" version = "0.1.0" language = "python" ``` ### Complete Code ```python import horus from horus import Node, run, us, ms # ── PID gains ───────────────────────────────────────── KP = 2.0 KI = 0.5 KD = 0.1 OUTPUT_MIN = -1.0 OUTPUT_MAX = 1.0 INTEGRAL_MAX = 10.0 # anti-windup limit # ── State ───────────────────────────────────────────── integral = [0.0] prev_error = [0.0] def pid_init(node): integral[0] = 0.0 prev_error[0] = 0.0 node.log_info(f"PID: kp={KP}, ki={KI}, kd={KD}") def pid_tick(node): setpoint = node.recv("setpoint") measured = node.recv("measured") if setpoint is None or measured is None: return sp = setpoint.get("value", 0.0) if isinstance(setpoint, dict) else float(setpoint) mv = measured.get("value", 0.0) if isinstance(measured, dict) else float(measured) # Error error = sp - mv # dt from framework clock — fixed in deterministic mode, real in production dt = horus.dt() # Proportional p_term = KP * error # Integral with anti-windup integral[0] += error * dt integral[0] = max(-INTEGRAL_MAX, min(INTEGRAL_MAX, integral[0])) i_term = KI * integral[0] # Derivative (on error, not measurement — simpler but noisier) d_term = KD * (error - prev_error[0]) / dt if dt > 0 else 0.0 prev_error[0] = error # IMPORTANT: clamp output to actuator limits output = p_term + i_term + d_term output = max(OUTPUT_MIN, min(OUTPUT_MAX, output)) node.send("control", { "output": output, "error": error, "p": p_term, "i": i_term, "d": d_term, }) pid = Node( name="PID", init=pid_init, tick=pid_tick, rate=100, order=5, budget=200 * us, subs=["setpoint", "measured"], pubs=["control"], ) run(pid, tick_rate=100) ``` ### Expected Output ```text [HORUS] Scheduler running — tick_rate: 100 Hz [HORUS] Node "PID" started (BestEffort, 100 Hz, budget: 200μs) [HORUS] PID: kp=2.0, ki=0.5, kd=0.1 ``` ### Key Points - **`horus.dt()`** gives the actual timestep — adapts to rate changes, fixed in deterministic mode for reproducible behavior - **Anti-windup** clamps the integral term to prevent saturation after long errors - **Output clamping** prevents actuator damage (motors, heaters, etc.) - **Derivative on error** is simple but amplifies noise — for noisy measurements, use derivative on measurement: `d_term = -KD * (mv - prev_mv) / dt` - **Deterministic mode** (`deterministic=True`) makes `dt()` return a fixed `1/rate`, so PID behavior is identical across runs ### Variations **Derivative on Measurement** (less noise): ```python d_term = -KD * (mv - prev_measured[0]) / dt if dt > 0 else 0.0 prev_measured[0] = mv ``` **Runtime Gain Tuning** (via Params): ```python import horus params = horus.Params() params.set("kp", 2.0) params.set("ki", 0.5) def pid_tick(node): kp = params.get_or("kp", 2.0) ki = params.get_or("ki", 0.5) # ... use kp, ki in calculation ``` **Cascaded PID** (position → velocity → torque): ```python # Outer loop: position → velocity setpoint (10 Hz) pos_pid = Node(name="pos_pid", tick=pos_tick, rate=10, order=0, subs=["pos_setpoint", "pos_measured"], pubs=["vel_setpoint"]) # Inner loop: velocity → torque command (100 Hz) vel_pid = Node(name="vel_pid", tick=vel_tick, rate=100, order=1, subs=["vel_setpoint", "vel_measured"], pubs=["torque_cmd"]) run(pos_pid, vel_pid, tick_rate=100) ``` ### Common Errors | Symptom | Cause | Fix | |---------|-------|-----| | Output oscillates wildly | Kd too high or noisy measurement | Lower Kd, use derivative-on-measurement, or low-pass filter | | Output saturates at max/min | Ki too high or sustained error | Lower Ki or increase INTEGRAL_MAX with caution | | Slow response to setpoint changes | Kp too low | Increase Kp (but watch for oscillation) | | Different behavior in deterministic mode | Using `time.time()` instead of `horus.dt()` | Always use `horus.dt()` for timestep | | Integral windup after long error | INTEGRAL_MAX too large | Reduce anti-windup limit | --- ## See Also - [PID Controller (Rust)](/recipes/pid-controller) — Rust version - [Differential Drive (Python)](/recipes/differential-drive-python) — Uses PID for wheel speed control - [Clock API](/python/api/clock) — `horus.dt()` and deterministic mode - [Rate & Params](/python/api/rate-params) — Runtime gain tuning with Params --- ## Recipe: Differential Drive (C++) Path: /recipes/differential-drive-cpp Description: Two-wheel robot control with odometry feedback ## Differential Drive (C++) Convert velocity commands into left/right wheel speeds for a two-wheeled mobile robot, with dead-reckoning odometry and safe shutdown. ### Problem You have a robot with two independently driven wheels and a caster. A path planner or teleoperation system publishes `CmdVel` (linear velocity + angular velocity). You need to split that into individual wheel speeds, track the robot's position via odometry, and guarantee the wheels stop if the node exits or enters a safety state. ### How It Works A differential drive robot steers by varying the speed difference between its two wheels. When both spin at the same speed, the robot drives straight. When one spins faster, the robot curves. When they spin at equal speed in opposite directions, the robot rotates in place. **Inverse kinematics** converts a `(v, omega)` command into wheel angular velocities: ```text v_left = (v - omega * L / 2) / r v_right = (v + omega * L / 2) / r ``` Where `L` is the wheel base (center-to-center distance) and `r` is the wheel radius. **Forward kinematics** (odometry) reverses the process. Given how far each wheel moved, compute how the robot's position changed: ```text x += v * cos(theta) * dt y += v * sin(theta) * dt theta += omega * dt ``` This is first-order Euler integration. It drifts over time but provides the baseline position estimate that SLAM and localization systems build upon. ### When To Use - Any 2-wheel differential drive robot (TurtleBot, warehouse AGV, hobby bots) - Converting `CmdVel` from a planner or teleop to motor outputs - When you need velocity clamping and safe shutdown on a ground platform ### Prerequisites - HORUS installed ([Installation Guide](/docs/getting-started/installation)) - Basic understanding of nodes and topics ([C++ Quick Start](/docs/getting-started/quick-start-cpp)) ### Complete Code ```cpp #include #include #include #include using namespace horus::literals; class DiffDrive : public horus::Node { public: DiffDrive(double wheel_base, double wheel_radius, double max_wheel_speed) : Node("diff_drive"), wheel_base_(wheel_base), wheel_radius_(wheel_radius), max_wheel_speed_(max_wheel_speed) { // Subscribe to velocity commands from planner/teleop cmd_sub_ = subscribe("cmd_vel"); // Publish odometry for localization / SLAM odom_pub_ = advertise("odom"); // Publish individual wheel velocities for the motor driver motor_pub_ = advertise("motor.wheels"); } void tick() override { // Always drain the subscriber buffer to avoid stale commands if (auto cmd = cmd_sub_->recv()) { last_v_ = cmd->get()->linear; last_w_ = cmd->get()->angular; } // ── Inverse kinematics: CmdVel -> wheel angular velocities ── // Left wheel traces the inner arc, right wheel the outer arc double v_left = (last_v_ - last_w_ * wheel_base_ / 2.0) / wheel_radius_; double v_right = (last_v_ + last_w_ * wheel_base_ / 2.0) / wheel_radius_; // Clamp to motor limits — prevents burning out motors on bad commands v_left = std::clamp(v_left, -max_wheel_speed_, max_wheel_speed_); v_right = std::clamp(v_right, -max_wheel_speed_, max_wheel_speed_); // Pack into CmdVel (linear=left, angular=right) for the motor driver horus::msg::CmdVel wheels{}; wheels.linear = static_cast(v_left); wheels.angular = static_cast(v_right); motor_pub_->send(wheels); // ── Forward kinematics: update dead-reckoning odometry ────── double dt = 1.0 / 50.0; // matches 50 Hz tick rate x_ += last_v_ * std::cos(theta_) * dt; y_ += last_v_ * std::sin(theta_) * dt; theta_ += last_w_ * dt; // Normalize theta to [-pi, pi] to prevent float drift if (theta_ > M_PI) theta_ -= 2.0 * M_PI; if (theta_ < -M_PI) theta_ += 2.0 * M_PI; horus::msg::Odometry odom{}; odom.pose.x = x_; odom.pose.y = y_; odom.pose.theta = theta_; odom_pub_->send(odom); // Log at 1 Hz (every 50th tick at 50 Hz) if (++tick_count_ % 50 == 0) { char buf[128]; std::snprintf(buf, sizeof(buf), "pos=(%.2f, %.2f) theta=%.1f deg wheels=(%.1f, %.1f) rad/s", x_, y_, theta_ * 180.0 / M_PI, v_left, v_right); horus::log::info("diff_drive", buf); } } void enter_safe_state() override { // Zero both motors immediately on safety event horus::msg::CmdVel stop{}; motor_pub_->send(stop); last_v_ = 0; last_w_ = 0; horus::blackbox::record("diff_drive", "Entered safe state, motors zeroed"); } private: horus::Subscriber* cmd_sub_; horus::Publisher* odom_pub_; horus::Publisher* motor_pub_; // Physical parameters double wheel_base_; double wheel_radius_; double max_wheel_speed_; // rad/s limit per wheel // Command state (persists between ticks for hold-last-value behavior) double last_v_ = 0; double last_w_ = 0; // Odometry state (double precision — running sums accumulate error with f32) double x_ = 0, y_ = 0, theta_ = 0; int tick_count_ = 0; }; int main() { horus::Scheduler sched; sched.tick_rate(50_hz).name("diff_drive_demo"); // TurtleBot3 Burger: 16cm wheelbase, 3.3cm wheel radius, ~60 rad/s max DiffDrive drive(0.16, 0.033, 61.5); sched.add(drive) .order(10) .budget(5_ms) .on_miss(horus::Miss::Warn) // warn on overrun — not safety-critical alone .build(); sched.spin(); } ``` ### Common Errors | Symptom | Cause | Fix | |---------|-------|-----| | Robot drives in circles when commanded straight | `wheel_base_` constant wrong | Measure center-to-center distance between wheels precisely | | Wheels spin after Ctrl+C | Missing `enter_safe_state()` | Always zero motors in safe state and shutdown | | Odometry drifts even driving straight | Wheel radii unequal in hardware | Calibrate `wheel_radius_` per wheel or average measured values | | Motor driver rejects commands | Wheel speed exceeds driver limit | Reduce `max_wheel_speed_` to match your motor's spec | | Odometry theta wraps unexpectedly | No angle normalization | Keep theta in [-pi, pi] with modular wrapping | | Position precision degrades far from origin | Using float for odometry state | Use double for x/y/theta running sums, cast to float only in output | ### Design Decisions | Choice | Rationale | |--------|-----------| | `horus::Node` subclass | Odometry state (x, y, theta) must persist across ticks cleanly | | Hold-last-value for `cmd_sub_` | If no new command arrives, keep the last velocity rather than stopping — lets sparse planners work | | `double` for odometry, `float` for output | Running sums lose precision with float; downstream consumers do not need double | | `enter_safe_state()` zeros motors | Prevents runaway wheels on any scheduler safety event | | `Miss::Warn` policy | Drive node alone is not safety-critical — pair with Emergency Stop for `Miss::SafeMode` | | 50 Hz tick rate | Fast enough for ground robots, slow enough for affordable compute | | Wheel speed clamping | Prevents motor damage from upstream bugs or corrupted messages | ### Variations **Acceleration limiting** — prevent wheel slip on sudden commands: ```cpp double target_left = (last_v_ - last_w_ * wheel_base_ / 2.0) / wheel_radius_; double max_accel = 5.0; // rad/s^2 double dt = 1.0 / 50.0; v_left = prev_left_ + std::clamp(target_left - prev_left_, -max_accel * dt, max_accel * dt); prev_left_ = v_left; ``` **Encoder-based odometry** — use actual wheel encoder ticks instead of integrating commands: ```cpp // More accurate than command integration: measures what the wheels actually did double d_left = (encoder_left - prev_enc_left_) * meters_per_tick_; double d_right = (encoder_right - prev_enc_right_) * meters_per_tick_; double ds = (d_left + d_right) / 2.0; double d_theta = (d_right - d_left) / wheel_base_; x_ += ds * std::cos(theta_ + d_theta / 2.0); // second-order Runge-Kutta y_ += ds * std::sin(theta_ + d_theta / 2.0); theta_ += d_theta; ``` **Mecanum (holonomic) drive** — 4-wheel omnidirectional: ```cpp // Requires lateral velocity (vy) in addition to v and omega double fl = (v - omega * L / 2.0 - vy) / radius; double fr = (v + omega * L / 2.0 + vy) / radius; double rl = (v - omega * L / 2.0 + vy) / radius; double rr = (v + omega * L / 2.0 - vy) / radius; ``` ### Key Takeaways - Always clamp wheel speeds to motor limits — upstream bugs produce unbounded velocities - Use `double` for odometry running sums — `float` loses sub-millimeter precision past 100m - `enter_safe_state()` must zero motor outputs, not just stop computing - Hold-last-value on `cmd_sub_` lets sparse planners work without the robot stopping between commands - Normalize theta to [-pi, pi] to prevent floating-point drift over long runs --- ## Recipe: Servo Controller (Python) Path: /recipes/servo-controller-python Description: Multi-servo bus controller in Python with position commands, feedback reading, joint limits, and safe shutdown. ## Servo Controller (Python) Controls a bus of servos (Dynamixel, hobby PWM, etc.). Subscribes to `ServoCommand` messages, enforces joint limits, writes to hardware, publishes `JointState` feedback. Returns all servos to home position on shutdown. ### Problem You need to control multiple servos on a bus with joint limit enforcement and safe shutdown from Python. ### When To Use - Robot arms with 3-12 servos on a shared bus - Pan-tilt camera mounts - Any multi-actuator system requiring coordinated position control ### Prerequisites - HORUS installed ([Installation Guide](/getting-started/installation)) - Python 3.8+ with `horus` package ### horus.toml ```toml [package] name = "servo-controller-py" version = "0.1.0" description = "Multi-servo bus controller with safe shutdown" language = "python" ``` ### Complete Code ```python #!/usr/bin/env python3 """Multi-servo controller with joint limits and safe shutdown.""" import math import horus from horus import Node, ServoCommand, JointState, us, ms # ── Configuration ──────────────────────────────────────────── NUM_SERVOS = 6 HOME_POSITION = 0.0 # radians — safe resting position JOINT_LIMIT = math.pi # +/- pi radians TEMP_WARNING = 70.0 # celsius SERVO_NAMES = [f"joint_{i}" for i in range(NUM_SERVOS)] # ── State ──────────────────────────────────────────────────── positions = [HOME_POSITION] * NUM_SERVOS velocities = [0.0] * NUM_SERVOS temperatures = [35.0] * NUM_SERVOS # ── Hardware stubs ─────────────────────────────────────────── def write_servo(servo_id, position): """Write position to a single servo — replace with real driver. For Dynamixel: use dynamixel_sdk For hobby PWM: use RPi.GPIO or pigpio """ if 0 <= servo_id < NUM_SERVOS: positions[servo_id] = position def read_feedback(): """Read feedback from all servos — replace with real driver.""" return positions[:], velocities[:], temperatures[:] # ── Node callbacks ─────────────────────────────────────────── def servo_tick(node): # IMPORTANT: always recv() every tick to drain the buffer cmd = node.recv("servo.command") if cmd is not None: # SAFETY: enforce joint limits to prevent mechanical damage clamped_pos = max(-JOINT_LIMIT, min(JOINT_LIMIT, cmd.position)) write_servo(cmd.servo_id, clamped_pos) # Read current state from all servos pos, vel, temp = read_feedback() # WARNING: check for overheating servos for i, t in enumerate(temp): if t > TEMP_WARNING: print(f"WARNING: servo {i} temperature {t:.1f}C exceeds limit") # Publish joint state feedback feedback = JointState( names=SERVO_NAMES, positions=pos, velocities=vel, efforts=[0.0] * NUM_SERVOS, ) node.send("servo.feedback", feedback) def servo_init(node): print(f"ServoController: initialized {NUM_SERVOS} servos") # Read current positions from hardware before accepting commands pos, _, _ = read_feedback() for i, p in enumerate(pos): positions[i] = p def servo_shutdown(node): # SAFETY: return ALL servos to home position before exiting for i in range(NUM_SERVOS): write_servo(i, HOME_POSITION) feedback = JointState( names=SERVO_NAMES, positions=[HOME_POSITION] * NUM_SERVOS, velocities=[0.0] * NUM_SERVOS, efforts=[0.0] * NUM_SERVOS, ) node.send("servo.feedback", feedback) print("ServoController: all servos returned to home position") # ── Main ───────────────────────────────────────────────────── servo_node = Node( name="ServoController", tick=servo_tick, init=servo_init, shutdown=servo_shutdown, rate=100, # 100 Hz servo update rate order=0, subs=["servo.command"], pubs=["servo.feedback"], budget=800 * us, # 800 us budget for bus communication on_miss="warn", ) if __name__ == "__main__": horus.run(servo_node) ``` ### Expected Output ```text [HORUS] Scheduler running — tick_rate: 1000 Hz ServoController: initialized 6 servos [HORUS] Node "ServoController" started (100 Hz) ^C ServoController: all servos returned to home position [HORUS] Shutting down... [HORUS] Node "ServoController" shutdown complete ``` ### Key Points - **`ServoCommand`** has `servo_id`, `position`, `speed`, and `enable` fields — one command per servo per tick - **`JointState`** publishes all servo positions/velocities in a single message with named joints - **Joint limit clamping** in `tick()` prevents hardware damage regardless of upstream commands - **Temperature monitoring** catches overheating before servo damage - **`shutdown()` returns to home** — critical for robot arms that hold pose under gravity - **`init()` reads current positions** — prevents jerking to an unexpected position on startup - **800 us budget** accounts for serial bus latency (Dynamixel at 1Mbps takes ~500 us for 6 servos) ### Variations - **Batch commands**: Accept all servo positions in a single `JointCommand` message instead of per-servo `ServoCommand` - **Velocity mode**: Replace position commands with velocity targets for wheeled joints - **Trajectory interpolation**: Accept waypoints and interpolate between them over time - **Current limiting**: Track servo current draw and reduce torque if limits are exceeded ### Common Errors | Symptom | Cause | Fix | |---------|-------|-----| | Servos jerk on startup | No initial position read before first command | Read current positions in `init()` before accepting commands | | Overheating warnings constantly | Servo loaded beyond continuous rating | Reduce duty cycle or upgrade servos | | Positions overshoot limits | Clamp range too wide for physical joint | Tighten clamp values to match hardware stops | | Bus timeout errors | Baud rate mismatch or cable issue | Verify baud rate matches servo firmware | | Servos do not return to home on Ctrl+C | `shutdown()` not implemented | Always implement `shutdown()` for actuator nodes | --- ## See Also - [Real Hardware Recipe](/recipes/real-hardware) — Complete serial motor examples (no stubs) - [Servo Controller (Rust)](/recipes/servo-controller) — Rust version of this recipe - [JointState](/stdlib/messages/joint-state) — Joint position feedback type --- ## Recipe: Servo Controller Path: /recipes/servo-controller Description: Multi-servo bus controller with position commands, feedback reading, and ordered safe shutdown. ## Servo Controller Controls a bus of servos (e.g., Dynamixel, hobby PWM). Reads position commands from a topic, writes to hardware, publishes joint feedback. Implements ordered shutdown -- all servos return to home position before exit. ### Problem You need to control multiple servos on a bus (serial, CAN, or PWM) with joint limit enforcement and safe shutdown. ### When To Use - Robot arms with 3-12 servos on a shared bus - Pan-tilt camera mounts - Any multi-actuator system requiring coordinated position control ### Prerequisites - HORUS installed ([Installation Guide](/getting-started/installation)) - Servo hardware or a simulated servo bus for testing ### horus.toml ```toml [package] name = "servo-controller" version = "0.1.0" description = "Multi-servo bus with safe shutdown" ``` ### Complete Code ### Expected Output ```text [HORUS] Scheduler running — tick_rate: 100 Hz [HORUS] Node "Servo" started (Rt, 100 Hz, budget: 800μs, deadline: 9.5ms) ^C [HORUS] Shutting down... [HORUS] Node "Servo" shutdown complete ``` ### Key Points - **Fixed-size arrays** (`[f32; NUM_SERVOS]`) enable `#[repr(C)]` + `Copy` for zero-copy IPC - **Joint limit clamping** in `tick()` prevents hardware damage regardless of upstream commands - **Temperature monitoring** catches overheating before servo damage - **`shutdown()` returns to home** — critical for robot arms that hold pose under gravity - **800μs budget** accounts for serial bus latency (Dynamixel at 1Mbps takes ~500μs for 6 servos) - **100Hz** is typical for hobby servos; use 200-500Hz for industrial servos ### Variations - **Velocity mode**: Replace position commands with velocity targets for wheeled joints - **Trajectory interpolation**: Accept waypoints and interpolate between them over time in `tick()` - **Mixed bus**: Use different `NUM_SERVOS` constants per bus and run multiple `ServoNode` instances - **Current limiting**: Read `current_amps` from feedback and reduce torque if limits are exceeded ### Common Errors | Symptom | Cause | Fix | |---------|-------|-----| | Servos jerk on startup | No initial position read before first command | Read current positions in `init()` before accepting commands | | Overheating warnings constantly | Servo loaded beyond continuous rating | Reduce duty cycle or upgrade servos | | Positions overshoot limits | Clamp range too wide for physical joint | Tighten `clamp()` values to match hardware stops | | Bus timeout errors | Baud rate mismatch or cable issue | Verify baud rate in `horus.toml` matches servo firmware | | Servos do not return to home on Ctrl+C | `shutdown()` not implemented | Always implement `shutdown()` for actuator nodes | --- ## See Also - [Real Hardware Recipe](/recipes/real-hardware) — Complete serial motor examples (no stubs) - [JointState](/stdlib/messages/joint-state) — Joint position feedback - [Driver API](/rust/api/drivers) — Hardware driver loading --- ## Recipe: IMU Reader (C++) Path: /recipes/imu-reader-cpp Description: Read IMU data and compute orientation with complementary filter ## IMU Reader (C++) Read accelerometer and gyroscope data from an IMU, fuse them with a complementary filter, and publish a stable orientation estimate. ### Problem Raw IMU data is noisy and incomplete. The gyroscope gives accurate short-term rotation rates but drifts over time. The accelerometer gives a gravity reference (absolute orientation) but is noisy and wrong during acceleration. You need to combine both into a single orientation estimate that is both stable and responsive. ### How It Works A **complementary filter** blends two imperfect signals by exploiting their complementary strengths: - **Gyroscope**: Integrate angular velocity to get angle. Accurate over short intervals but drifts because integration accumulates small errors indefinitely. - **Accelerometer**: Compute angle from gravity direction (`atan2(ay, az)` for pitch). No drift, but noisy and corrupted by any linear acceleration (driving, bumping, vibration). The filter runs a weighted blend every tick: ```text angle = alpha * (angle + gyro * dt) + (1 - alpha) * accel_angle ``` - `alpha = 0.98` means 98% gyro trust (smooth, responsive) and 2% accelerometer correction (eliminates long-term drift). - High `alpha` (0.99): smoother but slower to correct drift. Good for slow-moving platforms. - Low `alpha` (0.90): faster drift correction but more susceptible to acceleration noise. This is mathematically equivalent to a first-order high-pass filter on the gyro and a low-pass filter on the accelerometer, with the crossover frequency determined by `alpha` and the sample rate. ### When To Use - Orientation estimation for ground robots, drones, or stabilization platforms - When you need a lightweight filter without the complexity of a full Kalman filter - When the robot experiences moderate linear acceleration (walking, driving) ### Prerequisites - HORUS installed ([Installation Guide](/docs/getting-started/installation)) - Basic understanding of nodes and topics ([C++ Quick Start](/docs/getting-started/quick-start-cpp)) ### Complete Code ```cpp #include #include #include using namespace horus::literals; class ImuReader : public horus::Node { public: ImuReader(double alpha = 0.98) : Node("imu_reader"), alpha_(alpha) { // Subscribe to raw IMU data from the hardware driver imu_sub_ = subscribe("imu.raw"); // Publish filtered orientation for downstream consumers pose_pub_ = advertise("imu.orientation"); } void tick() override { auto data = imu_sub_->recv(); if (!data) return; const auto* imu = data->get(); // ── Extract sensor readings ───────────────────────────────── double ax = imu->linear_acceleration[0]; double ay = imu->linear_acceleration[1]; double az = imu->linear_acceleration[2]; double gx = imu->angular_velocity[0]; // roll rate (rad/s) double gy = imu->angular_velocity[1]; // pitch rate (rad/s) double gz = imu->angular_velocity[2]; // yaw rate (rad/s) double dt = 1.0 / 200.0; // matches 200 Hz tick rate // ── Accelerometer angles from gravity vector ──────────────── // Only valid when linear acceleration is negligible double accel_roll = std::atan2(ay, az); double accel_pitch = std::atan2(-ax, std::sqrt(ay * ay + az * az)); // ── Complementary filter ──────────────────────────────────── // Gyro integration for short-term accuracy, // accelerometer correction for long-term drift elimination roll_ = alpha_ * (roll_ + gx * dt) + (1.0 - alpha_) * accel_roll; pitch_ = alpha_ * (pitch_ + gy * dt) + (1.0 - alpha_) * accel_pitch; // Yaw from gyro only — accelerometer cannot sense yaw // (magnetometer would be needed for absolute yaw reference) yaw_ += gz * dt; // Normalize yaw to [-pi, pi] if (yaw_ > M_PI) yaw_ -= 2.0 * M_PI; if (yaw_ < -M_PI) yaw_ += 2.0 * M_PI; // ── Publish orientation ───────────────────────────────────── horus::msg::Pose2D pose{}; pose.x = roll_; // repurpose x for roll (radians) pose.y = pitch_; // repurpose y for pitch (radians) pose.theta = yaw_; // theta for yaw (radians) pose_pub_->send(pose); // ── Detect anomalies ──────────────────────────────────────── // Gravity magnitude should be ~9.81 m/s^2. Large deviations // indicate high linear acceleration — filter becomes less reliable double gravity_mag = std::sqrt(ax * ax + ay * ay + az * az); if (std::abs(gravity_mag - 9.81) > 2.0) { horus::log::warn("imu_reader", "High linear acceleration detected — filter accuracy degraded"); } // Log at 1 Hz (every 200th tick at 200 Hz) if (++tick_count_ % 200 == 0) { char buf[128]; std::snprintf(buf, sizeof(buf), "roll=%.1f pitch=%.1f yaw=%.1f deg |g|=%.2f", roll_ * 180.0 / M_PI, pitch_ * 180.0 / M_PI, yaw_ * 180.0 / M_PI, gravity_mag); horus::log::info("imu_reader", buf); } } void enter_safe_state() override { // IMU reader has no actuators, but publish a zeroed orientation // so downstream controllers see a neutral state and stop moving horus::msg::Pose2D zero{}; pose_pub_->send(zero); horus::blackbox::record("imu_reader", "Entered safe state, published zero orientation"); } private: horus::Subscriber* imu_sub_; horus::Publisher* pose_pub_; // Filter parameter double alpha_; // gyro trust factor (0.90-0.99 typical) // Orientation state (radians) double roll_ = 0; double pitch_ = 0; double yaw_ = 0; int tick_count_ = 0; }; int main() { horus::Scheduler sched; sched.tick_rate(200_hz).name("imu_demo"); // alpha=0.98: 98% gyro, 2% accel — good default for ground robots ImuReader imu(0.98); sched.add(imu) .order(5) // run early — downstream nodes depend on orientation .budget(200_us) // IMU processing is lightweight .on_miss(horus::Miss::Skip) // skip if overrun — stale orientation is worse than missing one .build(); sched.spin(); } ``` ### Common Errors | Symptom | Cause | Fix | |---------|-------|-----| | Orientation drifts in one direction | Gyro bias not compensated | Calibrate gyro at startup (average readings while stationary) | | Pitch/roll jitter during motion | `alpha` too low (too much accel trust) | Increase alpha to 0.98-0.99 | | Pitch/roll sluggish to correct | `alpha` too high (too little accel correction) | Decrease alpha toward 0.95 | | Yaw drifts continuously | No absolute yaw reference (expected) | Add magnetometer or use SLAM for yaw correction | | Angles jump when robot accelerates | Accelerometer reads linear accel, not just gravity | Increase alpha or gate accel when `\|g\| != 9.81` | | NaN in orientation output | `atan2(0, 0)` or dt=0 | Guard against zero denominators, validate IMU data | ### Design Decisions | Choice | Rationale | |--------|-----------| | Complementary filter over Kalman | Simpler, cheaper, no matrix math — sufficient for most ground robots | | `alpha = 0.98` default | Standard starting point: responsive with low drift | | 200 Hz tick rate | Matches typical IMU output rate (MPU-6050, ICM-20948) | | `Miss::Skip` policy | Stale orientation from accumulated lag is worse than dropping one reading | | Yaw from gyro only | Accelerometer cannot sense yaw — magnetometer needed for correction | | Gravity magnitude check | Detects high linear acceleration where filter accuracy degrades | | `Pose2D` for output | Lightweight, sufficient for 3-axis orientation (roll/pitch/yaw) | ### Variations **Gyro bias calibration** — average readings at startup for drift reduction: ```cpp // During first 500 ticks, accumulate gyro bias if (calibrating_ && cal_count_ < 500) { gx_bias_ += gx; gy_bias_ += gy; gz_bias_ += gz; cal_count_++; return; } if (calibrating_) { gx_bias_ /= 500; gy_bias_ /= 500; gz_bias_ /= 500; calibrating_ = false; horus::log::info("imu_reader", "Gyro calibration complete"); } gx -= gx_bias_; gy -= gy_bias_; gz -= gz_bias_; ``` **Adaptive alpha** — trust accelerometer less during high acceleration: ```cpp double gravity_mag = std::sqrt(ax*ax + ay*ay + az*az); double deviation = std::abs(gravity_mag - 9.81); // Scale alpha from 0.98 (stationary) to 0.999 (high accel) double adaptive_alpha = std::clamp(0.98 + deviation * 0.005, 0.98, 0.999); roll_ = adaptive_alpha * (roll_ + gx * dt) + (1.0 - adaptive_alpha) * accel_roll; ``` **Magnetometer-corrected yaw** — add absolute yaw reference: ```cpp double mag_yaw = std::atan2(mag_y, mag_x); // from magnetometer yaw_ = alpha_ * (yaw_ + gz * dt) + (1.0 - alpha_) * mag_yaw; ``` ### Key Takeaways - The complementary filter is the simplest reliable IMU fusion — start here before reaching for a Kalman filter - `alpha` controls the drift vs. noise trade-off: higher values are smoother but drift more slowly - Yaw always drifts without an external reference (magnetometer or SLAM) - Run the IMU reader early in the execution order — downstream nodes (controllers, SLAM) depend on fresh orientation - `Miss::Skip` is correct for sensor processing — accumulated lag causes stale readings --- ## Recipe: Multi-Sensor Fusion (Python) Path: /recipes/multi-sensor-fusion-python Description: Complementary filter in Python that fuses IMU heading with wheel odometry position into a corrected pose estimate using NumPy. ## Multi-Sensor Fusion (Python) Fuses IMU orientation with wheel odometry using a complementary filter. Subscribes to `Imu` and `Odometry`, caches the latest from each, blends heading when both are available, and publishes a corrected `Odometry` with fused heading. Uses NumPy for efficient angle wrapping. ### Problem You need a single pose estimate from multiple sensors that run at different rates and have different noise characteristics. ### When To Use - Combining IMU heading with wheel odometry for ground robots - Any system where no single sensor gives a complete state estimate - When you need a confidence metric for downstream planners ### Prerequisites - HORUS installed ([Installation Guide](/getting-started/installation)) - Python 3.8+ with `numpy` installed ### horus.toml ```toml [package] name = "sensor-fusion-py" version = "0.1.0" description = "IMU + odometry complementary filter" language = "python" [dependencies] numpy = { version = ">=1.24", source = "pypi" } ``` ### Complete Code ```python #!/usr/bin/env python3 """Complementary filter: fuse IMU heading with wheel odometry position.""" import math import numpy as np import horus from horus import Node, Imu, Odometry, us, ms # ── Configuration ──────────────────────────────────────────── ALPHA = 0.7 # Favor IMU for heading (less drift than wheels on turns) HIGH_YAW_RATE = 0.5 # rad/s — threshold for reducing confidence during fast turns # ── State ──────────────────────────────────────────────────── last_imu = [None] last_odom = [None] tick_count = [0] # ── Helpers ────────────────────────────────────────────────── def wrap_angle(angle): """Wrap angle to [-pi, pi] using NumPy.""" return float(np.arctan2(np.sin(angle), np.cos(angle))) def imu_yaw(imu_msg): """Extract yaw rate from IMU gyro_z (simplified — no magnetometer).""" return imu_msg.gyro_z # ── Node callbacks ─────────────────────────────────────────── def fusion_tick(node): # IMPORTANT: always recv() ALL topics every tick to drain buffers imu = node.recv("imu.raw") odom = node.recv("odom.wheels") if imu is not None: last_imu[0] = imu if odom is not None: last_odom[0] = odom # Fuse only when both sources are available if last_imu[0] is None or last_odom[0] is None: return tick_count[0] += 1 imu_data = last_imu[0] odom_data = last_odom[0] # Integrate IMU yaw rate to get heading estimate dt = 1.0 / 50.0 # 50 Hz fusion rate imu_heading = odom_data.theta + imu_data.gyro_z * dt # Complementary filter: blend odom heading with IMU-corrected heading fused_theta = wrap_angle( (1.0 - ALPHA) * odom_data.theta + ALPHA * imu_heading ) # Confidence drops during fast turns (IMU gyro saturates) yaw_rate = abs(imu_data.gyro_z) confidence = 0.6 if yaw_rate > HIGH_YAW_RATE else 0.9 # Publish fused pose as corrected Odometry fused = Odometry( x=odom_data.x, y=odom_data.y, theta=fused_theta, linear_velocity=odom_data.linear_velocity, angular_velocity=odom_data.angular_velocity, ) node.send("pose.fused", fused) # Publish confidence as a separate lightweight message node.send("pose.confidence", {"confidence": confidence, "tick": tick_count[0]}) if tick_count[0] % 100 == 0: print( f"Fusion tick {tick_count[0]}: " f"odom=({odom_data.x:.2f}, {odom_data.y:.2f}, {odom_data.theta:.2f}) " f"fused_theta={fused_theta:.2f} conf={confidence:.1f}" ) def fusion_shutdown(node): print(f"Fusion: completed {tick_count[0]} ticks") # ── Main ───────────────────────────────────────────────────── fusion_node = Node( name="Fusion", tick=fusion_tick, shutdown=fusion_shutdown, rate=50, # 50 Hz output — from 100 Hz IMU + 20 Hz odom order=10, # After sensor nodes subs=["imu.raw", "odom.wheels"], pubs=["pose.fused", "pose.confidence"], ) if __name__ == "__main__": horus.run(fusion_node) ``` ### Expected Output ```text [HORUS] Scheduler running — tick_rate: 1000 Hz [HORUS] Node "Fusion" started (50 Hz) Fusion tick 100: odom=(0.50, 0.00, 0.10) fused_theta=0.10 conf=0.9 Fusion tick 200: odom=(1.00, 0.05, 0.20) fused_theta=0.20 conf=0.9 ^C Fusion: completed 250 ticks [HORUS] Shutting down... [HORUS] Node "Fusion" shutdown complete ``` ### Key Points - **Multi-topic aggregation pattern**: `recv()` all topics, cache latest, fuse when both are available - **Complementary filter** is the simplest sensor fusion — for production, consider an Extended Kalman Filter (EKF) - **`ALPHA = 0.7`** favors IMU for heading — wheel odometry drifts on carpet/tile; tune per surface - **`wrap_angle()`** uses NumPy to keep heading in the `[-pi, pi]` range, avoiding discontinuities - **No `shutdown()` for actuators** — fusion nodes don't move anything, just publish estimates - **50 Hz output** from 100 Hz IMU + 20 Hz odometry is fine — fusion runs on cached values - **Confidence field** lets downstream nodes decide how much to trust the estimate ### Variations - **Extended Kalman Filter (EKF)**: Replace the complementary filter with `filterpy` or a custom EKF - **Three-sensor fusion**: Add GPS or LiDAR-based localization as a third input - **Adaptive alpha**: Adjust `ALPHA` based on vehicle dynamics — favor IMU during fast turns, odometry on straights - **NumPy matrix filter**: Use a full state vector `[x, y, theta, v, omega]` with matrix operations ### Common Errors | Symptom | Cause | Fix | |---------|-------|-----| | Fused heading drifts over time | Complementary filter cannot correct absolute drift | Add an absolute reference (GPS, magnetometer) or switch to EKF | | No output published | One or both sensors not publishing | Check `horus monitor` for active topics on `imu.raw` and `odom.wheels` | | Heading jumps at startup | First `recv()` returns stale cached data | Initialize caches to `None` and only fuse when both are fresh | | Confidence always low | `HIGH_YAW_RATE` threshold too tight | Tune the threshold to match your robot's normal turn rate | | Heading wraps incorrectly | Not using angle wrapping | Use `wrap_angle()` around any heading blend | --- ## See Also - [Multi-Sensor Fusion (Rust)](/recipes/multi-sensor-fusion) — Rust version of this recipe - [IMU Reader (Python)](/recipes/imu-reader-python) — IMU publishing pattern --- ## Recipe: Multi-Sensor Fusion Path: /recipes/multi-sensor-fusion Description: Combine IMU and wheel odometry into a fused state estimate using complementary filtering. ## Multi-Sensor Fusion Fuses IMU orientation with wheel odometry position using a complementary filter. Publishes a unified pose estimate. Demonstrates the multi-topic aggregation pattern -- cache latest from each sensor, fuse when both available. ### Problem You need a single pose estimate from multiple sensors that run at different rates and have different noise characteristics. ### When To Use - Combining IMU heading with wheel odometry for ground robots - Any system where no single sensor gives a complete state estimate - When you need a confidence metric for downstream planners ### Prerequisites - HORUS installed ([Installation Guide](/getting-started/installation)) - Upstream nodes publishing `WheelOdom` and `ImuHeading` (or simulated data for testing) ### horus.toml ```toml [package] name = "sensor-fusion" version = "0.1.0" description = "IMU + odometry complementary filter" ``` ### Complete Code ### Expected Output ```text [HORUS] Scheduler running — tick_rate: 50 Hz [HORUS] Node "Fusion" started (Rt, 50 Hz, budget: 16.0ms, deadline: 19.0ms) ^C [HORUS] Shutting down... [HORUS] Node "Fusion" shutdown complete ``` ### Key Points - **Multi-topic aggregation pattern**: `recv()` all topics, cache with `Option`, fuse when both are `Some` - **Complementary filter** is the simplest sensor fusion — for production, consider an Extended Kalman Filter (EKF) - **`alpha = 0.7`** favors IMU for heading — wheel odometry drifts on carpet/tile; tune per surface - **No `shutdown()` needed** — fusion nodes don't actuate anything - **50Hz output** from 100Hz IMU + 20Hz odometry is fine — fusion runs on cached values - **Confidence field** lets downstream nodes decide how much to trust the estimate ### Variations - **Extended Kalman Filter (EKF)**: Replace the complementary filter with an EKF for nonlinear systems - **Three-sensor fusion**: Add GPS or LiDAR-based localization as a third input with its own alpha weight - **Adaptive alpha**: Adjust `alpha` based on vehicle dynamics -- favor IMU during fast turns, odometry on straights - **Timestamp-based interpolation**: Use message timestamps to interpolate between sensor readings at different rates ### Common Errors | Symptom | Cause | Fix | |---------|-------|-----| | Fused heading drifts over time | Complementary filter cannot correct absolute drift | Add an absolute reference (GPS, magnetometer) or switch to EKF | | No output published | One or both sensors not publishing | Check `horus monitor` for active topics on `odom.wheels` and `imu.heading` | | Heading jumps at startup | First `recv()` returns stale cached data | Initialize `last_odom` and `last_imu` to `None` and only fuse when both are fresh | | Confidence always low | `yaw_rate` threshold too tight | Tune the `0.5` threshold to match your robot's normal turn rate | | Position lags behind reality | `alpha` too high (over-trusts IMU position) | Reduce `alpha` or only apply it to heading, not position | --- ## See Also - [IMU Reader Recipe](/recipes/imu-reader) — Single-sensor pattern - [Odometry](/stdlib/messages/odometry) — Odometry message type --- ## Recipe: Emergency Stop (Python) Path: /recipes/emergency-stop-python Description: Safety monitor in Python that subscribes to sensor topics, detects unsafe conditions, and publishes EmergencyStop to halt all motion. ## Emergency Stop (Python) Safety monitor node that subscribes to sensor topics, evaluates safety conditions every tick, and publishes `EmergencyStop` when hazards are detected. Sends zero `CmdVel` to override any active motion. Implements debounce on clear signals and fail-safe behavior when sensor data stops arriving. ### Problem You need a Python safety node that can halt the robot when sensors detect unsafe conditions (obstacles too close, battery low, communication lost). ### When To Use - Any robot with actuators (motors, servos, grippers) - When safety regulations require guaranteed shutdown - As a software safety layer alongside a hardware E-stop circuit ### Prerequisites - HORUS installed ([Installation Guide](/getting-started/installation)) - Upstream nodes publishing sensor data (`LaserScan`, `BatteryState`) ### horus.toml ```toml [package] name = "emergency-stop-py" version = "0.1.0" description = "E-stop monitor with multi-condition safety checks" language = "python" ``` ### Complete Code ```python #!/usr/bin/env python3 """Emergency stop monitor — multi-condition safety with debounce and fail-safe.""" import horus from horus import Node, CmdVel, EmergencyStop, LaserScan, BatteryState, us, ms # ── Safety thresholds ──────────────────────────────────────── MIN_OBSTACLE_DISTANCE = 0.2 # meters — hard stop if anything closer LOW_BATTERY_VOLTAGE = 10.5 # volts — shutdown to prevent brownout CLEAR_THRESHOLD = 50 # ticks of clear signals before releasing STALE_THRESHOLD = 20 # ticks without sensor data = fault # ── State ──────────────────────────────────────────────────── estop_active = [False] consecutive_clears = [0] ticks_since_scan = [0] ticks_since_battery = [0] total_ticks = [0] trigger_reason = [""] # ── Safety checks ──────────────────────────────────────────── def check_obstacle(scan): """Check if any LaserScan range is below the safety minimum.""" if scan is None: return False, "" for i, r in enumerate(scan.ranges): if 0.01 < r < MIN_OBSTACLE_DISTANCE: return True, f"obstacle at {r:.2f}m (ray {i})" return False, "" def check_battery(battery): """Check if battery voltage is critically low.""" if battery is None: return False, "" if battery.voltage > 0 and battery.voltage < LOW_BATTERY_VOLTAGE: return True, f"battery low: {battery.voltage:.1f}V" return False, "" def check_stale_sensors(): """Check if sensor data has stopped arriving (publisher crash).""" if ticks_since_scan[0] > STALE_THRESHOLD: return True, f"lidar stale ({ticks_since_scan[0]} ticks)" if ticks_since_battery[0] > STALE_THRESHOLD: return True, f"battery stale ({ticks_since_battery[0]} ticks)" return False, "" # ── Node callbacks ─────────────────────────────────────────── def estop_tick(node): total_ticks[0] += 1 ticks_since_scan[0] += 1 ticks_since_battery[0] += 1 # IMPORTANT: always recv() ALL topics every tick to drain buffers scan = node.recv("lidar.scan") battery = node.recv("battery.state") if scan is not None: ticks_since_scan[0] = 0 if battery is not None: ticks_since_battery[0] = 0 # Evaluate all safety conditions triggered = False reason = "" obstacle_bad, obstacle_reason = check_obstacle(scan) if obstacle_bad: triggered = True reason = obstacle_reason battery_bad, battery_reason = check_battery(battery) if battery_bad: triggered = True reason = battery_reason stale_bad, stale_reason = check_stale_sensors() if stale_bad: triggered = True reason = stale_reason if triggered: # SAFETY: immediately activate E-stop estop_active[0] = True consecutive_clears[0] = 0 trigger_reason[0] = reason print(f"E-STOP TRIGGERED: {reason}") else: consecutive_clears[0] += 1 # Require N consecutive clear signals before releasing if estop_active[0] and consecutive_clears[0] >= CLEAR_THRESHOLD: estop_active[0] = False print("E-STOP RELEASED after clear period") if estop_active[0]: # SAFETY: override cmd_vel with zero — stops all motion node.send("cmd_vel", CmdVel(linear=0.0, angular=0.0)) # Publish E-stop status for monitoring node.send("safety.estop", EmergencyStop( engaged=estop_active[0], reason=trigger_reason[0] if estop_active[0] else "", )) def estop_shutdown(node): # SAFETY: zero velocity on shutdown node.send("cmd_vel", CmdVel(linear=0.0, angular=0.0)) node.send("safety.estop", EmergencyStop(engaged=True, reason="shutdown")) print(f"EStop: shutdown after {total_ticks[0]} ticks") # ── Main ───────────────────────────────────────────────────── estop_node = Node( name="EStop", tick=estop_tick, shutdown=estop_shutdown, rate=100, # 100 Hz safety monitoring order=100, # IMPORTANT: runs LAST — overrides cmd_vel from other nodes subs=["lidar.scan", "battery.state"], pubs=["cmd_vel", "safety.estop"], budget=200 * us, # tight budget for safety node deadline=500 * us, # tight deadline on_miss="safe_mode", # force safe state on deadline miss ) if __name__ == "__main__": horus.run( estop_node, watchdog_ms=500, ) ``` ### Expected Output ```text [HORUS] Scheduler running — tick_rate: 1000 Hz [HORUS] Node "EStop" started (100 Hz) E-STOP TRIGGERED: obstacle at 0.15m (ray 42) E-STOP RELEASED after clear period ^C EStop: shutdown after 1500 ticks [HORUS] Shutting down... [HORUS] Node "EStop" shutdown complete ``` ### Key Points - **High `.order(100)`** ensures E-stop runs AFTER drive/planning nodes — it overrides their `cmd_vel` output - **Multi-condition checks**: obstacle proximity, battery voltage, and sensor staleness - **Debounce with `CLEAR_THRESHOLD`** prevents flickering E-stop from bouncing on/off - **No signal = fault**: if sensors stop publishing, the node treats it as triggered (fail-safe design) - **`on_miss="safe_mode"`** is the strictest miss policy — any deadline overrun triggers safe state - **200 us budget** keeps safety checks deterministic and fast - **`EmergencyStop`** message has `engaged` and `reason` fields for downstream monitoring ### Variations - **Wireless E-stop**: Subscribe to a network heartbeat topic; missing heartbeats trigger E-stop - **Multi-zone E-stop**: Separate E-stop nodes per actuator group (arm vs. wheels) - **Graduated response**: Reduce speed before full stop using distance-proportional scaling - **Hardware GPIO**: Read a physical E-stop button via `RPi.GPIO` and publish to `safety.estop` ### Common Errors | Symptom | Cause | Fix | |---------|-------|-----| | E-stop doesn't override motors | E-stop `order` lower than motor node | Set E-stop `.order()` HIGHER than motor node | | Motors resume after E-stop | Only sending zero once | Send `CmdVel(0, 0)` every tick while active | | E-stop flickers on/off | No debounce on clear signal | Use `CLEAR_THRESHOLD` consecutive clears | | E-stop never releases | `CLEAR_THRESHOLD` too high or no clear signals | Reduce threshold or verify sensors are publishing | | False triggers on startup | Sensors haven't published yet and stale threshold is too low | Increase `STALE_THRESHOLD` or skip checks for first N ticks | --- ## See Also - [Emergency Stop (Rust)](/recipes/emergency-stop) — Rust version with full system example - [LiDAR Obstacle Avoidance (Python)](/recipes/lidar-avoidance-python) — Reactive avoidance --- ## Emergency Stop Path: /recipes/emergency-stop Description: Safety-critical E-stop monitor that forces all actuators to safe state when triggered # Emergency Stop Every robot that moves can hurt someone. IEC 61508 (functional safety) and IEC 62061 (safety of machinery) both mandate that autonomous systems provide a reliable way to cease all hazardous motion immediately. Whether you are building a warehouse AGV, a surgical arm, or a hobby rover, an emergency stop is not optional — it is the single most important safety subsystem in your robot. HORUS provides the building blocks for a **cooperative software E-stop**: the scheduler's `enter_safe_state()` callback, `Miss::SafeMode` deadline enforcement, and shared-memory topics for cross-node signal propagation. This recipe shows you how to wire them together into a production-ready pattern with debounce, fail-safe defaults, and testable shutdown behavior. ## When To Use This - Any robot with actuators (motors, servos, grippers) - When safety regulations require guaranteed shutdown - When you need sub-millisecond response to safety events **Use [Fault Tolerance](/advanced/circuit-breaker) instead** if you need graceful degradation (reduced speed, limited range of motion) rather than full shutdown. ## Prerequisites - Familiarity with [Nodes](/concepts/core-concepts-nodes) and [CmdVel](/stdlib/messages/cmd-vel) - Understanding of [Miss policies](/rust/api/scheduler#miss-enum) and `enter_safe_state()` ## Solution ### horus.toml ```toml [package] name = "emergency-stop" version = "0.1.0" description = "E-stop monitor with safety state handling" ``` ### src/main.rs (or src/main.py) ## Understanding the Code - **`enter_safe_state()`** is called by the scheduler when this node misses its deadline — the robot stops automatically without any application logic - **`Miss::SafeMode`** is the strictest miss policy — any deadline overrun triggers safe state - **High `.order(100)`** ensures E-stop runs AFTER drive/planning nodes — it overrides their `cmd_vel` output - **Debounce with `CLEAR_THRESHOLD`** prevents flickering E-stop signals from bouncing - **No signal = fault** — if the E-stop topic stops publishing, the node treats it as triggered (fail-safe design) - **200 us budget** is generous for this simple node — keeps safety checks deterministic ## Full System Example A real robot does not run the E-stop node alone. Below is a complete scheduler with three nodes — `DriveNode` (motor control), `PlannerNode` (path planning), and `EStopNode` (safety override) — showing how `enter_safe_state()` and `is_safe_state()` cooperate across the system. ```rust // simplified use horus::prelude::*; // --- Messages (same EStopSignal and SafetyStatus as above) --- #[derive(Debug, Clone, Copy, Default, Serialize, Deserialize, LogSummary)] #[repr(C)] struct EStopSignal { triggered: u8, source: u8, } #[derive(Debug, Clone, Copy, Default, Serialize, Deserialize, LogSummary)] #[repr(C)] struct SafetyStatus { estop_active: u8, consecutive_clears: u32, uptime_ticks: u64, } #[derive(Debug, Clone, Copy, Default, Serialize, Deserialize, LogSummary)] #[repr(C)] struct WheelCmd { left_rpm: f32, right_rpm: f32, } // --- DriveNode: converts cmd_vel to wheel commands --- struct DriveNode { cmd_sub: Topic, wheel_pub: Topic, } impl DriveNode { fn new() -> Result { Ok(Self { cmd_sub: Topic::new("cmd_vel")?, wheel_pub: Topic::new("wheel.cmd")?, }) } } impl Node for DriveNode { fn name(&self) -> &str { "Drive" } fn tick(&mut self) { if let Some(cmd) = self.cmd_sub.recv() { let wheel_base = 0.3_f32; let radius = 0.05_f32; let to_rpm = 60.0 / (2.0 * std::f32::consts::PI); let left = ((cmd.linear - cmd.angular * wheel_base / 2.0) / radius) * to_rpm; let right = ((cmd.linear + cmd.angular * wheel_base / 2.0) / radius) * to_rpm; self.wheel_pub.send(WheelCmd { left_rpm: left.clamp(-200.0, 200.0), right_rpm: right.clamp(-200.0, 200.0), }); } } fn enter_safe_state(&mut self) { // Zero both motors immediately — this is the critical safety action self.wheel_pub.send(WheelCmd { left_rpm: 0.0, right_rpm: 0.0 }); } fn shutdown(&mut self) -> Result<()> { self.wheel_pub.send(WheelCmd { left_rpm: 0.0, right_rpm: 0.0 }); Ok(()) } } // --- PlannerNode: publishes cmd_vel, checks path safety --- struct PlannerNode { cmd_pub: Topic, path_clear: bool, } impl PlannerNode { fn new() -> Result { Ok(Self { cmd_pub: Topic::new("cmd_vel")?, path_clear: true, }) } } impl Node for PlannerNode { fn name(&self) -> &str { "Planner" } fn tick(&mut self) { // In production, check lidar/camera for obstacles if self.path_clear { self.cmd_pub.send(CmdVel::new(0.5, 0.0)); } else { self.cmd_pub.send(CmdVel::zero()); } } fn is_safe_state(&self) -> bool { // Scheduler queries this — if false, the safety monitor can escalate !self.path_clear } fn enter_safe_state(&mut self) { self.path_clear = false; self.cmd_pub.send(CmdVel::zero()); } } // --- EStopNode: same as the Solution section above --- // (omitted for brevity — use the full EStopNode from above) fn main() -> Result<()> { let mut scheduler = Scheduler::new() .watchdog(500_u64.ms()) .max_deadline_misses(1); // Planner runs first — publishes cmd_vel scheduler.add(PlannerNode::new()?) .order(0) .rate(20_u64.hz()) .on_miss(Miss::Skip) .build()?; // Drive runs second — converts cmd_vel to wheel commands scheduler.add(DriveNode::new()?) .order(10) .rate(50_u64.hz()) .budget(500_u64.us()) .on_miss(Miss::SafeMode) .build()?; // E-stop runs LAST — overrides cmd_vel if triggered scheduler.add(EStopNode::new()?) .order(100) .rate(100_u64.hz()) .budget(200_u64.us()) .deadline(500_u64.us()) .on_miss(Miss::SafeMode) .build()?; scheduler.run() } ``` The execution order matters: Planner (0) produces velocity, Drive (10) converts it to motor commands, and EStop (100) overrides `cmd_vel` with zero if triggered. Because EStop writes to the same `cmd_vel` topic _after_ Planner, the zero command propagates to Drive on the next tick. When the scheduler calls `enter_safe_state()` on DriveNode (due to a deadline miss or watchdog expiry), DriveNode zeros its wheel output independently of the E-stop signal. This gives you two layers of protection: the E-stop node zeroes the _command_, and the drive node zeroes the _output_. ## Hardware GPIO Integration Physical E-stop buttons connect to GPIO pins. The pattern below reads a hardware button and publishes to the `safety.estop` topic so the `EStopNode` (running in the same or a different process) can react. ```rust // simplified use horus::prelude::*; use gpio_cdev::{Chip, LineRequestFlags}; struct GpioEstopPublisher { estop_pub: Topic, gpio_line: gpio_cdev::Line, pin: u32, } impl GpioEstopPublisher { fn new(chip_path: &str, pin: u32) -> Result { let mut chip = Chip::new(chip_path) .map_err(|e| horus::Error::msg(format!("GPIO chip: {e}")))?; let line = chip.get_line(pin) .map_err(|e| horus::Error::msg(format!("GPIO line {pin}: {e}")))?; Ok(Self { estop_pub: Topic::new("safety.estop")?, gpio_line: line, pin, }) } } impl Node for GpioEstopPublisher { fn name(&self) -> &str { "GpioEstop" } fn tick(&mut self) { // Request the line as input each tick (some drivers require re-request) let handle = self.gpio_line .request(LineRequestFlags::INPUT, 0, "horus-estop") .unwrap(); let value = handle.get_value().unwrap_or(1); // fail-safe: default to triggered // E-stop buttons are normally-closed (NC): pin LOW = safe, pin HIGH = triggered // Wiring a NC button means a broken wire also triggers the E-stop (fail-safe) self.estop_pub.send(EStopSignal { triggered: value as u8, source: 0, // hardware }); } } ``` **Why normally-closed wiring?** A normally-closed (NC) button keeps the circuit completed when not pressed. If the wire breaks or the connector fails, the circuit opens — which reads the same as "button pressed." This is the standard industrial fail-safe pattern: any hardware fault triggers the E-stop rather than silently disabling it. ## Multi-Process E-stop HORUS topics use shared memory. When two processes open `Topic::new("safety.estop")`, they share the same SHM segment. This means a hardware GPIO publisher in one process and an `EStopNode` in another process see the same signal with zero serialization overhead. The pattern for multi-process E-stop: ```rust // simplified // Process 1: Hardware monitor (publishes E-stop from GPIO) fn main() -> Result<()> { let mut scheduler = Scheduler::new(); scheduler.add(GpioEstopPublisher::new("/dev/gpiochip0", 17)?) .order(0) .rate(200_u64.hz()) // sample GPIO at 200 Hz for fast response .budget(100_u64.us()) .build()?; scheduler.run() } // Process 2: Main robot controller (subscribes to E-stop) fn main() -> Result<()> { let mut scheduler = Scheduler::new() .watchdog(500_u64.ms()) .max_deadline_misses(1); scheduler.add(PlannerNode::new()?) .order(0) .rate(20_u64.hz()) .build()?; scheduler.add(DriveNode::new()?) .order(10) .rate(50_u64.hz()) .on_miss(Miss::SafeMode) .build()?; // E-stop reads from SHM — sees GPIO publisher's writes scheduler.add(EStopNode::new()?) .order(100) .rate(100_u64.hz()) .budget(200_u64.us()) .deadline(500_u64.us()) .on_miss(Miss::SafeMode) .build()?; scheduler.run() } ``` Every process that controls actuators should subscribe to `safety.estop`. The publisher only needs to exist once (in the hardware monitor process), but any number of subscribers can react to it. Because SHM is lock-free, the E-stop signal propagates in under 1 microsecond regardless of how many subscribers exist. ## Testing E-stop Use `tick_once()` to verify E-stop behavior deterministically without running the full scheduler loop. Each call to `tick_once()` executes exactly one tick of every node in order. ```rust // simplified #[cfg(test)] mod tests { use super::*; #[test] fn motor_runs_when_estop_clear() { let mut scheduler = Scheduler::new(); scheduler.add(DriveNode::new().unwrap()) .order(10) .build().unwrap(); scheduler.add(EStopNode::new().unwrap()) .order(100) .build().unwrap(); // Publish a clear E-stop signal let estop_pub: Topic = Topic::new("safety.estop").unwrap(); estop_pub.send(EStopSignal { triggered: 0, source: 0 }); // Publish a velocity command let cmd_pub: Topic = Topic::new("cmd_vel").unwrap(); cmd_pub.send(CmdVel::new(1.0, 0.0)); // Tick once — Drive should produce non-zero wheel output scheduler.tick_once(); let wheel_sub: Topic = Topic::new("wheel.cmd").unwrap(); let wheel = wheel_sub.recv().unwrap(); assert!(wheel.left_rpm.abs() > 0.0, "motor should be running"); assert!(wheel.right_rpm.abs() > 0.0, "motor should be running"); } #[test] fn motor_stops_when_estop_triggered() { let mut scheduler = Scheduler::new(); scheduler.add(DriveNode::new().unwrap()) .order(10) .build().unwrap(); scheduler.add(EStopNode::new().unwrap()) .order(100) .build().unwrap(); // Publish a triggered E-stop let estop_pub: Topic = Topic::new("safety.estop").unwrap(); estop_pub.send(EStopSignal { triggered: 1, source: 1 }); // Publish a velocity command — E-stop should override it let cmd_pub: Topic = Topic::new("cmd_vel").unwrap(); cmd_pub.send(CmdVel::new(1.0, 0.0)); // Tick once — EStopNode overwrites cmd_vel with zero scheduler.tick_once(); // Tick again — DriveNode reads the zeroed cmd_vel scheduler.tick_once(); let wheel_sub: Topic = Topic::new("wheel.cmd").unwrap(); let wheel = wheel_sub.recv().unwrap(); assert_eq!(wheel.left_rpm, 0.0, "motor must be stopped"); assert_eq!(wheel.right_rpm, 0.0, "motor must be stopped"); } #[test] fn estop_requires_debounce_to_clear() { let mut scheduler = Scheduler::new(); scheduler.add(EStopNode::new().unwrap()) .order(100) .build().unwrap(); let estop_pub: Topic = Topic::new("safety.estop").unwrap(); let status_sub: Topic = Topic::new("safety.status").unwrap(); // Trigger E-stop estop_pub.send(EStopSignal { triggered: 1, source: 0 }); scheduler.tick_once(); let status = status_sub.recv().unwrap(); assert_eq!(status.estop_active, 1, "should be active after trigger"); // Send a single clear — should NOT release (need 50 consecutive) estop_pub.send(EStopSignal { triggered: 0, source: 0 }); scheduler.tick_once(); let status = status_sub.recv().unwrap(); assert_eq!(status.estop_active, 1, "should still be active — debounce not met"); } #[test] fn no_signal_keeps_estop_active() { let mut scheduler = Scheduler::new(); scheduler.add(EStopNode::new().unwrap()) .order(100) .build().unwrap(); let estop_pub: Topic = Topic::new("safety.estop").unwrap(); let status_sub: Topic = Topic::new("safety.status").unwrap(); // Trigger E-stop, then stop publishing entirely estop_pub.send(EStopSignal { triggered: 1, source: 0 }); scheduler.tick_once(); // Tick without publishing — simulates publisher crash scheduler.tick_once(); let status = status_sub.recv().unwrap(); assert_eq!(status.estop_active, 1, "no signal = fault = stay active"); } } ``` These tests run in milliseconds and cover the four critical scenarios: normal operation, triggered stop, debounce behavior, and publisher failure. ## Safety Standards Note For systems that must meet **SIL (Safety Integrity Level)** ratings under IEC 61508 or performance levels under ISO 13849: - **Hardware E-stop circuit**: A physical relay that cuts power to actuators independently of software. This is the primary safety system. The relay must be rated for the motor's stall current and must be fail-safe (normally-closed contacts). - **Software E-stop (this recipe)**: A secondary layer that provides faster response (sub-millisecond vs. relay switching time of 5-20 ms) and richer behavior (debounce, status reporting, coordinated shutdown). It monitors the same physical button and coordinates the software stack. - **Watchdog timer**: A hardware watchdog (e.g., on the microcontroller or SBC) that resets the system if software stops sending heartbeats. HORUS's `.watchdog()` is a software watchdog — pair it with a hardware one for defense in depth. The layered approach: hardware relay catches catastrophic failures (software crash, kernel panic, power brownout), software E-stop handles the 99% case with better UX (status reporting, coordinated shutdown, logging). ## Design Decisions **Why fail-safe (no signal = fault)?** If the E-stop publisher crashes, the subscriber stops receiving messages. A "fail-dangerous" design would interpret silence as "all clear" — the robot keeps moving with no safety monitoring. The fail-safe design treats silence as a fault and activates the E-stop. This is the same principle as a dead man's switch: you must actively assert safety, not passively assume it. **Why debounce on clear (not on trigger)?** Triggering the E-stop must be instant — any delay could mean the robot travels further into a hazard. But releasing the E-stop can afford 0.5 seconds of delay. Debounce on clear prevents a flickering signal (e.g., a loose wire on the GPIO pin) from rapidly cycling the robot between motion and stop, which can damage motors and gearboxes. **Why same-scheduler ordering instead of priority?** HORUS uses cooperative scheduling with deterministic ordering, not preemptive priorities. The `.order(100)` guarantee means the E-stop node always runs after drive nodes within the same tick. This is simpler to reason about than preemptive priority inversion, and the worst-case latency is bounded by the tick period (10 ms at 100 Hz) rather than being unbounded. **Why `u8` instead of `bool` for `triggered`?** The `#[repr(C)]` attribute ensures the struct has a predictable memory layout for zero-copy SHM transport. The Rust `bool` type has no guaranteed `#[repr(C)]` size across all platforms. Using `u8` makes the wire format explicit: 0 = clear, 1 = triggered. ## Trade-offs | Approach | Latency | Reliability | Complexity | When to use | |----------|---------|-------------|------------|-------------| | Same-scheduler E-stop (this recipe) | <1 tick (10 ms at 100 Hz) | High — deterministic ordering | Low | Single-process robots, most use cases | | Multi-process SHM E-stop | <1 us propagation + subscriber tick period | High — survives publisher crash via fail-safe | Medium | Multi-process architectures | | Event-driven `.on("safety.estop")` | <1 us (wake on signal) | Medium — no periodic checking | Low | When lowest latency matters more than periodic monitoring | | Hardware relay only (no software) | 5-20 ms relay switching | Very high — independent of software | Very low | Certification requirement, last-resort backup | | Hardware relay + software E-stop | <1 tick software, 5-20 ms hardware backup | Highest — defense in depth | Medium | Production robots, SIL-rated systems | ## Variations ## Common Errors | Symptom | Cause | Fix | |---------|-------|-----| | E-stop doesn't override motors | E-stop `.order()` lower than motor node | Set E-stop `.order()` HIGHER than motor node | | Motors resume after E-stop | Only sending zero once | Send `CmdVel::zero()` every tick while active | | E-stop flickers on/off | No debounce on clear signal | Use `CLEAR_THRESHOLD` consecutive clears | | System continues after deadline miss | Using `Miss::Warn` instead of `Miss::SafeMode` | Set `.on_miss(Miss::SafeMode)` | | E-stop never releases | `CLEAR_THRESHOLD` too high or no clear signals | Verify upstream publishes clear signals | | GPIO reads inverted | Button wired normally-open instead of normally-closed | Invert the logic or rewire as NC for fail-safe | | Multi-process E-stop has lag | Subscriber tick rate too low | Increase E-stop node `.rate()` or use event-driven | ## See Also - [Safety Monitor](/advanced/safety-monitor) — Graduated degradation and watchdog - [Fault Tolerance](/advanced/circuit-breaker) — Circuit breaker pattern for graceful degradation - [CmdVel](/stdlib/messages/cmd-vel) — Velocity command type - [Miss Enum](/rust/api/scheduler#miss-enum) — All deadline miss policies - [Differential Drive](/recipes/differential-drive) — Motor control node that pairs with E-stop --- ## Recipe: LiDAR Obstacle Avoidance (C++) Path: /recipes/lidar-avoidance-cpp Description: Reactive obstacle avoidance using LaserScan data ## LiDAR Obstacle Avoidance (C++) Read 360-degree LiDAR scans, identify the closest obstacle, and generate reactive velocity commands to steer away from danger while maintaining forward progress. ### Problem Your robot has a planar LiDAR and needs to navigate without hitting things. A full path planner is overkill for simple environments. You need a lightweight reactive behavior: drive forward when the path is clear, slow down and steer away when obstacles appear. ### How It Works The node processes a 360-degree `LaserScan` in three zones: 1. **Front zone** (roughly -30 to +30 degrees): if an obstacle is here, the robot must stop or slow down 2. **Left zone** (0-180 degrees): obstacles here pull the robot to the right 3. **Right zone** (180-360 degrees): obstacles here pull the robot to the left The algorithm finds the closest point in the full scan, then decides: - If the closest obstacle is beyond the safe distance: drive at full speed - If it is within the safe distance but outside the critical zone: slow down and turn away - If it is within the critical distance: stop and rotate in place The turn direction is chosen to steer *away* from the closest obstacle. If the obstacle is on the left half of the scan (index 0-179), turn right (negative angular velocity). If on the right half (180-359), turn left (positive angular velocity). ### When To Use - Simple hallway or warehouse navigation without a map - Backup safety behavior when the primary planner fails - Prototyping before implementing a full costmap-based planner ### Prerequisites - HORUS installed ([Installation Guide](/docs/getting-started/installation)) - Basic understanding of nodes and topics ([C++ Quick Start](/docs/getting-started/quick-start-cpp)) ### Complete Code ```cpp #include #include #include #include using namespace horus::literals; class LidarAvoidance : public horus::Node { public: LidarAvoidance(float safe_dist, float critical_dist, float max_speed, float turn_rate) : Node("lidar_avoidance"), safe_dist_(safe_dist), critical_dist_(critical_dist), max_speed_(max_speed), turn_rate_(turn_rate) { scan_sub_ = subscribe("lidar.scan"); cmd_pub_ = advertise("cmd_vel"); } void tick() override { auto scan = scan_sub_->recv(); if (!scan) return; const auto* s = scan->get(); // ── Find minimum range and its bearing ────────────────────── float min_range = 999.0f; int min_idx = 0; int valid_count = 0; for (int i = 0; i < 360; i++) { float r = s->ranges[i]; // Filter invalid readings: too close (sensor noise) or too far (max range) if (r > 0.05f && r < 12.0f) { valid_count++; if (r < min_range) { min_range = r; min_idx = i; } } } // ── Detect sensor failure ─────────────────────────────────── // If fewer than 10% of beams are valid, sensor may be obscured or broken if (valid_count < 36) { horus::log::error("lidar_avoidance", "Insufficient valid LiDAR readings — stopping for safety"); horus::msg::CmdVel stop{}; cmd_pub_->send(stop); horus::blackbox::record("lidar_avoidance", "Sensor failure: <10% valid beams"); return; } // ── Decide action based on proximity ──────────────────────── horus::msg::CmdVel cmd{}; if (min_range < critical_dist_) { // CRITICAL: obstacle very close — rotate in place cmd.linear = 0.0f; cmd.angular = (min_idx < 180) ? -turn_rate_ : turn_rate_; horus::log::warn("lidar_avoidance", "Critical obstacle — rotating in place"); } else if (min_range < safe_dist_) { // CAUTION: obstacle within safe zone — slow down and steer away // Speed proportional to distance (closer = slower) float speed_factor = (min_range - critical_dist_) / (safe_dist_ - critical_dist_); cmd.linear = max_speed_ * speed_factor * 0.5f; cmd.angular = (min_idx < 180) ? -turn_rate_ * 0.7f : turn_rate_ * 0.7f; } else { // CLEAR: no close obstacles — full speed ahead cmd.linear = max_speed_; cmd.angular = 0.0f; } cmd_pub_->send(cmd); // Log at 1 Hz (every 10th tick at 10 Hz) if (++tick_count_ % 10 == 0) { char buf[128]; std::snprintf(buf, sizeof(buf), "min=%.2fm at %d deg cmd=(%.2f m/s, %.2f rad/s) valid=%d/360", min_range, min_idx, cmd.linear, cmd.angular, valid_count); horus::log::info("lidar_avoidance", buf); } } void enter_safe_state() override { // Stop all motion on safety event horus::msg::CmdVel stop{}; cmd_pub_->send(stop); horus::blackbox::record("lidar_avoidance", "Entered safe state, motion stopped"); } private: horus::Subscriber* scan_sub_; horus::Publisher* cmd_pub_; // Tunable parameters float safe_dist_; // meters — start slowing down float critical_dist_; // meters — stop and rotate float max_speed_; // m/s forward speed float turn_rate_; // rad/s rotation speed int tick_count_ = 0; }; int main() { horus::Scheduler sched; sched.tick_rate(10_hz).name("lidar_avoidance_demo"); // safe=0.8m, critical=0.3m, max=0.3 m/s, turn=0.8 rad/s LidarAvoidance avoid(0.8f, 0.3f, 0.3f, 0.8f); sched.add(avoid) .order(10) .budget(5_ms) .on_miss(horus::Miss::Skip) // skip if overrun — stale scan data is dangerous .build(); sched.spin(); } ``` ### Common Errors | Symptom | Cause | Fix | |---------|-------|-----| | Robot oscillates left/right near walls | Turn rate too high or safe distance too large | Reduce `turn_rate_` or `safe_dist_` | | Robot drives into obstacles | `safe_dist_` too small for robot's stopping distance | Increase safe distance to account for braking at `max_speed_` | | Robot stops and never recovers | `critical_dist_` too large or turn direction blocked | Reduce critical distance or add a "wander" timeout | | Robot ignores nearby obstacles | LiDAR returns values <0.05m that are filtered out | Adjust minimum range filter to match your sensor's spec | | Jerky motion near obstacles | 10 Hz too slow for smooth speed modulation | Increase tick rate to 20-30 Hz | | Robot stops with no obstacle | Sensor returns spurious short-range readings | Add median filter on `ranges[]` before processing | ### Design Decisions | Choice | Rationale | |--------|-----------| | Three-tier response (clear/caution/critical) | Smoother behavior than binary go/stop — gradual slowdown feels natural | | Speed proportional to obstacle distance | Closer obstacles mean slower approach — natural deceleration curve | | Sensor failure detection | If the LiDAR is blocked or broken, stopping is safer than driving blind | | 10 Hz tick rate | Matches typical LiDAR publish rate (10-15 Hz); faster ticks waste CPU re-processing the same scan | | `Miss::Skip` policy | Processing a stale scan with old obstacle positions is dangerous | | `enter_safe_state()` stops robot | Any system-level safety event must halt motion | | Min range filter at 0.05m | Most LiDARs report noise below their minimum range | ### Variations **Weighted gap-finding** — steer toward the largest open gap instead of away from the closest obstacle: ```cpp // Find the widest contiguous gap of ranges > safe_dist_ int best_start = 0, best_len = 0, cur_start = 0, cur_len = 0; for (int i = 0; i < 360; i++) { if (s->ranges[i] > safe_dist_) { if (cur_len == 0) cur_start = i; cur_len++; } else { if (cur_len > best_len) { best_start = cur_start; best_len = cur_len; } cur_len = 0; } } int gap_center = best_start + best_len / 2; cmd.angular = (gap_center - 180) * 0.01f; // steer toward gap ``` **Vector field histogram (VFH)** — build a polar histogram of obstacle density: ```cpp // Bin ranges into sectors, compute steering from obstacle-free sectors float histogram[36] = {}; // 10-degree bins for (int i = 0; i < 360; i++) { int bin = i / 10; if (s->ranges[i] < safe_dist_) histogram[bin] += 1.0f; } // Steer toward the bin with lowest obstacle density ``` **Follow-the-wall** — maintain a constant distance from a wall for corridor navigation: ```cpp // Use readings at 90 and 270 degrees for left/right wall distance float left_dist = s->ranges[90]; float right_dist = s->ranges[270]; float wall_error = left_dist - desired_wall_dist_; cmd.angular = kp_wall_ * wall_error; // P-controller on wall distance ``` ### Key Takeaways - Always validate sensor data — filter out-of-range readings and detect sensor failure - Three-tier response (clear/caution/critical) is smoother and safer than binary go/stop - Match tick rate to sensor publish rate — faster ticks just reprocess the same data - `Miss::Skip` is essential for reactive avoidance — stale obstacle data is dangerous - Speed proportional to obstacle distance gives a natural deceleration profile --- ## Recipe: Emergency Stop (C++) Path: /recipes/emergency-stop-cpp Description: Safety system that monitors sensors and triggers emergency stop ## Emergency Stop (C++) A safety-critical node that monitors sensor data for imminent collision, triggers an emergency stop to halt all motion, and requires deliberate confirmation before resuming. Uses `Miss::SafeMode` for the strictest deadline enforcement. ### Problem Your robot has actuators that can injure people or damage property. You need a last line of defense that overrides all motion commands when a hazard is detected, logs the event for post-mortem analysis, and does not resume until the hazard has definitively cleared. A single spurious "all clear" must not release the stop. ### How It Works The E-stop monitor runs at the **highest priority** (`order(0)`) with the **tightest budget** and the **strictest miss policy** (`Miss::SafeMode`). Every tick it: 1. Reads the latest LiDAR scan 2. Checks if any beam reports an obstacle within the critical distance (15cm) 3. If danger detected: activates E-stop, publishes `EmergencyStop{engaged=1}`, zeros `cmd_vel`, and records the event to the blackbox 4. If danger clears: increments a debounce counter. Only after N consecutive clear readings does it release the E-stop **Why debounce on clear but not on trigger?** Triggering must be instant — any delay means the robot travels further into the hazard. But releasing can afford 0.5 seconds of delay. Debounce prevents a flickering signal (e.g., a person walking past the sensor) from rapidly cycling the robot between motion and stop, which stresses motors and gearboxes. **Why fail-safe default?** If the LiDAR node crashes and stops publishing, `scan_sub_->recv()` returns null. The monitor treats this as a sensor failure and keeps the E-stop active. Silence is never interpreted as "safe." ### When To Use - Any robot with actuators (motors, servos, grippers) - When safety regulations require guaranteed shutdown - As a secondary software safety layer alongside hardware E-stop relays ### Prerequisites - HORUS installed ([Installation Guide](/docs/getting-started/installation)) - Basic understanding of nodes and topics ([C++ Quick Start](/docs/getting-started/quick-start-cpp)) - Understanding of [Miss policies](/docs/concepts/core-concepts-scheduling) ### Complete Code ```cpp #include #include using namespace horus::literals; class SafetyMonitor : public horus::Node { public: SafetyMonitor(float critical_dist = 0.15f, int clear_threshold = 25) : Node("safety_monitor"), critical_dist_(critical_dist), clear_threshold_(clear_threshold) { scan_sub_ = subscribe("lidar.scan"); estop_pub_ = advertise("emergency.stop"); cmd_pub_ = advertise("cmd_vel"); } void tick() override { auto scan = scan_sub_->recv(); // ── Fail-safe: no scan data = sensor failure ──────────────── // If the LiDAR node crashed, we have no sensor coverage. // Treat silence as danger, not safety. if (!scan) { ticks_without_scan_++; if (ticks_without_scan_ > 10 && !estop_active_) { estop_active_ = true; consecutive_clears_ = 0; horus::log::error("safety", "No LiDAR data for 10 ticks — activating E-stop"); horus::blackbox::record("safety", "E-stop triggered: LiDAR timeout (sensor failure)"); } // Always zero cmd_vel while E-stop is active if (estop_active_) { horus::msg::CmdVel stop{}; cmd_pub_->send(stop); horus::msg::EmergencyStop estop{}; estop.engaged = 1; estop_pub_->send(estop); } return; } ticks_without_scan_ = 0; // ── Check all beams for critical proximity ────────────────── bool danger = false; for (int i = 0; i < 360; i++) { float r = scan->get()->ranges[i]; // Valid reading below critical distance = imminent collision if (r > 0.02f && r < critical_dist_) { danger = true; break; } } // ── E-stop state machine ──────────────────────────────────── if (danger && !estop_active_) { // TRIGGER: immediate activation, no debounce estop_active_ = true; consecutive_clears_ = 0; horus::msg::EmergencyStop estop{}; estop.engaged = 1; estop_pub_->send(estop); horus::log::error("safety", "EMERGENCY STOP — obstacle within 15cm"); horus::blackbox::record("safety", "E-stop triggered: obstacle within critical distance"); // Zero all motor commands immediately horus::msg::CmdVel stop{}; cmd_pub_->send(stop); } else if (danger && estop_active_) { // Still in danger — reset clear counter, keep sending stop consecutive_clears_ = 0; horus::msg::CmdVel stop{}; cmd_pub_->send(stop); } else if (!danger && estop_active_) { // Danger cleared — debounce before releasing consecutive_clears_++; if (consecutive_clears_ >= clear_threshold_) { estop_active_ = false; horus::msg::EmergencyStop clear{}; clear.engaged = 0; estop_pub_->send(clear); horus::log::info("safety", "E-stop cleared after debounce"); horus::blackbox::record("safety", "E-stop released after clear threshold met"); } else { // Still debouncing — keep motors stopped horus::msg::CmdVel stop{}; cmd_pub_->send(stop); } } // Log at 1 Hz (every 50th tick at 50 Hz) if (++tick_count_ % 50 == 0) { char buf[128]; std::snprintf(buf, sizeof(buf), "estop=%s clears=%d/%d", estop_active_ ? "ACTIVE" : "clear", consecutive_clears_, clear_threshold_); horus::log::info("safety", buf); } } void enter_safe_state() override { // Called by scheduler on deadline miss — force E-stop horus::msg::CmdVel stop{}; cmd_pub_->send(stop); estop_active_ = true; consecutive_clears_ = 0; horus::blackbox::record("safety", "Scheduler forced safe state (deadline miss or watchdog)"); } private: horus::Subscriber* scan_sub_; horus::Publisher* estop_pub_; horus::Publisher* cmd_pub_; // Configuration float critical_dist_; // meters — trigger threshold int clear_threshold_; // consecutive clear ticks before release // State bool estop_active_ = false; int consecutive_clears_ = 0; int ticks_without_scan_ = 0; int tick_count_ = 0; }; int main() { horus::Scheduler sched; sched.tick_rate(50_hz).name("safety_demo"); // 15cm critical distance, 25 clear readings (0.5s at 50 Hz) to release SafetyMonitor safety(0.15f, 25); sched.add(safety) .order(0) // highest priority — runs first .budget(2_ms) // tight budget for safety-critical code .on_miss(horus::Miss::SafeMode) // strictest policy — force safe state on any overrun .build(); sched.spin(); } ``` ### Common Errors | Symptom | Cause | Fix | |---------|-------|-----| | E-stop never triggers | `critical_dist_` too small or LiDAR beams miss the obstacle | Increase distance or verify LiDAR field of view | | E-stop flickers on/off rapidly | No debounce on clear, or `clear_threshold_` too low | Increase `clear_threshold_` (25-50 at 50 Hz) | | Motors keep running after E-stop | E-stop node runs after motor node in execution order | Set E-stop to `order(0)` (highest priority) | | E-stop activates on startup | No LiDAR data yet, fail-safe triggers | Allow a startup grace period before fail-safe kicks in | | System enters safe mode unexpectedly | Budget too tight for scan processing | Increase budget from 2ms to 3ms, or reduce scan resolution | | E-stop never releases | Sensor has spurious short-range noise | Filter readings below sensor's minimum range (typically 0.02-0.05m) | ### Design Decisions | Choice | Rationale | |--------|-----------| | `order(0)` — highest priority | Safety must run before any other node can publish motion commands | | `Miss::SafeMode` policy | Any deadline miss in the safety node is itself a safety failure — force full stop | | Instant trigger, debounced clear | Triggering must be immediate; releasing can afford 0.5s delay to prevent flicker | | Fail-safe on missing data | Silence from LiDAR means no sensor coverage — assume danger | | `horus::blackbox::record()` | All E-stop events logged for post-mortem analysis and incident reporting | | `enter_safe_state()` forces E-stop | Scheduler can force safety independently of sensor data | | 50 Hz tick rate | Fast enough for <20ms response time, reasonable CPU cost | ### Variations **Multi-sensor E-stop** — monitor LiDAR, bumpers, and IMU simultaneously: ```cpp void tick() override { auto scan = scan_sub_->recv(); auto bump = bumper_sub_->recv(); auto imu = imu_sub_->recv(); bool danger = false; if (scan) danger |= check_proximity(scan); if (bump) danger |= (bump->get()->pressed != 0); if (imu) danger |= check_tilt(imu); // tip-over detection // Same state machine as single-sensor version // ... } ``` **Zone-based E-stop** — different thresholds for front, sides, and rear: ```cpp // Front (330-30 deg): 30cm critical, 80cm warning // Sides (30-150, 210-330): 15cm critical // Rear (150-210): 10cm critical (backing up is slower) for (int i = 0; i < 360; i++) { float threshold = get_zone_threshold(i); if (r > 0.02f && r < threshold) { danger = true; break; } } ``` **Hardware GPIO E-stop integration** — read a physical E-stop button: ```cpp // Subscribe to a GPIO publisher that reads the hardware button auto hw_estop = hw_estop_sub_->recv(); if (hw_estop && hw_estop->get()->engaged) { estop_active_ = true; // hardware override takes priority horus::blackbox::record("safety", "Hardware E-stop pressed"); } ``` ### Key Takeaways - Safety nodes run at `order(0)` with `Miss::SafeMode` — no exceptions - `horus::blackbox::record()` is mandatory for every E-stop event — you will need it for incident investigation - Debounce on clear, never on trigger — instant activation, deliberate release - Fail-safe means silence = danger — if the sensor stops publishing, assume the worst - `enter_safe_state()` gives the scheduler an independent path to force a stop, separate from sensor data --- ## Recipe: Telemetry Logger (Python) Path: /recipes/telemetry-logger-python Description: Async Python node that subscribes to topics and writes CSV logs using non-blocking file I/O. ## Telemetry Logger (Python) Subscribes to sensor and motor topics and writes timestamped data to a CSV file. Uses `async def tick` so file I/O runs on an async thread pool and never blocks the real-time control loop. ### Problem You need to log topic data to disk for post-run analysis without affecting the timing of real-time control nodes. ### When To Use - Recording flight/drive data for post-analysis - Capturing sensor streams for offline algorithm development - Debugging intermittent issues that require long recordings ### Prerequisites - HORUS installed ([Installation Guide](/getting-started/installation)) - Upstream nodes publishing topics you want to log ### horus.toml ```toml [package] name = "telemetry-logger-py" version = "0.1.0" description = "Non-blocking topic logger to CSV (Python)" language = "python" ``` ### Complete Code ```python #!/usr/bin/env python3 """Async telemetry logger — writes CSV without blocking the control loop.""" import os import time import aiofiles import asyncio import horus from horus import Node, Odometry, Imu, us, ms # ── Configuration ──────────────────────────────────────────── LOG_FILE = f"telemetry_{int(time.time())}.csv" CSV_HEADER = "tick,timestamp,odom_x,odom_y,odom_theta,odom_speed,imu_accel_z,imu_gyro_z\n" # ── State ──────────────────────────────────────────────────── file_handle = [None] tick_count = [0] lines_written = [0] # ── Node callbacks ─────────────────────────────────────────── async def logger_init(node): """Open log file and write CSV header — runs once before tick loop.""" file_handle[0] = await aiofiles.open(LOG_FILE, mode="w") await file_handle[0].write(CSV_HEADER) await file_handle[0].flush() print(f"Logger: writing to {LOG_FILE}") async def logger_tick(node): """Read topics and append a CSV line — async file I/O, never blocks RT nodes.""" tick_count[0] += 1 # IMPORTANT: always recv() every tick to drain buffers odom = node.recv("pose.fused") imu = node.recv("imu.raw") # Use defaults if topics haven't published yet odom_x = odom.x if odom else 0.0 odom_y = odom.y if odom else 0.0 odom_theta = odom.theta if odom else 0.0 odom_speed = odom.linear_velocity if odom else 0.0 imu_az = imu.accel_z if imu else 0.0 imu_gz = imu.gyro_z if imu else 0.0 # Write CSV line — async I/O so this never blocks the scheduler if file_handle[0] is not None: line = ( f"{tick_count[0]}," f"{time.time():.6f}," f"{odom_x:.4f}," f"{odom_y:.4f}," f"{odom_theta:.4f}," f"{odom_speed:.4f}," f"{imu_az:.4f}," f"{imu_gz:.4f}\n" ) await file_handle[0].write(line) lines_written[0] += 1 # Flush every 100 lines to balance performance and data safety if lines_written[0] % 100 == 0: await file_handle[0].flush() async def logger_shutdown(node): """Flush and close the log file.""" if file_handle[0] is not None: await file_handle[0].flush() await file_handle[0].close() file_handle[0] = None print(f"Logger: wrote {lines_written[0]} lines to {LOG_FILE}") # ── Main ───────────────────────────────────────────────────── logger_node = Node( name="Logger", tick=logger_tick, # async def — auto-detected, runs on async I/O pool init=logger_init, shutdown=logger_shutdown, rate=10, # 10 Hz logging — enough for post-analysis order=99, # Runs AFTER all data-producing nodes subs=["pose.fused", "imu.raw"], pubs=[], ) if __name__ == "__main__": horus.run(logger_node) ``` ### Expected Output ```text [HORUS] Scheduler running — tick_rate: 1000 Hz Logger: writing to telemetry_1711036800.csv [HORUS] Node "Logger" started (10 Hz) ^C Logger: wrote 150 lines to telemetry_1711036800.csv [HORUS] Shutting down... [HORUS] Node "Logger" shutdown complete ``` Generated CSV: ```text tick,timestamp,odom_x,odom_y,odom_theta,odom_speed,imu_accel_z,imu_gyro_z 1,1711036800.100000,0.0000,0.0000,0.0000,0.0000,9.8100,0.0000 2,1711036800.200000,0.0100,0.0000,0.0100,0.5000,9.8100,0.0500 3,1711036800.300000,0.0200,0.0010,0.0200,0.5000,9.8100,0.0500 ... ``` ### Synchronous Alternative If you do not need async I/O (or `aiofiles` is not available), use a synchronous tick with `compute=True` to run on a thread pool: ```python import horus from horus import Node log_file = [None] def sync_init(node): log_file[0] = open("telemetry.csv", "w") log_file[0].write("tick,data\n") def sync_tick(node): data = node.recv("sensor.data") if log_file[0] and data: log_file[0].write(f"{data}\n") def sync_shutdown(node): if log_file[0]: log_file[0].flush() log_file[0].close() logger = Node( name="SyncLogger", tick=sync_tick, init=sync_init, shutdown=sync_shutdown, rate=10, order=99, compute=True, # thread pool — file I/O won't block RT nodes subs=["sensor.data"], ) if __name__ == "__main__": horus.run(logger) ``` ### Key Points - **`async def tick`** is auto-detected by `horus.Node` — marks the node for async I/O execution class - **`aiofiles`** provides non-blocking file writes — the scheduler thread is never blocked by disk I/O - **`init()` opens the file** once before the tick loop starts - **`shutdown()` flushes and closes** the file — prevents data loss on Ctrl+C - **Periodic flush** (every 100 lines) balances write performance with crash-safety - **10 Hz logging** is typical for post-flight analysis; use 100 Hz+ for real-time debugging - **`order=99`** ensures the logger runs after all data-producing nodes have published ### Variations - **Binary format**: Replace CSV with MessagePack or struct packing for smaller files - **Rotating logs**: Create a new file every N minutes or N MB to prevent single-file bloat - **Selective logging**: Add a `log.enable` topic to toggle logging on/off at runtime - **Network streaming**: Replace file writes with UDP socket sends for live ground station telemetry ### Common Errors | Symptom | Cause | Fix | |---------|-------|-----| | File is empty after run | `shutdown()` did not flush | Always call `flush()` and `close()` in `shutdown()` | | CSV has all zeros | Topics not publishing before logger starts | Use defaults for missing data and check that upstream nodes are active | | Disk fills up quickly | Logging at too high a rate | Reduce `rate` or switch to binary format | | RT nodes slowed down | Logger not using async or compute mode | Use `async def tick` or `compute=True` | | `ModuleNotFoundError: aiofiles` | `aiofiles` not installed | Run `pip install aiofiles` or use the synchronous alternative | --- ## See Also - [Telemetry Logger (Rust)](/recipes/telemetry-logger) — Rust version with `.async_io()` - [Multi-Sensor Fusion (Python)](/recipes/multi-sensor-fusion-python) — Produces `pose.fused` topic --- ## Recipe: Telemetry Logger Path: /recipes/telemetry-logger Description: Log any topic to CSV file using async I/O — never blocks the control loop. ## Telemetry Logger Subscribes to topics and writes them to a CSV log file using `.async_io()` execution class. File I/O happens on a Tokio blocking thread -- it never blocks or delays the real-time control loop. ### Problem You need to log topic data to disk for post-run analysis without affecting the timing of real-time control nodes. ### When To Use - Recording flight/drive data for post-analysis - Capturing sensor streams for offline algorithm development - Debugging intermittent issues that require long recordings ### Prerequisites - HORUS installed ([Installation Guide](/getting-started/installation)) - Upstream nodes publishing the topics you want to log ### horus.toml ```toml [package] name = "telemetry-logger" version = "0.1.0" description = "Non-blocking topic logger to CSV" ``` ### Complete Code ### Expected Output ```text [HORUS] Scheduler running — tick_rate: 10 Hz [HORUS] Node "Logger" started (AsyncIo, 10 Hz) ^C [HORUS] Shutting down... [HORUS] Node "Logger" shutdown complete ``` Generated `telemetry.csv`: ```text tick,x,y,theta,speed,confidence,left_rpm,right_rpm,battery_v 1,0.0000,0.0000,0.0000,0.0000,0.00,0.0,0.0,0.00 2,0.0100,0.0000,0.0100,0.5000,0.90,45.0,47.0,12.40 ... ``` ### Key Points - **`.async_io()`** runs the node on a Tokio blocking thread pool — file writes never block the RT scheduler - **`init()`** opens the file once before the tick loop starts - **`shutdown()`** flushes and closes the file — prevents data loss - **`unwrap_or_default()`** on recv — logger uses default (zeros) if a topic hasn't published yet - **10Hz logging** is typical for post-flight analysis; use 100Hz+ for real-time debugging - **Combine with any other recipe** — just match the topic names and message types ### Variations - **Binary format**: Replace CSV with MessagePack or bincode for smaller files and faster writes - **Rotating logs**: Create a new file every N minutes or N MB to prevent single-file bloat - **Selective logging**: Add a `log.enable` topic that lets other nodes toggle logging on/off at runtime - **Network streaming**: Replace `File` with a TCP socket to stream telemetry to a ground station ### Common Errors | Symptom | Cause | Fix | |---------|-------|-----| | File is empty after run | `shutdown()` did not flush | Always call `f.flush()` in `shutdown()` | | CSV has all zeros | Topics not publishing before logger starts | Use `unwrap_or_default()` and check that upstream nodes are active | | Disk fills up quickly | Logging at too high a rate | Reduce `.rate()` or switch to binary format | | RT nodes slowed down | Logger not using `.async_io()` | Add `.async_io()` to the logger's scheduler config | | Permission denied on file create | Running as unprivileged user | Write to a user-writable directory or run with appropriate permissions | --- ## See Also - [Execution Classes](/concepts/execution-classes) — AsyncIo class - [Telemetry Export](/development/telemetry) — Prometheus/Grafana export --- ## Recipe: Multi-Sensor Fusion (C++) Path: /recipes/multi-sensor-fusion-cpp Description: Fuse IMU + odometry for localization using a simple Kalman-style filter ## Multi-Sensor Fusion (C++) Fuse IMU heading with wheel odometry position for more accurate localization than either sensor alone. Uses a complementary filter approach to blend the strengths of both sensors. ### Problem Wheel odometry gives position (x, y) and heading (theta), but heading drifts because of wheel slip, uneven terrain, and tire wear. The IMU gyroscope gives accurate short-term heading changes but drifts long-term due to bias. You need to combine both sensors so the robot's position estimate is better than either one alone. ### How It Works This fusion node runs a **complementary filter on heading** while taking position directly from odometry: - **Position (x, y)**: comes from wheel odometry. The IMU accelerometer is too noisy for double-integration into position — it drifts by meters within seconds. Wheel odometry is the only practical source for position without GPS or SLAM. - **Heading (theta)**: blended between odometry heading and IMU gyro integration. The gyro is trusted for short-term changes (no wheel slip), while odometry heading provides a long-term reference (no gyro drift). The fusion formula for heading: ```text theta = alpha * (theta + gyro_z * dt) + (1 - alpha) * odom_theta ``` Where `alpha` is the IMU trust factor (0.95 = 95% gyro, 5% odometry correction). High alpha gives smoother heading during turns (gyro-dominated). Low alpha corrects gyro drift faster (odometry-dominated). **Why not a full Kalman filter?** A Kalman filter is the theoretically optimal approach, but it requires a noise model (process and measurement covariances), matrix operations, and careful tuning. The complementary filter gives 90% of the benefit with 10% of the complexity. Start here, upgrade to Kalman only if you need covariance estimates or more than two sensors. ### When To Use - Any ground robot with wheel encoders and an IMU - When odometry heading drifts due to wheel slip (carpet, gravel, wet floors) - When you need better localization than odometry alone but cannot run SLAM ### Prerequisites - HORUS installed ([Installation Guide](/docs/getting-started/installation)) - Basic understanding of nodes and topics ([C++ Quick Start](/docs/getting-started/quick-start-cpp)) - An odometry source (e.g., [Differential Drive recipe](/docs/recipes/differential-drive-cpp)) - An IMU source (e.g., [IMU Reader recipe](/docs/recipes/imu-reader-cpp)) ### Complete Code ```cpp #include #include #include using namespace horus::literals; class SensorFusion : public horus::Node { public: SensorFusion(double alpha = 0.95) : Node("sensor_fusion"), alpha_(alpha) { // Subscribe to both sensor sources imu_sub_ = subscribe("imu.data"); odom_sub_ = subscribe("odom"); // Publish the fused estimate fused_pub_ = advertise("fused.pose"); } void tick() override { // ── Always drain both subscribers to avoid stale data ──────── auto imu = imu_sub_->recv(); auto odom = odom_sub_->recv(); double dt = 1.0 / 100.0; // matches 100 Hz tick rate // ── Predict from odometry (position + heading reference) ──── if (odom) { // Position comes directly from odometry — no IMU correction // IMU accelerometer is too noisy for position estimation x_ = odom->get()->pose.x; y_ = odom->get()->pose.y; // Odometry heading provides long-term reference (no gyro drift) odom_theta_ = odom->get()->pose.theta; has_odom_ = true; } // ── Correct heading from IMU gyroscope ────────────────────── if (imu) { double gz = imu->get()->angular_velocity[2]; // yaw rate (rad/s) if (has_odom_) { // Complementary filter: gyro for short-term, odom for long-term // alpha_ high (0.95) = trust gyro more = smooth during wheel slip // alpha_ low (0.80) = trust odom more = corrects gyro drift faster theta_ = alpha_ * (theta_ + gz * dt) + (1.0 - alpha_) * odom_theta_; } else { // No odometry yet — integrate gyro only (will drift) theta_ += gz * dt; } has_imu_ = true; } else if (has_odom_ && !has_imu_) { // No IMU data — fall back to odometry heading theta_ = odom_theta_; } // Normalize theta to [-pi, pi] if (theta_ > M_PI) theta_ -= 2.0 * M_PI; if (theta_ < -M_PI) theta_ += 2.0 * M_PI; // ── Publish fused estimate ────────────────────────────────── horus::msg::Odometry fused{}; fused.pose.x = x_; fused.pose.y = y_; fused.pose.theta = theta_; fused_pub_->send(fused); // Log at 2 Hz (every 50th tick at 100 Hz) if (++tick_count_ % 50 == 0) { double heading_diff = theta_ - odom_theta_; char buf[128]; std::snprintf(buf, sizeof(buf), "fused=(%.2f, %.2f, %.1f deg) odom_drift=%.1f deg", x_, y_, theta_ * 180.0 / M_PI, heading_diff * 180.0 / M_PI); horus::log::info("fusion", buf); // Log large heading disagreement as a warning if (std::abs(heading_diff) > 0.17) { // ~10 degrees horus::log::warn("fusion", "Large IMU/odom heading disagreement — possible wheel slip"); } } } void enter_safe_state() override { // Publish last known good pose with a flag indicating degraded mode. // Downstream nodes (e.g., navigation) should check for stale data // and switch to a conservative behavior. horus::msg::Odometry last{}; last.pose.x = x_; last.pose.y = y_; last.pose.theta = theta_; fused_pub_->send(last); horus::blackbox::record("fusion", "Entered safe state — publishing last known pose"); horus::log::warn("fusion", "Safe state active — fused estimate may be stale"); } private: horus::Subscriber* imu_sub_; horus::Subscriber* odom_sub_; horus::Publisher* fused_pub_; // Filter parameter double alpha_; // IMU trust factor (0.80-0.99 typical) // Fused state (double for running-sum precision) double x_ = 0, y_ = 0, theta_ = 0; double odom_theta_ = 0; // Data availability flags bool has_odom_ = false; bool has_imu_ = false; int tick_count_ = 0; }; int main() { horus::Scheduler sched; sched.tick_rate(100_hz).name("fusion_demo"); // alpha=0.95: 95% gyro trust for heading, 5% odometry correction SensorFusion fusion(0.95); sched.add(fusion) .order(20) // after odometry (0-10) and IMU (5) nodes .budget(2_ms) // lightweight math .on_miss(horus::Miss::Warn) // warn but don't stop — downstream can tolerate a stale estimate .build(); sched.spin(); } ``` ### Common Errors | Symptom | Cause | Fix | |---------|-------|-----| | Heading oscillates during turns | `alpha` too low — odometry heading disagrees with gyro | Increase alpha (0.95-0.99) to trust gyro more during turns | | Heading drifts when stationary | `alpha` too high — gyro drift not corrected | Decrease alpha toward 0.90 or calibrate gyro bias | | Position is completely wrong | Odometry node not running or publishing on wrong topic | Verify odom node publishes to "odom" topic | | Sudden heading jumps | Odometry theta wraps at pi/-pi differently than fused theta | Normalize both angles before blending | | No output on startup | Neither sensor has published yet | Check has_odom/has_imu flags; fusion waits for first data | | Heading matches odometry exactly | IMU not publishing or alpha = 0 | Verify IMU node runs and alpha is set correctly | ### Design Decisions | Choice | Rationale | |--------|-----------| | Position from odometry only | IMU accelerometer double-integration drifts meters in seconds — unusable for position | | Complementary filter over Kalman | 90% of Kalman's benefit with 10% of the complexity; no covariance matrices | | `alpha = 0.95` default | Trusts gyro for smooth turns while slowly correcting drift from odometry | | Graceful degradation | Works with odometry-only, IMU-only, or both — does not crash on missing data | | `order(20)` | Runs after sensor nodes (IMU=5, odom=10) so both inputs are fresh | | `Miss::Warn` policy | A stale fused estimate is imperfect but not dangerous — downstream nodes tolerate it | | Heading disagreement warning | Large drift between IMU and odometry signals wheel slip or sensor failure | | `enter_safe_state()` publishes last pose | Downstream navigation nodes get a final estimate rather than silence during a safety event | ### Variations **Extended Kalman Filter (EKF)** — for optimal fusion with uncertainty estimates: ```cpp // State: [x, y, theta], Measurement: [odom_x, odom_y, odom_theta, gyro_z] // Requires process noise Q and measurement noise R matrices void predict(double v, double w, double dt) { x_ += v * std::cos(theta_) * dt; y_ += v * std::sin(theta_) * dt; theta_ += w * dt; // P = F * P * F^T + Q (covariance propagation) } void correct_odom(double ox, double oy, double otheta) { // K = P * H^T * (H * P * H^T + R)^-1 (Kalman gain) // x = x + K * (z - H * x) (state update) } ``` **GPS fusion** — add absolute position correction when available: ```cpp if (auto gps = gps_sub_->recv()) { // GPS provides absolute x/y but at low rate (1-10 Hz) and with noise (~2m) double gps_alpha = 0.05; // low trust — GPS is noisy x_ = (1.0 - gps_alpha) * x_ + gps_alpha * gps->get()->x; y_ = (1.0 - gps_alpha) * y_ + gps_alpha * gps->get()->y; } ``` **Multi-IMU fusion** — average multiple IMUs for redundancy and noise reduction: ```cpp auto imu1 = imu1_sub_->recv(); auto imu2 = imu2_sub_->recv(); double gz = 0; int count = 0; if (imu1) { gz += imu1->get()->angular_velocity[2]; count++; } if (imu2) { gz += imu2->get()->angular_velocity[2]; count++; } if (count > 0) gz /= count; // averaged yaw rate ``` ### Key Takeaways - Never integrate IMU accelerometer for position — it drifts meters in seconds - The complementary filter is the right starting point: simple, tunable, and effective - Drain both subscribers every tick, even if only one has new data, to avoid stale buffers - Log the heading disagreement between sensors — it reveals wheel slip and sensor failures - `order(20)` ensures sensor nodes run first and fusion gets fresh data every tick - Use `enter_safe_state()` to publish a final known-good pose before the system shuts down --- ## Recipe: Python CV Node Path: /recipes/python-cv-node Description: Python computer vision node using OpenCV with horus.Node for camera processing. ## Python CV Node A Python node that reads camera images, runs OpenCV processing (e.g., ArUco marker detection), and publishes detection results. Uses `horus.Node` with NumPy-backed images for zero-copy interop. ### Problem You need to run computer vision (OpenCV, ML inference) in Python while integrating with the HORUS node lifecycle and shared memory topics. ### When To Use - ArUco marker detection, object detection, or lane following - Any task that benefits from Python's ML/CV ecosystem (OpenCV, PyTorch, TensorFlow) - Prototyping vision pipelines before porting hot paths to Rust ### Prerequisites - HORUS installed ([Installation Guide](/getting-started/installation)) - Python 3.8+ with `opencv-python` and `numpy` installed - A camera driver publishing images to `camera.image` ### horus.toml ```toml [package] name = "python-cv" version = "0.1.0" description = "Python computer vision with OpenCV" language = "python" [dependencies] opencv-python = { version = ">=4.8", source = "pypi" } numpy = { version = ">=1.24", source = "pypi" } ``` ### Complete Code ```python import horus import numpy as np # ── Detection result ───────────────────────────────────────── class DetectionResult: """Detected marker with ID and pose.""" def __init__(self, marker_id=0, x=0.0, y=0.0, confidence=0.0): self.marker_id = marker_id self.x = x self.y = y self.confidence = confidence # ── CV Node ────────────────────────────────────────────────── def make_marker_detector(): frame_count = [0] def tick(node): # IMPORTANT: always call recv() every tick to drain the buffer img = node.recv("camera.image") if img is None: return # no frame yet frame_count[0] += 1 # Convert horus Image to NumPy array (zero-copy via shared memory pool) frame = img.to_numpy() # returns (height, width, channels) view — no copy # --- OpenCV processing --- # Convert to grayscale for detection gray = frame[:, :, 0] # simplified — use cv2.cvtColor in production # Simulated detection (replace with cv2.aruco.detectMarkers) # In production: # import cv2 # aruco_dict = cv2.aruco.getPredefinedDictionary(cv2.aruco.DICT_4X4_50) # params = cv2.aruco.DetectorParameters() # corners, ids, rejected = cv2.aruco.detectMarkers(gray, aruco_dict, parameters=params) detection = DetectionResult( marker_id=42, x=float(frame.shape[1] / 2), y=float(frame.shape[0] / 2), confidence=0.95, ) node.send("vision.detections", detection) def shutdown(node): print(f"MarkerDetector: processed {frame_count[0]} frames") return horus.Node(name="MarkerDetector", tick=tick, shutdown=shutdown, subs=["camera.image"], pubs=["vision.detections"], rate=30) # ── Main ───────────────────────────────────────────────────── if __name__ == "__main__": horus.run(make_marker_detector()) ``` ### Expected Output ```text [HORUS] Scheduler running — tick_rate: 30 Hz [HORUS] Node "MarkerDetector" started (30 Hz) ^C [HORUS] Shutting down... MarkerDetector: processed 150 frames [HORUS] Node "MarkerDetector" shutdown complete ``` ### Key Points - **`horus.Node`** wraps the Rust scheduler -- Python nodes get the same lifecycle (init/tick/shutdown) - **`node.recv("topic")`** is the Python API for receiving messages -- always call every tick - **`node.send("topic", data)`** publishes to any topic - **`horus.run(*nodes)`** is the one-liner to start the scheduler - **NumPy zero-copy**: `horus.Image` data can be reshaped into NumPy arrays without copying - **30Hz** matches most USB cameras -- no benefit running faster than the sensor - **Pair with Rust nodes**: Python CV node publishes detections, Rust control node subscribes at 100Hz+ ### Variations - **Object detection**: Replace ArUco detection with a YOLO or SSD model for general object detection - **Stereo depth**: Subscribe to two camera topics and compute a disparity map - **GPU acceleration**: Use `cv2.cuda` or PyTorch GPU tensors for real-time inference on large frames - **Multi-camera**: Create multiple `horus.Node` instances, each subscribed to a different camera topic ### Common Errors | Symptom | Cause | Fix | |---------|-------|-----| | `ImportError: No module named 'cv2'` | OpenCV not installed | Run `pip install opencv-python` | | Frame is all zeros | Image format mismatch (RGB vs BGR, wrong resolution) | Check `img.width`, `img.height`, and channel count before reshape | | Node runs slower than camera rate | Processing takes longer than frame interval | Reduce resolution, use ROI cropping, or offload to GPU | | `node.recv()` always returns `None` | Camera driver not running or wrong topic name | Verify topic name with `horus monitor` | | Memory grows over time | NumPy arrays not freed between ticks | Avoid storing frame history; process and discard each frame | --- ## See Also - [Python Bindings](/python/api/python-bindings) — Python API reference - [Image](/stdlib/messages/image) — Image message type - [Image](/python/api/image), [PointCloud](/python/api/pointcloud), [DepthImage](/python/api/depth-image) — zero-copy NumPy/PyTorch/JAX --- ## Recipe: Servo Controller (C++) Path: /recipes/servo-controller-cpp Description: Control servos with position commands and runtime parameter tuning ## Servo Controller (C++) Control a servo motor with position commands, including proportional control, effort clamping, and runtime gain tuning via `horus::Params`. Suitable for robotic arms, pan-tilt heads, and gripper fingers. ### Problem You have a servo or position-controlled actuator that accepts effort commands (torque or voltage). You need to drive it to target positions smoothly, clamp the maximum effort to prevent hardware damage, and adjust gains at runtime without restarting the system. ### How It Works The controller implements a **proportional position loop with velocity feedforward**: ```text error = target_position - current_position effort = clamp(Kp * error + Kv * velocity_feedforward, -max_effort, max_effort) ``` The servo hardware moves in response to the effort command, and the controller reads the new position on the next tick. Two parameters dominate behavior: **Kp (Proportional gain) -- determines tracking aggression** - **Low Kp (0.5-1.0)**: slow, smooth motion with low overshoot. Good for camera gimbals where smooth video matters more than speed. - **Medium Kp (2.0-5.0)**: responsive tracking. Good for robotic arm joints that need to reach targets within a few hundred milliseconds. - **High Kp (10.0+)**: fast response but oscillation risk. Only use with derivative damping (see PID recipe). **max_effort -- protects hardware from self-destruction** Every servo has a torque or current limit beyond which gears strip, windings overheat, or mounting hardware deforms. The `max_effort` clamp is the last line of defense. Set it to 80% of the servo's rated maximum to leave margin for transient loads. **Runtime parameter tuning** via `horus::Params` lets you adjust `Kp`, `max_effort`, and `max_velocity` while the robot is running. This is critical for field calibration where you cannot afford to stop the system, recompile, and redeploy for every gain change. Parameters are read every tick so changes take effect within 10 ms at 100 Hz. **Feedback timeout** protects against encoder failures. If the position feedback topic goes silent for 2 seconds, the controller zeros effort rather than blindly driving the servo into a hard stop. This is a fail-safe: the servo coasts to a stop under its own friction rather than crashing at full torque. ### When To Use - Position-controlled servos (hobby PWM servos, Dynamixel, industrial servo drives) - Pan-tilt camera heads with live tuning requirements - Gripper fingers that need adjustable grip force - Any single-DOF actuator needing smooth position tracking with hardware protection - Field-deployed robots where gains must be adjusted without restarting ### Prerequisites - HORUS installed ([Installation Guide](/docs/getting-started/installation)) - Basic understanding of nodes and topics ([C++ Quick Start](/docs/getting-started/quick-start-cpp)) - A servo or actuator that accepts effort (torque/voltage) commands and provides position feedback ### Complete Code ```cpp #include #include #include #include using namespace horus::literals; class ServoController : public horus::Node { public: ServoController() : Node("servo_controller") { // Subscribe to position targets from upstream (planner, teleop, etc.) target_sub_ = subscribe("servo.target"); // Subscribe to actual position feedback from servo encoder feedback_sub_ = subscribe("servo.feedback"); // Publish effort commands to the servo driver motor_pub_ = advertise("servo.cmd"); } void tick() override { // ── Read runtime parameters (can be changed without restart) ─ double kp = params_.get("servo_kp", 2.0); double max_effort = params_.get("servo_max_effort", 5.0); double max_vel = params_.get("servo_max_velocity", 2.0); // rad/s double deadband = params_.get("servo_deadband", 0.005); // rad // ── Read latest target and feedback ───────────────────────── // IMPORTANT: always recv() every tick to drain buffers even if // you don't use the value this tick. if (auto target = target_sub_->recv()) { target_pos_ = target->get()->linear; } if (auto feedback = feedback_sub_->recv()) { prev_pos_ = current_pos_; current_pos_ = feedback->get()->linear; has_feedback_ = true; no_feedback_count_ = 0; } // ── Safety: do not command without position feedback ──────── if (!has_feedback_) { if (++no_feedback_count_ % 200 == 0) { // warn every 2s at 100 Hz horus::log::warn("servo", "No feedback for 2s — holding zero effort for safety"); } horus::msg::ServoCommand cmd{}; cmd.target_position = static_cast(target_pos_); cmd.target_velocity = 0.0f; cmd.max_force = 0.0f; // zero effort without feedback motor_pub_->send(cmd); return; } // ── Deadband: ignore errors smaller than encoder resolution ─ double error = target_pos_ - current_pos_; if (std::abs(error) < deadband) { // Close enough — hold position with zero effort to save power horus::msg::ServoCommand cmd{}; cmd.target_position = static_cast(current_pos_); cmd.target_velocity = 0.0f; cmd.max_force = static_cast(max_effort); motor_pub_->send(cmd); return; } // ── Proportional control ──────────────────────────────────── double effort = std::clamp(kp * error, -max_effort, max_effort); // ── Velocity limiting ─────────────────────────────────────── // Estimate current velocity from position delta double dt = 0.01; // 100 Hz tick rate double measured_vel = (current_pos_ - prev_pos_) / dt; // Clamp effort to keep velocity under max_vel // If already moving faster than max_vel, reduce effort if (std::abs(measured_vel) > max_vel) { double brake = std::copysign(max_effort * 0.5, -measured_vel); effort = brake; // actively brake if overspeeding } else { // Limit implied acceleration double implied_vel = std::abs(effort) / kp; if (implied_vel > max_vel) { effort = std::copysign(max_vel * kp, effort); effort = std::clamp(effort, -max_effort, max_effort); } } // ── Publish servo command ─────────────────────────────────── horus::msg::ServoCommand cmd{}; cmd.target_position = static_cast(target_pos_); cmd.target_velocity = static_cast(effort); cmd.max_force = static_cast(max_effort); motor_pub_->send(cmd); // ── Diagnostics at 2 Hz (every 50th tick at 100 Hz) ──────── if (++tick_count_ % 50 == 0) { char buf[196]; std::snprintf(buf, sizeof(buf), "target=%.3f actual=%.3f error=%.4f effort=%.2f " "vel=%.2f kp=%.1f max_eff=%.1f", target_pos_, current_pos_, error, effort, measured_vel, kp, max_effort); horus::log::info("servo", buf); } } void enter_safe_state() override { // Zero effort on safety event — servo coasts to stop under friction. // Holding position under power during an emergency is dangerous: // a jammed servo draws maximum current and can overheat or break // mechanical linkages. horus::msg::ServoCommand stop{}; stop.target_position = static_cast(current_pos_); stop.target_velocity = 0.0f; stop.max_force = 0.0f; motor_pub_->send(stop); horus::blackbox::record("servo", "Entered safe state, effort zeroed, servo coasting"); } void set_params(horus::Params& p) { params_ = std::move(p); } private: horus::Subscriber* target_sub_; horus::Subscriber* feedback_sub_; horus::Publisher* motor_pub_; horus::Params params_; // State double target_pos_ = 0; double current_pos_ = 0; double prev_pos_ = 0; bool has_feedback_ = false; int no_feedback_count_ = 0; int tick_count_ = 0; }; int main() { horus::Scheduler sched; sched.tick_rate(100_hz).name("servo_demo"); // Set initial parameters (adjustable at runtime via `horus param set`) horus::Params params; params.set("servo_kp", 2.0); // proportional gain params.set("servo_max_effort", 5.0); // Nm (set to 80% of servo rated max) params.set("servo_max_velocity", 2.0); // rad/s params.set("servo_deadband", 0.005); // rad (~0.3 degrees) ServoController servo; servo.set_params(params); sched.add(servo) .order(10) .budget(2_ms) .on_miss(horus::Miss::Skip) // skip if overrun — stale effort is dangerous .build(); sched.spin(); } ``` ### Gain Tuning Guide **Start with low Kp** (0.5). Command a small step (e.g., 0.1 rad) and watch the response: - **Slow, never reaches target**: increase Kp. Double it until the servo reaches the target within 0.5 seconds. - **Reaches target but slowly**: Kp is close. Increase by 20% increments. - **Overshoots then settles**: Kp is at the edge. Back off 30% or add derivative damping (see PID recipe). - **Oscillates continuously**: Kp is too high. Halve it. **Set max_effort conservatively**. Read the servo datasheet for stall torque. Set `servo_max_effort` to 60-80% of stall torque. A Dynamixel MX-64 has 6.0 Nm stall torque, so `max_effort = 4.0` is a safe starting point. **Set max_velocity for the application**: | Application | Typical max_vel | |-------------|-----------------| | Camera gimbal | 0.5-1.0 rad/s | | Robotic arm joint | 1.0-3.0 rad/s | | Gripper finger | 0.2-0.5 rad/s | | Pan-tilt turret | 1.5-4.0 rad/s | **Runtime tuning via CLI** (no restart needed): ```bash horus param set servo_kp 3.5 horus param set servo_max_effort 4.0 horus param set servo_max_velocity 1.5 ``` ### Common Errors | Symptom | Cause | Fix | |---------|-------|-----| | Servo oscillates around target | Kp too high for the mechanical system | Reduce `servo_kp` by 50%; add derivative damping (see PID recipe) | | Servo moves slowly, never reaches target | Kp too low or max_effort too low | Increase `servo_kp` or `servo_max_effort`; check for mechanical binding | | Servo jerks on large target changes | Velocity limiting too permissive | Reduce `servo_max_velocity` to limit tracking speed | | Servo does nothing | No feedback data (has_feedback_ stays false) | Verify feedback publisher topic name matches `servo.feedback`; check encoder wiring | | Effort maxes out constantly | Target far from current position | Working correctly but servo needs more travel time; reduce target step size or increase max_effort | | Servo holds position but drifts slowly | Deadband too large | Reduce `servo_deadband` (default 0.005 rad is ~0.3 degrees) | | Servo chatters near target | Deadband too small relative to encoder noise | Increase `servo_deadband` to 2x the encoder's noise floor | | Servo runs away after encoder disconnect | Feedback timeout not triggering | The code handles this: zero effort after 200 ticks (2s) without feedback | | Log says "No feedback for 2s" at startup | Feedback publisher not started yet | Normal during system bringup; servo holds zero effort until feedback arrives | ### Design Decisions | Choice | Rationale | |--------|-----------| | `horus::Params` for runtime tuning | Field calibration requires gain changes without restart; impossible to predict optimal gains for every installation | | Position feedback required before commanding | Open-loop servo control risks collisions and mechanical damage; the controller refuses to move without knowing where it is | | Feedback timeout (2s) | If the encoder dies mid-operation, zero effort prevents the servo from driving into a hard stop at maximum torque | | `enter_safe_state()` zeros effort | Holding position under power during an emergency risks overheating, stripped gears, or broken linkages if the servo is jammed | | `Miss::Skip` policy | A stale effort command computed from an old position reading drives the servo based on outdated state, which is worse than one missed tick | | Deadband around target | Eliminates chatter near the setpoint caused by encoder quantization noise; saves power when the servo is already at the target | | Velocity limiting via measured velocity | Prevents jerky motion on large target step changes and actively brakes if the servo is overspeeding | | 100 Hz tick rate | Standard for servo control: fast enough for smooth motion (10 ms latency), slow enough for most encoder hardware | | `order(10)` with 2 ms budget | Servo control runs early in the tick (before logging/diagnostics) with a tight budget to guarantee real-time behavior | ### Variations **Multi-joint arm control** -- control N servos as a coordinated unit: ```cpp class ArmController : public horus::Node { static constexpr int NUM_JOINTS = 6; horus::Subscriber* target_sub_; horus::Subscriber* feedback_sub_; horus::Publisher* cmd_pub_; double positions_[NUM_JOINTS] = {}; double kp_[NUM_JOINTS] = {3.0, 2.5, 2.5, 4.0, 4.0, 5.0}; // per-joint gains double max_effort_[NUM_JOINTS] = {6.0, 5.0, 5.0, 3.0, 3.0, 2.0}; // Nm per joint void tick() override { if (auto fb = feedback_sub_->recv()) { for (int i = 0; i < NUM_JOINTS; i++) positions_[i] = fb->get()->position[i]; } if (auto target = target_sub_->recv()) { horus::msg::JointState cmd{}; for (int i = 0; i < NUM_JOINTS; i++) { double error = target->get()->position[i] - positions_[i]; cmd.effort[i] = std::clamp(kp_[i] * error, -max_effort_[i], max_effort_[i]); } cmd_pub_->send(cmd); } } }; ``` **Trajectory following** -- interpolate between waypoints for smooth, jerk-free motion: ```cpp void tick() override { double t_elapsed = (tick_count_ - segment_start_) * 0.01; // seconds double t_norm = std::clamp(t_elapsed / segment_duration_, 0.0, 1.0); // Cubic interpolation (zero velocity at endpoints) double s = 3.0 * t_norm * t_norm - 2.0 * t_norm * t_norm * t_norm; target_pos_ = start_pos_ + s * (end_pos_ - start_pos_); // ... rest of proportional control as above } ``` **PID instead of P-only** -- for faster convergence and zero steady-state error under load: ```cpp double error = target_pos_ - current_pos_; double dt = 0.01; integral_ += error * dt; integral_ = std::clamp(integral_, -0.5, 0.5); // anti-windup double derivative = (error - prev_error_) / dt; double effort = kp * error + ki * integral_ + kd * derivative; prev_error_ = error; ``` **Gravity compensation** -- for vertical joints that fight gravity: ```cpp // Add a constant feedforward term to counteract gravity torque. // Calibrate by measuring the effort needed to hold the joint still. double gravity_comp = 1.2; // Nm, measured experimentally double effort = kp * error + gravity_comp * std::cos(current_pos_); ``` ### Key Takeaways - Always require position feedback before commanding effort -- open-loop servo control is dangerous and risks mechanical damage - `horus::Params` enables runtime tuning without restarting -- essential for field calibration where stopping the system is not an option - Deadband eliminates chatter near the target and saves power by not fighting encoder noise - Velocity limiting prevents jerky motion on large target changes and actively brakes overspeeding servos - `enter_safe_state()` should zero effort, not hold position -- powering the motor during an emergency creates fire and mechanical failure risks - `Miss::Skip` is correct for servo control because a stale error computation drives the servo based on an outdated position reading - Set `max_effort` to 60-80% of the servo's rated stall torque -- leave margin for transient loads --- ## Recipe: Coordinate Transforms (Python) Path: /recipes/transform-frames-python Description: Set up a robot's coordinate frame tree in Python with TransformFrame: static sensor mounts and dynamic joint transforms with point conversion. ## Coordinate Transforms (Python) Build a robot's coordinate frame tree using `horus.TransformFrame`. Register static sensor mounts (camera, LiDAR) and dynamic frames (base link, arm joint). A publisher node updates dynamic transforms each tick, and a consumer node queries transforms to convert points between frames. ### Problem You need to convert sensor data between coordinate frames (e.g., a camera detection into world coordinates) from Python. ### When To Use - Robots with sensors mounted at known offsets from the base - Articulated arms where end-effector pose depends on joint angles - Any system that needs spatial reasoning between components ### Prerequisites - HORUS installed ([Installation Guide](/getting-started/installation)) - Basic understanding of 3D transforms (translation + rotation) ### horus.toml ```toml [package] name = "transform-frames-py" version = "0.1.0" description = "Coordinate frame tree with sensor mounts" language = "python" ``` ### Complete Code ```python #!/usr/bin/env python3 """Coordinate frame tree: static mounts + dynamic joints + point transforms.""" import math import horus from horus import Node, Scheduler, TransformFrame, Transform, us, ms # ── Shared TransformFrame instance ─────────────────────────── tf = TransformFrame() # ============================================================================ # Node 1: FramePublisher — registers and updates the coordinate frame tree # ============================================================================ tick_pub = [0] def publisher_init(node): """Register the frame hierarchy. Frame tree: world (root) └── base_link (dynamic — robot moves in world) ├── camera_link (static — bolted to chassis) ├── lidar_link (static — mounted on top) └── arm_link (dynamic — joint rotates) └── end_effector (dynamic — tool tip) """ # Root frame tf.register_frame("world") # Robot base — dynamic, position changes as robot drives tf.register_frame("base_link", parent="world") # Camera — static mount: 10cm forward, 30cm up, tilted down 15 degrees camera_tf = Transform.from_euler( translation=[0.1, 0.0, 0.3], rpy=[0.0, 0.26, 0.0], # roll=0, pitch=15deg, yaw=0 ) tf.register_static_frame("camera_link", camera_tf, parent="base_link") # LiDAR — static mount: centered on top, 40cm up lidar_tf = Transform.from_translation([0.0, 0.0, 0.4]) tf.register_static_frame("lidar_link", lidar_tf, parent="base_link") # Arm link — dynamic, rotates around Z axis tf.register_frame("arm_link", parent="base_link") # End effector — dynamic, extends from arm tf.register_frame("end_effector", parent="arm_link") print(f"Frame tree registered — {tf.frame_count()} frames") print(tf.format_tree()) def publisher_tick(node): tick_pub[0] += 1 t = tick_pub[0] * 0.01 # 100 Hz -> 10 ms per tick # Update base_link: robot drives in a circle radius = 2.0 speed = 0.2 # rad/s base_x = radius * math.cos(speed * t) base_y = radius * math.sin(speed * t) base_yaw = speed * t + math.pi / 2 # face tangent direction base_tf = Transform.from_euler( translation=[base_x, base_y, 0.0], rpy=[0.0, 0.0, base_yaw], ) tf.update_transform("base_link", base_tf) # Update arm_link: joint sweeps back and forth arm_angle = 0.8 * math.sin(t * 0.5) # +/- 0.8 rad arm_tf = Transform.from_euler( translation=[0.15, 0.0, 0.2], rpy=[0.0, 0.0, arm_angle], ) tf.update_transform("arm_link", arm_tf) # Update end_effector: extends from arm tip ee_tf = Transform.from_translation([0.3, 0.0, 0.0]) # 30cm arm tf.update_transform("end_effector", ee_tf) if tick_pub[0] % 100 == 0: print( f"[FRAMES] base=({base_x:.2f}, {base_y:.2f}), " f"arm_angle={arm_angle:.2f} rad" ) # ============================================================================ # Node 2: FrameUser — queries transforms and converts sensor data # ============================================================================ tick_user = [0] def user_tick(node): tick_user[0] += 1 # Simulate a detection at a fixed point in camera frame # (1.5m forward, 0.2m left from camera) detection_camera = [1.5, 0.2, 0.0] # Transform the detection from camera_link to world frame if tf.can_transform("camera_link", "world"): world_point = tf.transform_point("camera_link", "world", detection_camera) node.send("detection.world", { "x": world_point[0], "y": world_point[1], "z": world_point[2], }) if tick_user[0] % 100 == 0: print( f"[USER] Camera ({detection_camera[0]:.2f}, " f"{detection_camera[1]:.2f}, {detection_camera[2]:.2f}) " f"-> World ({world_point[0]:.2f}, {world_point[1]:.2f}, " f"{world_point[2]:.2f})" ) # Query end effector position in world frame if tick_user[0] % 100 == 0 and tf.can_transform("end_effector", "world"): ee_tf = tf.tf("end_effector", "world") ee_pos = ee_tf.translation print( f"[USER] End effector in world: " f"({ee_pos[0]:.2f}, {ee_pos[1]:.2f}, {ee_pos[2]:.2f})" ) # Print the frame chain for debugging chain = tf.frame_chain("end_effector", "world") print(f"[USER] Frame chain: {' -> '.join(chain)}") # Check for stale transforms (sensor disconnect detection) if tick_user[0] % 200 == 0: if tf.is_stale("base_link", max_age_sec=0.5): print("WARNING: base_link transform is stale — odometry may be disconnected") def user_shutdown(node): print(f"FrameUser: {tick_user[0]} ticks") print(tf.format_tree()) # ============================================================================ # Main — two nodes sharing the same TransformFrame # ============================================================================ publisher_node = Node( name="FramePublisher", tick=publisher_tick, init=publisher_init, rate=100, # 100 Hz — updates dynamic transforms order=0, # Runs first pubs=[], subs=[], on_miss="skip", ) user_node = Node( name="FrameUser", tick=user_tick, shutdown=user_shutdown, rate=50, # 50 Hz — queries are cheaper than updates order=1, # Runs after publisher pubs=["detection.world"], subs=[], on_miss="warn", ) if __name__ == "__main__": horus.run(publisher_node, user_node) ``` ### Expected Output ```text Frame tree registered — 6 frames world └── base_link ├── camera_link [static] ├── lidar_link [static] └── arm_link └── end_effector [HORUS] Scheduler running — tick_rate: 1000 Hz [HORUS] Node "FramePublisher" started (100 Hz) [HORUS] Node "FrameUser" started (50 Hz) [FRAMES] base=(2.00, 0.00), arm_angle=0.00 rad [USER] Camera (1.50, 0.20, 0.00) -> World (2.07, 0.22, 0.30) [USER] End effector in world: (2.43, 0.03, 0.20) [USER] Frame chain: end_effector -> arm_link -> base_link -> world [FRAMES] base=(1.96, 0.39), arm_angle=0.39 rad [USER] Camera (1.50, 0.20, 0.00) -> World (1.89, 0.65, 0.30) [USER] End effector in world: (2.18, 0.57, 0.20) [USER] Frame chain: end_effector -> arm_link -> base_link -> world ^C FrameUser: 500 ticks world └── base_link ├── camera_link [static] ├── lidar_link [static] └── arm_link └── end_effector [HORUS] Shutting down... ``` ### Key Points - **`TransformFrame`** is shared between nodes — both publisher and user reference the same tree. The underlying Rust implementation is lock-free, so concurrent access is safe. - **Static vs dynamic frames**: `camera_link` and `lidar_link` use `register_static_frame()` because they are bolted to the chassis. Dynamic frames use `register_frame()` + `update_transform()`. - **`Transform.from_euler(translation, rpy)`** creates a transform from position and roll/pitch/yaw angles. - **`transform_point(src, dst, point)`** converts a 3D point from one frame to another, walking the tree automatically. - **`can_transform()`** checks before querying — avoids exceptions when frames are not yet updated. - **`is_stale()`** catches disconnected sensors or frozen publishers without polling. - **`frame_chain()`** returns the path between frames — useful for debugging unexpected transform results. ### Variations - **URDF loading**: Parse a URDF file at init to build the frame tree automatically - **Multi-process frames**: Publish `TransformStamped` messages on a shared topic so multiple processes share transforms - **Time-based queries**: Use `tf_at()` to query transforms at a specific timestamp for interpolation - **Visualization**: Publish frame positions to a topic for rendering in horus-sim3d ### Common Errors | Symptom | Cause | Fix | |---------|-------|-----| | `HorusTransformError` | Frame not registered or parent chain incomplete | Verify all frames are registered in `init()` with correct parent names | | Stale transform warnings | Publisher stopped updating a dynamic frame | Check that `update_transform()` runs every tick for dynamic frames | | Wrong point coordinates | Frame chain goes through unexpected intermediate frames | Use `frame_chain()` to debug the actual path | | Static transforms change | Using `update_transform()` on a static frame | Use `register_static_frame()` for fixed sensor mounts | | `can_transform()` returns False | Dynamic frame never received its first update | Ensure publisher `init()` or first `tick()` updates all dynamic frames | --- ## See Also - [Coordinate Transform Tree (Rust)](/recipes/transform-frames) — Rust version with `FrameId` caching - [TransformFrame Concepts](/concepts/transform-frame) — How frames work --- ## Recipe: Coordinate Transform Tree Path: /recipes/transform-frames Description: Set up a robot's coordinate frame tree with base_link, sensors, and end effector frames. # Recipe: Coordinate Transform Tree Build a robot's coordinate frame tree with static sensor mounts and dynamic joint transforms. A `FramePublisher` node registers the frame hierarchy (world -> base_link -> camera_link -> end_effector) and updates dynamic transforms each tick to simulate a moving robot. A `FrameUser` node queries transforms between frames and converts sensor data from camera coordinates into the world frame. ### Problem You need to convert sensor data between coordinate frames (e.g., a camera detection into world coordinates) across a multi-joint robot. ### When To Use - Robots with sensors mounted at known offsets from the base - Articulated arms where end-effector pose depends on joint angles - Any system that needs to reason about spatial relationships between components ### Prerequisites - HORUS installed ([Installation Guide](/getting-started/installation)) - Basic understanding of 3D transforms (translation + rotation) ### horus.toml ```toml [package] name = "transform_frames" version = "0.1.0" language = "rust" [dependencies] horus = "0.1" ``` ### Complete Code ### Expected Output ``` world └── base_link ├── camera_link [static] ├── lidar_link [static] └── arm_link └── end_effector [INFO] Frame tree registered — 6 frames [INFO] Frame user ready — will transform detections from camera to world [INFO] Transform frame system running — Ctrl+C to stop [INFO] [FRAMES] base=(2.00, 0.00), arm_angle=0.00 rad [INFO] [USER] Detection in camera: (1.50, 0.20, 0.00) -> world: (2.07, 0.22, 0.30) [INFO] [USER] End effector in world: (2.43, 0.03, 0.20) [INFO] [USER] Frame chain: end_effector -> arm_link -> base_link -> world [INFO] [FRAMES] base=(1.96, 0.39), arm_angle=0.39 rad [INFO] [USER] Detection in camera: (1.50, 0.20, 0.00) -> world: (1.89, 0.65, 0.30) [INFO] [USER] End effector in world: (2.18, 0.57, 0.20) [INFO] [USER] Frame chain: end_effector -> arm_link -> base_link -> world ^C [INFO] Frame publisher shutdown — 6 total, 2 static, 4 dynamic, depth 4 [INFO] Frame user shutdown after 500 ticks world └── base_link ├── camera_link [static] ├── lidar_link [static] └── arm_link └── end_effector ``` ### Key Points - **`Arc`** shared across nodes: Both the publisher and user hold a reference to the same tree. `TransformFrame` is lock-free internally, so concurrent reads and writes are safe without mutexes. - **Static vs dynamic frames**: `camera_link` and `lidar_link` use `.static_transform()` because they are bolted to the chassis. `base_link`, `arm_link`, and `end_effector` are dynamic and updated every tick. - **`FrameId` caching**: The publisher caches `FrameId` values at init and uses `update_transform_by_id()` in the hot loop. This avoids string-based name resolution (~200ns) and uses the faster ID path (~50ns). - **No `sleep()` calls**: All timing is managed by `.rate()`. The publisher runs at 100Hz and the user at 50Hz. The scheduler handles rate differences. - **Staleness detection**: `is_stale_now()` checks whether a frame's transform data is older than a threshold, which catches disconnected sensors or frozen publishers without polling. ### Variations - **URDF loading**: Parse a URDF file at init to build the frame tree automatically instead of manual `add_frame()` calls - **Network-distributed frames**: Publish `TFMessage` on a shared topic so multiple processes share the same tree - **Interpolation**: Use `query().at_time(t)` to interpolate between transform updates for smoother downstream data - **Visualization**: Publish frame positions to a `frames.viz` topic for rendering in horus-sim3d or an external viewer ### Common Errors | Symptom | Cause | Fix | |---------|-------|-----| | `Transform not available` error | Frame not registered or parent chain incomplete | Verify all frames are added in `init()` with correct parent names | | Stale transform warnings | Publisher stopped updating a dynamic frame | Check that `update_transform()` runs every tick for dynamic frames | | Wrong point coordinates | Frame chain goes through unexpected intermediate frames | Use `query().chain()` to debug the actual path | | Performance degrades with many frames | String-based lookups in hot loop | Cache `FrameId` at init and use `update_transform_by_id()` | | Static transforms drift | Using `update_transform()` on a static frame | Use `.static_transform()` at registration time for fixed mounts | --- ## See Also - [TransformFrame Concepts](/concepts/transform-frame) — How frames work - [TransformFrame API](/rust/api/transform-frame) — Full API reference --- ## Recipe: Transform Frames (C++) Path: /recipes/transform-frames-cpp Description: Coordinate frame tree for multi-sensor robots ## Transform Frames (C++) Build a coordinate frame tree for a robot with multiple sensors, set static transforms for sensor mounts, update dynamic transforms from localization, and query spatial relationships between frames. ### Problem Your robot has a LiDAR, a camera, and an IMU, each mounted at different positions and orientations on the chassis. A LiDAR point at `(1.5, 0.3)` in the LiDAR frame is useless until you know where that is in the robot's body frame or the world frame. You need a transform tree that tracks the spatial relationship between every sensor and the world, updated in real time as the robot moves. ### How It Works A **transform tree** is a directed acyclic graph (usually a tree) where: - Each **node** is a coordinate frame (e.g., `world`, `base_link`, `lidar`, `camera`) - Each **edge** stores a transform: translation (x, y, z) and rotation (quaternion) - **Static transforms** (sensor mounts) are set once at startup and never change - **Dynamic transforms** (localization) are updated every tick To find the transform between any two frames, HORUS composes transforms along the path in the tree. For example, to get `lidar` in `world` coordinates: ```text T_lidar_in_world = T_base_in_world * T_lidar_in_base ``` The `lookup(source, target)` function does this composition automatically, walking the tree from source to target and multiplying transforms along the way. **Quaternion convention**: HORUS uses `(x, y, z, w)` quaternion ordering. An identity rotation is `{0, 0, 0, 1}`. A 5-degree pitch is `{0, 0, sin(2.5deg), cos(2.5deg)} = {0, 0, 0.0436, 0.999}`. ### When To Use - Any robot with multiple sensors mounted at different positions - When you need sensor fusion that requires aligning data into a common frame - When localization updates the robot's pose in the world frame ### Prerequisites - HORUS installed ([Installation Guide](/docs/getting-started/installation)) - Basic understanding of nodes and topics ([C++ Quick Start](/docs/getting-started/quick-start-cpp)) ### Complete Code ```cpp #include #include #include using namespace horus::literals; class LocalizationNode : public horus::Node { public: LocalizationNode(horus::TransformFrame& tf) : Node("localizer"), tf_(tf) { odom_sub_ = subscribe("odom"); } void tick() override { if (auto odom = odom_sub_->recv()) { // Update base_link position in world frame from odometry double x = odom->get()->pose.x; double y = odom->get()->pose.y; double theta = odom->get()->pose.theta; // Convert heading to quaternion (rotation around Z axis) double half_angle = theta / 2.0; tf_.update("base_link", {x, y, 0.0}, // translation {0, 0, std::sin(half_angle), std::cos(half_angle)}, // quaternion (x,y,z,w) horus::now_ns()); // timestamp } } void enter_safe_state() override { // Stop updating transforms — downstream nodes see stale data // and should interpret it as "localization degraded" horus::blackbox::record("localizer", "Entered safe state, transform updates halted"); } private: horus::TransformFrame& tf_; horus::Subscriber* odom_sub_; }; class SensorProcessorNode : public horus::Node { public: SensorProcessorNode(horus::TransformFrame& tf) : Node("sensor_processor"), tf_(tf) { scan_sub_ = subscribe("lidar.scan"); } void tick() override { auto scan = scan_sub_->recv(); if (!scan) return; // ── Transform a LiDAR point into world coordinates ────────── // Example: closest obstacle point in LiDAR frame float min_range = 999.0f; int min_idx = 0; for (int i = 0; i < 360; i++) { if (scan->get()->ranges[i] > 0.05f && scan->get()->ranges[i] < min_range) { min_range = scan->get()->ranges[i]; min_idx = i; } } // Point in LiDAR frame: (range * cos(angle), range * sin(angle), 0) double angle_rad = min_idx * M_PI / 180.0; double lx = min_range * std::cos(angle_rad); double ly = min_range * std::sin(angle_rad); // Look up the transform from lidar to world if (auto t = tf_.lookup("lidar", "world")) { // t->translation gives lidar origin in world frame // Full point transform: world_point = R * lidar_point + t // For 2D (z=0): approximate with translation only double wx = t->translation[0] + lx; double wy = t->translation[1] + ly; if (++tick_count_ % 50 == 0) { char buf[128]; std::snprintf(buf, sizeof(buf), "Closest obstacle: lidar=(%.2f, %.2f) world=(%.2f, %.2f)", lx, ly, wx, wy); horus::log::info("tf_demo", buf); } } // ── Check sensor alignment ────────────────────────────────── if (tf_.can_transform("camera", "lidar")) { // Camera-to-lidar transform exists — can align detections // Use for matching camera bounding boxes with LiDAR point clouds } } private: horus::TransformFrame& tf_; horus::Subscriber* scan_sub_; int tick_count_ = 0; }; int main() { horus::TransformFrame tf; // ── Build frame tree ──────────────────────────────────────────── // world // +-- base_link (dynamic — updated by localization) // +-- lidar (static — front of robot, 20cm forward, 30cm up) // +-- camera (static — front, 10cm forward, 25cm up, 5 deg pitch down) // +-- imu (static — center, 15cm up) tf.register_frame("world"); tf.register_frame("base_link", "world"); tf.register_frame("lidar", "base_link"); tf.register_frame("camera", "base_link"); tf.register_frame("imu", "base_link"); // Static transforms: sensor mounts (set once, timestamp 0 = static) // These must match the physical mounting on your robot tf.update("lidar", {0.20, 0.0, 0.30}, {0, 0, 0, 1}, 0); tf.update("camera", {0.10, 0.0, 0.25}, {0, 0, 0.0436, 0.999}, 0); // ~5 deg pitch tf.update("imu", {0.0, 0.0, 0.15}, {0, 0, 0, 1}, 0); horus::log::info("tf_demo", "Frame tree built: world -> base_link -> {lidar, camera, imu}"); // ── Build scheduler ───────────────────────────────────────────── horus::Scheduler sched; sched.tick_rate(50_hz).name("tf_demo"); // Localization updates base_link in world frame LocalizationNode localizer(tf); sched.add(localizer) .order(0) // run first — other nodes need fresh transforms .budget(2_ms) .on_miss(horus::Miss::Warn) // warn on overrun — stale transforms degrade accuracy .build(); // Sensor processor queries transforms to map LiDAR points into world SensorProcessorNode processor(tf); sched.add(processor) .order(10) // after localization — uses updated transforms .budget(5_ms) .on_miss(horus::Miss::Skip) // skip if overrun — stale LiDAR data is not useful .build(); sched.spin(); } ``` ### Common Errors | Symptom | Cause | Fix | |---------|-------|-----| | `lookup()` returns null | Frame tree is disconnected or frames not registered | Check that parent frames are registered before children | | LiDAR points in wrong world position | Static transform does not match physical sensor mount | Measure sensor offsets on the actual robot hardware | | Transform stale / outdated | Localization node runs after consumer node | Set localization `order(0)` so it runs first each tick | | Rotation wrong | Quaternion components in wrong order | HORUS uses (x, y, z, w) — identity is `{0, 0, 0, 1}` | | Frame tree breaks on second run | Stale SHM from previous run | Run `horus clean --shm` between runs | | Camera-lidar alignment off | Camera pitch angle wrong in static transform | Recalculate quaternion: `sin(angle/2)` for z-component | ### Design Decisions | Choice | Rationale | |--------|-----------| | Static transforms with timestamp 0 | Sensor mounts do not change — setting once avoids redundant updates | | Dynamic base_link updated per tick | Robot pose changes continuously from localization | | `horus::Node` subclass for consumers | State (transform references) persists cleanly across ticks | | Localization at `order(0)` | Downstream nodes need fresh transforms — localization must run first | | `can_transform()` before complex queries | Graceful handling when frame tree is incomplete during startup | | Quaternion rotation (not Euler angles) | No gimbal lock, composable, compact (4 floats vs 3x3 matrix) | | Tree structure (not graph) | Single parent per frame guarantees unique transform paths | ### Variations **Robot arm with end effector** — chain of joint transforms: ```cpp // Each joint is a child of the previous joint tf.register_frame("base_link", "world"); tf.register_frame("shoulder", "base_link"); tf.register_frame("elbow", "shoulder"); tf.register_frame("wrist", "elbow"); tf.register_frame("end_effector", "wrist"); // Update joint angles each tick (quaternion from joint angle) void update_joint(const std::string& name, double angle_rad) { double half = angle_rad / 2.0; tf_.update(name, {0, 0, link_length_}, {0, 0, std::sin(half), std::cos(half)}, horus::now_ns()); } ``` **Multi-robot transforms** — separate subtrees per robot: ```cpp tf.register_frame("world"); tf.register_frame("robot_a", "world"); tf.register_frame("robot_a/lidar", "robot_a"); tf.register_frame("robot_b", "world"); tf.register_frame("robot_b/lidar", "robot_b"); // Now you can lookup robot_a/lidar -> robot_b/lidar for fleet coordination ``` **Publish transforms on a topic** for cross-process sharing: ```cpp // Publisher node horus::msg::TFMessage tf_msg{}; tf_msg.frame = "base_link"; tf_msg.parent = "world"; tf_msg.translation = {x, y, 0}; tf_msg.rotation = {0, 0, sin_half, cos_half}; tf_pub_->send(tf_msg); // Subscriber node in another process if (auto msg = tf_sub_->recv()) { tf_.update(msg->get()->frame, msg->get()->translation, msg->get()->rotation, horus::now_ns()); } ``` ### Key Takeaways - Register the frame tree once at startup; update static transforms with timestamp 0 - Localization must run before any node that queries transforms (`order(0)`) - `lookup()` composes transforms along the tree path automatically — no manual chain multiplication - Measure sensor offsets on the actual robot — CAD models drift from physical reality - `can_transform()` is cheap — use it to guard lookups during startup when the tree may be incomplete --- ## Recipe: Telemetry Logger (C++) Path: /recipes/telemetry-logger-cpp Description: Log robot state to console and horus log at configurable intervals ## Telemetry Logger (C++) A low-priority observer node that subscribes to all robot state topics and logs summary telemetry at a configurable rate. Useful for debugging, monitoring, and post-run analysis without affecting real-time performance. ### Problem You need to see what your robot is doing: where it is, what it is commanding, what its sensors read. But logging at full sensor rate (100-200 Hz) floods the terminal, wastes CPU on string formatting, and can cause real-time nodes to miss deadlines if the logger takes too long. You need a lightweight telemetry tap that runs at the lowest priority and logs at a human-readable rate. ### How It Works The telemetry logger runs at the **lowest execution priority** (`order(100)`) so it never delays real-time nodes. It subscribes to key topics (odometry, velocity commands, IMU) and samples them at a configurable interval. **Rate-limited logging**: Instead of logging every tick, the node counts ticks and only formats output every Nth tick. At 100 Hz with `log_every_n = 100`, you get 1 Hz logging. This approach is cheaper than running the node at 1 Hz because: 1. The node still drains subscriber buffers every tick (prevents stale data buildup) 2. No separate clock or timer needed 3. The log interval is always an integer multiple of the tick rate **Why `horus::log::info()` instead of `printf`?** The horus log system is non-blocking and thread-safe. Output appears in `horus log` CLI, can be filtered by tag, and is captured by the blackbox for post-mortem replay. `printf` blocks on stdout and can cause deadline misses in nodes that share the same terminal. ### When To Use - Debugging any multi-node robot system - Monitoring robot state during integration testing - Recording summary telemetry for post-run analysis ### Prerequisites - HORUS installed ([Installation Guide](/docs/getting-started/installation)) - Basic understanding of nodes and topics ([C++ Quick Start](/docs/getting-started/quick-start-cpp)) ### Complete Code ```cpp #include #include #include using namespace horus::literals; class TelemetryLogger : public horus::Node { public: TelemetryLogger(int log_every_n = 100) : Node("telemetry"), log_every_n_(log_every_n) { // Subscribe to all topics we want to monitor odom_sub_ = subscribe("odom"); cmd_sub_ = subscribe("cmd_vel"); imu_sub_ = subscribe("imu.data"); estop_sub_ = subscribe("emergency.stop"); } void tick() override { // ── Always drain subscriber buffers ───────────────────────── // Even when not logging, drain to prevent stale data buildup. // This ensures the next log entry shows the *latest* data, // not data that has been sitting in the buffer for seconds. auto odom = odom_sub_->recv(); auto cmd = cmd_sub_->recv(); auto imu = imu_sub_->recv(); auto estop = estop_sub_->recv(); // Cache latest values for logging if (odom) { last_x_ = odom->get()->pose.x; last_y_ = odom->get()->pose.y; last_theta_ = odom->get()->pose.theta; odom_alive_ = true; } if (cmd) { last_linear_ = cmd->get()->linear; last_angular_ = cmd->get()->angular; cmd_alive_ = true; } if (imu) { last_accel_z_ = imu->get()->linear_acceleration[2]; last_gyro_z_ = imu->get()->angular_velocity[2]; imu_alive_ = true; } if (estop) { estop_engaged_ = (estop->get()->engaged != 0); } // ── Rate-limited logging ──────────────────────────────────── tick_count_++; if (tick_count_ % log_every_n_ != 0) return; // ── Format and publish summary ────────────────────────────── char buf[256]; std::snprintf(buf, sizeof(buf), "pos=(%.2f, %.2f) hdg=%.1f deg " "cmd=(%.2f m/s, %.2f rad/s) " "az=%.2f m/s^2 gz=%.2f rad/s " "estop=%s", last_x_, last_y_, last_theta_ * 180.0 / M_PI, last_linear_, last_angular_, last_accel_z_, last_gyro_z_, estop_engaged_ ? "ACTIVE" : "clear"); horus::log::info("telemetry", buf); // ── Health check: warn on missing topics ──────────────────── if (!odom_alive_) horus::log::warn("telemetry", "No odom data received"); if (!cmd_alive_) horus::log::warn("telemetry", "No cmd_vel data received"); if (!imu_alive_) horus::log::warn("telemetry", "No IMU data received"); // Record to blackbox at lower rate (every 10th log = 0.1 Hz) if (tick_count_ % (log_every_n_ * 10) == 0) { char bb[128]; std::snprintf(bb, sizeof(bb), "pos=(%.2f,%.2f) cmd=%.2f estop=%d", last_x_, last_y_, last_linear_, estop_engaged_ ? 1 : 0); horus::blackbox::record("telemetry", bb); } } void enter_safe_state() override { // Logger has no actuators, but record the event for post-mortem horus::blackbox::record("telemetry", "System entered safe state — final snapshot: " "see preceding log entries for last known state"); horus::log::warn("telemetry", "Safe state activated by scheduler"); } private: horus::Subscriber* odom_sub_; horus::Subscriber* cmd_sub_; horus::Subscriber* imu_sub_; horus::Subscriber* estop_sub_; // Configurable log interval int log_every_n_; // Cached latest values double last_x_ = 0, last_y_ = 0, last_theta_ = 0; float last_linear_ = 0, last_angular_ = 0; double last_accel_z_ = 0, last_gyro_z_ = 0; bool estop_engaged_ = false; // Health tracking bool odom_alive_ = false; bool cmd_alive_ = false; bool imu_alive_ = false; int tick_count_ = 0; }; int main() { horus::Scheduler sched; sched.tick_rate(100_hz).name("telemetry_demo"); // Log every 100 ticks = 1 Hz output at 100 Hz tick rate TelemetryLogger logger(100); sched.add(logger) .order(100) // lowest priority — never delay real-time nodes .on_miss(horus::Miss::Skip) // skip if overrun — logging is expendable .build(); sched.spin(); } ``` ### Common Errors | Symptom | Cause | Fix | |---------|-------|-----| | Log output floods terminal | `log_every_n` too small | Increase to 100+ (1 Hz at 100 Hz tick) | | Logged data is stale | Not draining subscribers every tick | Call `recv()` every tick, not just on log ticks | | Real-time nodes miss deadlines | Logger at too high priority or `printf` blocking stdout | Set `order(100)` and use `horus::log::info()` | | Topic shows "no data" forever | Topic name mismatch between publisher and subscriber | Verify topic names match exactly (dots, not slashes) | | Blackbox entries too large | Logging too much detail in blackbox | Keep blackbox entries short; detailed logging goes to `horus::log` | | Memory grows over time | Subscriber buffers not drained | Ensure `recv()` is called every tick for all subscribers | ### Design Decisions | Choice | Rationale | |--------|-----------| | `order(100)` — lowest priority | Logging must never delay safety, control, or sensor nodes | | `Miss::Skip` policy | Logging is expendable — skipping a log entry is always acceptable | | Drain every tick, log every Nth | Fresh data on every log entry; no stale buffer accumulation | | `horus::log::info()` over printf | Non-blocking, thread-safe, filterable, captured by blackbox | | Cache values between log entries | Avoid repeated null checks; always have something to log | | Health check warnings | Surfaces broken topics early — one of the most useful debugging aids | | Blackbox recording at 0.1 Hz | Lightweight persistent record for post-mortem analysis without overhead | ### Variations **CSV file logging** — write to a file for offline analysis: ```cpp void tick() override { // ... drain subscribers and cache values ... if (tick_count_ % log_every_n_ != 0) return; // Append to CSV file FILE* f = std::fopen("/tmp/telemetry.csv", "a"); if (f) { std::fprintf(f, "%d,%.4f,%.4f,%.4f,%.4f,%.4f\n", tick_count_, last_x_, last_y_, last_theta_, last_linear_, last_angular_); std::fclose(f); } } ``` **Per-topic rate monitoring** — track publish rates to detect stalled sensors: ```cpp int odom_count_ = 0, cmd_count_ = 0, imu_count_ = 0; void tick() override { if (odom_sub_->recv()) odom_count_++; if (cmd_sub_->recv()) cmd_count_++; if (imu_sub_->recv()) imu_count_++; if (tick_count_ % 100 == 0) { char buf[128]; std::snprintf(buf, sizeof(buf), "rates: odom=%d Hz cmd=%d Hz imu=%d Hz", odom_count_, cmd_count_, imu_count_); horus::log::info("telemetry", buf); odom_count_ = cmd_count_ = imu_count_ = 0; } } ``` **Conditional detail logging** — verbose output only when anomalies detected: ```cpp if (estop_engaged_) { // Log extra detail during E-stop events horus::log::warn("telemetry", "E-stop active — dumping full state"); // ... log every subscriber value in detail ... } else { // Normal summary only horus::log::info("telemetry", summary_buf); } ``` ### Key Takeaways - The telemetry logger runs at `order(100)` with `Miss::Skip` — it must never interfere with real-time nodes - Drain all subscriber buffers every tick, even when not logging, to prevent stale data - Use `horus::log::info()` instead of printf — it is non-blocking and captured by the blackbox - Health check warnings ("no odom data received") are one of the most useful debugging features - Rate-limited logging (every Nth tick) is simpler and cheaper than running a separate low-rate node --- ## Recipe: Multi-Rate Pipeline (C++) Path: /recipes/multi-rate-pipeline-cpp Description: Run sensors at different rates in a single scheduler using tick dividers and deterministic ordering. ## Multi-Rate Pipeline (C++) Run an IMU at 200 Hz, a controller at 100 Hz, a camera pipeline at 30 Hz, and a logger at 1 Hz — all in one scheduler with deterministic ordering and per-node budgets. Each node is a `horus::Node` subclass with its own tick divider. ### Problem Your robot has sensors and actuators that operate at fundamentally different rates. The IMU publishes at 200 Hz, the depth camera at 30 Hz, the control loop needs 100 Hz, and diagnostics only need 1 Hz. Running separate schedulers wastes resources and makes ordering non-deterministic. You need a single scheduler where every node runs at its natural rate with guaranteed execution order. ### How It Works A HORUS scheduler ticks at one master rate — typically the rate of the **fastest node** in the pipeline. Slower nodes use **tick dividers**: an internal counter that increments every tick and only executes the node's real work when the counter hits the divisor. For example, if the scheduler runs at 200 Hz and a node needs 100 Hz, that node's divider is 2 — it does work every 2nd tick and returns immediately on the others. The key formula is: ```text divisor = master_rate / node_rate ``` So at 200 Hz master: IMU divisor = 1 (every tick), controller divisor = 2 (every other tick), camera divisor = 7 (approximately every 7th tick for ~28.6 Hz), logger divisor = 200 (once per second). **Ordering** is deterministic within each tick. Nodes with lower `order()` values run first. This guarantees that sensor producers run before consumers: IMU (order 0) publishes before the controller (order 10) reads, and the controller publishes before the logger (order 100) reads. Even nodes that skip their tick via the divider still occupy their slot — they just return immediately. **Budget enforcement** applies per tick, not per active cycle. If a node is in its skip phase, it consumes near-zero time. If it is in its active phase and overruns the budget, the miss policy kicks in. This means you can give tight budgets to critical nodes and generous budgets to non-critical ones without them interfering. ### When To Use - Any robot with sensors at different rates (IMU at 200 Hz, camera at 30 Hz, LiDAR at 10 Hz) - When you need deterministic execution order across all rates - When running multiple schedulers would waste CPU or complicate inter-node communication - Embedded systems where a single event loop is simpler than multi-threaded pipelines ### Prerequisites - HORUS installed ([Installation Guide](/docs/getting-started/installation)) - Basic understanding of nodes and topics ([C++ Quick Start](/docs/getting-started/quick-start-cpp)) - Understanding of [scheduling and ordering](/docs/concepts/core-concepts-scheduling) ### Complete Code ```cpp #include #include #include #include using namespace horus::literals; // ============================================================ // IMU Driver — 200 Hz (every tick) // Reads the IMU hardware and publishes raw measurements. // ============================================================ class ImuDriver : public horus::Node { public: ImuDriver() : Node("imu_driver") { imu_pub_ = advertise("imu.data"); } void tick() override { // Real driver would read SPI/I2C here horus::msg::Imu data{}; data.linear_acceleration[0] = 0.02f; // slight forward tilt (m/s^2) data.linear_acceleration[1] = 0.0f; data.linear_acceleration[2] = 9.81f; // gravity data.angular_velocity[0] = 0.001f; // near-zero roll rate (rad/s) data.angular_velocity[1] = 0.0f; data.angular_velocity[2] = 0.05f; // gentle yaw rate during a turn imu_pub_->send(data); } private: horus::Publisher* imu_pub_; }; // ============================================================ // Camera Pipeline — 30 Hz (every ~7th tick at 200 Hz master) // Processes depth frames and publishes detection results. // ============================================================ class CameraPipeline : public horus::Node { public: explicit CameraPipeline(int divisor) : Node("camera_pipeline"), divisor_(divisor) { det_pub_ = advertise("camera.detections"); } void tick() override { // Tick divider: skip ticks that aren't ours if (++tick_counter_ % divisor_ != 0) return; // Simulate depth frame processing — in production this would // run an inference model or stereo matching pipeline float detection_score = 0.87f; // confidence of nearest obstacle float detection_range = 2.4f; // meters to detected object horus::msg::CmdVel det{}; det.linear = detection_range; // pack range into linear field det.angular = detection_score; // pack confidence into angular field det_pub_->send(det); if (++active_count_ % 30 == 0) { horus::log::info("camera", "Frame processed — 30 Hz active cycle"); } } private: horus::Publisher* det_pub_; int divisor_; int tick_counter_ = 0; int active_count_ = 0; }; // ============================================================ // Controller — 100 Hz (every 2nd tick at 200 Hz master) // Fuses IMU + camera detections into velocity commands. // ============================================================ class Controller : public horus::Node { public: explicit Controller(int divisor) : Node("controller"), divisor_(divisor) { imu_sub_ = subscribe("imu.data"); det_sub_ = subscribe("camera.detections"); cmd_pub_ = advertise("cmd_vel"); } void tick() override { // IMPORTANT: always drain subscribers every tick, even on skip ticks. // If you only recv() on active ticks, stale messages pile up in the buffer. auto imu = imu_sub_->recv(); auto det = det_sub_->recv(); // Tick divider: skip the computation but still drained the buffers above if (++tick_counter_ % divisor_ != 0) return; // Base forward speed — 0.3 m/s cruising float linear = 0.3f; float angular = 0.0f; // If camera detected an obstacle, slow down proportionally if (det) { float range = det->get()->linear; float confidence = det->get()->angular; if (confidence > 0.7f && range < 3.0f) { // Scale speed: full at 3m, zero at 0.5m float factor = std::clamp((range - 0.5f) / 2.5f, 0.0f, 1.0f); linear = 0.3f * factor; } } // Use IMU yaw rate for heading correction if (imu) { float gz = imu->get()->angular_velocity[2]; // Compensate for drift: if turning unintentionally, correct if (std::abs(gz) > 0.02f) { angular = -0.5f * gz; // P-controller on yaw drift } } horus::msg::CmdVel cmd{}; cmd.linear = linear; cmd.angular = angular; cmd_pub_->send(cmd); } void enter_safe_state() override { // Zero all motion on safety event horus::msg::CmdVel stop{}; cmd_pub_->send(stop); horus::blackbox::record("controller", "Entered safe state — motors zeroed"); } private: horus::Subscriber* imu_sub_; horus::Subscriber* det_sub_; horus::Publisher* cmd_pub_; int divisor_; int tick_counter_ = 0; }; // ============================================================ // Diagnostics Logger — 1 Hz (every 200th tick at 200 Hz master) // Records system health and topic throughput. // ============================================================ class DiagLogger : public horus::Node { public: explicit DiagLogger(int divisor) : Node("diag_logger"), divisor_(divisor) { cmd_sub_ = subscribe("cmd_vel"); imu_sub_ = subscribe("imu.data"); } void tick() override { // Drain every tick to count messages if (auto cmd = cmd_sub_->recv()) { last_linear_ = cmd->get()->linear; last_angular_ = cmd->get()->angular; cmd_count_++; } if (auto imu = imu_sub_->recv()) { imu_count_++; } if (++tick_counter_ % divisor_ != 0) return; // Log once per second char buf[256]; std::snprintf(buf, sizeof(buf), "cmd_vel=(%.2f m/s, %.2f rad/s) " "imu_msgs/s=%d cmd_msgs/s=%d", last_linear_, last_angular_, imu_count_, cmd_count_); horus::log::info("diag", buf); // Reset counters for next 1-second window imu_count_ = 0; cmd_count_ = 0; } private: horus::Subscriber* cmd_sub_; horus::Subscriber* imu_sub_; int divisor_; int tick_counter_ = 0; int imu_count_ = 0; int cmd_count_ = 0; float last_linear_ = 0.0f; float last_angular_ = 0.0f; }; int main() { horus::Scheduler sched; // Master rate = fastest node (IMU at 200 Hz) sched.tick_rate(200_hz).name("multi_rate_pipeline"); // ── Compute divisors from master rate ─────────────────── // IMU: 200 / 200 = 1 (every tick) // Camera: 200 / 30 ~ 7 (every 7th tick = 28.6 Hz) // Controller: 200 / 100 = 2 (every 2nd tick) // Logger: 200 / 1 = 200 (every 200th tick) ImuDriver imu; sched.add(imu) .order(0) // runs first — sensor producer .budget(500_us) // SPI/I2C read should be fast .on_miss(horus::Miss::Warn) // IMU overrun is notable but not fatal .build(); CameraPipeline camera(7); sched.add(camera) .order(1) // after IMU, before controller .budget(15_ms) // vision processing takes longer .on_miss(horus::Miss::Skip) // skip if overrun — stale frame is useless .build(); Controller ctrl(2); sched.add(ctrl) .order(10) // after all sensors .budget(3_ms) // control math is lightweight .on_miss(horus::Miss::Skip) // skip if overrun — stale commands dangerous .build(); DiagLogger logger(200); sched.add(logger) .order(100) // runs last — reads everything .budget(5_ms) // logging can take a moment .on_miss(horus::Miss::Warn) // missed log is annoying but harmless .build(); sched.spin(); } ``` ### Common Errors | Symptom | Cause | Fix | |---------|-------|-----| | Subscriber buffer overflows | Only calling `recv()` on active ticks, so 200 Hz messages pile up between 1 Hz reads | Always drain subscribers every tick, even on skip ticks | | Camera runs at wrong rate | Divisor calculation is off (200/30 = 6.67, rounded to 7 gives 28.6 Hz not 30) | Accept the approximation, or set master rate to LCM of all rates (e.g., 600 Hz) | | Controller sees stale IMU data | Controller `order` is lower than IMU — it runs before IMU publishes | Ensure producers have lower order values than consumers | | Logger shows 0 messages/sec | Logger divisor is wrong or logger does not drain on skip ticks | Drain subscribers every tick; only gate the logging output | | High CPU usage with many skip ticks | Tick overhead per node even on skip, multiplied by 200 Hz | Reduce master rate if most nodes are slow, or use `Miss::Skip` to shed load | | Budget miss on camera node | 15ms budget too tight for inference on the target hardware | Profile actual inference time and set budget to p99 + 2ms headroom | ### Design Decisions | Choice | Rationale | |--------|-----------| | Master rate = fastest node (200 Hz) | Ensures the fastest node gets a tick every cycle without needing sub-tick scheduling | | Tick dividers inside each Node | Each node owns its rate logic — the scheduler does not need to know about multiple rates | | Drain subscribers on every tick | Prevents buffer overflow when a slow consumer reads a fast producer's topic | | Separate Node subclasses (not lambdas) | Each node has its own state, `enter_safe_state()`, and testable interface | | Controller has `enter_safe_state()` | The controller is the actuator node — it must zero outputs on any safety event | | `Miss::Skip` for controller and camera | Stale sensor data or stale commands are worse than skipping a cycle | | `Miss::Warn` for IMU and logger | IMU overrun is unusual and worth investigating; logger overrun is harmless | ### Variations **LCM-based master rate** — if you need exact rates for all nodes, set the master rate to the least common multiple: ```cpp // Exact 200 Hz, 100 Hz, 30 Hz, 1 Hz requires LCM = 600 Hz sched.tick_rate(600_hz); // Divisors: IMU=3, Controller=6, Camera=20, Logger=600 // Downside: 600 Hz master means more overhead per tick ``` **Node-level rate configuration** — pass rate as a parameter instead of computing divisors at compile time: ```cpp class RateNode : public horus::Node { public: RateNode(const char* name, int master_hz, int target_hz) : Node(name), divisor_(master_hz / target_hz) {} void tick() override { // Common divider logic if (++counter_ % divisor_ != 0) return; do_work(); // subclass implements actual computation } protected: virtual void do_work() = 0; private: int divisor_; int counter_ = 0; }; ``` **Priority-based budgeting** — give critical nodes more budget by stealing from non-critical ones: ```cpp // Total per-tick budget: 5ms at 200 Hz // IMU: 500us (critical — sensor timing matters) // Controller: 3ms (critical — actuator safety) // Camera: 15ms (only runs every 7th tick, can burst) // Logger: 1ms (best effort — can be skipped entirely) sched.add(logger) .order(100) .budget(1_ms) .on_miss(horus::Miss::Skip) // expendable — skip if time is tight .build(); ``` ### Key Takeaways - Set the master tick rate to the fastest node's rate — slower nodes use integer divisors - Always drain subscriber buffers every tick, even on skip ticks, to prevent stale data accumulation - Execution order is deterministic: lower `order()` values run first, guaranteeing producer-before-consumer - Each node should be a `horus::Node` subclass with its own state and `enter_safe_state()` for actuator nodes - Accept approximate rates from integer division (200/7 = 28.6 Hz instead of 30 Hz) or use LCM master rates for exactness --- ## Recipe: Record & Replay Path: /recipes/record-replay Description: Record robot execution, replay deterministically, and use mixed replay for regression testing ## Record & Replay Capture a robot's execution and replay it later for debugging or regression testing. Unlike external bag tools, HORUS recording is built into the scheduler — zero serialization overhead, tick-perfect determinism, and mixed replay for what-if testing. ### When To Use This - Debugging a bug that only reproduces with specific sensor data - Regression testing a new controller against recorded inputs - Comparing two algorithm versions on identical data - Capturing field data for offline analysis ### Prerequisites - HORUS installed ([Installation Guide](/getting-started/installation)) - Basic understanding of nodes and topics ([Quick Start](/getting-started/quick-start)) ### horus.toml ```toml [package] name = "record-replay-demo" version = "0.1.0" description = "Record and replay demonstration" ``` ### Complete Code ### Understanding the Code - **`.with_recording()`** / `recording=True` captures every node's topic inputs and outputs each tick as raw shared memory bytes — zero serialization overhead - **`stop_recording()`** flushes to disk and returns file paths (one `.horus` file per node + one scheduler metadata file) - **`replay_from()`** loads the scheduler recording and replays all nodes with the original timing and data - **`add_replay()`** is the key differentiator from external bag tools — it replays one node's recorded outputs while running other nodes live, enabling regression testing without re-recording ### CLI Workflows ```bash # Record during any run horus run --record my_session # List and inspect horus record list --long horus record info my_session # Full replay (all nodes) horus record replay my_session horus record replay my_session --speed 0.5 --start-tick 100 # Mixed replay (recorded sensor + live code) horus record inject my_session --nodes sensor # Compare two runs horus record diff session_v1 session_v2 # Export for analysis horus record export my_session --output data.csv --format csv # Cleanup horus record clean --max-age-days 30 horus record delete old_session ``` ### Common Errors | Symptom | Cause | Fix | |---------|-------|-----| | Empty recording | `.with_recording()` not set | Add to scheduler builder or use `--record` CLI flag | | Replay output differs | Code changed between record and replay | Expected for mixed replay; use `replay_from` for exact reproduction | | `inject --nodes X` has no effect | Topic name mismatch | Names are case-sensitive and dot-separated | | `FileNotFoundError` on replay | Wrong session name | Run `horus record list` to see available sessions | ### See Also - [Record & Replay Reference](/advanced/record-replay) — Full API docs, design decisions, trade-offs - [Debug with Record & Replay Tutorial](/tutorials/record-replay-debugging) — Step-by-step debugging walkthrough - [BlackBox Flight Recorder](/advanced/blackbox) — Lightweight always-on crash forensics - [Deterministic Mode](/advanced/deterministic-mode) — Bit-identical replay requirements --- ## Recipe: Record & Replay (Python) Path: /recipes/record-replay-python Description: Record robot execution from Python, manage recordings, and replay via CLI for debugging and regression testing ## Record & Replay (Python) Capture a Python robot's execution and replay it later for debugging or regression testing. Recording is enabled via the `Scheduler` constructor; replay and mixed replay use the CLI. ### When To Use - Debugging a bug that only reproduces with specific sensor data - Capturing field data for offline analysis or algorithm development - Regression testing after controller changes ### Prerequisites - HORUS installed ([Installation Guide](/getting-started/installation)) - Basic understanding of Python nodes ([Quick Start (Python)](/getting-started/quick-start-python)) ### horus.toml ```toml [package] name = "record-replay-py" version = "0.1.0" description = "Record and replay demonstration (Python)" language = "python" ``` ### Complete Code ```python #!/usr/bin/env python3 """Record a robot session, then replay and analyze via CLI.""" import math import horus from horus import Node, Scheduler # ── Sensor Node ──────────────────────────────────────────────── def sensor_init(node): node.state = {"tick": 0} def sensor_tick(node): t = node.state["tick"] * 0.01 # Sine wave with periodic spikes spike = 5.0 if node.state["tick"] % 73 == 0 else 0.0 value = math.sin(t * 2.0) * 10.0 + spike node.send("sensor.reading", {"value": value, "tick": node.state["tick"]}) node.state["tick"] += 1 sensor = Node( name="sensor", pubs=["sensor.reading"], tick=sensor_tick, init=sensor_init, rate=100, ) # ── Controller Node ──────────────────────────────────────────── def ctrl_init(node): node.state = {"integral": 0.0, "setpoint": 0.0} def ctrl_tick(node): data = node.recv("sensor.reading") if data is None: return error = node.state["setpoint"] - data["value"] dt = horus.dt() # IMPORTANT: Clamp integral to prevent windup node.state["integral"] = max(-10.0, min(10.0, node.state["integral"] + error * dt )) output = 0.5 * error + 0.1 * node.state["integral"] node.send("ctrl.cmd", {"output": output, "integral": node.state["integral"]}) controller = Node( name="controller", subs=["sensor.reading"], pubs=["ctrl.cmd"], tick=ctrl_tick, init=ctrl_init, rate=100, ) # ── Record ───────────────────────────────────────────────────── print("=== Recording 5 seconds ===") sched = Scheduler(tick_rate=100, recording=True) sched.add(sensor) sched.add(controller) sched.run(duration=5.0) # Save and list files = sched.stop_recording() print(f"Saved to: {files}") for rec in sched.list_recordings(): print(f" Available session: {rec}") # ── Step 2: Full replay ────────────────────────────────────────── print("\n=== Full replay ===") scheduler_file = [f for f in files if "scheduler@" in f][0] replay_sched = Scheduler.replay_from(scheduler_file) replay_sched.run() # ── Step 3: Time travel replay ───────────────────────────────── print("\n=== Time travel (ticks 200-400, half speed) ===") replay2 = Scheduler.replay_from(scheduler_file) replay2.start_at_tick(200) replay2.stop_at_tick(400) replay2.set_replay_speed(0.5) replay2.run() # ── Step 4: Mixed replay (recorded sensor + live controller) ─── print("\n=== Mixed replay ===") sensor_file = [f for f in files if "sensor@" in f][0] mixed = Scheduler(tick_rate=100) mixed.add_replay(sensor_file, priority=0) # Recorded sensor mixed.add(controller) # Live controller mixed.run(duration=5.0) # ── Step 5: What-if override ─────────────────────────────────── import struct print("\n=== What-if override ===") ov_sched = Scheduler.replay_from(scheduler_file) ov_sched.set_replay_override("sensor", "sensor.reading", struct.pack(' # horus record diff session_v1 session_v2 # horus record export --output data.csv --format csv # horus record clean --max-age-days 7 ``` ### Understanding the Code - **`Scheduler(recording=True)`** enables per-tick capture of all node inputs and outputs as raw shared memory bytes - **`stop_recording()`** flushes data to `~/.local/share/horus/recordings/` and returns file paths - **`Scheduler.replay_from(path)`** loads an entire scheduler recording for deterministic replay - **`add_replay(path)`** loads a single node's recording for mixed replay (recorded + live nodes) - **`start_at_tick()` / `stop_at_tick()`** enable time travel to specific tick ranges - **`set_replay_speed()`** controls playback speed (0.01x to 100x) - **`set_replay_override()`** replaces a node's output with custom bytes for what-if testing ### CLI Quick Reference ```bash # Record (alternative to code) horus run --record my_session src/main.py # Inspect horus record list --long horus record info my_session # Replay horus record replay my_session horus record replay my_session --speed 0.5 --start-tick 100 # Mixed replay (regression testing) horus record inject my_session --nodes sensor # Compare horus record diff before_fix after_fix # Export for pandas horus record export my_session --output data.csv --format csv # Cleanup horus record clean --max-age-days 30 ``` ### Common Errors | Symptom | Cause | Fix | |---------|-------|-----| | `list_recordings()` returns empty | `recording=True` not set in Scheduler | Add `recording=True` to constructor | | `stop_recording()` returns empty paths | Scheduler did not run (no ticks) | Ensure `sched.run(duration=N)` runs before stopping | | `horus record inject` has no effect | Topic name mismatch | Python topic names must match exactly (case-sensitive, dot-separated) | | Recording files are large | High-frequency topics with large payloads | Use `horus record clean` to manage disk space | ### See Also - [Record & Replay Reference](/advanced/record-replay) — Full documentation and design decisions - [Record & Replay Recipe (Rust)](/recipes/record-replay) — Rust version with programmatic replay and mixed replay - [Debug with Record & Replay Tutorial](/tutorials/record-replay-debugging) — Step-by-step debugging walkthrough - [BlackBox Flight Recorder](/advanced/blackbox) — Lightweight always-on crash forensics ======================================== # SECTION: cpp ======================================== --- ## C++ Real-Time Guide Path: /cpp/realtime Description: Budget, deadline, miss policies, SCHED_FIFO, CPU pinning, and watchdog configuration # C++ Real-Time Guide HORUS provides built-in real-time support. Set `.rate()`, `.budget()`, or `.deadline()` on a node and the scheduler automatically upgrades it to RT execution. No manual thread management. ## Execution Classes ```cpp using namespace horus::literals; horus::Scheduler sched; sched.tick_rate(1000_hz).prefer_rt(); // RT — auto-detected from rate/budget/deadline sched.add("motor") .rate(1000_hz) .budget(800_us) // max 800us per tick .deadline(950_us) // must finish by 950us .on_miss(horus::Miss::SafeMode) // enter safe state on overrun .pin_core(2) // pin to CPU core 2 .priority(90) // SCHED_FIFO priority 90 .order(0) // runs first .tick([&] { /* motor control */ }) .build(); // Compute — thread pool for CPU-heavy work sched.add("planner") .compute() .order(50) .tick([&] { /* path planning */ }) .build(); // Event — triggered by topic update sched.add("estop_handler") .on("emergency.stop") .tick([&] { /* handle e-stop */ }) .build(); // AsyncIo — for GPU/network I/O sched.add("detector") .async_io() .order(60) .tick([&] { /* ML inference */ }) .build(); // BestEffort — no timing guarantees sched.add("logger") .order(100) .tick([&] { /* log data */ }) .build(); ``` ## Miss Policies | Policy | Behavior | |--------|----------| | `Miss::Warn` | Log warning, continue (default) | | `Miss::Skip` | Skip this tick, resume next | | `Miss::SafeMode` | Call `enter_safe_state()` on the node | | `Miss::Stop` | Stop the entire scheduler | ## Node Lifecycle Hooks ```cpp class MotorCtrl : public horus::Node { public: MotorCtrl() : Node("motor") { cmd_ = advertise("motor.cmd"); } void init() override { // Called once before first tick horus::log::info("motor", "Initialized, homing..."); } void tick() override { // Called every tick (1kHz with .rate(1000_hz)) } void enter_safe_state() override { // Called on Miss::SafeMode, watchdog timeout, or safety event horus::msg::CmdVel stop{}; cmd_->send(stop); horus::blackbox::record("motor", "Entered safe state"); } private: horus::Publisher* cmd_; }; ``` ## Watchdog ```cpp sched.watchdog(5_s); // Global: any node stuck > 5s triggers action sched.add("sensor") .watchdog(500_ms) // Per-node: this node must tick within 500ms .tick([&] { /* ... */ }) .build(); ``` ## RT Requirements For full RT (SCHED_FIFO): - Linux with `PREEMPT_RT` kernel (recommended but not required) - `CAP_SYS_NICE` capability or root - CPU governor set to `performance` Without RT kernel, HORUS degrades gracefully — `.prefer_rt()` logs warnings but continues. Use `.require_rt()` to fail if RT is unavailable. ## Performance | Metric | Value | |--------|-------| | FFI overhead | 15 ns | | Scheduler tick (1 node) | 248 ns median | | Throughput | 2.89M ticks/sec | | ASAN | Zero errors | --- ## C++ Performance Guide Path: /cpp/performance Description: Zero-copy patterns, loan API, pool sizing, and FFI overhead # C++ Performance Guide ## Zero-Copy Publishing The loan pattern gives you a direct pointer into shared memory: ```cpp auto pub = sched.advertise("lidar.scan"); auto sample = pub.loan(); // ~3ns — get SHM buffer sample->ranges[0] = 1.5f; // 0ns — direct write to SHM sample->ranges[1] = 2.0f; // 0ns — no copies pub.publish(std::move(sample)); // ~3ns — make visible to subscribers // Total: ~6ns regardless of message size (LaserScan is 1472 bytes) ``` **vs. send by copy:** ```cpp horus::msg::LaserScan scan{}; scan.ranges[0] = 1.5f; pub.send(scan); // ~15ns — copies 1472 bytes to SHM ``` **Rule:** Use `loan()` + `publish()` for large messages (> 64 bytes). Use `send()` for small messages (CmdVel = 16 bytes). ## TensorPool for Large Data For camera images and point clouds, use pool-backed types: ```cpp horus::TensorPool pool(1, 64 * 1024 * 1024, 128); // 64MB, 128 slots // Camera image — allocated from SHM pool, not heap horus::Image img(pool, 1920, 1080, horus::Encoding::Rgb8); // 1920 * 1080 * 3 = 6.2MB — zero-copy from pool // Neural network tensor uint64_t shape[] = {1, 3, 224, 224}; horus::Tensor tensor(pool, shape, 4, horus::Dtype::F32); float* data = reinterpret_cast(tensor.data()); data[0] = 1.0f; // Direct SHM write ``` ## Measured Performance | Operation | Latency | Notes | |-----------|---------|-------| | FFI call (horus_get_abi_version) | 11 ns | Baseline overhead | | CmdVel send (16 bytes) | 15 ns | Small message | | LaserScan send (1472 bytes) | ~20 ns | Zero-copy loan | | Scheduler tick (empty) | 37 ns | No nodes | | Scheduler tick (1 node) | 248 ns | Median | | Scheduler tick (10 nodes) | 2.2 us | Median | | Scheduler tick (50 nodes) | 10.9 us | Median | | Throughput | 2.89M ticks/sec | 1 node | ## Avoiding Common Pitfalls **Don't allocate in tick:** ```cpp // BAD — allocates every tick void tick() override { auto str = std::string("hello"); // heap allocation } // GOOD — pre-allocate class MyNode : public horus::Node { char buf_[256]; // stack or member void tick() override { std::snprintf(buf_, 256, "hello"); } }; ``` **Don't copy large messages:** ```cpp // BAD — copies 1472 bytes horus::msg::LaserScan scan{}; pub.send(scan); // GOOD — writes directly to SHM auto sample = pub.loan(); sample->ranges[0] = 1.5f; pub.publish(std::move(sample)); ``` **Pre-create publishers/subscribers:** ```cpp // BAD — creates Topic in tick (SHM open + mmap) void tick() override { horus::Publisher pub("cmd"); // DON'T } // GOOD — create once in constructor class MyNode : public horus::Node { horus::Publisher* pub_; MyNode() : Node("x") { pub_ = advertise("cmd"); } }; ``` --- ## C++ Testing Guide Path: /cpp/testing Description: Unit testing, ASAN, stress testing, and cross-process verification # C++ Testing Guide ## Building Test Binaries ```bash # Build the shared library cargo build --no-default-features -p horus_cpp # Compile a C++ test g++ -std=c++17 -fext-numeric-literals \ -I horus_cpp/include \ -o my_test tests/my_test.cpp \ -L target/debug -lhorus_cpp -lpthread -ldl -lm # Run LD_LIBRARY_PATH=target/debug ./my_test ``` ## Writing C++ Tests Use a simple CHECK macro pattern: ```cpp #include static int pass = 0, fail = 0; #define CHECK(cond, name) do { \ if (cond) { printf("[PASS] %s\n", name); pass++; } \ else { printf("[FAIL] %s\n", name); fail++; } \ } while(0) void test_pubsub() { horus::Publisher pub("test.cmd"); horus::Subscriber sub("test.cmd"); horus::msg::CmdVel msg{}; msg.linear = 1.5f; pub.send(msg); auto recv = sub.recv(); CHECK(recv.has_value(), "message received"); CHECK(recv->get()->linear == 1.5f, "field preserved"); } int main() { test_pubsub(); printf("Results: %d passed, %d failed\n", pass, fail); return fail > 0 ? 1 : 0; } ``` ## AddressSanitizer Compile with ASAN to detect memory errors: ```bash g++ -std=c++17 -fsanitize=address -fno-omit-frame-pointer \ -I horus_cpp/include \ -o my_test_asan tests/my_test.cpp \ -L target/debug -lhorus_cpp -lpthread -ldl -lm LD_LIBRARY_PATH=target/debug ASAN_OPTIONS=detect_leaks=0 ./my_test_asan ``` ## Stress Testing Test stability under load: ```cpp // 1000 scheduler create/destroy cycles for (int i = 0; i < 1000; i++) { horus::Scheduler sched; sched.add("node").tick([]{ }).build(); sched.tick_once(); } // 50 nodes with 100 ticks each { horus::Scheduler sched; for (int i = 0; i < 50; i++) { sched.add(("node_" + std::to_string(i)).c_str()) .tick([]{ }).build(); } for (int i = 0; i < 100; i++) sched.tick_once(); } ``` ## Cross-Process Testing Test IPC between separate processes: ```bash # Terminal 1: subscriber (start first) LD_LIBRARY_PATH=target/debug ./cross_process_sub "test.topic" # Terminal 2: publisher LD_LIBRARY_PATH=target/debug ./cross_process_pub "test.topic" ``` The subscriber must start first — it creates the SHM ring buffer that both processes share. ## CI Integration The `.github/workflows/cpp-bindings.yml` pipeline runs: 1. **Rust FFI tests** (139 tests) 2. **C++ compilation** (15 binaries) 3. **C++ unit tests** (e2e, ergonomic, full API, user API, stress) 4. **Cross-process IPC** (CmdVel, JSON, Service, Action) 5. **ASAN** (stress + full API under AddressSanitizer) 6. **Benchmarks** (release-mode performance) --- ## C++ Error Handling Path: /cpp/error-handling Description: Exception safety, null checks, RAII, and panic-safe FFI # C++ Error Handling ## RAII Everywhere All HORUS C++ types use RAII — destructors clean up automatically: ```cpp { horus::Scheduler sched; // creates Rust scheduler horus::TensorPool pool(1, 1024*1024, 64); // creates SHM pool horus::Image img(pool, 640, 480); // allocates from pool // ... use them ... } // all destroyed in reverse order, SHM cleaned up ``` **Move-only types** prevent accidental copies: ```cpp auto pub = sched.advertise("cmd"); auto pub2 = std::move(pub); // OK — transfer ownership // auto pub3 = pub2; // ERROR — copy deleted ``` ## Null Safety All C API functions handle null gracefully: ```cpp horus_scheduler_destroy(nullptr); // no-op horus_image_width(nullptr); // returns 0 horus_params_has(nullptr, nullptr); // returns false horus_action_client_send_goal(nullptr, nullptr); // returns nullptr ``` C++ wrappers check validity: ```cpp horus::TensorPool pool(1, 1024, 4); if (!pool) { // Pool creation failed (e.g., SHM permission denied) } auto img = horus::Image(pool, 640, 480); if (!img) { // Image allocation failed (pool full) } ``` ## Exception Safety The FFI boundary uses `catch_unwind` to prevent Rust panics from crossing into C++: ``` C++ tick lambda └─ extern "C" trampoline function └─ CppNode::tick() in Rust └─ std::panic::catch_unwind └─ your callback └─ if panic → caught, node marked failed, continues ``` If a C++ exception is thrown inside a tick callback, it **must be caught before returning**. Unwinding through `extern "C"` is undefined behavior. ```cpp sched.add("safe_node") .tick([&] { try { risky_operation(); } catch (const std::exception& e) { horus::log::error("node", e.what()); } }) .build(); ``` ## Failed Nodes If a Rust panic occurs in a node's tick, the node is disabled: - First panic: caught, logged, node marked `failed` - Subsequent ticks: silently skipped (no-op) - Other nodes continue running This prevents one misbehaving node from taking down the entire system. ## Error Logging Use `horus::log` for structured error reporting: ```cpp horus::log::info("sensor", "Calibration complete"); horus::log::warn("controller", "PID output near saturation"); horus::log::error("safety", "Watchdog timeout on motor driver"); horus::blackbox::record("crash", "Segfault in vision pipeline"); ``` All messages appear in `horus log` CLI output. --- ## C++ Hardware Integration Path: /cpp/hardware Description: Connect real sensors and actuators using horus::Node # C++ Hardware Integration ## Pattern: Hardware Driver Node Every hardware device gets its own `horus::Node` subclass: ```cpp class LidarDriver : public horus::Node { public: LidarDriver(const char* port) : Node("lidar_driver"), port_(port) { scan_pub_ = advertise("lidar.scan"); } void init() override { // Open serial port, configure device fd_ = open(port_, O_RDWR | O_NOCTTY); if (fd_ < 0) { horus::log::error("lidar", "Failed to open port"); } horus::log::info("lidar", "Connected"); } void tick() override { if (fd_ < 0) return; // Read raw data from device uint8_t buf[2048]; int n = read(fd_, buf, sizeof(buf)); if (n <= 0) return; // Parse into LaserScan message auto scan = scan_pub_->loan(); parse_rplidar_packet(buf, n, scan.get()); scan_pub_->publish(std::move(scan)); } void enter_safe_state() override { // Stop motor on safety event if (fd_ >= 0) { uint8_t stop_cmd[] = {0xA5, 0x25}; write(fd_, stop_cmd, 2); } } private: const char* port_; int fd_ = -1; horus::Publisher* scan_pub_; void parse_rplidar_packet(const uint8_t* buf, int len, horus::msg::LaserScan* scan) { // Device-specific parsing... for (int i = 0; i < 360; i++) { scan->ranges[i] = 5.0f; // placeholder } } }; ``` ## Multi-Device Robot ```cpp int main() { horus::Scheduler sched; sched.tick_rate(100_hz).prefer_rt(); // Hardware drivers — highest priority, deterministic LidarDriver lidar("/dev/ttyUSB0"); sched.add(lidar).order(0).rate(100_hz).budget(2_ms).build(); ImuDriver imu("/dev/ttyUSB1"); sched.add(imu).order(1).rate(200_hz).budget(1_ms).build(); MotorDriver motors("/dev/ttyUSB2"); sched.add(motors).order(2).rate(100_hz).budget(1_ms) .on_miss(horus::Miss::SafeMode).build(); // Control — reads from hardware topics Controller ctrl; sched.add(ctrl).order(10).budget(5_ms).build(); // Monitoring — lowest priority TelemetryLogger logger; sched.add(logger).order(100).build(); sched.spin(); } ``` ## Coordinate Frames Mount sensors with known transforms: ```cpp horus::TransformFrame tf; tf.register_frame("world"); tf.register_frame("base_link", "world"); tf.register_frame("lidar", "base_link"); tf.register_frame("camera", "base_link"); // Static sensor mounts tf.update("lidar", {0.2, 0.0, 0.3}, {0, 0, 0, 1}, 0); tf.update("camera", {0.1, 0.0, 0.25}, {0, 0, 0.04, 1.0}, 0); // In processing node: auto lidar_in_world = tf.lookup("lidar", "world"); ``` ## Runtime Parameters Tune hardware settings without recompiling: ```cpp horus::Params params; params.set("lidar_rpm", int64_t(600)); params.set("motor_max_torque", 2.5); params.set("imu_calibration_offset", 0.02); // In driver tick: double max_torque = params.get("motor_max_torque", 1.0); ``` ## CLI Integration While your C++ robot runs: ```bash horus topic list # see all active topics horus topic echo lidar.scan # watch raw sensor data horus node list # see node status, tick rates, CPU usage horus log # see log messages from all nodes ``` --- ## C++ API Reference Path: /cpp/api Description: Complete API reference for the HORUS C++ bindings — scheduling, nodes, topics, services, and actions # C++ API Reference The HORUS C++ API provides first-class access to the real-time scheduler, zero-copy IPC, services, and actions. The bindings are generated from the Rust core via a C FFI layer (`horus_c.h`), so the C++ API has identical semantics to Rust and Python -- same shared memory transport, same scheduling guarantees, same message types. > **Rust**: See [Rust API Reference](/rust/api) for the native API. > **Python**: See [Python API](/python/api) for the scripting API. ```cpp // simplified #include using namespace horus::literals; ``` --- ## Quick Reference -- All Classes ### Core | Class | Header | Description | |-------|--------|-------------| | [`Scheduler`](/cpp/api/scheduler) | `scheduler.hpp` | Real-time scheduler -- creates, configures, and runs nodes | | [`NodeBuilder`](/cpp/api/scheduler#nodebuilder) | `scheduler.hpp` | Builder for configuring a node before registration | | [`Node`](/cpp/api/node) | `node.hpp` | Base class for struct-based nodes (like Rust `impl Node`) | | [`LambdaNode`](/cpp/api/node#lambdanode) | `node.hpp` | Declarative node with builder pattern (like Python `horus.Node`) | ### Communication | Class | Header | Description | |-------|--------|-------------| | [`Publisher`](/cpp/api/topic#publisher) | `topic.hpp` | Sends messages to a topic via zero-copy SHM | | [`Subscriber`](/cpp/api/topic#subscriber) | `topic.hpp` | Receives messages from a topic via zero-copy SHM | | [`LoanedSample`](/cpp/api/topic#loanedsample) | `topic.hpp` | Writable zero-copy buffer for publishing | | [`BorrowedSample`](/cpp/api/topic#borrowedsample) | `topic.hpp` | Read-only zero-copy buffer from subscribing | ### Services and Actions | Class | Header | Description | |-------|--------|-------------| | [`ServiceClient`](/cpp/api/services#serviceclient) | `service.hpp` | Synchronous request/response RPC client | | [`ServiceServer`](/cpp/api/services#serviceserver) | `service.hpp` | Request/response RPC server with handler callback | | [`ActionClient`](/cpp/api/services#actionclient) | `action.hpp` | Long-running task client with goal tracking | | [`ActionServer`](/cpp/api/services#actionserver) | `action.hpp` | Long-running task server with accept/execute handlers | | [`GoalHandle`](/cpp/api/services#goalhandle) | `action.hpp` | Handle to a running action goal | | [`GoalStatus`](/cpp/api/services#goalstatus) | `action.hpp` | Enum: Pending, Active, Succeeded, Aborted, Canceled, Rejected | ### Support Types | Type | Header | Description | |------|--------|-------------| | `Frequency` | `duration.hpp` | Frequency in Hertz, created via `100_hz` literal | | `Duration` | `duration.hpp` | Alias for `std::chrono::microseconds` | | `Miss` | `error.hpp` | Deadline miss policy: Warn, Skip, SafeMode, Stop | --- ## Include Patterns ### Single Include (Recommended) ```cpp #include using namespace horus::literals; // 100_hz, 5_ms, 200_us ``` This pulls in all headers: scheduler, node, topic, messages, services, actions, duration, and error types. ### Selective Includes ```cpp #include // Scheduler, NodeBuilder #include // Node, LambdaNode #include // Publisher, Subscriber, LoanedSample, BorrowedSample #include // All 51 message types #include // ServiceClient, ServiceServer #include // ActionClient, ActionServer, GoalHandle #include // Frequency, Duration, literals ``` ### Message Category Includes ```cpp #include // Point3, Vector3, Quaternion, Twist, Pose2D, Pose3D, ... #include // LaserScan, Imu, Odometry, JointState, BatteryState, ... #include // CmdVel, MotorCommand, JointCommand, PidConfig, ... #include // NavGoal, GoalResult, Waypoint, PathPlan #include // Heartbeat, DiagnosticStatus, EmergencyStop, ... #include // Detection, Detection3D, TrackedObject, Landmark, ... #include // CameraInfo, StereoInfo, RegionOfInterest #include // WrenchStamped, ForceCommand, ContactInfo, ... #include // Clock, TimeReference, SimSync, RateRequest #include // KeyboardInput, JoystickInput, AudioFrame ``` --- ## Minimal Example ```cpp #include using namespace horus::literals; int main() { auto sched = horus::Scheduler() .tick_rate(100_hz) .name("my_robot") .prefer_rt(); auto cmd_pub = sched.advertise("motor.cmd"); auto scan_sub = sched.subscribe("lidar.scan"); sched.add("controller") .rate(50_hz) .budget(5_ms) .on_miss(horus::Miss::Skip) .tick([&] { auto scan = scan_sub.recv(); if (!scan) return; auto cmd = cmd_pub.loan(); cmd->linear = scan->ranges[0] > 1.0f ? 0.5f : 0.0f; cmd->angular = 0.0f; cmd_pub.publish(std::move(cmd)); }) .build(); sched.spin(); // blocks until Ctrl+C } ``` --- ## Build Integration HORUS C++ projects use `horus.toml` as the single source of truth. The CLI generates `CMakeLists.txt` into `.horus/`: ```toml # horus.toml [package] name = "my_robot" language = "cpp" version = "0.1.0" [dependencies] horus = "0.1" ``` ```bash horus build # generates .horus/CMakeLists.txt, compiles horus run # builds + runs ``` --- ## Duration and Frequency Literals HORUS provides user-defined literals that match the Rust `DurationExt` trait. Import them with `using namespace horus::literals;`: | Literal | Type | Example | |---------|------|---------| | `_hz` | `Frequency` | `100_hz` = 100 Hz tick rate | | `_ms` | `Duration` (microseconds) | `5_ms` = 5000 us | | `_us` | `Duration` (microseconds) | `200_us` = 200 us | | `_ns` | `std::chrono::nanoseconds` | `500_ns` = 500 ns | | `_s` | `Duration` (microseconds) | `5_s` = 5000000 us | ```cpp using namespace horus::literals; auto freq = 100_hz; // Frequency(100.0) auto dur = 5_ms; // std::chrono::microseconds(5000) auto us = 200_us; // std::chrono::microseconds(200) // Frequency helper methods auto period = freq.period(); // 10000 us (10 ms) auto budget = freq.budget_default(); // 8000 us (80% of period) auto deadline = freq.deadline_default();// 9500 us (95% of period) ``` --- ## Cross-Language Equivalence The C++ API mirrors Rust and Python exactly. Code written in one language can be ported line-by-line: | Concept | Rust | C++ | Python | |---------|------|-----|--------| | Scheduler | `Scheduler::new()` | `horus::Scheduler()` | `horus.Scheduler()` | | Tick rate | `.tick_rate(100.hz())` | `.tick_rate(100_hz)` | `tick_rate=100` | | Publish | `topic.send(&msg)?` | `pub.send(msg)` | `node.send("topic", msg)` | | Subscribe | `topic.recv()` | `sub.recv()` | `node.recv("topic")` | | Node | `impl Node for T` | `class T : public horus::Node` | `horus.Node(name, tick)` | | Run | `scheduler.spin()` | `sched.spin()` | `horus.run(node)` | All three languages share the same SHM transport. A Rust publisher and a C++ subscriber on the same topic communicate with zero serialization overhead. --- ## Ownership and Move Semantics All core types (`Scheduler`, `Publisher`, `Subscriber`, `LoanedSample`, `BorrowedSample`, `ServiceClient`, `ServiceServer`, `ActionClient`, `ActionServer`, `GoalHandle`) are **move-only**. Copy constructors and copy assignment are deleted. This matches Rust's ownership model -- each resource has exactly one owner. ```cpp auto pub = sched.advertise("cmd"); // auto pub2 = pub; // COMPILE ERROR: copy deleted auto pub2 = std::move(pub); // OK: ownership transferred ``` --- ## See Also - [Scheduler API](/cpp/api/scheduler) -- Builder pattern, node registration, execution - [Node API](/cpp/api/node) -- Lifecycle, struct-based and lambda-based nodes - [Publisher and Subscriber API](/cpp/api/topic) -- Zero-copy messaging, loan pattern - [Services and Actions API](/cpp/api/services) -- RPC and long-running tasks - [C++ Real-Time Guide](/cpp/realtime) -- Budget, deadline, miss policies, SCHED_FIFO - [C++ Error Handling](/cpp/error-handling) -- Error types and recovery - [C++ Performance](/cpp/performance) -- Benchmarks and optimization --- ## Scheduler API Path: /cpp/api/scheduler Description: Complete API reference for the HORUS C++ Scheduler — builder pattern, node registration, real-time execution, and runtime queries # Scheduler API The `Scheduler` is the central orchestrator in HORUS. It creates topics, registers nodes, manages their lifecycle, and drives the tick loop. Configuration uses a builder pattern -- chain methods to set tick rate, RT mode, watchdog, and networking, then call `spin()` to run. > **Rust**: See [`Scheduler`](/rust/api/scheduler) for the Rust equivalent. > **Python**: See [`horus.Scheduler`](/python/api) for the Python equivalent. ```cpp // simplified #include using namespace horus::literals; ``` --- ## Quick Reference -- Scheduler Methods | Method | Returns | Description | |--------|---------|-------------| | `Scheduler()` | `Scheduler` | Construct a new scheduler | | `.tick_rate(Frequency)` | `Scheduler&` | Set the global tick rate | | `.name(string_view)` | `Scheduler&` | Set the scheduler name | | `.prefer_rt()` | `Scheduler&` | Prefer RT scheduling (graceful degradation) | | `.require_rt()` | `Scheduler&` | Require RT scheduling (fail if unavailable) | | `.deterministic(bool)` | `Scheduler&` | Enable deterministic mode (SimClock + seeded RNG) | | `.verbose(bool)` | `Scheduler&` | Enable verbose logging | | `.watchdog(Duration)` | `Scheduler&` | Set global watchdog timeout | | `.blackbox(size_t)` | `Scheduler&` | Set BlackBox flight recorder size in MB | | `.enable_network()` | `Scheduler&` | Enable LAN network replication | | `.advertise(string_view)` | `Publisher` | Create a publisher for a named topic | | `.subscribe(string_view)` | `Subscriber` | Create a subscriber for a named topic | | `.add(string_view)` | `NodeBuilder` | Add a lambda node by name | | `.add(Node&)` | `NodeBuilder` | Add a struct-based node | | `.add(LambdaNode&)` | `NodeBuilder` | Add a LambdaNode | | `.spin()` | `void` | Run the scheduler (blocks until stopped) | | `.tick_once()` | `void` | Execute a single tick of all nodes | | `.stop()` | `void` | Stop the scheduler (thread-safe) | | `.is_running()` | `bool` | Check if the scheduler is still running | | `.get_name()` | `std::string` | Get the scheduler name | | `.status()` | `std::string` | Get a human-readable status string | | `.has_full_rt()` | `bool` | Check if full RT capabilities are available | | `.node_list()` | `std::vector` | Get list of registered node names | ## Quick Reference -- NodeBuilder Methods | Method | Returns | Description | |--------|---------|-------------| | `.rate(Frequency)` | `NodeBuilder&` | Set tick rate for this node | | `.budget(Duration)` | `NodeBuilder&` | Set execution budget (auto-enables RT) | | `.deadline(Duration)` | `NodeBuilder&` | Set hard deadline (auto-enables RT) | | `.on_miss(Miss)` | `NodeBuilder&` | Set deadline miss policy | | `.compute()` | `NodeBuilder&` | Mark as compute-class (CPU-bound) | | `.async_io()` | `NodeBuilder&` | Mark as async I/O class | | `.on(string_view)` | `NodeBuilder&` | Trigger on topic message (event-driven) | | `.order(uint32_t)` | `NodeBuilder&` | Set execution order within tick | | `.pin_core(size_t)` | `NodeBuilder&` | Pin to a specific CPU core | | `.priority(int32_t)` | `NodeBuilder&` | Set thread priority | | `.watchdog(Duration)` | `NodeBuilder&` | Set per-node watchdog timeout | | `.tick(function)` | `NodeBuilder&` | Set the tick callback | | `.init(function)` | `NodeBuilder&` | Set the init callback (called once) | | `.safe_state(function)` | `NodeBuilder&` | Set the enter_safe_state callback | | `.build()` | `void` | Finalize and register the node | --- ## Construction and Configuration The Scheduler uses a builder pattern. All configuration methods return `Scheduler&` for chaining. Configuration is deferred -- nothing runs until `spin()` or `tick_once()`. ```cpp #include using namespace horus::literals; auto sched = horus::Scheduler() .tick_rate(100_hz) // 100 Hz global tick rate .name("arm_controller") // scheduler name (shown in logs) .prefer_rt() // request SCHED_FIFO (fallback to SCHED_OTHER) .watchdog(500_ms) // kill nodes that exceed 500ms .blackbox(64) // 64 MB flight recorder .verbose(true); // print scheduling decisions ``` ### RT Mode Selection | Method | Behavior | |--------|----------| | `.prefer_rt()` | Request real-time scheduling. Falls back gracefully if `CAP_SYS_NICE` is unavailable | | `.require_rt()` | Require real-time scheduling. Fails with an error if RT is unavailable | | (neither) | Best-effort scheduling only | Check RT availability at runtime: ```cpp if (sched.has_full_rt()) { printf("Running with SCHED_FIFO\n"); } else { printf("Falling back to SCHED_OTHER\n"); } ``` ### Deterministic Mode Deterministic mode replaces wall clock with SimClock and seeds all RNG. Use for reproducible tests and replay: ```cpp auto sched = horus::Scheduler() .tick_rate(100_hz) .deterministic(true); // SimClock + seeded RNG ``` --- ## Creating Topics Topics are created on the scheduler before nodes are added. The scheduler owns the underlying shared memory segments. ```cpp auto cmd_pub = sched.advertise("motor.cmd"); auto scan_sub = sched.subscribe("lidar.scan"); auto imu_sub = sched.subscribe("imu.data"); auto odom_pub = sched.advertise("odom"); ``` Topic names use **dots** (not slashes) as separators. This is required for macOS `shm_open` compatibility. See [Publisher and Subscriber API](/cpp/api/topic) for the full messaging API. --- ## Adding Nodes The scheduler supports three node styles. All go through `NodeBuilder` for scheduling configuration. ### Style 1: Lambda Node (Inline) The simplest approach. Pass a name and a tick callback: ```cpp sched.add("obstacle_detector") .rate(50_hz) .budget(5_ms) .on_miss(horus::Miss::Skip) .tick([&] { auto scan = scan_sub.recv(); if (!scan) return; // process scan... }) .build(); ``` ### Style 2: Struct-Based Node Subclass `horus::Node` for complex nodes with state (see [Node API](/cpp/api/node)): ```cpp ArmController ctrl; // subclass of horus::Node sched.add(ctrl).rate(100_hz).budget(2_ms).build(); ``` ### Style 3: LambdaNode Declarative node with builder pattern for pub/sub. Like Python's `horus.Node()`: ```cpp auto nav = horus::LambdaNode("navigator") .sub("odom") .pub("motor.cmd") .on_tick([](horus::LambdaNode& self) { auto odom = self.recv("odom"); if (!odom) return; self.send("motor.cmd", horus::msg::CmdVel{0, 0.3f, 0.0f}); }); sched.add(nav) .rate(20_hz) .build(); ``` See [Node API](/cpp/api/node) for the full lifecycle and introspection API. --- ## NodeBuilder Configuration Every `sched.add(...)` call returns a `NodeBuilder`. Chain scheduling options before calling `.build()`. ### Execution Class Auto-Detection The scheduler automatically assigns an execution class based on what you configure: | Configuration | Detected Class | Thread | |---------------|---------------|--------| | `.rate()` + `.budget()` or `.deadline()` | **Rt** | Dedicated RT thread, SCHED_FIFO | | `.rate()` only | **BestEffort** | Shared thread pool | | `.compute()` | **Compute** | CPU-bound thread pool | | `.async_io()` | **AsyncIo** | I/O thread pool | | `.on("topic")` | **Event** | Wakes on message arrival | ```cpp // RT node: rate + budget auto-detects as Rt class sched.add("safety_monitor") .rate(1000_hz) .budget(100_us) .deadline(900_us) .on_miss(horus::Miss::SafeMode) .priority(90) .pin_core(3) .tick([&] { /* safety checks */ }) .build(); // Compute node: long-running CPU work sched.add("path_planner") .compute() .tick([&] { /* A* search */ }) .build(); // Event-driven node: wakes on message sched.add("logger") .on("diagnostics.status") .tick([&] { /* log message */ }) .build(); ``` ### Deadline Miss Policies | Policy | Behavior | |--------|----------| | `Miss::Warn` | Log a warning, continue execution | | `Miss::Skip` | Skip the current tick, reset for next cycle | | `Miss::SafeMode` | Call `enter_safe_state()`, then continue | | `Miss::Stop` | Stop the node permanently | ### Init and Safe State Callbacks Lambda nodes can set lifecycle callbacks through the builder: ```cpp sched.add("motor_driver") .rate(100_hz) .budget(2_ms) .init([&] { printf("Motor driver initialized\n"); // one-time hardware setup }) .safe_state([&] { // send zero velocity on watchdog timeout cmd_pub.send(horus::msg::CmdVel{0, 0.0f, 0.0f}); }) .tick([&] { // normal motor control }) .build(); ``` --- ## Running the Scheduler ### Blocking Spin `spin()` blocks the calling thread until the scheduler is stopped (via Ctrl+C, SIGTERM, or `.stop()`): ```cpp sched.spin(); // execution resumes here after shutdown ``` ### Single Tick `tick_once()` executes exactly one tick of all registered nodes. Useful for testing and stepped simulation: ```cpp for (int i = 0; i < 1000; ++i) { sched.tick_once(); } ``` ### Stopping from Another Thread `stop()` is thread-safe. Call it from a signal handler, another thread, or a node's tick callback: ```cpp // Stop programmatically after 10 seconds: std::thread timer([&] { std::this_thread::sleep_for(std::chrono::seconds(10)); sched.stop(); }); sched.spin(); timer.join(); ``` --- ## Runtime Queries Query the scheduler state at any time (all methods are thread-safe): ```cpp // Check if still running if (sched.is_running()) { /* ... */ } // Get the scheduler name std::string name = sched.get_name(); // Get human-readable status std::string info = sched.status(); // List all registered nodes auto nodes = sched.node_list(); for (const auto& n : nodes) { printf(" node: %s\n", n.c_str()); } ``` --- ## Common Patterns ### Multi-Rate System Different nodes run at different rates within the same scheduler: ```cpp auto sched = horus::Scheduler() .tick_rate(1000_hz) // GCD of all node rates .prefer_rt(); sched.add("safety") .rate(1000_hz).budget(50_us).priority(99) .tick([&] { /* fastest, highest priority */ }).build(); sched.add("controller") .rate(100_hz).budget(2_ms) .tick([&] { /* medium rate */ }).build(); sched.add("planner") .rate(10_hz).compute() .tick([&] { /* slow, CPU-heavy */ }).build(); sched.spin(); ``` ### Test Harness with tick_once Step through execution deterministically for unit tests: ```cpp auto sched = horus::Scheduler().tick_rate(100_hz).deterministic(true); auto pub = sched.advertise("cmd"); auto sub = sched.subscribe("cmd"); int tick_count = 0; sched.add("producer").rate(100_hz) .tick([&] { pub.send(horus::msg::CmdVel{0, 1.0f, 0.0f}); tick_count++; }) .build(); sched.tick_once(); assert(tick_count == 1); auto msg = sub.recv(); assert(msg.has_value()); ``` ### Network-Enabled Scheduler ```cpp auto sched = horus::Scheduler().tick_rate(100_hz) .enable_network().name("robot_01"); // topics visible on LAN auto pub = sched.advertise("robot_01.odom"); ``` --- ## Ownership `Scheduler` is move-only. It owns the underlying Rust `Box` and releases it in the destructor. Copy is deleted: ```cpp horus::Scheduler a; // horus::Scheduler b = a; // COMPILE ERROR horus::Scheduler b = std::move(a); // OK ``` --- ## See Also - [Node API](/cpp/api/node) -- Struct-based and lambda-based node lifecycle - [Publisher and Subscriber API](/cpp/api/topic) -- Zero-copy messaging - [Services and Actions API](/cpp/api/services) -- RPC and long-running tasks - [C++ Real-Time Guide](/cpp/realtime) -- Budget, deadline, SCHED_FIFO, CPU pinning - [C++ API Overview](/cpp/api) -- All classes at a glance --- ## Node API Path: /cpp/api/node Description: Complete API reference for the HORUS C++ Node class — struct-based nodes, LambdaNode, lifecycle, pub/sub, and introspection # Node API Every component in a HORUS system -- sensor drivers, controllers, planners, loggers -- is a node. The C++ API provides three styles: inline lambdas (simplest), `horus::Node` subclasses (most control), and `horus::LambdaNode` (declarative builder). All three are scheduled identically by the `Scheduler`. > **Rust**: See [`Node`](/rust/api/node) for the Rust trait. Same lifecycle semantics. > **Python**: See [`horus.Node`](/python/api) for the Python wrapper. ```cpp // simplified #include #include ``` --- ## Quick Reference -- Node (Base Class) | Method | Virtual | Default | Description | |--------|---------|---------|-------------| | `Node(string_view name)` | -- | -- | Constructor with node name | | `tick()` | **Pure** | -- | Main execution, called every cycle | | `init()` | Yes | no-op | One-time initialization at startup | | `enter_safe_state()` | Yes | no-op | Called by watchdog on critical timeout | | `on_shutdown()` | Yes | no-op | Cleanup on scheduler exit | | `name()` | No | -- | Returns the node name | | `advertise(topic)` | No | -- | Create a `Publisher*` (call in constructor) | | `subscribe(topic)` | No | -- | Create a `Subscriber*` (call in constructor) | | `publishers()` | No | -- | List of published topic names | | `subscriptions()` | No | -- | List of subscribed topic names | ## Quick Reference -- LambdaNode | Method | Returns | Description | |--------|---------|-------------| | `LambdaNode(string_view name)` | `LambdaNode` | Constructor with node name | | `.pub(topic)` | `LambdaNode&` | Declare a publisher (builder) | | `.sub(topic)` | `LambdaNode&` | Declare a subscriber (builder) | | `.on_tick(fn)` | `LambdaNode&` | Set tick callback: `void(LambdaNode&)` | | `.on_init(fn)` | `LambdaNode&` | Set init callback: `void(LambdaNode&)` | | `.send(topic, msg)` | `void` | Send a message by copy (call from tick) | | `.recv(topic)` | `optional>` | Receive a message (call from tick) | | `.has_msg(topic)` | `bool` | Check if a message is available | | `.name()` | `const string&` | Returns the node name | | `.publishers()` | `const vector&` | List of published topic names | | `.subscriptions()` | `const vector&` | List of subscribed topic names | --- ## Lifecycle The scheduler manages the node lifecycle in a strict order, regardless of which node style you use: ``` Construction --> Registration --> Init --> Tick Loop --> Shutdown you scheduler once repeated once ``` 1. **Construction** -- You create the node, declare topics, set initial state 2. **Registration** -- `sched.add(node).build()` validates and registers the node 3. **Initialization** -- On first `spin()` or `tick_once()`, the scheduler calls `init()` once 4. **Tick Loop** -- Each cycle: feed watchdog, call `tick()`, measure timing, check budget/deadline 5. **Shutdown** -- On Ctrl+C, SIGTERM, or `sched.stop()`: `on_shutdown()` called on each node If `tick()` throws an exception, HORUS catches it (`catch_unwind` equivalent via the FFI boundary). The node is marked unhealthy but the scheduler continues running other nodes. --- ## Style 1: horus::Node Subclass The most powerful style. Subclass `horus::Node`, override lifecycle methods, and create topics in the constructor. This is the C++ equivalent of Rust's `impl Node for T`. ```cpp #include #include class ObstacleDetector : public horus::Node { public: ObstacleDetector() : Node("obstacle_detector") { scan_sub_ = subscribe("lidar.scan"); alert_pub_ = advertise("safety.alert"); } void init() override { printf("[%s] initialized, min_range=%.2f\n", name().c_str(), min_range_); } void tick() override { auto scan = scan_sub_->recv(); if (!scan) return; for (int i = 0; i < 360; ++i) { if (scan->ranges[i] > 0.0f && scan->ranges[i] < min_range_) { auto alert = alert_pub_->loan(); alert->level = 1; // WARN alert_pub_->publish(std::move(alert)); return; } } } void enter_safe_state() override { // Called by watchdog on timeout -- publish emergency alert auto alert = alert_pub_->loan(); alert->level = 3; // CRITICAL alert_pub_->publish(std::move(alert)); } void on_shutdown() override { printf("[%s] shutting down\n", name().c_str()); } private: horus::Subscriber* scan_sub_; horus::Publisher* alert_pub_; float min_range_ = 0.3f; }; ``` Register with the scheduler: ```cpp ObstacleDetector detector; sched.add(detector) .rate(50_hz) .budget(5_ms) .on_miss(horus::Miss::Skip) .build(); ``` ### Topic Creation in Constructor Call `advertise()` and `subscribe()` in the constructor. They return raw pointers that the `Node` base class owns internally (via `shared_ptr`). The pointers are valid for the lifetime of the node: ```cpp class MyNode : public horus::Node { public: MyNode() : Node("my_node") { // Create topics in constructor -- Node base class owns them pub_ = advertise("cmd"); sub_ = subscribe("imu"); } void tick() override { auto imu = sub_->recv(); if (!imu) return; pub_->send(horus::msg::CmdVel{0, 0.5f, 0.0f}); } private: horus::Publisher* pub_; horus::Subscriber* sub_; }; ``` --- ## Style 2: Lambda Node (Inline) For simple nodes that do not need their own class. Pass a lambda to `NodeBuilder::tick()`: ```cpp auto cmd_pub = sched.advertise("motor.cmd"); auto scan_sub = sched.subscribe("lidar.scan"); sched.add("reactive_driver") .rate(50_hz) .budget(2_ms) .init([&] { printf("Reactive driver ready\n"); }) .tick([&] { auto scan = scan_sub.recv(); if (!scan) return; float front = scan->ranges[0]; float speed = front > 1.0f ? 0.5f : 0.0f; cmd_pub.send(horus::msg::CmdVel{0, speed, 0.0f}); }) .safe_state([&] { cmd_pub.send(horus::msg::CmdVel{0, 0.0f, 0.0f}); }) .build(); ``` Lambda nodes capture topics from the enclosing scope by reference. Topics must outlive the scheduler. Since topics are created on the scheduler, this is guaranteed. --- ## Style 3: LambdaNode (Declarative Builder) `LambdaNode` combines topic declaration and callbacks in a single builder chain. It is the C++ equivalent of Python's `horus.Node(name, pubs, subs, tick)`: ```cpp auto controller = horus::LambdaNode("controller") .sub("lidar.scan") .sub("imu.data") .pub("motor.cmd") .on_init([](horus::LambdaNode& self) { printf("[%s] ready\n", self.name().c_str()); }) .on_tick([](horus::LambdaNode& self) { auto scan = self.recv("lidar.scan"); if (!scan) return; float speed = scan->ranges[0] > 1.0f ? 0.3f : 0.0f; self.send("motor.cmd", horus::msg::CmdVel{0, speed, 0.0f}); }); sched.add(controller) .rate(50_hz) .budget(5_ms) .build(); ``` ### LambdaNode Runtime API Inside the `on_tick` callback, use `self.send()`, `self.recv()`, and `self.has_msg()`: ```cpp .on_tick([](horus::LambdaNode& self) { // Check before consuming if (self.has_msg("imu.data")) { auto imu = self.recv("imu.data"); // process... } // Send by copy (simple, slight overhead for large messages) self.send("motor.cmd", horus::msg::CmdVel{0, 0.5f, 0.1f}); }) ``` --- ## Introspection Both `Node` and `LambdaNode` expose their topic lists for monitoring and debugging: ```cpp // Node subclass ObstacleDetector detector; for (const auto& topic : detector.publishers()) { printf("publishes: %s\n", topic.c_str()); } for (const auto& topic : detector.subscriptions()) { printf("subscribes: %s\n", topic.c_str()); } // LambdaNode auto node = horus::LambdaNode("nav") .pub("cmd") .sub("odom"); // node.publishers() -> ["cmd"] // node.subscriptions() -> ["odom"] ``` The scheduler uses these lists for the topic graph, `horus monitor`, and BlackBox flight recorder. --- ## Panic Safety and Error Handling HORUS wraps every `tick()` call at the FFI boundary. If a C++ node throws an unhandled exception: 1. The exception is caught at the FFI boundary (equivalent to Rust's `catch_unwind`) 2. The node is marked as **unhealthy** 3. Other nodes continue running normally 4. The watchdog may call `enter_safe_state()` depending on configuration This means a bug in one node does not crash the entire system: ```cpp void tick() override { auto scan = scan_sub_->recv(); if (!scan) return; // If this throws, the node is isolated -- other nodes keep running process_scan(*scan); } ``` Best practice: handle errors locally within `tick()` rather than letting exceptions propagate. --- ## Common Patterns ### Stateful Node with Configuration ```cpp class PidController : public horus::Node { public: PidController(float kp, float ki, float kd) : Node("pid_controller"), kp_(kp), ki_(ki), kd_(kd) { error_sub_ = subscribe("nav.error"); cmd_pub_ = advertise("motor.cmd"); } void tick() override { auto error = error_sub_->recv(); if (!error) return; float e = static_cast(error->x); integral_ += e; float derivative = e - last_error_; last_error_ = e; float output = kp_ * e + ki_ * integral_ + kd_ * derivative; auto cmd = cmd_pub_->loan(); cmd->linear = output; cmd->angular = 0.0f; cmd_pub_->publish(std::move(cmd)); } void enter_safe_state() override { // Zero output on safety event cmd_pub_->send(horus::msg::CmdVel{0, 0.0f, 0.0f}); integral_ = 0.0f; } private: horus::Subscriber* error_sub_; horus::Publisher* cmd_pub_; float kp_, ki_, kd_; float integral_ = 0.0f; float last_error_ = 0.0f; }; // Usage: PidController pid(1.0f, 0.01f, 0.1f); sched.add(pid).rate(100_hz).budget(500_us).build(); ``` ### Multiple Nodes Sharing Topics Nodes communicate through shared topics. Create topics on the scheduler, then reference them from multiple nodes: ```cpp auto odom_pub = sched.advertise("odom"); auto odom_sub = sched.subscribe("odom"); auto cmd_pub = sched.advertise("cmd"); sched.add("localizer") .rate(100_hz) .tick([&] { auto odom = odom_pub.loan(); odom->pose.x = compute_x(); odom->pose.y = compute_y(); odom_pub.publish(std::move(odom)); }) .build(); sched.add("controller") .rate(50_hz) .tick([&] { auto odom = odom_sub.recv(); if (!odom) return; cmd_pub.send(horus::msg::CmdVel{0, 0.3f, 0.0f}); }) .build(); ``` ### Choosing a Node Style | Style | Best For | Topic Creation | State | |-------|----------|---------------|-------| | Lambda (inline) | Simple nodes, prototyping | On scheduler | Captured variables | | `horus::Node` subclass | Complex nodes, reusable components | In constructor | Member variables | | `horus::LambdaNode` | Declarative, Python-like ergonomics | Builder chain | Via callback closure | --- ## See Also - [Scheduler API](/cpp/api/scheduler) -- Node registration and execution - [Publisher and Subscriber API](/cpp/api/topic) -- Zero-copy messaging details - [Services and Actions API](/cpp/api/services) -- RPC and long-running tasks - [C++ Real-Time Guide](/cpp/realtime) -- Budget, deadline, miss policies - [C++ API Overview](/cpp/api) -- All classes at a glance --- ## Publisher & Subscriber API Path: /cpp/api/topic Description: Complete API reference for HORUS C++ zero-copy IPC — Publisher, Subscriber, LoanedSample, BorrowedSample, and all 51 message types # Publisher and Subscriber API HORUS uses shared memory (SHM) for inter-process communication. `Publisher` writes messages, `Subscriber` reads them -- both zero-copy. The loan pattern lets you write directly into shared memory without any intermediate buffers or serialization. > **Rust**: See [`Topic`](/rust/api/topic) for the Rust unified pub/sub API. > **Python**: See [`node.send()` / `node.recv()`](/python/api) for the Python wrapper. ```cpp // simplified #include #include ``` --- ## Quick Reference -- `Publisher` | Method | Returns | Description | |--------|---------|-------------| | `.loan()` | `LoanedSample` | Get a writable buffer from SHM (zero-copy) | | `.publish(LoanedSample&&)` | `void` | Make a loaned sample visible to subscribers | | `.send(const T&)` | `void` | Send a message by copy (simpler API) | | `.name()` | `const string&` | Get the topic name | ## Quick Reference -- `Subscriber` | Method | Returns | Description | |--------|---------|-------------| | `.recv()` | `optional>` | Receive the next message (nullopt if none) | | `.has_msg()` | `bool` | Check if a message is available without consuming | | `.name()` | `const string&` | Get the topic name | ## Quick Reference -- `LoanedSample` | Method | Returns | Description | |--------|---------|-------------| | `operator->()` | `T*` / `const T*` | Direct SHM pointer access (zero-copy write) | | `operator*()` | `T&` / `const T&` | Dereference to the data | | `.get()` | `T*` / `const T*` | Raw pointer to the data | ## Quick Reference -- `BorrowedSample` | Method | Returns | Description | |--------|---------|-------------| | `operator->()` | `const T*` | Read-only pointer access | | `operator*()` | `const T&` | Read-only dereference | | `.get()` | `const T*` | Raw read-only pointer | --- ## Creating Topics Topics are created on the `Scheduler` or inside a `Node` / `LambdaNode`. Topic names use **dots** as separators (not slashes), required for macOS `shm_open` compatibility. ### On the Scheduler (for Lambda Nodes) ```cpp auto cmd_pub = sched.advertise("motor.cmd"); auto scan_sub = sched.subscribe("lidar.scan"); ``` ### Inside a Node Subclass (Constructor) ```cpp class Controller : public horus::Node { public: Controller() : Node("controller") { scan_sub_ = subscribe("lidar.scan"); cmd_pub_ = advertise("motor.cmd"); } // ... private: horus::Subscriber* scan_sub_; horus::Publisher* cmd_pub_; }; ``` ### Inside a LambdaNode (Builder) ```cpp auto node = horus::LambdaNode("controller") .sub("lidar.scan") .pub("motor.cmd"); ``` --- ## Zero-Copy Loan Pattern The loan pattern is the primary way to publish messages in HORUS. It eliminates all copies between producer and consumer. ### How It Works ``` 1. pub.loan() --> SHM allocates a slot, returns LoanedSample 2. sample->field = value --> writes directly into SHM (0 ns copy overhead) 3. pub.publish(std::move(sample))--> flips visibility flag, subscribers can read ``` ### Full Example ```cpp auto pub = sched.advertise("odom"); sched.add("localizer") .rate(100_hz) .tick([&] { // Step 1: Loan a writable slot from shared memory auto sample = pub.loan(); // Step 2: Write directly into SHM (zero copy) sample->pose.x = current_x; sample->pose.y = current_y; sample->pose.theta = current_theta; sample->timestamp_ns = now_ns(); // Step 3: Publish (consumes the sample via move) pub.publish(std::move(sample)); }) .build(); ``` After `publish()`, the `LoanedSample` is consumed (moved). Attempting to use it after publish is a compile error. ### LoanedSample Access Patterns ```cpp auto sample = pub.loan(); // Arrow operator -- most common sample->linear = 0.5f; sample->angular = 0.1f; // Dereference operator horus::msg::CmdVel& ref = *sample; ref.linear = 0.5f; // Raw pointer horus::msg::CmdVel* ptr = sample.get(); ptr->linear = 0.5f; ``` --- ## Send-by-Copy Pattern For simplicity or small messages, `send()` copies the message into shared memory in one call: ```cpp auto pub = sched.advertise("motor.cmd"); // Single-call publish -- copies msg into SHM pub.send(horus::msg::CmdVel{0, 0.5f, 0.1f}); ``` This is equivalent to `loan()` + write + `publish()` but performs one copy. For small messages like `CmdVel` (16 bytes), the overhead is negligible. For large messages like `LaserScan` (1468 bytes) or `Odometry`, prefer the loan pattern. | Pattern | Copy Count | Best For | |---------|-----------|----------| | `loan()` + `publish()` | **0** | Large messages, high-frequency topics | | `send()` | **1** | Small messages, simple code | --- ## Receiving Messages ### Basic Receive `recv()` returns `std::optional>`. If no message is available, it returns `std::nullopt`: ```cpp auto sub = sched.subscribe("lidar.scan"); sched.add("processor") .rate(50_hz) .tick([&] { auto scan = sub.recv(); if (!scan) return; // no message this tick // Access fields through arrow operator (read-only) float front_range = scan->ranges[0]; float min_range = scan->min_range; // Dereference const horus::msg::LaserScan& data = *scan; // Raw pointer const horus::msg::LaserScan* ptr = scan->get(); }) .build(); ``` ### Check Before Consuming Use `has_msg()` to check availability without consuming the message: ```cpp sched.add("conditional_processor") .rate(50_hz) .tick([&] { if (!scan_sub.has_msg()) { // No scan available -- do something else return; } auto scan = scan_sub.recv(); // guaranteed to have a value }) .build(); ``` ### BorrowedSample Lifetime The `BorrowedSample` is valid until it goes out of scope. RAII releases the SHM slot automatically: ```cpp { auto scan = sub.recv(); if (scan) { process(*scan); // valid here } } // BorrowedSample released here -- SHM slot freed ``` --- ## Move Semantics All topic types are **move-only**. Copy constructors and copy assignment operators are deleted. This enforces single ownership, matching Rust's ownership model: ```cpp auto pub = sched.advertise("cmd"); // auto pub2 = pub; // COMPILE ERROR: copy deleted auto pub2 = std::move(pub); // OK: ownership transferred auto sample = pub2.loan(); // auto sample2 = sample; // COMPILE ERROR: copy deleted pub2.publish(std::move(sample)); // OK: consumed by publish // sample->linear = 1.0; // COMPILE ERROR: sample was moved ``` --- ## Supported Message Types (51) All message types live in the `horus::msg` namespace. They are `#[repr(C)]` compatible with Rust for zero-copy SHM IPC. Each type gets a `Publisher` and `Subscriber` template specialization generated internally via `HORUS_TOPIC_IMPL` in `impl/topic_impl.hpp`. **Geometry (12):** `Point3`, `Vector3`, `Quaternion`, `Twist`, `Pose2D`, `TransformStamped`, `Pose3D`, `PoseStamped`, `PoseWithCovariance`, `TwistWithCovariance`, `Accel`, `AccelStamped` **Sensor (11):** `LaserScan`, `Imu`, `Odometry`, `RangeSensor`, `BatteryState`, `NavSatFix`, `MagneticField`, `Temperature`, `FluidPressure`, `Illuminance`, `JointState` **Control (7):** `CmdVel`, `MotorCommand`, `DifferentialDriveCommand`, `ServoCommand`, `PidConfig`, `TrajectoryPoint`, `JointCommand` **Navigation (5):** `NavGoal`, `GoalResult`, `Waypoint`, `VelocityObstacle`, `PathPlan` **Diagnostics (8):** `Heartbeat`, `DiagnosticStatus`, `EmergencyStop`, `ResourceUsage`, `NodeHeartbeat`, `SafetyStatus`, `DiagnosticValue`, `DiagnosticReport` **Detection (12):** `BoundingBox2D`, `BoundingBox3D`, `Detection`, `Detection3D`, `Landmark`, `Landmark3D`, `LandmarkArray`, `TrackedObject`, `TrackingHeader`, `PlaneDetection`, `SegmentationMask`, `PointField` **Vision (3):** `CameraInfo`, `RegionOfInterest`, `StereoInfo` **Force (5):** `WrenchStamped`, `ImpedanceParameters`, `ForceCommand`, `ContactInfo`, `HapticFeedback` **Time (4):** `Clock`, `TimeReference`, `SimSync`, `RateRequest` **Input (3):** `KeyboardInput`, `JoystickInput`, `AudioFrame` --- ## Common Patterns ### Multi-Topic Node A single node can publish and subscribe to multiple topics: ```cpp auto imu_sub = sched.subscribe("imu.data"); auto odom_sub = sched.subscribe("odom"); auto cmd_pub = sched.advertise("motor.cmd"); auto diag_pub = sched.advertise("health.ctrl"); sched.add("fusion_controller") .rate(100_hz) .tick([&] { auto imu = imu_sub.recv(); auto odom = odom_sub.recv(); if (imu && odom) { auto cmd = cmd_pub.loan(); cmd->linear = compute_speed(*odom); cmd->angular = compute_turn(*imu); cmd_pub.publish(std::move(cmd)); } // Always publish heartbeat uint64_t seq = 0; diag_pub.send(horus::msg::Heartbeat{ {}, seq++, true, 0.0f, 0.0f, 0 }); }) .build(); ``` ### Pipeline Pattern Chain nodes through topics to form a processing pipeline: ```cpp auto raw_sub = sched.subscribe("lidar.raw"); auto filt_pub = sched.advertise("lidar.filtered"); auto filt_sub = sched.subscribe("lidar.filtered"); auto cmd_pub = sched.advertise("motor.cmd"); // Stage 1: Filter raw scan sched.add("filter") .rate(50_hz).order(1) .tick([&] { auto raw = raw_sub.recv(); if (!raw) return; auto filtered = filt_pub.loan(); *filtered = *raw; // copy, then modify in place for (int i = 0; i < 360; ++i) { if (filtered->ranges[i] < 0.05f) filtered->ranges[i] = 0.0f; } filt_pub.publish(std::move(filtered)); }) .build(); // Stage 2: React to filtered scan sched.add("controller") .rate(50_hz).order(2) .tick([&] { auto scan = filt_sub.recv(); if (!scan) return; cmd_pub.send(horus::msg::CmdVel{0, 0.3f, 0.0f}); }) .build(); ``` ### Topic Name Conventions Use dot-separated hierarchical names: ``` sensor.lidar.scan # LiDAR data sensor.imu.data # IMU readings motor.cmd # Motor commands nav.goal # Navigation goal safety.alert # Safety alerts diagnostics.heartbeat # Heartbeat ``` --- ## See Also - [Scheduler API](/cpp/api/scheduler) -- Topic creation via `advertise` and `subscribe` - [Node API](/cpp/api/node) -- Node styles and lifecycle - [Services and Actions API](/cpp/api/services) -- Request/response and long-running tasks - [C++ API Overview](/cpp/api) -- All classes at a glance - [Rust Topic API](/rust/api/topic) -- Equivalent Rust API --- ## Services & Actions API Path: /cpp/api/services Description: Complete API reference for HORUS C++ services (request/response RPC) and actions (long-running tasks with progress and cancellation) # Services and Actions API HORUS provides two communication patterns beyond pub/sub topics: **services** for synchronous request/response RPC, and **actions** for long-running tasks with progress feedback and cancellation. Both use JSON as the wire format over shared memory topics, so they work same-process and cross-process with zero configuration. > **Rust**: See [Services](/rust/api/services) and [Actions](/rust/api/actions) for the Rust API. > **Python**: Services and actions are available via the Python bindings. ```cpp // simplified #include #include ``` --- ## Quick Reference -- ServiceClient | Method | Returns | Description | |--------|---------|-------------| | `ServiceClient(name)` | -- | Construct a client for the named service | | `.call(json, timeout)` | `optional` | Send request JSON, wait for response | | `operator bool()` | `bool` | Check if the client handle is valid | ## Quick Reference -- ServiceServer | Method | Returns | Description | |--------|---------|-------------| | `ServiceServer(name)` | -- | Construct a server for the named service | | `.set_handler(fn)` | `void` | Set the request handler callback | | `operator bool()` | `bool` | Check if the server handle is valid | ## Quick Reference -- ActionClient | Method | Returns | Description | |--------|---------|-------------| | `ActionClient(name)` | -- | Construct a client for the named action | | `.send_goal(json)` | `GoalHandle` | Send a goal and get a handle to track it | | `operator bool()` | `bool` | Check if the client handle is valid | ## Quick Reference -- ActionServer | Method | Returns | Description | |--------|---------|-------------| | `ActionServer(name)` | -- | Construct a server for the named action | | `.set_accept_handler(fn)` | `void` | Set the goal acceptance callback | | `.set_execute_handler(fn)` | `void` | Set the goal execution callback | | `.is_ready()` | `bool` | Check if both handlers are set | | `operator bool()` | `bool` | Check if the server handle is valid | ## Quick Reference -- GoalHandle | Method | Returns | Description | |--------|---------|-------------| | `.status()` | `GoalStatus` | Current status of the goal | | `.id()` | `uint64_t` | Unique goal identifier | | `.is_active()` | `bool` | True if Pending or Active | | `.cancel()` | `void` | Request cancellation of the goal | | `operator bool()` | `bool` | Check if the handle is valid | ## Quick Reference -- GoalStatus Enum | Value | Description | |-------|-------------| | `GoalStatus::Pending` | Goal accepted, waiting to start | | `GoalStatus::Active` | Goal is executing | | `GoalStatus::Succeeded` | Goal completed successfully | | `GoalStatus::Aborted` | Goal failed during execution | | `GoalStatus::Canceled` | Goal was canceled by client | | `GoalStatus::Rejected` | Goal was rejected by server | --- ## Services -- Request/Response RPC Services implement a synchronous request/response pattern. A client sends a JSON request and blocks until the server responds or a timeout expires. ### ServiceClient Create a client by name. Call `.call()` with a JSON string and a timeout: ```cpp #include #include using namespace std::chrono_literals; horus::ServiceClient client("add_two_ints"); // Check that the handle was created successfully if (!client) { fprintf(stderr, "Failed to create service client\n"); return; } // Send request, wait up to 1 second for response auto response = client.call(R"({"a": 3, "b": 4})", 1000ms); if (response) { printf("Response: %s\n", response->c_str()); // Output: {"sum": 7} } else { printf("Service call timed out or failed\n"); } ``` The `call()` method accepts both `const char*` and `const std::string&`: ```cpp // String literal auto r1 = client.call(R"({"x": 1.0})", 500ms); // std::string std::string request = R"({"x": 1.0, "y": 2.0})"; auto r2 = client.call(request, 500ms); ``` ### ServiceServer Create a server and set a handler function. The handler receives raw bytes (the JSON request) and writes raw bytes (the JSON response): ```cpp #include #include #include horus::ServiceServer server("add_two_ints"); server.set_handler([](const uint8_t* req, size_t req_len, uint8_t* res, size_t* res_len) -> bool { // Parse request (in production, use a JSON library) // For this example, assume req is: {"a": 3, "b": 4} int a = 3, b = 4; // parsed from req // Write response int written = snprintf(reinterpret_cast(res), 4096, R"({"sum": %d})", a + b); *res_len = static_cast(written); return true; // true = success, false = error }); ``` The handler signature is: ```cpp using Handler = bool(*)(const uint8_t* req, size_t req_len, uint8_t* res, size_t* res_len); ``` - `req` / `req_len`: Request payload (JSON bytes) - `res` / `res_len`: Response buffer (4096 bytes max) -- write your response here - Return `true` for success, `false` for error --- ## Actions -- Long-Running Tasks Actions handle operations that take time to complete, like navigation or trajectory execution. The client sends a goal and gets a `GoalHandle` to track progress and request cancellation. The server accepts or rejects goals and executes them asynchronously. ### ActionClient Create a client, send a goal as JSON, and track it via `GoalHandle`: ```cpp #include horus::ActionClient client("navigate_to_pose"); if (!client) return; auto goal = client.send_goal(R"({"target_x": 5.0, "target_y": 3.0})"); if (!goal) return; printf("Goal %lu submitted\n", goal.id()); // Poll until complete while (goal.is_active()) { /* do other work or sleep */ } if (goal.status() == horus::GoalStatus::Succeeded) { printf("Navigation complete!\n"); } ``` ### Canceling a Goal Call `cancel()` on the `GoalHandle` to request cancellation: ```cpp auto goal = client.send_goal(R"({"x": 10.0, "y": 0.0})"); // Cancel after 5 seconds if not done std::this_thread::sleep_for(std::chrono::seconds(5)); if (goal.is_active()) goal.cancel(); ``` ### GoalHandle Lifecycle ``` send_goal() --> Pending --> Active --> Succeeded | | | +--> Aborted (server error) | +--> Canceled (client cancel) | +--> Rejected (server rejects) ``` The `is_active()` method returns `true` for `Pending` and `Active` states, `false` for all terminal states. ### ActionServer Create a server with two handlers: one to accept/reject goals, one to execute them: ```cpp #include horus::ActionServer server("navigate_to_pose"); // Accept handler: receives goal data, returns 0 to accept, 1 to reject server.set_accept_handler([](const uint8_t* goal_data, size_t len) -> uint8_t { // Parse goal JSON, validate parameters // Return 0 = accept, 1 = reject return 0; // accept all goals }); // Execute handler: receives goal_id and goal data, runs the task server.set_execute_handler([](uint64_t goal_id, const uint8_t* goal_data, size_t len) { printf("Executing goal %lu\n", goal_id); // Parse goal, execute navigation... // When done, the goal transitions to Succeeded/Aborted automatically }); // Verify both handlers are set if (server.is_ready()) { printf("Action server ready\n"); } ``` The handler signatures are: ```cpp using AcceptHandler = uint8_t(*)(const uint8_t* goal, size_t len); using ExecuteHandler = void(*)(uint64_t goal_id, const uint8_t* goal, size_t len); ``` --- ## JSON Wire Transport Services and actions use JSON as the wire format, serialized into `JsonWireMessage` Pod structs (4 KB each) transported over SHM topics. This design means: - **No code generation**: No `.srv` or `.action` IDL files. Just send/receive JSON strings - **Cross-language**: A Rust service server can handle requests from a C++ client (and vice versa) - **Cross-process**: Uses the same SHM transport as topics -- works across processes automatically - **Debugging**: JSON payloads are human-readable in `horus monitor` and BlackBox recordings ### Topic Layout Each service creates two internal topics: ``` {name}.request -- client sends request here {name}.response.{client_pid} -- server sends response here (per-client) ``` Each action creates three internal topics: ``` {name}.goal -- client sends goals here {name}.feedback -- server publishes progress here {name}.result -- server publishes final result here ``` These are standard HORUS topics -- you can monitor them with `horus topic list` and `horus monitor`. ## Ownership and Move Semantics All service and action types are **move-only**. Copy is deleted: ```cpp horus::ServiceClient a("my_svc"); // horus::ServiceClient b = a; // COMPILE ERROR horus::ServiceClient b = std::move(a); // OK horus::GoalHandle g = client.send_goal("{}"); // horus::GoalHandle g2 = g; // COMPILE ERROR horus::GoalHandle g2 = std::move(g); // OK ``` Resources are released in destructors via the C FFI (`horus_*_destroy` functions). Use RAII -- do not call destroy manually. --- ## Common Patterns ### Service with JSON Library In production, use a JSON library (nlohmann/json, rapidjson, simdjson) for parsing: ```cpp #include #include using json = nlohmann::json; horus::ServiceServer server("compute_ik"); server.set_handler([](const uint8_t* req, size_t req_len, uint8_t* res, size_t* res_len) -> bool { auto request = json::parse(req, req + req_len); double x = request["target_x"]; double y = request["target_y"]; double z = request["target_z"]; // Compute inverse kinematics... json response = { {"joint_angles", {0.1, 0.5, -0.3, 0.0, 1.2, 0.0}}, {"success", true} }; std::string resp_str = response.dump(); std::memcpy(res, resp_str.data(), resp_str.size()); *res_len = resp_str.size(); return true; }); ``` ### Cross-Process Service Call Services work across processes with no extra configuration. Start the server in one process and the client in another -- they communicate via SHM automatically: ```cpp // Process A: server horus::ServiceServer server("robot.status"); server.set_handler([](const uint8_t*, size_t, uint8_t* res, size_t* res_len) -> bool { const char* s = R"({"battery": 85, "state": "idle"})"; std::memcpy(res, s, strlen(s)); *res_len = strlen(s); return true; }); ``` ```cpp // Process B: client horus::ServiceClient client("robot.status"); auto resp = client.call("{}", std::chrono::milliseconds(500)); if (resp) printf("Robot status: %s\n", resp->c_str()); ``` ### Action with Scheduler Integration Run an action server inside a scheduled node for navigation: ```cpp horus::ActionServer nav_server("navigate"); bool nav_active = false; nav_server.set_accept_handler([&](const uint8_t*, size_t) -> uint8_t { return nav_active ? 1 : 0; // reject if already navigating }); nav_server.set_execute_handler([&](uint64_t id, const uint8_t* data, size_t len) { nav_active = true; // parse target from data... }); auto odom_sub = sched.subscribe("odom"); auto cmd_pub = sched.advertise("motor.cmd"); sched.add("nav_executor").rate(50_hz) .tick([&] { if (!nav_active) return; auto odom = odom_sub.recv(); if (!odom) return; double dist = std::sqrt(/* dx^2 + dy^2 */); if (dist < 0.1) { nav_active = false; cmd_pub.send(horus::msg::CmdVel{0, 0.0f, 0.0f}); } else { cmd_pub.send(horus::msg::CmdVel{0, 0.3f, 0.0f}); } }).build(); ``` --- ## See Also - [Scheduler API](/cpp/api/scheduler) -- Where services and actions are registered alongside nodes - [Node API](/cpp/api/node) -- Node lifecycle for hosting service/action servers - [Publisher and Subscriber API](/cpp/api/topic) -- The underlying SHM transport - [C++ API Overview](/cpp/api) -- All classes at a glance - [Rust Services API](/rust/api/services) -- Equivalent Rust service API - [Rust Actions API](/rust/api/actions) -- Equivalent Rust action API --- ## TensorPool, Image & PointCloud API Path: /cpp/api/pool Description: Zero-copy SHM-backed memory pools for camera images, lidar point clouds, and neural network tensors # TensorPool, Image & PointCloud API HORUS provides zero-copy shared memory pools for large data: camera frames, lidar scans, and neural network tensors. Data lives in a pre-allocated SHM region and is passed between nodes by reference -- no serialization, no copies. ```cpp #include ``` --- ## Architecture The pool system has four classes arranged in a hierarchy: | Class | Purpose | |-------|---------| | `TensorPool` | Manages a shared memory region with fixed-size slots | | `Tensor` | Raw N-dimensional array allocated from a pool | | `Image` | Camera frame with width, height, and pixel encoding | | `PointCloud` | Lidar scan with per-point fields (XYZ, intensity, color) | `Image` and `PointCloud` are convenience wrappers over pool-backed memory. When you publish any pool-backed type on a topic, subscribers get a pointer into the same SHM region -- zero copies. --- ## TensorPool A fixed-size shared memory region divided into allocation slots. ### Constructor ```cpp horus::TensorPool pool( uint32_t pool_id, // unique pool identifier size_t pool_size_bytes, // total SHM region size size_t max_slots // maximum concurrent allocations ); ``` **Parameters** | Name | Type | Description | |------|------|-------------| | `pool_id` | `uint32_t` | Unique identifier for this pool. Must not collide with other pools in the same system. | | `pool_size_bytes` | `size_t` | Total size of the SHM-backed memory region in bytes. | | `max_slots` | `size_t` | Maximum number of tensors/images/point clouds that can be allocated simultaneously. | **Example** ```cpp // 16 MB pool with up to 64 concurrent allocations auto pool = horus::TensorPool(1, 16 * 1024 * 1024, 64); ``` ### stats() Returns current pool utilization. ```cpp TensorPool::Stats stats() const; ``` **Returns** a `Stats` struct: | Field | Type | Description | |-------|------|-------------| | `allocated` | `size_t` | Number of active allocations | | `used_bytes` | `size_t` | Bytes currently in use | | `free_bytes` | `size_t` | Bytes available for allocation | ```cpp auto s = pool.stats(); horus::log::info("pool", ("used: " + std::to_string(s.used_bytes) + " free: " + std::to_string(s.free_bytes)).c_str()); ``` ### Ownership and Validity `TensorPool` is **move-only** -- copy construction and copy assignment are deleted. Use `explicit operator bool()` to check if the pool handle is valid: ```cpp auto pool2 = std::move(pool); // OK -- transfers ownership if (!pool2) { horus::log::error("init", "Failed to create tensor pool"); } ``` --- ## Tensor A raw N-dimensional array allocated from a `TensorPool`. ### Constructor ```cpp horus::Tensor tensor( const TensorPool& pool, // pool to allocate from const uint64_t* shape, // dimension sizes array size_t ndim, // number of dimensions Dtype dtype // element data type ); ``` **Parameters** | Name | Type | Description | |------|------|-------------| | `pool` | `const TensorPool&` | Pool to allocate from. Must remain valid for the tensor's lifetime. | | `shape` | `const uint64_t*` | Array of dimension sizes. Length must equal `ndim`. | | `ndim` | `size_t` | Number of dimensions (rank of the tensor). | | `dtype` | `Dtype` | Element data type. See Dtype enum below. | ### Methods | Method | Return | Description | |--------|--------|-------------| | `data()` | `uint8_t*` | Raw pointer to tensor memory. Cast based on `dtype`. | | `nbytes()` | `uint64_t` | Total size in bytes (product of dimensions * element size). | | `release()` | `void` | Returns the slot to the pool early. Called automatically on destruction. | --- ## Dtype Enum Element data types for tensors. ```cpp enum class Dtype : uint8_t { F32, // 32-bit float F64, // 64-bit float U8, // unsigned 8-bit integer I32, // signed 32-bit integer }; ``` | Variant | Size | Use Case | |---------|------|----------| | `Dtype::F32` | 4 bytes | Neural network weights, sensor readings | | `Dtype::F64` | 8 bytes | High-precision computation | | `Dtype::U8` | 1 byte | Image pixels, raw byte buffers | | `Dtype::I32` | 4 bytes | Integer indices, counters | --- ## Image A camera frame with width, height, and pixel encoding, backed by pool memory. ### Constructor ```cpp horus::Image img( const TensorPool& pool, uint32_t width, uint32_t height, Encoding enc = Encoding::Rgb8 // optional, defaults to RGB8 ); ``` **Parameters** | Name | Type | Description | |------|------|-------------| | `pool` | `const TensorPool&` | Pool to allocate from. | | `width` | `uint32_t` | Image width in pixels. | | `height` | `uint32_t` | Image height in pixels. | | `enc` | `Encoding` | Pixel encoding format. Defaults to `Encoding::Rgb8`. | ### Methods | Method | Return Type | Description | |--------|-------------|-------------| | `width()` | `uint32_t` | Image width in pixels | | `height()` | `uint32_t` | Image height in pixels | | `data_size()` | `size_t` | Total image data size in bytes (width * height * bytes_per_pixel) | --- ## Encoding Enum Pixel encoding formats for images. ```cpp enum class Encoding : uint8_t { Rgb8, // 3 bytes per pixel: red, green, blue Rgba8, // 4 bytes per pixel: red, green, blue, alpha Gray8, // 1 byte per pixel: grayscale Bgr8, // 3 bytes per pixel: blue, green, red (OpenCV default) }; ``` | Variant | Bytes/Pixel | Use Case | |---------|-------------|----------| | `Encoding::Rgb8` | 3 | Standard color images | | `Encoding::Rgba8` | 4 | Images with transparency | | `Encoding::Gray8` | 1 | Depth maps, IR images, edge detection output | | `Encoding::Bgr8` | 3 | OpenCV-native format, avoids channel swap | --- ## PointCloud A lidar scan or 3D point set, backed by pool memory. ### Constructor ```cpp horus::PointCloud pc( const TensorPool& pool, uint32_t num_points, uint32_t fields_per_point ); ``` **Parameters** | Name | Type | Description | |------|------|-------------| | `pool` | `const TensorPool&` | Pool to allocate from. | | `num_points` | `uint32_t` | Number of points in the cloud. | | `fields_per_point` | `uint32_t` | Floats per point: 3 = XYZ, 4 = XYZI, 6 = XYZRGB. | ### Methods | Method | Return Type | Description | |--------|-------------|-------------| | `num_points()` | `uint64_t` | Number of points in the cloud | | `fields_per_point()` | `uint32_t` | Number of float fields per point | ### Common Field Layouts | Fields | Layout | Use Case | |--------|--------|----------| | 3 | X, Y, Z | Basic geometry | | 4 | X, Y, Z, Intensity | Lidar with reflectance | | 6 | X, Y, Z, R, G, B | Colored point cloud | --- ## Example: Camera Pipeline A camera node that captures frames and publishes them over a zero-copy topic. ```cpp #include #include auto pool = horus::TensorPool(1, 32 * 1024 * 1024, 32); auto cam_topic = horus::Topic("camera.rgb"); // Inside tick(): auto frame = horus::Image(pool, 1280, 720, horus::Encoding::Rgb8); if (frame) { // Fill frame data from hardware driver... // data_size() = 1280 * 720 * 3 = 2,764,800 bytes cam_topic.send(std::move(frame)); // Subscriber receives a pointer into the same SHM -- zero copies } ``` ## Example: Neural Network Tensor ```cpp #include auto pool = horus::TensorPool(2, 64 * 1024 * 1024, 16); // Allocate a [1, 3, 224, 224] input tensor for a vision model uint64_t shape[] = {1, 3, 224, 224}; auto input = horus::Tensor(pool, shape, 4, horus::Dtype::F32); if (input) { float* data = reinterpret_cast(input.data()); // Fill with preprocessed image data... // nbytes() = 1 * 3 * 224 * 224 * 4 = 602,112 bytes } ``` --- ## Pool Sizing Guidelines | Workload | Pool Size | Max Slots | Rationale | |----------|-----------|-----------|-----------| | Single 720p camera | 16 MB | 8 | ~2.7 MB per frame, triple-buffered with headroom | | Stereo 1080p cameras | 64 MB | 16 | ~6.2 MB per frame, two cameras, pipeline depth | | 16-beam lidar | 8 MB | 16 | ~120 KB per scan at 30K points | | NN inference batch | 64 MB | 16 | Depends on model input/output sizes | Overallocating pool size is safe -- unused SHM is not committed to physical memory on Linux. Underallocating max_slots causes allocation failures when the pipeline is full. --- ## Runtime Parameters API Path: /cpp/api/params Description: Dynamic configuration for C++ nodes -- set, get, and live-tune parameters at runtime # Runtime Parameters API `horus::Params` provides dynamic key-value configuration for nodes. Parameters can be read and written at any time -- including while the node is running. This enables live tuning of gains, thresholds, and behavior without restarting the system. ```cpp #include ``` --- ## Quick Reference | Method | Description | |--------|-------------| | `Params()` | Create a new empty parameter store | | `set(key, value)` | Set a parameter (6 overloads by type) | | `get(key, default)` | Get a typed parameter with fallback (5 specializations) | | `get_f64(key)` | Get as `optional` | | `get_i64(key)` | Get as `optional` | | `get_bool(key)` | Get as `optional` | | `get_string(key)` | Get as `optional` | | `has(key)` | Check if a key exists | --- ## Constructor ```cpp horus::Params params; ``` Creates an empty parameter store. No arguments. Move-only -- copy construction and copy assignment are deleted. --- ## Typed Setters Six overloads of `set()` cover all supported types. Each returns `true` on success, `false` on failure. ```cpp bool set(const char* key, double value); bool set(const char* key, int64_t value); bool set(const char* key, int value); bool set(const char* key, bool value); bool set(const char* key, const char* value); bool set(const char* key, const std::string& value); ``` **Parameters** | Name | Type | Description | |------|------|-------------| | `key` | `const char*` | Parameter name. Use dot-separated namespaces (e.g., `"pid.kp"`). | | `value` | varies | The value to store. Type determines storage format. | **Returns** `true` if the parameter was set successfully, `false` on error. The `int` overload promotes to `int64_t` internally, so `set("count", 42)` and `set("count", int64_t(42))` are equivalent. ```cpp horus::Params params; params.set("max_speed", 1.5); // double params.set("timeout_ms", int64_t(500)); // int64_t params.set("retries", 3); // int -> int64_t params.set("enabled", true); // bool params.set("name", "robot1"); // const char* std::string label = "arm_left"; params.set("label", label); // const std::string& ``` --- ## Template Get with Default The primary way to read parameters. Returns the value if it exists and matches the type, otherwise returns the default. ```cpp template T get(const char* key, T default_val) const; ``` **Parameters** | Name | Type | Description | |------|------|-------------| | `key` | `const char*` | Parameter name to look up. | | `default_val` | `T` | Value to return if the key does not exist. | ### Template Specializations | Type `T` | Internal Call | Example | |----------|---------------|---------| | `double` | `get_f64(key)` | `params.get("kp", 1.0)` | | `int64_t` | `get_i64(key)` | `params.get("count", 0)` | | `int` | `get_i64(key)` cast to `int` | `params.get("retries", 3)` | | `bool` | `get_bool(key)` | `params.get("enabled", true)` | | `std::string` | `get_string(key)` | `params.get("name", "default")` | ```cpp double kp = params.get("pid.kp", 1.0); int64_t timeout = params.get("timeout_ms", 500); int retries = params.get("retries", 3); bool debug = params.get("debug", false); std::string name = params.get("robot.name", "unnamed"); ``` --- ## Optional Getters For cases where you need to distinguish "parameter missing" from "parameter set to the default value," use the optional getters directly. Each returns `std::nullopt` if the key does not exist. ### get_f64 ```cpp std::optional get_f64(const char* key) const; ``` ### get_i64 ```cpp std::optional get_i64(const char* key) const; ``` ### get_bool ```cpp std::optional get_bool(const char* key) const; ``` ### get_string ```cpp std::optional get_string(const char* key) const; ``` Returns the string value if it exists. Internally uses a 1024-byte buffer -- strings longer than 1023 characters are truncated. ```cpp auto maybe_name = params.get_string("robot.name"); if (maybe_name.has_value()) { horus::log::info("config", ("Name: " + *maybe_name).c_str()); } else { horus::log::warn("config", "No robot name configured"); } ``` --- ## has() ```cpp bool has(const char* key) const; ``` Returns `true` if the key exists in the parameter store, regardless of its type. ```cpp if (params.has("safety.max_force")) { double limit = params.get("safety.max_force", 100.0); // apply force limit... } ``` --- ## Parameter Naming Conventions Use dot-separated namespaces to organize parameters. | Pattern | Example | Purpose | |---------|---------|---------| | `subsystem.param` | `pid.kp` | Group by subsystem | | `subsystem.component.param` | `arm.elbow.max_torque` | Hierarchical grouping | | `safety.*` | `safety.max_speed` | Safety-critical parameters | Avoid flat names like `"kp"` or `"max_speed"` -- they collide across nodes. --- ## Example: PID Gain Tuning at Runtime A motor controller that reads PID gains from parameters, allowing live adjustment without restart. ```cpp #include #include struct MotorController { horus::Params params; double integral = 0.0; double prev_error = 0.0; void init() { // Set initial gains params.set("pid.kp", 2.0); params.set("pid.ki", 0.1); params.set("pid.kd", 0.05); params.set("pid.output_limit", 10.0); params.set("pid.anti_windup", true); } double compute(double setpoint, double measured, double dt) { // Read gains -- can be changed at runtime via CLI or dashboard double kp = params.get("pid.kp", 2.0); double ki = params.get("pid.ki", 0.1); double kd = params.get("pid.kd", 0.05); double limit = params.get("pid.output_limit", 10.0); bool anti_windup = params.get("pid.anti_windup", true); double error = setpoint - measured; integral += error * dt; // Anti-windup: clamp integral term if (anti_windup) { double i_max = limit / ki; if (integral > i_max) integral = i_max; if (integral < -i_max) integral = -i_max; } double derivative = (error - prev_error) / dt; prev_error = error; double output = kp * error + ki * integral + kd * derivative; // Clamp output if (output > limit) output = limit; if (output < -limit) output = -limit; return output; } }; ``` To tune gains at runtime, another node or the CLI updates the parameter store: ```cpp // From a tuning node or dashboard callback: controller.params.set("pid.kp", 2.5); // increase proportional gain controller.params.set("pid.ki", 0.05); // reduce integral gain // Changes take effect on the next tick() -- no restart needed ``` --- ## Live Tuning Pattern For nodes that support runtime reconfiguration, read parameters every tick rather than caching them in `init()`. The overhead of `get()` is negligible compared to typical tick budgets. ```cpp // Good: reads fresh values every tick void tick() { double speed = params.get("max_speed", 1.0); bool enabled = params.get("enabled", true); // ... } // Avoid: stale values after parameter update void init() { max_speed_ = params.get("max_speed", 1.0); // cached, never refreshed } ``` For parameters that require validation or have side effects on change, check with `has()` and validate before applying: ```cpp void tick() { if (params.has("new_target")) { auto target = params.get("new_target", ""); if (validate_target(target)) { current_target_ = target; horus::log::info("nav", "Target updated"); } } } ``` --- ## TransformFrame API Path: /cpp/api/transform Description: Coordinate frame tree for tracking spatial relationships between robot links, sensors, and world frames # TransformFrame API `horus::TransformFrame` maintains a tree of coordinate frames -- world, base link, sensors, end effectors -- and computes transforms between any two frames. This is the C++ equivalent of ROS TF2: register frames in a parent-child hierarchy, update their poses over time, and query the transform between arbitrary frames. ```cpp #include ``` --- ## Quick Reference | Method | Description | |--------|-------------| | `TransformFrame()` | Create frame tree with default capacity | | `TransformFrame(max_frames)` | Create frame tree with reserved capacity | | `register_frame(name, parent)` | Add a frame to the tree | | `update(frame, pos, rot, ts)` | Set a frame's current pose | | `lookup(source, target)` | Compute transform between two frames | | `can_transform(source, target)` | Check if a path exists between frames | --- ## Transform Struct The result of a `lookup()` call. Contains translation and rotation. ```cpp struct Transform { std::array translation; // [x, y, z] in meters std::array rotation; // [qx, qy, qz, qw] quaternion }; ``` | Field | Type | Description | |-------|------|-------------| | `translation` | `std::array` | Position offset as [x, y, z] in meters | | `rotation` | `std::array` | Orientation as quaternion [qx, qy, qz, qw]. Identity = [0, 0, 0, 1] | The quaternion convention is Hamilton (scalar-last): `qx, qy, qz` are the imaginary components, `qw` is the real (scalar) component. Identity rotation is `{0, 0, 0, 1}`. --- ## Constructors ### Default ```cpp horus::TransformFrame tf; ``` Creates a frame tree with default capacity. Suitable for most robots. ### With Capacity ```cpp horus::TransformFrame tf(size_t max_frames); ``` Pre-allocates storage for `max_frames` frames. Use when you know the frame count ahead of time to avoid reallocation. ```cpp // Robot with 20 joints + 5 sensors + world + base = 27 frames horus::TransformFrame tf(32); ``` `TransformFrame` is move-only. Copy construction and copy assignment are deleted. --- ## register_frame() ```cpp int register_frame(const char* name, const char* parent = nullptr); ``` Adds a frame to the tree under the specified parent. **Parameters** | Name | Type | Description | |------|------|-------------| | `name` | `const char*` | Unique name for this frame (e.g., `"base_link"`, `"camera_optical"`). | | `parent` | `const char*` | Parent frame name, or `nullptr`/`""` for a root frame. | **Returns** the frame ID (non-negative) on success, or `-1` on error (duplicate name, parent not found). **Rules**: - The first frame registered with no parent becomes the root. - Every subsequent frame must have a parent that already exists in the tree. - Frame names must be unique within the tree. ```cpp horus::TransformFrame tf; tf.register_frame("world"); // root (no parent) tf.register_frame("base_link", "world"); // child of world tf.register_frame("shoulder", "base_link"); // child of base_link tf.register_frame("camera", "base_link"); // another child of base_link ``` --- ## update() ```cpp bool update( const char* frame, const std::array& pos, const std::array& rot, uint64_t timestamp_ns ); ``` Sets the current pose of a frame relative to its parent. **Parameters** | Name | Type | Description | |------|------|-------------| | `frame` | `const char*` | Name of the frame to update. Must already be registered. | | `pos` | `std::array` | Translation [x, y, z] in meters relative to parent. | | `rot` | `std::array` | Rotation [qx, qy, qz, qw] quaternion relative to parent. | | `timestamp_ns` | `uint64_t` | Timestamp in nanoseconds. Use monotonic clock for consistency. | **Returns** `true` on success, `false` if the frame is not found. ```cpp uint64_t now = get_monotonic_ns(); // Base link is 0.5m above ground, no rotation tf.update("base_link", {0.0, 0.0, 0.5}, {0, 0, 0, 1}, now); // Shoulder rotated 45 degrees about Z // qz = sin(pi/8), qw = cos(pi/8) tf.update("shoulder", {0.0, 0.0, 0.3}, {0, 0, 0.3827, 0.9239}, now); ``` --- ## lookup() ```cpp std::optional lookup(const char* source, const char* target) const; ``` Computes the transform that converts a point in the `source` frame to the `target` frame. Traverses the tree through the common ancestor. **Parameters** | Name | Type | Description | |------|------|-------------| | `source` | `const char*` | Frame to transform from. | | `target` | `const char*` | Frame to transform to. | **Returns** `std::optional` -- the transform if a path exists, `std::nullopt` if either frame is unknown or they are in disconnected subtrees. ```cpp auto result = tf.lookup("camera", "world"); if (result.has_value()) { auto& t = *result; double cam_x = t.translation[0]; // camera X in world frame double cam_y = t.translation[1]; double cam_z = t.translation[2]; } ``` ### Transform Direction `lookup("A", "B")` gives you the transform **from A to B**. To transform a point `p` expressed in frame A into frame B coordinates, apply this transform: ``` p_in_B = rotation * p_in_A + translation ``` --- ## can_transform() ```cpp bool can_transform(const char* source, const char* target) const; ``` Returns `true` if a path exists between the two frames in the tree. Use this to check connectivity before calling `lookup()`. ```cpp if (tf.can_transform("lidar", "base_link")) { auto t = tf.lookup("lidar", "base_link"); // guaranteed to succeed } ``` --- ## Static vs Dynamic Transforms **Static transforms** are set once and never change -- sensor mounts, fixed joints, calibration offsets. **Dynamic transforms** change every tick -- joint angles, odometry, SLAM corrections. Both use the same `update()` call. The distinction is in how often you call it. ```cpp void init() { tf.register_frame("world"); tf.register_frame("base_link", "world"); tf.register_frame("camera", "base_link"); tf.register_frame("lidar", "base_link"); uint64_t now = get_monotonic_ns(); // Static: camera mounted 0.1m forward, 0.3m up, no rotation tf.update("camera", {0.1, 0.0, 0.3}, {0, 0, 0, 1}, now); // Static: lidar mounted 0.0m forward, 0.4m up, no rotation tf.update("lidar", {0.0, 0.0, 0.4}, {0, 0, 0, 1}, now); } void tick() { uint64_t now = get_monotonic_ns(); // Dynamic: base_link updated from odometry every tick double odom_x = get_odometry_x(); double odom_y = get_odometry_y(); double odom_yaw = get_odometry_yaw(); double qz = std::sin(odom_yaw / 2.0); double qw = std::cos(odom_yaw / 2.0); tf.update("base_link", {odom_x, odom_y, 0.5}, {0, 0, qz, qw}, now); // Now lookup camera position in world frame auto cam_world = tf.lookup("camera", "world"); // Automatically chains: camera -> base_link -> world } ``` --- ## Example: Multi-Sensor Robot A mobile robot with a camera, lidar, and IMU, each at a fixed offset from the base. ```cpp #include struct SensorFusion { horus::TransformFrame tf; void init() { tf.register_frame("world"); tf.register_frame("odom", "world"); tf.register_frame("base_link", "odom"); tf.register_frame("camera_link", "base_link"); tf.register_frame("camera_optical", "camera_link"); tf.register_frame("lidar", "base_link"); uint64_t now = get_monotonic_ns(); // Static sensor mounts (set once) tf.update("camera_link", {0.12, 0.0, 0.25}, {0, 0, 0, 1}, now); tf.update("camera_optical", {0, 0, 0}, {-0.5, 0.5, -0.5, 0.5}, now); tf.update("lidar", {0.0, 0.0, 0.35}, {0, 0, 0, 1}, now); } void tick() { uint64_t now = get_monotonic_ns(); tf.update("odom", {odom_x, odom_y, 0}, {0, 0, odom_qz, odom_qw}, now); // Fuse lidar points into world frame if (tf.can_transform("lidar", "world")) { auto t = tf.lookup("lidar", "world"); // t->translation = lidar position in world coordinates } } double odom_x = 0, odom_y = 0, odom_qz = 0, odom_qw = 1; }; ``` --- ## Frame Tree Conventions Follow these naming conventions for consistency with the HORUS ecosystem and ROS interop: | Frame | Parent | Purpose | |-------|--------|---------| | `world` | (root) | Fixed global reference | | `odom` | `world` | Odometry origin (may drift) | | `base_link` | `odom` | Robot center (on ground plane) | | `base_footprint` | `base_link` | Projected to ground (optional) | | `*_link` | `base_link` | Sensor mount points | | `*_optical` | `*_link` | Optical convention (Z-forward) | --- ## Logging & BlackBox API Path: /cpp/api/logging Description: Structured logging visible in the CLI and persistent blackbox recording for post-mortem analysis # Logging & BlackBox API HORUS provides two complementary recording mechanisms: - **`horus::log`** -- Structured log messages visible in the `horus log` CLI. Use for operational status, warnings, and errors during normal operation. - **`horus::blackbox`** -- Persistent flight recorder for post-mortem analysis. Use for events that matter after a crash or anomaly. ```cpp #include ``` --- ## Quick Reference ### Logging | Function | Level | Use Case | |----------|-------|----------| | `horus::log::info(node, msg)` | Info | Normal operation, milestones | | `horus::log::warn(node, msg)` | Warn | Degraded operation, approaching limits | | `horus::log::error(node, msg)` | Error | Failures, safety events | ### BlackBox | Function | Purpose | |----------|---------| | `horus::blackbox::record(node, event, data)` | Record an event to the flight recorder | --- ## horus::log ### info() ```cpp void horus::log::info(const char* node, const char* msg); void horus::log::info(const std::string& node, const std::string& msg); ``` Emits an informational log message. Visible in `horus log` with default filters. **Parameters** | Name | Type | Description | |------|------|-------------| | `node` | `const char*` or `const std::string&` | Name of the node emitting the log. Should match the node's registered name. | | `msg` | `const char*` or `const std::string&` | Log message content. | ```cpp horus::log::info("motor_ctrl", "Motor controller initialized at 1000Hz"); horus::log::info("camera", "Streaming 1280x720 RGB @ 30fps"); ``` ### warn() ```cpp void horus::log::warn(const char* node, const char* msg); void horus::log::warn(const std::string& node, const std::string& msg); ``` Emits a warning log message. Indicates degraded operation or approaching limits. ```cpp horus::log::warn("imu", "Calibration drift detected: 0.3 deg/s"); horus::log::warn("battery", "Battery at 15% -- consider returning to base"); ``` ### error() ```cpp void horus::log::error(const char* node, const char* msg); void horus::log::error(const std::string& node, const std::string& msg); ``` Emits an error log message. Indicates a failure that requires attention. ```cpp horus::log::error("safety", "Emergency stop triggered by collision sensor"); horus::log::error("motor_ctrl", "Motor driver communication timeout"); ``` ### Log Levels | Level | Constant | When to Use | |-------|----------|-------------| | Info | 0 | Normal operation: startup, milestones, periodic status | | Warn | 1 | Something is wrong but the system continues: sensor drift, low battery, approaching limits | | Error | 2 | Something failed: communication loss, safety event, unrecoverable state | The log level is passed internally as an integer to the C FFI layer (`horus_log(level, node, msg)`). Info = 0, Warn = 1, Error = 2. --- ## Viewing Logs with the CLI All log messages are visible in the `horus log` command: ```bash # Stream all logs horus log # Filter by node horus log --node motor_ctrl # Filter by level horus log --level warn # Filter by level and node horus log --level error --node safety ``` Logs appear in real-time as nodes emit them. The CLI connects to the same SHM transport that nodes use for IPC. --- ## String Formatting The C++ log API takes `const char*` or `const std::string&`. For formatted messages, build the string before logging: ```cpp // Using std::to_string double temp = 72.5; horus::log::info("sensor", ("Temperature: " + std::to_string(temp) + " C").c_str()); // Using snprintf for precise formatting char buf[256]; std::snprintf(buf, sizeof(buf), "Position: (%.3f, %.3f, %.3f)", x, y, z); horus::log::info("odom", buf); // Using std::string concatenation std::string msg = "Joint angles: ["; for (size_t i = 0; i < 6; ++i) { if (i > 0) msg += ", "; msg += std::to_string(angles[i]); } msg += "]"; horus::log::info("arm", msg); ``` --- ## BlackBox Recording The blackbox is a persistent flight recorder. Unlike logs, blackbox entries survive crashes and are designed for post-mortem analysis. Think of it as the "black box" on an aircraft. ### record() ```cpp void horus::blackbox::record(const char* node, const char* event, const char* data); ``` Records a structured event to the flight recorder. **Parameters** | Name | Type | Description | |------|------|-------------| | `node` | `const char*` | Node that recorded the event. | | `event` | `const char*` | Event category (e.g., `"collision"`, `"estop"`, `"joint_limit"`). | | `data` | `const char*` | Event payload -- typically JSON or a structured string. | ```cpp horus::blackbox::record("safety", "collision", "{\"sensor\": \"bumper_front\", \"force_n\": 45.2}"); horus::blackbox::record("motor_ctrl", "overcurrent", "{\"motor\": 3, \"current_a\": 12.5, \"limit_a\": 10.0}"); horus::blackbox::record("nav", "goal_reached", "{\"goal\": \"charging_station\", \"error_m\": 0.02}"); ``` ### Viewing BlackBox Data ```bash # Dump all blackbox entries horus blackbox dump # Filter by node horus blackbox dump --node safety # Filter by event type horus blackbox dump --event collision # Export for analysis horus blackbox export --format json > crash_report.json ``` --- ## When to Use Log vs BlackBox | Scenario | Use | Why | |----------|-----|-----| | "Motor initialized at 1000Hz" | `log::info` | Operational status, no post-mortem value | | "IMU drift exceeds 1 deg/s" | `log::warn` | Operator should see it, but not a crash event | | "Emergency stop triggered" | Both | Operator sees it now (log), investigators see it later (blackbox) | | "Joint hit limit at 2.35 rad" | `blackbox::record` | Critical for root-cause analysis after incident | | "Collision detected, force=45N" | `blackbox::record` | Physical event that must be preserved | | "Starting path to waypoint 3" | `log::info` | Operational, no forensic value | | "Motor overcurrent: 12.5A" | Both | Safety event visible now and preserved | **Rule of thumb**: If you would want to see it when investigating why a robot stopped working 3 hours ago, record it in the blackbox. --- ## Example: Structured Diagnostics A motor controller that logs operational status and records safety events for post-mortem analysis. ```cpp #include #include struct MotorDiagnostics { double current_limit = 10.0; // amps int overcurrent_count = 0; void check_motor(int motor_id, double current_a, double temp_c) { // Normal status -- log char buf[256]; std::snprintf(buf, sizeof(buf), "Motor %d: %.1fA, %.1f C", motor_id, current_a, temp_c); horus::log::info("motor_diag", buf); // Warning threshold -- warn if (current_a > current_limit * 0.8) { std::snprintf(buf, sizeof(buf), "Motor %d approaching current limit: %.1fA / %.1fA", motor_id, current_a, current_limit); horus::log::warn("motor_diag", buf); } // Overcurrent -- error + blackbox if (current_a > current_limit) { overcurrent_count++; std::snprintf(buf, sizeof(buf), "Motor %d overcurrent: %.1fA (limit %.1fA)", motor_id, current_a, current_limit); horus::log::error("motor_diag", buf); // Blackbox: structured data for post-mortem char event_data[512]; std::snprintf(event_data, sizeof(event_data), "{\"motor\": %d, \"current_a\": %.2f, \"limit_a\": %.2f, " "\"temp_c\": %.1f, \"count\": %d}", motor_id, current_a, current_limit, temp_c, overcurrent_count); horus::blackbox::record("motor_diag", "overcurrent", event_data); } // Thermal warning if (temp_c > 80.0) { std::snprintf(buf, sizeof(buf), "Motor %d thermal warning: %.1f C", motor_id, temp_c); horus::log::warn("motor_diag", buf); char event_data[256]; std::snprintf(event_data, sizeof(event_data), "{\"motor\": %d, \"temp_c\": %.1f}", motor_id, temp_c); horus::blackbox::record("motor_diag", "thermal_warning", event_data); } } }; ``` --- ## Best Practices **Do:** - Use the node name consistently -- match the name registered with the scheduler - Keep messages concise -- include the key metric, not a paragraph - Use blackbox for any event you would want during incident investigation - Include units in numeric values (`"45.2N"`, `"12.5A"`, `"2.35rad"`) **Avoid:** - Logging every tick at info level -- this floods the log at 1000Hz. Log periodic summaries instead - Logging inside tight loops without rate limiting - Using error level for non-errors (e.g., "no new data this tick" is normal, not an error) - Putting sensitive data (passwords, keys) in logs or blackbox --- ## Duration & Frequency API Path: /cpp/api/duration Description: User-defined literals for durations and frequencies, plus the Miss policy enum for deadline handling # Duration & Frequency API HORUS provides user-defined literals that mirror the Rust `DurationExt` trait: `100_hz`, `10_ms`, `200_us`. These make scheduling configuration readable and type-safe. ```cpp #include using namespace horus::literals; ``` --- ## Quick Reference | Literal | Type | Example | Value | |---------|------|---------|-------| | `_hz` | `horus::Frequency` | `100_hz` | 100 Hz frequency | | `_s` | `std::chrono::microseconds` | `5_s` | 5,000,000 us | | `_ms` | `std::chrono::microseconds` | `10_ms` | 10,000 us | | `_us` | `std::chrono::microseconds` | `200_us` | 200 us | | `_ns` | `std::chrono::nanoseconds` | `500_ns` | 500 ns | All duration literals produce `std::chrono` types. The `_hz` literal produces `horus::Frequency`. --- ## Duration Type ```cpp using horus::Duration = std::chrono::microseconds; ``` HORUS uses microseconds as the standard duration unit internally. The `_s` and `_ms` literals convert to microseconds automatically. --- ## Literals Enable the literals with: ```cpp using namespace horus::literals; ``` ### Signatures ```cpp constexpr horus::Frequency operator""_hz(unsigned long long hz); constexpr std::chrono::microseconds operator""_s(unsigned long long s); constexpr std::chrono::microseconds operator""_ms(unsigned long long ms); constexpr std::chrono::microseconds operator""_us(unsigned long long us); constexpr std::chrono::nanoseconds operator""_ns(unsigned long long ns); ``` Note: `_ns` returns `std::chrono::nanoseconds`; all other duration literals return `std::chrono::microseconds`. ### Examples ```cpp auto rate = 100_hz; // Frequency(100.0) auto timeout = 5_s; // 5,000,000 microseconds auto budget = 10_ms; // 10,000 microseconds auto jitter = 800_us; // 800 microseconds auto precise = 500_ns; // 500 nanoseconds ``` --- ## Frequency Class `horus::Frequency` wraps a frequency value in Hz and provides methods to derive scheduling parameters. ### Constructor ```cpp constexpr explicit Frequency(double hz); ``` Creates a frequency from a `double`. Prefer the `_hz` literal for integer frequencies. ### value() ```cpp constexpr double value() const; ``` Returns the frequency in Hz. ```cpp auto rate = 100_hz; double hz = rate.value(); // 100.0 ``` ### period() ```cpp constexpr std::chrono::microseconds period() const; ``` Returns the period (time between ticks) as microseconds. Computed as `1,000,000 / hz`. ```cpp auto rate = 100_hz; auto p = rate.period(); // 10,000 us = 10 ms ``` | Frequency | Period | |-----------|--------| | `10_hz` | 100,000 us (100 ms) | | `50_hz` | 20,000 us (20 ms) | | `100_hz` | 10,000 us (10 ms) | | `500_hz` | 2,000 us (2 ms) | | `1000_hz` | 1,000 us (1 ms) | ### budget_default() ```cpp constexpr std::chrono::microseconds budget_default() const; ``` Returns the default compute budget: **80% of the period**. This is the maximum time a node's `tick()` should take under normal conditions. ```cpp auto rate = 100_hz; auto budget = rate.budget_default(); // 8,000 us = 8 ms (80% of 10 ms) ``` ### deadline_default() ```cpp constexpr std::chrono::microseconds deadline_default() const; ``` Returns the default deadline: **95% of the period**. If `tick()` exceeds this, the `Miss` policy is triggered. ```cpp auto rate = 100_hz; auto deadline = rate.deadline_default(); // 9,500 us = 9.5 ms (95% of 10 ms) ``` ### Derived Timing Table | Frequency | Period | Budget (80%) | Deadline (95%) | |-----------|--------|-------------|----------------| | `10_hz` | 100 ms | 80 ms | 95 ms | | `50_hz` | 20 ms | 16 ms | 19 ms | | `100_hz` | 10 ms | 8 ms | 9.5 ms | | `500_hz` | 2 ms | 1.6 ms | 1.9 ms | | `1000_hz` | 1 ms | 0.8 ms | 0.95 ms | --- ## Miss Enum Defines what happens when a node's `tick()` exceeds its deadline. ```cpp enum class Miss { Warn, // Log a warning, continue running Skip, // Skip the next tick to catch up SafeMode, // Call enter_safe_state() on the node Stop, // Stop the scheduler entirely }; ``` | Variant | Behavior | Use Case | |---------|----------|----------| | `Miss::Warn` | Logs a deadline-miss warning. Node keeps running at normal rate. | Non-critical nodes: logging, visualization, telemetry | | `Miss::Skip` | Skips the next scheduled tick to recover timing. Prevents cascading overruns. | Compute-heavy nodes: planning, perception | | `Miss::SafeMode` | Calls `enter_safe_state()` on the node, then continues. | Safety-critical nodes: motor control, force limiting | | `Miss::Stop` | Triggers scheduler shutdown. All nodes receive `shutdown()`. | Hard real-time systems where any overrun is catastrophic | --- ## Using with NodeBuilder Durations and frequencies are designed for the scheduler's builder API. ### rate() Sets the node's tick frequency. The scheduler calls `tick()` at this rate. ```cpp using namespace horus::literals; horus::Scheduler sched; sched.add("motor_ctrl") .rate(1000_hz) .tick([&] { /* runs every 1 ms */ }) .build(); ``` ### budget() Sets the maximum expected execution time for `tick()`. If exceeded, the watchdog records a budget overrun. ```cpp sched.add("motor_ctrl") .rate(1000_hz) .budget(800_us) // tick() should finish within 800 us .tick([&] { /* motor control */ }) .build(); ``` ### deadline() Sets the hard deadline. If exceeded, the `Miss` policy is triggered. ```cpp sched.add("motor_ctrl") .rate(1000_hz) .budget(800_us) .deadline(950_us) .on_miss(horus::Miss::SafeMode) .tick([&] { /* motor control */ }) .build(); ``` ### Full Example A complete scheduler setup with three nodes at different rates and miss policies. ```cpp #include #include using namespace horus::literals; int main() { horus::Scheduler sched; sched.tick_rate(1000_hz); // Hard RT: motor control at 1 kHz, enters safe state on overrun sched.add("motor_ctrl") .rate(1000_hz) .budget(800_us) .deadline(950_us) .on_miss(horus::Miss::SafeMode) .pin_core(2) .priority(90) .order(0) .tick([&] { /* PID loop */ }) .build(); // Soft RT: state estimation at 200 Hz, skips on overrun sched.add("estimator") .rate(200_hz) .budget(4_ms) .deadline(4500_us) .on_miss(horus::Miss::Skip) .order(10) .tick([&] { /* EKF update */ }) .build(); // Best effort: logging at 10 Hz, warns on overrun sched.add("logger") .rate(10_hz) .budget(50_ms) .on_miss(horus::Miss::Warn) .order(100) .tick([&] { /* write logs */ }) .build(); sched.run(); } ``` --- ## Conversion Examples ### Frequency to Duration ```cpp auto rate = 100_hz; auto period = rate.period(); // 10,000 us auto budget = rate.budget_default(); // 8,000 us auto deadline = rate.deadline_default(); // 9,500 us ``` ### Duration Arithmetic Since durations are `std::chrono` types, standard arithmetic works: ```cpp auto a = 500_us; auto b = 300_us; auto sum = a + b; // 800 us auto diff = a - b; // 200 us auto scaled = a * 2; // 1000 us // Compare bool fits = (800_us < 1_ms); // true ``` --- ## Sensor Messages Path: /cpp/api/sensor-messages Description: LaserScan, Imu, Odometry, JointState, RangeSensor, BatteryState, NavSatFix, and more # Sensor Messages (C++) All sensor types live in `horus::msg::` namespace. Include via ``. ## Quick Reference | Type | Size | Key Fields | Use Case | |------|------|------------|----------| | `LaserScan` | 1472 B | `ranges[360]`, `angle_min/max` | 2D LiDAR | | `Imu` | 304 B | `orientation[4]`, `angular_velocity[3]`, `linear_acceleration[3]` | IMU | | `Odometry` | ~700 B | `pose` (Pose2D), `twist` (Twist), covariance matrices | Wheel encoders | | `JointState` | ~900 B | `names[16][32]`, `positions[16]`, `velocities[16]`, `efforts[16]` | Robot arms | | `RangeSensor` | 32 B | `range`, `min/max_range`, `field_of_view` | Ultrasonic/IR | | `BatteryState` | 40 B | `voltage`, `current`, `percentage`, `health`, `status` | Battery monitor | | `NavSatFix` | 112 B | `latitude`, `longitude`, `altitude`, `satellites` | GPS | | `MagneticField` | 136 B | `magnetic_field[3]`, covariance | Magnetometer | | `Temperature` | 56 B | `temperature` (Celsius), `variance` | Thermometer | | `FluidPressure` | 56 B | `fluid_pressure` (Pascals) | Barometer | | `Illuminance` | 56 B | `illuminance` (Lux) | Light sensor | ## LaserScan The most common sensor message. 360 range readings for a full-circle 2D LiDAR. ```cpp #include class LidarNode : public horus::Node { public: LidarNode() : Node("lidar") { pub_ = advertise("lidar.scan"); } void tick() override { auto scan = pub_->loan(); // Fill range data (meters, 0 = invalid) for (int i = 0; i < 360; i++) { scan->ranges[i] = read_range(i); } scan->angle_min = 0.0f; // start angle (radians) scan->angle_max = 6.28318f; // end angle (2*pi) scan->min_range = 0.12f; // sensor minimum (meters) scan->max_range = 12.0f; // sensor maximum scan->time_increment = 0.0f; // time between measurements scan->scan_time = 0.1f; // full scan duration (seconds) scan->timestamp_ns = 0; pub_->publish(std::move(scan)); } private: horus::Publisher* pub_; float read_range(int angle) { return 2.0f; } // placeholder }; ``` ### Finding Minimum Range ```cpp auto scan = sub_->recv(); if (!scan) return; float min_range = 999.0f; int min_angle = 0; for (int i = 0; i < 360; i++) { float r = scan->get()->ranges[i]; if (r > 0.01f && r < min_range) { min_range = r; min_angle = i; } } // min_angle is in degrees (0-359) ``` ## Imu 9-axis IMU: orientation quaternion + angular velocity + linear acceleration, each with 3x3 covariance. ```cpp horus::msg::Imu imu{}; imu.orientation[0] = 0.0; // qx imu.orientation[1] = 0.0; // qy imu.orientation[2] = 0.0; // qz imu.orientation[3] = 1.0; // qw (identity) imu.angular_velocity[2] = 0.01; // yaw rate (rad/s) imu.linear_acceleration[2] = 9.81; // gravity (m/s^2) imu.timestamp_ns = 0; // Covariance: -1 in first element = "no data" imu.orientation_covariance[0] = -1.0; ``` ## JointState Up to 16 joints with names, positions, velocities, and efforts: ```cpp horus::msg::JointState js{}; js.joint_count = 6; // 6-DOF arm // Set joint names (null-terminated, max 31 chars) std::strncpy(reinterpret_cast(js.names[0]), "shoulder_pan", 31); std::strncpy(reinterpret_cast(js.names[1]), "shoulder_lift", 31); std::strncpy(reinterpret_cast(js.names[2]), "elbow", 31); std::strncpy(reinterpret_cast(js.names[3]), "wrist_1", 31); std::strncpy(reinterpret_cast(js.names[4]), "wrist_2", 31); std::strncpy(reinterpret_cast(js.names[5]), "wrist_3", 31); // Set positions (radians for revolute joints) js.positions[0] = 0.0; js.positions[1] = -1.57; // shoulder down js.positions[2] = 1.57; // elbow bent // velocities and efforts default to 0 ``` ## See Also - [Control Messages](/docs/cpp/api/control-messages) — MotorCommand, ServoCommand, PidConfig - [Geometry Messages](/docs/cpp/api/geometry-messages) — Twist, Pose2D, Point3, Quaternion - [Publisher & Subscriber API](/docs/cpp/api/topic) — how to send/receive these types --- ## Control Messages Path: /cpp/api/control-messages Description: CmdVel, MotorCommand, ServoCommand, JointCommand, PidConfig, and more # Control Messages (C++) All control types in `horus::msg::`. Include via ``. ## Quick Reference | Type | Size | Key Fields | Use Case | |------|------|------------|----------| | `CmdVel` | 16 B | `linear` (m/s), `angular` (rad/s) | Mobile robot velocity | | `MotorCommand` | 32 B | `target_velocity`, `target_position`, `kp`, `kd`, `max_effort` | Single motor | | `ServoCommand` | 24 B | `target_position`, `target_velocity`, `max_force` | Servo actuator | | `DifferentialDriveCommand` | 16 B | `left_velocity`, `right_velocity` | Two-wheel robot | | `PidConfig` | 32 B | `kp`, `ki`, `kd`, `integral_limit`, `derivative_filter` | PID tuning | | `TrajectoryPoint` | 104 B | `positions[3]`, `velocities[3]`, `accelerations[3]`, `effort[3]` | Path following | | `JointCommand` | 88 B | `joint_id[32]`, `target[3]`, `velocity[3]` | Joint-level control | ## CmdVel — The Most Common Message 16 bytes. Used by virtually every mobile robot: ```cpp // Publishing velocity commands auto pub = sched.advertise("cmd_vel"); horus::msg::CmdVel cmd{}; cmd.linear = 0.3f; // 0.3 m/s forward cmd.angular = 0.0f; // no turning cmd.timestamp_ns = 0; pub.send(cmd); // Zero-copy with loan pattern auto sample = pub.loan(); sample->linear = 0.5f; sample->angular = -0.1f; // slight right turn pub.publish(std::move(sample)); ``` ## MotorCommand — Single Motor Control ```cpp horus::msg::MotorCommand cmd{}; cmd.target_velocity = 100.0f; // RPM cmd.target_position = 0.0f; // not used in velocity mode cmd.kp = 1.0f; // proportional gain cmd.kd = 0.1f; // derivative gain cmd.max_effort = 5.0f; // max torque (Nm) cmd.timestamp_ns = 0; ``` ## PidConfig — Runtime Gain Tuning Send PID gains as a message (allows live tuning from another node): ```cpp class GainTuner : public horus::Node { public: GainTuner(horus::Params& params) : Node("tuner"), params_(params) { pid_pub_ = advertise("pid.config"); } void tick() override { horus::msg::PidConfig cfg{}; cfg.kp = static_cast(params_.get("kp", 1.0)); cfg.ki = static_cast(params_.get("ki", 0.1)); cfg.kd = static_cast(params_.get("kd", 0.05)); cfg.integral_limit = 10.0f; cfg.derivative_filter = 0.8f; pid_pub_->send(cfg); } private: horus::Params& params_; horus::Publisher* pid_pub_; }; ``` ## See Also - [Sensor Messages](/docs/cpp/api/sensor-messages) — LaserScan, Imu, Odometry - [Recipe: PID Controller](/docs/recipes/pid-controller-cpp) — using PidConfig in practice - [Recipe: Differential Drive](/docs/recipes/differential-drive-cpp) — CmdVel to wheel speeds --- ## Geometry Messages Path: /cpp/api/geometry-messages Description: Twist, Pose2D, Pose3D, Point3, Vector3, Quaternion, TransformStamped, and covariance types # Geometry Messages (C++) Spatial types in `horus::msg::`. Include via ``. ## Quick Reference | Type | Size | Key Fields | Use Case | |------|------|------------|----------| | `Twist` | 56 B | `linear[3]`, `angular[3]` | 3D velocity | | `Pose2D` | 32 B | `x`, `y`, `theta` | 2D robot pose | | `Pose3D` | 64 B | `position` (Point3), `orientation` (Quaternion) | 3D pose | | `Point3` | 24 B | `x`, `y`, `z` | 3D point | | `Vector3` | 24 B | `x`, `y`, `z` | 3D vector | | `Quaternion` | 32 B | `x`, `y`, `z`, `w` | 3D rotation | | `TransformStamped` | 64 B | `translation[3]`, `rotation[4]` | TF tree data | | `PoseStamped` | 72 B | `Pose3D` + `timestamp_ns` | Timestamped pose | | `PoseWithCovariance` | ~360 B | `Pose3D` + `covariance[36]` | Uncertain pose | | `TwistWithCovariance` | ~344 B | `Twist` + `covariance[36]` | Uncertain velocity | | `Accel` | 56 B | `linear[3]`, `angular[3]` | 3D acceleration | | `AccelStamped` | 64 B | `Accel` + `timestamp_ns` | Timestamped accel | ## Twist — 3D Velocity ```cpp horus::msg::Twist twist{}; twist.linear[0] = 0.5; // vx (m/s) — forward twist.linear[1] = 0.0; // vy — lateral (holonomic only) twist.linear[2] = 0.0; // vz — vertical twist.angular[0] = 0.0; // roll rate (rad/s) twist.angular[1] = 0.0; // pitch rate twist.angular[2] = 0.3; // yaw rate twist.timestamp_ns = 0; ``` ## Pose2D — 2D Robot Position The simplest pose — x, y, heading: ```cpp horus::msg::Pose2D pose{}; pose.x = 1.5; // meters pose.y = 2.0; // meters pose.theta = 0.785; // radians (45 degrees) pose.timestamp_ns = 0; ``` ## Quaternion — 3D Rotation Always normalize. `w=1, x=y=z=0` is identity (no rotation): ```cpp horus::msg::Quaternion q{}; q.x = 0.0; q.y = 0.0; q.z = 0.0; q.w = 1.0; // identity // 90° rotation around Z axis: q.x = 0.0; q.y = 0.0; q.z = 0.7071; q.w = 0.7071; ``` ## TransformStamped — For TF Tree Used internally by TransformFrame, but also publishable: ```cpp horus::msg::TransformStamped tf{}; tf.translation[0] = 0.2; // x offset (meters) tf.translation[1] = 0.0; tf.translation[2] = 0.3; // z offset (height) tf.rotation[0] = 0.0; // qx tf.rotation[1] = 0.0; // qy tf.rotation[2] = 0.0; // qz tf.rotation[3] = 1.0; // qw (identity) tf.timestamp_ns = 0; ``` ## Odometry Uses Pose2D + Twist ```cpp // Odometry combines position and velocity: auto odom = sub_->recv(); if (odom) { double x = odom->get()->pose.x; double y = odom->get()->pose.y; double heading = odom->get()->pose.theta; double vx = odom->get()->twist.linear[0]; double wz = odom->get()->twist.angular[2]; } ``` ## See Also - [Sensor Messages](/docs/cpp/api/sensor-messages) — Odometry uses Pose2D + Twist - [TransformFrame API](/docs/cpp/api/transform) — uses TransformStamped internally - [Navigation Messages](/docs/cpp/api/navigation-messages) — NavGoal uses Pose2D --- ## Navigation Messages Path: /cpp/api/navigation-messages Description: NavGoal, GoalResult, Waypoint, PathPlan, VelocityObstacle # Navigation Messages (C++) Path planning and goal types in `horus::msg::`. Include via ``. ## Quick Reference | Type | Key Fields | Use Case | |------|------------|----------| | `NavGoal` | `target_x`, `target_y`, `target_theta`, `tolerance` | Send navigation target | | `GoalResult` | `success`, `distance_error`, `angle_error` | Navigation completion | | `Waypoint` | `x`, `y`, `heading`, `velocity` | Path waypoint | | `PathPlan` | `waypoint_data[768]`, `waypoint_count`, `goal_pose[3]` | Planned path | | `VelocityObstacle` | `center[2]`, `velocity[2]`, `radius` | Collision avoidance | ## NavGoal — Send Navigation Target ```cpp horus::msg::NavGoal goal{}; goal.target_x = 5.0; // meters goal.target_y = 3.0; // meters goal.target_theta = 1.57; // radians (face east) goal.tolerance = 0.1f; // accept within 10cm goal.timestamp_ns = 0; ``` ## PathPlan — Compact Path Representation 768 floats = 256 waypoints x 3 values (x, y, heading): ```cpp horus::msg::PathPlan plan{}; plan.waypoint_count = 3; plan.goal_pose[0] = 5.0f; // goal x plan.goal_pose[1] = 3.0f; // goal y plan.goal_pose[2] = 0.0f; // goal heading // Waypoint 0: (0,0,0) plan.waypoint_data[0] = 0.0f; plan.waypoint_data[1] = 0.0f; plan.waypoint_data[2] = 0.0f; // Waypoint 1: (2.5, 1.5, 0.5) plan.waypoint_data[3] = 2.5f; plan.waypoint_data[4] = 1.5f; plan.waypoint_data[5] = 0.5f; // Waypoint 2: (5.0, 3.0, 0.0) = goal plan.waypoint_data[6] = 5.0f; plan.waypoint_data[7] = 3.0f; plan.waypoint_data[8] = 0.0f; ``` ## See Also - [Geometry Messages](/docs/cpp/api/geometry-messages) — Pose2D used by Odometry - [Recipe: LiDAR Avoidance](/docs/recipes/lidar-avoidance-cpp) — reactive navigation --- ## Detection & Vision Messages Path: /cpp/api/detection-messages Description: BoundingBox, Detection, CameraInfo, TrackedObject, Landmark, SegmentationMask # Detection & Vision Messages (C++) Perception types in `horus::msg::`. Include via `` and ``. ## Detection Types | Type | Key Fields | Use Case | |------|------------|----------| | `BoundingBox2D` | `center_x/y`, `width`, `height`, `angle` | 2D object detection | | `BoundingBox3D` | `center[3]`, `size[3]`, `rotation[4]`, `confidence` | 3D object detection | | `Detection` | `bbox` (BoundingBox2D), `class_id`, `confidence` | YOLO/SSD output | | `Detection3D` | `bbox` (BoundingBox3D), `class_id`, `velocity[3]` | 3D detector output | | `TrackedObject` | `track_id`, `position[3]`, `velocity[3]`, `age` | MOT tracker | | `SegmentationMask` | `width`, `height`, `num_classes`, `mask_type` | Semantic segmentation | ## Vision Types | Type | Key Fields | Use Case | |------|------------|----------| | `CameraInfo` | `width`, `height`, `fx/fy/cx/cy`, `distortion[5]` | Camera calibration | | `RegionOfInterest` | `x/y_offset`, `width`, `height`, `do_rectify` | Image crop region | | `StereoInfo` | `left` (CameraInfo), `right` (CameraInfo), `baseline` | Stereo pair | ## Detection Example ```cpp class Detector : public horus::Node { public: Detector() : Node("detector") { det_pub_ = advertise("detections"); } void tick() override { // After running inference... horus::msg::Detection det{}; det.bbox.center_x = 320.0f; // pixels det.bbox.center_y = 240.0f; det.bbox.width = 50.0f; det.bbox.height = 80.0f; det.bbox.angle = 0.0f; det.class_id = 1; // "person" det.confidence = 0.95f; det.timestamp_ns = 0; det_pub_->send(det); } private: horus::Publisher* det_pub_; }; ``` ## CameraInfo — Intrinsic Calibration ```cpp horus::msg::CameraInfo cam{}; cam.width = 640; cam.height = 480; cam.fx = 525.0; // focal length x (pixels) cam.fy = 525.0; // focal length y cam.cx = 320.0; // principal point x cam.cy = 240.0; // principal point y cam.distortion[0] = -0.28; // k1 cam.distortion[1] = 0.07; // k2 cam.distortion[2] = 0.0; // p1 cam.distortion[3] = 0.0; // p2 cam.distortion[4] = 0.0; // k3 ``` ## Tracking Types | Type | Key Fields | Use Case | |------|------------|----------| | `Landmark` | `id`, `x`, `y`, `covariance[4]` | 2D landmark | | `Landmark3D` | `x`, `y`, `z`, `visibility`, `index` | 3D landmark (packed) | | `LandmarkArray` | `num_landmarks`, `confidence`, `bbox_*` | Pose estimation output | | `TrackingHeader` | `frame_count`, `active_tracks` | Tracker metadata | ## See Also - [Sensor Messages](/docs/cpp/api/sensor-messages) — LaserScan, Imu for perception input - [TensorPool API](/docs/cpp/api/pool) — Image and PointCloud for raw perception data --- ## Diagnostics Messages Path: /cpp/api/diagnostics-messages Description: Heartbeat, EmergencyStop, DiagnosticStatus, ResourceUsage, SafetyStatus, NodeHeartbeat # Diagnostics Messages (C++) System health and safety types in `horus::msg::`. Include via ``. ## Quick Reference | Type | Key Fields | Use Case | |------|------------|----------| | `Heartbeat` | `node_id[32]`, `sequence`, `alive`, `cpu_usage`, `memory_usage` | Node liveness | | `EmergencyStop` | `engaged` (u8), `reason[64]`, `source[32]`, `auto_reset` | Safety stop | | `DiagnosticStatus` | `level` (0-3), `message[256]` | Component health | | `ResourceUsage` | `cpu_percent`, `memory_mb`, `disk_percent` | System resources | | `NodeHeartbeat` | `node_name[32]`, `tick_rate`, `cpu_usage`, `uptime_ms` | Per-node metrics | | `SafetyStatus` | `safe` (bool), `confidence`, `last_fault[128]` | Safety system state | | `DiagnosticValue` | `key[32]`, `value[64]`, `value_type` | Key-value diagnostic | | `DiagnosticReport` | `component[32]`, `values[16]`, `level` | Multi-value report | ## Heartbeat Published periodically to prove a node is alive: ```cpp class HeartbeatPublisher : public horus::Node { public: HeartbeatPublisher() : Node("heartbeat_pub") { hb_pub_ = advertise("heartbeat"); } void tick() override { if (++tick_ % 100 != 0) return; // 1 Hz at 100 Hz scheduler horus::msg::Heartbeat hb{}; std::strncpy(reinterpret_cast(hb.node_id), "motor_ctrl", 31); hb.sequence = seq_++; hb.alive = true; hb.cpu_usage = 12.5f; // percent hb.memory_usage = 45.0f; // MB hb_pub_->send(hb); } private: horus::Publisher* hb_pub_; uint64_t seq_ = 0; int tick_ = 0; }; ``` ## EmergencyStop The most safety-critical message. `engaged=1` means all actuators must stop immediately: ```cpp // Trigger e-stop horus::msg::EmergencyStop estop{}; estop.engaged = 1; std::strncpy(reinterpret_cast(estop.reason), "Obstacle < 10cm", 63); std::strncpy(reinterpret_cast(estop.source), "safety_monitor", 31); estop.auto_reset = 0; // manual reset required estop_pub_->send(estop); // Clear e-stop horus::msg::EmergencyStop clear{}; clear.engaged = 0; estop_pub_->send(clear); ``` **Convention:** Every actuator node must subscribe to `"emergency.stop"` and zero outputs when `engaged == 1`. ## DiagnosticReport — Multi-Value Health Check ```cpp horus::msg::DiagnosticReport report{}; std::strncpy(reinterpret_cast(report.component), "motor_driver", 31); report.level = 1; // 0=OK, 1=WARN, 2=ERROR report.value_count = 3; // Value 0: temperature std::strncpy(reinterpret_cast(report.values[0].key), "temperature", 31); std::strncpy(reinterpret_cast(report.values[0].value), "72.5", 63); report.values[0].value_type = 2; // float // Value 1: current std::strncpy(reinterpret_cast(report.values[1].key), "current_amps", 31); std::strncpy(reinterpret_cast(report.values[1].value), "3.2", 63); report.values[1].value_type = 2; // Value 2: status std::strncpy(reinterpret_cast(report.values[2].key), "status", 31); std::strncpy(reinterpret_cast(report.values[2].value), "overheating", 63); report.values[2].value_type = 0; // string ``` ## See Also - [Recipe: Emergency Stop](/docs/recipes/emergency-stop-cpp) — full safety monitor implementation - [Guide: Real-Time](/docs/cpp/realtime) — watchdog and miss policies --- ## Force & Tactile Messages Path: /cpp/api/force-messages Description: WrenchStamped, ForceCommand, ContactInfo, HapticFeedback, ImpedanceParameters # Force & Tactile Messages (C++) Force/torque sensing and haptic types in `horus::msg::`. Include via ``. ## Quick Reference | Type | Key Fields | Use Case | |------|------------|----------| | `WrenchStamped` | `force[3]`, `torque[3]` | Force/torque sensor | | `ForceCommand` | `target_force[3]`, `target_torque[3]`, `stiffness`, `damping` | Force control | | `ContactInfo` | `position[3]`, `normal[3]`, `force_magnitude`, `in_contact` | Contact detection | | `HapticFeedback` | `force[3]`, `vibration_amplitude/freq`, `enabled` | Haptic devices | | `ImpedanceParameters` | `stiffness[6]`, `damping[6]`, `mass[6]` | Impedance control | ## WrenchStamped — Force/Torque Sensor ```cpp horus::msg::WrenchStamped wrench{}; wrench.force[0] = 0.0; // Fx (N) wrench.force[1] = 0.0; // Fy (N) wrench.force[2] = -9.81; // Fz (N) — gravity wrench.torque[0] = 0.0; // Tx (Nm) wrench.torque[1] = 0.1; // Ty (Nm) wrench.torque[2] = 0.0; // Tz (Nm) wrench.timestamp_ns = 0; ``` ## ForceCommand — Impedance/Force Control ```cpp horus::msg::ForceCommand cmd{}; cmd.target_force[2] = -5.0; // push down 5N cmd.stiffness = 500.0f; // N/m cmd.damping = 10.0f; // Ns/m cmd.max_force = 20.0f; // safety limit ``` ## ContactInfo — Collision Detection ```cpp // In a contact detection node: auto contact = contact_sub_->recv(); if (contact && contact->get()->in_contact) { double fx = contact->get()->force_magnitude; horus::log::warn("contact", "Contact detected, force=" + std::to_string(fx) + "N"); } ``` ## See Also - [Sensor Messages](/docs/cpp/api/sensor-messages) — complementary sensor types - [Control Messages](/docs/cpp/api/control-messages) — MotorCommand, ServoCommand --- ## Tracking & Perception Messages Path: /cpp/api/tracking-messages Description: TrackedObject, Landmark, SegmentationMask, PointField, PlaneDetection # Tracking & Perception Messages (C++) Object tracking, landmarks, segmentation, and point cloud field types. ## Tracking | Type | Key Fields | Use Case | |------|------------|----------| | `TrackedObject` | `track_id`, `position[3]`, `velocity[3]`, `confidence`, `age` | Multi-object tracking | | `TrackingHeader` | `frame_count`, `active_tracks` | Tracker metadata | ```cpp horus::msg::TrackedObject obj{}; obj.track_id = 42; obj.position[0] = 1.5; // x meters obj.position[1] = 0.3; // y obj.position[2] = 0.0; // z obj.velocity[0] = 0.5; // vx m/s obj.confidence = 0.92f; obj.age = 15; // frames since first detection ``` ## Landmarks | Type | Key Fields | Use Case | |------|------------|----------| | `Landmark` | `id`, `x`, `y`, `covariance[4]` | 2D SLAM landmark | | `Landmark3D` | `x`, `y`, `z`, `visibility`, `index` | Pose estimation (packed) | | `LandmarkArray` | `num_landmarks`, `confidence`, `bbox_*` | Body/hand pose output | ```cpp horus::msg::LandmarkArray pose{}; pose.num_landmarks = 21; // hand keypoints pose.dimension = 3; // 3D pose.confidence = 0.85f; pose.bbox_x = 100.0f; // bounding box pose.bbox_y = 200.0f; pose.bbox_width = 80.0f; pose.bbox_height = 120.0f; ``` ## Segmentation ```cpp horus::msg::SegmentationMask mask{}; mask.width = 640; mask.height = 480; mask.num_classes = 21; // COCO classes mask.mask_type = 0; // 0=semantic, 1=instance // Pixel data stored separately via Image/TensorPool ``` ## Point Cloud Fields ```cpp horus::msg::PointField field{}; std::strncpy(reinterpret_cast(field.name), "x", 15); field.offset = 0; field.datatype = 0; // 0=f32 field.count = 1; ``` ## Clock & Time | Type | Key Fields | Use Case | |------|------------|----------| | `Clock` | `nanoseconds`, `source` | Time synchronization | | `TimeReference` | `time_ref_ns`, `source[32]` | External time source | ## Input | Type | Key Fields | Use Case | |------|------------|----------| | `KeyboardInput` | `key_code`, `pressed`, `modifiers` | Keyboard teleop | | `JoystickInput` | `axes[8]`, `buttons[16]`, `num_axes/buttons` | Gamepad control | | `AudioFrame` | `samples[4096]`, `sample_rate`, `channels` | Audio processing | ## See Also - [Detection Messages](/docs/cpp/api/detection-messages) — BoundingBox, Detection - [TensorPool API](/docs/cpp/api/pool) — Image and PointCloud for raw data --- ## Basic Examples Path: /cpp/examples/basic Description: Complete working C++ projects — from hello world to multi-node pub/sub # Basic Examples Complete, runnable C++ projects. Each includes `horus.toml`, `CMakeLists.txt`, and `src/main.cpp`. --- ## Example 1: Hello Node The simplest HORUS program — one node that prints on each tick. ### horus.toml ```toml [package] name = "hello_node" version = "0.1.0" language = "cpp" ``` ### src/main.cpp ```cpp #include #include using namespace horus::literals; int main() { horus::Scheduler sched; sched.tick_rate(1_hz).name("hello"); int count = 0; sched.add("greeter") .tick([&] { std::printf("Hello from HORUS! tick=%d\n", ++count); }) .build(); sched.spin(); } ``` ```bash horus new hello_node --lang cpp # paste the code above horus run ``` --- ## Example 2: Publisher + Subscriber Two nodes sharing data over a topic. ### src/main.cpp ```cpp #include #include using namespace horus::literals; int main() { horus::Scheduler sched; sched.tick_rate(10_hz).name("pubsub_demo"); auto pub = sched.advertise("demo.cmd"); auto sub = sched.subscribe("demo.cmd"); int seq = 0; sched.add("publisher") .order(0) .tick([&] { auto msg = pub.loan(); msg->linear = static_cast(seq) * 0.1f; msg->angular = 0.0f; msg->timestamp_ns = static_cast(seq); pub.publish(std::move(msg)); seq++; }) .build(); sched.add("subscriber") .order(10) .tick([&] { auto msg = sub.recv(); if (msg) { std::printf("Received: linear=%.2f seq=%lu\n", msg->get()->linear, msg->get()->timestamp_ns); } }) .build(); sched.spin(); } ``` --- ## Example 3: Struct-Based Node A proper Node subclass with state, pub/sub, and lifecycle hooks. ### src/main.cpp ```cpp #include #include #include using namespace horus::literals; class SineWaveGenerator : public horus::Node { public: SineWaveGenerator() : Node("sine_wave") { pub_ = advertise("wave.output"); } void init() override { horus::log::info("sine", "Generator started at 50 Hz"); } void tick() override { double t = tick_++ * 0.02; // 50 Hz → 20ms per tick horus::msg::CmdVel msg{}; msg.linear = static_cast(std::sin(2.0 * M_PI * 0.5 * t)); // 0.5 Hz sine msg.angular = static_cast(std::cos(2.0 * M_PI * 0.5 * t)); pub_->send(msg); } private: horus::Publisher* pub_; int tick_ = 0; }; class Plotter : public horus::Node { public: Plotter() : Node("plotter") { sub_ = subscribe("wave.output"); } void tick() override { auto msg = sub_->recv(); if (!msg) return; if (++count_ % 25 != 0) return; // print at 2 Hz float v = msg->get()->linear; int bar = static_cast((v + 1.0f) * 20); // scale to 0-40 std::printf("[%6d] ", count_); for (int i = 0; i < bar; i++) std::printf("█"); std::printf(" %.2f\n", v); } private: horus::Subscriber* sub_; int count_ = 0; }; int main() { horus::Scheduler sched; sched.tick_rate(50_hz).name("sine_demo"); SineWaveGenerator gen; sched.add(gen).order(0).build(); Plotter plot; sched.add(plot).order(10).build(); sched.spin(); } ``` --- ## Example 4: Runtime Parameters Change behavior without recompiling. ### src/main.cpp ```cpp #include #include using namespace horus::literals; int main() { horus::Params params; params.set("speed", 0.3); params.set("enabled", true); params.set("robot_name", "atlas"); horus::Scheduler sched; sched.tick_rate(10_hz); auto pub = sched.advertise("cmd_vel"); sched.add("driver") .tick([&] { bool enabled = params.get("enabled", true); double speed = params.get("speed", 0.3); std::string name = params.get("robot_name", "unknown"); horus::msg::CmdVel cmd{}; cmd.linear = enabled ? static_cast(speed) : 0.0f; pub.send(cmd); static int t = 0; if (++t % 10 == 0) { std::printf("[%s] speed=%.1f enabled=%s\n", name.c_str(), speed, enabled ? "yes" : "no"); } }) .build(); // Simulate parameter change after 3 seconds for (int i = 0; i < 100; i++) { sched.tick_once(); if (i == 30) { params.set("speed", 0.8); std::printf(">>> Speed changed to 0.8\n"); } if (i == 60) { params.set("enabled", false); std::printf(">>> Motor disabled\n"); } } } ``` --- ## Example 5: Transform Frames Coordinate frame tree for a robot with sensors. ### src/main.cpp ```cpp #include #include using namespace horus::literals; int main() { horus::TransformFrame tf; tf.register_frame("world"); tf.register_frame("base_link", "world"); tf.register_frame("lidar", "base_link"); tf.register_frame("camera", "base_link"); // Static mounts tf.update("lidar", {0.2, 0.0, 0.3}, {0, 0, 0, 1}, 0); tf.update("camera", {0.1, 0.0, 0.25}, {0, 0, 0, 1}, 0); // Simulate robot moving forward for (int i = 0; i < 50; i++) { double x = i * 0.1; tf.update("base_link", {x, 0.0, 0.0}, {0, 0, 0, 1}, 0); if (auto t = tf.lookup("lidar", "world")) { std::printf("tick %2d: lidar in world = (%.1f, %.1f, %.1f)\n", i, t->translation[0], t->translation[1], t->translation[2]); } } } ``` ## See Also - [Advanced Examples](/docs/cpp/examples/advanced) — multi-process, cross-language, RT - [Quick Start](/docs/getting-started/quick-start-cpp) — step-by-step first project --- ## Advanced Examples Path: /cpp/examples/advanced Description: Multi-process pipelines, cross-language systems, RT control, and production patterns # Advanced Examples Production-grade patterns for real robots. --- ## Example 1: Multi-Process Robot Three separate binaries communicating over SHM. ### sensor_driver.cpp ```cpp #include using namespace horus::literals; class LidarDriver : public horus::Node { public: LidarDriver() : Node("lidar") { pub_ = advertise("lidar.scan"); } void tick() override { auto scan = pub_->loan(); for (int i = 0; i < 360; i++) scan->ranges[i] = 2.0f + 0.3f * std::sin(i * 0.1f + tick_ * 0.05f); scan->angle_min = 0.0f; scan->angle_max = 6.28318f; pub_->publish(std::move(scan)); tick_++; } private: horus::Publisher* pub_; int tick_ = 0; }; int main() { horus::Scheduler sched; sched.tick_rate(10_hz).name("lidar_proc"); LidarDriver lidar; sched.add(lidar).order(0).build(); sched.spin(); } ``` ### controller.cpp ```cpp #include using namespace horus::literals; class Controller : public horus::Node { public: Controller() : Node("controller") { scan_ = subscribe("lidar.scan"); cmd_ = advertise("cmd_vel"); } void tick() override { auto scan = scan_->recv(); if (!scan) return; float min_r = 999.0f; for (int i = 0; i < 360; i++) if (scan->get()->ranges[i] > 0.01f && scan->get()->ranges[i] < min_r) min_r = scan->get()->ranges[i]; horus::msg::CmdVel cmd{}; cmd.linear = min_r > 0.5f ? 0.3f : 0.0f; cmd.angular = min_r > 0.5f ? 0.0f : 0.5f; cmd_->send(cmd); } void enter_safe_state() override { horus::msg::CmdVel stop{}; cmd_->send(stop); } private: horus::Subscriber* scan_; horus::Publisher* cmd_; }; int main() { horus::Scheduler sched; sched.tick_rate(50_hz).name("ctrl_proc").prefer_rt(); Controller ctrl; sched.add(ctrl).order(0).budget(5_ms).on_miss(horus::Miss::SafeMode).build(); sched.spin(); } ``` ### Run ```bash # Terminal 1 (start subscriber first) LD_LIBRARY_PATH=target/debug ./controller & sleep 0.5 # Terminal 2 LD_LIBRARY_PATH=target/debug ./sensor_driver & # Terminal 3 horus topic list # shows lidar.scan + cmd_vel horus topic echo cmd_vel ``` --- ## Example 2: Cross-Language Pipeline C++ sensor + Python AI + C++ motor control. ### cpp_sensor.cpp ```cpp #include using namespace horus::literals; int main() { horus::Scheduler sched; sched.tick_rate(30_hz); auto pub = sched.advertise("camera.detections"); sched.add("camera") .tick([&] { horus::msg::CmdVel det{}; det.linear = 0.95f; // detection confidence det.angular = 1.0f; // class: person pub.send(det); }) .build(); sched.spin(); } ``` ### python_ai.py ```python import horus from horus._horus import Topic, CmdVel src = Topic(CmdVel) # reads "cmd_vel" def ai_tick(node): det = node.recv("camera.detections") if det is not None and det.linear > 0.8: # High-confidence detection → slow down node.send("ai.decision", CmdVel(linear=0.1, angular=0.0)) else: node.send("ai.decision", CmdVel(linear=0.3, angular=0.0)) horus.run( horus.Node(name="ai", subs=["camera.detections"], pubs=["ai.decision"], tick=ai_tick, rate=10) ) ``` ### cpp_motor.cpp ```cpp #include using namespace horus::literals; class MotorCtrl : public horus::Node { public: MotorCtrl() : Node("motor") { sub_ = subscribe("ai.decision"); pub_ = advertise("motor.pwm"); } void tick() override { auto cmd = sub_->recv(); if (!cmd) return; // Scale to PWM range horus::msg::CmdVel pwm{}; pwm.linear = cmd->get()->linear * 255.0f; pub_->send(pwm); } void enter_safe_state() override { horus::msg::CmdVel stop{}; pub_->send(stop); } private: horus::Subscriber* sub_; horus::Publisher* pub_; }; int main() { horus::Scheduler sched; sched.tick_rate(100_hz).prefer_rt(); MotorCtrl motor; sched.add(motor).order(0).budget(2_ms).on_miss(horus::Miss::SafeMode).build(); sched.spin(); } ``` ### Run all three ```bash ./cpp_motor & # C++ motor (subscriber — start first) python3 python_ai.py & # Python AI ./cpp_sensor & # C++ sensor (publisher) horus topic list # 3 topics from 3 languages horus node list # 3 nodes with different PIDs ``` --- ## Example 3: 1 kHz RT Control Loop Production motor control with jitter tracking. ```cpp #include #include #include using namespace horus::literals; class RtMotor : public horus::Node { public: RtMotor() : Node("rt_motor") { cmd_ = subscribe("motor.target"); pwm_ = advertise("motor.pwm"); } void init() override { last_ = std::chrono::steady_clock::now(); } void tick() override { auto now = std::chrono::steady_clock::now(); auto dt = std::chrono::duration_cast(now - last_).count(); last_ = now; if (dt > max_jitter_) max_jitter_ = dt; ticks_++; auto cmd = cmd_->recv(); double target = cmd ? cmd->get()->linear : 0.0; // PID double error = target - velocity_; integral_ += error * 0.001; double output = 2.0 * error + 0.1 * integral_; velocity_ += output * 0.001; // simulate dynamics horus::msg::CmdVel pwm{}; pwm.linear = static_cast(output); pwm_->send(pwm); if (ticks_ % 1000 == 0) { std::printf("[RT] max_jitter=%ldus ticks=%d vel=%.3f\n", max_jitter_, ticks_, velocity_); } } void enter_safe_state() override { horus::msg::CmdVel stop{}; pwm_->send(stop); std::printf("[RT] SAFE STATE — max jitter was %ldus\n", max_jitter_); horus::blackbox::record("rt_motor", "Safe state, max_jitter=" + std::to_string(max_jitter_) + "us"); } private: horus::Subscriber* cmd_; horus::Publisher* pwm_; std::chrono::steady_clock::time_point last_; double velocity_ = 0, integral_ = 0; long max_jitter_ = 0; int ticks_ = 0; }; int main() { horus::Scheduler sched; sched.tick_rate(1000_hz).name("rt_demo").prefer_rt(); RtMotor motor; sched.add(motor) .order(0) .budget(500_us) .deadline(900_us) .on_miss(horus::Miss::SafeMode) .pin_core(3) .priority(95) .build(); sched.spin(); } ``` --- ## Example 4: Service + Action Request/response RPC and long-running navigation goal. ```cpp #include #include #include int main() { // Service: add two numbers horus::ServiceServer server("calc.add"); server.set_handler([](const uint8_t* req, size_t len, uint8_t* res, size_t* res_len) -> bool { int a = 0, b = 0; std::sscanf(reinterpret_cast(req), R"({"a":%d,"b":%d})", &a, &b); *res_len = std::snprintf(reinterpret_cast(res), 4096, R"({"sum":%d})", a + b); return true; }); horus::ServiceClient client("calc.add"); auto resp = client.call(R"({"a":10,"b":32})", std::chrono::milliseconds(1000)); if (resp) std::printf("10 + 32 = %s\n", resp->c_str()); // Action: navigate to goal horus::ActionClient nav("navigate"); auto goal = nav.send_goal(R"({"x":5.0,"y":3.0})"); std::printf("Goal %lu: %s\n", goal.id(), goal.is_active() ? "active" : "done"); goal.cancel(); std::printf("Canceled: %s\n", goal.status() == horus::GoalStatus::Canceled ? "yes" : "no"); } ``` ## See Also - [Basic Examples](/docs/cpp/examples/basic) — hello world, pub/sub, params, TF - [Tutorial 3: Full Robot](/docs/tutorials/03-full-robot-cpp) — 6-node system - [Cross-Language Interop](/docs/tutorials/cross-language-interop) — C++ + Rust + Python