Monitor

You need to observe your running robot system in real-time: see which nodes are active, watch message flow between topics, inspect performance metrics, and tune parameters without restarting. Here is how to use the HORUS Monitor.

The HORUS Monitor is under active development. Core monitoring features work (nodes, topics, graph, parameters, packages). Some functionality (remote deployment, recordings browser) is still being finalized.

When To Use This

  • Debugging message flow between nodes ("is my publisher actually sending data?")
  • Monitoring node performance and tick rates during development
  • Live tuning of PID gains, speed limits, and other runtime parameters
  • Remote monitoring of headless robots over SSH (TUI mode)
  • Verifying system health before field deployment

Use Telemetry Export instead if you need to send metrics to external dashboards like Grafana or Prometheus.

Use Debugging Workflows instead if you need to diagnose a specific problem like deadline misses or panics.

Prerequisites

  • A running HORUS application (horus run)
  • A second terminal for the monitor (or access via browser from another device)

Quick Start

# Start your HORUS application
horus run

# In another terminal, start the monitor
horus monitor

Browser opens automatically to http://localhost:3000. On first run, you'll be prompted to set a password (or press Enter to skip).

# Custom port
horus monitor 8080

# Terminal UI mode (no browser needed)
horus monitor --tui

# Reset password
horus monitor --reset-password

How It Works

The monitor is a read-only observer that attaches to a running HORUS application without modifying its behavior. It reads data that the scheduler already writes as part of normal operation.

┌─────────────────────────────────────────────────────────────────────┐
│                        HORUS Application                            │
│                                                                     │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐                          │
│  │  Node A  │  │  Node B  │  │  Node C  │    Scheduler writes      │
│  │  tick()  │  │  tick()  │  │  tick()  │    NodeMetrics + topic    │
│  └────┬─────┘  └────┬─────┘  └────┬─────┘    metadata to SHM       │
│       │              │              │         after each tick cycle  │
│       ▼              ▼              ▼                                │
│  ┌──────────────────────────────────────────┐                       │
│  │           Shared Memory (SHM)            │                       │
│  │  ┌────────────┐  ┌───────────────────┐   │                       │
│  │  │ NodeMetrics │  │ Topic Ring Buffers│   │                       │
│  │  │  per node   │  │   headers + data  │   │                       │
│  │  └────────────┘  └───────────────────┘   │                       │
│  └──────────────────────┬───────────────────┘                       │
└─────────────────────────┼───────────────────────────────────────────┘
                          │
            mmap read-only│  (no copies, no syscalls)
                          │
┌─────────────────────────┼───────────────────────────────────────────┐
│         horus monitor   │                                           │
│                         ▼                                           │
│  ┌──────────────────────────────────┐                               │
│  │        SHM Reader (mmap)         │  Reads NodeMetrics, topic     │
│  │   /proc scan for node discovery  │  headers, log buffer          │
│  └──────┬───────────────┬───────────┘                               │
│         │               │                                           │
│   ┌─────▼─────┐   ┌────▼──────────────────────┐                    │
│   │  TUI Mode │   │       Web Mode             │                    │
│   │  ratatui  │   │  ┌──────────┐ ┌──────────┐ │                    │
│   │  redraws  │   │  │  Axum    │ │WebSocket │ │                    │
│   │  at 4 Hz  │   │  │  HTTP    │ │  push    │ │                    │
│   │           │   │  │  server  │ │  at 4 Hz │ │                    │
│   └───────────┘   │  └──────────┘ └──────────┘ │                    │
│                   └────────────────────────────-┘                    │
└─────────────────────────────────────────────────────────────────────┘

Data flow step by step:

  1. Scheduler writes metrics -- After each tick cycle, the scheduler updates NodeMetrics (tick count, duration, deadline misses) in the node's SHM region. Topic ring buffer headers already contain pub/sub counts, pending messages, and drop counts as part of normal IPC operation.

  2. Monitor reads via mmap -- The monitor process opens the same SHM files read-only via mmap. This is a pointer dereference, not a copy -- reads hit L1 cache when the data is recent. The monitor scans /proc to discover running node processes and reads the SHM topics directory to enumerate active topics.

  3. Web UI via Axum -- In web mode, an Axum HTTP server runs on a separate thread. REST endpoints (/api/nodes, /api/topics, /api/graph) return JSON snapshots. A WebSocket at /api/ws pushes live updates to connected browsers at 4 Hz (every 250ms).

  4. TUI via ratatui -- In TUI mode, a crossterm/ratatui terminal UI polls for keyboard input at 100ms intervals and refreshes the display at 250ms intervals (4 Hz). No HTTP server is started.

The monitor never writes to SHM regions used by the application. The only write it performs is setting a verbose flag on a topic's SHM header when you enable topic debug logging from the TUI -- this is a single byte that the topic checks on each send()/recv().

Web Interface

The web monitor has 3 main tabs:

Monitor Tab

The main monitoring view with two sub-views:

List View — Shows nodes and topics in a grid layout:

  • Nodes card: All running nodes with their status
  • Topics card: Active message channels with sizes

Graph View — Interactive canvas showing:

  • Nodes as circles connected to their topics
  • Visual representation of the pub/sub network
  • Helps answer "which nodes are talking to which topics?"

A status bar at the top always shows:

  • Active Nodes count (hover for node list)
  • Active Topics count (hover for topic list)
  • Monitor port

What the dashboard looks like:

┌─────────────────────────────────────────────────────────────────┐
│  [Monitor]  [Parameters]  [Packages]       ● 5 Nodes  ● 8 Topics │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│   ┌─ Nodes ──────────────────┐  ┌─ Topics ──────────────────┐  │
│   │  ● imu_driver     [RT]   │  │  sensors.imu    4 msgs    │  │
│   │  ● camera_node    [Comp] │  │  sensors.lidar  12 msgs   │  │
│   │  ● slam_engine    [RT]   │  │  cmd_vel        1 msg     │  │
│   │  ● planner        [Comp] │  │  map.grid       0 msgs    │  │
│   │  ● motor_driver   [RT]   │  │  odom           2 msgs    │  │
│   └──────────────────────────┘  └────────────────────────────┘  │
│                                                                 │
│   [List View]  [Graph View]                                     │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

In Graph View, nodes appear as circles and topics as labeled connection points. Arrows show publish direction. Hovering a node highlights all of its connected topics.

Parameters Tab

Live runtime parameter editor:

  • Search parameters by name
  • Add new parameters at runtime
  • Edit existing values (changes apply immediately)
  • Delete parameters
  • Export all parameters to file
  • Import parameters from file

Useful for tuning PID gains, speed limits, sensor thresholds without restarting.

Packages Tab

Browse and manage HORUS packages:

  • Search the registry
  • Install packages
  • Manage environments

Terminal UI Mode

For SSH sessions and headless servers:

horus monitor --tui

The TUI provides 8 tabs navigated with arrow keys:

TabDescription
OverviewSystem health summary with log panel
NodesRunning nodes with detailed metrics
TopicsActive topics and message flow
NetworkNetwork connections and transport status
TransformFrameTransformFrame protocol inspection
PackagesPackage management
ParamsRuntime parameter editor
RecordingsSession recordings browser

What the TUI looks like:

┌─ HORUS Monitor ──────────────────────────────────────────────┐
│ [Overview] [Nodes] [Topics] [Network] [TF] [Pkg] [Par] [Rec]│
├──────────────────────────────────────────────────────────────┤
│                                                              │
│  System Health: OK           Uptime: 00:14:32               │
│  Active Nodes: 5/5           Tick Rate: 100 Hz              │
│                                                              │
│  ┌─ Node Status ────────────────────────────────────────┐   │
│  │  imu_driver     ████████████████████░░  92% budget   │   │
│  │  camera_node    ██████████░░░░░░░░░░░░  45% budget   │   │
│  │  slam_engine    ██████████████████░░░░  78% budget    │   │
│  │  planner        ████████░░░░░░░░░░░░░░  35% budget   │   │
│  │  motor_driver   ██████████████░░░░░░░░  62% budget    │   │
│  └──────────────────────────────────────────────────────┘   │
│                                                              │
│  ┌─ Log ─────────────────────────────────────────────── ┐   │
│  │  14:32:01 [INFO] imu_driver: tick 142800 (0.4ms)     │   │
│  │  14:32:01 [INFO] slam_engine: map updated (2.1ms)    │   │
│  │  14:32:01 [WARN] camera_node: frame dropped          │   │
│  └──────────────────────────────────────────────────────┘   │
│                                                              │
│  ← → Navigate tabs  ↑ ↓ Select  Enter: Details  q: Quit    │
└──────────────────────────────────────────────────────────────┘

Navigate between tabs with left/right arrow keys. Within each tab, use up/down arrows to select items, Enter to open detail panels, and Esc to close them. Press q to quit, p to pause/resume updates, ? to show the help overlay.

Topic Debug Logging

In the Topics tab, press Enter on any topic to enable runtime debug logging. All send() and recv() calls on that topic will emit live log entries showing direction, IPC latency, and message summaries (if LogSummary is implemented). Press Esc to disable logging — zero overhead resumes immediately.

No code changes or recompilation required.

Performance Overhead

The monitor is designed to be always-on during development with negligible impact on your application.

ComponentOverheadNotes
SHM metric readsSub-microsecondmmap pointer dereference, hits L1/L2 cache
/proc node scan~1ms per scanRuns at 4 Hz in the monitor process, not in your application
HTTP server (Axum)Own threadSeparate OS thread, does not compete with RT node threads
WebSocket push4 Hz (250ms)JSON serialization of node/topic snapshots, ~2-5 KB per push
TUI redraw4 Hz (250ms)Terminal write in monitor process only, event polling at 10 Hz
Topic debug logging~100ns per send()/recv()Only when verbose is enabled on a specific topic. Writes one log entry per call to the global log buffer
Parameter reads~50nsLock-free atomic reads from RuntimeParams SHM
Total on applicationLess than 0.1% CPUMonitor runs as a separate process. The only cost inside your application is the metric writes the scheduler already does

The scheduler writes NodeMetrics regardless of whether the monitor is running -- this data is used internally for deadline enforcement and watchdog detection. Running horus monitor adds zero overhead to your application's hot path. The monitor process itself typically uses 1-3% of a single CPU core.

When does overhead increase?

  • Topic debug logging: Enabling verbose mode on a high-frequency topic (e.g., 1 kHz IMU) adds log entries at that rate. Each entry is ~100ns, but at 1 kHz that is 100 microseconds per second -- still negligible, but visible in profiling.
  • Many WebSocket clients: Each connected browser receives the full snapshot. With 10+ simultaneous browser tabs, JSON serialization time may reach ~1ms per push cycle.
  • Parameter writes: Setting parameters from the web UI triggers a write to the params SHM. This is a one-time cost per edit, not a recurring overhead.

Network Access

The monitor binds to all network interfaces (0.0.0.0), so you can access it from:

  • Same machine: http://localhost:3000
  • Any device on the network: http://<your-ip>:3000

Always set a password when the monitor is network-accessible.

Security

The monitor supports password-based authentication for networked deployments.

Setup

On first run, set a password (or press Enter to skip authentication):

horus monitor

[SECURITY] HORUS Monitor - First Time Setup
Password: ********
Confirm password: ********
[SUCCESS] Password set successfully!

Reset password anytime:

horus monitor --reset-password

How Authentication Works

When a password is set:

  1. The web UI shows a login page before granting access
  2. All API endpoints require a valid session token (except /api/login)
  3. Sessions expire after 1 hour of inactivity
  4. Failed login attempts are rate-limited

When no password is set (Enter pressed at setup):

  • All endpoints are accessible without authentication
  • Suitable for local development only

API Authentication

# Login — returns a session token
curl -X POST http://localhost:3000/api/login \
  -H "Content-Type: application/json" \
  -d '{"password": "your_password"}'
# Returns: {"token": "abc123..."}

# Use token for API requests
curl http://localhost:3000/api/nodes \
  -H "Authorization: Bearer abc123..."

# Logout
curl -X POST http://localhost:3000/api/logout \
  -H "Authorization: Bearer abc123..."

Security Details

FeatureValue
Password hashingArgon2id
Session timeout1 hour inactivity
Rate limiting5 attempts per 60 seconds
Token size256-bit random (base64-encoded)

Password hash stored at ~/.horus/dashboard_password.hash.

For production deployments, consider placing a reverse proxy with TLS (e.g., nginx) in front of the monitor.

Recovery

If locked out:

# Option 1: Reset via CLI
horus monitor --reset-password

# Option 2: Delete the password hash file
rm ~/.horus/dashboard_password.hash
horus monitor  # Re-prompts for password setup

API Endpoints

The monitor exposes a REST API (authenticated when a password is set):

EndpointMethodDescription
/api/statusGETSystem health status
/api/nodesGETRunning nodes info
/api/topicsGETActive topics
/api/graphGETNode-topic graph
/api/networkGETNetwork connections
/api/logs/allGETAll logs
/api/logs/node/:nameGETLogs for specific node
/api/logs/topic/:nameGETLogs for specific topic
/api/paramsGETList parameters
/api/params/:keyGET/POST/DELETEGet/set/delete parameter
/api/params/exportPOSTExport all parameters
/api/params/importPOSTImport parameters
/api/packages/registryGETSearch packages
/api/packages/installPOSTInstall package
/api/packages/uninstallPOSTUninstall package
/api/recordingsGETList recordings
/api/loginPOSTAuthenticate
/api/logoutPOSTEnd session

Common Scenarios

Debugging Message Flow

"My subscriber isn't getting messages"

  1. Open Monitor tab, switch to Graph View
  2. Is there an arrow from publisher -> topic -> subscriber?
  3. If not: check topic name matches or node isn't running

"The robot is running slow"

  1. Check nodes list for high CPU usage
  2. Check tick rates — which node can't keep up?
  3. Use logs endpoint to check for slow tick warnings

Live Parameter Tuning

Tuning PID controller:

  1. Open Parameters tab
  2. Search for pid
  3. Edit pid.kp value — change applies instantly
  4. Watch robot behavior, adjust until optimal
  5. Export final values with Export button

Common Errors

SymptomCauseFix
Monitor shows nothingHORUS application not runningStart your app first with horus run
Cannot access from another deviceDevices on different networks or firewall blockingEnsure same network, run sudo ufw allow 3000
Port already in useAnother monitor or process on port 3000Use a different port: horus monitor 8080
Locked out of passwordForgotten passwordRun horus monitor --reset-password or delete ~/.horus/dashboard_password.hash
TUI rendering brokenTerminal does not support 256 colorsUse a modern terminal (kitty, alacritty, wezterm) or try TERM=xterm-256color horus monitor --tui
API returns 401 UnauthorizedSession expired or invalid tokenRe-authenticate via /api/login endpoint

Design Decisions

Why SHM-based, not network-based

Traditional monitoring tools (e.g., Prometheus exporters, ROS2 introspection) add network hops to collect metrics. HORUS chose shared memory because:

  • Zero overhead when not monitoring. The scheduler writes NodeMetrics to SHM regardless -- it uses the same data for deadline enforcement. No extra serialization, no sockets, no packets.
  • No configuration. The monitor auto-discovers running nodes by scanning /proc and the SHM topics directory. No need to configure exporters, ports, or scrape endpoints.
  • Works offline. SHM monitoring works without any network stack, which matters on embedded systems and in containers without network access.

If you need to send metrics to an external system (Grafana, Prometheus, Datadog), use Telemetry Export -- it reads the same SHM data and forwards it over the network.

Why both Web and TUI

The monitor provides two interfaces because robots are developed and deployed in different environments:

  • Web UI is for development machines with a browser available. It supports the interactive graph view, drag-and-drop parameter files, and multiple team members can open it simultaneously from different devices on the network.
  • TUI is for SSH sessions into headless robots, CI environments, and embedded systems. It requires only a terminal -- no browser, no X11, no port forwarding. The TUI has feature parity with the web UI for reading data (nodes, topics, params, logs) and additionally supports topic debug logging via mmap.

Both interfaces read from the same SHM data source. You can run them simultaneously -- horus monitor in one terminal for the web UI and horus monitor --tui in another for the TUI.

Why password auth, not tokens or mTLS

The monitor uses simple password-based sessions (Argon2id hashing, 256-bit random session tokens) instead of API keys, OAuth, or mutual TLS:

  • Single-user tool. The monitor is typically accessed by one developer or a small team on a local network. OAuth and mTLS add complexity with no benefit.
  • No external dependencies. Password hashing is done locally with Argon2id. No identity provider, no certificate authority, no token refresh flows.
  • Quick setup. First run prompts for a password. No config files, no key generation, no certificate management.

For production deployments exposed to the internet, place a reverse proxy with TLS (e.g., nginx) in front of the monitor rather than building TLS into the monitor itself.

Why read-only access to SHM

The monitor opens SHM files with read-only mmap. It cannot modify topic buffers, corrupt node state, or interfere with the scheduler. The single exception is the topic verbose flag (one byte per topic), which enables debug logging. This is intentional:

  • Safety. A monitoring tool must never be able to crash or corrupt the system it observes. Read-only mmap enforces this at the OS level.
  • No synchronization needed. The monitor reads atomic counters and ring buffer headers that the scheduler writes. No locks, no contention, no priority inversion risk for RT nodes.

Trade-offs

ChoiceBenefitCost
SHM-based monitoringZero overhead, no network config, works offlineOnly works on the same machine (use Telemetry Export for remote)
Separate monitor processCannot crash the application, clean resource isolationMust be started separately (horus monitor), adds one process
Web + TUI dual interfaceWorks everywhere: browser, SSH, headless, embeddedTwo codebases to maintain, feature parity requires discipline
Password auth (not mTLS)Simple setup, no PKI infrastructure neededNo per-user access control, no audit trail beyond rate limiting
Read-only SHM accessCannot corrupt application state, no lock contentionCannot inject test data or modify parameters from the SHM side (uses params API instead)
4 Hz refresh rateSmooth real-time feel without CPU wasteCannot capture sub-250ms transient events (use BlackBox or topic debug logging for those)
/proc scan for node discoveryWorks without any registration protocolLinux-specific, ~1ms per scan, may show stale entries briefly after node crash

See Also