Monitor

You need to observe your running robot system in real-time: see which nodes are active, watch message flow between topics, inspect performance metrics, and tune parameters without restarting. Here is how to use the HORUS Monitor.

The HORUS Monitor is under active development. Core monitoring features work (nodes, topics, graph, parameters, packages). Some functionality (remote deployment, recordings browser) is still being finalized.

When To Use This

Debugging message flow between nodes ("is my publisher actually sending data?")
Monitoring node performance and tick rates during development
Live tuning of PID gains, speed limits, and other runtime parameters
Remote monitoring of headless robots over SSH (TUI mode)
Verifying system health before field deployment

Use Telemetry Export instead if you need to send metrics to external dashboards like Grafana or Prometheus.

Use Debugging Workflows instead if you need to diagnose a specific problem like deadline misses or panics.

Prerequisites

A running HORUS application (horus run)
A second terminal for the monitor (or access via browser from another device)

Quick Start

# Start your HORUS application
horus run

# In another terminal, start the monitor
horus monitor

Browser opens automatically to http://localhost:3000. On first run, you'll be prompted to set a password (or press Enter to skip).

# Custom port
horus monitor 8080

# Terminal UI mode (no browser needed)
horus monitor --tui

# Reset password
horus monitor --reset-password

How It Works

The monitor is a read-only observer that attaches to a running HORUS application without modifying its behavior. It reads data that the scheduler already writes as part of normal operation.

┌─────────────────────────────────────────────────────────────────────┐
│                        HORUS Application                            │
│                                                                     │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐                          │
│  │  Node A  │  │  Node B  │  │  Node C  │    Scheduler writes      │
│  │  tick()  │  │  tick()  │  │  tick()  │    NodeMetrics + topic    │
│  └────┬─────┘  └────┬─────┘  └────┬─────┘    metadata to SHM       │
│       │              │              │         after each tick cycle  │
│       ▼              ▼              ▼                                │
│  ┌──────────────────────────────────────────┐                       │
│  │           Shared Memory (SHM)            │                       │
│  │  ┌────────────┐  ┌───────────────────┐   │                       │
│  │  │ NodeMetrics │  │ Topic Ring Buffers│   │                       │
│  │  │  per node   │  │   headers + data  │   │                       │
│  │  └────────────┘  └───────────────────┘   │                       │
│  └──────────────────────┬───────────────────┘                       │
└─────────────────────────┼───────────────────────────────────────────┘
                          │
            mmap read-only│  (no copies, no syscalls)
                          │
┌─────────────────────────┼───────────────────────────────────────────┐
│         horus monitor   │                                           │
│                         ▼                                           │
│  ┌──────────────────────────────────┐                               │
│  │        SHM Reader (mmap)         │  Reads NodeMetrics, topic     │
│  │   /proc scan for node discovery  │  headers, log buffer          │
│  └──────┬───────────────┬───────────┘                               │
│         │               │                                           │
│   ┌─────▼─────┐   ┌────▼──────────────────────┐                    │
│   │  TUI Mode │   │       Web Mode             │                    │
│   │  ratatui  │   │  ┌──────────┐ ┌──────────┐ │                    │
│   │  redraws  │   │  │  Axum    │ │WebSocket │ │                    │
│   │  at 4 Hz  │   │  │  HTTP    │ │  push    │ │                    │
│   │           │   │  │  server  │ │  at 4 Hz │ │                    │
│   └───────────┘   │  └──────────┘ └──────────┘ │                    │
│                   └────────────────────────────-┘                    │
└─────────────────────────────────────────────────────────────────────┘

Data flow step by step:

Scheduler writes metrics -- After each tick cycle, the scheduler updates NodeMetrics (tick count, duration, deadline misses) in the node's SHM region. Topic ring buffer headers already contain pub/sub counts, pending messages, and drop counts as part of normal IPC operation.
Monitor reads via mmap -- The monitor process opens the same SHM files read-only via mmap. This is a pointer dereference, not a copy -- reads hit L1 cache when the data is recent. The monitor scans /proc to discover running node processes and reads the SHM topics directory to enumerate active topics.
Web UI via Axum -- In web mode, an Axum HTTP server runs on a separate thread. REST endpoints (/api/nodes, /api/topics, /api/graph) return JSON snapshots. A WebSocket at /api/ws pushes live updates to connected browsers at 4 Hz (every 250ms).
TUI via ratatui -- In TUI mode, a crossterm/ratatui terminal UI polls for keyboard input at 100ms intervals and refreshes the display at 250ms intervals (4 Hz). No HTTP server is started.

The monitor never writes to SHM regions used by the application. The only write it performs is setting a verbose flag on a topic's SHM header when you enable topic debug logging from the TUI -- this is a single byte that the topic checks on each send()/recv().

Web Interface

The web monitor has 3 main tabs:

Monitor Tab

The main monitoring view with two sub-views:

List View — Shows nodes and topics in a grid layout:

Nodes card: All running nodes with their status
Topics card: Active message channels with sizes

Graph View — Interactive canvas showing:

Nodes as circles connected to their topics
Visual representation of the pub/sub network
Helps answer "which nodes are talking to which topics?"

A status bar at the top always shows:

Active Nodes count (hover for node list)
Active Topics count (hover for topic list)
Monitor port

What the dashboard looks like:

┌─────────────────────────────────────────────────────────────────┐
│  [Monitor]  [Parameters]  [Packages]       ● 5 Nodes  ● 8 Topics │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│   ┌─ Nodes ──────────────────┐  ┌─ Topics ──────────────────┐  │
│   │  ● imu_driver     [RT]   │  │  sensors.imu    4 msgs    │  │
│   │  ● camera_node    [Comp] │  │  sensors.lidar  12 msgs   │  │
│   │  ● slam_engine    [RT]   │  │  cmd_vel        1 msg     │  │
│   │  ● planner        [Comp] │  │  map.grid       0 msgs    │  │
│   │  ● motor_driver   [RT]   │  │  odom           2 msgs    │  │
│   └──────────────────────────┘  └────────────────────────────┘  │
│                                                                 │
│   [List View]  [Graph View]                                     │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

In Graph View, nodes appear as circles and topics as labeled connection points. Arrows show publish direction. Hovering a node highlights all of its connected topics.

Parameters Tab

Live runtime parameter editor:

Search parameters by name
Add new parameters at runtime
Edit existing values (changes apply immediately)
Delete parameters
Export all parameters to file
Import parameters from file

Useful for tuning PID gains, speed limits, sensor thresholds without restarting.

Packages Tab

Browse and manage HORUS packages:

Search the registry
Install packages
Manage environments

Terminal UI Mode

For SSH sessions and headless servers:

horus monitor --tui

The TUI provides 8 tabs navigated with arrow keys:

Tab	Description
Overview	System health summary with log panel
Nodes	Running nodes with detailed metrics
Topics	Active topics and message flow
Network	Network connections and transport status
TransformFrame	TransformFrame protocol inspection
Packages	Package management
Params	Runtime parameter editor
Recordings	Session recordings browser

What the TUI looks like:

┌─ HORUS Monitor ──────────────────────────────────────────────┐
│ [Overview] [Nodes] [Topics] [Network] [TF] [Pkg] [Par] [Rec]│
├──────────────────────────────────────────────────────────────┤
│                                                              │
│  System Health: OK           Uptime: 00:14:32               │
│  Active Nodes: 5/5           Tick Rate: 100 Hz              │
│                                                              │
│  ┌─ Node Status ────────────────────────────────────────┐   │
│  │  imu_driver     ████████████████████░░  92% budget   │   │
│  │  camera_node    ██████████░░░░░░░░░░░░  45% budget   │   │
│  │  slam_engine    ██████████████████░░░░  78% budget    │   │
│  │  planner        ████████░░░░░░░░░░░░░░  35% budget   │   │
│  │  motor_driver   ██████████████░░░░░░░░  62% budget    │   │
│  └──────────────────────────────────────────────────────┘   │
│                                                              │
│  ┌─ Log ─────────────────────────────────────────────── ┐   │
│  │  14:32:01 [INFO] imu_driver: tick 142800 (0.4ms)     │   │
│  │  14:32:01 [INFO] slam_engine: map updated (2.1ms)    │   │
│  │  14:32:01 [WARN] camera_node: frame dropped          │   │
│  └──────────────────────────────────────────────────────┘   │
│                                                              │
│  ← → Navigate tabs  ↑ ↓ Select  Enter: Details  q: Quit    │
└──────────────────────────────────────────────────────────────┘

Navigate between tabs with left/right arrow keys. Within each tab, use up/down arrows to select items, Enter to open detail panels, and Esc to close them. Press q to quit, p to pause/resume updates, ? to show the help overlay.

Topic Debug Logging

In the Topics tab, press Enter on any topic to enable runtime debug logging. All send() and recv() calls on that topic will emit live log entries showing direction, IPC latency, and message summaries (if LogSummary is implemented). Press Esc to disable logging — zero overhead resumes immediately.

No code changes or recompilation required.

Performance Overhead

The monitor is designed to be always-on during development with negligible impact on your application.

Component	Overhead	Notes
SHM metric reads	Sub-microsecond	mmap pointer dereference, hits L1/L2 cache
`/proc` node scan	~1ms per scan	Runs at 4 Hz in the monitor process, not in your application
HTTP server (Axum)	Own thread	Separate OS thread, does not compete with RT node threads
WebSocket push	4 Hz (250ms)	JSON serialization of node/topic snapshots, ~2-5 KB per push
TUI redraw	4 Hz (250ms)	Terminal write in monitor process only, event polling at 10 Hz
Topic debug logging	~100ns per `send()`/`recv()`	Only when verbose is enabled on a specific topic. Writes one log entry per call to the global log buffer
Parameter reads	~50ns	Lock-free atomic reads from `RuntimeParams` SHM
Total on application	Less than 0.1% CPU	Monitor runs as a separate process. The only cost inside your application is the metric writes the scheduler already does

The scheduler writes NodeMetrics regardless of whether the monitor is running -- this data is used internally for deadline enforcement and watchdog detection. Running horus monitor adds zero overhead to your application's hot path. The monitor process itself typically uses 1-3% of a single CPU core.

When does overhead increase?

Topic debug logging: Enabling verbose mode on a high-frequency topic (e.g., 1 kHz IMU) adds log entries at that rate. Each entry is ~100ns, but at 1 kHz that is 100 microseconds per second -- still negligible, but visible in profiling.
Many WebSocket clients: Each connected browser receives the full snapshot. With 10+ simultaneous browser tabs, JSON serialization time may reach ~1ms per push cycle.
Parameter writes: Setting parameters from the web UI triggers a write to the params SHM. This is a one-time cost per edit, not a recurring overhead.

Network Access

The monitor binds to all network interfaces (0.0.0.0), so you can access it from:

Same machine: http://localhost:3000
Any device on the network: http://<your-ip>:3000

Always set a password when the monitor is network-accessible.

Security

The monitor supports password-based authentication for networked deployments.

Setup

On first run, set a password (or press Enter to skip authentication):

horus monitor

[SECURITY] HORUS Monitor - First Time Setup
Password: ********
Confirm password: ********
[SUCCESS] Password set successfully!

Reset password anytime:

horus monitor --reset-password

How Authentication Works

When a password is set:

The web UI shows a login page before granting access
All API endpoints require a valid session token (except /api/login)
Sessions expire after 1 hour of inactivity
Failed login attempts are rate-limited

When no password is set (Enter pressed at setup):

All endpoints are accessible without authentication
Suitable for local development only

API Authentication

# Login — returns a session token
curl -X POST http://localhost:3000/api/login \
  -H "Content-Type: application/json" \
  -d '{"password": "your_password"}'
# Returns: {"token": "abc123..."}

# Use token for API requests
curl http://localhost:3000/api/nodes \
  -H "Authorization: Bearer abc123..."

# Logout
curl -X POST http://localhost:3000/api/logout \
  -H "Authorization: Bearer abc123..."

Security Details

Feature	Value
Password hashing	Argon2id
Session timeout	1 hour inactivity
Rate limiting	5 attempts per 60 seconds
Token size	256-bit random (base64-encoded)

Password hash stored at ~/.horus/dashboard_password.hash.

For production deployments, consider placing a reverse proxy with TLS (e.g., nginx) in front of the monitor.

Recovery

If locked out:

# Option 1: Reset via CLI
horus monitor --reset-password

# Option 2: Delete the password hash file
rm ~/.horus/dashboard_password.hash
horus monitor  # Re-prompts for password setup

API Endpoints

The monitor exposes a REST API (authenticated when a password is set):

Endpoint	Method	Description
`/api/status`	GET	System health status
`/api/nodes`	GET	Running nodes info
`/api/topics`	GET	Active topics
`/api/graph`	GET	Node-topic graph
`/api/network`	GET	Network connections
`/api/logs/all`	GET	All logs
`/api/logs/node/:name`	GET	Logs for specific node
`/api/logs/topic/:name`	GET	Logs for specific topic
`/api/params`	GET	List parameters
`/api/params/:key`	GET/POST/DELETE	Get/set/delete parameter
`/api/params/export`	POST	Export all parameters
`/api/params/import`	POST	Import parameters
`/api/packages/registry`	GET	Search packages
`/api/packages/install`	POST	Install package
`/api/packages/uninstall`	POST	Uninstall package
`/api/recordings`	GET	List recordings
`/api/login`	POST	Authenticate
`/api/logout`	POST	End session

Common Scenarios

Debugging Message Flow

"My subscriber isn't getting messages"

Open Monitor tab, switch to Graph View
Is there an arrow from publisher -> topic -> subscriber?
If not: check topic name matches or node isn't running

"The robot is running slow"

Check nodes list for high CPU usage
Check tick rates — which node can't keep up?
Use logs endpoint to check for slow tick warnings

Live Parameter Tuning

Tuning PID controller:

Open Parameters tab
Search for pid
Edit pid.kp value — change applies instantly
Watch robot behavior, adjust until optimal
Export final values with Export button

Common Errors

Symptom	Cause	Fix
Monitor shows nothing	HORUS application not running	Start your app first with `horus run`
Cannot access from another device	Devices on different networks or firewall blocking	Ensure same network, run `sudo ufw allow 3000`
Port already in use	Another monitor or process on port 3000	Use a different port: `horus monitor 8080`
Locked out of password	Forgotten password	Run `horus monitor --reset-password` or delete `~/.horus/dashboard_password.hash`
TUI rendering broken	Terminal does not support 256 colors	Use a modern terminal (kitty, alacritty, wezterm) or try `TERM=xterm-256color horus monitor --tui`
API returns 401 Unauthorized	Session expired or invalid token	Re-authenticate via `/api/login` endpoint

Design Decisions

Why SHM-based, not network-based

Traditional monitoring tools (e.g., Prometheus exporters, ROS2 introspection) add network hops to collect metrics. HORUS chose shared memory because:

Zero overhead when not monitoring. The scheduler writes NodeMetrics to SHM regardless -- it uses the same data for deadline enforcement. No extra serialization, no sockets, no packets.
No configuration. The monitor auto-discovers running nodes by scanning /proc and the SHM topics directory. No need to configure exporters, ports, or scrape endpoints.
Works offline. SHM monitoring works without any network stack, which matters on embedded systems and in containers without network access.

If you need to send metrics to an external system (Grafana, Prometheus, Datadog), use Telemetry Export -- it reads the same SHM data and forwards it over the network.

Why both Web and TUI

The monitor provides two interfaces because robots are developed and deployed in different environments:

Web UI is for development machines with a browser available. It supports the interactive graph view, drag-and-drop parameter files, and multiple team members can open it simultaneously from different devices on the network.
TUI is for SSH sessions into headless robots, CI environments, and embedded systems. It requires only a terminal -- no browser, no X11, no port forwarding. The TUI has feature parity with the web UI for reading data (nodes, topics, params, logs) and additionally supports topic debug logging via mmap.

Both interfaces read from the same SHM data source. You can run them simultaneously -- horus monitor in one terminal for the web UI and horus monitor --tui in another for the TUI.

Why password auth, not tokens or mTLS

The monitor uses simple password-based sessions (Argon2id hashing, 256-bit random session tokens) instead of API keys, OAuth, or mutual TLS:

Single-user tool. The monitor is typically accessed by one developer or a small team on a local network. OAuth and mTLS add complexity with no benefit.
No external dependencies. Password hashing is done locally with Argon2id. No identity provider, no certificate authority, no token refresh flows.
Quick setup. First run prompts for a password. No config files, no key generation, no certificate management.

For production deployments exposed to the internet, place a reverse proxy with TLS (e.g., nginx) in front of the monitor rather than building TLS into the monitor itself.

Why read-only access to SHM

The monitor opens SHM files with read-only mmap. It cannot modify topic buffers, corrupt node state, or interfere with the scheduler. The single exception is the topic verbose flag (one byte per topic), which enables debug logging. This is intentional:

Safety. A monitoring tool must never be able to crash or corrupt the system it observes. Read-only mmap enforces this at the OS level.
No synchronization needed. The monitor reads atomic counters and ring buffer headers that the scheduler writes. No locks, no contention, no priority inversion risk for RT nodes.

Trade-offs

Choice	Benefit	Cost
SHM-based monitoring	Zero overhead, no network config, works offline	Only works on the same machine (use Telemetry Export for remote)
Separate monitor process	Cannot crash the application, clean resource isolation	Must be started separately (`horus monitor`), adds one process
Web + TUI dual interface	Works everywhere: browser, SSH, headless, embedded	Two codebases to maintain, feature parity requires discipline
Password auth (not mTLS)	Simple setup, no PKI infrastructure needed	No per-user access control, no audit trail beyond rate limiting
Read-only SHM access	Cannot corrupt application state, no lock contention	Cannot inject test data or modify parameters from the SHM side (uses params API instead)
4 Hz refresh rate	Smooth real-time feel without CPU waste	Cannot capture sub-250ms transient events (use BlackBox or topic debug logging for those)
`/proc` scan for node discovery	Works without any registration protocol	Linux-specific, ~1ms per scan, may show stale entries briefly after node crash