Perception Messages

Perception messages carry computer vision results — detected objects, tracked targets, body pose keypoints, segmentation masks, and plane surfaces. These are the outputs of your ML models and the inputs to your planning/control systems.

# simplified
from horus import (
    BoundingBox2D, BoundingBox3D, Detection, Detection3D,
    TrackedObject, TrackingHeader,
    Landmark, Landmark3D, LandmarkArray,
    PlaneDetection, PlaneArray,
    SegmentationMask,
)

BoundingBox2D

Axis-aligned bounding box in 2D image coordinates. The fundamental output of object detectors like YOLO, SSD, Faster R-CNN.

Constructor

# simplified
bbox = BoundingBox2D(x=10.0, y=20.0, width=100.0, height=200.0)

`.from_center(cx, cy, width, height)` — From Center Point

# simplified
bbox = BoundingBox2D.from_center(cx=60.0, cy=120.0, width=100.0, height=200.0)

Many ML models output bounding boxes as (center_x, center_y, width, height). This factory creates a BoundingBox2D from that format.

`.area()` — Box Area in Pixels

# simplified
print(bbox.area())  # 20000.0

Width × height. Use for filtering — ignore very small detections (noise) or very large ones (false positives spanning the whole image).

`.iou(other)` — Intersection Over Union

# simplified
bbox_a = BoundingBox2D(x=0.0, y=0.0, width=100.0, height=100.0)
bbox_b = BoundingBox2D(x=50.0, y=50.0, width=100.0, height=100.0)
print(bbox_a.iou(bbox_b))  # ~0.143 (partial overlap)

Returns 0.0 (no overlap) to 1.0 (identical boxes). This is the core metric for non-maximum suppression (NMS) — when your detector finds multiple boxes for the same object, keep the highest-confidence one and suppress any box with IoU > threshold (typically 0.3-0.5).

# simplified
# Simple NMS pattern
detections.sort(key=lambda d: d.confidence, reverse=True)
kept = []
for det in detections:
    if all(det.bbox.iou(k.bbox) < 0.5 for k in kept):
        kept.append(det)

`.as_tuple()` / `.as_xyxy()` — Format Conversion

# simplified
x, y, w, h = bbox.as_tuple()       # (x, y, width, height)
x1, y1, x2, y2 = bbox.as_xyxy()    # (x_min, y_min, x_max, y_max)

Different drawing libraries expect different formats. OpenCV uses (x, y, w, h), some plotting tools use (x1, y1, x2, y2).

BoundingBox3D

A 3D bounding box with center, dimensions, and orientation. The constructor takes a single yaw angle for ground-plane rotation (the most common case). For full 3D orientation, use with_rotation.

Constructor

# simplified
bbox = BoundingBox3D(cx=1.0, cy=2.0, cz=0.5, length=2.0, width=1.0, height=1.5, yaw=0.3)

Fields

Field	Type	Description
`cx`, `cy`, `cz`	`float`	Center position in meters
`length`	`float`	Size along the X axis (meters)
`width`	`float`	Size along the Y axis (meters)
`height`	`float`	Size along the Z axis (meters)
`yaw`	`float`	Rotation around the Z axis (radians)

`.with_rotation(cx, cy, cz, length, width, height, roll, pitch, yaw)` — Full 3D Rotation

# simplified
bbox = BoundingBox3D.with_rotation(
    cx=1.0, cy=2.0, cz=0.5,
    length=2.0, width=1.0, height=1.5,
    roll=0.0, pitch=0.1, yaw=0.3
)

Use this when the detected object is tilted or on a slope. The constructor only accepts yaw (rotation around the vertical axis), which is sufficient for objects on flat ground. with_rotation lets you specify all three Euler angles for objects at arbitrary orientations — a crate on a ramp, a drone in flight, or a wall-mounted sensor.

Example — 3D Detection from LiDAR:

# simplified
from horus import BoundingBox3D

# Detected car: 4.5m long, 1.8m wide, 1.5m tall, heading 30 degrees
car_box = BoundingBox3D(
    cx=5.0, cy=2.0, cz=0.75,
    length=4.5, width=1.8, height=1.5,
    yaw=0.524  # ~30 degrees
)

Detection

A single 2D object detection result — class + confidence + bounding box.

Constructor

# simplified
det = Detection(class_name="person", confidence=0.95,
                x=10.0, y=20.0, width=100.0, height=200.0)

Fields

Field	Type	Description
`class_name`	`str`	Detected object class (e.g., "person", "car")
`confidence`	`float`	Detection confidence, 0.0 to 1.0
`x`, `y`	`float`	Top-left corner of bounding box (pixels)
`width`, `height`	`float`	Bounding box dimensions (pixels)
`class_id`	`int`	Numeric class ID (optional, set via `with_class_id()`)

`.is_confident(threshold)` — Filter Low Confidence

# simplified
if det.is_confident(0.5):
    print(f"Detected {det.class_name} at {det.confidence:.0%}")

Returns True if confidence exceeds the threshold. Typical thresholds:

0.3-0.5: Real-time applications (more detections, some false positives)
0.7-0.9: High-precision applications (fewer detections, almost no false positives)

`.with_class_id(class_id)` — Set Numeric Class ID

# simplified
det = det.with_class_id(1)  # COCO class ID for "person"

Returns a new Detection with the class ID set. Many ML frameworks output numeric class IDs alongside string names.

Detection3D

3D object detection with position, size, and optional velocity.

Constructor

# simplified
det3d = Detection3D(class_name="car", confidence=0.9,
                     cx=5.0, cy=2.0, cz=0.0,
                     length=4.5, width=1.8, height=1.5)

Fields

Field	Type	Description
`class_name`	`str`	Detected object class
`confidence`	`float`	Detection confidence, 0.0 to 1.0
`cx`, `cy`, `cz`	`float`	Center position in meters
`length`, `width`, `height`	`float`	Object dimensions in meters
`vx`, `vy`, `vz`	`float`	Velocity components (m/s, optional)

`.with_velocity(vx, vy, vz)` — Add Motion Estimate

# simplified
det3d = det3d.with_velocity(vx=10.0, vy=0.0, vz=0.0)  # Moving at 10 m/s in x

Returns a new Detection3D with velocity components. Use when your 3D detector also estimates object motion (e.g., from multi-frame tracking or radar fusion).

TrackedObject

Multi-object tracking state with a lifecycle: tentative → confirmed → deleted.

A new detection starts as tentative. After being seen in multiple consecutive frames, it's confirmed. If it's not seen for too long, it's deleted. This lifecycle prevents spurious single-frame detections from being treated as real objects.

Constructor

# simplified
tracked = TrackedObject(track_id=42, class_id=1, confidence=0.9,
                         x=1.0, y=2.0, width=3.0, height=4.0)

Fields

Field	Type	Description
`track_id`	`int`	Unique track identifier (persists across frames)
`class_id`	`int`	Object class ID
`confidence`	`float`	Latest detection confidence
`x`, `y`	`float`	Bounding box top-left (pixels)
`width`, `height`	`float`	Bounding box dimensions (pixels)

`.is_tentative()` / `.is_confirmed()` / `.is_deleted()` — State Queries

# simplified
if tracked.is_tentative():
    print("New detection — not yet reliable")
elif tracked.is_confirmed():
    print("Stable track — use for planning")
elif tracked.is_deleted():
    print("Lost track — remove from state")

`.confirm()` — Promote to Confirmed

# simplified
tracked.confirm()  # Tentative → Confirmed

Call after the object has been matched across enough frames (typically 3-5). Only confirmed tracks should be used for navigation and planning decisions.

`.update(bbox, confidence)` — New Frame Data

# simplified
tracked.update(new_bbox, new_confidence)

Updates the track with the latest detection. Resets the "time since update" counter. Call this every frame where the object is re-detected.

Common mistake: Forgetting to call update() for matched tracks. Without it, time_since_update grows and the track eventually gets deleted even though you keep detecting the object.

`.mark_missed()` — Not Seen This Frame

# simplified
tracked.mark_missed()

Call when the object was NOT detected in the current frame. Increments the miss counter — after enough misses, the track should be deleted.

`.delete()` — Remove Track

# simplified
tracked.delete()

Marks the track as deleted. is_deleted() returns True.

`.speed()` / `.heading()` — Motion Estimation

# simplified
print(f"Speed: {tracked.speed():.1f} px/frame")
print(f"Heading: {tracked.heading():.1f} rad")

Computed from the tracked trajectory. Speed is in pixels per frame (or meters per frame if tracking in world coordinates). Heading is the direction of motion in radians.

Landmark, Landmark3D, LandmarkArray

Body pose estimation keypoints — skeleton joints from COCO, MediaPipe, or custom pose models.

`Landmark` — 2D Keypoint

# simplified
lm = Landmark(x=100.0, y=200.0, visibility=0.95, index=5)
visible_lm = Landmark.visible(x=100.0, y=200.0, index=5)  # visibility=1.0

`.is_visible(threshold)` — Filter Occluded Keypoints

# simplified
if lm.is_visible(0.5):
    # Keypoint is visible — use for pose estimation
    pass

Visibility is a confidence score (0.0 = occluded/not detected, 1.0 = clearly visible). Filter low-visibility keypoints to avoid using unreliable data.

`.distance_to(other)` — Keypoint Distance

# simplified
dist = left_wrist.distance_to(right_wrist)

`Landmark3D` — 3D Keypoint

# simplified
lm3d = Landmark3D(x=1.0, y=2.0, z=0.5, visibility=0.9, index=10)
lm2d = lm3d.to_2d()  # Project to 2D (drops z)

`LandmarkArray` — Skeleton Presets

# simplified
# Standard presets for popular pose models
skeleton = LandmarkArray.coco_pose()        # 17 COCO keypoints
skeleton = LandmarkArray.mediapipe_pose()   # 33 MediaPipe pose keypoints
hand = LandmarkArray.mediapipe_hand()       # 21 hand keypoints
face = LandmarkArray.mediapipe_face()       # 478 face mesh keypoints

These presets set the correct number of landmarks and dimension for each model. Fill in the actual keypoint coordinates from your model's output.

PlaneDetection

Detected planar surfaces — floors, walls, tables. Used for navigation (floor detection), manipulation (table surface), and augmented reality.

Fields

Field	Type	Description
`nx`, `ny`, `nz`	`float`	Plane normal vector components
`d`	`float`	Distance from origin to plane along the normal
`confidence`	`float`	Detection confidence, 0.0 to 1.0

`.distance_to_point(px, py, pz)` — Point-to-Plane Distance

# simplified
plane = PlaneDetection(...)
dist = plane.distance_to_point(1.0, 2.0, 0.5)
# Signed distance — positive = above plane, negative = below

The signed perpendicular distance from a point to the plane. Use this to check if objects are on, above, or below a surface.

`.contains_point(px, py, pz, tolerance)` — Is a Point on This Plane?

# simplified
if plane.contains_point(1.0, 2.0, 0.01, tolerance=0.05):
    print("Point is on the table surface (within 5cm)")

Returns True if the point is within tolerance meters of the plane. Use for classifying which objects are on which surface.

SegmentationMask

Pixel-level image segmentation — semantic (class per pixel), instance (unique ID per object), or panoptic (both).

Factory Methods

# simplified
# Semantic: what class is each pixel?
mask = SegmentationMask.semantic(width=640, height=480, num_classes=21)

# Instance: which object is each pixel?
mask = SegmentationMask.instance(width=640, height=480)

# Panoptic: both class AND instance for each pixel
mask = SegmentationMask.panoptic(width=640, height=480, num_classes=21)

`.is_semantic()` / `.is_instance()` / `.is_panoptic()` — Check Type

# simplified
if mask.is_semantic():
    print("Semantic segmentation mask")

Fields

Field	Type	Description
`width`	`int`	Mask width in pixels
`height`	`int`	Mask height in pixels
`num_classes`	`int`	Number of semantic classes (semantic/panoptic only)

`.data_size()` / `.data_size_u16()`

# simplified
print(f"Data: {mask.data_size()} bytes (u8), {mask.data_size_u16()} elements (u16)")

Example — Check Driveable Area:

# simplified
from horus import SegmentationMask, Topic

seg_topic = Topic(SegmentationMask)
ROAD_CLASS = 7  # Example: COCO stuff class for "road"

def check_driveable(node):
    mask = seg_topic.recv(node)
    if mask is None or not mask.is_semantic():
        return
    # Count pixels belonging to the road class
    # (actual pixel access depends on your data pipeline)
    print(f"Segmentation mask: {mask.width}x{mask.height}, {mask.num_classes} classes")

PointField

Describes a single field in a point cloud — name, byte offset, datatype, and element count. Used when defining custom point cloud formats (e.g., XYZ + RGB + intensity).

Constructor

# simplified
field = PointField(name="x", offset=0, datatype=7, count=1)  # FLOAT32

`.field_size()` — Byte Size of One Element

# simplified
size = field.field_size()  # e.g., 4 for FLOAT32, 8 for FLOAT64

Returns the byte size of a single element based on the datatype. Useful when computing byte offsets for the next field in a point cloud layout, or when parsing raw point cloud buffers.

Design Decisions

Why BoundingBox2D.iou() instead of a standalone function? IoU is always computed between two boxes. Making it a method (bbox_a.iou(bbox_b)) reads naturally and avoids importing a separate utility. It also ensures both boxes use the same coordinate convention (top-left origin, positive width/height).

Why does TrackedObject have an explicit lifecycle (tentative/confirmed/deleted)? Without lifecycle management, every detection is treated equally. A single-frame false positive would trigger the same response as a stable track. The lifecycle pattern filters noise: only confirmed tracks (seen across multiple frames) should influence planning. This is standard practice in multi-object tracking (SORT, DeepSORT, ByteTrack all use this pattern).

Why separate Detection (2D) and Detection3D (3D)? Most object detectors output 2D bounding boxes in image coordinates. 3D detection requires depth information (stereo, LiDAR, monocular depth estimation). Forcing every detection into a 3D struct would waste 7 fields per detection for the 2D case (which is the 90% case). Separate types keep 2D detections lightweight while giving 3D detections full spatial information.

Why LandmarkArray presets (coco_pose(), mediapipe_pose()) instead of a generic constructor? Different pose models output different numbers of keypoints in different orders. COCO has 17 keypoints, MediaPipe Pose has 33, MediaPipe Hand has 21. The presets pre-allocate the correct number of landmarks and set the right dimension, preventing size mismatches between your model output and the message.

Why PlaneDetection.distance_to_point() returns a signed distance? The sign tells you which side of the plane the point is on. Positive means above the plane (same side as the normal), negative means below. This is essential for tasks like "is this object on the table?" (distance near zero) or "is the robot above the floor?" (distance should be positive).

Perception Messages

BoundingBox2D

Constructor

.from_center(cx, cy, width, height) — From Center Point

.area() — Box Area in Pixels

.iou(other) — Intersection Over Union

.as_tuple() / .as_xyxy() — Format Conversion

BoundingBox3D

Constructor

Fields

.with_rotation(cx, cy, cz, length, width, height, roll, pitch, yaw) — Full 3D Rotation

Detection

Constructor

Fields

.is_confident(threshold) — Filter Low Confidence

.with_class_id(class_id) — Set Numeric Class ID

Detection3D

Constructor

Fields

.with_velocity(vx, vy, vz) — Add Motion Estimate

TrackedObject

Constructor

Fields

.is_tentative() / .is_confirmed() / .is_deleted() — State Queries

.confirm() — Promote to Confirmed

.update(bbox, confidence) — New Frame Data

.mark_missed() — Not Seen This Frame

.delete() — Remove Track

.speed() / .heading() — Motion Estimation

Landmark, Landmark3D, LandmarkArray

Landmark — 2D Keypoint

.is_visible(threshold) — Filter Occluded Keypoints

.distance_to(other) — Keypoint Distance

Landmark3D — 3D Keypoint

LandmarkArray — Skeleton Presets

PlaneDetection

Fields

.distance_to_point(px, py, pz) — Point-to-Plane Distance

.contains_point(px, py, pz, tolerance) — Is a Point on This Plane?

SegmentationMask

Factory Methods

.is_semantic() / .is_instance() / .is_panoptic() — Check Type

Fields

.data_size() / .data_size_u16()

PointField

Constructor

.field_size() — Byte Size of One Element

Design Decisions

See Also

`.from_center(cx, cy, width, height)` — From Center Point

`.area()` — Box Area in Pixels

`.iou(other)` — Intersection Over Union

`.as_tuple()` / `.as_xyxy()` — Format Conversion

`.with_rotation(cx, cy, cz, length, width, height, roll, pitch, yaw)` — Full 3D Rotation

`.is_confident(threshold)` — Filter Low Confidence

`.with_class_id(class_id)` — Set Numeric Class ID

`.with_velocity(vx, vy, vz)` — Add Motion Estimate

`.is_tentative()` / `.is_confirmed()` / `.is_deleted()` — State Queries

`.confirm()` — Promote to Confirmed

`.update(bbox, confidence)` — New Frame Data

`.mark_missed()` — Not Seen This Frame

`.delete()` — Remove Track

`.speed()` / `.heading()` — Motion Estimation

`Landmark` — 2D Keypoint

`.is_visible(threshold)` — Filter Occluded Keypoints

`.distance_to(other)` — Keypoint Distance

`Landmark3D` — 3D Keypoint

`LandmarkArray` — Skeleton Presets

`.distance_to_point(px, py, pz)` — Point-to-Plane Distance

`.contains_point(px, py, pz, tolerance)` — Is a Point on This Plane?

`.is_semantic()` / `.is_instance()` / `.is_panoptic()` — Check Type

`.data_size()` / `.data_size_u16()`

`.field_size()` — Byte Size of One Element