Perception Messages

Perception messages carry computer vision results — detected objects, tracked targets, body pose keypoints, segmentation masks, and plane surfaces. These are the outputs of your ML models and the inputs to your planning/control systems.

from horus import (
    BoundingBox2D, BoundingBox3D, Detection, Detection3D,
    TrackedObject, TrackingHeader,
    Landmark, Landmark3D, LandmarkArray,
    PlaneDetection, PlaneArray,
    SegmentationMask,
)

BoundingBox2D

Axis-aligned bounding box in 2D image coordinates. The fundamental output of object detectors like YOLO, SSD, Faster R-CNN.

Constructor

bbox = BoundingBox2D(x=10.0, y=20.0, width=100.0, height=200.0)

.from_center(cx, cy, width, height) — From Center Point

bbox = BoundingBox2D.from_center(cx=60.0, cy=120.0, width=100.0, height=200.0)

Many ML models output bounding boxes as (center_x, center_y, width, height). This factory creates a BoundingBox2D from that format.

.area() — Box Area in Pixels

print(bbox.area())  # 20000.0

Width × height. Use for filtering — ignore very small detections (noise) or very large ones (false positives spanning the whole image).

.iou(other) — Intersection Over Union

bbox_a = BoundingBox2D(x=0.0, y=0.0, width=100.0, height=100.0)
bbox_b = BoundingBox2D(x=50.0, y=50.0, width=100.0, height=100.0)
print(bbox_a.iou(bbox_b))  # ~0.143 (partial overlap)

Returns 0.0 (no overlap) to 1.0 (identical boxes). This is the core metric for non-maximum suppression (NMS) — when your detector finds multiple boxes for the same object, keep the highest-confidence one and suppress any box with IoU > threshold (typically 0.3-0.5).

# Simple NMS pattern
detections.sort(key=lambda d: d.confidence, reverse=True)
kept = []
for det in detections:
    if all(det.bbox.iou(k.bbox) < 0.5 for k in kept):
        kept.append(det)

.as_tuple() / .as_xyxy() — Format Conversion

x, y, w, h = bbox.as_tuple()       # (x, y, width, height)
x1, y1, x2, y2 = bbox.as_xyxy()    # (x_min, y_min, x_max, y_max)

Different drawing libraries expect different formats. OpenCV uses (x, y, w, h), some plotting tools use (x1, y1, x2, y2).


BoundingBox3D

A 3D bounding box with center, dimensions, and orientation. The constructor takes a single yaw angle for ground-plane rotation (the most common case). For full 3D orientation, use with_rotation.

Constructor

bbox = BoundingBox3D(cx=1.0, cy=2.0, cz=0.5, length=2.0, width=1.0, height=1.5, yaw=0.3)

Fields

FieldTypeDescription
cx, cy, czfloatCenter position in meters
lengthfloatSize along the X axis (meters)
widthfloatSize along the Y axis (meters)
heightfloatSize along the Z axis (meters)
yawfloatRotation around the Z axis (radians)

.with_rotation(cx, cy, cz, length, width, height, roll, pitch, yaw) — Full 3D Rotation

bbox = BoundingBox3D.with_rotation(
    cx=1.0, cy=2.0, cz=0.5,
    length=2.0, width=1.0, height=1.5,
    roll=0.0, pitch=0.1, yaw=0.3
)

Use this when the detected object is tilted or on a slope. The constructor only accepts yaw (rotation around the vertical axis), which is sufficient for objects on flat ground. with_rotation lets you specify all three Euler angles for objects at arbitrary orientations — a crate on a ramp, a drone in flight, or a wall-mounted sensor.

Example — 3D Detection from LiDAR:

from horus import BoundingBox3D

# Detected car: 4.5m long, 1.8m wide, 1.5m tall, heading 30 degrees
car_box = BoundingBox3D(
    cx=5.0, cy=2.0, cz=0.75,
    length=4.5, width=1.8, height=1.5,
    yaw=0.524  # ~30 degrees
)

Detection

A single 2D object detection result — class + confidence + bounding box.

Constructor

det = Detection(class_name="person", confidence=0.95,
                x=10.0, y=20.0, width=100.0, height=200.0)

Fields

FieldTypeDescription
class_namestrDetected object class (e.g., "person", "car")
confidencefloatDetection confidence, 0.0 to 1.0
x, yfloatTop-left corner of bounding box (pixels)
width, heightfloatBounding box dimensions (pixels)
class_idintNumeric class ID (optional, set via with_class_id())

.is_confident(threshold) — Filter Low Confidence

if det.is_confident(0.5):
    print(f"Detected {det.class_name} at {det.confidence:.0%}")

Returns True if confidence exceeds the threshold. Typical thresholds:

  • 0.3-0.5: Real-time applications (more detections, some false positives)
  • 0.7-0.9: High-precision applications (fewer detections, almost no false positives)

.with_class_id(class_id) — Set Numeric Class ID

det = det.with_class_id(1)  # COCO class ID for "person"

Returns a new Detection with the class ID set. Many ML frameworks output numeric class IDs alongside string names.


Detection3D

3D object detection with position, size, and optional velocity.

Constructor

det3d = Detection3D(class_name="car", confidence=0.9,
                     cx=5.0, cy=2.0, cz=0.0,
                     length=4.5, width=1.8, height=1.5)

Fields

FieldTypeDescription
class_namestrDetected object class
confidencefloatDetection confidence, 0.0 to 1.0
cx, cy, czfloatCenter position in meters
length, width, heightfloatObject dimensions in meters
vx, vy, vzfloatVelocity components (m/s, optional)

.with_velocity(vx, vy, vz) — Add Motion Estimate

det3d = det3d.with_velocity(vx=10.0, vy=0.0, vz=0.0)  # Moving at 10 m/s in x

Returns a new Detection3D with velocity components. Use when your 3D detector also estimates object motion (e.g., from multi-frame tracking or radar fusion).


TrackedObject

Multi-object tracking state with a lifecycle: tentative → confirmed → deleted.

A new detection starts as tentative. After being seen in multiple consecutive frames, it's confirmed. If it's not seen for too long, it's deleted. This lifecycle prevents spurious single-frame detections from being treated as real objects.

Constructor

tracked = TrackedObject(track_id=42, class_id=1, confidence=0.9,
                         x=1.0, y=2.0, width=3.0, height=4.0)

Fields

FieldTypeDescription
track_idintUnique track identifier (persists across frames)
class_idintObject class ID
confidencefloatLatest detection confidence
x, yfloatBounding box top-left (pixels)
width, heightfloatBounding box dimensions (pixels)

.is_tentative() / .is_confirmed() / .is_deleted() — State Queries

if tracked.is_tentative():
    print("New detection — not yet reliable")
elif tracked.is_confirmed():
    print("Stable track — use for planning")
elif tracked.is_deleted():
    print("Lost track — remove from state")

.confirm() — Promote to Confirmed

tracked.confirm()  # Tentative → Confirmed

Call after the object has been matched across enough frames (typically 3-5). Only confirmed tracks should be used for navigation and planning decisions.

.update(bbox, confidence) — New Frame Data

tracked.update(new_bbox, new_confidence)

Updates the track with the latest detection. Resets the "time since update" counter. Call this every frame where the object is re-detected.

Common mistake: Forgetting to call update() for matched tracks. Without it, time_since_update grows and the track eventually gets deleted even though you keep detecting the object.

.mark_missed() — Not Seen This Frame

tracked.mark_missed()

Call when the object was NOT detected in the current frame. Increments the miss counter — after enough misses, the track should be deleted.

.delete() — Remove Track

tracked.delete()

Marks the track as deleted. is_deleted() returns True.

.speed() / .heading() — Motion Estimation

print(f"Speed: {tracked.speed():.1f} px/frame")
print(f"Heading: {tracked.heading():.1f} rad")

Computed from the tracked trajectory. Speed is in pixels per frame (or meters per frame if tracking in world coordinates). Heading is the direction of motion in radians.


Landmark, Landmark3D, LandmarkArray

Body pose estimation keypoints — skeleton joints from COCO, MediaPipe, or custom pose models.

Landmark — 2D Keypoint

lm = Landmark(x=100.0, y=200.0, visibility=0.95, index=5)
visible_lm = Landmark.visible(x=100.0, y=200.0, index=5)  # visibility=1.0

.is_visible(threshold) — Filter Occluded Keypoints

if lm.is_visible(0.5):
    # Keypoint is visible — use for pose estimation
    pass

Visibility is a confidence score (0.0 = occluded/not detected, 1.0 = clearly visible). Filter low-visibility keypoints to avoid using unreliable data.

.distance_to(other) — Keypoint Distance

dist = left_wrist.distance_to(right_wrist)

Landmark3D — 3D Keypoint

lm3d = Landmark3D(x=1.0, y=2.0, z=0.5, visibility=0.9, index=10)
lm2d = lm3d.to_2d()  # Project to 2D (drops z)

LandmarkArray — Skeleton Presets

# Standard presets for popular pose models
skeleton = LandmarkArray.coco_pose()        # 17 COCO keypoints
skeleton = LandmarkArray.mediapipe_pose()   # 33 MediaPipe pose keypoints
hand = LandmarkArray.mediapipe_hand()       # 21 hand keypoints
face = LandmarkArray.mediapipe_face()       # 478 face mesh keypoints

These presets set the correct number of landmarks and dimension for each model. Fill in the actual keypoint coordinates from your model's output.


PlaneDetection

Detected planar surfaces — floors, walls, tables. Used for navigation (floor detection), manipulation (table surface), and augmented reality.

Fields

FieldTypeDescription
nx, ny, nzfloatPlane normal vector components
dfloatDistance from origin to plane along the normal
confidencefloatDetection confidence, 0.0 to 1.0

.distance_to_point(px, py, pz) — Point-to-Plane Distance

plane = PlaneDetection(...)
dist = plane.distance_to_point(1.0, 2.0, 0.5)
# Signed distance — positive = above plane, negative = below

The signed perpendicular distance from a point to the plane. Use this to check if objects are on, above, or below a surface.

.contains_point(px, py, pz, tolerance) — Is a Point on This Plane?

if plane.contains_point(1.0, 2.0, 0.01, tolerance=0.05):
    print("Point is on the table surface (within 5cm)")

Returns True if the point is within tolerance meters of the plane. Use for classifying which objects are on which surface.


SegmentationMask

Pixel-level image segmentation — semantic (class per pixel), instance (unique ID per object), or panoptic (both).

Factory Methods

# Semantic: what class is each pixel?
mask = SegmentationMask.semantic(width=640, height=480, num_classes=21)

# Instance: which object is each pixel?
mask = SegmentationMask.instance(width=640, height=480)

# Panoptic: both class AND instance for each pixel
mask = SegmentationMask.panoptic(width=640, height=480, num_classes=21)

.is_semantic() / .is_instance() / .is_panoptic() — Check Type

if mask.is_semantic():
    print("Semantic segmentation mask")

Fields

FieldTypeDescription
widthintMask width in pixels
heightintMask height in pixels
num_classesintNumber of semantic classes (semantic/panoptic only)

.data_size() / .data_size_u16()

print(f"Data: {mask.data_size()} bytes (u8), {mask.data_size_u16()} elements (u16)")

Example — Check Driveable Area:

from horus import SegmentationMask, Topic

seg_topic = Topic(SegmentationMask)
ROAD_CLASS = 7  # Example: COCO stuff class for "road"

def check_driveable(node):
    mask = seg_topic.recv(node)
    if mask is None or not mask.is_semantic():
        return
    # Count pixels belonging to the road class
    # (actual pixel access depends on your data pipeline)
    print(f"Segmentation mask: {mask.width}x{mask.height}, {mask.num_classes} classes")

PointField

Describes a single field in a point cloud — name, byte offset, datatype, and element count. Used when defining custom point cloud formats (e.g., XYZ + RGB + intensity).

Constructor

field = PointField(name="x", offset=0, datatype=7, count=1)  # FLOAT32

.field_size() — Byte Size of One Element

size = field.field_size()  # e.g., 4 for FLOAT32, 8 for FLOAT64

Returns the byte size of a single element based on the datatype. Useful when computing byte offsets for the next field in a point cloud layout, or when parsing raw point cloud buffers.


Design Decisions

Why BoundingBox2D.iou() instead of a standalone function? IoU is always computed between two boxes. Making it a method (bbox_a.iou(bbox_b)) reads naturally and avoids importing a separate utility. It also ensures both boxes use the same coordinate convention (top-left origin, positive width/height).

Why does TrackedObject have an explicit lifecycle (tentative/confirmed/deleted)? Without lifecycle management, every detection is treated equally. A single-frame false positive would trigger the same response as a stable track. The lifecycle pattern filters noise: only confirmed tracks (seen across multiple frames) should influence planning. This is standard practice in multi-object tracking (SORT, DeepSORT, ByteTrack all use this pattern).

Why separate Detection (2D) and Detection3D (3D)? Most object detectors output 2D bounding boxes in image coordinates. 3D detection requires depth information (stereo, LiDAR, monocular depth estimation). Forcing every detection into a 3D struct would waste 7 fields per detection for the 2D case (which is the 90% case). Separate types keep 2D detections lightweight while giving 3D detections full spatial information.

Why LandmarkArray presets (coco_pose(), mediapipe_pose()) instead of a generic constructor? Different pose models output different numbers of keypoints in different orders. COCO has 17 keypoints, MediaPipe Pose has 33, MediaPipe Hand has 21. The presets pre-allocate the correct number of landmarks and set the right dimension, preventing size mismatches between your model output and the message.

Why PlaneDetection.distance_to_point() returns a signed distance? The sign tells you which side of the plane the point is on. Positive means above the plane (same side as the normal), negative means below. This is essential for tasks like "is this object on the table?" (distance near zero) or "is the robot above the floor?" (distance should be positive).


See Also