Perception Messages
Perception messages carry computer vision results — detected objects, tracked targets, body pose keypoints, segmentation masks, and plane surfaces. These are the outputs of your ML models and the inputs to your planning/control systems.
from horus import (
BoundingBox2D, BoundingBox3D, Detection, Detection3D,
TrackedObject, TrackingHeader,
Landmark, Landmark3D, LandmarkArray,
PlaneDetection, PlaneArray,
SegmentationMask,
)
BoundingBox2D
Axis-aligned bounding box in 2D image coordinates. The fundamental output of object detectors like YOLO, SSD, Faster R-CNN.
Constructor
bbox = BoundingBox2D(x=10.0, y=20.0, width=100.0, height=200.0)
.from_center(cx, cy, width, height) — From Center Point
bbox = BoundingBox2D.from_center(cx=60.0, cy=120.0, width=100.0, height=200.0)
Many ML models output bounding boxes as (center_x, center_y, width, height). This factory creates a BoundingBox2D from that format.
.area() — Box Area in Pixels
print(bbox.area()) # 20000.0
Width × height. Use for filtering — ignore very small detections (noise) or very large ones (false positives spanning the whole image).
.iou(other) — Intersection Over Union
bbox_a = BoundingBox2D(x=0.0, y=0.0, width=100.0, height=100.0)
bbox_b = BoundingBox2D(x=50.0, y=50.0, width=100.0, height=100.0)
print(bbox_a.iou(bbox_b)) # ~0.143 (partial overlap)
Returns 0.0 (no overlap) to 1.0 (identical boxes). This is the core metric for non-maximum suppression (NMS) — when your detector finds multiple boxes for the same object, keep the highest-confidence one and suppress any box with IoU > threshold (typically 0.3-0.5).
# Simple NMS pattern
detections.sort(key=lambda d: d.confidence, reverse=True)
kept = []
for det in detections:
if all(det.bbox.iou(k.bbox) < 0.5 for k in kept):
kept.append(det)
.as_tuple() / .as_xyxy() — Format Conversion
x, y, w, h = bbox.as_tuple() # (x, y, width, height)
x1, y1, x2, y2 = bbox.as_xyxy() # (x_min, y_min, x_max, y_max)
Different drawing libraries expect different formats. OpenCV uses (x, y, w, h), some plotting tools use (x1, y1, x2, y2).
BoundingBox3D
A 3D bounding box with center, dimensions, and orientation. The constructor takes a single yaw angle for ground-plane rotation (the most common case). For full 3D orientation, use with_rotation.
Constructor
bbox = BoundingBox3D(cx=1.0, cy=2.0, cz=0.5, length=2.0, width=1.0, height=1.5, yaw=0.3)
Fields
| Field | Type | Description |
|---|---|---|
cx, cy, cz | float | Center position in meters |
length | float | Size along the X axis (meters) |
width | float | Size along the Y axis (meters) |
height | float | Size along the Z axis (meters) |
yaw | float | Rotation around the Z axis (radians) |
.with_rotation(cx, cy, cz, length, width, height, roll, pitch, yaw) — Full 3D Rotation
bbox = BoundingBox3D.with_rotation(
cx=1.0, cy=2.0, cz=0.5,
length=2.0, width=1.0, height=1.5,
roll=0.0, pitch=0.1, yaw=0.3
)
Use this when the detected object is tilted or on a slope. The constructor only accepts yaw (rotation around the vertical axis), which is sufficient for objects on flat ground. with_rotation lets you specify all three Euler angles for objects at arbitrary orientations — a crate on a ramp, a drone in flight, or a wall-mounted sensor.
Example — 3D Detection from LiDAR:
from horus import BoundingBox3D
# Detected car: 4.5m long, 1.8m wide, 1.5m tall, heading 30 degrees
car_box = BoundingBox3D(
cx=5.0, cy=2.0, cz=0.75,
length=4.5, width=1.8, height=1.5,
yaw=0.524 # ~30 degrees
)
Detection
A single 2D object detection result — class + confidence + bounding box.
Constructor
det = Detection(class_name="person", confidence=0.95,
x=10.0, y=20.0, width=100.0, height=200.0)
Fields
| Field | Type | Description |
|---|---|---|
class_name | str | Detected object class (e.g., "person", "car") |
confidence | float | Detection confidence, 0.0 to 1.0 |
x, y | float | Top-left corner of bounding box (pixels) |
width, height | float | Bounding box dimensions (pixels) |
class_id | int | Numeric class ID (optional, set via with_class_id()) |
.is_confident(threshold) — Filter Low Confidence
if det.is_confident(0.5):
print(f"Detected {det.class_name} at {det.confidence:.0%}")
Returns True if confidence exceeds the threshold. Typical thresholds:
- 0.3-0.5: Real-time applications (more detections, some false positives)
- 0.7-0.9: High-precision applications (fewer detections, almost no false positives)
.with_class_id(class_id) — Set Numeric Class ID
det = det.with_class_id(1) # COCO class ID for "person"
Returns a new Detection with the class ID set. Many ML frameworks output numeric class IDs alongside string names.
Detection3D
3D object detection with position, size, and optional velocity.
Constructor
det3d = Detection3D(class_name="car", confidence=0.9,
cx=5.0, cy=2.0, cz=0.0,
length=4.5, width=1.8, height=1.5)
Fields
| Field | Type | Description |
|---|---|---|
class_name | str | Detected object class |
confidence | float | Detection confidence, 0.0 to 1.0 |
cx, cy, cz | float | Center position in meters |
length, width, height | float | Object dimensions in meters |
vx, vy, vz | float | Velocity components (m/s, optional) |
.with_velocity(vx, vy, vz) — Add Motion Estimate
det3d = det3d.with_velocity(vx=10.0, vy=0.0, vz=0.0) # Moving at 10 m/s in x
Returns a new Detection3D with velocity components. Use when your 3D detector also estimates object motion (e.g., from multi-frame tracking or radar fusion).
TrackedObject
Multi-object tracking state with a lifecycle: tentative → confirmed → deleted.
A new detection starts as tentative. After being seen in multiple consecutive frames, it's confirmed. If it's not seen for too long, it's deleted. This lifecycle prevents spurious single-frame detections from being treated as real objects.
Constructor
tracked = TrackedObject(track_id=42, class_id=1, confidence=0.9,
x=1.0, y=2.0, width=3.0, height=4.0)
Fields
| Field | Type | Description |
|---|---|---|
track_id | int | Unique track identifier (persists across frames) |
class_id | int | Object class ID |
confidence | float | Latest detection confidence |
x, y | float | Bounding box top-left (pixels) |
width, height | float | Bounding box dimensions (pixels) |
.is_tentative() / .is_confirmed() / .is_deleted() — State Queries
if tracked.is_tentative():
print("New detection — not yet reliable")
elif tracked.is_confirmed():
print("Stable track — use for planning")
elif tracked.is_deleted():
print("Lost track — remove from state")
.confirm() — Promote to Confirmed
tracked.confirm() # Tentative → Confirmed
Call after the object has been matched across enough frames (typically 3-5). Only confirmed tracks should be used for navigation and planning decisions.
.update(bbox, confidence) — New Frame Data
tracked.update(new_bbox, new_confidence)
Updates the track with the latest detection. Resets the "time since update" counter. Call this every frame where the object is re-detected.
Common mistake: Forgetting to call
update()for matched tracks. Without it,time_since_updategrows and the track eventually gets deleted even though you keep detecting the object.
.mark_missed() — Not Seen This Frame
tracked.mark_missed()
Call when the object was NOT detected in the current frame. Increments the miss counter — after enough misses, the track should be deleted.
.delete() — Remove Track
tracked.delete()
Marks the track as deleted. is_deleted() returns True.
.speed() / .heading() — Motion Estimation
print(f"Speed: {tracked.speed():.1f} px/frame")
print(f"Heading: {tracked.heading():.1f} rad")
Computed from the tracked trajectory. Speed is in pixels per frame (or meters per frame if tracking in world coordinates). Heading is the direction of motion in radians.
Landmark, Landmark3D, LandmarkArray
Body pose estimation keypoints — skeleton joints from COCO, MediaPipe, or custom pose models.
Landmark — 2D Keypoint
lm = Landmark(x=100.0, y=200.0, visibility=0.95, index=5)
visible_lm = Landmark.visible(x=100.0, y=200.0, index=5) # visibility=1.0
.is_visible(threshold) — Filter Occluded Keypoints
if lm.is_visible(0.5):
# Keypoint is visible — use for pose estimation
pass
Visibility is a confidence score (0.0 = occluded/not detected, 1.0 = clearly visible). Filter low-visibility keypoints to avoid using unreliable data.
.distance_to(other) — Keypoint Distance
dist = left_wrist.distance_to(right_wrist)
Landmark3D — 3D Keypoint
lm3d = Landmark3D(x=1.0, y=2.0, z=0.5, visibility=0.9, index=10)
lm2d = lm3d.to_2d() # Project to 2D (drops z)
LandmarkArray — Skeleton Presets
# Standard presets for popular pose models
skeleton = LandmarkArray.coco_pose() # 17 COCO keypoints
skeleton = LandmarkArray.mediapipe_pose() # 33 MediaPipe pose keypoints
hand = LandmarkArray.mediapipe_hand() # 21 hand keypoints
face = LandmarkArray.mediapipe_face() # 478 face mesh keypoints
These presets set the correct number of landmarks and dimension for each model. Fill in the actual keypoint coordinates from your model's output.
PlaneDetection
Detected planar surfaces — floors, walls, tables. Used for navigation (floor detection), manipulation (table surface), and augmented reality.
Fields
| Field | Type | Description |
|---|---|---|
nx, ny, nz | float | Plane normal vector components |
d | float | Distance from origin to plane along the normal |
confidence | float | Detection confidence, 0.0 to 1.0 |
.distance_to_point(px, py, pz) — Point-to-Plane Distance
plane = PlaneDetection(...)
dist = plane.distance_to_point(1.0, 2.0, 0.5)
# Signed distance — positive = above plane, negative = below
The signed perpendicular distance from a point to the plane. Use this to check if objects are on, above, or below a surface.
.contains_point(px, py, pz, tolerance) — Is a Point on This Plane?
if plane.contains_point(1.0, 2.0, 0.01, tolerance=0.05):
print("Point is on the table surface (within 5cm)")
Returns True if the point is within tolerance meters of the plane. Use for classifying which objects are on which surface.
SegmentationMask
Pixel-level image segmentation — semantic (class per pixel), instance (unique ID per object), or panoptic (both).
Factory Methods
# Semantic: what class is each pixel?
mask = SegmentationMask.semantic(width=640, height=480, num_classes=21)
# Instance: which object is each pixel?
mask = SegmentationMask.instance(width=640, height=480)
# Panoptic: both class AND instance for each pixel
mask = SegmentationMask.panoptic(width=640, height=480, num_classes=21)
.is_semantic() / .is_instance() / .is_panoptic() — Check Type
if mask.is_semantic():
print("Semantic segmentation mask")
Fields
| Field | Type | Description |
|---|---|---|
width | int | Mask width in pixels |
height | int | Mask height in pixels |
num_classes | int | Number of semantic classes (semantic/panoptic only) |
.data_size() / .data_size_u16()
print(f"Data: {mask.data_size()} bytes (u8), {mask.data_size_u16()} elements (u16)")
Example — Check Driveable Area:
from horus import SegmentationMask, Topic
seg_topic = Topic(SegmentationMask)
ROAD_CLASS = 7 # Example: COCO stuff class for "road"
def check_driveable(node):
mask = seg_topic.recv(node)
if mask is None or not mask.is_semantic():
return
# Count pixels belonging to the road class
# (actual pixel access depends on your data pipeline)
print(f"Segmentation mask: {mask.width}x{mask.height}, {mask.num_classes} classes")
PointField
Describes a single field in a point cloud — name, byte offset, datatype, and element count. Used when defining custom point cloud formats (e.g., XYZ + RGB + intensity).
Constructor
field = PointField(name="x", offset=0, datatype=7, count=1) # FLOAT32
.field_size() — Byte Size of One Element
size = field.field_size() # e.g., 4 for FLOAT32, 8 for FLOAT64
Returns the byte size of a single element based on the datatype. Useful when computing byte offsets for the next field in a point cloud layout, or when parsing raw point cloud buffers.
Design Decisions
Why BoundingBox2D.iou() instead of a standalone function? IoU is always computed between two boxes. Making it a method (bbox_a.iou(bbox_b)) reads naturally and avoids importing a separate utility. It also ensures both boxes use the same coordinate convention (top-left origin, positive width/height).
Why does TrackedObject have an explicit lifecycle (tentative/confirmed/deleted)? Without lifecycle management, every detection is treated equally. A single-frame false positive would trigger the same response as a stable track. The lifecycle pattern filters noise: only confirmed tracks (seen across multiple frames) should influence planning. This is standard practice in multi-object tracking (SORT, DeepSORT, ByteTrack all use this pattern).
Why separate Detection (2D) and Detection3D (3D)? Most object detectors output 2D bounding boxes in image coordinates. 3D detection requires depth information (stereo, LiDAR, monocular depth estimation). Forcing every detection into a 3D struct would waste 7 fields per detection for the 2D case (which is the 90% case). Separate types keep 2D detections lightweight while giving 3D detections full spatial information.
Why LandmarkArray presets (coco_pose(), mediapipe_pose()) instead of a generic constructor? Different pose models output different numbers of keypoints in different orders. COCO has 17 keypoints, MediaPipe Pose has 33, MediaPipe Hand has 21. The presets pre-allocate the correct number of landmarks and set the right dimension, preventing size mismatches between your model output and the message.
Why PlaneDetection.distance_to_point() returns a signed distance? The sign tells you which side of the plane the point is on. Positive means above the plane (same side as the normal), negative means below. This is essential for tasks like "is this object on the table?" (distance near zero) or "is the robot above the floor?" (distance should be positive).
See Also
- Vision Messages — Image, PointCloud, DepthImage
- Geometry Messages — Point3 for 3D positions
- Navigation Messages — OccupancyGrid for mapping from detections
- Python Message Library — All 55+ message types overview