Tensor
A lightweight tensor descriptor for zero-copy ML data sharing across nodes and processes.
use horus::prelude::*;
Overview
Tensor is a lightweight descriptor that references data in shared memory. Only the descriptor is transmitted through topics — the actual tensor data stays in-place, enabling zero-copy transport for large ML payloads.
You typically obtain a Tensor in one of two ways:
- Allocate from a TensorPool (via
Topic<Tensor>or a manual pool) -- see TensorPool API - Receive from a Topic -- another node sends it, you read the descriptor and access the backing data
Methods
| Method | Return Type | Description |
|---|---|---|
shape() | &[u64] | Tensor dimensions (e.g., [1080, 1920, 3]) |
strides() | &[u64] | Byte strides per dimension |
numel() | u64 | Total number of elements |
nbytes() | u64 | Total size in bytes (numel * dtype.element_size()) |
dtype() | TensorDtype | Element data type |
device() | Device | Device location (CPU or CUDA) |
is_cpu() | bool | True if data resides on CPU / shared memory |
is_cuda() | bool | True if device descriptor is set to CUDA |
is_contiguous() | bool | True if memory layout is C-contiguous |
view(new_shape) | Option<Self> | Reshape without copying (fails if not contiguous or element count changes) |
slice_first_dim(start, end) | Option<Self> | Slice along the first dimension, adjusting strides |
Reshape and Slice
let topic: Topic<Tensor> = Topic::new("model.input")?;
if let Some(handle) = topic.recv_handle() {
// Reshape a flat 1D tensor into a batch of images
if let Some(reshaped) = handle.view(&[4, 3, 224, 224]) {
println!("Batch shape: {:?}", reshaped.tensor().shape()); // [4, 3, 224, 224]
}
// Take the first 2 items from a batch
if let Some(sliced) = handle.slice_first_dim(0, 2) {
println!("Sliced shape: {:?}", sliced.tensor().shape()); // [2, 3, 224, 224]
}
}
TensorDtype
Supported element types with sizes and common use cases:
| Dtype | Size | Use Case |
|---|---|---|
F32 | 4 bytes | ML training and inference |
F64 | 8 bytes | High-precision computation |
F16 | 2 bytes | Memory-efficient inference |
BF16 | 2 bytes | Training on modern GPUs |
I8 | 1 byte | Quantized inference |
I16 | 2 bytes | Audio, sensor data |
I32 | 4 bytes | General integer |
I64 | 8 bytes | Large signed values |
U8 | 1 byte | Images |
U16 | 2 bytes | Depth sensors (mm) |
U32 | 4 bytes | Large indices |
U64 | 8 bytes | Counters, timestamps |
Bool | 1 byte | Masks |
TensorDtype Methods
let dtype = TensorDtype::F32;
// Size in bytes
assert_eq!(dtype.element_size(), 4);
// Display (lowercase string representation)
println!("{}", dtype); // "float32"
// Parse from string — accepts common aliases
let parsed = TensorDtype::parse("float32").unwrap(); // F32
let parsed = TensorDtype::parse("f16").unwrap(); // F16
let parsed = TensorDtype::parse("uint8").unwrap(); // U8
let parsed = TensorDtype::parse("bool").unwrap(); // Bool
Device
Pod-safe device descriptor supporting CPU and CUDA device tags. Device is metadata only — Device::cuda(N) tags a tensor with a device target but does not allocate GPU memory (GPU tensor pools are not yet implemented).
// Constructors
let cpu = Device::cpu();
let gpu0 = Device::cuda(0); // Descriptor only — no GPU allocation
// Check device type
assert!(cpu.is_cpu());
assert!(gpu0.is_cuda());
// Display
println!("{}", gpu0); // "cuda:0"
// Parse from string
let dev = Device::parse("cpu").unwrap();
let dev = Device::parse("cuda:0").unwrap();
ML Pipeline Example
A complete example showing a camera node feeding frames to an inference node via Topic<Tensor>:
use horus::prelude::*;
// Producer: camera capture node
node! {
CameraNode {
pub { frames: Tensor -> "camera.rgb" }
data { frame_count: u64 = 0 }
tick {
// Allocate from the topic's auto-managed pool
if let Ok(mut handle) = self.frames.alloc_tensor(
&[480, 640, 3],
TensorDtype::U8,
Device::cpu(),
) {
let pixels = handle.data_slice_mut().unwrap();
// ... fill pixel data from camera driver ...
self.frames.send_handle(&handle);
self.frame_count += 1;
}
}
}
}
// Consumer: inference node
node! {
InferenceNode {
sub { frames: Tensor -> "camera.rgb" }
pub { detections: GenericMessage -> "model.detections" }
tick {
if let Some(handle) = self.frames.recv_handle() {
let data = handle.data_slice().unwrap(); // Zero-copy access
let shape = handle.shape();
hlog!(debug, "Frame: {:?}, {} bytes", shape, handle.nbytes());
// Run inference, publish results ...
}
}
}
}
Python API
PyTensorHandle provides seamless interop with NumPy, PyTorch, and JAX.
Creating Tensors
import horus
import numpy as np
# From NumPy (zero-copy when possible)
arr = np.zeros((480, 640, 3), dtype=np.float32)
handle = horus.PyTensorHandle.from_numpy(arr)
# From PyTorch
import torch
t = torch.randn(1, 3, 224, 224)
handle = horus.PyTensorHandle.from_torch(t)
Converting Back
# To NumPy (zero-copy)
arr = handle.to_numpy()
# To PyTorch
tensor = handle.to_torch()
# To JAX
jax_arr = handle.to_jax()
Properties and Methods
| Property / Method | Description |
|---|---|
shape | Tuple of dimensions |
dtype | Data type string (e.g., "float32") |
device | Device descriptor string (e.g., "cpu") |
is_contiguous() | Check C-contiguous layout |
view(new_shape) | Reshape without copying |
slice(start, end) | Slice along first dimension |
Python ML Pipeline
import horus
import numpy as np
# Subscribe to camera frames
topic = horus.Topic("camera.rgb", horus.Tensor)
while True:
handle = topic.recv_handle()
if handle is not None:
# Zero-copy to NumPy for processing
frame = handle.to_numpy()
print(f"Frame shape: {frame.shape}, dtype: {frame.dtype}")
# Or pass directly to PyTorch model
import torch
tensor = handle.to_torch()
# output = model(tensor.unsqueeze(0))
See Also
- TensorPool API -- Pool management, allocation, and configuration
- Tensor Messages -- Tensor descriptor, TensorDtype, Device reference
- Python Memory Types -- Python TensorHandle with NumPy/PyTorch/JAX zero-copy interop