TensorPool API
HORUS provides efficient tensor memory management through shared memory pools, enabling zero-copy data sharing between processes.
Overview
The TensorPool system consists of:
- TensorPool - CPU tensor allocation via shared memory
- TensorHandle - RAII wrapper for automatic memory management
- Tensor - Lightweight tensor descriptor (metadata only)
- Device - Pod-safe device location descriptor (
Device::cpu(),Device::cuda(N))
For most users, the high-level domain types (Image, PointCloud, DepthImage) are the recommended API — they handle pool management automatically. TensorPool and TensorHandle are internal types for advanced use cases.
Recommended: Use Domain Types
Most users should use Image, PointCloud, or DepthImage instead of working with TensorPool/TensorHandle directly. These types handle pool management automatically:
use horus::prelude::*;
// Create an image — pool allocation is automatic
let mut img = Image::new(640, 480, ImageEncoding::Rgb8)?;
img.fill(&[255, 0, 0]); // Red
// Send via topic — zero-copy, only a lightweight descriptor travels
let topic: Topic<Image> = Topic::new("camera.rgb")?;
topic.send(&img);
// Receive
if let Some(img) = topic.recv() {
let pixel = img.pixel(0, 0); // Direct pixel access
}
// Point clouds work the same way
let pc = PointCloud::new(10000, 3, TensorDtype::F32)?;
let depth = DepthImage::new(480, 640, TensorDtype::F32)?;
See Basic Examples — Camera Image Pipeline for complete send/recv examples.
Auto-Managed Pools (Advanced)
The simplest way to use tensor pools is through Topic<Tensor>, which auto-manages a pool per topic:
use horus::prelude::*;
let topic: Topic<Tensor> = Topic::new("camera.rgb")?;
// Allocate from the topic's auto-managed pool
let handle = topic.alloc_tensor(&[1080, 1920, 3], TensorDtype::U8, Device::cpu())?;
// Write pixel data
let pixels = handle.data_slice_mut()?;
// ... fill pixels ...
// Send — only a lightweight descriptor is transmitted, not the data
topic.send_handle(&handle);
// Receiver side
let topic: Topic<Tensor> = Topic::new("camera.rgb")?;
if let Some(handle) = topic.recv_handle() {
let data = handle.data_slice()?; // Zero-copy access
println!("Shape: {:?}, Dtype: {:?}", handle.shape(), handle.dtype());
}
// Refcount decremented automatically on drop
The pool is created lazily on first use and shared across all Topic<Tensor> instances with the same name — even across processes.
Manual CPU TensorPool
For advanced use cases requiring direct pool control.
Creating a Pool
use horus::prelude::*;
use std::sync::Arc;
// Create with default config (1024 slots, 1GB pool)
let pool = Arc::new(TensorPool::new(1, TensorPoolConfig::default())?);
// Or customize
let config = TensorPoolConfig {
pool_size: 2 * 1024 * 1024 * 1024, // 2GB
max_slots: 2048,
slot_alignment: 64, // Cache-line aligned
};
let pool = Arc::new(TensorPool::new(1, config)?);
// Open an existing pool (for consumer processes)
let pool = Arc::new(TensorPool::open(1)?);
// Preset configs
let small_config = TensorPoolConfig::small(); // 64MB, 256 slots
let large_config = TensorPoolConfig::large(); // 4GB, 4096 slots
Allocating Tensors
use horus::prelude::*;
// Allocate a 1080p RGB image
let handle = TensorHandle::alloc(
pool.clone(),
&[1080, 1920, 3],
TensorDtype::U8,
Device::cpu(),
)?;
// Access data
let data: &mut [u8] = handle.data_slice_mut()?;
// Get tensor descriptor (for sending through Topic)
let tensor: &Tensor = handle.tensor();
// Clone increases refcount automatically
let handle2 = handle.clone();
// Refcount decremented on drop
Supported Data Types
| Type | Rust | Size |
|---|---|---|
TensorDtype::F32 | f32 | 4 bytes |
TensorDtype::F64 | f64 | 8 bytes |
TensorDtype::F16 | f16 | 2 bytes |
TensorDtype::BF16 | bf16 | 2 bytes |
TensorDtype::I8 | i8 | 1 byte |
TensorDtype::I16 | i16 | 2 bytes |
TensorDtype::I32 | i32 | 4 bytes |
TensorDtype::I64 | i64 | 8 bytes |
TensorDtype::U8 | u8 | 1 byte |
TensorDtype::U16 | u16 | 2 bytes |
TensorDtype::U32 | u32 | 4 bytes |
TensorDtype::U64 | u64 | 8 bytes |
TensorDtype::Bool | bool | 1 byte |
TensorPoolConfig
| Field | Type | Default | Description |
|---|---|---|---|
pool_size | usize | 1GB | Total pool size in bytes |
max_slots | usize | 1024 | Maximum concurrent tensors |
slot_alignment | usize | 64 | Memory alignment in bytes |
Presets: TensorPoolConfig::small() (64MB, 256 slots), TensorPoolConfig::large() (4GB, 4096 slots).
Pool Statistics
let stats = pool.stats();
println!("Pool: {}/{} slots used, {}/{} bytes",
stats.allocated_slots, stats.max_slots,
stats.used_bytes, stats.pool_size);
Device Descriptors
The Device type is a Pod-safe descriptor that tags tensors with a target device. Device::cuda(N) creates a descriptor — it does not allocate GPU memory. Actual GPU tensor pools are not yet implemented.
use horus::prelude::*;
// Device is metadata only — no GPU allocation happens here
let mut tensor = Tensor::default();
tensor.set_device(Device::cuda(0));
assert!(tensor.is_cuda());
// Stub detection functions (always return false/0 currently)
println!("CUDA available: {}", cuda_available()); // false
println!("CUDA devices: {}", cuda_device_count()); // 0
Tensor Descriptor
Tensor is a lightweight descriptor that acts as a handle to tensor data in shared memory. Only the descriptor is transmitted through topics — the actual data stays in-place for zero-copy access.
// Key methods on Tensor:
let shape = tensor.shape(); // &[u64] - dimensions
let dtype = tensor.dtype(); // TensorDtype
let dev = tensor.device(); // Device descriptor
let size = tensor.size; // Total size in bytes (pub field)
Python API
import horus
# Create tensor pool
pool = horus.TensorPool(pool_id=1, size_mb=1024, max_slots=1024)
# Allocate CPU tensor
handle = pool.alloc(shape=(1080, 1920, 3), dtype="float32")
# Use numpy arrays directly
import numpy as np
arr = np.array([1.0, 2.0, 3.0], dtype=np.float32)
# Convert to framework tensors as needed
import torch
torch_tensor = torch.from_numpy(arr) # Zero-copy
Performance
CPU TensorPool
| Operation | Latency |
|---|---|
| Slot allocation | ~100ns |
| Cross-process access | Zero-copy shared memory |
Best Practices
- Use auto-managed pools: Prefer
Topic<Tensor>withalloc_tensor()/send_handle()/recv_handle()— the framework handles pool lifecycle - Wrap in Arc: When using manual pools, TensorPool doesn't implement Clone — use
Arc<TensorPool>for sharing - Reuse pools: Create pools once at startup, not per-tensor
- Use TensorHandle: Prefer
TensorHandleover manualretain/releasefor automatic refcounting - Match dtypes: Ensure sender and receiver use same dtype
See Also
- Tensor Messages - Tensor, TensorDtype, Device types and domain wrappers
- Topic & Shared Memory - How HORUS achieves zero-copy IPC
- Python Memory Types — Python TensorHandle, Image, PointCloud, DepthImage