TensorPool API

HORUS provides efficient tensor memory management through shared memory pools, enabling zero-copy data sharing between processes.

Overview

The TensorPool system consists of:

  • TensorPool - CPU tensor allocation via shared memory
  • TensorHandle - RAII wrapper for automatic memory management
  • Tensor - Lightweight tensor descriptor (metadata only)
  • Device - Pod-safe device location descriptor (Device::cpu(), Device::cuda(N))

For most users, the high-level domain types (Image, PointCloud, DepthImage) are the recommended API — they handle pool management automatically. TensorPool and TensorHandle are internal types for advanced use cases.

Most users should use Image, PointCloud, or DepthImage instead of working with TensorPool/TensorHandle directly. These types handle pool management automatically:

use horus::prelude::*;

// Create an image — pool allocation is automatic
let mut img = Image::new(640, 480, ImageEncoding::Rgb8)?;
img.fill(&[255, 0, 0]); // Red

// Send via topic — zero-copy, only a lightweight descriptor travels
let topic: Topic<Image> = Topic::new("camera.rgb")?;
topic.send(&img);

// Receive
if let Some(img) = topic.recv() {
    let pixel = img.pixel(0, 0); // Direct pixel access
}
// Point clouds work the same way
let pc = PointCloud::new(10000, 3, TensorDtype::F32)?;
let depth = DepthImage::new(480, 640, TensorDtype::F32)?;

See Basic Examples — Camera Image Pipeline for complete send/recv examples.

Auto-Managed Pools (Advanced)

The simplest way to use tensor pools is through Topic<Tensor>, which auto-manages a pool per topic:

use horus::prelude::*;

let topic: Topic<Tensor> = Topic::new("camera.rgb")?;

// Allocate from the topic's auto-managed pool
let handle = topic.alloc_tensor(&[1080, 1920, 3], TensorDtype::U8, Device::cpu())?;

// Write pixel data
let pixels = handle.data_slice_mut()?;
// ... fill pixels ...

// Send — only a lightweight descriptor is transmitted, not the data
topic.send_handle(&handle);
// Receiver side
let topic: Topic<Tensor> = Topic::new("camera.rgb")?;

if let Some(handle) = topic.recv_handle() {
    let data = handle.data_slice()?;  // Zero-copy access
    println!("Shape: {:?}, Dtype: {:?}", handle.shape(), handle.dtype());
}
// Refcount decremented automatically on drop

The pool is created lazily on first use and shared across all Topic<Tensor> instances with the same name — even across processes.

Manual CPU TensorPool

For advanced use cases requiring direct pool control.

Creating a Pool

use horus::prelude::*;
use std::sync::Arc;

// Create with default config (1024 slots, 1GB pool)
let pool = Arc::new(TensorPool::new(1, TensorPoolConfig::default())?);

// Or customize
let config = TensorPoolConfig {
    pool_size: 2 * 1024 * 1024 * 1024, // 2GB
    max_slots: 2048,
    slot_alignment: 64,                 // Cache-line aligned
};
let pool = Arc::new(TensorPool::new(1, config)?);

// Open an existing pool (for consumer processes)
let pool = Arc::new(TensorPool::open(1)?);

// Preset configs
let small_config = TensorPoolConfig::small();  // 64MB, 256 slots
let large_config = TensorPoolConfig::large();  // 4GB, 4096 slots

Allocating Tensors

use horus::prelude::*;

// Allocate a 1080p RGB image
let handle = TensorHandle::alloc(
    pool.clone(),
    &[1080, 1920, 3],
    TensorDtype::U8,
    Device::cpu(),
)?;

// Access data
let data: &mut [u8] = handle.data_slice_mut()?;

// Get tensor descriptor (for sending through Topic)
let tensor: &Tensor = handle.tensor();

// Clone increases refcount automatically
let handle2 = handle.clone();
// Refcount decremented on drop

Supported Data Types

TypeRustSize
TensorDtype::F32f324 bytes
TensorDtype::F64f648 bytes
TensorDtype::F16f162 bytes
TensorDtype::BF16bf162 bytes
TensorDtype::I8i81 byte
TensorDtype::I16i162 bytes
TensorDtype::I32i324 bytes
TensorDtype::I64i648 bytes
TensorDtype::U8u81 byte
TensorDtype::U16u162 bytes
TensorDtype::U32u324 bytes
TensorDtype::U64u648 bytes
TensorDtype::Boolbool1 byte

TensorPoolConfig

FieldTypeDefaultDescription
pool_sizeusize1GBTotal pool size in bytes
max_slotsusize1024Maximum concurrent tensors
slot_alignmentusize64Memory alignment in bytes

Presets: TensorPoolConfig::small() (64MB, 256 slots), TensorPoolConfig::large() (4GB, 4096 slots).

Pool Statistics

let stats = pool.stats();
println!("Pool: {}/{} slots used, {}/{} bytes",
    stats.allocated_slots, stats.max_slots,
    stats.used_bytes, stats.pool_size);

Device Descriptors

The Device type is a Pod-safe descriptor that tags tensors with a target device. Device::cuda(N) creates a descriptor — it does not allocate GPU memory. Actual GPU tensor pools are not yet implemented.

use horus::prelude::*;

// Device is metadata only — no GPU allocation happens here
let mut tensor = Tensor::default();
tensor.set_device(Device::cuda(0));
assert!(tensor.is_cuda());

// Stub detection functions (always return false/0 currently)
println!("CUDA available: {}", cuda_available());   // false
println!("CUDA devices: {}", cuda_device_count());  // 0

Tensor Descriptor

Tensor is a lightweight descriptor that acts as a handle to tensor data in shared memory. Only the descriptor is transmitted through topics — the actual data stays in-place for zero-copy access.

// Key methods on Tensor:
let shape = tensor.shape();         // &[u64] - dimensions
let dtype = tensor.dtype();         // TensorDtype
let dev = tensor.device();          // Device descriptor
let size = tensor.size;             // Total size in bytes (pub field)

Python API

import horus

# Create tensor pool
pool = horus.TensorPool(pool_id=1, size_mb=1024, max_slots=1024)

# Allocate CPU tensor
handle = pool.alloc(shape=(1080, 1920, 3), dtype="float32")

# Use numpy arrays directly
import numpy as np
arr = np.array([1.0, 2.0, 3.0], dtype=np.float32)

# Convert to framework tensors as needed
import torch
torch_tensor = torch.from_numpy(arr)  # Zero-copy

Performance

CPU TensorPool

OperationLatency
Slot allocation~100ns
Cross-process accessZero-copy shared memory

Best Practices

  1. Use auto-managed pools: Prefer Topic<Tensor> with alloc_tensor()/send_handle()/recv_handle() — the framework handles pool lifecycle
  2. Wrap in Arc: When using manual pools, TensorPool doesn't implement Clone — use Arc<TensorPool> for sharing
  3. Reuse pools: Create pools once at startup, not per-tensor
  4. Use TensorHandle: Prefer TensorHandle over manual retain/release for automatic refcounting
  5. Match dtypes: Ensure sender and receiver use same dtype

See Also