nexus-scribe-models

ML model management, loading, and caching.

Overview

Centralized model management with:

  • Model discovery and validation
  • Checksum verification
  • Version extraction from filenames
  • LRU caching
  • Warm-up and benchmarking

Model Manager

use nexus_scribe_models::ModelManager;
use std::path::PathBuf;

let manager = ModelManager::new(
    PathBuf::from("/opt/NexusScribe/models"),
    hailo_device,  // Optional
)?;

Model Discovery

// Discover all models
let models = manager.discover_models().await?;

// With checksum computation
use nexus_scribe_models::DiscoveryOptions;

let options = DiscoveryOptions::new().with_checksums();
let models = manager.discover_models_with_options(&options).await?;

Supported formats: .hef, .onnx, .pt

Loading Models

// Load Whisper model
let whisper = manager.load_whisper_model("large-v3").await?;

// Load speaker embedding model
let speaker = manager.load_speaker_model().await?;

// Generic model by name
let info = manager.get_model_info("custom-model").await?;

Model Types

pub enum ModelType {
    Whisper,
    SpeakerEmbedding,
    VAD,
    LanguageDetection,
    Custom(String),
}

Auto-detected from model name:

  • whisper-* -> Whisper
  • speaker-*, diarization-* -> SpeakerEmbedding
  • vad-*, voice-activity-* -> VAD
  • lang-*, lid-* -> LanguageDetection

Model Info

pub struct ModelInfo {
    pub name: String,
    pub version: String,
    pub model_type: ModelType,
    pub path: PathBuf,
    pub size_bytes: u64,
    pub checksum: Option<String>,
    pub metadata: HashMap<String, String>,
}

Version Extraction

Extracted automatically from filenames:

Filename Name Version
whisper-large-v3.hef whisper-large 3.0.0
speaker-v2.1.onnx speaker 2.1.0
model_v1.0.0.pt model 1.0.0
vad-silero.hef vad-silero 1.0.0

Checksum Verification

// Validate model integrity
let valid = manager.validate_model("whisper-large-v3").await?;

// Or manually
if let Some(info) = manager.get_model_info("model-name").await? {
    if info.checksum.is_some() {
        let valid = info.verify_checksum()?;
    }
}

Caching

LRU cache for loaded models:

// Check cache stats
let stats = manager.cache_stats();
println!("Cached: {}/{}", stats.cached_models, stats.max_cache_size);

// Clear cache
let unload_stats = manager.clear_cache().await?;

Unloading Models

// Unload specific model
let stats = manager.unload_model(&loaded_model).await?;
println!("Released {} bytes", stats.model_size_bytes);

// Unload by name
if let Some(stats) = manager.unload_model_by_name("whisper-large-v3").await? {
    println!("NPU released: {}", stats.npu_resources_released);
}

Unload Stats

pub struct UnloadStats {
    pub model_name: String,
    pub model_size_bytes: u64,
    pub removed_from_cache: bool,
    pub npu_resources_released: bool,
    pub duration_ms: u64,
}

Warm-up

Pre-load models for faster first inference:

// Warm up default models
manager.warmup_models().await?;

LoadedModel

pub struct LoadedModel {
    pub info: ModelInfo,
    pub hailo_model: Option<HailoModel>,
}

Registry

use nexus_scribe_models::ModelRegistry;

let mut registry = ModelRegistry::new();

// Register model
registry.register(model_info)?;

// Find by name
let info = registry.find_by_name("whisper-large-v3");

// List all
let all = registry.list_all();

Discovery Options

let options = DiscoveryOptions::new()
    .with_checksums()      // Compute SHA256 checksums
    .without_checksums();  // Skip checksum computation

Usage

[dependencies]
nexus-scribe-models = { path = "../nexus-scribe-models" }

Model Directory Structure

/opt/NexusScribe/models/
├── whisper/
│   ├── ggml-tiny.bin
│   ├── ggml-base.bin
│   └── ggml-large-v3.bin
├── speaker/
│   └── speaker-embedding-v2.onnx
└── vad/
    └── silero-vad-v4.onnx