nexus-scribe-transcription

Speech recognition and transcription engine using Whisper models.

Overview

Provides two transcription approaches:

  1. SimpleTranscriptionEngine (Recommended) - Uses whisper-rs directly
  2. TranscriptionEngine - Legacy engine with optional NPU support
use nexus_scribe_transcription::SimpleTranscriptionEngine;

let engine = SimpleTranscriptionEngine::new(
    "/opt/NexusScribe/models/whisper/ggml-base.bin",
    Some("en".to_string()),
)?;

// Transcribe 16kHz mono f32 audio
let result = engine.transcribe(&audio_samples).await?;

println!("Text: {}", result.text);
println!("Confidence: {:.2}", result.confidence);
println!("Duration: {}ms", result.duration_ms);

Configuration

use nexus_scribe_transcription::{TranscriptionConfig, TranscriptionTask};

let config = TranscriptionConfig {
    model_version: "large-v3".to_string(),
    language: "en".to_string(),
    task: TranscriptionTask::Transcribe,
    beam_size: 5,
    temperature: 0.0,
    compression_ratio_threshold: 2.4,
    log_prob_threshold: -1.0,
    no_speech_threshold: 0.6,
};

SimpleTranscriptionEngine

Works on any platform without special hardware:

use nexus_scribe_transcription::SimpleTranscriptionEngine;

// Basic transcription
let engine = SimpleTranscriptionEngine::new(model_path, language)?;

// With translation to English
let engine = SimpleTranscriptionEngine::with_translation(
    model_path,
    Some("es".to_string()),  // Source language
)?;

// Transcribe f32 samples
let result = engine.transcribe(&audio_f32).await?;

// Transcribe i16 samples (common from audio capture)
let result = engine.transcribe_i16(&audio_i16, 44100).await?;

TranscriptionResult

pub struct TranscriptionResult {
    pub text: String,
    pub language: String,
    pub confidence: f32,
    pub word_timings: Vec<WordTiming>,
    pub duration_ms: u64,
}

pub struct WordTiming {
    pub word: String,
    pub start_ms: i64,
    pub end_ms: i64,
    pub confidence: f32,
}

Model Sizes

Model Size Speed Accuracy
tiny 75MB Fastest Lower
base 142MB Fast Good
small 466MB Medium Better
medium 1.5GB Slow High
large-v3 3GB Slowest Best

Download models:

# Download using helper binary
cargo run --bin download_model -- base

NPU Support (Optional)

For Hailo NPU acceleration:

use nexus_scribe_transcription::TranscriptionEngine;
use nexus_scribe_models::ModelManager;
use nexus_scribe_hailo::HailoDevice;

let hailo = HailoDevice::new(HailoConfig::default())?;
let model_manager = ModelManager::new(models_path, Some(Arc::new(hailo)))?;

let engine = TranscriptionEngine::new(
    config,
    &model_manager,
    Some(Arc::new(hailo)),
).await?;

// Check availability
if engine.is_npu_available() {
    println!("Using Hailo NPU acceleration");
} else if engine.is_cpu_fallback_available() {
    println!("Using CPU fallback (whisper-rs)");
}

CPU-Only Engine

let engine = TranscriptionEngine::new_cpu_only(
    config,
    "/opt/NexusScribe/models/whisper/ggml-base.bin",
)?;

Whisper Module

Direct access to whisper-rs:

use nexus_scribe_transcription::{WhisperTranscriber, WhisperConfig};

let config = WhisperConfig {
    model_path: "/path/to/model.bin".to_string(),
    language: Some("en".to_string()),
    translate: false,
    n_threads: 0,  // Auto-detect
    use_gpu: true,  // Use CUDA if available
};

let transcriber = WhisperTranscriber::new(config)?;
let result = transcriber.transcribe(&audio).await?;

Post-Processing

Text cleanup and normalization:

use nexus_scribe_transcription::TextPostprocessor;

let processor = TextPostprocessor::new();
let cleaned = processor.process(&raw_text)?;

Features:

  • Punctuation normalization
  • Sentence casing
  • Filler word removal (optional)
  • Profanity filtering (optional)

Feature Flags

[dependencies]
nexus-scribe-transcription = {
    path = "../nexus-scribe-transcription",
    features = ["hailo"]  # Enable NPU support
}

Performance

Target latency: <200ms for real-time transcription on Raspberry Pi 5.

Tips:

  • Use base model for real-time
  • Use large-v3 for post-processing
  • Enable VAD to skip silence