nexus-scribe-transcription
Speech recognition and transcription engine using Whisper models.
Overview
Provides two transcription approaches:
SimpleTranscriptionEngine(Recommended) - Uses whisper-rs directlyTranscriptionEngine- Legacy engine with optional NPU support
Quick Start (Recommended)
use nexus_scribe_transcription::SimpleTranscriptionEngine;
let engine = SimpleTranscriptionEngine::new(
"/opt/NexusScribe/models/whisper/ggml-base.bin",
Some("en".to_string()),
)?;
// Transcribe 16kHz mono f32 audio
let result = engine.transcribe(&audio_samples).await?;
println!("Text: {}", result.text);
println!("Confidence: {:.2}", result.confidence);
println!("Duration: {}ms", result.duration_ms);
Configuration
use nexus_scribe_transcription::{TranscriptionConfig, TranscriptionTask};
let config = TranscriptionConfig {
model_version: "large-v3".to_string(),
language: "en".to_string(),
task: TranscriptionTask::Transcribe,
beam_size: 5,
temperature: 0.0,
compression_ratio_threshold: 2.4,
log_prob_threshold: -1.0,
no_speech_threshold: 0.6,
};
SimpleTranscriptionEngine
Works on any platform without special hardware:
use nexus_scribe_transcription::SimpleTranscriptionEngine;
// Basic transcription
let engine = SimpleTranscriptionEngine::new(model_path, language)?;
// With translation to English
let engine = SimpleTranscriptionEngine::with_translation(
model_path,
Some("es".to_string()), // Source language
)?;
// Transcribe f32 samples
let result = engine.transcribe(&audio_f32).await?;
// Transcribe i16 samples (common from audio capture)
let result = engine.transcribe_i16(&audio_i16, 44100).await?;
TranscriptionResult
pub struct TranscriptionResult {
pub text: String,
pub language: String,
pub confidence: f32,
pub word_timings: Vec<WordTiming>,
pub duration_ms: u64,
}
pub struct WordTiming {
pub word: String,
pub start_ms: i64,
pub end_ms: i64,
pub confidence: f32,
}
Model Sizes
| Model | Size | Speed | Accuracy |
|---|---|---|---|
tiny | 75MB | Fastest | Lower |
base | 142MB | Fast | Good |
small | 466MB | Medium | Better |
medium | 1.5GB | Slow | High |
large-v3 | 3GB | Slowest | Best |
Download models:
# Download using helper binary
cargo run --bin download_model -- base
NPU Support (Optional)
For Hailo NPU acceleration:
use nexus_scribe_transcription::TranscriptionEngine;
use nexus_scribe_models::ModelManager;
use nexus_scribe_hailo::HailoDevice;
let hailo = HailoDevice::new(HailoConfig::default())?;
let model_manager = ModelManager::new(models_path, Some(Arc::new(hailo)))?;
let engine = TranscriptionEngine::new(
config,
&model_manager,
Some(Arc::new(hailo)),
).await?;
// Check availability
if engine.is_npu_available() {
println!("Using Hailo NPU acceleration");
} else if engine.is_cpu_fallback_available() {
println!("Using CPU fallback (whisper-rs)");
}
CPU-Only Engine
let engine = TranscriptionEngine::new_cpu_only(
config,
"/opt/NexusScribe/models/whisper/ggml-base.bin",
)?;
Whisper Module
Direct access to whisper-rs:
use nexus_scribe_transcription::{WhisperTranscriber, WhisperConfig};
let config = WhisperConfig {
model_path: "/path/to/model.bin".to_string(),
language: Some("en".to_string()),
translate: false,
n_threads: 0, // Auto-detect
use_gpu: true, // Use CUDA if available
};
let transcriber = WhisperTranscriber::new(config)?;
let result = transcriber.transcribe(&audio).await?;
Post-Processing
Text cleanup and normalization:
use nexus_scribe_transcription::TextPostprocessor;
let processor = TextPostprocessor::new();
let cleaned = processor.process(&raw_text)?;
Features:
- Punctuation normalization
- Sentence casing
- Filler word removal (optional)
- Profanity filtering (optional)
Feature Flags
[dependencies]
nexus-scribe-transcription = {
path = "../nexus-scribe-transcription",
features = ["hailo"] # Enable NPU support
}
Performance
Target latency: <200ms for real-time transcription on Raspberry Pi 5.
Tips:
- Use
basemodel for real-time - Use
large-v3for post-processing - Enable VAD to skip silence