nexus-scribe-speaker
Speaker diarization and identification engine.
Overview
Real-time speaker identification with:
- Neural speaker embedding extraction (d-vector/x-vector)
- Voice profile enrollment and management
- Speaker change detection (<50ms latency)
- Clustering for unknown speakers
- Multi-speaker conversation tracking
Configuration
use nexus_scribe_speaker::SpeakerConfig;
let config = SpeakerConfig {
embedding_dim: 256,
similarity_threshold: 0.75,
change_detection_threshold: 0.65,
min_segment_duration_ms: 500,
clustering_threshold: 0.70,
};
Speaker Engine
use nexus_scribe_speaker::{SpeakerEngine, SpeakerConfig};
let engine = SpeakerEngine::new(config, db).await?;
Speaker Enrollment
Register known speakers:
// Enroll new speaker with audio sample
let speaker_id = engine.enroll_speaker(&audio_samples, "John Doe").await?;
// Update profile with additional samples
engine.update_speaker_profile(speaker_id, &more_samples).await?;
// List all enrolled speakers
let speakers = engine.list_speakers().await?;
// Delete speaker profile
engine.delete_speaker(speaker_id).await?;
Real-time Processing
Process audio chunks during meetings:
let segment = engine.process_chunk(&audio, timestamp_ms).await?;
println!("Speaker: {:?}", segment.speaker_name);
println!("Confidence: {:.2}", segment.confidence);
Speaker Identification
Identify speaker from audio:
if let Some(speaker_id) = engine.identify_speaker(&audio).await? {
println!("Identified speaker: {}", speaker_id);
} else {
println!("Unknown speaker");
}
Data Types
SpeakerProfile
pub struct SpeakerProfile {
pub id: Uuid,
pub name: String,
pub embedding: Vec<f32>,
pub enrollment_samples: usize,
pub created_at: i64,
pub updated_at: i64,
}
SpeakerSegment
pub struct SpeakerSegment {
pub speaker_id: Option<Uuid>,
pub speaker_name: Option<String>,
pub start_ms: u64,
pub end_ms: u64,
pub confidence: f32,
}
Modules
Embedding Extraction (embedding.rs)
Extract speaker embeddings from audio:
use nexus_scribe_speaker::EmbeddingExtractor;
let extractor = EmbeddingExtractor::new(256)?;
let embedding = extractor.extract(&audio)?;
Features:
- MFCC feature extraction
- Neural network embedding
- Normalization
Profile Management (profile.rs)
Database-backed speaker profiles:
use nexus_scribe_speaker::ProfileManager;
let manager = ProfileManager::new(db)?;
// Identify speaker by embedding
let (speaker_id, confidence) = manager.identify(
&embedding,
similarity_threshold,
).await?;
Clustering (clustering.rs)
Group unknown speakers:
use nexus_scribe_speaker::SpeakerClusterer;
let clusterer = SpeakerClusterer::new(0.70);
let cluster_id = clusterer.add_embedding(&embedding)?;
let clusters = clusterer.get_clusters();
Uses agglomerative clustering with cosine similarity.
Change Detection (detector.rs)
Detect speaker transitions:
use nexus_scribe_speaker::ChangeDetector;
let detector = ChangeDetector::new(0.65);
if detector.detect_change(¤t_embedding)? {
// Speaker changed
}
NPU Acceleration
With Hailo feature:
#[cfg(feature = "hailo")]
let engine = SpeakerEngine::new_with_hailo(
config,
db,
Some(Arc::new(hailo_device)),
).await?;
Usage
[dependencies]
nexus-scribe-speaker = { path = "../nexus-scribe-speaker" }
Performance
- Embedding extraction: ~20ms per chunk
- Speaker identification: ~5ms
- Change detection: ~2ms
Total pipeline latency: <50ms target