nexus-scribe-speaker

Speaker diarization and identification engine.

Overview

Real-time speaker identification with:

  • Neural speaker embedding extraction (d-vector/x-vector)
  • Voice profile enrollment and management
  • Speaker change detection (<50ms latency)
  • Clustering for unknown speakers
  • Multi-speaker conversation tracking

Configuration

use nexus_scribe_speaker::SpeakerConfig;

let config = SpeakerConfig {
    embedding_dim: 256,
    similarity_threshold: 0.75,
    change_detection_threshold: 0.65,
    min_segment_duration_ms: 500,
    clustering_threshold: 0.70,
};

Speaker Engine

use nexus_scribe_speaker::{SpeakerEngine, SpeakerConfig};

let engine = SpeakerEngine::new(config, db).await?;

Speaker Enrollment

Register known speakers:

// Enroll new speaker with audio sample
let speaker_id = engine.enroll_speaker(&audio_samples, "John Doe").await?;

// Update profile with additional samples
engine.update_speaker_profile(speaker_id, &more_samples).await?;

// List all enrolled speakers
let speakers = engine.list_speakers().await?;

// Delete speaker profile
engine.delete_speaker(speaker_id).await?;

Real-time Processing

Process audio chunks during meetings:

let segment = engine.process_chunk(&audio, timestamp_ms).await?;

println!("Speaker: {:?}", segment.speaker_name);
println!("Confidence: {:.2}", segment.confidence);

Speaker Identification

Identify speaker from audio:

if let Some(speaker_id) = engine.identify_speaker(&audio).await? {
    println!("Identified speaker: {}", speaker_id);
} else {
    println!("Unknown speaker");
}

Data Types

SpeakerProfile

pub struct SpeakerProfile {
    pub id: Uuid,
    pub name: String,
    pub embedding: Vec<f32>,
    pub enrollment_samples: usize,
    pub created_at: i64,
    pub updated_at: i64,
}

SpeakerSegment

pub struct SpeakerSegment {
    pub speaker_id: Option<Uuid>,
    pub speaker_name: Option<String>,
    pub start_ms: u64,
    pub end_ms: u64,
    pub confidence: f32,
}

Modules

Embedding Extraction (embedding.rs)

Extract speaker embeddings from audio:

use nexus_scribe_speaker::EmbeddingExtractor;

let extractor = EmbeddingExtractor::new(256)?;
let embedding = extractor.extract(&audio)?;

Features:

  • MFCC feature extraction
  • Neural network embedding
  • Normalization

Profile Management (profile.rs)

Database-backed speaker profiles:

use nexus_scribe_speaker::ProfileManager;

let manager = ProfileManager::new(db)?;

// Identify speaker by embedding
let (speaker_id, confidence) = manager.identify(
    &embedding,
    similarity_threshold,
).await?;

Clustering (clustering.rs)

Group unknown speakers:

use nexus_scribe_speaker::SpeakerClusterer;

let clusterer = SpeakerClusterer::new(0.70);

let cluster_id = clusterer.add_embedding(&embedding)?;
let clusters = clusterer.get_clusters();

Uses agglomerative clustering with cosine similarity.

Change Detection (detector.rs)

Detect speaker transitions:

use nexus_scribe_speaker::ChangeDetector;

let detector = ChangeDetector::new(0.65);

if detector.detect_change(&current_embedding)? {
    // Speaker changed
}

NPU Acceleration

With Hailo feature:

#[cfg(feature = "hailo")]
let engine = SpeakerEngine::new_with_hailo(
    config,
    db,
    Some(Arc::new(hailo_device)),
).await?;

Usage

[dependencies]
nexus-scribe-speaker = { path = "../nexus-scribe-speaker" }

Performance

  • Embedding extraction: ~20ms per chunk
  • Speaker identification: ~5ms
  • Change detection: ~2ms

Total pipeline latency: <50ms target