nexus-scribe-audio

Audio capture and preprocessing for real-time transcription.

Overview

Handles the complete audio pipeline:

  • Multi-microphone capture with CPAL
  • Noise reduction and preprocessing
  • Resampling to 16kHz mono
  • Voice Activity Detection (VAD)
  • Real-time streaming with buffering

Audio Configuration

use nexus_scribe_audio::{AudioConfig, AudioProcessor};

let config = AudioConfig {
    sample_rate: 16000,      // Whisper requires 16kHz
    channels: 1,             // Mono for speech recognition
    chunk_size: 1024,        // Samples per chunk
    device_name: None,       // Use default device
};

Audio Processor

Main coordinator for the audio pipeline:

let (mut processor, rx) = AudioProcessor::new(config)?;

// Initialize capture device
processor.initialize_capture()?;

// Start capturing
processor.start_capture().await?;

// Process audio chunks
while let Some(chunk) = processor.process_chunk(&audio, timestamp).await? {
    // Send to transcription engine
    transcription_tx.send(chunk).await?;
}

// Stop when done
processor.stop_capture().await?;

Modules

Capture (capture.rs)

CPAL-based audio capture:

use nexus_scribe_audio::AudioCapture;

let capture = AudioCapture::new(device, config, tx)?;
capture.start()?;
// ...
capture.stop()?;

Preprocessing (preprocessing.rs)

Audio enhancement:

use nexus_scribe_audio::AudioPreprocessor;

let mut preprocessor = AudioPreprocessor::new(16000)?;

// Apply noise reduction, normalization
preprocessor.process(&mut samples)?;

Features:

  • Noise gate
  • Normalization
  • DC offset removal
  • Optional echo cancellation

Resampler (resampler.rs)

Convert between sample rates:

use nexus_scribe_audio::AudioResampler;

// Device is 48kHz, Whisper needs 16kHz
let resampler = AudioResampler::new(48000, 16000, 1)?;

let resampled = resampler.process(&samples)?;

Uses high-quality sinc interpolation.

Voice Activity Detection (vad.rs)

Detect speech segments:

use nexus_scribe_audio::VoiceActivityDetector;

let vad = VoiceActivityDetector::new(16000)?;

if vad.is_speech(&samples)? {
    // Process speech segment
}

Features:

  • Energy-based detection
  • Zero-crossing rate analysis
  • Configurable thresholds

Decode (decode.rs)

Audio format decoding:

use nexus_scribe_audio::decode_audio_file;

// Decode MP3/WAV/FLAC to f32 samples
let samples = decode_audio_file("recording.mp3")?;

Device Enumeration

use cpal::traits::HostTrait;

let host = cpal::default_host();

for device in host.input_devices()? {
    println!("Device: {}", device.name()?);
}

Streaming Subscription

// Get audio receiver for streaming
let rx = processor.subscribe();

tokio::spawn(async move {
    while let Some(chunk) = rx.recv().await {
        // Handle audio chunk
    }
});

Performance Notes

  • Uses lock-free ring buffers internally
  • Processes 30-second windows with 5-second overlap
  • Drops frames under backpressure (never blocks input)

Usage

[dependencies]
nexus-scribe-audio = { path = "../nexus-scribe-audio" }

Required system dependencies:

  • ALSA (Linux)
  • CoreAudio (macOS)
  • WASAPI (Windows)