nexus-scribe-audio
Audio capture and preprocessing for real-time transcription.
Overview
Handles the complete audio pipeline:
- Multi-microphone capture with CPAL
- Noise reduction and preprocessing
- Resampling to 16kHz mono
- Voice Activity Detection (VAD)
- Real-time streaming with buffering
Audio Configuration
use nexus_scribe_audio::{AudioConfig, AudioProcessor};
let config = AudioConfig {
sample_rate: 16000, // Whisper requires 16kHz
channels: 1, // Mono for speech recognition
chunk_size: 1024, // Samples per chunk
device_name: None, // Use default device
};
Audio Processor
Main coordinator for the audio pipeline:
let (mut processor, rx) = AudioProcessor::new(config)?;
// Initialize capture device
processor.initialize_capture()?;
// Start capturing
processor.start_capture().await?;
// Process audio chunks
while let Some(chunk) = processor.process_chunk(&audio, timestamp).await? {
// Send to transcription engine
transcription_tx.send(chunk).await?;
}
// Stop when done
processor.stop_capture().await?;
Modules
Capture (capture.rs)
CPAL-based audio capture:
use nexus_scribe_audio::AudioCapture;
let capture = AudioCapture::new(device, config, tx)?;
capture.start()?;
// ...
capture.stop()?;
Preprocessing (preprocessing.rs)
Audio enhancement:
use nexus_scribe_audio::AudioPreprocessor;
let mut preprocessor = AudioPreprocessor::new(16000)?;
// Apply noise reduction, normalization
preprocessor.process(&mut samples)?;
Features:
- Noise gate
- Normalization
- DC offset removal
- Optional echo cancellation
Resampler (resampler.rs)
Convert between sample rates:
use nexus_scribe_audio::AudioResampler;
// Device is 48kHz, Whisper needs 16kHz
let resampler = AudioResampler::new(48000, 16000, 1)?;
let resampled = resampler.process(&samples)?;
Uses high-quality sinc interpolation.
Voice Activity Detection (vad.rs)
Detect speech segments:
use nexus_scribe_audio::VoiceActivityDetector;
let vad = VoiceActivityDetector::new(16000)?;
if vad.is_speech(&samples)? {
// Process speech segment
}
Features:
- Energy-based detection
- Zero-crossing rate analysis
- Configurable thresholds
Decode (decode.rs)
Audio format decoding:
use nexus_scribe_audio::decode_audio_file;
// Decode MP3/WAV/FLAC to f32 samples
let samples = decode_audio_file("recording.mp3")?;
Device Enumeration
use cpal::traits::HostTrait;
let host = cpal::default_host();
for device in host.input_devices()? {
println!("Device: {}", device.name()?);
}
Streaming Subscription
// Get audio receiver for streaming
let rx = processor.subscribe();
tokio::spawn(async move {
while let Some(chunk) = rx.recv().await {
// Handle audio chunk
}
});
Performance Notes
- Uses lock-free ring buffers internally
- Processes 30-second windows with 5-second overlap
- Drops frames under backpressure (never blocks input)
Usage
[dependencies]
nexus-scribe-audio = { path = "../nexus-scribe-audio" }
Required system dependencies:
- ALSA (Linux)
- CoreAudio (macOS)
- WASAPI (Windows)