jtkim-kaist/VAD
Voice activity detection (VAD) toolkit including DNN, bDNN, LSTM and ACAM based VAD. We also provide our directly recorded dataset.
Implements multi-resolution cochleagram (MRCG) feature extraction with configurable post-processing parameters (hang_before, hang_over, on/off_length) to handle common VAD errors like false positives and dropouts. Built on TensorFlow with Python/MATLAB support, it includes a real-world recorded dataset across four acoustic environments (bus stop, construction site, park, room) at 16kHz sampling with manual speech annotations and low SNR conditions (2-18dB).
869 stars. No commits in the last 6 months.
Stars
869
Forks
233
Language
MATLAB
License
—
Category
Last pushed
Jun 09, 2021
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/jtkim-kaist/VAD"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
k2-fsa/sherpa-ncnn
Real-time speech recognition and voice activity detection (VAD) using next-gen Kaldi with ncnn...
FluidInference/FluidAudio
Frontier CoreML audio models in your apps — text-to-speech, speech-to-text, voice activity...
phuc-nt/my-translator
Real-time speech translation — macOS & Windows, free TTS, no server, your API keys only
Blaizzy/mlx-audio-swift
A modular Swift SDK for audio processing with MLX on Apple Silicon
pot-app/pot-desktop
🌈一个跨平台的划词翻译和OCR软件 | A cross-platform software for text translation and recognition.