jtkim-kaist/VAD

Voice activity detection (VAD) toolkit including DNN, bDNN, LSTM and ACAM based VAD. We also provide our directly recorded dataset.

/ 100

Emerging

Implements multi-resolution cochleagram (MRCG) feature extraction with configurable post-processing parameters (hang_before, hang_over, on/off_length) to handle common VAD errors like false positives and dropouts. Built on TensorFlow with Python/MATLAB support, it includes a real-world recorded dataset across four acoustic environments (bus stop, construction site, park, room) at 16kHz sampling with manual speech annotations and low SNR conditions (2-18dB).

869 stars. No commits in the last 6 months.

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 8 / 25

Community 25 / 25

How are scores calculated?

Stars

869

Forks

233

Language

MATLAB

License

—

Higher-rated alternatives

k2-fsa/sherpa-ncnn

Real-time speech recognition and voice activity detection (VAD) using next-gen Kaldi with ncnn...

FluidInference/FluidAudio

Frontier CoreML audio models in your apps — text-to-speech, speech-to-text, voice activity...

phuc-nt/my-translator

Real-time speech translation — macOS & Windows, free TTS, no server, your API keys only

Blaizzy/mlx-audio-swift

A modular Swift SDK for audio processing with MLX on Apple Silicon

pot-app/pot-desktop

🌈一个跨平台的划词翻译和OCR软件 | A cross-platform software for text translation and recognition.

Explore Voice AI Tools

All categories Trending Voice AI directory Insights