All Voice AI Tools

6,981 tools ranked by quality score · Page 7 of 70

Showing 601–700 of 6,981
# Tool Score Tier
601 myshell-ai/MeloTTS

High-quality multi-lingual text-to-speech library by MyShell.ai. Support...

48
Emerging
602 ringger/transcribe-critic

Multi-source transcript merging inspired by textual criticism — LLM...

48
Emerging
603 zlargon/google-tts

Google TTS (Text-To-Speech) for node.js

48
Emerging
604 AutoArk/GPA

[AutoArk] GPA (General Purpose Audio) can do ASR, TTS and voice conversion...

48
Emerging
605 artcore-c/AI-Voice-Clone-with-Coqui-XTTS-v2

Free voice cloning for creators using Coqui XTTS-v2 on Google Colab. Clone...

48
Emerging
606 devnen/Dia-TTS-Server

Self-host the powerful Dia TTS model. This server offers a user-friendly Web...

48
Emerging
607 alesaccoia/VoiceStreamAI

Near-Realtime audio transcription using self-hosted Whisper and WebSocket in...

48
Emerging
608 Picovoice/cobra

On-device voice activity detection (VAD) powered by deep learning

48
Emerging
609 tarepan/VoiceConversionLab

Collect Voice Conversion researches

48
Emerging
610 KKshitiz/J.A.R.V.I.S

Iron man inspired Personal virtual assistant

48
Emerging
611 jxzhanggg/nonparaSeq2seqVC_code

Implementation code of non-parallel sequence-to-sequence VC

48
Emerging
612 just-ai/aimybox-android-assistant

Embeddable custom voice assistant for Android applications

48
Emerging
613 abhirooptalasila/AutoSub

A CLI script to generate subtitle files (SRT/VTT/TXT) for any video using...

48
Emerging
614 dngda/bot-whatsapp

Unmaintained - Multipurpose WhatsApp Bot 🤖 using open-wa/wa-automate-nodejs...

48
Emerging
615 yl4579/AuxiliaryASR

Joint CTC-S2S Phoneme-level ASR for Voice Conversion and TTS (Text-Mel Alignment)

48
Emerging
616 openspeech-team/openspeech

Open-Source Toolkit for End-to-End Speech Recognition leveraging...

48
Emerging
617 HiMeditator/auto-caption

A cross-platform real-time subtitle display software. 一个跨平台的实时字幕显示软件。

48
Emerging
618 reriiasu/speech-to-text

Real-time transcription using faster-whisper

48
Emerging
619 SuyashMore/MevonAI-Speech-Emotion-Recognition

Identify the emotion of multiple speakers in an Audio Segment

48
Emerging
620 tsurumeso/vocal-remover

Vocal Remover using Deep Neural Networks

48
Emerging
621 keenresearch/keenasr-ios-poc

Proof of concept app that demonstrates use of KeenASR SDK in ObjC. WE ARE...

48
Emerging
622 jpuigcerver/Laia

Laia: A deep learning toolkit for HTR based on Torch

48
Emerging
623 Aivis-Project/AivisSpeech

AivisSpeech: AI Voice Imitation System - Text to Speech Software

48
Emerging
624 gitmylo/bark-voice-cloning-HuBERT-quantizer

The code for the bark-voicecloning model. Training and inference.

48
Emerging
625 blaisewf/rvc-cli

🚀 RVC + UVR = A perfect set of tools for voice cloning, easily and free!

48
Emerging
626 KinglittleQ/GST-Tacotron

A PyTorch implementation of Style Tokens: Unsupervised Style Modeling,...

48
Emerging
627 BoltzmannEntropy/xtts2-ui

A User Interface for XTTS-2 Text-Based Voice Cloning using only 10 seconds of speech

48
Emerging
628 gentaiscool/end2end-asr-pytorch

End-to-End Automatic Speech Recognition on PyTorch

48
Emerging
629 keonlee9420/STYLER

Official repository of STYLER: Style Factor Modeling with Rapidity and...

48
Emerging
630 gooofy/zamia-speech

Open tools and data for cloudless automatic speech recognition

48
Emerging
631 XiaoMi/kaldi-onnx

Kaldi model converter to ONNX

48
Emerging
632 haoheliu/voicefixer_main

General Speech Restoration

48
Emerging
633 EnjiRouz/Voice-Assistant-App

Python Voice Assistant project can: recognize and synthesize speech without...

48
Emerging
634 filippogiruzzi/voice_activity_detection

Voice Activity Detection based on Deep Learning & TensorFlow

48
Emerging
635 gexgd0419/NaturalVoiceSAPIAdapter

Make Azure natural TTS voices accessible to any SAPI 5-compatible application.

48
Emerging
636 yl4579/PL-BERT

Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme Predictions

48
Emerging
637 rolczynski/Automatic-Speech-Recognition

🎧 Automatic Speech Recognition: DeepSpeech & Seq2Seq (TensorFlow)

48
Emerging
638 harry0703/AudioNotes

快速提取音视频内容,整理成一份结构化的markdown笔记

48
Emerging
639 clovaai/ClovaCall

ClovaCall dataset and Pytorch LAS baseline code (Interspeech 2020)

48
Emerging
640 iamjanvijay/rnnt_decoder_cuda

An efficient implementation of RNN-T Prefix Beam Search in C++/CUDA.

48
Emerging
641 fulldecent/vowel-practice

iOS application for finding formants in spoken sounds

48
Emerging
642 rishikksh20/FastSpeech2

PyTorch Implementation of FastSpeech 2 : Fast and High-Quality End-to-End...

48
Emerging
643 lucasnewman/best-rq-pytorch

Implementation of BEST-RQ - a model for self-supervised learning of speech...

48
Emerging
644 philipperemy/tensorflow-ctc-speech-recognition

Application of Connectionist Temporal Classification (CTC) for Speech...

48
Emerging
645 andresayac/edge-tts

Edge TTS is a Node or Bun package that allows access to the online...

48
Emerging
646 thepirat000/spleeter-api

Audio separation API using Spleeter from Deezer

48
Emerging
647 PlayVoice/vits_chinese

Best practice TTS based on BERT and VITS with some Natural Speech Features...

48
Emerging
648 Mag1cFall/AIStudio2API

将AI Studio反代成OpenAI兼容的API | OpenAI-compatible API proxy for Google AI Studio

48
Emerging
649 enhuiz/vall-e

An unofficial PyTorch implementation of the audio LM VALL-E

48
Emerging
650 voicekit-team/T-one

T-one is a high-performance streaming ASR pipeline for Russian, specialized...

47
Emerging
651 prateekkalra/Selection-js

A lightweight javascipt library which provides users with a set of options...

47
Emerging
652 cvqluu/Factorized-TDNN

PyTorch implementation of the Factorized TDNN (TDNN-F) from "Semi-Orthogonal...

47
Emerging
653 saiteja-talluri/Speech2Face

Implementation of the CVPR 2019 Paper - Speech2Face: Learning the Face...

47
Emerging
654 symblai/speech-recognition-evaluation

Evaluate results from ASR/Speech-to-Text quickly

47
Emerging
655 roatienza/efficientspeech

PyTorch code implementation of EfficientSpeech - to be presented at ICASSP2023.

47
Emerging
656 IBM/MAX-Speech-to-Text-Converter

Converts spoken words into text form.

47
Emerging
657 linto-ai/linto-studio

Transcription and annotation interface for recorded audio or video files

47
Emerging
658 microsoft/UniSpeech

UniSpeech - Large Scale Self-Supervised Learning for Speech

47
Emerging
659 yy4382/tts-importer

轻松将 Azure TTS 语音合成服务导入阅读软件。现支持阅读(legado)、爱阅记、源阅读。

47
Emerging
660 alphacep/vosk-asterisk

Speech Recognition in Asterisk with Vosk Server

47
Emerging
661 mediatechlab/tts-wrapper

TTS-Wrapper makes it easier to use text-to-speech APIs by providing a...

47
Emerging
662 LEEYOONHYUNG/BVAE-TTS

Official implementation of BVAE-TTS

47
Emerging
663 StephenVinouze/KontinuousSpeechRecognizer

A Kotlin Speech Recognizer that runs continuously and is triggered with an...

47
Emerging
664 hs-CN/msedge-tts

This library is a wrapper of MSEdge Read aloud function API. You can use it...

47
Emerging
665 IS2AI/Kazakh_TTS

An expanded version of the previously released Kazakh text-to-speech...

47
Emerging
666 Nikorasu/LiveWhisper

A nearly-live implementation of OpenAI's Whisper, using sounddevice....

47
Emerging
667 BandarLabs/gitpodcast

Convert any git repository into an engaging podcast

47
Emerging
668 chenyme/Chenyme-AAVT

这是一个全自动(音频)视频翻译项目。利用Whisper识别声音,AI大模型翻译字幕,最后合并字幕视频,生成翻译后的视频。

47
Emerging
669 coqui-ai/STT

🐸STT - The deep learning toolkit for Speech-to-Text. Training and deploying...

47
Emerging
670 exPHAT/SwiftWhisper

🎤 The easiest way to transcribe audio in Swift

47
Emerging
671 lmnt-com/wavegrad

A fast, high-quality neural vocoder.

47
Emerging
672 EgorLakomkin/KTSpeechCrawler

Automatically constructing corpus for automatic speech recognition from...

47
Emerging
673 tiberiu44/TTS-Cube

End-2-end speech synthesis with recurrent neural networks

47
Emerging
674 JJWRoeloffs/transcribe_align_textgrid

A small wrapper package around whisper-timestamped. Create force-aligned...

47
Emerging
675 ArchishmanSengupta/autovoiceevals

A self-improving loop for voice AI agents. Uses karpathy's autoresearch as...

47
Emerging
676 1038lab/ComfyUI-EdgeTTS

ComfyUI-EdgeTTS is a powerful text-to-speech node for ComfyUI, leveraging...

47
Emerging
677 gitmylo/audio-webui

A webui for different audio related Neural Networks

47
Emerging
678 bricewalker/Hey-Jetson

Deep Learning based Automatic Speech Recognition with attention for the...

47
Emerging
679 shamspias/vibevoice-studio

Beautiful voice app: record or upload to train a voice, generate speech from...

47
Emerging
680 Ashish-Patnaik/kokoclone

Voice Cloning, Now Inside Kokoro. Generate natural multilingual speech and...

47
Emerging
681 ycyy/edge-tts-webui

edge-tts webui

47
Emerging
682 puntorigen/podcast_tts

A class for generating realistic audio (TTS) for podcasts and dialogues.

47
Emerging
683 bensonruan/Chrome-Web-Speech-API

Chrome Web Speech API

47
Emerging
684 dspavankumar/keras-kaldi

Keras Interface for Kaldi ASR

47
Emerging
685 DrewThomasson/VoxNovel

VoxNovel: generate audiobooks giving each character a different voice actor.

47
Emerging
686 BatuhanYilmaz26/Auto-Subtitled-Video-Generator

Input a YouTube video link or upload a video file and get a video with subtitles.

47
Emerging
687 atomicoo/tacotron2-mandarin

Tensorflow implementation of Chinese/Mandarin TTS (Text-to-Speech) based on...

47
Emerging
688 VidyasagarMSC/WatBot

An Android ChatBot powered by IBM Watson Services (Assistant V1,...

47
Emerging
689 xenova/whisper-web

ML-powered speech recognition directly in your browser

47
Emerging
690 ivcylc/OpenMusic

OpenMusic: SOTA Text-to-music (TTM) Generation

47
Emerging
691 travisvn/edge-tts-client

Client-side (web browser) implementation of Edge TTS package — Microsoft...

47
Emerging
692 BuildWithAIs/voicekey

Voice to text, one key to input.

47
Emerging
693 yl4579/PitchExtractor

Deep Neural Pitch Extractor for Voice Conversion and TTS Training

47
Emerging
694 BogiHsu/Tacotron2-PyTorch

Yet another PyTorch implementation of Tacotron 2 with reduction factor and...

47
Emerging
695 amazon-archives/amazon-polly-sample

Sample application for Amazon Polly. Allows to convert any blog into an...

47
Emerging
696 longluo/EbookReader

The EbookReader Android App. Support file format like epub, pdf, txt, html,...

47
Emerging
697 jimbozhang/kaldi-gop

Kaldi-based goodness of pronunciation (GOP)

47
Emerging
698 louiskirsch/speechT

An opensource speech-to-text software written in tensorflow

47
Emerging
699 supershaneski/openai-whisper-talk

openai-whisper-talk is a sample voice conversation application powered by...

47
Emerging
700 Kyubyong/tacotron_asr

Speech Recognition Using Tacotron

47
Emerging
« Prev 1 2 3 5 6 7 8 9 68 69 70 Next »