Voice AI Categories

.NET TTS Libraries

.NET/PowerShell libraries and SDKs for text-to-speech integration across multiple providers (Azure, OpenAI, Microsoft Speech SDK). Does NOT include general voice-ai SDKs, platform-specific implementations, or non-.NET speech tools.

203 tools

General Purpose Voice Assistants

Standalone voice assistant applications with integrated speech recognition, NLP, and task automation capabilities. Does NOT include specialized assistants (emergency, customer service, voice cloning), mobile-only implementations, or tools focused primarily on a single function like speech-to-text.

187 tools

Lightweight TTS Libraries

Minimal, dependency-light text-to-speech implementations and wrappers for local/offline synthesis. Does NOT include API wrappers, cloud-based services, speech recognition, or production-grade TTS engines.

185 tools

Automatic Speech Recognition

Libraries, frameworks, and tools for building, training, and evaluating automatic speech recognition (ASR) systems. Does NOT include pre-built transcription APIs, TTS systems, or ASR applications like transcription apps or meeting summarizers.

161 tools

Web Speech API Libraries

Angular and JavaScript libraries wrapping the browser's native Web Speech API for speech recognition functionality. Does NOT include commercial speech APIs (Speechly, Deepgram), text-to-speech, or framework-agnostic speech frameworks.

149 tools

Web Speech API TTS

Browser-native text-to-speech implementations using the Web Speech API for client-side voice synthesis. Does NOT include cloud TTS services, advanced voice cloning, specialized model implementations (TensorFlow, Coqui, etc.), or documentation/content generation tools.

149 tools

Speech-To-Text Converters

Tools that transcribe audio files or streams into text using Whisper or similar models. Does NOT include diarization, video processing, subtitle generation, voice typing, or server/API implementations.

147 tools

Android Speech Apps

Native Android applications integrating speech-to-text (STT) and text-to-speech (TTS) capabilities for mobile use cases like messaging, navigation, and accessibility. Does NOT include cloud APIs, SDKs, or web-based tools.

113 tools

Keyword Speech Recognition

Machine learning models for recognizing isolated spoken words/commands from audio using CNNs, RNNs, and neural networks. Does NOT include continuous speech-to-text ASR, end-to-end speech recognition pipelines, or general audio classification beyond single-word detection.

112 tools

End-to-End ASR Frameworks

PyTorch-based implementations of complete automatic speech recognition systems with integrated acoustic modeling, feature extraction, and decoding. Does NOT include ASR evaluation metrics, language models, individual components (vocoder, G2P), or non-PyTorch frameworks like Kaldi-only solutions.

109 tools

Local Voice Assistants

Complete offline voice assistant systems that combine speech recognition, language models, and text-to-speech into integrated conversational agents running entirely on local hardware. Does NOT include individual components (ASR, TTS, LLM separately), cloud-dependent assistants, or specialized applications like coding assistants or language tutoring.

101 tools

iOS Speech Frameworks

Native iOS/macOS/tvOS applications and SDKs wrapping Apple's Speech.framework and AVFoundation for text-to-speech, speech recognition, or voice synthesis. Does NOT include cloud-only services, Python/web implementations, or general voice AI applications without iOS-native code.

99 tools

Self-Hosted TTS Servers

Complete server implementations and APIs for running TTS models locally, including voice cloning and streaming capabilities. Does NOT include TTS libraries, wrappers for commercial services, or standalone applications without API/server components.

97 tools

Voice Agent Applications

End-to-end voice AI agents that handle specific real-world tasks (customer service, insurance claims, healthcare, political outreach, restaurant orders). Does NOT include standalone ASR/TTS components, SDK libraries, or single-capability voice tools.

88 tools

Discord TTS Bots

Discord bots that convert text messages to speech in voice channels. Does NOT include music bots, general Discord bots without TTS functionality, or TTS APIs/services themselves.

86 tools

Python Voice Assistants

Desktop and local Python-based voice assistants with task automation capabilities. Includes general-purpose voice command systems inspired by JARVIS/Alexa. Does NOT include platform-specific implementations (Discord, Slack bots), specialized voice applications (gaming, coding), or core speech/TTS components.

82 tools

Voice Controlled Robotics

Physical robots and robotic platforms controlled via voice commands or speech recognition. Includes DIY robot projects, robot kits, and voice-activated mechanical systems. Does NOT include home automation, virtual assistants, or purely software-based voice applications.

81 tools

Lightweight TTS Runtimes

Lightweight, self-contained Text-to-Speech implementations optimized for edge deployment, local inference, and minimal dependencies (typically C++/ONNX-based). Does NOT include cloud-based TTS APIs, general speech synthesis frameworks, or non-TTS applications.

79 tools

Speech Recognition APIs

Tools and implementations for converting spoken audio to text using cloud APIs (Google, etc.) and basic speech-to-text workflows. Does NOT include speech translation, conversational systems, cascading architectures, or domain-specific applications like voice typing editors or JOSM integration.

78 tools

AI Video Generation

Tools for automatically generating short-form or long-form videos from text, topics, or source material using AI for scripting, visuals, voiceovers, and editing. Does NOT include video editing software, speech recognition tools, or real-time video synthesis from physics simulation.

75 tools

Google TTS Libraries

Node.js/JavaScript libraries and wrappers for Google's Text-to-Speech API and Google Translate TTS. Does NOT include other TTS providers (AWS Polly, IBM Watson, ElevenLabs), home automation integrations, or applications built on top of TTS.

75 tools

FastSpeech TTS Models

PyTorch implementations and variants of FastSpeech and FastSpeech2 architectures for neural text-to-speech synthesis. Does NOT include other TTS architectures (Transformer-TTS, Glow-TTS), vocoder implementations, or non-FastSpeech based speech synthesis models.

74 tools

Kokoro TTS Ecosystem

Implementations, deployments, and applications of the Kokoro TTS model across different platforms and formats (ONNX, CoreML, C++, Android, web). Does NOT include other TTS models, voice cloning extensions, or non-Kokoro speech synthesis engines.

72 tools

Voice Cloning Tools

Applications and libraries for cloning voices from audio samples and synthesizing speech in cloned voices. Includes web interfaces, APIs, and local implementations using models like XTTS-v2, Coqui TTS, and YourTTS. Does NOT include general TTS without cloning capability, speech recognition, or avatar/video generation.

71 tools

OpenAI TTS Applications

Web and desktop applications built on OpenAI's text-to-speech API, including wrappers, UI clients, and end-user tools. Does NOT include other TTS providers (Azure, AWS Polly, Piper, ElevenLabs), speech recognition, or lower-level SDKs/libraries.

71 tools

Neural Vocoder Implementations

Tools and models for converting mel-spectrograms or acoustic features into high-fidelity waveforms using neural networks (GANs, diffusion, autoregressive models). Does NOT include end-to-end TTS systems, speech recognition, or general audio processing.

71 tools

Coqui TTS Applications

Production implementations and wrappers around Coqui TTS engine (including XTTS variants) with APIs, servers, language-specific adaptations, and UI frontends. Does NOT include general TTS frameworks, other TTS engines, or speech recognition tools.

70 tools

Tacotron TTS Models

Implementations and variants of Tacotron and Tacotron2 neural architectures for end-to-end text-to-speech synthesis. Does NOT include other TTS architectures (FastSpeech, Glow-TTS, VITS), neural vocoders, or non-TTS applications.

70 tools

Voice Chatbot Applications

Conversational AI systems that process voice input, generate intelligent responses, and output speech. Includes chatbots with speech-to-text and text-to-speech pipelines for dialogue. Does NOT include standalone speech recognition, text-to-speech engines, or non-conversational voice tools.

67 tools

Kaldi ASR Ecosystem

Tools, recipes, models, and utilities built on or for the Kaldi ASR framework, including language-specific implementations, format converters, and training pipelines. Does NOT include non-Kaldi ASR systems, general speech recognition APIs, or TTS tools.

66 tools

CTC ASR Implementations

End-to-end speech recognition systems using Connectionist Temporal Classification (CTC) loss function for acoustic modeling. Includes CTC decoders, training frameworks, and CTC variants (CTC-CRF). Does NOT include general ASR frameworks without explicit CTC focus, TTS systems, or non-neural ASR approaches.

65 tools

Java TTS Libraries

Java-based text-to-speech libraries and frameworks that synthesize speech from text. Includes wrappers around cloud APIs (Google, Microsoft) and offline TTS engines. Does NOT include speech recognition, voice cloning, Android apps, or server/application implementations.

65 tools

Voice Command Assistants

Full-featured AI assistants that combine speech recognition, natural language understanding, and voice synthesis to handle conversational tasks like news retrieval, information lookup, and general assistance. Does NOT include isolated ASR/TTS components, chatbots without voice I/O, or task-specific voice tools like calculators or coding assistants.

65 tools

Qwen3 TTS Applications

Web and desktop applications built on Alibaba's Qwen3-TTS model for text-to-speech synthesis, voice cloning, voice customization, and audiobook generation. Does NOT include ASR tools, non-Qwen TTS models, or lower-level TTS SDKs/APIs.

64 tools

Browser TTS Extensions

Chrome/browser extensions that convert web content (text, subtitles, selected passages) to speech for accessibility and reading assistance. Does NOT include standalone TTS libraries, voice cloning, ASR/transcription, or non-browser applications.

63 tools

Speech Corpora Datasets

Collections and catalogs of annotated speech audio data for training ASR, TTS, and voice AI models. Does NOT include tools for processing/cleaning datasets, annotation pipelines, or model implementations.

63 tools

eBook to Audiobook Conversion

Tools for converting written documents (ebooks, PDFs, EPUBs) into audio format using text-to-speech technology. Does NOT include general TTS engines, podcast creation from scripts, or audio-only content generation without a source document.

62 tools

Edge TTS Implementations

Implementations and integrations of Microsoft Edge's text-to-speech service across different platforms and applications. Does NOT include other TTS engines (AWS Polly, IBM Watson), speech recognition, or general voice AI SDKs.

62 tools

Text To Speech Frameworks

62 tools

Local Voice Dictation

Tools for real-time, on-device speech-to-text input that integrates with system text fields and applications via hotkey activation. Focuses on privacy-preserving local transcription for immediate typing/input workflows. Does NOT include cloud-based transcription, diarization, subtitle generation, or voice synthesis.

62 tools

Whisper Subtitle Generation

Tools that automatically generate subtitle files (SRT, VTT, etc.) from video/audio using speech recognition and optionally translate them. Does NOT include real-time captioning, audio alignment/synchronization libraries, or diarization systems.

60 tools

AI Tutoring Platforms

Interactive AI-powered educational applications that provide personalized instruction, conversational learning, and real-time feedback across subjects through voice, chat, or multimodal interfaces. Does NOT include general educational content generators, speech therapy tools, or standalone speech recognition/pronunciation analysis without tutoring context.

59 tools

Go TTS Libraries

Go/Golang libraries and SDKs for text-to-speech conversion, including integrations with cloud speech APIs and lightweight local TTS engines. Does NOT include applications built on top of TTS, non-Go implementations, or speech recognition (ASR) tools.

59 tools

Content-to-Podcast Converters

Tools that transform written content (emails, web articles, PDFs, spreadsheets) into audio podcast episodes using AI script generation and text-to-speech. Does NOT include general TTS tools, speech translation, or manual podcast production platforms.

58 tools

Voice AI Learning Collections

Educational Python repositories and coding practice collections covering diverse domains (utilities, automation, tutorials). Does NOT include specialized voice-AI tools, production applications, or focused libraries for specific tasks like TTS/ASR.

57 tools

Educational Voice Apps

Mobile and web applications that use voice AI (speech recognition, text-to-speech, voice commands) as core features for learning, teaching, or educational engagement across subjects like language learning, history, accessibility, and skill development. Does NOT include general-purpose voice assistants, voice translation tools without educational context, or accessibility apps focused primarily on independence tasks (banking, navigation, shopping).

56 tools

AI Avatar Platforms

Real-time interactive digital humans with synchronized lip-sync, voice cloning, and conversational AI. Includes avatar creation systems, facial animation, and end-to-end platforms combining video synthesis with voice interaction. Does NOT include standalone TTS, ASR, or video generation tools used independently.

53 tools

Speech AI Coursework

Educational materials, course assignments, tutorials, and seminar presentations focused on teaching speech processing, voice systems, and audio AI fundamentals. Does NOT include production tools, commercial applications, or research papers without educational scaffolding.

53 tools

Voice ChatGPT Interfaces

Conversational AI interfaces that combine ChatGPT/LLMs with speech-to-text and text-to-speech for spoken dialogue. Does NOT include single-purpose assistants (coding, cooking, robotics), SDK libraries, or implementations of specific TTS/ASR engines.

53 tools

Multimodal Medical Assistants

AI healthcare assistants combining voice (speech recognition/synthesis) with vision (image analysis) for medical consultation and diagnosis support. Does NOT include general medical chatbots without multimodal capabilities, voice-only medical apps, or non-healthcare multimodal systems.

53 tools

Android Voice Assistants

Complete voice assistant applications for Android devices with integrated speech recognition and command processing. Does NOT include SDKs/libraries, web-based assistants, or specialized single-function voice tools.

52 tools

TTS Model Fine-Tuning

Repositories for fine-tuning and training text-to-speech models on custom datasets, including LoRA and full model adaptation. Does NOT include pre-built TTS services, inference-only implementations, or general voice cloning without model training.

52 tools

Assistive Vision AI

Tools combining computer vision (object detection, OCR, scene understanding) with voice interface to assist visually impaired users in real-time navigation and environmental awareness. Does NOT include general image-to-speech, security surveillance, or non-accessibility-focused vision systems.

50 tools

Telegram Voice Transcription

Telegram bots that convert voice messages, video notes, and audio files to text transcripts. Does NOT include general chatbots, translation-only tools, or non-Telegram voice applications.

49 tools

Meeting Transcription Summarizers

Tools that automatically transcribe meetings/lectures and generate summaries from audio or video recordings. Includes speaker diarization and timestamp organization. Does NOT include general transcription tools, podcast converters, or lecture slide generation.

49 tools

Voice Controlled Desktop Automation

Tools that use voice commands to control desktop applications, automate system tasks, and interact with OS functions. Does NOT include chatbots, general-purpose assistants without desktop control, or physical hardware projects.

47 tools

FunASR Speech Recognition

Speech recognition APIs and clients built on or wrapping FunASR and similar open-source ASR frameworks. Includes deployment servers, language bindings, and integration layers. Does NOT include text-to-speech, voice assistants, or end-user applications using ASR as a component.

46 tools

Wav2Vec2 ASR Models

Fine-tuning frameworks and implementations of Wav2Vec 2.0 for automatic speech recognition across languages. Does NOT include general ASR systems using other architectures (WaveNet, etc.), TTS, or non-ASR applications of Wav2Vec.

46 tools

Speech Emotion Recognition

Tools for detecting, classifying, and analyzing emotional states from audio speech input. Includes multimodal approaches combining speech with text/lyrics. Does NOT include general sentiment analysis of text-only content, hate speech detection, or emotion-modulated TTS output generation.

45 tools

Wake Word Detection

Tools for detecting specific trigger words or commands in audio streams, typically optimized for embedded/edge devices. Does NOT include general speech recognition, ASR, or speech classification beyond wake-word activation.

45 tools

Vue Speech Recognition

Vue.js components and libraries for integrating Web Speech API and speech-to-text functionality into web applications. Does NOT include text-to-speech, voice synthesis, or non-Vue speech recognition tools.

45 tools

Rust TTS Libraries

Rust bindings, crates, and wrappers for text-to-speech engines and TTS APIs. Does NOT include non-Rust TTS implementations, speech recognition, or higher-level applications built on TTS.

45 tools

Audio Transcription Apps

Web and mobile applications that convert speech to text in real-time or from audio files, with features like note-taking, translation, or formatting. Does NOT include ASR model implementations, TTS synthesis, or voice assistants.

44 tools

eSpeak-NG Ecosystem

Wrappers, bindings, and extensions for eSpeak NG across multiple programming languages and platforms. Does NOT include other TTS engines, general text-to-speech tools, or non-eSpeak speech synthesis projects.

43 tools

Speech Translation Apps

Applications that translate spoken language from one language to another in real-time or near-real-time, combining speech recognition, translation, and text-to-speech synthesis. Does NOT include standalone translation tools, transcription-only apps, or single-language speech synthesis.

43 tools

Zero-Shot Voice Synthesis

Tools for synthesizing speech with zero-shot or few-shot learning, enabling speaker cloning, emotion control, style transfer, and voice conversion without extensive training data. Does NOT include general text-to-speech engines, ASR systems, or non-zero-shot voice synthesis approaches.

43 tools

Deepgram Starter Projects

Beginner-friendly demo applications and starter templates for Deepgram's speech APIs (transcription, text-to-speech, voice agents) across multiple frameworks and languages. Does NOT include SDK libraries, production applications, or non-Deepgram speech tools.

43 tools

Vosk ASR Implementations

Offline speech recognition tools and integrations built on the Vosk toolkit. Does NOT include other ASR engines, text-to-speech, or general voice AI platforms.

42 tools

Gradio TTS WebUIs

Gradio-based web interfaces for text-to-speech and voice synthesis tools. Includes wrapped TTS engines with UI controls for voice selection, speed, and audio export. Does NOT include standalone TTS libraries, non-Gradio web frameworks, or voice cloning without TTS generation.

42 tools

Video Dubbing Tools

End-to-end solutions for automatically translating and dubbing video content with synchronized speech synthesis and voice cloning. Does NOT include general video generation, subtitle tools, or standalone TTS/ASR services.

41 tools

Voice Cloning Synthesis

41 tools

ElevenLabs Integrations

Wrappers, clients, and integrations for the ElevenLabs API and platform. Does NOT include general TTS tools, other TTS services, or applications that only use ElevenLabs as one of multiple backends.

40 tools

Video Transcription Extraction

Tools that transcribe video/audio content into text format with optional summarization, translation, or subtitle generation. Does NOT include voice cloning, speaker diarization, or real-time streaming analysis.

39 tools

Real-Time Voice Translation

Tools that capture spoken language in real-time, translate it to another language, and output the result as speech or text. Focuses on live interpretation across languages. Does NOT include static text translation, document translation, or tools primarily designed for transcription without translation.

38 tools

Piper TTS Ecosystem

Tools, integrations, and extensions built around the Piper TTS system, including model training, platform ports, wrappers, and specialized implementations. Does NOT include general TTS systems, other TTS engines, or non-Piper speech synthesis tools.

38 tools

Speaker Diarization Embedding

37 tools

Whisper Transcription Apps

36 tools

AWS Polly TTS

Tools and applications for converting text to speech using AWS Polly and related cloud TTS services. Includes integrations, API wrappers, and implementations across various platforms. Does NOT include general TTS frameworks, speech recognition, or voice cloning tools.

36 tools

Twitch Chat TTS

Tools that convert Twitch chat messages to spoken audio for streamers and their audiences. Includes chat filtering, channel points integration, and stream platform adapters. Does NOT include general TTS services, non-streaming chat applications, or LLM-based response generation (unless TTS output is the primary focus).

35 tools

Sign Language Translation

Tools that convert between sign language and spoken/written language using gesture recognition, computer vision, and speech synthesis. Does NOT include general gesture control, voice commands, or accessibility features that don't involve sign language translation.

34 tools

AI-Powered eReaders

Tools for reading digital books (EPUB, PDF, manga, etc.) with integrated AI features like text-to-speech, synchronized highlighting, OCR-based translation, and interactive learning. Does NOT include general TTS engines, standalone audiobook players, or non-reading-focused applications.

33 tools

React Native Voice Libraries

React Native libraries and modules for speech recognition, text-to-speech, and voice processing on iOS and Android. Does NOT include complete voice applications, web frameworks, or non-React Native speech tools.

33 tools

TTS Dataset Creation

Tools and workflows for preparing, recording, processing, and organizing audio datasets specifically for training text-to-speech models. Does NOT include pre-built TTS datasets, TTS model training frameworks, or general speech datasets for ASR/voice cloning.

33 tools

VITS TTS Implementations

VITS-based text-to-speech models, servers, and fine-tuning projects across multiple languages. Includes VITS variants (Bert-VITS2, MB-iSTFT-VITS), API implementations, and language-specific deployments. Does NOT include non-VITS TTS engines, speech recognition, or voice cloning systems without TTS synthesis.

32 tools

PDF to Audio Conversion

Tools that convert PDF documents into audio files through text extraction and text-to-speech synthesis. Does NOT include general TTS engines, video conversion, or tools that read non-PDF text files.

32 tools

Audio Transcription Tools

31 tools

React Speech Recognition

React applications using browser-based speech-to-text APIs for voice input and transcription. Does NOT include text-to-speech, voice synthesis, backend speech processing, or non-React voice implementations.

31 tools

Voice Dictation Typing

Tools that convert speech-to-text in real-time and directly input the transcribed text into applications for typing/dictation purposes. Does NOT include translation, diarization, subtitle generation, or voice synthesis applications.

30 tools

Parakeet ASR Implementations

Open-source speech-to-text and transcription tools built on or compatible with NVIDIA Parakeet models, including local deployments, API servers, and optimized inference frameworks. Does NOT include general TTS synthesis, non-Parakeet ASR systems, or voice cloning applications.

30 tools

System TTS Wrappers

Lightweight wrappers and CLI interfaces for operating system built-in text-to-speech engines (macOS `say`, Windows SAPI, etc.). Does NOT include cloud-based TTS APIs, specialized speech synthesis libraries, or applications built on top of TTS.

30 tools

SMS Voice Integrations

Plugins and integrations for sending SMS and text-to-speech calls through third-party APIs (like seven.io, TotalVoice, InfoBip) into existing platforms and workflows. Does NOT include standalone TTS engines, speech recognition, or general voice AI assistants.

29 tools

Cross-Platform TTS Frameworks

Libraries and frameworks that provide unified APIs for accessing multiple TTS engines across different operating systems and platforms. Does NOT include standalone TTS applications, speech recognition, or language-specific TTS implementations.

29 tools

Conformer ASR Implementations

Implementations and variants of the Conformer architecture for automatic speech recognition. Does NOT include general ASR frameworks, other acoustic models, or fine-tuned models for specific languages/domains without conformer as the core architecture.

28 tools

Voice Enabled Coding Assistants

Tools that add voice input/output capabilities to AI coding assistants (Claude Code, etc.) via TTS, STT, or voice cloning. Does NOT include standalone audio tools, general voice apps, or tools without coding assistant integration.

28 tools

Text To Speech Conversion

27 tools

Whisper Framework Ports

Framework-specific implementations and bindings of Whisper.cpp for game engines, mobile platforms, and system integrations. Includes language bindings and platform-specific compilations. Does NOT include higher-level applications, UI wrappers, or server implementations that use Whisper.

27 tools

Whisper Fine-Tuning

Tools and frameworks for fine-tuning Whisper models on custom datasets, including language-specific adaptation, accent conditioning, and model distillation. Does NOT include pre-built Whisper applications, deployment wrappers, or inference optimization without training components.

27 tools

Live Caption Generation

Real-time speech-to-text transcription and subtitle display for audio/video streams, broadcasts, and live events. Does NOT include speech translation (unless paired with transcription), general ASR models, or non-real-time captioning systems.

27 tools

ASR Evaluation Metrics

Tools for measuring and analyzing the accuracy of automatic speech recognition systems through metrics like WER, CER, and DER. Does NOT include ASR models themselves, transcription services, or general audio quality assessment.

26 tools

ComfyUI TTS Nodes

Custom ComfyUI nodes and integrations for text-to-speech synthesis across multiple TTS models and engines. Does NOT include standalone TTS applications, speech recognition, voice conversion, or non-ComfyUI TTS tools.

26 tools

PHP TTS Libraries

PHP libraries and packages for text-to-speech synthesis across multiple TTS providers (Google Cloud, AWS Alexa, etc.). Does NOT include framework-specific implementations (Laravel, Yii2), non-PHP TTS tools, or speech recognition systems.

26 tools

Audio Noise Reduction

25 tools

Live Meeting Translation

Real-time speech translation and captioning for meetings, lectures, and video calls using browser APIs or dedicated platforms. Does NOT include general speech recognition, text translation tools, or non-real-time transcription services.

25 tools

Whisper Diarization

Tools that combine OpenAI Whisper (or similar ASR) with speaker diarization to identify and separate speakers in audio. Does NOT include general transcription without speaker identification, or standalone diarization tools without ASR components.

24 tools

Grapheme-to-Phoneme Conversion

Tools for converting written text (graphemes) into phonetic representations (phonemes) across languages. Includes G2P models, phonemizers, and language-specific phonetic converters. Does NOT include general text-to-speech synthesis, speech recognition, or IPA transcription editors.

24 tools

Rust Speech Recognition

Rust-based speech-to-text and audio processing libraries with local inference capabilities. Includes STT engines, noise reduction, and voice activity detection. Does NOT include cloud-dependent APIs, TTS, LLMs, or non-Rust implementations.

24 tools

Streamlit TTS Apps

Streamlit-based applications for text-to-speech conversion and speech synthesis. Includes multi-language TTS, audio playback interfaces, and speech editing tools. Does NOT include speech recognition (STT), voice cloning, specialized TTS libraries, or non-Streamlit implementations.

24 tools

Embedded TTS Systems

Lightweight text-to-speech implementations for microcontrollers and embedded devices (Arduino, ESP32, Teensy). Does NOT include cloud-based TTS services, general TTS libraries for standard computers, or speech recognition systems.

22 tools

Interactive AI Avatars

Tools for creating animated 2D/3D character avatars (Live2D, VRM) that interact via voice and text with real-time lip-sync, facial expressions, and emotional responses. Does NOT include static avatar generators, general chatbots without animation, or VTuber streaming infrastructure separate from the avatar interaction system.

22 tools

Anki TTS Integration

Tools and utilities that integrate text-to-speech capabilities into Anki flashcard decks for audio generation, playback, and language learning. Does NOT include standalone TTS engines, general speech recognition, or non-Anki language learning applications.

22 tools

OpenClaw Voice Assistants

Complete voice interface applications and integrations built on the OpenClaw platform, combining speech recognition, text-to-speech, and conversational AI. Does NOT include standalone STT/TTS tools, general voice SDKs, or non-OpenClaw voice assistants.

21 tools

Voice AI SDKs

Python SDKs and client libraries for commercial voice AI platforms (ASR, TTS, translation, voice agents). Does NOT include open-source speech recognition implementations, language models, or framework-specific integrations.

21 tools

Yandex SpeechKit Tools

SDKs, wrappers, and integrations for Yandex SpeechKit API across multiple languages and platforms. Does NOT include general speech recognition/TTS tools, other cloud providers' speech APIs, or smart home devices unrelated to SpeechKit.

21 tools

Audio Source Separation

20 tools

News Audio Bulletins

Tools that automatically gather, summarize, and convert news content into audio format for consumption via broadcasts, bots, or apps. Includes news scraping, summarization, and TTS integration. Does NOT include general content-to-podcast converters, non-news audio synthesis, or standalone TTS/ASR tools.

19 tools

Web-Based TTS Apps

Flask and web framework-based text-to-speech applications with user interfaces for converting text to speech. Includes dictionary/image text extraction features integrated into web apps. Does NOT include standalone TTS libraries, speech-to-text, or voice cloning tools.

19 tools

AI Interview Simulators

Platforms for practicing job interviews with AI interviewers using voice/video, real-time feedback, and automated scoring. Does NOT include general career coaching, resume builders, or hiring/recruitment platforms for employers.

19 tools

Image-to-Speech Synthesis

Tools that convert visual content (images, documents, video frames) into spoken audio through image captioning, optical character recognition, or visual description generation combined with text-to-speech. Does NOT include standalone OCR, image captioning without audio output, or general TTS systems without visual input processing.

19 tools

Text Normalization Engines

Tools for normalizing written text into spoken forms across languages, handling numbers, dates, abbreviations, and special characters for TTS and speech processing. Does NOT include general text-to-speech synthesis, speech recognition, or audio processing.

17 tools

Home Assistant TTS

Home Assistant integrations and plugins that add text-to-speech capabilities to Home Assistant automation platforms. Does NOT include standalone TTS engines, general voice AI tools, or non-Home Assistant smart home integrations.

17 tools

Face Recognition Systems

Authentication and access control systems that use facial recognition as the primary security mechanism, often combined with multimodal verification (voice, liveness detection, QR codes). Does NOT include general computer vision, face detection without authentication, or standalone speech/voice systems.

17 tools

IBM Watson Speech

Tools and integrations for IBM Watson's speech-to-text and text-to-speech APIs. Includes implementations across programming languages, real-time transcription, and Watson service applications. Does NOT include other cloud speech providers (AWS, Google Cloud) or general-purpose voice AI frameworks.

15 tools

Government Procurement Docs

Solicitation documents, acquisition guides, and procurement-related materials for government technology projects and services. Does NOT include actual software tools, implementations, or operational systems.

15 tools

Clipboard Text-to-Speech

Tools that monitor the clipboard and automatically read aloud any copied text using TTS engines. Does NOT include general text-to-speech, selected-text readers without clipboard integration, or voice synthesis SDKs.

15 tools

Ukrainian Voice AI

Open-source speech recognition, text-to-speech, and phonetic processing tools specifically for the Ukrainian language. Does NOT include multilingual solutions, language-agnostic frameworks, or non-Ukrainian language implementations.

13 tools

Text To Speech Tts

12 tools

Whisper Speech Transcription

12 tools

Voice Assistant Devices

12 tools

Persian Speech AI

Tools, datasets, and models specifically for Persian/Farsi speech recognition, text-to-speech, and related NLP tasks. Does NOT include general multilingual speech tools, non-Persian language resources, or speech AI for other languages.

12 tools

Audio Music Learning

11 tools

Multilingual Speech Datasets

Curated speech corpora and audio datasets across multiple languages for training ASR and speech processing models. Does NOT include text-to-speech synthesis, voice cloning, or speech recognition inference tools.

11 tools

Speech To Text Transcription

10 tools

Voice Assistant Applications

9 tools

Voice Ai Assistants

8 tools

Voice Controlled Calculators

Tools that perform mathematical calculations through voice input/commands and typically provide spoken output. Includes scientific calculators, unit converters, and math solvers with voice interfaces. Does NOT include general math tutoring apps, voice coding assistants, or text-to-speech engines without calculation functionality.

8 tools

Stt

8 tools

Ai Podcast Generation

7 tools

Conversational Chatbot Applications

7 tools

Lip Reading Synthesis

7 tools

Sign Language Recognition

7 tools

Voice Interactive Games

Web-based games and educational applications using speech recognition for user input and gameplay mechanics (number guessing, word games, pronunciation training). Does NOT include voice assistants, transcription tools, or non-interactive speech recognition systems.

7 tools

Text To Speech

6 tools

Voice Assistant Frameworks

6 tools

Multimodal Vision Language

6 tools

Voice Assistant Projects

5 tools

Tts

5 tools

Wav2Vec2 Speech Recognition

4 tools

Data Annotation Tools

4 tools

Speech Synthesis Diffusion

4 tools

Text To Video Generation

4 tools

Virtual Assistants Nlp

4 tools

Bioacoustic Species Classification

4 tools

Audio Event Classification

4 tools

Voice Ai Agents

3 tools

Speech Recognition Datasets

3 tools

Unity Ml Inference

3 tools

Deepfake Detection Systems

3 tools

Personal Assistant Rag

3 tools

Conversational Rag Agents

3 tools

Image Caption Generation

3 tools

Facial Attribute Classification

3 tools

Joke Telling Apps

Web and desktop applications that fetch jokes from APIs and use text-to-speech to narrate them aloud. Does NOT include general text-to-speech tools, voice cloning, speech recognition, or joke APIs themselves.

3 tools

Text To Speech Mcp

2 tools

Llm Scaling Architecture

2 tools

Comfyui Extensions

2 tools

Flutter Ai Chat Apps

2 tools

Multi Modal Ai Assistants

2 tools

Ai Virtual Companions

2 tools

Machine Translation Systems

2 tools

Ai Chatbot Interfaces

2 tools

Next Word Prediction

2 tools

Audio Classification Transformers

2 tools

Ai Image Generation Platforms

2 tools

Natural Language Task Scheduling

2 tools

Text Translation Tools

2 tools

Ai Workflow Automation

1 tools

Ai Assistant Platforms

1 tools

Text Embedding Runtimes

1 tools

Mediapipe Implementations

1 tools

Discord Ai Chatbots

1 tools

Vision Language Models

1 tools

Indic Language Translation

1 tools

Neural Machine Translation

1 tools

Gpt Implementation Tutorials

1 tools

Gemini Prompt Workbenches

1 tools

Speculative Decoding Algorithms

1 tools

Text Scanning Ocr

1 tools

Text Emotion Recognition

1 tools

Multi Agent Orchestration

1 tools

Llm Inference Serving

1 tools

Vibe Coding Frameworks

1 tools

Vietnamese Nlp Tools

1 tools

Respiratory Disease Detection

1 tools

Ai Terminal Agents

1 tools

Ai Note Taking Apps

1 tools

Document Qa Chatbots

1 tools

Ai Children Storytelling

1 tools

Nlp Task Libraries

1 tools

Llm Fine Tuning

1 tools

Chatbot Frameworks

1 tools

Talking Head Generation

1 tools

Gemini Api Applications

1 tools

Llm Docker Deployments

1 tools

Stress Detection Ml

1 tools

Nlp Dataset Collections

1 tools

Fullstack Ai Assistants

1 tools

Graph Database Rag

1 tools

Video Content Intelligence

1 tools

Temporal Expression Parsing

1 tools

Health App Development

1 tools

Clip Vision Language

1 tools

Ai Interview Coaching

1 tools

Hand Gesture Control

1 tools

Ml Benchmarking Frameworks

1 tools

Viral Clip Generation

1 tools

Model Compression Optimization

1 tools

Edge Camera Ml

1 tools

Ocr Document Extraction

1 tools

Go Ml Bindings

1 tools

Reading Comprehension Qa

1 tools

Tokenization Libraries

1 tools

Llm Translation Tools

1 tools

Ai Skill Integrations

1 tools

Facial Recognition Apps

1 tools

Federated Learning Frameworks

1 tools

Personal Knowledge Management

1 tools

Flashcard Generation

1 tools

Streamlit Chatbot Apps

1 tools

Ml Learning Resources

1 tools

Llm Sdk Packages

1 tools

Semantic Kernel Tools

1 tools

Embedding Model Tuning

1 tools

Llm Learning Resources

1 tools

Chatbot Nlp Frameworks

1 tools

Telegram Llm Bots

1 tools

Nlu Game Applications

1 tools

Diffusion Model Frameworks

1 tools

Image Classification Demos

1 tools