jonatasgrosman/huggingsound
HuggingSound: A toolkit for speech-related tasks based on Hugging Face's tools
Provides unified APIs for speech recognition inference, fine-tuning, and evaluation using Hugging Face Hub models with optional language model decoding via KenLM or Flashlight. Supports character-level timestamp and confidence outputs, multiple LM decoder backends (Kensho, Parlance, Flashlight), and handles various audio formats through ffmpeg integration. Built on PyTorch and Transformers, targeting wav2vec2 and similar CTC-based models for multilingual speech tasks.
470 stars and 404 monthly downloads. No commits in the last 6 months. Available on PyPI.
Stars
470
Forks
46
Language
Python
License
MIT
Category
Last pushed
Sep 20, 2023
Monthly downloads
404
Commits (30d)
0
Dependencies
5
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/jonatasgrosman/huggingsound"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.