fishaudio/Bert-VITS2
vits2 backbone with multilingual-bert
Integrates pretrained multilingual BERT encoders for phoneme/semantic representation instead of traditional text encoders, enabling cross-lingual voice synthesis within a VITS2 generative framework. The architecture combines BERT's contextual embeddings with VITS2's flow-based decoder and adversarial training for improved naturalness across multiple languages. Designed for training custom voice models from scratch with preprocessing pipelines exposed via `webui_preprocess.py`.
8,707 stars. Actively maintained with 1 commit in the last 30 days.
Stars
8,707
Forks
1,267
Language
Python
License
AGPL-3.0
Category
Last pushed
Mar 09, 2026
Commits (30d)
1
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/fishaudio/Bert-VITS2"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
FunAudioLLM/CosyVoice
Multi-lingual large voice generation model, providing inference, training and deployment...
travisvn/chatterbox-tts-api
Local, OpenAI-compatible text-to-speech (TTS) API using Chatterbox, enabling users to generate...
OpenMOSS/MOSS-TTS
MOSS‑TTS Family is an open‑source speech and sound generation model family from MOSI.AI and the...
OpenMOSS/MOSS-TTSD
MOSS-TTSD is a spoken dialogue generation model designed for expressive multi-speaker synthesis....
FunAudioLLM/SenseVoice
Multilingual Voice Understanding Model