fishaudio/Bert-VITS2

vits2 backbone with multilingual-bert

/ 100

Established

Integrates pretrained multilingual BERT encoders for phoneme/semantic representation instead of traditional text encoders, enabling cross-lingual voice synthesis within a VITS2 generative framework. The architecture combines BERT's contextual embeddings with VITS2's flow-based decoder and adversarial training for improved naturalness across multiple languages. Designed for training custom voice models from scratch with preprocessing pipelines exposed via `webui_preprocess.py`.

8,707 stars. Actively maintained with 1 commit in the last 30 days.

No Package No Dependents

Maintenance 16 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 22 / 25

How are scores calculated?

Stars

8,707

Forks

1,267

Language

Python

License

AGPL-3.0

Related tools

FunAudioLLM/CosyVoice

Multi-lingual large voice generation model, providing inference, training and deployment...

travisvn/chatterbox-tts-api

Local, OpenAI-compatible text-to-speech (TTS) API using Chatterbox, enabling users to generate...

OpenMOSS/MOSS-TTS

MOSS‑TTS Family is an open‑source speech and sound generation model family from MOSI.AI and the...

OpenMOSS/MOSS-TTSD

MOSS-TTSD is a spoken dialogue generation model designed for expressive multi-speaker synthesis....

FunAudioLLM/SenseVoice

Multilingual Voice Understanding Model

Explore Voice AI Tools

All categories Trending Voice AI directory Insights