Aratako/Irodori-TTS

A Flow Matching-based Text-to-Speech Model with Emoji-driven Style Control

42
/ 100
Emerging

Employs a Rectified Flow Diffusion Transformer over DACVAE continuous latents for 48kHz synthesis, with joint-attention conditioning for zero-shot voice cloning and emoji-driven style control. Supports distributed multi-GPU training via torchrun with mixed precision (bf16), gradient accumulation, and parameter-efficient LoRA fine-tuning. Provides inference via CLI, Gradio UI, and direct HuggingFace Hub checkpoint loading with configurable guidance modes and DACVAE codec control.

No Package No Dependents
Maintenance 10 / 25
Adoption 7 / 25
Maturity 11 / 25
Community 14 / 25

How are scores calculated?

Stars

40

Forks

6

Language

Python

License

MIT

Last pushed

Feb 27, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/diffusion/Aratako/Irodori-TTS"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.