declare-lab/jamify

JAM: A Tiny Flow-based Song Generator with Fine-grained Controllability and Aesthetic Alignment

/ 100

Emerging

Implements rectified flow diffusion with word and phoneme-level timing control via a compact 530M-parameter DiT backbone, enabling precise vocal prosody specification in lyrics-to-song generation. Achieves 3× lower phoneme/word error rates through phoneme boundary attention and incorporates Direct Preference Optimization using synthetic preference datasets for aesthetic alignment without manual annotation. Distributes inference across multi-GPU setups via Hugging Face Accelerate and supports both reference audio style extraction and text prompts for controllable generation up to 3m50s duration.

154 stars. No commits in the last 6 months.

Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 10 / 25

Maturity 15 / 25

Community 15 / 25

How are scores calculated?

Stars

154

Forks

Language

Python

License

—

Higher-rated alternatives

whitphx/streamlit-stt-app

Real time web based Speech-to-Text app with Streamlit

open-mmlab/Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to...

saidsef/tika-document-to-text

Apache Tika extract text and metadata from any document format with this pre-built containerised...

hipnologo/EchoForge_Studio

Multi-LLM writing and voice production workspace built with Streamlit.

SiddhantSadangi/st_deepgram_playground

API playground for Deepgram built with Streamlit

Explore Voice AI Tools

All categories Trending Voice AI directory Insights