declare-lab/jamify
JAM: A Tiny Flow-based Song Generator with Fine-grained Controllability and Aesthetic Alignment
Implements rectified flow diffusion with word and phoneme-level timing control via a compact 530M-parameter DiT backbone, enabling precise vocal prosody specification in lyrics-to-song generation. Achieves 3× lower phoneme/word error rates through phoneme boundary attention and incorporates Direct Preference Optimization using synthetic preference datasets for aesthetic alignment without manual annotation. Distributes inference across multi-GPU setups via Hugging Face Accelerate and supports both reference audio style extraction and text prompts for controllable generation up to 3m50s duration.
154 stars. No commits in the last 6 months.
Stars
154
Forks
20
Language
Python
License
—
Category
Last pushed
Aug 07, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/declare-lab/jamify"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
whitphx/streamlit-stt-app
Real time web based Speech-to-Text app with Streamlit
open-mmlab/Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to...
saidsef/tika-document-to-text
Apache Tika extract text and metadata from any document format with this pre-built containerised...
hipnologo/EchoForge_Studio
Multi-LLM writing and voice production workspace built with Streamlit.
SiddhantSadangi/st_deepgram_playground
API playground for Deepgram built with Streamlit