JIA-Lab-research/MGM-Omni
MGM-Omni: Scaling Omni LLMs to Personalized Long-Horizon Speech
Leverages modality-specific encoders (audio, video, image, text) feeding into a unified MLLM, with speech output generated via a SpeechLM using Chunk-Based Parallel Decoding and Flow Matching vocoding for streaming synthesis. Handles hour-long speech inputs and generates 10+ minutes of coherent audio, supporting zero-shot voice cloning from ~10-second reference clips in Chinese and English. Built on MiniGemini/Lyra architecture with open-source model weights on Hugging Face and includes Long-TTS-Eval benchmark for long-form speech synthesis evaluation.
265 stars.
Stars
265
Forks
16
Language
Python
License
Apache-2.0
Category
Last pushed
Mar 16, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/JIA-Lab-research/MGM-Omni"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.