PaddlePaddle/PaddleMIX
Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high performance and flexibility.
Built on PaddlePaddle, it provides unified training pipelines across vision-language understanding (LLaVA, Qwen-VL, DeepSeek-VL2), text-to-image generation (Stable Diffusion, FLUX), and video generation with specialized models like PP-DocBee for document understanding and PP-VCtrl for video control. The toolkit includes Fast-Diffusers acceleration algorithms (training-free inference optimization and distillation techniques achieving 2x+ speedup), multi-modal data processing via DataCopilot, and distributed training support across GPU and Ascend 910B chips via PaddlePaddle's 4D hybrid parallelism.
718 stars. Actively maintained with 1 commit in the last 30 days.
Stars
718
Forks
224
Language
Python
License
Apache-2.0
Category
Last pushed
Mar 06, 2026
Commits (30d)
1
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/diffusion/PaddlePaddle/PaddleMIX"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related models
UCSC-VLAA/story-iter
[ICLR 2026] A Training-free Iterative Framework for Long Story Visualization
keivalya/mini-vla
a minimal, beginner-friendly VLA to show how robot policies can fuse images, text, and states to...
adobe-research/custom-diffusion
Custom Diffusion: Multi-Concept Customization of Text-to-Image Diffusion (CVPR 2023)
byliutao/1Prompt1Story
🔥ICLR 2025 (Spotlight) One-Prompt-One-Story: Free-Lunch Consistent Text-to-Image Generation...
HorizonWind2004/reconstruction-alignment
[ICLR 2026] Official repo of paper "Reconstruction Alignment Improves Unified Multimodal...