microsoft/unilm
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
Built on foundation architectures like DeepNet (1,000+ layer Transformers) and Magneto (general-purpose multimodal modeling), the project implements unified pre-training across diverse modalities including vision (BEiT, DiT), speech (WavLM, VALL-E), and document understanding (LayoutLM series). It emphasizes training stability and efficiency through techniques like sparse Mixture-of-Experts (X-MoE) and length extrapolation, while supporting 100+ languages via models like InfoXLM and DeltaLM for cross-lingual transfer and machine translation.
22,042 stars.
Stars
22,042
Forks
2,692
Language
Python
License
MIT
Category
Last pushed
Jan 23, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/microsoft/unilm"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related models
jncraton/languagemodels
Explore large language models in 512MB of RAM
haizelabs/verdict
Inference-time scaling for LLMs-as-a-judge.
bytedance/Sa2VA
Official Repo For Pixel-LLM Codebase
albertan017/LLM4Decompile
Reverse Engineering: Decompiling Binary Code with Large Language Models
Cardinal-Operations/ORLM
ORLM: Training Large Language Models for Optimization Modeling