microsoft/unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

/ 100

Established

Built on foundation architectures like DeepNet (1,000+ layer Transformers) and Magneto (general-purpose multimodal modeling), the project implements unified pre-training across diverse modalities including vision (BEiT, DiT), speech (WavLM, VALL-E), and document understanding (LayoutLM series). It emphasizes training stability and efficiency through techniques like sparse Mixture-of-Experts (X-MoE) and length extrapolation, while supporting 100+ languages via models like InfoXLM and DeltaLM for cross-lingual transfer and machine translation.

22,042 stars.

No Package No Dependents

Maintenance 10 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 21 / 25

How are scores calculated?

Stars

22,042

Forks

2,692

Language

Python

License

MIT

Related models

jncraton/languagemodels

Explore large language models in 512MB of RAM

haizelabs/verdict

Inference-time scaling for LLMs-as-a-judge.

bytedance/Sa2VA

Official Repo For Pixel-LLM Codebase

albertan017/LLM4Decompile

Reverse Engineering: Decompiling Binary Code with Large Language Models

Cardinal-Operations/ORLM

ORLM: Training Large Language Models for Optimization Modeling

Explore Transformer Models

All categories Trending Transformer directory Insights