datawhalechina/tiny-universe

《大模型白盒子构建指南》:一个全手搓的Tiny-Universe

48
/ 100
Emerging

Implements core LLM components from first principles using PyTorch—including Tiny Diffusion for image generation, Tiny Llama3 for pretraining, and Tiny Transformer architecture—alongside practical systems for RAG, Agent orchestration, and evaluation. Focuses on interpretable, minimal implementations with detailed code comments that decouple learning from high-level frameworks like Hugging Face, enabling independent system modification. Covers the complete pipeline from tokenizer training through inference, GraphRAG construction, and domain-specific evaluation metrics.

4,598 stars.

No License No Package No Dependents
Maintenance 10 / 25
Adoption 10 / 25
Maturity 8 / 25
Community 20 / 25

How are scores calculated?

Stars

4,598

Forks

450

Language

Jupyter Notebook

License

Last pushed

Feb 12, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/rag/datawhalechina/tiny-universe"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.