FareedKhan-dev/Building-llama3-from-scratch

LLaMA 3 is one of the most promising open-source model after Mistral, we will recreate it's architecture in a simpler manner.

/ 100

Emerging

Implements LLaMA 3's transformer architecture entirely in pure Python without OOP, covering RMSNorm pre-normalization, SwiGLU activation, rotary embeddings (RoPE), and grouped-query attention mechanisms. Uses OpenAI's Tiktoken tokenizer and supports 8192-token context length, scaling to 8B and 70B parameter models on CPU-only setups with 17GB+ RAM. Includes step-by-step implementations of tokenization, embeddings, multi-head attention, and inference generation for educational understanding of modern LLM internals.

203 stars. No commits in the last 6 months.

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 1 / 25

Community 22 / 25

How are scores calculated?

Stars

203

Forks

Language

Jupyter Notebook

License

—

Higher-rated alternatives

Lightning-AI/litgpt

20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.

SPUTNIKAI/LeechTransformer

Leech-Lila: A Geometric Attention Transformer(Language Model) with the Leech Lattice Attention

liangyuwang/Tiny-DeepSpeed

Tiny-DeepSpeed, a minimalistic re-implementation of the DeepSpeed library

viralcode/superGPT

Train your own LLM from scratch

microsoft/Text2Grad

🚀 Text2Grad: Converting natural language feedback into gradient signals for precise model...

Explore LLM Tools

All categories Trending LLM Tool directory Insights