FareedKhan-dev/Building-llama3-from-scratch
LLaMA 3 is one of the most promising open-source model after Mistral, we will recreate it's architecture in a simpler manner.
Implements LLaMA 3's transformer architecture entirely in pure Python without OOP, covering RMSNorm pre-normalization, SwiGLU activation, rotary embeddings (RoPE), and grouped-query attention mechanisms. Uses OpenAI's Tiktoken tokenizer and supports 8192-token context length, scaling to 8B and 70B parameter models on CPU-only setups with 17GB+ RAM. Includes step-by-step implementations of tokenization, embeddings, multi-head attention, and inference generation for educational understanding of modern LLM internals.
203 stars. No commits in the last 6 months.
Stars
203
Forks
46
Language
Jupyter Notebook
License
—
Category
Last pushed
Aug 23, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/FareedKhan-dev/Building-llama3-from-scratch"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
Lightning-AI/litgpt
20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.
SPUTNIKAI/LeechTransformer
Leech-Lila: A Geometric Attention Transformer(Language Model) with the Leech Lattice Attention
liangyuwang/Tiny-DeepSpeed
Tiny-DeepSpeed, a minimalistic re-implementation of the DeepSpeed library
viralcode/superGPT
Train your own LLM from scratch
microsoft/Text2Grad
🚀 Text2Grad: Converting natural language feedback into gradient signals for precise model...