bassrehab/speculative-decoding

Reference implementation of LLM inference acceleration techniques. Includes speculative decoding, tree speculation, EAGLE, Medusa, KV-cache compression, and diffusion model efficiency with roofline analysis.

/ 100

Experimental

No Package No Dependents

Maintenance 6 / 25

Adoption 1 / 25

Maturity 9 / 25

Community 0 / 25

How are scores calculated?

Stars

Forks

—

Language

Python

License

MIT

Category

llm-implementation-from-scratch

Last pushed

Dec 17, 2025

Commits (30d)

GitHub

LLM Implementation From Scratch · 44 models

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/bassrehab/speculative-decoding"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

Higher-rated alternatives

rasbt/LLMs-from-scratch

Implement a ChatGPT-like LLM in PyTorch from scratch, step by step

facebookresearch/LayerSkip

Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024

FareedKhan-dev/train-llm-from-scratch

A straightforward method for training your LLM, from downloading data to generating text.

kmeng01/rome

Locating and editing factual associations in GPT (NeurIPS 2022)

datawhalechina/llms-from-scratch-cn

仅需Python基础，从0构建大语言模型；从0逐步构建GLM4\Llama3\RWKV6，深入理解大模型原理

Explore Transformer Models

All categories Trending Transformer directory Insights