DongmingShenDS/Mistral_From_Scratch

Mistral and Mixtral (MoE) from scratch

/ 100

Experimental

This project helps machine learning engineers and researchers understand and build large language models (LLMs) from the ground up. It provides step-by-step implementations of Mistral and Mixtral (Mixtral of Experts) architectures, including key components like RoPE, RMSNorm, and various attention mechanisms. Anyone looking to dive deep into the mechanics of modern LLMs will find this invaluable.

No commits in the last 6 months.

Use this if you are a machine learning engineer, researcher, or student who wants to learn the fundamental building blocks of advanced large language models by implementing them yourself.

Not ideal if you are an end-user simply looking to apply or fine-tune existing LLMs without needing to understand their internal mechanics.

large-language-models deep-learning-research neural-network-architecture machine-learning-engineering model-implementation

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 5 / 25

Maturity 8 / 25

Community 0 / 25

How are scores calculated?

Stars

Forks

—

Language

Python

License

—

Higher-rated alternatives

mistralai/mistral-inference

Official inference library for Mistral models

dvmazur/mixtral-offloading

Run Mixtral-8x7B models in Colab or consumer desktops

open-compass/MixtralKit

A toolkit for inference and evaluation of 'mixtral-8x7b-32kseqlen' from Mistral AI

vicuna-tools/vicuna-installation-guide

The "vicuna-installation-guide" provides step-by-step instructions for installing and...

pleisto/yuren-13b

Yuren 13B is an information synthesis large language model that has been continuously trained...

Explore Transformer Models

All categories Trending Transformer directory Insights