LLM Scaling Architecture LLM Tools

Research implementations and codebases focused on scaling language models across languages, sequence lengths, and parameters—including multilingual adaptation, embedding optimization, and architectural innovations for handling massive model capacity. Does NOT include deployment infrastructure, inference optimization, or general LLM applications.

There are 49 llm scaling architecture tools tracked. 1 score above 50 (established tier). The highest-rated is aalok-sathe/surprisal at 50/100 with 51 stars and 240 monthly downloads. 1 of the top 10 are actively maintained.

Get all 49 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=llm-tools&subcategory=llm-scaling-architecture&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Tool	Score	Tier	Stars	Language
1	aalok-sathe/surprisal A unified interface for computing surprisal (log probabilities) from...	50	Established	51	Python
2	EvolvingLMMs-Lab/lmms-engine A simple, unified multimodal models training engine. Lean, flexible, and...	43	Emerging	740	Python
3	FunnySaltyFish/Better-Ruozhiba 【逐条处理完成】人为审核+修改每一条的弱智吧精选问题QA数据集	38	Emerging	253	—
4	reasoning-machines/pal PaL: Program-Aided Language Models (ICML 2023)	38	Emerging	518	Python
5	microsoft/monitors4codegen Code and Data artifact for NeurIPS 2023 paper - "Monitor-Guided Decoding of...	36	Emerging	280	Python
6	apenab/pyrlm-runtime Minimal runtime for Recursive Language Models (RLMs) inspired by the MIT...	33	Emerging	14	Python
7	JKevin17/TM-LLM The official code for "(ISCC 2025) Network Traffic Matrix Imputation via...	33	Emerging	6	Python
8	YutongWang1216/DocMTAgent Code and data releases for the paper -- DelTA: An Online Document-Level...	32	Emerging	59	Roff
9	FreedomIntelligence/EchoX EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for...	32	Emerging	47	Python
10	nercone-dev/zeta-llm-tool Fully Open-source LLM Tool	31	Emerging	5	Python
11	merantix-momentum/acip 🗜️Codebase of the ACIP algorithm 🗜️	30	Emerging	16	Python
12	Mxoder/Maxs-Awesome-Datasets Max的有趣数据集 / Max's awesome datasets	27	Experimental	68	—
13	ch3njust1n/smart Self-modifying code at runtime with Large Language Models	26	Experimental	7	Python
14	Kitsunp/Prueba-de-modelo-de-ByteLatentTransformer Este es una prueba de concepto del paper mencionado de Meta junto a otros...	26	Experimental	8	Python
15	nitinvetcha/DeGAML-LLM DeGAML-LLM: Decoupling Generalization and Adaptation in Meta-Learning for...	25	Experimental	16	Python
16	ZetangForward/CSA-GEC This is the official code for ``Beyond Hard Samples: Robust and Effective...	24	Experimental	3	Python
17	farukalpay/ISO-639-2023 large language model	24	Experimental	1	—
18	fatemafaria142/Large-Language-Models-Over-Transformer-Models-for-Bangla-NLI This research examines the performance of Large Language Models (GPT-3.5...	24	Experimental	3	Jupyter Notebook
19	zhiyuanpeng/SPTAR Soft Prompt Tuning for Augmenting Dense Retrieval with Large Language Models	23	Experimental	16	Jupyter Notebook
20	Y-debug-sys/LMTE [INFOCOM 2026] Official Implementation of "LMTE: Putting the {Reasoning}...	22	Experimental	4	Python
21	burcgokden/PLDR-LLM-Self-Organized-Criticality Code used in paper titled "PLDR-LLMs Reason at Self-Organized Criticality"	22	Experimental	—	Python
22	zjunlp/LookAheadTuning [WSDM 2026] LookAhead Tuning: Safer Language Models via Partial Answer Previews	21	Experimental	17	Python
23	LARK-AI-Lab/CodeScaler The official repo for "CodeScaler: Scaling Code LLM Training and Test-Time...	21	Experimental	32	Python
24	lime9903/SemanticHAR LLM-based Human Activity Recognition System	20	Experimental	1	Python
25	Dahouabdelhalim/CodeSeg Replication code for "Semantic Code Segmentation with Language Models"...	20	Experimental	1	Jupyter Notebook
26	GeorgeVern/qe-fusion This repo contains the code for the paper "Don't Rank, Combine! Combining...	19	Experimental	5	Python
27	hmyousuf2010/bodh A morphology-aware Bengali tokenizer for large language models.	19	Experimental	—	Rust
28	a-m-team/a-m-models a-m-team's exploration in large language modeling	18	Experimental	194	—
29	ictnlp/StreamUni StreamUni is a framework that efficiently enables unified Large...	17	Experimental	19	Python
30	Lucky-Wang-Chenlong/CodeSync [ICML25] CODESYNC: Synchronizing Large Language Models with Dynamic Code...	17	Experimental	25	Python
31	mllpresearch/ESO-dataset ESO speech dataset: an English-language speech corpus of the oncology domain...	17	Experimental	2	—
32	PrithwishJana/CoTran Official repository for CoTran: An LLM-based code translator for...	16	Experimental	16	Java
33	WSE-research/Code2Code-Translations-using-LLMs-ENASE-2026 The repository to the paper Code2Code Translations using LLMs	16	Experimental	2	Python
34	originaonxi/prm-replication Live proof of arXiv:2603.17815 — O(N) confirmed R²=0.952, 1,984 API calls	15	Experimental	1	Python
35	Jaso1024/Semantic-Code-Embeddings IEEE 2023 \| SCALE: Semantic Code Analysis via Learned Embeddings	15	Experimental	2	Python
36	aakarsh/rl-llm-calibration-test Attempt at replication of the parts of the paper "Language models (mostly)...	14	Experimental	1	Jupyter Notebook
37	JingyingHu/ChineseL2Writing-Surprisals Materials and code for Hu and Cong (2025) - Modeling Chinese L2 Writing...	14	Experimental	3	R
38	AidanCooper/constrained-decoding A guide to structured generation using constrained decoding	14	Experimental	14	Jupyter Notebook
39	sky24h/Training-Free_Zero-Shot_Semantic_Segmentation_with_LLM_Refinement This repository contains official implementation of the paper "Training-Free...	13	Experimental	5	Jupyter Notebook
40	tony10101105/ExpEmergence [ICLR'25] U-shaped and Inverted-U Scaling behind Emergent Abilities of Large...	12	Experimental	3	Python
41	sunwang-ai-linguist/bilingual-rlhf-semantic-repair-corpus Daily Mandarin-English semantic alignment corpus for RLHF training, tone...	11	Experimental	—	Python
42	lindeng0/Replication-of-LARGE-LANGUAGE-MODELS-AN-APPLIED-ECONOMETRIC-FRAMEWORK Replication of LLM econometric framework: leakage checks, prompt/model...	11	Experimental	—	Jupyter Notebook
43	Vidit-Ostwal/RLM-demo Recursive Language Model Demo	11	Experimental	—	TypeScript
44	aliasgar-m/Inventory-Opt-LLM A comparison between Large Language Models for Inventory Optimization	11	Experimental	—	Python
45	ymgw55/repro-superposition Unofficial implementation to reproduce the experiments from "Superposition...	11	Experimental	—	Jupyter Notebook
46	isaacwiafe/speech_data_ghana_ug The dataset comprises of 5000 hours speech corpus in Akan, Ewe, Dagbani,...	10	Experimental	1	HTML
47	ikeasamoahansah/univ-model A Universal Document Understanding Model (UDUM) which accepts various file types	10	Experimental	1	Jupyter Notebook
48	MaLA-LM/emma-500 EMMA-500: Enhancing Massively Multilingual Adaptation of Large Language Models	10	Experimental	4	Python
49	vitorhcsousa/llm-w-mlx Large Language Models with MLX	10	Experimental	1	Python