LLM Scaling Architecture LLM Tools
Research implementations and codebases focused on scaling language models across languages, sequence lengths, and parameters—including multilingual adaptation, embedding optimization, and architectural innovations for handling massive model capacity. Does NOT include deployment infrastructure, inference optimization, or general LLM applications.
There are 49 llm scaling architecture tools tracked. 1 score above 50 (established tier). The highest-rated is aalok-sathe/surprisal at 50/100 with 51 stars and 240 monthly downloads. 1 of the top 10 are actively maintained.
Get all 49 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=llm-tools&subcategory=llm-scaling-architecture&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Tool | Score | Tier |
|---|---|---|---|
| 1 |
aalok-sathe/surprisal
A unified interface for computing surprisal (log probabilities) from... |
|
Established |
| 2 |
EvolvingLMMs-Lab/lmms-engine
A simple, unified multimodal models training engine. Lean, flexible, and... |
|
Emerging |
| 3 |
FunnySaltyFish/Better-Ruozhiba
【逐条处理完成】人为审核+修改每一条的弱智吧精选问题QA数据集 |
|
Emerging |
| 4 |
reasoning-machines/pal
PaL: Program-Aided Language Models (ICML 2023) |
|
Emerging |
| 5 |
microsoft/monitors4codegen
Code and Data artifact for NeurIPS 2023 paper - "Monitor-Guided Decoding of... |
|
Emerging |
| 6 |
apenab/pyrlm-runtime
Minimal runtime for Recursive Language Models (RLMs) inspired by the MIT... |
|
Emerging |
| 7 |
JKevin17/TM-LLM
The official code for "(ISCC 2025) Network Traffic Matrix Imputation via... |
|
Emerging |
| 8 |
YutongWang1216/DocMTAgent
Code and data releases for the paper -- DelTA: An Online Document-Level... |
|
Emerging |
| 9 |
FreedomIntelligence/EchoX
EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for... |
|
Emerging |
| 10 |
nercone-dev/zeta-llm-tool
Fully Open-source LLM Tool |
|
Emerging |
| 11 |
merantix-momentum/acip
🗜️Codebase of the ACIP algorithm 🗜️ |
|
Emerging |
| 12 |
Mxoder/Maxs-Awesome-Datasets
Max的有趣数据集 / Max's awesome datasets |
|
Experimental |
| 13 |
ch3njust1n/smart
Self-modifying code at runtime with Large Language Models |
|
Experimental |
| 14 |
Kitsunp/Prueba-de-modelo-de-ByteLatentTransformer
Este es una prueba de concepto del paper mencionado de Meta junto a otros... |
|
Experimental |
| 15 |
nitinvetcha/DeGAML-LLM
DeGAML-LLM: Decoupling Generalization and Adaptation in Meta-Learning for... |
|
Experimental |
| 16 |
ZetangForward/CSA-GEC
This is the official code for ``Beyond Hard Samples: Robust and Effective... |
|
Experimental |
| 17 |
farukalpay/ISO-639-2023
large language model |
|
Experimental |
| 18 |
fatemafaria142/Large-Language-Models-Over-Transformer-Models-for-Bangla-NLI
This research examines the performance of Large Language Models (GPT-3.5... |
|
Experimental |
| 19 |
zhiyuanpeng/SPTAR
Soft Prompt Tuning for Augmenting Dense Retrieval with Large Language Models |
|
Experimental |
| 20 |
Y-debug-sys/LMTE
[INFOCOM 2026] Official Implementation of "LMTE: Putting the {Reasoning}... |
|
Experimental |
| 21 |
burcgokden/PLDR-LLM-Self-Organized-Criticality
Code used in paper titled "PLDR-LLMs Reason at Self-Organized Criticality" |
|
Experimental |
| 22 |
zjunlp/LookAheadTuning
[WSDM 2026] LookAhead Tuning: Safer Language Models via Partial Answer Previews |
|
Experimental |
| 23 |
LARK-AI-Lab/CodeScaler
The official repo for "CodeScaler: Scaling Code LLM Training and Test-Time... |
|
Experimental |
| 24 |
lime9903/SemanticHAR
LLM-based Human Activity Recognition System |
|
Experimental |
| 25 |
Dahouabdelhalim/CodeSeg
Replication code for "Semantic Code Segmentation with Language Models"... |
|
Experimental |
| 26 |
GeorgeVern/qe-fusion
This repo contains the code for the paper "Don't Rank, Combine! Combining... |
|
Experimental |
| 27 |
hmyousuf2010/bodh
A morphology-aware Bengali tokenizer for large language models. |
|
Experimental |
| 28 |
a-m-team/a-m-models
a-m-team's exploration in large language modeling |
|
Experimental |
| 29 |
ictnlp/StreamUni
StreamUni is a framework that efficiently enables unified Large... |
|
Experimental |
| 30 |
Lucky-Wang-Chenlong/CodeSync
[ICML25] CODESYNC: Synchronizing Large Language Models with Dynamic Code... |
|
Experimental |
| 31 |
mllpresearch/ESO-dataset
ESO speech dataset: an English-language speech corpus of the oncology domain... |
|
Experimental |
| 32 |
PrithwishJana/CoTran
Official repository for CoTran: An LLM-based code translator for... |
|
Experimental |
| 33 |
WSE-research/Code2Code-Translations-using-LLMs-ENASE-2026
The repository to the paper Code2Code Translations using LLMs |
|
Experimental |
| 34 |
originaonxi/prm-replication
Live proof of arXiv:2603.17815 — O(N) confirmed R²=0.952, 1,984 API calls |
|
Experimental |
| 35 |
Jaso1024/Semantic-Code-Embeddings
IEEE 2023 | SCALE: Semantic Code Analysis via Learned Embeddings |
|
Experimental |
| 36 |
aakarsh/rl-llm-calibration-test
Attempt at replication of the parts of the paper "Language models (mostly)... |
|
Experimental |
| 37 |
JingyingHu/ChineseL2Writing-Surprisals
Materials and code for Hu and Cong (2025) - Modeling Chinese L2 Writing... |
|
Experimental |
| 38 |
AidanCooper/constrained-decoding
A guide to structured generation using constrained decoding |
|
Experimental |
| 39 |
sky24h/Training-Free_Zero-Shot_Semantic_Segmentation_with_LLM_Refinement
This repository contains official implementation of the paper "Training-Free... |
|
Experimental |
| 40 |
tony10101105/ExpEmergence
[ICLR'25] U-shaped and Inverted-U Scaling behind Emergent Abilities of Large... |
|
Experimental |
| 41 |
sunwang-ai-linguist/bilingual-rlhf-semantic-repair-corpus
Daily Mandarin-English semantic alignment corpus for RLHF training, tone... |
|
Experimental |
| 42 |
lindeng0/Replication-of-LARGE-LANGUAGE-MODELS-AN-APPLIED-ECONOMETRIC-FRAMEWORK
Replication of LLM econometric framework: leakage checks, prompt/model... |
|
Experimental |
| 43 |
Vidit-Ostwal/RLM-demo
Recursive Language Model Demo |
|
Experimental |
| 44 |
aliasgar-m/Inventory-Opt-LLM
A comparison between Large Language Models for Inventory Optimization |
|
Experimental |
| 45 |
ymgw55/repro-superposition
Unofficial implementation to reproduce the experiments from "Superposition... |
|
Experimental |
| 46 |
isaacwiafe/speech_data_ghana_ug
The dataset comprises of 5000 hours speech corpus in Akan, Ewe, Dagbani,... |
|
Experimental |
| 47 |
ikeasamoahansah/univ-model
A Universal Document Understanding Model (UDUM) which accepts various file types |
|
Experimental |
| 48 |
MaLA-LM/emma-500
EMMA-500: Enhancing Massively Multilingual Adaptation of Large Language Models |
|
Experimental |
| 49 |
vitorhcsousa/llm-w-mlx
Large Language Models with MLX |
|
Experimental |