LLM Scaling Architecture LLM Tools

Research implementations and codebases focused on scaling language models across languages, sequence lengths, and parameters—including multilingual adaptation, embedding optimization, and architectural innovations for handling massive model capacity. Does NOT include deployment infrastructure, inference optimization, or general LLM applications.

There are 49 llm scaling architecture tools tracked. 1 score above 50 (established tier). The highest-rated is aalok-sathe/surprisal at 50/100 with 51 stars and 240 monthly downloads. 1 of the top 10 are actively maintained.

Get all 49 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=llm-tools&subcategory=llm-scaling-architecture&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 aalok-sathe/surprisal

A unified interface for computing surprisal (log probabilities) from...

50
Established
2 EvolvingLMMs-Lab/lmms-engine

A simple, unified multimodal models training engine. Lean, flexible, and...

43
Emerging
3 FunnySaltyFish/Better-Ruozhiba

【逐条处理完成】人为审核+修改每一条的弱智吧精选问题QA数据集

38
Emerging
4 reasoning-machines/pal

PaL: Program-Aided Language Models (ICML 2023)

38
Emerging
5 microsoft/monitors4codegen

Code and Data artifact for NeurIPS 2023 paper - "Monitor-Guided Decoding of...

36
Emerging
6 apenab/pyrlm-runtime

Minimal runtime for Recursive Language Models (RLMs) inspired by the MIT...

33
Emerging
7 JKevin17/TM-LLM

The official code for "(ISCC 2025) Network Traffic Matrix Imputation via...

33
Emerging
8 YutongWang1216/DocMTAgent

Code and data releases for the paper -- DelTA: An Online Document-Level...

32
Emerging
9 FreedomIntelligence/EchoX

EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for...

32
Emerging
10 nercone-dev/zeta-llm-tool

Fully Open-source LLM Tool

31
Emerging
11 merantix-momentum/acip

🗜️Codebase of the ACIP algorithm 🗜️

30
Emerging
12 Mxoder/Maxs-Awesome-Datasets

Max的有趣数据集 / Max's awesome datasets

27
Experimental
13 ch3njust1n/smart

Self-modifying code at runtime with Large Language Models

26
Experimental
14 Kitsunp/Prueba-de-modelo-de-ByteLatentTransformer

Este es una prueba de concepto del paper mencionado de Meta junto a otros...

26
Experimental
15 nitinvetcha/DeGAML-LLM

DeGAML-LLM: Decoupling Generalization and Adaptation in Meta-Learning for...

25
Experimental
16 ZetangForward/CSA-GEC

This is the official code for ``Beyond Hard Samples: Robust and Effective...

24
Experimental
17 farukalpay/ISO-639-2023

large language model

24
Experimental
18 fatemafaria142/Large-Language-Models-Over-Transformer-Models-for-Bangla-NLI

This research examines the performance of Large Language Models (GPT-3.5...

24
Experimental
19 zhiyuanpeng/SPTAR

Soft Prompt Tuning for Augmenting Dense Retrieval with Large Language Models

23
Experimental
20 Y-debug-sys/LMTE

[INFOCOM 2026] Official Implementation of "LMTE: Putting the {Reasoning}...

22
Experimental
21 burcgokden/PLDR-LLM-Self-Organized-Criticality

Code used in paper titled "PLDR-LLMs Reason at Self-Organized Criticality"

22
Experimental
22 zjunlp/LookAheadTuning

[WSDM 2026] LookAhead Tuning: Safer Language Models via Partial Answer Previews

21
Experimental
23 LARK-AI-Lab/CodeScaler

The official repo for "CodeScaler: Scaling Code LLM Training and Test-Time...

21
Experimental
24 lime9903/SemanticHAR

LLM-based Human Activity Recognition System

20
Experimental
25 Dahouabdelhalim/CodeSeg

Replication code for "Semantic Code Segmentation with Language Models"...

20
Experimental
26 GeorgeVern/qe-fusion

This repo contains the code for the paper "Don't Rank, Combine! Combining...

19
Experimental
27 hmyousuf2010/bodh

A morphology-aware Bengali tokenizer for large language models.

19
Experimental
28 a-m-team/a-m-models

a-m-team's exploration in large language modeling

18
Experimental
29 ictnlp/StreamUni

StreamUni is a framework that efficiently enables unified Large...

17
Experimental
30 Lucky-Wang-Chenlong/CodeSync

[ICML25] CODESYNC: Synchronizing Large Language Models with Dynamic Code...

17
Experimental
31 mllpresearch/ESO-dataset

ESO speech dataset: an English-language speech corpus of the oncology domain...

17
Experimental
32 PrithwishJana/CoTran

Official repository for CoTran: An LLM-based code translator for...

16
Experimental
33 WSE-research/Code2Code-Translations-using-LLMs-ENASE-2026

The repository to the paper Code2Code Translations using LLMs

16
Experimental
34 originaonxi/prm-replication

Live proof of arXiv:2603.17815 — O(N) confirmed R²=0.952, 1,984 API calls

15
Experimental
35 Jaso1024/Semantic-Code-Embeddings

IEEE 2023 | SCALE: Semantic Code Analysis via Learned Embeddings

15
Experimental
36 aakarsh/rl-llm-calibration-test

Attempt at replication of the parts of the paper "Language models (mostly)...

14
Experimental
37 JingyingHu/ChineseL2Writing-Surprisals

Materials and code for Hu and Cong (2025) - Modeling Chinese L2 Writing...

14
Experimental
38 AidanCooper/constrained-decoding

A guide to structured generation using constrained decoding

14
Experimental
39 sky24h/Training-Free_Zero-Shot_Semantic_Segmentation_with_LLM_Refinement

This repository contains official implementation of the paper "Training-Free...

13
Experimental
40 tony10101105/ExpEmergence

[ICLR'25] U-shaped and Inverted-U Scaling behind Emergent Abilities of Large...

12
Experimental
41 sunwang-ai-linguist/bilingual-rlhf-semantic-repair-corpus

Daily Mandarin-English semantic alignment corpus for RLHF training, tone...

11
Experimental
42 lindeng0/Replication-of-LARGE-LANGUAGE-MODELS-AN-APPLIED-ECONOMETRIC-FRAMEWORK

Replication of LLM econometric framework: leakage checks, prompt/model...

11
Experimental
43 Vidit-Ostwal/RLM-demo

Recursive Language Model Demo

11
Experimental
44 aliasgar-m/Inventory-Opt-LLM

A comparison between Large Language Models for Inventory Optimization

11
Experimental
45 ymgw55/repro-superposition

Unofficial implementation to reproduce the experiments from "Superposition...

11
Experimental
46 isaacwiafe/speech_data_ghana_ug

The dataset comprises of 5000 hours speech corpus in Akan, Ewe, Dagbani,...

10
Experimental
47 ikeasamoahansah/univ-model

A Universal Document Understanding Model (UDUM) which accepts various file types

10
Experimental
48 MaLA-LM/emma-500

EMMA-500: Enhancing Massively Multilingual Adaptation of Large Language Models

10
Experimental
49 vitorhcsousa/llm-w-mlx

Large Language Models with MLX

10
Experimental