Code Model Training AI Coding Tools

Tools and frameworks for pre-training, fine-tuning, and optimizing language models specifically for code generation and programming tasks. Does NOT include inference-only tools, deployment platforms, or general LLM training frameworks.

There are 76 code model training tools tracked. 1 score above 50 (established tier). The highest-rated is k4black/codebleu at 58/100 with 130 stars and 5,089 monthly downloads.

Get all 76 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=ai-coding&subcategory=code-model-training&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Tool	Score	Tier	Stars	Language
1	k4black/codebleu Pip compatible CodeBLEU metric implementation available for linux/macos/win	58	Established	130	Python
2	LiveCodeBench/LiveCodeBench Official repository for the paper "LiveCodeBench: Holistic and Contamination...	46	Emerging	818	Python
3	EdinburghNLP/code-docstring-corpus Preprocessed Python functions and docstrings for automated code...	41	Emerging	211	Python
4	AS-SiliconMind/SiliconMind-V1 Inference Engine for SiliconMind-V1 Verilog Coding Models	41	Emerging	16	Python
5	hendrycks/apps APPS: Automated Programming Progress Standard (NeurIPS 2021)	39	Emerging	520	Python
6	solis-team/Hydra [FSE 2026] Do Not Treat Code as Natural Language: Implications for...	38	Emerging	5	Python
7	alxschwrz/codex_py2cpp Converts python code into c++ by using OpenAI CODEX.	36	Emerging	505	Python
8	reddy-lab-code-research/PPOCoder Code for the TMLR 2023 paper "PPOCoder: Execution-based Code Generation...	33	Emerging	117	Python
9	tongye98/Awesome-Code-Benchmark A comprehensive code domain benchmark review of LLM researches.	33	Emerging	208	—
10	bharathsudharsan/OTA-TinyML Code for IEEE Internet Computing Journal paper 'OTA-TinyML: Over the Air...	32	Emerging	29	C++
11	logpai/LogBench A benchmark for logging statement generation.	31	Emerging	26	Python
12	s2e-lab/Code-Smell-Code-Generation Source code for "An Empirical Study of Code Smells in Transformer-based Code...	30	Emerging	11	Python
13	JHansiduYapa/Fine-Tuning-a-Small-Language-Model-for-Cypher-Query-Generation This project fine-tunes Unsloth's Gemma-3 4B IT (4-bit) model to translate...	30	Emerging	6	Jupyter Notebook
14	zorazrw/odex [EMNLP'23] Execution-Based Evaluation for Open Domain Code Generation	29	Experimental	49	Python
15	vl2g/floco Flow Chart Image-to-Code Generation	28	Experimental	36	Python
16	code-gen/cscg Code Generation as a Dual Task of Code Summarization.	28	Experimental	30	Jupyter Notebook
17	99EnriqueD/verilog_autocompletion Code implementation for "A Deep Learning Framework for Verilog...	28	Experimental	8	Jupyter Notebook
18	CloudIDEaaS-zz/hydra Hydra is a app generation product. Hydra aims to reduce the "concept to...	28	Experimental	5	JavaScript
19	devashish-gupta/Geode A zero-shot geospatial question answering agent with precise spatiotemporal...	27	Experimental	8	Python
20	Gen-Verse/CURE [NeurIPS 2025 Spotlight] Co-Evolving LLM Coder and Unit Tester via...	27	Experimental	159	Python
21	s2e-lab/SecurityEval Repository for "SecurityEval Dataset: Mining Vulnerability Examples to...	27	Experimental	85	Python
22	matlab-deep-learning/Deep_Learning_Poker_Player_using_MATLAB_and_Raspberry_Pi This example shows how to use automatic code generation to deploy a deep...	26	Experimental	6	MATLAB
23	martin-wey/cl-code-apis Replication package of the paper "On the Usage of Continual Learning for...	26	Experimental	5	Python
24	formula-code/terminal-bench Evaluation harness for FormulaCode	25	Experimental	4	Python
25	madaan/pie-perf Training language models to make programs faster	25	Experimental	98	Jupyter Notebook
26	formula-code/fc-eval Evaluation harness for FormulaCode	25	Experimental	4	Python
27	WebPAI/Interaction2Code [ASE 2025] Benchmarking MLLM-based Interactive Webpage Code Generation from...	24	Experimental	53	Python
28	Pavansomisetty21/Automated-Code-Generation-and-Execution-Agent-using-LangChain-and-Cohere-LLM In this we implement an agent which generates and executes code using cohere...	23	Experimental	2	Jupyter Notebook
29	adpena/vertigo-lora Domain-specialized LoRA fine-tuning pipeline for Roblox/Luau code generation...	23	Experimental	1	Python
30	skpig/MPSC [ACL 2024] Enhancing Large Language Models in Coding Through...	23	Experimental	6	Python
31	matthewdeanmartin/paipi Pypi search, except the backend is an LLM's pixelated memory of Pypi.	23	Experimental	1	Python
32	yunbow/ai-dev-os-benchmark Benchmark: how AI coding guidelines affect code quality — 3 conditions × 9...	23	Experimental	1	TypeScript
33	HIT-SCIR/Abacus 珠算代码大模型（Abacus Code LLM）	22	Experimental	58	—
34	HySonLab/Design2Code Large Language Model in combination with Large Vision Model for the task of...	22	Experimental	10	Python
35	kroq86/honeybadger formal VM benchmark and inspectable reasoning runtime for testing whether...	22	Experimental	—	Python
36	carlos-life/OpenEvolve Evolve algorithms with LLMs. Open-source AlphaEvolve alternative. Uses...	22	Experimental	—	Python
37	sephirxth/LLM_code_test LLM code generation benchmark — Claude vs Gemini vs DeepSeek vs Grok on a...	22	Experimental	—	Python
38	Meisdy/Speech-to-Code-Generation-for-Collaborative-Robots A modular pipeline that lets users program collaborative robots through...	22	Experimental	—	Python
39	Rudra5417/Code-Generator-using-GPT-3 Natural Language to Code	22	Experimental	14	Jupyter Notebook
40	Training-Datasmith/olmo3-code-150m-pretrain Pre-training a ~150M parameter code-specialized language model using OLMo 3...	22	Experimental	—	Jupyter Notebook
41	aswathselvam/Potholes Realtime pothole detection on Android phone's IMU data. SVM model in C++, ...	22	Experimental	3	C
42	sanskar9999/CodeEvolveLLM A framework for using local LLMs (Qwen2.5-coder 7B) that are fine-tuned...	21	Experimental	8	Python
43	aixcoder-plugin/nl2code-dataset Aix-bench, the Java benchmark for code synthesis problem.	20	Experimental	51	Java
44	domaineval/DomainEval DOMAINEVAL is an auto-constructed benchmark for multi-domain code generation...	20	Experimental	14	Python
45	KohlerHECTOR/interpreter-py Implementation of Interpretable and Editable Programmatic Tree Policies for...	20	Experimental	15	Python
46	jszheng21/RACE RACE is a multi-dimensional benchmark for code generation that focuses on...	20	Experimental	12	Python
47	VaibhavYadav/pytorch_pix2code A pytorch Implementation of pix2code	20	Experimental	8	Jupyter Notebook
48	albertusk95/intention-to-code-lstm Source Code Generation Based On User Intention Using LSTM Networks	19	Experimental	19	Python
49	seal-research/OmniCode OmniCode: A Diverse Software Engineering Benchmark for Evaluating Large...	19	Experimental	13	Python
50	medxiaorudan/CodeGeneration Prompt engineering with Langchain and fine-tuning the CodeLlama model. The...	18	Experimental	8	C++
51	CodeEff/ECCO [EMNLP 2024] Code for the paper "ECCO: Can We Improve Model-Generated Code...	18	Experimental	7	Python
52	LiuZeJie97/Code-Generation-From-Flowcharts-with-Texts-A-Benchmark-Dataset-and-An-Approach Code for the paper "Code Generation From Flowcharts with Texts: A Benchmark...	17	Experimental	13	Jupyter Notebook
53	ftrou/Decodifier The Compiler for AI-Generated Software LLMs don’t write code. ...	16	Experimental	1	Python
54	AngelicaArabe/OTA-IOT 🔧 Develop IoT applications with ESP32-S3 using OTA updates, SPIFFS web...	16	Experimental	—	C++
55	ameerkhan9394/ide-ai-benchmark 🚀 Evaluate and compare AI models across multiple IDEs with a comprehensive...	15	Experimental	1	Python
56	LIANGQINGYUAN/Lyra Lyra: A Benchmark for Turducken-Style Code Generation	15	Experimental	15	Python
57	PAN001/LeToRr LeToRr: Learning to Re-rank with Application in Code Generation	14	Experimental	1	—
58	cloudrishi/springboot-ai-generator AI-powered Spring Boot code generator using CodeLlama LLM running locally via Ollama	14	Experimental	—	Python
59	dakshjain-1616/nemotron3-super-vs-gpt5.4-nano Head-to-head benchmark comparing Nemotron and GPT-5.4-nano on code generation tasks	14	Experimental	—	Python
60	ALM3ARQ/character-prefix-conditioning 🔍 Streamline token sampling with character prefix conditioning using a...	14	Experimental	—	Python
61	ada994/prism-bench 🌐 Benchmark models using the PRISM framework and access the FLUX-Reason-6M...	14	Experimental	—	Python
62	jacopotagliabue/LLMs-to-Alloy Example of LLM generated Alloy code for deductive reasoning from English...	14	Experimental	4	Alloy
63	yueyueL/ReliableLM4Code Collections of research, benchmarks and tools towards more robust and...	14	Experimental	30	—
64	przeprogramowani/10x-bench-eval Scoring criteria for 10x-bench (10xbench.ai)	13	Experimental	—	—
65	sssszh/CodePLAN The code repository for the paper “Enhancing Code Generation Performance of...	13	Experimental	8	Python
66	kabirjaipal/Evil-Codes Evil Codes is a repository where you will find many useful code snippets and...	13	Experimental	5	C++
67	Bifrost-Technologies/Prometheus A developer platform for generating complete Solana programs in one-shot...	13	Experimental	—	C#
68	betterenvi/open-dataset Links to awesome open dataset.	12	Experimental	3	—
69	evalops/llmcc LLM-native compiler toolchain - implementing 'LLM ≈ probabilistic compiler'...	12	Experimental	1	TypeScript
70	AshrafMorningstar/omni-code-polyglot A massive, SEO‑optimized collection of 300+ ready‑to‑run code snippets in...	12	Experimental	1	—
71	rajat-kumar-thakur/LLMs-for-Resource-Constrained-Devices This work was done as part of SRIP 2025 Internship, IIT Gandhinagar	11	Experimental	—	Jupyter Notebook
72	navneetprabhakar/telegram-bot-llm Telegram bot with LLM code gen capabilities	11	Experimental	—	Java
73	runaicode/ai-coding-benchmarks Standardized test prompts and benchmarks for evaluating AI coding...	11	Experimental	—	—
74	gokhanercan/gen-atomic An LLM-based code generation framework aims to support a wide range of...	11	Experimental	7	Python
75	falconvn2006/GPasT GPT for Pascal code generation :)	11	Experimental	2	Jupyter Notebook
76	motazsaad/Natural-Language-to-Python Natural Language to Python code Translation	10	Experimental	1	Jupyter Notebook