Instruction Tuning Datasets LLM Tools

Datasets, papers, and resources specifically for instruction tuning and instruction-following in LLMs. Does NOT include general fine-tuning methods, evaluation benchmarks, or model inference tools.

There are 28 instruction tuning datasets tools tracked. 1 score above 50 (established tier). The highest-rated is MantisAI/sieves at 54/100 with 125 stars and 605 monthly downloads.

Get all 28 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=llm-tools&subcategory=instruction-tuning-datasets&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Tool	Score	Tier	Stars	Language
1	MantisAI/sieves Plug-and-play document AI with zero-shot models.	54	Established	125	Python
2	xiaoya-li/Instruction-Tuning-Survey Project for the paper entitled `Instruction Tuning for Large Language...	37	Emerging	230	—
3	princeton-pli/STAT Skill-Targeted Adaptive Training	29	Experimental	16	Python
4	TencentARC-QQ/TagGPT TagGPT: Large Language Models are Zero-shot Multimodal Taggers	28	Experimental	66	Python
5	rafaelpierre/bullet bullet: A Zero-Shot / Few-Shot Learning, LLM Based, text classification framework	28	Experimental	12	Jupyter Notebook
6	amazon-science/adaptive-in-context-learning AdaICL: Which Examples to Annotate of In-Context Learning? Towards Effective...	27	Experimental	20	Python
7	18907305772/Explore-Instruct EMNLP'2023: Explore-Instruct: Enhancing Domain-Specific Instruction Coverage...	27	Experimental	5	Python
8	andrewzamai/SLIMER_IT An Instruction-tuned LLM for zero-shot NER on Italian	26	Experimental	4	Jupyter Notebook
9	Shivanshu-Gupta/in-context-learning Easy in-context learning experiemnts with variety of datasets, LLMs, and...	25	Experimental	1	Python
10	LIN-SHANG/InstructERC The offical realization of InstructERC	24	Experimental	148	Python
11	Lichang-Chen/InstructZero Official Implementation of InstructZero; the first framework to optimize bad...	23	Experimental	199	Python
12	OpenGVLab/Instruct2Act Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with...	23	Experimental	373	Python
13	LGDiMaggio/few-shot-fault-diagnosis-multimodal-LLM Few-shot bearing fault diagnosis using multimodal LLMs and prototypical networks	22	Experimental	4	Python
14	HamedBabaei/author-profiling-pan2023 Symbol Team model for PAN@AP 2023 shared task on Profiling Cryptocurrency...	22	Experimental	1	Python
15	raunak-agarwal/instruction-datasets Datasets for Instruction Tuning of Large Language Models	21	Experimental	261	—
16	basicv8vc/chinese-instruction-datasets-for-llms 用于微调LLM的中文指令数据集	20	Experimental	29	—
17	MK2112/conflicting-few-shots experiments on how conflicting few-shot examples affect emotion...	19	Experimental	—	Python
18	OpenDFM/HeadsUp [ICML 2025] Codes for the paper "Heads up! Large Language Models Can Perform...	18	Experimental	3	Jupyter Notebook
19	snowood1/Zero-Shot-PLOVER Leveraging Codebook Knowledge with NLI and ChatGPT for Zero-Shot Political...	18	Experimental	6	Jupyter Notebook
20	MiuLab/InstUPR Source code of our paper "InstUPR: Instruction-based Unsupervised Passage...	16	Experimental	3	Python
21	A-baoYang/instruction-finetune-datasets Collect and maintain high quality instruction finetune datasets in different...	15	Experimental	20	—
22	andrewzamai/SLIMER Show Less, Instruct More: Enriching Prompts with Definitions and Guidelines...	14	Experimental	4	Python
23	Reason-Wang/notable-instruction-llm The repo collects model and data projects for instruction following large...	14	Experimental	1	—
24	Showndarya/Few-Shot-ChatGPT Zero-Shot and Few-shot learning method using ChatGPT on problem sets	13	Experimental	5	Jupyter Notebook
25	mukhal/icl-ensembling [Me-FoMo ICLR 2023 - Oral] Exploring Demonstration Ensembling for In-context Learning	13	Experimental	5	Python
26	Ghost---Shadow/InSQuaD InSQuaD is a research framework for efficient in-context learning that...	13	Experimental	2	Python
27	DeperiasKerre/qpInstruct Instruction Dataset for QCL properties Extraction from Text	11	Experimental	—	Python
28	davidandym/Multitask-Transfer-Instruction-Tuning This is the official code repository for the ACL Findings Paper "Multi-Task...	10	Experimental	1	—