Instruction Tuning Datasets LLM Tools

Datasets, papers, and resources specifically for instruction tuning and instruction-following in LLMs. Does NOT include general fine-tuning methods, evaluation benchmarks, or model inference tools.

There are 28 instruction tuning datasets tools tracked. 1 score above 50 (established tier). The highest-rated is MantisAI/sieves at 54/100 with 125 stars and 605 monthly downloads.

Get all 28 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=llm-tools&subcategory=instruction-tuning-datasets&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 MantisAI/sieves

Plug-and-play document AI with zero-shot models.

54
Established
2 xiaoya-li/Instruction-Tuning-Survey

Project for the paper entitled `Instruction Tuning for Large Language...

37
Emerging
3 princeton-pli/STAT

Skill-Targeted Adaptive Training

29
Experimental
4 TencentARC-QQ/TagGPT

TagGPT: Large Language Models are Zero-shot Multimodal Taggers

28
Experimental
5 rafaelpierre/bullet

bullet: A Zero-Shot / Few-Shot Learning, LLM Based, text classification framework

28
Experimental
6 amazon-science/adaptive-in-context-learning

AdaICL: Which Examples to Annotate of In-Context Learning? Towards Effective...

27
Experimental
7 18907305772/Explore-Instruct

EMNLP'2023: Explore-Instruct: Enhancing Domain-Specific Instruction Coverage...

27
Experimental
8 andrewzamai/SLIMER_IT

An Instruction-tuned LLM for zero-shot NER on Italian

26
Experimental
9 Shivanshu-Gupta/in-context-learning

Easy in-context learning experiemnts with variety of datasets, LLMs, and...

25
Experimental
10 LIN-SHANG/InstructERC

The offical realization of InstructERC

24
Experimental
11 Lichang-Chen/InstructZero

Official Implementation of InstructZero; the first framework to optimize bad...

23
Experimental
12 OpenGVLab/Instruct2Act

Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with...

23
Experimental
13 LGDiMaggio/few-shot-fault-diagnosis-multimodal-LLM

Few-shot bearing fault diagnosis using multimodal LLMs and prototypical networks

22
Experimental
14 HamedBabaei/author-profiling-pan2023

Symbol Team model for PAN@AP 2023 shared task on Profiling Cryptocurrency...

22
Experimental
15 raunak-agarwal/instruction-datasets

Datasets for Instruction Tuning of Large Language Models

21
Experimental
16 basicv8vc/chinese-instruction-datasets-for-llms

用于微调LLM的中文指令数据集

20
Experimental
17 MK2112/conflicting-few-shots

experiments on how conflicting few-shot examples affect emotion...

19
Experimental
18 OpenDFM/HeadsUp

[ICML 2025] Codes for the paper "Heads up! Large Language Models Can Perform...

18
Experimental
19 snowood1/Zero-Shot-PLOVER

Leveraging Codebook Knowledge with NLI and ChatGPT for Zero-Shot Political...

18
Experimental
20 MiuLab/InstUPR

Source code of our paper "InstUPR: Instruction-based Unsupervised Passage...

16
Experimental
21 A-baoYang/instruction-finetune-datasets

Collect and maintain high quality instruction finetune datasets in different...

15
Experimental
22 andrewzamai/SLIMER

Show Less, Instruct More: Enriching Prompts with Definitions and Guidelines...

14
Experimental
23 Reason-Wang/notable-instruction-llm

The repo collects model and data projects for instruction following large...

14
Experimental
24 Showndarya/Few-Shot-ChatGPT

Zero-Shot and Few-shot learning method using ChatGPT on problem sets

13
Experimental
25 mukhal/icl-ensembling

[Me-FoMo ICLR 2023 - Oral] Exploring Demonstration Ensembling for In-context Learning

13
Experimental
26 Ghost---Shadow/InSQuaD

InSQuaD is a research framework for efficient in-context learning that...

13
Experimental
27 DeperiasKerre/qpInstruct

Instruction Dataset for QCL properties Extraction from Text

11
Experimental
28 davidandym/Multitask-Transfer-Instruction-Tuning

This is the official code repository for the ACL Findings Paper "Multi-Task...

10
Experimental