Llm Domain Datasets Transformer Models

There are 16 llm domain datasets models tracked. 1 score above 50 (established tier). The highest-rated is mlabonne/llm-datasets at 53/100 with 4,319 stars. 1 of the top 10 are actively maintained.

Get all 16 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=llm-domain-datasets&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Model Score Tier
1 mlabonne/llm-datasets

Curated list of datasets and tools for post-training.

53
Established
2 malteos/llm-datasets

A collection of datasets for language model pretraining including scripts...

48
Emerging
3 magpie-align/magpie

[ICLR 2025] Alignment Data Synthesis from Scratch by Prompting Aligned LLMs...

43
Emerging
4 willxxy/ECG-Bench

A Unified Framework for Benchmarking Generative Electrocardiogram-Language...

41
Emerging
5 geobrain-ai/geogalactica

Code and datasets for paper "GeoGalactica: A Scientific Large Language Model...

40
Emerging
6 HaoAreYuDong/MachineLearningLM

Scaling In-context Learning from Few-shot to 1,024-shot on Tabular ML

34
Emerging
7 dsdanielpark/open-llm-datasets

Repository for organizing datasets and papers used in Open LLM.

32
Emerging
8 asimsinan/LLM-Research

A collection of LLM related papers, thesis, tools, datasets, courses, open...

30
Emerging
9 seedatnabeel/CLLM

Curated LLM (ICML 2024)

29
Experimental
10 shahriargolchin/time-travel-in-llms

The official repository for the paper entitled "Time Travel in LLMs: Tracing...

29
Experimental
11 artpli/CodeIE

[ACL 23] CodeIE: Large Code Generation Models are Better Few-Shot...

24
Experimental
12 sodascience/social_science_inferences_with_llms

Addressing LLM-related measurement error in social science modeling research.

23
Experimental
13 OSU-NLP-Group/LLM-IOAA

Code and data for the paper "Large Language Models Achieve Gold Medal...

22
Experimental
14 mahadi-nahid/TabSQLify

[NAACL 2024] TabSQLify: Enhancing Reasoning Capabilities of LLMs Through...

22
Experimental
15 rmovva/LLM-publication-patterns-public

[NAACL 2024] Topics, Authors, and Institutions in Large Language Model...

15
Experimental
16 vicgalle/distilled-self-critique

distilled Self-Critique refines the outputs of a LLM with only synthetic data

14
Experimental

Comparisons in this category