Llm Domain Datasets Transformer Models
There are 16 llm domain datasets models tracked. 1 score above 50 (established tier). The highest-rated is mlabonne/llm-datasets at 53/100 with 4,319 stars. 1 of the top 10 are actively maintained.
Get all 16 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=llm-domain-datasets&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Model | Score | Tier |
|---|---|---|---|
| 1 |
mlabonne/llm-datasets
Curated list of datasets and tools for post-training. |
|
Established |
| 2 |
malteos/llm-datasets
A collection of datasets for language model pretraining including scripts... |
|
Emerging |
| 3 |
magpie-align/magpie
[ICLR 2025] Alignment Data Synthesis from Scratch by Prompting Aligned LLMs... |
|
Emerging |
| 4 |
willxxy/ECG-Bench
A Unified Framework for Benchmarking Generative Electrocardiogram-Language... |
|
Emerging |
| 5 |
geobrain-ai/geogalactica
Code and datasets for paper "GeoGalactica: A Scientific Large Language Model... |
|
Emerging |
| 6 |
HaoAreYuDong/MachineLearningLM
Scaling In-context Learning from Few-shot to 1,024-shot on Tabular ML |
|
Emerging |
| 7 |
dsdanielpark/open-llm-datasets
Repository for organizing datasets and papers used in Open LLM. |
|
Emerging |
| 8 |
asimsinan/LLM-Research
A collection of LLM related papers, thesis, tools, datasets, courses, open... |
|
Emerging |
| 9 |
seedatnabeel/CLLM
Curated LLM (ICML 2024) |
|
Experimental |
| 10 |
shahriargolchin/time-travel-in-llms
The official repository for the paper entitled "Time Travel in LLMs: Tracing... |
|
Experimental |
| 11 |
artpli/CodeIE
[ACL 23] CodeIE: Large Code Generation Models are Better Few-Shot... |
|
Experimental |
| 12 |
sodascience/social_science_inferences_with_llms
Addressing LLM-related measurement error in social science modeling research. |
|
Experimental |
| 13 |
OSU-NLP-Group/LLM-IOAA
Code and data for the paper "Large Language Models Achieve Gold Medal... |
|
Experimental |
| 14 |
mahadi-nahid/TabSQLify
[NAACL 2024] TabSQLify: Enhancing Reasoning Capabilities of LLMs Through... |
|
Experimental |
| 15 |
rmovva/LLM-publication-patterns-public
[NAACL 2024] Topics, Authors, and Institutions in Large Language Model... |
|
Experimental |
| 16 |
vicgalle/distilled-self-critique
distilled Self-Critique refines the outputs of a LLM with only synthetic data |
|
Experimental |