weAIDB/awesome-data-llm

Official Repository of "LLM × DATA" Survey Paper

52
/ 100
Established

Organizes comprehensive research across three interconnected domains: data optimization for LLM training (via the IaaS framework addressing inclusiveness, abundance, articulation, and sanitization), LLM/Agent-as-Data-Analyst techniques spanning structured to heterogeneous data modalities, and LLM-enhanced data preparation workflows for cleaning, integration, and enrichment. Curates papers and methodologies covering the full LLM lifecycle—from pretraining and fine-tuning through RAG and agent systems—alongside data infrastructure concerns like deduplication, filtering, storage formats, and serving optimization. Synthesizes emerging paradigms around prompt-driven data workflows and agentic preparation systems alongside foundational data-centric approaches for scaling model performance.

740 stars. Actively maintained with 10 commits in the last 30 days.

No License No Package No Dependents
Maintenance 17 / 25
Adoption 10 / 25
Maturity 8 / 25
Community 17 / 25

How are scores calculated?

Stars

740

Forks

66

Language

License

Last pushed

Mar 05, 2026

Commits (30d)

10

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/weAIDB/awesome-data-llm"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.