amazon-science/synthesizrr
Synthesizing realistic and diverse text-datasets from augmented LLMs
Combines retrieval-augmented generation with LLM-based data synthesis to create diverse training datasets, using spaCy and NLTK for linguistic processing. Distributes computation across Ray clusters with support for large external corpora stored on S3, enabling scalable generation of realistic text examples conditioned on retrieved context.
Stars
16
Forks
5
Language
Python
License
Apache-2.0
Category
Last pushed
Jan 26, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/rag/amazon-science/synthesizrr"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
WangRongsheng/awesome-LLM-resources
🧑🚀 全世界最好的LLM资料总结(多模态生成、Agent、辅助编程、AI审稿、数据处理、模型训练、模型推理、o1 模型、MCP、小语言模型、视觉语言模型) | Summary of the...
katanaml/sparrow
Structured data extraction and instruction calling with ML, LLM and Vision LLM
luhengshiwo/LLMForEverybody
每个人都能看懂的大模型知识分享,LLMs春/秋招大模型面试前必看,让你和面试官侃侃而谈
LazyAGI/LazyLLM
Easiest and laziest way for building multi-agent LLMs applications.
SylphAI-Inc/AdalFlow
AdalFlow: The library to build & auto-optimize LLM applications.