amazon-science/synthesizrr

Synthesizing realistic and diverse text-datasets from augmented LLMs

40
/ 100
Emerging

Combines retrieval-augmented generation with LLM-based data synthesis to create diverse training datasets, using spaCy and NLTK for linguistic processing. Distributes computation across Ray clusters with support for large external corpora stored on S3, enabling scalable generation of realistic text examples conditioned on retrieved context.

No Package No Dependents
Maintenance 10 / 25
Adoption 6 / 25
Maturity 9 / 25
Community 15 / 25

How are scores calculated?

Stars

16

Forks

5

Language

Python

License

Apache-2.0

Last pushed

Jan 26, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/rag/amazon-science/synthesizrr"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.