ZEKE320/llm-dataset-generator
The LLM Dataset Generator is an open source tool for generating text data compatible with various language models supported by LangChain. You can customize it to meet your specific needs, making it a valuable resource for researchers, developers, and organizations working on NLP applications.
No commits in the last 6 months.
Stars
6
Forks
—
Language
Jupyter Notebook
License
CC0-1.0
Category
Last pushed
Nov 18, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/ZEKE320/llm-dataset-generator"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
VikParuchuri/textbook_quality
Generate textbook-quality synthetic LLM pretraining data
dmanuel64/codablellm
A framework for creating and curating high-quality code datasets tailored for large language models
BhabhaAI/dataformer
Solving data for LLMs - Create quality synthetic datasets!
BothBosu/Synthetic-Data-for-Scam-Detection-Leveraging-LLMs-to-Train-Deep-Learning-Models
This repository contains the source code and synthetic datasets used in the research on scam...
iiis-ai/TemplateMath
[ICLR 2025 DATA-FM] Training and Evaluating Language Models with Template-based Data Generation...