qubasehq/qudata
A comprehensive LLM data processing system designed to transform raw multi-format data into high-quality training datasets optimized for Large Language Models.
No commits in the last 6 months. Available on PyPI.
Stars
1
Forks
—
Language
Python
License
—
Category
Last pushed
Aug 22, 2025
Commits (30d)
0
Dependencies
32
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/qubasehq/qudata"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
allenai/dolma
Data and tools for generating and inspecting OLMo pre-training data.
waikato-llm/llm-dataset-converter
For converting LLM datasets from one format into another.
refuel-ai/autolabel
Label, clean and enrich text datasets with LLMs.
niclasgriesshaber/llm_patent_pipeline
LLMs for Historical Dataset Construction from Archival Image Scans
cgxjdzz/FeatureForge-LLM
FeatureForge LLM is a Python package that leverages large language models (LLMs) to automate and...