carbonz0/alpaca-chinese-dataset
alpaca中文指令微调数据集
Dataset comprises instruction-output pairs generated through machine translation and self-instruct methodology, formatted identically to the original Alpaca dataset's JSON structure for seamless compatibility with existing LLM fine-tuning pipelines. Data generation combines automated translation of English instructions with self-bootstrapping techniques, though cleaning methodologies and keyword filtering rules remain in development. Targets Chinese language model instruction tuning, enabling adaptation of instruction-following capabilities to Mandarin without requiring manual annotation at scale.
397 stars. No commits in the last 6 months.
Stars
397
Forks
24
Language
—
License
—
Category
Last pushed
Mar 26, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/carbonz0/alpaca-chinese-dataset"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
axolotl-ai-cloud/axolotl
Go ahead and axolotl questions
google/paxml
Pax is a Jax-based machine learning framework for training large scale models. Pax allows for...
JosefAlbers/PVM
Phi-3.5 for Mac: Locally-run Vision and Language Models for Apple Silicon
iamarunbrahma/finetuned-qlora-falcon7b-medical
Finetuning of Falcon-7B LLM using QLoRA on Mental Health Conversational Dataset
h2oai/h2o-wizardlm
Open-Source Implementation of WizardLM to turn documents into Q:A pairs for LLM fine-tuning