carbonz0/alpaca-chinese-dataset

alpaca中文指令微调数据集

31
/ 100
Emerging

Dataset comprises instruction-output pairs generated through machine translation and self-instruct methodology, formatted identically to the original Alpaca dataset's JSON structure for seamless compatibility with existing LLM fine-tuning pipelines. Data generation combines automated translation of English instructions with self-bootstrapping techniques, though cleaning methodologies and keyword filtering rules remain in development. Targets Chinese language model instruction tuning, enabling adaptation of instruction-following capabilities to Mandarin without requiring manual annotation at scale.

397 stars. No commits in the last 6 months.

No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 8 / 25
Community 13 / 25

How are scores calculated?

Stars

397

Forks

24

Language

License

Category

llm-fine-tuning

Last pushed

Mar 26, 2023

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/carbonz0/alpaca-chinese-dataset"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.