carbonz0/alpaca-chinese-dataset

alpaca中文指令微调数据集

/ 100

Emerging

Dataset comprises instruction-output pairs generated through machine translation and self-instruct methodology, formatted identically to the original Alpaca dataset's JSON structure for seamless compatibility with existing LLM fine-tuning pipelines. Data generation combines automated translation of English instructions with self-bootstrapping techniques, though cleaning methodologies and keyword filtering rules remain in development. Targets Chinese language model instruction tuning, enabling adaptation of instruction-following capabilities to Mandarin without requiring manual annotation at scale.

397 stars. No commits in the last 6 months.

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 8 / 25

Community 13 / 25

How are scores calculated?

Stars

397

Forks

Language

—

License

—

Higher-rated alternatives

axolotl-ai-cloud/axolotl

Go ahead and axolotl questions

google/paxml

Pax is a Jax-based machine learning framework for training large scale models. Pax allows for...

JosefAlbers/PVM

Phi-3.5 for Mac: Locally-run Vision and Language Models for Apple Silicon

iamarunbrahma/finetuned-qlora-falcon7b-medical

Finetuning of Falcon-7B LLM using QLoRA on Mental Health Conversational Dataset

h2oai/h2o-wizardlm

Open-Source Implementation of WizardLM to turn documents into Q:A pairs for LLM fine-tuning

Explore LLM Tools

All categories Trending LLM Tool directory Insights