charent/Phi2-mini-Chinese

Phi2-Chinese-0.2B 从0开始训练自己的Phi2中文小模型,支持接入langchain加载本地知识库做检索增强生成RAG。Training your own Phi2 small chat model from scratch.

38
/ 100
Emerging

Implements a complete training pipeline from tokenizer creation (byte-level BPE) through CLM pretraining and SFT instruction tuning to DPO preference optimization, with Flash Attention 2 acceleration support. Uses BELLE datasets for both pretraining and fine-tuning, with specialized data cleaning and formatting (BOS/EOS tokens, document boundaries) optimized for Chinese text. Integrates with LangChain for RAG applications, enabling retrieval-augmented generation with local knowledge bases through the standard HuggingFace transformers interface.

586 stars. No commits in the last 6 months.

Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 9 / 25
Community 19 / 25

How are scores calculated?

Stars

586

Forks

66

Language

Jupyter Notebook

License

Apache-2.0

Last pushed

Jul 11, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/rag/charent/Phi2-mini-Chinese"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.