charent/Phi2-mini-Chinese

Phi2-Chinese-0.2B 从0开始训练自己的Phi2中文小模型，支持接入langchain加载本地知识库做检索增强生成RAG。Training your own Phi2 small chat model from scratch.

/ 100

Emerging

Implements a complete training pipeline from tokenizer creation (byte-level BPE) through CLM pretraining and SFT instruction tuning to DPO preference optimization, with Flash Attention 2 acceleration support. Uses BELLE datasets for both pretraining and fine-tuning, with specialized data cleaning and formatting (BOS/EOS tokens, document boundaries) optimized for Chinese text. Integrates with LangChain for RAG applications, enabling retrieval-augmented generation with local knowledge bases through the standard HuggingFace transformers interface.

586 stars. No commits in the last 6 months.

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 9 / 25

Community 19 / 25

How are scores calculated?

Stars

586

Forks

Language

Jupyter Notebook

License

Apache-2.0

Higher-rated alternatives

run-llama/llama_index

LlamaIndex is the leading document agent and OCR platform

emarco177/documentation-helper

Reference implementation of a RAG-based documentation helper using LangChain, Pinecone, and Tavily..

janus-llm/janus-llm

Leveraging LLMs for modernization through intelligent chunking, iterative prompting and...

JetXu-LLM/llama-github

Llama-github is an open-source Python library that empowers LLM Chatbots, AI Agents, and...

Vasallo94/ObsidianRAG

RAG system to query your Obsidian notes using LangGraph and local LLMs (Ollama)

Explore RAG Tools

All categories Trending RAG directory Insights