OceanPresentChao/llm-corpus

从零搭建大模型知识库(Build LLM RAG Corpus from scratch)

29
/ 100
Experimental

Implements a complete RAG pipeline with custom Word2Vec embedding training for Chinese corpora, vector persistence in Qdrant, and flexible model backends supporting both local ChatGLM2-6B deployment and OpenAI APIs. The modular architecture separates document ingestion, embedding generation, vector storage, and inference into independent components configurable via JSON, enabling experimentation with different embedding models and LLM providers without code changes.

No commits in the last 6 months.

No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 9 / 25
Maturity 8 / 25
Community 12 / 25

How are scores calculated?

Stars

86

Forks

9

Language

Python

License

Category

local-rag-stacks

Last pushed

Oct 23, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/vector-db/OceanPresentChao/llm-corpus"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.