zeroentropy-ai/zchunk
A new chunking strategy developed by ZeroEntropy for general semantic chunking using Llama-70B.
Leverages Llama-70B's logprobs to detect semantic boundaries without regex or embedding similarity—by injecting a special token and measuring model confidence in its placement across the document. Includes a benchmarked hyperparameter tuning pipeline and evaluates performance on LegalBenchConsumerContractsQA using dual metrics: retrieval faithfulness and signal-to-noise ratio. Outperforms fixed-size and embedding-based chunking strategies across multiple document types (Python, Markdown, HTML, legal text) with optimization for KV-cache inference efficiency.
254 stars. No commits in the last 6 months.
Stars
254
Forks
20
Language
Python
License
—
Category
Last pushed
Jan 28, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/zeroentropy-ai/zchunk"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
icereed/paperless-gpt
Use LLMs and LLM Vision (OCR) to handle paperless-ngx - Document Digitalization powered by AI
retab-dev/retab
The developper starter pack for document processing
SharanyaAchanta/LexTransition-AI
LexTransition AI is an open-source, offline-first legal assistant. It helps users navigate the...
lvwzhen/law-cn-ai
⚖️ AI 法律助手
CTU-LinguTechies/VN-Law-Advisor
Ứng dụng hỗ trợ tra cứu, hỏi đáp tri thức pháp luật dựa trên Bộ pháp điển và CSDL văn bản QPPL Việt Nam.