laxmimerit/RAGWire
Production-grade RAG toolkit — ingest PDFs, DOCX, XLSX into Qdrant with LLM metadata extraction, hybrid search, and SHA256 deduplication.
Supports multi-format ingestion via MarkItDown (PPTX, XLSX, DOCX, PDFs), markdown-aware recursive chunking, and customizable LLM-based metadata extraction through YAML configuration. Provides pluggable embedding providers (Ollama, OpenAI, HuggingFace, Google, FastEmbed) with Qdrant's dense/sparse/hybrid search, plus directory-level ingestion with file and chunk-level SHA256 deduplication. Designed as modular Python components with environment variable substitution for production deployments.
8 stars and 1,249 monthly downloads. Used by 1 other package. Available on PyPI.
Stars
8
Forks
1
Language
Python
License
MIT
Category
Last pushed
Mar 27, 2026
Monthly downloads
1,249
Commits (30d)
0
Dependencies
11
Reverse dependents
1
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/rag/laxmimerit/RAGWire"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Compare
Related tools
thiswillbeyourgithub/wdoc
Summarize and query from a lot of heterogeneous documents. Any LLM provider, any filetype,...
Arterning/DeepParseX
DeepParseX 是一个强大的多模态文档解析与知识管理平台,支持 PDF、Word、Excel、PPT、图片、视频、音频 等多种文件格式的智能解析,自动提取关键信息,并构建...
NoEdgeAI/pdfdeal
A python wrapper for the Doc2X API and comes with native texts processing (to improve PDF recall...
atpuxiner/docsloader
This is a documents loader. (文档解析加载器,rag文档解析,rag知识库构建)
David-Lolly/ViewRAG
图文并茂的 PDF RAG 系统:支持版式感知分块、图表深度理解与精准视觉溯源。 Multimodal PDF RAG: Features layout-aware chunking,...