laxmimerit/RAGWire

Production-grade RAG toolkit — ingest PDFs, DOCX, XLSX into Qdrant with LLM metadata extraction, hybrid search, and SHA256 deduplication.

/ 100

Established

Supports multi-format ingestion via MarkItDown (PPTX, XLSX, DOCX, PDFs), markdown-aware recursive chunking, and customizable LLM-based metadata extraction through YAML configuration. Provides pluggable embedding providers (Ollama, OpenAI, HuggingFace, Google, FastEmbed) with Qdrant's dense/sparse/hybrid search, plus directory-level ingestion with file and chunk-level SHA256 deduplication. Designed as modular Python components with environment variable substitution for production deployments.

8 stars and 1,249 monthly downloads. Used by 1 other package. Available on PyPI.

Maintenance 13 / 25

Adoption 12 / 25

Maturity 18 / 25

Community 8 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Compare

RAGWire and ViewRAG RAGWire and pdf-process-rag RAGWire and PythoRAG RAGWire and IntGathering-x-RAG--BlazingDocs RAGWire and rag-document-pipeline

Related tools

thiswillbeyourgithub/wdoc

Summarize and query from a lot of heterogeneous documents. Any LLM provider, any filetype,...

Arterning/DeepParseX

DeepParseX 是一个强大的多模态文档解析与知识管理平台，支持 PDF、Word、Excel、PPT、图片、视频、音频等多种文件格式的智能解析，自动提取关键信息，并构建...

NoEdgeAI/pdfdeal

A python wrapper for the Doc2X API and comes with native texts processing (to improve PDF recall...

atpuxiner/docsloader

This is a documents loader. (文档解析加载器，rag文档解析，rag知识库构建)

David-Lolly/ViewRAG

图文并茂的 PDF RAG 系统：支持版式感知分块、图表深度理解与精准视觉溯源。 Multimodal PDF RAG: Features layout-aware chunking,...

Explore RAG Tools

All categories Trending RAG directory Insights