RAGWire and rag-document-pipeline
These are competitors, as both are production-grade RAG toolkits/pipelines for ingesting multi-format documents like PDFs, with B specifically focusing on extraction and intelligent chunking for vector DB ingestion, while A offers a more complete solution including metadata extraction, hybrid search, and deduplication with Qdrant.
About RAGWire
laxmimerit/RAGWire
Production-grade RAG toolkit — ingest PDFs, DOCX, XLSX into Qdrant with LLM metadata extraction, hybrid search, and SHA256 deduplication.
Supports multi-format ingestion via MarkItDown (PPTX, XLSX, DOCX, PDFs), markdown-aware recursive chunking, and customizable LLM-based metadata extraction through YAML configuration. Provides pluggable embedding providers (Ollama, OpenAI, HuggingFace, Google, FastEmbed) with Qdrant's dense/sparse/hybrid search, plus directory-level ingestion with file and chunk-level SHA256 deduplication. Designed as modular Python components with environment variable substitution for production deployments.
About rag-document-pipeline
salim-lakhal/rag-document-pipeline
Production RAG pipeline: multi-format document extraction → intelligent chunking → metadata-enriched JSONL for vector DB ingestion
Related comparisons
Scores updated daily from GitHub, PyPI, and npm data. How scores work