RAGWire and pdf-process-rag

These are competitors offering overlapping RAG-based PDF processing solutions, with RAGWire being the mature, production-ready choice (supporting multiple document formats, hybrid search, and deduplication) while pdf-process-rag appears to be an earlier-stage alternative focused on vector embeddings for PDF querying.

RAGWire
51
Established
pdf-process-rag
24
Experimental
Maintenance 13/25
Adoption 12/25
Maturity 18/25
Community 8/25
Maintenance 2/25
Adoption 1/25
Maturity 9/25
Community 12/25
Stars: 8
Forks: 1
Downloads: 1,249
Commits (30d): 0
Language: Python
License: MIT
Stars: 1
Forks: 1
Downloads:
Commits (30d): 0
Language: Python
License: MIT
No risk flags
Stale 6m No Package No Dependents

About RAGWire

laxmimerit/RAGWire

Production-grade RAG toolkit — ingest PDFs, DOCX, XLSX into Qdrant with LLM metadata extraction, hybrid search, and SHA256 deduplication.

Supports multi-format ingestion via MarkItDown (PPTX, XLSX, DOCX, PDFs), markdown-aware recursive chunking, and customizable LLM-based metadata extraction through YAML configuration. Provides pluggable embedding providers (Ollama, OpenAI, HuggingFace, Google, FastEmbed) with Qdrant's dense/sparse/hybrid search, plus directory-level ingestion with file and chunk-level SHA256 deduplication. Designed as modular Python components with environment variable substitution for production deployments.

About pdf-process-rag

salameaz/pdf-process-rag

A Python-based application that extracts and processes PDF content using a Retrieval-Augmented Generation (RAG) approach. Leverage vector embeddings to enable efficient querying of both text-based and scanned PDFs, and interact with your documents using a large language model.

Scores updated daily from GitHub, PyPI, and npm data. How scores work