rknightion/paperless-ngx-dedupe
Helps dedupe your paperless-ngx instance
Combines MinHash/LSH algorithms for O(n log n) duplicate detection with LLM-powered metadata extraction (OpenAI/Anthropic) and RAG-based document Q&A using hybrid vector+full-text search. Syncs bidirectionally with Paperless-NGX via REST API, processes documents in background worker threads, and exposes results through a web UI and REST API with Server-Sent Events progress tracking. Single-container deployment with SQLite, OpenTelemetry observability, and bulk operations for reviewing and applying deduplication results.
Stars
11
Forks
—
Language
TypeScript
License
GPL-3.0
Category
Last pushed
Mar 27, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/rag/rknightion/paperless-ngx-dedupe"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
joungminsung/OpenDocuments
Self-hosted open-source RAG platform that unifies organizational documents and answers natural...
PT-Perkasa-Pilar-Utama/ppu-pdf
Pdf utilities for text extraction in digital and convert scanned pdf into canvas.
osllmai/inDox
The Indox Ecosystem offers integrated AI tools for data workflows. Our four components...
pega2077/ai_file_manager
AIFileManager--AI based file manager. Auto tag,classify,rag your documents,images,videos
Harry-027/DocuMind
A document based RAG application