jonaswinkler/paperless-ng
A supercharged version of paperless: scan, index and archive all your physical documents
ArchivedPerforms automatic OCR and full-text indexing on documents (PDF, images, Office formats via Apache Tika), with machine learning-powered auto-tagging of correspondents and document types. Provides a modern single-page web frontend with relevance-ranked full-text search, email ingestion with filtering rules, and parallel document processing optimized for multi-core systems. Stores documents plainly on disk with configurable naming schemes, integrates with network scanners via FTP or mobile apps, and ships as a Docker Compose deployment.
5,416 stars. No commits in the last 6 months.
Stars
5,416
Forks
349
Language
Python
License
GPL-3.0
Category
Last pushed
Feb 14, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/jonaswinkler/paperless-ng"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Compare
Higher-rated alternatives
paperless-ngx/paperless-ngx
A community-supported supercharged document management system: scan, index and archive all your documents
GoogleCloudPlatform/document-ai-samples
Sample applications and demos for Document AI, the end-to-end document processing platform on...
aphp/edspdf
EDS-PDF is a generic, pure-Python framework for text extraction from PDF documents. It provides...
aws-solutions/document-understanding-solution
Example of integrating & using Amazon Textract, Amazon Comprehend, Amazon Comprehend Medical,...
naiveHobo/InvoiceNet
Deep neural network to extract intelligent information from invoice documents.