ICIJ/datashare

A self‑hosted search engine for documents

62
/ 100
Established

Ingests heterogeneous file formats (PDFs, emails, spreadsheets, images, archives) with automatic text extraction via OCR, then enriches documents through named-entity recognition before exposing them via full-text search UI and REST API. Built on Elasticsearch for indexing, PostgreSQL + Liquibase for schema management, and Redis for task orchestration—all running self-hosted with a Vue 3 frontend and extensible plugin architecture for custom analysis modules.

713 stars. Actively maintained with 95 commits in the last 30 days.

No Package No Dependents
Maintenance 25 / 25
Adoption 10 / 25
Maturity 9 / 25
Community 18 / 25

How are scores calculated?

Stars

713

Forks

66

Language

Java

License

AGPL-3.0

Last pushed

Mar 16, 2026

Commits (30d)

95

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/vector-db/ICIJ/datashare"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.