ICIJ/datashare
A self‑hosted search engine for documents
Ingests heterogeneous file formats (PDFs, emails, spreadsheets, images, archives) with automatic text extraction via OCR, then enriches documents through named-entity recognition before exposing them via full-text search UI and REST API. Built on Elasticsearch for indexing, PostgreSQL + Liquibase for schema management, and Redis for task orchestration—all running self-hosted with a Vue 3 frontend and extensible plugin architecture for custom analysis modules.
713 stars. Actively maintained with 95 commits in the last 30 days.
Stars
713
Forks
66
Language
Java
License
AGPL-3.0
Category
Last pushed
Mar 16, 2026
Commits (30d)
95
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/vector-db/ICIJ/datashare"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
meilisearch/meilisearch
A lightning-fast search engine API bringing AI-powered hybrid search to your sites and applications.
nuclia/nucliadb
NucliaDB, The AI Search database for RAG
vespa-engine/vespa
AI + Data, online. https://vespa.ai
PrithivirajDamodaran/FlashRank
Lite & Super-fast re-ranking for your search & retrieval pipelines. Supports SoTA Listwise and...
oramasearch/orama
🌌 A complete search engine and RAG pipeline in your browser, server or edge network with...