opensemanticsearch/open-semantic-search

Open Source research tool to search, browse, analyze and explore large document collections by Semantic Search Engine and Open Source Text Mining & Text Analytics platform (Integrates ETL for document processing, OCR for images & PDF, named entity recognition for persons, organizations & locations, metadata management by thesaurus & ontologies, search user interface & search apps for fulltext search, faceted search & knowledge graph)

51
/ 100
Established

Built on Apache Solr for distributed search and Apache Tika for document extraction, it orchestrates a microservices pipeline via Docker Compose integrating spaCy-services for NLP and optional OCR. The modular architecture uses Git submodules (particularly Open Semantic ETL) to decouple components like crawling, text analysis, and indexing, enabling independent scaling and extension. Deployment supports Debian/Ubuntu packages, Docker containers, or a VirtualBox appliance, with automated integration and E2E tests using Playwright for browser automation.

1,154 stars. No commits in the last 6 months.

Stale 6m No Package No Dependents
Maintenance 2 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 23 / 25

How are scores calculated?

Stars

1,154

Forks

196

Language

Shell

License

GPL-3.0

Last pushed

Apr 19, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/opensemanticsearch/open-semantic-search"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.