jacobmarks/semantic-document-search-plugin

Semantically search through OCR text blocks with Qdrant, Sentence Transformers, and FiftyOne!

/ 100

Experimental

This tool helps you quickly find specific information within large collections of scanned documents, like research papers or historical archives. It takes your scanned documents (with text extracted by OCR) and a natural language query, then identifies and shows you the most relevant text blocks. This is ideal for researchers, librarians, or anyone who needs to pinpoint exact content across many documents without relying on exact keyword matches.

No commits in the last 6 months.

Use this if you need to intelligently search through digitized documents, finding relevant passages even when your search terms aren't exact matches to the text.

Not ideal if you are working with born-digital text documents where keyword search is sufficient, or if your documents are not yet processed with OCR.

document-analysis information-retrieval research-assist archive-management digital-humanities

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 5 / 25

Maturity 8 / 25

Community 0 / 25

How are scores calculated?

Stars

Forks

—

Language

Python

License

—

Featured in

Embeddings Are Easier Than Whatever You're Doing Instead

Higher-rated alternatives

meilisearch/meilisearch

A lightning-fast search engine API bringing AI-powered hybrid search to your sites and applications.

nuclia/nucliadb

NucliaDB, The AI Search database for RAG

vespa-engine/vespa

AI + Data, online. https://vespa.ai

ICIJ/datashare

A self‑hosted search engine for documents

PrithivirajDamodaran/FlashRank

Lite & Super-fast re-ranking for your search & retrieval pipelines. Supports SoTA Listwise and...

Explore Vector Databases

All categories Trending Vector Database directory Insights