jacobmarks/semantic-document-search-plugin

Semantically search through OCR text blocks with Qdrant, Sentence Transformers, and FiftyOne!

13
/ 100
Experimental

This tool helps you quickly find specific information within large collections of scanned documents, like research papers or historical archives. It takes your scanned documents (with text extracted by OCR) and a natural language query, then identifies and shows you the most relevant text blocks. This is ideal for researchers, librarians, or anyone who needs to pinpoint exact content across many documents without relying on exact keyword matches.

No commits in the last 6 months.

Use this if you need to intelligently search through digitized documents, finding relevant passages even when your search terms aren't exact matches to the text.

Not ideal if you are working with born-digital text documents where keyword search is sufficient, or if your documents are not yet processed with OCR.

document-analysis information-retrieval research-assist archive-management digital-humanities
No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 5 / 25
Maturity 8 / 25
Community 0 / 25

How are scores calculated?

Stars

9

Forks

Language

Python

License

Last pushed

Apr 05, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/vector-db/jacobmarks/semantic-document-search-plugin"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.