colonelwatch/abstracts-search

Semantic search engine indexing 110 million academic publications

44
/ 100
Emerging

Generates dense vector embeddings from 110M academic abstracts using the Stella 1.5B model, then builds a FAISS index for fast approximate nearest-neighbor retrieval—all components (embeddings, index, search interface) are published as separate Hugging Face datasets and spaces for modularity. Integrates with OpenAlex for publication metadata and supports incremental syncing against quarterly dataset snapshots to keep the index current. The modular architecture allows running just the search interface without rebuilding, or performing full reindexing on commodity hardware (RTX 3060, 32GB RAM+swap) in under a week.

102 stars.

No Package No Dependents
Maintenance 10 / 25
Adoption 9 / 25
Maturity 16 / 25
Community 9 / 25

How are scores calculated?

Stars

102

Forks

6

Language

Python

License

Apache-2.0

Last pushed

Jan 19, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/embeddings/colonelwatch/abstracts-search"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.