colonelwatch/abstracts-search

Semantic search engine indexing 110 million academic publications

/ 100

Emerging

Generates dense vector embeddings from 110M academic abstracts using the Stella 1.5B model, then builds a FAISS index for fast approximate nearest-neighbor retrieval—all components (embeddings, index, search interface) are published as separate Hugging Face datasets and spaces for modularity. Integrates with OpenAlex for publication metadata and supports incremental syncing against quarterly dataset snapshots to keep the index current. The modular architecture allows running just the search interface without rebuilding, or performing full reindexing on commodity hardware (RTX 3060, 32GB RAM+swap) in under a week.

102 stars.

No Package No Dependents

Maintenance 10 / 25

Adoption 9 / 25

Maturity 16 / 25

Community 9 / 25

How are scores calculated?

Stars

102

Forks

Language

Python

License

Apache-2.0

Related tools

ahr9n/quranic-search-v2

Quranic Lexical/Semantic Search

VIGINUM-FR/D3lta

A Python implementation of the D3lta algorithm for duplicated textual content detection

geetanjaliapp/geetanjali

RAG-powered ethical decision guidance from Bhagavad Geeta. Analyze dilemmas, get structured...

hazemabdelkawy/SunnahGPT

SunnahGPT is a natural language processing (NLP) project aimed at scraping hadith data from the...

mufaizz/FAIZ-AI

FAIZ AI 🔍 – The search bot that finds what others miss. Searches HTTP, FTP, IPFS & Torrent with...

Explore Embedding Tools

All categories Trending Embeddings directory Insights