xhluca/bm25s

Fast lexical search implementing BM25 in Python

/ 100

Verified

Leverages sparse matrix representations to eagerly compute and cache BM25 scores for all document tokens, enabling sub-millisecond query scoring without runtime computation. Built entirely on NumPy with optional Numba JIT compilation for further acceleration, and integrates with lightweight stemming libraries like PyStemmer for linguistic preprocessing. Designed as a drop-in replacement for Elasticsearch and rank-bm25, offering a Python-native alternative with no external service dependencies.

1,560 stars and 1,192,545 monthly downloads. Used by 12 other packages. Actively maintained with 13 commits in the last 30 days. Available on PyPI.

Maintenance 20 / 25

Adoption 25 / 25

Maturity 18 / 25

Community 17 / 25

How are scores calculated?

Stars

1,560

Forks

Language

Python

License

MIT

Category

retrieval-ranking-fusion

Last pushed

Mar 06, 2026

Monthly downloads

1,192,545

Commits (30d)

Dependencies

Reverse dependents

GitHub PyPI

Retrieval Ranking Fusion · 3 tools

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/embeddings/xhluca/bm25s"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

Featured in

Embeddings Are Easier Than Whatever You're Doing Instead

Related tools

ALucek/QuicKB

Optimize Document Retrieval with Fine-Tuned KnowledgeBases

analyticsinmotion/symrank

🐍📦 High-performance cosine similarity ranking for Retrieval-Augmented Generation (RAG) pipelines.

Explore Embedding Tools

All categories Trending Embeddings directory Insights