xhluca/bm25s

Fast lexical search implementing BM25 in Python

80
/ 100
Verified

Leverages sparse matrix representations to eagerly compute and cache BM25 scores for all document tokens, enabling sub-millisecond query scoring without runtime computation. Built entirely on NumPy with optional Numba JIT compilation for further acceleration, and integrates with lightweight stemming libraries like PyStemmer for linguistic preprocessing. Designed as a drop-in replacement for Elasticsearch and rank-bm25, offering a Python-native alternative with no external service dependencies.

1,560 stars and 1,192,545 monthly downloads. Used by 12 other packages. Actively maintained with 13 commits in the last 30 days. Available on PyPI.

Maintenance 20 / 25
Adoption 25 / 25
Maturity 18 / 25
Community 17 / 25

How are scores calculated?

Stars

1,560

Forks

93

Language

Python

License

MIT

Last pushed

Mar 06, 2026

Monthly downloads

1,192,545

Commits (30d)

13

Dependencies

1

Reverse dependents

12

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/embeddings/xhluca/bm25s"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.