rragundez/chunkdot

Multi-threaded matrix multiplication and cosine similarity calculations for dense and sparse matrices. Appropriate for calculating the K most similar items for a large number of items by chunking the item matrix representation (embeddings) and using Numba to accelerate the calculations.

/ 100

Emerging

Implements memory-efficient chunked processing with configurable RAM budgets and supports both self-similarity and cross-similarity queries, returning results as sparse CSR matrices. Provides a scikit-learn transformer interface for end-to-end pipelines with structured data, enabling direct integration into preprocessing workflows. Numba JIT compilation accelerates the core similarity computations while multi-threading parallelizes chunk processing across CPU cores.

No commits in the last 6 months. Available on PyPI.

Stale 6m

Maintenance 0 / 25

Adoption 9 / 25

Maturity 25 / 25

Community 8 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Higher-rated alternatives

curiosity-ai/catalyst

🚀 Catalyst is a C# Natural Language Processing library built for speed. Inspired by spaCy's...

Azure/azure-search-vector-samples

A repository of code samples for Vector search capabilities in Azure AI Search.

supabase/embeddings-generator

GitHub Action to generate embeddings from the markdown files in your repository.

vector-ai/vectorai

Vector AI — A platform for building vector based applications. Encode, query and analyse data...

yusufhilmi/client-vector-search

A client side vector search library that can embed, store, search, and cache vectors. Works on...

Explore Embedding Tools

All categories Trending Embeddings directory Insights