rragundez/chunkdot
Multi-threaded matrix multiplication and cosine similarity calculations for dense and sparse matrices. Appropriate for calculating the K most similar items for a large number of items by chunking the item matrix representation (embeddings) and using Numba to accelerate the calculations.
Implements memory-efficient chunked processing with configurable RAM budgets and supports both self-similarity and cross-similarity queries, returning results as sparse CSR matrices. Provides a scikit-learn transformer interface for end-to-end pipelines with structured data, enabling direct integration into preprocessing workflows. Numba JIT compilation accelerates the core similarity computations while multi-threading parallelizes chunk processing across CPU cores.
No commits in the last 6 months. Available on PyPI.
Stars
86
Forks
5
Language
Python
License
MIT
Category
Last pushed
Dec 28, 2024
Commits (30d)
0
Dependencies
5
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/embeddings/rragundez/chunkdot"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
curiosity-ai/catalyst
🚀 Catalyst is a C# Natural Language Processing library built for speed. Inspired by spaCy's...
Azure/azure-search-vector-samples
A repository of code samples for Vector search capabilities in Azure AI Search.
supabase/embeddings-generator
GitHub Action to generate embeddings from the markdown files in your repository.
vector-ai/vectorai
Vector AI — A platform for building vector based applications. Encode, query and analyse data...
yusufhilmi/client-vector-search
A client side vector search library that can embed, store, search, and cache vectors. Works on...