gentaiscool/distfuse
A library to calculate similarity scores between two collections of text sequences encoded using transformer models for bitext mining, dense retrieval, retrieval-based classification, and retrieval-augmented generation (RAG).
Supports ensemble scoring by combining multiple heterogeneous encoders (Hugging Face, OpenAI, Cohere) with configurable weights and distance metrics (cosine, euclidean, manhattan), including instruction-tuned models. Implements the MINERS approach for multilingual dense retrieval, enabling both pairwise similarity computation and multi-reference evaluation scenarios where predictions are scored against variable numbers of reference texts.
No commits in the last 6 months. Available on PyPI.
Stars
5
Forks
3
Language
Python
License
Apache-2.0
Category
Last pushed
Jun 22, 2024
Monthly downloads
20
Commits (30d)
0
Dependencies
5
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/embeddings/gentaiscool/distfuse"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
aryn-ai/sycamore
🍁 Sycamore is an LLM-powered search and analytics platform for unstructured data.
deepset-ai/haystack-tutorials
Here you can find all the Tutorials for Haystack 📓
MaartenGr/PolyFuzz
Fuzzy string matching, grouping, and evaluation.
unum-cloud/USearch
Fast Open-Source Search & Clustering engine × for Vectors & Arbitrary Objects × in C++, C,...
pinecone-io/pinecone-datasets
An open-source dataset library for pre-embedded dataset: create your own data catalog, or use...