gentaiscool/distfuse

A library to calculate similarity scores between two collections of text sequences encoded using transformer models for bitext mining, dense retrieval, retrieval-based classification, and retrieval-augmented generation (RAG).

/ 100

Emerging

Supports ensemble scoring by combining multiple heterogeneous encoders (Hugging Face, OpenAI, Cohere) with configurable weights and distance metrics (cosine, euclidean, manhattan), including instruction-tuned models. Implements the MINERS approach for multilingual dense retrieval, enabling both pairwise similarity computation and multi-reference evaluation scenarios where predictions are scored against variable numbers of reference texts.

No commits in the last 6 months. Available on PyPI.

Stale 6m

Maintenance 0 / 25

Adoption 7 / 25

Maturity 18 / 25

Community 14 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

Apache-2.0

Higher-rated alternatives

aryn-ai/sycamore

🍁 Sycamore is an LLM-powered search and analytics platform for unstructured data.

deepset-ai/haystack-tutorials

Here you can find all the Tutorials for Haystack 📓

MaartenGr/PolyFuzz

Fuzzy string matching, grouping, and evaluation.

unum-cloud/USearch

Fast Open-Source Search & Clustering engine × for Vectors & Arbitrary Objects × in C++, C,...

pinecone-io/pinecone-datasets

An open-source dataset library for pre-embedded dataset: create your own data catalog, or use...

Explore Embedding Tools

All categories Trending Embeddings directory Insights