shibing624/similarities

Similarities: a toolkit for similarity calculation and semantic search. 相似度计算、匹配搜索工具包，支持亿级数据文搜文、文搜图、图搜图，python3开发，开箱即用。

/ 100

Established

Implements multiple semantic matching architectures including CoSENT and CLIP models with support for various embedding-based search backends (Faiss, Annoy, HNSW) optimized for billion-scale retrieval. Provides unified APIs for computing text, image, and cross-modal similarities using pre-trained transformer models from Hugging Face, with additional literal matching methods (BM25, Word2Vec, SimHash) for cold-start scenarios. Includes CLI tools, FastAPI backend services, and Gradio frontends for production deployment of search and clustering pipelines.

899 stars and 404 monthly downloads. Actively maintained with 1 commit in the last 30 days. Available on PyPI.

Maintenance 13 / 25

Adoption 16 / 25

Maturity 18 / 25

Community 19 / 25

How are scores calculated?

Stars

899

Forks

Language

Python

License

Apache-2.0

Related tools

explosion/sense2vec

🦆 Contextually-keyed word vectors

chakki-works/chakin

Simple downloader for pre-trained word vectors

pdrm83/sent2vec

How to encode sentences in a high-dimensional vector space, a.k.a., sentence embedding.

sebischair/Lbl2Vec

Lbl2Vec learns jointly embedded label, document and word vectors to retrieve documents with...

code-kern-ai/embedders

With embedders, you can easily convert your texts into sentence- or token-level embeddings...

Explore Embedding Tools

All categories Trending Embeddings directory Insights