shibing624/similarities
Similarities: a toolkit for similarity calculation and semantic search. 相似度计算、匹配搜索工具包,支持亿级数据文搜文、文搜图、图搜图,python3开发,开箱即用。
Implements multiple semantic matching architectures including CoSENT and CLIP models with support for various embedding-based search backends (Faiss, Annoy, HNSW) optimized for billion-scale retrieval. Provides unified APIs for computing text, image, and cross-modal similarities using pre-trained transformer models from Hugging Face, with additional literal matching methods (BM25, Word2Vec, SimHash) for cold-start scenarios. Includes CLI tools, FastAPI backend services, and Gradio frontends for production deployment of search and clustering pipelines.
899 stars and 404 monthly downloads. Actively maintained with 1 commit in the last 30 days. Available on PyPI.
Stars
899
Forks
90
Language
Python
License
Apache-2.0
Category
Last pushed
Mar 05, 2026
Monthly downloads
404
Commits (30d)
1
Dependencies
7
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/embeddings/shibing624/similarities"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
explosion/sense2vec
🦆 Contextually-keyed word vectors
chakki-works/chakin
Simple downloader for pre-trained word vectors
pdrm83/sent2vec
How to encode sentences in a high-dimensional vector space, a.k.a., sentence embedding.
sebischair/Lbl2Vec
Lbl2Vec learns jointly embedded label, document and word vectors to retrieve documents with...
code-kern-ai/embedders
With embedders, you can easily convert your texts into sentence- or token-level embeddings...