SeanWong17/Semantic-Text-Deduplicator
一个基于 Transformer 模型(如BERT)和 FAISS 索引的高性能文本去重工具,专为处理大规模语料库中的语义重复问题而设计。
No commits in the last 6 months.
Stars
3
Forks
1
Language
Python
License
MIT
Category
Last pushed
Aug 12, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/embeddings/SeanWong17/Semantic-Text-Deduplicator"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
deepset-ai/haystack-tutorials
Here you can find all the Tutorials for Haystack 📓
unum-cloud/USearch
Fast Open-Source Search & Clustering engine × for Vectors & Arbitrary Objects × in C++, C,...
towhee-io/towhee
Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.
aryn-ai/sycamore
🍁 Sycamore is an LLM-powered search and analytics platform for unstructured data.
MaartenGr/PolyFuzz
Fuzzy string matching, grouping, and evaluation.