MaartenGr/PolyFuzz
Fuzzy string matching, grouping, and evaluation.
Supports ensemble matching across multiple algorithms (edit distance, TF-IDF n-grams, FastText, GloVe, and transformer embeddings) with unified `fit`/`transform` workflows for production deployment. Includes hierarchical grouping via single-linkage clustering and precision-recall evaluation curves to validate matching quality at different similarity thresholds. Integrates with Hugging Face transformers, Flair, spaCy, and Gensim backends with optional sparse matrix acceleration.
792 stars. Used by 2 other packages. No commits in the last 6 months. Available on PyPI.
Stars
792
Forks
71
Language
Python
License
MIT
Category
Last pushed
Jul 10, 2025
Commits (30d)
0
Dependencies
9
Reverse dependents
2
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/embeddings/MaartenGr/PolyFuzz"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
aryn-ai/sycamore
🍁 Sycamore is an LLM-powered search and analytics platform for unstructured data.
deepset-ai/haystack-tutorials
Here you can find all the Tutorials for Haystack 📓
unum-cloud/USearch
Fast Open-Source Search & Clustering engine × for Vectors & Arbitrary Objects × in C++, C,...
pinecone-io/pinecone-datasets
An open-source dataset library for pre-embedded dataset: create your own data catalog, or use...
towhee-io/towhee
Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.