vintasoftware/entity-embed

PyTorch library for transforming entities like companies, products, etc. into vectors to support scalable Record Linkage / Entity Resolution using Approximate Nearest Neighbors.

48
/ 100
Emerging

Combines PyTorch Lightning for training with the N2 approximate nearest neighbor library to enable efficient blocking at scale, achieving ~0.99 recall with minimal false negatives. Uses contrastive learning to organize embeddings in N-dimensional space where duplicate records cluster together, optimized specifically for the blocking/indexing stage rather than final pair matching. Designed as a preprocessing component in entity resolution pipelines, returning high-recall candidate pairs for downstream filtering by pairwise classifiers.

161 stars. No commits in the last 6 months. Available on PyPI.

Stale 6m
Maintenance 0 / 25
Adoption 10 / 25
Maturity 25 / 25
Community 13 / 25

How are scores calculated?

Stars

161

Forks

16

Language

Jupyter Notebook

License

MIT

Last pushed

Nov 18, 2022

Commits (30d)

0

Dependencies

12

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/embeddings/vintasoftware/entity-embed"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.