vintasoftware/entity-embed
PyTorch library for transforming entities like companies, products, etc. into vectors to support scalable Record Linkage / Entity Resolution using Approximate Nearest Neighbors.
Combines PyTorch Lightning for training with the N2 approximate nearest neighbor library to enable efficient blocking at scale, achieving ~0.99 recall with minimal false negatives. Uses contrastive learning to organize embeddings in N-dimensional space where duplicate records cluster together, optimized specifically for the blocking/indexing stage rather than final pair matching. Designed as a preprocessing component in entity resolution pipelines, returning high-recall candidate pairs for downstream filtering by pairwise classifiers.
161 stars. No commits in the last 6 months. Available on PyPI.
Stars
161
Forks
16
Language
Jupyter Notebook
License
MIT
Category
Last pushed
Nov 18, 2022
Commits (30d)
0
Dependencies
12
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/embeddings/vintasoftware/entity-embed"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
MilaNLProc/contextualized-topic-models
A python package to run contextualized topic modeling. CTMs combine contextualized embeddings...
vinid/cade
Compass-aligned Distributional Embeddings. Align embeddings from different corpora
ina-foss/twembeddings
Sentence embeddings for unsupervised event detection in the Twitter stream: study on English and...
criteo-research/CausE
Code for the Recsys 2018 paper entitled Causal Embeddings for Recommandation.
spcl/ncc
Neural Code Comprehension: A Learnable Representation of Code Semantics