Domain-Specific Embeddings Embedding Tools

Task-specific embedding models and representations trained on specialized vocabularies, domains, or linguistic phenomena (legislation, events, topics, entities, skills). Does NOT include general-purpose pre-trained embeddings, embedding infrastructure, or domain-agnostic retrieval systems.

There are 82 domain-specific embeddings tools tracked. 1 score above 50 (established tier). The highest-rated is MilaNLProc/contextualized-topic-models at 58/100 with 1,266 stars and 1,326 monthly downloads.

Get all 82 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=embeddings&subcategory=domain-specific-embeddings&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 MilaNLProc/contextualized-topic-models

A python package to run contextualized topic modeling. CTMs combine...

58
Established
2 vinid/cade

Compass-aligned Distributional Embeddings. Align embeddings from different corpora

47
Emerging
3 ina-foss/twembeddings

Sentence embeddings for unsupervised event detection in the Twitter stream:...

43
Emerging
4 criteo-research/CausE

Code for the Recsys 2018 paper entitled Causal Embeddings for Recommandation.

41
Emerging
5 spcl/ncc

Neural Code Comprehension: A Learnable Representation of Code Semantics

41
Emerging
6 vintasoftware/entity-embed

PyTorch library for transforming entities like companies, products, etc....

41
Emerging
7 bnosac/ruimtehol

R package to Embed All the Things! using StarSpace

38
Emerging
8 rodrigobressan/entity_embeddings_categorical

Discover relevant information about categorical data with entity embeddings...

37
Emerging
9 BaseModelAI/cleora

Cleora AI is a general-purpose open-source model for efficient, scalable...

36
Emerging
10 mop/bier

Cleaned up reference implementation of BIER: Boosting Independent Embeddings...

34
Emerging
11 tony-hong/event-embedding-multitask

*SEM 2018: Learning Distributed Event Representations with a Multi-Task Approach

33
Emerging
12 uhh-lt/sensegram

Making sense embedding out of word embeddings using graph-based word sense induction

33
Emerging
13 jxmorris12/cde

code for training & evaluating Contextual Document Embedding models

31
Emerging
14 cpa-analytics/embedding-encoder

Scikit-Learn compatible transformer that turns categorical variables into...

31
Emerging
15 dustinstoltz/CMDist

DEPRECATED - The Concept Mover's Distance Method is now available in the...

30
Emerging
16 bnosac/ETM

Topic Modelling in Semantic Embedding Spaces

30
Emerging
17 lfmatosm/embedded-topic-model

A package to run embedded topic modelling with ETM. Adapted from the...

30
Emerging
18 WladimirSidorenko/SentiLex

Sentiment Lexicon Generation Suite

30
Emerging
19 wangjksjtu/multi-embedding-cws

Multiple Character Embeddings for Chinese Word Segmentation, ACL 2019

30
Emerging
20 y3ro/meemi

Improving cross-lingual word embeddings by meeting in the middle

29
Experimental
21 milangritta/Minimalist-Location-Metonymy-Resolution

The code and data accompanying the ACL 2017 "outstanding award" publication ...

29
Experimental
22 dustinstoltz/cartography_poetics

Reproduction Repository for "Cultural Cartography with Word Embeddings"

27
Experimental
23 dkn22/embedder

Embed categorical variables via neural networks.

27
Experimental
24 marziehf/TS_Embeddings

Learning topic-sensitive word embeddings

26
Experimental
25 kaushalshetty/Positional-Encoding

Encoding position with the word embeddings.

26
Experimental
26 oentaryorj/smu.softeng.crossact

Cross-platform activity prediction

25
Experimental
27 vgupta123/P-SIF

Source code for our AAAI 2020 paper P-SIF: Document Embeddings using...

25
Experimental
28 arsena-k/discourse_atoms

How are topics encoded in semantic space? Repository to accompany PNAS...

25
Experimental
29 ikergarcia1996/MVM-Embeddings

A monolingual and cross-lingual meta-embedding generation framework

24
Experimental
30 ltgoslo/diachronic_armed_conflicts

Diachronic armed conflicts prediction from news texts

24
Experimental
31 garawalid/Multilingual-Unsupervised-Embeddings

Align two embeddings (EN - FR) using MUSE (Unsupervised)

24
Experimental
32 skesiraju/smm

Subspace multinomial model for learning document representations

24
Experimental
33 dwulff/embedR

Generate and analyze state-of-the-art text embeddings

24
Experimental
34 harmanpreet93/poincare-embedding-using-gensim

Train poincare embedding using gensim

23
Experimental
35 catalyst-cooperative/ccai-entity-matching

An exploration of generalizable approaches to unsupervised entity matching...

23
Experimental
36 cisnlp/MEXA

🔍 Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignment

23
Experimental
37 moonlockwood/BinaryNeuralNetwork

Tiny nn for experimenting with '8-hot' binary encoded embeddings

22
Experimental
38 armintabari/Emotional-Embedding

Retraining embedding models to incorporate emotional constraints.

21
Experimental
39 arranger1044/spae

Code and supplemental material for "Sum-Product Autoencoding: Encoding and...

21
Experimental
40 BUTSpeechFIT/BaySMM

A Bayesian Multilingual Document Model

21
Experimental
41 rug-compling/bimu

Bilingual Learning of Multi-sense Embeddings with Discrete Autoencoders

21
Experimental
42 stephantul/piecelearn

Learning BPE embeddings by first learning a segmentation model and then...

20
Experimental
43 ChenghaoMou/embeddings

zero-vocab or low-vocab embeddings

20
Experimental
44 shuxiaobo/text-representation

Text representation works, such as : paper, code, review, datasets, blogs,...

20
Experimental
45 dalisson/am_softmax

This is a pytorch implementation of the am_softmax, this softmax layer...

20
Experimental
46 corradomonti/ideological-embeddings

Code and data for the CIKM2021 paper "Learning Ideological Embeddings From...

20
Experimental
47 MiuLab/GenDef

Probing task; contextual embeddings -> textual definitions (EMNLP19)

20
Experimental
48 zouharvi/pwesuite

Suite for phonetic word embeddings, especially their evaluation and baseline models.

20
Experimental
49 fursovia/geometric_embedding

"Zero-Training Sentence Embedding via Orthogonal Basis" paper implementation

19
Experimental
50 centre-for-humanities-computing/embedding-projection

This is a repository for reproducing the results of Continuous sentiment...

18
Experimental
51 rimonim/embedplyr

Tools for Working With Text Embeddings in R

18
Experimental
52 gabmoreira/subspaces

Code for the paper Learning Visual-Semantic Subspace Representations

18
Experimental
53 yyaghoobzadeh/figment-multi

Multi-level Representations for Fine-Grained Typing of Knowledge Base Entities

18
Experimental
54 g-laz77/Cross-Lingual-Word-Embeddings

Learn a shared embedding space between words in multiple languages.

18
Experimental
55 victor7246/MRF-LDA

MRF-LDA model for topic modelling

18
Experimental
56 pedrada88/relative

Repository to learn relation vectors from text corpora. Includes the...

17
Experimental
57 manojsukhavasi/Unsupervised-Cross-Lingual-Embeddings

cross-lingual word embeddings with unsupervised learning

17
Experimental
58 junyachen/NPMM

A nonparametric model for online topic discovery with word embeddings

17
Experimental
59 soliblue/Reddit-Politics

Code for a large-scale analysis of political subcommunities on Reddit,...

16
Experimental
60 JanEnglerRWTH/SensePOLAR

Code related to the project: SensePOLAR: Word sense aware interpretability...

16
Experimental
61 vpuru98/Embeddings

Training Word Embeddings and using them to perform Sentiment Analysis with...

16
Experimental
62 izhx/uni-rep

Code for embedding and retrieval research.

15
Experimental
63 kiudee/pareto-embeddings

Advanced choice modeling with multidimensional utility representations.

14
Experimental
64 jparkerweb/fast-topic-analysis

🏷️ Fast Topic Analysis is a tool for analyzing text against predefined...

14
Experimental
65 gabmoreira/subembed

Repository for the paper: Native Logical and Hierarchical Representations...

14
Experimental
66 r2d4/blog-embeddings

Script to generate embeddings from a blog and use GPT-3.5 to categorize the...

13
Experimental
67 Riccorl/sense-embedding

BabelNet (and WordNet) sense embedding trained with Word2Vec and FastText

13
Experimental
68 do-me/embedding-algebra

Test scripts for common word embedding falsehoods like King - Man + Woman =...

13
Experimental
69 satya77/Entity_Embedding

Reference implementation of the paper "Word Embeddings for Entity-annotated Texts"

12
Experimental
70 pedrada88/rwe

Repository containing data and code of the ACL-19 paper "Relational Word Embeddings"

12
Experimental
71 csiro-robotics/MDL

🔥[IEEE TPAMI 2023] Official repository TPAMI 2023 paper "Exploiting Field...

12
Experimental
72 tteofili/jtm

tool for extraction of topics from jira issues

12
Experimental
73 jdenes/TopicEmbeddings

An open-source framework to create and test document embeddings using topic models.

11
Experimental
74 yigitsever/Evaluating-Dictionary-Alignment

Code for the paper "Evaluating cross-lingual textual similarity on...

11
Experimental
75 stannida/skill-embeddings

Repository for the Master Thesis "Encoding semantic information about skills...

11
Experimental
76 Develop-Packt/Deep-Learning-for-Text-embeddings

This module demonstrates the power of word embeddings and explains the...

11
Experimental
77 slowwavesleep/FnSenseMapper

A tool to map FrameNet Lexical Units to BabelNet synsets using the distance...

11
Experimental
78 akshaychawla/Accelerated-Training-by-disentangling-neural-representations

Just a theory.

11
Experimental
79 KlaraGtknst/text_topic

This repository implements a pipeline to store various data of files from a...

11
Experimental
80 thecml/neural_embedder

A small library that can encode categorical variables to entity embeddings...

10
Experimental
81 apostolidoum/modeling-behaviour-of-SoC-players

Code for my Diploma Thesis. The goal was to model the players' behavior by...

10
Experimental
82 amitkumarj441/CIKM2023_SubspaceEmbedding

Pluggable Embedding Code for our CIKM paper titled "Lightweight Adaptation...

10
Experimental