Pretrained Embedding Models Embedding Tools

Tools and implementations for loading, extracting, and utilizing pre-trained language model embeddings (BERT, ELMo, GloVe, RoBERTa, etc.). Does NOT include embedding APIs, vector databases, downstream applications like semantic search, or domain-specific embedding use cases.

There are 45 pretrained embedding models tools tracked. 1 score above 70 (verified tier). The highest-rated is MinishLab/model2vec at 87/100 with 2,008 stars and 582,040 monthly downloads. 1 of the top 10 are actively maintained.

Get all 45 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=embeddings&subcategory=pretrained-embedding-models&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 MinishLab/model2vec

Fast State-of-the-Art Static Embeddings

87
Verified
2 AnswerDotAI/ModernBERT

Bringing BERT into modernity via both architecture changes and scaling

55
Established
3 Santosh-Gupta/SpeedTorch

Library for faster pinned CPU <-> GPU transfer in Pytorch

54
Established
4 twang2218/vocab-coverage

语言模型中文认知能力分析

54
Established
5 Embedding/Chinese-Word-Vectors

100+ Chinese Word Vectors 上百种预训练中文词向量

51
Established
6 tensorflow/hub

A library for transfer learning by reusing parts of TensorFlow models.

51
Established
7 MinishLab/tokenlearn

Pre-train Static Word Embeddings

47
Emerging
8 AliOsm/simplerepresentations

Easy-to-use text representations extraction library based on the...

46
Emerging
9 ltgoslo/simple_elmo

Simple library to work with pre-trained ELMo models in TensorFlow

45
Emerging
10 jasonwei20/eda_nlp

Data augmentation for NLP, presented at EMNLP 2019

42
Emerging
11 pdasigi/onto-lstm

Keras implementation of ontology aware token embeddings

42
Emerging
12 PlanTL-GOB-ES/lm-spanish

Official source for spanish Language Models and resources made @ BSC-TEMU...

40
Emerging
13 Riccorl/transformers-embedder

A Word Level Transformer layer based on PyTorch and 🤗 Transformers.

40
Emerging
14 setu4993/convert-labse-tf-pt

Convert LaBSE model from TF Hub to PyTorch.

38
Emerging
15 YC-wind/embedding_study

中文预训练模型生成字向量学习,测试BERT,ELMO的中文效果

37
Emerging
16 davidberenstein1957/fast-sentence-transformers

Simply, faster, sentence-transformers

36
Emerging
17 siddk/relation-network

Tensorflow Implementation of Relation Networks for the bAbI QA Task,...

34
Emerging
18 MoleculeTransformers/smiles-featurizers

Extract Molecular SMILES embeddings from language models pre-trained with...

29
Experimental
19 fsxfreak/nlp-augment

A collection of utilities used in exploring data augmentation of...

28
Experimental
20 milistu/bertdistiller

Faster, smaller BERT models in just a few lines of code.

28
Experimental
21 Textualization/Ropherta

Compute RoBERTa embeddings in PHP using ONNX framework.

26
Experimental
22 agadetsky/pytorch-definitions

[ACL 2018] Conditional Generators of Words Definitions

26
Experimental
23 jina-ai/embedding-fingerprints

Identify which embedding model produced a vector using digit-level...

25
Experimental
24 windsuzu/Joint-Semantic-Phonetic-Embedding

We use phonetics as a feature to create a joint semantic-phonetic embedding...

25
Experimental
25 WenchenLi/capricorn

nlp vocabulary builder and embedding loader

25
Experimental
26 Textualization/sentence-transphormers

Compute RoBERTa sentence embeddings in PHP using ONNX framework

24
Experimental
27 rcarmo/asterisk-embedding-model

A small text embedding model for low-resource hardware

23
Experimental
28 dataiku/dss-plugin-nlp-embedding

Dataiku DSS plugin to extract vector embeddings from text data 👾

23
Experimental
29 SpydazWebAI-NLP/SpydazWebAI_NLP_Models

Word/Image/Audio Embedding models, Tokenizer models, Ngram language models,...

22
Experimental
30 ksm26/Understanding-and-Applying-Text-Embeddings

Dive into the world of text embeddings. This course will guide you through...

21
Experimental
31 smpanaro/ModernBERT-AppleNeuralEngine

ModernBERT model optimized for Apple Neural Engine.

21
Experimental
32 greninja/NPLM

Neural Network for word embeddings and Language Model

20
Experimental
33 sz128/pretrained_word_embeddings

It is about how to load and aggregate pretrained word embeddings in pytorch,...

20
Experimental
34 rbitr/ferrite

Simple, lightweight transformers in Fortran

20
Experimental
35 Repmak/sentenCPP

C++20 library designed to replicate the functionality and ease of use of the...

19
Experimental
36 rahmanidashti/pretrain-lightfm

Pre-train Embedding in LightFM Recommender System Framework

18
Experimental
37 MayankSingh-coder/octopus-prime

Perceptron-based neural models with tokenization, embeddings, and a minimal...

18
Experimental
38 vliu15/elmo-kmeans

GPU-accelerated Topic Analysis pipeline

18
Experimental
39 ruanchaves/elmo

Supporting code for the paper "Portuguese Language Models and Word...

18
Experimental
40 vliu15/qanet

Tensorflow QANet with ELMo

17
Experimental
41 EsterHlav/Quantitative-Comparison-NLP-Embeddings-from-GloVe-to-RoBERTa

Fair quantitative comparison of NLP embeddings from GloVe to RoBERTa with...

16
Experimental
42 chmcbs/chinese-noun-embeddings

An analysis of how encoder transformer models represent Chinese nouns,...

15
Experimental
43 HenryNdubuaku/pete

Parameter-efficient transformer embeddings replace learned embeddings with...

15
Experimental
44 ada-k/LanguageModels

pretrained transformer and embeddings language models

14
Experimental
45 dayyass/muse_tf2pt

Convert MUSE from TensorFlow to PyTorch and ONNX

14
Experimental