Embedding Evaluation Benchmarks Embedding Tools

Tools and frameworks for evaluating, testing, and benchmarking embedding models across various dimensions (quality, stress-testing, cross-lingual performance). Does NOT include embedding generation, pre-trained models, or domain-specific embedding applications.

There are 58 embedding evaluation benchmarks tools tracked. 1 score above 70 (verified tier). The highest-rated is embeddings-benchmark/mteb at 99/100 with 3,159 stars and 1,555,633 monthly downloads. 1 of the top 10 are actively maintained.

Get all 58 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=embeddings&subcategory=embedding-evaluation-benchmarks&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 embeddings-benchmark/mteb

MTEB: Massive Text Embedding Benchmark

99
Verified
2 yannvgn/laserembeddings

LASER multilingual sentence embeddings as a pip package

53
Established
3 harmonydata/harmony

The Harmony Python library: a research tool for psychologists to harmonise...

52
Established
4 embeddings-benchmark/results

Data for the MTEB leaderboard

47
Emerging
5 MilaNLProc/honest

A Python package to compute HONEST, a score to measure hurtful sentence...

45
Emerging
6 fresh-stack/freshstack

This repository helps you evaluate your models on the FreshStack benchmark!

44
Emerging
7 autonomio/signs

A suite of tools for text preparation, vectorization and processing for deep...

41
Emerging
8 Hironsan/awesome-embedding-models

A curated list of awesome embedding models tutorials, projects and communities.

41
Emerging
9 SeanLee97/AnglE

Train and Infer Powerful Sentence Embeddings with AnglE | 🔥 SOTA on STS and...

39
Emerging
10 flipz357/S3BERT

Semantically Structured Sentence Embeddings

39
Emerging
11 etalab-ia/mediatech

Collection of public datasets from the French administration, vectorized and...

38
Emerging
12 plasticityai/magnitude

A fast, efficient universal vector embedding utility package.

37
Emerging
13 isaacus-dev/mleb

The code used to evaluate embedding models on the Massive Legal Embedding...

37
Emerging
14 bheinzerling/bpemb

Pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE)

37
Emerging
15 ricsinaruto/dialog-eval

Evaluate your dialog model with 17 metrics! (see paper)

37
Emerging
16 MaxwellRebo/awesome-2vec

Curated list of 2vec-type embedding models

36
Emerging
17 wangyuxinwhy/uniem

unified embedding model

36
Emerging
18 encord-team/ebind

A 5-way embedding model for text, audio, image, video, and 3D point clouds.

34
Emerging
19 IndicoDataSolutions/Enso

Enso: An Open Source Library for Benchmarking Embeddings + Transfer Learning Methods

32
Emerging
20 janluke/embfile

A package for reading/writing files containing pre-trained word embeddings...

31
Emerging
21 isaacus-dev/open-australian-legal-embeddings-creator

The code used to create and update the Open Australian Legal Embeddings, the...

29
Experimental
22 DeepK/hoDMD-experiments

EigenSent: Spectral sentence embeddings using higher-order Dynamic Mode Decomposition

29
Experimental
23 ikergarcia1996/MetaVec

A monolingual and cross-lingual meta-embedding generation and evaluation framework

28
Experimental
24 vered1986/NC_embeddings

Comparison between various noun compound embeddings

27
Experimental
25 jfilter/hyperhyper

🧮 Python package to construct word embeddings for small data using PMI and SVD

27
Experimental
26 sberdevices/saf_vectorizers

Плагин для SmartApp Framework, осуществляющий векторизацию (получение...

26
Experimental
27 EloiZ/embedding_evaluation

Evaluate your word embeddings

25
Experimental
28 semvec/embedstresstest

Stress Testing Embedding Models

24
Experimental
29 sileod/embcomp

Composition of embeddings

24
Experimental
30 louisbrulenaudet/tax-retrieval-benchmark

An implementation of the TaxRetrievalBenchmark task for the 🤗 Massive Text...

24
Experimental
31 yanaiela/easyEmbed

downloading pre-trained embedding easily and keeping only the necessary...

23
Experimental
32 s1mb1o/epg-embedding-benchmark

Evaluating sentence embedding models for cross-lingual TV program guide...

23
Experimental
33 Sandipan99/POLAR

The POLAR Framework: polar Opposites Enable Interpretability of Pre-Trained...

23
Experimental
34 MukundaKatta/EmbedBench

Embedding model comparison toolkit — benchmark TF-IDF, BoW, n-gram...

22
Experimental
35 Hanscal/textembedding

计算文本相似度时经常需要用到的算法包

21
Experimental
36 AbdulSametTurkmenoglu/embedding_compare

Embedding Model Comparison for Turkish Medical Texts

20
Experimental
37 ClimSocAna/tecb-de

German Text Embedding Clustering Benchmark

20
Experimental
38 rafalposwiata/pl-mteb

PL-MTEB: Polish Massive Text Embedding Benchmark

20
Experimental
39 neural-dialogue-metrics/EmbeddingBased

Embedding-based evaluation metrics for dialogue generation.

20
Experimental
40 eifuentes/awesome-embeddings

🪁A curated list of awesome resources around entity embeddings

20
Experimental
41 OctaviusLeo/rag-lite-tfidf-eval

AI/SWE

19
Experimental
42 guenthermi/table-embeddings

Tools for training schema-aware Web table embedding for unsupervised and...

19
Experimental
43 busycaesar/Embeddings_And_Cosine_Similarity

Code for the presentation.

17
Experimental
44 Paulescu/text-embedding-evaluation

Join 15k builders to the Real-World ML Newsletter ⬇️⬇️⬇️

16
Experimental
45 paithiov909/apportita

Utility for handling ‘magnitude’ pretrained word embeddings

16
Experimental
46 TonioDominguez/dungeons_and_pythons_embeddings

Particular adaptación de juegos de rol basados en texto con tecnología NLP...

15
Experimental
47 kushmadlani/embedtrics

Word embedding evaluation package for word similarity, word analogies & word...

15
Experimental
48 iamtatsuki05/MIREI

MIREI is a research workspace that builds encoder/decoder text-embedding...

15
Experimental
49 dali-does/vse-probing

Code for COLING2020 paper: Probing Multimodal Embeddings for Linguistic Properties.

14
Experimental
50 BYU-PCCL/regexv

Regex using word embeddings for text matching

13
Experimental
51 France-Travail/embcompare

A simple python tool for embedding comparison

13
Experimental
52 MinionAttack/conllu-conll-tool

Tool to convert CoNLL-U format files to CoNLL format files and manipulate...

12
Experimental
53 abhimishra91/corpus-creator

This tool can be used to create a word corpus from locally available...

12
Experimental
54 tahsinkoc/test-embrix-experimental

Comprehensive benchmark suite for evaluating embedding model performance...

11
Experimental
55 aravpanwar/Embedding_Comparision

This repository provides a framework to benchmark the performance and...

11
Experimental
56 metawake/awesome-text-embeddings

A curated list of text embedding models, benchmarks, and tools for semantic...

11
Experimental
57 alecokas/subword-embedding

A tool for generating sub-word (phone or grapheme) level embeddings from an...

10
Experimental
58 inkrement/StuffedTurkey

Distributed Embedding Aggregation

10
Experimental