Embedding Model Tuning Embedding Tools

Tools, techniques, and frameworks for fine-tuning embedding models on domain-specific data to improve performance on downstream tasks. Does NOT include pre-trained embedding models, embedding inference/serving, or applications built on top of embeddings.

There are 41 embedding model tuning tools tracked. 1 score above 50 (established tier). The highest-rated is ContextualAI/gritlm at 56/100 with 688 stars and 12,353 monthly downloads.

Get all 41 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=embeddings&subcategory=embedding-model-tuning&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 ContextualAI/gritlm

Generative Representational Instruction Tuning

56
Established
2 xlang-ai/instructor-embedding

[ACL 2023] One Embedder, Any Task: Instruction-Finetuned Text Embeddings

38
Emerging
3 liuqidong07/LLMEmb

[AAAI'25 Oral] The official implementation code of LLMEmb

35
Emerging
4 ritesh-modi/embedding-hallucinations

This repo shows how foundational model hallucinates and how we can fix such...

33
Emerging
5 hpcaitech/CachedEmbedding

A memory efficient DLRM training solution using ColossalAI

33
Emerging
6 ritesh-modi/fine-tuning-embeddings-template

This repo is a template to fine-tune embedding models using...

31
Emerging
7 shobrook/weightgain

Train an adapter for any embedding model in under a minute

30
Emerging
8 lperezmo/embeddings-extraction

Scripts for reading, extracting, and organizing data from either HTML or PDF...

29
Experimental
9 jjcmoon/DeepSoftLog

Soft-Unification in Deep Probabilistic Logic (NeurIPS 2023)

28
Experimental
10 jina-ai/llm-query-expansion

Query Expension for Better Query Embedding using LLMs

28
Experimental
11 CodeSoul-co/THETA

LLM-adaptive embeddings (Zero-shot / LoRA) with Generative Topic Modeling &...

27
Experimental
12 Benja1972/topicphrase

Simple project for extraction of key-phrases from single document based on...

24
Experimental
13 IsmaelMekene/meteor-CUTIE

Spatial and Semantic Segementation

23
Experimental
14 FelipeBenavidesMz/AlphaEarth-Interpretability-Experiments

Binary classification experiments to interpret Google AlphaEarth Foundation...

22
Experimental
15 Jiayu7Yao/llm-classifier

Classify, cluster, and extract data using structured LLM outputs with...

22
Experimental
16 Blue16-WangFudi/DialectSense

Chinese dialect identification using audio embeddings from LLMs.

21
Experimental
17 aws-samples/finetune-bge-embeddings-blog

Code associated with the blog post titled, "Fine-Tuning BGE Embeddings Using...

21
Experimental
18 AnderssonProgramming/llm-embeddings-text-preprocessing

LLM text preprocessing and embedding pipeline implementation for the...

19
Experimental
19 LivingFutureLab/UQABench

[KDD 2025] The source code for UQABench

19
Experimental
20 shimo-lab/modelmap

Embedding language models in probability space via log-likelihood vectors

18
Experimental
21 rag-fish/noesisnoema-pipeline

Modular pipeline for building RAG and LLM workflows in Colab, including...

17
Experimental
22 zh-he/Document-Based-Fine-Tuning-Tool

One-stop pipeline for building IR datasets from PDFs and fine-tuning...

17
Experimental
23 csinva/fmri

Experiments with language fMRI data from Alex Huth lab. More organized repo...

16
Experimental
24 warrofua/n-dimensional-llm

Research exploration of multi‑field information bottlenecks and...

15
Experimental
25 aws-samples/fine-tune-embedding-models-on-sagemaker

This repository contains samples for fine-tuning embedding models using...

15
Experimental
26 csinva/interpretable-embeddings

Interpretable text embeddings by asking LLMs yes/no questions (NeurIPS 2024)

14
Experimental
27 vidhiJain/SpatialEmbeddings

Learning Embeddings that Capture Spatial Semantics for Indoor Navigation,...

14
Experimental
28 quantumxiaol/activation_beacon

fork from...

14
Experimental
29 rubsj/ai-contrastive-embedding-finetuning

Domain-specific embedding fine-tuning with contrastive learning and PEFT/LoRA

14
Experimental
30 ksm26/Embedding-Models-From-Architecture-to-Implementation

Understand and build embedding models, focusing on word and sentence...

14
Experimental
31 meghanmane84/LLM-Manifold-Based-Compression-Techniques

Research code for LLM Compression using Functional Algorithms, exploring...

13
Experimental
32 PetropoulakisPanagiotis/igae

State Representations as Incentives for Reinforcement Learning Agents: A...

12
Experimental
33 NC0DER/LMRank

LMRank: Utilizing Pre-Trained Language Models and Dependency Parsing for...

12
Experimental
34 sine2pi/ASR-model

ASR model

12
Experimental
35 LCEmT/LCEmT

Lossless Compression Techniques for Embedding Tables in Substantial Deep...

11
Experimental
36 AparnaRoy76/Fine-Tune-Embedding-Model

🚀 Generate high-quality triplet datasets for job titles & skills, and...

11
Experimental
37 IMSUVEN/wubba

Wubba learns layout-invariant embeddings from raw HTML using contrastive...

11
Experimental
38 1kkiRen/Embeddings-Division

Python script for dividing embedding layer of LLM.

11
Experimental
39 Renatoelho/embeddings-consultas-similaridade

Vou mostrar como converter textos simples em representações matemáticas...

11
Experimental
40 kushagraghosh/EuroSAT

Trained a ResNet50 model on the EuroSAT satellite imagery dataset w/...

11
Experimental
41 daniau23/Fine_Tuning_LLMs_and_Embeddings

Exploring the fine tuning of both LLMs and Embedding models.

11
Experimental