Embedding Model Tuning Embedding Tools

Tools, techniques, and frameworks for fine-tuning embedding models on domain-specific data to improve performance on downstream tasks. Does NOT include pre-trained embedding models, embedding inference/serving, or applications built on top of embeddings.

There are 41 embedding model tuning tools tracked. 1 score above 50 (established tier). The highest-rated is ContextualAI/gritlm at 56/100 with 688 stars and 12,353 monthly downloads.

Get all 41 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=embeddings&subcategory=embedding-model-tuning&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Tool	Score	Tier	Stars	Language
1	ContextualAI/gritlm Generative Representational Instruction Tuning	56	Established	688	Jupyter Notebook
2	xlang-ai/instructor-embedding [ACL 2023] One Embedder, Any Task: Instruction-Finetuned Text Embeddings	38	Emerging	2,023	Python
3	liuqidong07/LLMEmb [AAAI'25 Oral] The official implementation code of LLMEmb	35	Emerging	52	Python
4	ritesh-modi/embedding-hallucinations This repo shows how foundational model hallucinates and how we can fix such...	33	Emerging	9	Python
5	hpcaitech/CachedEmbedding A memory efficient DLRM training solution using ColossalAI	33	Emerging	107	Python
6	ritesh-modi/fine-tuning-embeddings-template This repo is a template to fine-tune embedding models using...	31	Emerging	7	Python
7	shobrook/weightgain Train an adapter for any embedding model in under a minute	30	Emerging	129	Python
8	lperezmo/embeddings-extraction Scripts for reading, extracting, and organizing data from either HTML or PDF...	29	Experimental	13	Python
9	jjcmoon/DeepSoftLog Soft-Unification in Deep Probabilistic Logic (NeurIPS 2023)	28	Experimental	10	Python
10	jina-ai/llm-query-expansion Query Expension for Better Query Embedding using LLMs	28	Experimental	68	Python
11	CodeSoul-co/THETA LLM-adaptive embeddings (Zero-shot / LoRA) with Generative Topic Modeling &...	27	Experimental	11	Python
12	Benja1972/topicphrase Simple project for extraction of key-phrases from single document based on...	24	Experimental	7	Python
13	IsmaelMekene/meteor-CUTIE Spatial and Semantic Segementation	23	Experimental	2	Python
14	FelipeBenavidesMz/AlphaEarth-Interpretability-Experiments Binary classification experiments to interpret Google AlphaEarth Foundation...	22	Experimental	—	Jupyter Notebook
15	Jiayu7Yao/llm-classifier Classify, cluster, and extract data using structured LLM outputs with...	22	Experimental	—	Python
16	Blue16-WangFudi/DialectSense Chinese dialect identification using audio embeddings from LLMs.	21	Experimental	2	Python
17	aws-samples/finetune-bge-embeddings-blog Code associated with the blog post titled, "Fine-Tuning BGE Embeddings Using...	21	Experimental	11	Jupyter Notebook
18	AnderssonProgramming/llm-embeddings-text-preprocessing LLM text preprocessing and embedding pipeline implementation for the...	19	Experimental	—	Jupyter Notebook
19	LivingFutureLab/UQABench [KDD 2025] The source code for UQABench	19	Experimental	13	Python
20	shimo-lab/modelmap Embedding language models in probability space via log-likelihood vectors	18	Experimental	16	Jupyter Notebook
21	rag-fish/noesisnoema-pipeline Modular pipeline for building RAG and LLM workflows in Colab, including...	17	Experimental	3	Python
22	zh-he/Document-Based-Fine-Tuning-Tool One-stop pipeline for building IR datasets from PDFs and fine-tuning...	17	Experimental	2	Python
23	csinva/fmri Experiments with language fMRI data from Alex Huth lab. More organized repo...	16	Experimental	4	Jupyter Notebook
24	warrofua/n-dimensional-llm Research exploration of multi‑field information bottlenecks and...	15	Experimental	—	Python
25	aws-samples/fine-tune-embedding-models-on-sagemaker This repository contains samples for fine-tuning embedding models using...	15	Experimental	15	Jupyter Notebook
26	csinva/interpretable-embeddings Interpretable text embeddings by asking LLMs yes/no questions (NeurIPS 2024)	14	Experimental	46	Python
27	vidhiJain/SpatialEmbeddings Learning Embeddings that Capture Spatial Semantics for Indoor Navigation,...	14	Experimental	9	Python
28	quantumxiaol/activation_beacon fork from...	14	Experimental	—	Python
29	rubsj/ai-contrastive-embedding-finetuning Domain-specific embedding fine-tuning with contrastive learning and PEFT/LoRA	14	Experimental	—	HTML
30	ksm26/Embedding-Models-From-Architecture-to-Implementation Understand and build embedding models, focusing on word and sentence...	14	Experimental	7	Jupyter Notebook
31	meghanmane84/LLM-Manifold-Based-Compression-Techniques Research code for LLM Compression using Functional Algorithms, exploring...	13	Experimental	—	Jupyter Notebook
32	PetropoulakisPanagiotis/igae State Representations as Incentives for Reinforcement Learning Agents: A...	12	Experimental	4	Python
33	NC0DER/LMRank LMRank: Utilizing Pre-Trained Language Models and Dependency Parsing for...	12	Experimental	4	Python
34	sine2pi/ASR-model ASR model	12	Experimental	1	Python
35	LCEmT/LCEmT Lossless Compression Techniques for Embedding Tables in Substantial Deep...	11	Experimental	—	C++
36	AparnaRoy76/Fine-Tune-Embedding-Model 🚀 Generate high-quality triplet datasets for job titles & skills, and...	11	Experimental	—	Jupyter Notebook
37	IMSUVEN/wubba Wubba learns layout-invariant embeddings from raw HTML using contrastive...	11	Experimental	—	Python
38	1kkiRen/Embeddings-Division Python script for dividing embedding layer of LLM.	11	Experimental	—	Python
39	Renatoelho/embeddings-consultas-similaridade Vou mostrar como converter textos simples em representações matemáticas...	11	Experimental	—	Python
40	kushagraghosh/EuroSAT Trained a ResNet50 model on the EuroSAT satellite imagery dataset w/...	11	Experimental	—	Python
41	daniau23/Fine_Tuning_LLMs_and_Embeddings Exploring the fine tuning of both LLMs and Embedding models.	11	Experimental	—	Jupyter Notebook