xlang-ai/instructor-embedding
[ACL 2023] One Embedder, Any Task: Instruction-Finetuned Text Embeddings
Leverages instruction-following to condition embeddings on task-specific prompts (e.g., "Represent the Science title for retrieval"), enabling a single model to adapt to 70+ diverse tasks without additional finetuning. Built on transformer architecture with three model sizes (base/large/xl) available via Hugging Face, integrated with PyTorch and supporting batch inference with configurable output formats (numpy arrays or tensors). Evaluates across MTEB benchmarks and domain-specific datasets, with support for downstream applications like semantic search, clustering, and text classification through unified embedding API.
2,023 stars. No commits in the last 6 months.
Stars
2,023
Forks
156
Language
Python
License
Apache-2.0
Category
Last pushed
Jan 15, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/embeddings/xlang-ai/instructor-embedding"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
ContextualAI/gritlm
Generative Representational Instruction Tuning
liuqidong07/LLMEmb
[AAAI'25 Oral] The official implementation code of LLMEmb
hpcaitech/CachedEmbedding
A memory efficient DLRM training solution using ColossalAI
shobrook/weightgain
Train an adapter for any embedding model in under a minute
jina-ai/llm-query-expansion
Query Expension for Better Query Embedding using LLMs