huggingface/text-embeddings-inference
A blazing fast inference solution for text embeddings models
Leverages Flash Attention, Candle, and cuBLASLt for optimized transformer inference; supports dynamic token-based batching for variable sequence lengths. Deploys embedding, re-ranking, and sequence classification models via REST and gRPC APIs with no compilation step, Safetensors/ONNX loading, and production observability through OpenTelemetry metrics.
4,582 stars. Actively maintained with 9 commits in the last 30 days.
Stars
4,582
Forks
370
Language
Rust
License
Apache-2.0
Category
Last pushed
Mar 12, 2026
Commits (30d)
9
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/embeddings/huggingface/text-embeddings-inference"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
Anush008/fastembed-rs
Rust library for vector embeddings and reranking.
MinishLab/model2vec-rs
Official Rust Implementation of Model2Vec
finalfusion/finalfusion-rust
finalfusion embeddings in Rust
finalfusion/finalfusion-python
Finalfusion embeddings in Python
benoitc/erlang-python
Execute Python from Erlang using dirty NIFs with GIL-aware execution, rate limiting, and...