huggingface/text-embeddings-inference

A blazing fast inference solution for text embeddings models

/ 100

Established

Leverages Flash Attention, Candle, and cuBLASLt for optimized transformer inference; supports dynamic token-based batching for variable sequence lengths. Deploys embedding, re-ranking, and sequence classification models via REST and gRPC APIs with no compilation step, Safetensors/ONNX loading, and production observability through OpenTelemetry metrics.

4,582 stars. Actively maintained with 9 commits in the last 30 days.

No Package No Dependents

Maintenance 20 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 19 / 25

How are scores calculated?

Stars

4,582

Forks

370

Language

Rust

License

Apache-2.0

Category

text-embedding-runtimes

Last pushed

Mar 12, 2026

Commits (30d)

GitHub

Text Embedding Runtimes · 42 tools

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/embeddings/huggingface/text-embeddings-inference"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

Featured in

Embeddings Are Easier Than Whatever You're Doing Instead

Related tools

Anush008/fastembed-rs

Rust library for vector embeddings and reranking.

MinishLab/model2vec-rs

Official Rust Implementation of Model2Vec

finalfusion/finalfusion-rust

finalfusion embeddings in Rust

finalfusion/finalfusion-python

Finalfusion embeddings in Python

benoitc/erlang-python

Execute Python from Erlang using dirty NIFs with GIL-aware execution, rate limiting, and...

Explore Embedding Tools

All categories Trending Embeddings directory Insights