huggingface/text-embeddings-inference

A blazing fast inference solution for text embeddings models

65
/ 100
Established

Leverages Flash Attention, Candle, and cuBLASLt for optimized transformer inference; supports dynamic token-based batching for variable sequence lengths. Deploys embedding, re-ranking, and sequence classification models via REST and gRPC APIs with no compilation step, Safetensors/ONNX loading, and production observability through OpenTelemetry metrics.

4,582 stars. Actively maintained with 9 commits in the last 30 days.

No Package No Dependents
Maintenance 20 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 19 / 25

How are scores calculated?

Stars

4,582

Forks

370

Language

Rust

License

Apache-2.0

Last pushed

Mar 12, 2026

Commits (30d)

9

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/embeddings/huggingface/text-embeddings-inference"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.