fresh-stack/freshstack

This repository helps you evaluate your models on the FreshStack benchmark!

/ 100

Established

Supports both dense embedding models (via BEIR) and multi-vector retrieval systems (via PyLate/ColBERT), enabling comprehensive evaluation of diverse retrieval architectures. The framework automatically constructs benchmarks from real Stack Overflow queries and GitHub repository code, evaluated using nugget-based metrics (α-NDCG, coverage, recall) that capture partial relevance across multiple valid answers. Datasets and evaluation scripts are provided for five technical topics with standardized BEIR format compatibility.

Available on PyPI.

Maintenance 6 / 25

Adoption 11 / 25

Maturity 24 / 25

Community 9 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

Apache-2.0

Featured in

Embeddings Are Easier Than Whatever You're Doing Instead You're Shipping AI You Can't Measure

Related tools

embeddings-benchmark/mteb

MTEB: Massive Text Embedding Benchmark

yannvgn/laserembeddings

LASER multilingual sentence embeddings as a pip package

harmonydata/harmony

The Harmony Python library: a research tool for psychologists to harmonise data and...

embeddings-benchmark/results

Data for the MTEB leaderboard

Hironsan/awesome-embedding-models

A curated list of awesome embedding models tutorials, projects and communities.

Explore Embedding Tools

All categories Trending Embeddings directory Insights