google-research/rliable

[NeurIPS'21 Outstanding Paper] Library for reliable evaluation on RL and ML benchmarks, even with only a handful of seeds.

Archived
41
/ 100
Emerging

Implements stratified bootstrap confidence intervals and performance profiles to quantify statistical uncertainty in aggregate metrics like Interquartile Mean (IQM), which is more robust to outlier tasks than mean or median. Provides specialized aggregate metrics (IQM, optimality gap, probability of improvement) alongside visualization tools for performance profiles and sample efficiency curves across multi-task benchmarks. Targets RL evaluation on standard suites like Atari, DeepMind Control, and Procgen with APIs for computing interval estimates across algorithm score matrices.

866 stars. No commits in the last 6 months.

Archived Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 15 / 25

How are scores calculated?

Stars

866

Forks

49

Language

Jupyter Notebook

License

Apache-2.0

Last pushed

Aug 12, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/google-research/rliable"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.