google-research/rliable
[NeurIPS'21 Outstanding Paper] Library for reliable evaluation on RL and ML benchmarks, even with only a handful of seeds.
ArchivedImplements stratified bootstrap confidence intervals and performance profiles to quantify statistical uncertainty in aggregate metrics like Interquartile Mean (IQM), which is more robust to outlier tasks than mean or median. Provides specialized aggregate metrics (IQM, optimality gap, probability of improvement) alongside visualization tools for performance profiles and sample efficiency curves across multi-task benchmarks. Targets RL evaluation on standard suites like Atari, DeepMind Control, and Procgen with APIs for computing interval estimates across algorithm score matrices.
866 stars. No commits in the last 6 months.
Stars
866
Forks
49
Language
Jupyter Notebook
License
Apache-2.0
Category
Last pushed
Aug 12, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/google-research/rliable"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
opentensor/bittensor
Internet-scale Neural Networks
trailofbits/fickling
A Python pickling decompiler and static analyzer
benchopt/benchopt
A framework for reproducible, comparable benchmarks
BiomedSciAI/fuse-med-ml
A python framework accelerating ML based discovery in the medical field by encouraging code...
taoshidev/vanta-network
Vanta Network built on Bittensor