google-research/rliable

[NeurIPS'21 Outstanding Paper] Library for reliable evaluation on RL and ML benchmarks, even with only a handful of seeds.

Archived

/ 100

Emerging

Implements stratified bootstrap confidence intervals and performance profiles to quantify statistical uncertainty in aggregate metrics like Interquartile Mean (IQM), which is more robust to outlier tasks than mean or median. Provides specialized aggregate metrics (IQM, optimality gap, probability of improvement) alongside visualization tools for performance profiles and sample efficiency curves across multi-task benchmarks. Targets RL evaluation on standard suites like Atari, DeepMind Control, and Procgen with APIs for computing interval estimates across algorithm score matrices.

866 stars. No commits in the last 6 months.

Archived Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 15 / 25

How are scores calculated?

Stars

866

Forks

Language

Jupyter Notebook

License

Apache-2.0

Higher-rated alternatives

opentensor/bittensor

Internet-scale Neural Networks

trailofbits/fickling

A Python pickling decompiler and static analyzer

benchopt/benchopt

A framework for reproducible, comparable benchmarks

BiomedSciAI/fuse-med-ml

A python framework accelerating ML based discovery in the medical field by encouraging code...

taoshidev/vanta-network

Vanta Network built on Bittensor

Explore ML Frameworks

All categories Trending ML Framework directory Insights