YangLinyi/GLUE-X

We leverage 14 datasets as OOD test data and conduct evaluations on 8 NLU tasks over 21 popularly used models. Our findings confirm that the OOD accuracy in NLP tasks needs to be paid more attention to since the significant performance decay compared to ID accuracy has been found in all settings.

21
/ 100
Experimental

No commits in the last 6 months.

No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 9 / 25
Maturity 8 / 25
Community 4 / 25

How are scores calculated?

Stars

93

Forks

2

Language

Python

License

Last pushed

Aug 15, 2023

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/YangLinyi/GLUE-X"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.