Maluuba/nlg-eval

Evaluation code for various unsupervised automated metrics for Natural Language Generation.

/ 100

Emerging

Implements nine metrics spanning n-gram overlap (BLEU, METEOR, ROUGE, CIDEr, SPICE) and semantic similarity approaches (SkipThought, GloVe embeddings, greedy matching). The toolkit provides both CLI and Python APIs (functional and object-oriented) for corpus-level and sentence-level evaluation, with pre-trained models and embeddings downloaded automatically during setup.

1,391 stars. No commits in the last 6 months.

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 23 / 25

How are scores calculated?

Stars

1,391

Forks

227

Language

Python

License

—

Higher-rated alternatives

google/langfun

OO for LLMs

tanaos/artifex

Small Language Model Inference, Fine-Tuning and Observability. No GPU, no labeled data needed.

vulnerability-lookup/VulnTrain

A tool to generate datasets and models based on vulnerabilities descriptions from @Vulnerability-Lookup.

DataScienceUIBK/HintEval

HintEval💡: A Comprehensive Framework for Hint Generation and Evaluation for Questions

microsoft/LMChallenge

A library & tools to evaluate predictive language models.

Explore NLP Tools

All categories Trending NLP directory Insights