devxiongmao/llm-scorecaster

LLM-Scorecaster is a Python-based system designed to evaluate and analyze LLM-generated responses. It calculates a variety of metric scores (either synchronously or async) for LLM responses against user-persisted inputs, then emits the results. Ideal for NLP researchers and developers looking to assess LLM accuracy and performance with precision.

/ 100

Experimental

No Package No Dependents

Maintenance 13 / 25

Adoption 0 / 25

Maturity 9 / 25

Community 0 / 25

How are scores calculated?

Stars

—

Forks

—

Language

Python

License

MIT

Category

llm-evaluation-benchmarking

Last pushed

Mar 16, 2026

Commits (30d)

GitHub

LLM Evaluation Benchmarking · 120 tools

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/devxiongmao/llm-scorecaster"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

Higher-rated alternatives

google/langfun

OO for LLMs

tanaos/artifex

Small Language Model Inference, Fine-Tuning and Observability. No GPU, no labeled data needed.

vulnerability-lookup/VulnTrain

A tool to generate datasets and models based on vulnerabilities descriptions from @Vulnerability-Lookup.

DataScienceUIBK/HintEval

HintEval💡: A Comprehensive Framework for Hint Generation and Evaluation for Questions

microsoft/LMChallenge

A library & tools to evaluate predictive language models.

Explore NLP Tools

All categories Trending NLP directory Insights