justplus/llm-eval

大语言模型评估平台,支持多种评估基准、自定义数据集和性能测试。支持基于自定义数据集的RAG评估。

45
/ 100
Emerging

Provides LLM-agnostic evaluation across multiple task formats (QA, MCQ, RAG) with built-in LLM-as-a-judge scoring using Ragas framework for RAG pipelines, and customizable Jinja2 templates for domain-specific metrics. Includes concurrent performance stress-testing with latency/throughput analysis, multi-model management via unified API configuration, and result export capabilities, all through a web UI built with DaisyUI supporting real-time task status updates.

No commits in the last 6 months.

Stale 6m No Package No Dependents
Maintenance 2 / 25
Adoption 9 / 25
Maturity 15 / 25
Community 19 / 25

How are scores calculated?

Stars

82

Forks

18

Language

Python

License

MIT

Last pushed

Aug 20, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/rag/justplus/llm-eval"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.