justinwetch/SkillEval

A visual workbench for A/B testing AI skills. Upload two skill files, run them through a batch of test prompts, and let an AI judge score the results.

32
/ 100
Emerging

The workbench auto-generates evaluation criteria and test prompts using an AI, supporting text or visual outputs via an optional Puppeteer-based screenshot server. It integrates exclusively with the Anthropic API, allowing users to configure and run evaluations using various Claude models (Opus, Sonnet, Haiku) for both skill execution and judging. The client-side application is built with React and requires Node.js.

No Package No Dependents
Maintenance 13 / 25
Adoption 6 / 25
Maturity 9 / 25
Community 4 / 25

How are scores calculated?

Stars

21

Forks

1

Language

JavaScript

License

MIT

Last pushed

Mar 12, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/agents/justinwetch/SkillEval"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.