justinwetch/SkillEval
A visual workbench for A/B testing AI skills. Upload two skill files, run them through a batch of test prompts, and let an AI judge score the results.
The workbench auto-generates evaluation criteria and test prompts using an AI, supporting text or visual outputs via an optional Puppeteer-based screenshot server. It integrates exclusively with the Anthropic API, allowing users to configure and run evaluations using various Claude models (Opus, Sonnet, Haiku) for both skill execution and judging. The client-side application is built with React and requires Node.js.
Stars
21
Forks
1
Language
JavaScript
License
MIT
Category
Last pushed
Mar 12, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/agents/justinwetch/SkillEval"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
memodb-io/Acontext
Agent Skills as a Memory Layer
powroom/flins
Universal skill installer for AI coding agents
vaibhavtupe/skill-guard
The quality gate for Agent Skills — validate, secure, conflict-detect, and test skills across...
DougTrajano/pydantic-ai-skills
This package implements Agent Skills (https://agentskills.io) support with progressive...
ARPAHLS/skillware
A Python framework for modular, self-contained skill management for machines.