justinwetch/SkillEval

A visual workbench for A/B testing AI skills. Upload two skill files, run them through a batch of test prompts, and let an AI judge score the results.

/ 100

Emerging

The workbench auto-generates evaluation criteria and test prompts using an AI, supporting text or visual outputs via an optional Puppeteer-based screenshot server. It integrates exclusively with the Anthropic API, allowing users to configure and run evaluations using various Claude models (Opus, Sonnet, Haiku) for both skill execution and judging. The client-side application is built with React and requires Node.js.

No Package No Dependents

Maintenance 13 / 25

Adoption 6 / 25

Maturity 9 / 25

Community 4 / 25

How are scores calculated?

Stars

Forks

Language

JavaScript

License

MIT

Featured in

How Agents Acquire Skills: The Emerging Architecture of Composable Capabilities

Higher-rated alternatives

memodb-io/Acontext

Agent Skills as a Memory Layer

powroom/flins

Universal skill installer for AI coding agents

vaibhavtupe/skill-guard

The quality gate for Agent Skills — validate, secure, conflict-detect, and test skills across...

DougTrajano/pydantic-ai-skills

This package implements Agent Skills (https://agentskills.io) support with progressive...

ARPAHLS/skillware

A Python framework for modular, self-contained skill management for machines.

Explore AI Agents

All categories Trending AI Agent directory Insights