promptfoo and promptfoo-action

The GitHub Action is a wrapper that integrates the core testing framework into CI/CD pipelines, making them complements designed to be used together rather than alternatives.

promptfoo

Verified

promptfoo-action

Established

Maintenance 25/25

Adoption 24/25

Maturity 25/25

Community 20/25

Maintenance 13/25

Adoption 8/25

Maturity 16/25

Community 20/25

Stars: 14,219

Forks: 1,297

Downloads: 670,558

Commits (30d): 357

Language: TypeScript

License: MIT

Stars: 47

Forks: 23

Downloads: —

Commits (30d): 0

Language: TypeScript

License: MIT

No risk flags

No Package No Dependents

About promptfoo

promptfoo/promptfoo

Test your prompts, agents, and RAGs. Red teaming/pentesting/vulnerability scanning for AI. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and CI/CD integration.

Supports automated red teaming and vulnerability scanning through LLM-generated adversarial prompts, alongside traditional metric-based evaluations with custom grading logic. Executes tests locally with configurable providers (OpenAI, Anthropic, Bedrock, Ollama, etc.) while storing results for comparison, and integrates natively with GitHub code scanning and CI/CD pipelines for continuous LLM app security validation.

About promptfoo-action

promptfoo/promptfoo-action

The GitHub Action for Promptfoo. Test your prompts, agents, and RAGs. AI Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and CI/CD integration.

Automatically compares prompt changes between git commits and posts interactive before/after evaluations directly to pull requests, with support for push and manual workflow triggers. The action integrates with promptfoo's declarative YAML config system and web viewer for side-by-side result exploration, while supporting optional result caching and pass/fail thresholds to enforce quality gates in CI/CD pipelines.

Scores updated daily from GitHub, PyPI, and npm data. How scores work