Skip to content

Test your prompts, agents, and RAGs. Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and CI/CD integration.

License

Notifications You must be signed in to change notification settings

hishab-nlp/promptfoo

This branch is 883 commits behind promptfoo/promptfoo:main.

Folders and files

NameName
Last commit message
Last commit date
Aug 28, 2024
Dec 20, 2024
Oct 7, 2024
Dec 13, 2024
Dec 11, 2024
Dec 23, 2024
Nov 14, 2024
Oct 22, 2024
Dec 27, 2024
Dec 27, 2024
Dec 28, 2024
Sep 17, 2024
Dec 7, 2024
Nov 30, 2023
Dec 9, 2024
Nov 14, 2024
Jun 24, 2024
Dec 24, 2024
Jul 15, 2024
Oct 23, 2024
Jul 18, 2024
Dec 25, 2024
Oct 17, 2024
Nov 4, 2024
Sep 26, 2024
Oct 9, 2024
Oct 9, 2024
Oct 7, 2024
Dec 24, 2024
Dec 24, 2024
Dec 11, 2024

Repository files navigation

Promptfoo: LLM evals & red teaming

npm npm GitHub Workflow Status MIT license Discord

promptfoo is a developer-friendly local tool for testing LLM applications. Stop the trial-and-error approach - start shipping secure, reliable AI apps.

Quick Start

# Install and initialize project
npx promptfoo@latest init

# Run your first evaluation
npx promptfoo eval

See Getting Started (evals) or Red Teaming (vulnerability scanning) for more.

What can you do with Promptfoo?

  • Test your prompts and models with automated evaluations
  • Secure your LLM apps with red teaming and vulnerability scanning
  • Compare models side-by-side (OpenAI, Anthropic, Azure, Bedrock, Ollama, and more)
  • Automate checks in CI/CD
  • Share results with your team

Here's what it looks like in action:

prompt evaluation matrix - web viewer

It works on the command line too:

prompt evaluation matrix - command line

It also can generate security vulnerability reports:

gen ai red team

Why promptfoo?

  • πŸš€ Developer-first: Fast, with features like live reload and caching
  • πŸ”’ Private: Runs 100% locally - your prompts never leave your machine
  • πŸ”§ Flexible: Works with any LLM API or programming language
  • πŸ’ͺ Battle-tested: Powers LLM apps serving 10M+ users in production
  • πŸ“Š Data-driven: Make decisions based on metrics, not gut feel
  • 🀝 Open source: MIT licensed, with an active community

Learn More

Contributing

We welcome contributions! Check out our contributing guide to get started.

Join our Discord community for help and discussion.

About

Test your prompts, agents, and RAGs. Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and CI/CD integration.

Resources

License

Citation

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • TypeScript 96.5%
  • CSS 2.1%
  • JavaScript 0.7%
  • Shell 0.3%
  • Python 0.2%
  • Smarty 0.1%
  • Other 0.1%