ITBench

📢 Announcements

Latest Updates

[February 7, 2025] Initial release 0.1.0! 🎉 Includes research paper, self-hosted environment setup tooling, sample scenarios, and baseline agents

Coming Soon

[March 2025] Limited Access Beta 🏆 Get invite-only access to the ITBench hosted scenario evnironments and be one of the first participants on our leaderboard. You register your SRE, FinOps, or CISO AI agent and prep it for interacting with the ITBench-hosted scenario environments. ITBench automatically handles scenario deployment, agent evaluation, and leaderboard updates. To request access, e-mail us here.
[April 2025] Public Launch 🚀 Complete ITBench platform access opens to all.

Overview

The goal of ITBench is to measure the performance of AI agents across a wide variety of complex and real-life IT automation tasks targetting three key personas:

Site Reliability Engineering (SRE) - focusing on availability and resiliency
Financial Operations (FinOps) - focusing on enforcing cost efficiencies and optimizing return on investment
Compliance and Security Operations (CISO) - focusing on ensuring compliance and security of IT implementations

Through push-button workflows and interpretable metrics, it helps AI researchers and developers explore both the challenges and potential of IT automation.

ITBench centers on two core principles:

Real-world representation of IT environments and incident scenarios that happen in such environments
Open, extensible framework with comprehensive IT coverage

ITBench enables researchers and developers to replicate real-world incidents in Kubernetes environments (scenarios) and develop AI agents to address them. As of February 2025, we are open-sourcing:

Push-button deployment tooling for environment setup
Framework for recreating:
- 6 SRE scenarios
- 4 categories of CISO scenarios
Two reference AI agents:
- CISO (Chief Information Security Officer) Agent
- SRE (Site Reliability Engineering) Agent

Scenarios

ITBench incorporates a collection of problems that we call scenarios. For example, one of the SRE scenarios in ITBench is to resolve a “High error rate on service order-management” in a Kubernetes environment. Another scenario that is relevant for the CISO persona involves assessing the compliance posture for a “new control rule detected for RHEL 9.” Each of the ITBench scenarios are deployed in an operational environment in which problem(s) occur.

The scenarios can be found here.

Agents

Two baseline agents (SRE-FinOps and CISO) are being open-sourced with the ITBench. We use the open-source CrewAI framework to create and manage agents. The agents can be configured to use various LLMs either through watsonx, Azure, or vLLM. Each agent is initialized with a prompt that describes its goal, the context, the tasks, and the expected output format. In-context learning examples are included to guide the agent and demonstrate tool usage. Agents use natural language to access tools to interact with the environment for information gathering.

CAA Agent

Source code repository here.

SRE Agent

Source code repository here.

How to Cite

@misc{jha2025itbench,
      title={ITBench: Evaluating AI Agents across Diverse Real-World IT Automation Tasks},
      author={Jha, Saurabh and Arora, Rohan and Watanabe, Yuji and others},
      year={2025},
      url={https://github.com/IBM/itbench-sample-scenarios/blob/main/it_bench_arxiv.pdf}
}

Contributors

Saurabh Jha
Rohan Arora
Yuji Watanabe
Takumi Yanagawa
Yinfang Chen (UIUC - University of Illinois at Urbana-Champaign)
Jackson Clark (UIUC - University of Illinois at Urbana-Champaign)
Bhavya Bhavya
Mudit Verma
Harshit Kumar
Hirokuni Kitahara
Noah Zheutlin
Saki Takano
Divya Pathak
Felix George
Xinbo Wu (UIUC - University of Illinois at Urbana-Champaign)
Bekir O Turkkan
Gerard Vanloo
Michael Nidd
Ting Dai
Oishik Chatterjee
Pranjal Gupta
Suranjana Samanta
Pooja Aggarwal
Rong Lee
Pavankumar Murali
Jae-wook Ahn
Debanjana Kar
Ameet Rahane
Carlos Fonseca
Amit Paradkar
Yu Deng
Pratibha Moogi
Prateeti Mohapatra
Naoki Abe
Chandrasekhar Narayanaswami
Tianyin Xu (UIUC - University of Illinois at Urbana-Champaign)
Lav R. Varshney (UIUC - University of Illinois at Urbana-Champaign)
Ruchi Mahindru
Anca Sailer
Laura Shwartz
Daby Sow
Nicholas C. M. Fuller
Ruchir Puri

Contacts

agent-bench-automation@ibm.com
Saurabh Jha (saurabh.jha@ibm.com)
Yuji Wantabe (muew@jp.ibm.com)
Ruchi Mahindru (rmahindr@us.ibm.com)
Anca Sailer (ancas@us.ibm.com)

Name	Name	Last commit message	Last commit date
Latest commit Red-GV feature: release v0 of ITBench Mar 19, 2025 80c6319 · Mar 19, 2025 History 2 Commits
images	images	feature: release v0 of ITBench	Mar 19, 2025
LICENSE	LICENSE	feature: release v0 of ITBench	Mar 19, 2025
README.md	README.md	feature: release v0 of ITBench	Mar 19, 2025
it_bench_arxiv.pdf	it_bench_arxiv.pdf	feature: release v0 of ITBench	Mar 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ITBench

📢 Announcements

Latest Updates

Coming Soon

Overview

Scenarios

Agents

CAA Agent

SRE Agent

How to Cite

Contributors

Contacts

About

Releases

Packages

Contributors 6

License

IBM/ITBench

Folders and files

Latest commit

History

Repository files navigation

ITBench

📢 Announcements

Latest Updates

Coming Soon

Overview

Scenarios

Agents

CAA Agent

SRE Agent

How to Cite

Contributors

Contacts

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Packages