Skip to content

Latest commit

 

History

History
81 lines (51 loc) · 4.42 KB

README.md

File metadata and controls

81 lines (51 loc) · 4.42 KB

Agent TARS

license GitHub contributors

Agent TARS is an open-source multimodal AI agent that leverages browser operations by visually interpreting web pages and seamlessly integrating with command lines and file systems.

Caution

DISCLAIMER: Agent TARS is still in Technical Preview stage and not stable yet. It's not recommended to use it in production.

Showcases

agent-tars-demo-01.mp4

For more showcases please head: https://agent-tars.com/showcase

✨️ Features

  • 🌐 Advanced Browser Operations: Executes sophisticated tasks like Deep Research and Operator functions through an agent framework, enabling comprehensive planning and execution.
  • 🛠️ Comprehensive Tool Support: Integrates with search, file editing, command line, and Model Context Protocol (MCP) tools to handle complex workflows.
  • 💻️ Enhanced Desktop App: A revamped UI with displays for browsers, multimodal elements, session management, model configuration, dialogue flow visualization, and browser/search status tracking.
  • 🔄 Workflow Orchestration: Seamlessly connects GUI Agent tools—search, browse, explore links, and synthesize information into final outputs.
  • ⚙️ Developer-Friendly Framework: Simplifies integration with UI-TARS and custom workflow creation for GUI Agent projects.

�Install

You can download the latest release version of Agent TARS from our releases page.

Note: If you have Homebrew installed, you can install UI-TARS Desktop by running the following command:

brew install --cask agent-tars

Getting Started

See Quick Start.

Contributing

Please read the contributing guide and let's build Agent TARS together.

Code of conduct

This repo has adopted the ByteDance Open Source Code of Conduct. Please check Code of conduct for more details.

Roadmap

Agent TARS is more than a tool —— it’s a platform for the future of multimodal agents. Upcoming enhancements include:

  • Ongoing optimization of agent framework —— GUI Agent synergy with expanded model compatibility.
  • Expansion to mobile device operations with cross-platform framework.
  • Integration with game environments for AI-driven gameplay.

Credits

Thanks to:

  • The browser-use project whose work inspired us to better operate browsers
  • @alexchenzl for developing the innovative nanobrowser Chrome extension, which provided valuable technical references during our browser control in Electron
  • @EGOIST for creating the remarkable AI chatbot ChatWise, from which we drew significant inspiration for local browser detection and local browser search.
  • Anthropic for building the Model Context Protocol to help us better manage local tools
  • puppeteer team for their excellent browser automation toolkit that greatly enhanced our workflow
  • Web Infra team and the Rslib project helps us build our libraries better.
  • The UI-TARS and UI-TARS-desktop development teams for laying crucial foundational frameworks
  • All contributors and members of the open-source community who supported this journey with their expertise and encouragement

License

Agent TARS is Apache License 2.0 licensed.