Skip to content

Commit 89c63fd

Browse files
gregpr07MagMueller
andauthored
Added custom actions registry and fixed extraction layer (browser-use#20)
* Validator * Test mind2web * Cleaned up logger * Pytest logger * Cleaned up logger * Disable flag for human input * Multiple clicks per button * Multiple clicks per button * More structured system prompt * Fields with description * System prompt example * One logger * Cleaner logging * Log step in step function * Fix critical clicking error - wrong argument used * Improved thought process of agent * Improve system prompt * Remove human input message * Custome action registration * Pydantic model for custom actions * Pydantic model for custome output * Runs through, model outputs functions, but not called yet * Work in progress - description for custome actions * Description works, but schema not yet * Model can call the right action - but is not executed * Seperate is_controller_action and is_custom_action * Works! Model can call custom function * Use registry for action, but result is not feed back to model * Include result in messages * Works with custom function - but typing is not correct * Renamed registry * First test cases * Captcha tests * Pydantic for tests * Improve prompts for multy step * System prompt structure * Handle errors like validation error * Refactor error handling in agent * Refactor error handling in agent * Improved logging * Update view * Fix click parameter to index * Simplify dynamic actions * Use run instead of step * Rename history * Rename AgentService to Agent * Rename ControllerService to Controller * Pytest file * Rename get state * Rename BrowserService * reversed dom extraction recursion to while * Rename use_vision * Rename use_vision * reversed dom tree items and made browser less anoying * Renaming and fixing type errors * Renamed class names for agent * updated requirements * Update prompt * Action registration works for user and controller * Fix done call by returning ActionResult * Fix if result is none * Rename AgentOutput and ActionModel * Improved prompt Passes 6/8 tests from test_agent_actions * Calculate token cost * Improve display * Simplified logger * Test function calling * created super simple xpath extraction algo * Tests logging * tiny fixes to dom extraction * Remove test * Dont log number of clicks * Pytest file * merged per element js checks * Check if driver is still open * super fast processing * fixed agent planning and stuff * Fix example * Fix example * Improve error * Improved error correction * New line for step * small type error fixes * Test for pydantic * Fix line * Removed sample * fixed readme and examples --------- Co-authored-by: magmueller <[email protected]>
1 parent 5b5ee3e commit 89c63fd

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

42 files changed

+19240
-1319
lines changed

.gitignore

+5-1
Original file line numberDiff line numberDiff line change
@@ -160,4 +160,8 @@ cython_debug/
160160
# and can be added to the global gitignore or merged into this file. For a more nuclear
161161
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
162162
#.idea/
163-
temp
163+
temp
164+
tmp
165+
166+
167+
.DS_Store

.vscode/launch.json

+42-11
Original file line numberDiff line numberDiff line change
@@ -2,20 +2,51 @@
22
"version": "0.2.0",
33
"configurations": [
44
{
5-
"name": "Python: Debug Tests",
5+
"name": "Python Debugger: Module",
6+
"type": "debugpy",
7+
"request": "launch",
8+
"module": "examples.extend_actions"
9+
},
10+
{
11+
"name": "Python: Debug extend_actions",
12+
"type": "module",
13+
"request": "launch",
14+
"module": "examples.extend_actions",
15+
"console": "integratedTerminal",
16+
"justMyCode": false,
17+
"env": {
18+
"PYTHONPATH": "${workspaceFolder}"
19+
}
20+
},
21+
{
22+
"name": "Python: Debug Captcha Tests",
23+
"type": "python",
24+
"request": "launch",
25+
"module": "pytest",
26+
"args": [
27+
"tests/test_agent_actions.py",
28+
"-v",
29+
"-k",
30+
"test_captcha_solver",
31+
"--capture=no",
32+
],
33+
"console": "integratedTerminal",
34+
"justMyCode": false
35+
},
36+
{
37+
"name": "Python: Debug Ecommerce Interaction",
638
"type": "python",
739
"request": "launch",
8-
"program": "${workspaceFolder}/.venv/bin/pytest",
40+
"module": "pytest",
941
"args": [
10-
"src/tests/test_kayak_search.py",
42+
"tests/test_agent_actions.py",
1143
"-v",
12-
"-s"
44+
"-k",
45+
"test_ecommerce_interaction",
46+
"--capture=no",
1347
],
1448
"console": "integratedTerminal",
15-
"justMyCode": false,
16-
"env": {
17-
"PYTHONPATH": "${workspaceFolder}"
18-
}
19-
}
20-
]
21-
}
49+
"justMyCode": false
50+
}
51+
]
52+
}

README.md

+80-125
Original file line numberDiff line numberDiff line change
@@ -1,194 +1,149 @@
1-
<div align="center">
2-
3-
# 🌐 Browser-Use
1+
# 🌐 Browser Use
42

5-
### Open-Source Web Automation with LLMs
3+
Make websites accessible for AI agents 🤖.
64

75
[![GitHub stars](https://img.shields.io/github/stars/gregpr07/browser-use?style=social)](https://github.com/gregpr07/browser-use/stargazers)
86
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
97
[![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue.svg)](https://www.python.org/downloads/)
10-
[![Discord](https://img.shields.io/discord/1303749220842340412?color=7289DA&label=Discord&logo=discord&logoColor=white)](https://discord.gg/uaCtrbbv)
8+
[![Discord](https://img.shields.io/discord/1303749220842340412?color=7289DA&label=Discord&logo=discord&logoColor=white)](https://link.browser-use.com/discord)
119

12-
</div>
10+
Browser use is the easiest way to connect your AI agents with the browser. If you have used Browser Use for your project feel free to show it off in our [Discord](https://link.browser-use.com/discord).
1311

14-
Let LLMs interact with websites through a simple interface.
12+
# Quick start
1513

16-
## Short Example
14+
With pip:
1715

1816
```bash
1917
pip install browser-use
2018
```
2119

20+
Spin up your agent:
21+
2222
```python
2323
from langchain_openai import ChatOpenAI
2424
from browser_use import Agent
2525

2626
agent = Agent(
27-
task="Go to hackernews on show hn and give me top 10 post titles, their points and hours. Calculate for each the ratio of points per hour.",
27+
task="Find a one-way flight from Bali to Oman on 12 January 2025 on Google Flights. Return me the cheapest option.",
2828
llm=ChatOpenAI(model="gpt-4o"),
2929
)
3030

3131
# ... inside an async function
3232
await agent.run()
3333
```
3434

35-
## Demo
36-
37-
<div>
38-
<a href="https://www.loom.com/share/63612b5994164cb1bb36938d62fe9983">
39-
<img style="max-width:300px;" src="https://cdn.loom.com/sessions/thumbnails/63612b5994164cb1bb36938d62fe9983-7133f9e169672e6f-full-play.gif">
40-
</a>
41-
<p><i>Prompt: Go to hackernews on show hn and give me top 10 post titles, their points and hours. Calculate for each the ratio of points per hour. (1x speed) </i></p>
42-
</div>
43-
<div>
44-
<a href="https://www.loom.com/share/2af938b9f8024647950a9e18b3946054">
45-
<img style="max-width:300px;" src="https://cdn.loom.com/sessions/thumbnails/2af938b9f8024647950a9e18b3946054-b99c733cf670e568-full-play.gif">
46-
</a>
47-
<p><i>Prompt: Search the top 3 AI companies 2024 and find what out what concrete hardware each is using for their model. (1x speed)</i></p>
48-
</div>
49-
50-
51-
52-
<div style="display: flex; justify-content: space-between; margin-top: 20px;">
53-
<div style="flex: 1; margin-right: 10px;">
54-
<img style="width: 100%;" src="./static/kayak.gif" alt="Kayak flight search demo">
55-
<p><i>Prompt: Go to kayak.com and find a one-way flight from Zürich to San Francisco on 12 January 2025. (2.5x speed)</i></p>
56-
</div>
57-
<div style="flex: 1; margin-left: 10px;">
58-
<img style="width: 100%;" src="./static/photos.gif" alt="Photos search demo">
59-
<p><i>Prompt: Opening new tabs and searching for images for these people: Albert Einstein, Oprah Winfrey, Steve Jobs. (2.5x speed)</i></p>
60-
</div>
61-
</div>
62-
</div>
63-
64-
## Local Setup
65-
66-
1. Create a virtual environment and install dependencies:
35+
And don't forget to add your API keys to your `.env` file.
6736

6837
```bash
69-
# To install all dependencies including dev
70-
pip install . ."[dev]"
38+
OPENAI_API_KEY=
39+
ANTHROPIC_API_KEY=
7140
```
7241

73-
2. Add your API keys to the `.env` file:
42+
# Demo
7443

75-
```bash
76-
cp .env.example .env
77-
```
44+
DEMO VIDEO HERE
7845

79-
E.g. for OpenAI:
46+
# Features ⭐
8047

81-
```bash
82-
OPENAI_API_KEY=
83-
```
48+
- Vision + html extraction
49+
- Automatic multi-tab management
50+
- Extract clicked elements XPaths
51+
- Add custom actions (e.g. add data to database which the LLM can use)
52+
- Self-correcting
53+
- Use any LLM supported by LangChain (e.g. gpt4o, gpt4o mini, claude 3.5 sonnet, llama 3.1 405b, etc.)
8454

85-
You can use any LLM model supported by LangChain by adding the appropriate environment variables. See [langchain models](https://python.langchain.com/docs/integrations/chat/) for available options.
55+
## Register custom actions
8656

87-
## Features
57+
If you want to add custom actions your agent can take, you can register them like this:
8858

89-
- Universal LLM Support - Works with any Language Model
90-
- Interactive Element Detection - Automatically finds interactive elements
91-
- Multi-Tab Management - Seamless handling of browser tabs
92-
- XPath Extraction for scraping functions - No more manual DevTools inspection
93-
- Vision Model Support - Process visual page information
94-
- Customizable Actions - Add your own browser interactions (e.g. add data to database which the LLM can use)
95-
- Handles dynamic content - dont worry about cookies or changing content
96-
- Chain-of-thought prompting with memory - Solve long-term tasks
97-
- Self-correcting - If the LLM makes a mistake, the agent will self-correct its actions
59+
```python
60+
from browser_use.agent.service import Agent
61+
from browser_use.browser.service import Browser
62+
from browser_use.controller.service import Controller
9863

99-
## Advanced Examples
64+
# Initialize controller first
65+
controller = Controller()
10066

101-
### Chain of Agents
67+
@controller.action('Ask user for information')
68+
def ask_human(question: str, display_question: bool) -> str:
69+
return input(f'\n{question}\nInput: ')
70+
```
10271

103-
You can persist the browser across multiple agents and chain them together.
72+
Or define your parameters using Pydantic
10473

10574
```python
106-
from asyncio import run
107-
from browser_use import Agent, Controller
108-
from dotenv import load_dotenv
109-
from langchain_anthropic import ChatAnthropic
110-
load_dotenv()
111-
112-
# Persist browser state across agents
113-
controller = Controller()
75+
class JobDetails(BaseModel):
76+
title: str
77+
company: str
78+
job_link: str
79+
salary: Optional[str] = None
80+
81+
@controller.action('Save job details which you found on page', param_model=JobDetails, requires_browser=True)
82+
def save_job(params: JobDetails, browser: Browser):
83+
print(params)
84+
85+
# use the browser normally
86+
browser.driver.get(params.job_link)
87+
```
11488

115-
# Initialize browser agent
116-
agent1 = Agent(
117-
task="Open 3 VCs websites in the New York area.",
118-
llm=ChatAnthropic(model="claude-3-5-sonnet-20240620", timeout=25, stop=None),
119-
controller=controller)
120-
agent2 = Agent(
121-
task="Give me the names of the founders of the companies in all tabs.",
122-
llm=ChatAnthropic(model="claude-3-5-sonnet-20240620", timeout=25, stop=None),
123-
controller=controller)
89+
and then run your agent:
12490

125-
run(agent1.run())
126-
founders, history = run(agent2.run())
91+
```python
92+
model = ChatAnthropic(model_name='claude-3-5-sonnet-20240620', timeout=25, stop=None, temperature=0.3)
93+
agent = Agent(task=task, llm=model, controller=controller)
12794

128-
print(founders)
95+
await agent.run()
12996
```
13097

131-
You can use the `history` to run the agents again deterministically.
98+
## Get XPath history
13299

133-
## Command Line Usage
100+
To get the entire history of everything the agent has done, you can use the output of the `run` method:
134101

135-
Run examples directly from the command line (clone the repo first):
102+
```python
103+
history: list[AgentHistory] = await agent.run()
136104

137-
```bash
138-
python examples/try.py "Your query here" --provider [openai|anthropic]
105+
print(history)
139106
```
140107

141-
### Anthropic
108+
## More examples
142109

143-
You need to add `ANTHROPIC_API_KEY` to your environment variables. Example usage:
110+
For more examples see the [examples](examples) folder or join the [Discord](https://link.browser-use.com/discord) and show off your project.
144111

145-
```bash
112+
# Contributing
146113

147-
python examples/try.py "Search the top 3 AI companies 2024 and find out in 3 new tabs what hardware each is using for their models" --provider anthropic
148-
```
114+
Contributions are welcome! Feel free to open issues for bugs or feature requests.
149115

150-
### OpenAI
116+
## Setup
151117

152-
You need to add `OPENAI_API_KEY` to your environment variables. Example usage:
118+
1. Create a virtual environment and install dependencies:
153119

154120
```bash
155-
python examples/try.py "Go to hackernews on show hn and give me top 10 post titles, their points and hours. Calculate for each the ratio of points per hour. " --provider anthropic
121+
# To install all dependencies including dev
122+
pip install -r requirements.txt -r requirements-dev.txt
156123
```
157124

158-
## 🤖 Supported Models
159-
160-
All LangChain chat models are supported. Tested with:
161-
162-
- GPT-4o
163-
- GPT-4o Mini
164-
- Claude 3.5 Sonnet
165-
- LLama 3.1 405B
125+
2. Add your API keys to the `.env` file:
166126

167-
## Limitations
127+
```bash
128+
cp .env.example .env
129+
```
168130

169-
- When extracting page content, the message length increases and the LLM gets slower.
170-
- Currently one agent costs about 0.01$
171-
- Sometimes it tries to repeat the same task over and over again.
172-
- Some elements might not be extracted which you want to interact with.
173-
- What should we focus on the most?
174-
- Robustness
175-
- Speed
176-
- Cost reduction
131+
or copy the following to your `.env` file:
177132

178-
## Roadmap
133+
```bash
134+
OPENAI_API_KEY=
135+
ANTHROPIC_API_KEY=
136+
```
179137

180-
- [x] Save agent actions and execute them deterministically
181-
- [ ] Pydantic forced output
182-
- [ ] Third party SERP API for faster Google Search results
183-
- [ ] Multi-step action execution to increase speed
184-
- [ ] Test on mind2web dataset
185-
- [ ] Add more browser actions
138+
You can use any LLM model supported by LangChain by adding the appropriate environment variables. See [langchain models](https://python.langchain.com/docs/integrations/chat/) for available options.
186139

187-
## Contributing
140+
### Building the package
188141

189-
Contributions are welcome! Feel free to open issues for bugs or feature requests.
142+
```bash
143+
hatch build
144+
```
190145

191-
Feel free to join the [Discord](https://discord.gg/uaCtrbbv) for discussions and support.
146+
Feel free to join the [Discord](https://link.browser-use.com/discord) for discussions and support.
192147

193148
---
194149

browser_use/__init__.py

+7-3
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,10 @@
1-
from browser_use.agent.service import AgentService as Agent
2-
from browser_use.browser.service import BrowserService as Browser
3-
from browser_use.controller.service import ControllerService as Controller
1+
from browser_use.logging_config import setup_logging
2+
3+
setup_logging()
4+
5+
from browser_use.agent.service import Agent as Agent
6+
from browser_use.browser.service import Browser as Browser
7+
from browser_use.controller.service import Controller as Controller
48
from browser_use.dom.service import DomService
59

610
__all__ = ['Agent', 'Browser', 'Controller', 'DomService']

0 commit comments

Comments
 (0)