Skip to content

feat: various enhancements #508

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Apr 8, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 15 additions & 39 deletions docs/demos/rosbot_xl.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,19 @@ This demo utilizes Open 3D Engine simulation and allows you to work with RAI on

## Quick start

> [!TIP]
> The demo uses the `complex_model` LLM configured in [../../config.toml](../../config.toml). The model should be a multimodal, tool-calling model.

1. Download the newest binary release:

- Ubuntu 22.04 & ros2 humble: [link](https://robotec-ml-roscon2024-demos.s3.eu-central-1.amazonaws.com/ROSCON_Release/RAIROSBotDemo_1.0.0_jammyhumble.zip)
- Ubuntu 24.04 & ros2 jazzy: [link](https://robotec-ml-roscon2024-demos.s3.eu-central-1.amazonaws.com/ROSCON_Release/RAIROSBotDemo_1.0.0_noblejazzy.zip)

2. Install required packages
2. Install and download required packages

```bash
sudo apt install ros-${ROS_DISTRO}-ackermann-msgs ros-${ROS_DISTRO}-gazebo-msgs ros-${ROS_DISTRO}-control-toolbox ros-${ROS_DISTRO}-nav2-bringup
vcs import < demos.repos
rosdep install --from-paths src --ignore-src -r -y
poetry install --with openset
```

Expand All @@ -32,56 +36,28 @@ This demo utilizes Open 3D Engine simulation and allows you to work with RAI on
If you would like more freedom to adapt the simulation to your needs, you can make changes using
[O3DE Editor](https://www.docs.o3de.org/docs/welcome-guide/) and build the project
yourself.
Please refer to [rai husarion rosbot xl demo][rai rosbot demo] for more details.
Please refer to [rai husarion rosbot xl demo](https://github.com/RobotecAI/rai-rosbot-xl-demo) for more details.

# Running RAI

1. Robot identity

Process of setting up the robot identity is described in [create_robots_whoami](../create_robots_whoami.md).
We provide ready whoami for RosBotXL in the package.

```bash
cd rai
vcs import < demos.repos
colcon build --symlink-install --packages-select rosbot_xl_whoami
```

2. Running rai nodes and agents, navigation stack and O3DE simulation.
1. Running rai nodes and agents, navigation stack and O3DE simulation.

```bash
ros2 launch ./examples/rosbot-xl.launch.py game_launcher:=path/to/RAIROSBotXLDemo.GameLauncher
```

3. Play with the demo, adding tasks to the RAI agent. Here are some examples:
2. Run streamlit gui:

```bash
# Ask robot where it is. RAI will use camera to describe the environment
ros2 action send_goal -f /perform_task rai_interfaces/action/Task "{priority: 10, description: '', task: 'Where are you?'}"

# See integration with the navigation stack
ros2 action send_goal -f /perform_task rai_interfaces/action/Task "{priority: 10, description: '', task: 'Drive 1 meter forward'}"
ros2 action send_goal -f /perform_task rai_interfaces/action/Task "{priority: 10, description: '', task: 'Spin 90 degrees'}"

# Try out more complicated tasks
ros2 action send_goal -f /perform_task rai_interfaces/action/Task "{priority: 10, description: '', task: ' Drive forward if the path is clear, otherwise backward'}"
streamlit run examples/rosbot-xl-demo.py
```

> **NOTE**: For now agent is capable of performing only 1 task at once.
> Human-Robot Interaction module is not yet included in the demo (coming soon!).

### What is happening?

By looking at the example code in [rai/examples/rosbot-xl-demo.py](../../examples/rosbot-xl-demo.py) `examples` you can see that:

- This node has no information about the robot besides what it can get from `rai_whoami_node`.
- Topics can be whitelisted to only receive information about the robot.
- Before every LLM decision, `rai_node` sends its state to the LLM Agent. By default, it contains ros interfaces (topics, services, actions) and logs summary, but the state can be extended.
- In the example we are also adding description of the camera image to the state.

If you wish, you can learn more about [configuring RAI for a specific robot](../create_robots_whoami.md).
3. Play with the demo, prompting the agent to perform tasks. Here are some examples:

[rai rosbot demo]: https://github.com/RobotecAI/rai-rosbot-xl-demo
- Where are you now?
- What do you see?
- What is the position of bed?
- Navigate to the kitchen.

> [!TIP]
> If you are having trouble running the binary, you can build it from source [here](https://github.com/RobotecAI/rai-rosbot-xl-demo).
247 changes: 247 additions & 0 deletions docs/developer_guide/tools.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,247 @@
# Tools

Tools are a fundamental concept in LangChain that allow AI models to interact with external systems and perform specific operations. Think of tools as callable functions that bridge the gap between natural language understanding and system execution.

RAI offers a comprehensive set of pre-built tools, including both general-purpose and ROS 2-specific tools [here](../../src/rai_core/rai/tools/ros2). However, in some cases, you may need to develop custom tools tailored to specific robots or applications. This guide demonstrates how to create custom tools in RAI using the [LangChain framework](https://python.langchain.com/docs/).

RAI supports two primary approaches for implementing tools, each with distinct advantages:

### `BaseTool` Class

- Offers full control over tool behavior and lifecycle
- Allows configuration parameters
- Supports stateful operations (e.g., maintaining ROS 2 connector instances)

### `@tool` Decorator

- Provides a lightweight, functional approach
- Ideal for stateless operations
- Minimizes boilerplate code
- Suited for simple, single-purpose tools

Use the `BaseTool` class when state management, or extensive configuration is required. Choose the `@tool` decorator for simple, stateless functionality where conciseness is preferred.

---

## Creating a Custom Tool

LangChain tools typically return either a string or a tuple containing a string and an artifact.

RAI extends LangChain’s tool capabilities by supporting **multimodal tools**—tools that return not only text but also other content types, such as images, audio, or structured data. This is achieved using a special object called `MultimodalArtifact` along with a custom `ToolRunner` class.

---

### Single-Modal Tool (Text Output)

Here’s an example of a single-modal tool implemented using class inheritance:

```python
from langchain_core.tools import BaseTool
from pydantic import BaseModel, Field
from typing import Type


class GrabObjectToolInput(BaseModel):
"""Input schema for the GrabObjectTool."""
object_name: str = Field(description="The name of the object to grab")


class GrabObjectTool(BaseTool):
"""Tool for grabbing objects using a robot."""
name: str = "grab_object"
description: str = "Grabs a specified object using the robot's manipulator"
args_schema: Type[GrabObjectToolInput] = GrabObjectToolInput

def _run(self, object_name: str) -> str:
"""Execute the object grabbing operation."""
try:
status = robot.grab_object(object_name)
return f"Successfully grabbed object: {object_name}, status: {status}"
except Exception as e:
return f"Failed to grab object: {object_name}, error: {str(e)}"
```

Alternatively, using the `@tool` decorator:

```python
from langchain_core.tools import tool

@tool
def grab_object(object_name: str) -> str:
"""Grabs a specified object using the robot's manipulator."""
try:
status = robot.grab_object(object_name)
return f"Successfully grabbed object: {object_name}, status: {status}"
except Exception as e:
return f"Failed to grab object: {object_name}, error: {str(e)}"
```

---

### Multimodal Tool (Text + Image Output)

RAI supports multimodal tools through the `rai.agents.tool_runner.ToolRunner` class. These tools must use this runner either directly or via agents such as [`create_react_runnable`](../../src/rai_core/rai/agents/langchain/runnables.py) to handle multimedia output correctly.

```python
from langchain_core.tools import BaseTool
from pydantic import BaseModel, Field
from typing import Type, Tuple
from rai.messages import MultimodalArtifact


class Get360ImageToolInput(BaseModel):
"""Input schema for the Get360ImageTool."""
topic: str = Field(description="The topic name for the 360 image")


class Get360ImageTool(BaseTool):
"""Tool for retrieving 360-degree images."""
name: str = "get_360_image"
description: str = "Retrieves a 360-degree image from the specified topic"
args_schema: Type[Get360ImageToolInput] = Get360ImageToolInput
response_format: str = "content_and_artifact"

def _run(self, topic: str) -> Tuple[str, MultimodalArtifact]:
try:
image = robot.get_360_image(topic)
return "Successfully retrieved 360 image", MultimodalArtifact(images=[image])
except Exception as e:
return f"Failed to retrieve image: {str(e)}", MultimodalArtifact(images=[])
```

---

### ROS 2 Tools

RAI includes a base class for ROS 2 tools, supporting configuration of readable, writable, and forbidden topics/actions/services, as well as ROS 2 connector. TODO(docs): link docs to the ARIConnector.

```python
from rai.tools.ros2.base import BaseROS2Tool
from pydantic import BaseModel, Field
from typing import Type, cast
from sensor_msgs.msg import PointCloud2


class GetROS2LidarDataToolInput(BaseModel):
"""Input schema for the GetROS2LidarDataTool."""
topic: str = Field(description="The topic name for the LiDAR data")


class GetROS2LidarDataTool(BaseROS2Tool):
"""Tool for retrieving and processing LiDAR data."""
name: str = "get_ros2_lidar_data"
description: str = "Retrieves and processes LiDAR data from the specified topic"
args_schema: Type[GetROS2LidarDataToolInput] = GetROS2LidarDataToolInput

def _run(self, topic: str) -> str:
try:
lidar_data = self.connector.receive_message(topic)
msg = cast(PointCloud2, lidar_data.payload)
# Process the LiDAR data
return f"Successfully processed LiDAR data. Detected objects: ..."
except Exception as e:
return f"Failed to process LiDAR data: {str(e)}"
```

Refer to the [BaseROS2Tool source code](../../src/rai_core/rai/tools/ros2/base.py) for more information.

---

## Tool Initialization

Tools can be initialized with parameters such as a connector, enabling custom configurations for ROS 2 environments.

```python
from rai.communication.ros2 import ROS2ARIConnector
from rai.tools.ros2 import (
GetROS2ImageTool,
GetROS2TopicsNamesAndTypesTool,
PublishROS2MessageTool,
)

def initialize_tools(connector: ROS2ARIConnector):
"""Initialize and configure ROS 2 tools.

Returns:
list: A list of configured tools.
"""
readable_names = ["/color_image5", "/depth_image5", "/color_camera_info5"]
forbidden_names = ["cmd_vel"]
writable_names = ["/to_human"]

return [
GetROS2ImageTool(
connector=connector, readable=readable_names, forbidden=forbidden_names
),
GetROS2TopicsNamesAndTypesTool(
connector=connector,
readable=readable_names,
forbidden=forbidden_names,
writable=writable_names,
),
PublishROS2MessageTool(
connector=connector, writable=writable_names, forbidden=forbidden_names
),
]
```

---

### Using Tools in a RAI Agent (Distributed Setup)

TODO(docs): add link to the BaseAgent docs (regarding distributed setup)

```python
from rai.agents import ReActAgent
from rai.communication import ROS2ARIConnector, ROS2HRIConnector
from rai.tools.ros2 import ROS2Toolkit
from rai.utils import ROS2Context, wait_for_shutdown

@ROS2Context()
def main() -> None:
"""Initialize and run the RAI agent with configured tools."""
connector = ROS2HRIConnector(sources=["/from_human"], targets=["/to_human"])
ari_connector = ROS2ARIConnector()
agent = ReActAgent(
connectors={"hri": connector},
tools=initialize_tools(connector=ari_connector),
)
agent.run()
wait_for_shutdown([agent])

# Example:
# ros2 topic pub /from_human rai_interfaces/msg/HRIMessage "{\"text\": \"What do you see?\"}"
# ros2 topic echo /to_human rai_interfaces/msg/HRIMessage
```

---

### Using Tools in LangChain/LangGraph Agent (Local Setup)

```python
from rai.agents.langchain import create_react_runnable
from langchain.schema import HumanMessage
from rai.utils import ROS2Context, wait_for_shutdown

@ROS2Context()
def main():
ari_connector = ROS2ARIConnector()
agent = create_react_runnable(
tools=initialize_tools(connector=ari_connector),
system_prompt="You are a helpful assistant that can answer questions and help with tasks.",
)
state = {'messages': []}
while True:
input_text = input("Enter a prompt: ")
state['messages'].append(HumanMessage(content=input_text))
response = agent.invoke(state)
print(response)
```

---

## Related Topics

- [Connectors](../communication/connectors.md)
- [ROS2ARIConnector](../communication/ros2.md)
- [ROS2HRIConnector](../communication/ros2.md)
2 changes: 1 addition & 1 deletion examples/manipulation-demo.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
from rai.agents.conversational_agent import create_conversational_agent
from rai.communication.ros2.connectors import ROS2ARIConnector
from rai.tools.ros.manipulation import GetObjectPositionsTool, MoveToPointTool
from rai.tools.ros2.topics import GetROS2ImageTool, GetROS2TopicsNamesAndTypesTool
from rai.tools.ros2 import GetROS2ImageTool, GetROS2TopicsNamesAndTypesTool
from rai.utils.model_initialization import get_llm_model
from rai_open_set_vision.tools import GetGrabbingPointTool

Expand Down
Loading