RobotecAI · maciejmajek · Apr 8, 2025 · Apr 4, 2025 · Apr 4, 2025 · Apr 4, 2025
diff --git a/docs/demos/rosbot_xl.md b/docs/demos/rosbot_xl.md
@@ -6,15 +6,19 @@ This demo utilizes Open 3D Engine simulation and allows you to work with RAI on
 
 ## Quick start
 
+> [!TIP]
+> The demo uses the `complex_model` LLM configured in [../../config.toml](../../config.toml). The model should be a multimodal, tool-calling model.
+
 1. Download the newest binary release:
 
 - Ubuntu 22.04 & ros2 humble: [link](https://robotec-ml-roscon2024-demos.s3.eu-central-1.amazonaws.com/ROSCON_Release/RAIROSBotDemo_1.0.0_jammyhumble.zip)
 - Ubuntu 24.04 & ros2 jazzy: [link](https://robotec-ml-roscon2024-demos.s3.eu-central-1.amazonaws.com/ROSCON_Release/RAIROSBotDemo_1.0.0_noblejazzy.zip)
 
-2. Install required packages
+2. Install and download required packages
 
    ```bash
-   sudo apt install ros-${ROS_DISTRO}-ackermann-msgs ros-${ROS_DISTRO}-gazebo-msgs ros-${ROS_DISTRO}-control-toolbox ros-${ROS_DISTRO}-nav2-bringup
+   vcs import < demos.repos
+   rosdep install --from-paths src --ignore-src -r -y
    poetry install --with openset
    ```
 
@@ -32,56 +36,28 @@ This demo utilizes Open 3D Engine simulation and allows you to work with RAI on
 If you would like more freedom to adapt the simulation to your needs, you can make changes using
 [O3DE Editor](https://www.docs.o3de.org/docs/welcome-guide/) and build the project
 yourself.
-Please refer to [rai husarion rosbot xl demo][rai rosbot demo] for more details.
+Please refer to [rai husarion rosbot xl demo](https://github.com/RobotecAI/rai-rosbot-xl-demo) for more details.
 
 # Running RAI
 
-1. Robot identity
-
-   Process of setting up the robot identity is described in [create_robots_whoami](../create_robots_whoami.md).
-   We provide ready whoami for RosBotXL in the package.
-
-   ```bash
-   cd rai
-   vcs import < demos.repos
-   colcon build --symlink-install --packages-select rosbot_xl_whoami
-   ```
-
-2. Running rai nodes and agents, navigation stack and O3DE simulation.
+1. Running rai nodes and agents, navigation stack and O3DE simulation.
 
    ```bash
    ros2 launch ./examples/rosbot-xl.launch.py game_launcher:=path/to/RAIROSBotXLDemo.GameLauncher
    ```
 
-3. Play with the demo, adding tasks to the RAI agent. Here are some examples:
+2. Run streamlit gui:
 
    ```bash
-   # Ask robot where it is. RAI will use camera to describe the environment
-   ros2 action send_goal -f /perform_task rai_interfaces/action/Task "{priority: 10, description: '', task: 'Where are you?'}"
-
-   # See integration with the navigation stack
-   ros2 action send_goal -f /perform_task rai_interfaces/action/Task "{priority: 10, description: '', task: 'Drive 1 meter forward'}"
-   ros2 action send_goal -f /perform_task rai_interfaces/action/Task "{priority: 10, description: '', task: 'Spin 90 degrees'}"
-
-   # Try out more complicated tasks
-   ros2 action send_goal -f /perform_task rai_interfaces/action/Task "{priority: 10, description: '', task: ' Drive forward if the path is clear, otherwise backward'}"
+   streamlit run examples/rosbot-xl-demo.py
    ```
 
-> **NOTE**: For now agent is capable of performing only 1 task at once.
-> Human-Robot Interaction module is not yet included in the demo (coming soon!).
-
-### What is happening?
-
-By looking at the example code in [rai/examples/rosbot-xl-demo.py](../../examples/rosbot-xl-demo.py) `examples` you can see that:
-
-- This node has no information about the robot besides what it can get from `rai_whoami_node`.
-- Topics can be whitelisted to only receive information about the robot.
-- Before every LLM decision, `rai_node` sends its state to the LLM Agent. By default, it contains ros interfaces (topics, services, actions) and logs summary, but the state can be extended.
-- In the example we are also adding description of the camera image to the state.
-
-If you wish, you can learn more about [configuring RAI for a specific robot](../create_robots_whoami.md).
+3. Play with the demo, prompting the agent to perform tasks. Here are some examples:
 
-[rai rosbot demo]: https://github.com/RobotecAI/rai-rosbot-xl-demo
+   - Where are you now?
+   - What do you see?
+   - What is the position of bed?
+   - Navigate to the kitchen.
 
 > [!TIP]
 > If you are having trouble running the binary, you can build it from source [here](https://github.com/RobotecAI/rai-rosbot-xl-demo).
diff --git a/docs/developer_guide/tools.md b/docs/developer_guide/tools.md
@@ -0,0 +1,247 @@
+# Tools
+
+Tools are a fundamental concept in LangChain that allow AI models to interact with external systems and perform specific operations. Think of tools as callable functions that bridge the gap between natural language understanding and system execution.
+
+RAI offers a comprehensive set of pre-built tools, including both general-purpose and ROS 2-specific tools [here](../../src/rai_core/rai/tools/ros2). However, in some cases, you may need to develop custom tools tailored to specific robots or applications. This guide demonstrates how to create custom tools in RAI using the [LangChain framework](https://python.langchain.com/docs/).
+
+RAI supports two primary approaches for implementing tools, each with distinct advantages:
+
+### `BaseTool` Class
+
+- Offers full control over tool behavior and lifecycle
+- Allows configuration parameters
+- Supports stateful operations (e.g., maintaining ROS 2 connector instances)
+
+### `@tool` Decorator
+
+- Provides a lightweight, functional approach
+- Ideal for stateless operations
+- Minimizes boilerplate code
+- Suited for simple, single-purpose tools
+
+Use the `BaseTool` class when state management, or extensive configuration is required. Choose the `@tool` decorator for simple, stateless functionality where conciseness is preferred.
+
+---
+
+## Creating a Custom Tool
+
+LangChain tools typically return either a string or a tuple containing a string and an artifact.
+
+RAI extends LangChain’s tool capabilities by supporting **multimodal tools**—tools that return not only text but also other content types, such as images, audio, or structured data. This is achieved using a special object called `MultimodalArtifact` along with a custom `ToolRunner` class.
+
+---
+
+### Single-Modal Tool (Text Output)
+
+Here’s an example of a single-modal tool implemented using class inheritance:
+
+```python
+from langchain_core.tools import BaseTool
+from pydantic import BaseModel, Field
+from typing import Type
+
+
+class GrabObjectToolInput(BaseModel):
+    """Input schema for the GrabObjectTool."""
+    object_name: str = Field(description="The name of the object to grab")
+
+
+class GrabObjectTool(BaseTool):
+    """Tool for grabbing objects using a robot."""
+    name: str = "grab_object"
+    description: str = "Grabs a specified object using the robot's manipulator"
+    args_schema: Type[GrabObjectToolInput] = GrabObjectToolInput
+
+    def _run(self, object_name: str) -> str:
+        """Execute the object grabbing operation."""
+        try:
+            status = robot.grab_object(object_name)
+            return f"Successfully grabbed object: {object_name}, status: {status}"
+        except Exception as e:
+            return f"Failed to grab object: {object_name}, error: {str(e)}"
+```
+
+Alternatively, using the `@tool` decorator:
+
+```python
+from langchain_core.tools import tool
+
+@tool
+def grab_object(object_name: str) -> str:
+    """Grabs a specified object using the robot's manipulator."""
+    try:
+        status = robot.grab_object(object_name)
+        return f"Successfully grabbed object: {object_name}, status: {status}"
+    except Exception as e:
+        return f"Failed to grab object: {object_name}, error: {str(e)}"
+```
+
+---
+
+### Multimodal Tool (Text + Image Output)
+
+RAI supports multimodal tools through the `rai.agents.tool_runner.ToolRunner` class. These tools must use this runner either directly or via agents such as [`create_react_runnable`](../../src/rai_core/rai/agents/langchain/runnables.py) to handle multimedia output correctly.
+
+```python
+from langchain_core.tools import BaseTool
+from pydantic import BaseModel, Field
+from typing import Type, Tuple
+from rai.messages import MultimodalArtifact
+
+
+class Get360ImageToolInput(BaseModel):
+    """Input schema for the Get360ImageTool."""
+    topic: str = Field(description="The topic name for the 360 image")
+
+
+class Get360ImageTool(BaseTool):
+    """Tool for retrieving 360-degree images."""
+    name: str = "get_360_image"
+    description: str = "Retrieves a 360-degree image from the specified topic"
+    args_schema: Type[Get360ImageToolInput] = Get360ImageToolInput
+    response_format: str = "content_and_artifact"
+
+    def _run(self, topic: str) -> Tuple[str, MultimodalArtifact]:
+        try:
+            image = robot.get_360_image(topic)
+            return "Successfully retrieved 360 image", MultimodalArtifact(images=[image])
+        except Exception as e:
+            return f"Failed to retrieve image: {str(e)}", MultimodalArtifact(images=[])
+```
+
+---
+
+### ROS 2 Tools
+
+RAI includes a base class for ROS 2 tools, supporting configuration of readable, writable, and forbidden topics/actions/services, as well as ROS 2 connector. TODO(docs): link docs to the ARIConnector.
+
+```python
+from rai.tools.ros2.base import BaseROS2Tool
+from pydantic import BaseModel, Field
+from typing import Type, cast
+from sensor_msgs.msg import PointCloud2
+
+
+class GetROS2LidarDataToolInput(BaseModel):
+    """Input schema for the GetROS2LidarDataTool."""
+    topic: str = Field(description="The topic name for the LiDAR data")
+
+
+class GetROS2LidarDataTool(BaseROS2Tool):
+    """Tool for retrieving and processing LiDAR data."""
+    name: str = "get_ros2_lidar_data"
+    description: str = "Retrieves and processes LiDAR data from the specified topic"
+    args_schema: Type[GetROS2LidarDataToolInput] = GetROS2LidarDataToolInput
+
+    def _run(self, topic: str) -> str:
+        try:
+            lidar_data = self.connector.receive_message(topic)
+            msg = cast(PointCloud2, lidar_data.payload)
+            # Process the LiDAR data
+            return f"Successfully processed LiDAR data. Detected objects: ..."
+        except Exception as e:
+            return f"Failed to process LiDAR data: {str(e)}"
+```
+
+Refer to the [BaseROS2Tool source code](../../src/rai_core/rai/tools/ros2/base.py) for more information.
+
+---
+
+## Tool Initialization
+
+Tools can be initialized with parameters such as a connector, enabling custom configurations for ROS 2 environments.
+
+```python
+from rai.communication.ros2 import ROS2ARIConnector
+from rai.tools.ros2 import (
+    GetROS2ImageTool,
+    GetROS2TopicsNamesAndTypesTool,
+    PublishROS2MessageTool,
+)
+
+def initialize_tools(connector: ROS2ARIConnector):
+    """Initialize and configure ROS 2 tools.
+
+    Returns:
+        list: A list of configured tools.
+    """
+    readable_names = ["/color_image5", "/depth_image5", "/color_camera_info5"]
+    forbidden_names = ["cmd_vel"]
+    writable_names = ["/to_human"]
+
+    return [
+        GetROS2ImageTool(
+            connector=connector, readable=readable_names, forbidden=forbidden_names
+        ),
+        GetROS2TopicsNamesAndTypesTool(
+            connector=connector,
+            readable=readable_names,
+            forbidden=forbidden_names,
+            writable=writable_names,
+        ),
+        PublishROS2MessageTool(
+            connector=connector, writable=writable_names, forbidden=forbidden_names
+        ),
+    ]
+```
+
+---
+
+### Using Tools in a RAI Agent (Distributed Setup)
+
+TODO(docs): add link to the BaseAgent docs (regarding distributed setup)
+
+```python
+from rai.agents import ReActAgent
+from rai.communication import ROS2ARIConnector, ROS2HRIConnector
+from rai.tools.ros2 import ROS2Toolkit
+from rai.utils import ROS2Context, wait_for_shutdown
+
+@ROS2Context()
+def main() -> None:
+    """Initialize and run the RAI agent with configured tools."""
+    connector = ROS2HRIConnector(sources=["/from_human"], targets=["/to_human"])
+    ari_connector = ROS2ARIConnector()
+    agent = ReActAgent(
+        connectors={"hri": connector},
+        tools=initialize_tools(connector=ari_connector),
+    )
+    agent.run()
+    wait_for_shutdown([agent])
+
+# Example:
+# ros2 topic pub /from_human rai_interfaces/msg/HRIMessage "{\"text\": \"What do you see?\"}"
+# ros2 topic echo /to_human rai_interfaces/msg/HRIMessage
+```
+
+---
+
+### Using Tools in LangChain/LangGraph Agent (Local Setup)
+
+```python
+from rai.agents.langchain import create_react_runnable
+from langchain.schema import HumanMessage
+from rai.utils import ROS2Context, wait_for_shutdown
+
+@ROS2Context()
+def main():
+    ari_connector = ROS2ARIConnector()
+    agent = create_react_runnable(
+        tools=initialize_tools(connector=ari_connector),
+        system_prompt="You are a helpful assistant that can answer questions and help with tasks.",
+    )
+    state = {'messages': []}
+    while True:
+        input_text = input("Enter a prompt: ")
+        state['messages'].append(HumanMessage(content=input_text))
+        response = agent.invoke(state)
+        print(response)
+```
+
+---
+
+## Related Topics
+
+- [Connectors](../communication/connectors.md)
+- [ROS2ARIConnector](../communication/ros2.md)
+- [ROS2HRIConnector](../communication/ros2.md)
diff --git a/examples/manipulation-demo.py b/examples/manipulation-demo.py
@@ -19,7 +19,7 @@
 from rai.agents.conversational_agent import create_conversational_agent
 from rai.communication.ros2.connectors import ROS2ARIConnector
 from rai.tools.ros.manipulation import GetObjectPositionsTool, MoveToPointTool
-from rai.tools.ros2.topics import GetROS2ImageTool, GetROS2TopicsNamesAndTypesTool
+from rai.tools.ros2 import GetROS2ImageTool, GetROS2TopicsNamesAndTypesTool
 from rai.utils.model_initialization import get_llm_model
 from rai_open_set_vision.tools import GetGrabbingPointTool