diff --git a/examples/document_analysis_mcp/Dockerfile b/examples/document_analysis_mcp/Dockerfile new file mode 100644 index 00000000..2a17bb6e --- /dev/null +++ b/examples/document_analysis_mcp/Dockerfile @@ -0,0 +1,37 @@ +# SPDX-FileCopyrightText: Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +FROM python:3.11-slim + +WORKDIR /app + +# Install system dependencies +RUN apt-get update && apt-get install -y \ + build-essential \ + && rm -rf /var/lib/apt/lists/* + +# Copy requirements first to leverage Docker cache +COPY requirements.txt . +RUN pip install --no-cache-dir -r requirements.txt + +# Copy the rest of the application +COPY servers /app/servers +COPY configs /app/configs + +# Expose the port +EXPOSE 9902 + +# Run the server +CMD ["python", "-m", "servers.server"] \ No newline at end of file diff --git a/examples/document_analysis_mcp/README.md b/examples/document_analysis_mcp/README.md new file mode 100644 index 00000000..2e6a8eae --- /dev/null +++ b/examples/document_analysis_mcp/README.md @@ -0,0 +1,186 @@ + + +# Document Analysis MCP Example + +This example demonstrates how to use AIQ Toolkit with Model Context Protocol (MCP) to create a document analysis and question answering system. It showcases the integration of multiple tools and sophisticated configurations within the AIQ Toolkit framework. + +## Features + +- URL content fetching with HTML parsing +- Document analysis and information extraction +- Question answering about analyzed documents +- Enhanced error handling and retry mechanisms +- Docker support for easy deployment +- Full MCP server and client implementation + +## Prerequisites + +- Python 3.11 or higher +- Docker and Docker Compose +- NVIDIA API Key for accessing the LLM +- AIQ Toolkit installed with required plugins + +## Setup + +1. Set your NVIDIA API Key: + ```bash + export NVIDIA_API_KEY=your_api_key_here + ``` + +2. Install required AIQ Toolkit plugins: + ```bash + uv pip install -e '.[langchain]' + ``` + +3. Build and start the Docker container: + ```bash + docker-compose -f deployment/docker-compose.yml up --build + ``` + +4. The server will be available at `http://localhost:9902` + +## Available Tools + +1. **Fetch Tool** + - Fetches content from a URL + - Parses HTML and extracts text + - Handles errors and timeouts + ```python + { + "url": "https://example.com" + } + ``` + +2. **Document Analysis Tool** + - Analyzes document text + - Splits into chunks + - Creates vector store for Q&A + ```python + { + "text": "Your document text here" + } + ``` + +3. **Question Answering Tool** + - Answers questions about analyzed documents + - Uses vector search for context + - Provides detailed answers + ```python + { + "question": "Your question here" + } + ``` + +## Architecture + +- `Dockerfile`: Container configuration +- `deployment/docker-compose.yml`: Service orchestration + +## How it Works + +1. The server provides three main tools: + - URL content fetching + - Document analysis + - Question answering + +2. Each tool has: + - Input validation + - Error handling + - Retry mechanisms + - Detailed logging + +3. The system uses: + - LangChain for document processing + - FAISS for vector storage + - BeautifulSoup for HTML parsing + - Docker for deployment + + +## Installation and Setup + +If you have not already done so, follow the instructions in the [Install Guide](../../docs/source/quick-start/installing.md#install-from-source) to create the development environment and install AIQ Toolkit. + +To run this example do the following: + +1. Start up docker compose using the provided `docker-compose.yml` file. + ```bash + docker compose -f examples/document_analysis_mcp/deployment/docker-compose.yml up -d + ``` + The container will pull down the necessary code to run the server when it starts, so it may take a few minutes before the server is ready. + You can inspect the logs by running + ```bash + docker compose -f examples/document_analysis_mcp/deployment/docker-compose.yml logs + ``` + The server is ready when you see the following: + ```bash + mcp-proxy-aiq | INFO: Started server process [1] + mcp-proxy-aiq | INFO: Waiting for application startup. + mcp-proxy-aiq | INFO: Application startup complete. + mcp-proxy-aiq | INFO: Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit) + ``` + + 2. In a new terminal, from the root of the AIQ Toolkit repository run the workflow: + ```bash + source .venv/bin/activate + aiq run --config_file=examples/document_analysis_mcp/configs/config.yml --input="What is langchain?" + ``` + + The ReAct Agent will use the tool to answer the question + ```console + 2025-03-11 16:13:29,922 - aiq.agent.react_agent.agent - INFO - The agent's thoughts are: +Thought: To answer this question, I need to find out what LangChain is. It's possible that it's a recent development or a concept that has been discussed online. I can use the internet to find the most up-to-date information about LangChain. + +Action: mcp_url_tool +Action Input: {"url": "https://langchain.dev/", "max_length": 5000, "start_index": 0, "raw": false} + + +2025-03-11 16:13:29,924 - aiq.agent.react_agent.agent - INFO - Calling tool mcp_url_tool with input: {"url": "https://langchain.dev/", "max_length": 5000, "start_index": 0, "raw": false} +``` +```console +Workflow Result: +["LangChain is a composable framework that supports developers in building, running, and managing applications powered by Large Language Models (LLMs). It offers a suite of products, including LangChain, LangGraph, and LangSmith, which provide tools for building context-aware and reasoning applications, deploying LLM applications at scale, and debugging, collaborating, testing, and monitoring LLM apps. LangChain's products are designed to help developers create reliable and efficient GenAI applications, and its platform is used by teams of all sizes across various industries."] +``` + +## Usage Examples + +1. Fetch content from a URL: + ```bash + curl -X POST http://localhost:9902/tools/fetch \ + -H "Content-Type: application/json" \ + -d '{"url": "https://example.com"}' + ``` +2. Analyze a document: + ```bash + curl -X POST http://localhost:9902/tools/analyze_document \ + -H "Content-Type: application/json" \ + -d '{"text": "Your document text here"}' + ``` + +3. Ask a question: + ```bash + curl -X POST http://localhost:9902/tools/answer_question \ + -H "Content-Type: application/json" \ + -d '{"question": "What is the main topic?"}' + ``` + +## Related Documentation + +- [AIQ Toolkit Documentation](https://docs.nvidia.com/aiqtoolkit) +- [MCP Server Guide](./docs/source/workflows/mcp/mcp-server.md) +- [MCP Client Guide](./docs/source/workflows/mcp/mcp-client.md) +- [LangChain Integration](./docs/source/plugins/langchain.md) diff --git a/examples/document_analysis_mcp/configs/config.yml b/examples/document_analysis_mcp/configs/config.yml new file mode 100644 index 00000000..a4e11f3b --- /dev/null +++ b/examples/document_analysis_mcp/configs/config.yml @@ -0,0 +1,40 @@ +# SPDX-FileCopyrightText: Copyright (c) 2024-2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + + + +general: + use_uvloop: true + +functions: + mcp_url_tool: + _type: mcp_tool_wrapper + url: "http://localhost:9903/sse" + mcp_tool_name: fetch + +llms: + nim_llm: + _type: nim + model_name: nvdev/meta/llama-3.1-70b-instruct + temperature: 0 + max_tokens: 4096 + top_p: 1 + +workflow: + _type: react_agent + tool_names: + - mcp_url_tool + verbose: true + llm_name: nim_llm \ No newline at end of file diff --git a/examples/document_analysis_mcp/deployment/docker-compose.yml b/examples/document_analysis_mcp/deployment/docker-compose.yml new file mode 100644 index 00000000..a0361ec5 --- /dev/null +++ b/examples/document_analysis_mcp/deployment/docker-compose.yml @@ -0,0 +1,32 @@ +# SPDX-FileCopyrightText: Copyright (c) 2024-2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + + +services: + fetch_mcp_server: + container_name: mcp-proxy-aiq + build: + context: ../servers + dockerfile: Dockerfile.proxy + ports: + - "9903:8080" + volumes: + - ../servers/run_fetch.sh:/scripts/run_fetch.sh + command: + - "--sse-port=8080" + - "--sse-host=0.0.0.0" + - "/scripts/run_fetch.sh" + environment: + - NVIDIA_API_KEY=${NVIDIA_API_KEY} diff --git a/examples/document_analysis_mcp/docker-compose.yml b/examples/document_analysis_mcp/docker-compose.yml new file mode 100644 index 00000000..75dbd450 --- /dev/null +++ b/examples/document_analysis_mcp/docker-compose.yml @@ -0,0 +1,28 @@ +# SPDX-FileCopyrightText: Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +services: + document_analysis_mcp: + build: + context: . + dockerfile: Dockerfile + ports: + - "9903:9902" + volumes: + - ./configs:/app/configs + environment: + - PYTHONPATH=/app + - NVIDIA_API_KEY=${NVIDIA_API_KEY} + command: python -m servers.server \ No newline at end of file diff --git a/examples/document_analysis_mcp/requirements.txt b/examples/document_analysis_mcp/requirements.txt new file mode 100644 index 00000000..24b0aed1 --- /dev/null +++ b/examples/document_analysis_mcp/requirements.txt @@ -0,0 +1,11 @@ +mcp>=0.1.0 +langchain>=0.1.0 +langchain-community>=0.1.0 +python-dotenv>=1.0.0 +fastapi>=0.68.0 +uvicorn>=0.15.0 +pydantic>=1.8.0 +faiss-cpu>=1.7.4 +beautifulsoup4>=4.12.0 +requests>=2.31.0 +python-multipart>=0.0.5 \ No newline at end of file diff --git a/examples/document_analysis_mcp/run.sh b/examples/document_analysis_mcp/run.sh new file mode 100755 index 00000000..0f6db565 --- /dev/null +++ b/examples/document_analysis_mcp/run.sh @@ -0,0 +1,7 @@ +#!/bin/bash + +# Set NVIDIA API Key +export NVIDIA_API_KEY=<> + +# Build and start the container +docker-compose up --build \ No newline at end of file diff --git a/examples/document_analysis_mcp/servers/Dockerfile.proxy b/examples/document_analysis_mcp/servers/Dockerfile.proxy new file mode 100644 index 00000000..43d21dec --- /dev/null +++ b/examples/document_analysis_mcp/servers/Dockerfile.proxy @@ -0,0 +1,25 @@ +# SPDX-FileCopyrightText: Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +FROM ubuntu:22.04 + +RUN apt-get update && apt-get upgrade -y && apt install -y python3 python3-pip +RUN pip3 install uv uvx +RUN pip3 install mcp-proxy + +RUN mkdir /scripts +COPY ./run_fetch.sh /scripts/run_fetch.sh + +ENTRYPOINT [ "mcp-proxy", "--pass-environment"] diff --git a/examples/document_analysis_mcp/servers/run_fetch.sh b/examples/document_analysis_mcp/servers/run_fetch.sh new file mode 100755 index 00000000..6882d89c --- /dev/null +++ b/examples/document_analysis_mcp/servers/run_fetch.sh @@ -0,0 +1,18 @@ +#!/bin/bash + +# SPDX-FileCopyrightText: Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +uvx run mcp-server-fetch -- --ignore-robots-txt