From 2b25455184f61d533c33077a77c31352f82166fc Mon Sep 17 00:00:00 2001 From: Raghu Ramesha Date: Wed, 14 May 2025 14:10:51 -0700 Subject: [PATCH 1/3] Added MCP example to create a document analysis and question answering system with AIQ --- examples/document_analysis_mcp/Dockerfile | 37 +++++ examples/document_analysis_mcp/README.md | 141 ++++++++++++++++++ .../document_analysis_mcp/configs/config.yml | 40 +++++ .../deployment/docker-compose.yml | 32 ++++ .../document_analysis_mcp/docker-compose.yml | 28 ++++ .../document_analysis_mcp/requirements.txt | 11 ++ examples/document_analysis_mcp/run.sh | 7 + .../servers/Dockerfile.proxy | 25 ++++ .../servers/run_fetch.sh | 18 +++ 9 files changed, 339 insertions(+) create mode 100644 examples/document_analysis_mcp/Dockerfile create mode 100644 examples/document_analysis_mcp/README.md create mode 100644 examples/document_analysis_mcp/configs/config.yml create mode 100644 examples/document_analysis_mcp/deployment/docker-compose.yml create mode 100644 examples/document_analysis_mcp/docker-compose.yml create mode 100644 examples/document_analysis_mcp/requirements.txt create mode 100755 examples/document_analysis_mcp/run.sh create mode 100644 examples/document_analysis_mcp/servers/Dockerfile.proxy create mode 100755 examples/document_analysis_mcp/servers/run_fetch.sh diff --git a/examples/document_analysis_mcp/Dockerfile b/examples/document_analysis_mcp/Dockerfile new file mode 100644 index 00000000..2a17bb6e --- /dev/null +++ b/examples/document_analysis_mcp/Dockerfile @@ -0,0 +1,37 @@ +# SPDX-FileCopyrightText: Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +FROM python:3.11-slim + +WORKDIR /app + +# Install system dependencies +RUN apt-get update && apt-get install -y \ + build-essential \ + && rm -rf /var/lib/apt/lists/* + +# Copy requirements first to leverage Docker cache +COPY requirements.txt . +RUN pip install --no-cache-dir -r requirements.txt + +# Copy the rest of the application +COPY servers /app/servers +COPY configs /app/configs + +# Expose the port +EXPOSE 9902 + +# Run the server +CMD ["python", "-m", "servers.server"] \ No newline at end of file diff --git a/examples/document_analysis_mcp/README.md b/examples/document_analysis_mcp/README.md new file mode 100644 index 00000000..9214ca48 --- /dev/null +++ b/examples/document_analysis_mcp/README.md @@ -0,0 +1,141 @@ + + +# Document Analysis MCP Example + +This example demonstrates how to use AIQ Toolkit with Model Context Protocol (MCP) to create a document analysis and question answering system. It showcases the integration of multiple tools and sophisticated configurations within the AIQ Toolkit framework. + +## Features + +- URL content fetching with HTML parsing +- Document analysis and information extraction +- Question answering about analyzed documents +- Enhanced error handling and retry mechanisms +- Docker support for easy deployment +- Full MCP server and client implementation + +## Prerequisites + +- Python 3.11 or higher +- Docker and Docker Compose +- NVIDIA API Key for accessing the LLM +- AIQ Toolkit installed with required plugins + +## Setup + +1. Set your NVIDIA API Key: + ```bash + export NVIDIA_API_KEY=your_api_key_here + ``` + +2. Install required AIQ Toolkit plugins: + ```bash + uv pip install -e '.[langchain]' + ``` + +3. Build and start the Docker container: + ```bash + docker-compose -f deployment/docker-compose.yml up --build + ``` + +4. The server will be available at `http://localhost:9902` + +## Available Tools + +1. **Fetch Tool** + - Fetches content from a URL + - Parses HTML and extracts text + - Handles errors and timeouts + ```python + { + "url": "https://example.com" + } + ``` + +2. **Document Analysis Tool** + - Analyzes document text + - Splits into chunks + - Creates vector store for Q&A + ```python + { + "text": "Your document text here" + } + ``` + +3. **Question Answering Tool** + - Answers questions about analyzed documents + - Uses vector search for context + - Provides detailed answers + ```python + { + "question": "Your question here" + } + ``` + +## Architecture + +- `Dockerfile`: Container configuration +- `deployment/docker-compose.yml`: Service orchestration + +## How it Works + +1. The server provides three main tools: + - URL content fetching + - Document analysis + - Question answering + +2. Each tool has: + - Input validation + - Error handling + - Retry mechanisms + - Detailed logging + +3. The system uses: + - LangChain for document processing + - FAISS for vector storage + - BeautifulSoup for HTML parsing + - Docker for deployment + +## Usage Examples + +1. Fetch content from a URL: + ```bash + curl -X POST http://localhost:9902/tools/fetch \ + -H "Content-Type: application/json" \ + -d '{"url": "https://example.com"}' + ``` + +2. Analyze a document: + ```bash + curl -X POST http://localhost:9902/tools/analyze_document \ + -H "Content-Type: application/json" \ + -d '{"text": "Your document text here"}' + ``` + +3. Ask a question: + ```bash + curl -X POST http://localhost:9902/tools/answer_question \ + -H "Content-Type: application/json" \ + -d '{"question": "What is the main topic?"}' + ``` + +## Related Documentation + +- [AIQ Toolkit Documentation](https://docs.nvidia.com/aiqtoolkit) +- [MCP Server Guide](./docs/source/workflows/mcp/mcp-server.md) +- [MCP Client Guide](./docs/source/workflows/mcp/mcp-client.md) +- [LangChain Integration](./docs/source/plugins/langchain.md) \ No newline at end of file diff --git a/examples/document_analysis_mcp/configs/config.yml b/examples/document_analysis_mcp/configs/config.yml new file mode 100644 index 00000000..a4e11f3b --- /dev/null +++ b/examples/document_analysis_mcp/configs/config.yml @@ -0,0 +1,40 @@ +# SPDX-FileCopyrightText: Copyright (c) 2024-2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + + + +general: + use_uvloop: true + +functions: + mcp_url_tool: + _type: mcp_tool_wrapper + url: "http://localhost:9903/sse" + mcp_tool_name: fetch + +llms: + nim_llm: + _type: nim + model_name: nvdev/meta/llama-3.1-70b-instruct + temperature: 0 + max_tokens: 4096 + top_p: 1 + +workflow: + _type: react_agent + tool_names: + - mcp_url_tool + verbose: true + llm_name: nim_llm \ No newline at end of file diff --git a/examples/document_analysis_mcp/deployment/docker-compose.yml b/examples/document_analysis_mcp/deployment/docker-compose.yml new file mode 100644 index 00000000..a0361ec5 --- /dev/null +++ b/examples/document_analysis_mcp/deployment/docker-compose.yml @@ -0,0 +1,32 @@ +# SPDX-FileCopyrightText: Copyright (c) 2024-2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + + +services: + fetch_mcp_server: + container_name: mcp-proxy-aiq + build: + context: ../servers + dockerfile: Dockerfile.proxy + ports: + - "9903:8080" + volumes: + - ../servers/run_fetch.sh:/scripts/run_fetch.sh + command: + - "--sse-port=8080" + - "--sse-host=0.0.0.0" + - "/scripts/run_fetch.sh" + environment: + - NVIDIA_API_KEY=${NVIDIA_API_KEY} diff --git a/examples/document_analysis_mcp/docker-compose.yml b/examples/document_analysis_mcp/docker-compose.yml new file mode 100644 index 00000000..75dbd450 --- /dev/null +++ b/examples/document_analysis_mcp/docker-compose.yml @@ -0,0 +1,28 @@ +# SPDX-FileCopyrightText: Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +services: + document_analysis_mcp: + build: + context: . + dockerfile: Dockerfile + ports: + - "9903:9902" + volumes: + - ./configs:/app/configs + environment: + - PYTHONPATH=/app + - NVIDIA_API_KEY=${NVIDIA_API_KEY} + command: python -m servers.server \ No newline at end of file diff --git a/examples/document_analysis_mcp/requirements.txt b/examples/document_analysis_mcp/requirements.txt new file mode 100644 index 00000000..24b0aed1 --- /dev/null +++ b/examples/document_analysis_mcp/requirements.txt @@ -0,0 +1,11 @@ +mcp>=0.1.0 +langchain>=0.1.0 +langchain-community>=0.1.0 +python-dotenv>=1.0.0 +fastapi>=0.68.0 +uvicorn>=0.15.0 +pydantic>=1.8.0 +faiss-cpu>=1.7.4 +beautifulsoup4>=4.12.0 +requests>=2.31.0 +python-multipart>=0.0.5 \ No newline at end of file diff --git a/examples/document_analysis_mcp/run.sh b/examples/document_analysis_mcp/run.sh new file mode 100755 index 00000000..0f6db565 --- /dev/null +++ b/examples/document_analysis_mcp/run.sh @@ -0,0 +1,7 @@ +#!/bin/bash + +# Set NVIDIA API Key +export NVIDIA_API_KEY=<> + +# Build and start the container +docker-compose up --build \ No newline at end of file diff --git a/examples/document_analysis_mcp/servers/Dockerfile.proxy b/examples/document_analysis_mcp/servers/Dockerfile.proxy new file mode 100644 index 00000000..43d21dec --- /dev/null +++ b/examples/document_analysis_mcp/servers/Dockerfile.proxy @@ -0,0 +1,25 @@ +# SPDX-FileCopyrightText: Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +FROM ubuntu:22.04 + +RUN apt-get update && apt-get upgrade -y && apt install -y python3 python3-pip +RUN pip3 install uv uvx +RUN pip3 install mcp-proxy + +RUN mkdir /scripts +COPY ./run_fetch.sh /scripts/run_fetch.sh + +ENTRYPOINT [ "mcp-proxy", "--pass-environment"] diff --git a/examples/document_analysis_mcp/servers/run_fetch.sh b/examples/document_analysis_mcp/servers/run_fetch.sh new file mode 100755 index 00000000..6882d89c --- /dev/null +++ b/examples/document_analysis_mcp/servers/run_fetch.sh @@ -0,0 +1,18 @@ +#!/bin/bash + +# SPDX-FileCopyrightText: Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +uvx run mcp-server-fetch -- --ignore-robots-txt From 03aefcb2439939deffd4b01b9a24d4b965d5c641 Mon Sep 17 00:00:00 2001 From: Raghu Ramesha Date: Wed, 14 May 2025 14:33:31 -0700 Subject: [PATCH 2/3] Update README.md Signed-off-by: Raghu Ramesha --- examples/document_analysis_mcp/README.md | 47 +++++++++++++++++++++++- 1 file changed, 46 insertions(+), 1 deletion(-) diff --git a/examples/document_analysis_mcp/README.md b/examples/document_analysis_mcp/README.md index 9214ca48..ba4ab9ba 100644 --- a/examples/document_analysis_mcp/README.md +++ b/examples/document_analysis_mcp/README.md @@ -110,6 +110,51 @@ This example demonstrates how to use AIQ Toolkit with Model Context Protocol (MC - BeautifulSoup for HTML parsing - Docker for deployment + +## Installation and Setup + +If you have not already done so, follow the instructions in the [Install Guide](../../docs/source/quick-start/installing.md#install-from-source) to create the development environment and install AIQ Toolkit. + +To run this example do the following: + 1) Start up docker compose using the provided `docker-compose.yml` file. + ```bash + docker compose -f examples/document_analysis_mcp/deployment/docker-compose.yml up -d + ``` + The container will pull down the necessary code to run the server when it starts, so it may take a few minutes before the server is ready. + You can inspect the logs by running + ```bash + docker compose -f examples/document_analysis_mcp/deployment/docker-compose.yml logs + ``` + The server is ready when you see the following: + ```bash + mcp-proxy-aiq | INFO: Started server process [1] + mcp-proxy-aiq | INFO: Waiting for application startup. + mcp-proxy-aiq | INFO: Application startup complete. + mcp-proxy-aiq | INFO: Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit) + ``` + + 2) In a new terminal, from the root of the AIQ Toolkit repository run the workflow: + ```bash + source .venv/bin/activate + aiq run --config_file=examples/document_analysis_mcp/configs/config.yml --input="What is langchain?" + ``` + + The ReAct Agent will use the tool to answer the question + ```console + 2025-03-11 16:13:29,922 - aiq.agent.react_agent.agent - INFO - The agent's thoughts are: +Thought: To answer this question, I need to find out what LangChain is. It's possible that it's a recent development or a concept that has been discussed online. I can use the internet to find the most up-to-date information about LangChain. + +Action: mcp_url_tool +Action Input: {"url": "https://langchain.dev/", "max_length": 5000, "start_index": 0, "raw": false} + + +2025-03-11 16:13:29,924 - aiq.agent.react_agent.agent - INFO - Calling tool mcp_url_tool with input: {"url": "https://langchain.dev/", "max_length": 5000, "start_index": 0, "raw": false} +``` +```console +Workflow Result: +["LangChain is a composable framework that supports developers in building, running, and managing applications powered by Large Language Models (LLMs). It offers a suite of products, including LangChain, LangGraph, and LangSmith, which provide tools for building context-aware and reasoning applications, deploying LLM applications at scale, and debugging, collaborating, testing, and monitoring LLM apps. LangChain's products are designed to help developers create reliable and efficient GenAI applications, and its platform is used by teams of all sizes across various industries."] + + ## Usage Examples 1. Fetch content from a URL: @@ -138,4 +183,4 @@ This example demonstrates how to use AIQ Toolkit with Model Context Protocol (MC - [AIQ Toolkit Documentation](https://docs.nvidia.com/aiqtoolkit) - [MCP Server Guide](./docs/source/workflows/mcp/mcp-server.md) - [MCP Client Guide](./docs/source/workflows/mcp/mcp-client.md) -- [LangChain Integration](./docs/source/plugins/langchain.md) \ No newline at end of file +- [LangChain Integration](./docs/source/plugins/langchain.md) From b117dda141c7e9a99c9f517e5f9ad3dd4f7bdabe Mon Sep 17 00:00:00 2001 From: Raghu Ramesha Date: Wed, 14 May 2025 14:35:53 -0700 Subject: [PATCH 3/3] Update README.md Signed-off-by: Raghu Ramesha --- examples/document_analysis_mcp/README.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/examples/document_analysis_mcp/README.md b/examples/document_analysis_mcp/README.md index ba4ab9ba..2e6a8eae 100644 --- a/examples/document_analysis_mcp/README.md +++ b/examples/document_analysis_mcp/README.md @@ -116,7 +116,8 @@ This example demonstrates how to use AIQ Toolkit with Model Context Protocol (MC If you have not already done so, follow the instructions in the [Install Guide](../../docs/source/quick-start/installing.md#install-from-source) to create the development environment and install AIQ Toolkit. To run this example do the following: - 1) Start up docker compose using the provided `docker-compose.yml` file. + +1. Start up docker compose using the provided `docker-compose.yml` file. ```bash docker compose -f examples/document_analysis_mcp/deployment/docker-compose.yml up -d ``` @@ -133,7 +134,7 @@ To run this example do the following: mcp-proxy-aiq | INFO: Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit) ``` - 2) In a new terminal, from the root of the AIQ Toolkit repository run the workflow: + 2. In a new terminal, from the root of the AIQ Toolkit repository run the workflow: ```bash source .venv/bin/activate aiq run --config_file=examples/document_analysis_mcp/configs/config.yml --input="What is langchain?" @@ -153,7 +154,7 @@ Action Input: {"url": "https://langchain.dev/", "max_length": 5000, "start_index ```console Workflow Result: ["LangChain is a composable framework that supports developers in building, running, and managing applications powered by Large Language Models (LLMs). It offers a suite of products, including LangChain, LangGraph, and LangSmith, which provide tools for building context-aware and reasoning applications, deploying LLM applications at scale, and debugging, collaborating, testing, and monitoring LLM apps. LangChain's products are designed to help developers create reliable and efficient GenAI applications, and its platform is used by teams of all sizes across various industries."] - +``` ## Usage Examples @@ -163,7 +164,6 @@ Workflow Result: -H "Content-Type: application/json" \ -d '{"url": "https://example.com"}' ``` - 2. Analyze a document: ```bash curl -X POST http://localhost:9902/tools/analyze_document \