Skip to content

Added MCP example to create a document analysis and question answering #276

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 37 additions & 0 deletions examples/document_analysis_mcp/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# SPDX-FileCopyrightText: Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

FROM python:3.11-slim

WORKDIR /app

# Install system dependencies
RUN apt-get update && apt-get install -y \
build-essential \
&& rm -rf /var/lib/apt/lists/*

# Copy requirements first to leverage Docker cache
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy the rest of the application
COPY servers /app/servers
COPY configs /app/configs

# Expose the port
EXPOSE 9902

# Run the server
CMD ["python", "-m", "servers.server"]
186 changes: 186 additions & 0 deletions examples/document_analysis_mcp/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,186 @@
<!--
SPDX-FileCopyrightText: Copyright (c) 2024-2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
SPDX-License-Identifier: Apache-2.0

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->

# Document Analysis MCP Example

This example demonstrates how to use AIQ Toolkit with Model Context Protocol (MCP) to create a document analysis and question answering system. It showcases the integration of multiple tools and sophisticated configurations within the AIQ Toolkit framework.

## Features

- URL content fetching with HTML parsing
- Document analysis and information extraction
- Question answering about analyzed documents
- Enhanced error handling and retry mechanisms
- Docker support for easy deployment
- Full MCP server and client implementation

## Prerequisites

- Python 3.11 or higher
- Docker and Docker Compose
- NVIDIA API Key for accessing the LLM
- AIQ Toolkit installed with required plugins

## Setup

1. Set your NVIDIA API Key:
```bash
export NVIDIA_API_KEY=your_api_key_here
```

2. Install required AIQ Toolkit plugins:
```bash
uv pip install -e '.[langchain]'
```

3. Build and start the Docker container:
```bash
docker-compose -f deployment/docker-compose.yml up --build
```

4. The server will be available at `http://localhost:9902`

## Available Tools

1. **Fetch Tool**
- Fetches content from a URL
- Parses HTML and extracts text
- Handles errors and timeouts
```python
{
"url": "https://example.com"
}
```

2. **Document Analysis Tool**
- Analyzes document text
- Splits into chunks
- Creates vector store for Q&A
```python
{
"text": "Your document text here"
}
```

3. **Question Answering Tool**
- Answers questions about analyzed documents
- Uses vector search for context
- Provides detailed answers
```python
{
"question": "Your question here"
}
```

## Architecture

- `Dockerfile`: Container configuration
- `deployment/docker-compose.yml`: Service orchestration

## How it Works

1. The server provides three main tools:
- URL content fetching
- Document analysis
- Question answering

2. Each tool has:
- Input validation
- Error handling
- Retry mechanisms
- Detailed logging

3. The system uses:
- LangChain for document processing
- FAISS for vector storage
- BeautifulSoup for HTML parsing
- Docker for deployment


## Installation and Setup

If you have not already done so, follow the instructions in the [Install Guide](../../docs/source/quick-start/installing.md#install-from-source) to create the development environment and install AIQ Toolkit.

To run this example do the following:

1. Start up docker compose using the provided `docker-compose.yml` file.
```bash
docker compose -f examples/document_analysis_mcp/deployment/docker-compose.yml up -d
```
The container will pull down the necessary code to run the server when it starts, so it may take a few minutes before the server is ready.
You can inspect the logs by running
```bash
docker compose -f examples/document_analysis_mcp/deployment/docker-compose.yml logs
```
The server is ready when you see the following:
```bash
mcp-proxy-aiq | INFO: Started server process [1]
mcp-proxy-aiq | INFO: Waiting for application startup.
mcp-proxy-aiq | INFO: Application startup complete.
mcp-proxy-aiq | INFO: Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)
```

2. In a new terminal, from the root of the AIQ Toolkit repository run the workflow:
```bash
source .venv/bin/activate
aiq run --config_file=examples/document_analysis_mcp/configs/config.yml --input="What is langchain?"
```

The ReAct Agent will use the tool to answer the question
```console
2025-03-11 16:13:29,922 - aiq.agent.react_agent.agent - INFO - The agent's thoughts are:
Thought: To answer this question, I need to find out what LangChain is. It's possible that it's a recent development or a concept that has been discussed online. I can use the internet to find the most up-to-date information about LangChain.

Action: mcp_url_tool
Action Input: {"url": "https://langchain.dev/", "max_length": 5000, "start_index": 0, "raw": false}


2025-03-11 16:13:29,924 - aiq.agent.react_agent.agent - INFO - Calling tool mcp_url_tool with input: {"url": "https://langchain.dev/", "max_length": 5000, "start_index": 0, "raw": false}
```
```console
Workflow Result:
["LangChain is a composable framework that supports developers in building, running, and managing applications powered by Large Language Models (LLMs). It offers a suite of products, including LangChain, LangGraph, and LangSmith, which provide tools for building context-aware and reasoning applications, deploying LLM applications at scale, and debugging, collaborating, testing, and monitoring LLM apps. LangChain's products are designed to help developers create reliable and efficient GenAI applications, and its platform is used by teams of all sizes across various industries."]
```

## Usage Examples

1. Fetch content from a URL:
```bash
curl -X POST http://localhost:9902/tools/fetch \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com"}'
```
2. Analyze a document:
```bash
curl -X POST http://localhost:9902/tools/analyze_document \
-H "Content-Type: application/json" \
-d '{"text": "Your document text here"}'
```

3. Ask a question:
```bash
curl -X POST http://localhost:9902/tools/answer_question \
-H "Content-Type: application/json" \
-d '{"question": "What is the main topic?"}'
```

## Related Documentation

- [AIQ Toolkit Documentation](https://docs.nvidia.com/aiqtoolkit)
- [MCP Server Guide](./docs/source/workflows/mcp/mcp-server.md)
- [MCP Client Guide](./docs/source/workflows/mcp/mcp-client.md)
- [LangChain Integration](./docs/source/plugins/langchain.md)
40 changes: 40 additions & 0 deletions examples/document_analysis_mcp/configs/config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# SPDX-FileCopyrightText: Copyright (c) 2024-2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.



general:
use_uvloop: true

functions:
mcp_url_tool:
_type: mcp_tool_wrapper
url: "http://localhost:9903/sse"
mcp_tool_name: fetch

llms:
nim_llm:
_type: nim
model_name: nvdev/meta/llama-3.1-70b-instruct
temperature: 0
max_tokens: 4096
top_p: 1

workflow:
_type: react_agent
tool_names:
- mcp_url_tool
verbose: true
llm_name: nim_llm
32 changes: 32 additions & 0 deletions examples/document_analysis_mcp/deployment/docker-compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# SPDX-FileCopyrightText: Copyright (c) 2024-2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.


services:
fetch_mcp_server:
container_name: mcp-proxy-aiq
build:
context: ../servers
dockerfile: Dockerfile.proxy
ports:
- "9903:8080"
volumes:
- ../servers/run_fetch.sh:/scripts/run_fetch.sh
command:
- "--sse-port=8080"
- "--sse-host=0.0.0.0"
- "/scripts/run_fetch.sh"
environment:
- NVIDIA_API_KEY=${NVIDIA_API_KEY}
28 changes: 28 additions & 0 deletions examples/document_analysis_mcp/docker-compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# SPDX-FileCopyrightText: Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

services:
document_analysis_mcp:
build:
context: .
dockerfile: Dockerfile
ports:
- "9903:9902"
volumes:
- ./configs:/app/configs
environment:
- PYTHONPATH=/app
- NVIDIA_API_KEY=${NVIDIA_API_KEY}
command: python -m servers.server
11 changes: 11 additions & 0 deletions examples/document_analysis_mcp/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
mcp>=0.1.0
langchain>=0.1.0
langchain-community>=0.1.0
python-dotenv>=1.0.0
fastapi>=0.68.0
uvicorn>=0.15.0
pydantic>=1.8.0
faiss-cpu>=1.7.4
beautifulsoup4>=4.12.0
requests>=2.31.0
python-multipart>=0.0.5
7 changes: 7 additions & 0 deletions examples/document_analysis_mcp/run.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
#!/bin/bash

# Set NVIDIA API Key
export NVIDIA_API_KEY=<<REPLACE_WITH_YOUR_NVIDIA_API_KEY>>

# Build and start the container
docker-compose up --build
25 changes: 25 additions & 0 deletions examples/document_analysis_mcp/servers/Dockerfile.proxy
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# SPDX-FileCopyrightText: Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

FROM ubuntu:22.04

RUN apt-get update && apt-get upgrade -y && apt install -y python3 python3-pip
RUN pip3 install uv uvx
RUN pip3 install mcp-proxy

RUN mkdir /scripts
COPY ./run_fetch.sh /scripts/run_fetch.sh

ENTRYPOINT [ "mcp-proxy", "--pass-environment"]
18 changes: 18 additions & 0 deletions examples/document_analysis_mcp/servers/run_fetch.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
#!/bin/bash

# SPDX-FileCopyrightText: Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

uvx run mcp-server-fetch -- --ignore-robots-txt