This directory contains code for deploying OmniParser v2 to Amazon SageMaker as an asynchronous inference endpoint.
omniparser-sagemaker/
βββ container/ # Container files for SageMaker deployment
β βββ Dockerfile # Docker configuration for the container
β βββ inference.py # SageMaker model server implementation
βββ model/ # Model artifacts
β βββ download_weights.py # Script to download weights from Hugging Face
β βββ weights/ # Local directory for temporary weight storage
βββ scripts/ # Deployment and build scripts
β βββ build_and_push.sh # Script to build and push Docker image to ECR
β βββ deploy.py # Script to deploy model to SageMaker
βββ .python-version # Python version specification
βββ pyproject.toml # Project configuration and dev dependencies
βββ requirements.txt # Production dependencies
βββ .gitignore # Git ignore rules
- AWS CLI installed and configured with appropriate credentials
- Docker installed and running
- Python 3.11
- Required Python packages (install via
pip install -r requirements.txt
):# Core Dependencies boto3 sagemaker sagemaker-inference multi-model-server # ML & Vision torch torchvision transformers ultralytics==8.3.70 supervision==0.18.0 opencv-python opencv-python-headless # OCR Components paddlepaddle paddleocr easyocr # Utilities numpy==1.26.4 einops==0.8.0
This project uses pyproject.toml
for development dependencies and configuration. To set up a development environment:
# Create and activate a virtual environment
python -m venv .venv
source .venv/bin/activate
# Install development dependencies
pip install -e ".[dev]"
# Install pre-commit hooks
pre-commit install
# Install required packages
pip install -r requirements.txt
# Configure AWS CLI with your credentials
aws configure
cd sagemaker/scripts
# Set your S3 bucket for model weights
export OMNIPARSER_MODEL_BUCKET="your-model-bucket-name"
# Build and push (this will also download and upload model weights)
./build_and_push.sh
This script will:
- Create the S3 bucket if it doesn't exist
- Download model weights from Hugging Face
- Create a tarball and upload to:
s3://${OMNIPARSER_MODEL_BUCKET}/model/omniparser-v2/model.tar.gz
- Build and push the Docker container to ECR
from scripts.deploy import deploy_omniparser
# Deploy using the same bucket used in build step
predictor = deploy_omniparser(
model_bucket="your-model-bucket-name"
)
This will:
- Create a SageMaker model using the ECR container
- Configure the model to use weights from S3
- Deploy an async inference endpoint
- Return a predictor object for making inferences
from examples.invoke_endpoint import invoke_omniparser, get_results
# Submit an inference request
image_path = "path/to/your/image.png"
output_location = invoke_omniparser(image_path)
# Wait for processing (you can implement polling here)
import time
time.sleep(30)
# Get results
labeled_image, coordinates, content = get_results(output_location)
OmniParser v2 uses two main model components:
- Icon Detection Model (YOLO-based)
- Icon Caption Model (Florence2)
The weights are managed in two stages:
-
Build Time:
- Downloaded from Hugging Face
- Packaged into
model.tar.gz
- Uploaded to S3:
s3://<bucket>/model/omniparser-v2/model.tar.gz
-
Runtime:
- SageMaker automatically downloads weights from S3
- Extracts to
/opt/ml/model
in the container - Used by the model for inference
# Required:
export OMNIPARSER_MODEL_BUCKET="your-bucket" # S3 bucket for model weights
# Optional:
export AWS_DEFAULT_REGION="us-west-2" # Defaults to us-west-2
# In deploy.py:
predictor = deploy_omniparser(
model_bucket="your-bucket",
model_prefix="model/omniparser-v2" # Optional, defaults to this value
)
# In invoke_endpoint.py:
request = {
'image': encode_image(image_path),
'box_threshold': 0.05, # Detection confidence threshold
'iou_threshold': 0.7, # Box overlap threshold
'use_paddleocr': False, # Whether to use PaddleOCR
'batch_size': 128 # Batch size for caption generation
}
-
CloudWatch Metrics:
- Endpoint invocations
- Model latency
- GPU utilization
-
CloudWatch Logs:
- Container logs
- Inference errors
-
S3 Monitoring:
- Async inference results
- Failed inference requests
-
Build Issues:
- Check S3 bucket permissions
- Verify Hugging Face access
- Check Docker build logs
- Ensure enough disk space for weights
-
Deployment Issues:
- Verify IAM roles have necessary permissions
- Check SageMaker service quotas
- Verify GPU instance availability
-
Inference Issues:
- Check async output location
- Verify input image format
- Monitor GPU memory usage
import boto3
# Delete endpoint
sagemaker = boto3.client('sagemaker')
sagemaker.delete_endpoint(EndpointName='omniparser-v2-async')
# Delete model weights (optional)
s3 = boto3.client('s3')
s3.delete_object(
Bucket='your-model-bucket',
Key='model/omniparser-v2/model.tar.gz'
)