BlabberSeg 💬:Optimization by Reusing Features for Real-Time On-Device Open-Vocabulary Aerial Segmentation

About

BlabberSeg is a novel reusing architecture that extends CLIPSeg (https://arxiv.org/abs/2112.10003) to unlock computational efficiency during exploration of safe landing zone segmentation.

Application

Existing models require massive computation which is a limitation in real-time applications. For our model, the enhancement comes from reusing features to mitigate excessive computational overhead and achieve real-time segmentationOur approach can be extended to other open-vocabulary segmentation models. To validate our methodology, we employed it in safe landing context using the recent paper DOVESEI’s setting (https://arxiv.org/abs/2308.11471), which presented a streamlined reactive UAV system utilizing visual servoing and leveraging open-vocabulary image segmentation.

Quickstart

This was only tested on Jetson Orin AGX. Dependencies may vary and reinstalling versions (Python3, torch, torchvision, etc.) might be required.

Docker Image for Jetson Orin AGX and DOVESEI

docker run --runtime nvidia -it --rm --network=host --volume="$HOME/.Xauthority:/root/.Xauthority:rw" -e DISPLAY=$DISPLAY -v $(pwd):/home haechanmarkbong/blabberseg

Aerial Images

We used a generator to generate images (https://github.com/ricardodeazambuja/AerialViewGenerator), but you can use any images.

Generate CLIPSeg Modification Backbones

mkdir CLIPActivations
mkdir fp16_quant_dataset/activations
mkdir fp16_quant_dataset/input_images
mkdir segmentations/{name of the optimized model (see scripts for the models)}
python3 scripts/Generate\ CLIPSeg\ Mod\ Backbones.py

Generate CLIP Activations & Conditonals

python3 scripts/Generate\ Activations\ and\ Conditionals.py

Generate ONNX Models

mkdir onnx
cd int8_quant_dataset
mkdir activations
python3 scripts/ONNX\ model\ generation.py

Simplify ONNX Models

cd onnx
onnxsim CLIPActivations_fp16_352.onnx CLIPActivations_fp16_352_simpl.onnx
onnxsim CLIPSegDecoder_fp16_352.onnx CLIPSegDecoder_fp16_352_simpl.onnx

TensorRT Engine

trtexec --onnx=CLIPActivations_fp16_352_simpl.onnx --saveEngine=CLIPActivations.trt --workspace=128 --fp16 --verbose --best
trtexec --onnx=CLIPSegDecoder_fp16_352_simpl.onnx --saveEngine=CLIPSegDecoder.trt --workspace=128 --fp16 --verbose --best

Test without DOVESEI

python3 scripts/CLIPSeg_Mod_TensorRT_IOBinding_Dovesei.py

Testing using DOVESEI

git clone --recurse-submodules https://github.com/MISTLab/DOVESEI.git
colcon build --symlink-install --packages-select ros2_satellite_aerial_view_simulator ros2_open_voc_landing_heatmap ros2_open_voc_landing_heatmap_srv
source install/setup.bash
Change this file with the one in this repo.:
src/ros2_open_voc_landing_heatmap/ros2_open_voc_landing_heatmap/generate_landing_heatmap.py
cd src/ros2_satellite_aerial_view_simulator
pip install --use-pep517 ros2_satellite_aerial_view_simulator
ros2 launch ros2_open_voc_landing_heatmap start_aerialview.launch.py

The Reusing Architecture

Reusing Features

Conditionals
CLIP Activations
FiLM
Image Encoding
Positional Embeddings
Image (multi-prompts/image)

Hardware Acceleration

In order to facilitate reproducible experiments, all tests are done in Nvidia Jetson Orin AGX 64GB, NX 16GB, Nano 8GB, AGX Xavier 64GB and Google Colab. Using Orin AGX, Orin NX and Nano were emulated for performance comparison. Other than the reusing methods, we accelerated our model using TensorRT and ONNX runtime input/output binding, taking advantage of efficient hardware acceleration. In addition, our model integrated OpenCLIP instead of the original CLIP to address precision-related challenges by casting our model with float16. The internal cast to float32 in CLIP can disrupt compatibility with float16, making it challenging to operate with different precisions. OpenCLIP provides a solution that enables seamless casting of the entire model to float16, enhancing flexibility without compromising precision.

Performance

Compared to the usage of original CLIPSeg, our model was able to increase the computation speed to 1261.09% for Jetson Orin AGX, respectively.

Publication

For more information about BlabberSeg, refer to our paper: https://lnkd.in/gcthVaqk

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

BlabberSeg 💬:Optimization by Reusing Features for Real-Time On-Device Open-Vocabulary Aerial Segmentation

About

Application

Quickstart

Docker Image for Jetson Orin AGX and DOVESEI

Aerial Images

Generate CLIPSeg Modification Backbones

Generate CLIP Activations & Conditonals

Generate ONNX Models

Simplify ONNX Models

TensorRT Engine

Test without DOVESEI

Testing using DOVESEI

The Reusing Architecture

Reusing Features

Hardware Acceleration

Performance

Publication

Files

README.md

Latest commit

History

README.md

File metadata and controls

BlabberSeg 💬:Optimization by Reusing Features for Real-Time On-Device Open-Vocabulary Aerial Segmentation

About

Application

Quickstart

Docker Image for Jetson Orin AGX and DOVESEI

Aerial Images

Generate CLIPSeg Modification Backbones

Generate CLIP Activations & Conditonals

Generate ONNX Models

Simplify ONNX Models

TensorRT Engine

Test without DOVESEI

Testing using DOVESEI

The Reusing Architecture

Reusing Features

Hardware Acceleration

Performance

Publication