GitHub - mbari-org/aipipeline: Library for running detection, clustering or classification ai pipelines using ApacheBeam

aipipeline is a library for running ai pipelines and monitoring the performance of the pipelines, e.g. accuracy, precision, recall, F1 score. This may include object detection, clustering, classification, and vector search algorithms. It is designed to be used for a number of projects at MBARI that require advanced workflows to process large amounts of images or video. After workflows are developed, they may be moved to the project repositories for production use. The roadmap includes adding the core functionality of some of the processing components to more broad use in the MBARI AI ecosystem.

See the MBARI Internal AI documentation for more information on the tools and services used in the pipelines.

Example plots from the t-SNE, confusion matrix and accuracy analysis of examplar data.

Requirements

Three tools are required to run the code in this repository:

Anaconda environment

This is a package manager for python. We recommend using the Miniconda version of Anaconda. Install on Mac OS X with the following command:

brew install miniconda

or on Ubuntu with the following command:

sudo apt install miniconda

Docker

This is a containerization tool that allows you to run code in a container.

just tool.

This is a handy tool for running scripts in the project. This is easier to use than make and more clean than bash scripts. Try it out!

Install on Mac OS X with the following command:

port install just

or on Ubuntu with the following command:

sudo apt install just

Installation

Clone the repository and run the setup command.

git clone http://github.com/mbari-org/aipipeline.git
cd aipipeline
just setup

Sensitive information is stored in a .env file in the root directory of the project, so you need to create a .env file with the following contents in the root directory of the project:

TATOR_TOKEN=your_api_token
REDIS_PASSWORD=your_redis_password
ENVIRONMENT=testing or production

Usage

Recipes are available to run the pipelines. To see the available recipes, run the following command:

just list

Command	Description
list	List recipes
install	Setup the environment
cp-env	Copy the default .env file to the project
update_trackers	Update the environment. Run this command after checking out any code changes
update-env	Update environment
cp-core	Copy core dev code to the project on doris
cp-dev-cfe	Copy cfe dev code to the project on doris
cp-dev-ptvr	Copy planktivore dev code to the project on doris
cp-dev-uav	Copy uav dev code to the project on doris
cp-dev-bio	Copy bio dev code to the project on doris
cp-dev-i2map	Copy i2map dev code to the project on doris
init-labels project='uav' leaf_type_id='19'	Initialize labels for quick lookup, e.g. just init-labels uav 19
plot-tsne-vss project='uav'	Generate a tsne plot of the VSS database
optimize-vss project='uav' *more_args=""	Optimize the VSS database
calc-acc-vss project='uav'	Calculate the accuracy of the VSS database; run after download, then optimize
reset-vss-all	Reset the VSS database, removing all data. Proceed with caution!!
reset-vss project='uav'	Reset the VSS database, removing all data
remove-vss project='uav' *more_args=""	Remove an entry from the VSS database
init-vss project='uav' *more_args=""	Initialize the VSS database for a project
load-vss project='uav'	Load already computed exemplars into the VSS database
load-cfe-isiis-videos missions=""	Load cfe ISII mission videos
cluster-cfe-isiis-frames roi_dir=...	Cluster CFE ISIIS hawaii mission frames
load-ptvr-images images='tmp/roi' *args=""	Load planktivore ROI images
cluster-ptvr-images *more_args=""	Cluster planktivore ROI images
load-ptvr-clusters clusters='tmp/...' *args=""	Load planktivore ROI clusters
rescale-ifcb-images collection="2014"	Rescale planktivore ROI images
rescale-ptvr-images collection="..."	Rescale planktivore ROI images
download-rescale-ptvr-images collection="..."	Download and rescale planktivore ROI images
cluster-uav *more_args=""	Cluster mission in UAV project
detect-uav *more_args=""	Detect mission in UAV project
detect-uav-test	Detect mission data for testing
load-uav-images	Load UAV mission images
load-uav type="cluster"	Load UAV detections/clusters
fix-uav-metadata	Fix UAV metadata lat/lon/alt
compute-saliency project='uav' *args=""	Compute saliency for downloaded VOC data
crop project='uav' *more_args=""	Crop detections from VOC formatted downloads
download-crop project='uav' *more_args=""	Download and crop with defaults
download project='uav'	Download only
cluster project='uav' *more_args=""	Cluster only
predict-vss project='uav' image_dir=...	Predict images using the VSS database
run-ctenoA-prod	Run strided inference on videos in TSV file
run-mega-inference	Run mega strided inference on single video
run-mega-stride-bio video=...	Run mega strided pipeline for bio project
run-mega-track-bio video=...	Run mega strided tracking pipeline for bio project
run-mega-stride-i2map video=...	Run mega strided pipeline for i2map
run-mega-track-i2map video=...	Run mega strided tracking pipeline for i2map
run-mega-track-test-1min	Test mega strided tracking pipeline on one video
run-mega-track-test-fastapiyv5	Test mega strided pipeline with FastAPI
run-mega-track-isiis-video video=...	Mega strided tracking pipeline for cfe project
cluster-i2mapbulk	Run inference and cluster on i2MAP bulk data
download-cluster project="i2map" ...	Download and cluster data
cluster-ptvr-sweep roi_dir=... save_dir=...	Sweep clustering for planktivore data
load-i2mapbulk data='data'	Load i2MAP bulk data
load-cluster project="uav" data='data' ...	Load clusters for any project
download-i2mapbulk-unlabeled	Download i2MAP bulk unlabeled data
replace-m3-urls	Replace m3 URLs with mantis
gen-bio-data image_dir=""	Generate training data for the bio project
gen-cfe-data	Generate training data for the CFE project
gen-i2map-data	Generate training data for the i2map project
gen-i2mapbulk-data	Generate training data for the i2map project from bulk
gen-uav-data	Generate training data for the UAV project
gen-stats-csv project='UAV' data=...	Generate training data stats
gen-ptvr-lowmag-data	Generate training data for the planktivore low mag
init-ptvr-lowmag-vss	Initialize VSS for planktivore low mag data
transcode-i2map	Transcode i2MAP videos from mov to mp4
--

Related projects

aidata -A tool to extract, transform, load and download operations on AI data.
sdcat - Sliced Detection and Clustering Analysis Toolkit; a tool to detect and cluster objects in images.
deepsea-ai - A tool to train and run object detection and tracking on video at scale in the cloud (AWS).
fastapi-yolov5 - A RESTful API for running YOLOv5 object detection models on images either locally or in the cloud (AWS).
fastapi-vss - A RESTful API for vector similarity search using foundational models.
fastapi-tator - A RESTful API server for bulk operations on a Tator annotation database.

🗓️ Last updated: 2025-06-01

Name		Name	Last commit message	Last commit date
Latest commit History 644 Commits
.github/workflows		.github/workflows
aipipeline		aipipeline
docs/imgs		docs/imgs
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
example.env		example.env
justfile		justfile
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

See the MBARI Internal AI documentation for more information on the tools and services used in the pipelines.

Requirements

Anaconda environment

Docker

just tool.

Installation

Usage

Related projects

About

Uh oh!

Releases 177

Packages

Uh oh!

Languages

License

mbari-org/aipipeline

Folders and files

Latest commit

History

Repository files navigation

See the MBARI Internal AI documentation for more information on the tools and services used in the pipelines.

Requirements

Anaconda environment

Docker

just tool.

Installation

Usage

Related projects

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 177

Packages 0

Uh oh!

Languages

Packages