Skip to content

Docker containers to pre-process data for visualization in the portal

License

Notifications You must be signed in to change notification settings

hubmapconsortium/portal-containers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

808962a · Mar 24, 2025

History

96 Commits
Jan 29, 2025
Mar 24, 2025
Mar 24, 2025
Jan 29, 2021
Dec 11, 2019
Feb 7, 2023
Mar 21, 2021
Mar 20, 2025
Mar 27, 2024
Jan 15, 2025
Jan 29, 2025
Jun 19, 2020
Sep 26, 2024
Mar 27, 2024
Oct 17, 2024
Jan 29, 2025
Jan 29, 2025
Jun 19, 2020
Oct 17, 2024
Mar 24, 2025
Mar 24, 2025
Apr 15, 2022
Jun 19, 2020
Oct 17, 2024
Oct 18, 2024
Feb 18, 2025
Mar 21, 2021
Feb 24, 2025
Apr 20, 2021
Feb 18, 2025
Apr 30, 2020
Apr 30, 2020
Apr 10, 2024

Repository files navigation

portal-containers

Docker containers to pre-process data for visualization in the portal.

The subdirectories in this repo all have the same structure:

  • context/: A Docker context, including a Dockerfile and typically main.py, requirements.txt, and requirements-freeze.txt.
  • test-input/, test-output-actual/, test-output-expected/: Test fixtures.
  • VERSION: contains a semantic version number
  • and a README.md.

Images are named by the containing directory. Running test.sh will build (and test!) all the images. You can then define $INPUT_DIR, $OUTPUT_DIR, and $IMAGE to run an image with your own data:

docker run \
  --mount type=bind,source=$INPUT_DIR,target=/input \
  --mount type=bind,source=$OUTPUT_DIR,target=/output \
  $IMAGE

To push the latest versions to dockerhub just run:

test_docker.sh push

Getting it to run in production

This repo is included as a submodule in ingest-pipeline: When there are changes here that you want run in production:

  • bump the VERSION file in to the corresponding containers subdirectory
  • update the version referenced in the corresponding .cwl file in the root directory
  • run test_docker.sh push
  • make a PR in ingest-pipeline to update that submodule to the latest code here, and make Joel a reviewer on the PR.

Depending on the rate of change, it might be good to have a weekly routine of making PRs to ingest-pipeline. TBD.

In addition, each workflow must have a corresponding -manifest.json file conforming to this schema, which has a pattern, description, and edam_ontology_term entry for each output file. (see here for information about EDAM).

# In ingest-pipeline:
git checkout devel
git pull
git submodule update --init --recursive # This fails right now because we're not using plain URLs in .gitmodules.
git checkout -b username/update-portal-containers
cd src/ingest-pipeline/airflow/dags/cwl/portal-containers/
git checkout master
git pull
cd -
git commit -am 'Update portal-containers'
git push origin
# And then make a PR at: https://github.com/hubmapconsortium/ingest-pipeline

Here is a template for the PR into ingest-pipelines, if there is a new pipeline in portal-containers that needs to be run. This helps us communicate what the pipeline's input, output, and purpose are:

# --NAME OF THE PIPELINE--

## Input Pipeline/Original Dataset:

## Output Pipeline (Optional):

## Description:

For example:

# [ome-tiff-tiler](https://github.com/hubmapconsortium/portal-containers/blob/master/ome-tiff-tiler.cwl)

## Input Pipeline/Original Dataset + Files:

- High resolution imaging from Vanderbilt data (OME-TIFF) files, such as those in `/hive/hubmap/lz/Vanderbilt TMC/e6e9bb7c01d3cb9cdb31a8da857f8832/processedMicroscopy/`

## Output Pipeline:

- [ome-tiff-offsets](https://github.com/hubmapconsortium/portal-containers/blob/master/ome-tiff-offsets.cwl)

## Description:

This pipeline takes as input Vanderbilt's processed microscopy data and outputs an image pyramid for visualization.  In addition, the `ome-tiff-offsets` pipeline needs to be run the output of `ome-tiff-tiler` so that images with z-stacks/large numbers of channels can be efficiently visualized.