Skip to content

[New Feature]: Modularize the EMIT workflow #302

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
LucaCinquini opened this issue Feb 3, 2025 · 7 comments
Open

[New Feature]: Modularize the EMIT workflow #302

LucaCinquini opened this issue Feb 3, 2025 · 7 comments
Assignees
Labels
enhancement New feature or request U-SPS

Comments

@LucaCinquini
Copy link
Collaborator

The old version of the EMIT workflow is here:
http://awslbdockstorestack-lb-1429770210.us-west-2.elb.amazonaws.com:9998/api/ga4gh/trs/v2/tools/%23workflow%2Fdockstore.org%2FGodwinShen%2Femit-ghg/versions/9/plain-CWL/descriptor/workflow.cwl

and the parameter files for executing in unity-venue-dev and unity-venue-test are:

https://raw.githubusercontent.com/GodwinShen/emit-ghg/refs/heads/main/test/emit-ghg-dev.json
https://raw.githubusercontent.com/GodwinShen/emit-ghg/refs/heads/main/test/emit-ghg-test.json

This task involves updating the EMIT workflow to follow the new stage-in / process / stage-out design. The Docker container that executes processing:

godwinshen/emit-ghg:bc61e769

will probably have to be updated as well.

Please work with @ngachung and @brianlee731 for help with the Data Services functionality and overall EMIT design.

@LucaCinquini LucaCinquini added the enhancement New feature or request label Feb 3, 2025
@LucaCinquini LucaCinquini moved this from Todo to In Progress in Unity Project Board Feb 3, 2025
@nikki-t
Copy link
Collaborator

nikki-t commented Feb 4, 2025

@LucaCinquini - Is there a GitHub repo that holds the Dockerfile for the container (godwinshen/emit-ghg:bc61e769) and the code that runs the EMIT processing?

@LucaCinquini
Copy link
Collaborator Author

The Docker image is in DockerHub: https://hub.docker.com/r/godwinshen/emit-ghg
Maybe @GodwinShen can point you to the Dockerfile he used to create the image? Thanks Godwin.

@GodwinShen
Copy link

@LucaCinquini I did not generate a Dockerfile myself, I simply ran the app pack generation software "locally" and then pushed the image to docker hub.

@nikki-t
Copy link
Collaborator

nikki-t commented Feb 4, 2025

@GodwinShen - I am not familiar with how to use the app pack generation software. Just guessing but maybe you ran this: https://github.com/unity-sds/unity-app-generator on the emit-ghg repo: https://github.com/emit-sds/emit-ghg?

@GodwinShen
Copy link

@nikki-t yes I used that unity-app-generator on my fork of the emit-ghg repo: https://github.com/GodwinShen/emit-ghg.git

I followed the "manual" method in this tutorial: https://unity-sds.gitbook.io/docs/mdps-overview/tutorials/the-development-environment/packaging-an-algorithm

@nikki-t
Copy link
Collaborator

nikki-t commented Feb 6, 2025

@brianlee731 and I were able to run the EMIT workflow using the modular DAG. This did require a few extra tasks:

Example logs

stage_in

[2025-02-06, 20:55:53 UTC] {pod_manager.py:471} INFO - [base] Stage in download directory: /data/stage_in/granules
[2025-02-06, 20:55:53 UTC] {pod_manager.py:471} INFO - [base] total 5577656
[2025-02-06, 20:55:53 UTC] {pod_manager.py:471} INFO - [base] -rw-r--r--    1 root     root     108709152 Feb  6 20:54 EMIT_L1B_OBS_001_20230620T084426_2317106_011.nc
[2025-02-06, 20:55:53 UTC] {pod_manager.py:471} INFO - [base] -rw-r--r--    1 root     root     1852557979 Feb  6 20:54 EMIT_L1B_RAD_001_20230620T084426_2317106_011.nc
[2025-02-06, 20:55:53 UTC] {pod_manager.py:471} INFO - [base] -rw-r--r--    1 root     root      48049971 Feb  6 20:55 EMIT_L2A_MASK_001_20230620T084426_2317106_011.nc
[2025-02-06, 20:55:53 UTC] {pod_manager.py:471} INFO - [base] -rw-r--r--    1 root     root     1851092306 Feb  6 20:55 EMIT_L2A_RFLUNCERT_001_20230620T084426_2317106_011.nc
[2025-02-06, 20:55:53 UTC] {pod_manager.py:471} INFO - [base] -rw-r--r--    1 root     root     1851092294 Feb  6 20:54 EMIT_L2A_RFL_001_20230620T084426_2317106_011.nc
[2025-02-06, 20:55:53 UTC] {pod_manager.py:471} INFO - [base] -rw-r--r--    1 root     root          2375 Feb  6 20:55 G2721220118-LPCLOUD.stac.json
[2025-02-06, 20:55:53 UTC] {pod_manager.py:471} INFO - [base] -rw-r--r--    1 root     root          2631 Feb  6 20:55 G2721699381-LPCLOUD.stac.json
[2025-02-06, 20:55:53 UTC] {pod_manager.py:471} INFO - [base] -rw-r--r--    1 root     root           519 Feb  6 20:55 catalog.json

process

[2025-02-06, 21:03:52 UTC] {pod_manager.py:471} INFO - [base] + ls -l /data/process/l4h7c7z9
[2025-02-06, 21:03:52 UTC] {pod_manager.py:471} INFO - [base] total 8820812
[2025-02-06, 21:03:52 UTC] {pod_manager.py:471} INFO - [base] -rw-r--r--    1 root     root           374 Feb  6 21:03 catalog.json
[2025-02-06, 21:03:52 UTC] {pod_manager.py:471} INFO - [base] -rw-r--r--    1 root     root     7153047462 Feb  5 18:20 dataset_ch4_full.hdf5
[2025-02-06, 21:03:52 UTC] {pod_manager.py:471} INFO - [base] -rw-r--r--    1 root     root       6359040 Feb  6 21:02 emit20230620T084426_ch4_mf
[2025-02-06, 21:03:52 UTC] {pod_manager.py:471} INFO - [base] -rw-r--r--    1 root     root           302 Feb  6 21:02 emit20230620T084426_ch4_mf.hdr
[2025-02-06, 21:03:52 UTC] {pod_manager.py:471} INFO - [base] -rw-r--r--    1 root     root          1673 Feb  6 21:03 emit20230620T084426_ch4_mf.json
[2025-02-06, 21:03:52 UTC] {pod_manager.py:471} INFO - [base] -rw-r--r--    1 root     root      18310464 Feb  6 21:03 emit20230620T084426_ch4_mf_ort
[2025-02-06, 21:03:52 UTC] {pod_manager.py:471} INFO - [base] -rw-r--r--    1 root     root           569 Feb  6 21:03 emit20230620T084426_ch4_mf_ort.aux.xml
[2025-02-06, 21:03:52 UTC] {pod_manager.py:471} INFO - [base] -rw-r--r--    1 root     root           537 Feb  6 21:03 emit20230620T084426_ch4_mf_ort.hdr
[2025-02-06, 21:03:52 UTC] {pod_manager.py:471} INFO - [base] -rw-r--r--    1 root     root         10425 Feb  6 21:02 emit20230620T084426_ch4_target

stage_out

[2025-02-06, 21:04:02 UTC] {pod_manager.py:471} INFO - [base] + successful_features='{
[2025-02-06, 21:04:02 UTC] {pod_manager.py:471} INFO - [base]   "type": "FeatureCollection",
[2025-02-06, 21:04:02 UTC] {pod_manager.py:471} INFO - [base]   "features": [
[2025-02-06, 21:04:02 UTC] {pod_manager.py:471} INFO - [base]     {
[2025-02-06, 21:04:02 UTC] {pod_manager.py:471} INFO - [base]       "type": "Feature",
[2025-02-06, 21:04:02 UTC] {pod_manager.py:471} INFO - [base]       "stac_version": "1.0.0",
[2025-02-06, 21:04:02 UTC] {pod_manager.py:471} INFO - [base]       "id": "urn:nasa:unity:emit:dev:emit_ghg_test___1:emit20230620T084426_ch4_mf",
[2025-02-06, 21:04:02 UTC] {pod_manager.py:471} INFO - [base]       "properties": {
[2025-02-06, 21:04:02 UTC] {pod_manager.py:471} INFO - [base]         "datetime": "2025-02-06T21:03:07.720944Z",
[2025-02-06, 21:04:02 UTC] {pod_manager.py:471} INFO - [base]         "start_datetime": "2025-02-06T21:03:07.720944+00:00",
[2025-02-06, 21:04:02 UTC] {pod_manager.py:471} INFO - [base]         "end_datetime": "2025-02-06T21:03:07.720968+00:00",
[2025-02-06, 21:04:02 UTC] {pod_manager.py:471} INFO - [base]         "created": "2025-02-06T21:03:07.720973+00:00",
[2025-02-06, 21:04:02 UTC] {pod_manager.py:471} INFO - [base]         "updated": "2025-02-06T21:03:07.721271Z"
[2025-02-06, 21:04:02 UTC] {pod_manager.py:471} INFO - [base]       },
[2025-02-06, 21:04:02 UTC] {pod_manager.py:471} INFO - [base]       "geometry": null,
[2025-02-06, 21:04:02 UTC] {pod_manager.py:471} INFO - [base]       "links": [
[2025-02-06, 21:04:02 UTC] {pod_manager.py:471} INFO - [base]         {
[2025-02-06, 21:04:02 UTC] {pod_manager.py:471} INFO - [base]           "rel": "root",
[2025-02-06, 21:04:02 UTC] {pod_manager.py:471} INFO - [base]           "href": "./catalog.json",
[2025-02-06, 21:04:02 UTC] {pod_manager.py:471} INFO - [base]           "type": "application/json"
[2025-02-06, 21:04:02 UTC] {pod_manager.py:471} INFO - [base]         },
[2025-02-06, 21:04:02 UTC] {pod_manager.py:471} INFO - [base]         {
[2025-02-06, 21:04:02 UTC] {pod_manager.py:471} INFO - [base]           "rel": "parent",
[2025-02-06, 21:04:02 UTC] {pod_manager.py:471} INFO - [base]           "href": "./catalog.json",
[2025-02-06, 21:04:02 UTC] {pod_manager.py:471} INFO - [base]           "type": "application/json"
[2025-02-06, 21:04:02 UTC] {pod_manager.py:471} INFO - [base]         }
[2025-02-06, 21:04:02 UTC] {pod_manager.py:471} INFO - [base]       ],
[2025-02-06, 21:04:02 UTC] {pod_manager.py:471} INFO - [base]       "assets": {
[2025-02-06, 21:04:02 UTC] {pod_manager.py:471} INFO - [base]         "emit20230620T084426_ch4_mf": {
[2025-02-06, 21:04:02 UTC] {pod_manager.py:471} INFO - [base]           "href": "s3://unity-bucket/urn:nasa:unity:emit:dev:emit_ghg_test___1/urn:nasa:unity:emit:dev:emit_ghg_test___1:emit20230620T084426_ch4_mf/emit20230620T084426_ch4_mf",
[2025-02-06, 21:04:02 UTC] {pod_manager.py:471} INFO - [base]           "title": "ENVI file",
[2025-02-06, 21:04:02 UTC] {pod_manager.py:471} INFO - [base]           "description": "",
[2025-02-06, 21:04:02 UTC] {pod_manager.py:471} INFO - [base]           "roles": [
[2025-02-06, 21:04:02 UTC] {pod_manager.py:471} INFO - [base]             "data"
[2025-02-06, 21:04:02 UTC] {pod_manager.py:471} INFO - [base]           ]
[2025-02-06, 21:04:02 UTC] {pod_manager.py:471} INFO - [base]         },
[2025-02-06, 21:04:02 UTC] {pod_manager.py:471} INFO - [base]         "emit20230620T084426_ch4_mf.hdr": {
[2025-02-06, 21:04:02 UTC] {pod_manager.py:471} INFO - [base]           "href": "s3://unity-bucket/urn:nasa:unity:emit:dev:emit_ghg_test___1/urn:nasa:unity:emit:dev:emit_ghg_test___1:emit20230620T084426_ch4_mf/emit20230620T084426_ch4_mf.hdr",
[2025-02-06, 21:04:02 UTC] {pod_manager.py:471} INFO - [base]           "title": "ENVI_hdr file",
[2025-02-06, 21:04:02 UTC] {pod_manager.py:471} INFO - [base]           "description": "",
[2025-02-06, 21:04:02 UTC] {pod_manager.py:471} INFO - [base]           "roles": [
[2025-02-06, 21:04:02 UTC] {pod_manager.py:471} INFO - [base]             "data"
[2025-02-06, 21:04:02 UTC] {pod_manager.py:471} INFO - [base]           ]
[2025-02-06, 21:04:02 UTC] {pod_manager.py:471} INFO - [base]         },
[2025-02-06, 21:04:02 UTC] {pod_manager.py:471} INFO - [base]         "emit20230620T084426_ch4_mf_ort": {
[2025-02-06, 21:04:02 UTC] {pod_manager.py:471} INFO - [base]           "href": "s3://unity-bucket/urn:nasa:unity:emit:dev:emit_ghg_test___1/urn:nasa:unity:emit:dev:emit_ghg_test___1:emit20230620T084426_ch4_mf/emit20230620T084426_ch4_mf_ort",
[2025-02-06, 21:04:02 UTC] {pod_manager.py:471} INFO - [base]           "title": "ENVI file",
[2025-02-06, 21:04:02 UTC] {pod_manager.py:471} INFO - [base]           "description": "",
[2025-02-06, 21:04:02 UTC] {pod_manager.py:471} INFO - [base]           "roles": [
[2025-02-06, 21:04:02 UTC] {pod_manager.py:471} INFO - [base]             "data"
[2025-02-06, 21:04:02 UTC] {pod_manager.py:471} INFO - [base]           ]
[2025-02-06, 21:04:02 UTC] {pod_manager.py:471} INFO - [base]         },
[2025-02-06, 21:04:02 UTC] {pod_manager.py:471} INFO - [base]         "emit20230620T084426_ch4_mf_ort.hdr": {
[2025-02-06, 21:04:02 UTC] {pod_manager.py:471} INFO - [base]           "href": "s3://unity-bucket/urn:nasa:unity:emit:dev:emit_ghg_test___1/urn:nasa:unity:emit:dev:emit_ghg_test___1:emit20230620T084426_ch4_mf/emit20230620T084426_ch4_mf_ort.hdr",
[2025-02-06, 21:04:02 UTC] {pod_manager.py:471} INFO - [base]           "title": "ENVI_hdr file",
[2025-02-06, 21:04:02 UTC] {pod_manager.py:471} INFO - [base]           "description": "",
[2025-02-06, 21:04:02 UTC] {pod_manager.py:471} INFO - [base]           "roles": [
[2025-02-06, 21:04:02 UTC] {pod_manager.py:471} INFO - [base]             "data"
[2025-02-06, 21:04:02 UTC] {pod_manager.py:471} INFO - [base]           ]
[2025-02-06, 21:04:02 UTC] {pod_manager.py:471} INFO - [base]         },
[2025-02-06, 21:04:02 UTC] {pod_manager.py:471} INFO - [base]         "emit20230620T084426_ch4_mf.json": {
[2025-02-06, 21:04:02 UTC] {pod_manager.py:471} INFO - [base]           "href": "s3://unity-bucket/urn:nasa:unity:emit:dev:emit_ghg_test___1/urn:nasa:unity:emit:dev:emit_ghg_test___1:emit20230620T084426_ch4_mf/emit20230620T084426_ch4_mf.json",
[2025-02-06, 21:04:02 UTC] {pod_manager.py:471} INFO - [base]           "title": "json file",
[2025-02-06, 21:04:02 UTC] {pod_manager.py:471} INFO - [base]           "description": "",
[2025-02-06, 21:04:02 UTC] {pod_manager.py:471} INFO - [base]           "roles": [
[2025-02-06, 21:04:02 UTC] {pod_manager.py:471} INFO - [base]             "metadata"
[2025-02-06, 21:04:02 UTC] {pod_manager.py:471} INFO - [base]           ]
[2025-02-06, 21:04:02 UTC] {pod_manager.py:471} INFO - [base]         }
[2025-02-06, 21:04:02 UTC] {pod_manager.py:471} INFO - [base]       },
[2025-02-06, 21:04:02 UTC] {pod_manager.py:471} INFO - [base]       "stac_extensions": [],
[2025-02-06, 21:04:02 UTC] {pod_manager.py:471} INFO - [base]       "collection": "urn:nasa:unity:emit:dev:emit_ghg_test___1"
[2025-02-06, 21:04:02 UTC] {pod_manager.py:471} INFO - [base]     }
[2025-02-06, 21:04:02 UTC] {pod_manager.py:471} INFO - [base]   ]
[2025-02-06, 21:04:02 UTC] {pod_manager.py:471} INFO - [base] }

@LucaCinquini
Copy link
Collaborator Author

This is great, thanks so much to you and Brian. Let's review and demo on Monday.

@LucaCinquini LucaCinquini self-assigned this Feb 11, 2025
@LucaCinquini LucaCinquini moved this from In Progress to Done in Unity Project Board Feb 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request U-SPS
Projects
Status: Done
Development

No branches or pull requests

3 participants