Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Application Package standardization -stage in/out to platform Implementation #206

Open
3 tasks
mike-gangl opened this issue Sep 10, 2024 · 5 comments
Open
3 tasks
Assignees
Labels
Feature Feature label used in Unity Project U-ADE U-ADS U-DS U-SPS

Comments

@mike-gangl
Copy link
Contributor

mike-gangl commented Sep 10, 2024

Application Package standardization -stage in/out to platform Implementation

This ticket is to track the feature work to align the MDPS application package generator and platforms with the OGC Application package best practices Specification.

Design Ticket is located at #187

Note - there are many moving parts with this approach, and we must be able to support the current approach while working towards these new application packages.

  • SPS should maintain the current CWL DAG while adding a new app-package dag
  • U-DS must continue to support stage-in/out steps in CWL
  • App packages must be able to build existing app packages, non-default flag for building new ones (e.g. don't bundle stage_in/out)
  • Documentation for each service as they progress

Acceptance Criteria

Application Packages

  • The algorithm CWL specifies file/data inputs as Directory types, and in that directory will/can read a catalog.json file to get the data its relying on (A.2.1). This changes the input type from a catalog.json location to a Directory.
  • The algorithm writes outputs to a Directory along with a stac catalog catalog.json), describing the files in that directory. (A.3.1). Outputs have always been a directory type and will continue to be one for a valid application package or workflow
  • NEW Gherkin Test case added to System Test DURING THE PLANNING
  • NEW Gherkin Test case implementation added to System Test during PI

Stage-in/Out

  • stage-in and out functionalities are made available and documented for the SPS team to use
  • stage-out functionality should utilize the collection identifier in a stac-item. https://github.com/radiantearth/stac-spec/blob/master/item-spec/item-spec.md#collection If this is not present, the identifier in the catalog should be used.
  • if a non-conforming colleciton id is given (e.g. "my_cool_collection") it should be 'converted' to a valid collection: urn:nasa:unity:<project>:<venue>:my_cool_collection___1
  • stage-in/out tooling should provide a way for explicit (as part of the call) or implicit (env var) setting of the project, venue, and bucket variables.

Platform

  • the platform will be able to take an 'input' parameter (a valid STAC document or URL to a stac document) and stage those inputs in a single directory for usage by an application package.
    • The platform shall be capable of retrieving URLs to stac documents from CMR or from the UDS Catalog
  • The platform shall generate a catalog.json file and associated item files or entries for all staged data products and add it the directory containing staged data.
  • After execution of a CWL workflow, the platform shall read data from a "process_output" directory and persist (stage-out) all data products specified in the catalog.json entry.
  • The platform can determine the S3 bucket to which files are persisted through environment or SSM parameters. These are not specified by the user (unless we want to override them?)
  • Items like EDL passwords, Unity passwords will default to SSM parameters, and those must be created prior to invocation of the job run (e.g. using the management console, a user can create SSM parameters that map to EDL username/passwords).
  • NEW Gherkin Test case(s) added to System Test DURING THE PLANNING
  • NEW Gherkin Test case(s) implementation added to System Test during PI

Running the Example Application Package:

Current inputs required to run the Unity-example-application

input_json = """
    {{
      "stage_in": {{
        "stac_json": "https://raw.githubusercontent.com/unity-sds/unity-tutorial-application/main/test/stage_in/stage_in_results.json",
        "downloading_roles": "",
        "downloading_keys": "data",
        "download_type": "HTTP",
        "edl_username": null,
        "edl_password_type": "",
        "edl_password": "",
        "unity_client_id": "",
        "unity_stac_auth": "NONE"
      }},
      "parameters": {{
        "output_collection": "urn:nasa:unity:{0}:{1}:unity-tutorial___1",
        "summary_table_filename": "summary_table_{2}.txt"
      }},
      "stage_out": {{
        "staging_bucket": "{3}",
        "collection_id": "urn:nasa:unity:{0}:{1}:unity-tutorial___1",
        "result_path_prefix": "stage_out"
      }}
    }}
    """.format(
        project, venue, date, s3bucket
    )

After stage-in/out:

input_json = """
    {{
        "input": "https://raw.githubusercontent.com/unity-sds/unity-tutorial-application/main/test/stage_in/stage_in_results.json", # -- because this has the 'input' key, this will be staged
        # "input_downloading_roles": "" --  for overriding default behaviors?
        # "input_downloading_keys": "data", -- for overriding default behaviors
        # "input_download_type": "HTTP", -- for overriding default behaviors
        "output_collection": "urn:nasa:unity:{0}:{1}:unity-tutorial___1", -- inputs to the app pacakge
        "summary_table_filename": "summary_table_{2}.txt" -- inputs to the app package
        # "staging_bucket": "{3}", -- should be discovered by platform through SSM/env
    }}
    """.format(
        project, venue, date, s3bucket
    )

Work Tickets

Link to work tickets required to implement the epic

  • TBC (to be created)

Dependencies

Other epics or outside tickets required for this to work

  • TBC

Associated Risks

links to risk issues associated with this epic

  • TBC
@LucaCinquini
Copy link

System Validation Criteria:
o Gherkin tests for successful execution of each target use case (EMIT, SBG, ASPIS) using the decomposed stage-in/process/stage-out DAG, where (at least) the process Task is executed via a CWL workflow.

@LucaCinquini
Copy link

A first prototype version of a modular stage-in / process / stage-out DAG is part of the 24.4 release. The team needs to create more examples of modularized CWL workflows which can be used for further testing.

@brianlee731
Copy link
Contributor

@rtapella
Copy link
Collaborator

what application is used to test these? is the https://github.com/unity-sds/unity-tutorial-application used for anything?

if not, we will need to also update unity-tutorial-application and associated documentation/tutorials

@brianlee731
Copy link
Contributor

We can get rid of the unity-tutorial-application; I had forked it from unity-example-application in case we wanted to change anything for the tutorial. We ended up just using unity-example-application as is.

With that being said, I'll delete the repo unless anyone has concerns about removing it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature Feature label used in Unity Project U-ADE U-ADS U-DS U-SPS
Projects
Status: In Progress
Development

No branches or pull requests

6 participants