Open
Description
Application Package standardization -stage in/out to platform Implementation
This ticket is to track the feature work to align the MDPS application package generator and platforms with the OGC Application package best practices Specification.
Design Ticket is located at #187
Note - there are many moving parts with this approach, and we must be able to support the current approach while working towards these new application packages.
- SPS should maintain the current CWL DAG while adding a new app-package dag
- U-DS must continue to support stage-in/out steps in CWL
- App packages must be able to build existing app packages, non-default flag for building new ones (e.g. don't bundle stage_in/out)
- Documentation for each service as they progress
Acceptance Criteria
Application Packages
- The algorithm CWL specifies file/data inputs as Directory types, and in that directory will/can read a catalog.json file to get the data its relying on (A.2.1). This changes the input type from a catalog.json location to a Directory.
- The algorithm writes outputs to a Directory along with a stac catalog catalog.json), describing the files in that directory. (A.3.1). Outputs have always been a directory type and will continue to be one for a valid application package or workflow
- NEW Gherkin Test case added to System Test DURING THE PLANNING
- NEW Gherkin Test case implementation added to System Test during PI
Stage-in/Out
- stage-in and out functionalities are made available and documented for the SPS team to use
- stage-out functionality should utilize the collection identifier in a stac-item. https://github.com/radiantearth/stac-spec/blob/master/item-spec/item-spec.md#collection If this is not present, the identifier in the catalog should be used.
- if a non-conforming colleciton id is given (e.g. "my_cool_collection") it should be 'converted' to a valid collection:
urn:nasa:unity:<project>:<venue>:my_cool_collection___1
- stage-in/out tooling should provide a way for explicit (as part of the call) or implicit (env var) setting of the project, venue, and bucket variables.
Platform
- the platform will be able to take an 'input' parameter (a valid STAC document or URL to a stac document) and stage those inputs in a single directory for usage by an application package.
- The platform shall be capable of retrieving URLs to stac documents from CMR or from the UDS Catalog
- The platform shall generate a catalog.json file and associated item files or entries for all staged data products and add it the directory containing staged data.
- After execution of a CWL workflow, the platform shall read data from a "process_output" directory and persist (stage-out) all data products specified in the catalog.json entry.
- The platform can determine the S3 bucket to which files are persisted through environment or SSM parameters. These are not specified by the user (unless we want to override them?)
- Items like EDL passwords, Unity passwords will default to SSM parameters, and those must be created prior to invocation of the job run (e.g. using the management console, a user can create SSM parameters that map to EDL username/passwords).
- NEW Gherkin Test case(s) added to System Test DURING THE PLANNING
- NEW Gherkin Test case(s) implementation added to System Test during PI
Running the Example Application Package:
Current inputs required to run the Unity-example-application
input_json = """
{{
"stage_in": {{
"stac_json": "https://raw.githubusercontent.com/unity-sds/unity-tutorial-application/main/test/stage_in/stage_in_results.json",
"downloading_roles": "",
"downloading_keys": "data",
"download_type": "HTTP",
"edl_username": null,
"edl_password_type": "",
"edl_password": "",
"unity_client_id": "",
"unity_stac_auth": "NONE"
}},
"parameters": {{
"output_collection": "urn:nasa:unity:{0}:{1}:unity-tutorial___1",
"summary_table_filename": "summary_table_{2}.txt"
}},
"stage_out": {{
"staging_bucket": "{3}",
"collection_id": "urn:nasa:unity:{0}:{1}:unity-tutorial___1",
"result_path_prefix": "stage_out"
}}
}}
""".format(
project, venue, date, s3bucket
)
After stage-in/out:
input_json = """
{{
"input": "https://raw.githubusercontent.com/unity-sds/unity-tutorial-application/main/test/stage_in/stage_in_results.json", # -- because this has the 'input' key, this will be staged
# "input_downloading_roles": "" -- for overriding default behaviors?
# "input_downloading_keys": "data", -- for overriding default behaviors
# "input_download_type": "HTTP", -- for overriding default behaviors
"output_collection": "urn:nasa:unity:{0}:{1}:unity-tutorial___1", -- inputs to the app pacakge
"summary_table_filename": "summary_table_{2}.txt" -- inputs to the app package
# "staging_bucket": "{3}", -- should be discovered by platform through SSM/env
}}
""".format(
project, venue, date, s3bucket
)
Work Tickets
Link to work tickets required to implement the epic
- TBC (to be created)
Dependencies
Other epics or outside tickets required for this to work
- TBC
Associated Risks
links to risk issues associated with this epic
- TBC
Metadata
Metadata
Assignees
Type
Projects
Status
Done