Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: docker_cwl_entrypoint.sh can't handle large number of inputs #326

Open
mike-gangl opened this issue Feb 18, 2025 · 6 comments
Open
Assignees
Labels
bug Something isn't working U-SPS

Comments

@mike-gangl
Copy link

Checked for duplicates

Have you checked for duplicate issue tickets?

Yes - I've already checked

Describe the bug

The ASIPS team attempted to provide a stac_catalog of 240 input files. This wasn't referenced as a URL or a file to download, but as a string sent to the airflow API directly. The error encountered was:

[2025-02-14, 20:12:22 UTC] {pod_manager.py:489} INFO - [base] exec /usr/share/cwl/docker_cwl_entrypoint.sh: argument list too long

It seems the STAC catalog "string" is simply too long.

Reproducible steps

Gathering more infromation on this right now, https://jpl-eosdis.slack.com/archives/C075D7F9EUD/p1739565358265049 contains the conversation.

Note: a work around "kind of" exists for this as we can query the UDS catalog for the data files to pass in successfully. But if the data are not in U-DS, or we can't craft a query to select the specific files required, then this fix will be required.

What is your environment?

Environment was ASIPS-INT environment.

@mike-gangl mike-gangl added the bug Something isn't working label Feb 18, 2025
@mike-gangl
Copy link
Author

mike-gangl commented Feb 18, 2025

Input json file that caused the issue:

formatted.json

Workflow:
https://raw.githubusercontent.com/asips/mdps-prototype/main/workflows/mvcm_l3/mvcm_d3.workflow.cwl

Airflow request (everything):

d3_example.json

@nikki-t
Copy link
Collaborator

nikki-t commented Feb 20, 2025

I think the issue is that the STAC JSON string passed into the entrypoint script is ~300mb and it bumps up against size limitations imposed by the system and bash shell. I was able to reproduce this error using the AIrflow API and I also tried submitting it via the Airflow UI and ran into an error with a similar issue: "Request Entity Too Large".

It seems like there is interest in supporting passing in the STAC_JSON as a string to the CWL DAG using the AIrflow API. An initial thought is to:

  1. Take the STAC JSON string content and store it as a file in the CWL DAG code.
  2. Then the CWL DAG can upload it to an S3 bucket.
  3. That the entypoint script can download to local EBS storage.
  4. Then the entrypoint can pass in the file as an argument to the process CWL.

@nikki-t
Copy link
Collaborator

nikki-t commented Feb 20, 2025

@LucaCinquini - I tried to run the CWL DAG on the inputs stored as a JSON file taken from the d3_example file and the cwl_args key. They are stored here: https://raw.githubusercontent.com/unity-sds/unity-sps-workflows/refs/heads/326-entrypoint-large-inputs/demos/asips-mvcm_d3.json

This is the CWL that I used: https://raw.githubusercontent.com/asips/mdps-prototype/main/workflows/mvcm_l3/mvcm_d3.workflow.cwl

I keep running into this error:

[2025-02-20, 20:46:31 UTC] {pod_manager.py:471} INFO - [base] ERROR Workflow error:
[2025-02-20, 20:46:31 UTC] {pod_manager.py:471} INFO - [base] Invalid job input record:
[2025-02-20, 20:46:31 UTC] {pod_manager.py:471} INFO - [base] https://raw.githubusercontent.com/unity-sds/unity-sps-workflows/refs/heads/326-entrypoint-large-inputs/demos/asips-mvcm_d3.json:3:3: 
[2025-02-20, 20:46:31 UTC] {pod_manager.py:471} INFO - [base] the 'stac_json' field is not valid because tried File but Missing 'class' field

Did I maybe grab the input incorrectly or am running the wrong workflow? @mike-gangl - Can we confirm that the CWL arguments passed in as a string are valid inputs to the workflow?

@nikki-t
Copy link
Collaborator

nikki-t commented Feb 25, 2025

I was almost able to get the CWL DAG to execute the ASIPS workflow via the Airflow UI where it was failing on stage in credential authentication.

I had to set the stac_json input parameter to string and pass the STAC JSON in as a string:

When I ran this via the API I got the same, argument lists too long so I think this is still a limitation of the bash shell.

@LucaCinquini
Copy link
Collaborator

@nikki-t : could you please point me to the DAG run which you executed via the Airflow UI?

@nikki-t
Copy link
Collaborator

nikki-t commented Feb 25, 2025

Here is the DAG run where I executed via the Airflow UI: http://unity-nikki-1-dev-httpd-alb-761067244.us-west-2.elb.amazonaws.com:8080/unity-nikki-1/dev/sps/dags/cwl_dag/grid?dag_run_id=manual__2025-02-25T14%3A29%3A42%2B00%3A00&task_id=cwl_task&tab=logs

It ran into a Docker issue because it is trying to pull down a data services container image from ECR so maybe I just needed to log into ECR? Not sure if that ECR repo is hosted in unity-venue-dev though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working U-SPS
Projects
Status: Todo
Development

No branches or pull requests

3 participants