-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: docker_cwl_entrypoint.sh can't handle large number of inputs #326
Comments
Input json file that caused the issue: Workflow: Airflow request (everything): |
I think the issue is that the STAC JSON string passed into the entrypoint script is ~300mb and it bumps up against size limitations imposed by the system and bash shell. I was able to reproduce this error using the AIrflow API and I also tried submitting it via the Airflow UI and ran into an error with a similar issue: "Request Entity Too Large". It seems like there is interest in supporting passing in the STAC_JSON as a string to the CWL DAG using the AIrflow API. An initial thought is to:
|
@LucaCinquini - I tried to run the CWL DAG on the inputs stored as a JSON file taken from the This is the CWL that I used: https://raw.githubusercontent.com/asips/mdps-prototype/main/workflows/mvcm_l3/mvcm_d3.workflow.cwl I keep running into this error: [2025-02-20, 20:46:31 UTC] {pod_manager.py:471} INFO - [base] ERROR Workflow error:
[2025-02-20, 20:46:31 UTC] {pod_manager.py:471} INFO - [base] Invalid job input record:
[2025-02-20, 20:46:31 UTC] {pod_manager.py:471} INFO - [base] https://raw.githubusercontent.com/unity-sds/unity-sps-workflows/refs/heads/326-entrypoint-large-inputs/demos/asips-mvcm_d3.json:3:3:
[2025-02-20, 20:46:31 UTC] {pod_manager.py:471} INFO - [base] the 'stac_json' field is not valid because tried File but Missing 'class' field Did I maybe grab the input incorrectly or am running the wrong workflow? @mike-gangl - Can we confirm that the CWL arguments passed in as a string are valid inputs to the workflow? |
I was almost able to get the CWL DAG to execute the ASIPS workflow via the Airflow UI where it was failing on stage in credential authentication. I had to set the
When I ran this via the API I got the same, |
@nikki-t : could you please point me to the DAG run which you executed via the Airflow UI? |
Here is the DAG run where I executed via the Airflow UI: http://unity-nikki-1-dev-httpd-alb-761067244.us-west-2.elb.amazonaws.com:8080/unity-nikki-1/dev/sps/dags/cwl_dag/grid?dag_run_id=manual__2025-02-25T14%3A29%3A42%2B00%3A00&task_id=cwl_task&tab=logs It ran into a Docker issue because it is trying to pull down a data services container image from ECR so maybe I just needed to log into ECR? Not sure if that ECR repo is hosted in unity-venue-dev though. |
Checked for duplicates
Yes - I've already checked
Describe the bug
The ASIPS team attempted to provide a stac_catalog of 240 input files. This wasn't referenced as a URL or a file to download, but as a string sent to the airflow API directly. The error encountered was:
It seems the STAC catalog "string" is simply too long.
Reproducible steps
Gathering more infromation on this right now, https://jpl-eosdis.slack.com/archives/C075D7F9EUD/p1739565358265049 contains the conversation.
Note: a work around "kind of" exists for this as we can query the UDS catalog for the data files to pass in successfully. But if the data are not in U-DS, or we can't craft a query to select the specific files required, then this fix will be required.
What is your environment?
Environment was ASIPS-INT environment.
The text was updated successfully, but these errors were encountered: