Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Landsat Workflow Automation #7

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from
Draft

Landsat Workflow Automation #7

wants to merge 3 commits into from

Conversation

KastanDay
Copy link
Collaborator

@KastanDay KastanDay commented Jun 22, 2023

See the Landsat repo for the corresponding PR enabling these changes: initze/landsattrend#17

Landsat full pipeline to automate

  1. Google Earth to Google Cloud Storage. helper.sh?
    a. Can take days, not great observability. Gives list of files, but not all files show up in storage bucket because irrelevant ones are filtered out (like all-water images).
  2. GCS to HPC. helper.sh?
  3. HPC Jobs (Runtime: 2-3 hours per zone.)
    a. Generate SLURM files: generate_slurm_for_site.py
    b. Run slurm files
    c. Port forward Ray dashboard from running nodes (from slurm) to user's laptop.
  4. Upload to Clowder
    a. Input data to Clowder helper.sh?
    b. ✅ Results to Clowder upload_region_output.sh

@tcnichol can you help me identify the right helper.sh/py files for each step?

@KastanDay KastanDay requested review from lmarini and tcnichol June 22, 2023 19:28
@KastanDay KastanDay marked this pull request as draft June 22, 2023 19:28
@KastanDay KastanDay self-assigned this Jun 22, 2023
@KastanDay
Copy link
Collaborator Author

KastanDay commented Jun 22, 2023

@tcnichol Hi Todd, maybe you could help here:

In generate_slurm_for_site.py what should I use for site_one and site_two?

Do I need to use your files like upload_missing.py and upload_larger_on_clowder.py.

It's ideal if you could just put the bash commands in here:

# step 1: Google Earth to Google Cloud Storage.

# step 2: GCS to HPC.

# step 3: run HPC jobs. 
generate_slurm_for_site.py ??? 

# step 4a - upload results to clowder. 
bash ~/codeflare_utils/landsat_workflow/landsattrend/import_export/upload_region_output.sh https://pdg.clowderframework.org/ 981ab4c8-7d22-418d-93a2-b47019c2f583 ALASKA /scratch/bbou/toddn/landsat-delta/landsattrend/process 649232e2e4b00aa1838f0fc2
echo "Completed Step 4a: 'upload_region_output.sh'"

# step 4b - upload results to clowder. 
bash ~/codeflare_utils/landsat_workflow/landsattrend/import_export/upload_region.sh https://pdg.clowderframework.org/ 981ab4c8-7d22-418d-93a2-b47019c2f583 ALASKA /scratch/bbou/toddn/landsat-delta/landsattrend/process 649232e2e4b00aa1838f0fc2
echo "Completed Step 4b: 'upload_input_regions'"

@tcnichol
Copy link

For exporting to cloud, there are python scripts in another repo:

/landsatttrend-pipeline/export_to_cloud.py $region_name

This will export a single zone

landsattrend-pipeline/export_all_to_cloud.py

That will export everything.

@tcnichol
Copy link

tcnichol commented Jun 28, 2023

Kastan, here is what I think are answers. I have added some new files that should handle all regions and sites rather than just one at a time, which would probably be useful on automating.

  1. GEE to GOOGLE CLOUD

https://github.com/PermafrostDiscoveryGateway/landsattrend-pipeline/blob/minor_fixes_in/export_to_cloud.py $region

region comes from ALASKA, CANADA, EURASIA1, EURASIA2, EURASIA3, TEST

https://github.com/PermafrostDiscoveryGateway/landsattrend-pipeline/blob/minor_fixes_in/export_all_to_cloud.py

this one takes no arguments, but exports all the regions to the bucket.

TODO for the future - add in parameters for start and end year, since that is what will change over time.

  1. From GOOGLE CLOUD to DELTA or other HPC

https://github.com/initze/landsattrend/blob/dev4Clowder_Ingmar_deployed_delta/import_export/cloud_download_all_regions.py $download_directory

the $download_directory is the location of the 'data' folder where the data should be. this downloads everything.

https://github.com/initze/landsattrend/blob/dev4Clowder_Ingmar_deployed_delta/import_export/cloud_download_region.py $region $download_directory

This also takes the region as an argument

  1. GENERATE ALL THE SLURM FILES

this new python script will generate the slum file for all sites

https://github.com/initze/landsattrend/blob/dev4Clowder_Ingmar_deployed_delta/generate_slurm_for_all_sites.py

  1. UPLOAD INPUT AND OUTPUT

https://github.com/initze/landsattrend/blob/dev4Clowder_Ingmar_deployed_delta/import_export/upload_data.py $url $key $landsat_space_id $data_dir

This will upload all the input data

https://github.com/initze/landsattrend/blob/dev4Clowder_Ingmar_deployed_delta/import_export/upload_process.py $url $key $landsat_space_id $process_dir

This will upload all the results (contents of process) This script does make assumptions about the structure of the folders under process, I should probably make it more generic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants