Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Demonstrate SPS Scalability #213

Open
LucaCinquini opened this issue Sep 18, 2024 · 6 comments
Open

Demonstrate SPS Scalability #213

LucaCinquini opened this issue Sep 18, 2024 · 6 comments
Assignees
Labels
Feature Feature label used in Unity Project PSE U-SPS

Comments

@LucaCinquini
Copy link

LucaCinquini commented Sep 18, 2024

Examples of Scalability:

  • ASIPS 1 day
  • ASIPS 1 month
  • EMIT 300 scene
  • SBG 1 day
  • SBG 1 month

Risks to scaling:

  • NAT Gateway - anything we need to get "over the internet" Docker containers, HTTP Data). Use S3, ECR where possible and monitor gateway usage
  • Shared Resources - U-DS might be a bottleneck at some point with ingestion of data (TBD), what's the collective view into the system?
  • Costs - how will we monitor cost usage during this time?
@LucaCinquini LucaCinquini converted this from a draft issue Sep 18, 2024
@LucaCinquini LucaCinquini self-assigned this Sep 18, 2024
@LucaCinquini LucaCinquini added U-SPS Feature Feature label used in Unity Project PSE labels Sep 18, 2024
@LucaCinquini LucaCinquini changed the title Demonstrated Scalability Demonstrated SPS Scalability Sep 18, 2024
@LucaCinquini
Copy link
Author

This goal could potentially be combined with the initiators to demonstrate scalability when triggering EDRgen/RDRgen for SRL.

@LucaCinquini LucaCinquini changed the title Demonstrated SPS Scalability Demonstrate SPS Scalability Sep 18, 2024
@mike-gangl
Copy link
Contributor

mike-gangl commented Sep 25, 2024

features/sps/scale.feature:

Feature: Scaled SPS Testing
  Test the SPS by scaling to meet various workloads. This should be run ocassionally, and not a part of nightly tests.

@develop @test @scale
Scenario : The SPS shall be able to run a single day of ASIPS data within X hours
      Given a listing of 1 day of ASIPS input data
      When I request 12 runs of asips_workflow 
      Then all workflows are submitted successfully
      And 12 nodes have spun up to process SBG Workflows #(or we can make this an env variable to see what the maximum umber of jobs processed at once can be)
      And all workflows successfully complete
      #And the workflow data shows up in the data catalog # -- removed, not a part of SPS Scalability scope
      And the total test time was less than X hours

@develop @test @scale
Scenario : The SPS shall be able to run a single day of SBG Preprocess data within X hours
      Given a listing of 1 day of SBG input data
      When I request X runs of SBG_preprocess_workflow 
      Then all workflows are submitted successfully
      And X nodes have spun up to process SBG Workflows #(or we can make this an env variable to see what the maximum umber of jobs processed at once can be)
      And all workflows successfully complete
      # And the workflow data shows up in the data catalog # --  removed, not a part of SPS Scalability scope
      And the total test time was less than X hours

this could be run by using

behave features/sps/scale.feature -n "The SPS shall be able to run a single day of ASIPS data within X hours" #runs ASIPS test
behave features/sps/scale.feature -n "The SPS shall be able to run a single day of SBG Preprocess data within X hours" #runs SBG Preprocess test...

The code that implements the test would:

  • decide how to parse the given listing of input products into something that could be submitted
  • Decide how to submit the jobs (OGC API, Airflow API)
  • Create the jobs specifying workflow, job inputs in the When I request ... runs command

This is very similar code to https://github.com/unity-sds/unity-monorepo/blob/main/tests/system-tests/features/sps/cwl.feature and the implementations will be very similar, it will simply be different mechanisms to submit the jobs and what workflows to use. the workflow can be defined in the code itself or in an environment variable. Alternatively we could specify the workflow in the step itself, and then the code is always tied to that particular workflow, until we change the test itself- not a terrible idea so we know what we're testing (a link to dockstore, github, etc)

@LucaCinquini
Copy link
Author

System Validation Criteria:
o A Unity developer is able to scale up the execution of each of the target workspaces defined above
o Successful execution of the Gherkin tests

@GodwinShen
Copy link

@LucaCinquini please list the SPS board work tickets in this ticket. Running EMIT jobs in parallel doubles the run-time, possibly due to the large docker container.

@GodwinShen GodwinShen moved this from Todo to In Progress in Unity Project Board Oct 22, 2024
@LucaCinquini
Copy link
Author

SPS has successfully demonstrated scalability for these 2 use cases:
o EMIT, working with Jay
o SBG end-to-end, using a Docker image stored in ECR

This ticket could be either closed now, or moved to the next PI for further scalability testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature Feature label used in Unity Project PSE U-SPS
Projects
Status: In Progress
Development

No branches or pull requests

3 participants