Skip to content

Docker container to execute the Unity App Generator Script #21

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mcduffie opened this issue Jan 21, 2025 · 16 comments
Closed

Docker container to execute the Unity App Generator Script #21

mcduffie opened this issue Jan 21, 2025 · 16 comments
Assignees
Labels

Comments

@mcduffie
Copy link
Collaborator

mcduffie commented Jan 21, 2025

Create a Docker-in-Docker container capable of running the unity-app-generator software.

Currently application packages are built in two manners:

  • Manually through direct use of the unity-app-generator
  • Automatically using the Application Package Generation API endpoint

This ticket aims to replace the latter. The current approach involves a API Gateway end point that calls a Lambda. That Lambda then triggers a process in the MCP self hosted Gitlab instance. The Gitlab actions there run the unity-app-generator to build a package and place it within Dockerhub and Dockstore.

But that approach is opaque to users of the API endpoint because there is no feedback on success or failure. Due to the licensing nature of Gitlab, only 2 people have direct access to the build logs.

Instead we would like to replace the Gitlab portion of this process with one that uses Airflow for the building of packages using unity-app-generator. This will require in this order:

  1. A Docker image that uses Docker-in-Docker to call unity-app-generator:
  • build_ogc_app init
  • build_ogc_app build_docker
  • build_ogc_app push_docker
  • build_ogc_app build_cwl
  • build_ogc_app push_app_registry
  1. An Airflow DAG that calls item 1 - [New Feature]: Create DAG to execute the Application Package generator unity-sps#275
  2. Modifications to the existing Application Package Generation Lambda to call item 2 instead of Gitlab

Item 1 will require the ability to pass credentials for push_docker and push_app_registry to the Docker image.

Item 2 will require the ability to either store the credentials in a secret store within Airflow or pass them along from item 3.

Item 3 may or may not pull credentials from the AWS Parameter store.

Since the current way Airflow is exposed allows multiple users to look at jobs, credential obfuscation in logs is an essential element of the task. Why? Because in the future it is conceivable that the Application Package Generation API endpoint could allow users to pass their credentials to direct the storage of Docker images and CWL files.

The two credentials involved are one to a Docker registry and one to an application catalog. In the current configuration the Docker registry is Dockerhub and the application catalog is Dockstore.

@mcduffie
Copy link
Collaborator Author

Follow up task will be: unity-sds/unity-sps#275

@grallewellyn
Copy link
Contributor

Finished item 1 on the above ticket
This Dockerfile runs this shell script as an entrypoint
I got the base code for this from Mike and Luca said this response is acceptable because we normally run our DAGs in privileged mode (which is necessary for this approach because we are running docker in docker)
I ran through this process and confirmed it works
These are the commands I am running to build and run the docker image

docker build --no-cache -t unity-app-gen -f docker/Dockerfile .
docker run --privileged --env GITHUB_REPO=https://github.com/unity-sds/unity-example-application --env DOCKER_USER=<DockerHub username> --env DOCKER_PAT=<DockerHub password> --env DOCKSTORE_TOKEN=<Dockstore token> unity-app-gen

To get the process.cwl generated by the above commands to work with our Modular DAG example and https://raw.githubusercontent.com/unity-sds/unity-example-application/refs/heads/main/test/stage_in/catalog.json, I had to build this on a Linux Machine

The next steps are to find a more secure way to pass the DockerHub password and Dockstore token and also creating the DAG

cc @LucaCinquini

@mcduffie
Copy link
Collaborator Author

In which Git repo do you plan to commit the Dockerfile? unity-app-generator?

@grallewellyn
Copy link
Contributor

I was working in a branch off of unity-app-generator. I thought that repo made the most sense since the entrypoint.sh is putting together the commands outlined in the unity app gen README
https://github.com/unity-sds/unity-app-generator/blob/docker-container-appgen/docker/Dockerfile

@LucaCinquini
Copy link

Hi @grallewellyn : great job with completing step 1) and finishing the Dockerfile and the shell script.
In order to use secure information for DOCKER_USER, DOCKER_PAT and DOCKSTORE_TOKEN, one option is the following:

  1. Store those values as SSM parameters in the specific venue. You can start with manually entering using the AWX console, but eventually they would need to be added by Terraform when a venue is deployed (@jl-0 : this is probably a task for the Management Console?)

2a) Retrieve those parameters from within the shell script, with a command like:
aws ssm get-parameter --name <parameter_name> --with-decryption --region us-west-2

the use those values as needed. The AWS boto library must be installed inside the Docker container that runs the shell script.

2b) Another option is to retrieve the SSM parameters from within the DAG, and pass them as env variables to the KubernetesPodOperator - see example here: https://github.com/unity-sds/unity-sps/blob/develop/airflow/dags/sbg_preprocess_no_cwl.py
But I am not sure if the values would be echoed in the logs, so I would try the other option first.

@grallewellyn
Copy link
Contributor

I will try the first option!
If we want to store DOCKER_USER, DOCKER_PAT and DOCKSTORE_TOKEN as SSM parameters, how should we get the users to enter them? Through a separate form for their account (i.e. do it once and it persists for your account)? Or via the DAG input parameters on each run?

@LucaCinquini
Copy link

LucaCinquini commented Mar 19, 2025

Actually @grallewellyn I might have made a mistake... I thought those parameters would be the same for all runs, in which case it makes sense to store them as system-wide SSM parameters. But instead different users will want to push Docker images to different repositories? If that is the case, could you investigate what are the recommended ways to securely pass credentials from the UI (such that they are not echoed in the logs).

@grallewellyn
Copy link
Contributor

Would it be the case that different users want to push Docker images to different repositories? I am not very familiar with our user working group but I think that would make sense as a case
If you agree with this, then I can investigate ways to securely pass credentials from the UI

@LucaCinquini
Copy link

Yes, I think we need to support different users using the same SPS installation to push Docker images to different repos. @mcduffie would you agree? Or would a project always push to the same Docker repo?

@mcduffie
Copy link
Collaborator Author

I don't think we need to investigate user supplied Docker registry credentials. This will be part of an automated system to push into the application catalog. The application catalog and Docker registry will in the future use the same credentials. But for now we need to save the Docker registry credentials as a system wide SSM. If a user needs to push somewhere else they will have to use the package generator manually.

@LucaCinquini
Copy link

This ticket should be carried on to the next PI to demonstrate the full end-to-end functionality.

@grallewellyn
Copy link
Contributor

Sorry, this ticket has taken me a while because I have been trying to get my airflow deployment working, after giving up on fixing unity-graceal-1 and graceal-1 projects (when they had both worked at some point then randomly stopped), I created a new graceal-2 project which worked and allowed me to post my DAG
Now I am going to start a thread in slack with James about how to get docker registry credentials and DockStore token from SSM

@grallewellyn
Copy link
Contributor

grallewellyn commented Apr 9, 2025

I completed this ticket. Here are some notes about it:

DockerHub and Dockstore credentials are being stored in AWS SSM parameter store. I uploaded for Unity-venue-dev, Unity-venue-test and Unity-venue-ops

  • For Unity-venue-ops, I added the access tokens as a secure string because this seemed to be following the convention that ops was already doing. I don’t access the parameters any differently, but costs might apply: AWS Systems Manager Parameter Store - AWS Systems Manager If we don’t want to store tokens as secure strings, we should discuss that
  • These DockerHub and Dockstore accounts were created with an internal email.

I published the Docker image that we are using in the DAG to our jplmdps account: https://hub.docker.com/r/jplmdps/unity-app-gen

I tested the generated process.cwl file with a CWL DAG Modular

My code for the Dockerfile is here: #31 and my code for the DAG is here: unity-sds/unity-sps#390

Note that the Dockerfile will need to change when we want to deploy and start using v1.1.0 of unity app generator and app pack generator (needs to be deployed on pip when changes to these packages are finished)

The logs are not displaying any sensitive data such as DockerHub and Dockerstore tokens

We can add "Application Package Generator DAG" as an option for users and close this ticket once my PRs are merged:
#31
unity-sds/unity-sps#390

@LucaCinquini
Copy link

Great work @grallewellyn - we can discuss on Monday. The only question for now is whether we should push the Docker image jplmdps/unity-app-gen to Google Container Registry instead of DockerHub.

@github-project-automation github-project-automation bot moved this from In Progress to Done in Unity Project Board Apr 22, 2025
@grallewellyn
Copy link
Contributor

Reopening because bullet 3 was not completed: "3. Modifications to the existing Application Package Generation Lambda to call item 2 instead of Gitlab"

@grallewellyn grallewellyn reopened this Apr 24, 2025
@mcduffie
Copy link
Collaborator Author

mcduffie commented May 1, 2025

Closing because "3. Modifications to the existing Application Package Generation Lambda to call item 2 instead of Gitlab" has been descoped.

@mcduffie mcduffie closed this as completed May 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Done
Development

No branches or pull requests

3 participants