-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deployment failures 2025-01-03 and 2025-01-05 #1410
Comments
First one:
Second one:
|
It's not clear how to add packages to the environment or where the environment comes from. For example, there is no obvious place in |
@slifty Yesterday you suggested you might take a look at this, marking it so here too. |
My preferred solution is to build the docker image and have digital ocean deploy that image. That way there is no need for Digital Ocean to be running lint commands in the first place. |
The goal is to restore the Digital Ocean (DO) pipeline while keeping the existing `lint` command that now includes `sqlfluff` calls. The assumption is that an Ubuntu Jammy image is being built and that when an Aptfile is present, DO will use it before or during a build. See https://docs.digitalocean.com/products/app-platform/reference/buildpacks/aptfile/ Hat tip to Dan Schultz for finding that documentation. Issue #1410
The goal is to restore the Digital Ocean (DO) deployment pipeline while keeping the existing `lint` command that now includes `sqlfluff` calls. The assumption is that an Ubuntu Jammy image is being built and that when an Aptfile is present, DO will use it before or during a build. The specific error we're getting at deploy time is this: > postinstall > > [ ! -d "venv" ] && python3 -m venv venv; ./venv/bin/pip install -r sqlfluff/requirements.txt > The virtual environment was not created successfully because ensurepip is not > available. On Debian/Ubuntu systems, you need to install the python3-venv > package... See https://docs.digitalocean.com/products/app-platform/reference/buildpacks/aptfile/ Hat tip to Daniel Schultz for finding that documentation. Issue #1410
OK, #1416 partially worked in that I now see this in the build log:
But it still fails to find pip:
Perhaps we need |
The previous attempt did not resolve a Digital Ocean deployment failure, `pip` is still not found. It appears the package from `Aptfile` was in fact installed so explicitly install `pip` too. Issue #1410
#1417 was a more valiant effort, but it too failed:
|
See also https://askubuntu.com/questions/879437/ensurepip-is-disabled-in-debian-ubuntu-for-the-system-python specifically the comment about Debian 11 still having the issue years later. |
Next step is to try deploying using a GH-built docker image, referencing GHCR from DO. |
Without this commit, the Digital Ocean deployment pipeline would pull the new source code, re-run lint, re-run tests, build a docker image, and then deploy. The problem with this approach is that it includes too much, does too much, and is entirely outside our CI/CD steps at the canonical source forge (GH). The problem is revealed when running the `lint` step at Digital Ocean: no Python environment. But then an attempt to add Python to the environment both fails and does not make much sense. There doesn't need to be a Python environment, nor any development environment for that matter, present in the production deployment image. Only the already-linted, already-compiled, and already-tested code should go into production. Instead of trying to remove steps from the DO pipeline, this commit restores a docker image build and push when a merge to main occurs. Note that a controversial token is not used in this step and the inter-repository trigger that previously used such a token not restored. The approach here is to build an artifact out of components that have already been linted, compiled, and tested, and do not include any development tools along the way. See the `npm prune --omit=dev` command. Note that this does not perfectly meet the objective but further improvements can be done in later commits, for example, making this step contingent on previous workflow steps and declaring `uses` the exact same deployment image in the lint and test workflows. On the DO App Platform side, there is an option to run a docker image rather than compiling and building one from source. Other benefits come from this approach. The version of node used by DO was not clear until looking deep into logs or running a shell on the production runtime. The version of node was not able to controlled, either. With the approach of running an existing docker image, the full runtime, including node version, is known and controlled. A further benefit is when an issue occurs in production that cannot be easily reproduced with source code, the exact image running in production can be run on a local development environment by pulling from the image repository. Issue #1410 PhilanthropyDataCommons/deploy#118 PR #1236
Without this commit, the Digital Ocean deployment pipeline would pull the new source code, re-run lint, re-run tests, build a docker image, and then deploy. The problem with this approach is that it includes too much, does too much, and is entirely outside our CI/CD steps at the canonical source forge (GH). The problem is revealed when running the `lint` step at Digital Ocean: no Python environment. But then an attempt to add Python to the environment both fails and does not make much sense. There doesn't need to be a Python environment, nor any development environment for that matter, present in the production deployment image. Only the already-linted, already-compiled, and already-tested code should go into production. Instead of trying to remove steps from the DO pipeline, this commit restores a docker image build and push when a merge to main occurs. Note that a controversial token is not used in this step and the inter-repository trigger that previously used such a token not restored. The approach here is to build an artifact out of components that have already been linted, compiled, and tested, and do not include any development tools along the way. See the `npm prune --omit=dev` command. Note that this does not perfectly meet the objective but further improvements can be done in later commits, for example, making this step contingent on previous workflow steps and declaring `uses` the exact same deployment image in the lint and test workflows. On the DO App Platform side, there is an option to run a docker image rather than compiling and building one from source. Other benefits come from this approach. The version of node used by DO was not clear until looking deep into logs or running a shell on the production runtime. The version of node was not able to controlled, either. With the approach of running an existing docker image, the full runtime, including node version, is known and controlled. A further benefit is when an issue occurs in production that cannot be easily reproduced with source code, the exact image running in production can be run on a local development environment by pulling from the image repository. Issue #1410 PhilanthropyDataCommons/deploy#118 PR #1236
Prior to this commit, the Digital Ocean deployment pipeline would pull the new source code, re-run lint, re-run tests, build a docker image, and then deploy. The problem with this approach is that it includes too much, does too much, and is entirely outside our CI/CD steps at the canonical source forge (GH). The problem is revealed when running the `lint` step at Digital Ocean: no Python environment. But then an attempt to add Python to the environment both fails and does not make much sense. There doesn't need to be a Python environment, nor any development environment for that matter, present in the production deployment image. Only the already-linted, already-compiled, and already-tested code should go into production. Instead of trying to remove steps from the DO pipeline, this commit restores a docker image build and push when a merge to main occurs. Note that a controversial token is not used in this step and the inter-repository trigger that previously used such a token not restored. The approach here is to build an artifact out of components that have already been linted, compiled, and tested, and do not include any development tools along the way. See the `npm prune --omit=dev` command. Note that this does not perfectly meet the objective but further improvements can be done in later commits, for example, making this step contingent on previous workflow steps and declaring `uses` the exact same deployment image in the lint and test workflows. On the DO App Platform side, there is an option to run a docker image rather than compiling and building one from source. Other benefits come from this approach. The version of node used by DO was not clear until looking deep into logs or running a shell on the production runtime. The version of node was not able to controlled, either. With the approach of running an existing docker image, the full runtime, including node version, is known and controlled. A further benefit is when an issue occurs in production that cannot be easily reproduced with source code, the exact image running in production can be run on a local development environment by pulling from the image repository. With this commit, the ability to use the option of running a GHCR image on the DO App Platform is present because the image is publicly available. Issue #1410 PhilanthropyDataCommons/deploy#118 PR #1236
Prior to this commit, the Digital Ocean deployment pipeline would pull the new source code, re-run lint, re-run tests, build a docker image, and then deploy. The problem with this approach is that it includes too much, does too much, and is entirely outside our CI/CD steps at the canonical source forge (GH). The problem is revealed when running the `lint` step at Digital Ocean: no Python environment. But then an attempt to add Python to the environment both fails and does not make much sense. There doesn't need to be a Python environment, nor any development environment for that matter, present in the production deployment image. Only the already-linted, already-compiled, and already-tested code should go into production. Instead of trying to remove steps from the DO pipeline, this commit restores a docker image build and push when a merge to main occurs. Note that a controversial token is not used in this step and the inter-repository trigger that previously used such a token not restored. The approach here is to build an artifact out of components that have already been linted, compiled, and tested, and do not include any development tools along the way. See the `npm prune --omit=dev` command. Note that this does not perfectly meet the objective but further improvements can be done in later commits, for example, making this step contingent on previous workflow steps and declaring `uses` the exact same deployment image in the lint and test workflows. On the DO App Platform side, there is an option to run a docker image rather than compiling and building one from source. Other benefits come from this approach. The version of node used by DO was not clear until looking deep into logs or running a shell on the production runtime. The version of node was not able to controlled, either. With the approach of running an existing docker image, the full runtime, including node version, is known and controlled. A further benefit is when an issue occurs in production that cannot be easily reproduced with source code, the exact image running in production can be run on a local development environment by pulling from the image repository. With this commit, the ability to use the option of running a GHCR image on the DO App Platform is present because the image is publicly available. Issue #1410 PhilanthropyDataCommons/deploy#118 PR #1236
Prior to this commit, the Digital Ocean deployment pipeline would pull the new source code, re-run lint, re-run tests, build a docker image, and then deploy. The problem with this approach is that it includes too much, does too much, and is entirely outside our CI/CD steps at the canonical source forge (GH). The problem is revealed when running the `lint` step at Digital Ocean: no Python environment. But then an attempt to add Python to the environment both fails and does not make much sense. There doesn't need to be a Python environment, nor any development environment for that matter, present in the production deployment image. Only the already-linted, already-compiled, and already-tested code should go into production. Instead of trying to remove steps from the DO pipeline, this commit restores a docker image build and push when a merge to main occurs. Note that a controversial token is not used in this step and the inter-repository trigger that previously used such a token not restored. The approach here is to build an artifact out of components that have already been linted, compiled, and tested, and do not include any development tools along the way. See the `npm prune --omit=dev` command. Note that this does not perfectly meet the objective but further improvements can be done in later commits, for example, making this step contingent on previous workflow steps and declaring `uses` the exact same deployment image in the lint and test workflows. On the DO App Platform side, there is an option to run a docker image rather than compiling and building one from source. Other benefits come from this approach. The version of node used by DO was not clear until looking deep into logs or running a shell on the production runtime. The version of node was not able to controlled, either. With the approach of running an existing docker image, the full runtime, including node version, is known and controlled. A further benefit is when an issue occurs in production that cannot be easily reproduced with source code, the exact image running in production can be run on a local development environment by pulling from the image repository. With this commit, the ability to use the option of running a GHCR image on the DO App Platform is present because the image is publicly available. Issue #1410 PhilanthropyDataCommons/deploy#118 PR #1236
Prior to this commit, the Digital Ocean deployment pipeline would pull the new source code, re-run lint, re-run tests, build a docker image, and then deploy. The problem with this approach is that it includes too much, does too much, and is entirely outside our CI/CD steps at the canonical source forge (GH). The problem is revealed when running the `lint` step at Digital Ocean: no Python environment. But then an attempt to add Python to the environment both fails and does not make much sense. There doesn't need to be a Python environment, nor any development environment for that matter, present in the production deployment image. Only the already-linted, already-compiled, and already-tested code should go into production. Instead of trying to remove steps from the DO pipeline, this commit restores a docker image build and push when a merge to main occurs. Note that a controversial token is not used in this step and the inter-repository trigger that previously used such a token not restored. The approach here is to build an artifact out of components that have already been linted, compiled, and tested, and do not include any development tools along the way. See the `npm prune --omit=dev` command. Note that this does not perfectly meet the objective but further improvements can be done in later commits, for example, making this step contingent on previous workflow steps and declaring `uses` the exact same deployment image in the lint and test workflows. On the DO App Platform side, there is an option to run a docker image rather than compiling and building one from source. Other benefits come from this approach. The version of node used by DO was not clear until looking deep into logs or running a shell on the production runtime. The version of node was not able to controlled, either. With the approach of running an existing docker image, the full runtime, including node version, is known and controlled. A further benefit is when an issue occurs in production that cannot be easily reproduced with source code, the exact image running in production can be run on a local development environment by pulling from the image repository. With this commit, the ability to use the option of running a GHCR image on the DO App Platform is present because the image is publicly available. Issue #1410 PhilanthropyDataCommons/deploy#118 PR #1236
I created a new
Next step is to give permission and try again. After that get it to automatically deploy new images. It succeeds with proper permission. |
The new `deploy` workflow should trigger a deployment of the newly built (in the `build` workflow) image to Digital Ocean App Platform when several previous tasks such as `test` and `build` succeeded. The `build` task is assumed to have updated the `latest` image tag and therefore the deployment command called by `curl` here should deploy the image just created in most cases. There may still be a chance of interleaving workflow executions causing the unintended image to be deployed. Until a more robust solution is found, this may have to do. The situation as of writing is that no new code is delivered to production so this commit is a step toward getting an automated deployment working in a test environment and we expect to improve it to deploy to production. Issue #1410
The correct form is like `${{ secrets.VARIABLE }}`. Issue #1410
This one looks a bit more appropriate, actively developed: https://github.com/marketplace/actions/wait-other-jobs |
This one looks more appropriate. Issue #1410
Set the environment more simply and correctly, pass secrets through the `env` as documented, explicitly mark the URL in `curl` call. Issue #1410
Much much closer now. |
Do not attempt to hide the (non-secret) deployment URL. This makes it easier to troubleshoot. Also cause script to fail when curl fails. Issue #1410
The variables should come through with this change. Issue #1410
This might make variable substitution work. Issue #1410
To limit the chances of interleaving workflow runs deploying an unintended version of PDC, condition the build/push step on successful completion of lint, test, and sdk tasks. Use names for steps for friendlier presentation in GH UIs. Use fuzzy versioning for actions to match our current practice. Issue #1410
Deploy to production when the call to deploy to test succeeds. Without this commit, the deployment actions only target the test environment. While this deployment pipeline does not (yet) validate that the test environment is working after deployment to test, it at least validates that the command to deploy succeeded via `needs: deploy-to-test-env`. Issue #1410
To limit the chances of interleaving workflow runs deploying an unintended version of PDC, condition the build/push step on successful completion of lint, test, and sdk tasks. Use names for steps for friendlier presentation in GH UIs. Use fuzzy versioning for actions to match our current practice. Issue #1410
Deploy to production when the call to deploy to test succeeds. Without this commit, the deployment actions only target the test environment. While this deployment pipeline does not (yet) validate that the test environment is working after deployment to test, it at least validates that the command to deploy succeeded via `needs: deploy-to-test-env`. Issue #1410
No trailing commas allowed. Issue #1410
The deployment pipeline is working again, and with the improvement that Build is gated on the Lint and Test actions while Deploy is gated on the Lint, Test, and Build actions and Deploy to production is gated on Deploy to test. I also confirmed that the fuzzy docker image version does seem to give us image upgrades when a new one is available. I compared |
This reflects the recent change in the deployment style. The structure matches what is used as of this commit. Issue #1410
This reflects the recent change in the deployment style. The structure matches what is used as of this commit. Issue #1410
This ticket is to investigate the deployment failures on Jan 3 and Jan 5.
The text was updated successfully, but these errors were encountered: