Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deployment failures 2025-01-03 and 2025-01-05 #1410

Closed
bickelj opened this issue Jan 6, 2025 · 16 comments
Closed

Deployment failures 2025-01-03 and 2025-01-05 #1410

bickelj opened this issue Jan 6, 2025 · 16 comments
Assignees

Comments

@bickelj
Copy link
Contributor

bickelj commented Jan 6, 2025

This ticket is to investigate the deployment failures on Jan 3 and Jan 5.

@bickelj bickelj self-assigned this Jan 6, 2025
@bickelj bickelj changed the title Deployment failure 2025-01-03 Deployment failures 2025-01-03 and 2025-01-05 Jan 6, 2025
@bickelj
Copy link
Contributor Author

bickelj commented Jan 6, 2025

First one:

[2025-01-03 21:28:04] │ -----> Installing dependencies
[2025-01-03 21:28:04] │        Installing node modules
[2025-01-03 21:28:12] │        
[2025-01-03 21:28:12] │        > postinstall
[2025-01-03 21:28:12] │        > [ ! -d "venv" ] && python3 -m venv venv; ./venv/bin/pip install -r sqlfluff/requirements.txt
[2025-01-03 21:28:12] │        
[2025-01-03 21:28:12] │        The virtual environment was not created successfully because ensurepip is not
[2025-01-03 21:28:12] │        available.  On Debian/Ubuntu systems, you need to install the python3-venv
[2025-01-03 21:28:12] │        package using the following command.
[2025-01-03 21:28:12] │        
[2025-01-03 21:28:12] │            apt install python3.10-venv
[2025-01-03 21:28:12] │        
[2025-01-03 21:28:12] │        You may need to use sudo with that command.  After installing the python3-venv
[2025-01-03 21:28:12] │        package, recreate your virtual environment.
[2025-01-03 21:28:12] │        
[2025-01-03 21:28:12] │        Failing command: /workspace/venv/bin/python3
[2025-01-03 21:28:12] │        
[2025-01-03 21:28:12] │        sh: 1: ./venv/bin/pip: not found
[2025-01-03 21:28:12] │        npm error code 127
[2025-01-03 21:28:12] │        npm error path /workspace
[2025-01-03 21:28:12] │        npm error command failed
[2025-01-03 21:28:12] │        npm error command sh -c [ ! -d "venv" ] && python3 -m venv venv; ./venv/bin/pip install -r sqlfluff/requirements.txt
[2025-01-03 21:28:12] │        npm error A complete log of this run can be found in: /tmp/npmcache.sUAS8/_logs/2025-01-03T21_28_04_967Z-debug-0.log

Second one:

[2025-01-06 02:53:42] │ -----> Installing dependencies
[2025-01-06 02:53:42] │        Installing node modules
[2025-01-06 02:53:50] │        
[2025-01-06 02:53:50] │        > postinstall
[2025-01-06 02:53:50] │        > [ ! -d "venv" ] && python3 -m venv venv; ./venv/bin/pip install -r sqlfluff/requirements.txt
[2025-01-06 02:53:50] │        
[2025-01-06 02:53:50] │        The virtual environment was not created successfully because ensurepip is not
[2025-01-06 02:53:50] │        available.  On Debian/Ubuntu systems, you need to install the python3-venv
[2025-01-06 02:53:50] │        package using the following command.
[2025-01-06 02:53:50] │        
[2025-01-06 02:53:50] │            apt install python3.10-venv
[2025-01-06 02:53:50] │        
[2025-01-06 02:53:50] │        You may need to use sudo with that command.  After installing the python3-venv
[2025-01-06 02:53:50] │        package, recreate your virtual environment.
[2025-01-06 02:53:50] │        
[2025-01-06 02:53:50] │        Failing command: /workspace/venv/bin/python3
[2025-01-06 02:53:50] │        
[2025-01-06 02:53:50] │        sh: 1: ./venv/bin/pip: not found
[2025-01-06 02:53:50] │        npm error code 127
[2025-01-06 02:53:50] │        npm error path /workspace
[2025-01-06 02:53:50] │        npm error command failed
[2025-01-06 02:53:50] │        npm error command sh -c [ ! -d "venv" ] && python3 -m venv venv; ./venv/bin/pip install -r sqlfluff/requirements.txt
[2025-01-06 02:53:50] │        npm error A complete log of this run can be found in: /tmp/npmcache.jgRzj/_logs/2025-01-06T02_53_42_706Z-debug-0.log
[2025-01-06 02:53:50] │ 
[2025-01-06 02:53:50] │ -----> Build failed

@bickelj
Copy link
Contributor Author

bickelj commented Jan 6, 2025

We need an additional package in the DO build environment, it appears.

#321 #1399

@bickelj
Copy link
Contributor Author

bickelj commented Jan 6, 2025

It's not clear how to add packages to the environment or where the environment comes from. For example, there is no obvious place in .do/deploy.template.yaml when looking there first. Poking around the DO interfaces such as Overview and Settings for the service app does not show where the environment comes from or how to add a package to it. I vaguely recall something about build packs. Looking through docs next.

@bickelj
Copy link
Contributor Author

bickelj commented Jan 7, 2025

@slifty Yesterday you suggested you might take a look at this, marking it so here too.

@bickelj bickelj assigned slifty and unassigned bickelj Jan 7, 2025
@bickelj
Copy link
Contributor Author

bickelj commented Jan 7, 2025

My preferred solution is to build the docker image and have digital ocean deploy that image. That way there is no need for Digital Ocean to be running lint commands in the first place.

@slifty slifty added this to Phase 5 Jan 7, 2025
@slifty slifty moved this to Todo in Phase 5 Jan 7, 2025
bickelj added a commit that referenced this issue Jan 7, 2025
The goal is to restore the Digital Ocean (DO) pipeline while keeping
the existing `lint` command that now includes `sqlfluff` calls. The
assumption is that an Ubuntu Jammy image is being built and that when
an Aptfile is present, DO will use it before or during a build.

See https://docs.digitalocean.com/products/app-platform/reference/buildpacks/aptfile/

Hat tip to Dan Schultz for finding that documentation.

Issue #1410
bickelj added a commit that referenced this issue Jan 7, 2025
The goal is to restore the Digital Ocean (DO) deployment pipeline
while keeping the existing `lint` command that now includes `sqlfluff`
calls. The assumption is that an Ubuntu Jammy image is being built and
that when an Aptfile is present, DO will use it before or during a build.

The specific error we're getting at deploy time is this:

> postinstall
> > [ ! -d "venv" ] && python3 -m venv venv; ./venv/bin/pip install -r sqlfluff/requirements.txt
> The virtual environment was not created successfully because ensurepip is not
> available.  On Debian/Ubuntu systems, you need to install the python3-venv
> package...

See https://docs.digitalocean.com/products/app-platform/reference/buildpacks/aptfile/

Hat tip to Daniel Schultz for finding that documentation.

Issue #1410
@bickelj
Copy link
Contributor Author

bickelj commented Jan 7, 2025

OK, #1416 partially worked in that I now see this in the build log:

[2025-01-07 21:24:13] │ => Installing apt packages with dpkg
[2025-01-07 21:24:13] │ libpython3.10-minimal_3.10.12-1~22.04.7_amd64.deb
[2025-01-07 21:24:13] │ libpython3.10-stdlib_3.10.12-1~22.04.7_amd64.deb
[2025-01-07 21:24:13] │ python3-pip-whl_22.0.2+dfsg-1ubuntu0.5_all.deb
[2025-01-07 21:24:13] │ python3-setuptools-whl_59.6.0-1.2ubuntu0.22.04.2_all.deb
[2025-01-07 21:24:13] │ python3-venv_3.10.6-1~22.04.1_amd64.deb
[2025-01-07 21:24:13] │ python3.10-minimal_3.10.12-1~22.04.7_amd64.deb
[2025-01-07 21:24:13] │ python3.10-venv_3.10.12-1~22.04.7_amd64.deb
[2025-01-07 21:24:13] │ python3.10_3.10.12-1~22.04.7_amd64.deb

But it still fails to find pip:


[2025-01-07 21:24:25] │        > postinstall
[2025-01-07 21:24:25] │        > [ ! -d "venv" ] && python3 -m venv venv; ./venv/bin/pip install -r sqlfluff/requirements.txt
[2025-01-07 21:24:25] │        
[2025-01-07 21:24:25] │        The virtual environment was not created successfully because ensurepip is not
[2025-01-07 21:24:25] │        available.  On Debian/Ubuntu systems, you need to install the python3-venv
[2025-01-07 21:24:25] │        package using the following command.
[2025-01-07 21:24:25] │        
[2025-01-07 21:24:25] │            apt install python3.10-venv
[2025-01-07 21:24:25] │        
[2025-01-07 21:24:25] │        You may need to use sudo with that command.  After installing the python3-venv
[2025-01-07 21:24:25] │        package, recreate your virtual environment.
[2025-01-07 21:24:25] │        
[2025-01-07 21:24:25] │        Failing command: /workspace/venv/bin/python3
[2025-01-07 21:24:25] │        
[2025-01-07 21:24:25] │        sh: 1: ./venv/bin/pip: not found
[2025-01-07 21:24:25] │        npm error code 127
[2025-01-07 21:24:25] │        npm error path /workspace
[2025-01-07 21:24:25] │        npm error command failed
[2025-01-07 21:24:25] │        npm error command sh -c [ ! -d "venv" ] && python3 -m venv venv; ./venv/bin/pip install -r sqlfluff/requirements.txt
[2025-01-07 21:24:25] │        npm error A complete log of this run can be found in: /tmp/npmcache.hfCGp/_logs/2025-01-07T21_24_17_791Z-debug-0.log
[2025-01-07 21:24:25] │ 
[2025-01-07 21:24:25] │ -----> Build failed

Perhaps we need python3-pip too. Or perhaps that's all we needed in the first place, given that's the same error above.

bickelj added a commit that referenced this issue Jan 7, 2025
The previous attempt did not resolve a Digital Ocean deployment
failure, `pip` is still not found. It appears the package from
`Aptfile` was in fact installed so explicitly install `pip` too.

Issue #1410
@bickelj
Copy link
Contributor Author

bickelj commented Jan 7, 2025

#1417 was a more valiant effort, but it too failed:

[2025-01-07 21:43:03] │ => Installing apt packages with dpkg
[2025-01-07 21:43:03] │ javascript-common_11+nmu1_all.deb
[2025-01-07 21:43:03] │ libjs-jquery_3.6.0+dfsg+~3.5.13-1_all.deb
[2025-01-07 21:43:03] │ libjs-sphinxdoc_4.3.2-1_all.deb
[2025-01-07 21:43:03] │ libjs-underscore_1.13.2~dfsg-2_all.deb
[2025-01-07 21:43:03] │ libpython3-dev_3.10.6-1~22.04.1_amd64.deb
[2025-01-07 21:43:03] │ libpython3.10-dev_3.10.12-1~22.04.7_amd64.deb
[2025-01-07 21:43:03] │ libpython3.10-minimal_3.10.12-1~22.04.7_amd64.deb
[2025-01-07 21:43:03] │ libpython3.10-stdlib_3.10.12-1~22.04.7_amd64.deb
[2025-01-07 21:43:03] │ libpython3.10_3.10.12-1~22.04.7_amd64.deb
[2025-01-07 21:43:03] │ python3-dev_3.10.6-1~22.04.1_amd64.deb
[2025-01-07 21:43:03] │ python3-pip-whl_22.0.2+dfsg-1ubuntu0.5_all.deb
[2025-01-07 21:43:03] │ python3-pip_22.0.2+dfsg-1ubuntu0.5_all.deb
[2025-01-07 21:43:03] │ python3-setuptools-whl_59.6.0-1.2ubuntu0.22.04.2_all.deb
[2025-01-07 21:43:03] │ python3-setuptools_59.6.0-1.2ubuntu0.22.04.2_all.deb
[2025-01-07 21:43:03] │ python3-venv_3.10.6-1~22.04.1_amd64.deb
[2025-01-07 21:43:03] │ python3-wheel_0.37.1-2ubuntu0.22.04.1_all.deb
[2025-01-07 21:43:03] │ python3.10-dev_3.10.12-1~22.04.7_amd64.deb
[2025-01-07 21:43:03] │ python3.10-minimal_3.10.12-1~22.04.7_amd64.deb
[2025-01-07 21:43:03] │ python3.10-venv_3.10.12-1~22.04.7_amd64.deb
[2025-01-07 21:43:03] │ python3.10_3.10.12-1~22.04.7_amd64.deb
...
[2025-01-07 21:43:08] │ -----> Installing dependencies
[2025-01-07 21:43:08] │        Installing node modules
[2025-01-07 21:43:19] │        
[2025-01-07 21:43:19] │        > postinstall
[2025-01-07 21:43:19] │        > [ ! -d "venv" ] && python3 -m venv venv; ./venv/bin/pip install -r sqlfluff/requirements.txt
[2025-01-07 21:43:19] │        
[2025-01-07 21:43:19] │        The virtual environment was not created successfully because ensurepip is not
[2025-01-07 21:43:19] │        available.  On Debian/Ubuntu systems, you need to install the python3-venv
[2025-01-07 21:43:19] │        package using the following command.
[2025-01-07 21:43:19] │        
[2025-01-07 21:43:19] │            apt install python3.10-venv
[2025-01-07 21:43:19] │        
[2025-01-07 21:43:19] │        You may need to use sudo with that command.  After installing the python3-venv
[2025-01-07 21:43:19] │        package, recreate your virtual environment.
[2025-01-07 21:43:19] │        
[2025-01-07 21:43:19] │        Failing command: /workspace/venv/bin/python3
[2025-01-07 21:43:19] │        
[2025-01-07 21:43:19] │        sh: 1: ./venv/bin/pip: not found
[2025-01-07 21:43:19] │        npm error code 127
[2025-01-07 21:43:19] │        npm error path /workspace
[2025-01-07 21:43:19] │        npm error command failed
[2025-01-07 21:43:19] │        npm error command sh -c [ ! -d "venv" ] && python3 -m venv venv; ./venv/bin/pip install -r sqlfluff/requirements.txt
[2025-01-07 21:43:19] │        npm error A complete log of this run can be found in: /tmp/npmcache.oNLY8/_logs/2025-01-07T21_43_08_648Z-debug-0.log
[2025-01-07 21:43:19] │ 
[2025-01-07 21:43:19] │ -----> Build failed

@bickelj
Copy link
Contributor Author

bickelj commented Jan 7, 2025

See also https://askubuntu.com/questions/879437/ensurepip-is-disabled-in-debian-ubuntu-for-the-system-python specifically the comment about Debian 11 still having the issue years later.

@bickelj
Copy link
Contributor Author

bickelj commented Jan 7, 2025

Next step is to try deploying using a GH-built docker image, referencing GHCR from DO.

bickelj added a commit that referenced this issue Jan 8, 2025
Without this commit, the Digital Ocean deployment pipeline would pull
the new source code, re-run lint, re-run tests, build a docker image,
and then deploy. The problem with this approach is that it includes
too much, does too much, and is entirely outside our CI/CD steps at
the canonical source forge (GH). The problem is revealed when running
the `lint` step at Digital Ocean: no Python environment. But then an
attempt to add Python to the environment both fails and does not make
much sense. There doesn't need to be a Python environment, nor any
development environment for that matter, present in the production
deployment image. Only the already-linted, already-compiled, and
already-tested code should go into production. Instead of trying to
remove steps from the DO pipeline, this commit restores a docker image
build and push when a merge to main occurs. Note that a controversial
token is not used in this step and the inter-repository trigger that
previously used such a token not restored.

The approach here is to build an artifact out of components that have
already been linted, compiled, and tested, and do not include any
development tools along the way. See the `npm prune --omit=dev`
command. Note that this does not perfectly meet the objective but
further improvements can be done in later commits, for example, making
this step contingent on previous workflow steps and declaring `uses`
the exact same deployment image in the lint and test workflows.

On the DO App Platform side, there is an option to run a docker image
rather than compiling and building one from source. Other benefits
come from this approach. The version of node used by DO was not clear
until looking deep into logs or running a shell on the production
runtime. The version of node was not able to controlled, either. With
the approach of running an existing docker image, the full runtime,
including node version, is known and controlled. A further benefit is
when an issue occurs in production that cannot be easily reproduced
with source code, the exact image running in production can be run on
a local development environment by pulling from the image repository.

Issue #1410
PhilanthropyDataCommons/deploy#118
PR #1236
bickelj added a commit that referenced this issue Jan 8, 2025
Without this commit, the Digital Ocean deployment pipeline would pull
the new source code, re-run lint, re-run tests, build a docker image,
and then deploy. The problem with this approach is that it includes
too much, does too much, and is entirely outside our CI/CD steps at
the canonical source forge (GH). The problem is revealed when running
the `lint` step at Digital Ocean: no Python environment. But then an
attempt to add Python to the environment both fails and does not make
much sense. There doesn't need to be a Python environment, nor any
development environment for that matter, present in the production
deployment image. Only the already-linted, already-compiled, and
already-tested code should go into production. Instead of trying to
remove steps from the DO pipeline, this commit restores a docker image
build and push when a merge to main occurs. Note that a controversial
token is not used in this step and the inter-repository trigger that
previously used such a token not restored.

The approach here is to build an artifact out of components that have
already been linted, compiled, and tested, and do not include any
development tools along the way. See the `npm prune --omit=dev`
command. Note that this does not perfectly meet the objective but
further improvements can be done in later commits, for example, making
this step contingent on previous workflow steps and declaring `uses`
the exact same deployment image in the lint and test workflows.

On the DO App Platform side, there is an option to run a docker image
rather than compiling and building one from source. Other benefits
come from this approach. The version of node used by DO was not clear
until looking deep into logs or running a shell on the production
runtime. The version of node was not able to controlled, either. With
the approach of running an existing docker image, the full runtime,
including node version, is known and controlled. A further benefit is
when an issue occurs in production that cannot be easily reproduced
with source code, the exact image running in production can be run on
a local development environment by pulling from the image repository.

Issue #1410
PhilanthropyDataCommons/deploy#118
PR #1236
bickelj added a commit that referenced this issue Jan 8, 2025
Prior to this commit, the Digital Ocean deployment pipeline would pull
the new source code, re-run lint, re-run tests, build a docker image,
and then deploy. The problem with this approach is that it includes
too much, does too much, and is entirely outside our CI/CD steps at
the canonical source forge (GH). The problem is revealed when running
the `lint` step at Digital Ocean: no Python environment. But then an
attempt to add Python to the environment both fails and does not make
much sense. There doesn't need to be a Python environment, nor any
development environment for that matter, present in the production
deployment image. Only the already-linted, already-compiled, and
already-tested code should go into production. Instead of trying to
remove steps from the DO pipeline, this commit restores a docker image
build and push when a merge to main occurs. Note that a controversial
token is not used in this step and the inter-repository trigger that
previously used such a token not restored.

The approach here is to build an artifact out of components that have
already been linted, compiled, and tested, and do not include any
development tools along the way. See the `npm prune --omit=dev`
command. Note that this does not perfectly meet the objective but
further improvements can be done in later commits, for example, making
this step contingent on previous workflow steps and declaring `uses`
the exact same deployment image in the lint and test workflows.

On the DO App Platform side, there is an option to run a docker image
rather than compiling and building one from source. Other benefits
come from this approach. The version of node used by DO was not clear
until looking deep into logs or running a shell on the production
runtime. The version of node was not able to controlled, either. With
the approach of running an existing docker image, the full runtime,
including node version, is known and controlled. A further benefit is
when an issue occurs in production that cannot be easily reproduced
with source code, the exact image running in production can be run on
a local development environment by pulling from the image repository.

With this commit, the ability to use the option of running a GHCR
image on the DO App Platform is present because the image is publicly
available.

Issue #1410
PhilanthropyDataCommons/deploy#118
PR #1236
bickelj added a commit that referenced this issue Jan 8, 2025
Prior to this commit, the Digital Ocean deployment pipeline would pull
the new source code, re-run lint, re-run tests, build a docker image,
and then deploy. The problem with this approach is that it includes
too much, does too much, and is entirely outside our CI/CD steps at
the canonical source forge (GH). The problem is revealed when running
the `lint` step at Digital Ocean: no Python environment. But then an
attempt to add Python to the environment both fails and does not make
much sense. There doesn't need to be a Python environment, nor any
development environment for that matter, present in the production
deployment image. Only the already-linted, already-compiled, and
already-tested code should go into production. Instead of trying to
remove steps from the DO pipeline, this commit restores a docker image
build and push when a merge to main occurs. Note that a controversial
token is not used in this step and the inter-repository trigger that
previously used such a token not restored.

The approach here is to build an artifact out of components that have
already been linted, compiled, and tested, and do not include any
development tools along the way. See the `npm prune --omit=dev`
command. Note that this does not perfectly meet the objective but
further improvements can be done in later commits, for example, making
this step contingent on previous workflow steps and declaring `uses`
the exact same deployment image in the lint and test workflows.

On the DO App Platform side, there is an option to run a docker image
rather than compiling and building one from source. Other benefits
come from this approach. The version of node used by DO was not clear
until looking deep into logs or running a shell on the production
runtime. The version of node was not able to controlled, either. With
the approach of running an existing docker image, the full runtime,
including node version, is known and controlled. A further benefit is
when an issue occurs in production that cannot be easily reproduced
with source code, the exact image running in production can be run on
a local development environment by pulling from the image repository.

With this commit, the ability to use the option of running a GHCR
image on the DO App Platform is present because the image is publicly
available.

Issue #1410
PhilanthropyDataCommons/deploy#118
PR #1236
bickelj added a commit that referenced this issue Jan 8, 2025
Prior to this commit, the Digital Ocean deployment pipeline would pull
the new source code, re-run lint, re-run tests, build a docker image,
and then deploy. The problem with this approach is that it includes
too much, does too much, and is entirely outside our CI/CD steps at
the canonical source forge (GH). The problem is revealed when running
the `lint` step at Digital Ocean: no Python environment. But then an
attempt to add Python to the environment both fails and does not make
much sense. There doesn't need to be a Python environment, nor any
development environment for that matter, present in the production
deployment image. Only the already-linted, already-compiled, and
already-tested code should go into production. Instead of trying to
remove steps from the DO pipeline, this commit restores a docker image
build and push when a merge to main occurs. Note that a controversial
token is not used in this step and the inter-repository trigger that
previously used such a token not restored.

The approach here is to build an artifact out of components that have
already been linted, compiled, and tested, and do not include any
development tools along the way. See the `npm prune --omit=dev`
command. Note that this does not perfectly meet the objective but
further improvements can be done in later commits, for example, making
this step contingent on previous workflow steps and declaring `uses`
the exact same deployment image in the lint and test workflows.

On the DO App Platform side, there is an option to run a docker image
rather than compiling and building one from source. Other benefits
come from this approach. The version of node used by DO was not clear
until looking deep into logs or running a shell on the production
runtime. The version of node was not able to controlled, either. With
the approach of running an existing docker image, the full runtime,
including node version, is known and controlled. A further benefit is
when an issue occurs in production that cannot be easily reproduced
with source code, the exact image running in production can be run on
a local development environment by pulling from the image repository.

With this commit, the ability to use the option of running a GHCR
image on the DO App Platform is present because the image is publicly
available.

Issue #1410
PhilanthropyDataCommons/deploy#118
PR #1236
bickelj added a commit that referenced this issue Jan 8, 2025
Prior to this commit, the Digital Ocean deployment pipeline would pull
the new source code, re-run lint, re-run tests, build a docker image,
and then deploy. The problem with this approach is that it includes
too much, does too much, and is entirely outside our CI/CD steps at
the canonical source forge (GH). The problem is revealed when running
the `lint` step at Digital Ocean: no Python environment. But then an
attempt to add Python to the environment both fails and does not make
much sense. There doesn't need to be a Python environment, nor any
development environment for that matter, present in the production
deployment image. Only the already-linted, already-compiled, and
already-tested code should go into production. Instead of trying to
remove steps from the DO pipeline, this commit restores a docker image
build and push when a merge to main occurs. Note that a controversial
token is not used in this step and the inter-repository trigger that
previously used such a token not restored.

The approach here is to build an artifact out of components that have
already been linted, compiled, and tested, and do not include any
development tools along the way. See the `npm prune --omit=dev`
command. Note that this does not perfectly meet the objective but
further improvements can be done in later commits, for example, making
this step contingent on previous workflow steps and declaring `uses`
the exact same deployment image in the lint and test workflows.

On the DO App Platform side, there is an option to run a docker image
rather than compiling and building one from source. Other benefits
come from this approach. The version of node used by DO was not clear
until looking deep into logs or running a shell on the production
runtime. The version of node was not able to controlled, either. With
the approach of running an existing docker image, the full runtime,
including node version, is known and controlled. A further benefit is
when an issue occurs in production that cannot be easily reproduced
with source code, the exact image running in production can be run on
a local development environment by pulling from the image repository.

With this commit, the ability to use the option of running a GHCR
image on the DO App Platform is present because the image is publicly
available.

Issue #1410
PhilanthropyDataCommons/deploy#118
PR #1236
@bickelj
Copy link
Contributor Author

bickelj commented Jan 8, 2025

I created a new pdc DB in our existing test database cluster on DO, added a new pdc user, but apparently I didn't give ownership so the first deploy failed:

[2025-01-08 18:37:29] {"level":30,"time":1736361449806,"pid":2,"hostname":"pdc-service-test-ghcr-image-558dd7b868-r9sxz","source":"/opt/philanthropy-data-commons/server/dist/database/migrate.js","msg":"Error while using lock: Migration failed. Reason: An error occurred running 'create-migrations-table'. Rolled back this migration. No further migrations were run. Reason: permission denied for database pdc"}

Next step is to give permission and try again. After that get it to automatically deploy new images.

It succeeds with proper permission.

bickelj added a commit that referenced this issue Jan 9, 2025
The new `deploy` workflow should trigger a deployment of the newly
built (in the `build` workflow) image to Digital Ocean App Platform
when several previous tasks such as `test` and `build` succeeded.

The `build` task is assumed to have updated the `latest` image tag and
therefore the deployment command called by `curl` here should deploy
the image just created in most cases. There may still be a chance of
interleaving workflow executions causing the unintended image to be
deployed. Until a more robust solution is found, this may have to do.

The situation as of writing is that no new code is delivered to
production so this commit is a step toward getting an automated
deployment working in a test environment and we expect to improve it
to deploy to production.

Issue #1410
bickelj added a commit that referenced this issue Jan 9, 2025
The correct form is like `${{ secrets.VARIABLE }}`.

Issue #1410
@bickelj
Copy link
Contributor Author

bickelj commented Jan 9, 2025

This one looks a bit more appropriate, actively developed: https://github.com/marketplace/actions/wait-other-jobs

bickelj added a commit that referenced this issue Jan 9, 2025
This one looks more appropriate.

Issue #1410
bickelj added a commit that referenced this issue Jan 9, 2025
bickelj added a commit that referenced this issue Jan 9, 2025
Set the environment more simply and correctly, pass secrets through
the `env` as documented, explicitly mark the URL in `curl` call.

Issue #1410
@bickelj
Copy link
Contributor Author

bickelj commented Jan 9, 2025

Much much closer now.

bickelj added a commit that referenced this issue Jan 10, 2025
Do not attempt to hide the (non-secret) deployment URL. This makes it
easier to troubleshoot. Also cause script to fail when curl fails.

Issue #1410
bickelj added a commit that referenced this issue Jan 10, 2025
The variables should come through with this change.

Issue #1410
bickelj added a commit that referenced this issue Jan 10, 2025
This might make variable substitution work.

Issue #1410
bickelj added a commit that referenced this issue Jan 13, 2025
To limit the chances of interleaving workflow runs deploying an
unintended version of PDC, condition the build/push step on successful
completion of lint, test, and sdk tasks.

Use names for steps for friendlier presentation in GH UIs.

Use fuzzy versioning for actions to match our current practice.

Issue #1410
bickelj added a commit that referenced this issue Jan 13, 2025
bickelj added a commit that referenced this issue Jan 13, 2025
Deploy to production when the call to deploy to test succeeds. Without
this commit, the deployment actions only target the test environment.

While this deployment pipeline does not (yet) validate that the test
environment is working after deployment to test, it at least validates
that the command to deploy succeeded via `needs: deploy-to-test-env`.

Issue #1410
bickelj added a commit that referenced this issue Jan 13, 2025
To limit the chances of interleaving workflow runs deploying an
unintended version of PDC, condition the build/push step on successful
completion of lint, test, and sdk tasks.

Use names for steps for friendlier presentation in GH UIs.

Use fuzzy versioning for actions to match our current practice.

Issue #1410
bickelj added a commit that referenced this issue Jan 13, 2025
bickelj added a commit that referenced this issue Jan 13, 2025
Deploy to production when the call to deploy to test succeeds. Without
this commit, the deployment actions only target the test environment.

While this deployment pipeline does not (yet) validate that the test
environment is working after deployment to test, it at least validates
that the command to deploy succeeded via `needs: deploy-to-test-env`.

Issue #1410
bickelj added a commit that referenced this issue Jan 14, 2025
No trailing commas allowed.

Issue #1410
@bickelj
Copy link
Contributor Author

bickelj commented Jan 14, 2025

The deployment pipeline is working again, and with the improvement that Build is gated on the Lint and Test actions while Deploy is gated on the Lint, Test, and Build actions and Deploy to production is gated on Deploy to test.

I also confirmed that the fuzzy docker image version does seem to give us image upgrades when a new one is available.

I compared docker history ghcr.io/philanthropydatacommons/service:20250114-c51955d with docker history ghcr.io/philanthropydatacommons/service:20250109-7239bb3 in light of the knowledge that "22-debian-12 Last pushed 3 days ago by bitnamirobot" and saw a difference in that base image.

@bickelj bickelj closed this as completed Jan 14, 2025
@slifty slifty moved this from In Progress to Done in Phase 5 Jan 14, 2025
@slifty slifty moved this from Done to Done & Cleared in Phase 5 Jan 14, 2025
bickelj added a commit that referenced this issue Jan 15, 2025
This reflects the recent change in the deployment style. The structure
matches what is used as of this commit.

Issue #1410
bickelj added a commit that referenced this issue Jan 16, 2025
This reflects the recent change in the deployment style. The structure
matches what is used as of this commit.

Issue #1410
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done & Cleared
Development

No branches or pull requests

2 participants