Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(orchestrator): startup and remove containers based on dependencies #5509

Merged
merged 57 commits into from
Dec 19, 2023

Conversation

KollaAdithya
Copy link
Contributor

@KollaAdithya KollaAdithya commented Nov 30, 2023

This is final PR of Essential and DependsOn
UX with the below manifest config

name: wc2011
type: Load Balanced Web Service

image:
  build: ../../demo/web/Dockerfile

sidecars:
  nginx:
    essential: false
    depends_on:
      wc2011: healthy
  db:
    depends_on:
      wc2011: healthy
      nginx: success
[wc2011] WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
[wc2011] 2023/12/06 23:33:00 Hello
[wc2011] 2023/12/06 23:33:00 Listening on port 80...
Waiting for container "nginx" dependencies: [wc2011->healthy]
✔ Successfully dependency container "dhoni-rfer-wc2011-wc2011" reached "healthy"
[nginx] WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
[nginx] Exiting with code 0
Waiting for container "db" dependencies: [wc2011->healthy, nginx->success]
✔ Successfully dependency container "dhoni-rfer-wc2011-wc2011" reached "healthy"
✔ Successfully dependency container "dhoni-rfer-wc2011-nginx" reached "success"
[db] WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
[db] 2023/12/06 23:34:03 Hello
[db] 2023/12/06 23:34:03 Listening on port 80...
[db] 2023/12/06 23:34:03 listen tcp :80: bind: address already in use

@KollaAdithya KollaAdithya requested a review from a team as a code owner November 30, 2023 01:43
@KollaAdithya KollaAdithya requested review from CaptainCarpensir and removed request for a team November 30, 2023 01:43
@KollaAdithya KollaAdithya marked this pull request as draft November 30, 2023 01:43
Copy link

github-actions bot commented Nov 30, 2023

🍕 Here are the new binary sizes!

Name New size (kiB) size (kiB) Delta (%)
macOS (amd) 56980 57312 ❤️ -0.58
macOS (arm) 58000 58332 ❤️ -0.57
linux (amd) 49964 50256 ❤️ -0.58
linux (arm) 49280 49540 ❤️ -0.52
windows (amd) 47088 47344 ❤️ -0.54

@KollaAdithya KollaAdithya changed the title chore: startup containers based on dependencies chore(orchestrator): startup containers based on dependencies Nov 30, 2023
@KollaAdithya KollaAdithya reopened this Nov 30, 2023
for {
isRunning, err := o.docker.IsContainerRunning(ctx, id)
switch {
case err != nil:
var errContainerExited *dockerengine.ErrContainerExited
if errors.As(err, &errContainerExited) && !isEssential {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you know what ECS does when a non-essential container fails to start?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case if Essential Container Exits the deployment circuit breaker will be triggered. But in case if Non Essential container exits ECS will not trigger the circuit breaker and continue to deploy the service.

This is what i see in the Console for NonEssential containers.

Screenshot 2023-12-15 at 1 40 26 PM

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I know ECS doesn't error out or anything if the non-essential container has started and exited. I am curious what happens if the container was never able to start. I think it is not super important for us though, just in case you knew on top of you mind (if you don't, don't bother testing it 😂 !)

internal/pkg/docker/orchestrator/orchestrator.go Outdated Show resolved Hide resolved
internal/pkg/docker/orchestrator/orchestrator.go Outdated Show resolved Hide resolved
internal/pkg/docker/orchestrator/orchestrator.go Outdated Show resolved Hide resolved
internal/pkg/docker/orchestrator/orchestrator.go Outdated Show resolved Hide resolved
internal/pkg/docker/orchestrator/orchestrator.go Outdated Show resolved Hide resolved
internal/pkg/docker/orchestrator/orchestrator.go Outdated Show resolved Hide resolved
internal/pkg/docker/orchestrator/orchestrator.go Outdated Show resolved Hide resolved
internal/pkg/docker/orchestrator/orchestrator.go Outdated Show resolved Hide resolved
internal/pkg/docker/orchestrator/orchestrator.go Outdated Show resolved Hide resolved
internal/pkg/docker/orchestrator/orchestrator.go Outdated Show resolved Hide resolved
internal/pkg/graph/graph.go Outdated Show resolved Hide resolved
for {
isRunning, err := o.docker.IsContainerRunning(ctx, id)
switch {
case err != nil:
var errContainerExited *dockerengine.ErrContainerExited
if errors.As(err, &errContainerExited) && !isEssential {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I know ECS doesn't error out or anything if the non-essential container has started and exited. I am curious what happens if the container was never able to start. I think it is not super important for us though, just in case you knew on top of you mind (if you don't, don't bother testing it 😂 !)

@@ -434,16 +464,19 @@ func (o *Orchestrator) stopTask(ctx context.Context, task Task) error {
}

// waitForContainerToStart blocks until the container specified by id starts.
func (o *Orchestrator) waitForContainerToStart(ctx context.Context, id string) error {
func (o *Orchestrator) waitForContainerToStart(ctx context.Context, id string, isEssential bool) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, we take isEssential as a boolean parameters, instead of letting the callers swallow the exit error from non-essential container - I assume - because all the callers of waitForContainerToStart want the same error handling right? Essentially, we are passing a boolean parameter to avoid code duplication. Is this guess close to what you thought?

This makes sense to me, though I am wary of boolean parameters: among all the cons it has 😂 , it makes code harder to read. For example, it is not obvious what o.waitForContainerToStart(ctx, opts.ContainerName, true) means.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah your guess is absolutely right 😆
I want to reduce the duplicacy where each caller has to handle the error accordingly.
I have two options in this case

  1. Adjust the doc comment to make it more clear
  2. Let each caller handle the error scenario of non essential container exited.

Which one do you suggest among these two?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After a brief search I think there are 3 calls to waitForContainerToStart, and only 2 of them need to handle the non-essential case, so the duplication is not bad. Therefore, if you ask me - I would probably have opted for 2. But I will leave the final decision to you 👍🏼

internal/pkg/docker/orchestrator/orchestrator.go Outdated Show resolved Hide resolved
internal/pkg/docker/orchestrator/orchestrator.go Outdated Show resolved Hide resolved
internal/pkg/docker/orchestrator/orchestrator.go Outdated Show resolved Hide resolved
healthy, err := o.docker.IsContainerHealthy(ctx, ctrId)
if err != nil {
if !isEssential {
fmt.Printf("non-essential container %q is not healthy: %v\n", ctrId, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit nit: just because the container could be in non-running state or maybe docker inspect has some unexpected error (which does not indicate "unhealthiness" of the container):

Suggested change
fmt.Printf("non-essential container %q is not healthy: %v\n", ctrId, err)
fmt.Printf("check health status for container %q: %v\n", ctrId, err)

The wrapped error returned from IsContainerHealthy is enough to tell the users whether it's because the container is indeed unhealthy, or it doesn't have health check, or other.

Copy link
Contributor

@Lou1415926 Lou1415926 Dec 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also.....on a second thought, if I have a manifest like this:

sidecars:
  container1:
     depends_on:
         container2: healthy
 container2:
     # not essential
     healtchcheck:...

And then container2 turns out to be unhealthy, should I really start container1 🤔 it seems from the implementation that we will just log and start contaienr1. Curious to hear your thoughts!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this scenario. ECS tries to deploy container2 and it becomes unhealthy. ECS does not try to start container1nor Circuit breaker is triggered. CloudFormation just errors our after 3 hours.

Resource timed out waiting for completion

So Essential-lity and dependsOn are two different entities.
Changed the logic to completely stop and remove all the containers even if non-essential container becomes unhealthy.

name, state := name, state
eg.Go(func() error {
ctrId := o.containerID(name)
isEssential := definitions[name].IsEssential
Copy link
Contributor

@Lou1415926 Lou1415926 Dec 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, it's a bit counter-intuitive that a container would care about whether the containers that it depends on are essential 💭 For example, if the depends_on specify that container A can start only after container B has started, why would container A care whether container B is essential?

It seems like we are using isEssential here majorly to decide:

  1. whether to error out from waitForContainerToStart
  2. whether to log.Successf or to just fmt.Printf
  3. whether to error out from IsContainerHealthy

For 1,

return o.waitForContainerToStart(ctx, o.containerID(containerName), a.task.Containers[containerName].IsEssential)
would ⬆️ have been a better place to handle whether to error out or not depending on the essential-ity? Then, we will swallow the "ErrContainerExited` error here (regardless of whether it is essential), because if the contaienr has exited, that means it must have been started.

For 2, I think

return o.waitForContainerToStart(ctx, o.containerID(containerName), a.task.Containers[containerName].IsEssential)
again is the better place to have that logic.

For 3, I have the same question in https://github.com/aws/copilot-cli/pull/5509/files#r1428611649.

Essentially, I feel like we are mixing two routes of logics here:

  1. To complain or just log when an essential/non-essential contaienr is not functioning
  2. To determine whether to start a container based on its dependency
    The function that handles 2, I think, typically should not care about the essential-ity of the containers.

Let me know what you think!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for pointing me about this. yeah, essential-ity and dependsOn are different entities.
I changed the logic to reflect the above.

Copy link
Contributor

@iamhopaul123 iamhopaul123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! LGTM overall

internal/pkg/docker/dockerengine/dockerengine.go Outdated Show resolved Hide resolved
internal/pkg/graph/graph.go Outdated Show resolved Hide resolved
internal/pkg/graph/graph.go Outdated Show resolved Hide resolved
Copy link
Contributor

@Lou1415926 Lou1415926 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two more questions and we are good to go!!!! Amazing work Adi!

internal/pkg/docker/orchestrator/orchestrator.go Outdated Show resolved Hide resolved
internal/pkg/docker/orchestrator/orchestrator.go Outdated Show resolved Hide resolved
Copy link
Contributor

@Lou1415926 Lou1415926 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome!

@@ -475,6 +475,7 @@ func (o *Orchestrator) waitForContainerToStart(ctx context.Context, id string) e
case err != nil:
return fmt.Errorf("check if %q is running: %w", id, err)
case isRunning:
log.Successf("Successfully started container %s\n", id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for this PR, but some time later after this is merged -

I feel like it'd be easier to refactor the code in the future if we add a logger to Orchestrator:

type Orchestrator struct {
  logger log.Logger
}

func New() *Orchestrator {
  return &Orchestrator { logger: log.New() }
}

func (o *Orchestrator) waitForContainerToStart(ctx context.Context, id string) error {
   o.logger.Successf("...")
}

This would be easier for us to a) mute the logger by creating a no-op loggeer in New() b) create a logger that logs to an open file instead of stdout/err and c) refactor the codes, because we can easily locate all the logging by "Go to reference" of the logger (vs. now we need to do a global search on the keyword "log")

@mergify mergify bot merged commit be03ed1 into aws:mainline Dec 19, 2023
12 checks passed
@KollaAdithya KollaAdithya deleted the implement/graph/dependencies branch December 20, 2023 22:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants