Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

api: polish api service shutdown and top-level context handling #237

Conversation

tychoish
Copy link
Contributor

@tychoish tychoish commented Jan 10, 2025

This makes shutdown, in response to SIGTERM, cleaner and more correct:

  • avoided creating a new context in several key places, notably, the base context handed to requests.
  • the context canceled when the signal is caught (SIGTERM).
  • the process will not exit until all request handlers have returned and any shutdown operations.

As an added bonus, I made some changes to error handling and logging which should remove the case of "process exited without logging anything useful".

There are still some aspects which could require more attention in a follow up PR:

  • I haven't looked at all the places where we either create a background context (should only really be done in tests (maybe) and in main(), but this requires more churn and should happen in a later PR.
  • I haven't audited all the places where we create go routines to make sure that we give them reasonable contexts and wait for them during shut down, if needed.

The changes for shutdown handling are pretty generic (though not implemented here. If needed later, we can either use stuff from one of my repos (if suitable) or factor the generic parts out to be used in other processes.

Description by Callstackai

This PR improves the API service shutdown handling and top-level context management. It ensures that the shutdown process is cleaner and more correct, avoiding unnecessary context creation and ensuring all request handlers complete before exiting. Additionally, it enhances error handling and logging.

Diagrams of code changes
sequenceDiagram
    participant Main
    participant APIStore
    participant Server
    participant Cleanup

    Main->>Main: Create root context
    Main->>Main: Setup signal handling (SIGTERM)
    Main->>APIStore: NewAPIStore(ctx)
    Main->>Server: NewGinServer(signalCtx, apiStore, swagger, port)
    
    par HTTP Server
        Server->>Server: ListenAndServe()
    and Signal Handling
        Main->>Main: Wait for shutdown signal
    end

    alt Signal Received
        Main->>Server: Shutdown()
        Main->>Cleanup: Execute cleanup operations
        Cleanup->>APIStore: Close()
        Note over APIStore: Close all clients:<br/>- Analytics<br/>- Posthog<br/>- Orchestrator<br/>- TemplateManager<br/>- Database
    end

    Main->>Main: Exit with status code
Loading
Files Changed
FileSummary
packages/api/internal/handlers/store.goRefactored the APIStore initialization to accept a context and improved error handling in the Close method.
packages/api/main.goUpdated the main function to handle shutdown signals more effectively and improved context management for the server.
packages/shared/pkg/db/client.goModified the Close method to return an error instead of ignoring it.
packages/shared/pkg/telemetry/otel.goUpdated the InitOTLPExporter function to accept a context and return a function that also accepts a context for cleanup.

This PR includes files in programming languages that we currently do not support. We have not reviewed files with the extensions .go. See list of supported languages.

@ValentaTomas ValentaTomas added the improvement Improvement for current functionality label Jan 13, 2025
@ValentaTomas ValentaTomas self-assigned this Jan 13, 2025
@jakubno
Copy link
Member

jakubno commented Jan 20, 2025

Hey @tychoish, I saw you started adding some changes in orchestrator and template-manager. Are those relevant to API part? Ideally we would have several smaller PRs rather than one PR.

@tychoish
Copy link
Contributor Author

@jakubno definitely can split it up: the change to the API wanted a changed to a shared package. There's no way to change the shared code and the calling code (in the API and orchestrator packages,) in different commits, but I can pull the shared change from the larger API changes. Regardless the orchestrator change is pretty small.

@jakubno
Copy link
Member

jakubno commented Jan 21, 2025

No problem with that, just making sure it has limited scope

Copy link
Contributor Author

@tychoish tychoish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this PR, is a superset of #248 (which can be merged at any time.)

Getting #248 out of the way will make this easier to review, but it's testable in its current form, will set this PR to draft while I test, but there's no further work to do here.

packages/api/main.go Show resolved Hide resolved
@tychoish tychoish marked this pull request as draft January 24, 2025 02:37
@ValentaTomas ValentaTomas assigned jakubno and unassigned ValentaTomas Jan 24, 2025
@tychoish tychoish marked this pull request as ready for review January 24, 2025 13:58
@tychoish
Copy link
Contributor Author

Status:

@tychoish tychoish requested a review from jakubno January 24, 2025 14:06
@tychoish
Copy link
Contributor Author

Update: I've thoroughly tested this and am confident that it's safe to merge and deploy.

@tychoish
Copy link
Contributor Author

Based on:

It seems like we should make this handle SIGINT in addition to SIGTERM.

SIGTERM is slightly more common for this in other non-nomad tools, so I think handling both as "graceful shutdown triggers" is correct.

packages/api/main.go Show resolved Hide resolved
@jakubno jakubno merged commit b4ca230 into e2b-dev:main Jan 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement for current functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants