-
Notifications
You must be signed in to change notification settings - Fork 4
Production checklist v1 #691
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for seqera-docs ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
Signed-off-by: Justine Geffen <[email protected]>
Signed-off-by: Justine Geffen <[email protected]>
Signed-off-by: Justine Geffen <[email protected]>
Signed-off-by: Justine Geffen <[email protected]>
@gavinelder, would really appreciate your input on the infra section. The goal is to keep iterating on this document and this is v1. |
Stylistic and language changes Signed-off-by: Justine Geffen <[email protected]>
Signed-off-by: Justine Geffen <[email protected]>
Signed-off-by: Justine Geffen <[email protected]>
|
||
Infrastructural requirements differ widely based on the workload you’re expecting. | ||
|
||
To begin with, build out a proof of concept using the below recommendations, to create a baseline of your capacity requirements. Once you’re ready to move to production, take into account the increased workload you’d expect. Here are some starting points for estimating compute and database requirements. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have any examples we can provide as a good template for getting started?
|
||
These autoscale for pipeline runs, but the sizing recommendation will be based on the workload and can vary significantly based on the number of pipelines, and number of concurrent processes, you have in mind. Consult the [Azure autoscaling documentation](https://learn.microsoft.com/en-us/azure/azure-monitor/autoscale/autoscale-get-started) for information about scaling in Azure. | ||
|
||
## Spot instance retry strategy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would we provide an example here or link to one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tutorial we link to does cover the retry strategy but you make a good point. Maybe this would be better as a list with links to sections within the tutorial.
|
||
## Seqera Platform limitations | ||
|
||
When cancelling large runs, make sure to check that all jobs were killed in your compute environment (ZOMBIE JOBS). Large runs sometimes leak jobs because Nextflow is killed before it can cancel all of them, which can lead to significant cost overruns. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can I do this as a user? How do I know what a zombie job is?
Signed-off-by: Justine Geffen <[email protected]>
WIP first draft of production checklist to cover infrastructure, retry strategy, security, and Platform limitations.
Outstanding
@netlify /platform-cloud/getting-started/production-checklist