Production checklist v1 #691

justinegeffen · 2025-07-03T21:22:02Z

WIP first draft of production checklist to cover infrastructure, retry strategy, security, and Platform limitations.

Outstanding

IAM policies content (to be added to Platform docs and linked)
Platform limitations section expansion
Security recommendations

@netlify /platform-cloud/getting-started/production-checklist

netlify · 2025-07-03T21:22:06Z

✅ Deploy Preview for seqera-docs ready!

Name	Link
🔨 Latest commit	`91a145a`
🔍 Latest deploy log	https://app.netlify.com/projects/seqera-docs/deploys/68779d67823188000807c74c
😎 Deploy Preview	https://deploy-preview-691--seqera-docs.netlify.app/platform-cloud/getting-started/production-checklist
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Signed-off-by: Justine Geffen <[email protected]>

justinegeffen · 2025-07-11T13:57:45Z

@gavinelder, would really appreciate your input on the infra section. The goal is to keep iterating on this document and this is v1.

Stylistic and language changes Signed-off-by: Justine Geffen <[email protected]>

Signed-off-by: Justine Geffen <[email protected]>

MichaelTansiniSeqera · 2025-07-15T08:37:31Z

platform-cloud/docs/getting-started/production-checklist.md

+
+Infrastructural requirements differ widely based on the workload you’re expecting. 
+
+To begin with, build out a proof of concept using the below recommendations, to create a baseline of your capacity requirements. Once you’re ready to move to production, take into account the increased workload you’d expect. Here are some starting points for estimating compute and database requirements.


Do we have any examples we can provide as a good template for getting started?

MichaelTansiniSeqera · 2025-07-15T09:31:58Z

platform-cloud/docs/getting-started/production-checklist.md

+
+These autoscale for pipeline runs, but the sizing recommendation will be based on the workload and can vary significantly based on the number of pipelines, and number of concurrent processes, you have in mind. Consult the [Azure autoscaling documentation](https://learn.microsoft.com/en-us/azure/azure-monitor/autoscale/autoscale-get-started) for information about scaling in Azure.
+
+## Spot instance retry strategy


Would we provide an example here or link to one?

The tutorial we link to does cover the retry strategy but you make a good point. Maybe this would be better as a list with links to sections within the tutorial.

MichaelTansiniSeqera · 2025-07-15T09:32:34Z

platform-cloud/docs/getting-started/production-checklist.md

+
+## Seqera Platform limitations
+
+When cancelling large runs, make sure to check that all jobs were killed in your compute environment (ZOMBIE JOBS). Large runs sometimes leak jobs because Nextflow is killed before it can cancel all of them, which can lead to significant cost overruns.


How can I do this as a user? How do I know what a zombie job is?

platform-cloud/docs/getting-started/production-checklist.md

Signed-off-by: Justine Geffen <[email protected]>

Draft of production checklist

43dba9d

justinegeffen added 6 commits July 7, 2025 22:11

Update overview.md

a2c1ad1

Signed-off-by: Justine Geffen <[email protected]>

Update production-checklist.md

75320e2

Signed-off-by: Justine Geffen <[email protected]>

Update production-checklist.md

2a2f4fc

Signed-off-by: Justine Geffen <[email protected]>

Update cloud-sidebar.json

6616781

Signed-off-by: Justine Geffen <[email protected]>

Fixed build break

0c907f0

Merge branch 'master' into justine-prod-checklist

1115c13

justinegeffen requested a review from llewellyn-sl July 11, 2025 13:49

justinegeffen added 1. Editor review Needs a language review 1. Dev/PM/SME Needs a review by a Dev/PM/SME labels Jul 11, 2025

justinegeffen requested review from drewdipalma and gavinelder July 11, 2025 13:56

justinegeffen added the draft/WIP label Jul 11, 2025

Merge branch 'master' into justine-prod-checklist

292da28

justinegeffen removed request for gavinelder, drewdipalma and llewellyn-sl July 14, 2025 07:05

justinegeffen added 3 commits July 14, 2025 10:27

Update production-checklist.md

86bb212

Stylistic and language changes Signed-off-by: Justine Geffen <[email protected]>

Update production-checklist.md

abaa2b3

Signed-off-by: Justine Geffen <[email protected]>

Update production-checklist.md

72859ce

Signed-off-by: Justine Geffen <[email protected]>

MichaelTansiniSeqera reviewed Jul 15, 2025

View reviewed changes

justinegeffen commented Jul 16, 2025

View reviewed changes

platform-cloud/docs/getting-started/production-checklist.md Outdated Show resolved Hide resolved

Update platform-cloud/docs/getting-started/production-checklist.md

91a145a

Signed-off-by: Justine Geffen <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Production checklist v1 #691

Production checklist v1 #691

Uh oh!

justinegeffen commented Jul 3, 2025 •

edited

Loading

Uh oh!

netlify bot commented Jul 3, 2025 •

edited

Loading

Uh oh!

justinegeffen commented Jul 11, 2025

Uh oh!

MichaelTansiniSeqera Jul 15, 2025

Uh oh!

MichaelTansiniSeqera Jul 15, 2025

Uh oh!

justinegeffen Jul 16, 2025

Uh oh!

MichaelTansiniSeqera Jul 15, 2025

Uh oh!

Uh oh!

Uh oh!


		Infrastructural requirements differ widely based on the workload you’re expecting.

		To begin with, build out a proof of concept using the below recommendations, to create a baseline of your capacity requirements. Once you’re ready to move to production, take into account the increased workload you’d expect. Here are some starting points for estimating compute and database requirements.


		These autoscale for pipeline runs, but the sizing recommendation will be based on the workload and can vary significantly based on the number of pipelines, and number of concurrent processes, you have in mind. Consult the [Azure autoscaling documentation](https://learn.microsoft.com/en-us/azure/azure-monitor/autoscale/autoscale-get-started) for information about scaling in Azure.

		## Spot instance retry strategy


		## Seqera Platform limitations

		When cancelling large runs, make sure to check that all jobs were killed in your compute environment (ZOMBIE JOBS). Large runs sometimes leak jobs because Nextflow is killed before it can cancel all of them, which can lead to significant cost overruns.

Production checklist v1 #691

Are you sure you want to change the base?

Production checklist v1 #691

Uh oh!

Conversation

justinegeffen commented Jul 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

netlify bot commented Jul 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for seqera-docs ready!

Uh oh!

justinegeffen commented Jul 11, 2025

Uh oh!

MichaelTansiniSeqera Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

MichaelTansiniSeqera Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

justinegeffen Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

MichaelTansiniSeqera Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

justinegeffen commented Jul 3, 2025 •

edited

Loading

netlify bot commented Jul 3, 2025 •

edited

Loading