Helm charts for the seqr platform
This repo consists of helm charts defining the seqr platform. Helm is a package manager for Kubernetes, an open source system for automating deployment and management of containerized applications.
- The seqr application chart consists of deployments for the seqr application, the
redis
cache andpostgresql
relational database. Theredis
andpostgresql
services may be disabled ifseqr
is running in a cloud environment with access to managed services. Note that this deployment does not include support forelasticsearch
. - The hail-search application chart contains a deployment of the service powering variant search within seqr.
- The pipeline-runner application chart contains the multiple services that make up the seqr loading pipeline. This chart also runs the luigi scheduler user interface to view running pipeline tasks.
- A lib library chart for resources shared between the other charts.
- The seqr-platform umbrella chart that bundles the composing charts into a single installable.
The Kubernetes ecosystem contains many standardized and custom solutions across a wide range of cloud and on-premises environments. To avoid the complexity of a full-fledged production environment and to achieve parity with the existing docker-compose, we recommend setting up a simple local Kubernetes cluster on an on-premises server or a cloud Virtual Machine with at least 32GB
of memory and 750GB
of disk space. While there is no requirement for the minimum number of CPUs, having more available will significantly speed up data loading and some searches.
Install the four required kubernetes infrastructure components:
- The
docker
container engine.- If running
Docker Desktop
on a laptop, make sure to set your CPU and Memory limits under Settings > Resources > Advanced. - If running on linux, make sure docker can be run without
sudo
(https://docs.docker.com/engine/install/linux-postinstall/)
- If running
- The
kubectl
command line client. - The
kind
local cluster manager. - The
helm
package manager.
Then:
- Create a local
/var/seqr
directory to be mounted into the Kubernetes cluster. This will host all seqr application data:sudo mkdir -p /var/seqr sudo chmod 777 /var/seqr
- Start a
kind
cluster:Note that kubernetes can have unexpected behavior when run withcurl https://raw.githubusercontent.com/broadinstitute/seqr-helm/refs/heads/main/kind.yaml > kind.yaml kind create cluster --config kind.yaml
sudo
. Make sure to run this and all otherkubectl
/kind
/helm
commands without it - Create the Required Secrets in your cluster using
kubectl
. - Migrate any existing application data.
- Install the
seqr-platform
chart with any override values:helm repo add seqr-helm https://broadinstitute.github.io/seqr-helm helm install YOUR_INSTITUTION_NAME-seqr seqr-helm/seqr-platform
After install you should expect to something like:
helm install YOUR_INSTITUTION_NAME-seqr seqr-helm/seqr-platform
NAME: YOUR_INSTITUTION_NAME-seqr
LAST DEPLOYED: Wed Oct 16 14:50:22 2024
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
The first deployment will include a download of all of the genomic reference data (400GB+). It is likely to be slow, but can be monitored by checking the contents of /var/seqr/seqr-reference-data
. Additionally, you may check the status of the services with:
kubectl get pods
NAME READY STATUS RESTARTS AGE
hail-search-7678986f7-n8655 1/1 Running 0 22m
pipeline-runner-api-5557bbc7-vrtcj 2/2 Running 0 22m
pipeline-runner-ui-749c94468f-62rtv 1/1 Running 0 22m
seqr-68d7b855fb-bjppn 1/1 Running 0 22m
seqr-check-new-samples-job-28818190-vlhxj 0/1 Completed 0 22m
seqr-postgresql-0 1/1 Running 0 22m
seqr-redis-master-0 1/1 Running 0 22m
While the reference data is downloading, the pipeline-runner-api pod should be in the Init
state
pipeline-runner-api-5557bbc7-vrtcj 0/2 Init:0/4 0 8m51s
Once services are healthy, you may create a seqr admin user using the pod name from the above output:
kubectl exec seqr-68d7b855fb-bjppn -c seqr -it -- bash
python3 /seqr/manage.py createsuperuser
The seqr application expects a few secrets to be defined for the services to start. The default expected secrets are declared in the default values.yaml
file of the seqr application chart. You should create these secrets in your kubernetes cluster prior to attempting to install the chart.
- A secret containing a
password
field for the postgres database password. By default this secret is namedpostgres-secrets
. - A secret containing a
django_key
field for the django security key. By default this secret is namedseqr-secrets
.
Here's how you might create the secrets:
kubectl create secret generic postgres-secrets \
--from-literal=password='super-secure-password'
kubectl create secret generic seqr-secrets \
--from-literal=django_key='securely-generated-key'
Alternatively, you can use your preferred method for defining secrets in kubernetes. For example, you might use External Secrets to synchronize secrets from your cloud provider into your kubernetes cluster.
- If you wish to preserve your existing application state in
postgresql
, you may move your existing./data/postgres
to/var/seqr/postgresql-data
. You should see:
cat /var/seqr/postgresql-data/PG_VERSION
12
-
To migrate static files, you may move your existing
./data/seqr_static_files
to/var/seqr/seqr-static-media
. -
To migrate
readviz
, you may move your existing./data/readviz
directory to/var/seqr/seqr-static-media
and additionally run theupdate_igv_location.py
manage.py
command:
kubectl exec seqr-68d7b855fb-bjppn -c seqr -it -- bash
python3 /seqr/manage.py update_igv_location old_prefix new_prefix
Note that you do not need to migrate any elasticsearch data.
All default values in the seqr-platform
chart may be overriden with helm's Values file functionality. For example, to disable the postgresql
deployment, you might create a file my-values.yaml
with the contents:
seqr:
postgresql:
enabled: false
This is also the recommended pattern for overriding any seqr
environment variables:
seqr:
environment:
GUNICORN_WORKER_THREADS: "8"
A more comprehensive example of what this may look like, and how the different values are formated in practice, is found in the seqr unit tests.
To fetch the latest versions of the helm
infrastructure and seqr
application code, you may run:
helm repo update
helm upgrade YOUR_INSTITUTION_NAME-seqr seqr-helm/seqr-platform
To update reference data in seqr, such as OMIM, HPO, etc., run the following
kubectl exec seqr-68d7b855fb-bjppn -c seqr -it -- bash
python3 /seqr/manage.py update_all_reference_data --use-cached-omim --skip-gencode
- How do I uninstall
seqr
and remove all application data?
helm uninstall YOUR_INSTITUTION_NAME-seqr
kind delete cluster
rm -rf /var/seqr
- How do I view
seqr
's disk utilization? You may access the size of each of the on-disk components with:
du -sh /var/seqr/*
- How do I tail logs? To tail the logs of the pipeline worker after you have started a pipeline run, for example:
kubectl get pods -o name | grep pipeline-runner-api
pipeline-runner-api-5557bbc7-vrtcj
kubectl logs pipeline-runner-api-5557bbc7-vrtcj -c pipeline-runner-api-sidecar
2024-10-16 18:24:27 - pipeline_worker - INFO - Waiting for work
2024-10-16 18:24:28 - pipeline_worker - INFO - Waiting for work
2024-10-16 18:24:29 - pipeline_worker - INFO - Waiting for work
....
base_hail_table - INFO - UpdatedCachedReferenceDatasetQuery(reference_genome=GRCh37, dataset_type=SNV_INDEL, crdq=CLINVAR_PATH_VARIANTS) start
[Stage 42:========>