Skip to content

Latest commit

 

History

History
180 lines (144 loc) · 9.53 KB

README.md

File metadata and controls

180 lines (144 loc) · 9.53 KB

seqr-helm

Helm charts for the seqr platform

Overview

This repo consists of helm charts defining the seqr platform. Helm is a package manager for Kubernetes, an open source system for automating deployment and management of containerized applications.

  1. The seqr application chart consists of deployments for the seqr application, the redis cache and postgresql relational database. The redis and postgresql services may be disabled if seqr is running in a cloud environment with access to managed services. Note that this deployment does not include support for elasticsearch.
  2. The hail-search application chart contains a deployment of the service powering variant search within seqr.
  3. The pipeline-runner application chart contains the multiple services that make up the seqr loading pipeline. This chart also runs the luigi scheduler user interface to view running pipeline tasks.
  4. A lib library chart for resources shared between the other charts.
  5. The seqr-platform umbrella chart that bundles the composing charts into a single installable.

Instructions for Initial Deployment

The Kubernetes ecosystem contains many standardized and custom solutions across a wide range of cloud and on-premises environments. To avoid the complexity of a full-fledged production environment and to achieve parity with the existing docker-compose, we recommend setting up a simple local Kubernetes cluster on an on-premises server or a cloud Virtual Machine with at least 32GB of memory and 750GB of disk space. While there is no requirement for the minimum number of CPUs, having more available will significantly speed up data loading and some searches.

Install the four required kubernetes infrastructure components:

  1. The docker container engine.
  2. The kubectl command line client.
  3. The kind local cluster manager.
  4. The helm package manager.

Then:

  1. Create a local /var/seqr directory to be mounted into the Kubernetes cluster. This will host all seqr application data:
    sudo mkdir -p /var/seqr
    sudo chmod 777 /var/seqr 
    
  2. Start a kind cluster:
    curl https://raw.githubusercontent.com/broadinstitute/seqr-helm/refs/heads/main/kind.yaml > kind.yaml
    kind create cluster --config kind.yaml
    
    Note that kubernetes can have unexpected behavior when run with sudo. Make sure to run this and all other kubectl/kind/helm commands without it
  3. Create the Required Secrets in your cluster using kubectl.
  4. Migrate any existing application data.
  5. Install the seqr-platform chart with any override values:
    helm repo add seqr-helm https://broadinstitute.github.io/seqr-helm
    helm install YOUR_INSTITUTION_NAME-seqr seqr-helm/seqr-platform
    

After install you should expect to something like:

helm install YOUR_INSTITUTION_NAME-seqr seqr-helm/seqr-platform 
NAME: YOUR_INSTITUTION_NAME-seqr
LAST DEPLOYED: Wed Oct 16 14:50:22 2024
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None

The first deployment will include a download of all of the genomic reference data (400GB+). It is likely to be slow, but can be monitored by checking the contents of /var/seqr/seqr-reference-data. Additionally, you may check the status of the services with:

kubectl get pods
NAME                                        READY   STATUS      RESTARTS      AGE
hail-search-7678986f7-n8655                 1/1     Running     0             22m
pipeline-runner-api-5557bbc7-vrtcj          2/2     Running     0             22m
pipeline-runner-ui-749c94468f-62rtv         1/1     Running     0             22m
seqr-68d7b855fb-bjppn                       1/1     Running     0             22m
seqr-check-new-samples-job-28818190-vlhxj   0/1     Completed   0             22m
seqr-postgresql-0                           1/1     Running     0             22m
seqr-redis-master-0                         1/1     Running     0             22m

While the reference data is downloading, the pipeline-runner-api pod should be in the Init state

pipeline-runner-api-5557bbc7-vrtcj        0/2     Init:0/4    0             8m51s

Once services are healthy, you may create a seqr admin user using the pod name from the above output:

kubectl exec seqr-68d7b855fb-bjppn -c seqr -it -- bash
python3 /seqr/manage.py createsuperuser

Required Secrets

The seqr application expects a few secrets to be defined for the services to start. The default expected secrets are declared in the default values.yaml file of the seqr application chart. You should create these secrets in your kubernetes cluster prior to attempting to install the chart.

  1. A secret containing a password field for the postgres database password. By default this secret is named postgres-secrets.
  2. A secret containing a django_key field for the django security key. By default this secret is named seqr-secrets.

Here's how you might create the secrets:

kubectl create secret generic postgres-secrets \
  --from-literal=password='super-secure-password'

kubectl create secret generic seqr-secrets \
  --from-literal=django_key='securely-generated-key'

Alternatively, you can use your preferred method for defining secrets in kubernetes. For example, you might use External Secrets to synchronize secrets from your cloud provider into your kubernetes cluster.

Migrating Application Data from docker-compose.yaml

  • If you wish to preserve your existing application state in postgresql, you may move your existing ./data/postgres to /var/seqr/postgresql-data. You should see:
cat /var/seqr/postgresql-data/PG_VERSION
12
kubectl exec seqr-68d7b855fb-bjppn -c seqr -it -- bash
python3 /seqr/manage.py update_igv_location old_prefix new_prefix

Note that you do not need to migrate any elasticsearch data.

Values/Environment Overrides

All default values in the seqr-platform chart may be overriden with helm's Values file functionality. For example, to disable the postgresql deployment, you might create a file my-values.yaml with the contents:

seqr:
  postgresql:
    enabled: false

This is also the recommended pattern for overriding any seqr environment variables:

seqr:
  environment:
    GUNICORN_WORKER_THREADS: "8"

A more comprehensive example of what this may look like, and how the different values are formated in practice, is found in the seqr unit tests.

Updating seqr

To fetch the latest versions of the helm infrastructure and seqr application code, you may run:

helm repo update
helm upgrade YOUR_INSTITUTION_NAME-seqr seqr-helm/seqr-platform

To update reference data in seqr, such as OMIM, HPO, etc., run the following

kubectl exec seqr-68d7b855fb-bjppn -c seqr -it -- bash
python3 /seqr/manage.py update_all_reference_data --use-cached-omim --skip-gencode

Debugging FAQ

  • How do I uninstall seqr and remove all application data?
helm uninstall YOUR_INSTITUTION_NAME-seqr
kind delete cluster
rm -rf /var/seqr
  • How do I view seqr's disk utilization? You may access the size of each of the on-disk components with:
du -sh /var/seqr/*
  • How do I tail logs? To tail the logs of the pipeline worker after you have started a pipeline run, for example:
kubectl get pods -o name | grep pipeline-runner-api
pipeline-runner-api-5557bbc7-vrtcj
kubectl logs pipeline-runner-api-5557bbc7-vrtcj -c pipeline-runner-api-sidecar
2024-10-16 18:24:27 - pipeline_worker - INFO - Waiting for work
2024-10-16 18:24:28 - pipeline_worker - INFO - Waiting for work
2024-10-16 18:24:29 - pipeline_worker - INFO - Waiting for work
....
base_hail_table - INFO - UpdatedCachedReferenceDatasetQuery(reference_genome=GRCh37, dataset_type=SNV_INDEL, crdq=CLINVAR_PATH_VARIANTS) start
[Stage 42:========>