Skip to content

DOC-136 #129

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
Original file line number Diff line number Diff line change
@@ -0,0 +1,279 @@
# Deploy Cortex on Kubernetes

Deploying Cortex on Kubernetes improves scalability, reliability, and resource management. Kubernetes handles automated deployment, dynamic resource allocation, and isolated execution of analyzers and responders, boosting performance and security. This setup simplifies the management of large workloads.

This guide provides step-by-step instructions for deploying Cortex on a Kubernetes cluster.

You will learn how to:

* [Configure a shared filesystem](#configure-a-shared-filesystem) to enable Cortex and its jobs to exchange data by allowing different pods to share input files, store job results, while ensuring consistent and reliable data access across the Kubernetes cluster

* [Set up a Kubernetes service account (SA)](#set-up-a-kubernetes-service-account) with the necessary permissions for Cortex to communicate with the Kubernetes API and create jobs for running analyzers and responders

## Configure a shared filesystem

!!! warning "Configuration errors"
Improperly configured shared filesystems can cause errors when running jobs with Cortex.

When running on Kubernetes, Cortex launches a new pod for each analyzer or responder execution. After the job completes and Cortex retrieves the result, the pod is terminated. A shared filesystem allows these jobs to share input data, store and retrieve results, ensure consistency across pods, and enable concurrent access.

Kubernetes supports several methods for sharing filesystems between pods, including:

* [PersistentVolume (PV) using an NFS server](https://kubernetes.io/docs/concepts/storage/volumes/#nfs)
* Dedicated storage solutions like [Longhorn](https://longhorn.io/) or [Rook](https://rook.io/)

This guide focuses on configuring a PV using an NFS server, with an example for [AWS Elastic File System (EFS)](https://aws.amazon.com/efs/).

### Step 1: Ensure all users can access files on the shared filesystem

At runtime, Cortex and its jobs run on different pods and may use different user IDs (UIDs) and group IDs (GIDs):

* Cortex defaults to uid:gid `1001:1001`.
* Analyzers may use different uid:gid, such as `1000:1000` or `0:0` if running as root.

To prevent permission errors when reading or writing files on the shared filesystem, [configure the NFS server](https://manpages.ubuntu.com/manpages/noble/man5/exports.5.html) with the `all_squash` parameter. This ensures all filesystem operations use uid:gid `65534:65534`, regardless of the user's actual UID and GID.

### Step 2: Define a PersistentVolume for the NFS server

A [PersistentVolume (PV)](https://kubernetes.io/docs/concepts/storage/persistent-volumes/) represents a piece of storage provisioned by an administrator or dynamically provisioned using storage classes. When using an NFS server, the PV allows multiple pods to access shared storage concurrently.

To define a PV for your NFS server:

1. Ensure NFS server accessibility.

Confirm that your NFS server is running and accessible from the Kubernetes cluster.

!!! note "Using a different storage solution?"
If you're using another storage system, create a ReadWriteMany PV following your tool’s documentation, then continue with the next step.

2. Create a PersistentVolume manifest.

This manifest indicates how pods should connect to your NFS server:

```yaml
apiVersion: v1
kind: PersistentVolume
metadata:
name: cortex-shared-fs
spec:
storageClassName: standard
capacity:
storage: 10Gi # Allocate enough storage to fit your observables
accessModes:
- ReadWriteMany # Allows multiple pods to read and write simultaneously
nfs:
server: 172.31.0.1 # IP address of the NFS server
path: "/srv/cortex" # Path on the NFS server used as the PV root
mountOptions:
- nfsvers=4.2 # Specify the NFS client version
```

### Step 3: Create a PersistentVolumeClaim

A [PersistentVolumeClaim (PVC)](https://kubernetes.io/docs/concepts/storage/persistent-volumes/) is a request for storage by a pod. It connects to an existing PV or dynamically creates one, specifying the required storage capacity.

This manifest references the previously defined PV:

```yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: cortex-shared-fs-claim
spec:
# Ensure the following parameters match your previously defined PV
storageClassName: standard
accessModes:
- ReadWriteMany
resources:
requests:
storage: 10Gi
```

### Step 4: Edit your Cortex deployment

Edit your Cortex deployment manifest to configure how Cortex runs within your Kubernetes cluster. This configuration connects Cortex to the shared filesystem by mounting the PVC, enabling Cortex to access and store job data.

!!! warning "Partial deployment manifest"
The following manifest is only a snippet of the full deployment. It highlights the relevant parameters, and you should integrate them into your complete deployment configuration.

```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: cortex
spec:
replicas: 1 # Number of pods to run
selector:
matchLabels:
app.kubernetes.io/name: cortex
template:
metadata:
labels:
app.kubernetes.io/name: cortex
spec:
containers:
- name: cortex
# Command configuration can also be set using environment variables or an application.conf file
command:
- /opt/cortex/entrypoint
# Directory for storing job data, should match the mountPath defined below
- --job-directory
- /tmp/cortex-jobs
# Reference the name of the ReadWriteMany PVC you created
- --kubernetes-job-pvc
- cortex-shared-fs-claim
volumeMounts:
# Must match the name defined in the volumes section
- name: cortex-job-pvc
mountPath: /tmp/cortex-jobs
# (...)
volumes:
- name: cortex-job-pvc
persistentVolumeClaim:
# Reference the previously defined PVC
claimName: cortex-shared-fs-claim
```

### Example: Deploy Cortex on Kubernetes using AWS EFS

#### Prerequisites

Before setting up the PV for AWS EFS, complete the following steps:

1. [Create an IAM role](https://docs.aws.amazon.com/eks/latest/userguide/efs-csi.html) to allow the EFS CSI driver to interact with EFS.
2. Install the EFS CSI driver on your Kubernetes cluster using one of the following methods:
* [EKS add-ons](https://www.eksworkshop.com/docs/fundamentals/storage/efs/efs-csi-driver) (recommended)
* [Official Helm Chart](https://github.com/kubernetes-sigs/aws-efs-csi-driver/releases?q=helm-chart&expanded=true)
3. [Create an EFS filesystem](https://github.com/kubernetes-sigs/aws-efs-csi-driver/blob/master/docs/efs-create-filesystem.md) and note the associated EFS filesystem ID.

#### Step 1: Create a StorageClass for EFS

!!! note "Reference example"
The following manifests are based on the [EFS CSI driver multiple pods example](https://github.com/kubernetes-sigs/aws-efs-csi-driver/tree/master/examples/kubernetes/multiple_pods).

Create a StorageClass that references your EFS filesystem:

```yaml
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: efs-sc
provisioner: efs.csi.aws.com
# https://github.com/kubernetes-sigs/aws-efs-csi-driver?tab=readme-ov-file#storage-class-parameters-for-dynamic-provisioning
parameters:
provisioningMode: efs-ap # EFS access point provisioning mode
fileSystemId: fs-01234567 # Replace with your EFS filesystem ID
directoryPerms: "700" # Permissions for newly created directories
uid: 1001 # User ID to set file permissions
gid: 1001 # Group ID to set file permissions
ensureUniqueDirectory: "false" # Set to false to allow shared folder access between Cortex and job containers
subPathPattern: "${.PVC.namespace}/${.PVC.name}" # Optional subfolder structure inside the NFS filesystem
```

#### Step 2: Create a PVC using the EFS StorageClass

Kubernetes will automatically create a PV when a PVC is defined using the EFS StorageClass.

Define the PVC as follows:

```yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: cortex-shared-fs-claim
spec:
storageClassName: efs-sc # References the EFS StorageClass
# (...)
```

## Set up a Kubernetes service account

!!! warning "Service account configuration required"
If you don't configure the Kubernetes service account properly, it won't be able to create Kubernetes jobs to run analyzers or responders.

In Kubernetes, a service account (SA) allows a pod to authenticate and interact with the Kubernetes API, enabling it to perform specific actions within the cluster.

When deploying Cortex, a dedicated SA is essential for creating and managing Kubernetes jobs that run analyzers and responders. Without proper configuration, Cortex can't execute these jobs.

### Step 1: Create a SA for Cortex

```yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: cortex
```

### Step 2: Define a role for Cortex job execution

```yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: cortex-job-runner
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "list"]
- apiGroups: ["batch"]
resources: ["jobs"]
verbs: ["create", "delete", "get", "list", "watch"]
```

### Step 3: Create a RoleBinding to link the SA and role

```yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: cortex-job-runner-binding
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: cortex-job-runner
subjects:
- kind: ServiceAccount
name: cortex
```
### Step 4: Assign the SA in the Cortex deployment

```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: cortex
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/name: cortex
template:
metadata:
labels:
app.kubernetes.io/name: cortex
spec:
containers:
# (...)
volumes:
# (...)
serviceAccountName: cortex
```

## Verify the deployment and service account

Run the following command to ensure the deployment is running with the correct service account (SA):

```bash
kubectl get deployment cortex -o=jsonpath='{.spec.template.spec.serviceAccountName}'
```

Verify the PVC is bound:

```bash
kubectl get pvc cortex-shared-fs-claim
```

## Next steps

* [Analyzers & Responders](analyzers-responders.md)
* [Advanced Configuration](advanced-configuration.md)
5 changes: 2 additions & 3 deletions docs/cortex/installation-and-configuration/index.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Installation & configuration guides
# Installation and Configuration Guides

## Overview
Cortex relies on Elasticsearch to store its data. A basic setup to install Elasticsearch, then Cortex on a standalone and dedicated server (physical or virtual).
Expand Down Expand Up @@ -38,10 +38,9 @@ Cortex has been tested and is supported on the following operating systems:
3. Install Cortex and all its dependencies to run Analyzers & Responders as Docker Iiages
4. Install Cortex and all its dependencies to run Analyzers & Responders on the host (Debian and Ubuntu **ONLY**)



For each release, DEB, RPM and ZIP binary packages are built and provided.

For deploying Cortex on a Kubernetes cluster, refer to our detailed [Kubernetes deployment guide](deploy-cortex-on-kubernetes.md).

The [following Guide](step-by-step-guide.md) let you **prepare**, **install** and **configure** Cortex and its prerequisites for Debian and RPM packages based Operating Systems, as well as for other systems and using our binary packages.

Expand Down
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -637,6 +637,7 @@ nav:
- ./cortex/installation-and-configuration/proxy-settings.md
- ./cortex/installation-and-configuration/docker.md
- ./cortex/installation-and-configuration/database.md
- ./cortex/installation-and-configuration/deploy-cortex-on-kubernetes.md
- 'User Guides':
- 'First start' : 'cortex/user-guides/first-start.md'
- 'User roles' : 'cortex/user-guides/roles.md'
Expand Down