diff --git a/docs/cortex/installation-and-configuration/deploy-cortex-on-kubernetes.md b/docs/cortex/installation-and-configuration/deploy-cortex-on-kubernetes.md new file mode 100644 index 000000000..839f7a38f --- /dev/null +++ b/docs/cortex/installation-and-configuration/deploy-cortex-on-kubernetes.md @@ -0,0 +1,279 @@ +# Deploy Cortex on Kubernetes + +Deploying Cortex on Kubernetes improves scalability, reliability, and resource management. Kubernetes handles automated deployment, dynamic resource allocation, and isolated execution of analyzers and responders, boosting performance and security. This setup simplifies the management of large workloads. + +This guide provides step-by-step instructions for deploying Cortex on a Kubernetes cluster. + +You will learn how to: + +* [Configure a shared filesystem](#configure-a-shared-filesystem) to enable Cortex and its jobs to exchange data by allowing different pods to share input files, store job results, while ensuring consistent and reliable data access across the Kubernetes cluster + +* [Set up a Kubernetes service account (SA)](#set-up-a-kubernetes-service-account) with the necessary permissions for Cortex to communicate with the Kubernetes API and create jobs for running analyzers and responders + +## Configure a shared filesystem + +!!! warning "Configuration errors" + Improperly configured shared filesystems can cause errors when running jobs with Cortex. + +When running on Kubernetes, Cortex launches a new pod for each analyzer or responder execution. After the job completes and Cortex retrieves the result, the pod is terminated. A shared filesystem allows these jobs to share input data, store and retrieve results, ensure consistency across pods, and enable concurrent access. + +Kubernetes supports several methods for sharing filesystems between pods, including: + +* [PersistentVolume (PV) using an NFS server](https://kubernetes.io/docs/concepts/storage/volumes/#nfs) +* Dedicated storage solutions like [Longhorn](https://longhorn.io/) or [Rook](https://rook.io/) + +This guide focuses on configuring a PV using an NFS server, with an example for [AWS Elastic File System (EFS)](https://aws.amazon.com/efs/). + +### Step 1: Ensure all users can access files on the shared filesystem + +At runtime, Cortex and its jobs run on different pods and may use different user IDs (UIDs) and group IDs (GIDs): + +* Cortex defaults to uid:gid `1001:1001`. +* Analyzers may use different uid:gid, such as `1000:1000` or `0:0` if running as root. + +To prevent permission errors when reading or writing files on the shared filesystem, [configure the NFS server](https://manpages.ubuntu.com/manpages/noble/man5/exports.5.html) with the `all_squash` parameter. This ensures all filesystem operations use uid:gid `65534:65534`, regardless of the user's actual UID and GID. + +### Step 2: Define a PersistentVolume for the NFS server + +A [PersistentVolume (PV)](https://kubernetes.io/docs/concepts/storage/persistent-volumes/) represents a piece of storage provisioned by an administrator or dynamically provisioned using storage classes. When using an NFS server, the PV allows multiple pods to access shared storage concurrently. + +To define a PV for your NFS server: + +1. Ensure NFS server accessibility. + + Confirm that your NFS server is running and accessible from the Kubernetes cluster. + + !!! note "Using a different storage solution?" + If you're using another storage system, create a ReadWriteMany PV following your tool’s documentation, then continue with the next step. + +2. Create a PersistentVolume manifest. + + This manifest indicates how pods should connect to your NFS server: + + ```yaml + apiVersion: v1 + kind: PersistentVolume + metadata: + name: cortex-shared-fs + spec: + storageClassName: standard + capacity: + storage: 10Gi # Allocate enough storage to fit your observables + accessModes: + - ReadWriteMany # Allows multiple pods to read and write simultaneously + nfs: + server: 172.31.0.1 # IP address of the NFS server + path: "/srv/cortex" # Path on the NFS server used as the PV root + mountOptions: + - nfsvers=4.2 # Specify the NFS client version + ``` + +### Step 3: Create a PersistentVolumeClaim + +A [PersistentVolumeClaim (PVC)](https://kubernetes.io/docs/concepts/storage/persistent-volumes/) is a request for storage by a pod. It connects to an existing PV or dynamically creates one, specifying the required storage capacity. + +This manifest references the previously defined PV: + +```yaml +apiVersion: v1 +kind: PersistentVolumeClaim +metadata: + name: cortex-shared-fs-claim +spec: + # Ensure the following parameters match your previously defined PV + storageClassName: standard + accessModes: + - ReadWriteMany + resources: + requests: + storage: 10Gi +``` + +### Step 4: Edit your Cortex deployment + +Edit your Cortex deployment manifest to configure how Cortex runs within your Kubernetes cluster. This configuration connects Cortex to the shared filesystem by mounting the PVC, enabling Cortex to access and store job data. + +!!! warning "Partial deployment manifest" + The following manifest is only a snippet of the full deployment. It highlights the relevant parameters, and you should integrate them into your complete deployment configuration. + +```yaml +apiVersion: apps/v1 +kind: Deployment +metadata: + name: cortex +spec: + replicas: 1 # Number of pods to run + selector: + matchLabels: + app.kubernetes.io/name: cortex + template: + metadata: + labels: + app.kubernetes.io/name: cortex + spec: + containers: + - name: cortex + # Command configuration can also be set using environment variables or an application.conf file + command: + - /opt/cortex/entrypoint + # Directory for storing job data, should match the mountPath defined below + - --job-directory + - /tmp/cortex-jobs + # Reference the name of the ReadWriteMany PVC you created + - --kubernetes-job-pvc + - cortex-shared-fs-claim + volumeMounts: + # Must match the name defined in the volumes section + - name: cortex-job-pvc + mountPath: /tmp/cortex-jobs + # (...) + volumes: + - name: cortex-job-pvc + persistentVolumeClaim: + # Reference the previously defined PVC + claimName: cortex-shared-fs-claim +``` + +### Example: Deploy Cortex on Kubernetes using AWS EFS + +#### Prerequisites + +Before setting up the PV for AWS EFS, complete the following steps: + +1. [Create an IAM role](https://docs.aws.amazon.com/eks/latest/userguide/efs-csi.html) to allow the EFS CSI driver to interact with EFS. +2. Install the EFS CSI driver on your Kubernetes cluster using one of the following methods: + * [EKS add-ons](https://www.eksworkshop.com/docs/fundamentals/storage/efs/efs-csi-driver) (recommended) + * [Official Helm Chart](https://github.com/kubernetes-sigs/aws-efs-csi-driver/releases?q=helm-chart&expanded=true) +3. [Create an EFS filesystem](https://github.com/kubernetes-sigs/aws-efs-csi-driver/blob/master/docs/efs-create-filesystem.md) and note the associated EFS filesystem ID. + +#### Step 1: Create a StorageClass for EFS + +!!! note "Reference example" + The following manifests are based on the [EFS CSI driver multiple pods example](https://github.com/kubernetes-sigs/aws-efs-csi-driver/tree/master/examples/kubernetes/multiple_pods). + +Create a StorageClass that references your EFS filesystem: + +```yaml +kind: StorageClass +apiVersion: storage.k8s.io/v1 +metadata: + name: efs-sc +provisioner: efs.csi.aws.com +# https://github.com/kubernetes-sigs/aws-efs-csi-driver?tab=readme-ov-file#storage-class-parameters-for-dynamic-provisioning +parameters: + provisioningMode: efs-ap # EFS access point provisioning mode + fileSystemId: fs-01234567 # Replace with your EFS filesystem ID + directoryPerms: "700" # Permissions for newly created directories + uid: 1001 # User ID to set file permissions + gid: 1001 # Group ID to set file permissions + ensureUniqueDirectory: "false" # Set to false to allow shared folder access between Cortex and job containers + subPathPattern: "${.PVC.namespace}/${.PVC.name}" # Optional subfolder structure inside the NFS filesystem +``` + +#### Step 2: Create a PVC using the EFS StorageClass + +Kubernetes will automatically create a PV when a PVC is defined using the EFS StorageClass. + +Define the PVC as follows: + +```yaml +apiVersion: v1 +kind: PersistentVolumeClaim +metadata: + name: cortex-shared-fs-claim +spec: + storageClassName: efs-sc # References the EFS StorageClass + # (...) +``` + +## Set up a Kubernetes service account + +!!! warning "Service account configuration required" + If you don't configure the Kubernetes service account properly, it won't be able to create Kubernetes jobs to run analyzers or responders. + +In Kubernetes, a service account (SA) allows a pod to authenticate and interact with the Kubernetes API, enabling it to perform specific actions within the cluster. + +When deploying Cortex, a dedicated SA is essential for creating and managing Kubernetes jobs that run analyzers and responders. Without proper configuration, Cortex can't execute these jobs. + +### Step 1: Create a SA for Cortex + +```yaml +apiVersion: v1 +kind: ServiceAccount +metadata: + name: cortex +``` + +### Step 2: Define a role for Cortex job execution + +```yaml +apiVersion: rbac.authorization.k8s.io/v1 +kind: Role +metadata: + name: cortex-job-runner +rules: + - apiGroups: [""] + resources: ["pods"] + verbs: ["get", "list"] + - apiGroups: ["batch"] + resources: ["jobs"] + verbs: ["create", "delete", "get", "list", "watch"] +``` + +### Step 3: Create a RoleBinding to link the SA and role + +```yaml +apiVersion: rbac.authorization.k8s.io/v1 +kind: RoleBinding +metadata: + name: cortex-job-runner-binding +roleRef: + apiGroup: rbac.authorization.k8s.io + kind: Role + name: cortex-job-runner +subjects: + - kind: ServiceAccount + name: cortex +``` +### Step 4: Assign the SA in the Cortex deployment + +```yaml +apiVersion: apps/v1 +kind: Deployment +metadata: + name: cortex +spec: + replicas: 1 + selector: + matchLabels: + app.kubernetes.io/name: cortex + template: + metadata: + labels: + app.kubernetes.io/name: cortex + spec: + containers: + # (...) + volumes: + # (...) + serviceAccountName: cortex +``` + +## Verify the deployment and service account + +Run the following command to ensure the deployment is running with the correct service account (SA): + +```bash +kubectl get deployment cortex -o=jsonpath='{.spec.template.spec.serviceAccountName}' +``` + +Verify the PVC is bound: + +```bash +kubectl get pvc cortex-shared-fs-claim +``` + +## Next steps + +* [Analyzers & Responders](analyzers-responders.md) +* [Advanced Configuration](advanced-configuration.md) \ No newline at end of file diff --git a/docs/cortex/installation-and-configuration/index.md b/docs/cortex/installation-and-configuration/index.md index a81bd974d..0ce7ab5d1 100644 --- a/docs/cortex/installation-and-configuration/index.md +++ b/docs/cortex/installation-and-configuration/index.md @@ -1,4 +1,4 @@ -# Installation & configuration guides +# Installation and Configuration Guides ## Overview Cortex relies on Elasticsearch to store its data. A basic setup to install Elasticsearch, then Cortex on a standalone and dedicated server (physical or virtual). @@ -38,10 +38,9 @@ Cortex has been tested and is supported on the following operating systems: 3. Install Cortex and all its dependencies to run Analyzers & Responders as Docker Iiages 4. Install Cortex and all its dependencies to run Analyzers & Responders on the host (Debian and Ubuntu **ONLY**) - - For each release, DEB, RPM and ZIP binary packages are built and provided. +For deploying Cortex on a Kubernetes cluster, refer to our detailed [Kubernetes deployment guide](deploy-cortex-on-kubernetes.md). The [following Guide](step-by-step-guide.md) let you **prepare**, **install** and **configure** Cortex and its prerequisites for Debian and RPM packages based Operating Systems, as well as for other systems and using our binary packages. diff --git a/mkdocs.yml b/mkdocs.yml index 06eab476a..76a3a83f0 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -637,6 +637,7 @@ nav: - ./cortex/installation-and-configuration/proxy-settings.md - ./cortex/installation-and-configuration/docker.md - ./cortex/installation-and-configuration/database.md + - ./cortex/installation-and-configuration/deploy-cortex-on-kubernetes.md - 'User Guides': - 'First start' : 'cortex/user-guides/first-start.md' - 'User roles' : 'cortex/user-guides/roles.md'