Skip to content

Commit df3ff21

Browse files
authored
Update etcd-druid documentation and enhance kind-up script to start a local kind registry container (#889)
Improved etcd-druid documentation and enhanced hack/kind-up.sh script
1 parent ac9753e commit df3ff21

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

48 files changed

+1594
-615
lines changed

.gitignore

+1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
/args
22
/bin
33
/hack/tools/bin
4+
/hack/kind/*
45
/.kube-secrets
56
/tmp/*
67
/dev

CONTRIBUTING.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
Please refer to the [Gardener contributor guide](https://gardener.cloud/docs/contribute).
1+
Please refer to the [etcd-druid contributor guide](docs/development/contribution.md).

Makefile

+5-5
Original file line numberDiff line numberDiff line change
@@ -15,9 +15,9 @@ PLATFORM ?= $(shell docker info --format '{{.OSType}}/{{.Architecture
1515
BUILD_DIR := build
1616
PROVIDERS := ""
1717
BUCKET_NAME := "e2e-test"
18-
KUBECONFIG_PATH := $(HACK_DIR)/e2e-test/infrastructure/kind/kubeconfig
19-
TEST_COVER := "true"
2018
IMG ?= ${IMAGE_REPOSITORY}:${IMAGE_BUILD_TAG}
19+
TEST_COVER := "true"
20+
KUBECONFIG_PATH := $(HACK_DIR)/kind/kubeconfig
2121

2222
# Tools
2323
# -------------------------------------------------------------------------
@@ -123,7 +123,7 @@ test-e2e: $(KUBECTL) $(HELM) $(SKAFFOLD) $(KUSTOMIZE)
123123
@VERSION=$(VERSION) GIT_SHA=$(GIT_SHA) $(HACK_DIR)/e2e-test/run-e2e-test.sh $(PROVIDERS)
124124

125125
.PHONY: ci-e2e-kind
126-
ci-e2e-kind: $(GINKGO)
126+
ci-e2e-kind: $(GINKGO) $(YQ) $(KIND)
127127
@BUCKET_NAME=$(BUCKET_NAME) $(HACK_DIR)/ci-e2e-kind.sh
128128

129129
.PHONY: ci-e2e-kind-azure
@@ -165,12 +165,12 @@ kind-up kind-down ci-e2e-kind ci-e2e-kind-azure deploy-localstack deploy-azurite
165165

166166
.PHONY: kind-up
167167
kind-up: $(KIND)
168-
@printf "\n\033[0;33m📌 NOTE: To target the newly created KinD cluster, please run the following command:\n\n export KUBECONFIG=$(KUBECONFIG_PATH)\n\033[0m\n"
169168
@$(HACK_DIR)/kind-up.sh
169+
@printf "\n\033[0;33m📌 NOTE: To target the newly created KinD cluster, please run the following command:\n\n export KUBECONFIG=$(KUBECONFIG_PATH)\n\033[0m\n"
170170

171171
.PHONY: kind-down
172172
kind-down: $(KIND)
173-
$(KIND) delete cluster --name etcd-druid-e2e
173+
@$(HACK_DIR)/kind-down.sh
174174

175175
# Install CRDs into a cluster
176176
.PHONY: install

README.md

+3-6
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,8 @@
1-
# etcd-druid
2-
3-
<image src="logo/etcd-druid-logo.png" style="width:300px"></image>
1+
<img src="docs/assets/logo/etcd-druid-with-tagline.png" style="width:120%"></img>
42

53
[![REUSE status](https://api.reuse.software/badge/github.com/gardener/etcd-druid)](https://api.reuse.software/info/github.com/gardener/etcd-druid) [![CI Build status](https://concourse.ci.gardener.cloud/api/v1/teams/gardener/pipelines/etcd-druid-master/jobs/master-head-update-job/badge)](https://concourse.ci.gardener.cloud/teams/gardener/pipelines/etcd-druid-master/jobs/master-head-update-job) [![Go Report Card](https://goreportcard.com/badge/github.com/gardener/etcd-druid)](https://goreportcard.com/report/github.com/gardener/etcd-druid) [![License: Apache-2.0](https://img.shields.io/badge/License-Apache--2.0-blue.svg)](LICENSE) [![Release](https://img.shields.io/github/v/release/gardener/etcd-druid.svg?style=flat)](https://github.com/gardener/etcd-druid) [![Go Reference](https://pkg.go.dev/badge/github.com/gardener/etcd-druid.svg)](https://pkg.go.dev/github.com/gardener/etcd-druid)
64

7-
`etcd-druid` is an [etcd](https://github.com/etcd-io/etcd) [operator](https://kubernetes.io/docs/concepts/extend-kubernetes/operator/) which makes it easy to configure, provision, reconcile and monitor etcd clusters. It enables management of an etcd cluster through [declarative Kubernetes API model](config/crd/bases/crd-druid.gardener.cloud_etcds.yaml).
5+
`etcd-druid` is an [etcd](https://github.com/etcd-io/etcd) [operator](https://kubernetes.io/docs/concepts/extend-kubernetes/operator/) which makes it easy to configure, provision, reconcile, monitor and delete etcd clusters. It enables management of etcd clusters through [declarative Kubernetes API model](config/crd/bases/crd-druid.gardener.cloud_etcds.yaml).
86

97
In every etcd cluster managed by `etcd-druid`, each etcd member is a two container `Pod` which consists of:
108

@@ -23,7 +21,6 @@ In every etcd cluster managed by `etcd-druid`, each etcd member is a two contain
2321
- Offers an asynchronous and threshold based capability to process backed up snapshots to:
2422
- Potentially minimize the recovery time by leveraging restoration from backups followed by [etcd's compaction and defragmentation](https://etcd.io/docs/v3.4/op-guide/maintenance/).
2523
- Indirectly assert integrity of the backed up snaphots.
26-
2724
- Allows seamless copy of backups between any two object store buckets.
2825

2926
## Start using or developing `etcd-druid` locally
@@ -36,7 +33,7 @@ For detailed documentation, see our `/docs` folder. Please find the [index](docs
3633

3734
## Contributions
3835

39-
If you wish to contribute then please see our [guidelines](https://github.com/gardener/etcd-druid/blob/4e9971aba3c3880a4cb6583d05843eabb8ca1409/CONTRIBUTING.md).
36+
If you wish to contribute then please see our [contributor guidelines](docs/development/contribution.md).
4037

4138
## Feedback and Support
4239

docs/README.md

+26-19
Original file line numberDiff line numberDiff line change
@@ -1,31 +1,42 @@
11
# Documentation Index
22

3-
4-
53
## Concepts
64

7-
* [Controllers](concepts/controllers.md)
8-
* [Webhooks](concepts/webhooks.md)
5+
* [Etcd Cluster Components](concepts/etcd-cluster-components.md)
6+
* [Protecting Etcd Cluster Resources](concepts/etcd-cluster-resource-protection.md)
97

108
## Development
119

12-
* [Testing(Unit, Integration and E2E Tests)](development/testing.md)
13-
* [etcd Network Latency](development/etcd-network-latency.md)
14-
* [Getting started locally using azurite emulator](development/getting-started-locally-azurite.md)
15-
* [Getting started locally using localstack emulator](development/getting-started-locally-localstack.md)
10+
* [Prepare Dev Environment](development/prepare-dev-environment.md)
1611
* [Getting started locally](development/getting-started-locally.md)
17-
* [Local End-To-End Tests](development/local-e2e-tests.md)
12+
* [Dependency Management](development/dependency-management.md)
13+
* [Changing the API](development/changing-api.md)
14+
* [Controllers](development/controllers.md)
15+
* [Add a new Etcd Cluster Component](development/add-new-etcd-cluster-component.md)
16+
* [Raising a Pull Request](development/raising-a-pr.md)
17+
* [Testing (Unit, Integration and E2E Tests)](development/testing.md)
1818

1919
## Deployment
2020

21-
* [etcd-druid CLI Flags](deployment/cli-flags.md)
21+
* [Getting started locally](deployment/getting-started-locally/getting-started-locally.md)
22+
* [Configure etcd-druid](deployment/configure-etcd-druid.md)
2223
* [Feature Gates](deployment/feature-gates.md)
24+
* [Recommendations for productive setup](deployment/production-setup-recommendations.md)
25+
* [Version Comptability Matrix](deployment/version-compatibility-matrix.md)
26+
27+
## Monitoring
28+
29+
* [Metrics](monitoring/metrics.md)
30+
31+
## Benchmarks
2332

24-
## Operations
33+
* [etcd Network Latency](benchmark/etcd-network-latency.md)
2534

26-
* [Metrics](operations/metrics.md)
27-
* [Recovery from Permanent Quorum Loss in etcd cluster](operations/recovery-from-permanent-quorum-loss-in-etcd-cluster.md)
28-
* [Restoring single member in a Multi-Node etcd cluster](operations/restoring-single-member-in-multi-node-etcd-cluster.md)
35+
## Usage
36+
37+
* [Managing Etcd Clusters](usage/managing-etcd-clusters.md)
38+
* [Securing Etcd Clusters](usage/securing-etcd-clusters.md)
39+
* [Recovering Etcd Clusters](usage/recovering-etcd-clusters.md)
2940

3041
## Proposals
3142

@@ -34,8 +45,4 @@
3445
* [DEP-2: Snapshot compaction](proposals/02-snapshot-compaction.md)
3546
* [DEP-3: Scaling up an Etcd cluster](proposals/03-scaling-up-an-etcd-cluster.md)
3647
* [DEP-4: Etcd Member custom resource](proposals/04-etcd-member-custom-resource.md)
37-
* [DEP-5: Etcd Operator Tasks](proposals/05-etcd-operator-tasks.md)
38-
39-
## Usage
40-
41-
* [Supported K8S versions](usage/supported_k8s_versions.md)
48+
* [DEP-5: Etcd Operator Tasks](proposals/05-etcd-operator-tasks.md)
Loading
File renamed without changes.
197 KB
Loading
+77
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
# Etcd Cluster Components
2+
3+
For every `Etcd` cluster that is provisioned by `etcd-druid` it deploys a set of resources. Following sections provides information and code reference to each such resource.
4+
5+
## StatefulSet
6+
7+
[StatefulSet](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/) is the primary kubernetes resource that gets provisioned for an etcd cluster.
8+
9+
* Replicas for the StatefulSet are derived from `Etcd.Spec.Replicas` in the custom resource.
10+
11+
* Each pod comprises of two containers:
12+
* `etcd-wrapper` : This is the main container which runs an etcd process.
13+
14+
* `etcd-backup-restore` : This is a side-container which does the following:
15+
16+
* Orchestrates the initialization of etcd. This includes validation of any existing etcd data directory, restoration in case of corrupt etcd data directory files for a single-member etcd cluster.
17+
* Periodically renewes member lease.
18+
* Optionally takes schedule and thresold based delta and full snapshots and pushes them to a configured object store.
19+
* Orchestrates scheduled etcd-db defragmentation.
20+
21+
> NOTE: This is not a complete list of functionalities offered out of `etcd-backup-restore`.
22+
23+
**Code reference:** [StatefulSet-Component](https://github.com/gardener/etcd-druid/tree/480213808813c5282b19aff5f3fd6868529e779c/internal/component/statefulset)
24+
25+
> For detailed information on each container you can visit [etcd-wrapper](https://github.com/gardener/etcd-wrapper) and [etcd-backup-restore](https://github.com/gardener/etcd-backup-restore) respositories.
26+
27+
## ConfigMap
28+
29+
Every `etcd` member requires [configuration](https://etcd.io/docs/v3.4/op-guide/configuration/) with which it must be started. `etcd-druid` creates a [ConfigMap](https://kubernetes.io/docs/concepts/configuration/configmap/) which gets mounted onto the `etcd-backup-restore` container. `etcd-backup-restore` container will modify the etcd configuration and serve it to the `etcd-wrapper` container upon request.
30+
31+
**Code reference:** [ConfigMap-Component](https://github.com/gardener/etcd-druid/tree/480213808813c5282b19aff5f3fd6868529e779c/internal/component/configmap)
32+
33+
## PodDisruptionBudget
34+
35+
An etcd cluster requires quorum for all write operations. Clients can additionally configure quorum based reads as well to ensure [linearizable](https://jepsen.io/consistency/models/linearizable) reads (kube-apiserver's etcd client is configured for linearizable reads and writes). In a cluster of size 3, only 1 member failure is tolerated. [Failure tolerance](https://etcd.io/docs/v3.3/faq/#what-is-failure-tolerance) for an etcd cluster with replicas `n` is computed as `(n-1)/2`.
36+
37+
To ensure that etcd pods are not evicted more than its failure tolerance, `etcd-druid` creates a [PodDisruptionBudget](https://kubernetes.io/docs/concepts/workloads/pods/disruptions/#pod-disruption-budgets).
38+
39+
> **NOTE:** For a single node etcd cluster a `PodDisruptionBudget` will be created, however `pdb.spec.minavailable` is set to 0 effectively disabling it.
40+
41+
**Code reference:** [PodDisruptionBudget-Component](https://github.com/gardener/etcd-druid/tree/480213808813c5282b19aff5f3fd6868529e779c/internal/component/poddistruptionbudget)
42+
43+
## ServiceAccount
44+
45+
`etch-backup-restore` container running as a side-car in every etcd-member, requires permissions to access resources like `Lease`, `StatefulSet` etc. A dedicated [ServiceAccount](https://kubernetes.io/docs/concepts/security/service-accounts/) is created per `Etcd` cluster for this purpose.
46+
47+
**Code reference:** [ServiceAccount-Component](https://github.com/gardener/etcd-druid/tree/3383e0219a6c21c6ef1d5610db964cc3524807c8/internal/component/serviceaccount)
48+
49+
## Role & RoleBinding
50+
51+
`etch-backup-restore` container running as a side-car in every etcd-member, requires permissions to access resources like `Lease`, `StatefulSet` etc. A dedicated [Role]() and [RoleBinding]() is created and linked to the [ServiceAccount](https://kubernetes.io/docs/concepts/security/service-accounts/) created per `Etcd` cluster.
52+
53+
**Code reference:** [Role-Component](https://github.com/gardener/etcd-druid/tree/3383e0219a6c21c6ef1d5610db964cc3524807c8/internal/component/role) & [RoleBinding-Component](https://github.com/gardener/etcd-druid/tree/master/internal/component/rolebinding)
54+
55+
## Client & Peer Service
56+
57+
To enable clients to connect to an etcd cluster a ClusterIP `Client` [Service](https://kubernetes.io/docs/concepts/services-networking/service/) is created. To enable `etcd` members to talk to each other(for discovery, leader-election, raft consensus etc.) `etcd-druid` also creates a [Headless Service](https://kubernetes.io/docs/concepts/services-networking/service/#headless-services).
58+
59+
**Code reference:** [Client-Service-Component](https://github.com/gardener/etcd-druid/tree/480213808813c5282b19aff5f3fd6868529e779c/internal/component/clientservice) & [Peer-Service-Component](https://github.com/gardener/etcd-druid/tree/480213808813c5282b19aff5f3fd6868529e779c/internal/component/peerservice)
60+
61+
## Member Lease
62+
63+
Every member in an `Etcd` cluster has a dedicated [Lease](https://kubernetes.io/docs/concepts/architecture/leases/) that gets created which signifies that the member is alive. It is the responsibility of the `etcd-backup-store` side-car container to periodically renew the lease.
64+
65+
> Today the lease object is also used to indicate the member-ID and the role of the member in an etcd cluster. Possible roles are `Leader`, `Member`(which denotes that this is a member but not a leader). This will change in the future with [EtcdMember resource](https://github.com/gardener/etcd-druid/blob/3383e0219a6c21c6ef1d5610db964cc3524807c8/docs/proposals/04-etcd-member-custom-resource.md).
66+
67+
**Code reference:** [Member-Lease-Component](https://github.com/gardener/etcd-druid/tree/3383e0219a6c21c6ef1d5610db964cc3524807c8/internal/component/memberlease)
68+
69+
## Delta & Full Snapshot Leases
70+
71+
One of the responsibilities of `etcd-backup-restore` container is to take periodic or threshold based snapshots (delta and full) of the etcd DB. Today `etcd-backup-restore` communicates the end-revision of the latest full/delta snapshots to `etcd-druid` operator via leases.
72+
73+
`etcd-druid` creates two [Lease](https://kubernetes.io/docs/concepts/architecture/leases/) resources one for delta and another for full snapshot. This information is used by the operator to trigger [snapshot-compaction](../proposals/02-snapshot-compaction.md) jobs. Snapshot leases are also used to derive the health of backups which gets updated in the `Status` subresource of every `Etcd` resource.
74+
75+
> In future these leases will be replaced by [EtcdMember resource](https://github.com/gardener/etcd-druid/blob/3383e0219a6c21c6ef1d5610db964cc3524807c8/docs/proposals/04-etcd-member-custom-resource.md).
76+
77+
**Code reference:** [Snapshot-Lease-Component](https://github.com/gardener/etcd-druid/tree/3383e0219a6c21c6ef1d5610db964cc3524807c8/internal/component/snapshotlease)
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
# Etcd Cluster Resource Protection
2+
3+
`etcd-druid` provisions and manages [kubernetes resources (a.k.a components)](etcd-cluster-components.md) for each `Etcd` cluster. To ensure that each component's specification is in line with the configured attributes defined in `Etcd` custom resource and to protect unintended changes done to any of these *managed components* a [Validating Webhook](https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/) is employed.
4+
5+
[Etcd Components Webhook](https://github.com/gardener/etcd-druid/tree/55efca1c8f6c852b0a4e97f08488ffec2eed0e68/internal/webhook/etcdcomponents) is the *validating webhook* which prevents unintended *UPDATE* and *DELETE* operations on all managed resources. Following sections describe what is prohibited and in which specific conditions the changes are permitted.
6+
7+
## Configure Etcd Components Webhook
8+
9+
Prerequisite to enable the validation webhook is to [configure the Webhook Server](../deployment/configure-etcd-druid.md#webhook-server). Additionally you need to enable the `Etcd Components` validating webhook and optionally configure other options. You can look at all the options [here](../deployment/configure-etcd-druid.md#etcd-components-webhook).
10+
11+
## What is allowed?
12+
13+
Modifications to managed resources under the following circumstances will be allowed:
14+
15+
* `Create` and `Connect` operations are allowed and no validation is done.
16+
* Changes to a kubernetes resource (e.g. StatefulSet, ConfigMap etc) not managed by etcd-druid are allowed.
17+
* Changes to a resource whose Group-Kind is amongst the resources managed by etcd-druid but does not have a parent `Etcd` resource are allowed.
18+
* It is possible that an operator wishes to explicitly disable etcd-component protection. This can be done by setting `druid.gardener.cloud/disable-etcd-component-protection` annotation on an `Etcd` resource. If this annotation is present then changes to managed components will be allowed.
19+
* If `Etcd` resource has a deletion timestamp set indicating that it is marked for deletion and is awaiting etcd-druid to delete all managed resources then deletion requests for all managed resources for this etcd cluster will be allowed if:
20+
* The deletion request has come from a `ServiceAccount` associated to etcd-druid. If not explicitly specified via `--reconciler-service-account` then a [default-reconciler-service-account](https://github.com/gardener/etcd-druid/blob/55efca1c8f6c852b0a4e97f08488ffec2eed0e68/internal/webhook/etcdcomponents/config.go#L23) will be assumed.
21+
* The deletion request has come from a `ServiceAccount` configured via `--etcd-components-webhook-exempt-service-accounts`.
22+
* `Lease` objects are periodically updated by each etcd member pod. A single `ServiceAccount` is created for all members. `Update` operation on `Lease` objects from [this ServiceAccount](https://github.com/gardener/etcd-druid/blob/55efca1c8f6c852b0a4e97f08488ffec2eed0e68/api/v1alpha1/helper.go#L28) is allowed.
23+
* If an active reconciliation is in-progress then only allow operations that are initiated by etcd-druid.
24+
* If no active reconciliation is currently in-progress, then allow updates to managed resource from `ServiceAccounts` configured via `--etcd-components-webhook-exempt-service-accounts`.

0 commit comments

Comments
 (0)