Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements to e2e tests and documentation #2233

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions e2e/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ STORAGE_CLASS?=
STORAGE_ENGINE?=
DUMP_OPERATOR_STATE?=true
SEAWEEDFS_IMAGE?=chrislusf/seaweedfs:3.73
NODE_SELECTOR?=
# Defines the cloud provider used for the underlying Kubernetes cluster. Currently only kind is support, other cloud providers
# should still work but this test framework has no special cases for those.
CLOUD_PROVIDER?=
Expand Down Expand Up @@ -183,4 +184,5 @@ endif
--unified-fdb-image=$(UNIFIED_FDB_IMAGE) \
--feature-server-side-apply=$(FEATURE_SERVER_SIDE_APPLY) \
--seaweedfs-image=$(SEAWEEDFS_IMAGE) $(FEATURE_LOCALITIES_FLAG) $(FEATURE_DNS) \
--node-selector="$(NODE_SELECTOR)" \
| grep -v 'constructing many client instances from the same exec auth config can cause performance problems during cert rotation' &> $(BASE_DIR)/../logs/$<.log
84 changes: 77 additions & 7 deletions e2e/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,22 @@
Those test must be running on a Kubernetes cluster that has `amd64` Linux nodes as FoundationDB currently has no builds for `arm64`.
Every test suite has a head in the `*_test.go` file that describes the test cases and the targeted scenarios.

## Setup

The e2e tests expect certain customizations in the Kubernetes cluster. To deploy those in the cluster, run:

```bash
make install
```

This is only necessary once.

To cover storage class migrations in tests, annotate your cluster's storage classes as follows. Alternatively, you can skip those tests and provide the default storage class (see "StorageClass selection" below).

* Annotate the `StorageClass` to use for most tests with the label: `"storageclass.kubernetes.io/is-default-class=true`.
* Annotate at least two `StorageClasses` with the label `foundationdb.org/operator-testing=true`. If the test suite is not able to get at least 2 different `StorageClasses` the migration test will be skipped.


## Running the e2e tests

The following command will run all the operator related tests with the default values:
Expand All @@ -19,6 +35,40 @@ make -C e2e test_operator.run

Every test suite will create at least one namespace, HA cluster tests will create all the required namespaces.

### Running against a custom operator version

By default, the tests will test the "latest" official version (`docker.io/fdb-kubernetes-operator:latest`). To test
against a custom-built operator, there are three approaches:

#### Specify the operator container image

You can specify which container image to use via environment variable:

```bash
OPERATOR_IMAGE=fdb-kubernetes-operator:v1.54.0
```

#### Use a private container registry

You can specify a different container registry than `docker.io`:

```bash
REGISTRY=12345.dkr.ecr.us-east-1.amazonaws.com
```

The e2e test will now read all images from there, so make sure to have all the necessary FoundationDB containers
in this registry, too.

#### Specify registry per container

It is possible to use different registries for the operator and system under test by including the registry in the container image specification, e.g.:

```bash
REGISTRY=
OPERATOR_IMAGE=12345.dkr.ecr.us-east-1.amazonaws.com/fdb-kubernetes-operator:latest
UNIFIED_FDB_IMAGE=docker.io/foundationdb/fdb-kubernetes-monitor
```

### Reusing an existing test cluster

A test cluster can be reused if wanted, e.g. for test cases that load a large amount of data into the cluster.
Expand All @@ -39,12 +89,16 @@ You can provide the targeted `StorageClass` as an environment variable:
STORAGE_CLASS='my-fancy-storage' make -kj -C e2e test_operator.run
```

If the `STORAGE_CLASS` is not set, the operator will take the default `StorageClass` in this cluster.
The default `StorageClass` will be identified based on the annotation: `"storageclass.kubernetes.io/is-default-class=true`.
If the `STORAGE_CLASS` is not set, the operator will pick storage classes based on labels, see "Setup" above.

### Using a custom nodeSelector

The e2e test suite has some tests, that will test a migration from one `StorageClass` to another.
To prevent potential issues, the e2e test suite will only select `StorageClasses` that have the label `foundationdb.org/operator-testing=true`.
If the test suite is not able to get at least 2 different `StorageClasses` the migration test will be skipped.
To start the FDB cluster on nodes matching a particular label (e.g. a particular node pool), you can provide a single
key-value pair in an environment variable that is added to the nodeSelector:

```bash
NODE_SELECTOR="my-label=true"
```

### Customize the e2e test runs

Expand Down Expand Up @@ -88,7 +142,7 @@ make -C e2e test_operator_upgrades.run

### Running e2e tests in kind

_NOTE_ This setup is currently not used by our CI.
_NOTE_ This setup is currently not used by anyone and is not being maintained. Some tests currently do not pass in `kind.`

[kind](https://kind.sigs.k8s.io) provides an easy way to run a local Kubernetes cluster.
For running tests on a `kind` cluster you should set the `CLOUD_PROVIDER=kind` environment variable to make sure the test framework is creating clusters with smaller resource requirements.
Expand All @@ -98,7 +152,18 @@ The following steps assume that `kind` and `helm` are already installed.
make -C e2e kind-setup
```

This will call the [setup_e2e.sh](./scripts/setup_e2e.sh) script to setup `kind` and install chaos-mesh.
This will call the [setup_e2e.sh](./scripts/setup_e2e.sh) script to setup `kind` and install chaos-mesh. Kind clusters do not load images
from a container registry, they need to be explicitly loaded and `kind-setup` doesn't do this correctly right now. So
you may need to explicitly load missing images into the kind cluster, like:

```bash
kind load docker-image -n e2e-tests docker.io/foundationdb/fdb-kubernetes-monitor:7.1.57
kind load docker-image -n e2e-tests docker.io/foundationdb/fdb-kubernetes-monitor:7.3.59
```

If you forgot one, the test will not be able to schedule FoundationDB nodes. You can see what is missing by looking
at the events section of the `kubectl describe pod` output.

After testing you can run the following command to remove the kind cluster:

```bash
Expand Down Expand Up @@ -155,3 +220,8 @@ You can run all tests with `make -kj -C e2e run`
All tests will be logging to the `logs` folder in the root of this repository.
If you want to see the current state of a running test you can use `tail`, e.g. `tail -f ./logs/test_operator.log`, to see the progress of the operator tests, the command assumes you are running it from the project directory.
All tests that are started by our CI pipelines will report in the PR with the test status.

The e2e tests start new pods frequently. Tests will fail if scheduling or provisioning is slow. In environments using
node provisioners such as `karpenter` (like AWS EKS), it is advisable to ensure there is enough spare capacity, by configuring a
minimum cluster size and/or by using fewer larger nodes. On the other hand, the cluster should have at least a handful
of nodes (say 5) to make it less likely that a majority of coordinators get scheduled on the same node.
5 changes: 5 additions & 0 deletions e2e/fixtures/cluster_config.go
Original file line number Diff line number Diff line change
Expand Up @@ -172,6 +172,11 @@ func (config *ClusterConfig) SetDefaults(factory *Factory) {
if config.TLSPeerVerification == "" {
config.TLSPeerVerification = "I.CN=localhost,I.O=Example Inc.,S.CN=localhost,S.O=Example Inc."
}
nodeSelector := factory.GetNodeSelector()
if nodeSelector != "" {
splitSelector := strings.Split(nodeSelector, "=")
config.NodeSelector = map[string]string{splitSelector[0]: splitSelector[1]}
}
}

// getVolumeSize returns the volume size in as a string. If no volume size is defined a default will be set based on
Expand Down
12 changes: 8 additions & 4 deletions e2e/fixtures/factory.go
Original file line number Diff line number Diff line change
Expand Up @@ -584,7 +584,7 @@ func writePodInformation(pod corev1.Pod) string {
buffer.WriteString(strconv.Itoa(containers))
buffer.WriteString("\t")
buffer.WriteString(string(pod.Status.Phase))

buffer.WriteString("\t")
if pod.Status.Phase == corev1.PodPending {
for _, condition := range pod.Status.Conditions {
// Only check the PodScheduled condition.
Expand All @@ -594,20 +594,19 @@ func writePodInformation(pod corev1.Pod) string {

// If the Pod is scheduled we can ignore this condition.
if condition.Status == corev1.ConditionTrue {
buffer.WriteString("\t-")
buffer.WriteString("-")
continue
}

// Printout the message, why the Pod is not scheduling.
buffer.WriteString("\t")
if condition.Message != "" {
buffer.WriteString(condition.Message)
} else {
buffer.WriteString("-")
}
}
} else {
buffer.WriteString("\t-")
buffer.WriteString("-")
}

buffer.WriteString("\t")
Expand Down Expand Up @@ -879,3 +878,8 @@ func (factory *Factory) getStorageEngine() fdbv1beta2.StorageEngine {
func (factory *Factory) Intn(n int) int {
return factory.randomGenerator.Intn(n)
}

// GetNodeSelector returns the node selector, which is an empty string or has the format key=value.
func (factory *Factory) GetNodeSelector() string {
return factory.options.nodeSelector
}
18 changes: 18 additions & 0 deletions e2e/fixtures/options.go
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@ type FactoryOptions struct {
featureOperatorUnifiedImage bool
featureOperatorServerSideApply bool
dumpOperatorState bool
nodeSelector string
}

// BindFlags binds the FactoryOptions flags to the provided FlagSet. This can be used to extend the current test setup
Expand Down Expand Up @@ -217,6 +218,7 @@ func (options *FactoryOptions) BindFlags(fs *flag.FlagSet) {
"chrislusf/seaweedfs:3.73",
"defines the seaweedfs image that should be used for testing. SeaweedFS is used for backup and restore testing to spin up a S3 compatible blobstore.",
)
fs.StringVar(&options.nodeSelector, "node-selector", "", "if defined, specifies a Kubernetes node selector for the FDB cluster in the format key=value")
}

func (options *FactoryOptions) validateFlags() error {
Expand Down Expand Up @@ -281,6 +283,11 @@ func (options *FactoryOptions) validateFlags() error {
options.cloudProvider = strings.ToLower(options.cloudProvider)
}

err := options.validateNodeSelector()
if err != nil {
return err
}

return options.validateFDBVersionTagMapping()
}

Expand Down Expand Up @@ -321,6 +328,17 @@ func (options *FactoryOptions) validateFDBVersionTagMapping() error {
return nil
}

func (options *FactoryOptions) validateNodeSelector() error {
if options.nodeSelector == "" {
return nil
}
splitSelector := strings.Split(options.nodeSelector, "=")
if len(splitSelector) != 2 {
return fmt.Errorf("node selector must have format key=value, got: %s", options.nodeSelector)
}
return nil
}

// getTagSuffix returns "-1" if the tag suffix should be used for a sidecar image.
func getTagSuffix(isSidecar bool) string {
if isSidecar {
Expand Down
10 changes: 6 additions & 4 deletions e2e/test_operator/operator_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -1250,10 +1250,12 @@ var _ = Describe("Operator", Label("e2e", "pr"), func() {
})

AfterEach(func() {
Expect(fdbCluster.UpdateStorageClass(
defaultStorageClass,
fdbv1beta2.ProcessClassLog,
)).NotTo(HaveOccurred())
if defaultStorageClass != "" {
Expect(fdbCluster.UpdateStorageClass(
defaultStorageClass,
fdbv1beta2.ProcessClassLog,
)).NotTo(HaveOccurred())
}
})
})

Expand Down