Quick subsequent reconciliations cause errors caused by using cached client #733

pmalek · 2024-10-10T12:36:11Z

Problem statement

controller-runtime (use by the operator) uses a cached client.

This causes issues when quick, successive reconciliations of a resource happen and the subsequent iterations do not get an up to date view on an updated/patched object. For example:

calls an API (either external or kubeapi-server's) to create a resource
the request was successfull, object's status is patched with ID, controller returns from reconciliation loop (update/patch will cause a requeue)
another reconciliation happens
controller (using controller-runtime's client) gets the object and sees that there is no ID in status, attempts to call create again
this can result in
- an error if the call specificies unique createria (like ID, or name which has to be unique)
- or duplicated resources

Up until now, this has been taken care of by calling one of the Reduce*() helper functions which are aware of the objects' schema and their priorities so that surplus objects can be removed without consequences.

With integration with external APIs (e.g. Konnect API in #370) this does not work as the resource is created in external API and more importantly the create API calls will result in HTTP 409 Conflicts.

Exemplar logs

request:
POST /v2/control-planes HTTP/1.1
Host: eu.api.konghq.tech
Content-Length: 290
Accept: application/json
Authorization: Bearer kpat_XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Content-Type: application/json
Accept-Encoding: gzip

{"name":"test1","labels":{"app":"test1","k8s-generation":"1","k8s-group":"konnect.konghq.com","k8s-kind":"KonnectGatewayControlPlane","k8s-name":"test1","k8s-namespace":"default","k8s-uid":"0c2b1653-2839-4a4e-9c42-15ec8bcacc01","k8s-version":"v1alpha1","key1":"test1","session":"eng-demo"}}
response:
HTTP/2.0 201 Created
Connection: close
Access-Control-Allow-Credentials: true
Access-Control-Expose-Headers: Date,x-datadog-trace-id,Konnect-Acting-As,Konnect-Feature-Set
Content-Security-Policy: default-src 'self';base-uri 'self';block-all-mixed-content;font-src 'self' https: data:;form-action 'self';frame-ancestors 'self';img-src 'self' data:;object-src 'none';script-src 'self';script-src-attr 'none';style-src 'self' https: 'unsafe-inline';upgrade-insecure-requests
Content-Type: application/json; charset=utf-8
Cross-Origin-Embedder-Policy: require-corp
Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Resource-Policy: same-origin
Date: Thu, 10 Oct 2024 12:22:14 GMT
Etag: W/"2a5-ebpzw2GSE0sps7ves89bnmx3ey4"
Expect-Ct: max-age=0
Origin-Agent-Cluster: ?1
Ratelimit-Limit: 5000
Ratelimit-Remaining: 4920
Ratelimit-Reset: 47
Referrer-Policy: no-referrer
Strict-Transport-Security: max-age=15552000; includeSubDomains
Vary: Origin
Via: kong-enterprise-edition
X-B3-Sampled: 1
X-B3-Spanid: 5c13d5952748e800
X-B3-Traceid: 000000000000000039ca0f476805b600
X-Content-Type-Options: nosniff
X-Datadog-Parent-Id: 6634881512632804896
X-Datadog-Trace-Id: 4164157604819744060
X-Dns-Prefetch-Control: off
X-Download-Options: noopen
X-Envoy-Upstream-Service-Time: 940
X-Frame-Options: SAMEORIGIN
X-Kong-Proxy-Latency: 1
X-Kong-Upstream-Latency: 941
X-Permitted-Cross-Domain-Policies: none
X-Ratelimit-Limit-Minute: 5000
X-Ratelimit-Remaining-Minute: 4920
X-Xss-Protection: 0

{"id":"28a018d8-44d2-4f1d-97e7-077e4f2d8fbc","name":"test1","description":"","labels":{"app":"test1","key1":"test1","k8s-uid":"0c2b1653-2839-4a4e-9c42-15ec8bcacc01","session":"eng-demo","k8s-kind":"KonnectGatewayControlPlane","k8s-name":"test1","k8s-group":"konnect.konghq.com","k8s-version":"v1alpha1","k8s-namespace":"default","k8s-generation":"1"},"config":{"control_plane_endpoint":"https://25d04d0909.eu.cp0.konghq.tech","telemetry_endpoint":"https://25d04d0909.eu.tp0.konghq.tech","cluster_type":"CLUSTER_TYPE_CONTROL_PLANE","auth_type":"pinned_client_certs","cloud_gateway":false,"proxy_urls":[]},"created_at":"2024-10-10T12:22:13.692Z","updated_at":"2024-10-10T12:22:13.692Z"}
{"level":"info","ts":"2024-10-10T14:22:14.528+0200","logger":"KonnectGatewayControlPlane","msg":"operation in Konnect API complete","controller":"KonnectGatewayControlPlane","controllerGroup":"konnect.konghq.com","controllerKind":"KonnectGatewayControlPlane","KonnectGatewayControlPlane":{"name":"test1","namespace":"default"},"namespace":"default","name":"test1","reconcileID":"8fbd8dcf-53a9-47b8-ac2f-790815e089ed","op":"create","duration":"990.068708ms","konnect_id":"28a018d8-44d2-4f1d-97e7-077e4f2d8fbc"}
{"level":"debug","ts":"2024-10-10T14:22:14.558+0200","logger":"KonnectGatewayControlPlane","msg":"reconciling","controller":"KonnectGatewayControlPlane","controllerGroup":"konnect.konghq.com","controllerKind":"KonnectGatewayControlPlane","KonnectGatewayControlPlane":{"name":"test1","namespace":"default"},"namespace":"default","name":"test1","reconcileID":"4761a3b1-5543-4635-82a7-e4405645e265"}
request:
POST /v2/control-planes HTTP/1.1
Host: eu.api.konghq.tech
Content-Length: 290
Accept: application/json
Authorization: Bearer kpat_XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Content-Type: application/json
Accept-Encoding: gzip

{"name":"test1","labels":{"app":"test1","k8s-generation":"1","k8s-group":"konnect.konghq.com","k8s-kind":"KonnectGatewayControlPlane","k8s-name":"test1","k8s-namespace":"default","k8s-uid":"0c2b1653-2839-4a4e-9c42-15ec8bcacc01","k8s-version":"v1alpha1","key1":"test1","session":"eng-demo"}}
Error: <nil>
response:
HTTP/2.0 409 Conflict
Content-Length: 169
Access-Control-Allow-Credentials: true
Access-Control-Expose-Headers: Date,x-datadog-trace-id,Konnect-Acting-As,Konnect-Feature-Set
Content-Security-Policy: default-src 'self';base-uri 'self';block-all-mixed-content;font-src 'self' https: data:;form-action 'self';frame-ancestors 'self';img-src 'self' data:;object-src 'none';script-src 'self';script-src-attr 'none';style-src 'self' https: 'unsafe-inline';upgrade-insecure-requests
Content-Type: application/problem+json; charset=utf-8
Cross-Origin-Embedder-Policy: require-corp
Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Resource-Policy: same-origin
Date: Thu, 10 Oct 2024 12:22:15 GMT
Etag: W/"a9-+hkYkS5+BfEhKzaetYcn4sNF9XU"
Expect-Ct: max-age=0
Origin-Agent-Cluster: ?1
Ratelimit-Limit: 5000
Ratelimit-Remaining: 4919
Ratelimit-Reset: 46
Referrer-Policy: no-referrer
Strict-Transport-Security: max-age=15552000; includeSubDomains
Vary: Origin
Via: kong-enterprise-edition
X-B3-Sampled: 1
X-B3-Spanid: 5fe637d7c66f5c00
X-B3-Traceid: 00000000000000004793b70701333800
X-Content-Type-Options: nosniff
X-Datadog-Parent-Id: 6910272078133288161
X-Datadog-Trace-Id: 5157667238982137886
X-Dns-Prefetch-Control: off
X-Download-Options: noopen
X-Envoy-Upstream-Service-Time: 360
X-Frame-Options: SAMEORIGIN
X-Kong-Proxy-Latency: 1
X-Kong-Upstream-Latency: 361
X-Permitted-Cross-Domain-Policies: none
X-Ratelimit-Limit-Minute: 5000
X-Ratelimit-Remaining-Minute: 4919
X-Xss-Protection: 0

{"status":409,"title":"Conflict","instance":"kong:trace:5157667238982137886","detail":"Key (org_id, name)=(5ca26716-02f7-4430-9117-1d1a7a2695e7, test1) already exists."}
{"level":"error","ts":"2024-10-10T14:22:14.962+0200","logger":"KonnectGatewayControlPlane","msg":"operation in Konnect API failed","controller":"KonnectGatewayControlPlane","controllerGroup":"konnect.konghq.com","controllerKind":"KonnectGatewayControlPlane","KonnectGatewayControlPlane":{"name":"test1","namespace":"default"},"namespace":"default","name":"test1","reconcileID":"4761a3b1-5543-4635-82a7-e4405645e265","op":"create","duration":"403.763875ms","error":"failed to create KonnectGatewayControlPlane default/test1: {\"status\":409,\"title\":\"Conflict\",\"instance\":\"kong:trace:5157667238982137886\",\"detail\":\"Key (org_id, name)=(5ca26716-02f7-4430-9117-1d1a7a2695e7, test1) already exists.\"}"}

Proposed solution

Make use of expectations https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/controller_utils.go

Related material

https://www.youtube.com/watch?v=wMqzAOp15wo&t=748s

Acceptance criteria

Subsequent reconciliations of a resource should be idempotent (e.g. do not attempt to create a resource more than once)

The text was updated successfully, but these errors were encountered:

pmalek · 2024-10-11T08:38:09Z

As a follow up to the conversation on Kubernetes slack: https://kubernetes.slack.com/archives/C02MRBMN00Z/p1728554873783249.

A possible way forward (at least for the resources that are created against an external API like Konnect) we can implement a fallback lookup when 409 Conflict is returned. We could add filtering based on Kubernetes UID so that we're not "adopting" (separate issue #460) resources mapped from elsewhere. This could look like this (example for KonnectGatewayControlPlane):

	var sdkConflictError *sdkkonnecterrs.ConflictError
		if errors.As(err, &sdkConflictError) {
			reqList := operations.ListControlPlanesRequest{
				FilterNameEq: lo.ToPtr(cp.Spec.Name),
				Labels:       lo.ToPtr(KubernetesUIDLabelKey + ":" + string(cp.GetUID())),
			}
			if cp.Spec.ClusterType != nil {
				reqList.FilterClusterTypeEq = lo.ToPtr(string(*cp.Spec.ClusterType))
			}

			respList, err := sdk.ListControlPlanes(ctx, reqList)
			if err != nil {
				return err
			}
			for _, listCP := range respList.ListControlPlanesResponse.Data {
				if listCP.Name != nil && *listCP.Name == req.Name {
					cp.Status.SetKonnectID(*listCP.ID)
					break
				}
			}

pmalek added this to the KGO v1.5.x milestone Oct 10, 2024

pmalek changed the title ~~Quick subsequent reconciliations cause errors~~ Quick subsequent reconciliations cause errors caused by using cached client Oct 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quick subsequent reconciliations cause errors caused by using cached client #733

Quick subsequent reconciliations cause errors caused by using cached client #733

pmalek commented Oct 10, 2024

pmalek commented Oct 11, 2024

Quick subsequent reconciliations cause errors caused by using cached client #733

Quick subsequent reconciliations cause errors caused by using cached client #733

Comments

pmalek commented Oct 10, 2024

Problem statement

Proposed solution

Related material

Acceptance criteria

pmalek commented Oct 11, 2024