Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for internal-frontend #602

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions charts/temporal/templates/admintools-deployment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -33,9 +33,17 @@ spec:
env:
# TEMPORAL_CLI_ADDRESS is deprecated, use TEMPORAL_ADDRESS instead
- name: TEMPORAL_CLI_ADDRESS
{{- if index $.Values.server "internal-frontend" "enabled" }}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adamko147 Just a question about security:

If we use internal frontend here then everyone who has access to the admin-tools pod will be able to execute commands without any additional auth.

Is this expected?

Copy link

@adamko147 adamko147 Nov 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, that’s how we use it in our use case. Temporal is running in dedicated cluster/namespace and only frontend is exposed to outside world. access to the k8s cluster controlled via IAM, so unless you don’t have cluster access, you don’t have admin tools access.
it’s also works better for ops person when troubleshooting, who’s logs into admin tools using kubectl and from there has access w/o need to generate access token. and as I said, kubectl access is controlled by IAM connected to our internal company IDP where we also set the auth access

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adamko147 what you described matches our use case as well

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My internal "deny all" rule is screaming that it will be a problem for some people who has stricter polices related to this topic. But if code owners are ok with that then I'm ok too.

value: {{ include "temporal.fullname" $ }}-internal-frontend:{{ index $.Values.server "internal-frontend" "service" "port" }}
{{- else }}
value: {{ include "temporal.fullname" $ }}-frontend:{{ .Values.server.frontend.service.port }}
{{- end }}
- name: TEMPORAL_ADDRESS
{{- if index $.Values.server "internal-frontend" "enabled" }}
value: {{ include "temporal.fullname" $ }}-internal-frontend:{{ index $.Values.server "internal-frontend" "service" "port" }}
{{- else }}
value: {{ include "temporal.fullname" $ }}-frontend:{{ .Values.server.frontend.service.port }}
{{- end }}
{{- if .Values.admintools.additionalEnv }}
{{- toYaml .Values.admintools.additionalEnv | nindent 12 }}
{{- end }}
Expand Down
11 changes: 11 additions & 0 deletions charts/temporal/templates/server-configmap.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -119,6 +119,15 @@ data:
membershipPort: {{ $server.frontend.service.membershipPort }}
bindOnIP: "0.0.0.0"
{{- if index $.Values.server "internal-frontend" "enabled" }}
internal-frontend:
rpc:
grpcPort: {{ index $server "internal-frontend" "service" "port" }}
httpPort: {{ index $server "internal-frontend" "service" "httpPort" }}
membershipPort: {{ index $server "internal-frontend" "service" "membershipPort" }}
bindOnIP: "0.0.0.0"
{{- end }}
history:
rpc:
grpcPort: {{ $server.history.service.port }}
Expand Down Expand Up @@ -173,8 +182,10 @@ data:
{{- toYaml . | nindent 6 }}
{{- end }}
{{- if not (index $.Values.server "internal-frontend" "enabled") }}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you test it with authorization?

When I was preparing similar changes I had to add condition to "authorization" + "publicClient" only for frontend. Otherwise internalFrontend will try to do auth and fail. Also, if we skip auth section for frontend then users will be able to access UI without permission.

I ended up with creating a loop to make separated configmap per service

{{- range $originalService := (list "frontend" "internalFrontend" "history" "matching" "worker") }}
{{ $serviceValues := index $.Values.server $originalService }}
{{ $service := include "serviceName" (list $originalService) }}
...
      {{- if eq $service "frontend" }}
      {{- with $server.config.authorization }}
      authorization:
        {{- toYaml . | nindent 10 }}
      {{- end }}
      {{- end }}
...
      {{- if ne $service "frontend"}}
      internal-frontend:
        rpc:
            grpcPort: {{ $server.internalFrontend.service.port }}
            httpPort: {{ $server.internalFrontend.service.httpPort }}
            membershipPort: {{ $server.internalFrontend.service.membershipPort }}
            bindOnIP: "0.0.0.0"
      {{- end }}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TheGeniesis We have tested it with authorization on frontend (in our case using OIDC) which is working, but I haven't specifically tested auth for internal-frontend. I was under the impression (possibly mistakenly?) that the purpose of internal-frontend was to bypass authorization for internal access, but perhaps there's a middle ground where it has a different level of auth configured. If you have insights or ideas here would appreciate any contributions to this PR as it's a bit outside of my wheelhouse.

Copy link

@adamko147 adamko147 Nov 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TheGeniesis I don't think you're mistaken about the internal frontend purpose. It was introduced at temporalio/temporal#3706

The server worker needs to make connections to the frontend with effective "admin" access (to all namespaces). With pluggable claim mappers and authorizers, we can't control the logic in them, and requiring all users to set up mTLS with special claims is a large burden. Adding internal frontends as an alternative is easier and more reliable.

In our use case we have https://docs.temporal.io/self-hosted-guide/security#plugins custom authorizer/claim mapper for JWT tokens issued by our auth server and there is no way other temporal components (e.g. server worker) could get/refresh the token required for frontend access, so internal-frontend solves exactly this case.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've successfully tested this PR with authorization enabled. Currently we're enabling auth using the built-in authorizer and claim plugins, and it's working without issue.

I haven't leveraged the features that were failing when we didn't have the internal worker enabled (eg: scheduled jobs), but the system did come up without the errors and seemed to be working properly.

I agree with @adamko147 about the internal frontend's purpose.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been trying to find time in between meetings to test this, so I appreciate that confirmation, @dleblanc.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I wasn't clear enough.

About definitions
As you mentioned - internal-frontend shouldn't have any authorization and should by used by internal pods (like worker).

Frontend should have authorization and should be used by users (direct calls/web).

My auth scenario
I use Azure EntraID to manage accesses with default Temporal Authorizer + mapper.

The EntraID was configured by following a medium article. The only difference is in system namespace which in the newer version is temporal-system instead of system.

The problem which I faced
During tests services were failing, because internal-frontend had "authorization" part in the config. I had to disable it for the internal-frontend, so it stopped trying to do auth for internal services.

The Frontend needs the "authorization" part and publicClient (since it's public).

This MR
The current config uses the same configmap for "frontend" and "internal-frontend". So, it's not possible to have different setup for both (internal without authorization, frontend without internal-frontend).

How I solved the issue in my repo
I have separated configmaps for frontend and internal-frontend with enabled/disabled options described above

I see it's working for you. There might be one more factor which I'm not aware of and it was the reason why it didn't fully work for me. Unfortunately, I'm behind my working schedule and I don't have time to test deployments again with this MR, to confirm that I'm able to reproduce the same problem again :/

publicClient:
hostPort: "{{ include "temporal.componentname" (list $ "frontend") }}:{{ $server.frontend.service.port }}"
{{- end }}
dynamicConfigClient:
filepath: "/etc/temporal/dynamic_config/dynamic_config.yaml"
Expand Down
10 changes: 8 additions & 2 deletions charts/temporal/templates/server-deployment.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
{{- if $.Values.server.enabled }}
{{- range $service := (list "frontend" "history" "matching" "worker") }}
{{- range $service := (list "frontend" "internal-frontend" "history" "matching" "worker") }}
{{ $serviceValues := index $.Values.server $service }}
{{- if or (not (hasKey $serviceValues "enabled")) $serviceValues.enabled }}
apiVersion: apps/v1
kind: Deployment
metadata:
Expand Down Expand Up @@ -81,6 +82,10 @@ spec:
secretKeyRef:
name: {{ include "temporal.persistence.secretName" (list $ "visibility") }}
key: {{ include "temporal.persistence.secretKey" (list $ "visibility") }}
{{- if index $.Values.server "internal-frontend" "enabled" }}
- name: USE_INTERNAL_FRONTEND
value: "1"
{{- end }}
{{- if $.Values.server.versionCheckDisabled }}
- name: TEMPORAL_VERSION_CHECK_DISABLED
value: "1"
Expand All @@ -94,7 +99,7 @@ spec:
containerPort: {{ $serviceValues.service.port }}
protocol: TCP
{{- end }}
{{- if eq $service "frontend" }}
{{- if or (eq $service "frontend") (eq $service "internal-frontend") }}
- name: http
containerPort: {{ $serviceValues.service.httpPort }}
protocol: TCP
Expand Down Expand Up @@ -162,3 +167,4 @@ spec:
---
{{- end }}
{{- end }}
{{- end }}
4 changes: 4 additions & 0 deletions charts/temporal/templates/server-job.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -142,7 +142,11 @@ spec:
args: ['temporal operator namespace describe -n {{ $namespace.name }} || temporal operator namespace create -n {{ $namespace.name }}{{- if hasKey $namespace "retention" }} --retention {{ $namespace.retention }}{{- end }}']
env:
- name: TEMPORAL_ADDRESS
{{- if index $.Values.server "internal-frontend" "enabled" }}
value: {{ include "temporal.fullname" $ }}-internal-frontend.{{ $.Release.Namespace }}.svc:{{ index $.Values.server "internal-frontend" "service" "port" }}
{{- else }}
value: "{{ include "temporal.fullname" $ }}-frontend.{{ $.Release.Namespace }}.svc:{{ $.Values.server.frontend.service.port }}"
{{- end }}
{{- with $.Values.server.additionalVolumeMounts }}
volumeMounts:
{{- toYaml . | nindent 12 }}
Expand Down
4 changes: 3 additions & 1 deletion charts/temporal/templates/server-pdb.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
{{- if $.Values.server.enabled }}
{{- range $service := (list "frontend" "history" "matching" "worker") }}
{{- range $service := (list "frontend" "internal-frontend" "history" "matching" "worker") }}
{{- $serviceValues := index $.Values.server $service -}}
{{- if or (not (hasKey $serviceValues "enabled")) $serviceValues.enabled }}
{{- if and (gt ($serviceValues.replicaCount | int) 1) ($serviceValues.podDisruptionBudget) }}
apiVersion: policy/v1
kind: PodDisruptionBudget
Expand All @@ -19,3 +20,4 @@ spec:
---
{{- end }}
{{- end }}
{{- end }}
4 changes: 3 additions & 1 deletion charts/temporal/templates/server-service-monitor.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
{{- if $.Values.server.enabled }}
{{- range $service := (list "frontend" "matching" "history" "worker") }}
{{- range $service := (list "frontend" "internal-frontend" "matching" "history" "worker") }}
{{- $serviceValues := index $.Values.server $service -}}
{{- if or (not (hasKey $serviceValues "enabled")) $serviceValues.enabled }}
{{- if (default $.Values.server.metrics.serviceMonitor.enabled $serviceValues.metrics.serviceMonitor.enabled) }}
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
Expand Down Expand Up @@ -33,3 +34,4 @@ spec:
{{- end }}
{{- end }}
{{- end }}
{{- end }}
36 changes: 35 additions & 1 deletion charts/temporal/templates/server-service.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -29,8 +29,41 @@ spec:
app.kubernetes.io/component: frontend

---
{{- range $service := (list "frontend" "matching" "history" "worker") }}
{{- if index $.Values.server "internal-frontend" "enabled" }}
apiVersion: v1
kind: Service
metadata:
name: {{ include "temporal.componentname" (list $ "internal-frontend") }}
labels:
{{- include "temporal.resourceLabels" (list $ "internal-frontend" "") | nindent 4 }}
{{- if hasKey (index .Values.server "internal-frontend" "service") "annotations" }}
annotations: {{- include "common.tplvalues.render" ( dict "value" (index .Values.server "internal-frontend" "service" "annotations") "context" $) | nindent 4 }}
{{- end }}
spec:
type: {{ index .Values.server "internal-frontend" "service" "type" }}
ports:
- port: {{ index .Values.server "internal-frontend" "service" "port" }}
targetPort: rpc
protocol: TCP
name: grpc-rpc
{{- if hasKey (index .Values.server "internal-frontend" "service") "nodePort" }}
nodePort: {{ index .Values.server "internal-frontend" "service" "nodePort" }}
{{- end }}
- port: {{ index .Values.server "internal-frontend" "service" "httpPort" }}
targetPort: http
protocol: TCP
name: http
# TODO: Allow customizing the node HTTP port
selector:
app.kubernetes.io/name: {{ include "temporal.name" $ }}
app.kubernetes.io/instance: {{ .Release.Name }}
app.kubernetes.io/component: internal-frontend

---
{{- end }}
{{- range $service := (list "frontend" "internal-frontend" "matching" "history" "worker") }}
{{ $serviceValues := index $.Values.server $service }}
{{- if or (not (hasKey $serviceValues "enabled")) $serviceValues.enabled }}
apiVersion: v1
kind: Service
metadata:
Expand Down Expand Up @@ -70,3 +103,4 @@ spec:
---
{{- end }}
{{- end }}
{{- end }}
29 changes: 29 additions & 0 deletions charts/temporal/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -242,6 +242,35 @@ server:
containerSecurityContext: {}
topologySpreadConstraints: []
podDisruptionBudget: {}
internal-frontend:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems to me the helm chart is uses camelCase convention, would it ok if we renamed internal-frontend to internalFrontend? It would also makes the usage a bit easier, we could use {{ if $.Values.server.internalFrontend.enabled }} directly.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adamko147 I had initially done this as internalFrontend but recall these identifiers are actually used to generate the kubernetes services and it doesn't allow uppercase letters - hence internal-frontend which corresponds to what the actual temporal feature is named.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huh, thanks, did not realize that.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just as a proposition: you can create a simple helper to use internalFrontend as a "yaml" part and internal-frontend in names when needed:

{{- define "serviceName" -}}
    {{- $service := index . 0 -}}
    {{- if eq $service "internalFrontend" }}
        {{- print "internal-frontend" }}
    {{- else }}
        {{- print $service }}
    {{- end }}
{{- end -}}

Usage:

{{ $service := include "serviceName" (list "frontend" "internalFrontend" "matching" "history" "worker") }}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, to be backwards compatible, should we disable internal frontend by default?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adamko147 agreed, that was meant to be disabled by default, I'll make this update

# Enable this to create internal-frontend
enabled: false
service:
# Evaluated as template
annotations: {}
type: ClusterIP
port: 7233
membershipPort: 6933
httpPort: 7243

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these ports be 6936 and 7236 by default to match https://github.com/temporalio/temporal/blob/main/docker/config_template.yaml#L283-L284

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ports 6936 and 7236 also matches the suggestion in the release notes for version 1.20 (which introduced the internal-frontend).

metrics:
annotations:
enabled: true
serviceMonitor: {}
# enabled: false
prometheus: {}
# timerType: histogram
deploymentLabels: {}
deploymentAnnotations: {}
podAnnotations: {}
podLabels: {}
resources: {}
nodeSelector: {}
tolerations: []
affinity: {}
additionalEnv: []
containerSecurityContext: {}
topologySpreadConstraints: []
podDisruptionBudget: {}
history:
service:
# type: ClusterIP
Expand Down