Skip to content

Commit d152acd

Browse files
authored
Merge pull request #388 from appuio/exoscale/how-to/instancepool
Add how-tos for managing nodes for Exoscale clusters with instance pools
2 parents 72799a5 + 05362c7 commit d152acd

8 files changed

+383
-5
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,228 @@
1+
= Change worker node type (instance pool)
2+
3+
:cloud_provider: exoscale
4+
:kubectl_extra_args: --as=cluster-admin
5+
:needs_hieradata_edit: no
6+
7+
:node-delete-list: ${NODES_TO_REMOVE}
8+
9+
[abstract]
10+
--
11+
Steps to change the instance type of an OpenShift 4 cluster on https://www.exoscale.com[Exoscale] with instance pools.
12+
--
13+
== Starting situation
14+
15+
* You already have a OpenShift 4 cluster on Exoscale
16+
* Your cluster uses Exoscale instance pools for the worker and infra nodes
17+
* You have admin-level access to the cluster
18+
* Your `kubectl` context points to the cluster you're modifying
19+
* You want to change the node type (size) of the worker or infra nodes
20+
21+
== High-level overview
22+
23+
* Update the instance pool with the new desired type
24+
* Replace each existing node with a new node
25+
26+
== Prerequisites
27+
28+
include::partial$exoscale/prerequisites.adoc[]
29+
30+
== Prepare local environment
31+
32+
include::partial$exoscale/setup-local-env.adoc[]
33+
34+
== Update Cluster Config
35+
36+
. Set new desired node type
37+
+
38+
[source,bash]
39+
----
40+
new_type=<exoscale instance type> <1>
41+
----
42+
<1> An Exoscale instance type, for example `standard.huge`.
43+
44+
. Update cluster config
45+
+
46+
[source,bash]
47+
----
48+
pushd "inventory/classes/${TENANT_ID}/"
49+
50+
yq eval -i ".parameters.openshift4_terraform.terraform_variables.worker_type = \"${new_type}\"" \
51+
${CLUSTER_ID}.yml
52+
----
53+
54+
. Review and commit
55+
+
56+
[source,bash]
57+
----
58+
59+
# Have a look at the file ${CLUSTER_ID}.yml.
60+
61+
git commit -a -m "Update worker nodes of cluster ${CLUSTER_ID} to ${new_type}"
62+
git push
63+
64+
popd
65+
----
66+
67+
. Compile and push cluster catalog
68+
+
69+
[source,bash]
70+
----
71+
commodore catalog compile ${CLUSTER_ID} --push -i
72+
----
73+
74+
== Run Terraform
75+
76+
include::partial$exoscale/configure-terraform-secrets.adoc[]
77+
78+
include::partial$setup_terraform.adoc[]
79+
80+
. Run Terraform
81+
+
82+
[NOTE]
83+
====
84+
This doesn't make changes to existing instances.
85+
However, after this step, any new instances created for the instance pool will use the new configuration.
86+
====
87+
+
88+
[source,bash]
89+
----
90+
terraform apply
91+
----
92+
93+
== Apply new instance pool configuration
94+
95+
[IMPORTANT]
96+
====
97+
Double-check that your `kubectl` context points to the cluster you're working on
98+
====
99+
100+
[TIP]
101+
====
102+
Depending on the number of nodes you're updating, you may want to execute the steps in this section for a subset of the nodes at a time.
103+
104+
On clusters with dedicated hypervisors, you'll need to execute the steps for each `worker` instance pool.
105+
You can list the worker instance pools with
106+
107+
[source,bash]
108+
----
109+
exo compute instance-pool list -Ojson | jq -r '.[]|select(.name|contains("worker"))|.name'
110+
----
111+
====
112+
113+
[IMPORTANT]
114+
====
115+
If you're using this how-to for changing the instance type of the infra nodes, you must run Terraform again after replacing nodes to ensure that the LB hieradata is updated with the new infra node IPs.
116+
117+
When replacing infra nodes, we strongly recommend doing so in two batches to ensure availability of the cluster ingress.
118+
====
119+
120+
. Select the instance pool
121+
+
122+
[source,bash]
123+
----
124+
pool_name="${CLUSTER_ID}_worker-0" <1>
125+
----
126+
127+
. Compute the new instance count
128+
+
129+
[source,bash]
130+
----
131+
new_count=$(exo compute instance-pool show "${pool_name}" -Ojson | jq -r '.size * 2')
132+
----
133+
+
134+
[TIP]
135+
====
136+
For larger clusters, you'll probably want to do something like the following to replace nodes in batches.
137+
If you do this, you'll need to repeat the steps below this one for each batch.
138+
139+
[source,bash]
140+
----
141+
batch_size=3 <1>
142+
new_count=$(exo compute instance-pool show "${pool_name}" -Ojson | \
143+
jq --argjson batch "$batch_size" -r '.size + $batch')
144+
----
145+
<1> Replace with the desired batch size.
146+
Please ensure that you adjust the last batch size to not provision extra nodes if your node count isn't divisible by your selected batch size.
147+
====
148+
149+
. Get the list of old nodes
150+
+
151+
[source,bash]
152+
----
153+
NODES_TO_REMOVE=$(exo compute instance-pool show "${pool_name}" -Ojson | \
154+
jq -r '.instances|join(" ")')
155+
----
156+
+
157+
[TIP]
158+
====
159+
If you're replacing nodes in batches, save the list of old nodes in a file:
160+
161+
[source,bash]
162+
----
163+
exo compute instance-pool show "${pool_name}" -Ojson | jq -r '.instances' > old-nodes.json <1>
164+
----
165+
<1> Run this *only once* before starting to replace nodes.
166+
167+
Compute a batch of old nodes to remove and drop those from the file:
168+
169+
[source,bash]
170+
----
171+
NODES_TO_REMOVE=$(jq --argjson batch "$batch_size" -r '.[:$batch]|join(" ")' old-nodes.json)
172+
jq -r '.[$batch:]' old-nodes.json > old-nodes-rem.json && \
173+
mv old-nodes-rem.json old-nodes.json
174+
----
175+
====
176+
177+
. Scale up the instance pool to create new instances with the new desired type
178+
+
179+
[source,bash]
180+
----
181+
exo compute instance-pool scale "${pool_name}" "${new_count}" -z "${EXOSCALE_ZONE}"
182+
----
183+
184+
. Approve CSRs of new nodes
185+
+
186+
include::partial$install/approve-node-csrs.adoc[]
187+
188+
. Label nodes
189+
+
190+
[source,bash]
191+
----
192+
kubectl get node -ojson | \
193+
jq -r '.items[] | select(.metadata.name | test("infra|master|storage-")|not).metadata.name' | \
194+
xargs -I {} kubectl label node {} node-role.kubernetes.io/app=
195+
----
196+
197+
. Drain and remove old nodes
198+
+
199+
* If you are working on a production cluster, you need to *schedule the node drain for the next maintenance.*
200+
+
201+
.Schedule node drain (production clusters)
202+
[%collapsible]
203+
====
204+
include::partial$drain-node-scheduled.adoc[]
205+
====
206+
* If you are working on a non-production cluster, you may *drain and remove the nodes immediately.*
207+
+
208+
.Drain and remove node immediately
209+
[%collapsible]
210+
====
211+
include::partial$drain-node-immediately.adoc[]
212+
====
213+
214+
. Remove old VMs from instance pool
215+
+
216+
[IMPORTANT]
217+
====
218+
Only do this after the previous step is completed.
219+
On production clusters this must happen *after the maintenance*.
220+
====
221+
+
222+
[source,bash]
223+
----
224+
for node in "$NODES_TO_REMOVE"; do
225+
exo compute instance-pool evict "${pool_name}" "${node}" -z "${EXOSCALE_ZONE}"
226+
done
227+
----
228+

docs/modules/ROOT/pages/how-tos/exoscale/remove_node.adoc

+3-2
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
= Remove a worker node
1+
= Remove a worker node (no instance pool)
22

33
:cloud_provider: exoscale
44
:kubectl_extra_args: --as=cluster-admin
@@ -10,12 +10,13 @@
1010

1111
[abstract]
1212
--
13-
Steps to remove a worker node of an OpenShift 4 cluster on https://www.exoscale.com[Exoscale].
13+
Steps to remove a worker node of an OpenShift 4 cluster on https://www.exoscale.com[Exoscale] without instance pools.
1414
--
1515

1616
== Starting situation
1717

1818
* You already have a OpenShift 4 cluster on Exoscale
19+
* Your cluster doesn't use Exoscale instance pools
1920
* You have admin-level access to the cluster
2021
* You want to remove an existing worker node in the cluster
2122

Original file line numberDiff line numberDiff line change
@@ -0,0 +1,98 @@
1+
= Remove a worker node (instance pool)
2+
3+
:cloud_provider: exoscale
4+
:kubectl_extra_args: --as=cluster-admin
5+
:delabel_app_nodes: yes
6+
7+
:node-delete-list: ${NODE_TO_REMOVE}
8+
:instance-pool-group: worker
9+
:delete-pvs: old_pv_names
10+
11+
[abstract]
12+
--
13+
Steps to remove a worker node of an OpenShift 4 cluster on https://www.exoscale.com[Exoscale] which uses instance pools.
14+
--
15+
16+
== Starting situation
17+
18+
* You already have a OpenShift 4 cluster on Exoscale
19+
* Your cluster uses instance pools for the worker and infra nodes
20+
* You have admin-level access to the cluster
21+
* You want to remove an existing worker node in the cluster
22+
23+
== High-level overview
24+
25+
* We drain the node
26+
* Then we remove it from Kubernetes.
27+
* Finally we remove the associated VM from the instance pool.
28+
29+
== Prerequisites
30+
31+
include::partial$exoscale/prerequisites.adoc[]
32+
33+
== Prepare local environment
34+
35+
include::partial$exoscale/setup-local-env.adoc[]
36+
37+
== Prepare Terraform environment
38+
39+
include::partial$exoscale/configure-terraform-secrets.adoc[]
40+
41+
include::partial$setup_terraform.adoc[]
42+
43+
== Drain and Remove Node
44+
45+
* Select a node to remove.
46+
With instance pools, we can remove any node.
47+
+
48+
[source,bash]
49+
----
50+
export NODE_TO_REMOVE=<node name>
51+
----
52+
53+
* If you are working on a production cluster, you need to *schedule the node drain for the next maintenance.*
54+
* If you are working on a non-production cluster, you may *drain and remove the node immediately.*
55+
56+
=== Schedule node drain (production clusters)
57+
58+
include::partial$drain-node-scheduled.adoc[]
59+
60+
=== Drain and remove node immediately
61+
62+
include::partial$drain-node-immediately.adoc[]
63+
64+
== Update Cluster Config
65+
66+
. Update cluster config.
67+
+
68+
[source,bash]
69+
----
70+
pushd "inventory/classes/${TENANT_ID}/"
71+
72+
yq eval -i ".parameters.openshift4_terraform.terraform_variables.worker_count -= 1" \
73+
${CLUSTER_ID}.yml
74+
----
75+
76+
. Review and commit
77+
+
78+
[source,bash]
79+
----
80+
81+
# Have a look at the file ${CLUSTER_ID}.yml.
82+
83+
git commit -a -m "Remove worker node from cluster ${CLUSTER_ID}"
84+
git push
85+
86+
popd
87+
----
88+
89+
. Compile and push cluster catalog
90+
+
91+
[source,bash]
92+
----
93+
commodore catalog compile ${CLUSTER_ID} --push -i
94+
----
95+
96+
== Remove VM
97+
98+
include::partial$exoscale/delete-node-vm-instancepool.adoc[]

docs/modules/ROOT/partials/drain-node-scheduled.adoc

+1-1
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
[source,bash,subs="attributes+"]
44
----
55
pushd "../../../inventory/classes/$TENANT_ID"
6-
cat > manifests/$CLUSTER_ID/drain_node_hook <<EOF
6+
cat > manifests/$CLUSTER_ID/drain_node_hook.yaml <<EOF
77
---
88
apiVersion: managedupgrade.appuio.io/v1beta1
99
kind: UpgradeJobHook
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
2+
. Evict the VM(s) from the instance pool
3+
+
4+
[NOTE]
5+
====
6+
We're going through all {instance-pool-group} instance pools to find the pool containing the node(s) to remove.
7+
This ensures that we can apply the step as-is on clusters on dedicated hypervisors which may have multiple {instance-pool-group} instance pools.
8+
====
9+
+
10+
[source,bash,subs="attributes+"]
11+
----
12+
instancepool_names=$(exo compute instance-pool list -Ojson | \
13+
jq --arg ip_group "{instance-pool-group}" -r \
14+
'.[]|select(.name|contains($ip_group))|.name')
15+
16+
for node in $(echo -n {node-delete-list}); do
17+
for pool_name in ${instancepool_names}; do
18+
has_node=$(exo compute instance-pool show "${pool_name}" -Ojson | \
19+
jq --arg node "${node}" -r '.instances|index($node)!=null')
20+
if [ "$has_node" == "true" ]; then
21+
exo compute instance-pool evict "${pool_name}" "${node}" -z "$EXOSCALE_ZONE"
22+
break
23+
fi
24+
done
25+
done
26+
----
27+
28+
. Run Terraform to update the state with the new instance pool size
29+
+
30+
NOTE: There shouldn't be any changes since `instance-pool evict` reduces the instance-pool size by one.
31+
+
32+
NOTE: Ensure that you're still in directory `${WORK_DIR}/catalog/manifests/openshift4-terraform` before executing this command.
33+
+
34+
[source,bash]
35+
----
36+
terraform apply
37+
----
38+
39+
endif::[]

0 commit comments

Comments
 (0)