kubekey添加功能：可以清除现有集群配置，并重新部署集群 #2241

LiShuang-codes · 2024-05-12T10:09:26Z

Your current KubeKey version

kk version: &version.Info{Major:"3", Minor:"0", GitVersion:"v3.0.13", GitCommit:"ac75d3ef3c22e6a9d999dcea201234d6651b3e72", GitTreeState:"clean", BuildDate:"2023-10-30T11:15:14Z", GoVersion:"go1.19.2", Compiler:"gc", Platform:"linux/amd64"}

Describe this feature

为什么需要这个功能？
我的实验集群，在意外断电后，etcd服务无法再启动。整个集群已无法再恢复到原来的面貌。因此，我决定清除所有已安装的文件和配置，重新使用kk部署集群。但意外的发现，无论我如何清除文件，包括已安装的kubectl、kubelet和kubeadm，以及/var、/etc中的一些配置文件，总是不能干干净净地删掉这个集群，当我在使用kk重新部署的时候，总是认为是在现有的集群上部署，etcd也没有重新安装和部署。而我不想重新安装操作系统。
尝试清除的指令

systemctl stop kubelet
systemctl disable kubelet
systemctl stop etcd
systemctl stop docker
systemctl stop containerd
systemctl disable etcd

kubeadm reset -f
sudo apt-mark unhold kubeadm kubectl kubelet kubernetes-cni
sudo apt-get purge kubeadm kubectl kubelet kubernetes-cni -y
apt list | grep kube # 找到installed残留，继续卸载
rm -rvf $HOME/.kube
rm -rvf ~/.kube/
rm -rvf /etc/kubernetes/
rm -rvf /etc/systemd/system/kubelet.service.d
rm -rvf /etc/systemd/system/kubelet.service
rm -rvf /usr/bin/kube*
rm -rvf /etc/cni
rm -rvf /opt/cni
rm -rvf /var/lib/etcd
rm -rvf /var/lib/kubeedge
rm -rf $(find / -name "kubeedge*")
rm -rvf /var/etcd
rm -rf /var/lib/kubelet /var/lib/docker/overlay2/* /var/lib/dpkg/info/kubelet.* /var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/*/fs/usr/local/bin/kube-* /var/lib/systemd/deb-systemd-helper-masked/kubelet.service /var/lib/systemd/deb-systemd-helper-enabled/* /usr/libexec/kubernetes /etc/systemd/system/kubelet.service /etc/systemd/system/multi-user.target.wants/kubelet.service /var/log/pods/ /var/log/containers/  /sys/fs/cgroup/
systemctl status kubelet
rm -rf $(find / -name "*kube*" | grep -v "/root") # 查看残留
rm -rf $(find / -name "etcd" | grep -v "/root")
rm -rf $(find / -name "calico" | grep -v /root)
sudo iptables -F && sudo iptables -X && sudo iptables -F -t nat && sudo iptables -X -t nat
# Unit kubelet.service could not be found.
ss -tulp # 看看kube-proxy的进程pid：
kill -9 <pid>

重新安装时的报错

root@master:~# kk create cluster --with-kubernetes v1.23.17 --container-manager docker --with-kubesphere v3.4.0


 _   __      _          _   __
| | / /     | |        | | / /
| |/ / _   _| |__   ___| |/ /  ___ _   _
|    \| | | | '_ \ / _ \    \ / _ \ | | |
| |\  \ |_| | |_) |  __/ |\  \  __/ |_| |
\_| \_/\__,_|_.__/ \___\_| \_/\___|\__, |
                                    __/ |
                                   |___/

18:38:31 CST [GreetingsModule] Greetings
18:38:31 CST message: [master]
Greetings, KubeKey!
18:38:31 CST success: [master]
18:38:31 CST [NodePreCheckModule] A pre-check on nodes
18:38:31 CST success: [master]
18:38:31 CST [ConfirmModule] Display confirmation form
+--------+------+------+---------+----------+-------+-------+---------+-----------+--------+----------+------------+------------+-------------+------------------+--------------+
| name   | sudo | curl | openssl | ebtables | socat | ipset | ipvsadm | conntrack | chrony | docker   | containerd | nfs client | ceph client | glusterfs client | time         |
+--------+------+------+---------+----------+-------+-------+---------+-----------+--------+----------+------------+------------+-------------+------------------+--------------+
| master | y    | y    | y       | y        | y     | y     | y       | y         | y      | 20.10.24 | 1.6.31     |            |             |                  | CST 18:38:31 |
+--------+------+------+---------+----------+-------+-------+---------+-----------+--------+----------+------------+------------+-------------+------------------+--------------+

This is a simple check of your environment.
Before installation, ensure that your machines meet all requirements specified at
https://github.com/kubesphere/kubekey#requirements-and-recommendations

Continue this installation? [yes/no]: yes
18:38:32 CST success: [LocalHost]
18:38:32 CST [NodeBinariesModule] Download installation binaries
18:38:32 CST message: [localhost]
downloading amd64 kubeadm v1.23.17 ...
18:38:33 CST message: [localhost]
kubeadm is existed
18:38:33 CST message: [localhost]
downloading amd64 kubelet v1.23.17 ...
18:38:33 CST message: [localhost]
kubelet is existed
18:38:33 CST message: [localhost]
downloading amd64 kubectl v1.23.17 ...
18:38:33 CST message: [localhost]
kubectl is existed
18:38:33 CST message: [localhost]
downloading amd64 helm v3.9.0 ...
18:38:33 CST message: [localhost]
helm is existed
18:38:33 CST message: [localhost]
downloading amd64 kubecni v1.2.0 ...
18:38:34 CST message: [localhost]
kubecni is existed
18:38:34 CST message: [localhost]
downloading amd64 crictl v1.24.0 ...
18:38:34 CST message: [localhost]
crictl is existed
18:38:34 CST message: [localhost]
downloading amd64 etcd v3.4.13 ...
18:38:34 CST message: [localhost]
etcd is existed
18:38:34 CST message: [localhost]
downloading amd64 docker 24.0.6 ...
18:38:34 CST message: [localhost]
docker is existed
18:38:34 CST message: [localhost]
downloading amd64 calicoctl v3.26.1 ...
18:38:34 CST message: [localhost]
calicoctl is existed
18:38:34 CST success: [LocalHost]
18:38:34 CST [ConfigureOSModule] Get OS release
18:38:34 CST success: [master]
18:38:34 CST [ConfigureOSModule] Prepare to init OS
18:38:35 CST success: [master]
18:38:35 CST [ConfigureOSModule] Generate init os script
18:38:35 CST success: [master]
18:38:35 CST [ConfigureOSModule] Exec init os script
18:38:36 CST stdout: [master]
net.ipv4.conf.default.rp_filter = 1
net.ipv4.conf.all.rp_filter = 1
net.ipv4.ip_forward = 1
net.bridge.bridge-nf-call-arptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_local_reserved_ports = 30000-32767
net.core.netdev_max_backlog = 65535
net.core.rmem_max = 33554432
net.core.wmem_max = 33554432
net.core.somaxconn = 32768
net.ipv4.tcp_max_syn_backlog = 1048576
net.ipv4.neigh.default.gc_thresh1 = 512
net.ipv4.neigh.default.gc_thresh2 = 2048
net.ipv4.neigh.default.gc_thresh3 = 4096
net.ipv4.tcp_retries2 = 15
net.ipv4.tcp_max_tw_buckets = 1048576
net.ipv4.tcp_max_orphans = 65535
net.ipv4.udp_rmem_min = 131072
net.ipv4.udp_wmem_min = 131072
net.ipv4.conf.all.arp_accept = 1
net.ipv4.conf.default.arp_accept = 1
net.ipv4.conf.all.arp_ignore = 1
net.ipv4.conf.default.arp_ignore = 1
vm.max_map_count = 262144
vm.swappiness = 0
vm.overcommit_memory = 0
fs.inotify.max_user_instances = 524288
fs.inotify.max_user_watches = 524288
fs.pipe-max-size = 4194304
fs.aio-max-nr = 262144
kernel.pid_max = 65535
kernel.watchdog_thresh = 5
kernel.hung_task_timeout_secs = 5
18:38:36 CST success: [master]
18:38:36 CST [ConfigureOSModule] configure the ntp server for each node
18:38:36 CST skipped: [master]
18:38:36 CST [KubernetesStatusModule] Get kubernetes cluster status
18:38:36 CST success: [master]
18:38:36 CST [InstallContainerModule] Sync docker binaries
18:38:36 CST skipped: [master]
18:38:36 CST [InstallContainerModule] Generate docker service
18:38:36 CST skipped: [master]
18:38:36 CST [InstallContainerModule] Generate docker config
18:38:36 CST skipped: [master]
18:38:36 CST [InstallContainerModule] Enable docker
18:38:36 CST skipped: [master]
18:38:36 CST [InstallContainerModule] Add auths to container runtime
18:38:37 CST skipped: [master]
18:38:37 CST [PullModule] Start to pull images on all nodes
18:38:37 CST message: [master]
downloading image: registry.cn-beijing.aliyuncs.com/kubesphereio/pause:3.6
18:38:37 CST message: [master]
downloading image: registry.cn-beijing.aliyuncs.com/kubesphereio/kube-apiserver:v1.23.17
18:38:37 CST message: [master]
downloading image: registry.cn-beijing.aliyuncs.com/kubesphereio/kube-controller-manager:v1.23.17
18:38:38 CST message: [master]
downloading image: registry.cn-beijing.aliyuncs.com/kubesphereio/kube-scheduler:v1.23.17
18:38:38 CST message: [master]
downloading image: registry.cn-beijing.aliyuncs.com/kubesphereio/kube-proxy:v1.23.17
18:38:39 CST message: [master]
downloading image: registry.cn-beijing.aliyuncs.com/kubesphereio/coredns:1.8.6
18:38:39 CST message: [master]
downloading image: registry.cn-beijing.aliyuncs.com/kubesphereio/k8s-dns-node-cache:1.15.12
18:38:40 CST message: [master]
downloading image: registry.cn-beijing.aliyuncs.com/kubesphereio/kube-controllers:v3.26.1
18:38:40 CST message: [master]
downloading image: registry.cn-beijing.aliyuncs.com/kubesphereio/cni:v3.26.1
18:38:40 CST message: [master]
downloading image: registry.cn-beijing.aliyuncs.com/kubesphereio/node:v3.26.1
18:38:41 CST message: [master]
downloading image: registry.cn-beijing.aliyuncs.com/kubesphereio/pod2daemon-flexvol:v3.26.1
18:38:41 CST success: [master]
18:38:41 CST [ETCDPreCheckModule] Get etcd status
18:38:41 CST stdout: [master]
ETCD_NAME=etcd-master
18:38:41 CST success: [master]
18:38:41 CST [CertsModule] Fetch etcd certs
18:38:41 CST message: [master]
failed to find certificate files: Failed to exec command: sudo -E /bin/bash -c "ls /etc/ssl/etcd/ssl/ | grep .pem"
: Process exited with status 1
18:38:41 CST retry: [master]
18:38:46 CST message: [master]
failed to find certificate files: Failed to exec command: sudo -E /bin/bash -c "ls /etc/ssl/etcd/ssl/ | grep .pem"
: Process exited with status 1
18:38:46 CST retry: [master]
18:38:51 CST message: [master]
failed to find certificate files: Failed to exec command: sudo -E /bin/bash -c "ls /etc/ssl/etcd/ssl/ | grep .pem"
: Process exited with status 1
18:38:51 CST failed: [master]
error: Pipeline[CreateClusterPipeline] execute failed: Module[CertsModule] exec failed:
failed: [master] [FetchETCDCerts] exec failed after 3 retries: failed to find certificate files: Failed to exec command: sudo -E /bin/bash -c "ls /etc/ssl/etcd/ssl/ | grep .pem"
: Process exited with status 1

Describe the solution you'd like

kk可以提供从新开始的选项，就是抛弃原有集群的所有选项。

Additional information

无

The text was updated successfully, but these errors were encountered:

LiuG-lynx · 2024-05-15T08:40:05Z

请问我遇到了相同的问题 failed to find certificate files: Failed to exec command: sudo -E /bin/bash -c "ls /etc/ssl/etcd/ssl/ | grep .pem" 这个怎么单独解决呢？ @LiShuang-codes 我也是在重装集群环境，这个环境上面还有其他容器资源在运行，该怎么解决呢？

LiShuang-codes · 2024-05-15T08:48:18Z

如果你备份了etcd，直接恢复就好。回复教程网上有的是。没有的话，只能删掉重新不熟

…

---原始邮件--- 发件人: ***@***.***> 发送时间: 2024年5月15日(周三) 下午4:40 收件人: ***@***.***>; 抄送: ***@***.******@***.***>; 主题: Re: [kubesphere/kubekey] kubekey添加功能：可以清除现有集群配置，并重新部署集群 (Issue #2241) 请问我遇到了相同的问题 failed to find certificate files: Failed to exec command: sudo -E /bin/bash -c "ls /etc/ssl/etcd/ssl/ | grep .pem" 这个怎么单独解决呢？ @LiShuang-codes 我也是在重装集群环境，这个环境上面还有其他容器资源在运行，该怎么解决呢？ — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: ***@***.***>

xiaopengpeng · 2025-02-27T06:43:34Z

如果你备份了etcd，直接恢复就好。回复教程网上有的是。没有的话，只能删掉重新不熟
…

你还，删掉重新部署，是删掉什么呢，我目前也遇到了这个问题，很头疼，卸载完再部署怎么都不成功

xiaopengpeng · 2025-02-27T06:43:53Z

如果你备份了etcd，直接恢复就好。回复教程网上有的是。没有的话，只能删掉重新不熟
…

@LiShuang-codes

LiShuang-codes closed this as completed May 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kubekey添加功能：可以清除现有集群配置，并重新部署集群 #2241

kubekey添加功能：可以清除现有集群配置，并重新部署集群 #2241

LiShuang-codes commented May 12, 2024 •

edited

Loading

LiuG-lynx commented May 15, 2024

LiShuang-codes commented May 15, 2024 via email

xiaopengpeng commented Feb 27, 2025

xiaopengpeng commented Feb 27, 2025

kubekey添加功能：可以清除现有集群配置，并重新部署集群 #2241

kubekey添加功能：可以清除现有集群配置，并重新部署集群 #2241

Comments

LiShuang-codes commented May 12, 2024 • edited Loading

Your current KubeKey version

Describe this feature

Describe the solution you'd like

Additional information

LiuG-lynx commented May 15, 2024

LiShuang-codes commented May 15, 2024 via email

xiaopengpeng commented Feb 27, 2025

xiaopengpeng commented Feb 27, 2025

LiShuang-codes commented May 12, 2024 •

edited

Loading