Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubekey添加功能:可以清除现有集群配置,并重新部署集群 #2241

Closed
LiShuang-codes opened this issue May 12, 2024 · 4 comments

Comments

@LiShuang-codes
Copy link

LiShuang-codes commented May 12, 2024

Your current KubeKey version

kk version: &version.Info{Major:"3", Minor:"0", GitVersion:"v3.0.13", GitCommit:"ac75d3ef3c22e6a9d999dcea201234d6651b3e72", GitTreeState:"clean", BuildDate:"2023-10-30T11:15:14Z", GoVersion:"go1.19.2", Compiler:"gc", Platform:"linux/amd64"}

Describe this feature

  1. 为什么需要这个功能?
    我的实验集群,在意外断电后,etcd服务无法再启动。整个集群已无法再恢复到原来的面貌。因此,我决定清除所有已安装的文件和配置,重新使用kk部署集群。但意外的发现,无论我如何清除文件,包括已安装的kubectl、kubelet和kubeadm,以及/var、/etc中的一些配置文件,总是不能干干净净地删掉这个集群,当我在使用kk重新部署的时候,总是认为是在现有的集群上部署,etcd也没有重新安装和部署。而我不想重新安装操作系统。
  2. 尝试清除的指令
systemctl stop kubelet
systemctl disable kubelet
systemctl stop etcd
systemctl stop docker
systemctl stop containerd
systemctl disable etcd

kubeadm reset -f
sudo apt-mark unhold kubeadm kubectl kubelet kubernetes-cni
sudo apt-get purge kubeadm kubectl kubelet kubernetes-cni -y
apt list | grep kube # 找到installed残留,继续卸载
rm -rvf $HOME/.kube
rm -rvf ~/.kube/
rm -rvf /etc/kubernetes/
rm -rvf /etc/systemd/system/kubelet.service.d
rm -rvf /etc/systemd/system/kubelet.service
rm -rvf /usr/bin/kube*
rm -rvf /etc/cni
rm -rvf /opt/cni
rm -rvf /var/lib/etcd
rm -rvf /var/lib/kubeedge
rm -rf $(find / -name "kubeedge*")
rm -rvf /var/etcd
rm -rf /var/lib/kubelet /var/lib/docker/overlay2/* /var/lib/dpkg/info/kubelet.* /var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/*/fs/usr/local/bin/kube-* /var/lib/systemd/deb-systemd-helper-masked/kubelet.service /var/lib/systemd/deb-systemd-helper-enabled/* /usr/libexec/kubernetes /etc/systemd/system/kubelet.service /etc/systemd/system/multi-user.target.wants/kubelet.service /var/log/pods/ /var/log/containers/  /sys/fs/cgroup/
systemctl status kubelet
rm -rf $(find / -name "*kube*" | grep -v "/root") # 查看残留
rm -rf $(find / -name "etcd" | grep -v "/root")
rm -rf $(find / -name "calico" | grep -v /root)
sudo iptables -F && sudo iptables -X && sudo iptables -F -t nat && sudo iptables -X -t nat
# Unit kubelet.service could not be found.
ss -tulp # 看看kube-proxy的进程pid:
kill -9 <pid>
  1. 重新安装时的报错
root@master:~# kk create cluster --with-kubernetes v1.23.17 --container-manager docker --with-kubesphere v3.4.0


 _   __      _          _   __
| | / /     | |        | | / /
| |/ / _   _| |__   ___| |/ /  ___ _   _
|    \| | | | '_ \ / _ \    \ / _ \ | | |
| |\  \ |_| | |_) |  __/ |\  \  __/ |_| |
\_| \_/\__,_|_.__/ \___\_| \_/\___|\__, |
                                    __/ |
                                   |___/

18:38:31 CST [GreetingsModule] Greetings
18:38:31 CST message: [master]
Greetings, KubeKey!
18:38:31 CST success: [master]
18:38:31 CST [NodePreCheckModule] A pre-check on nodes
18:38:31 CST success: [master]
18:38:31 CST [ConfirmModule] Display confirmation form
+--------+------+------+---------+----------+-------+-------+---------+-----------+--------+----------+------------+------------+-------------+------------------+--------------+
| name   | sudo | curl | openssl | ebtables | socat | ipset | ipvsadm | conntrack | chrony | docker   | containerd | nfs client | ceph client | glusterfs client | time         |
+--------+------+------+---------+----------+-------+-------+---------+-----------+--------+----------+------------+------------+-------------+------------------+--------------+
| master | y    | y    | y       | y        | y     | y     | y       | y         | y      | 20.10.24 | 1.6.31     |            |             |                  | CST 18:38:31 |
+--------+------+------+---------+----------+-------+-------+---------+-----------+--------+----------+------------+------------+-------------+------------------+--------------+

This is a simple check of your environment.
Before installation, ensure that your machines meet all requirements specified at
https://github.com/kubesphere/kubekey#requirements-and-recommendations

Continue this installation? [yes/no]: yes
18:38:32 CST success: [LocalHost]
18:38:32 CST [NodeBinariesModule] Download installation binaries
18:38:32 CST message: [localhost]
downloading amd64 kubeadm v1.23.17 ...
18:38:33 CST message: [localhost]
kubeadm is existed
18:38:33 CST message: [localhost]
downloading amd64 kubelet v1.23.17 ...
18:38:33 CST message: [localhost]
kubelet is existed
18:38:33 CST message: [localhost]
downloading amd64 kubectl v1.23.17 ...
18:38:33 CST message: [localhost]
kubectl is existed
18:38:33 CST message: [localhost]
downloading amd64 helm v3.9.0 ...
18:38:33 CST message: [localhost]
helm is existed
18:38:33 CST message: [localhost]
downloading amd64 kubecni v1.2.0 ...
18:38:34 CST message: [localhost]
kubecni is existed
18:38:34 CST message: [localhost]
downloading amd64 crictl v1.24.0 ...
18:38:34 CST message: [localhost]
crictl is existed
18:38:34 CST message: [localhost]
downloading amd64 etcd v3.4.13 ...
18:38:34 CST message: [localhost]
etcd is existed
18:38:34 CST message: [localhost]
downloading amd64 docker 24.0.6 ...
18:38:34 CST message: [localhost]
docker is existed
18:38:34 CST message: [localhost]
downloading amd64 calicoctl v3.26.1 ...
18:38:34 CST message: [localhost]
calicoctl is existed
18:38:34 CST success: [LocalHost]
18:38:34 CST [ConfigureOSModule] Get OS release
18:38:34 CST success: [master]
18:38:34 CST [ConfigureOSModule] Prepare to init OS
18:38:35 CST success: [master]
18:38:35 CST [ConfigureOSModule] Generate init os script
18:38:35 CST success: [master]
18:38:35 CST [ConfigureOSModule] Exec init os script
18:38:36 CST stdout: [master]
net.ipv4.conf.default.rp_filter = 1
net.ipv4.conf.all.rp_filter = 1
net.ipv4.ip_forward = 1
net.bridge.bridge-nf-call-arptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_local_reserved_ports = 30000-32767
net.core.netdev_max_backlog = 65535
net.core.rmem_max = 33554432
net.core.wmem_max = 33554432
net.core.somaxconn = 32768
net.ipv4.tcp_max_syn_backlog = 1048576
net.ipv4.neigh.default.gc_thresh1 = 512
net.ipv4.neigh.default.gc_thresh2 = 2048
net.ipv4.neigh.default.gc_thresh3 = 4096
net.ipv4.tcp_retries2 = 15
net.ipv4.tcp_max_tw_buckets = 1048576
net.ipv4.tcp_max_orphans = 65535
net.ipv4.udp_rmem_min = 131072
net.ipv4.udp_wmem_min = 131072
net.ipv4.conf.all.arp_accept = 1
net.ipv4.conf.default.arp_accept = 1
net.ipv4.conf.all.arp_ignore = 1
net.ipv4.conf.default.arp_ignore = 1
vm.max_map_count = 262144
vm.swappiness = 0
vm.overcommit_memory = 0
fs.inotify.max_user_instances = 524288
fs.inotify.max_user_watches = 524288
fs.pipe-max-size = 4194304
fs.aio-max-nr = 262144
kernel.pid_max = 65535
kernel.watchdog_thresh = 5
kernel.hung_task_timeout_secs = 5
18:38:36 CST success: [master]
18:38:36 CST [ConfigureOSModule] configure the ntp server for each node
18:38:36 CST skipped: [master]
18:38:36 CST [KubernetesStatusModule] Get kubernetes cluster status
18:38:36 CST success: [master]
18:38:36 CST [InstallContainerModule] Sync docker binaries
18:38:36 CST skipped: [master]
18:38:36 CST [InstallContainerModule] Generate docker service
18:38:36 CST skipped: [master]
18:38:36 CST [InstallContainerModule] Generate docker config
18:38:36 CST skipped: [master]
18:38:36 CST [InstallContainerModule] Enable docker
18:38:36 CST skipped: [master]
18:38:36 CST [InstallContainerModule] Add auths to container runtime
18:38:37 CST skipped: [master]
18:38:37 CST [PullModule] Start to pull images on all nodes
18:38:37 CST message: [master]
downloading image: registry.cn-beijing.aliyuncs.com/kubesphereio/pause:3.6
18:38:37 CST message: [master]
downloading image: registry.cn-beijing.aliyuncs.com/kubesphereio/kube-apiserver:v1.23.17
18:38:37 CST message: [master]
downloading image: registry.cn-beijing.aliyuncs.com/kubesphereio/kube-controller-manager:v1.23.17
18:38:38 CST message: [master]
downloading image: registry.cn-beijing.aliyuncs.com/kubesphereio/kube-scheduler:v1.23.17
18:38:38 CST message: [master]
downloading image: registry.cn-beijing.aliyuncs.com/kubesphereio/kube-proxy:v1.23.17
18:38:39 CST message: [master]
downloading image: registry.cn-beijing.aliyuncs.com/kubesphereio/coredns:1.8.6
18:38:39 CST message: [master]
downloading image: registry.cn-beijing.aliyuncs.com/kubesphereio/k8s-dns-node-cache:1.15.12
18:38:40 CST message: [master]
downloading image: registry.cn-beijing.aliyuncs.com/kubesphereio/kube-controllers:v3.26.1
18:38:40 CST message: [master]
downloading image: registry.cn-beijing.aliyuncs.com/kubesphereio/cni:v3.26.1
18:38:40 CST message: [master]
downloading image: registry.cn-beijing.aliyuncs.com/kubesphereio/node:v3.26.1
18:38:41 CST message: [master]
downloading image: registry.cn-beijing.aliyuncs.com/kubesphereio/pod2daemon-flexvol:v3.26.1
18:38:41 CST success: [master]
18:38:41 CST [ETCDPreCheckModule] Get etcd status
18:38:41 CST stdout: [master]
ETCD_NAME=etcd-master
18:38:41 CST success: [master]
18:38:41 CST [CertsModule] Fetch etcd certs
18:38:41 CST message: [master]
failed to find certificate files: Failed to exec command: sudo -E /bin/bash -c "ls /etc/ssl/etcd/ssl/ | grep .pem"
: Process exited with status 1
18:38:41 CST retry: [master]
18:38:46 CST message: [master]
failed to find certificate files: Failed to exec command: sudo -E /bin/bash -c "ls /etc/ssl/etcd/ssl/ | grep .pem"
: Process exited with status 1
18:38:46 CST retry: [master]
18:38:51 CST message: [master]
failed to find certificate files: Failed to exec command: sudo -E /bin/bash -c "ls /etc/ssl/etcd/ssl/ | grep .pem"
: Process exited with status 1
18:38:51 CST failed: [master]
error: Pipeline[CreateClusterPipeline] execute failed: Module[CertsModule] exec failed:
failed: [master] [FetchETCDCerts] exec failed after 3 retries: failed to find certificate files: Failed to exec command: sudo -E /bin/bash -c "ls /etc/ssl/etcd/ssl/ | grep .pem"
: Process exited with status 1

Describe the solution you'd like

kk可以提供从新开始的选项,就是抛弃原有集群的所有选项。

Additional information

@LiuG-lynx
Copy link

请问我遇到了相同的问题 failed to find certificate files: Failed to exec command: sudo -E /bin/bash -c "ls /etc/ssl/etcd/ssl/ | grep .pem" 这个怎么单独解决呢? @LiShuang-codes 我也是在重装集群环境,这个环境上面还有其他容器资源在运行,该怎么解决呢?

@LiShuang-codes
Copy link
Author

LiShuang-codes commented May 15, 2024 via email

@xiaopengpeng
Copy link

如果你备份了etcd,直接恢复就好。回复教程网上有的是。没有的话,只能删掉重新不熟

你还,删掉重新部署,是删掉什么呢,我目前也遇到了这个问题,很头疼,卸载完再部署怎么都不成功

@xiaopengpeng
Copy link

如果你备份了etcd,直接恢复就好。回复教程网上有的是。没有的话,只能删掉重新不熟

@LiShuang-codes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants