Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

air-gapped environment install failed #2433

Open
Thatwho opened this issue Oct 19, 2024 · 1 comment
Open

air-gapped environment install failed #2433

Thatwho opened this issue Oct 19, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@Thatwho
Copy link

Thatwho commented Oct 19, 2024

What is version of KubeKey has the issue?

v3.1.6

What is your os environment?

Ubuntu 22.04

KubeKey config file

apiVersion: kubekey.kubesphere.io/v1alpha2
kind: Cluster
metadata:
  name: sample
spec:
  hosts:
  - {name: node1, address: 192.168.120.3, internalAddress: 192.168.120.3, user: user, password: "123456"}
  roleGroups:
    etcd:
    - node1
    control-plane: 
    - node1
    worker:
    - node1
    registry:
    - node1
  controlPlaneEndpoint:
    ## Internal loadbalancer for apiservers 
    # internalLoadbalancer: haproxy

    domain: lb.kubesphere.local
    address: ""
    port: 6443
  kubernetes:
    version: v1.28.13
    clusterName: cluster.local
    autoRenewCerts: true
    containerManager: containerd
  etcd:
    type: kubekey
  network:
    plugin: calico
    kubePodsCIDR: 10.233.64.0/18
    kubeServiceCIDR: 10.233.0.0/18
    ## multus support. https://github.com/k8snetworkplumbingwg/multus-cni
    multusCNI:
      enabled: false
  registry:
    type: harbor
    auths:
      "reg.kubekey.local":
        username: admin
        password: Harbor12345
        certsPath: /etc/docker/certs.d/reg.kubekey.local
    privateRegistry: "reg.kubekey.local"
    namespaceOverride: "kubesphere"
    registryMirrors: []
    insecureRegistries: []
  addons: []

A clear and concise description of what happend.

I'm facing an issue when installing Kubernetes in an air-gapped environment using Kubekey. The installation fails, and when I check the logs using journalctl -xfeu kubelet, I see the following error messages:

Oct 19 13:13:54 node1 kubelet[647]: E1019 13:13:54.628298     647 event.go:289] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"kube-scheduler-node1.17ffcfa08ae2e7f3", GenerateName:"", Namespace:"kube-system", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Pod", Namespace:"kube-system", Name:"kube-scheduler-node1", UID:"6a9443ee7c86d00dc2d503017a48ce21", APIVersion:"v1", ResourceVersion:"", FieldPath:""}, Reason:"FailedCreatePodSandBox", Message:"Failed to create pod sandbox: rpc error: code = Unknown desc = failed to get sandbox image \"registry.k8s.io/pause:3.8\": failed to pull image \"registry.k8s.io/pause:3.8\": failed to pull and unpack image \"registry.k8s.io/pause:3.8\": failed to resolve reference \"registry.k8s.io/pause:3.8\": failed to do request: Head \"https://registry.k8s.io/v2/pause/manifests/3.8\": dial tcp: lookup registry.k8s.io on 114.114.114.114:53: dial udp 114.114.114.114:53: connect: network is unreachable", Source:v1.EventSource{Component:"kubelet", Host:"node1"}, FirstTimestamp:time.Date(2024, time.October, 19, 9, 11, 10, 365403123, time.Local), LastTimestamp:time.Date(2024, time.October, 19, 9, 15, 52, 951162615, time.Local), Count:22, Type:"Warning", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"kubelet", ReportingInstance:"node1"}': 'Patch "https://lb.kubesphere.local:6443/api/v1/namespaces/kube-system/events/kube-scheduler-node1.17ffcfa08ae2e7f3": dial tcp 192.168.120.3:6443: connect: connection refused'(may retry after sleeping)
Oct 19 13:13:55 node1 kubelet[647]: E1019 13:13:55.185117     647 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"https://lb.kubesphere.local:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/node1?timeout=10s\": dial tcp 192.168.120.3:6443: connect: connection refused" interval="7s"
Oct 19 13:13:55 node1 kubelet[647]: E1019 13:13:55.983785     647 remote_runtime.go:193] "RunPodSandbox from runtime service failed" err="rpc error: code = Unknown desc = failed to get sandbox image \"registry.k8s.io/pause:3.8\": failed to pull image \"registry.k8s.io/pause:3.8\": failed to pull and unpack image \"registry.k8s.io/pause:3.8\": failed to resolve reference \"registry.k8s.io/pause:3.8\": failed to do request: Head \"https://registry.k8s.io/v2/pause/manifests/3.8\": dial tcp: lookup registry.k8s.io on 114.114.114.114:53: dial udp 114.114.114.114:53: connect: network is unreachable"
Oct 19 13:13:55 node1 kubelet[647]: E1019 13:13:55.983905     647 kuberuntime_sandbox.go:72] "Failed to create sandbox for pod" err="rpc error: code = Unknown desc = failed to get sandbox image \"registry.k8s.io/pause:3.8\": failed to pull image \"registry.k8s.io/pause:3.8\": failed to pull and unpack image \"registry.k8s.io/pause:3.8\": failed to resolve reference \"registry.k8s.io/pause:3.8\": failed to do request: Head \"https://registry.k8s.io/v2/pause/manifests/3.8\": dial tcp: lookup registry.k8s.io on 114.114.114.114:53: dial udp 114.114.114.114:53: connect: network is unreachable" pod="kube-system/kube-scheduler-node1"
Oct 19 13:13:55 node1 kubelet[647]: E1019 13:13:55.983933     647 kuberuntime_manager.go:1181] "CreatePodSandbox for pod failed" err="rpc error: code = Unknown desc = failed to get sandbox image \"registry.k8s.io/pause:3.8\": failed to pull image \"registry.k8s.io/pause:3.8\": failed to pull and unpack image \"registry.k8s.io/pause:3.8\": failed to resolve reference \"registry.k8s.io/pause:3.8\": failed to do request: Head \"https://registry.k8s.io/v2/pause/manifests/3.8\": dial tcp: lookup registry.k8s.io on 114.114.114.114:53: dial udp 114.114.114.114:53: connect: network is unreachable" pod="kube-system/kube-scheduler-node1"
Oct 19 13:13:55 node1 kubelet[647]: E1019 13:13:55.983991     647 pod_workers.go:1300] "Error syncing pod, skipping" err="failed to \"CreatePodSandbox\" for \"kube-scheduler-node1_kube-system(6a9443ee7c86d00dc2d503017a48ce21)\" with CreatePodSandboxError: \"Failed to create sandbox for pod \\\"kube-scheduler-node1_kube-system(6a9443ee7c86d00dc2d503017a48ce21)\\\": rpc error: code = Unknown desc = failed to get sandbox image \\\"registry.k8s.io/pause:3.8\\\": failed to pull image \\\"registry.k8s.io/pause:3.8\\\": failed to pull and unpack image \\\"registry.k8s.io/pause:3.8\\\": failed to resolve reference \\\"registry.k8s.io/pause:3.8\\\": failed to do request: Head \\\"https://registry.k8s.io/v2/pause/manifests/3.8\\\": dial tcp: lookup registry.k8s.io on 114.114.114.114:53: dial udp 114.114.114.114:53: connect: network is unreachable\"" pod="kube-system/kube-scheduler-node1" podUID="6a9443ee7c86d00dc2d503017a48ce21"
Oct 19 13:13:55 node1 kubelet[647]: E1019 13:13:55.986432     647 remote_runtime.go:193] "RunPodSandbox from runtime service failed" err="rpc error: code = Unknown desc = failed to get sandbox image \"registry.k8s.io/pause:3.8\": failed to pull image \"registry.k8s.io/pause:3.8\": failed to pull and unpack image \"registry.k8s.io/pause:3.8\": failed to resolve reference \"registry.k8s.io/pause:3.8\": failed to do request: Head \"https://registry.k8s.io/v2/pause/manifests/3.8\": dial tcp: lookup registry.k8s.io on 114.114.114.114:53: dial udp 114.114.114.114:53: connect: network is unreachable"
Oct 19 13:13:55 node1 kubelet[647]: E1019 13:13:55.986460     647 kuberuntime_sandbox.go:72] "Failed to create sandbox for pod" err="rpc error: code = Unknown desc = failed to get sandbox image \"registry.k8s.io/pause:3.8\": failed to pull image \"registry.k8s.io/pause:3.8\": failed to pull and unpack image \"registry.k8s.io/pause:3.8\": failed to resolve reference \"registry.k8s.io/pause:3.8\": failed to do request: Head \"https://registry.k8s.io/v2/pause/manifests/3.8\": dial tcp: lookup registry.k8s.io on 114.114.114.114:53: dial udp 114.114.114.114:53: connect: network is unreachable" pod="kube-system/kube-controller-manager-node1"
Oct 19 13:13:55 node1 kubelet[647]: E1019 13:13:55.986475     647 kuberuntime_manager.go:1181] "CreatePodSandbox for pod failed" err="rpc error: code = Unknown desc = failed to get sandbox image \"registry.k8s.io/pause:3.8\": failed to pull image \"registry.k8s.io/pause:3.8\": failed to pull and unpack image \"registry.k8s.io/pause:3.8\": failed to resolve reference \"registry.k8s.io/pause:3.8\": failed to do request: Head \"https://registry.k8s.io/v2/pause/manifests/3.8\": dial tcp: lookup registry.k8s.io on 114.114.114.114:53: dial udp 114.114.114.114:53: connect: network is unreachable" pod="kube-system/kube-controller-manager-node1"
Oct 19 13:13:55 node1 kubelet[647]: E1019 13:13:55.986508     647 pod_workers.go:1300] "Error syncing pod, skipping" err="failed to \"CreatePodSandbox\" for \"kube-controller-manager-node1_kube-system(8e63188c7b866f30b73a171d356edc93)\" with CreatePodSandboxError: \"Failed to create sandbox for pod \\\"kube-controller-manager-node1_kube-system(8e63188c7b866f30b73a171d356edc93)\\\": rpc error: code = Unknown desc = failed to get sandbox image \\\"registry.k8s.io/pause:3.8\\\": failed to pull image \\\"registry.k8s.io/pause:3.8\\\": failed to pull and unpack image \\\"registry.k8s.io/pause:3.8\\\": failed to resolve reference \\\"registry.k8s.io/pause:3.8\\\": failed to do request: Head \\\"https://registry.k8s.io/v2/pause/manifests/3.8\\\": dial tcp: lookup registry.k8s.io on 114.114.114.114:53: dial udp 114.114.114.114:53: connect: network is unreachable\"" pod="kube-system/kube-controller-manager-node1" podUID="8e63188c7b866f30b73a171d356edc93"

It seems like the kubelet is trying to pull the pause image from the internet, but since this is an air-gapped environment, the network is unreachable.

Did I miss any steps during the air-gapped setup that could have caused this? How should I proceed to ensure the kubelet pulls the required images from the local repository instead of the internet?

Relevant log output

Oct 19 13:13:54 node1 kubelet[647]: E1019 13:13:54.628298     647 event.go:289] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"kube-scheduler-node1.17ffcfa08ae2e7f3", GenerateName:"", Namespace:"kube-system", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Pod", Namespace:"kube-system", Name:"kube-scheduler-node1", UID:"6a9443ee7c86d00dc2d503017a48ce21", APIVersion:"v1", ResourceVersion:"", FieldPath:""}, Reason:"FailedCreatePodSandBox", Message:"Failed to create pod sandbox: rpc error: code = Unknown desc = failed to get sandbox image \"registry.k8s.io/pause:3.8\": failed to pull image \"registry.k8s.io/pause:3.8\": failed to pull and unpack image \"registry.k8s.io/pause:3.8\": failed to resolve reference \"registry.k8s.io/pause:3.8\": failed to do request: Head \"https://registry.k8s.io/v2/pause/manifests/3.8\": dial tcp: lookup registry.k8s.io on 114.114.114.114:53: dial udp 114.114.114.114:53: connect: network is unreachable", Source:v1.EventSource{Component:"kubelet", Host:"node1"}, FirstTimestamp:time.Date(2024, time.October, 19, 9, 11, 10, 365403123, time.Local), LastTimestamp:time.Date(2024, time.October, 19, 9, 15, 52, 951162615, time.Local), Count:22, Type:"Warning", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"kubelet", ReportingInstance:"node1"}': 'Patch "https://lb.kubesphere.local:6443/api/v1/namespaces/kube-system/events/kube-scheduler-node1.17ffcfa08ae2e7f3": dial tcp 192.168.120.3:6443: connect: connection refused'(may retry after sleeping)
Oct 19 13:13:55 node1 kubelet[647]: E1019 13:13:55.185117     647 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"https://lb.kubesphere.local:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/node1?timeout=10s\": dial tcp 192.168.120.3:6443: connect: connection refused" interval="7s"
Oct 19 13:13:55 node1 kubelet[647]: E1019 13:13:55.983785     647 remote_runtime.go:193] "RunPodSandbox from runtime service failed" err="rpc error: code = Unknown desc = failed to get sandbox image \"registry.k8s.io/pause:3.8\": failed to pull image \"registry.k8s.io/pause:3.8\": failed to pull and unpack image \"registry.k8s.io/pause:3.8\": failed to resolve reference \"registry.k8s.io/pause:3.8\": failed to do request: Head \"https://registry.k8s.io/v2/pause/manifests/3.8\": dial tcp: lookup registry.k8s.io on 114.114.114.114:53: dial udp 114.114.114.114:53: connect: network is unreachable"
Oct 19 13:13:55 node1 kubelet[647]: E1019 13:13:55.983905     647 kuberuntime_sandbox.go:72] "Failed to create sandbox for pod" err="rpc error: code = Unknown desc = failed to get sandbox image \"registry.k8s.io/pause:3.8\": failed to pull image \"registry.k8s.io/pause:3.8\": failed to pull and unpack image \"registry.k8s.io/pause:3.8\": failed to resolve reference \"registry.k8s.io/pause:3.8\": failed to do request: Head \"https://registry.k8s.io/v2/pause/manifests/3.8\": dial tcp: lookup registry.k8s.io on 114.114.114.114:53: dial udp 114.114.114.114:53: connect: network is unreachable" pod="kube-system/kube-scheduler-node1"
Oct 19 13:13:55 node1 kubelet[647]: E1019 13:13:55.983933     647 kuberuntime_manager.go:1181] "CreatePodSandbox for pod failed" err="rpc error: code = Unknown desc = failed to get sandbox image \"registry.k8s.io/pause:3.8\": failed to pull image \"registry.k8s.io/pause:3.8\": failed to pull and unpack image \"registry.k8s.io/pause:3.8\": failed to resolve reference \"registry.k8s.io/pause:3.8\": failed to do request: Head \"https://registry.k8s.io/v2/pause/manifests/3.8\": dial tcp: lookup registry.k8s.io on 114.114.114.114:53: dial udp 114.114.114.114:53: connect: network is unreachable" pod="kube-system/kube-scheduler-node1"
Oct 19 13:13:55 node1 kubelet[647]: E1019 13:13:55.983991     647 pod_workers.go:1300] "Error syncing pod, skipping" err="failed to \"CreatePodSandbox\" for \"kube-scheduler-node1_kube-system(6a9443ee7c86d00dc2d503017a48ce21)\" with CreatePodSandboxError: \"Failed to create sandbox for pod \\\"kube-scheduler-node1_kube-system(6a9443ee7c86d00dc2d503017a48ce21)\\\": rpc error: code = Unknown desc = failed to get sandbox image \\\"registry.k8s.io/pause:3.8\\\": failed to pull image \\\"registry.k8s.io/pause:3.8\\\": failed to pull and unpack image \\\"registry.k8s.io/pause:3.8\\\": failed to resolve reference \\\"registry.k8s.io/pause:3.8\\\": failed to do request: Head \\\"https://registry.k8s.io/v2/pause/manifests/3.8\\\": dial tcp: lookup registry.k8s.io on 114.114.114.114:53: dial udp 114.114.114.114:53: connect: network is unreachable\"" pod="kube-system/kube-scheduler-node1" podUID="6a9443ee7c86d00dc2d503017a48ce21"
Oct 19 13:13:55 node1 kubelet[647]: E1019 13:13:55.986432     647 remote_runtime.go:193] "RunPodSandbox from runtime service failed" err="rpc error: code = Unknown desc = failed to get sandbox image \"registry.k8s.io/pause:3.8\": failed to pull image \"registry.k8s.io/pause:3.8\": failed to pull and unpack image \"registry.k8s.io/pause:3.8\": failed to resolve reference \"registry.k8s.io/pause:3.8\": failed to do request: Head \"https://registry.k8s.io/v2/pause/manifests/3.8\": dial tcp: lookup registry.k8s.io on 114.114.114.114:53: dial udp 114.114.114.114:53: connect: network is unreachable"
Oct 19 13:13:55 node1 kubelet[647]: E1019 13:13:55.986460     647 kuberuntime_sandbox.go:72] "Failed to create sandbox for pod" err="rpc error: code = Unknown desc = failed to get sandbox image \"registry.k8s.io/pause:3.8\": failed to pull image \"registry.k8s.io/pause:3.8\": failed to pull and unpack image \"registry.k8s.io/pause:3.8\": failed to resolve reference \"registry.k8s.io/pause:3.8\": failed to do request: Head \"https://registry.k8s.io/v2/pause/manifests/3.8\": dial tcp: lookup registry.k8s.io on 114.114.114.114:53: dial udp 114.114.114.114:53: connect: network is unreachable" pod="kube-system/kube-controller-manager-node1"
Oct 19 13:13:55 node1 kubelet[647]: E1019 13:13:55.986475     647 kuberuntime_manager.go:1181] "CreatePodSandbox for pod failed" err="rpc error: code = Unknown desc = failed to get sandbox image \"registry.k8s.io/pause:3.8\": failed to pull image \"registry.k8s.io/pause:3.8\": failed to pull and unpack image \"registry.k8s.io/pause:3.8\": failed to resolve reference \"registry.k8s.io/pause:3.8\": failed to do request: Head \"https://registry.k8s.io/v2/pause/manifests/3.8\": dial tcp: lookup registry.k8s.io on 114.114.114.114:53: dial udp 114.114.114.114:53: connect: network is unreachable" pod="kube-system/kube-controller-manager-node1"
Oct 19 13:13:55 node1 kubelet[647]: E1019 13:13:55.986508     647 pod_workers.go:1300] "Error syncing pod, skipping" err="failed to \"CreatePodSandbox\" for \"kube-controller-manager-node1_kube-system(8e63188c7b866f30b73a171d356edc93)\" with CreatePodSandboxError: \"Failed to create sandbox for pod \\\"kube-controller-manager-node1_kube-system(8e63188c7b866f30b73a171d356edc93)\\\": rpc error: code = Unknown desc = failed to get sandbox image \\\"registry.k8s.io/pause:3.8\\\": failed to pull image \\\"registry.k8s.io/pause:3.8\\\": failed to pull and unpack image \\\"registry.k8s.io/pause:3.8\\\": failed to resolve reference \\\"registry.k8s.io/pause:3.8\\\": failed to do request: Head \\\"https://registry.k8s.io/v2/pause/manifests/3.8\\\": dial tcp: lookup registry.k8s.io on 114.114.114.114:53: dial udp 114.114.114.114:53: connect: network is unreachable\"" pod="kube-system/kube-controller-manager-node1" podUID="8e63188c7b866f30b73a171d356edc93"

Additional information

No response

@Thatwho Thatwho added the bug Something isn't working label Oct 19, 2024
@lbrigman124
Copy link

lbrigman124 commented Oct 22, 2024

There are a few steps that are needed to make air-gapped installations successful.
Install a cluster without being air-gapped.
Use the cluster to create an artifact manifest. This is different from the cluster install manifest.
Sanity check this file to make sure all needed containers are included in the artifact file. Manually add additional containers if necessary.
Note, the docker registry tarball is not included and will need to be added
Then create an artifact file with the above artifact manifest.

Note that if you are doing an HA install the registry server for the air-gapped system to pull from needs to be
separate from your cluster.

See: https://github.com/kubesphere/kubekey/blob/master/docs/manifest_and_artifact.md
Additional steps after getting the artifacts file
kk init registry -f <config_file> -a <artifacts_file>
kk artifact image push -f <config_file> -a <artifacts_file>

Then before installing the cluster use
kk init os -f <config_file> -a <artifacts_file>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants