Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

运行了一年后,创建新的 pod 报错 failed bind with extender at URL http://127.0.0.1:32766/gpushare-scheduler/bind, code 500 #229

Open
klvchen opened this issue Aug 26, 2024 · 0 comments

Comments

@klvchen
Copy link

klvchen commented Aug 26, 2024

这边是自建的K8S,版本是 v1.24.6,证书自己修改的是10年。
gpushare-device-plugin 镜像是 registry.cn-hangzhou.aliyuncs.com/acs/k8s-gpushare-plugin:v2-1.11-aff8a23
k8s-gpushare-schd-extender 镜像是 registry.cn-hangzhou.aliyuncs.com/acs/k8s-gpushare-schd-extender:1.11-d170d8a

image
今天更新一个服务,发现无法创建 pod ,用了官方的测试例子,也是报同样的问题
binding rejected: failed bind with extender at URL http://127.0.0.1:32766/gpushare-scheduler/bind, code 500

#使用的测试例子的yaml
cat test.yaml
apiVersion: apps/v1 
kind: StatefulSet

metadata:
  name: binpack-1
  labels:
    app: binpack-1

spec:
  replicas: 2
  serviceName: "binpack-1"
  podManagementPolicy: "Parallel"
  selector: # define how the deployment finds the pods it manages
    matchLabels:
      app: binpack-1

  template: # define the pods specifications
    metadata:
      labels:
        app: binpack-1

    spec:
      containers:
      - name: binpack-1
        image: cheyang/gpu-player:v2
        resources:
          limits:
            # GiB
            aliyun.com/gpu-mem: 1

# 无法启动后检查
kubectl describe pod binpack-1-0

image
查看了 kubectl -n kube-system get pod
image

gpushare-schd-extender-6cf7d6cdd9-nb4ph 这个 pod 里面有很多 Unauthorized 字眼,不知道是否跟这有关系

[  warn ] 2024/08/26 09:39:13 gpushare-bind.go:25: Failed to handle pod binpack-1-0 in ns default due to error Unauthorized
[  info ] 2024/08/26 09:39:13 routes.go:137: extenderBindingResult = {"Error":"Unauthorized"}
[ debug ] 2024/08/26 09:39:13 routes.go:162: /gpushare-scheduler/bind response=&{0xc420198780 0xc420395800 0xc42089f400 0x565b70 true false false false 0xc420d72740 {0xc420e0e540 map[Content-Type:[application/json]] false false} map[Content-Type:[application/json]] true 24 -1 500 false false [] 0 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] [0 0 0 0 0 0 0 0 0 0] [0 0 0] 0xc4203699d0 0}
E0826 09:39:14.488506       1 reflector.go:205] github.com/AliyunContainerService/gpushare-scheduler-extender/vendor/k8s.io/client-go/informers/factory.go:130: Failed to list *v1.Pod: Unauthorized
E0826 09:39:14.489290       1 reflector.go:205] github.com/AliyunContainerService/gpushare-scheduler-extender/vendor/k8s.io/client-go/informers/factory.go:130: Failed to list *v1.Node: Unauthorized
[ debug ] 2024/08/26 09:39:14 routes.go:160: /gpushare-scheduler/filter request body = &{0xc420627940 <nil> <nil> false true {0 0} false false false 0x69bfd0}

请问该如何解决这个问题,谢谢~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
@klvchen and others