linux – When using cilium as Kubernetes network CNI, the coredns is running but not-ready, healthcheck failed and plugin/errors HINFO: read udp i/o timeout

I find some error when using Cilium as Kubernetes network.

  • Cilium: 1.11.6
  • Kubernetes: 1.23.0

Describe

The step of create cluster as follow:

  1. Use kubeadm –config kubeadm.conf init cluster:
kind: InitConfiguration
localAPIEndpoint:
  advertiseAddress: 192.168.153.21
  bindPort: 6443
nodeRegistration:
  criSocket: /var/run/dockershim.sock
  imagePullPolicy: IfNotPresent
  name: nm
  taints: null
---
apiServer:
  timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta3
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns: {}
etcd:
  local:
    dataDir: /var/lib/etcd
imageRepository: registry.aliyuncs.com/google_containers
kind: ClusterConfiguration
kubernetesVersion: 1.23.0
networking:
  dnsDomain: cluster.local
  serviceSubnet: 10.96.0.0/12
  podSubnet: 10.5.0.0/16
scheduler: {}
---
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
resolvConf: /run/systemd/resolve/resolv.conf

Runkubeadm init --config kubeadm.conf to init cluster.

  1. Join to worker node.
  • master: 192.168.153.21
  • worker1: 192.168.153.22
  • worker2: 192.168.153.22
  1. Install Cilium I use cilium install to install cilium. The status of cordedns from Pending to Running. But it’s not ready!
root@nm:/work-place/kubernetes/create-cluster# kubectl get pods -A -o wide
NAMESPACE     NAME                               READY   STATUS             RESTARTS        AGE   IP               NODE   NOMINATED NODE   READINESS GATES
kube-system   cilium-99lxc                       1/1     Running            0               18m   192.168.153.22   na     <none>           <none>
kube-system   cilium-ct5s7                       1/1     Running            0               18m   192.168.153.21   nm     <none>           <none>
kube-system   cilium-drtlh                       1/1     Running            0               18m   192.168.153.23   nb     <none>           <none>
kube-system   cilium-operator-5d67fc458d-zxgdd   1/1     Running            0               18m   192.168.153.22   na     <none>           <none>
kube-system   coredns-6d8c4cb4d-jkssb            0/1     Running            8 (2m55s ago)   19m   10.0.0.240       na     <none>           <none>
kube-system   coredns-6d8c4cb4d-psxvw            0/1     CrashLoopBackOff   8 (83s ago)     19m   10.0.2.176       nb     <none>           <none>
kube-system   etcd-nm                            1/1     Running            2               25m   192.168.153.21   nm     <none>           <none>
kube-system   kube-apiserver-nm                  1/1     Running            2               25m   192.168.153.21   nm     <none>           <none>
kube-system   kube-controller-manager-nm         1/1     Running            2               25m   192.168.153.21   nm     <none>           <none>
kube-system   kube-proxy-hv5nc                   1/1     Running            0               24m   192.168.153.22   na     <none>           <none>
kube-system   kube-proxy-pbzlx                   1/1     Running            0               24m   192.168.153.23   nb     <none>           <none>
kube-system   kube-proxy-rqpxw                   1/1     Running            0               25m   192.168.153.21   nm     <none>           <none>
kube-system   kube-scheduler-nm                  1/1     Running            2               25m   192.168.153.21   nm     <none>           <none>

Here is the logs and describe of coreds.

root@nm:/work-place/kubernetes/create-cluster# kubectl describe pod  coredns-6d8c4cb4d-jkssb -n kube-system
Name:                 coredns-6d8c4cb4d-jkssb
Namespace:            kube-system
Priority:             2000000000
Priority Class Name:  system-cluster-critical
Node:                 na/192.168.153.22
Start Time:           Wed, 13 Jul 2022 00:37:51 +0800
Labels:               k8s-app=kube-dns
                      pod-template-hash=6d8c4cb4d
Annotations:          <none>
Status:               Running
IP:                   10.0.0.240
IPs:
  IP:           10.0.0.240
Controlled By:  ReplicaSet/coredns-6d8c4cb4d
Containers:
  coredns:
    Container ID:  docker://cc35b97903b120cb54765641da47c69ea8c833e6c72958407c7e605a5aa001b4
    Image:         registry.aliyuncs.com/google_containers/coredns:v1.8.6
    Image ID:      docker-pullable://registry.aliyuncs.com/google_containers/coredns@sha256:5b6ec0d6de9baaf3e92d0f66cd96a25b9edbce8716f5f15dcd1a616b3abd590e
    Ports:         53/UDP, 53/TCP, 9153/TCP
    Host Ports:    0/UDP, 0/TCP, 0/TCP
    Args:
      -conf
      /etc/coredns/Corefile
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Wed, 13 Jul 2022 00:57:12 +0800
      Finished:     Wed, 13 Jul 2022 00:59:06 +0800
    Ready:          False
    Restart Count:  8
    Limits:
      memory:  170Mi
    Requests:
      cpu:        100m
      memory:     70Mi
    Liveness:     http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
    Readiness:    http-get http://:8181/ready delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:  <none>
    Mounts:
      /etc/coredns from config-volume (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-v8hzn (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  config-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      coredns
    Optional:  false
  kube-api-access-v8hzn:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              kubernetes.io/os=linux
Tolerations:                 CriticalAddonsOnly op=Exists
                             node-role.kubernetes.io/control-plane:NoSchedule
                             node-role.kubernetes.io/master:NoSchedule
                             node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason                  Age   From               Message
  ----     ------                  ----  ----               -------
  Normal   Scheduled               21m   default-scheduler  Successfully assigned kube-system/coredns-6d8c4cb4d-jkssb to na
  Warning  FailedCreatePodSandBox  20m   kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "8ae4c118e4c3ff1c0bd2c601c808cae2c17cbc27552fb148b755b7d798f0bb71" network for pod "coredns-6d8c4cb4d-jkssb": networkPlugin cni failed to set up pod "coredns-6d8c4cb4d-jkssb_kube-system" network: unable to connect to Cilium daemon: failed to create cilium agent client after 30.000000 seconds timeout: Get "http:///var/run/cilium/cilium.sock/v1/config": dial unix /var/run/cilium/cilium.sock: connect: no such file or directory
Is the agent running?
  Normal   SandboxChanged  20m                   kubelet  Pod sandbox changed, it will be killed and re-created.
  Normal   Pulled          20m                   kubelet  Container image "registry.aliyuncs.com/google_containers/coredns:v1.8.6" already present on machine
  Normal   Created         20m                   kubelet  Created container coredns
  Normal   Started         20m                   kubelet  Started container coredns
  Warning  Unhealthy       20m (x2 over 20m)     kubelet  Readiness probe failed: Get "http://10.0.0.240:8181/ready": dial tcp 10.0.0.240:8181: i/o timeout (Client.Timeout exceeded while awaiting headers)
  Warning  Unhealthy       18m (x13 over 20m)    kubelet  Readiness probe failed: Get "http://10.0.0.240:8181/ready": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
  Warning  Unhealthy       15m (x12 over 19m)    kubelet  Liveness probe failed: Get "http://10.0.0.240:8080/health": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
  Normal   Killing         14m                   kubelet  Container coredns failed liveness probe, will be restarted

We can find the healthcheck of cordedns is failed. And the logs of coredns as follow, we can find it can’t access the udp 192.168.153.2:53(i/o timeout).192.168.153.2 is the gateway of my VNet8.

root@nm:/work-place/kubernetes/create-cluster# kubectl logs  coredns-6d8c4cb4d-jkssb -n kube-system
[WARNING] plugin/kubernetes: starting server with unsynced Kubernetes API
.:53
[INFO] plugin/reload: Running configuration MD5 = db32ca3650231d74073ff4cf814959a7
CoreDNS-1.8.6
linux/amd64, go1.17.1, 13a9191
[ERROR] plugin/errors: 2 7607030484537686268.4300248127207674545. HINFO: read udp 10.0.0.240:39983->192.168.153.2:53: i/o timeout
[ERROR] plugin/errors: 2 7607030484537686268.4300248127207674545. HINFO: read udp 10.0.0.240:53240->192.168.153.2:53: i/o timeout
[ERROR] plugin/errors: 2 7607030484537686268.4300248127207674545. HINFO: read udp 10.0.0.240:49802->192.168.153.2:53: i/o timeout
[ERROR] plugin/errors: 2 7607030484537686268.4300248127207674545. HINFO: read udp 10.0.0.240:54428->192.168.153.2:53: i/o timeout
[ERROR] plugin/errors: 2 7607030484537686268.4300248127207674545. HINFO: read udp 10.0.0.240:43974->192.168.153.2:53: i/o timeout
[ERROR] plugin/errors: 2 7607030484537686268.4300248127207674545. HINFO: read udp 10.0.0.240:37821->192.168.153.2:53: i/o timeout
[ERROR] plugin/errors: 2 7607030484537686268.4300248127207674545. HINFO: read udp 10.0.0.240:36545->192.168.153.2:53: i/o timeout
[ERROR] plugin/errors: 2 7607030484537686268.4300248127207674545. HINFO: read udp 10.0.0.240:56785->192.168.153.2:53: i/o timeout
[ERROR] plugin/errors: 2 7607030484537686268.4300248127207674545. HINFO: read udp 10.0.0.240:47913->192.168.153.2:53: i/o timeout
[ERROR] plugin/errors: 2 7607030484537686268.4300248127207674545. HINFO: read udp 10.0.0.240:38162->192.168.153.2:53: i/o timeout
[INFO] SIGTERM: Shutting down servers then terminating
[INFO] plugin/health: Going into lameduck mode for 5s

Try

I do this try.

Other Info

root@nm:/work-place/kubernetes/create-cluster# kubectl describe cm coredns -n kube-system
Name:         coredns
Namespace:    kube-system
Labels:       <none>
Annotations:  <none>

Data
====
Corefile:
----
.:53 {
    errors
    health {
       lameduck 5s
    }
    ready
    kubernetes cluster.local in-addr.arpa ip6.arpa {
       pods insecure
       fallthrough in-addr.arpa ip6.arpa
       ttl 30
    }
    prometheus :9153
    forward . /etc/resolv.conf {
       max_concurrent 1000
    }
    cache 30
    loop
    reload
    loadbalance
}


BinaryData
====

Events:  <none>
root@nm:/home/lzl# cat /var/lib/kubelet/config.yaml 
apiVersion: kubelet.config.k8s.io/v1beta1
authentication:
  anonymous:
    enabled: false
  webhook:
    cacheTTL: 0s
    enabled: true
  x509:
    clientCAFile: /etc/kubernetes/pki/ca.crt
authorization:
  mode: Webhook
  webhook:
    cacheAuthorizedTTL: 0s
    cacheUnauthorizedTTL: 0s
cgroupDriver: systemd
clusterDNS:
- 10.96.0.10
clusterDomain: cluster.local
cpuManagerReconcilePeriod: 0s
evictionPressureTransitionPeriod: 0s
fileCheckFrequency: 0s
healthzBindAddress: 127.0.0.1
healthzPort: 10248
httpCheckFrequency: 0s
imageMinimumGCAge: 0s
kind: KubeletConfiguration
logging:
  flushFrequency: 0
  options:
    json:
      infoBufferSize: "0"
  verbosity: 0
memorySwap: {}
nodeStatusReportFrequency: 0s
nodeStatusUpdateFrequency: 0s
resolvConf: /run/systemd/resolve/resolv.conf
rotateCertificates: true
runtimeRequestTimeout: 0s
shutdownGracePeriod: 0s
shutdownGracePeriodCriticalPods: 0s
staticPodPath: /etc/kubernetes/manifests
streamingConnectionIdleTimeout: 0s
syncFrequency: 0s
volumeStatsAggPeriod: 0s
  • kubeadm flags of kubelet:
root@nm:/home/lzl# cat /var/lib/kubelet/kubeadm-flags.env 
KUBELET_KUBEADM_ARGS="--network-plugin=cni --pod-infra-container-image=registry.aliyuncs.com/google_containers/pause:3.6"

Leave a Comment