k8s的startupProbe探针中initialDelaySeconds是否有效

众所周知,k8s 目前有三种探针,官方文档的英文解释如下:

The kubelet uses liveness probes to know when to restart a container. For example, liveness probes could catch a deadlock, where an application is running, but unable to make progress. Restarting a container in such a state can help to make the application more available despite bugs.

The kubelet uses readiness probes to know when a container is ready to start accepting traffic. A Pod is considered ready when all of its containers are ready. One use of this signal is to control which Pods are used as backends for Services. When a Pod is not ready, it is removed from Service load balancers.

The kubelet uses startup probes to know when a container application has started. If such a probe is configured, it disables liveness and readiness checks until it succeeds, making sure those probes don't interfere with the application startup. This can be used to adopt liveness checks on slow starting containers, avoiding them getting killed by the kubelet before they are up and running.

上面的英文概述很清楚,liveness probes 存活探针决定什么时候重启容器,live 英文字面理解也相符,不通过 liveness 检测,容器就会被重启。readiness probes 就是传统负载均衡中的概念,很好理解,检测通过就加入服务池,熟悉 nginx 的都可以类比 upstream 中配置的健康检测,行就上,不行就下,容器还在。最后是新加的 startup probes 启动探针,我理解的使用场景是,慢启动的容器,在启动开始后,运行了一段时间,期间并不想让另外两种探针来处理,等这个启动探针通过后,再由另外两个探针的逻辑接管,这样的好处是多了一道关卡,保护慢启动的容器,避免还没准备好的时候,额外检测操作来导致不可预知的问题。

有了上述基础背景后,initialDelaySeconds 这个参数就出场了,同样给出官方文档解释

initialDelaySeconds: Number of seconds after the container has started before liveness or readiness probes are initiated. Defaults to 0 seconds. Minimum value is 0.

当你细读上面的文字的时候,是否会有疑问,initialDelaySeconds 参数到底对 startup probes 有效吗?我搜了网上的各种信息,很遗憾没有人确切说明,在官方的另外一个文档中

initialDelaySeconds integer Number of seconds after the container has started before liveness probes are initiated. More info: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#container-probes

有没有搞错,连 readiness probes 的解释都不见了,怎么办呢,靠别人不行,只能自己加倍努力啦,我的方法是翻源码初步确认,再做实验验证

1.去 github 搜索 k8s 源码,探针为一个 golang 结构体

 1    type Probe struct {
 2        // The action taken to determine the health of a container
 3        ProbeHandler
 4        // Length of time before health checking is activated.  In seconds.
 5        // +optional
 6        InitialDelaySeconds int32
 7        // Length of time before health checking times out.  In seconds.
 8        // +optional
 9        TimeoutSeconds int32
10        // How often (in seconds) to perform the probe.
11        // +optional
12        PeriodSeconds int32
13        // Minimum consecutive successes for the probe to be considered successful after having failed.
14        // Must be 1 for liveness and startup.
15        // +optional
16        SuccessThreshold int32
17        // Minimum consecutive failures for the probe to be considered failed after having succeeded.
18        // +optional
19        FailureThreshold int32
20        // Optional duration in seconds the pod needs to terminate gracefully upon probe failure.
21        // The grace period is the duration in seconds after the processes running in the pod are sent
22        // a termination signal and the time when the processes are forcibly halted with a kill signal.
23        // Set this value longer than the expected cleanup time for your process.
24        // If this value is nil, the pod's terminationGracePeriodSeconds will be used. Otherwise, this
25        // value overrides the value provided by the pod spec.
26        // Value must be non-negative integer. The value zero indicates stop immediately via
27        // the kill signal (no opportunity to shut down).
28        // This is a beta field and requires enabling ProbeTerminationGracePeriod feature gate.
29        // +optional
30        TerminationGracePeriodSeconds *int64
31    }

上面并没有说 startup probes 不能使用 InitialDelaySeconds 参数

2.做实验,爱迪生也是这么做的,没毛病

 1    apiVersion: v1
 2    kind: Pod
 3    metadata:
 4    labels:
 5        test: liveness
 6    name: liveness-exec
 7    spec:
 8    containers:
 9    - name: liveness
10        image: k8s.gcr.io/busybox
11        args:
12        - /bin/sh
13        - -c
14        - touch /tmp/healthy; sleep 30; rm -f /tmp/healthy; sleep 600
15        resources:
16        limits:
17            cpu: 1000m
18            memory: 2G
19        requests:
20            cpu: 1000m
21            memory: 2G
22        livenessProbe:
23        exec:
24            command:
25            - cat
26            - /tmp/healthy
27        initialDelaySeconds: 5
28        periodSeconds: 5
29
30        startupProbe:
31        exec:
32            command:
33            - cat
34            - /tmp/helloworld   # 这个文件肯定不存在的
35        failureThreshold: 3
36        periodSeconds: 10
37        initialDelaySeconds: 8  # 测试的就是这个参数

保存上述文件为 exec-liveness.yaml,启动 pod 测试

1    kubectl -n xxxx-test apply -f exec-liveness.yaml

终端里面反复查看这个,在输出结果的最下面,有信息显示启动探测器失败了

1    kubectl -n xxxx-test describe pod liveness-exec

注意看那个 age 栏的 9s,非常接近 8s 对不对

1    Events:
2    Type     Reason     Age        From                    Message
3    ----     ------     ----       ----                    -------
4    Normal   Scheduled  <unknown>  default-scheduler       Successfully assigned xxxx-test/liveness-exec to 10.132.135.31
5    Normal   Pulled     9s         kubelet, 10.132.135.31  Container image "xxx/docker/busybox:1.24" already present on machine
6    Normal   Created    9s         kubelet, 10.132.135.31  Created container liveness
7    Normal   Started    9s         kubelet, 10.132.135.31  Started container liveness
8    Warning  Unhealthy  0s         kubelet, 10.132.135.31  Startup probe failed: cat: can't open '/tmp/helloworld': No such file or directory

修改改成 initialDelaySeconds: 15,删除容器重新查看,会发现那个地方变成 16s

1    Events:
2    Type     Reason     Age        From                    Message
3    ----     ------     ----       ----                    -------
4    Normal   Scheduled  <unknown>  default-scheduler       Successfully assigned xxxx-test/liveness-exec to 10.132.133.16
5    Normal   Pulling    16s        kubelet, 10.132.133.16  Pulling image "xxx/docker/busybox:1.24"
6    Normal   Pulled     16s        kubelet, 10.132.133.16  Successfully pulled image "xxxx/docker/busybox:1.24"
7    Normal   Created    16s        kubelet, 10.132.133.16  Created container liveness
8    Normal   Started    16s        kubelet, 10.132.133.16  Started container liveness
9    Warning  Unhealthy  0s         kubelet, 10.132.133.16  Startup probe failed: cat: can't open '/tmp/helloworld': No such file or directory

说明了什么?那就是 InitialDelaySeconds 参数在目前的版本,对 startup probes 是有效的,和源码里面的注释相符,很遗憾,官方文档多少有些误导大家,最后希望大家合理使用这些探针,及时发现问题,高枕无忧

最后修改于: Monday, February 26, 2024
欢迎关注微信公众号,留言交流。

相关文章:

翻译: