k8s的startupProbe探針中initialDelaySeconds是否有效

衆所周知,k8s目前有三種探針,官方文檔的英文解釋如下:

The kubelet uses liveness probes to know when to restart a container. For example, liveness probes could catch a deadlock, where an application is running, but unable to make progress. Restarting a container in such a state can help to make the application more available despite bugs.

The kubelet uses readiness probes to know when a container is ready to start accepting traffic. A Pod is considered ready when all of its containers are ready. One use of this signal is to control which Pods are used as backends for Services. When a Pod is not ready, it is removed from Service load balancers.

The kubelet uses startup probes to know when a container application has started. If such a probe is configured, it disables liveness and readiness checks until it succeeds, making sure those probes don't interfere with the application startup. This can be used to adopt liveness checks on slow starting containers, avoiding them getting killed by the kubelet before they are up and running.

上面的英文概述很清楚,liveness probes存活探針決定什麼時候重啓容器,live英文字面理解也相符,不通過liveness檢測,容器就會被重啓。readiness probes 就是傳統負載均衡中的概念,很好理解,檢測通過就加入服務池,熟悉nginx的都可以類比 upstream 中配置的健康檢測,行就上,不行就下,容器還在。最後是新加的 startup probes 啓動探針,我理解的使用場景是,慢啓動的容器,在啓動開始後,運行了一段時間,期間並不想讓另外兩種探針來處理,等這個啓動探針通過後,再由另外兩個探針的邏輯接管,這樣的好處是多了一道關卡,保護慢啓動的容器,避免還沒準備好的時候,額外檢測操作來導致不可預知的問題。

有了上述基礎背景後,initialDelaySeconds這個參數就出場了,同樣給出官方文檔解釋

initialDelaySeconds: Number of seconds after the container has started before liveness or readiness probes are initiated. Defaults to 0 seconds. Minimum value is 0.

當你細讀上面的文字的時候,是否會有疑問,initialDelaySeconds參數到底對startup probes有效嗎?我搜了網上的各種信息,很遺憾沒有人確切說明,在官方的另外一個文檔中

initialDelaySeconds integer Number of seconds after the container has started before liveness probes are initiated. More info: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#container-probes

有沒有搞錯,連readiness probes的解釋都不見了,怎麼辦呢,靠別人不行,只能自己加倍努力啦,我的方法是翻源碼初步確認,再做實驗驗證

1.去github搜索k8s源碼,探針爲一個golang結構體

 1    type Probe struct {
 2        // The action taken to determine the health of a container
 3        ProbeHandler
 4        // Length of time before health checking is activated.  In seconds.
 5        // +optional
 6        InitialDelaySeconds int32
 7        // Length of time before health checking times out.  In seconds.
 8        // +optional
 9        TimeoutSeconds int32
10        // How often (in seconds) to perform the probe.
11        // +optional
12        PeriodSeconds int32
13        // Minimum consecutive successes for the probe to be considered successful after having failed.
14        // Must be 1 for liveness and startup.
15        // +optional
16        SuccessThreshold int32
17        // Minimum consecutive failures for the probe to be considered failed after having succeeded.
18        // +optional
19        FailureThreshold int32
20        // Optional duration in seconds the pod needs to terminate gracefully upon probe failure.
21        // The grace period is the duration in seconds after the processes running in the pod are sent
22        // a termination signal and the time when the processes are forcibly halted with a kill signal.
23        // Set this value longer than the expected cleanup time for your process.
24        // If this value is nil, the pod's terminationGracePeriodSeconds will be used. Otherwise, this
25        // value overrides the value provided by the pod spec.
26        // Value must be non-negative integer. The value zero indicates stop immediately via
27        // the kill signal (no opportunity to shut down).
28        // This is a beta field and requires enabling ProbeTerminationGracePeriod feature gate.
29        // +optional
30        TerminationGracePeriodSeconds *int64
31    }

上面並沒有說startup probes不能使用InitialDelaySeconds參數

2.做實驗,愛迪生也是這麼做的,沒毛病

 1    apiVersion: v1
 2    kind: Pod
 3    metadata:
 4    labels:
 5        test: liveness
 6    name: liveness-exec
 7    spec:
 8    containers:
 9    - name: liveness
10        image: k8s.gcr.io/busybox
11        args:
12        - /bin/sh
13        - -c
14        - touch /tmp/healthy; sleep 30; rm -f /tmp/healthy; sleep 600
15        resources:
16        limits:
17            cpu: 1000m
18            memory: 2G
19        requests:
20            cpu: 1000m
21            memory: 2G
22        livenessProbe:
23        exec:
24            command:
25            - cat
26            - /tmp/healthy
27        initialDelaySeconds: 5
28        periodSeconds: 5
29
30        startupProbe:
31        exec:
32            command:
33            - cat
34            - /tmp/helloworld   # 這個文件肯定不存在的
35        failureThreshold: 3
36        periodSeconds: 10
37        initialDelaySeconds: 8  # 測試的就是這個參數

保存上述文件爲exec-liveness.yaml,啓動pod測試

1    kubectl -n xxxx-test apply -f exec-liveness.yaml

終端裏面反覆查看這個,在輸出結果的最下面,有信息顯示啓動探測器失敗了

1    kubectl -n xxxx-test describe pod liveness-exec

注意看那個age欄的9s,非常接近8s對不對

Events:
Type     Reason     Age        From                    Message
----     ------     ----       ----                    -------
Normal   Scheduled  <unknown>  default-scheduler       Successfully assigned xxxx-test/liveness-exec to 10.132.135.31
Normal   Pulled     9s         kubelet, 10.132.135.31  Container image "xxx/docker/busybox:1.24" already present on machine
Normal   Created    9s         kubelet, 10.132.135.31  Created container liveness
Normal   Started    9s         kubelet, 10.132.135.31  Started container liveness
Warning  Unhealthy  0s         kubelet, 10.132.135.31  Startup probe failed: cat: can't open '/tmp/helloworld': No such file or directory

修改改成initialDelaySeconds: 15,刪除容器重新查看,會發現那個地方變成16s

Events:
Type     Reason     Age        From                    Message
----     ------     ----       ----                    -------
Normal   Scheduled  <unknown>  default-scheduler       Successfully assigned xxxx-test/liveness-exec to 10.132.133.16
Normal   Pulling    16s        kubelet, 10.132.133.16  Pulling image "xxx/docker/busybox:1.24"
Normal   Pulled     16s        kubelet, 10.132.133.16  Successfully pulled image "xxxx/docker/busybox:1.24"
Normal   Created    16s        kubelet, 10.132.133.16  Created container liveness
Normal   Started    16s        kubelet, 10.132.133.16  Started container liveness
Warning  Unhealthy  0s         kubelet, 10.132.133.16  Startup probe failed: cat: can't open '/tmp/helloworld': No such file or directory

說明了什麼?那就是InitialDelaySeconds參數在目前的版本,對startup probes是有效的,和源碼裏面的註釋相符,很遺憾,官方文檔多少有些誤導大家,最後希望大家合理使用這些探針,及時發現問題,高枕無憂

最後修改於: Monday, August 28, 2023

相關文章:

翻譯: