Whether initialDelaySeconds is valid in the startupProbe probe of k8s

As we all know, k8s currently has three types of probes. The English explanation of the official document is as follows:

The kubelet uses liveness probes to know when to restart a container. For example, liveness probes could catch a deadlock, where an application is running, but unable to make progress. Restarting a container in such a state can help to make the application more available despite bugs.

The kubelet uses readiness probes to know when a container is ready to start accepting traffic. A Pod is considered ready when all of its containers are ready. One use of this signal is to control which Pods are used as backends for Services. When a Pod is not ready, it is removed from Service load balancers.

The kubelet uses startup probes to know when a container application has started. If such a probe is configured, it disables liveness and readiness checks until it succeeds, making sure those probes don't interfere with the application startup. This can be used to adopt liveness checks on slow starting containers, avoiding them getting killed by the kubelet before they are up and running.

The above English overview is very clear. The liveness probes determine when to restart the container. The English literal understanding of live is also consistent. If the liveness test fails, the container will be restarted. Readiness probes is a concept in traditional load balancing. It is easy to understand. If the detection passes, it will be added to the service pool. Those who are familiar with nginx can compare it with the health detection configured in upstream. Finally, there is the newly added startup probes startup probe. The usage scenario I understand is that the slow-start container runs for a period of time after startup. During this period, I don’t want the other two probes to handle it. Wait for this startup probe After passing, the logic of the other two probes will take over. The advantage of this is that there is an additional checkpoint to protect the slow-start container and avoid unpredictable problems caused by additional detection operations when it is not ready.

With the above basic background, the parameter initialDelaySeconds appears, and the official document explanation is also given

initialDelaySeconds: Number of seconds after the container has started before liveness or readiness probes are initiated. Defaults to 0 seconds. Minimum value is 0.

When you read the above text carefully, do you have any doubts, is the initialDelaySeconds parameter valid for startup probes? I searched all kinds of information on the Internet, but unfortunately no one explained exactly, in another official document

initialDelaySeconds integer Number of seconds after the container has started before liveness probes are initiated. More info: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#container-probes

Is there any mistake? Even the explanation of readiness probes is gone. What should I do? I can’t rely on others. I can only work harder. My method is to check the source code for preliminary confirmation, and then do experimental verification

1. Go to github to search for k8s source code, the probe is a golang structure

 1     type Probe struct {
 2         // The action taken to determine the health of a container
 3         ProbeHandler
 4         // Length of time before health checking is activated. In seconds.
 5         // +optional
 6         InitialDelaySeconds int32
 7         // Length of time before health checking times out. In seconds.
 8         // +optional
 9         TimeoutSeconds int32
10         // How often (in seconds) to perform the probe.
11         // +optional
12         PeriodSecondsint32
13         // Minimum consecutive successes for the probe to be considered successful after having failed.
14         // Must be 1 for liveness and startup.
15         // +optional
16         SuccessThreshold int32
17         // Minimum consecutive failures for the probe to be considered failed after having succeeded.
18         // +optional
19         FailureThreshold int32
20         // Optional duration in seconds the pod needs to terminate gracefully upon probe failure.
21         // The grace period is the duration in seconds after the processes running in the pod are sent
22         // a termination signal and the time when the processes are forcibly halted with a kill signal.
23         // Set this value longer than the expected cleanup time for your process.
24         // If this value is nil, the pod's terminationGracePeriodSeconds will be used. Otherwise, this
25         // value overrides the value provided by the pod spec.
26         // Value must be non-negative integer. The value zero indicates stop immediately via
27         // the kill signal (no opportunity to shut down).
28         // This is a beta field and requires enabling ProbeTerminationGracePeriod feature gate.
29         // +optional
30         TerminationGracePeriodSeconds *int64
31     }

The above does not say that startup probes cannot use the InitialDelaySeconds parameter

2. Do experiments, Edison did the same, no problem

 1     apiVersion: v1
 2     kind: Pod
 3     metadata:
 4     labels:
 5         test: liveness
 6     name: liveness-exec
 7     spec:
 8     containers:
 9     -name: liveness
10         image: k8s.gcr.io/busybox
11         args:
12         - /bin/sh
13         - -c
14         - touch /tmp/healthy; sleep 30; rm -f /tmp/healthy; sleep 600
15         resources:
16         limits:
17             cpu: 1000m
18             memory: 2G
19         requests:
20             cpu: 1000m
21             memory: 2G
22         livenessProbe:
23         exec:
24             command:
25             - cat
26             - /tmp/healthy
27         initialDelaySeconds: 5
28         periodSeconds: 5
29
30         startupProbe:
31         exec:
32             command:
33             - cat
34             - /tmp/helloworld # This file definitely does not exist
35         failureThreshold: 3
36         periodSeconds: 10
37         initialDelaySeconds: 8 #  test

Save the above file as exec-liveness.yaml, start the pod test

1     kubectl -n xxxx-test apply -f exec-liveness.yaml

Check this repeatedly in the terminal. At the bottom of the output, there is a message showing that the startup detector failed.

1     kubectl -n xxxx-test describe pod liveness-exec

Pay attention to the 9s in the age column, it is very close to 8s, right?

 Events:
 Type Reason Age From Message
 ---- ------- ---- ---- -------
 Normal Scheduled <unknown> default-scheduler Successfully assigned xxxx-test/liveness-exec to 10.132.135.31
 Normal Pulled 9s kubelet, 10.132.135.31 Container image "xxx/docker/busybox:1.24" already present on machine
 Normal Created 9s kubelet, 10.132.135.31 Created container liveness
 Normal Started 9s kubelet, 10.132.135.31 Started container liveness
 Warning Unhealthy 0s kubelet, 10.132.135.31 Startup probe failed: cat: can't open '/tmp/helloworld': No such file or directory

Modify it to initialDelaySeconds: 15, delete the container and check again, you will find that the place becomes 16s

 Events:
 Type Reason Age From Message
 ---- ------- ---- ---- -------
 Normal Scheduled <unknown> default-scheduler Successfully assigned xxxx-test/liveness-exec to 10.132.133.16
 Normal Pulling 16s kubelet, 10.132.133.16 Pulling image "xxx/docker/busybox:1.24"
 Normal Pulled 16s kubelet, 10.132.133.16 Successfully pulled image "xxxx/docker/busybox:1.24"
 Normal Created 16s kubelet, 10.132.133.16 Created container liveness
 Normal Started 16s kubelet, 10.132.133.16 Started container liveness
 Warning Unhealthy 0s kubelet, 10.132.133.16 Startup probe failed: cat: can't open '/tmp/helloworld': No such file or directory

What does it mean? That is the InitialDelaySeconds parameter in the current version, which is valid for startup probes and consistent with the comments in the source code. Unfortunately, the official documentation is somewhat misleading. Finally, I hope that you can use these probes reasonably, find problems in time, and rest easy.

Lastmod: Monday, August 28, 2023

See Also:

Translations: