Ad hoc observability tools on Kubernetes nodes
The following notes cover a variety of tools and methods for capturing and analyzing ad hoc observability data (e.g. resource usage and performance characteristics) on a GKE node.
Some tools that are not container-aware have some unintuitive scoping challenges. Known gotchas are included in their sections.
Helper scripts overview
Section titled “Helper scripts overview”Some helper scripts to facilitate ad hoc observations and exploratory work are available in this repo’s scripts/gke
directory.
When running these scripts you must explicitly run bash <script>
, because our home directories on the GKE nodes are subject to
the noexec
mount option. To run shell scripts we must explicitly invoke the shell binary (in this case /bin/bash
).
The sections below cover more details, but as a brief summary:
Get the helper scripts:
$ git clone https://gitlab.com/gitlab-com/runbooks.git$ cd runbooks/scripts/gke/
CPU profiling:
$ bash perf_flamegraph_for_all_running_processes.sh$ bash perf_flamegraph_for_container_id.sh [container_id]$ bash perf_flamegraph_for_container_of_pid.sh [pid]$ bash perf_flamegraph_for_pod_id.sh [pod_id]
Mapping between IDs for PIDs, pods, containers, and cgroups:
$ bash container_id_for_pid.sh [pid]$ bash container_id_for_cgroup.sh [cgroup_path]
$ bash pod_id_for_pid.sh [pid]$ bash pod_id_for_cgroup.sh [cgroup_path]
$ bash cgroup_for_container_id.sh [container_id]$ bash cgroup_for_pid.sh [pid]$ bash cgroup_for_pod_id.sh [pod_id]
Packet captures:
$ bash tcpdump_on_gke_node.sh [max_duration] [tcpdump_options...]$ bash tcpdump_on_gke_node_for_pod_id.using_pod_iface.sh [pod_id] [max_duration_seconds] [tcpdump_options...]$ bash tcpdump_on_gke_node_for_pod_id.using_pod_netns.sh [pod_id] [max_duration_seconds] [tcpdump_options...]
$ bash ip_addr_for_pod_id.sh [pod_id]
Picking a GKE node
Section titled “Picking a GKE node”All of these tools and methods are meant to be run on a specific GKE node.
If you have not already chosen a target GKE node to observe, here are some ways to list the currently running nodes.
List GKE nodes:
➜ ~ gcloud --project gitlab-pre compute instances list --filter 'name:gke-pre-gitlab-gke-redis'NAME ZONE MACHINE_TYPE PREEMPTIBLE INTERNAL_IP EXTERNAL_IP STATUSgke-pre-gitlab-gke-redis-ratelimiting-231e75c5-mabc us-east1-b c2-standard-8 10.232.20.65 RUNNINGgke-pre-gitlab-gke-redis-ratelimiting-b9fd9cef-42t5 us-east1-c c2-standard-8 10.232.20.41 RUNNINGgke-pre-gitlab-gke-redis-ratelimiting-3378c320-7uwa us-east1-d c2-standard-8 10.232.20.16 RUNNINGgke-pre-gitlab-gke-redis-ratelimiting-3378c320-h1j3 us-east1-d c2-standard-8 10.232.20.67 RUNNING
or
$ kubectl get nodes | grep -e 'NAME' -e 'redis-ratelimiting' | head -n 3NAME STATUS ROLES AGE VERSIONgke-pre-gitlab-gke-redis-ratelimiting-231e75c5-l90m Ready <none> 3d17h v1.21.11-gke.900gke-pre-gitlab-gke-redis-ratelimiting-231e75c5-rpfh Ready <none> 3d17h v1.21.11-gke.900
SSH into node:
➜ ~ gcloud --project gitlab-pre compute ssh gke-pre-gitlab-gke-redis-ratelimiting-231e75c5-mabc
CPU profiling
Section titled “CPU profiling”Once you SSH onto a GKE node, you can capture a CPU profile for a container, a pod, or the whole host.
All of the following helper scripts use the same approach and output format. The main difference is letting you optionally scope the capture to a container or pod:
bash perf_flamegraph_for_all_running_processes.shbash perf_flamegraph_for_container_id.sh [container_id]bash perf_flamegraph_for_container_of_pid.sh [pid]bash perf_flamegraph_for_pod_id.sh [pod_id]
Profiling the whole GKE node
Section titled “Profiling the whole GKE node”Install and run perf_flamegraph_for_all_running_processes.sh
:
igor@gke-pre-gitlab-gke-redis-ratelimiting-231e75c5-mabc ~ $ wget https://gitlab.com/gitlab-com/runbooks/-/raw/master/scripts/gke/perf_flamegraph_for_all_running_processes.sh
igor@gke-pre-gitlab-gke-redis-ratelimiting-231e75c5-mabc ~ $ bash perf_flamegraph_for_all_running_processes.sh
...
Starting capture for 60 seconds.[ perf record: Woken up 33 times to write data ][ perf record: Captured and wrote 9.275 MB perf.data (47520 samples) ]Please do not use --share-system anymore, use $SYSTEMD_NSPAWN_SHARE_* instead.Spawning container igor-gcr.io_google-containers_toolbox-20201104-00 on /var/lib/toolbox/igor-gcr.io_google-containers_toolbox-20201104-00.Press ^] three times within 1s to kill container.Container igor-gcr.io_google-containers_toolbox-20201104-00 exited successfully.
Results:Flamegraph: /tmp/perf-record-results.6VUGMYa4/gke-pre-gitlab-gke-redis-ratelimiting-231e75c5-mabc.20220331_144401_UTC.all_cpus.flamegraph.svgRaw stack traces: /tmp/perf-record-results.6VUGMYa4/gke-pre-gitlab-gke-redis-ratelimiting-231e75c5-mabc.20220331_144401_UTC.all_cpus.perf-script.txt.gz
Now you can scp the flamegraph to your local machine:
➜ ~ gcloud --project gitlab-pre compute scp gke-pre-gitlab-gke-redis-ratelimiting-231e75c5-mabc:/tmp/perf-record-results.6VUGMYa4/gke-pre-gitlab-gke-redis-ratelimiting-231e75c5-mabc.20220331_144401_UTC.all_cpus.flamegraph.svg .
Profiling a specific container
Section titled “Profiling a specific container”Similar to the above, you can profile a single container rather than the whole host.
Find and ssh into a GKE node:
gcloud compute instances list --project gitlab-pre --filter 'name:gke-pre-gitlab-gke-redis'gcloud compute ssh --project gitlab-pre gke-pre-gitlab-gke-redis-ratelimiting-231e75c5-mabc
On the GKE node, fetch the helper script:
git clone https://gitlab.com/gitlab-com/runbooks.gitcd runbooks/scripts/gke/
Choose your target PID, and run the profiler on just its container:
TARGET_PID=$( pgrep -n -f 'redis-server .*:6379' )bash ./perf_flamegraph_for_container_of_pid.sh $TARGET_PID
Alternately, you can specify the target container by id:
CONTAINER_ID=$( crictl ps --latest --quiet --name 'redis' )bash ./perf_flamegraph_for_container_id.sh $CONTAINER_ID
On your laptop, download the flamegraph (and optionally the perf-script output for fine-grained analysis):
gcloud compute scp --project gitlab-pre gke-pre-gitlab-gke-redis-ratelimiting-231e75c5-mabc:/tmp/perf-record-results.oI3YR698/gke-pre-gitlab-gke-redis-ratelimiting-231e75c5-mabc.20220414_010939_UTC.container_of_pid_7154.flamegraph.svg .
gcloud compute scp --project gitlab-pre gke-pre-gitlab-gke-redis-ratelimiting-231e75c5-mabc:/tmp/perf-record-results.oI3YR698/gke-pre-gitlab-gke-redis-ratelimiting-231e75c5-mabc.20220414_010939_UTC.container_of_pid_7154.perf-script.txt.gz .
Finding relationships between processes, containers, and pods
Section titled “Finding relationships between processes, containers, and pods”After SSHing to a GKE node, you can use crictl
to view containers and pods running on that node.
To find which container or pod owns a process, you can use the utilities in the runbooks repo’s scripts/gke
directory,
as shown below.
This is all run from a normal shell on the GKE node (outside the target container, in the GKE node’s root process namespace).
Choose a process to inspect.
Note: This is the process’s PID from the host’s root namespace, not its PID from within the container’s namespace.
msmiley@gke-pre-gitlab-gke-redis-ratelimiting-231e75c5-mabc ~ $ TARGET_PID=$( pgrep -f 'redis-server.*:6379' )
msmiley@gke-pre-gitlab-gke-redis-ratelimiting-231e75c5-mabc ~ $ ps uwf $TARGET_PIDUSER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMANDchronos+ 7154 1.5 0.0 139808 10200 ? Ssl Mar15 720:01 redis-server *:6379
Find the process’s container id, and use crictl
to display the container’s summary info:
msmiley@gke-pre-gitlab-gke-redis-ratelimiting-231e75c5-mabc ~ $ crictl ps --id $( bash container_id_for_pid.sh "$TARGET_PID" )CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID89e7d6da5f404 be0431d8c1328 4 weeks ago Running redis 0 bb2d2ec8e1499
Find the process’s pod id, and use crictl
to display all containers in that pod:
msmiley@gke-pre-gitlab-gke-redis-ratelimiting-231e75c5-mabc ~ $ crictl ps --pod $( bash pod_id_for_pid.sh "$TARGET_PID" )CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD IDf70ffec5e32e7 2600417bb7548 4 weeks ago Running metrics 0 bb2d2ec8e1499ad8d09b753bd6 de6f7fadcaf3d 4 weeks ago Running sentinel 0 bb2d2ec8e149989e7d6da5f404 be0431d8c1328 4 weeks ago Running redis 0 bb2d2ec8e1499
Quick reference: Explore pods, containers, and images using crictl
Section titled “Quick reference: Explore pods, containers, and images using crictl”All GKE nodes have the crictl
utility installed by default.
This tool queries the container runtime (in this case runc/containerd) to expose details of the running pods and containers.
The following quick reference covers some common uses:
$ crictl ps
$ crictl ps --pod $POD_ID
$ crictl pods
$ crictl pods --namespace redis
$ crictl pods --label 'app.kubernetes.io/instance=redis-ratelimiting'
$ POD_ID=$( crictl pods --latest --quiet --namespace redis )$ CONTAINER_ID=$( crictl ps --latest --quiet --name sentinel )
$ crictl inspectp $POD_ID$ crictl inspect $CONTAINER_ID$ crictl inspecti $IMAGE_ID
$ crictl exec -it $CONTAINER_ID ps -ef$ crictl exec -it $CONTAINER_ID which redis-cli
$ crictl --help$ crictl ps --help
Packet captures on a GKE node
Section titled “Packet captures on a GKE node”Capturing a small sample of network traffic can be diagnostically useful.
The following helper scripts facilitate this:
tcpdump_on_gke_node.sh [max_duration_seconds] [tcpdump_options]tcpdump_on_gke_node_for_pod_id.using_pod_iface.sh [pod_id] [max_duration_seconds] [tcpdump_options]tcpdump_on_gke_node_for_pod_id.using_pod_netns.sh [pod_id] [max_duration_seconds] [tcpdump_options]
GKE nodes do not have tcpdump
installed by default. These scripts install it via toolbox,
choose a network interface to capture, run tcpdump
with whatever additional options you specify,
and write the resulting compressed capture file to /var/pcap/
. From there, you can download
the pcap file for analysis.
Quick reference: Packet capture commands
Section titled “Quick reference: Packet capture commands”All of the following commands are run on a GKE node, outside of any container.
$ git clone https://gitlab.com/gitlab-com/runbooks.git$ cd runbooks/scripts/gke/
$ bash ./tcpdump_on_gke_node.sh 30 -c 10000
$ bash ./tcpdump_on_gke_node.sh 10 'port 6379'
$ crictl pods$ POD_ID=$( crictl pods --latest --quiet --namespace redis )
$ bash ./ip_addr_for_pod_id.sh $POD_ID
$ bash ./tcpdump_on_gke_node_for_pod_id.using_pod_netns.sh "$POD_ID" 10
$ bash ./tcpdump_on_gke_node_for_pod_id.using_pod_iface.sh "$POD_ID" 10
$ bash ./tcpdump_on_gke_node_for_pod_id.using_pod_netns.sh "$POD_ID" 10 'dst port 6379'
Gotchas
Section titled “Gotchas”Before running large or long-running packet captures, be aware of the following gotchas:
- Disk space:
- The pcap files are saved to
/var/pcap
on the GKE host. - This is a disk-backed filesystem, so writing there is safer than writing to a
tmpfs
filesystem (i.e. does not consume memory). - However, its space is limited. The same single filesystem holds container images and logs. So be frugal.
- If you generate a large pcap file, delete it when done.
- The pcap files are saved to
- Cannot Ctrl-C a running capture:
-
Because we must run
tcpdump
inside thetoolbox
container, we cannot interactively interrupt the capture by pressing Ctrl-C. -
Choose a reasonable max duration when running the tcpdump wrapper scripts. Optionally also specify a max packet count (
-c 10000
). -
If you need to quickly kill a capture, start a new shell on the GKE node and kill the
tcpdump
process:sudo pkill tcpdump
-
Clone the tools onto your target GKE node
Section titled “Clone the tools onto your target GKE node”Find and ssh into a GKE node:
gcloud --project gitlab-pre compute instances list --filter 'name:gke-pre-gitlab-gke-redis'gcloud compute ssh --project gitlab-pre gke-pre-gitlab-gke-redis-ratelimiting-231e75c5-rpfh
On the GKE node, fetch the helper script:
git clone https://gitlab.com/gitlab-com/runbooks.gitcd runbooks/scripts/gke/
Capturing traffic at the host or pod level
Section titled “Capturing traffic at the host or pod level”Note: Treat packet captures as potentially sensitive data.
Typically each pod runs in its own network namespace. Multiple pods may have processes listening on the same port.
To capture traffic on that port for all pods, capture at the host level. The following example captures for 10 seconds all traffic on port 6379 in both directions for all pods.
bash ./tcpdump_on_gke_node.sh 10 'port 6379'
To do the same capture for a specific single pod, choose a pod id, and run:
POD_ID=$( crictl pods --latest --quiet --namespace redis )bash ./tcpdump_on_gke_node_for_pod_id.using_pod_netns.sh "$POD_ID" 10 'port 6379'
For reference during analysis, it is often helpful to know the pod’s IP address:
$ bash ./ip_addr_for_pod_id.sh $POD_ID10.235.8.8
When the capture finishes, it will show the full path to the output pcap file.
On your laptop, download the file using gcloud compute scp
:
$ gcloud compute scp --project gitlab-pre gke-pre-gitlab-gke-redis-ratelimiting-231e75c5-rpfh:/var/pcap/gke-pre-gitlab-gke-redis-ratelimiting-231e75c5-rpfh.pod_7373dfce7087b83d7d9040dd3149a3a81ff14c946f572aa7949a39a694eb69b6.20220520_224447.pcap.gz /tmp/No zone specified. Using zone [us-east1-b] for instance: [gke-pre-gitlab-gke-redis-ratelimiting-231e75c5-rpfh].External IP address was not found; defaulting to using IAP tunneling.gke-pre-gitlab-gke-redis-ratelimiting-231e75c5-rpfh.pod_7373dfce7087b83d7d9040dd3149a3a81ff14c946f572aa79 100% 23KB 152.1KB/s 00:00
Capture
Section titled “Capture”Choose a GKE host that is running a pod we want to observe.
Here we use kubectl get pods
to find a node that is currently running a redis pod.
msmiley@saoirse:~$ kubectl get pods -n redis -o wide | grep 'redis'redis-ratelimiting-node-0 4/4 Running 2 30h 10.235.8.8 gke-pre-gitlab-gke-redis-ratelimiting-231e75c5-rpfh <none> <none>
SSH to the GKE node.
msmiley@saoirse:~$ gcloud compute ssh --project gitlab-pre gke-pre-gitlab-gke-redis-ratelimiting-231e75c5-rpfh
Choose a pod.
msmiley@gke-pre-gitlab-gke-redis-ratelimiting-231e75c5-rpfh ~/runbooks/scripts/gke $ POD_ID=$( crictl pods --latest --quiet --namespace redis )
msmiley@gke-pre-gitlab-gke-redis-ratelimiting-231e75c5-rpfh ~/runbooks/scripts/gke $ echo $POD_ID7373dfce7087b83d7d9040dd3149a3a81ff14c946f572aa7949a39a694eb69b6
msmiley@gke-pre-gitlab-gke-redis-ratelimiting-231e75c5-rpfh ~/runbooks/scripts/gke $ crictl ps --pod $POD_IDCONTAINER IMAGE CREATED STATE NAME ATTEMPT POD IDf090b746ef29c 9319ef6fd72b0 31 hours ago Running sentinel 2 7373dfce7087b35c62fcef2873 6847317f2c777 31 hours ago Running process-exporter 0 7373dfce7087bf2927ce7eafd3 a9ef660d48b40 31 hours ago Running metrics 0 7373dfce7087b8ad5f5213991b c7504ce63cbcb 31 hours ago Running redis 0 7373dfce7087b
Run the capture.
msmiley@gke-pre-gitlab-gke-redis-ratelimiting-231e75c5-rpfh ~/runbooks/scripts/gke $ bash ./tcpdump_on_gke_node_for_pod_id.using_pod_netns.sh "$POD_ID" 10 'port 6379'Setting up toolbox. (May take up to a minute during first run.)tcpdump is already the newest version (4.9.3-1~deb10u1).0 upgraded, 0 newly installed, 0 to remove and 1 not upgraded.jq is already the newest version (1.5+dfsg-2+b1).0 upgraded, 0 newly installed, 0 to remove and 1 not upgraded.
Capturing to pcap file: gke-pre-gitlab-gke-redis-ratelimiting-231e75c5-rpfh.pod_7373dfce7087b83d7d9040dd3149a3a81ff14c946f572aa7949a39a694eb69b6.20220520_224447.pcapPlease do not use --share-system anymore, use $SYSTEMD_NSPAWN_SHARE_* instead.Spawning container msmiley-gcr.io_google-containers_toolbox-20201104-00 on /var/lib/toolbox/msmiley-gcr.io_google-containers_toolbox-20201104-00.Press ^] three times within 1s to kill container.tcpdump: listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes644 packets captured847 packets received by filter0 packets dropped by kernelContainer msmiley-gcr.io_google-containers_toolbox-20201104-00 exited successfully.Compressing pcap file./media/root/var/pcap/gke-pre-gitlab-gke-redis-ratelimiting-231e75c5-rpfh.pod_7373dfce7087b83d7d9040dd3149a3a81ff14c946f572aa7949a39a694eb69b6.20220520_224447.pcap: 81.5% -- replaced with /media/root/var/pcap/gke-pre-gitlab-gke-redis-ratelimiting-231e75c5-rpfh.pod_7373dfce7087b83d7d9040dd3149a3a81ff14c946f572aa7949a39a694eb69b6.20220520_224447.pcap.gzResults:-rw-r--r-- 1 root root 23K May 20 22:45 /var/pcap/gke-pre-gitlab-gke-redis-ratelimiting-231e75c5-rpfh.pod_7373dfce7087b83d7d9040dd3149a3a81ff14c946f572aa7949a39a694eb69b6.20220520_224447.pcap.gz
Download the pcap file for analysis.
msmiley@saoirse:~$ gcloud compute scp --project gitlab-pre gke-pre-gitlab-gke-redis-ratelimiting-231e75c5-rpfh:/var/pcap/gke-pre-gitlab-gke-redis-ratelimiting-231e75c5-rpfh.pod_7373dfce7087b83d7d9040dd3149a3a81ff14c946f572aa7949a39a694eb69b6.20220520_224447.pcap.gz /tmp/No zone specified. Using zone [us-east1-b] for instance: [gke-pre-gitlab-gke-redis-ratelimiting-231e75c5-rpfh].External IP address was not found; defaulting to using IAP tunneling.gke-pre-gitlab-gke-redis-ratelimiting-231e75c5-rpfh.pod_7373dfce7087b83d7d9040dd3149a3a81ff14c946f572aa79 100% 23KB 152.1KB/s 00:00
Analysis
Section titled “Analysis”Analyze the pcap as you normally would. Example:
$ PCAP_FILE=/tmp/gke-pre-gitlab-gke-redis-ratelimiting-231e75c5-rpfh.pod_7373dfce7087b83d7d9040dd3149a3a81ff14c946f572aa7949a39a694eb69b6.20220520_224447.pcap.gz
$ export TZ=UTC
$ capinfos -aceuxs $PCAP_FILEFile name: /tmp/gke-pre-gitlab-gke-redis-ratelimiting-231e75c5-rpfh.pod_7373dfce7087b83d7d9040dd3149a3a81ff14c946f572aa7949a39a694eb69b6.20220520_224447.pcap.gzNumber of packets: 644File size: 23 kBCapture duration: 8.550298 secondsFirst packet time: 2022-05-20 22:44:50.497803Last packet time: 2022-05-20 22:44:59.048101Average packet rate: 75 packets/s
$ tshark -r $PCAP_FILE -q -z conv,ip================================================================================IPv4 ConversationsFilter:<No Filter> | <- | | -> | | Total | Relative | Duration | | Frames Bytes | | Frames Bytes | | Frames Bytes | Start | |10.232.4.103 <-> 10.235.8.8 105 19 kB 105 21 kB 210 40 kB 0.114044000 8.380910.232.4.101 <-> 10.235.8.8 92 7,999 bytes 79 20 kB 171 28 kB 0.000000000 8.550310.235.8.8 <-> 10.232.4.102 49 16 kB 62 5,581 bytes 111 22 kB 0.138301000 8.356610.235.8.8 <-> 10.235.8.8 0 0 bytes 96 18 kB 96 18 kB 0.138327000 8.3566127.0.0.1 <-> 127.0.0.1 0 0 bytes 56 4,204 bytes 56 4,204 bytes 0.463964000 5.0367================================================================================
$ tshark -r $PCAP_FILE -q -z conv,tcp | head================================================================================TCP ConversationsFilter:<No Filter> | <- | | -> | | Total | Relative | Duration | | Frames Bytes | | Frames Bytes | | Frames Bytes | Start | |10.232.4.101:6379 <-> 10.235.8.8:37806 48 3,642 bytes 48 9,609 bytes 96 13 kB 0.000000000 8.494910.232.4.102:6379 <-> 10.235.8.8:49648 36 2,448 bytes 36 10 kB 72 13 kB 0.143919000 8.351010.232.4.103:6379 <-> 10.235.8.8:38996 36 2,448 bytes 36 10 kB 72 13 kB 0.143965000 8.351010.235.8.8:6379 <-> 10.235.8.8:52978 27 1,836 bytes 27 8,230 bytes 54 10 kB 0.143990000 8.350910.235.8.8:6379 <-> 10.232.4.103:43333 27 1,836 bytes 27 8,230 bytes 54 10 kB 0.144002000 8.3510
$ tshark -r $PCAP_FILE -T fields -e tcp.flags.str | sort | uniq -c 284 ·······A···· 8 ·······A···F 344 ·······AP··· 4 ·······A··S· 4 ··········S·
$ tshark -r $PCAP_FILE -Y 'tcp.flags.syn == 1' -T fields -e frame.time -e tcp.flags.str -e ip.src -e ip.dst -e tcp.srcport -e tcp.dstport | head -n4May 20, 2022 22:44:50.961767000 UTC ··········S· 127.0.0.1 127.0.0.1 39114 6379May 20, 2022 22:44:50.961774000 UTC ·······A··S· 127.0.0.1 127.0.0.1 6379 39114May 20, 2022 22:44:50.994887000 UTC ··········S· 127.0.0.1 127.0.0.1 39116 6379May 20, 2022 22:44:50.994896000 UTC ·······A··S· 127.0.0.1 127.0.0.1 6379 39116
$ wireshark -r $PCAP_FILE -Y 'ip.addr == 127.0.0.1 && tcp.port == 39114 && tcp.port == 6379'
Misc other resource usage observability tools on a GKE node
Section titled “Misc other resource usage observability tools on a GKE node”Sometimes we need to run ad hoc observations of a running container, but its image may lack the required tooling. In such cases we can either install and run the tool outside of the target container or add it to the container.
Both of those options have trade-offs and depending on the circumstances either may be impractical or undesirable. For example, adding a tool to a container implicitly gives it the same context and isolation properties as the processes to be observed, but the tool may also have conflicting library dependencies or require elevated privileges. Conversely, installing and running a tool from outside the target container means it lacks visibility into the container’s mount namespace, and consequently cannot resolve paths or inspect files within the container. There are sometimes work-arounds for these problems (e.g. statically linking the tool before copying it into the container to avoid library conflicts; using tools that natively switch namespaces when needed, etc.), but these solutions are not always convenient or supported.
For each of the following tools, we show installation, usage, and known gotchas for both running the tool and interpreting its output.
Using vmstat
Section titled “Using vmstat”vmstat
reports host-level resource usage statistics.
vmstat
is not container-aware. Typically it reports host-level utilization metrics
whether or not it is run inside a container.
Usually you want to ignore its first line of output, since that represents averages since the last reboot. All subsequent measurements are real-time averages for the requested interval seconds.
To install and run vmstat
using toolbox
, run the following on the GKE node:
Install vmstat.$ toolbox apt update -y$ toolbox apt install -y procps$ toolbox which vmstat
Run vmstat via toolbox, using a 1-second interval 3 times.$ toolbox vmstat --wide 1 3
Using pidstat
Section titled “Using pidstat”pidstat
gathers and reports process-level statistics, covering many of the same metrics as sar
.
To install and run pidstat
using toolbox
, run the following on the GKE node:
Install pidstat.$ toolbox apt update -y$ toolbox apt install -y sysstat$ toolbox which pidstat
Choose a target host PID (not a container PID).This can optionally be a comma-delimited list of PIDs, as shown here.$ TARGET_PID_LIST=$( pgrep -f 'puma: cluster worker' | tr '\n' ',' )
Run pidstat via toolbox for the chosen PIDs, using a 1-second interval 3 times.$ toolbox pidstat -p $TARGET_PID_LIST 1 3
Run pidstat, matching processes on their full command line (rather than specifying PIDs).$ toolbox pidstat -l -G 'puma: cluster worker'
Show all threads for a matching PID.$ toolbox pidstat -t -p $TARGET_PID 1 3
Tips for interpreting pidstat
output
Section titled “Tips for interpreting pidstat output”pidstat
is not aware of cgroup limits.- CPU utilization percentage represents mean CPU time / wallclock time during the interval.
- CPU utilization can exceed 100% if a process has multiple threads concurrently running on CPUs.
- Even a completely CPU-bound process may not reach 100% utilization if it is starved for CPU time by:
- the container’s cgroup limit being budgeted less than 1 CPU core
- competing processes within the container using CPU time
- competing processes outside the container using CPU time, if the host is oversubscribed (which is common)
Background and demo: Which PID to use for a process?
Section titled “Background and demo: Which PID to use for a process?”pidstat
can be run either inside or outside of the target container, but since it is usually not included
by default, here we show how to run it from outside of the target container using toolbox
.
The same process can have a different identifier (PID) in each PID namespace, and for tools like pidstat
(or ps
, etc.) that take a PID as an argument, we must give them the PID that the target process uses
in the namespace where we are running the tool.
msmiley@gke-gprd-us-east1-b-api-2-c1ce60b7-mfxl ~ $ CONTAINER_ID=$( crictl ps --name webservice --latest --quiet )msmiley@gke-gprd-us-east1-b-api-2-c1ce60b7-mfxl ~ $ crictl exec $CONTAINER_ID sleep 123 &
msmiley@gke-gprd-us-east1-b-api-2-c1ce60b7-mfxl ~ $ crictl exec $CONTAINER_ID pgrep -n -a -f 'sleep 123'1089 sleep 123
msmiley@gke-gprd-us-east1-b-api-2-c1ce60b7-mfxl ~ $ pgrep -n -a -f 'sleep 123'789724 sleep 123
msmiley@gke-gprd-us-east1-b-api-2-c1ce60b7-mfxl ~ $ toolbox pidstat -p 1089 1 3Linux 5.4.170+ (gke-gprd-us-east1-b-api-2-c1ce60b7-mfxl) 05/24/22 _x86_64_ (30 CPU)
23:25:37 UID PID %usr %system %guest %wait %CPU CPU Command
msmiley@gke-gprd-us-east1-b-api-2-c1ce60b7-mfxl ~ $ toolbox pidstat -p 789724 1 3Linux 5.4.170+ (gke-gprd-us-east1-b-api-2-c1ce60b7-mfxl) 05/24/22 _x86_64_ (30 CPU)
23:25:44 UID PID %usr %system %guest %wait %CPU CPU Command23:25:45 1000 789724 0.00 0.00 0.00 0.00 0.00 1 sleep23:25:46 1000 789724 0.00 0.00 0.00 0.00 0.00 1 sleep23:25:47 1000 789724 0.00 0.00 0.00 0.00 0.00 1 sleepAverage: 1000 789724 0.00 0.00 0.00 0.00 0.00 - sleep
How does pidstat
represent container CPU limit
Section titled “How does pidstat represent container CPU limit”pidstat
is not aware of container cgroup limits.
It reports CPU utilization as used CPU seconds divided by wallclock seconds (not divided by quota seconds).
When the container cgroup is using a hard CPU limit (a CFS quota), if it runs out of quota, the wallclock time
spent stalled waiting for quota to refill is reported by pidstat
as wait time.
Example:
This demo shows that when a container’s CPU quota is exhausted, its processes do not actually get scheduled onto a CPU until the quota refills again (every 100 ms), and “pidstat” reflects this starvation as “%wait”.
$ docker run -it --rm --name test_cpu_quota_starvation ubuntu:latest /bin/bash -c 'while true ; do foo=1 ; done'
$ CONTAINER_ID=$( docker ps --quiet --no-trunc --filter 'name=test_cpu_quota_starvation' )$ TARGET_PID=$( cat /sys/fs/cgroup/cpu/docker/$CONTAINER_ID/cgroup.procs | head -n1 )
$ pidstat -p $TARGET_PID 1 3Linux 5.13.0-41-generic (saoirse) 05/25/2022 _x86_64_ (8 CPU)
10:51:39 PM UID PID %usr %system %guest %wait %CPU CPU Command10:51:40 PM 0 161603 100.00 1.00 0.00 1.00 101.00 3 bash10:51:41 PM 0 161603 99.00 0.00 0.00 0.00 99.00 3 bash10:51:42 PM 0 161603 100.00 0.00 0.00 0.00 100.00 3 bashAverage: 0 161603 99.67 0.33 0.00 0.33 100.00 - bash
$ grep '' /sys/fs/cgroup/cpu/docker/$CONTAINER_ID/cpu.cfs_*/sys/fs/cgroup/cpu/docker/805b4538c8236665bce5de1cce08ef2f27b3b6b5238fadb68597016c5a06e45e/cpu.cfs_period_us:100000/sys/fs/cgroup/cpu/docker/805b4538c8236665bce5de1cce08ef2f27b3b6b5238fadb68597016c5a06e45e/cpu.cfs_quota_us:-1
$ echo 70000 | sudo tee /sys/fs/cgroup/cpu/docker/$CONTAINER_ID/cpu.cfs_quota_us70000
$ pidstat -p $TARGET_PID 1 3Linux 5.13.0-41-generic (saoirse) 05/25/2022 _x86_64_ (8 CPU)
10:54:35 PM UID PID %usr %system %guest %wait %CPU CPU Command10:54:36 PM 0 161603 70.00 0.00 0.00 30.00 70.00 6 bash10:54:37 PM 0 161603 70.00 0.00 0.00 30.00 70.00 6 bash10:54:38 PM 0 161603 70.00 0.00 0.00 31.00 70.00 6 bashAverage: 0 161603 70.00 0.00 0.00 30.33 70.00 - bash
$ for i in $( seq 100 ) ; do sleep 0.01 ; ps -o s= $TARGET_PID ; done | sort | uniq -c 100 R
$ docker stop test_cpu_quota_starvation