参考:如何解决容器中nginx worker process自动设置的问题
参考:nginx start can't get cpu limit and change worker_processes numbers
结论写在前面
启动nginx容器时候设置一个NGINX_ENTRYPOINT_WORKER_PROCESSES_AUTOTUNE=true
的变量,可以解决nginx进程数不准的情况。
问题描述
nginx容器化时,有一个普遍会遇到的问题:如何自动设置nginx worker process的数量?
nginx官方容器镜像的nginx.conf配置文件中,会有一条worker process配置:
worker_processes 1;
它会配置nginx仅启动1个worker。这在nginx容器为1核时,可以良好的工作。
当我们希望nginx给更高的配置,例如4c或者16c,我们需要确保nginx也能启动响应个数的worker process。有两个办法:
- 修改nginx.conf,将worker_processes的个数调整为对应的cpu核数。
- 修改nginx.conf,将worker_processes修改为auto。
第一个方法可行,但是需要修改配置文件,nginx需要reload。实际部署时,必须将nginx.conf作为配置文件挂载,对一些不太熟悉nginx的用来说,心智负担会比较重。
第二个方法,在Kubernetes上会遇到一些问题。通过在容器中观察可以发现,nginx启动的worker process,并没有遵循我们给Pod设置的limit,而是与Pod所在node的cpu核数保持一致。
这在宿主机的cpu核数比较多,而Pod的cpu配置较小时,会因为每个worker分配的时间片比较少,带来明显的响应慢的问题。
问题原因
我们知道,在Kubernetes为容器配置cpu的limits为2时,容器其实并不是真正的“分配了”2个cpu,而是通过cgroup进行了限制。
resources:
limits:
cpu: 500m
memory: 256Mi
requests:
cpu: 500m
memory: 256Mi
我们到这个Pod所在宿主机上去查看相关信息。
# docker inspect 17f5f35c3500|grep -i cgroup
"Cgroup": "",
"CgroupParent": "/kubepods/burstable/podb008ccda-9396-11ea-bc20-ecf4bbd63ee8",
"DeviceCgroupRules": null,
# cd /sys/fs/cgroup/cpu/kubepods/burstable/podb008ccda-9396-11ea-bc20-ecf4bbd63ee8
# cat cpu.cfs_quota_us
200000
# cat cpu.cfs_period_us
100000
可以看到,实际是通过 cpu.cfs_quota_us/cpu.cfs_period_us
来限制该Pod能使用的cpu核数的。
但是nginx的worker_processes,是通过 sysconf(_SC_NPROCESSORS_ONLN)
来查询宿主机上的cpu个数的(getconf _NPROCESSORS_ONLN),我们通过strace来观察下这个过程。
# strace getconf _NPROCESSORS_ONLN
execve("/bin/getconf", ["getconf", "_NPROCESSORS_ONLN"], [/* 23 vars */]) = 0
brk(0) = 0x606000
...
open("/sys/devices/system/cpu/online", O_RDONLY|O_CLOEXEC) = 3
read(3, "0-31\n", 8192) = 5
close(3) = 0
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 5), ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f6a922a0000
write(1, "32\n", 332
可见,getconf _NPROCESSORS_ONLN
实际是通过读取文件/sys/devices/system/cpu/online
来获取cpu个数的。
默认Kubernetes上,/sys/devices/system/cpu/online
文件实际就是宿主机的,因此,nginx启动的worker process个数,与宿主机cpu个数一致,也就不奇怪了。
解决方案
nginx官方已经在两年前帮我们把这个问题解决了。 办法就是启动nginx容器时候设置一个NGINX_ENTRYPOINT_WORKER_PROCESSES_AUTOTUNE=true
的变量
参考:nginx start can't get cpu limit and change worker_processes numbers
原理是什么
参考:docker-nginx/30-tune-worker-processes.sh at master · nginxinc/docker-nginx · GitHub
先是定义了5个变量:
ncpu_online
:getconf _NPROCESSORS_ONLN
可以获取到当前宿主机上的CPU核心数ncpu_cpuset
:判断是否使用的cgroup v1的cpuset,没有的话设置值为ncpu_online
ncpu_quota
:判断是否使用了cgroup v1的cpu quota,没有的话设置值为ncpu_online
ncpu_cpuset_v2
:判断是否使用的cgroup v2的cpuset,没有的话设置值为ncpu_online
ncpu_quota_v2
:判断是否使用了cgroup v2的cpu quota,没有的话设置值为ncpu_online
最后将这5个变量的大小进行排序,取最小的那个变量赋值给ncpu
然后通过sed命令将ncpu
值写到到nginx.conf
配置文件的work_processes值
#!/bin/sh
# vim:sw=2:ts=2:sts=2:et
set -eu
LC_ALL=C
ME=$( basename "$0" )
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
[ "${NGINX_ENTRYPOINT_WORKER_PROCESSES_AUTOTUNE:-}" ] || exit 0
touch /etc/nginx/nginx.conf 2>/dev/null || { echo >&2 "$ME: error: can not modify /etc/nginx/nginx.conf (read-only file system?)"; exit 0; }
ceildiv() {
num=$1
div=$2
echo $(( (num + div - 1) / div ))
}
get_cpuset() {
cpusetroot=$1
cpusetfile=$2
ncpu=0
[ -f "$cpusetroot/$cpusetfile" ] || return 1
for token in $( tr ',' ' ' < "$cpusetroot/$cpusetfile" ); do
case "$token" in
*-*)
count=$( seq $(echo "$token" | tr '-' ' ') | wc -l )
ncpu=$(( ncpu+count ))
;;
*)
ncpu=$(( ncpu+1 ))
;;
esac
done
echo "$ncpu"
}
get_quota() {
cpuroot=$1
ncpu=0
[ -f "$cpuroot/cpu.cfs_quota_us" ] || return 1
[ -f "$cpuroot/cpu.cfs_period_us" ] || return 1
cfs_quota=$( cat "$cpuroot/cpu.cfs_quota_us" )
cfs_period=$( cat "$cpuroot/cpu.cfs_period_us" )
[ "$cfs_quota" = "-1" ] && return 1
[ "$cfs_period" = "0" ] && return 1
ncpu=$( ceildiv "$cfs_quota" "$cfs_period" )
[ "$ncpu" -gt 0 ] || return 1
echo "$ncpu"
}
get_quota_v2() {
cpuroot=$1
ncpu=0
[ -f "$cpuroot/cpu.max" ] || return 1
cfs_quota=$( cut -d' ' -f 1 < "$cpuroot/cpu.max" )
cfs_period=$( cut -d' ' -f 2 < "$cpuroot/cpu.max" )
[ "$cfs_quota" = "max" ] && return 1
[ "$cfs_period" = "0" ] && return 1
ncpu=$( ceildiv "$cfs_quota" "$cfs_period" )
[ "$ncpu" -gt 0 ] || return 1
echo "$ncpu"
}
get_cgroup_v1_path() {
needle=$1
found=
foundroot=
mountpoint=
[ -r "/proc/self/mountinfo" ] || return 1
[ -r "/proc/self/cgroup" ] || return 1
while IFS= read -r line; do
case "$needle" in
"cpuset")
case "$line" in
*cpuset*)
found=$( echo "$line" | cut -d ' ' -f 4,5 )
break
;;
esac
;;
"cpu")
case "$line" in
*cpuset*)
;;
*cpu,cpuacct*|*cpuacct,cpu|*cpuacct*|*cpu*)
found=$( echo "$line" | cut -d ' ' -f 4,5 )
break
;;
esac
esac
done << __EOF__
$( grep -F -- '- cgroup ' /proc/self/mountinfo )
__EOF__
while IFS= read -r line; do
controller=$( echo "$line" | cut -d: -f 2 )
case "$needle" in
"cpuset")
case "$controller" in
cpuset)
mountpoint=$( echo "$line" | cut -d: -f 3 )
break
;;
esac
;;
"cpu")
case "$controller" in
cpu,cpuacct|cpuacct,cpu|cpuacct|cpu)
mountpoint=$( echo "$line" | cut -d: -f 3 )
break
;;
esac
;;
esac
done << __EOF__
$( grep -F -- 'cpu' /proc/self/cgroup )
__EOF__
case "${found%% *}" in
"/")
foundroot="${found##* }$mountpoint"
;;
"$mountpoint")
foundroot="${found##* }"
;;
esac
echo "$foundroot"
}
get_cgroup_v2_path() {
found=
foundroot=
mountpoint=
[ -r "/proc/self/mountinfo" ] || return 1
[ -r "/proc/self/cgroup" ] || return 1
while IFS= read -r line; do
found=$( echo "$line" | cut -d ' ' -f 4,5 )
done << __EOF__
$( grep -F -- '- cgroup2 ' /proc/self/mountinfo )
__EOF__
while IFS= read -r line; do
mountpoint=$( echo "$line" | cut -d: -f 3 )
done << __EOF__
$( grep -F -- '0::' /proc/self/cgroup )
__EOF__
case "${found%% *}" in
"")
return 1
;;
"/")
foundroot="${found##* }$mountpoint"
;;
"$mountpoint" | /../*)
foundroot="${found##* }"
;;
esac
echo "$foundroot"
}
ncpu_online=$( getconf _NPROCESSORS_ONLN )
ncpu_cpuset=
ncpu_quota=
ncpu_cpuset_v2=
ncpu_quota_v2=
cpuset=$( get_cgroup_v1_path "cpuset" ) && ncpu_cpuset=$( get_cpuset "$cpuset" "cpuset.effective_cpus" ) || ncpu_cpuset=$ncpu_online
cpu=$( get_cgroup_v1_path "cpu" ) && ncpu_quota=$( get_quota "$cpu" ) || ncpu_quota=$ncpu_online
cgroup_v2=$( get_cgroup_v2_path ) && ncpu_cpuset_v2=$( get_cpuset "$cgroup_v2" "cpuset.cpus.effective" ) || ncpu_cpuset_v2=$ncpu_online
cgroup_v2=$( get_cgroup_v2_path ) && ncpu_quota_v2=$( get_quota_v2 "$cgroup_v2" ) || ncpu_quota_v2=$ncpu_online
ncpu=$( printf "%s\n%s\n%s\n%s\n%s\n" \
"$ncpu_online" \
"$ncpu_cpuset" \
"$ncpu_quota" \
"$ncpu_cpuset_v2" \
"$ncpu_quota_v2" \
| sort -n \
| head -n 1 )
sed -i.bak -r 's/^(worker_processes)(.*)$/# Commented out by '"$ME"' on '"$(date)"'\n#\1\2\n\1 '"$ncpu"';/' /etc/nginx/nginx.conf
可以启动一个nginx容器验证一下:可以看到worker_processes被改为3了
~]# docker run -d --name nginx -e NGINX_ENTRYPOINT_WORKER_PROCESSES_AUTOTUNE=true --cpus 3 nginx:1.22-alpine
cb8cfcec8e3b34efa318dd236f247d2f0c0a64876141bd6cacfb39329f51b0d0
~]# docker exec -it nginx cat /etc/nginx/nginx.conf
user nginx;
# Commented out by 30-tune-worker-processes.sh on Thu Nov 10 13:42:59 UTC 2022
#worker_processes auto;
worker_processes 3;
error_log /var/log/nginx/error.log notice;
pid /var/run/nginx.pid;
......