使用 kubeadm 安装 k8s

环境准备

启动三个 Debian 虚拟机,不需要桌面环境,降低资源占用,使用的镜像为 debian-11.7.0-amd64-DVD,后续操作全部以 root 用户进行

HOSTNAME ROLE Interface IP cpu/mem/disk
debian-1 master ens33 10.10.10.5 2c/3G/40G
debian-2 node ens33 10.10.10.6 2c/3G/40G
debian-3 node ens33 10.10.10.7 2c/3G/40G

为保证网络环境畅通,已在宿主机上开启了 clash TUN mode,并在虚拟机中配置中科大镜像源

$ cat /etc/apt/sources.list

deb http://mirrors.ustc.edu.cn/debian/ bullseye main
deb-src http://mirrors.ustc.edu.cn/debian/ bullseye main

deb http://security.debian.org/debian-security bullseye-security main contrib
deb-src http://security.debian.org/debian-security bullseye-security main contrib

# bullseye-updates, to get updates before a point release is made;
# see https://www.debian.org/doc/manuals/debian-reference/ch02.en.html#_updates_and_backports
deb http://mirrors.ustc.edu.cn/debian/ bullseye-updates main contrib
deb-src http://mirrors.ustc.edu.cn/debian/ bullseye-updates main contrib

软件包

安装 kubeadm, kubelet, kubectl

# 更新 apt 包索引并安装使用 Kubernetes apt 仓库所需要的包
apt install -y apt-transport-https ca-certificates curl

# 下载 Google Cloud 公开签名秘钥
curl -fsSL https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-archive-keyring.gpg

#添加 Kubernetes apt 仓库(中科大镜像)
echo "deb [signed-by=/etc/apt/keyrings/kubernetes-archive-keyring.gpg] https://mirrors.ustc.edu.cn/kubernetes/apt/ kubernetes-xenial main" | tee /etc/apt/sources.list.d/kubernetes.list

# 更新 apt 包索引,安装 kubelet、kubeadm 和 kubectl,并锁定其版本
apt update
apt install -y kubelet kubeadm kubectl
apt hold kubelet kubeadm kubectl

安装 containerd 作为容器运行时,由于 debian/ubuntu 源中的 containerd 版本太低,不满足 kubeadm 的要求,直接安装会报错 validate service connection: CRI v1 runtime API is not implemented for endpoint \"unix:///var/run/containerd/containerd.sock\": rpc error: code = Unimplemented desc = unknown service runtime.v1.RuntimeService

选用手动安装的方式,从 Docker 文档 中选择合适的包

wget https://download.docker.com/linux/debian/dists/bullseye/pool/stable/amd64/containerd.io_1.6.21-1_amd64.deb
dpkg -i containerd.io_1.6.21-1_amd64.deb

systemctl start containerd
systemctl enable containerd

检查一下安装的版本

$ apt list --installed | grep "kube\|containerd"

containerd.io/now 1.6.21-1 amd64 [installed,local]
cri-tools/kubernetes-xenial,now 1.26.0-00 amd64 [installed,automatic]
kubeadm/kubernetes-xenial,now 1.27.2-00 amd64 [installed]
kubectl/kubernetes-xenial,now 1.27.2-00 amd64 [installed]
kubelet/kubernetes-xenial,now 1.27.2-00 amd64 [installed]
kubernetes-cni/kubernetes-xenial,now 1.2.0-00 amd64 [installed,automatic]

系统配置

CRI 接口

需要为 containerd 开启 CRI 接口,并配置 systemd 为 cgroup 驱动程序

编辑配置文件开启 vim /etc/containerd/containerd.toml

# 注释这一句或者删除 cri
disabled_plugins = ["cri"]

# 设置 cgroup 驱动
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
  ...
  [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
    SystemdCgroup = true

然后重启 containerd,systemctl restart containerd

添加防火墙规则

# master 节点
iptables -A INPUT -p tcp -m multiport --dports 6443,2379,2380,10250,10259,10257 -j ACCEPT

# node 节点

内核配置

还需开启 bridge-nf-call-iptables 和 ip_forward,用于 k8s 网络转发功能,否则会遇到报错

[ERROR FileContent--proc-sys-net-bridge-bridge-nf-call-iptables]: /proc/sys/net/bridge/bridge-nf-call-iptables does not exist
[ERROR FileContent--proc-sys-net-ipv4-ip_forward]: /proc/sys/net/ipv4/ip_forward contents are not set to 1

加载 br_netfilter overlay 内核模块 modprobe br_netfilter && modprobe overlay,然后将下面的配置写入 /etc/sysctl.conf

net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1

sysctl -p 使配置生效,仅打印出配置且不报错,即为成功

初始化集群

初始化控制节点

为了方便后续使用 CNI 插件,这里提前设置 cidr 为 10.100.0.1/24

kubeadm init with --pod-network-cidr=10.100.0.1/24

初始化 node 节点

控制节点命令执行成功后获得一串命令,用于别的节点加入当前集群

kubeadm join 10.10.10.5:6443 --token 66d7jo.qn6hzvskoueshe0v \
        --discovery-token-ca-cert-hash sha256:d5bfe27c8bb6ff6cca1c90e7ab620962f33de2d5b3db5481e814831c290cdfe1

查看集群情况

在 master 节点查看集群所有节点

$ kubectl get node

NAME       STATUS     ROLES           AGE    VERSION
debian-1   NotReady   control-plane   111m   v1.27.2
debian-2   NotReady   <none>          88m    v1.27.2
debian-3   NotReady   <none>          62m    v1.27.2

查看所有 pod

$ kubectl get pod -A

NAMESPACE     NAME                               READY   STATUS             RESTARTS         AGE
kube-system   coredns-5d78c9869d-fgvn9           0/1     Pending            0                111m
kube-system   coredns-5d78c9869d-v8qr8           0/1     Pending            0                111m
kube-system   etcd-debian-1                      1/1     Running            34               110m
kube-system   kube-apiserver-debian-1            1/1     Running            32               111m
kube-system   kube-controller-manager-debian-1   1/1     Running            37               110m
kube-system   kube-proxy-mggbb                   0/1     CrashLoopBackOff   17 (5m5s ago)    88m
kube-system   kube-proxy-nkj64                   0/1     CrashLoopBackOff   13 (4m33s ago)   62m
kube-system   kube-proxy-s25pp                   1/1     Running            39               111m
kube-system   kube-scheduler-debian-1            1/1     Running            32               110m

因为还未安装网络插件,所以上面 Crash Pending NotReady 等状态都是正常的

安装网络插件

这次选择 flannel 作为网络插件。因为部署集群时候均使用了默认配置,所以需要先给集群指定一下网段(这是补救手段,应该在集群初始化时候就设置好)

# 如果初始化时候没设置,这里需要先设置网段为 10.100.0.1/24
kubectl get nodes -o name | xargs -I {} kubectl patch node {} -p '{"spec":{"podCIDR":"10.100.0.1/24"}}'

# 修改 flannel 配置文件内的网段与集群配置对应
wget https://github.com/flannel-io/flannel/releases/latest/download/kube-flannel.yml
#  net-conf.json: |
#   ...
#     "Network": "10.100.0.1/24",
#   ...
kubectl apply -f kube-flannel.yaml

这时候检查 node 和 pod,就都处于正常状态了

遇到的问题

etcd 无限重启

发现 kubectl get node 报 6443 端口无法链接,通过 crictl 查看 apiserver 日志,是 etcd 无法连接,持续观察发现 etcd 容器运行一段时间就会挂掉重启

原因:containerd 未配置 systemcd 作为 cgroup 驱动,配置方法在上面 containerd 配置章节,参考 k8s集群部署时etcd容器不停重启问题及处理 - CSDN

更新集群证书

宿主机休眠之后,k8s 集群证书失效,检查方法 kubeadm certs check-expiration

如需要更新,参考 官方文档,方法为

kubeadm certs renew all

再次检查证书情况,得到如下输出

[check-expiration] Reading configuration from the cluster...
[check-expiration] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'

CERTIFICATE                EXPIRES                  RESIDUAL TIME   CERTIFICATE AUTHORITY   EXTERNALLY MANAGED
admin.conf                 Jun 10, 2024 13:03 UTC   364d            ca                      no
apiserver                  Jun 10, 2024 13:03 UTC   364d            ca                      no
apiserver-etcd-client      Jun 10, 2024 13:03 UTC   364d            etcd-ca                 no
apiserver-kubelet-client   Jun 10, 2024 13:03 UTC   364d            ca                      no
controller-manager.conf    Jun 10, 2024 13:03 UTC   364d            ca                      no
etcd-healthcheck-client    Jun 10, 2024 13:03 UTC   364d            etcd-ca                 no
etcd-peer                  Jun 10, 2024 13:03 UTC   364d            etcd-ca                 no
etcd-server                Jun 10, 2024 13:03 UTC   364d            etcd-ca                 no
front-proxy-client         Jun 10, 2024 13:03 UTC   364d            front-proxy-ca          no
scheduler.conf             Jun 10, 2024 13:03 UTC   364d            ca                      no

CERTIFICATE AUTHORITY   EXPIRES                  RESIDUAL TIME   EXTERNALLY MANAGED
ca                      Jun 08, 2033 06:03 UTC   9y              no
etcd-ca                 Jun 08, 2033 06:03 UTC   9y              no
front-proxy-ca          Jun 08, 2033 06:03 UTC   9y              no

kubectl 8080 拒绝访问

使用 kubectl 命令,得到如下报错 The connection to the server localhost:8080 was refused - did you specify the right host or port?

机器重启后未加载环境变量(kubeadm init 后打印出的命令),官方文档中也有

# 要使非 root 用户可以运行 kubectl,请运行以下命令, 它们也是 kubeadm init 输出的一部分:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

# 或者,如果你是 root 用户,则可以运行:
export KUBECONFIG=/etc/kubernetes/admin.conf

这里使用 root 用户,直接写进 /etc/profile,避免重启之后还需要重复操作