使用Kubespray 部署Kubernetes集群

Kubespray

Posted by BlueFat on Tuesday, May 11, 2021

介绍

Kubespray是一个安装k8s集群的工具,kuberspray对比kubeadm更加简洁内部集成了kubeadm与ansible,通过ansible-playbook 来定义系统与k8s集群部署的任务。

Kubespray 在2.9版本移除kubeadm_enabled (控制是否kubeadm部署kube-apiserver …),意味着kube-apiserver …不再有二进制方式部署了。etcd则默认采用二进制方式部署,可选kubeadm部署。

Kubespray 2.9以上版本采用kubeadm以静态pod方式部署kube-apiserver,kube-scheduler,kube-controller-manager

部署环境准备

kubespray部署专用机器: 192.168.10.220

方案1: 物理部署

# Ansible 要求python版本提高到python3.8及更高版本
yum -y install zlib-devel bzip2-devel openssl-devel ncurses-devel sqlite-devel readline-devel tk-devel libffi-devel gcc make
wget https://www.python.org/ftp/python/3.9.15/Python-3.9.15.tar.xz
tar xf Python-3.9.15.tar.xz 
cd Python-3.9.15/
./configure --enable-optimizations --prefix=/usr/local/python39
make install

# ansible 安装在/usr/local/python39/bin/下
cat << \EOF > /etc/profile.d/python39.sh
export PATH=$PATH:/usr/local/python39/bin
EOF

ln -sv /usr/local/python39/bin/python3 /usr/local/bin/
ln -sv /usr/local/python39/bin/pip3 /usr/local/bin/

# 升级pip和setuptools
python3 -m pip install --upgrade pip
pip3 install --upgrade setuptools

方案2: Docker部署

docker pull quay.io/kubespray/kubespray:v2.20.0
docker run --rm -it --mount type=bind,source="$(pwd)"/inventory/sample,dst=/inventory \
  --mount type=bind,source="${HOME}"/.ssh/id_rsa,dst=/root/.ssh/id_rsa \
  quay.io/kubespray/kubespray:v2.20.0 bash
# 进行容器中运行 kubespray playbooks:
ansible-playbook -i /inventory/inventory.ini --private-key /root/.ssh/id_rsa cluster.yml

Kubespray配置

下载及依赖

https://github.com/kubernetes-sigs/kubespray/releases/tag/v2.20.0

kubernetes 1.24.x

注:要指定版本,不要直接拉取master,有未知的bug

git clone https://github.com/kubernetes-sigs/kubespray.git -b v2.20.0
cd kubespray
pip3 install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/
# pip3 install -r requirements.txt 

自定义集群

# 拷贝集群清单
cp -rfp inventory/sample inventory/mycluster

需要修改的配置文件列表:

  • inventory/mycluster/group_vars/all/*.yml
  • inventory/mycluster/group_vars/k8s-cluster/*.yml

集群网络

vim inventory/mycluster/group_vars/k8s_cluster/k8s-cluster.yml

# 选择网络插件,支持 cilium, calico, weave 和 flannel
kube_network_plugin: cilium

# 设置 Service 网段
kube_service_addresses: 10.233.0.0/18

# 设置 Pod 网段
kube_pods_subnet: 10.233.64.0/18

容器配置

相关配置文件:

  • inventory/mycluster/group_vars/k8s_cluster/k8s-cluster.yml
  • inventory/mycluster/group_vars/all/containerd.yml
  • inventory/mycluster/group_vars/all/cri-o.yml
  • inventory/mycluster/group_vars/all/docker.yml
cat inventory/mycluster/group_vars/k8s_cluster/k8s-cluster.yml

# 支持 docker, crio 和 containerd,推荐 containerd.
container_manager: containerd

# 是否开启 kata containers
kata_containers_enabled: false

修改容器数据目录

vim ./inventory/mycluster/group_vars/all/containerd.yml
containerd_storage_dir: "/data/containerd"

配置容器registry

vim ./inventory/mycluster/group_vars/all/containerd.yml

containerd_registries:
    "docker.io":
    - "http://hub-mirror.c.163.com"
    - "https://mirror.aliyuncs.com"

centos7需启用containerd_snapshotter: "native"不然kubelet报错启动不了

sed -i 's@# containerd_snapshotter: "native"@containerd_snapshotter: "native"@g' inventory/mycluster/group_vars/all/containerd.yml
# 修改后
cat inventory/mycluster/group_vars/all/containerd.yml
containerd_snapshotter: "native"

修改etcd

vim inventory/mycluster/group_vars/all/etcd.yml

etcd_data_dir: /data/etcd

集群证书 默认一年有效期

sed -i 's@auto_renew_certificates: false@auto_renew_certificates: true@g' inventory/mycluster/group_vars/k8s_cluster/k8s-cluster.yml
# 修改后
cat inventory/mycluster/group_vars/k8s_cluster/k8s-cluster.yml
# 是否开启自动更新证书,推荐开启。
auto_renew_certificates: true

打开日志排错

vim inventory/mycluster/group_vars/all/all.yml

unsafe_show_logs: true

使用外部负载器

默认没有对kube-apiserver https做负载高可用
这里使用外部haproxy负载kube-apiserver https

例haproxy配置

listen kubernetes-apiserver-https
  bind 0.0.0.0:6443
  mode tcp
  option tcplog
  option log-health-checks
  balance roundrobin
  timeout client 3h
  timeout server 3h
  server k8s-master01 192.168.10.221:6443 check check-ssl verify none inter 10000
  server k8s-master02 192.168.10.222:6443 check check-ssl verify none inter 10000
  server k8s-master03 192.168.10.223:6443 check check-ssl verify none inter 10000

定义了loadbalancer_apiserver会自动关闭loadbalancer_apiserver_localhost

vim ./inventory/mycluster/group_vars/all/all.yml

apiserver_loadbalancer_domain_name: "apiserver.sundayhk.com"
loadbalancer_apiserver:
  address: 192.168.10.220
  port: 8443

配置主机列表

vim inventory/mycluster/inventory.ini

[all]
master1 ansible_host=192.168.10.221  # ip=10.3.0.1 etcd_member_name=etcd1
master2 ansible_host=192.168.10.222  # ip=10.3.0.2 etcd_member_name=etcd2
master3 ansible_host=192.168.10.223  # ip=10.3.0.3 etcd_member_name=etcd3
node1 ansible_host=192.168.10.224  # ip=10.3.0.4 etcd_member_name=etcd4
node2 ansible_host=192.168.10.225  # ip=10.3.0.5 etcd_member_name=etcd5
node3 ansible_host=192.168.10.226  # ip=10.3.0.6 etcd_member_name=etcd6

[kube_control_plane]
master1
master2
master3

[etcd]
master1
master2
master3

[kube_node]
master1
master2
master3
node1
node2
node3

[calico_rr]

[k8s_cluster:children]
kube_control_plane
kube_node
calico_rr

查看集群部署配置

ansible-inventory -i inventory/mycluster/inventory.ini --list

国内环境安装

在国内进行安装时会因GFW影响而安装失败.

这里采用方案2: daocloud国内源

修改 offline.yml

# 备份
cp inventory/mycluster/group_vars/all/offline.yml{,.bak}
# 修改files_repo
sed -i 's@^# files_repo: .*@files_repo: "https://files.m.daocloud.io"@g' inventory/mycluster/group_vars/all/offline.yml
# 修改registry_repo
sed -i 's@^# kube_image_repo: .*@kube_image_repo: "k8s.m.daocloud.io"@g' inventory/mycluster/group_vars/all/offline.yml
sed -i 's@^# gcr_image_repo: .*@gcr_image_repo: "gcr.m.daocloud.io"@g' inventory/mycluster/group_vars/all/offline.yml
sed -i 's@^# github_image_repo: .*@github_image_repo: "ghcr.m.daocloud.io"@g' inventory/mycluster/group_vars/all/offline.yml
sed -i 's@^# docker_image_repo: .*@docker_image_repo: "docker.m.daocloud.io"@g' inventory/mycluster/group_vars/all/offline.yml
sed -i 's@^# quay_image_repo: .*@quay_image_repo: "quay.m.daocloud.io"@g' inventory/mycluster/group_vars/all/offline.yml
# 取消注释 启用files_repo和registry_host
sed -i -E '/# .*\{\{ files_repo/s/^# //g' inventory/mycluster/group_vars/all/offline.yml
sed -i -E '/# .*\{\{ registry_host/s/^# //g' inventory/mycluster/group_vars/all/offline.yml

修改后,也可以直接复制下面的到offline.yml

cat inventory/mycluster/group_vars/all/offline.yml

files_repo: "https://files.m.daocloud.io"

## Container Registry overrides
kube_image_repo: "k8s.m.daocloud.io" 
gcr_image_repo: "gcr.m.daocloud.io"
github_image_repo: "ghcr.m.daocloud.io"
docker_image_repo: "docker.m.daocloud.io"
quay_image_repo: "quay.m.daocloud.io"

## Kubernetes components
kubeadm_download_url: "{{ files_repo }}/storage.googleapis.com/kubernetes-release/release/{{ kubeadm_version }}/bin/linux/{{ image_arch }}/kubeadm"
kubectl_download_url: "{{ files_repo }}/storage.googleapis.com/kubernetes-release/release/{{ kube_version }}/bin/linux/{{ image_arch }}/kubectl"
kubelet_download_url: "{{ files_repo }}/storage.googleapis.com/kubernetes-release/release/{{ kube_version }}/bin/linux/{{ image_arch }}/kubelet"

## CNI Plugins
cni_download_url: "{{ files_repo }}/github.com/containernetworking/plugins/releases/download/{{ cni_version }}/cni-plugins-linux-{{ image_arch }}-{{ cni_version }}.tgz"

## cri-tools
crictl_download_url: "{{ files_repo }}/github.com/kubernetes-sigs/cri-tools/releases/download/{{ crictl_version }}/crictl-{{ crictl_version }}-{{ ansible_system | lower }}-{{ image_arch }}.tar.gz"

## [Optional] etcd: only if you **DON'T** use etcd_deployment=host
etcd_download_url: "{{ files_repo }}/github.com/etcd-io/etcd/releases/download/{{ etcd_version }}/etcd-{{ etcd_version }}-linux-{{ image_arch }}.tar.gz"

# [Optional] Calico: If using Calico network plugin
calicoctl_download_url: "{{ files_repo }}/github.com/projectcalico/calico/releases/download/{{ calico_ctl_version }}/calicoctl-linux-{{ image_arch }}"
calicoctl_alternate_download_url: "{{ files_repo }}/github.com/projectcalico/calicoctl/releases/download/{{ calico_ctl_version }}/calicoctl-linux-{{ image_arch }}"
# [Optional] Calico with kdd: If using Calico network plugin with kdd datastore
calico_crds_download_url: "{{ files_repo }}/github.com/projectcalico/calico/archive/{{ calico_version }}.tar.gz"

# [Optional] Cilium: If using Cilium network plugin
ciliumcli_download_url: "{{ files_repo }}/github.com/cilium/cilium-cli/releases/download/{{ cilium_cli_version }}/cilium-linux-{{ image_arch }}.tar.gz"

# [Optional] Flannel: If using Falnnel network plugin
flannel_cni_download_url: "{{ files_repo }}/kubernetes/flannel/{{ flannel_cni_version }}/flannel-{{ image_arch }}"

# [Optional] helm: only if you set helm_enabled: true
helm_download_url: "{{ files_repo }}/get.helm.sh/helm-{{ helm_version }}-linux-{{ image_arch }}.tar.gz"

# [Optional] crun: only if you set crun_enabled: true
crun_download_url: "{{ files_repo }}/github.com/containers/crun/releases/download/{{ crun_version }}/crun-{{ crun_version }}-linux-{{ image_arch }}"

# [Optional] kata: only if you set kata_containers_enabled: true
kata_containers_download_url: "{{ files_repo }}/github.com/kata-containers/kata-containers/releases/download/{{ kata_containers_version }}/kata-static-{{ kata_containers_version }}-{{ ansible_architecture }}.tar.xz"

# [Optional] cri-dockerd: only if you set container_manager: docker
cri_dockerd_download_url: "{{ files_repo }}/github.com/Mirantis/cri-dockerd/releases/download/v{{ cri_dockerd_version }}/cri-dockerd-{{ cri_dockerd_version }}.{{ image_arch }}.tgz"

# [Optional] cri-o: only if you set container_manager: crio
# crio_download_base: "download.opensuse.org/repositories/devel:kubic:libcontainers:stable"
# crio_download_crio: "http://{{ crio_download_base }}:/cri-o:/"

# [Optional] runc,containerd: only if you set container_runtime: containerd
runc_download_url: "{{ files_repo }}/github.com/opencontainers/runc/releases/download/{{ runc_version }}/runc.{{ image_arch }}"
containerd_download_url: "{{ files_repo }}/github.com/containerd/containerd/releases/download/v{{ containerd_version }}/containerd-{{ containerd_version }}-linux-{{ image_arch }}.tar.gz"
nerdctl_download_url: "{{ files_repo }}/github.com/containerd/nerdctl/releases/download/v{{ nerdctl_version }}/nerdctl-{{ nerdctl_version }}-{{ ansible_system | lower }}-{{ image_arch }}.tar.gz"

部署集群

ansible-playbook -i inventory/mycluster/inventory.ini \
  --private-key=id_rsa --user=ubuntu -b -v cluster.yml

部署完成

[root@master1 ~]# kubectl get node
NAME          STATUS   ROLES           AGE   VERSION
master1   Ready    control-plane   64m   v1.24.6
master2   Ready    control-plane   64m   v1.24.6
master3   Ready    control-plane   64m   v1.24.6
node1     Ready    <none>          62m   v1.24.6
node2     Ready    <none>          62m   v1.24.6
node3     Ready    <none>          62m   v1.24.6

获取Kubeconfig

部署完成后,可以在master节点上的 /root/.kube/config 路径获取到 kubeconfig 这里使用ansible的fetch模块,将kubeconfig 拷贝下来:

# 这里从master1拉取
ansible -i inventory/mycluster/inventory.ini master1 \
	-m fetch  -a 'src=/root/.kube/config dest=kubeconfig flat=yes' \
	-b  --user=ubuntu --private-key id_rsa 

master1 | CHANGED => {
    "changed": true,
    "checksum": "bbe7e6462702d1bd4a0414a3e97053fa63eaab62",
    "dest": "/root/kubespray/kubeconfig",
    "md5sum": "859683a1484a802d4859db115bb42a16",
    "remote_checksum": "bbe7e6462702d1bd4a0414a3e97053fa63eaab62",
    "remote_md5sum": null
}

$ ls -l kubeconfig 
-rw------- 1 root root 5645 Nov 12 22:11 kubeconfig

获取到kubeconfig后,将 https://127.0.0.1:6443 修改成kube-apiserver负载均衡器的地址:端口,或者其中一台master。

扩容节点

https://kubespray.io/#/docs/nodes

如果要扩容节点,可以准备好节点的内网 IP 列表,并追加到之前的 inventory 文件里,然后再次使用 ansible-playbook 运行一次,有点不同的是: cluster.yml 换成 scale.yml:

ansible-playbook -i inventory/mycluster/inventory.ini \
  --private-key=id_rsa --user=ubuntu -b \
  scale.yml --limit=NEW_NODE_NAME

您可以使用--limit=NODE_NAME限制 Kubespray 以避免干扰集群中的其他节点。 在没有使用--limitplaybook会运行facts.yml刷新所有节点的fact缓存。

缩容节点

如果有节点不再需要了,我们可以将其移除集群,通常步骤是:

  • 1.kubectl cordon NODE 驱逐节点,确保节点上的服务飘到其它节点上去,参考安全维护或下线节点
  • 2.停止节点上的一些 k8s 组件 (kubelet, kube-proxy) 等。
  • 3.kubectl delete NODE 将节点移出集群。
  • 4.如果节点是虚拟机,并且不需要了,可以直接销毁掉。

前3个步骤,也可以用 kubespray 提供的remove-node.yml这个 playbook 来一步到位实现:

ansible-playbook \
  -i inventory/mycluster/inventory.ini \
  --private-key=id_rsa --user=ubuntu -b \
  -e "node=node2,node3" \
  remove-node.yml

-e 里写要移出的节点名列表,如果您要删除的节点不在线,您应该将reset_nodes=false和添加allow_ungraceful_removal=true到您的额外变量中

升级

https://kubespray.io/#/docs/upgrades

错误解决

FAILED - RETRYING: [master1]: download_file | Validate mirrors (4 retries left).

failed: [master1] (item=None) => {"attempts": 4, "censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result", "changed": false}
fatal: [master1 -> {{ download_delegate if download_force_cache else inventory_hostname }}]: FAILED! => {"censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result", "changed": false}

vim inventory/mycluster/group_vars/all/all.yml
# 打开日志查看
unsafe_show_logs: true
https://github.com/containerd/containerd/issues/4581
"Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized"

# 启用containerd_snapshotter: "native"
sed -i 's@# containerd_snapshotter: "native"@containerd_snapshotter: "native"@g' inventory/mycluster/group_vars/all/containerd.yml

cat inventory/mycluster/group_vars/all/containerd.yml
containerd_snapshotter: "native" 

重新运行kubespray
ModuleNotFoundError: No module named '_ctypes'

yum install -y libffi-devel 然后重新make python39

https://github.com/kubernetes-sigs/kubespray/blob/master/docs/mirror.md

https://kubernetes.io/zh-cn/docs/setup/production-environment/tools/kubespray/

使用 kubespray 搭建集群