Kubernetes监控kube-prometheus使用

Kube Prometheus

Posted by BlueFat on Friday, November 25, 2022

kube-prometheus github

部署

git clone https://github.com/prometheus-operator/kube-prometheus.git
cd kube-prometheus
git checkout v0.11.0 # 切换kubernetes兼容版本

首先创建需要的命名空间和 CRDs,等待它们可用后再创建其余资源

kubectl apply --server-side -f manifests/setup
kubectl wait \
	--for condition=Established \
	--all CustomResourceDefinition \
	--namespace=monitoring
kubectl apply -f manifests/

注: 这里要使用kubectl create,使用kubectl apply 会报错Too long: must have at most 262144 bytes

kubectl get servicemonitors --all-namespaces
kubectl get pod -n monitoring
kubectl get svc -n monitoring

访问dashboard

在release-0.11版本新增了NetworkPolicy,默认只允许kube-promethues下pod访问, 需修改或删除Networkpolicy规则,才能访问Service、NodePort、Ingress。

修改NetworkPolicy规则

kubectl get networkPolicy -n monitoring

cd /root/kube-prometheus/manifests

vim grafana-networkPolicy.yaml
spec:
  egress:
  - {}
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app.kubernetes.io/name: prometheus
    - ipBlock: # 添加网段
        cidr: 192.168.10.0/24 # 信任网段
    ports:
    - port: 3000
      protocol: TCP
      
vim prometheus-networkPolicy.yaml
spec:
  egress:
  - {}
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app.kubernetes.io/name: prometheus
    - ipBlock:
        cidr: 192.168.10.0/24
    ports:
    - port: 9090
      protocol: TCP
    - port: 8080
      protocol: TCP
      
vim alertmanager-networkPolicy.yaml 
spec:
  egress:
  - {}
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app.kubernetes.io/name: prometheus
    - ipBlock:
        cidr: 192.168.10.0/24
    ports:
    - port: 9093
      protocol: TCP
    - port: 8080
      protocol: TCP

方式1:修改为NodePort

kubectl patch svc grafana -p '{ "spec":{"type": "NodePort"} }' -n monitoring
kubectl patch svc alertmanager-main -p '{ "spec":{"type": "NodePort"} }' -n monitoring
kubectl patch svc prometheus-k8s -p '{ "spec":{"type": "NodePort"} }' -n monitoring

方式2:配置ingress

cat > prometheus-ingress.yaml << EOF
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  namespace: monitoring
  name: prometheus-ingress
spec:
  ingressClassName: nginx
  rules:
  - host: grafana.example.com
    http:
      paths:
      - backend:
          service:
            name: grafana
            port:
              number: 3000
        path: /
        pathType: Prefix
  - host: prometheus.example.com
    http:
      paths:
      - backend:
          service:
            name: prometheus-k8s
            port:
              number: 9090
        path: /
        pathType: Prefix
  - host: alertmanager.example.com
    http:
      paths:
      - backend:
          service:
            name: alertmanager-main
            port:
              number: 9093
        path: /
        pathType: Prefix
EOF
kubectl apply -f prometheus-ingress.yaml

访问dashboard

Prometheus

Prometheus: http://promethues.example.com

[root@master1 manifests]# kubectl get pod -n monitoring | grep prometheus-k8s
prometheus-k8s-0                       2/2     Running   0          128m
prometheus-k8s-1                       2/2     Running   0          128m

两个Prometheus实例的,但实际上我们这里访问的时候始终是路由到后端的一个实例上去,因为这里的 Service 在创建的时候添加了 sessionAffinity: ClientIP 这样的属性,会根据 ClientIP 来做 session 亲和性,所以我们不用担心请求会到不同的副本上去

AlertManager

AlertManager: http://alertmanager.example.com

Grafana

Grafana: http://grafana.example.com 用户名和密码,都是admin

grafana-kubernetes-app插件

$ kubectl exec -it $(kubectl get pod -n monitoring -l app.kubernetes.io/name=grafana \
-o jsonpath='{.items[*].metadata.name}') -n monitoring -- sh

/usr/share/grafana $ grafana-cli plugins install grafana-kubernetes-app

kube-prometheus grafana dashboard时区默认为UTC,比北京时间慢了8小时,很不便于日常监控查看,这里可以修改

/root/kube-prometheus/manifests
sed -i 's/UTC/UTC+8/g'  grafana-dashboardDefinitions.yaml
sed -i 's/utc/utc+8/g'  grafana-dashboardDefinitions.yaml
kubectl apply -f grafana-dashboardDefinitions.yaml 

AlterManager报警配置

Altermanager邮件报警

这里给出精简版本,详细可以参考kube-prometheus/manifests/alertmanager-secret.yaml

cat << EOF > alertmanager-secret.yaml
apiVersion: v1
kind: Secret
metadata:
  labels:
    app.kubernetes.io/component: alert-router
    app.kubernetes.io/instance: main
    app.kubernetes.io/name: alertmanager
    app.kubernetes.io/part-of: kube-prometheus
    app.kubernetes.io/version: 0.24.0
  name: alertmanager-main
  namespace: monitoring
stringData:
  alertmanager.yaml: |-
    global:
      resolve_timeout: 5m
      smtp_smarthost: "smtp.qq.com:465"
      smtp_from: "xxx@qq.com"     # 发件人邮箱
      smtp_auth_username: "xxx@qq.com"    # 发件人邮箱
      smtp_auth_password: "xxxbnmqfesbdhj"  # QQ授权码
      smtp_hello: "qq.com"
      smtp_require_tls: false
    route:
      group_by: ["alertname"]
      group_interval: 5m
      group_wait: 30s
      receiver: default-receiver
      repeat_interval: 12h
      routes:
      - receiver: "example-project"
        match_re:
          namespace: "^(example-project1|example-project2).*$"    # 根据namespace进行区分报警
    receivers:
    - name: "default-receiver"
      email_configs:
      - to: "xxx@example.com"    # 收件人邮箱,设置多个收件人
        send_resolved: true
      - to: "xxx@example.com"
        send_resolved: true
    - name: "example-project"    # 跟上面定义的名字必须一样
      email_configs:
      - to: "xxx@example.com"    # 收件人邮箱,也可以设置多个
        send_resolved: true
      - to: "xxx@example.com"
        send_resolved: true    
type: Opaque
EOF
kubectl apply -f alertmanager-secret.yaml

AlertManager钉钉报警

钉钉-机器人管理 复制生成的webhook

dingtalk deployment

cat << EOF > dingtalk-webhook.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    run: dingtalk
  name: webhook-dingtalk
  namespace: monitoring
spec:
  replicas: 1
  selector:
    matchLabels:
      run: dingtalk
  template:
    metadata:
      labels:
        run: dingtalk
    spec:
      containers:
      - name: dingtalk
        image: timonwong/prometheus-webhook-dingtalk:v1.4.0
        imagePullPolicy: IfNotPresent
        args:
          - --ding.profile=webhook1=https://oapi.dingtalk.com/robot/send?access_token=<替换成你的token>
        ports:
        - containerPort: 8060
          protocol: TCP 
---
apiVersion: v1
kind: Service
metadata:
  labels:
    run: dingtalk
  name: webhook-dingtalk
  namespace: monitoring
spec:
  ports:
  - port: 8060
    protocol: TCP
    targetPort: 8060
  selector:
    run: dingtalk
  sessionAffinity: None
EOF

kubectl apply -f dingtalk-webhook.yaml

这里给出精简版本,详细可以参考kube-prometheus/manifests/alertmanager-secret.yaml

cat << EOF > alertmanager-dingtalk.yaml
apiVersion: v1
kind: Secret
metadata:
  labels:
    app.kubernetes.io/component: alert-router
    app.kubernetes.io/instance: main
    app.kubernetes.io/name: alertmanager
    app.kubernetes.io/part-of: kube-prometheus
    app.kubernetes.io/version: 0.24.0
  name: alertmanager-main
  namespace: monitoring
stringData:
  alertmanager.yaml: |-
    global:
      resolve_timeout: 5m
    route:
      group_by: ['alertname']
      group_interval: 5m
      group_wait: 30s
      receiver: "webhook"
      repeat_interval: 12h
    receivers:
    #配置钉钉告警的webhook
    - name: 'webhook'
      webhook_configs:
      - url: 'http://webhook-dingtalk.monitoring.svc.cluster.local:8060/dingtalk/webhook1/send'
        send_resolved: true
type: Opaque
EOF
kubectl apply -f alertmanager-dingtalk.yaml

验证一下钉钉可以收到报警

# 启动一个错误的容器
kubectl run busybox --image=busybox 

此条规则需要等待5分钟

Grafana导入模板ID: 3070

Kube-Prometheus 部署
安装配置
prometheus-book kube-prometheus监控 controller-manager && scheduler 组件