velero备份恢复实战

背景

项目需求,需将多个aks集群里面的资源迁移到新的集群里面。并有以下需求:

  • 应用有挂载存储卷,使用的存储底层有azurefile(文件存储)和azuredisk(块存储),都是基于storageclass动态创建的。 需要协助迁移,加快迁移速度
  • 修改namespace信息。以前的项目namespace命名不规范,迁移过来后要调整。
  • 关闭Loadbalance和NodePort类似的Service,统一使用ingress暴露服务

本次迁移查看了AKS的相关文档,azurefile和azuredisk支持迁移到不同订阅下面,不过我们这边没有这么高的权限操作。

azure官方支持velero迁移数据,且数据支持保存到AzureBlob(对象存储)里面,但是我们没有AzureBlob的权限。且内网有防火墙,无法使用阿里云对象存储。

本来计划是导出资源的YAML文件去迁移的,但是发现这样存在一些问题:

  • 很多服务挂载了存储卷,不迁移服务起不来,等研发迁移效率低
  • 要修改namespace,使用python修改yaml文件的namespace,需要先转JSON修改,修改完再转YAML,此时部分资源清单会出现一些问题(或许是我方式不对?)
  • Service资源之前

综合下来发现velero可以满足需求,只是要自建minio对象存储了,不过只是迁移用,小小部署一下,问题不大。

环境信息

集群版本:两个集群不一样,但是问题不大,整体上API差异不是很大。

部署组件:每个集群都需要部署velero组件,minio对象存储部署在新的集群里面

存储类:无需手动创建,AKS集群创建好之后自动创建azurefile和azuredisk的存储类,业务的pvc也都是在这之上创建的

老集群 新集群
集群版本 v1.25.11 v1.28.10
部署组件 velero velero,minio
StorageClass

一、对象存储部署

对象存储minio部署到新的集群里面

资源清单

---
apiVersion: v1
kind: Namespace
metadata:
  name: minio
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata: 
  name: minio-pv-claim 
  namespace: minio
spec: 
  storageClassName: default 
  accessModes: 
    - ReadWriteOnce 
  resources: 
    requests: 
      storage: 500Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: minio 
  namespace: minio
  labels:
    app: minio
spec: 
  selector: 
    matchLabels: 
      app: minio 
  strategy: 
    type: Recreate 
  template: 
    metadata: 
      labels: 
        app: minio 
    spec: 
      volumes: 
      - name: storage 
        persistentVolumeClaim: 
          claimName: minio-pv-claim 
      containers: 
      - name: minio 
        image: xxxxxx.azurecr.cn/minio/minio:latest
        args:
        - server 
        - /storage 
        - --config-dir=/config
        - --console-address=:9001
        env: 
        - name: MINIO_ACCESS_KEY 
          value: minio
        - name: MINIO_SECRET_KEY 
          value: minio123
        ports: 
        - containerPort: 9000 
          hostPort: 9000 
        volumeMounts: 
        - name: storage  
          mountPath: "/storage"
---
apiVersion: v1
kind: Service
metadata:
  namespace: minio
  name: minio
  labels:
    app: minio
spec:
  type: NodePort
  ports:
    - name: api
      port: 9000
      targetPort: 9000
    - name: console
      port: 9001
      targetPort: 9001
  selector:
    app: minio

创建bucket

565949d24c71:/# kubectl -n velero exec -it minio-6b494b99f6-47hfw bash
bash-5.1# mc alias set myminio http://10.244.2.158:9000 minio minio123
Added `myminio` successfully.
bash-5.1# mc ls myminio
bash-5.1# mc mb myminio/velero
Bucket created successfully `myminio/velero`.
bash-5.1# mc ls myminio
[2024-09-24 06:43:43 UTC]     0B velero/

注意:上面minio使用NodePort方式暴露,方便其他集群连接

565949d24c71:~# kubectl -n minio get svc
NAME    TYPE       CLUSTER-IP   EXTERNAL-IP   PORT(S)                         AGE
minio   NodePort   10.2.0.187   <none>        9000:30198/TCP,9001:30207/TCP   2d16h

二、velero服务端部署

每个集群 都需要部署velero服务端,s3Url指向项目minio的地址。

#下载velero命令行工具
wget https://www.ghproxy.cn/https://github.com/vmware-tanzu/velero/releases/download/v1.14.1/velero-v1.14.1-linux-amd64.tar.gz
mv velero-v1.14.1-linux-amd64/velero /usr/local/bin/
#设置bash补全
apk add bash-completion
source <(velero completion bash)
echo 'source <(velero completion bash)' >>~/.bashrc
#部署velero服务端
#准备velero配置文件,填写minio的AKSK信息
# cat velero-auth.txt 
[default]
aws_access_key_id = minio
aws_secret_access_key = minio123
#使用velero install安装velero服务端
velero install \
  --provider aws \
  --plugins xxxxxx.azurecr.cn/velero/velero-plugin-for-aws:v1.10.0 \
  --bucket velero \
  --secret-file ./velero-auth.txt \
  --image xxxxxx.azurecr.cn/velero/velero:v1.14.1 \
  --use-node-agent \
  --use-volume-snapshots=false \
  --namespace velero\
  --backup-location-config region=minio,s3ForcePathStyle="true",s3Url=http://10.182.107.37:30198

三、执行备份操作

在老的集群上执行备份操作,下面使用的--include-namespaces选项来备份指定namespace,也可以使用--exclude-namespaces排除不需要备份的namespace来备份整个集群

#切换到老的集群上
~# kubectl config use-context old-cluster
#执行备份
~# velero backup create dp-test-backup \
  --include-namespaces disc-test \
  --exclude-resources statefulsets \
  --default-volumes-to-fs-backup
Backup request "dp-test-backup" submitted successfully.
Run `velero backup describe dp-test-backup` or `velero backup logs dp-test-backup` for more details.
#查看备份
~# velero backup get
NAME             STATUS       ERRORS   WARNINGS   CREATED                         EXPIRES   STORAGE LOCATION   SELECTOR
dp-test-backup   InProgress   0        0          2024-09-24 07:19:28 +0000 UTC   29d       default            <none>
#查看备份详情
~# velero backup describe dp-test-backup --details
Name:         dp-test-backup
Namespace:    velero
Labels:       velero.io/storage-location=default
Annotations:  velero.io/resource-timeout=10m0s
              velero.io/source-cluster-k8s-gitversion=v1.25.11
              velero.io/source-cluster-k8s-major-version=1
              velero.io/source-cluster-k8s-minor-version=25

Phase:  Completed

Namespaces:
  Included:  disc-test
  Excluded:  <none>

Resources:
  Included:        *
  Excluded:        statefulsets
  Cluster-scoped:  auto

Label selector:  <none>

Or label selector:  <none>

Storage Location:  default

Velero-Native Snapshot PVs:  auto
Snapshot Move Data:          false
Data Mover:                  velero

TTL:  720h0m0s

CSISnapshotTimeout:    10m0s
ItemOperationTimeout:  4h0m0s

Hooks:  <none>

Backup Format Version:  1.1.0

Started:    2024-09-24 07:19:28 +0000 UTC
Completed:  2024-09-24 07:19:50 +0000 UTC

Expiration:  2024-10-24 07:19:27 +0000 UTC

Total items to be backed up:  110
Items backed up:              110

Resource List:
  apps/v1/Deployment:
    - disc-test/disc-eureka
    - disc-test/disc-file
    - disc-test/disc-gateway
    - disc-test/disc-service
    - disc-test/disc-ui
    - disc-test/disc-workflow
    - disc-test/postgres
  apps/v1/ReplicaSet:
    - disc-test/disc-eureka-674b4b54f8
    - disc-test/disc-eureka-7b48cf87d5
    - disc-test/disc-eureka-847b44b775
    - disc-test/disc-eureka-89fc89c9b
    - disc-test/disc-eureka-945b54cd7
    - disc-test/disc-eureka-f67fcf8c
    - disc-test/disc-file-58d9959589
    - disc-test/disc-file-5c799cb8cb
    - disc-test/disc-file-6cdb85586c
    - disc-test/disc-file-d5479fb4
    - disc-test/disc-gateway-5bcfd85cf8
    - disc-test/disc-gateway-5dc78c559
    - disc-test/disc-gateway-645f7c4cbd
    - disc-test/disc-gateway-647565d5b5
    - disc-test/disc-gateway-65cdffdc94
    - disc-test/disc-gateway-69465d5844
    - disc-test/disc-gateway-764bc7ffd9
    - disc-test/disc-gateway-7dffd485f9
    - disc-test/disc-gateway-868b6d47d7
    - disc-test/disc-gateway-86c594985b
    - disc-test/disc-gateway-d6fcb7784
    - disc-test/disc-service-5bd7d95f44
    - disc-test/disc-service-64df9758dd
    - disc-test/disc-service-6575c976b5
    - disc-test/disc-service-65b55857d4
    - disc-test/disc-service-769845d97f
    - disc-test/disc-service-76ffc4765d
    - disc-test/disc-service-7c9487b895
    - disc-test/disc-service-7c9d7dfcc
    - disc-test/disc-service-86d8bb7d69
    - disc-test/disc-service-b6594c76
    - disc-test/disc-service-ccff659b4
    - disc-test/disc-ui-5ffdcc67c
    - disc-test/disc-ui-644894785c
    - disc-test/disc-ui-65cd5ff847
    - disc-test/disc-ui-66b6955bb9
    - disc-test/disc-ui-68bf6dbf7b
    - disc-test/disc-ui-69959c546d
    - disc-test/disc-ui-7597cdb5b4
    - disc-test/disc-ui-769cd846bb
    - disc-test/disc-ui-7cbb85f99c
    - disc-test/disc-ui-84fbb947d6
    - disc-test/disc-ui-fcf99db9f
    - disc-test/disc-workflow-5466bbddc5
    - disc-test/disc-workflow-556c648c79
    - disc-test/disc-workflow-5655576b9d
    - disc-test/disc-workflow-569bc59c86
    - disc-test/disc-workflow-56b5b57fc
    - disc-test/disc-workflow-57cc5757b8
    - disc-test/disc-workflow-5d679b4bdc
    - disc-test/disc-workflow-5f7ff45485
    - disc-test/disc-workflow-6ddd45b688
    - disc-test/disc-workflow-6df7864678
    - disc-test/disc-workflow-8cf8896c7
    - disc-test/postgres-56fc595864
  discovery.k8s.io/v1/EndpointSlice:
    - disc-test/disc-eureka-z68w7
    - disc-test/disc-file-4pnqj
    - disc-test/disc-gateway-p4zbl
    - disc-test/disc-service-xrgpz
    - disc-test/disc-ui-vt82h
    - disc-test/disc-workflow-qks5n
    - disc-test/postgres-9pxcf
  v1/ConfigMap:
    - disc-test/kube-root-ca.crt
  v1/Endpoints:
    - disc-test/disc-eureka
    - disc-test/disc-file
    - disc-test/disc-gateway
    - disc-test/disc-service
    - disc-test/disc-ui
    - disc-test/disc-workflow
    - disc-test/postgres
  v1/Namespace:
    - disc-test
  v1/PersistentVolume:
    - pvc-01088f44-0bc2-42d6-9de7-aa314662097a
    - pvc-22f535f7-1dc5-4266-827e-d9a693b45932
    - pvc-85e83822-0615-4b31-becb-44a83727290a
    - pvc-b1d42355-5739-48bd-8c09-fce53beafc3d
    - pvc-d7d21438-83c4-45d9-8a42-2ebf22989cd0
    - pvc-db8e43e1-7def-4e83-8c36-e3fc1bbfbeca
    - pvc-ee464b26-7c93-438f-b02e-969f5bac5255
  v1/PersistentVolumeClaim:
    - disc-test/disc-backup-dataset-pvc
    - disc-test/disc-filestore-pvc
    - disc-test/disc-log-pvc-file
    - disc-test/disc-log-pvc-gateway
    - disc-test/disc-log-pvc-service
    - disc-test/disc-log-pvc-workflow
    - disc-test/disc-pg-pvc
  v1/Pod:
    - disc-test/disc-eureka-f67fcf8c-vjkld
    - disc-test/disc-file-6cdb85586c-k668z
    - disc-test/disc-gateway-5dc78c559-l668z
    - disc-test/disc-service-76ffc4765d-hmdw2
    - disc-test/disc-ui-68bf6dbf7b-z64ml
    - disc-test/disc-workflow-8cf8896c7-lb4b5
    - disc-test/postgres-56fc595864-mzxwl
  v1/Secret:
    - disc-test/acr.secret
    - disc-test/azure-storage-account-fe8963d6e59184c25a5f078-secret
    - disc-test/dpsecret
  v1/Service:
    - disc-test/disc-eureka
    - disc-test/disc-file
    - disc-test/disc-gateway
    - disc-test/disc-service
    - disc-test/disc-ui
    - disc-test/disc-workflow
    - disc-test/postgres
  v1/ServiceAccount:
    - disc-test/default

Backup Volumes:
  Velero-Native Snapshots: <none included>

  CSI Snapshots: <none included>

  Pod Volume Backups - kopia:
    Completed:
      disc-test/disc-file-6cdb85586c-k668z: file-volume, log-volume
      disc-test/disc-gateway-5dc78c559-l668z: log-volume
      disc-test/disc-service-76ffc4765d-hmdw2: backup-dataset-volume, log-volume
      disc-test/disc-workflow-8cf8896c7-lb4b5: log-volume
      disc-test/postgres-56fc595864-mzxwl: dev-shm, postgres-data

HooksAttempted:  0
HooksFailed:     0

四、执行恢复操作

在新的集群上执行恢复操作。这里在新的集群上创建了一个ConfigMap,指定了恢复使用的镜像和权限。说明:velero的文件系统恢复原理是在应用的pod里面插入一个initcontainer,然后将备份的数据复制到存储卷里面。这个操作只会进行一次。

恢复的时候使用了--namespace-mappings调整namespace,非常方便

#切换到新集群
kubectl config use-context new-cluster
#创建configmap,指定restore container
~# cat cm_velero.yaml 
apiVersion: v1
kind: ConfigMap
metadata:
  name: fs-restore-action-config
  namespace: velero
  labels:
    velero.io/plugin-config: ""
    velero.io/pod-volume-restore: RestoreItemAction
data:
  image: "xxxxxx.azurecr.cn/velero/velero-restore-helper:v1.14.1"
  secCtxRunAsUser: "0"
  secCtxRunAsGroup: "0"
  secCtxAllowPrivilegeEscalation: "false"
~# kubectl -n velero delete pod velero-659d579c46-4nhww
data:
  image: "xxxxxx.azurecr.cn/velero/velero-restore-helper:v1.14.1"
#创建恢复
~# velero restore create dp-test-restore \
  --from-backup dp-test-backup \
  --namespace-mappings disc-test:dp-test \
  --preserve-nodeports=false \
  --existing-resource-policy=none
#查看恢复详情
velero restore get
velero restore describe dp-test-backup 
#查看恢复日志
velero restore logs dp-test-backup 

服务都可以正常拉起来,数据也都在,跨Namespace服务调用或者NodePort服务调用,只需要研发稍微调整下,整体上工作量不大。

最后需要处理下Service资源,LoadBalance和NodePort类型的Service改成ClusterIP

遇到的问题

velero restore delete xxx失败,一直在Deleting?

原因:可能是某些原因导致恢复没有执行成功,尝试重启velero服务:kubectl -n velero delete pod velero-659d579c46-4nhww。也有可能是minio对象存储无法连接,需要解决。如果不小心误删了minio对象存储,也会导致restore无法删除,可以使用kubectl -n velero patch restores/dp-test-restore --type json --patch='[ { "op": "remove", "path": "/metadata/finalizers" } ]'命令来删除