velero备份恢复实战
背景
项目需求,需将多个aks集群里面的资源迁移到新的集群里面。并有以下需求:
- 应用有挂载存储卷,使用的存储底层有azurefile(文件存储)和azuredisk(块存储),都是基于storageclass动态创建的。 需要协助迁移,加快迁移速度
- 修改namespace信息。以前的项目namespace命名不规范,迁移过来后要调整。
- 关闭Loadbalance和NodePort类似的Service,统一使用ingress暴露服务
本次迁移查看了AKS的相关文档,azurefile和azuredisk支持迁移到不同订阅下面,不过我们这边没有这么高的权限操作。
azure官方支持velero迁移数据,且数据支持保存到AzureBlob(对象存储)里面,但是我们没有AzureBlob的权限。且内网有防火墙,无法使用阿里云对象存储。
本来计划是导出资源的YAML文件去迁移的,但是发现这样存在一些问题:
- 很多服务挂载了存储卷,不迁移服务起不来,等研发迁移效率低
- 要修改namespace,使用python修改yaml文件的namespace,需要先转JSON修改,修改完再转YAML,此时部分资源清单会出现一些问题(或许是我方式不对?)
- Service资源之前
综合下来发现velero可以满足需求,只是要自建minio对象存储了,不过只是迁移用,小小部署一下,问题不大。
环境信息
集群版本:两个集群不一样,但是问题不大,整体上API差异不是很大。
部署组件:每个集群都需要部署velero组件,minio对象存储部署在新的集群里面
存储类:无需手动创建,AKS集群创建好之后自动创建azurefile和azuredisk的存储类,业务的pvc也都是在这之上创建的
老集群 | 新集群 | |
---|---|---|
集群版本 | v1.25.11 | v1.28.10 |
部署组件 | velero | velero,minio |
StorageClass |

一、对象存储部署
对象存储minio部署到新的集群里面
资源清单
---
apiVersion: v1
kind: Namespace
metadata:
name: minio
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: minio-pv-claim
namespace: minio
spec:
storageClassName: default
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 500Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: minio
namespace: minio
labels:
app: minio
spec:
selector:
matchLabels:
app: minio
strategy:
type: Recreate
template:
metadata:
labels:
app: minio
spec:
volumes:
- name: storage
persistentVolumeClaim:
claimName: minio-pv-claim
containers:
- name: minio
image: xxxxxx.azurecr.cn/minio/minio:latest
args:
- server
- /storage
- --config-dir=/config
- --console-address=:9001
env:
- name: MINIO_ACCESS_KEY
value: minio
- name: MINIO_SECRET_KEY
value: minio123
ports:
- containerPort: 9000
hostPort: 9000
volumeMounts:
- name: storage
mountPath: "/storage"
---
apiVersion: v1
kind: Service
metadata:
namespace: minio
name: minio
labels:
app: minio
spec:
type: NodePort
ports:
- name: api
port: 9000
targetPort: 9000
- name: console
port: 9001
targetPort: 9001
selector:
app: minio
创建bucket
565949d24c71:/# kubectl -n velero exec -it minio-6b494b99f6-47hfw bash
bash-5.1# mc alias set myminio http://10.244.2.158:9000 minio minio123
Added `myminio` successfully.
bash-5.1# mc ls myminio
bash-5.1# mc mb myminio/velero
Bucket created successfully `myminio/velero`.
bash-5.1# mc ls myminio
[2024-09-24 06:43:43 UTC] 0B velero/
注意:上面minio使用NodePort方式暴露,方便其他集群连接
565949d24c71:~# kubectl -n minio get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
minio NodePort 10.2.0.187 <none> 9000:30198/TCP,9001:30207/TCP 2d16h
二、velero服务端部署
每个集群 都需要部署velero服务端,s3Url指向项目minio的地址。
#下载velero命令行工具
wget https://www.ghproxy.cn/https://github.com/vmware-tanzu/velero/releases/download/v1.14.1/velero-v1.14.1-linux-amd64.tar.gz
mv velero-v1.14.1-linux-amd64/velero /usr/local/bin/
#设置bash补全
apk add bash-completion
source <(velero completion bash)
echo 'source <(velero completion bash)' >>~/.bashrc
#部署velero服务端
#准备velero配置文件,填写minio的AKSK信息
# cat velero-auth.txt
[default]
aws_access_key_id = minio
aws_secret_access_key = minio123
#使用velero install安装velero服务端
velero install \
--provider aws \
--plugins xxxxxx.azurecr.cn/velero/velero-plugin-for-aws:v1.10.0 \
--bucket velero \
--secret-file ./velero-auth.txt \
--image xxxxxx.azurecr.cn/velero/velero:v1.14.1 \
--use-node-agent \
--use-volume-snapshots=false \
--namespace velero\
--backup-location-config region=minio,s3ForcePathStyle="true",s3Url=http://10.182.107.37:30198
三、执行备份操作
在老的集群上执行备份操作,下面使用的--include-namespaces
选项来备份指定namespace,也可以使用--exclude-namespaces
排除不需要备份的namespace来备份整个集群
#切换到老的集群上
~# kubectl config use-context old-cluster
#执行备份
~# velero backup create dp-test-backup \
--include-namespaces disc-test \
--exclude-resources statefulsets \
--default-volumes-to-fs-backup
Backup request "dp-test-backup" submitted successfully.
Run `velero backup describe dp-test-backup` or `velero backup logs dp-test-backup` for more details.
#查看备份
~# velero backup get
NAME STATUS ERRORS WARNINGS CREATED EXPIRES STORAGE LOCATION SELECTOR
dp-test-backup InProgress 0 0 2024-09-24 07:19:28 +0000 UTC 29d default <none>
#查看备份详情
~# velero backup describe dp-test-backup --details
Name: dp-test-backup
Namespace: velero
Labels: velero.io/storage-location=default
Annotations: velero.io/resource-timeout=10m0s
velero.io/source-cluster-k8s-gitversion=v1.25.11
velero.io/source-cluster-k8s-major-version=1
velero.io/source-cluster-k8s-minor-version=25
Phase: Completed
Namespaces:
Included: disc-test
Excluded: <none>
Resources:
Included: *
Excluded: statefulsets
Cluster-scoped: auto
Label selector: <none>
Or label selector: <none>
Storage Location: default
Velero-Native Snapshot PVs: auto
Snapshot Move Data: false
Data Mover: velero
TTL: 720h0m0s
CSISnapshotTimeout: 10m0s
ItemOperationTimeout: 4h0m0s
Hooks: <none>
Backup Format Version: 1.1.0
Started: 2024-09-24 07:19:28 +0000 UTC
Completed: 2024-09-24 07:19:50 +0000 UTC
Expiration: 2024-10-24 07:19:27 +0000 UTC
Total items to be backed up: 110
Items backed up: 110
Resource List:
apps/v1/Deployment:
- disc-test/disc-eureka
- disc-test/disc-file
- disc-test/disc-gateway
- disc-test/disc-service
- disc-test/disc-ui
- disc-test/disc-workflow
- disc-test/postgres
apps/v1/ReplicaSet:
- disc-test/disc-eureka-674b4b54f8
- disc-test/disc-eureka-7b48cf87d5
- disc-test/disc-eureka-847b44b775
- disc-test/disc-eureka-89fc89c9b
- disc-test/disc-eureka-945b54cd7
- disc-test/disc-eureka-f67fcf8c
- disc-test/disc-file-58d9959589
- disc-test/disc-file-5c799cb8cb
- disc-test/disc-file-6cdb85586c
- disc-test/disc-file-d5479fb4
- disc-test/disc-gateway-5bcfd85cf8
- disc-test/disc-gateway-5dc78c559
- disc-test/disc-gateway-645f7c4cbd
- disc-test/disc-gateway-647565d5b5
- disc-test/disc-gateway-65cdffdc94
- disc-test/disc-gateway-69465d5844
- disc-test/disc-gateway-764bc7ffd9
- disc-test/disc-gateway-7dffd485f9
- disc-test/disc-gateway-868b6d47d7
- disc-test/disc-gateway-86c594985b
- disc-test/disc-gateway-d6fcb7784
- disc-test/disc-service-5bd7d95f44
- disc-test/disc-service-64df9758dd
- disc-test/disc-service-6575c976b5
- disc-test/disc-service-65b55857d4
- disc-test/disc-service-769845d97f
- disc-test/disc-service-76ffc4765d
- disc-test/disc-service-7c9487b895
- disc-test/disc-service-7c9d7dfcc
- disc-test/disc-service-86d8bb7d69
- disc-test/disc-service-b6594c76
- disc-test/disc-service-ccff659b4
- disc-test/disc-ui-5ffdcc67c
- disc-test/disc-ui-644894785c
- disc-test/disc-ui-65cd5ff847
- disc-test/disc-ui-66b6955bb9
- disc-test/disc-ui-68bf6dbf7b
- disc-test/disc-ui-69959c546d
- disc-test/disc-ui-7597cdb5b4
- disc-test/disc-ui-769cd846bb
- disc-test/disc-ui-7cbb85f99c
- disc-test/disc-ui-84fbb947d6
- disc-test/disc-ui-fcf99db9f
- disc-test/disc-workflow-5466bbddc5
- disc-test/disc-workflow-556c648c79
- disc-test/disc-workflow-5655576b9d
- disc-test/disc-workflow-569bc59c86
- disc-test/disc-workflow-56b5b57fc
- disc-test/disc-workflow-57cc5757b8
- disc-test/disc-workflow-5d679b4bdc
- disc-test/disc-workflow-5f7ff45485
- disc-test/disc-workflow-6ddd45b688
- disc-test/disc-workflow-6df7864678
- disc-test/disc-workflow-8cf8896c7
- disc-test/postgres-56fc595864
discovery.k8s.io/v1/EndpointSlice:
- disc-test/disc-eureka-z68w7
- disc-test/disc-file-4pnqj
- disc-test/disc-gateway-p4zbl
- disc-test/disc-service-xrgpz
- disc-test/disc-ui-vt82h
- disc-test/disc-workflow-qks5n
- disc-test/postgres-9pxcf
v1/ConfigMap:
- disc-test/kube-root-ca.crt
v1/Endpoints:
- disc-test/disc-eureka
- disc-test/disc-file
- disc-test/disc-gateway
- disc-test/disc-service
- disc-test/disc-ui
- disc-test/disc-workflow
- disc-test/postgres
v1/Namespace:
- disc-test
v1/PersistentVolume:
- pvc-01088f44-0bc2-42d6-9de7-aa314662097a
- pvc-22f535f7-1dc5-4266-827e-d9a693b45932
- pvc-85e83822-0615-4b31-becb-44a83727290a
- pvc-b1d42355-5739-48bd-8c09-fce53beafc3d
- pvc-d7d21438-83c4-45d9-8a42-2ebf22989cd0
- pvc-db8e43e1-7def-4e83-8c36-e3fc1bbfbeca
- pvc-ee464b26-7c93-438f-b02e-969f5bac5255
v1/PersistentVolumeClaim:
- disc-test/disc-backup-dataset-pvc
- disc-test/disc-filestore-pvc
- disc-test/disc-log-pvc-file
- disc-test/disc-log-pvc-gateway
- disc-test/disc-log-pvc-service
- disc-test/disc-log-pvc-workflow
- disc-test/disc-pg-pvc
v1/Pod:
- disc-test/disc-eureka-f67fcf8c-vjkld
- disc-test/disc-file-6cdb85586c-k668z
- disc-test/disc-gateway-5dc78c559-l668z
- disc-test/disc-service-76ffc4765d-hmdw2
- disc-test/disc-ui-68bf6dbf7b-z64ml
- disc-test/disc-workflow-8cf8896c7-lb4b5
- disc-test/postgres-56fc595864-mzxwl
v1/Secret:
- disc-test/acr.secret
- disc-test/azure-storage-account-fe8963d6e59184c25a5f078-secret
- disc-test/dpsecret
v1/Service:
- disc-test/disc-eureka
- disc-test/disc-file
- disc-test/disc-gateway
- disc-test/disc-service
- disc-test/disc-ui
- disc-test/disc-workflow
- disc-test/postgres
v1/ServiceAccount:
- disc-test/default
Backup Volumes:
Velero-Native Snapshots: <none included>
CSI Snapshots: <none included>
Pod Volume Backups - kopia:
Completed:
disc-test/disc-file-6cdb85586c-k668z: file-volume, log-volume
disc-test/disc-gateway-5dc78c559-l668z: log-volume
disc-test/disc-service-76ffc4765d-hmdw2: backup-dataset-volume, log-volume
disc-test/disc-workflow-8cf8896c7-lb4b5: log-volume
disc-test/postgres-56fc595864-mzxwl: dev-shm, postgres-data
HooksAttempted: 0
HooksFailed: 0
四、执行恢复操作
在新的集群上执行恢复操作。这里在新的集群上创建了一个ConfigMap,指定了恢复使用的镜像和权限。说明:velero的文件系统恢复原理是在应用的pod里面插入一个initcontainer,然后将备份的数据复制到存储卷里面。这个操作只会进行一次。
恢复的时候使用了--namespace-mappings
调整namespace,非常方便
#切换到新集群
kubectl config use-context new-cluster
#创建configmap,指定restore container
~# cat cm_velero.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: fs-restore-action-config
namespace: velero
labels:
velero.io/plugin-config: ""
velero.io/pod-volume-restore: RestoreItemAction
data:
image: "xxxxxx.azurecr.cn/velero/velero-restore-helper:v1.14.1"
secCtxRunAsUser: "0"
secCtxRunAsGroup: "0"
secCtxAllowPrivilegeEscalation: "false"
~# kubectl -n velero delete pod velero-659d579c46-4nhww
data:
image: "xxxxxx.azurecr.cn/velero/velero-restore-helper:v1.14.1"
#创建恢复
~# velero restore create dp-test-restore \
--from-backup dp-test-backup \
--namespace-mappings disc-test:dp-test \
--preserve-nodeports=false \
--existing-resource-policy=none
#查看恢复详情
velero restore get
velero restore describe dp-test-backup
#查看恢复日志
velero restore logs dp-test-backup
服务都可以正常拉起来,数据也都在,跨Namespace服务调用或者NodePort服务调用,只需要研发稍微调整下,整体上工作量不大。
最后需要处理下Service资源,LoadBalance和NodePort类型的Service改成ClusterIP
遇到的问题
velero restore delete xxx
失败,一直在Deleting?
原因:可能是某些原因导致恢复没有执行成功,尝试重启velero服务:kubectl -n velero delete pod velero-659d579c46-4nhww
。也有可能是minio对象存储无法连接,需要解决。如果不小心误删了minio对象存储,也会导致restore无法删除,可以使用kubectl -n velero patch restores/dp-test-restore --type json --patch='[ { "op": "remove", "path": "/metadata/finalizers" } ]'
命令来删除