Velero(以前称为 Heptio Ark)是一个开源工具,可以安全地备份和还原,执行灾难恢复以及迁移 Kubernetes 集群资源和持久卷。 Velero可以部署在自建Kubernetes集群或者公有云托管的K8S环境中,比如QKE(Kubesphere)。 Velero 可用于:
- 备份集群资源
- 并在丢失的情况下进行还原。
- 将集群资源迁移到其他集群。
- 将生产集群资源复制到开发和测试集群。
Velero的两种备份方式:
- 用restic方式备份,它是文件系统级别备份持久卷数据并将其发送到 Velero 的对象存储。执行速度取决于本地IO能力,网络贷款和对象存储性能,相对快照方式备份慢。 但如果当前集群或者存储出现问题,由于所有资源和数据都存储在远端的对象存储上, 用restic方式备份可以很容易的将应用恢复。
- 用快照方式备份, Velero使用一组 BackupItemAction 插件针对 PersistentVolumeClaims 进行备份。执行速度快。它创建一个以 PersistentVolumeClaim 作为源的 VolumeSnapshot 对象. 此 VolumeSnapshot 对象与用作源的 PersistentVolumeClaim 位于同一命名空间中。与VolumeSnapshot对应的 VolumeSnapshotContent 对象是一个集群范围的资源,将指向存储系统中基于磁盘的实际快照。Velero 备份时将所有 VolumeSnapshots 和 VolumeSnapshotContents 对象上传到对象存储系统, 但是Velero 备份后的数据资源仍然保存在集群的存储上。数据可用性依赖于本地存储的高可用性,因为如果是由于存储故障导致的应用问题,Velero的快照备份机制并不能恢复应用数据。
本文将针对velero 快照备份的限制,手动将备份的应用和数据到AWS兼容的S3对象存储,比如私有环境下的minio或者公有云上青云的QingStor等等,这里以QingStor为例。
本次实验将在Kubesphere集群的wordpress项目(命名空间)中部署一个wordpress的应用,先使用 Velero快照方式备份这个命名空间下的应用和数据到QingStor,手动将数据资源从主存储导出到QingStor,然后模拟主存储失效,从QingStor恢复数据资源和应用到另一个集群。下面是具体的实验步骤:
实验环境和前提条件:
安装Velero 开源工具,配置好对应的对象存储。 基于rook-ceph创建wordpress命名空间,并且运行wordpress和mysql应用
root$ kubectl -n wordpress get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
mysql-pv-claim Bound pvc-8a3b3541-1718-4af5-94fc-e24ebe026172 10Gi RWO rook-ceph-block 47m
wp-pv-claim Bound pvc-355352e5-0ecf-4b40-9056-5514015eb392 2Gi RWO rook-ceph-block 47m
root$ kubectl -n wordpress get pods
NAME READY STATUS RESTARTS AGE
wordpress-589f976cd5-4ns55 1/1 Running 0 45m
wordpress-mysql-d9b8d8884-2kmtb 1/1 Running 0 45m
备份数据
- 为了证明数据能够恢复,首先在wordpress上发布一篇新的文章,等整个备份恢复流程后再检验文章是否已经被恢复。
<img src="https://gitee.com/jibutech/tech-docs/raw/master/images/wordpress-demo.png" style="zoom:50%;" />
- 使用Velero对workpress项目做一个快照备份。我们在可以Velero运行的命名空间velero下创建一个Velero备份时所使用的CR。
# wp-snap-manual.yaml
apiVersion: velero.io/v1
kind: Backup
metadata:
annotations:
velero.io/source-cluster-k8s-gitversion: v1.19.5
velero.io/source-cluster-k8s-major-version: "1"
velero.io/source-cluster-k8s-minor-version: "19"
namespace: velero
name: wp-snap-manual
spec:
defaultVolumesToRestic: false
hooks: {}
snapshotVolumes: true
includedNamespaces:
- wordpress
storageLocation: qingstor-vbbf8
volumeSnapshotLocations:
- qingstor-bd0a9b2b-7add-4b97-ba26-d8182d1a2d8e
ttl: 2h0m0s
创建一个backups.velero.io CR
root$ kubectl apply -f wp-snap-manual.yaml
backup.velero.io/wp-snap-manual created
可以查看到wordpress 命名空间下生成的volumesnapshot资源,并查看对应的volumesnapshotcontent信息
root$ kubectl -n wordpress get volumesnapshot
NAME AGE
velero-mysql-pv-claim-hmthh 58m
velero-wp-pv-claim-lgmh5 58m
root$ kubectl -n wordpress get volumesnapshot velero-mysql-pv-claim-hmthh -o yaml | grep bound
boundVolumeSnapshotContentName: snapcontent-428c9f1d-69e1-46b0-93d5-dac44b795aaa
root$ kubectl -n wordpress get volumesnapshot velero-wp-pv-claim-lgmh5 -o yaml | grep bound
boundVolumeSnapshotContentName: snapcontent-6f2ca29b-75a5-46b4-89a3-2f7a4eeff958
- 删除Wordpress 命名空间下的volumesnapshot
root$ kubectl -n wordpress delete volumesnapshot velero-mysql-pv-claim-hmthh velero-wp-pv-claim-lgmh5
volumesnapshot.snapshot.storage.k8s.io "velero-mysql-pv-claim-hmthh" deleted
volumesnapshot.snapshot.storage.k8s.io "velero-wp-pv-claim-lgmh5" deleted
- 创建一个新的命名空间poc,在poc中创建volumesnapshot,spec 中source的volumeSnapshotContentName为第二步中的volumesnapshotcontent名字
# velero-wp-snapshot.yaml
apiVersion: snapshot.storage.k8s.io/v1beta1
kind: VolumeSnapshot
metadata:
finalizers:
- snapshot.storage.kubernetes.io/volumesnapshot-as-source-protection
labels:
velero.io/backup-name: wp-snap-manual
manager: snapshot-controller
name: velero-wp-snapshot
namespace: poc
spec:
source:
volumeSnapshotContentName: snapcontent-6f2ca29b-75a5-46b4-89a3-2f7a4eeff958
volumeSnapshotClassName: csi-rbdplugin-snapclass
# velero-mysql-snapshot.yaml
apiVersion: snapshot.storage.k8s.io/v1beta1
kind: VolumeSnapshot
metadata:
finalizers:
- snapshot.storage.kubernetes.io/volumesnapshot-as-source-protection
labels:
velero.io/backup-name: wp-snap-manual
manager: snapshot-controller
name: velero-mysql-snapshot
namespace: poc
spec:
source:
volumeSnapshotContentName: snapcontent-428c9f1d-69e1-46b0-93d5-dac44b795aaa
volumeSnapshotClassName: csi-rbdplugin-snapclass
root$ kubectl create ns poc
root$ kubectl -n poc apply -f velero-mysql-snapshot.yaml -f velero-wp-snapshot.yaml
- 更改两个volumesnapshotcontent的yaml,使其volumeSnapshotRef指向新命名空间的volumesnapshot
root$ kubectl edit volumesnapshotcontent snapcontent-428c9f1d-69e1-46b0-93d5-dac44b795aaa
找到VolumeSnapshotRef 字段更新为poc下的volumesnapshot内容
volumeSnapshotRef:
apiVersion: snapshot.storage.k8s.io/v1beta1
kind: VolumeSnapshot
name: velero-wp-snapshot
namespace: poc
uid: 4c1a4a4a-9949-425a-a3a9-1970f494aaca
volumeSnapshotRef:
apiVersion: snapshot.storage.k8s.io/v1beta1
kind: VolumeSnapshot
name: velero-mysql-snapshot
namespace: poc
uid: 4c1a4a4a-9949-4277-a3a9-1970f494aaff
- 在新命名空间中创建PVC, 并指定数据源当前命名空间的两个volumesnapshot
# mysql-pv-claim.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: mysql-pv-claim
spec:
storageClassName: rook-ceph-block
dataSource:
name: velero-mysql-snapshot
kind: VolumeSnapshot
apiGroup: snapshot.storage.k8s.io
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
# wp-pv-claim.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: wp-pv-claim
spec:
storageClassName: rook-ceph-block
dataSource:
name: velero-wp-snapshot
kind: VolumeSnapshot
apiGroup: snapshot.storage.k8s.io
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
root$ kubectl -n poc apply -f wp-pv-claim.yaml -f mysql-pv-claim.yaml
root$ kubectl -n poc get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
mysql-pv-claim Bound pvc-7fcf6a03-abd5-4692-92d5-63504891de57 10Gi RWO rook-ceph-block 4s
wp-pv-claim Bound pvc-a1234890-18a0-4d5a-9435-1b74632e8f17 2Gi RWO rook-ceph-block 4s
- 在PVC创建成功后, 集群会为他们创建新的PV, 将PV的RECLAIM POLICY 更新为Retain,目的是为了删除PVC后继续保留这两个PV,使数据继续保存。
kubectl patch pv pvc-7fcf6a03-abd5-4692-92d5-63504891de57 -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}' --type=merge
kubectl patch pv pvc-a1234890-18a0-4d5a-9435-1b74632e8f17 -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}' --type=merge
- 删除poc中的PVC mysql-pv-claim 和wp-pv-claim, 将volumename指定为为上面创建的PV名字,这样新的PVC的就和上面的创建的PV绑定在一起。通过以上步骤(volumesnapshotcontent -> PVC -> PV -> PVC2), wordpress的快照数据已经和当前PVC指定的数据资源完全一致了。
# mysql-pv-claim-2.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: mysql-pv-claim
spec:
storageClassName: rook-ceph-block
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
volumeName: pvc-89b028cc-c5c1-4c63-9398-05f54c80860a
# wp-pv-claim-2.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: wp-pv-claim
spec:
storageClassName: rook-ceph-block
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
volumeName: pvc-1d985bd1-d03d-40e6-b1cc-aaf2deb1d403
root$ kubectl -n poc apply -f wp-pv-claim-2.yaml -f mysql-pv-claim-2.yaml
- 将两个PV的RECLAIM POLICY 改回为Delete
kubectl patch pv pvc-7fcf6a03-abd5-4692-92d5-63504891de57 -p '{"spec":{"persistentVolumeReclaimPolicy":"Delete"}}' --type=merge
kubectl patch pv pvc-a1234890-18a0-4d5a-9435-1b74632e8f17 -p '{"spec":{"persistentVolumeReclaimPolicy":"Delete"}}' --type=merge
- 在新命名空间中创建临时的pod, 绑定PVC。
root$ kubectl -n poc get pods
NAME READY STATUS RESTARTS AGE
stage-wordpress-589f976cd5-4ns55-krjxs 1/1 Running 0 56s
stage-wordpress-mysql-d9b8d8884-2kmtb-q9qnf 1/1 Running 0 56s
- 调用Velero进行文件系统级别的备份。备份成功后,wordpress的数据资源已经全部导出到远端的QingStor
# poc-filesystem-manual.yaml
apiVersion: velero.io/v1
kind: Backup
metadata:
annotations:
velero.io/source-cluster-k8s-gitversion: v1.19.5
velero.io/source-cluster-k8s-major-version: "1"
velero.io/source-cluster-k8s-minor-version: "19"
namespace: velero
name: poc-filesystem-manual
spec:
defaultVolumesToRestic: true
hooks: {}
snapshotVolumes: false
includedNamespaces:
- wordpress
storageLocation: qingstor-vbbf8
ttl: 2h0m0s
root$ kubectl -n velero apply -f poc-filesystem-manual.yaml
root$ kubectl -n velero get backups.velero.io poc-filesystem-manual
NAME AGE
poc-filesystem-manual 95s
root$ kubectl -n velero describe backups.velero.io poc-filesystem-manual
...
Status:
Completion Timestamp: 2021-11-18T05:49:18Z
Expiration: 2021-12-18T05:47:35Z
Format Version: 1.1.0
Phase: Completed
Progress:
Items Backed Up: 31
Total Items: 31
Start Timestamp: 2021-11-18T05:47:35Z
Version: 1
Events: <none>
恢复数据
备份成功后 数据已经全部传输到远端的对象存储上。如果此时出现故障和存储失效, 接下来将介绍如何从远端对象存储恢复命名空间和数据。
- 模拟灾难状态,将wordpress命名空间删除
root$ kubectl delete ns wordpress
- 创建Velero的restore CR,将我们前面备份的数据恢复到wordpress 命名空间中, 恢复成功后删掉两个临时的pod。
# poc-restore-manual.yaml
apiVersion: velero.io/v1
kind: Restore
metadata:
name: poc-restore
namespace: velero
spec:
backupName: poc-filesystem-manual
excludedResources:
- nodes
- events
- events.events.k8s.io
- backups.velero.io
- restores.velero.io
- resticrepositories.velero.io
hooks: {}
namespaceMapping:
poc: wordpress
restorePVs: true
root$ kubectl -n velero apply -f poc-restore-manual.yaml
- 再使用Velero将wordpress 快照备份的CR等资源恢复到wordpress 命名空间中。在恢复时exclude PV和PVC资源。等待pod 运行。
# wp-restore-manual.yaml
apiVersion: velero.io/v1
kind: Restore
metadata:
name: poc-restore
namespace: velero
spec:
backupName: wp-snapshot-manual
excludedResources:
- nodes
- events
- events.events.k8s.io
- backups.velero.io
- restores.velero.io
- resticrepositories.velero.io
- persistentvolume
- persistentvolumeclaim
hooks: {}
namespaceMapping:
wordpress: wordpress
restorePVs: true
root$ kubectl -n velero apply -f wp-restore-manual.yaml
root$ kubectl -n wordpress get pods
NAME READY STATUS RESTARTS AGE
wordpress-mysql-d9b8d8884-2kmtb 1/1 Running 0 18s
wordpress-589f976cd5-4ns55 1/1 Running 0 18s
- 现在我们可以验证一下wordpress的数据是否全部恢复回来。 打开wordpress, 可以看到下图的中的文章仍然存在,至此说明恢复成功。
<img src="https://gitee.com/jibutech/tech-docs/raw/master/images/wordpress-demo-2.png" style="zoom:50%;" />
自动化
针对于备份和恢复流程,作者写了一个小的工具将上面的手动流程自动化起来,欢迎大家试用data-mover 并提出宝贵建议。
以下为程序运行输出:
1. 备份数据
root data-mover % go run main.go --action backup --backupName wp-backup-snap-76mxp-hzb2f --namespace wordpress
=== Step 0. Create temporay namespace + dm-wp-backup-snap-76mxp-hzb2f
=== Step 1. Create new volumesnapshot in temporary namespace
name: velero-mysql-pv-claim-q6jgv, uid: 532b6050-1bd7-4a6f-abfc-1a900bb52fc1, pvc: mysql-pv-claim, content_name: snapcontent-532b6050-1bd7-4a6f-abfc-1a900bb52fc1
Deleted volumesnapshot: velero-mysql-pv-claim-q6jgv in namesapce wordpress
Created volumesnapshot: velero-mysql-pv-claim-q6jgv in dm-wp-backup-snap-76mxp-hzb2f
name: velero-wp-pv-claim-p4lhl, uid: ada383d6-c23d-48fc-93fd-cad20f863cf4, pvc: wp-pv-claim, content_name: snapcontent-ada383d6-c23d-48fc-93fd-cad20f863cf4
Deleted volumesnapshot: velero-wp-pv-claim-p4lhl in namesapce wordpress
Created volumesnapshot: velero-wp-pv-claim-p4lhl in dm-wp-backup-snap-76mxp-hzb2f
=== Step 2. Update volumesnapshot content to new volumesnapshot in temporary namespace
Update volumesnapshotcontent snapcontent-532b6050-1bd7-4a6f-abfc-1a900bb52fc1 to remove snapshot reference
Update volumesnapshotcontent snapcontent-ada383d6-c23d-48fc-93fd-cad20f863cf4 to remove snapshot reference
=== Step 3. Create pvc reference to the new volumesnapshot in temporary namespace
Created pvc mysql-pv-claim in dm-wp-backup-snap-76mxp-hzb2f
Created pvc wp-pv-claim in dm-wp-backup-snap-76mxp-hzb2f
=== Step 4. Recreate pvc to reference pv created in step 3
Get pvc mysql-pv-claim and pv pvc-7fb33118-02a7-42db-9b18-2ba2a88c1346
Patch pv pvc-7fb33118-02a7-42db-9b18-2ba2a88c1346 with retain option
Deleted pvc mysql-pv-claim
Update pv pvc-7fb33118-02a7-42db-9b18-2ba2a88c1346 to remove reference in dm-wp-backup-snap-76mxp-hzb2f
Update pv pvc-7fb33118-02a7-42db-9b18-2ba2a88c1346 to remove reference in dm-wp-backup-snap-76mxp-hzb2f
Create pvc mysql-pv-claim in dm-wp-backup-snap-76mxp-hzb2f with pv pvc-7fb33118-02a7-42db-9b18-2ba2a88c1346
Patch pv pvc-7fb33118-02a7-42db-9b18-2ba2a88c1346 with delete option
Get pvc wp-pv-claim and pv pvc-297cb6ad-322b-4a9a-80a8-e51057d0e28a
Patch pv pvc-297cb6ad-322b-4a9a-80a8-e51057d0e28a with retain option
Deleted pvc wp-pv-claim
Update pv pvc-297cb6ad-322b-4a9a-80a8-e51057d0e28a to remove reference in dm-wp-backup-snap-76mxp-hzb2f
Update pv pvc-297cb6ad-322b-4a9a-80a8-e51057d0e28a to remove reference in dm-wp-backup-snap-76mxp-hzb2f
Create pvc wp-pv-claim in dm-wp-backup-snap-76mxp-hzb2f with pv pvc-297cb6ad-322b-4a9a-80a8-e51057d0e28a
Patch pv pvc-297cb6ad-322b-4a9a-80a8-e51057d0e28a with delete option
=== Step 5. Create pod with pvc created in step 4
build stage pod wordpress-589f976cd5-vbj5z
build stage pod wordpress-mysql-d9b8d8884-9g4r5
=== Step 6. Invoke velero to backup the temporary namespace using file system copy
Get velero backup plan wp-backup-snap-76mxp-hzb2f
Created velero backup plan generate-backup-kql6f
2. 恢复数据
root data-mover % go run main.go --action restore --backupName wp-backup-snap-76mxp-hzb2f --namespace wordpress
=== Step 1. Get filesystem copy backup
generate-backup-kql6f
=== Step 2. Delete namespace
=== Step 3. Invoke velero to restore the temporary namespace to given namespace
Created velero restore plan generate-restore-ppdmp
=== Step 4. Delete pod in given namespace
Deleted pod stage-wordpress-589f976cd5-vbj5z-d4zg7
Deleted pod stage-wordpress-mysql-d9b8d8884-9g4r5-xzr8r
=== Step 5. Invoke velero to restore original namespace
Created velero restore plan generate-restore-tfqhz
参考
Container Storage Interface Snapshot Support in Velero
https://velero.io/docs/v1.7/csi/#docs
Backup Storage Locations and Volume Snapshot Locations
https://velero.io/docs/v1.7/locations/#limitations--caveats
Velero备份了哪些资源
https://velero.cn/d/8-velero
为云原生关键工作负载保驾护航 ——Velero备份容灾最佳实践
https://mp.weixin.qq.com/s/9KZmH_pT6p5NmtzqdoQ_gg