AKS – Quickstart to Backups & Restore with Velero

In this short blogpost I would like to show you how easily you can backup the state of your cluster, create snapshots of your persistent volumes (Azure Disks) and restore it with Velero. With the Velero setup I will use a plugin that supports running Velero in combination with Azure.

Requirements

To be able to implement the below instructions there are a couple of requirements:

  • You have an Azure account. If needed, you can create a free account with $200 credits for 12 months
  • You have git installed
  • You have the azure cli and helm installed
    • We will deploy Velero via their Helm chart
  • You have vscode (or another code editor) installed
  • You have experience with git, working from the command line and Azure

Setup Azure Services

For the whole setup we needs some Azure resources, like resource groups, a kubernetes cluster, a storage account and a service principal. We will do this step by step.

AKS cluster

First we need to setup an Azure Kubernetes cluster:

# Set variables
TENANT_ID=<<COPY TENANT_ID FROM AZURE PORTAL>>

# Log in to Azure
az login -t "${TENANT_ID}"

# Set variables
export SUBSCRIPTION_ID=$(az account show -o json | jq -r '.id')
export AKS_RESOURCE_GROUP='velero-rg'
export AKS_NAME='velero'
export BACKUP_RESOURCE_GROUP='velero-backup-rg'
export STORAGE_ACCOUNT_NAME=$(openssl rand -hex 12)
export STORAGE_ACCOUNT_CONTAINER_NAME='velero'

# Create AKS cluster
az group create --name "${AKS_RESOURCE_GROUP}" --location westeurope
az aks create --name "${AKS_NAME}" \
    --resource-group "${AKS_RESOURCE_GROUP}" \
    --enable-cluster-autoscaler \
    --min-count 2 \
    --max-count 3 \
    --no-ssh-key \
    --network-plugin azure \
    --network-policy azure \
    --location westeurope

# Connect to cluster
az aks install-cli
az aks get-credentials --name "${AKS_NAME}" \
    --resource-group "${AKS_RESOURCE_GROUP}" \
    --admin \
    --overwrite-existing

# Check cluster works
kubectl get pod -A

# Retrieve the AKS Node Resource Group, you'll need it later on
# This RG is automatically created with AKS and holds the vm's and disks
AZURE_RESOURCE_GROUP=$(az aks show --name "${AKS_NAME}" --resource-group "${AKS_RESOURCE_GROUP}" -o json | jq -r '.nodeResourceGroup')

Service Principal

After the cluster has been created we create a Service Principal that will be used by Velero:

# Create Service Principal
AZURE_CLIENT_SECRET=`az ad sp create-for-rbac --name "velero" --role "Contributor" --query 'password' -o tsv \
  --scopes  /subscriptions/"${SUBSCRIPTION_ID}"

AZURE_CLIENT_ID=`az ad sp list --display-name "velero" --query '[0].appId' -o tsv`

You can check here for a more fine grained permissions than Contributor on Subscription level.

Storage Account

We also need to create a separate resource group with a storage account where the backups will be stored in, but also to store snapshots of the Azure Disks in that backup resource group.

# Create Backup Resource Group
az group create --name "${BACKUP_RESOURCE_GROUP}" --location westeurope

# Create storage account
az storage account create --name "${STORAGE_ACCOUNT_NAME}" \
  --resource-group "${BACKUP_RESOURCE_GROUP}" \
  --location westeurope

# Create container
az storage container create \
  --name "${STORAGE_ACCOUNT_CONTAINER_NAME}" 
  --account-name "${STORAGE_ACCOUNT_NAME}" \
  --public-access off

Persistent Volume Claim

Now that we have our Azure resources in place we can deploy a simple pod with a persistent volume claim. We will deploy it in the namespace test.

You can deploy it by saving the example below locally to a yaml file and apply it, or you can just use kubectl apply -f https://gist.githubusercontent.com/bramvdklinkenberg/57d7bc663d5e56d50dc776a6c09a331b/raw/8dd1fec6a5d27e8fb0c68c1eec3554b56ade6dab/pod_pvc.yaml.

When you check the pod and pvc you will see a result similar like below:

❯ kubectl get pod,pvc --namespace test
NAME        READY   STATUS    RESTARTS   AGE
pod/mypod   1/1     Running   0          40s

NAME                                       STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE
persistentvolumeclaim/azure-managed-disk   Bound    pvc-a53a067d-9ccb-407a-bf8a-5f1c697392ea   5Gi        RWO            managed-premium   40s

Setup Velero

We will install Velero to the cluster via Helm.

Velero Helm Chart

First we need to clone the repository with the Velero helm chart.

# Clone helm chart
git clone https://github.com/vmware-tanzu/helm-charts.git
cd helm-charts/charts/velero/

Once we have cloned the repository we have to edit the credentials file in the repository that Velero needs.

Replace the <<VALUES>> with your/ the correct data!!

The AZURE_RESOURCE_GROUP value must be the node resource group of the cluster!!


cat << EOF  > ./credentials-velero
AZURE_SUBSCRIPTION_ID=<<SUBSCRIPTION_ID>>
AZURE_TENANT_ID=<<TENANT_ID>>
AZURE_CLIENT_ID=<<AZURE_CLIENT_ID>>
AZURE_CLIENT_SECRET=<<AZURE_CLIENT_SECRET>>
AZURE_RESOURCE_GROUP=<<AZURE_RESOURCE_GROUP>>
AZURE_CLOUD_NAME=AzurePublicCloud
EOF

And last we need to do a bit of editing in the values file. I took the default values file and edited the following settings:

  • initContainers
  • configuration
    • configuration.backupStorageLocation
    • configuration.volumeSnapshotLocation
  • schedules

The initConfiguration setting makes sure we deploy the plugin that is needed for velero to work on Azure.

In the configuration section we specify the azure resource so velero knows to which storage account it needs to save the backup

You can find the adjusted values file here, or you can copy the example below.

Again, replace the <<VALUES>>!!

Deploy Velero Helm Chart

We are now ready to deploy the Velero helm chart!

# Deploy the velero helm chart
helm upgrade velero --install --create-namespace --namespace velero --set-file credentials.secretContents.cloud=./credentials-velero . -f values.yaml

Once it has been deployed you can check it and you should see a similar results as below:

❯ helm -n velero list
NAME  	NAMESPACE	REVISION	UPDATED                            	STATUS  	CHART        	APP VERSION
velero	velero   	1       	2022-01-04 23:14:39.96287 +0100 CET	deployed	velero-2.27.1	1.7.1
❯ kubectl -n velero get deployment
NAME     READY   UP-TO-DATE   AVAILABLE   AGE
velero   1/1     1            1           9m

Backups & Snapshots

In the vales file we configured a schedule that runs every day at 1AM, but right after you first deploy (or update) the helm chart it will do an initial backup and will also create a first snapshot of disks found in the AKS node resource group.

Azure Portal

Check in the Azure portal and you will see something similar as below screen shots.

Backup

Snapshots

Velero CLI

You can also use the Velero CLI to check your backups.

❯ velero backup get
NAME                             STATUS            ERRORS   WARNINGS   CREATED                         EXPIRES   STORAGE LOCATION   SELECTOR
velero-mybackup-20220104224439   Completed         0        0          2022-01-04 23:44:39 +0100 CET   4d        default            <none>
velero-mybackup-20220104224242   Completed         0        0          2022-01-04 23:42:42 +0100 CET   4d        default            <none>
velero-mybackup-20220104224039   Completed         0        0          2022-01-04 23:40:44 +0100 CET   4d        default            <none>
velero-mybackup-20220104223706   Completed         0        0          2022-01-04 23:37:06 +0100 CET   4d        default            <none>

If you see failed backups then you can use velero backup describe <backup name> and velero backup logs <backup name> to check for more information on why the backup has failed.

Kubectl

You can also use kubectl to get an overview of your backups.

❯ kubectl -n velero get backups
NAME                             AGE
velero-mybackup-20220104223706   25m
velero-mybackup-20220104224039   25m
velero-mybackup-20220104224242   23m
velero-mybackup-20220104224439   21m

For more information about the backups or for troubleshooting purposes you can use kubectl -n velero describe <backup name> and kubectl -n velero logs <velero pod name>.

Restore

So, now the cool part. We will delete the test namespace, which will also delete the pod and its pvc, and then restore it via a backup and snapshot that was created by Velero on the time that we configured in the schedule.

We can now adjust the schedule and update the helm release. Set the schedule 2 minutes ahead of your current time and save the values file.

# Update the release after the schedule change
helm upgrade velero --install --create-namespace --namespace velero --set-file credentials.secretContents.cloud=./credentials-velero . -f values.yaml

Check with velero get backups if the latest backup completed and copy the name of that backup! We will then remove the namespace.

# Delete namespace
kubectl delete ns test

# Check if the namespace has been deleted
kubectl get ns

Now we can restore the namespace, including the pod and its pvc with the velero cli:

❯ velero restore create --from-backup velero-mybackup-20220104233139
Restore request "velero-mybackup-20220104233139-20220105003756" submitted successfully.
Run `velero restore describe velero-mybackup-20220104233139-20220105003756` or `velero restore logs velero-mybackup-20220104233139-20220105003756` for more details.

Once the restore is done you can see that your workload is back as well.

❯ kubectl -n test get pod,pvc
NAME        READY   STATUS    RESTARTS   AGE
pod/mypod   1/1     Running   0          70s

NAME                                       STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE
persistentvolumeclaim/azure-managed-disk   Bound    pvc-a53a067d-9ccb-407a-bf8a-5f1c697392ea   5Gi        RWO            managed-premium   70s

When you check the AKS node resource group you will see that the restored pvc now has a name that starts with restore- instead of pvc-

Velero adds some tags to the disk, which will show the new pvc name on the aks cluster and from which backup it was created.

Conclusion

As you can see it is fairly easy to implement and configure Velero to create backups from your cluster state and snapshots of your disks.

The example in this blog is of course a simplified setup to get you up and running quickly. For more specific configuration options (like scheduling, backupstorage and volumesnapshot locations) if would refer to the Velero docs and the Azure Plugin Repository.

Leave a Reply

Your email address will not be published. Required fields are marked *