Situation: we are in a cloud environment using bare metal servers installed with VMware vSphere, VMware vSAN and VMware NSX.
Cloud Backup is a topic of this article and architecture decision was made in a way to have a possibility to restore servers immediately in case of VMware vSAN failure.
To make this possible, I move critical restore component to cloud provided NFS datastore:
- vCenter Server
- Backup Server
- Backup proxy
In case of primary datastore failure in this case VMware vSAN, the goal is to launch the restore immediately and not to start installing vCenter or backup proxy which might take you out of restore window.
Until this point everything has been looking straightforward and logical, but now I find out strange behavior.
Backup proxy goes power off during mounting of VMware vSAN disks
The issue is strictly related to VMware vSAN as other VMs which were not on VMware vSAN continue to backup normally. If you copy VM from vSAN to local datastore it will backup normally too. The behavior is bad because it interrupts all backups in progress and prevent new backups to start. Basically, you lose your backups that day. I would expect a backup of a VM having errors but backup of other VMs would continue to run.
In vSphere logs there is (vmx) Unexpected signal 11
At this point I open SR to VMware and PMR for backup. But nobody is able to understand what is the problem and was asking the logs from various components which was not affected.
So I did continue searching for symptoms and one of VMware KBs 2146829 was able to help me to workaround the issue.
Resolution: the backup proxy needs to be on VMware vSAN datastore to backup VMs on VMware vSAN correctly.
But wait, it violates the Architecture Decision for the restore. Well, creation of second backup proxy only for the restore and place it out of vSAN datastore seems to be the only possibility here.
Has any of you met such an issue or are you aware of any other solution? Post a comment then!