Wednesday, October 5, 2011

VMware VCB backup problem (with iSCSI LUN)

So you're using VMware Consolidated Backup (VCB) with iSCSI disks and the backup is no longer working?
First, check the output of a failed vcbMounter backup job:

[2011-06-30 10:17:24.315 'App' 5664 error] No path to device LVID:2bedb412-a987b654-1234-012a345b6cde/2bedb412-9bc87d6e-abcd-012a345b6cde/1 found.
[2011-06-30 10:17:24.315 'BlockList' 5664 error]
[2011-06-30 10:17:24.529 'vcbMounter' 5664 error] Error: Failed to open the disk: Cannot access a SAN/iSCSI LUN backing this virtual disk. (Hint: If you are using vcbMounter you can use the option "-m nbd" to switch to network based disk access if this is what you want.) If you were attempting file-level access, stop the vmount Service by typing "net stop vmount2" on a command prompt to force vmount to re-scan for SAN LUNs and re-try the command.
[2011-06-30 10:17:24.529 'vcbMounter' 5664 error] An error occurred, cleaning up

Executing the VCB SAN debug tool is another good idea to get more information:

C:\Program Files\VMware\VMware Consolidated Backup Framework>vcbsandbg
[2011-06-30 10:20:40.300 'App' 848 info] Current working directory: C:\Program Files\VMware\VMware Consolidated Backup Framework
[2011-06-30 10:20:40.316 'BaseLibs' 848 info] HOSTINFO: Seeing Intel CPU, numCoresPerCPU 1 numThreadsPerCore 2.
[2011-06-30 10:20:40.316 'BaseLibs' 848 info] HOSTINFO: This machine has 1 physical CPUS, 1 total cores, and 2 logical CPUs.
[2011-06-30 10:20:40.316 'App' 848 verbose] Building SCSI Device List...
[2011-06-30 10:20:40.378 'App' 848 trivia] Evaluating 1 paths.
[2011-06-30 10:20:40.378 'App' 848 trivia] Trying to open path \\?\scsi#disk&ven_netapp__&prod_lun_____________&rev_7654#1&2abc3d45&6&000001#{23e45678-f9ab-12c3-45d6-01a2b34cde5f}.
[2011-06-30 10:20:40.378 'App' 848 info] Now using Path \\?\scsi#disk&ven_netapp__&prod_lun_____________&rev_7654#1&2abc3d45&6&000001#{23e45678-f9ab-12c3-45d6-01a2b34cde5f}.
[2011-06-30 10:20:40.378 'App' 848 trivia] Reading 32256 bytes from offset 0.
[2011-06-30 10:20:40.394 'App' 848 trivia] Found 1 partition(s) on this device.
[2011-06-30 10:20:40.394 'App' 848 trivia] Evaluating 1 paths.
[2011-06-30 10:20:40.394 'App' 848 trivia] Trying to open path \\?\scsi#disk&ven_netapp__&prod_lun_____________&rev_7654#1&2abc3d45&6&000001#{23e45678-f9ab-12c3-45d6-01a2b34cde5f}.
[2011-06-30 10:20:40.394 'App' 848 info] Now using Path \\?\scsi#disk&ven_netapp__&prod_lun_____________&rev_7654#1&2abc3d45&6&000001#{23e45678-f9ab-12c3-45d6-01a2b34cde5f}.
[2011-06-30 10:20:40.394 'App' 848 trivia] Reading 32256 bytes from offset 0.
[2011-06-30 10:20:40.394 'App' 848 trivia] Found 1 partition(s) on this device.
[2011-06-30 10:20:40.394 'App' 848 error] Dumping SCSI Device/LUN List.
[2011-06-30 10:20:40.394 'App' 848 info] **** Begin SCSI Device LIst ****
[2011-06-30 10:20:40.394 'App' 848 info] Found SCSI Device: NAA:70a9800070654321098765432109876b5c432d102030
[2011-06-30 10:20:40.394 'App' 848 info] Visible on 1 paths:
[2011-06-30 10:20:40.394 'App' 848 info] Device Name: \\?\scsi#disk&ven_netapp__&prod_lun_____________&rev_7654#1&2abc3d45&6&000001#{23e45678-f9ab-12c3-45d6-01a2b34cde5f}, Bus: 0 Target: 0 Lun: 2
[2011-06-30 10:20:40.409 'App' 848 info] Lun does not contain any VMFS/LVM signatures.
[2011-06-30 10:20:40.409 'App' 848 info] Found SCSI Device: NAA:70a9800070654321098765432109876b5c432d102030
[2011-06-30 10:20:40.409 'App' 848 info] Visible on 1 paths:
[2011-06-30 10:20:40.409 'App' 848 info] Device Name: \\?\scsi#disk&ven_netapp__&prod_lun_____________&rev_7654#1&2abc3d45&6&000001#{23e45678-f9ab-12c3-45d6-01a2b34cde5f}, Bus: 0 Target: 0 Lun: 3
[2011-06-30 10:20:40.409 'App' 848 info] Lun does not contain any VMFS/LVM signatures.
[2011-06-30 10:20:40.409 'App' 848 info] **** End SCSI Device LIst ****

So, the VCB is no langer able to find the VMFS disk.
Checking the Windows disk management: you will find the VMware disk in an Unallocated state

Or in the vSphere Client: (selecting a host, choose "Configuration", "Storage" and change the view to "Devices") - No VMFS partition shows up here..

What happened?
Most probably, the Windows diskpart automount feature (which is enabled by default) has written its own signature to the VMware disks.
http://technet.microsoft.com/en-us/library/cc753703(WS.10).aspx
Btw, it's generally a good idea to disable this feature on a server which is connected to iSCSI LUNs.

Solution: To change the disk signature back to VMFS, connect to the console of a vSphere host server.
Login as 'root' and execute the following command:
(to search for a disk which has no detailed informations listed)
[root@host /]# fdisk -l
...
Disk /dev/sdd: 408.0 GB, 408063836160 bytes
255 heads, 63 sectors/track, 49610 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
 Device Boot      Start         End      Blocks   Id  System
                                                                                               <= missing information
[root@host /]#
[root@host /]# fdisk -u /dev/sdd
The number of cylinders for this disk is set to 49610.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
   (e.g., DOS FDISK, OS/2 FDISK):

Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
p
Partition number (1-4): 1
First sector (63-614465535, default 63): 128
Last sector or +size or +sizeM or +sizeK (128-614465535, default 614465535):
Using default value 614465535

Command (m for help): t
Selected partition 1
Hex code (type L to list codes): fb
Changed system type of partition 1 to fb (VMware VMFS)

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.
[root@host /]# vmkfstools -V
[root@host /]# 

For more information, visit the following site:
http://kb.vmware.com/kb/1002281

Attention: If you have several disks with missing VMFS signatures, change all disk signatures at the same time.
As long as you re-signature only one disk (e.g. for testing), you could have problems connecting to this 'repaired' disk.

No comments:

Post a Comment