Jump to content


Photo

Quiesced Snapshot Failed on Windows Server 2012 STD

Started by bananaC , 04 June 2014 - 08:39 AM
5 replies to this topic

Best Answer James Cannon , 05 June 2014 - 05:08 PM

Free space does not seem to be an issue. Here is your issue:

 

Jun  5 10:04:08 xenserver6.2 SM: [10350] FAILED in util.pread: (rc 22) stdout: 'primary footer invalid: invalid cookie
Jun  5 10:04:08 xenserver6.2 SM: [10350] /dev/VG_XenStorage-8cd473b0-9dcc-f705-e6c8-980edf4a17ed/VHD-5ddaea0f-e079-4c5f-8cde-12a913db46de appears invalid; dumping metadata
Jun  5 10:04:08 xenserver6.2 SM: [10350] VHD Footer Summary:
Jun  5 10:04:08 xenserver6.2 SM: [10350] -------------------
Jun  5 10:04:08 xenserver6.2 SM: [10350] Cookie              : conectix
Jun  5 10:04:08 xenserver6.2 SM: [10350] Features            : (0x00000002) <RESV>
Jun  5 10:04:08 xenserver6.2 SM: [10350] File format version : Major: 1, Minor: 0
Jun  5 10:04:08 xenserver6.2 SM: [10350] Data offset         : 512
Jun  5 10:04:08 xenserver6.2 SM: [10350] Timestamp           : Thu May 22 09:57:01 2014
Jun  5 10:04:08 xenserver6.2 SM: [10350] Creator Application : 'tap'
Jun  5 10:04:08 xenserver6.2 SM: [10350] Creator version     : Major: 1, Minor: 3
Jun  5 10:04:08 xenserver6.2 SM: [10350] Creator OS          : Unknown!
Jun  5 10:04:08 xenserver6.2 SM: [10350] Original disk size  : 61440 MB (64424509440 Bytes)
Jun  5 10:04:08 xenserver6.2 SM: [10350] Current disk size   : 61440 MB (64424509440 Bytes)
Jun  5 10:04:08 xenserver6.2 SM: [10350] Geometry            : Cyl: 30840, Hds: 16, Sctrs: 255
Jun  5 10:04:08 xenserver6.2 SM: [10350]                     : = 61439 MB (64423526400 Bytes)
Jun  5 10:04:08 xenserver6.2 SM: [10350] Disk type           : Differencing hard disk
Jun  5 10:04:08 xenserver6.2 SM: [10350] Checksum            : 0xffffef9c|0xffffef9c (Good!)
Jun  5 10:04:08 xenserver6.2 SM: [10350] UUID                : fa02b090-6d3f-462a-bdcf-9ad744c328f4
Jun  5 10:04:08 xenserver6.2 SM: [10350] Saved state         : No
Jun  5 10:04:08 xenserver6.2 SM: [10350] Hidden              : 1
Jun  5 10:04:08 xenserver6.2 SM: [10350] 
Jun  5 10:04:08 xenserver6.2 SM: [10350] VHD Header Summary:
Jun  5 10:04:08 xenserver6.2 SM: [10350] -------------------
Jun  5 10:04:08 xenserver6.2 SM: [10350] Cookie              : cxsparse
Jun  5 10:04:08 xenserver6.2 SM: [10350] Data offset (unusd) : 18446744073709
Jun  5 10:04:08 xenserver6.2 SM: [10350] Table offset        : 1536
Jun  5 10:04:08 xenserver6.2 SM: [10350] Header version      : 0x00010000
Jun  5 10:04:08 xenserver6.2 SM: [10350] Max BAT size        : 1048576
Jun  5 10:04:08 xenserver6.2 SM: [10350] Block size          : 2097152 (2 MB)
Jun  5 10:04:08 xenserver6.2 SM: [10350] Parent name         : VG_XenStorage--8cd473b0--9dcc--f705--e6c8--980edf4a17ed-VHD--23da75f6--c704--4623--9741--33c6d7945ed8
Jun  5 10:04:08 xenserver6.2 SM: [10350] Parent UUID         : 4abe4d4d-d8d6-44d6-b49e-60ded40adb27
Jun  5 10:04:08 xenserver6.2 SM: [10350] Parent timestamp    : Thu May 22 09:57:00 2014
Jun  5 10:04:08 xenserver6.2 SM: [10350] Checksum            : 0xffffc83a|0xffffc83a (Good!)
Jun  5 10:04:08 xenserver6.2 SM: [10350] 
Jun  5 10:04:08 xenserver6.2 SM: [10350] VHD Parent Locators:
Jun  5 10:04:08 xenserver6.2 SM: [10350] --------------------
Jun  5 10:04:08 xenserver6.2 SM: [10350] locator:            : 0
Jun  5 10:04:08 xenserver6.2 SM: [10350]        code         : PLAT_CODE_MACX
Jun  5 10:04:08 xenserver6.2 SM: [10350]        data_space   : 512
Jun  5 10:04:08 xenserver6.2 SM: [10350]        data_length  : 110
Jun  5 10:04:08 xenserver6.2 SM: [10350]        data_offset  : 4327424
Jun  5 10:04:08 xenserver6.2 SM: [10350]        decoded name : ./VG_XenStorage--8cd473b0--9dcc--f705--e6c8--980edf4a17ed-VHD--23da75f6--c704--4623--9741--33c6d7945ed8
Jun  5 10:04:08 xenserver6.2 SM: [10350] 
Jun  5 10:04:08 xenserver6.2 SM: [10350] locator:            : 1
Jun  5 10:04:08 xenserver6.2 SM: [10350]        code         : PLAT_CODE_W2KU
Jun  5 10:04:08 xenserver6.2 SM: [10350]        data_space   : 512
Jun  5 10:04:08 xenserver6.2 SM: [10350]        data_length  : 206
Jun  5 10:04:08 xenserver6.2 SM: [10350]        data_offset  : 4327936
Jun  5 10:04:08 xenserver6.2 SM: [10350]        decoded name : ./VG_XenStorage--8cd473b0--9dcc--f705--e6c8--980edf4a17ed-VHD--23da75f6--c704--4623--9741--33c6d7945ed8
Jun  5 10:04:08 xenserver6.2 SM: [10350] 
Jun  5 10:04:08 xenserver6.2 SM: [10350] locator:            : 2
Jun  5 10:04:08 xenserver6.2 SM: [10350]        code         : PLAT_CODE_W2RU
Jun  5 10:04:08 xenserver6.2 SM: [10350]        data_space   : 512
Jun  5 10:04:08 xenserver6.2 SM: [10350]        data_length  : 206
Jun  5 10:04:08 xenserver6.2 SM: [10350]        data_offset  : 4328448
Jun  5 10:04:08 xenserver6.2 SM: [10350]        decoded name : ./VG_XenStorage--8cd473b0--9dcc--f705--e6c8--980edf4a17ed-VHD--23da75f6--c704--4623--9741--33c6d7945ed8
Jun  5 10:04:08 xenserver6.2 SM: [10350] 
Jun  5 10:04:08 xenserver6.2 SM: [10350] VHD Batmap Summary:
Jun  5 10:04:08 xenserver6.2 SM: [10350] -------------------
Jun  5 10:04:08 xenserver6.2 SM: [10350] Batmap offset       : 4196352
Jun  5 10:04:08 xenserver6.2 SM: [10350] Batmap size (secs)  : 256
Jun  5 10:04:08 xenserver6.2 SM: [10350] Batmap version      : 0x00010002
Jun  5 10:04:08 xenserver6.2 SM: [10350] Checksum            : 0xffffffff|0xffffffff (Good!)
 

We need to know more about the VDI:

xe vbd-list vdi-uuid=5ddaea0f-e079-4c5f-8cde-12a913db46de

 

If we look up we see a parent of the VDI. Here is a section of 1 parent and our parent of the VDI with issue. The chain starts with VDI with ID: afa31417...  and ends with VDI with ID: 48e5efb0. The VDIs after 48e5f3fb0 are other branches, but are included for completeness of my post.

 

Jun  5 10:04:05 xenserver6.2 SMGC: [10350]                                         *afa31417[VHD](60.000G//880.000M|n)
Jun  5 10:04:05 xenserver6.2 SMGC: [10350]                                             *2bf70e4e[VHD](60.000G//688.000M|n)
Jun  5 10:04:05 xenserver6.2 SMGC: [10350]                                                 *b55facdc[VHD](60.000G//176.000M|n)
Jun  5 10:04:05 xenserver6.2 SMGC: [10350]                                                     a46a02e6[VHD](60.000G//8.000M|n)
Jun  5 10:04:05 xenserver6.2 SMGC: [10350]                                                     *929af42d[VHD](60.000G//384.000M|n)
Jun  5 10:04:05 xenserver6.2 SMGC: [10350]                                                         *9a700f9b[VHD](60.000G//212.000M|n)
Jun  5 10:04:05 xenserver6.2 SMGC: [10350]                                                             *b63680c4[VHD](60.000G//156.000M|n)
Jun  5 10:04:05 xenserver6.2 SMGC: [10350]                                                                 *23da75f6[VHD](60.000G//200.000M|n)
Jun  5 10:04:05 xenserver6.2 SMGC: [10350]                                                                     770f0f2c[VHD](60.000G//60.125G|n)
Jun  5 10:04:05 xenserver6.2 SMGC: [10350]                                                                     *5ddaea0f[VHD](60.000G//176.000M|n)
Jun  5 10:04:05 xenserver6.2 SMGC: [10350]                                                                         *4692279a[VHD](60.000G//196.000M|a)
Jun  5 10:04:05 xenserver6.2 SMGC: [10350]                                                                             *abe947bd[VHD](60.000G//152.000M|n)
Jun  5 10:04:05 xenserver6.2 SMGC: [10350]                                                                                 *784283e1[VHD](60.000G//164.000M|n)
Jun  5 10:04:05 xenserver6.2 SMGC: [10350]                                                                                     *a57708d7[VHD](60.000G//276.000M|n)
Jun  5 10:04:05 xenserver6.2 SMGC: [10350]                                                                                         48e5efb0[VHD](60.000G//216.000M|n)
Jun  5 10:04:05 xenserver6.2 SMGC: [10350]                                                                                     6c1df35d[VHD](60.000G//8.000M|n)
Jun  5 10:04:05 xenserver6.2 SMGC: [10350]                                                                 4a35478e[VHD](60.000G//8.000M|n)
Jun  5 10:04:05 xenserver6.2 SMGC: [10350]                                                         be55dce7[VHD](60.000G//8.000M|n)
Jun  5 10:04:05 xenserver6.2 SMGC: [10350]                                                 8736b79d[VHD](60.000G//8.000M|n)
Jun  5 10:04:05 xenserver6.2 SMGC: [10350]                                    

 

I see lot of nested VDIs. Perhaps you had performed a number of snapshots, created VMs from snapshots, fast-copies, backup software that uses snapshots, etc. As you already did a copy (it should be a full copy - so we are not chain-linked to any VDIs). The above shows we are chain-linked. I later see a real bad exception error ...

 

Jun  5 10:04:23 xenserver6.2 SMGC: [10350] *~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*
Jun  5 10:04:23 xenserver6.2 SMGC: [10350]          ***********************
Jun  5 10:04:23 xenserver6.2 SMGC: [10350]          *  E X C E P T I O N  *
Jun  5 10:04:23 xenserver6.2 SMGC: [10350]          ***********************
Jun  5 10:04:23 xenserver6.2 SMGC: [10350] coalesce: EXCEPTION util.SMException, VHD *5ddaea0f[VHD](60.000G//176.000M|n) corrupted
Jun  5 10:04:23 xenserver6.2 SMGC: [10350]   File "/opt/xensource/sm/cleanup.py", line 1414, in coalesce
Jun  5 10:04:23 xenserver6.2 SMGC: [10350]     self._coalesce(vdi)
Jun  5 10:04:23 xenserver6.2 SMGC: [10350]   File "/opt/xensource/sm/cleanup.py", line 1604, in _coalesce
Jun  5 10:04:23 xenserver6.2 SMGC: [10350]     vdi._doCoalesce()
Jun  5 10:04:23 xenserver6.2 SMGC: [10350]   File "/opt/xensource/sm/cleanup.py", line 1063, in _doCoalesce
Jun  5 10:04:23 xenserver6.2 SMGC: [10350]     self.parent.validate()
Jun  5 10:04:23 xenserver6.2 SMGC: [10350]   File "/opt/xensource/sm/cleanup.py", line 1056, in validate
Jun  5 10:04:23 xenserver6.2 SMGC: [10350]     VDI.validate(self, fast)
Jun  5 10:04:23 xenserver6.2 SMGC: [10350]   File "/opt/xensource/sm/cleanup.py", line 646, in validate
Jun  5 10:04:23 xenserver6.2 SMGC: [10350]     raise util.SMException("VHD %s corrupted" % self)
Jun  5 10:04:23 xenserver6.2 SMGC: [10350] 
Jun  5 10:04:23 xenserver6.2 SMGC: [10350] *~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*
Jun  5 10:04:23 xenserver6.2 SMGC: [10350] Coalesce failed, skipping
 

So, we need to start making full copies of a number of VMs as the coalesce of VDIs is failing. If it fails for one VDI chain, it may proceed with other VDI chains, but we must resolve the above exception error. Hopefully we can make a full copy of the VM before deleting it - we will have to delete the bad VDIs noted in log. :(

bananaC Members

Banana Chalie
  • 18 posts

Posted 04 June 2014 - 08:39 AM

I have a Windows VM (Windows Server 2012 64-bit) which can take quiesced snapshot successfully about one week ago. 

But today when I try to take quiesced snapshot of the same VM, I got the following error message from XenCenter:

Jun 4, 2014 1:59:21 PM Error: Snapshotting VM 'Windows Server 2012 STD'... - The quiesced-snapshot operation failed for an unexpected reason

(I have checked that Xen VSS provider is installed properly on that VM.)

 

I try to restart tool stack, but it becomes worse. XenCenter shows that the operation failed but when I look at the snapshot panel in XenCenter, there is a new snapshot created.(as attachment)

 

Has anyone run into the same problem and know how to solve it? 

Thanks in advanced.

 

ps. The server in running XenServer 6.2 with Hotfix SP1. And the vm had installed the newest xstools.

 

 

 

Attached Thumbnails

  • snap.PNG


James Cannon Citrix Employees

James Cannon
  • 4,402 posts

Posted 04 June 2014 - 02:09 PM

You will want to look at the end of the StorageManagement (/var/log/SMlog) shortly after error to determine cause of issue. Once done, perhaps you can post the error from SMlog to forum.



bananaC Members

Banana Chalie
  • 18 posts

Posted 05 June 2014 - 02:28 AM

These might be snapshot-related logs I got from /var/log/SMLog
 
*quiesce-snapshot was taken at 10:04:00
 
Line 144271: Jun  5 10:04:01 xenserver6.2 snapwatchd: [6869] STATUS: provider-initialized
Line 144272: Jun  5 10:04:02 xenserver6.2 snapwatchd: [6869] STATUS: create-snapshots
Line 144275: Jun  5 10:04:03 xenserver6.2 snapwatchd: [6869] DevUUID requested: d2e161cd-53b1-4cdd-9cb4-1844f6899e52
Line 144276: Jun  5 10:04:03 xenserver6.2 snapwatchd: [6869] Generating snap: Snapshot of 186efb51-02cb-9002-85df-7fddbe0d0836 [2014-6-5:10:4:3]
Line 145047: Jun  5 10:04:13 xenserver6.2 snapwatchd: [6869] Adding task to the cleanup queue...
Line 145048: Jun  5 10:04:13 xenserver6.2 snapwatchd: [6869] Generated the task cleanup GC file name: /tmp/SNAPGC:VM:655b794b-5b48-456d-0ba3-203b3d789e67
Line 145049: Jun  5 10:04:13 xenserver6.2 snapwatchd: [6869] Asynch VM.snapshot was timed out
Line 145050: Jun  5 10:04:13 xenserver6.2 snapwatchd: [6869] Logging out from xapi session.
Line 145051: Jun  5 10:04:13 xenserver6.2 snapwatchd: [6869] Unlocking the VM.
Line 145052: Jun  5 10:04:13 xenserver6.2 snapwatchd: [6869] COMPLETION STATUS: snapshots-failed

 

 

Thanks for you help.

Attached Files



Alan Lantz Members

Alan Lantz
  • 6,985 posts

Posted 05 June 2014 - 04:08 AM

failed snapshots are more than likely storage. Do you have 3x free space? Is there any other existing snaphots on the VM? If you migrate the VM to another host does it still fail ?

 

Alan Lantz

SysAdmin

City of Rogers, AR

 



bananaC Members

Banana Chalie
  • 18 posts

Posted 05 June 2014 - 05:20 AM

Hi

 

I can take disk snapshot (xe vm-snapshot) and disk-and-memory snapshot (xe vm-checkpoint) successfully. Only quiesce snapshot (xe snapshot-with-quiesce) failed. The vm is on local storage of master server. After I copy (xe vm-copy) to another local storage in the same pool, quiesce snapshot still failed.

 

The storage information on XenCenter shows 7128.4 GB used of 22342 GB total (6947.1 GB allocated).

 

Thanks.



James Cannon Citrix Employees

James Cannon
  • 4,402 posts

Posted 05 June 2014 - 05:08 PM

Free space does not seem to be an issue. Here is your issue:

 

Jun  5 10:04:08 xenserver6.2 SM: [10350] FAILED in util.pread: (rc 22) stdout: 'primary footer invalid: invalid cookie
Jun  5 10:04:08 xenserver6.2 SM: [10350] /dev/VG_XenStorage-8cd473b0-9dcc-f705-e6c8-980edf4a17ed/VHD-5ddaea0f-e079-4c5f-8cde-12a913db46de appears invalid; dumping metadata
Jun  5 10:04:08 xenserver6.2 SM: [10350] VHD Footer Summary:
Jun  5 10:04:08 xenserver6.2 SM: [10350] -------------------
Jun  5 10:04:08 xenserver6.2 SM: [10350] Cookie              : conectix
Jun  5 10:04:08 xenserver6.2 SM: [10350] Features            : (0x00000002) <RESV>
Jun  5 10:04:08 xenserver6.2 SM: [10350] File format version : Major: 1, Minor: 0
Jun  5 10:04:08 xenserver6.2 SM: [10350] Data offset         : 512
Jun  5 10:04:08 xenserver6.2 SM: [10350] Timestamp           : Thu May 22 09:57:01 2014
Jun  5 10:04:08 xenserver6.2 SM: [10350] Creator Application : 'tap'
Jun  5 10:04:08 xenserver6.2 SM: [10350] Creator version     : Major: 1, Minor: 3
Jun  5 10:04:08 xenserver6.2 SM: [10350] Creator OS          : Unknown!
Jun  5 10:04:08 xenserver6.2 SM: [10350] Original disk size  : 61440 MB (64424509440 Bytes)
Jun  5 10:04:08 xenserver6.2 SM: [10350] Current disk size   : 61440 MB (64424509440 Bytes)
Jun  5 10:04:08 xenserver6.2 SM: [10350] Geometry            : Cyl: 30840, Hds: 16, Sctrs: 255
Jun  5 10:04:08 xenserver6.2 SM: [10350]                     : = 61439 MB (64423526400 Bytes)
Jun  5 10:04:08 xenserver6.2 SM: [10350] Disk type           : Differencing hard disk
Jun  5 10:04:08 xenserver6.2 SM: [10350] Checksum            : 0xffffef9c|0xffffef9c (Good!)
Jun  5 10:04:08 xenserver6.2 SM: [10350] UUID                : fa02b090-6d3f-462a-bdcf-9ad744c328f4
Jun  5 10:04:08 xenserver6.2 SM: [10350] Saved state         : No
Jun  5 10:04:08 xenserver6.2 SM: [10350] Hidden              : 1
Jun  5 10:04:08 xenserver6.2 SM: [10350] 
Jun  5 10:04:08 xenserver6.2 SM: [10350] VHD Header Summary:
Jun  5 10:04:08 xenserver6.2 SM: [10350] -------------------
Jun  5 10:04:08 xenserver6.2 SM: [10350] Cookie              : cxsparse
Jun  5 10:04:08 xenserver6.2 SM: [10350] Data offset (unusd) : 18446744073709
Jun  5 10:04:08 xenserver6.2 SM: [10350] Table offset        : 1536
Jun  5 10:04:08 xenserver6.2 SM: [10350] Header version      : 0x00010000
Jun  5 10:04:08 xenserver6.2 SM: [10350] Max BAT size        : 1048576
Jun  5 10:04:08 xenserver6.2 SM: [10350] Block size          : 2097152 (2 MB)
Jun  5 10:04:08 xenserver6.2 SM: [10350] Parent name         : VG_XenStorage--8cd473b0--9dcc--f705--e6c8--980edf4a17ed-VHD--23da75f6--c704--4623--9741--33c6d7945ed8
Jun  5 10:04:08 xenserver6.2 SM: [10350] Parent UUID         : 4abe4d4d-d8d6-44d6-b49e-60ded40adb27
Jun  5 10:04:08 xenserver6.2 SM: [10350] Parent timestamp    : Thu May 22 09:57:00 2014
Jun  5 10:04:08 xenserver6.2 SM: [10350] Checksum            : 0xffffc83a|0xffffc83a (Good!)
Jun  5 10:04:08 xenserver6.2 SM: [10350] 
Jun  5 10:04:08 xenserver6.2 SM: [10350] VHD Parent Locators:
Jun  5 10:04:08 xenserver6.2 SM: [10350] --------------------
Jun  5 10:04:08 xenserver6.2 SM: [10350] locator:            : 0
Jun  5 10:04:08 xenserver6.2 SM: [10350]        code         : PLAT_CODE_MACX
Jun  5 10:04:08 xenserver6.2 SM: [10350]        data_space   : 512
Jun  5 10:04:08 xenserver6.2 SM: [10350]        data_length  : 110
Jun  5 10:04:08 xenserver6.2 SM: [10350]        data_offset  : 4327424
Jun  5 10:04:08 xenserver6.2 SM: [10350]        decoded name : ./VG_XenStorage--8cd473b0--9dcc--f705--e6c8--980edf4a17ed-VHD--23da75f6--c704--4623--9741--33c6d7945ed8
Jun  5 10:04:08 xenserver6.2 SM: [10350] 
Jun  5 10:04:08 xenserver6.2 SM: [10350] locator:            : 1
Jun  5 10:04:08 xenserver6.2 SM: [10350]        code         : PLAT_CODE_W2KU
Jun  5 10:04:08 xenserver6.2 SM: [10350]        data_space   : 512
Jun  5 10:04:08 xenserver6.2 SM: [10350]        data_length  : 206
Jun  5 10:04:08 xenserver6.2 SM: [10350]        data_offset  : 4327936
Jun  5 10:04:08 xenserver6.2 SM: [10350]        decoded name : ./VG_XenStorage--8cd473b0--9dcc--f705--e6c8--980edf4a17ed-VHD--23da75f6--c704--4623--9741--33c6d7945ed8
Jun  5 10:04:08 xenserver6.2 SM: [10350] 
Jun  5 10:04:08 xenserver6.2 SM: [10350] locator:            : 2
Jun  5 10:04:08 xenserver6.2 SM: [10350]        code         : PLAT_CODE_W2RU
Jun  5 10:04:08 xenserver6.2 SM: [10350]        data_space   : 512
Jun  5 10:04:08 xenserver6.2 SM: [10350]        data_length  : 206
Jun  5 10:04:08 xenserver6.2 SM: [10350]        data_offset  : 4328448
Jun  5 10:04:08 xenserver6.2 SM: [10350]        decoded name : ./VG_XenStorage--8cd473b0--9dcc--f705--e6c8--980edf4a17ed-VHD--23da75f6--c704--4623--9741--33c6d7945ed8
Jun  5 10:04:08 xenserver6.2 SM: [10350] 
Jun  5 10:04:08 xenserver6.2 SM: [10350] VHD Batmap Summary:
Jun  5 10:04:08 xenserver6.2 SM: [10350] -------------------
Jun  5 10:04:08 xenserver6.2 SM: [10350] Batmap offset       : 4196352
Jun  5 10:04:08 xenserver6.2 SM: [10350] Batmap size (secs)  : 256
Jun  5 10:04:08 xenserver6.2 SM: [10350] Batmap version      : 0x00010002
Jun  5 10:04:08 xenserver6.2 SM: [10350] Checksum            : 0xffffffff|0xffffffff (Good!)
 

We need to know more about the VDI:

xe vbd-list vdi-uuid=5ddaea0f-e079-4c5f-8cde-12a913db46de

 

If we look up we see a parent of the VDI. Here is a section of 1 parent and our parent of the VDI with issue. The chain starts with VDI with ID: afa31417...  and ends with VDI with ID: 48e5efb0. The VDIs after 48e5f3fb0 are other branches, but are included for completeness of my post.

 

Jun  5 10:04:05 xenserver6.2 SMGC: [10350]                                         *afa31417[VHD](60.000G//880.000M|n)
Jun  5 10:04:05 xenserver6.2 SMGC: [10350]                                             *2bf70e4e[VHD](60.000G//688.000M|n)
Jun  5 10:04:05 xenserver6.2 SMGC: [10350]                                                 *b55facdc[VHD](60.000G//176.000M|n)
Jun  5 10:04:05 xenserver6.2 SMGC: [10350]                                                     a46a02e6[VHD](60.000G//8.000M|n)
Jun  5 10:04:05 xenserver6.2 SMGC: [10350]                                                     *929af42d[VHD](60.000G//384.000M|n)
Jun  5 10:04:05 xenserver6.2 SMGC: [10350]                                                         *9a700f9b[VHD](60.000G//212.000M|n)
Jun  5 10:04:05 xenserver6.2 SMGC: [10350]                                                             *b63680c4[VHD](60.000G//156.000M|n)
Jun  5 10:04:05 xenserver6.2 SMGC: [10350]                                                                 *23da75f6[VHD](60.000G//200.000M|n)
Jun  5 10:04:05 xenserver6.2 SMGC: [10350]                                                                     770f0f2c[VHD](60.000G//60.125G|n)
Jun  5 10:04:05 xenserver6.2 SMGC: [10350]                                                                     *5ddaea0f[VHD](60.000G//176.000M|n)
Jun  5 10:04:05 xenserver6.2 SMGC: [10350]                                                                         *4692279a[VHD](60.000G//196.000M|a)
Jun  5 10:04:05 xenserver6.2 SMGC: [10350]                                                                             *abe947bd[VHD](60.000G//152.000M|n)
Jun  5 10:04:05 xenserver6.2 SMGC: [10350]                                                                                 *784283e1[VHD](60.000G//164.000M|n)
Jun  5 10:04:05 xenserver6.2 SMGC: [10350]                                                                                     *a57708d7[VHD](60.000G//276.000M|n)
Jun  5 10:04:05 xenserver6.2 SMGC: [10350]                                                                                         48e5efb0[VHD](60.000G//216.000M|n)
Jun  5 10:04:05 xenserver6.2 SMGC: [10350]                                                                                     6c1df35d[VHD](60.000G//8.000M|n)
Jun  5 10:04:05 xenserver6.2 SMGC: [10350]                                                                 4a35478e[VHD](60.000G//8.000M|n)
Jun  5 10:04:05 xenserver6.2 SMGC: [10350]                                                         be55dce7[VHD](60.000G//8.000M|n)
Jun  5 10:04:05 xenserver6.2 SMGC: [10350]                                                 8736b79d[VHD](60.000G//8.000M|n)
Jun  5 10:04:05 xenserver6.2 SMGC: [10350]                                    

 

I see lot of nested VDIs. Perhaps you had performed a number of snapshots, created VMs from snapshots, fast-copies, backup software that uses snapshots, etc. As you already did a copy (it should be a full copy - so we are not chain-linked to any VDIs). The above shows we are chain-linked. I later see a real bad exception error ...

 

Jun  5 10:04:23 xenserver6.2 SMGC: [10350] *~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*
Jun  5 10:04:23 xenserver6.2 SMGC: [10350]          ***********************
Jun  5 10:04:23 xenserver6.2 SMGC: [10350]          *  E X C E P T I O N  *
Jun  5 10:04:23 xenserver6.2 SMGC: [10350]          ***********************
Jun  5 10:04:23 xenserver6.2 SMGC: [10350] coalesce: EXCEPTION util.SMException, VHD *5ddaea0f[VHD](60.000G//176.000M|n) corrupted
Jun  5 10:04:23 xenserver6.2 SMGC: [10350]   File "/opt/xensource/sm/cleanup.py", line 1414, in coalesce
Jun  5 10:04:23 xenserver6.2 SMGC: [10350]     self._coalesce(vdi)
Jun  5 10:04:23 xenserver6.2 SMGC: [10350]   File "/opt/xensource/sm/cleanup.py", line 1604, in _coalesce
Jun  5 10:04:23 xenserver6.2 SMGC: [10350]     vdi._doCoalesce()
Jun  5 10:04:23 xenserver6.2 SMGC: [10350]   File "/opt/xensource/sm/cleanup.py", line 1063, in _doCoalesce
Jun  5 10:04:23 xenserver6.2 SMGC: [10350]     self.parent.validate()
Jun  5 10:04:23 xenserver6.2 SMGC: [10350]   File "/opt/xensource/sm/cleanup.py", line 1056, in validate
Jun  5 10:04:23 xenserver6.2 SMGC: [10350]     VDI.validate(self, fast)
Jun  5 10:04:23 xenserver6.2 SMGC: [10350]   File "/opt/xensource/sm/cleanup.py", line 646, in validate
Jun  5 10:04:23 xenserver6.2 SMGC: [10350]     raise util.SMException("VHD %s corrupted" % self)
Jun  5 10:04:23 xenserver6.2 SMGC: [10350] 
Jun  5 10:04:23 xenserver6.2 SMGC: [10350] *~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*
Jun  5 10:04:23 xenserver6.2 SMGC: [10350] Coalesce failed, skipping
 

So, we need to start making full copies of a number of VMs as the coalesce of VDIs is failing. If it fails for one VDI chain, it may proceed with other VDI chains, but we must resolve the above exception error. Hopefully we can make a full copy of the VM before deleting it - we will have to delete the bad VDIs noted in log. :(


Best Answer