Jump to content


Photo

Quiesced Snapshots Fail

Started by lepphce1 , 21 April 2014 - 04:16 PM
19 replies to this topic

lepphce1 Members

Ladd Epp
  • 151 posts

Posted 21 April 2014 - 04:16 PM

XenServer 6.2 SP1+update2+update4.

 

I'm having troubles with all of my Windows 2008+ VMs failing to take Quiesced snapshots every time. Non-quiesced snapshots work OK.  According to the documentation this should work... so any ideas? Thanks!

 

I'm attaching the XenServer message and source logs and xencenter message

 

... and the Windows logs:

"Apr 21 09:23:16 Host.domain MSWinEventLog    1       Application    85      Mon Apr 21 09:23:15 2014  8194    VSS     N/A     N/A     Error   Host.domain    None            Volume Shadow Copy Service error: Unexpected error querying for the IVssWriterCallback interface.  hr = 0x80070005, Access is denied.  . This is often caused by incorrect security settings in either the writer or requestor process.   Operation:    Gathering Writer Data  Context:    Writer Class Id: {e8132975-6f93-4464-a53e-1050253ae220}    Writer Name: System Writer    Writer Instance ID: {edeb806a-51ab-417f-9f74-0aa09aa13dba}        40"

"Apr 21 09:23:16 Host.domain MSWinEventLog  1              Application         86           Mon Apr 21 09:23:15 2014            8194                VSS        N/A        N/A        Error      Host.domain      None                     Volume Shadow Copy Service error: Unexpected error querying for the IVssWriterCallback interface.  hr = 0x80070005, Access is denied.  . This is often caused by incorrect security settings in either the writer or requestor process.   Operation:    Gathering Writer Data  Context:    Writer Class Id: {35e81631-13e1-48db-97fc-d5bc721bb18a}    Writer Name: NPS VSS Writer    Writer Instance ID: {0fa63b89-c698-4684-9739-40727eda930b}  41"

 

"Apr 21 09:23:16 Host.domain MSWinEventLog  1              Application         87           Mon Apr 21 09:23:15 2014            8194                VSS        N/A        N/A        Error      Host.domain      None                     Volume Shadow Copy Service error: Unexpected error querying for the IVssWriterCallback interface.  hr = 0x80070005, Access is denied.  . This is often caused by incorrect security settings in either the writer or requestor process.   Operation:    Gathering Writer Data  Context:    Writer Class Id: {be9ac81e-3619-421f-920f-4c6fea9e93ad}    Writer Name: Dhcp Jet Writer    Writer Instance ID: {209fbf0b-8f58-4fd3-a09c-21d22ac6c526}     42"

 

Apr 21 09:23:28 Host.domain MSWinEventLog    1              System 88           Mon Apr 21 09:23:27 2014            8                volsnap N/A        N/A        Error      Host.domain      None                     The flush and hold writes operation on volume \\?\Volume{c08e4b84-801e-11e0-b937-806e6f6e6963} timed out while waiting for a release writes command.          29

 

Apr 21 09:23:28 Host.domain MSWinEventLog    1              System 89           Mon Apr 21 09:23:27 2014            8                volsnap N/A        N/A        Error      Host.domain      None                     The flush and hold writes operation on volume C: timed out while waiting for a release writes command.            30

 

"Apr 21 09:23:30 Host.domain MSWinEventLog  1              Application         90           Mon Apr 21 09:23:28 2014            12293                VSS        N/A        N/A        Error      Host.domain      None                     Volume Shadow Copy Service error: Error calling a routine on a Shadow Copy Provider {00000000-0000-0000-0000-000000000000}. Routine details CommitSnapshots [hr = 0x80004005, Unspecified error  ].   Operation:    Executing Asynchronous Operation  Context:    Current State: DoSnapshotSet   43"

 

"Apr 21 09:23:30 Host.domain MSWinEventLog  1              Application         91           Mon Apr 21 09:23:28 2014            12298                VSS        N/A        N/A        Error      Host.domain      None                     Volume Shadow Copy Service error: The I/O writes cannot be held during the shadow copy creation period on volume \\?\Volume{c08e4b84-801e-11e0-b937-806e6f6e6963}\. The volume index in the shadow copy set is 0. Error details: Open[0x00000000, The operation completed successfully.  ], Flush[0x00000000, The operation completed successfully.  ], Release[0x80042314, The shadow copy provider timed out while holding writes to the volume being shadow copied. This is probably due to excessive activity on the volume by an application or a system service. Try again later when activity on the volume is reduced.  ], OnRun[0x00000000, The operation completed successfully.  ].   Operation:    Executing Asynchronous Operation  Context:    Current State: DoSnapshotSet       44"

Attached Thumbnails

  • Capture.PNG

Attached Files



James Cannon Citrix Employees

James Cannon
  • 4,402 posts

Posted 21 April 2014 - 09:05 PM

Hi Ladd.

 

VM install guide has the following note in the section for Windows 2008 Known Issue:

 

A.1.2. Windows Server 2008
Quiesced snapshots taken on Windows Server 2008 guests will not be directly bootable. Attach the snapshot disk
to an existing Windows Server 2008 VM to access files for restoration purposes.
 

http://support.citrix.com/servlet/KbServlet/download/34971-102-704221/guest.pdf



lepphce1 Members

Ladd Epp
  • 151 posts

Posted 22 April 2014 - 01:32 PM

That's good to know, as I was trying to get this working with the CommVault XenServer agent (a non-bootable image is NOT good!)

 

Does this apply to Windows 2008R2 and Windows 2012 / R2?

 

It's unfortuante that there isn't a better mechanism to get clean DR snaps of VMs.

 

In any case, the limitation is somewhat irrelvant because I can't even get the thing to take a quiesced snapshot to begin with.

 

Thanks,

Ladd



James Cannon Citrix Employees

James Cannon
  • 4,402 posts

Posted 22 April 2014 - 07:18 PM

Hi Ladd,

 

Yes it does apply to Win2k8R2. Win2k12 is likely to have same, but I do not see documentation. Win2k12 is closer to Win2k8 than the older Win2k3.

 

Also, you did follow steps in VM guide ... ?

 

9.4. Windows Volume Shadow Copy Service (VSS) provider
The Windows tools also include a XenServer VSS provider that is used to quiesce the guest filesystem in preparation for a VM snapshot. The VSS provider is installed as part of the PV driver installation, but is not enabled by default.
 
To enable the Windows XenServer VSS provider
 
1. Install the Windows PV drivers.
2. Navigate to the directory where the drivers are installed (by default c:\Program Files
\Citrix\XenTools, or the value of HKEY_LOCAL_MACHINE\Software\Citrix\XenTools\Install_dir in the Windows Registry).
3. Double-click the install-XenProvider.cmd command to activate the VSS provider.
 
Note:
 
The VSS provider is automatically uninstalled when the PV drivers are uninstalled, and need
to be activated again upon re-installation. They can be uninstalled separately


lepphce1 Members

Ladd Epp
  • 151 posts

Posted 23 April 2014 - 08:39 PM

OK, will assume same issue with Win2k12.

 

Yes, the VSS provider is installed as instructed. (Actually you do not get the option for quiesced snapshots without having the service/provider installed on the guest).

 

Thanks

Ladd


Edited by Ladd Epp, 23 April 2014 - 08:40 PM.


James Cannon Citrix Employees

James Cannon
  • 4,402 posts

Posted 23 April 2014 - 10:41 PM

Hi Ladd,

 

Good deal. I don't use the quiesce feature in lab, but did validate, that you are correct (script must be run). Looking back at error, it looks like your storage is busy, or perhaps other I/O writes from inside of Windows is happening. I have no issue on my test VM. Perhaps you can create a test VM on a different storage repository?



lepphce1 Members

Ladd Epp
  • 151 posts

Posted 24 April 2014 - 09:29 PM

Well, I don't get any errors in XenCenter in my lab (local storage), but I still get the following VSS error in event viewer/application.

 

Volume Shadow Copy Service error: Unexpected error querying for the IVssWriterCallback interface.  hr = 0x80070005, Access is denied. This is often caused by incorrect security settings in either the writer or requestor process.

 

Additionally, when I do a vssadmin list writers, all come back as no error. So I'm not sure if I can ignore the above error or not...?

 

The two pools where it is failing have the eql storage type, I wonder if there is a problem with how that connector works (one VM per LUN, native LUN-level snapshots)

 

Thanks

Ladd



Konrad Ruess Members

Konrad Ruess
  • 2,925 posts

Posted 28 July 2015 - 11:52 AM

XS6.5sp1 and this is still not being sorted out... :-( We also constantly fail to take quiesced snapshots from our guests (we mainly have W2012R2). We usually have 3 disks with one or more NTFS partitions on each attached to the guests and it looks to me as if VSS process takes longer if you have more disks / partitions. Windows Backup works perfectly, but the time it reports to take the snapshots is definitely much longer than XenServer waits before throwing the error.

 

We also see the timeouts in the windows event logs (ID 8194, 12293, 12298): 0x80042314, The shadow copy provider timed out while holding writes to the volume being shadow copied.

 

I don't really think that it's due to heavy I/O, as this even happens on idling test systems - it looks like XenServer just does not wait for long enough for the VSS to do its work. Otherwise Windows Backup should fail, too. There must be some parameter to tell VSS to give itself more time to complete (ie. /AutoRetry ?).

 

After a failed quiesced snapshot, some VSS Writers (ASR, COM+, IIS, Registry, WMI) will show:

  State: [9] Failed

  Last error: Timed out

 

Our MSA P2000 storage is being attached through 1Gbit iSCSI and does quite a good job, performance-wise.

 

BfN, Konrad



Andreas Becker Members

Andreas Becker
  • 66 posts

Posted 09 October 2015 - 10:45 AM

Hi Konrad,

 

have you tried this ?

 

From the Start Menu, select Run
The Run dialog opens.

In the Open field, input dcomcnfg and click OK.
The Component Services dialog opens.

Expand Component Services, Computers, and My Computer.
Right-click My Computer and click Properties on the pop-up menu.

The My Computer Properties dialog opens.

Click the COM Security tab.
Under Access Permission click Edit Default.

The Access Permissions dialog opens.

From the Access Permissions dialog, add the "Network Service" account with Local Access allowed.
Close all open dialogs.
Restart the compu



Konrad Ruess Members
  • #10

Konrad Ruess
  • 2,925 posts

Posted 13 October 2015 - 06:39 PM

Hallo Andreas,

 

Thanks for sharing. I've done the changes and rebooted the machine. It 'feels' like it's taking much longer before it throws the "quiesced-snapshot operation failed" message (I'd say 30s instead of 10s before), but the error stays. Also the event logs in the windows guest still show the same errors regarding "VSS timed out".

 

BfN, Konrad



Alan Lantz Members
  • #11

Alan Lantz
  • 6,985 posts

Posted 13 October 2015 - 07:29 PM

There can also be things like free space available, time sync  and virus software that can interfere with the quiesced process.

 

--Alan--



Tobias Kreidl CTP Member
  • #12

Tobias Kreidl
  • 18,283 posts

Posted 13 October 2015 - 07:31 PM

Is there a coalesce process going on, by any chance? That can take many hours in some cases.



Konrad Ruess Members
  • #13

Konrad Ruess
  • 2,925 posts

Posted 15 October 2015 - 03:57 PM

Let's see...

 

@Alan: I did some additional tests with a plain vanilla W2012R2, no antivirus, only one disk and usal two partitions (boot & system), plenty of disk space (guest & SR), time is in-sync on guest and hosts. No success

 

@Tobias: No coalesce, I've even moved a test system to local SSD-based SR on a XS host (which is not being used at all), but it still fails.

 

Source:        VSS
Date:          15.10.2015 17:34:42
Event ID:      12298
Task Category: None
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      W63testSSD2
Description:
Volume Shadow Copy Service error: The I/O writes cannot be held during the shadow copy creation period on volume \\?\Volume{b7ac0b9b-3e7a-11e5-80b5-806e6f6e6963}\. The volume index in the shadow copy set is 0. Error details: Open[0x00000000, The operation completed successfully.
], Flush[0x00000000, The operation completed successfully.
], Release[0x80042314, The shadow copy provider timed out while holding writes to the volume being shadow copied. This is probably due to excessive activity on the volume by an application or a system service. Try again later when activity on the volume is reduced.
], OnRun[0x00000000, The operation completed successfully.
].

Operation:
   Executing Asynchronous Operation

Context:
   Current State: DoSnapshotSet
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
  <System>
    <Provider Name="VSS" />
    <EventID Qualifiers="0">12298</EventID>
    <Level>2</Level>
    <Task>0</Task>
    <Keywords>0x80000000000000</Keywords>
    <TimeCreated SystemTime="2015-10-15T15:34:42.000000000Z" />
    <EventRecordID>1507</EventRecordID>
    <Channel>Application</Channel>
    <Computer>W63testSSD2</Computer>
    <Security />
  </System>
  <EventData>
    <Data>\\?\Volume{b7ac0b9b-3e7a-11e5-80b5-806e6f6e6963}\</Data>
    <Data>0</Data>
    <Data>0x00000000, The operation completed successfully.
</Data>
    <Data>0x00000000, The operation completed successfully.
</Data>
    <Data>0x80042314, The shadow copy provider timed out while holding writes to the volume being shadow copied. This is probably due to excessive activity on the volume by an application or a system service. Try again later when activity on the volume is reduced.
</Data>
    <Data>0x00000000, The operation completed successfully.
</Data>
    <Data>

Operation:
   Executing Asynchronous Operation

Context:
   Current State: DoSnapshotSet</Data>
    <Binary>2D20436F64653A20434F524C4F564C4330303030313330392D2043616C6C3A20434F524C4F564C4330303030313139392D205049443A202030303030323530302D205449443A202030303030323837362D20434D443A2020433A5C57696E646F77735C73797374656D33325C76737376632E6578652020202D20557365723A204E616D653A204E5420415554484F524954595C53595354454D2C205349443A532D312D352D313820</Binary>
  </EventData>
</Event>

Log Name:      Application

 

and:

Log Name:      System
Source:        volsnap
Date:          15.10.2015 17:34:42
Event ID:      8
Task Category: None
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      W63testSSD2
Description:
The flush and hold writes operation on volume C: timed out while waiting for a release writes command.
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
  <System>
    <Provider Name="volsnap" />
    <EventID Qualifiers="49158">8</EventID>
    <Level>2</Level>
    <Task>0</Task>
    <Keywords>0x80000000000000</Keywords>
    <TimeCreated SystemTime="2015-10-15T15:34:42.620240000Z" />
    <EventRecordID>3782</EventRecordID>
    <Channel>System</Channel>
    <Computer>W63testSSD2</Computer>
    <Security />
  </System>
  <EventData>
    <Data>
    </Data>
    <Data>C:</Data>
    <Binary>000000000200300000000000080006C0000000000000000001000000000000000000000000000000</Binary>
  </EventData>
</Event>

 

Doing a "Windows Server Backup" still takes a good time longer for the VSS to successfully finish taking the snapshots (~1min) than XenVSS waits until it throws the error (~30s).



Tobias Kreidl CTP Member
  • #14

Tobias Kreidl
  • 18,283 posts

Posted 15 October 2015 - 04:33 PM

Konrad,

The last resort is probably to try an export, delete, import. Also, is there enough free space on the SR?

Servus,

-=Tobias



Konrad Ruess Members
  • #15

Konrad Ruess
  • 2,925 posts

Posted 15 October 2015 - 05:07 PM

It's a brandnew guest, so there's no snapshot chain:

# vhd-util scan -f -m "VHD-*" -l VG_XenStorage-55f3783b-fd59-f57b-357b-a3fc4d25662c -p

vhd=VHD-ee49951a-bf5d-46a4-b2cc-79c0f654f5d2 capacity=34359738368 size=34435235840 hidden=0 parent=none

 

and there's plenty of storage available:

# xe sr-param-list

                       VDIs (SRO): ee49951a-bf5d-46a4-b2cc-79c0f654f5d2
                     PBDs (SRO): 0f36af8f-ae8f-ec40-78f1-4bed4e932d87
    virtual-allocation ( RO): 34435235840

physical-utilisation ( RO): 34439430144
          physical-size ( RO): 118396813312

 

Hmmm...

 

So, for you, quiesced snapshotting works fine and reliable, even with multi-VBDs, many partitions windows 2012 R2 guests?

 

BfN, Konrad



lepphce1 Members
  • #16

Ladd Epp
  • 151 posts

Posted 15 October 2015 - 05:21 PM

Can you confirm your SR type? If it's EqualLogic, in my experience it simply does not work. I think it's how the connector handles the LUN-level snapshots. Try LVM over iSCSI.



Konrad Ruess Members
  • #17

Konrad Ruess
  • 2,925 posts

Posted 15 October 2015 - 05:34 PM

In general, my guests are running on LVMoISCSI (HP P2000).

 

The tests with the plain-vanilla W2012R2 were done on LVM on a locally attached SSD.



Alan Lantz Members
  • #18

Alan Lantz
  • 6,985 posts

Posted 15 October 2015 - 05:35 PM

As long as XenTools installs and you ran the install-xenprovider.cmd and the XenVSS services shows up I would try increasing the timeout for VSS.

 

HKLM\Software\Microsoft\Windows NT\CurrentVersion\SPP

dword: CreateTimeout 12000000

 

12000000 = 20 minutes, the default is 10 minutes.

 

--Alan--



Konrad Ruess Members
  • #19

Konrad Ruess
  • 2,925 posts

Posted 18 November 2015 - 01:55 PM

Hi Alan,

 

Thanks for the hint. It took me some time to find a slot for new tests, but now I've performed several tests around your reg keys.

 

Unfortunately, it did not help to get this sorted. The snapshot task fails after approx 45 seconds whenever it's being triggered by Xen VSS Provider. It works fine if triggered by MS VSS Provider.

 

So, it comes nothing near the 10 or 20 or whatever minutes configured in that reg value.

 

 :-(



Alan Lantz Members
  • #20

Alan Lantz
  • 6,985 posts

Posted 18 November 2015 - 04:52 PM

Strange, I haven't thought about this for a while. I still have one 6.2 pool left. I will see what I can figure out on mine and if I have any other ideas.

 

--Alan--