Jump to content


Photo

Worker servers hanging up randomly

Started by Lukasz Slemp , 17 February 2017 - 09:27 AM

Lukasz Slemp Members

Lukasz Slemp
  • 10 posts

Posted 17 February 2017 - 09:27 AM

Hello,

I'm having an issue with worker servers booting from PVS vdisk - there's like 20 of them and for last few days they are acting weird. Randomly, servers are entering "Initializing" state (while like 20-30 users are working on them) and hold in this state. 

Nobody can start new session and Citrix DCs are happily trying to start new sessions exactly on these failing servers. 

 

Only thing I can do is to use remote MMC to these machines (as RDP won't work) so I did fetch event logs and it seems like it always starts with the same things:

 

Error 2/17/2017 9:46:27 AM Microsoft-Windows-DistributedCOM 10010 None The server {AB8902B4-09CA-4BB6-B78D-A8F59079A8D5} did not register with DCOM within the required timeout.

Error 2/17/2017 9:55:19 AM Service Control Manager 7011 None A timeout (180000 milliseconds) was reached while waiting for a transaction response from the iphlpsvc service.

Error 2/17/2017 9:58:19 AM Service Control Manager 7011 None A timeout (180000 milliseconds) was reached while waiting for a transaction response from the NlaSvc service.

Error 2/17/2017 9:58:19 AM Service Control Manager 7011 None A timeout (180000 milliseconds) was reached while waiting for a transaction response from the ShellHWDetection service.

Error 2/17/2017 10:01:19 AM Service Control Manager 7011 None A timeout (180000 milliseconds) was reached while waiting for a transaction response from the Schedule service.

Error 2/17/2017 10:01:25 AM Microsoft-Windows-DistributedCOM 10010 None The server {75DFF2B7-6936-4C06-A8BB-676A7B00B24B} did not register with DCOM within the required timeout.

Error 2/17/2017 10:04:19 AM Service Control Manager 7011 None A timeout (180000 milliseconds) was reached while waiting for a transaction response from the ServicesManager service.

Error 2/17/2017 10:07:19 AM Service Control Manager 7011 None A timeout (180000 milliseconds) was reached while waiting for a transaction response from the SessionEnv service.

Error 2/17/2017 10:10:19 AM Service Control Manager 7011 None A timeout (180000 milliseconds) was reached while waiting for a transaction response from the ShellHWDetection service.

 

At this point, server is almost unresponsive, in initializing state and only a force reboot helps. But I can't be sure that after the reboot server will last in a healthy state for a longer period of time. It sometimes fails after half an hour or 2 hours. It's driving me crazy because it's totally random.

 

Does anybody had such issues and knows what could that be? 

vdisk was not changed for longer period of time, yesterday I thought I'll update vmware tools and that might somehow help, but it didn't. 

 

XenDesktop/App/PVS/DC - 7.11; WS2012 R2 U4

 

Any hint will be appreciated.