Jump to content
Welcome to our new Citrix community!
  • 0

Asynchronous I/O in PVS Creates Extreme Slowdown


Question

So I saw there's now a checkbox for asynchronous I/O support in the latest version of PVS. I wasn't able to find much on this other than it supposedly improves both latency and networking throughput.

 

Activating this pretty much brings our entire thing to a crawl - totally unusable. Just booting into windows takes about 20 minutes and I see hundreds and hundreds of excessive retries in the PVS console.

 

This is all on a relatively new install - all Server 2019 - version 1903 across the board (2 load balanced DC, 2 load balanced storefronts, 2 load balanced PVS servers, UPM in use, no WEM quite yet, EDT / DTLS enabled). With the checkbox off everything performs well (windows 10 VM). 

 

What can I troubleshoot getting this to work?

Link to comment

13 answers to this question

Recommended Posts

  • 1

Asynch IO only works on the target when in standard image mode and ram cache with overflow is the cache type, it has nothing to do with the network and does not do anything for any other cache mode.  I would suggest taking a network trace from the target and PVS servers during the boot.    I would also recommend not having the load balance in front of you PVS servers, this can cause issues but it should be fairly consistent so probably not causing the issue if there really is a difference.  Once a target is connected to a PVS server it expects only to communicate with that PVS server and no other unless it has to fail over due to connectivity issues with that PVS server.  

  • Like 1
Link to comment
  • 0
3 hours ago, Carl Fallis said:

Asynch IO only works on the target when in standard image mode and ram cache with overflow is the cache type, it has nothing to do with the network and does not do anything for any other cache mode.  I would suggest taking a network trace from the target and PVS servers during the boot.    I would also recommend not having the load balance in front of you PVS servers, this can cause issues but it should be fairly consistent so probably not causing the issue if there really is a difference.  Once a target is connected to a PVS server it expects only to communicate with that PVS server and no other unless it has to fail over due to connectivity issues with that PVS server.  

 

Definitely have it in standard image mode and ram cache with overflow. Will do a network trace.

 

I don't believe we have a load balance in front of PVS (load balancing enabled within PVS itself).

 

What advantages real world have you seen with async I/O ? Wondering how much troubleshooting I should do on this.

Link to comment
  • 0

Still an issue with LTSR 1912.


Went to test it and it took ~3 minutes to boot/shutdown and was extremely slow.  I don't really understand, the documentation states it should only apply in private or maintenance modes.....

 

Anyway reverted and they're running like a Ferrari again

 

I've never had an issue w/ performance running PVS with in similar specs:

 

VM specs 6 vCPU 32GB RAM

8GB of Write Cache with overflow to HDD. (all flash)

User density ~15 but 30+ when needed.

Link to comment
  • 0

@Derek Black The documentation is trying to be precise but could cause confusion.  Below "private or maintenance mode", next two bullets actually describe cache mode options that are supported in standard mode.  Please reach out to our support team, they should help you resolve this issue with engineering assistance if needed.  We will update documentation to avoid confusion

Link to comment
  • 0
On 2/28/2020 at 10:57 PM, Yuhua Lu said:

@Derek Black The documentation is trying to be precise but could cause confusion.  Below "private or maintenance mode", next two bullets actually describe cache mode options that are supported in standard mode.  Please reach out to our support team, they should help you resolve this issue with engineering assistance if needed.  We will update documentation to avoid confusion

Here are the next two bullet points:

The following vDisk cache modes support asynchronous I/O:

  • Cache in device RAM with overflow on hard drive
  • Cache on server persistent

 

Our environment is ONLY using Cache in device RAM with overflow on hard drive yet when enabling it crushes the performance of our VMs.  
 

8GB of Write Cache with overflow to HDD. (all flash)


Does anyone have this working and if so, what performance benefits have you seen?

Link to comment
  • 0
On 6/8/2020 at 10:15 AM, Derek Black said:

Here are the next two bullet points:

The following vDisk cache modes support asynchronous I/O:

  • Cache in device RAM with overflow on hard drive
  • Cache on server persistent

 

Our environment is ONLY using Cache in device RAM with overflow on hard drive yet when enabling it crushes the performance of our VMs.  
 

8GB of Write Cache with overflow to HDD. (all flash)


Does anyone have this working and if so, what performance benefits have you seen?

looks like issue we found is not related to performance.  Please reach out to Citrix support

Link to comment
  • 0
On 6/9/2020 at 12:15 AM, Derek Black said:

Our environment is ONLY using Cache in device RAM with overflow on hard drive yet when enabling it crushes the performance of our VMs.  
 

8GB of Write Cache with overflow to HDD. (all flash)


Does anyone have this working and if so, what performance benefits have you seen?

Did you ever determine the cause to this?

 

We tried PVS 1912 CU1 Server and Target device. 5GB Cache to RAM with overflow. async enabled. Had it on 8 servers. Worked fine no reports of issues.

 

Roll to prod ~130 servers. Client connection failures, slow logon, slow app performance.

 

Vmware environment with 11.0.5 tools.

 

Guests were using low CPU, hosts around 15% CPU, PVS servers were showing reduced CPU usage to normal.

 

We did make a change of antivirus vendor as well (following best practices) so we are separating out the async change and AV change to get a definitive culprit.

Link to comment
  • 0
On 6/9/2020 at 12:05 PM, Yuhua Lu said:

looks like issue we found is not related to performance.  Please reach out to Citrix support

So if it isn't related to performance it is related to something else (misconfiguration, bug, etc.) that manifests itself in performance issue?   By performance issue I am specifically talking about the amount of time taken for the VMs to boot.  Once the VMs were booted into Windows we did not see performance issues.

Link to comment
  • 0
On 7/8/2020 at 3:23 AM, Justin Annand said:

Did you ever determine the cause to this?

 

We tried PVS 1912 CU1 Server and Target device. 5GB Cache to RAM with overflow. async enabled. Had it on 8 servers. Worked fine no reports of issues.

 

Roll to prod ~130 servers. Client connection failures, slow logon, slow app performance.

 

Vmware environment with 11.0.5 tools.

 

Guests were using low CPU, hosts around 15% CPU, PVS servers were showing reduced CPU usage to normal.

 

We did make a change of antivirus vendor as well (following best practices) so we are separating out the async change and AV change to get a definitive culprit.


I did not open a support case, but we've since upgraded to LTSR 1912 CU1 so I'll work on testing that again and post my results.

Could you expound on the change to the Antivirus Vendor?  Was there also policy, feature, or other changes?  Just curious as we are running SEP.

Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...