Jump to content
Welcome to our new Citrix community!

Load Balancing methods do not work as expected between WEB and APP servers.


Recommended Posts

Hi Guys,

 

I have a customer complaining that  least connection load balancing method does not work as it is supposed to. First, since I noticed that the current configuration method is Round Robin becasue "Bound service's state changed to UP" when looking into the vServer configuration, I thought it might be cause of Slow-Start mode, by which NetScaler protects newly enabled or up servers from a possible overhead, but then I found that it's not the root cause as servers' state does not change so often for Slow-Start Mode to switch the load balancing method from configured one to Round Robin.

 

Regarding Slow-Start Mode:

 

https://support.citrix.com/article/CTX108886?_ga=2.163941951.1802433873.1553506206-1079685509.1515480303

 

Anyway, although least connection method is configured, we observe that some servers have 11 connections whereas some of them have 3 so there is always a huge gap between the server that has the most connection and the server that has the least connection. vServer in question is SSL_BRIDGE. Connection multiplexing is not in use which is obvious when comparing the "Current Client Connections" and "Current Server Connections" at vServer statistics as these values are identical for each service bound to the vServer.

 

I attached vServer and servicegroup configuration, connection count at vServer as well as two graphs taken from APM tool, one of them shows the number of connection requests to the servers and the other one shows the number of active connections at the servers. NetScaler version is 11.1 Build 58.13.

 

The customer changed the load balancing method to Round Robin and they're still not satisfied. Is it normal to face such a case as some connections may take longer than the others or is there anything I am possibly missing? This a traffic between WEB servers and APP servers which is something I am not so familiar with.

 

Any help or idea would be great as I really don't know what to do. Load balancing is the main task that this appliance perform so I think this shouldn't be a problem :) 

 

Thanks,

Hakan

 

lb-method.rar

Link to comment
Share on other sites

Hi!

 

So the last time the Vserver went into Round Robin mode is 2 days . Service 1, MOBAPP02 is the one that has the lower uptime.

In that article says that it will leave the rpundrobin method only when it reaches

request rate x number of packet engines x bound services

 

I don't know what your request rate is/was or how many packet engines  you have. 

So if we say you had 100 requests/s x 7 packet engines x 13 services = 9100

so only after 9100 client hits (tcp connections in your case) it will leave the round robin method.

 

I am guessing you vserver did not hit the limit number so it stays with round robin.

 

i found this article that might help you:

https://docs.citrix.com/en-us/netscaler/12/load-balancing/load-balancing-advanced-settings/slow-start-service.html

 

"With automated slow start, a service is taken out of the slow start phase when one of the following conditions applies:

The actual request rate is less than the new service request rate.

The service does not receive traffic for three successive increment intervals.

The request rate has been incremented 200 times.

The percentage of traffic that the new service must receive is greater than or equal to 100.

"

Give this a try.

Link to comment
Share on other sites

On 04.04.2019 at 11:14 AM, Mihai Cziraki1709160741 said:

Hi!

 

So the last time the Vserver went into Round Robin mode is 2 days . Service 1, MOBAPP02 is the one that has the lower uptime.

In that article says that it will leave the rpundrobin method only when it reaches

request rate x number of packet engines x bound services

 

I don't know what your request rate is/was or how many packet engines  you have. 

So if we say you had 100 requests/s x 7 packet engines x 13 services = 9100

so only after 9100 client hits (tcp connections in your case) it will leave the round robin method.

 

I am guessing you vserver did not hit the limit number so it stays with round robin.

 

i found this article that might help you:

https://docs.citrix.com/en-us/netscaler/12/load-balancing/load-balancing-advanced-settings/slow-start-service.html

 

"With automated slow start, a service is taken out of the slow start phase when one of the following conditions applies:

The actual request rate is less than the new service request rate.

The service does not receive traffic for three successive increment intervals.

The request rate has been incremented 200 times.

The percentage of traffic that the new service must receive is greater than or equal to 100.

"

Give this a try.

 

Hi Mihai,

 

Thanks for the comment which encouraged me to take a deep dive into this. I have read the command reference as well as the link you sent.

 

When looking into the command reference:

 

https://developer-docs.citrix.com/projects/netscaler-command-reference/en/11.0/lb/lb-vserver/lb-vserver/

 

newServiceRequestIncrementInterval

Interval, in seconds, between successive increments in the load on a new service or a service whose state has just changed from DOWN to UP. A value of 0 (zero) specifies manual slow start. Default value: 0 Minimum value: 0 Maximum value: 3600

 

-My configuration is set to 0 so manual slow start is in use.

 

newServiceRequest

Number of requests, or percentage of the load on existing services, by which to increase the load on a new service at each interval in slow-start mode. A non-zero value indicates that slow-start is applicable. A zero value indicates that the global RR startup parameter is applied. Changing the value to zero will cause services currently in slow start to take the full traffic as determined by the LB method. Subsequently, any new services added will use the global RR factor. Default value: 0 Minimum value: 0

 

-My configuration is set to 0 so global RR startup parameter is in use.

 

I have 6 cores and 13 services bound to this vServer. I also suspected that my vServer remains in slow start  for a long time but for this to happen, I thought services state must change frequently. Therefore, when I noticed that last time one of the servers state changed 2 days ago, I said ok then this is not the case. However, as you said, I may need to observe the vServer hits.

 

So my vServer exits slow start mode when it reaches;

 

100 requests/s x 6 packet engines x 13 services = 7800 hits.

 

That is, from the time at which MOBAPP02's state changed to UP untill the time this servicegroup configuration was taken, my vServer might have not received 7800 hits. So I will check the counter of vServer hits between Fri Mar 29 15:30:25 -  Mon Apr 01 12:23:37 (Mar 29 15:30:25 + 2 days, 20:53:12)

 

       1)  1.1.1.1:443  State: UP       Server Name: MOBAPP02   Server ID: None Weight: 1
             Last state change was at Fri Mar 29 15:30:25 2019 
             Time since last state change: 2 days, 20:53:12.550

 

nsconmsg -K /var/nslog/newnslog.xx -d current -g vsvr_tot_Hits -T 7 | grep vserver_ip

 

Or do you know a direct way of seeing for how long this vServer had been using Round Robin during these 2 days + 21 hours.

 

Thanks,

Hakan

 

 

 

Link to comment
Share on other sites

Update:

 

I found how long it takes for the vserver to get 7800 hits by analyzing newnslog, which is 21 hours, but it was useless as there is something I missed. The default value for  startupRRFactor as given in the below link is for a newly configured vServer. 

 

https://support.citrix.com/article/CTX108886?_ga=2.243877189.1561745537.1554466477-1079685509.1515480303

By default the newly configured virtual server remains in a Slow Start mode for Startup RR Factor of 100.

 

When I have look at the command reference to find a description for "set lb parameter startupRRFactor"  command, I found this:

https://developer-docs.citrix.com/projects/netscaler-command-reference/en/12.0/lb/lb-parameter/lb-parameter/

For an existing virtual server, if one or more services are newly bound or newly enabled, or if the load balancing method is changed, the appliance dynamically computes the number of requests for which to implement startup round robin. It obtains this number by multiplying the request rate by the number of bound services (it includes services that are marked as DOWN). For example, if the current request rate is 20 requests/s and ten services are bound to the virtual server, the appliance performs startup round robin for 200 requests.

 

Most of the time the request rate of vServer in question is between 0-1 (most of the time it doesn't get 7 requests during any 7 seconds of period so when the delta value is less than 7, it shows 0 for request rate). So in this case, for this vServer to exit slow start, it must receive the following requests:

 

1 (req rate) x 6 (core) x 13 (services bound to the vserver)=78 hits.

 

if it does use the rational numbers to calculate, possible values;

 

(2/7) x (6) x (13) ≈ 22 hits.

(19/7) x (6) x (13)  ≈ 212 hits

 

Assuming worst case scenario has occured, that is a service state changed to up when request rate is 2 (delta=19), even if the vserver receives only 1 request at each 7 seconds from the moment the service state's change, it would exit slow start after 1484 seconds ((212/1) x 7) which is approximately equal to 25 minutes and it's not as long as we predicted. It's even not close.

 

Log analysis sample:

nsconmsg -K /var/nslog/newnslog.xx.tar.gz -d current -g vsvr_tot_Hits

 

reltime:mili second between two records Fri Apr  5 15:08:01 2019
  Index   rtime totalcount-val      delta rate/sec symbol-name&device-no
  1412       0          99121          2        0 vsvr_tot_Hits vserver_lb_9.9.9.9:443(MOBAPP_PX12_443_VIP)


reltime:mili second between two records Fri Apr  5 15:24:35 2019
  Index   rtime totalcount-val      delta rate/sec symbol-name&device-no
   2052       0      199757292       3265      466 vsvr_tot_Hits vserver_cs_0.0.0.0:443(COMOBWEB_BANK_HTTPS_CS_VIP)  

   2054       0          99268          1        0 vsvr_tot_Hits vserver_lb_9.9.9.9:443(MOBAPP_PX12_443_VIP)  
   2055    7000      199748605       3091      441 vsvr_tot_Hits vserver_lb_0.0.0.0:0(COMOBWEB_BANK_HTTPS_VIP)  
   2058       0          99270          2        0 vsvr_tot_Hits vserver_lb_9.9.9.9:443(MOBAPP_PX12_443_VIP)  
   2059       0         466040          6        0 vsvr_tot_Hits vserver_cs_8.8.8.8:80(COMOBWEB_BANK_HTTP_CS_VIP)  
   2072    7000      199761849       3271      467 vsvr_tot_Hits vserver_lb_0.0.0.0:0(COMOBWEB_BANK_HTTPS_VIP)  

   2076       0          99277          7        1 vsvr_tot_Hits vserver_lb_9.9.9.9:443(MOBAPP_PX12_443_VIP)  

   2077    7000      199765213       3364      480 vsvr_tot_Hits vserver_lb_0.0.0.0:0(COMOBWEB_BANK_HTTPS_VIP)  
   2078       0          99296         19        2 vsvr_tot_Hits vserver_lb_9.9.9.9:443(MOBAPP_PX12_443_VIP)  

 

 

To sum it up, the problem seems like not related to slow start. Or am I missing something again? What can I do for leastconnection method to work as expected? TCP Multiplexing is not even in use because the vServer is SSL_BRIDGE. I don't have any idea what else could be the reason.

 

Thanks,

Hakan

 

 

Link to comment
Share on other sites

you probably need to open a ticket with Citrix, they might advise on this.

 

what is not clear to me is this :

"For a virtual server that is already configured and is serving the production traffic, when the services are enabled or the services are UP, the time to exit Slow Start is calculated using the following calculation:
Request rate = current instance value - previous instance value (before 7 seconds)
"

 

I think it will best to ask Citrix. Or maybe somebody else knows better and will reply on this thread.

Link to comment
Share on other sites

Thanks for your comments Mihai.

You're right. I asked customer to open a ticket for this.

 

I assume "current instance value" correspond to the value under totalcount-val  column of nslog, accordingly previous instance value is the value 7 second ago. But when you do this calculation it does not give you a rate, it gives you the number of hits which you must divide by 7 in order to find the rate.

 

I think the correct notation should be like this:

 

Request rate: (current instance value-previous instance value) / 7

in which "current instance value-previous instance value" correspond to delta in nslog.

 

But this explanation from the command reference makes more sense. I think current request rate correspond to the value under rate/sec column of nslog

https://developer-docs.citrix.com/projects/netscaler-command-reference/en/12.0/lb/lb-parameter/lb-parameter/

When I have look at the command reference to find a description for "set lb parameter startupRRFactor"  command, I found this:

https://developer-docs.citrix.com/projects/netscaler-command-reference/en/12.0/lb/lb-parameter/lb-parameter/

For an existing virtual server, if one or more services are newly bound or newly enabled, or if the load balancing method is changed, the appliance dynamically computes the number of requests for which to implement startup round robin. It obtains this number by multiplying the request rate by the number of bound services (it includes services that are marked as DOWN). For example, if the current request rate is 20 requests/s and ten services are bound to the virtual server, the appliance performs startup round robin for 200 requests.

 

 

Link to comment
Share on other sites

  • 1 month later...

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...