Jump to content


Photo

NS VPX 10.0 Build 71.6.nc Apparent DNS / Name Server issues

Started by Adrian Walker , 05 November 2012 - 08:56 AM
21 replies to this topic

Adrian Walker Members

Adrian Walker
  • 72 posts

Posted 05 November 2012 - 08:56 AM

Hi,
Last week we upgraded our NetScalers from 10.0 build 54.7.nc to 71.6.nc. We used the Upgrade Wizard and the upgrade was installed without error.

We have two NSs in an Active / Passive HA pair. We use the free license for load balancing our Exchange 2010 CAS array.

When the upgrade was complete, the Service Group for the Exchange 2010 CAS servers had an Effective State of down. None of the monitors were down. On closer inspection of the two Configured Members for the Service Group, I noticed the IP Address/Domain were both listed as 0.0.0.0.

The two Exchange 2010 CAS servers were added using their Domain Name (FQDN) and were both Enabled.

Looking at the DNS --> Name Servers, the four Active Directory DNS servers were listed with an Effective State of Up. I attempted to Disable one only to find a message that the "Resource could not be found" (or words to that effect). I got out the documentation on how I orriginally set them up, deleted the four DNS servers and readded them, however this did not resolve the issue.

We resolved the issue by readding the Exchange 2010 Servers using their IP address however, I'm trying to fathom what is going on. I have recreated the Vitual Server for Exchange from the ground up with no change.

What is interesting is that we have configured the NS to use Active Directory logins. That includes a load balanced config for LDAP. The service group for this contains all four Active Directory servers. This still works in Live, however when I created it in test using the exact same setting, it has the same issue as Exchange.

The other thing that I noticed, was that after the upgrade, the Load Balanced feature was disabled. Enabling didn't help. I enabled this in Test last week. I saved the config, however this morning I've had to re-enabled it. I am working on the Active Node.

I've read the 'What's New' change list but there doesn't appear to be anything that would cause our issues. If anyone could shed some light on what may be going on in our environment, I'd be most grateful.

Many thanks

W.



Adrian Walker Members

Adrian Walker
  • 72 posts

Posted 12 November 2012 - 10:57 AM

Bump.

Sorry, but I have still not found a solution.

Thanks

W.



SIMON GOTTSCHLAG CTP Member

SIMON GOTTSCHLAG
  • 196 posts

Posted 12 November 2012 - 11:08 AM

Hi,

This is a long shot:
Have you removed all your temporary Java-files? http://www.java.com/en/download/help/plugin_cache.xml

Another question:
Do you use the hostname or the IP-address for the AD Domain Controllers? If you are using their hostnames/dns-entries, perhaps that's the problem?

A third question:
Have you configured the NetScaler DNS Name Servers? Are you using the a load balancer or the IP-addresses of the DNS'?



Adrian Walker Members

Adrian Walker
  • 72 posts

Posted 12 November 2012 - 01:52 PM

Hi Simon,

Thanks for your reply. All java cache has been cleared.

Before the upgrade everything was working fine and it's only when we performed the upgrade that the issue occurred.

Yes, we are using the FQDN for the domain controllers (Load Balancing --> Servers). This is indeed where the problem lies. Before the upgrade, we were using FQDN (for the Exchange CAS servers) without issue, now we have had to add them using their IP. However, the AD Domain Controllers are still in the list with their FQDN and log in with AD credentials still works. Have Citrix changed something here?

Under DNS-->Name Servers, we have added all four of our AD domain controllers / DNS servers individually using their IP. All DNS servers are up.

Many thanks

W.



SIMON GOTTSCHLAG CTP Member

SIMON GOTTSCHLAG
  • 196 posts

Posted 12 November 2012 - 02:10 PM

Hi,

If you look at the service's monitor details, what does it say? Does it say "domain name not resolved"?

Could you post the configuration for the servers?



Adrian Walker Members

Adrian Walker
  • 72 posts

Posted 12 November 2012 - 02:49 PM

Could this be it?

I negated to say we run the VPX Express edition. My next move was to rebuild the NS and I couldn't find download of 71.6. I have now realised that there are two catagories for the VPX download: VPX Release 10 and VPX Express. Under VPX Express, the latest version is 70.7.nc, whereas the paid version is 71.6.

If this is the cause of all my headaches, I don't know whether to scream at Citrix for allowing me to do this or myself for only just figuring this out.

VPX Express Download pages:
https://www.citrix.com/downloads/netscaler-adc/virtual-appliances/netscaler-vpx-express.html '> https://www.citrix.com/downloads/netscaler-adc/virtual-appliances/netscaler-vpx-express.html

VPX Release 10 (Full) Download pages:
https://www.citrix.com/downloads/netscaler-adc/virtual-appliances/netscaler-vpx-release-10.html '> https://www.citrix.com/downloads/netscaler-adc/virtual-appliances/netscaler-vpx-release-10.html

Thanks

W.



George Gildenhuys Members

George Gildenhuys
  • 7 posts

Posted 04 December 2012 - 09:46 AM

This is a common issue *(driving me mad)*. I currently have a support case open for this and it has been confirmed as bug. I will let you know once I hear back from Citrix support.


Helpful Answer

Adrian Walker Members

Adrian Walker
  • 72 posts

Posted 04 December 2012 - 11:49 AM

Hi George,

Thanks for the post. Good to know its not me alone!! Please let me know the outcome. Probably yet another upgrade!

I hate Citrix, one of the largest software houses in the world and in my 8 years experience, they release nothing but bugs. Not a lot to choose between Adobe and Citrix realy!! I guess that's why Citrix consultants get paid more than MS!

Rant over.

Thanks again for the info.

W.

Edited by: awalker30 on 04-Dec-2012 11:58



David Kirby Members

David Kirby
  • 149 posts

Posted 04 December 2012 - 11:58 AM

You have absolutley summed it up. I couldnt agree more. I have been working on a AGEE/Netsclaler implementation for the past couple of months and have logged several calls with Citrix through our resellers. It is so buggy and very frustrating.



George Gildenhuys Members
  • #10

George Gildenhuys
  • 7 posts

Posted 04 December 2012 - 12:05 PM

I thought it is just me discovering all these bugs. I think in two months we are on our third bug related issue.

So far I have had the following issues:
DNS resolution not working.
Kerberos Contrstained Delegation crashes the appliance
Intermittently the source IP would use a different SNIP when connecting to a backend server

All bugs!

This is getting silly now.

I am thinking F5 next time.



Adrian Walker Members
  • #11

Adrian Walker
  • 72 posts

Posted 04 December 2012 - 12:30 PM

Maybe I've missed something, but I always thought that when software houses released products for beta testing to the general populus, they were free, not £14,000 a pop!!! And they normally came with a big disclaimer.

It makes me laugh that one has to change the URL to /guia to administer the things. They've known about that for the last four releases at least. How dificult is it to update the URL?

CITRIX for goodness sake SORT IT OUT!!! We've had to take out our Netscalers from load balancing Exchange 2010 because of your poor workmanship. Perhaps you should replace your very busy product rebranding / renaming team with a test team? I suggest you read these forums, the posts are littered with disgruntled users because you keep releasing products full of bugs and charge top dollar. If you bought a TV that didn't work what would you do? Take it back and buy a different make?



David Kirby Members
  • #12

David Kirby
  • 149 posts

Posted 04 December 2012 - 12:56 PM

VMWare and Cisco anyone?



Peter Carter Members
  • #13

Peter Carter
  • 33 posts

Posted 14 October 2013 - 08:48 PM

I have the same problem intermittently too. It's something buggy with the gui and monitors/service groups.

"Bandage" / temp fixes for us are:

1) Connect to the servers by direct IP, don't use DNS
2) If you MUST use DNS I've found this worked: create a new monitor with the same settings as the one that's not resolving DNS (highlight the monitor and click "new" so it carries over all the settings from the one that's not working correctly). Specify the PORT on your "new" monitor instead of leaving it blank (or maybe change it to something else if you already had specified it before). Save the "new" monitor as something else. Bind your new monitor to the service group using the DNS hostnames and whatever port they listen on. Save, apply, refresh, and re-open the service group to see if it resolved the IP. If so, remove the "new" monitor and re-apply your old one. Hopefully DNS should still be resolving. Now leave it alone ;)

If #2 doesn't work for you and you must use DNS, just know that it is a GUI bug, and you have to sort of trick it into resolving DNS. It's like it gets stuck in a spot and can't resolve it. The steps in #2 worked for us, but may be different for you. Just keep tweaking / rebuilding things and trying to trick it into resolving again. Wish Citrix would resolve this bug, I'm on version 76.7 and still see the issue. Don't upgrade to this version (or anything before this) if you are trying to eliminate this bug. It might be fixed in future patches but honestly I haven't checked.

Edited by: Peter Carter on Oct 14, 2013 4:49 PM



Neil Burton Members
  • #14

Neil Burton
  • 71 posts

Posted 02 July 2015 - 01:52 PM

Well Adrian, 2.5 years later I'm still having exactly the same problem, now with 10.5-57.5!

 

Two servers both added identically by FQDN which are resolveable and online.  Bound both to a service group.  Looking at the members of the SG one displays IP and is available, the other displays 0.0.0.0 and "domain name not resolved".

 

Both names resolve successfully from the NS console and I also tried eliminated the local LB DNS configuration and have specified an AD domain controller directly as UDP_TCP name server.

 

Messed around exactly as per Peter's advice and, lo and behold, after binding a ping monitor to the SG it suddenly resolved and started working!

 

It would be nice to think that this issue can be identified and fixed in a forthcoming release...



Inigo Alonso Members
  • #15

Inigo Alonso
  • 6 posts

Posted 17 February 2016 - 04:35 PM

Hi

 

I have just come across this issue in the NetScaler and have found this thread. After some testing I think that the problem is a race condition, and it can be easily avoided by configuring things in a precise order. Sharing the details below in case it can be of use to others. NetScaler version is 11.0-64.34.

 

The root cause of the problem seems to be that if health monitoring is enabled for the service/servicegroup to which the domain-based server is bound, the probe will immediately fail because the domain name of the server has not been resolved yet. The actual issue is that the NetScaler stays blocked on this state, and even if the name can be resolved the probe will never succeed.

 

To avoid getting in this state, one needs to configure first the service/servicegroup with healthMonitor set to "NO" and bind it to the domain-based server. Then, let the NetScaler resolve the server name a first time and update the service/serviceGroup. Once resolved, the health monitoring can be enabled in the service/service-group.

 

Here below an example configuration. I've tested this several times and it's worked OK.

 

BTW, an additional caveat that I've found: if the domain-based server and the service are configured in a traffic domain, the probes to the server are still sent using the default TD (!). The workaround is to use serviceGroup, at least I'm seeing that with ServiceGroup and domain-based servers in non-default TD, the probes are sent out using the configured TD for the ServiceGroup.

 

> add server SVR-WIKIPEDIA-WEB en.wikipedia.org

Done
> add service SVC-POLL-WIKIPEDIA-WEB SVR-WIKIPEDIA-WEB TCP * -appflowLog DISABLED -healthMonitor NO
Done
> show service SVC-POLL-WIKIPEDIA-WEB
        SVC-POLL-WIKIPEDIA-WEB (198.35.26.96:*) - TCP                <<<< server domain name has been resolved
        State: UP
(...)

> bind  service SVC-POLL-WIKIPEDIA-WEB -monitorName PROBE-PING
 Done
> set  service SVC-POLL-WIKIPEDIA-WEB -healthMonitor YES
 Done

> show service SVC-POLL-WIKIPEDIA-WEB
        SVC-POLL-WIKIPEDIA-WEB (198.35.26.96:*) - TCP
        State: UP
(..)
1)      Monitor Name: PROBE-PING
                State: UP       Weight: 1       Passive: 0
                Probes: 1       Failed [Total: 0 Current: 0]
                Last response: Success - ICMP echo reply received.
                Response Time: 0.0 millisec
 

HTH

Inigo Alonso



Torben Aagaard Members
  • #16

Torben Aagaard
  • 7 posts

Posted 04 October 2016 - 08:33 PM

Thanks, Inigo.

 

I came across the exact same error on the latest 11.0 build 68.10 - seems Citrix have not yet fixed this error :(



Torben Aagaard Members
  • #17

Torben Aagaard
  • 7 posts

Posted 05 October 2016 - 09:36 AM

Oh - BTW : The workaround above does not survive a NS reboot.

 

After rebooting, you have to disable/enable monitoring. I guess the same goes for a HA failover :(

 

Citrix - please fix !



Claus Jan Harms Members
  • #18

Claus Jan Harms
  • 5 posts

Posted 09 January 2017 - 09:57 AM

Bug is still existing in NetScaler 11.1 Build 51.21 but could be temporarily fixed by the mentioned "Workaround" with disabling and enabling Health Monitoring on the Service/Service Group.



Citrix Administrators Members
  • #19

Citrix Administrators
  • 45 posts

Posted 13 April 2017 - 11:09 PM

Having the same issue and I did manage to get mine up after disabling and re-enabling health monitoring.  But, isn't there an automated way to do this?  If we have a power failure and it restarts, no one can log in.  Long drive to the office to get this fixed.



Valeri Bonchev Citrix Employees
  • #20

Valeri Bonchev
  • 41 posts

Posted 18 April 2017 - 02:54 PM

Hi All,

 

The issue is very hard to reproduce and seems to be environmental specific. I encourage any of you that are experiencing the issue to open a support case. This will help us with more data points and will help with faster diagnosis of the issue.