Jump to content
Welcome to our new Citrix community!

Monitoring HTTPS gives error - Time out during SSL handshake stage


Jonathan Hoppe

Recommended Posts

We monitor hundreds of websites with HTTP monitors using secure and port 443, essentially checking  a website over HTTPS and looking for a 200 response code. Pretty typical stuff.  Several of these sites have the monitor consistently fail, and when we look at the servicegroup to see why, the monitor says "Last response: failure - Time out during SSL handshake stage".  This is cropping up more and more, and we can't figure out why.

 

We've looked at the typical settings like the response time-out to be sure that wasn't too low, SSL certificate errors, and we tried "GET /" instead of "HEAD /" as well, all without success.

 

To dig deeper, I SSH'd into the Netscaler and dropped to the shell and did a "curl -IL -k "https://www.example.com" to see what would result (example.com is not the real website, of course) in an effort to mimic what the Netscaler's monitor might be doing.  I got back this:  "curl: (35) Unknown SSL protocol error in connection to www.example.com:443" Okay, this is somewhat helpful, although not entirely. But for other sites that give the same "Time out during SSL handshake stage", curl is not helpful because it shows "HTTP/1.1 200 OK" on the response, which makes it even more perplexing as to why the monitor is down.

 

I'm posting today with the hope that someone has encountered a similar issue in the past and may have some ideas and/or troubleshooting strategies to see if we can get to the bottom of this. Right now we've had to use simple PING monitors for these sites instead, but that is far from accurate since PING might work when Apache/IIS is down, as we all know.

 

Thanks!

Link to comment
Share on other sites

Why not run a TCP trace on the Netscaler, and see what the actual monitors are doing, rather than trying to mimic them?

 

Not only will you be able to see the packet details, you'll see the timings. If you put your SSL key onto wireshark, it will (if recent) also decrypt the SSL packets.

 

Also worth remembering that HTTP 1.1 says you should have a Host header, which the default monitors don't actually have...

Link to comment
Share on other sites

We are having the same problem.  We have found that the monitor work fine with SHA-1 certificates, but they fail with SHA256 certificates.  Are you using SHA256 certificates?  We are having the issue when attempting to monitor two Web Interface servers runningon Windows Server 2008 R2.  We are using VPX 3000s on 10.5.  Did you find a solution?  We have been working with Citrix support...thus far, to no avail.  I am on the line with them now...just found your post.  

Link to comment
Share on other sites

John, I get the error with SHA1 certs too, so that doesn't seem to be the case for me. I also have successful HTTP monitoring with both types of certs, so it is something unique. It could be an IPS, firewall or something else on the other end too, for all I know.

 

I'm working on setting up a test VPX device so I can run a trace like Paul suggests above. My production netscaler appliance is so busy that running a trace for 1 minute makes Gigs and Gigs of data. I'll post any results.

Link to comment
Share on other sites

Re live box: can't you use a filter on the trace to pre-limit the data you capture?

 

But yeah, a lab VPX would work well (although do remember that VPX doesn't do TLS 1.1 or 1.2, which could be part of the issue!)

 

I'm assuming that you have "normal" timings for the monitor, so it's not a REAL timeout issue?

Link to comment
Share on other sites

Possible reasons :

1) If you have a firewall in between these servers which is patched with "Poodle sslv3 block" , its possible that the packets are dropped on firewall when Netscaler uses sslv3 for ssl handshake . Better disable sslv3 on the services forcing service monitors on tlsv1 .

 

2) backend server are over consumed with resources , and is rejecting some ssl connections .

 

3) backend servers have multiple interfaces , and some return traffic are not routed back to Netscaler as its taking a different interface and looping in your network .

  • Like 1
Link to comment
Share on other sites

  • 3 months later...

I can't believe that 3 months have gone by, but I finally had a couple hours to spare today, so I ran a trace and captured what the monitor was doing. This was the result:

 

TLSv1 Record Layer: Alert (Level: Fatal, Description: Unsupported Certificate)
Content Type: Alert (21)

 

So I dug further to find the difference between this monitored device and others and found that this device has a certificate with a 4096 bit key.  So I did some more testing and indeed, 4096 bit keys are not supported. 2048, no problem.  Maybe if I have extra time in the next week, I'll try a 3072 to see how that goes.

 

So the next question to the folks at Citrix is WHY!!! This will become a huge problem in a very short amount of time. Hopefully it is on the roadmap for support.

  • Like 1
Link to comment
Share on other sites

  • 1 year later...

Possible reasons :

1) If you have a firewall in between these servers which is patched with "Poodle sslv3 block" , its possible that the packets are dropped on firewall when Netscaler uses sslv3 for ssl handshake . Better disable sslv3 on the services forcing service monitors on tlsv1 .

 

2) backend server are over consumed with resources , and is rejecting some ssl connections .

 

3) backend servers have multiple interfaces , and some return traffic are not routed back to Netscaler as its taking a different interface and looping in your network .

 

1st Option fixed my issue, Thanks.

Link to comment
Share on other sites

  • 2 years later...
  • 3 months later...

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...