Jump to content


Photo

Machine catalog not creating on AWS

Started by Joris Limousin , 07 March 2016 - 05:43 PM
21 replies to this topic

Joris Limousin Members

Joris Limousin
  • 46 posts

Posted 07 March 2016 - 05:43 PM

Hi there,

 

I am at the last step of setting up a XenDesktop 7.7 infrastructure on Amazon Web Services. I have followed this guide (Which, by the way, contains a lot of errors and bad hyperlinks) and finally I got to the last step which is the creation of the Workers.

 

As such, I have been creating a Master AMI (With magnetic storage as this may be the source of a few problems) and tried to create a machine catalog on my Delivery Controller.

 

I get the following error : "No facility available for disk upload. No facility available for disk upload. Unable to create any functioning volume service VMs."

 

I have been going through a lot of blog posts I could find talking about the same error that I have but none of them helped me to resolve this issue. I have changed my Master AMI storage type from SSD to magnetic, added "All traffic" rules on the involved security groups, and tried a lot of other fixes that should have solved my problem, but none of them worked.

 

Could you please advise on how to make this work?

 

Thank you in advance and have a nice day!

 

Joris Limousin



Joris Limousin Members

Joris Limousin
  • 46 posts

Posted 07 March 2016 - 05:47 PM

I also want to add that the error is happening just after the AMI is being available. Two instance are created ("Preparation-Xen-..." and "Something-Temp") then stopped, and then an AMI is being created. Once this AMI becomes available, the installation process fails.



Paul Howard Citrix Employees

Paul Howard
  • 65 posts

Posted 08 March 2016 - 09:43 AM

Hello, Joris.

 

Does the provisioning process fail immediately at the point the volume worked AMI is finalized? Do you not see any EC2 instances with names like "Citrix.XD.VolumeWorker-xxxxx" being spawned from that AMI, however briefly?

 

The point at which you are seeing the process fail corresponds exactly to a known issue with XenDesktop 7.6. However, if you are using 7.7 (or 7.6 with Controller Update 3), then that issue should be fixed, in which case I'm not sure what's wrong. You said you're using 7.7, but I do just want to double check this because of the nature of this symptom being identical to an older known issue.

 

Which AWS region are you using?

 

Does the error message include any additional details?

 

Thanks,

Paul.



Joris Limousin Members

Joris Limousin
  • 46 posts

Posted 08 March 2016 - 10:03 AM

Hi Paul,

 

Thank you for the quick answer.

 

Yep the process fails immediately after the AMI is finalized, and the only instances that I see are the "Preparation..." and the "...-Temp" ones.

 

I just checked the XenDesktop version and it seems to be the 7.5 version... I have been following this guide and checked XenDesktop 7.7 version when providing CloudFormation templates so I guessed it installed the 7.7 version but apparently not.

 

So what do you suggest? Should I upgrade my Storefront and Delivery Controller instances to the 7.7, 7.8 version? Is that a hard process or is it straight forward?

 

I am also in the eu-west-1 region.

 

Also, can you confirm that having an AMI with an SSD storage is going to make this process fail or has it been fixed in newer releases?

 

Finally, the guide I talked about is full of bugs/outdated links, so if you want to update this guide I think this would be beneficial to a lot of customers!

 

Thank you very much for your help and have a nice day!

 

Kind regards,

 

Joris Limousin



Joris Limousin Members

Joris Limousin
  • 46 posts

Posted 08 March 2016 - 10:11 AM

Here is the script you use in one of the CloudFormation templates you provide, I guess I better understand why it does not install the 7.7 version :P

 

if($XDVersion -eq "XD76") {
    Start-BitsTransfer -Source "https://s3.amazonaws.com/cf-XenDesktop/ISO/XenApp_and_XenDesktop_7_6.iso" -Destination "\\$DestServer\$DestShare\"
else {
    Start-BitsTransfer -Source "https://s3.amazonaws.com/cf-XenDesktop/ISO/XenApp_and_XenDesktop_7_5.iso" -Destination  "\\$DestServer\$DestShare\"
}


Paul Howard Citrix Employees

Paul Howard
  • 65 posts

Posted 08 March 2016 - 10:17 AM

Hi, Joris.

 

Thanks for the update. Clearly these scripts have not kept up with the pace of change! I can only say that I'm very sorry, and thank you for bringing this to my attention. Let me follow up internally and have the scripts audited and updated. I will also request an audit of the hyperlinks in the reference architecture document. You mentioned that you discovered other inaccuracies in the guide. Please feel free to share further details of these with me, and I will forward them on internally.

 

For now, I would recommend upgrading your existing infrastructure instances to 7.8. This should be a straightforward process. The delivery controllers can be upgraded in-place.

 

Paul.



Joris Limousin Members

Joris Limousin
  • 46 posts

Posted 08 March 2016 - 10:59 AM

Hi Paul,

 

Thank you for your attention, I will upgrade my infrastructure to the 7.8 version asap and keep you in touch if I still have problems.

 

For the guide, here is what I remember.

 

I think the step 1 on configuring the AD instances is just fine.

 

In the step 2 about NetScalers, a CloudFormation template is missing. You provide a CloudFormation template that launches a NetScaler, and another one that launches 2 NetScalers (One in each AZ) and optionally 2 CloudBridges. This second template is not available, the link is outdated.

 

There is also something that is not told in your guide, it is to add 10.16.9.11 and 10.16.1.10 as Subnet IP for NS1, and 10.16.10.11 and 10.16.2.10 as Subnet IP for NS2. I stuggled with that.

 

For the step 3, I also struggled a lot with the CloudFormation template. I remember that the section about Stripping the disks is not working because your are starting at index 0 instead of 1 with a "select disk=0". (See below the modified version I created)

 

"select disk=1 ",
"\n",
"select partition 1 ",
"\n",
"delete partition ",
"\n",
"select disk=2 ",
"\n",
"select partition 1 ",

...

 

So here you just need to increment the disks number by 1.

 

In my own implementation, I did not want to use an Enterprise version of SQL Server as this is quite costly, so I guess it could be good if you can provide an option to use SQL Server Standard with database mirroring or something else instead of AlwaysOn Availability Groups (which requires enterprise version), for small deployments like mine who cannot afford an Enterprise version. (I think I saw somewhere that the 2016 version will provide AlwaysOn for 2 nodes in the standard version, so that could solve the problem once it is released)

 

Finally for this step, I also remember that this is not a really stable deployment as it sometimes "randomly" fails. I needed to launch the CloudFormation template a few times before it worked. I do not exactly remember why, but I think it was also related to the disks, and I think I increased the waitAfterCompletion value somewhere to make it work more often.

 

For the step 4, I just remember having to manually join the second Delivery Controller to the first Delivery Controller site, because the CloudFormation template did not do it correctly. I think that's all for this step.

 

For step 5, I remember that I struggled with the Security Groups not being setup correctly.

 

And finally for the step 6, my issue is described previously in this post. :P

 

More generally, the default values/descriptions for CloudFormation parameters provided in the tables of your guide are not always corresponding to the actual parameters of the scripts. I remember that I struggled quite a lot with the right IP addresses to assign to the NetScaler instances, as it was different in your guide and on the templates.

 

Some of the Security Groups that are being created are missing some rules, I am sorry I do not remember each one. I just wrote that somewhere if it can be useful -> "Add 10.16.7.0/24 and 10.16.8.0/24 to the StoreFront Security group for ports 808. (To enable replication between the 2 StoreFront instances). Add the 3008-3011 TCP port range to destination Anywhere to the NetScaler public security group, to enable synchronisation between NetScaler appliances."

 

I think that is pretty much most of the issues I encountered during the deployment. Most of them are easy fixes, but when you are a newbie like me, it can take a loooooot of time to fix them. I probably wasted a lot of time and money in this deployment because of the issues found in this guide and that's not cool but at least it made me understand every component of the infrastructure perfectly which is cool.

 

So if you need any other details please do not hesitate to ask me and I will try to help as much as I can.

 

Thank you again.

 

Joris Limousin



Paul Howard Citrix Employees

Paul Howard
  • 65 posts

Posted 08 March 2016 - 11:14 AM

Hi, Joris.

 

Thank you, and I really appreciate you taking the time to share all of these details. I will ensure that they are passed on to the appropriate people.

 

On the subject of the cloud formation scripts, I should probably also draw your attention to Citrix Lifecycle Manager as a newer alternative to these scripts:-

 

https://www.citrix.com/products/citrix-lifecycle-management/overview.html

 

This recent product offering gives you, amongst other things, a push-button web interface for deploying an entire XenDesktop site within AWS. This would be a more reliable and better-maintained route to creating XenApp and XenDesktop infrastructure inside the public cloud. The scripts that you are using were created a couple of years ago, before the Lifecycle Manager product was developed.

 

I will investigate the possibility of getting the existing cloud formation scripts updated.

 

I'll continue monitoring this thread in case you have any problems upgrading your controllers and getting MCS provisioning to work.

 

Thanks once again for your patience and assistance.

 

Paul.



Joris Limousin Members

Joris Limousin
  • 46 posts

Posted 08 March 2016 - 05:11 PM

Hi Paul,

 

I have been working on another problem that I am not able to solve. I was wondering if you could help.

 

So I have configured everything apart from the Workers.

 

When I type desktop.domain.net in my external browser on my machine it redirects me to the NetScaler Unified Gateway login page, which is fine.

 

When I type storefront.domain.net in an internal browser (For example on the browser of the AD instance, or even on any other instance) it just gives me "This webpage is not available".

 

From what I understood, the external beacon (desktop.xxx) should redirect me to the Citrix receiver page if I request it from an external device, and the internal beacon (storefront.xxx) should do the same but only if I am requesting from an internal device.

 

So why is the external beacon working but not the internal one? I should be able to access the interface from the internal beacon as well shouldn't I?

 

storefront.xxx resolves to my storefront vServer on my NetScaler (10.16.3.7 and 10.16.4.7)

 

Would you be able to help me on this because I am completely lost. :/

 

Kind regards,

 

Joris Limousin



Paul Howard Citrix Employees
  • #10

Paul Howard
  • 65 posts

Posted 09 March 2016 - 08:42 AM

Hi, Joris.

 

Yes, I would expect both the internal and external storefront sites to work. It might be worth an audit of your security group rules, since a blocking SG is the most common cause of issues like this. For example, which SGs are associated with your storefront servers, and are they permitting http(s) ingress from the instances in your internal network? Remember that security ingress rules also include a specific source, so seeing port 80 or 443 being open does not necessarily mean that it's open to anyone. I didn't create the CloudFormation script that was used here, I'm afraid, so I don't have any instinctive knowledge of how the security groups wire together. It could be that they were set up with an assumption that end users would always be connecting from outside the cloud, rather than within. It might just need one or two additional HTTP(s) ingress rules setting up somewhere.

 

If that doesn't yield anything, then it might also be worth cross-posting this part of the question to the Storefront discussion forum as well.

 

Did you manage to create your MCS machine catalogs now?

 

Paul.



Joris Limousin Members
  • #11

Joris Limousin
  • 46 posts

Posted 09 March 2016 - 10:34 AM

Hi Paul,

 

I have been successfully upgrading my 2 StoreFront instances with the 7.8 version and it works as it did before the upgrade.

 

However, I upgraded the Delivery Controller 1 and I cannot launch Citrix Studio. When trying, it just opens a window saying "LaunchConsole has stopped working". I tried to restart but it didn't change anything.

 

What am I supposed to do?

 

Kind regards,

 

Joris Limousin



Joris Limousin Members
  • #12

Joris Limousin
  • 46 posts

Posted 09 March 2016 - 11:01 AM

Okey I have decided to uninstall and reinstall XenDesktop on my Delivery Controllers, it will be easier.

 

I will keep you in touch.



Paul Howard Citrix Employees
  • #13

Paul Howard
  • 65 posts

Posted 09 March 2016 - 01:23 PM

Hi, Joris.

 

Sorry for the radio silence - I had to set up a 7.5 environment in my lab to test the upgrade to 7.8. Sure enough, I can reproduce your issue where the Studio console fails to launch. It's quite a low-level failure and I haven't triaged the cause yet.

 

Unfortunately, I think this is a bug affecting the upgrade between those specific versions. On that basis, I think you have probably chosen the best course of action by deploying new controllers.

 

I will log this in our issue tracking system.

 

Thanks,

Paul.



Joris Limousin Members
  • #14

Joris Limousin
  • 46 posts

Posted 10 March 2016 - 11:47 AM

Hi Paul,

 

I have been reinstalling the Delivery Controllers on both instances after cleaning the SQL Server instance of the already existing users and databases.

 

First DC instance worked fine. I have been creating the site, the data has successfully been inserted in the database and Studio is showing me that everything is working.

 

Then I tried to join the second Delivery Controller to the site. I pressed "Yes" when asking if I want to update automatically the database, and an error comes up. I run the wizard to join the site again, press "No" this time and I execute the scripts manually on the database. They execute successfully.

 

The wizard on the second DC continues and an error "Cannot communicate to the database" comes up. And guess what, if I return on my first DC in Studio, I now get the same error when going on "Controllers" or "Zones".

 

I'm starting to really get tired of all these issues, and I'm wasting a hell lot of money hourly as AWS is charging me for running the instances, and you are charging me for using NetScaler instances...

 

Could you please assist me to finally have something that is working?

 

And do you know who should I ask to get a discount when I will buy all your licenses, because I think after all this time fighting to make it work and paying AWS for the instances I quite deserve something from your side.

 

Anyway thank you very much for you help Paul and have a nice day!

 

Kind regards,

 

Joris Limousin



Joris Limousin Members
  • #15

Joris Limousin
  • 46 posts

Posted 10 March 2016 - 02:07 PM

All right I fixed this issue, I know have a working XenDesktop 7.8 environment.

 

I will work on the MCS this afternoon and keep you in touch.



Paul Howard Citrix Employees
  • #16

Paul Howard
  • 65 posts

Posted 10 March 2016 - 02:25 PM

Hi, Joris.

 

Thanks for the update. I was following up internally on some of your other questions, but in the meantime I'm glad that you've been able to restore the site.

 

Incidentally, the thing we stumbled across with the upgrade turned out to be one of the listed known issues with Studio in 7.8:-

 

http://docs.citrix.com/en-us/xenapp-and-xendesktop/7-8/whats-new/known-issues.html

 

The issue we had with Studio was actually issue #617897 from that list. There was, in fact, a work around, but of course it's too late to apply it for this case. I wasn't aware of the issue myself, so I lost some time in reproducing it and reporting the bug internally. By the time I discovered it was a known issue, the work-around was no longer relevant for us.

 

I'll continue to stand-by on any MCS issues that you might have.

 

Thanks,

Paul.



Joris Limousin Members
  • #17

Joris Limousin
  • 46 posts

Posted 10 March 2016 - 03:31 PM

Hi Paul,

 

I just tried to setup a Machine Catalog using the AMI I created, using a SSD storage. It failed but at least created the "***Volumeworker***" so that's a good point, and shows that there is progress.

 

I will try with a Magnetic storage and tell you if it works.



Joris Limousin Members
  • #18

Joris Limousin
  • 46 posts

Posted 10 March 2016 - 03:56 PM

Hi Paul,

 

So I tried again with an AMI on a Magnetic Storage and it failed again..

 

What am I supposed to do in order for this to work?

 

Thank you in advance.



Paul Howard Citrix Employees
  • #19

Paul Howard
  • 65 posts

Posted 10 March 2016 - 04:46 PM

Hi, Joris.

 

If the VolumeWorker instance is being spawned correctly, then the most likely remaining issue is a failure of the SSL handshake that needs to take place between that instance and the delivery controller.

 

As part of your catalog creation process, you will have made some selections that could potentially impact this process. Specifically:

 

(1) You will have chosen a network to provision onto.

(2) You will have chosen one or more security groups to provision the instances into.

 

You will have chosen these settings for the catalog instances that you wish to deploy with MCS. However, we do use these exact same settings to deploy these temporary volume worker instances.

 

The controller needs to be able to communicate with the volume worker instance on port 443.

 

So, which network have you chosen to deploy onto? And is this network routable to the delivery controller? Is it part of the same VPC, for example?

 

And, which security group(s) are you choosing? Do these choices permit ingress on 443 from the delivery controller's security group(s). A common error here is to just choose a "default" or "quicklaunch" security group, which may not have the required ingress. The CloudFormation script will have built security groups that are appropriate for the VDAs inside your VPC. These will probably have names like "Private Group" and "Domain Members Group" for example. The names don't really matter. You just need to look over the security groups that you are chosing and make sure that 443 ingress is permitted from a source security group where your delivery controller is running.

 

How quickly is the process failing? Is the volume worker instance running for several minutes before it fails? If so, then a network or security group issue of some kind is very likely to be the problem.

 

Try pinging the volume worker instance from your delivery controller while it is running. If this doesn't work, then it confirms that there is some kind of network or firewalling issue that you will need to resolve.

 

Paul.



Joris Limousin Members
  • #20

Joris Limousin
  • 46 posts

Posted 10 March 2016 - 05:26 PM

Hi Paul,

 

The problem was indeed the Security Group. I now have a working MachineCatalog.

 

My last issue will be with StoreFront, but I will post in the appropriate section of your forum.

 

Thank you very much for all your attention to help me resolve this issue. This is greatly appreciated.

 

Have a nice day!

 

Kind regards,

 

Joris Limousin