Shutdown script for pool and NAS

Emanuele Chionetti · September 3, 2015

Dear all.

I'm trying to setup last things on my new pool.

At the moment I am dealing with a script aiming to perform the following actions upon UPS detecting power failure.

- shutdown all the VMs in the pool (all of them have xs-tools installed and working)

- shutdown all the hosts in the pool except the pool master (actually only one slave but I could add more in the future and I want my script to be smart enough)

- shutdown the NAS containing the main SR for the pool (iSCSI connected)

- shutdown the pool master

I wrote down the following script but (being in a production environment) I've not been able to test it yet, and in any case before doing it I'd need some clarification as per my followins questions.

------------------------------

#!/bin/bash

# THIS HOST HAS TO BE THE POOL MASTER

# Obtain the name of all the VMs in the pool

vm_uuid=`xe vm-list --minimal`

# Shutdown all the VMs in the pool

xe vm-shutdown uuid=$vm_uuid --multiple

# Obtain the name of this host

this_host=`hostname`

# Obtain the uuid of this host

this_host_uuid=`xe host-list name-label=$this_host --minimal`

# Obtain the uuid of all the hosts in the pool but this one

other_hosts_uuid=`xe host-list --minimal | tr ',' '\n' | grep -v $this_host_uuid`

# Disable and shutdown all the hosts in this pool but this one

xe host-disable uuid=$other_hosts_uuid --multiple

xe host-shutdown uuid=$other_hosts_uuid --multiple

# Disable this host

xe host-disable uuid=$this_host_uuid

# Shutdown the NAS

ssh root@192.168.0.181 'shutdown'

# Shutdown this host

xe host-shutdown uuid=$this_host_uuid

------------------------------

Some questions.

1. How can I be sure one command is completed before the next one is processed? In particular for the NAS: when I send the shutdown command I have to be sure the VMs are switched off.

2. Is it safe to shutdown all the VMs, all together, or should I filter at least the Dom0 VMs using param is-control-domain and shutdown them separately?

3. Do you think the sequence is correct? In particular for the NAS. Do I have to wait one instant before shutting down the master (as in my script) or could I send the shutdown command immediately after the VMs are off?

Many thanks for your help.

EC

Tobias Kreidl · September 3, 2015

Hi, Emanuele:

A couple of things that might help. The master/slave of a given host can be checked by looking at the file /etc/xensource/pool.conf

on a XenServer and also, "xe pool-list" will tell you the UUID of which host is the pool master.

You only need to shut down running VMs, so I'd use something like this to get the list:

xe vm-list power-state=running --minimal

As to commands, if strung together in s script they will execute in sequence and wait for the previous line to finish before moving on. You can check typically a return code to see if the command just executed was successful or not with the variable "$?" (0 indicates full success) and in particular, you may have issues any time a VM doesn't shut down cleanly, so you may want to create a loop and query of any VMs are up and see if that completes cleanly before proceeding, or you may just want to give up and go ahead as shutting down the host is better than the power suddenly being turned off. For unresponsive VMs, consider suggestions listed here: http://support.citrix.com/article/CTX131421

A forced shutdown or power state reset may be required for some VMs.

If you do not have a very large number of VMs on each server, doing the shutdown individually instead of using the multiple flag would give you more control, but would also be slower.

As with anything like this, it's always best to try it out on a test system first.

Regards,,

-=Tobias

Emanuele Chionetti · September 8, 2015

I promised to post the final procedure and here I am.

In attachment you will find the script which I tested and currently running on my pool master.

The script is meant to run on the pool master in order to be the last one to be shutdown.

If the host is not the pool master the script will exit.

At the moment it is not taking into account HA or WLB.

In essence these are the operations it is aimed to do:

1) gracefully shutdown all the VMs, all together and in background (be careful to use it with a high number of VMs)

2) begin monitoring the shutdown for a maximum amount of time; if the shutdown command above failed for some VMs it is given again with the "force" option enabled

3) if the maximum time has gone and some VMs are still running a "reset-powerstate" command is given to allow the host to shutdown

4) disable and shutdown in background all the hosts in the pool except the one on which the script is running (being it the pool master)

5) begin monitoring the hosts by pinging them until they are no more reachable

6) disable the pool master

7) shutdown the NAS/es (Linux based) issuing a shutdown command via ssh

8) shutdown the pool master

You can choose the best way for your environment to call the script at the right moment.

In my case apcupsd calls it through apccontrol.

I want to thank Tobias for the given help.

Hope this will be helpful for someone else.

Regards.

EC

doshutdown.zip

Emanuele Chionetti · September 3, 2015

Tobias, thank you.

As usual, your suggestions are far more than welcome.

Maybe I could submit a shutdown --multiple in a first step to speed up things, and then loop through all the previously running VMs to check if they are shut down. In case someone is still running I could force the shutdown. If even the forced shutdown fails I'd give up and continue with shutting down the host.

In any case, once "xe vm-shutdown" finishes with error code 0 I can be sure the VM is stopped, right? My concern is the eventuality of shutting down the NAS while the VMs are still shutting down.

Finally: do you think it's better to unplug the iSCSI SR PBD before shutting down the NAS?

Thanks again.

EC

Tobias Kreidl · September 3, 2015

A return code of 0 just indicates the command executed correctly, not that a shutdown itself necessarily succeeded, hence the suggestion to check the status of all your VMs in a loop would be a good one to assure all VMs are properly dealt with.

Try it on a VM that fails (maybe a non-existing one) and see what the return code is or on one that's already shut down.

If there is no iSCSI activity you probably don't need to unplug the NAS would be my thought.

Best,

-=Tobias

Emanuele Chionetti · September 3, 2015

Tobias. I worked a bit on the VM shutdown script. What do you think of this?

Basicly I first try to gracefully shutdown all the VMs. If the command fails I try again with the "force" option.

Then I begin monitoring the VMs waiting 'til a maximum time or the succesful shutdown of all the VMs, whatever occurs first.

#!/bin/bash

# This will try to shutdown all the VMs in the pool

max_shutdown_time=$1 # Max time before the script closes is passed to the script [seconds]

vms_uuid=`xe vm-list power-state=running --minimal` # Obtaining the running VMs UUIDs

for vm_uuid in $(echo $vms_uuid | tr "," " ") # For each running VM

do

xe vm-shutdown uuid=$vm_uuid # Gracefully shutdown

if [ $? -ne 0 ] # If shutdown command exit with an error code...

then

xe vm-shutdown uuid=$vm_uuid force=true # ...shutdown is forcibly given again

fi

done

vms_status=99999 # Initializing the variable containing the running VMs with a huge number

start_time=$SECONDS # Setting the start waiting time

until [ $vms_status -eq 0 ] || [ $(( SECONDS - start_time )) -gt $max_shutdown_time ] # Loop until all the VMs are shut down or the max waiting time has passed

do

vms_status=0 # Reset the number of running VMs

for vm_uuid in $(echo $vms_uuid | tr "," " ") # For each initially running VM

do

vm_status=`xe vm-param-get uuid=$vm_uuid param-name=power-state` # Obtain its power state

if [ $vm_status = "running" ]

then

vms_status=$((vms_status+1)) # Increase the number of running VMs by one

fi

done

sleep 5 # Wait 5 seconds

done

exit 0

Thanks in advance for your thoughts.

EC

Tobias Kreidl · September 3, 2015

Looks nice from just a quick look.

Depending on the amount of time on your UPS, you may wish to figure out what the longest acceptable time is before you have to force the host to shut down.

Also, you'd want your master to go down last of all the hosts, so what is the plan to coordinate that: have it ping the others and not shut itself down until they are no longer reachable? One last thought for now is that you'd of course want to turn off HA, if you have it enabled, first thing, as well as WLB, if any of those are implemented within your pool.

-=Tobias

Emanuele Chionetti · September 4, 2015

Ok, I've gone deeply in this matter and found some interesting things.

First of all, I've been able to test what I've done yesterday and, apart from some scripting error, it worked well.

All the VMs were shutted down, then the slave host/s, the NAS, and finally the master.

I followed Tobias' suggestion and added a loop checking if the slave hosts are still reachable so to shut down the master only when the slaves are off (I will post my final script as soon as it will be working in case someone could be interested in something similar).

Now the problem.

Contrary to my initial concerns I found out that "xe vm-shutdown" waits until the VM is completely switched off.

Although in general this is a desirable behavior, in this case I need to speed up the shutdown procedure.

What if some VM remains unresponsive or locks for some reason during the procedure? At a given point the UPS will need to turn off and we could still be waiting for some VMs to shutdown.

Furthermore, I have a VM taking ages to turn off. I need to begin its shutdown as soon as possible and I do not want to wait until it is turned off to begin shutting down the others.

At the moment the loop taking care of verifying whether all the VMs are turned off is completely unuseful: if the script reaches the loop it means all the VMs are already shut down.

Do you know a way of beginning the shutdown process and go on without waiting the command to complete?

@Tobias: I surely agree HA and WLB should be taken into account; at the moment neither of them are enabled on my pool; something to work on in the future.

Many thanks to anyone wishing to help.

EC

Tobias Kreidl · September 4, 2015

Hi, Emanuele:
You can always use the command

xe vm-reset-powerstate uuid=<UUID_of_the_VM_to_recover> --force

ro force it to be off. As to other processes, if you backgorind then (by adding an '&" a the end of the command line) the process will run in the back ground and an be monitored with the "jobs" command to see if the process is still running.So

instead you'd run:

xe vm-shutdown uuid=$vm_uuid force=true &

That way, you are not forced to wait if one VM gets stuck and can still keep track of all their processes. That might overload the server if you have a huge number of VMs running, so I'd be aware of that. If you perform the shutdowns staggered in time some, that would help.

--Tobias

Robert Madrian · August 6, 2019

Hello can you repost the script because I cannot download it?

robert

Emanuele Chionetti · August 7, 2019

15 hours ago, Robert Madrian said:

Hello can you repost the script because I cannot download it?

Once signed in you will be able to download the attachment.

Just tried and it worked. For some reason it saved the file with different name and extension. Just rename the file and you're done.

emanuele

Sign In

Shutdown script for pool and NAS

Question

Emanuele Chionetti

Link to comment

10 answers to this question

Recommended Posts

Tobias Kreidl

Link to comment

Emanuele Chionetti

Link to comment

Emanuele Chionetti

Link to comment

Tobias Kreidl

Link to comment

Emanuele Chionetti

Link to comment

Tobias Kreidl

Link to comment

Emanuele Chionetti

Link to comment

Tobias Kreidl

Link to comment

Robert Madrian

Link to comment

Emanuele Chionetti

Link to comment

Create an account or sign in to comment

Create an account

Sign in

Discussions

Netscaler

Citrix

Technical Articles

Tech Insights

Community Articles

Resources

Events

Education