Jump to content


Photo

Had an issue with RAID and now XenServer 6.5 is in a reboot loop

Started by warrentack , 06 January 2017 - 04:53 PM
31 replies to this topic

warrentack Members

Warren Tack
  • 17 posts

Posted 06 January 2017 - 04:53 PM

Hi all.

My first post and I think it's a biggie. I've got an issue with a XenServer. The server had a hardware fault and in short, lost it's RAID configuration. It's been rebuilt (with a dodgy drive and has now been replaced, raid is reporting everything is OK). XenCenter attempts to boot, gets to the splash screen and then restarts itself. I've pressed ESC on the splash screen and grabbed the following screenshot (attached).

 

I am happy to rebuild the XenServer if theres no alternative but I need to be able to rescue 2 of the VM's that are on there. Do you think it's possible?

 

Any help is greatly appreciated, this server has been out of action since the 23rd of Dec!

Attached Thumbnails

  • rebootloop.jpg


Tobias Kreidl CTP Member

Tobias Kreidl
  • 18,287 posts

Posted 06 January 2017 - 07:19 PM

Welcome first off to the forum, and sorry this is the nature of your first post!

 

Your best hope would be to boot in single-user mode (google for how to do this, depending on the version of XS you're running) and see if you can even see the native storage drives and if perhaps they need a thorough file system check (fsck) performed. Are the VMs in question on local storage?

 

-=Tobias



warrentack Members

Warren Tack
  • 17 posts

Posted 07 January 2017 - 03:10 PM

Thanks for your help.

 

They are on local storage yes. It did actually boot and lasted about 5 hours when the raid first was rebuilt and then the VM's froze so I rebooted (a hard reboot as the console wasn't able to reboot, it couldn't suspend the VM's it said) then the server decided one of the disks had failed so I replaced it and then the raid started to recover. The server started to boot into XS and an error was reported, said to perform fsck so I did. I repaired lots of errors (however I think I ran fsck whilst the raid was still recovering) and then after it rebooted, the reboot loop appeared.

 

I'll look into single-user mode on Monday. Do you think I've caused a bigger problem by what I did above?



Alan Lantz Members

Alan Lantz
  • 6,985 posts

Posted 07 January 2017 - 04:13 PM

Welcome to the forums. The controller should totally isolate you from the drive rebuild, so running fsck may have been slower, but it shouldn't have affected anything one way or the other. Power cycling the server of course is frowned upon, but hey, it happens to all of us. Hopefully single user mode and doing fsck again will clear things. If not there is a CTX somewhere on how to reinstall XenServer while preserving VM's. You have to be very careful in that area, as by default XenServer will wipe local storage.

I seem to remember recently where if you can get the server to the point of IP connectivity, you can connect with WinSCP and pull you VHD files from the server also.

 

 

--Alan--



warrentack Members

Warren Tack
  • 17 posts

Posted 09 January 2017 - 11:57 AM

Ok. I tried booting into single user mode (using these instructions), got there but it then rebooted automatically so I decided to reboot from the installation cd and try FSCK from there (using these instructions).

 

FSCK now responds with the attached screenshot. Any ideas? Have I screwed up the filesystem?

 

 

Attached Thumbnails

  • 1.jpg
  • 2.jpg
  • 3.jpg


warrentack Members

Warren Tack
  • 17 posts

Posted 09 January 2017 - 12:16 PM

Also attaching an output of fdisk -l /dev/sda1

 

 

 

 

Attached Thumbnails

  • 4.jpg
  • 5.jpg


Alan Lantz Members

Alan Lantz
  • 6,985 posts

Posted 09 January 2017 - 05:12 PM

I'm not Linuxxy savvy enough to answer that. Scrambled filesystem for sure but I don't know how recoverable that is.

 

--Alan--



warrentack Members

Warren Tack
  • 17 posts

Posted 09 January 2017 - 05:29 PM

So I know that I'm using an LVM so have dug a little deeper but still unable to run FSCK. This is where I am at now, I have the volumes active but still can't scan... Any ideas?

 

 

 

 

Attached Thumbnails

  • 6.jpg


Jiri Cerny Members

Jiri Cerny
  • 126 posts

Posted 11 January 2017 - 10:37 AM

Hello Warren,

what type/kind od RAID are you using? How many HDDs?

 

Jiri 



warrentack Members
  • #10

Warren Tack
  • 17 posts

Posted 11 January 2017 - 11:12 AM

Hi Jiri,

 

It's a HP P410 using RAID 5 across 14 disks.

 

Can I install a fresh copy of XS on a different, SATA hdd (disabling the RAID completely during install) and then mount the images when XS is installed and the RAID is reconnected? Will it see them? I'm getting quite desperate now!



Dan Pollak Members
  • #11

Dan Pollak
  • 22 posts

Posted 11 January 2017 - 12:47 PM

Have you tried the steps from this article?


Helpful Answer

Jiri Cerny Members
  • #12

Jiri Cerny
  • 126 posts

Posted 11 January 2017 - 12:51 PM

I don't know, maybe...

 

Instead of installation you should download ISO of gparted and try boot server with it. Inside that you'll see clearly, which partitions are on your array.



warrentack Members
  • #13

Warren Tack
  • 17 posts

Posted 11 January 2017 - 01:18 PM

Right, thankyou Dan, I did the steps (pictured) and it's now booted to the console however it's not detected the SR....

 

It's on the same RAID as the XS install but when looking in 'Virtual Machines' and then'All VMs' it says <No Virtual Machines Present>.

 

Looking in 'Disks and Storage Repositories' I have <No Storage Repositories Present> and choosing 'Attach Existing Storage Repository and then 'Hardware HBA' it is unable to find anything, reporting <No Devices Detected>. The device must be there otherwise it wouldn't boot XS at all.

 

I also have no network interfaces displaying either.... 

 

Does this boot up screen look normal?, it's attached as 7.jpg.

Attached Thumbnails

  • 8.png
  • 7.png


warrentack Members
  • #14

Warren Tack
  • 17 posts

Posted 11 January 2017 - 01:21 PM

I don't know, maybe...

 

Instead of installation you should download ISO of gparted and try boot server with it. Inside that you'll see clearly, which partitions are on your array.

Jim,

 

I'm going to keep this one on my back pocket for now, hopefully the state the server is in now should make it recoverable, I really really have my fingers crossed!



Jiri Cerny Members
  • #15

Jiri Cerny
  • 126 posts

Posted 11 January 2017 - 01:39 PM

So, if you can boot XS now and you are sure, that SR was LVM, you should read this:
https://support.citrix.com/article/CTX116017 or else howto on web how scan and possibly recover LVM.

 

Be very careful...

 

Jiri 



warrentack Members
  • #16

Warren Tack
  • 17 posts

Posted 11 January 2017 - 01:55 PM

So, if you can boot XS now and you are sure, that SR was LVM, you should read this:
https://support.citrix.com/article/CTX116017 or else howto on web how scan and possibly recover LVM.

 

Be very careful...

 

Jiri 

 

Jiri,

 

They all look like they are there though? I've attached the output of lvscan, vgscan and pvscan.

 

Do you have any idea about the lack of network adapters issue? I was SO hopeful when the machine started booting up, I didn't expect this as well!

 

Thank you so much for all your help everyone.

Attached Thumbnails

  • 9.jpg


Jiri Cerny Members
  • #17

Jiri Cerny
  • 126 posts

Posted 11 January 2017 - 02:16 PM

Yes, they are. 

Second thing is to export them outside some way.

 

For me, lack of NICs and SRs seems that you have all pool metadata destroyed. I think right way will be escape VDIs and rebuild server (RAID and XS) from scratch.

 

Can you connect to the server via XenCenter? Is XAPI service running? I suppose not.

 

So, how to copy/export LVM volume to another system...



Dan Pollak Members
  • #18

Dan Pollak
  • 22 posts

Posted 11 January 2017 - 02:18 PM


It's a HP P410 using RAID 5 across 14 disks.

 

What RAID controller? Or is this software RAID?

 

P.S. I hope that these are SSD drives as RAID 5 is considered very risky with this many drives.



warrentack Members
  • #19

Warren Tack
  • 17 posts

Posted 11 January 2017 - 02:35 PM

What RAID controller? Or is this software RAID?

 

P.S. I hope that these are SSD drives as RAID 5 is very considered risky with this many drives.

Nope... They are 600gb 3.5" SAS. One of the drives is a hot spare.

 

I will amend the RAID when I've rebuilt the system, I did think 5 was a bad choice but wasn't my decision at the time. And it's a HP P410, that's the controller.



warrentack Members
  • #20

Warren Tack
  • 17 posts

Posted 11 January 2017 - 02:43 PM

Yes, they are. 

Second thing is to export them outside some way.

 

For me, lack of NICs and SRs seems that you have all pool metadata destroyed. I think right way will be escape VDIs and rebuild server (RAID and XS) from scratch.

 

Can you connect to the server via XenCenter? Is XAPI service running? I suppose not.

 

So, how to copy/export LVM volume to another system...

Cannot connect to it through anything, can't even ping....

 

I'm more than happy to rebuild the RAID and XS from scratch and import the VM's somehow, the trouble is, getting them off. I can get a USB or SATA HDD onto the machine no problem, it's just the VM's off and then back onto a new instance and getting them running properly that I'm worried about. 

 

Can I reinstall XS over the top and somehow point it to the SR? Or doesn't it work like that?