Jump to content


Photo

XAPI fails to restart after / (root) partition became full

Started by Ors Apor Horvath , 10 October 2011 - 05:34 PM
10 replies to this topic

Ors Apor Horvath Members

Ors Apor Horvath
  • 6 posts

Posted 10 October 2011 - 05:34 PM

*Xenserver version:* 5.6 SP2

Background: The root (/) partition of the server became full, since then we are unable to connect via XenCenter, nor the xsconsole runs. Tried to restart xapi daemon, failed (was hanging), in the end was killed, and tried to start again... no luck.

root@myserver ~# xe-toolstack-restart
Stopping xapi: cannot stop xapi: xapi is not running. FAILED
Stopping the v6 licensing daemon: cannot stop v6d: v6d is nFAILEDng.
Stopping the memory ballooning daemon: cannot stop squeezedFAILEDed is not running.
Stopping perfmon: OK
Stopping the fork/exec daemon: OK
Stopping the multipath alerting daemon: OK
Starting the multipath alerting daemon: OK
Starting the fork/exec daemon: OK
Starting perfmon: OK
Starting the memory ballooning daemon: ....................FAILED..failed to start squeezed.

root@myserver ~# xe pif-list params=all
Error: Connection refused (calling connect )

[root@myserver ~]# tail -n 15 /var/log/messages
Oct 10 19:14:28 myserver python: PERFMON: caught socket.error: (111 Connection refused) - restarting XAPI session
Oct 10 19:19:14 myserver python: PERFMON: caught socket.error: (111 Connection refused) - restarting XAPI session
Oct 10 19:22:04 myserver xsconsole: Started as /usr/lib/xsconsole/XSConsole.py
Oct 10 19:22:04 myserver xsconsole: UpdateFromPatchVersions failed:
Oct 10 19:22:04 myserver xsconsole: [Errno 2] No such file or directory
Oct 10 19:22:04 myserver xsconsole: Loaded initial xapi and system data in 0.027 seconds
Oct 10 19:22:04 myserver xsconsole: Displaying 'xapi is not running' dialogue
Oct 10 19:23:55 myserver python: PERFMON: Caught signal 15 - exiting
Oct 10 19:23:55 myserver python: PERFMON: 11 Resource temporarily unavailable
Oct 10 19:23:55 myserver python: PERFMON: Traceback (most recent call last):
Oct 10 19:23:55 myserver python: PERFMON: File "/opt/xensource/bin/perfmon", line 930, in ? rc = main()
Oct 10 19:23:55 myserver python: PERFMON: File "/opt/xensource/bin/perfmon", line 880, in main cmd = cmdsock.recv(cmdmaxlen)
Oct 10 19:23:55 myserver python: PERFMON: error: (11, 'Resource temporarily unavailable')
Oct 10 19:23:56 myserver python: PERFMON: caught socket.error: (111 Connection refused) - restarting XAPI session
Oct 10 19:29:08 myserver python: PERFMON: caught socket.error: (111 Connection refused) - restarting XAPI session

Let me know what else I should provide to identify the problem. The server is a host for a production environment, so server restart is not an option right now...



Tobias Kreidl Members

Tobias Kreidl
  • 12,758 posts

Posted 10 October 2011 - 05:52 PM

If it's full, many services will not run properly because they require logging. Go to /var/log and delete some of your old logs. Of course, unless you identify why they filled up so fast, the problem is in danger of happening again.
You may need to consider at least temporarily reducing the number of retained logs, as well as turning on compression. These can be set in /etc/logrotate.conf.
--Tobias



Ors Apor Horvath Members

Ors Apor Horvath
  • 6 posts

Posted 10 October 2011 - 05:58 PM

Thanks for the reply! It's not full anymore, it was a mistake of an ISO image upload... otherwise we have no problems with disk space. So the issue still exists...



Tobias Kreidl Members

Tobias Kreidl
  • 12,758 posts

Posted 11 October 2011 - 01:17 PM

Did you restart XenCenter and/or try a reboot of the server?



Ors Apor Horvath Members

Ors Apor Horvath
  • 6 posts

Posted 11 October 2011 - 01:28 PM

> {quote:title=tjkreidl wrote:}{quote}
> Did you restart XenCenter and/or try a reboot of the server?
XenCenter yes, sure, but without a running XAPI afaik it won't help too much...
The server not: +"The server is a host for a production environment, so server restart is not an option right now..."+



Tobias Kreidl Members

Tobias Kreidl
  • 12,758 posts

Posted 11 October 2011 - 01:40 PM

Oh, right. It's not a pool is it, but standalone? How about a network restart -- "service network restart"?
That should only temporarily stop the networking.



Ors Apor Horvath Members

Ors Apor Horvath
  • 6 posts

Posted 11 October 2011 - 06:09 PM

It's a standalone, unfortunately didn't helped. Anyways, thanks Tobias! ;)
So with a scheduled downtime did a restart on the server which apparently solved the problem, although I would be highly interested how this can be solved without having to restart the host machine itself...
If anyone runs into this, don't hesitate to update the thread!



Joseph Hom Members

Joseph Hom
  • 144 posts

Posted 11 October 2011 - 09:20 PM

xe-toolstack-restart should fix the issue after clearing the space. If not then it's xapissl that has zombied out.

ps -ef | grep stunnel

will list your xapissl sessions. Kill them and restart xapissl, then do a toolstack restart.



Tim Waring Members

Tim Waring
  • 252 posts

Posted 14 October 2011 - 12:20 AM

We get this issue often, due to a network card issue with XS5.6fp1 that fills our disk with logs. I am assuming this would also work on sp2.

After clearing out space:
run 'killall -9 xapi'
run 'xe-toolstack-restart'
This gets everything running properly again and allows XenCenter access etc.



Ors Apor Horvath Members
  • #10

Ors Apor Horvath
  • 6 posts

Posted 14 October 2011 - 09:21 AM

> {quote:title=jwhom01 wrote:}{quote}
> xe-toolstack-restart should fix the issue after clearing the space. If not then it's xapissl that has zombied out.
>
> ps -ef | grep stunnel
>
> will list your xapissl sessions. Kill them and restart xapissl, then do a toolstack restart.
I tried this also as last resort, killing xapi and in case of xapissl the "restart" worked well, just the xe-toolstack-restart didn't (as quoted in my very first post)...



Ors Apor Horvath Members
  • #11

Ors Apor Horvath
  • 6 posts

Posted 14 October 2011 - 09:24 AM

> {quote:title=waringt wrote:}{quote}
> We get this issue often, due to a network card issue with XS5.6fp1 that fills our disk with logs. I am assuming this would also work on sp2.
>
> After clearing out space:
> run 'killall -9 xapi'
> run 'xe-toolstack-restart'
> This gets everything running properly again and allows XenCenter access etc.
Thanks, this is THE state where I posted to the forum. My very first post (as mentioned) is the state after killing all xapi processes (after the space was cleaned up).