[Mageia-sysadm] Error on LO side this night

Michael Scherer misc at zarb.org
Thu Aug 18 11:24:57 CEST 2011


we have recevied various error messages from alamut ( webserver ) this
morning, around 7h ( CEST ).

According to lost oasis irc log, they had a problem at this moment, with
some others servers down and doing fsck. All our servers ( and those of
zarb.org too ) got rebooted at this time.

On our side, that caused :
- ryu.zarb.org failure ( it was rebooted again by guillaume rousse at
10h15 CEST ). The last error message was about "pci hotplug poweroff"
around 7h15, and that's IMHO correspond to someone pushing the button to
start it or shut it down.
- a error between alamut and valstar. I suspect that alamut being
slightly faster to boot than valstar, it got up before valstar was
ready, thus being unable to get access to it, hence the error message
with "no route to host".

So besides fixing BIOS to boot fast on all server, and getting our own
redundant power supply and datacenter, there is nothing we can do :)

Nice thing : all our server boot fine. 
Another nice thing, hobbit works fine to send us alert.

Michael Scherer

More information about the Mageia-sysadm mailing list