[Mageia-dev] [ANN] Power outage at Marseille DC, servers impact

Michael Scherer misc at zarb.org
Sat May 28 10:54:46 CEST 2011


Hi,

as some people may have seen, we suffered from a severe outage
yesterday, around 00h05 CET time. It seems that a electrical problem
stopped some servers at the Lost Oasis datacenter, with the net effect
of stopping valstar, alamut, jonund and ecosse, as well as the virtual
machine running on them ( friteuse_tmp ). It also impacted all servers
of zarb.org, especially ryu, that still provides supports for some
services ( like www, mailling list, secondary dns, smtp, etc ).

Perenoel, from LO, went to the building to take care of the issue and so
our servers got power and likely restarted.

Everything expect 2 servers are back. Problem is that the 2 servers who
didn't start are valstar and jonund. 

Jonund is just a builder, we have a 2nd one and we are in freeze, so we
can cope with the failure.

Valstar is the main svn and ldap server, so everything depend on it.

On zarb.org side, it seems thing went smoothly ( contrary to my
expectation ). 

On our side, we are still impacted as valstar is not up
for a unknown reason.


Impacted services :
- ldap ( so likely all access )
  - forum, no access, no one can log
  - identity, no access ( no account creation )
  - bugzilla, no one can log, but people already logged should be ok
  - transifex, same as bugzilla
  - most @mageia.org aliases ( mail are still in queue on zarb )
  - shell access ( rabbit, champagne )
  - some sympa lists ( @ml.mageia.org ), mostly council 
- svn 
- buildsystem ( no scheduler, no mirror )
- administration of all server ( no puppetmaster )

The rest ( website, blog, xymon ) should be ok. 

I am looking to find someone to fix everything ( ie, start valstar ),
and will send a email later when stuff are back. No ETA at the moment.

Sysadmins will also be looking at making the infrastructure more
resilient to such problem ( for example, a 2nd ldap would have solved
most issue, and this is already planned : 
https://bugs.mageia.org/show_bug.cgi?id=861 ). 

If people have question please direct them to the sysadmin mailling
list, where we will be happy to answer you.

-- 
Michael Scherer



More information about the Mageia-dev mailing list