[Mageia-sysadm] questions about our infrastructure setup & costs

Mon Apr 2 17:49:03 CEST 2012

On Mon, 02 Apr 2012, Romain d'Alverny wrote:

> Hi,
> 
> following past week-end incident, and I know that there are already
> some reflexions and discussions about that, I'm posting the following
> questions/needs, with my treasurer/board hat; some of these may
> already have answers, so please just link me to them.
> 
> It comes down to:
>  - board needs to have an up-to-date view of how much our
> infrastructure costs, and would cost in different setups; and this,
> split in separate, functional chunks;
>  - how can we change our setup to: 1) reduce the impact of having one
> chunk (here a faulty RJ45 in Marseille) shut down so much of the
> project for such a long time and 2) have a quick report, automatic
> about this (not only for sysadmin, but for all users of our
> infrastructure).
> 
> So here is how I would put it:
> 
>  A. could you, as sysadmin, draw (graphically) the dependencies
> between services, at a certain functional scale + their current
> location/host;
>    * goal: have an overview of Mageia infrastructure, from the outside
> of sysadmin team (and yes, again, that is needed);
>    * can we get it produced from the puppet conf? => the goal being
> for now to have such a visual overview first, not to have it
> automated.
>    * the function blocks I can think of would be (but add/split/fix
> accordingly):
>      + core for communication & doc:
>        - user accounts (LDAP, identity.m.o)
>        - communications (mailing-lists, mail server)
>        - documentation (Wiki, Bugzilla)
>        - a specific code repository (not related to the build system)
> for adm and/or one dedicated to organization (paperwork, reports,
> constitution, etc.)
>      + Web hosts (www, blog, planet, forums, security notifs, etc.)
>      + core for building the distribution
>        - code repo
>        - buildsystem
>        - translation tools
>        - other?
>      + core for distribution software
>        - primary mirror
>      + other?
> 
>  B. based on these functional chunks, for each, could you:
>   * document what is needed for them: storage, bandwidth, what it
> represent in full hardware today, what it should grow to. Goals are:
>     - to have a clear idea of how much it represents/costs: today, or
> if we would move to other hosting solutions (paid or not, hardware or
> virtual);
>     - to know how much we need to budget in security for these services;
>     - to know what our options (and needs) are for migrating some
> services to an architecture or a paid solution that would improve
> their availability (and accessibility in case of failure).

Using paid hosting will not remove problems like bad RJ45 or switch
that stop working. If we want good availability, we need more servers
in different places.

I think it could help to have a few other servers in a different
datacenter, and duplicate services when it is easy to do (like ldap,
mirrors list, etc ...). In other cases like build system scheduler /
main mirror however, it's difficult to duplicate without conflicts
between the different servers, and cannot be done in a few minutes (or
hours), so I think we should migrate only in case of very long outage
(when we know that it will take more than a few days). Having a few
more servers available would help if we need to migrate some things in
case of big problem on one of the server.

>  C. various questions:
>   * could both above documentation (A and B) be maintained through changes;
>   * would it be possible to have the systems hosting our services to
> have a prefix in their fqdn with the city/country they are located in?
> Goal: being more explicit about where a service is located at this
> time, so that a $ host www.mageia.org can answer me something like
> champagne.paris.fr.mageia.org - for instance. I don't mean to change
> all that, but I'm wondering about the opportunity.

I don't think having location of servers in the hostname is very useful.
It's not very difficult for someone interested to know the location of a
server.

>   * what do you think about maintaining a separate blog (for
> opening/closing tickets + a global summary of what xymon provides
> already) under status.mageia.org (or maybe a different domain, for
> that matter)? (something similar to status.twitter.com)

We are already adding posts on official blog, and mails on mailling
list in case of problem, so what would be the use of an other separate
blog for this ?