[Mageia-sysadm] questions about our infrastructure setup & costs

Mon Apr 2 16:59:59 CEST 2012

Le lundi 02 avril 2012 à 15:23 +0200, Romain d'Alverny a écrit :
> Hi,
> 
> following past week-end incident, and I know that there are already
> some reflexions and discussions about that, I'm posting the following
> questions/needs, with my treasurer/board hat; some of these may
> already have answers, so please just link me to them.
> 
> It comes down to:
>  - board needs to have an up-to-date view of how much our
> infrastructure costs, and would cost in different setups; and this,
> split in separate, functional chunks;

That's a rather odd question, since with your treasurer hat, you should
have all infos, so I do not really see what we can answer to that.

The list of servers is in puppet :
http://svnweb.mageia.org/adm/puppet/manifests/nodes/

and each has some module assigned to it, take this as functional chunk.
Unfortunately, the servers are all doing more than one tasks, so
splitting them in functional chunks do not mean much.

>  - how can we change our setup to: 1) reduce the impact of having one
> chunk (here a faulty RJ45 in Marseille) shut down so much of the
> project for such a long time and

That's easy to explain.

You identify each single point of failure, ( or spof ) and you make sure
to remove the 'single' from SPOF by making it redundant.

For exemple, have 2 redundant power supply. Have 2 redundant ldap server
( we already do it ), have 2 redundant network connection.

Of course, the downside is that it cost twice the price ( at least ),
and it is more complex.

Another solution is to try to increase the MTRR. 

>  2) have a quick report, automatic
> about this (not only for sysadmin, but for all users of our
> infrastructure).

I do think for me that the current report of xymon are sufficient. 

> So here is how I would put it:
> 
>  A. could you, as sysadmin, draw (graphically) the dependencies
> between services, at a certain functional scale + their current
> location/host;
>    * goal: have an overview of Mageia infrastructure, from the outside
> of sysadmin team (and yes, again, that is needed);
>    * can we get it produced from the puppet conf? => the goal being
> for now to have such a visual overview first, not to have it
> automated.
>    * the function blocks I can think of would be (but add/split/fix
> accordingly):
>      + core for communication & doc:
>        - user accounts (LDAP, identity.m.o)
>        - communications (mailing-lists, mail server)
>        - documentation (Wiki, Bugzilla)
>        - a specific code repository (not related to the build system)
> for adm and/or one dedicated to organization (paperwork, reports,
> constitution, etc.)
>      + Web hosts (www, blog, planet, forums, security notifs, etc.)
>      + core for building the distribution
>        - code repo
>        - buildsystem
>        - translation tools
>        - other?
>      + core for distribution software
>        - primary mirror
>      + other?
> 
>  B. based on these functional chunks, for each, could you:
>   * document what is needed for them: storage, bandwidth, what it
> represent in full hardware today, what it should grow to. Goals are:
>     - to have a clear idea of how much it represents/costs: today, or
> if we would move to other hosting solutions (paid or not, hardware or
> virtual);
>     - to know how much we need to budget in security for these services;
>     - to know what our options (and needs) are for migrating some
> services to an architecture or a paid solution that would improve
> their availability (and accessibility in case of failure).

so basically, if I take the price from OVH ( as they have a lot of
choices and are rather cheap ) :

- alamut would cost around 84 e per month at ovh.fr. That's the closest
server we can find in their offer.

- valstar has much more processors, ( 16 core ) and less ram, so let's
evaluate this at 100e to 110e per month ( processor are more expensive
than memory )

- ecosse would be around the same as alamut, but there is less ram so 70
to 80 euros per month

- jonund has more processor so let's say too around 100 to 110e per
month.

- fiona would like be 30 to 40 euros per month, given the price of
Kimsufi ( cheaper servers from OVH )

- I cannot connect to sukuc from my bastion, so I do not know, but since
that's a brand new server, let's say 80e per month.

As we cannot rent arm boards, let's assume that we will rent the space
to host them. 

Housing can be found in Paris for 300e :
http://www.online.net/serveur-dedie/offre-dedibox-housing-dedirack.xhtml

since that's too much space for 2 arm board, I found a cheaper
alternative :
https://www.ovh.com/fr/housing/location_baie_1_a_3U.xml
99e 

That make around 570 to 600 euros per month, for replacing the free
hosting in LO with paid server, hosting them on one of the cheapest
providers in the world. And for this price, we have of course no SSD on
the builder ( there is some offer with small SSD, count 10 euros more
per month and per server ) etc.

If we want to just host them in Paris, I think we can have for 600 euros
per month, just for the housing, since we would use more than 3U ( I do
not know exactly how much ).

People can feel free to redo the cost analysis on amazon EC2 or
rackspace, I was not able to understand how much would alamut cost at
rackspace ( not even if that's even possible to have a server where we
are in charge ), and amazon ec2 pricing is to hosting what java is to my
abacus.

And for being complete, I also searched random hosters around the
world :

I found this 
http://www.razorservers.com/solutions/dedicated-servers/pricing/
so a server with the same spec as alamut is around 200$ for a more
classic provider.

I found this 
http://www.server4you.com/root-server/server-details.php?products=3 
would make 85$ ( since there is setup fee for each month ). Server4you
is more like OVh.

and several others where the price is more around 150$ than 100$.

And of course, most of them have metered network connections that would
maybe not be suitable for something like valstar, who act as a primary
mirror. For reference, since we have started the server :

RX bytes:453228974131 (422.1 GiB)  
TX bytes:9311461347504 (8.4 TiB)

Uptime is 60 days.
That's around 4 T per month of transfert.

That's for alamut, to compare :
RX bytes:30792994686 (28.6 GiB)  
TX bytes:215624995862 (200.8 GiB)

While hosters often propose "unlimited transfer", most don't, and most
use unlimited in the same way that phone providers do. So we need to be
wary on this point if we want to go further in the cost analysis.

>  C. various questions:
>   * could both above documentation (A and B) be maintained through changes;

That depend on how they will be done, but I do not foresee someone
volunteering for that, and since puppet informations are not sufficient
to express that in a automated manner ( there is support for graphing
deps between modules but not inter servers ), I doubt to see it being
written soon.

Nagios do support doing some form of graphs, but we already have a
working monitoring system, and there is some more important stuff to do
before changing it ( for example, making sure that the current one is
read by people by reducing the amount of crap sent on the ml, and this
would requires someone fixing #4591, among others )

>   * would it be possible to have the systems hosting our services to
> have a prefix in their fqdn with the city/country they are located in?
> Goal: being more explicit about where a service is located at this
> time, so that a $ host www.mageia.org can answer me something like
> champagne.paris.fr.mageia.org - for instance. I don't mean to change
> all that, but I'm wondering about the opportunity.

What problem would it solve ?

The grouping of servers is already visible on xymon.mageia.org :
http://xymon.mageia.org/xymon/servers/servers.html

I pondered on adding support this in puppet for that, but in the end, I
didn't found any good reason to do that for now ( would help if we have
enough server, to setup ntp based on d-c, bastion server acl, etc, but
we are not there yet ).

>   * what do you think about maintaining a separate blog (for
> opening/closing tickets + a global summary of what xymon provides
> already) under status.mageia.org (or maybe a different domain, for
> that matter)? (something similar to status.twitter.com)

Again, that solve none of our problems at all. 

That solve a problem for a startup when they want to say "we care about
our customer, we give access to some form of monitoring", but we do
already give full access to our monitoring, so that would be redundant.

Now, maybe the current access is not nice enough, and I am sure we can
do some css work to enhance that, but as a aesthetic issue, I would not
make this a priority.

And I have seen no one saying that the current blog is not enough. If
people do not read it, they will not read another web site.
-- 
Michael Scherer