[Mageia-webteam] Forum VM needs

Michael Scherer misc at zarb.org
Thu Jan 13 13:29:04 CET 2011


Le mercredi 12 janvier 2011 à 21:27 +0100, Maât a écrit :
> Hi there,
> 
> As it seems VM creation takes a little bit of time due 
> to people being under heavy load at work Anne and misc 
> considered the option of creation the Xen VM on one of 
> our servers (we could migrate the VM on atalante later)

The exact technology should not matter much, that's also what puppet is
made for. Ie, unless we plan to do a migration at the system image
level, we could simply install the 2nd vm, put puppet, clone the
computer, migrate the db and ip.

( not that I do not like xen, but I would prefer something else ).

> For that misc asked for Forum needs...

I think I didn't make myself clear. I wanted information to deploy it
like where is the git stored ( a url, not "it is on a server" ), who
will need what access, etc. But the information you gave are also
important ( and bring lots of question as you can see ).

> For the beginning i'll consider that we are going to put everything on the same machine 
> (DB and PHP). This is not rally brilliant to virtualize DB servers but i guess this will 
> not kill the VM in the first monthes as the tables will not be big.

AFAIK, using virtio and proper cache, this should not be much a problem.

> So phpBB needs a LAMP Stack : Apache + PHP5 + MysSQL5 (it prefers to have MySQLi extention)

No specific requirement in term of version, using 2010.1 rpm should be
ok, I assume ?

> And we'll need with php the optional :
> -- zlib compression (better having it)
> -- remote ftp support (well... i'm not in favor even if documentation asks for it)
We could drop outgoing connexion if needed.

( yes, php make me paranoid in term of security )

> -- XML support (better having it)
> -- Image Magick support (better having it)

php-image-magick. I do think there is a conspiration to make me have a
stroke. Security research by a friend of mine on ImageMagick do not make
feel safe to know we will use it, but if this is required, we have no
choice.

Just to know, what will it be used for ( I assume this will be used to
resize avatar ) ?

> -- GD support (same as Image magick)

Does the forum support suhoshin, or various php hardening measures ?

Did you do various testing with a hardened configuration with dangerous
call disabled ( mainly remote url access for a start, but i also think
we can use opendir restriction, etc, etc ).

Does it have non regression testing ( so we can enable stuff and see if
anything break ? ) ?

> For source management git will be used... so we'll need it too :)

Just git clone ?
I have a puppet module for this, just need tests before I commit.

For git hosting, again, while I am in favor, there is a few questions to
answer and prepare it, see my previous mail about what is needed.

> As forum have often to face bruteforce having Fail2ban would be really great... 
> for every open service like ssh 

On ssh level, and for me, that's a vote in favor of "no". We use ssh
keys only for admins, so fail2ban will just cause trouble.

> but also for forums... i'd like to have Fail2ban 
> parse a file of phpBB failed login to trigger a IP low level ban during a 
> few hours or more...

Well, if you give us the configuration, we can see. 
We can also use the trick that Olivier deployed on d-c to avoid numerous
connexions from the same IP ( in case someone decide to be smart and do
simultaneous attempts to log ).  

> For forum management we'll need :

>>> we
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'we' is not defined

or for those who are not CxO-fluent ( private joke ), who is 'we', in
term of organisation ( ie, do we need to create a ldap group, etc ) ?

> -- access to sources (read/write)
I rather keep this automated from git, for security reasons and to avoid
human errors. I would even add a cron job that does a git diff or
something similar, to detect if someone uploaded a file manually, or
touched to it using apache. 

In fact, as a security measure, I think the user that will write the
source should put it read only for apache. Ie, use a separate system
user for that.

> -- access to data zones (avatars and uploaded things) (read/write)
You mean apache will need it, no ?

Direct access seems to me a pretty rare event, we can grant access if
there is a really lots of request, ie if you annoy admin enough to make
them give it rather than doing themselves.

> -- access to accesslogs and errorlogs (read)
then this should be merged with the webmasters concept that romain
explained. For now, we didn't setup anything ( we didn't even split log
file on alamut, even if this should be trivial ).

> -- ability to change php log levels
This can be done by php, I think.

> -- access to php logs (read)
same as accesslogs

> -- console access to database(s) (i'd prefer to avoid completely phpMyadmin on the forum server)
I would prefer avoid giving console access until there is a real need. I would favor then a remote 
mysql access, and forcing ssl, maybe even limited by fixed ip address if you wish to avoid bruteforce.

( I will not go to the point of proposing to use a vpn too, but
almost ).

Maybe we could think of some kind of ssh bastion for such access ( or
maybe that's overkill too ).

> For performance questions : i guess forum opening will trigger a rather vast 
> amount of people coming (at least to register their nicks)... i'd be happy to 
> avoid the server being loaded to death.

Registration will be done on catdap from what I think we agreed on, no ?
( correct me if I am wrong ). 

So we need to work on that part ( starting more processes, and so
letting us tune that with puppet ( this is hardcoded now, AFAIK ). So
depending on where we host the forum, we can surely avoid this effect.

> So i'm targetting at least one thousand simultaneous users being active on the 
> forum... that will do for apache tuning.

Ok so let's say 120 simultaneous process for apache, which also mean we
need to keep apache process as lean as possible ( ie, no unused module
loaded ). I assume that there is no guarantee on being thread safe from
php and associated library, so we will use mpm-prefork. 

Since the server is isolated and serve only for php hosting, I guess
using fast-cgi will not bring much to the equation, when compared to
mod_php.

Let's also assume 30 processes for forum registration on catdap ( if I
am not wrong on that part, of course ) ? We could surely mitigate the
potential overload by not announcing this on every possible channel at
the same time ( ie, first a mail, then a blog post, then
identica/twitter ).

Should we also maybe need to tune ldap ?

> For database that will mean 800 to 1200 requests per seconds...
> 
> We'll have 2 - 3 months to see the tables grow and tune the indexes and the memory accordingly.

That mean that we will have to deploy some monitoring, and we didn't
decided anything ( buchan proposed hobbit, I proposed munin, purely by
familiarity ).

What metrics would you need so we can work on them in priority ( once we
start to set up something ) ?

> But i think our needs will stabilize around 4-6 GO for RAM if the forum gets really 
> used (we'll have to tune mysql to keep many requests in cache) apache+mysql all 
> included... if we split later apache and mysql on separate machines the needs on 
> each machine will be obviously lower.

No cache ( squid, varnish ) ?
No php level cache too ?

( not that it may be requested now )

> For app disk space code is under 50 megs... and with hundred of avatars uploader 
> we will not grow above 1GO
> 
> For database disk space even after years of activity we'll remain under 5GO

Ok so let's allocate a 10 g partition for the db + ssthat on lvm. 
We should take in account logs, and logs backup ( french law requires 1
year of logs ). 

How many logs are to be expected per day ?
The only busy webserver I can think of is d-c, but Nanar and I just
discovered that the configuration is not good.
So now, that's 5g of log, uncompressed, per month.


> We'll need to set up some tables with heavy read and write accesses with InnoDB (not all) : 
> that would be great to have one file per table innodb option enabled

Ok, I guess it should be safe to enable it for all mysql db I guess.

> Nota : i'd like to use https (at least for admin accesses)... so that will mean to enable 
> ssl and open 443 port also

We did not plan to let people use their password under cleartext at all.
Centralized auth have been setup ( and should be used for forum too ),
so people will reuse their password, the same used at others part of the
infrastructure, and that mean svn, or bugzilla, etc. Since people with
access will use it, no cleartext at all when the password is sent ( or
over my dead body, after fighting my ghost ). 

I guess we can make exception for the cookie, as long as it is not
shared ( ie, we will have to rethink the scheme if we deploy SSO ). 

That also mean that people will complain because of firefox if we do not
buy a certificate.


> That's all for system level... i think directory structures (Which concerns apache web root config) can be dealt with later...
> 
> Tell me if you have got everything you need for VM creation...

What I needed was more information for forum deployment, not vm
requirements, I guess I didn't express myself clearly. The requirement
for the vm in term of memory/disk have been roughly drafted before. What
I would prefer is a deployment document.

Ideally, I would also prefer that discussion regarding forum development
occurs on public ml ( for webteam here ) rather than private mail
exchange.

So @sysadmins, where do we host this ( for temporary creation and setup
until MLO do it ) :

10 g of disk ( let's say 20 )
4 to 6 g of ram 
( and I guess 1 or cpu ).

Alamut ?


In term of network, do we use a reverse proxy, or do we ask for a ip to
LO. I would suggest a reverse proxy, easier to setup ( and no fiddling
with bridge ).


What virt technology ?

I am quite biased in favor of libvirtd + kvm + virt-manager, as this is
supported by upstream and redhat, but maybe other have different
experiences.

-- 
Michael Scherer




More information about the Mageia-webteam mailing list