[Mageia-sysadm] [LONG] A not so modest proposal

Sun Oct 24 01:55:18 CEST 2010

Hi,

so as I said in the previous mail ( and before ), I propose that we use
puppet+svn to manage the servers. Since this may not be obvious to
everybody here is a long mail with a long explanation of each point.
( spoiler, there is a bold proposal in the middle ).

So first, what is puppet ?
--------------------------

As I didn't found a better explanation than the one of their website,
let's cut and paste : 

"Puppet is an open source data center automation and configuration
management framework. Puppet provides system administrators with a
simplified platform that allows for consistent, transparent, and
flexible systems management."

If you have heard of cfengine ( used at mdv ), or bcfg2, puppet is
similar. It allows you to describe your computers in a configuration
file and take care of installing and setting them according the
configuration file. For example, you can say 'for server of group web
server, install apache php, and use this config file, and start this
process'. Usually, such system are combined with a regular vcs like svn,
for reason that I will outline later.

To quickly explain how it work, there is a central server ( called
"puppetmaster" for puppet ) and a agent on each computer. Agents fetch
the config from puppetmaster after a configurable interval of time and
apply it ( ie, install rpm, change configfile, reload software, run
tasks, etc ).

So why use a configuration management system and a vcs ?
--------------------------------------------------------

First, it provides use with a audit trail. Like for code, we know who
changed what and why. The goal is not to distribute blame, but to be
able to have more information about a change ( ie, if 2 years ago, I
changed some php variable that later found out to break some web
software, the changelog will tell us why it was changed in the first
place, maybe to fix another important software ). It allows us to repeat
a configuration ( useful to migrate a service or to reinstall a
server ), to rollback to previous versions of the configuration, and to
work concurrently. 

It also ensure that process are running which add a extra safety layer
in case of problem. And we can use hooks to check file before applying
them automatically, or send mail to the admin when there is a change.
For example, you can no longer commit broken dns zone since they are
checked them before applying automatically ( provided you write the
proper script, of course ).

Why use puppet and not $FOO ?
-----------------------------

That's a good question. First, let's be honest, I take care of puppet in
mdv, so of course, I am biased. In term of softwares, there is lots of
choice
( http://en.wikipedia.org/wiki/Comparison_of_open_source_configuration_management_software ), but reading on the topic on the web, we can safely restrict our discussion to cfengine 2, cfengine 3, puppet, bcfg2, chef.

Chef is not packaged, nor easy to setup ( imho ) unless you use gems
( which is a bad idea ). The configuration is basically written ruby. It
seems nice, but no one played with it around me ( except people at
CERN ).

Cfengine 2 is what is used at zarb, and also lightly used at mandriva. I
took a look at mandriva configuration, it is mainly used to prevent
service from starting and to manage fstab. So nothing that could prevent
a migration and in fact, almost nothing to reuse. The version 2 is also
the legacy branch, and I am not sure it is maintained ( guillomovitch
may know better than me ).
So far, I think boklm, blino, me and maybe nanar know it.

Bcfg2 is not packaged, and use xml for config file, which is imho a bad
point for it. I have some friends that use it, they do not seems unhappy
with it. yet, I found few information about it besides the web site.

Cfengine3 is not packaged either. And the configuration language is
different from cfengine 2, which would mean that we will likely have to
re-learn it, for those that know the older version. 

Puppet is packaged ( by me ). It is used by several free software group
( afaik, redhat and mozilla among others ), maintained by a enterprise
and there is a healthy community providing modules and software around
it. From what I know, Nanar and I know how to use it. Boklm also started
to look at it. 

So I think we should first use a packaged software, for various obvious
reasons. So the choice is basically between puppet and cfengine2. I
asked to guillomovitch, our zarb.org cfengine expert about the migration
to cfengine3, and he told me he was planning to use puppet instead. And
having used both in production ( either zarb.org or my own server ), I
think puppet is nicer and provides more high level component than
cfengine 2, and is maintained. And since there will be almost no
configuration reuse from Mandriva, I do not think it will be worth to
keep it.

Why use svn and not $FOO ?
---------------------------

Again a good question. At zarb, we use svn. At mdv, afaik, we used
nothing, cfengine was just here to automate the cluster setup, which is
a similar yet different task than the one we are discussing now. So, as
I prefer DVCS, I tested git + puppet for you, 6 months ago. And after 3
weeks, I have migrated to svn the whole system. The git hooks are just
too complex for my poor mind ( ie, checking the syntax of puppet, bind,
etc before applying was not easy ). So since svn is quite straight
forward with that, and since we can always use git-svn to have the best
of both world, I suggest to use svn.

( yes, hg, bzr, cvs and the gazillion others were not suggested ).

Now, if someone is a git-hooks master, I have no problem with git.

What would be the work flow ?
-----------------------------

So the idea is :
- a admin commit some config change in svn repository, 
- svn check the syntax if needed ( pre-commit hook )
- svn extract the configuration with a post-commit hook ( and send a
email ),
- puppetmaster notice the change, 
- puppetmaster reload the configuration
- each server get the configuration after some time
- each server apply the configuration, and send mail in case of problem

We can also add ACLs if needed.

Could we also use svn to manage the documentation ?
---------------------------------------------------

Indeed, that's a good suggestion. I have learned that the documentation
is indeed something that I never find when I need, and I have been often
frustrated by the lack of useful tools to exploit it ( lik grep ). So,
to me, it make sense to have the documentation along the configuration
since I always have a checkout of the configuration on my home, and
since I often update it. ( which is usually not the case with most
wikis, as this requires extranous step ).

Where is the bold proposal ?
-----------------------------

And now, for the very very bold part of the proposal :

this svn should be public. Ie, people could browse it, changelog should
be sent on this ml ( which is public ), and people could even do
anonymous checkout. 

So I expect three type of reactions at this part of the mail :

People who say "mhh, I do not understand what he speak about".

People who say "this make sense, we are doing free software".

And admins who are screaming "argh, what about the passwords and
security !"  

So for people who do not understand, well, either reread or ask me on
irc, or by mail. There is no shame into asking questions.

For people who are screaming, first, no need to scream, I cannot hear
you. And I have of course thought of that, and we can either :

1) do a search and replace in puppet config for password stored in a
secure location in the svn hooks that do the checkout

2) use extlookup, a puppet feature that do this, provides we use it
properly
( http://www.devco.net/archives/2009/08/31/complex_data_and_puppet.php ).

Ie, we can store the passwords in a private CSV file somewhere, while
publishing everything else. Obviously, stuff that count as data ( such
as private certificates, gpg keys, etc ) should not be stored in svn, or
not in the public one. And everything that would be considered as
personal information should not be stored in the public svn either. 

And so for those that think, "this make sense", yes, it make sense, but
more than simple making sense, it bring some advantages :

Sharing is the basis of the free software ethics. And so that show our
commitment to free software.

Publishing our configuration is also perfectly in line with the values
of the project : 

"We will empower our user base by demystifying advanced technologies"
-> publishing our work as sysadmin is a step toward demystifying

"Mageia will always adhere to high security and privacy
standards/technologies to protect our users' data." 

-> letting everybody audit is IMHO a high security standard, while
security by obscurity is not. We always say that free software is better
for security for this reason, so we are consistent with this idea.

"We will cooperate with other OSS distributions and core and kernel
developers with code contribution."

-> if the configuration is treated like code, so we are cooperating by
giving it. For other distribution that may want to know how we do, be it
big distros or smaller ones. For various project that want to know how
we handle specific part of the infrastructure.

"We will maintain the vibrancy within our Community, always aiming to
lead the way in collaborative development."

-> people are eager to help us as sysadmin. I received lots of help
propositions. However for security reasons, I do not think we should
have a group of 20 persons, neither we should have a too complex
organisation for admins. Publishing the config allows us to collaborate
with others by letting them :
  - review our changes, as the svn changelog on cooker
  - directly send patches, which can be much clearer to us
  - directly search when they see a problem ( for very simple problem of
course )
  - help us to communicate with everybody when we fix problem or do some
changes.

More ever, this allow us to see who is motivated to help us, who is able
to understand our infrastructure, and therefor, this can help us to
recruit new admins in case of need, and allows us to manage the erosion
of our group. This also bring a gentle introduction to new comers, who
can see what happen.

Finally, using this will allow us to have a forkable infrastructure. And
this would a real innovation I think ( at least, a innovation good
enough to be communicated ).

I do not plan to let the project go bad, of course, but I also didn't
intend Mandriva to disappear either. 

So in the future, if something goes wrong for any kind of reason, or if
some peoples prefer to work without us, we are offering them what
Mandriva didn't offered, a easy way to fork. And while I do hope it will
not matter much in practice, I think that's a strong statement to
demonstrate that we learned from our errors and that we have evolved.

So, WDYT ?
( please do not let me be warnocked
http://en.wikipedia.org/wiki/Warnocked )

PS : baud, as I know you will ask me later and read this mail, yes, you
can use use under CC-BY-SA or what you want.

-- 
Michael Scherer