[Mageia-sysadm] Valstar upgrades (was: Re: [sysadmin-reports] Puppet Report for jonund.mageia.org)

Tue Oct 23 09:57:57 CEST 2012

Pascal Terjan skrev 23.10.2012 01:23:
> On Mon, Oct 22, 2012 at 11:20 PM,  <root at mageia.org> wrote:
>> Tue Oct 23 00:20:52 +0200 2012 Puppet (err): Could not retrieve catalog from remote server: Error 400 on SERVER: Could not parse YAML data for node jonund.mageia.org: syntax error on line 100, col 45: `  time: 2012-10-21 23:04:15.975319 +02:00ert: "false"'
>> Tue Oct 23 00:20:52 +0200 2012 Puppet (err): Could not retrieve catalog; skipping run
>
> This had been happening for over a month, I "fixed" it with rm
> /var/lib/puppet/yaml/node/jonund.mageia.org.yaml on valstar

Yep.

I did the same last weekend and it was happy for a few days.

puppetmaster is screwing up more often now when valstar is
more loaded with work, and I think we maybe have to move it
from sqlite to full sql too. sqlite tends to be "locked"
to often.

I think we also need to add more disks on valstar asap,
so we can distribute i/o better on it.

We discussed with boklm some weeks ago to add:

2 2TB disks in raid1 and move /distrib tree on them.
- this will move the heavy rsyncs on separate
   i/o from svn & bin repo (current distrib tree
   is 886G, of wich some will be needed when adding
   more space to /svn and /binrepo)
- this will have to be done anyway when we fork mga3
   tree at the latest, as we are running out of free
   space on the lvm

1 240GB SSD to stick /var/lib/schedbot and
   /tmp on.
- theese are heavy on i/o for queue management,
   and rpmlint / rpm unpacking ...

This way valstar will behave better even
on more busy days.

Comments ?

--

Thomas