[Mageia-sysadm] Update procedure on sysadmin side

Fri Jul 22 23:37:59 CEST 2011

On Fri, Jul 22, 2011 at 05:53:57PM +0300, Thomas Backlund wrote:
> Michael Scherer skrev 20.7.2011 01:26:
> >Le mardi 19 juillet 2011 à 22:50 +0200, Michael Scherer a écrit :
> >>Hi,
> >>
> >>while trying to see how to do update ( since the list is quite long :
> >>https://bugs.mageia.org/buglist.cgi?query_format=advanced&emailassigned_to1=1&order=Importance&bug_status=NEW&bug_status=ASSIGNED&bug_status=REOPENED&email1=qa-bugs%40ml.mageia.org&product=Mageia&emailtype1=substring ), I searched bash history to find the procedure.
> >>
> >>So here is what should be used :
> >>
> >># /root/tmp/mgatools/mga-move-update 1 core me-tv
> 
> Thinking a little about updates...
> 
> Should we hardlink the rpms instead of moving them,
> and postpone the removal from *_testing for a few days (or ~a week)
> so all mirrors catch up...
> 
> That would save the mirrors from re-downloading the stuff, and make
> the updates available faster...

> It does not matter much for a simple update like logrotate, but a
> kernel update or a full kde update would definately benefit from it.

I think that's negligeable when compared to the size of cauldron updates,
and that would be slightly more complex to develop.

This would also mean we keep then in updates_testing, thus having bigger 
hdlists, which mean more rpms to parse, and more data to download by mirror 
( since hdlists are redownloaded each time, since they are compressed and
I am doubtful about the rsync-friendliness of the whole compression ).

And since hdlists are downloaded by every users doing testing, it may have a
bigger impact.

Let's check some data ( likely bogus, but that's to illustrate ). 

Imagine that a hdlist is 1.3 mo bigger due to keeping a kernel in updates testing.
( as said on http://permalink.gmane.org/gmane.linux.mageia.devel/1392 ). 
Let's say 1 update per day, and maybe more depending on when we clean updates_testing,
so that's 10 differents hdlists to download in the week, as we recreate it for
each update. Since the hdlist is compressed, I suspect that rsync may not 
be very efficient when downloading it ( ie, do it from 0 each time ).

On the mirror, the hdlist will be downloaded by each mirror or users syncing, 
let's say 10 people and 2 mirrors, that make 12.

Each syncing once per day, that make around 100 mo of bandwidth used just for the 
hdlist, on one single mirror. Compare that to the size of the update itself, ie 30
mo for the kernel example.
You can do the math also for -debug.

And as Thomas said on irc, the hdlist issue will be likely worst due to kmod() 
provides on rpm 4.9.

Now, there is a few assumption that can be discussed, and that would change
everything :
- hdlist and rsync. rsync and gzip do not mix well, but maybe we do have provision 
against that. I asked to teuf but he was not sure, maybe nanar or tv can shed a 
different light. Maybe we also no longer us hdlist. 
- number of users. Given the size of the QA team, maybe 10 is more than our 
wildest dream.

And of course, I used a worst case, and for stuff like tetex or vegastrike-data,
it would surely help to do a hardlink. But the hdlist issue would arise mostly 
for rpms with lots of files ( kernel, kde ?, various documentation ? ), so that's
something that we cannot decide without at least doing some stats.

IE, while it would help for vegastrike ( best case ), we never updated it on 
Mandriva, while there was lots of kernel update ( worst case, and on steroid due 
to the kernel multiplication ).

My point is not that we should or should not do this, but rather that we should 
first measure the impact, based on real data rather than trying to optimize 
based on guess, because it may make things worst for mirrors ( more bandwidth used
in the long run ) and for us ( more complexity on the setup ).

> the downside is of course more complex code on staging/primary mirror

Yes, and I fear this may not be so trivial to do (ie, how do we know that
we need to remove a rpm from update_testing, since we cannot base on the creation
date due to the hardlink ) 

-- 
Michael Scherer