[Mageia-dev] Mirror layout, round two

Michael scherer misc at zarb.org
Mon Nov 29 01:24:42 CET 2010

On Sat, Nov 27, 2010 at 08:00:17PM +0200, Thomas Backlund wrote:
> Michael scherer skrev 27.11.2010 10:43:
> >On Fri, Nov 26, 2010 at 10:29:14PM +0200, Thomas Backlund wrote:
> [...]
> > >
> > > Then we come to the "problematic" part:
> >
> >This part look really too complex to me.
> >
> > > ------
> > > /x86_64/
> > >        /media/
> > >              /codecs/ (disabled by default)
> >
> > so, ogg, webm, being codec, should go there or not ?
> > What about patents problem about something else than codec ?
> > ( freetype, image such as gif, DRM stuff )
> >
> Actually this is the "maybe_legal_greyzone" repo,
> but since flagging it as "codecs" would really make people
> react, I named it so for now...

Sorry to be so direct, but that's doesn't answer the question :/
> > >              /core/ (old main+contrib)
> > >                   /backports/ (disabled by default)
> > >                   /backports_testing/ (disabled by default)
> > >                   /release/
> > >                   /testing/ (disabled by default)

Shall I suggest to name this one "updates_testing", for consistency ?
( consistency with backport_testing, and because this explain what goes in
more clearly. This also look simpler ).

> > >                   /updates/
> > >              /extra/ (unmaintained, disabled by default)
> >
> > If used by people, then why no one step to maintain anything ?
> Yeah, thats the problem.

If this is the problem, how does it help to have people to maintain
the application ? 

So far, the only way that really work is 
"someone take care or we shoot the do^W rpm".
So maybe we could just be more active with cleaning ?

> And reality shows we have a lot of packages assigned to nomaintainer@ ...
> > >              /firmware/ (disabled by default)
> >
> > Why separate firmware from non_free ? What does it bring ?
> > Since both of them are disabled by default, they can be simply merged.
> >
> Well, this suggestion is partly based on the fact that we have users
> that want a firmware free install, wich this would satisfy...

I do not think this warrant a full media, maybe just a way to filter package.

Using a media seems overkill to me, since this bring complexity in dialog box, from
easyurpmi to rpmdrake and installer, and since it bring complexity on mirror, on BS
and on our policy.
Maybe we could find a way to tag them "firmware", like a rpmgroup.

The benefit is the complexity will only be on rpmdrake side, not on mirroring and BS

More ever, this would much more flexible ( ie, see the games option I propose later ).
> But yes, if we ignore those suggestions, we split the firmwares in
> GPL -> /core/ and the rest to /non-free/
> > >              /games/ (disabled by default)
> >
> > That's a simplification that make no sense.
> > Not all games are big, not all big packages are games ( tetex, openoffice ).
> It's not only a size question, its also a nice option for companies
> to not mirror games ("employees should work, not play...")

Such companies likely already have admins to prevent users from installing games.
Maybe we could add feature in rpmdrake for that ( like "do not show package
that match such conditions : group =~ games/, maintainer =~ nomaintainer@, requires =~ python ).

The problem of private internal companies mirrors is really not our concern.
And their software policy, even if they may decide to apply it on a public mirror,
should not leak on our side.

> And we have some contributors that already have stated that they
> plan to add all possible games so it will grow.
> and we all know games are the fastest growing /space demanding...

Well, so either that will cause a problem on our side, in which case this will
just be unhelpful on our primary mirrors, or it will only cause issues on some mirrors, 
and in this case, there is lots of other thing that can take space that we do not 
take in account :
- debug
- source code ( except that a GPL requirement )
- adding another arch ( like arm/mips )
- adding more iso ( something that is asked each time, like 64 bits one, etc )

So if we decide "mirrors will not handle the load, so we need to split games", then we 
should also say "mirrors will not handle the load, so we need to do less iso/offer to not
mirror debug/offer to not mirror some architecture", and we end with a non consistent 
network of mirror, with lots of complexity on our side to handle the possible choice 
made by mirrors. I am not sure that users
will truly benefit from this. And I am sure that we will not benefit from the complexity.

If the space is a issue ( and I think that's one of the main one ), then we should decide 
based on metrics. Ie, we plan to have no more than X% growth in mirror size for 1 year. 
If we hit some soft limit, then we investigate and decide ( ie, stop adding big backport, 
stop adding new package, etc ).

And decide the metrics based on mirrors input, and based on packagers input.
But so far, apart from Olivier and Wolfgang, we do not have much metrics and 
requirements :/

> > >              /non-free/ (disabled by default)
> > >              /debug_*/ (disabled by default)
> >
> >
> > And what are the relation of requirements ?
> > Ie, what can requires non_free, codecs, games, etc ?
> >
> IMHO /core/ should be selfcontained.
> We are promoting open source after all.

Yes, but what about the others ?
Ie, can a game requires a codec or not ? a package in extra ?
If we remove a package from extra, do we remove everything 
that requires it ?

> > And what about something that can goes in both media, ie a non_free
> > game goes where ? A unmaintained codecs goes where ?
> Yeah, to be precise, that would need a games_non-free

another media ? Really, I think most users are already lost with the 
current media selection.
For core, we have 15/20 medias ( src + debug + binary ( 1 or 2 ) * update/release/testing/backport/
backport testing ). Each media we add at the level of core will therefore add 15 to 20 medias too.
So firmware, game, extras, codecs, non_free, that would make the total around 80 to 90 medias for a single
arch ( I assume that firmware may not have debug_* )
While it can be partially solved with a better interface for selecting media, 
we cannot do miracles if there is too much things :/

So let's try to think how we can reduce the number of media.

We have 2 kind of issue we try to solve at mirror level :
- the concern of mirror admins
- the concern of users.
with impact on BS and packagers

Mirror admins are concerned by :
- size and growth ( see Wobo mail in the past thread )
- content ( or at least, we think )

Content part is mainly legal matter, but I didn't heard any admin 
telling "we can't do that", so that's my interpretation. The concern is 
mainly around DCMA and EUCD, even if lesser know laws also exist around 
the world ( like the Paragraph 202C of German law, who ban "hacking tools" ). 
For DMCA, there is some protection for them : 
http://www.benedict.com/Digital/Internet/DMCA/DMCA-SafeHarbor.aspx .
For EUCD and the rest, I do not know.

Users are concerned with a wide range of issues, some contradictory :
- some want newer stuff, some don't
- some want stable stuff, some do not care as much
- some want non_free, some don't want it
- some want firmware, some don't
- etc

Yet, the users concern mainly evolve around 2 things :
- package availiability
- package filtering, based on packages content

The first part is already solved by the subdivision ( release, etc ). We
need to split them for build reason. So we can't really avoid adding 
medias on this part.
The second part is more tricky. And in fact, I think we can avoid creating media
for this. Ie, do not let the concern of filtering appearing on 
the BS and mirrors, and push this on endusers system. 
Some people do not want firmware on their system, they do not really care about 
the firmware being in a separate directory on mirrors, as long as they can
disable them easily from the list of package they can install ( at 
perl-urpm level, IMHO ).

Same goes for non_free, or for nomaintained software. Or even games.

So if we push the users issues on endusers system, we only have to manage the 
mirror admins issue on mirror.

And so here is a proposal that start by the size issue :

- discuss with mirror admin, decide on a size that everybody would agree to mirror
for core/ for the next release, or the 2 next one. Ie, every year or every 6 months,
we do a survey of our mirrors, to see if everything goes well for them.
- discuss also of the growth of core in term of size
- decide on a limit size
- if anything goes off limit for mirror, add a overflow/ to hold the packages 
that will not be mirrored by everybody. Overflow will be treated like core, in all points. 
Only difference is that mirroring is optional ( but strongly encouraged ) 
- put everything in core, except what goes to overflow.
- let users filter on their system, with something urpmi side ( I suggest a filtering
when we do urpmi.update, but the exact details of how to do it are not relevent now ).

Overflow will be filled with packages that : 
1) are not required by anything else ( thus games data would likely fit, 
but not only )
2) have triggered the limit of size

After the limit of core size is raised ( ie after all mirror have agreed ),we can readd packages 
from overflow to core, based on 
criteria not defined yet ( first come first serve, try to make most useful first ?
or some wild guesstimate based on some mirrors stats ? ). But being in core or 
overflow should not change anything for both enduser and packagers. This is 
a mirror only concern, and so should be kept there only. 
And this should avoid discussion about the location of packages by packagers. 

This mean that both core and overflow should be by default on users system.
( and I would not be against a better name, but I didn't found one )

In order to reduce number of media, another question is :
- should non_free have it own media ?

Having them in core would simplify the BS, the upload and the mirroring.

Having it separated would be better from various points of view ( political, 
communication, etc ). Maybe some people will refuse to help us if we don't, 
maybe there is some further restriction on some non-free software leading us
to create another media whatever we do, I do not know.
To me, as long as we can filter on user side, it would be ok.

I cannot really tell what I prefer for that :/

So the only important mirror issue left to solve is the greyzone area. 
And well, that's quite complex.

So we can either :

1) decide to not care ( ie everything in core )
2) decide to not offer them at all ( aka offload to PLF )
3) decide to add a media ( aka the "codecs" media )

1 is the simplest. But maybe not really a good idea.

If we care, then what indeed should be done is another media, and let admins 
choose to mirrors it or not. I would even propose to revise the idea of 
separation every year, because if all mirrors have the 
2 medias, no need to split in reality ( but I doubt it will happen, but 
at least, this would show that we try to revise our fondation on a regular
basis ). And at least, we should revise the packages present in such medias.
If there is some packages that can be moved to core, 
then they should.

We could also simplify a bit the BS by placing non-free packages there
( instead of either having a non_free media, or the non_free pacakges in core ).
It would sadden me a little to blur the line between "free with patents problems" 
from "non free", but my PLF experience showed that most people do not care, and that
it requires more than a media separation.

So, in the end, we would have :


"overflow"/    <- big packages, just for mirroring issues
restricted/    <- with non_free, firmware, "codecs"

with the 5 directories under them, and with src, debug, binary.
Imho, 3 upper medias is the simplest we can have ( besides debug/src, that 
I would place also on the same level than the binaries, but my 
mail is already long enough :/ )

> For codecs either a extra_codecs or simply drop after a grace period.
> but I guess codecs are important to people, so hopefully they wont
> get orphaned...

Unfortunately, there is not always a relation between "being important 
to users" and "someone want to take the burden of maintaining it" :/
For example, something like etherpad would be nice for users,
yet no one will take time to maintain it. 

Michael Scherer

More information about the Mageia-dev mailing list