[Mageia-dev] Proposal for Mageia: implement bitorrent protocol to allow updates download

andre999 andr55 at laposte.net
Wed Jan 12 09:57:24 CET 2011


Michael Scherer a écrit :
>
> Le mardi 11 janvier 2011 à 21:45 -0500, andre999 a écrit :
>> Michael Scherer a écrit :
>>>
>>> Le mardi 11 janvier 2011 à 20:03 +0100, Marcello Anni a écrit :
>>>> hi all,
>>>> i have one question (maybe it can be a proposal): is it possible to implement
>>>> the torrent protocol to faster download the updates of the distro? it could be
>>>> an interesting features for the coming Mageia releases
>>>
>>> I think the issue of faster download could be simply taken care by
>>> having more mirror, or faster one.
>>>
>>> I had under the impression that some ISP throttle down bittorrent, and
>>> that it may not be very nat and firewall friendly..
>>
>> Some suggestions for faster downloads without bittorrent.
>> 1) use aria2c (or a similar application), which uses multiple
>> connections, defaulting to 5, and allows multiple mirrors.
>> By default it starts by allocating space for the file to be downloaded,
>> which allows non-sequential downloading of the file, facilitating faster
>> downloading from multiple sites.
>>
>> 2) use mirrors which allow multiple connexions.
>> (Of course, with download software that takes advantage of this.)
>>
>> 3) use multiple mirrors.
>> (Again, according to download software.)
>
> Theses 3 suggestions basically put X time the load of the mirror for
> each client. ( or on more mirror, for that matters ).
>
> And that's quite bad from the point of view of a mirror manager.

Why would it make _any_ difference to the mirrors ?
Besides spreading the download over multiple mirrors.
We download the same amount of data from the mirrors in any case.
So we want to accelerate the download.  Some mirrors might want to 
throttle the download, or limit the connexions, to spread the download 
over a longer period of time, and they still can.
This approach just works around such limitations for the end user.

BTW, aria2c monitors the download speed, to choose the faster mirrors 
for multiple connexions.  (From the urls put on the command line.)

> For example, distrib-coffee could blacklist you if you do this, if you
> are not alone on your network connexion. And when we deployed this
> measure to protect the server, the limit was 2 connexion per address,
> since this was taking too much ressources on the old server ( each http
> request taking 1 process and so memory ). Hopefully, the hardware was
> upgraded but not everybody can afford 32g of ram and 8*2 ghz CPU.

Such restrictions are perfectly legitimate for a mirror.

> [root at distrib-coffee ~]# grep -B 6 -A 3
> MaxConnPerIP  /etc/httpd/conf.d/distrib-coffee.conf
>
> <Directory /var/ftp/pub>
>     order allow,deny
>     Allow from all
>     Options +Indexes +MultiViews +SymLinksIfOwnerMatch
>     <IfModule mod_limitipconn.c>
>         MaxConnPerIP 5
>         ErrorDocument 503 "5 connections at the same time only allowed."
>     </IfModule>
> </Directory>
>
> So I think pissing off mirror maintainers is likely the wrong way of
> solving the problem ( who was not properly explained nor looked at
> besides "it should be faster" ).

Again, the same amount of data is downloaded collectively from the 
mirrors used.
And I agree totally with mirrors using whatever download speed controls 
they wish.  (It is strongly advisable for mirrors with limited resources 
to put such controls in place.)
Just as downloaders are free to work within the limits of such controls 
to their advantage.

>> 4) use ftp instead of http
>
> Based on ?

My experience.  Most downloads are noticeably faster downloading larger 
files via ftp.  (Not just Mandriva downloads by any means.)
And not just my current provider, either.  I use a very fast internet 
access for most of my ISO downloads.  (The local library.)

> If this is based on using d-c, again, that's our custom QOS rules. If
> this is because of throttling on your provider, not everybody have the
> same provider, and so the same throttling.
> The only difference between http and ftp is that ftp server will likely
> scale better server side.

An explanation ...
And maybe at least some http servers don't allow out-of-order packet 
requests ?

>> 5) use closer mirrors.  (less delay in handshaking, etc.)
>
> I think tcp handshake is not much a problem, given the fact it happen
> once per rpm, compared to the number of tcp packet for the rest of the
> download. Use wireshark to see.

Again based on my experience.
BTW, this is mostly ISOs, since the download time of smaller files -- 
like typical .rpms -- is relatively fast.
Isn't there some sort of handshaking and verification by tcp packet ?
Also propagation time in the event of errors should explain at least 
part of the difference.

I could have added download to the fastest disk available, but that 
seemed obvious.  (e.g. one could download to a usb disk, but that would 
be painfully slow, unless one has a usb3 disk on a usb3 port.)

>> In my case, using aria2c with 2 mirrors and the default 5 connexions is
>> at least 3 times as fast as a single connexion (to my closest mirror).
>> And a much greater improvement over other download options I've tried.
>>
>> I also have configured rpmdrake to use aria2c -- it seems to give me
>> faster and more reliable updating, but I don't have any figures.
>>
>> aria2c is a console app, but it works well enough for me that I haven't
>> (yet) bothered to install the available GUI frontend.
>
> That's because it worked in your case that it would work for every
> possible case, especially without giving a proper analysis of the issue
> on your side.

True.  "Suggestions" based on my experience, in response to a request 
for faster downloads.
My figures were based on the same very fast internet access point (the 
same evening, in fact), so the differences have to be attributed server 
side.  The ratio was actually about 3.5 : 1 (according to aria2c).  The 
closest mirror allows only 1 connexion.  Tests in the past have shown 
that other mirrors (not much further) have about the same speed for a 
single connexion.  But they allow multiple connexions, which is the 
default for aria2c.
In case you haven't used aria2c, it prints the download speed with the 
estimated finishing time and current number of connexions, once every 
minute.  (In this comparison, the number of connexions fluctuated 
between 4 and 5.)

Agreed, I could have provided more suppositions as to why.

Regards :)

-- 
André


More information about the Mageia-dev mailing list