Issue147

Project smart
Title smart's non-pycurl fetcher.py needs keep-alive and serialization support
Priority bug Status done-review
Superseder Nosy List mvo, niemeyer, rasker, thimm
Assigned To rasker Topics

Created on 2006-05-08.09:03:15 by thimm, last changed 2008-07-01.10:21:34 by rasker.

Messages
msg1421 (view) Author: rasker Date: 2008-07-01.10:21:34
Retired

Reason for Retirement: Please confirm if this is still a problem in the latest
version of Smart.

Please reopen this issue in the new bugtracker if it is still an issue.
New Bugtracker : http://bugs.launchpad.net/smart
further details:
https://blueprints.launchpad.net/smart/+spec/bug-reporting-migration.
msg532 (view) Author: thimm Date: 2006-06-12.18:25:48
I haven't yet verified whether presence of pycurl will fix it. If you are aware
of the issue and know this fixes it go ahead and close it - should my check fail
I would reopen it.
msg531 (view) Author: mvo Date: 2006-06-12.16:58:48
Thimm, do you think can we close this bug? If pycurl is used "Connection:
keep-alive" is used automatically so this is more or less a
packaging/documentation issue (packaging smart with "Depends: pycurl" should fix
it for most users).
msg486 (view) Author: niemeyer Date: 2006-05-11.20:14:40
> > Opening new connections for each package, while not ideal, shouldn't
> > kill a web server, should it?
> 
> If (when) 60 concurrent smart users open 10-20 connections
> simultaneously it runs oom.

Yes, that's what I meant. It's not opening new connections for
each package that kills the server (that's what keep-alive would
handle), but opening them in parallel.

> I think per IP it should be exactly one. Anything else is abusing the
> server's resources for no gain. The client does get the bits faster
> than other concurrent clients, though, but at the cost of the total
> bandwidth and server resources.

Indeed.

> No, I haven't. It looks like pycurl is automatically detected at
> run-time, so all I need to do is package pycurl?

Yes, it is. Packaging it should be all that is needed.
msg485 (view) Author: thimm Date: 2006-05-11.19:08:50
> Opening new connections for each package, while not ideal, shouldn't
> kill a web server, should it?

If (when) 60 concurrent smart users open 10-20 connections simultaneously it
runs oom.

> If you belive that 5 is really a bad idea, let's talk. I'm open to reducing
> this limit.

I think per IP it should be exactly one. Anything else is abusing the server's
resources for no gain. The client does get the bits faster than other concurrent
clients, though, but at the cost of the total bandwidth and server resources.

> Have you tested Smart with pycurl?  It should reuse connections
> automatically, like you're suggesting.

No, I haven't. It looks like pycurl is automatically detected at run-time, so
all I need to do is package pycurl?
msg483 (view) Author: niemeyer Date: 2006-05-11.14:11:28
> W/o keep-alive smart's concurrent downloading kills a web server, as
> each package/metadata file opens up a new concurrent httpd process on
> the other side.

Opening new connections for each package, while not ideal, shouldn't
kill a web server, should it?

> A typical smart session on a rather often updated system shows up to
> 15 httpd processes simultaneously serving this one IP.

When using URLLIB, Smart is currently limited to 5 active connections.
If you want to make tests, or even patch your local Smart, you can easily
change the constant in fetcher.py (MAXACTIVE). If you belive that 5 is
really a bad idea, let's talk. I'm open to reducing this limit.

> There are keep-alive solutions for urllib2, for instance urlgrabber's
> keepalive.py that can be used as a handler for urllib2. This looks
> easy enough for me to try patching up smart with it, if it's
> considered useful.

There's already a urllib2 fetcher, but it's commented out because it's
not thread safe. I'm not sure if we can do the same thing in urllib.

Have you tested Smart with pycurl?  It should reuse connections
automatically, like you're suggesting.

> But most probably keep-alive is not enough as smart deliberately fires
> up package retrievals in parallel, and keep-alive is only of help for
> reusing connections. If a connection is still in use, you end up
> creating a new one.  Therefore a serialization procedure is neccessary
> when the packages come from the same host (or at least the same
> channel).

Smart does limit the number of active connections already. Improving
that limit shouldn't be hard. OTOH, since you say that you have 15
open connections, there must be something else wrong.
msg480 (view) Author: thimm Date: 2006-05-08.09:03:13
W/o keep-alive smart's concurrent downloading kills a web server, as each
package/metadata file opens up a new concurrent httpd process on the other side.

A typical smart session on a rather often updated system shows up to 15 httpd
processes simultaneously serving this one IP.

There are keep-alive solutions for urllib2, for instance urlgrabber's
keepalive.py that can be used as a handler for urllib2. This looks easy enough
for me to try patching up smart with it, if it's considered useful.

But most probably keep-alive is not enough as smart deliberately fires up
package retrievals in parallel, and keep-alive is only of help for reusing
connections. If a connection is still in use, you end up creating a new one.
Therefore a serialization procedure is neccessary when the packages come from
the same host (or at least the same channel).
History
Date User Action Args
2008-07-01 10:21:34raskersetstatus: chatting -> done-review
nosy: + rasker
messages: + msg1421
assignedto: rasker
2006-06-14 17:13:49mvosettitle: smart's fetcher.py needs keep-alive and serialization support -> smart's non-pycurl fetcher.py needs keep-alive and serialization support
2006-06-12 18:25:59thimmsetnosy: mvo, thimm, niemeyer
messages: + msg532
2006-06-12 16:58:49mvosetnosy: + mvo
messages: + msg531
2006-05-11 20:14:41niemeyersetmessages: + msg486
2006-05-11 19:08:51thimmsetnosy: thimm, niemeyer
messages: + msg485
2006-05-11 14:11:31niemeyersetstatus: unread -> chatting
nosy: + niemeyer
messages: + msg483
2006-05-08 09:03:17thimmcreate