python - Threading HTTP requests (with proxies) -
i've looked @ similar questions, there seems whole lot of disagreement on best way handle threading http.
what want do: i'm using python 2.7, , want try , thread http requests (specifically, posting something), socks5 proxy each. code have works, rather slow since it's waiting each request (to proxy server, web server) finish before starting another. each thread making different request different socks proxy.
so far i've purely been using urllib2. looked modules pycurl, extremely difficult install python 2.7 on windows, want support , coding on. i'd willing use other module though.
i've looked @ these questions in particular:
python urllib2.urlopen() slow, need better way read several urls
python - example of urllib2 asynchronous / threaded request using https
many of examples received downvotes , arguing. assuming commenters correct, making client asynchronous framework twisted sounds fastest thing use. however, googled ferociously, , not provide sort of support socks5 proxies. i'm using socksipy module, , try like:
socks.setdefaultproxy(socks.proxy_type_socks5, ip, port) socks.wrapmodule(twisted.web.client)
i have no idea if work though, , don't know if twisted want use. go threading module , work current urllib2 code, if going slower twisted, may not want bother. have insight?
perhaps easier way rely on gevent (or eventlet) let open lots of connections server. these libs monkeypatch urllib make async, whilst still letting write code sync-ish. smaller overhead vs threads means can spawn lots more (1000s not unusual).
ive used loads (plagiarized here):
urls = ['http://www.google.com', 'http://www.yandex.ru', 'http://www.python.org'] import gevent gevent import monkey # patches stdlib (including socket , ssl modules) cooperate other greenlets monkey.patch_all() import urllib2 def print_head(url): print ('starting %s' % url) data = urllib2.urlopen(url).read() print ('%s: %s bytes: %r' % (url, len(data), data[:50])) jobs = [gevent.spawn(print_head, url) url in urls]
Comments
Post a Comment