python - Decoding response while opening a URL -
i using following code open url , retrieve it's response :
def get_issue_report(query): request = urllib2.request(query) response = urllib2.urlopen(request) response_headers = response.info() print response.read()
the response follows :
<?xml version='1.0' encoding='utf-8'?><entry xmlns='http://www.w3.org/2005/atom' xmlns:gd='http://schemas.google.com/g/2005' xmlns:issues='http://schemas.google.com/projecthosting/issues/2009' gd:etag='w/"duufqh47ecl7ima9wxbbfeg."'><id>http://code.google.com/feeds/issues/p/chromium/issues/full/2</id><published>2008-08-30t16:00:21.000z</published><updated>2010-03-13t05:13:31.000z</updated><title>testing if chromium id works</title><content type='html'><b>what steps reproduce problem?</b> <b>1.</b> <b>2.</b> <b>3.</b> <b>what expected output? see instead?</b> <b>please use labels , text provide additional information.</b> </content><link rel='replies' type='application/atom+xml' href='http://code.google.com/feeds/issues/p/chromium/issues/2/comments/full'/><link rel='alternate' type='text/html' href='http://code.google.com/p/chromium/issues/detail?id=2'/><link rel='self' type='application/atom+xml' href='https://code.google.com/feeds/issues/p/chromium/issues/full/2'/><author><name>rah...@google.com</name><uri>/u/@vbjvrvdxdhzcvgj%2ff3tbuv5saw%3d%3d/</uri></author><issues:closeddate>2008-08-30t20:48:43.000z</issues:closeddate><issues:id>2</issues:id><issues:label>type-bug</issues:label><issues:label>priority-medium</issues:label><issues:owner><issues:uri>/u/kuchhal@chromium.org/</issues:uri><issues:username>kuchhal@chromium.org</issues:username></issues:owner><issues:stars>4</issues:stars><issues:state>closed</issues:state><issues:status>invalid</issues:status></entry>
i rid of characters <, > etc. tried using
response.read().decode('utf-8')
but doesn't much.
just in case, response.info() prints following :
content-type: application/atom+xml; charset=utf-8; type=entry expires: fri, 01 jul 2011 11:15:17 gmt date: fri, 01 jul 2011 11:15:17 gmt cache-control: private, max-age=0, must-revalidate, no-transform vary: accept, x-gdata-authorization, gdata-version gdata-version: 1.0 etag: w/"duufqh47ecl7ima9wxbbfeg." last-modified: sat, 13 mar 2010 05:13:31 gmt x-content-type-options: nosniff x-frame-options: sameorigin x-xss-protection: 1; mode=block server: gse connection: close
here's url : https://code.google.com/feeds/issues/p/chromium/issues/full/2
from htmlparser import htmlparser import urllib2 query="http://code.google.com/feeds/issues/p/chromium/issues/full/2" def get_issue_report(query): request = urllib2.request(query) response = urllib2.urlopen(request) response_headers = response.info() return response.read() s = get_issue_report(query) p = htmlparser() print p.unescape(s) p.close()
Comments
Post a Comment